Evaluation of the Space Syntax Measures Affecting Pedestrian Density through Ordinal Logistic Regression Analysis

: This paper examines the relationship between pedestrian density and space syntax measures in a university campus using ordinal logistic regression analysis. The pedestrian density assumed as the dependent variable of regression analysis was categorised in low, medium, and high classes by using Jenks natural break classification. The data elements of groups were derived from pedestrian counts performed in 22 gates 132 times. The counting period grouped in nominal categories was assumed as an independent variable. Another independent was one of the 15 derived measures of axial analysis and visual graphic analysis. The statistically significant model results indicated that the integration of axial analysis was the most reasonable measure that explained the pedestrian density. Then, the changes in integration values of current and master plan datasets were analysed using paired sample t -test. The calculated p -value of t -test proved that the master plan would change the campus morphology for pedestrians.


Introduction
A geographic area with buildings to visit and streets where pedestrians can easily walk encourages people to travel on foot [1][2][3][4][5][6][7][8]. Pedestrians generally prefer to move along linear lines characterised by a minimum number of turns, with movement guided by the perception of visual spaces. These spaces, defined by Hillier [9] as walkable physical environments for pedestrians in city centres where buildings' density is high, physically coincide with the street network [10]. Space syntax (SS) techniques are often used to analyse pedestrian walking behaviour in an urban environment consisting of street and building configurations. Axial analysis (AA) and visual graph analysis (VGA) are two of them [11].
The last decade has seen a remarkable increase in studies that examine walking behaviours via AA measures. Lerman and Omer [12], for example, analysed the relationship between pedestrian movement in two neighbouring regions located in Tel-Aviv based on integration, choice, and connectivity measures. Monokrousou and Giannopoulou [13] incorporated integration and choice into a correlation analysis to explain pedestrian movement and predict the future condition of Athens. Ahmed et al. [14] inquired into the morphological changes that occurred from 1947 to 2007 in Dhaka City, Bangladesh, using the integration, connectivity, and control measures. Hajrasouliha and Li [15] explained pedestrian movement in the centre of Buffalo City in New York through the connectivity measure. Liu et al. [16] used control and integration as the basis in determining the walking behaviours of tourists in scenic Mt. Sanqingshan, one of China's major tourism regions. Jabbari et al. [17] investigated the association between connectivity and pedestrian movement by conducting multi-criteria analyses in the city centre of Oporto, Portugal. In addition, some recent studies examine the relationships between measures obtained from VGA and pedestrian movements and evaluate them together with AA measures. Dessylas and Duxburry [18] and Hölscher et al. [19] compared some measures obtained from VGA and AA to determine the correlation of data with pedestrian movement. Heitor et al. [20] evaluated a campus of the Technical Institute of Lisbon in Portugal with the connectivity measures derived via the integration of VGA and AA. Bendjedidi et al. [21] used the connectivity and integration measures of VGA to look into the intensity of pedestrian movement in the city squares of Biskra in Algeria. Koutsolampros et al. [22] employed VGA measures to analyse indoor walking. Pagkratidou et al. [23] explored the behaviours of individuals unfamiliar with the Temple University campus using the integration, choice, and connectivity measures of both VGA and AA.
All of these studies mentioned above showed that the researchers had a significant effort to explain pedestrian movements using the measures obtained from AA and VGA. However, it was striking that these researchers did not choose a general measure valid in their study areas to explain the pedestrian movement. The number of measures they used was usually more than one. This confusing situation points to a need to determine a proper measure that explains the pedestrian movement for each study area. The appropriate measure should be a specific measure that reflects the walkability character of the concerned area, except that it is a general measure valid for all study areas. This paper aims to determine the most appropriate measure defining the intensity of pedestrian movements in a specified test area to overcome this deficiency mentioned above. For this purpose, the relationships between pedestrian movements and measures are evaluated using regression analysis. In addition, the study aims to reveal how future changes in accordance with the master plan will affect pedestrian density. The general assessment of the effects of these changes is made based on the results of the paired sample t-test, which is commonly used to compare the differences between two samples.
The remainder of the paper is organised as follows. Section 2 briefly describes the study area and the datasets used in this study. Section 3 introduces the use of SS techniques and statistical methods to determine the most proper measure related to pedestrian density, and Section 4 presents the results. Section 5 concludes the paper with final remarks.

Study Area
University campuses with a high student population are substantial settlements with characteristics that are similar to those of urban areas. Several university campuses can be found in most of the densely populated metropolitan cities in Europe. One of these cities is Istanbul, which has a population of more than 15 million and is home to the leading higher education institutions of Turkey. Among these institutions, Yildiz Technical University is the fourth oldest, founded in 1911. It offers most of its educational services in Davutpasa Campus, which occupies an area of approximately 124 ha and is the largest educational expanse in Istanbul. The campus had eight faculties, two institutes, and various administrative units at the end of 2019. The total area of the buildings and length of the road network is about 13 ha and 14.5 km, respectively (grey areas in Figure 1).
The historical building in the centre of the campus was built in 1832. According to inventory records, the rest of the buildings were constructed after 2001. The master plan that the university administration intends to implement after 2019 will expand the total building area and road network length by approximately 1 ha and 4.2 km, respectively (orange areas in Figure 1). The new buildings will be built on open spaces in the northwestern and southeastern sides of the campus, and the planned roads will provide access to these buildings. Some walking paths leading to the open spaces in the campus centre will also be built. The morphological changes occurring from 2001 to 2019, together with the data in the master plan, reflect the dynamic structure of the study area.

Materials and Methods
The datasets needed to derive AA and VGA measures in the study area were obtained by digitizing from the existing orthophotos of the region. The axial map and building footprints are the baseline data used in the AA and VGA, respectively. The axial map used for AA consists of a minimum number of axial lines. Each axial line in this map shows the direction between the two points where the pedestrian movement starts and ends [24,25]. All of them must be compatible with the street network configurations [24,26,27]. VGA is implemented depending on the building configuration of the space [28][29][30]. This technique entails dividing geographic areas into grid cells and subsequently calculating measures through an examination of empty cells, together with nearby neighbours where pedestrians can move. In this study, a total of 15 measures, described below, were obtained using both analyses.

AA and VGA Measures
Open source tools were used to calculate the measures derived from the AA and VGA. One of these was the SS toolkit, the Geographic Information System (GIS) software Quantum GIS (QGIS) plug-in used to obtain AA measures from an axial map. It is a multi-platform spatial network analysis tool that integrates SS techniques with GIS data analysis and visualisation features [31]. Another open-source tool used in this study was DepthMap, which was developed as a platform for visibility analysis of urban and architectural designs [30]. Both SS toolkit and DepthMap allow researchers to use several measures ( Figure 2).
Integration describes accessibility from a location to all other locations [32]. Accessibility is one of the most important factors that increase or decrease the movement of pedestrians [33][34][35]. Both analyses can determine integration based on source data. Hillier and Hanson's [24] integration ( ) is a global measure obtained from AA [11]; it also shows the topological proximity of an axial line to other lines in an axial map. When integration is calculated for axial lines delimited by a radius value, it is called local integration( ( ) ). Mean depth ( ) is an inverse of global integration and provides the mean of the total depth of an axial line connected to other axial lines. VGA integration ( ) developed by Hillier and Hanson [24] yields the ratio of the visibility of any grid cell in the space to the visibility of other cells. Teklenburg [36] suggested normalising VGA integration ( ) to secure an opportunity to compare the integration obtained from analyses performed in different areas. Similarly, Campos and Fong [37] normalised integration using a p-value ( ). The mean depth of VGA ( ) is the mean of the depth value calculated from one cell to other grid cells. Connectivity is another measure that can be obtained from AA and VGA. AA connectivity ( ) is the total number of axial lines connected to an axial line, whereas VGA connectivity ( ) refers to the total number of adjacent grids connected to a grid cell in the space. AA can be used to derive another measure called choice ( ℎ ), which reflects the total distance of an axial line to other axial lines. It provides the number of shortest paths that connect an axial line to other axial lines. The distance between two axial lines is the shortest route calculated topologically [38]. The other measures evaluated in this study were obtained through VGA. Hillier and Hanson's [24] control measure ( ) and Turner's [30] controllability ( ) are two measures. Control involves visually selecting dominant areas, whereas controllability determines easily monitored regions during a walk. Visual entropy ( ) is adopted in the examination of the depth values of a grid cell and other adjacent cells. If the depth value of neighbouring cells is approximately equal to that of an investigated cell, entropy is high. If the depth values between adjacent cells differ from one another, then entropy is low. Relativized entropy ( ) is a calculation of the expected distribution of mean depth value. The clustering coefficient ( ) is employed to ascertain how a pedestrian's space perception changes in case of moving away from a current location. The loss of visual information due to pedestrian movement reduces the value of the clustering coefficient [30].

Determining the Most Proper Measure
In this study, regression analysis was used to explain the relationships between abovementioned measures and pedestrian density. It was decided the most appropriate measure by evaluating the results of its analysis models. Regression analysis is a statistical approach implemented to analyse the relationship between a dependent variable and one or more independent variables. In this study, the pedestrian density data calculated according to the number of people counted at 22 gates in the campus area (blue marks in Figure 1) was accepted as the dependent variable. People counting, also known as the gate-count method [39], is the most straightforward approach to sensitively determining pedestrian movement in a location where an automated data collection method cannot be used [40]. The appropriate selection of places from where people will be counted is critical in effectively reflecting pedestrian mobility. Enough points with different characteristics that reveal the frequency of use in a region by pedestrians are also needed [41]. The counts at 22 gates continued on weekdays for two weeks. Each gate was tried to count for 15 min at least once in four different periods (08.30-10:30, 10.30-12:30, 12:30-14:30, 14:30-16:30). Besides, it was counted two more times over random periods. The 15-min count for totally 22 gates was 132 times. Then, the authors categorised all data elements into three classes that represent ordered density groups (low, medium, and high). Three groups were sufficient for 132 gate-count data in this study. On the other hand, if researchers achieve more repetitive counts at more gates for a more extensive study area, they can also increase the number of density groups.
Before the classification, a pre-check process was applied to determine whether there were potential outliers among the gate-count data. One of the essential methods to reveal the potential outliers in statistics is to choose a criterion value known as z-score. Data elements with a value greater or less than the z-score are investigated as the outliers. In this study, the z-score of each data element in the dataset is calculated using Equation (1) [42].
where x indicates the value of data element, and are the mean and the standard deviation values of the dataset, respectively. The default threshold values in this study were selected as ±2.5. The identified potential outliers were not considered as insignificant data elements that need to be deleted; it was investigated whether they distorted the limit values of the classes.
In this study, the Jenks natural break (JNB) algorithm, which identifies the best groups that have similar values, was used to categorize the data into ordinal three classes. JNB is an optimization technique that maximises the differences among classes. It addresses the problem of how to split a range of numbers into contiguous classes to minimise the squared deviation within each class [43]. This algorithm iteratively processes the data and assigns it into classes whose names are specified by the user. It then returns the sum of squares of deviations (SSD) of data elements from the sample mean (Equation (2)) within each class and calculates the total value for all classes. The classification that gives the minimum total value among iterative processes is optimum [44].
The goodness of variance fit (GVF) for the specified class elements gives the strength of the classification process (Equation (3)). GVF takes a value between 0 and 1. The closer GVF to 1, the better the fit is.
where SSD is the minimum total value of SDD for optimum classification, and SSD is the squared deviation of all elements in the dataset.
The first independent variable used to create a regression model is the measures whose relationship with pedestrian densities are investigated. They take continuous values calculated with the software. Since the measures obtained from AA and VGA are typically inter-correlated, a multicolinearity problem occurs when all measures are given as input data in the regression model. Therefore, a separate model was established for each measure, and they were evaluated separately. The second independent variable for the regression model was the periods of pedestrian counts. Since the measures are collected in four different periods, this independent variable takes categorical values ranging from nominal 1 to 4.
An analyst has to use a proper regression model depending on the data types of dependent and independent variables entering the model. The dependent variable used in this study includes numerical data grouped in three ordered classes. When the ordered data that constitutes the dependent variable has at least three groups in a natural order, it is modelled using ordinal logistic regression [45]. The ordinal logistic regression model is based on the existence of a continuous and unobserved random Y* latent variable under the categorically dependent Y variable. The categories in this variable are predicted as sequential intervals in a continuous plane called a threshold value [46,47]. Four separate tests check the robustness of the ordinal logistic regression model. These tests are (1) model fitting, (2) goodness of fit, (3) pseudo R 2 , and (4) parallel lines. Model fitting examines −2 log-likelihood. A significant (p < 0.05) change in the −2 log-likelihood statistics between the baseline model and the final model indicates that the predictors were significant based on these tests [48]. Two types of the goodness of fit tests called Pearson chi-square, and deviance statistics evaluate the discrepancy between the current model and the full model. However, these tests are sensitive to empty cells. If there are many empty cells in the model, −2 log-likelihood is considered a more robust indicator than Pearson chi-square and deviance statistics [49,50]. Pseudo R 2 values are used to estimate the variance explained by the independent variable. When these values are closer zero, model fitting diminishes. Lower R 2 values do not prevent interpreting parameters. One of the assumptions underlying ordinal logistic regression is based on that the relationship between each pair of outcomes is the same. In other words, ordinal logistic regression assumes that the coefficients defining a relationship between the lowest and all higher categories of the response variable are the same as those that describe the relationship between the next lowest category and all higher categories. This is called the parallel lines assumption, and the chi-square test is used to test the validity of the assumption of parallelism. At the end of the regression analysis process, the test sensitivities of each ordinal logistic regression model created in this study were checked. The most suitable model reflecting the pedestrian density was determined according to the results of parameter estimation tables.

Analysing the Effect of Master Plan
At this stage, it was aimed to determine how the new walkways to be added in the future, according to the master plan could affect the pedestrian density in the study area. Pedestrian density was examined according to the reference measure determined by regression analysis in the previous stage. The evaluations were made analysing measure values of the same features both in the current situation and master plan.
The number of features in the master plan dataset is higher than the current dataset due to newly added walking paths. The reference measure values of features in both datasets were normalised to make a reasonable comparison between the newly added and current walking paths. In this study, the normalisation process was performed using the interval between zero and one (Equation (4)). * = − − where represents the measure values, and and pertain to the maximum and minimum values in the datasets, respectively.
The differences between the current and master plan datasets were compared using a paired sample t-test, which is a parametric inferential procedure designed to evaluate two means calculated from two related samples. The three assumptions espoused in the paired sample t-test are the same as those adopted in an independent sample t-test. The first assumption indicates that the dependent variable contains a scale or ratio scale and is distributed normally. The second assumption holds that the variances in raw scores are equivalent to populations estimated by . The third assumption posits that represented populations exhibit homogeneous variances. The number of samples in the two groups for before and after situations must be equal to generate score pairs from the relevant samples. The paired samples are based on H0 and Ha. The t-value is used in the statistical evaluation of the test. The calculated t-value is compared with the corresponding value from the t distribution table for a selected confidence level. If the calculated t-value is higher than the critical value, H0 is rejected. Rejection shows that means statistically differ from one another [42].

Calculated Values for Measures
In this study, AA and VGA values calculated for 22 gates are given in Tables 1 and 2, respectively. Since the local integrations of AA were calculated for two different radii (r = 3 and r = 6), the total number of measures in the two tables was 16. Besides, the grid spacing was used as 10 m while calculating the VGA measures.

Ordinal Logistic Regression Analysis
The dependent variable of the ordinal logistic regression process is based on gate-count data in this study. The mean and the standard deviation values of totally 132 data elements were calculated as 189. 16 and 228.97, respectively. Depending on these values, z-scores were generated using Eq.1 for each element, and those that could be outliers were determined. Figure 3 shows the distribution range of the gate-count data. In this graph, three data elements with z-score values outside the ±2.5 default threshold can be considered as potential outliers (shown with the blue rings). Attention should be paid to these elements in a classification process using the JNB algorithm. According to the first JNB classification with 132 data elements, 106 data were assigned to the low, 25 data to the medium, and only one data to the high classes ( Table 3). The GVF value for this classification is calculated as 0.823. Assigning only one element to the high class and the GVF value lower than 0.850 led the authors to exclude the outliers in the first JNB classification. According to the second classification with 129 data elements, 75 data were assigned to low, 44 data to medium, and 10 data to high classes ( Table 3). The GVF value for this classification is 0.870. The more successful distribution of the data to the classes and the GVF value higher than 0.850 showed that this procedure was applicable. Besides, the three outliers were added to the high class and the number of data of this class was 13. Descriptive information about the dependent and independent variables used in this study is presented in Table 4. A total of 16 different models were created. The dependent variable pedestrian density and the independent variable periods were used in each model. The independent variable measures were included in each model one by one.  (Tables 1 and 2) 100.0% The significance of the models created in this study was examined by the model fitting and parallel lines tests ( Table 5). The goodness of fit and pseudo R 2 tests were not evaluated because the data contained a large number of empty cells and sorting the models was unnecessary. All of the 16 models provided the assumption of parallel lines because of p > 0.05. When the fit values explaining the significance of the model were evaluated, the four models highlighted in gray in Table 5 were found statistically significant because of p < 0.05. The effects of independent variables (periods and measures) on the dependent variable (pedestrian density) were analysed with parameter estimation statistics for the four significant models identified above. Parameter estimation uses a table of statistics to explain the direction, strength, and statistical significance of the relationships between variables. Table 6 gives the model results of the integration measure obtained from AA. Since the authors demanded to analyze the variables that increased the pedestrian density, they selected the high pedestrian density group as a reference in this study. This preference causes ß-value of the independent variables that increase the pedestrian density to be positive. The p-value lower than 0.05 for notifies that integration has a significant effect on the established model. According to periods variable, T2 was statistically significant (p < 0.05); T1 and T3 were not significant (p > 0.05), when T4 was selected as a reference. The non-significant effects for T1 and T3 explain that the pedestrian density in these periods was not significantly different from T4. ß is a coefficient that expresses how the effect of variables on the dependent variable proportionally. A variable with a high ß-value has more effect in the model than others. In this study, since has greater ß-value than the others, integration was accepted as the most significant variable in explaining the pedestrian density. On the other hand, the ß-values of the periods are not statistically as effective as the integration in explaining the pedestrian density. e β and w-values are used to interpret the effects of variables like ß-values. e β -value takes values less than one when ß-value is negative and greater than one when it is positive. A high ß coefficient for a variable indicates that its effect on the dependent variable is considerable. Briefly, AA integration in this model was more effective than the periods in explaining the pedestrian density.  Table 7 gives the model results of the mean depth measure obtained from AA. The p-value lower than 0.05 for notifies that mean depth has a significant effect. The results for periods variable are similar to the model established with . Because is an inverse of , its ß-value is negative, and therefore its e β -value is lower than one. The ß-value, e β -value, and w-value for were lower than the values of has. These statistical results show that does not reflect pedestrian density as successful as does.  Table 8 gives the model results of the choice measure obtained from AA. The p-value higher than 0.05 for ℎ notifies that choice does not have a significant effect on pedestrian density. Therefore the authors could not evaluate ß-value, e β -value, and w-value for choice. The results for periods variable are similar to the models established with and .  Table 9 gives the model results of the control measure obtained from VGA. The p-value higher than 0.05 for notifies that control does not have a significant effect on pedestrian density.
Therefore the authors could not evaluate ß-value, e β -value, and w-value for control. The results for periods variable are similar to the models established with , and ℎ . The four tables prepared for parameter estimations of ordinal logistic regression analyses verify that integration was the most effective measure to predict the pedestrian density in this study area. Since the p-values of T1 and T3 groups, which were among the periods used in all models, were higher than 0.05, the counts made in these periods were not significantly different from the reference T4 in determining the pedestrian density. The effect of the T2 group, which was statistically significant in each model, was negative compared to the reference T4 group. The significant effect for T2 implies that the pedestrian volume at T2 was significantly lower than at the reference T4. If it were aimed to determine the low pedestrian density of the study area, it would be appropriate to make the counts in the period corresponding to the T2 group.  The new axial lines that were planned to be connected to the axes categorized in high class in the current axial map were generally grouped in high class. These axial lines often appeared in the central part of the campus in the master plan. However, other new axes planned to be added around the campus boundaries were in the medium and low classes. They increased overall accessibility by helping spread the centrality to the edges as well. Table 10 shows the paired sample t-test results for the current and master plan datasets. There are 96 axial lines in the current dataset, and their mean integration value is 0.388. According to the master plan, when 41 new walkways were added to the campus area, the mean integration value decreased to 0.356. Paired sample t-test was performed for 96 common walkways in both datasets. The mean integration value of these 96 paths in the master plan was calculated as 0.370. These differences between the mean values showed that new paths to be added would reduce the mean value of the current ones. Besides, the mean values of the new paths were lower than the current ones. The p-value of 0.002 indicated that the changes in the means mentioned above were statistically significant. The value proved that the master plan would change the campus morphology for pedestrians.

Conclusions
This study investigated the relationship between pedestrian density and measures obtained from AA and VGA. These two analysis methods enabled the derivation of 15 different measures. The relationship of each measure to pedestrian density was analysed using ordinal logistic regression. The integration and mean depth of AA yielded statistically significant results to explain the pedestrian density. However, the coefficient values of the regression analysis revealed that integration was more effective than mean depth. For this reason, integration was accepted as the most effective tool to explain the pedestrian movement in the study area.
According to the regression analysis results in the Davutpaşa Campus case, AA integration value of an axial line reflects usage intensity of walking path that physically coincides with this line. This axial line with high integration value enables to identify the walking path with high pedestrian density. Moreover, the analysis results prove that AA, as a SS technique, explain pedestrian density better than VGA. This assessment emphasises the importance of AA integration obtained from a twodimensional axial map. However, some environmental characteristics such as the topography of the region and land use characteristics are other factors affecting the walkability and pedestrian density on the walkways. Since Davutpaşa Campus has much less street connection than a large urban area, the environmental characteristics of the walking paths can be easily observed. Considering the environmental features, walking paths, which have high pedestrian density according to the integration measure, are also suitable for walking in the campus area. However, their effects have not been investigated statistically in this study. In the future, the authors intend to examine these relationships between the walkways with high integration values and the environmental factors on a broader study area that has many street connections.
Increasing accessibility and pedestrian mobility to a maximum by adding minimum number of new walkways is the most optimal solution for planners. Statistical analysis of the effects of new walkways on the existing ones, depending on the alternative plans, helps to decide on the most appropriate solution. In the second part of this study, changes in integration values of axial lines in the current and master plan datasets were analysed by using paired sample t-test. This comparison only determines whether the master plan significantly affects existing walkways. In the case of more than one alternative plan, the significance of the changes created by each plan on existing walkways should be examined in order to compare the plans. Funding: This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. It was prepared from a part of the first author master thesis.