Exploring effective built environment factors for evaluating pedestrian volume in high-density areas: a new finding for the CBD in Melbourne, Australia

: Previous studies have mostly examined how sustainable cities try to promote non-motor-ized travel by creating a walking-friendly environment. Such existing studies provide little research that identifies how the built environment affects pedestrian volume in high-density areas. This paper presents a methodology that combines person correlation analysis, stepwise regression, and principal component analysis for exploring the internal correlation and potential impact of built environment variables. To study this relationship, cross-sectional data in the Melbourne central business district were selected. Pearson’s correlation coefficient confirmed that visible green ratio and intersection density were not correlated to pedestrian volume. The results from stepwise regression showed that land-use mix degree, public transit stop density, and employment density could be associated with pedestrian volume. Moreover, two principal components were extracted by factor analysis. The result of the first component yielded an internal correlation where land-use and amenities components were positively associated with the pedestrian volume. Component 2 presents parking facilities density, which negatively relates to the pedestrian volume. Based on the results, existing street problems and policy recommendations were put forward to suggest diversifying community service within walking distance, improving the service level of the public transit system, and restricting on-street parking in Melbourne. the sensitive distance for walking to school. In this study, we define the 500-meter buffer area around the pedestrian counting sensors to catch the built R² is 0.668. It shows about 66.80% variation of pedestrian volume in the noontime peak is based on the built environment variables. the intervention areas with the low land-use mixed degree, employment density, public transit stop density, and restaurant seating density. The darker


Introduction
Walking and built environment are considered to be very important for sustainable cities due to their environmental benefits [1]. To begin with, scholars and practitioners often consider the built environment variables as a reflection of the urban fabric and a significant component that influences travel behaviour [2,3]. A great deal of effort has been expended in exploring land use and walking behaviour over the last two decades [3][4][5][6]. Most often, mixed land-use is used as a strategy for operationalizing and promoting non-auto travel (e.g., walking, cycling or public transit) [7]. A mix-use area usually incorporates banks, restaurants, retail, businesses, working and housing, all close to each other [6]. Greenwald and Boarnet [3] found that land-use affects pedestrian travel behaviour in Oregon, Portland. Ewing et al. highlighted that the single land-use type has not been an attraction for pedestrians [4]. In addition, Hatamzadeh et al. [5] measured walking behaviour in commuting to work. The result notes that higher mixed-use can be an effective policy to promote walking in the city of Rasht, Iran. As reported by the literature discussed so far, one may suppose that land-use mixed degree is a valid variable in the walking-related research. However, very little of studies highlight the relationship between pedestrian volume and land-use diversity in the high-density area.
Urban development density influences travel behaviour in the modern city. For example, Kerr et al. [8] highlighted that features of the neighbourhood, such as residential density and intersection density, were related to walking behaviour after the logistics regression analyses. Azmi and Ahmad [9] believed that the high transit stop density encourages walking between leisure, working, and home. In addition, Laatikainen et al. [10] stated that a significant effect was found on the transit stops density for older adults' walking in Helsinki, Finland.
An area with high street connectivity offers more potential routes for pedestrians and increases the walkability of neighbourhoods due to a higher intersection density resulting from small block sizes and a flexible street network [4]. Some studies highlighted that intersection density was significantly and positively associated with walking [5,10,12,13]. Knuiman et al. [12]found evidence from the Perth resident's that proves the positive correlation between the built environment variables (street connectivity and land-use mixed) and walking frequency. The positive correlation between walking and intersection density also supported by Laatikainen et al. [10] and Hatamzadeh et al. [5]. In addition, Koohsari et al. [13] in a case study conducted in Adelaide, Australia, explored the relationship for adult's walking between transport and street network (intersection density and street integration). The finding illustrates about 42% of the association of street integration with walking to transport can be explained by perceived destination accessibility [13].
The street trees, sidewalks and pedestrian routes are the built environment component which reflects the quality of street, and influences the walking experience in the previous study. Yang et al. [14] noted that the visibility of street greenery level was positively related to the walking time and walking frequency in older adults, while Rollo et al. [15] highlighted the importance of the quality and effect of green attributes within the overall street scape experience. However, only few studies questioned the positive association between street tree coverage and walking behaviour [16,17]. For example, Ferrer et al. [16] argued that sidewalk cafes and trees create the physical obstacles that narrow the sidewalk, therefore making it difficult to walk. In addition, Cerin et al. [17] asserted that greenery and aesthetically pleasing scenery are not associated with walking in the older generation.
The findings drawn from the literature review indicate that built environment variables influence the walking frequency, walking time and walking distance. However, the existing body of studies has not taken into account the integration effects of land use, street form, facilities density, and the quality of sidewalk with respect to pedestrian volume in the high-density metropolitan area. The present study intends to bridge this research gap by analysing the relationship between built environment variables and pedestrian volume.
This paper aims to identify the walking peak periods and determine the relationship between the built environment factors and the pedestrian volume of 52 pedestrian counting sensors [18] in the Melbourne central business district (CBD). Specifically, the study evaluates the following questions: 1) What are the trends of the pedestrian volume in the Melbourne CBD? (If walking occurred in several peak periods, one would expect to categorize and collect the data during the correlation analysis).
2) Do all built environment factors under consideration correlate with respect to the pedestrian volume in the different peak periods? If not, then can we isolate the irrelevant factor/factors and identify the correlation between built environment factors and pedestrian volume?
3) What components comprise the principal component analysis, and do these relate to pedestrian volume within the Melbourne CBD?
Interrelated variables were grouped as the principal component, and were evaluated to assess how they relate to pedestrian volume. Additionally, two equations were proposed to assess pedestrian volume. Based on the results, the design intervention areas and policy implementation can be used to increase the walkability in the Melbourne CBD. This study is unique in that it considers the internal relationship for exploring of the correlation between built environment variables and pedestrian volume in a high-density area.

Study area
The present study utilises Melbourne's CBD as a case study. Melbourne is Australia's second-largest city and the capital city of the State of Victoria, with an estimated resident population of 183,756 in 2020 [19]. Melbourne CBD is located in the centre-place of Melbourne city. In terms of the central business district, the dominant 200x200 meter grid defining the CBD covers an area of roughly 1.0x0.5 miles or 1.87x0.95 km when considering street width, with the major north-south and east-west streets being 30 meters wide ( Figure 1). All major CBD blocks measure 200x200 meters, with the Hoddle grid establishing a further subdivision on the east-west axis dividing each major block into two 94x200 meter half blocks separated by a 12-meter lane and yielding a total of 64 blocks. The development density of CBD is higher than in other suburbs. Melbourne CBD has a pedestrian-and transit-friendly environment according to the pedestrian-and transit-friendly neighbourhood standard by Ewing [11]. Therefore, the investigation in Melbourne CBD provides a better understanding of the relationship between the built environment and pedestrian volume in the high-density area. Many studies assumed that the buffer zone of transit-related research is between 400 meters to 800 meters [20,21]. Furthermore, the research by Gori et al. [21] agreed with the maximum distance for pedestrian-friendly walking was an 800 meters radius or 10 minutes walking time. Similarly, Hatamzadeh et al. [22] found that 0.25 miles (approximately 402 meters) is the sensitive distance for walking to school. In this study, we define the 500-meter buffer area around the pedestrian counting sensors to catch the built environment variables ( Figure 2). Moreover, Figure 2 shows 52 pedestrian volume counting sensors and the 500-meter buffer area in the Melbourne CBD.

Study design
Since the study's objective is to identify the relationship between built environment variables and pedestrian volume, this paper presents a new methodology that combines stepwise regression and principal component analysis for illustrating the correlation and internal operation between the built environment factors and pedestrian volume.
The flow diagram in Figure 3 illustrates the methodology and includes four parts. In the data preparation process, we first extract the factors of pedestrian volume and built environment variables from a different database. In the second step, the Pearson correlation coefficient identifies the association between built environment factors and pedestrian volume. In the third step, stepwise regression is the method to evaluate the order of importance of variables and select a valuable subset of variables [23]. The correlation coefficient tests the linear relationship between built environment variables and pedestrian volume in the peak period. The next step in the process involves factor analysis and principal component analysis. The factor analysis with a varimax rotation is used to reduce the dimensionality of the datasets, increase interpretability, minimize information loss, and extract the principal component [24]. The principal component analysis results can identify the internal correlation between the principal component and pedestrian volume. The final result and further recommendations were based on the comparison of stepwise regression and principal component analysis.

Data collection
In this study, unapplied cross-sectional data were included in the data collection. For example, Cervero and Kockelman [7] conceptualized the factors of development density, land-use diversity, and street network design as 3Ds and examined the correlation between 3Ds and travel behaviour. Ewing and Cervero [1] expanded the 3Ds with destination accessibility and distance to transit as 6Ds. The factors of land-use mix degree, employment density, intersection density, public transit stop density, parking facility density, visible green ratio, and restaurant seating density were used to measure the built environment of the buffer zones around the sensors.
Land-use mixed degree can be defined as the diversity of land use type. Shannon diversity index is a commonly used and valid method to measure land-use mix diversity [25][26][27][28]. The land-use mix degree function as given below: where, LUMD is the land-use mix degree of sensor i, Pi is the number of land-use type of sensor i. Land use dataset of Melbourne CBD was collected from Census of Land Use and Employment in 2018 [29]. In addition, the classification standard of land use codes was based on the Australian and New Zealand Standard Industrial Classification.
Employment density is the number of employments around the pedestrian counting sensors within a buffer zone. The function of employment density is as follow: where, ED is the employment density of sensor i, FTEMSi is the number of full-time equivalent member of staff of buffer zone around sensor i, Ai is the area of the buffer zone around sensor i. The employments (number of jobs) of given areas were collected based on the Census of Land Use and Employment in 2018 [29]. Intersection density is the number of intersections measured within a buffer area of pedestrian counting sensors. Intersection density refers to Eq. (3)): where, ID is the intersection density of sensor i, βi is the intersection density coefficient of the buffer zone around sensor i, Ii is the number of three-way or four-way intersection of the buffer zone around sensor i, Ai is the area of the buffer zone around sensor i. Google Earth and Open Street Map provided the street networks with all intersections within the Melbourne CBD. Besides, the intersection density coefficient was based on the penalty in Table 1 of intersection density by Walk Score Methodology [30]. Public transit stop density relates to the number of stops, service level, and distance decay. The data was collected from the various network maps in Public Transport Victoria and City Mapper platform. The distance decay coefficient and service level coefficient ( Table 1) of public transport were provided by Walk Score Methodology [30]. The equation of public transit stop density refers to Eq. (4): where, PTSD is the public transit stops density of sensor i, σi is the distance decay function coefficient ( Table 1), ωi is the service level coefficient of public transportation ( Table 1), PTSi is the number of public transit stops (includes bus, tram, train, and V/line) of sensor i, Ai is the area of the buffer zone around sensor i. Parking facility density reflects the number of parking space (includes on-street parking and off-street parking facilities) available of pedestrian counting sensors within a 500m buffer area (refer to Eq. (5)). The information about parking spaces was based on the dataset in Census of Land Use and Employment 2018 [29]. The following equation can calculate the density of parking facility: where, PFD is the parking facility density of sensor i, PFi is the number of parking facility of sensor i, Ai is the area of the buffer zone around the sensor i. The visible green ratio reflects the street-side greenery at the human scale in Figure  4. The dataset of visible green ratio was based on the human-scale street view in Google Earth and Photoshop 2020. The calculation equation was proposed by Li et al. [31].
where, In Eq. (6), VGR is the visible green ratio of sensor i, Gi is the total green pixel of the street image of sensor i, Pi is the total pixel of the street image of sensor i. Restaurant seating density is the number of seats of the Café, restaurant, bistro around sensors within a 500-meter buffer area and refers to Eq. (7): where, RSD is the number of seats in the restaurants around the sensor i, Ri is the number of seats in restaurants around the sensor i, Ai is the area of the buffer zone around sensor i. The information about the number of seats in restaurants is based on the data resource of Census of Land Use and Employment 2018 [29]. The polylines of the pedestrian volume of 52 counting sensors in the Melbourne CBD were presented in Figure 5. The pedestrian volume of sensors was based on the dataset in the Melbourne pedestrian counting system. In Figure 5, most of the walking travels occurred in the morning (6:00 to 10:00), noontime (11:00 to 15:00), and evening peak time (16:00 to 20:00). Therefore, this study categorized the pedestrian volume of each sensor into three groups as morning-peak pedestrian volume, noontime-peak pedestrian volume, and evening-peak pedestrian volume. Due to excluding outlying data from the analysis, each peak period's pedestrian volume was collected based on the Trim mean function to calculate the mean taken by excluding a percentage of data points from the pedestrian volume dataset's bottom tails.  Table 2 shows that the correlation between built environment density and pedestrian volume indifferent peak periods, where the land-use mix degree is related to the employment density, public transit stop density, and restaurant seating density. Also, employment density is associated with land-use mix degree, public transit stop density, parking facility density, and restaurant seating density. A high restaurant seating density area in the Melbourne CBD is associated with a high level of land-use mix degree, employment density, parking facility density, intersection density, and public transit stop density. A strong association of land-use mix degree, employment density, public transit stop density, restaurant seating density, and pedestrian volume of noon-peak was showed in Table 2. It was decided that the best dependent variable of the pedestrian volume is the noon-peak group. This study uses stepwise regression to gain insights into the relationship between built environment variables and pedestrian volume in the peak time. Table 3 illustrates some of the main characteristics of the built environment variables and pedestrian volume. In Table 3,, R² is 0.668. It shows about 66.80% variation of pedestrian volume in the noontime peak is based on the built environment variables.

Summary of correlation and stepwise regression
In this model, F is 32.203, and the p-value of constant is 0.000 (less than 0.050), which means at least a variable correlate to the pedestrian volume. The value of VIF of the variable is less than 5.00, which verifies that the model does not have multicollinearity. The datasets do not self-correlate due to the D-W value is 1.891. In Table 3, both variables of land-use mix degree and public transit stop density significantly and positively correlate to the pedestrian volume in the noon-peak time. The equation of stepwise regression refers to Eq. (8): where, PV is the pedestrian volume, PTSD is the public transit stop density of the sensor i, LUMD is the land-use mix degree of the sensor i, ED is the employment density of the sensor i.

Result of factor analysis
Factor analysis is used to increase the interpretability of the correlation between built environment variables and pedestrian volume in this study before the application of principal component analysis. The Kaiser-Meyer-Olkin (KMO) and Bartlett's test were used to verify the dataset's adequacy for factor analysis [32]. According to the acceptable value range of Measure of Sampling Adequacy by Kaiser [32], the minimum eligible value is 0.500, and the p-value should less than 0.050, which pass the Bartlett test. The KMO and Bartlett test results report that the datasets of built environment variables are suitable to run the principal component analysis and factor analysis because the value of KMO is 0.609 (the minimum acceptable value for KMO is 0.600). The significant value (p-value =0.000) of Bartlett's test is less than 0.050. To explore the potential connection between independent variables, the rotation of the factor improves the reliability of the factor and simplify the factor structures ( Table 4).

Result of principal component analysis
The literature review has noted the importance of the internal effects of built environment variables. However, very few studies examined how the variables affect each other and how they correlate to the pedestrian volume as different sets. The principal component analysis is a way to reduce the dimensionality of datasets, increase interpretability, and minimize information loss. The built environment variables categorized into three principal components and the principal component analysis results is summarized in Table 5. Note: Dependent variable is Noon-peak-PV; D-W value is 1.847; * p<0.050, ** p<0.010; F (3,48) =28.520, p = 0.000. Table 5, about 64.10% of the variation of pedestrian volume in Melbourne can be explained by components 1, 2, and 3 (R²=0.641). The model passed the Ftest, and at least one component correlated to the noon-peak pedestrian volume due to the F value is 28.520, and the p-value of constant is 0.00 (less than 0.05). Also, samples' collinearity and self-correlation did not show because the value of VIF (1.00) is less than 5.00, and the D-W value (1.847) is around 2.00.

As shown in
The result of principal component analysis presents the correlation between component 1 and 3. In Table 5, there is a clear positive correlation (standard coefficient=0.735 and p=0.00<0.01) between component 1 (diversity of land use and amenities) and pedestrian volume. The finding supports the benefits of mix-use design. A mix-use environment can promote walking in people's daily travel. The relationship between component 2 (walking-friendly) and pedestrian volume was not found (standardized coefficients=-0.089 and p=0.308>0.010). In addition, the value of component 3 reflects the vehicle parking spaces supply. A negative correlation (standard coefficient=-0.304 and p=0.001<0.010) was found between component 3 and pedestrian volume. Besides, the principal component analysis model is reduced to the original variable form in Eq. where, PV is the pedestrian volume, Component1 is the diversity of land use and amenities, Component2 is the walking friendly, Component3 is the vehicle parking friendly, LUMD is the land-use mixed degree, ED is the employment density, PFD is the parking facility density, ID is the intersection density, PTSD is the public transit stop density, VGR is the visible green ratio, RSD is the restaurant seating density.

Discussion and design intervention
This study explores the correlation between built environment variables and pedestrian volume in the high-density area. The analysis of datasets presents some general characteristics of variables. The data pre-process examined three peak periods of walking, which means most walking occurred in the noontime peak (11:00 to 15:00). The finding of Pearson's correlation coefficient indicates that land-use mix degree and public transit stop density had a positive correlation to the pedestrian volume in the three different peak period. On the other hand, pedestrian volume in the morning peak received a negative effect on parking facility density.
The factor analysis aims to extract the principal components from the built environment variable. The results of factor analysis showed different aspects of principal components. About 64.70% of the variation of pedestrian volume in Melbourne can be explained by the component 1, 2, and 3. Meanwhile, variables of land-use mix degree, employment density, public transit stop density, and the restaurant seating density were all positively related to component 1. Intersection density and visible green ratio directly related to component 2, and parking facility density correlated to component 3. Following the principal component and built environment variables, three principal components were named 'diversity of land use and amenities', 'walking friendly', and 'vehicle parking friendly'.
The principal component analysis highlights that pedestrian volume received a positive effect on component 1, which means the diversity of land use and amenities were positively related to pedestrian volume. Among the results, component 2 reflects the quality and comfort level of the walking environment in the given area. Previous studies assumed that greenery level and high intersection density promotes walking travel. However, component 2 and pedestrian volume are uncorrelated in this study. Although the visible green ratio and intersection density do not correlate to the pedestrian volume, a walkable environment with sidewalk trees and small to medium length of blocks may affect walking behaviour potentially. The present study agrees that the mix-use design of neighbourhoods and diversity of amenities are supporting walking travel. Meanwhile, making a rational arrangement of public transit stops is another way to promote walking.
Parking facility density was a critical factor that associates with walking in some studies. However, a negative relationship between vehicle parking friendly components (component 3) and the pedestrian volume was found. The findings suggest that parking facility density negatively associates with pedestrian volume in the Melbourne CBD. With the increase of parking facility density, the on-street parking space creates the physical obstacle that narrows the sidewalk making it difficult to walk in a high-density area. Besides, a higher parking facility density is associated with the vehicles use friendly, and the motor vehicle travel mode is the primary travel model in areas with the high parking facility density.
This study contributes a framework to explore the association between the built environment and pedestrian volume in the high-density area. Besides, existing street problems and potential improvement of walkability in Melbourne CBD can be identified in this study ( Figure 6). Urban design considerations in Melbourne's CBD should focus not only on the neighbourhood's layout and urban fabric in the intervention area in Fig.6 but also on the integrity of amenities.  Figure 6 shows the intervention areas with the low land-use mixed degree, employment density, public transit stop density, and restaurant seating density. The darker circles in Fig.6 present the intervention areas with low diversity of land use and amenities. Furthermore, policy recommendations were put forward suggesting, mix-use design or diversifying community service, providing an accessible environment by walking, increasing the density of public transit stops, improving the service level of the public transport system, and restricting the supply of on-street parking facilities in intervention areas.

Conclusions
The study contributes to understanding the factors affecting pedestrian volume in a high-density area. The characteristics of the dataset were illustrated in the first part. Secondly, the variable of land-use mix degree, employment density, public transit stop density, and restaurant seating density correlated with the pedestrian volume in the correlation analysis. As the result of further study, only two variables (land-use mix degree and public transit stop density) are related to the pedestrian volume according to the results of stepwise regression. The factor analysis extracts the principal components from the built environment variables and understands the correlation of different components. In the third part, three principal components were extracted and represent the different aspects of the built environment of Melbourne's CBD. Component 1 presents the diversity of land use and amenities and associate with the pedestrian volume in the peak period. Component 3 is the reflection of vehicle parking friendly (density of parking facilities), which is negatively associated with the pedestrian volume.
Previous studies assumed that the quality of the walking environment is often associated with walking. However, this study indicates that walking environment are uncorrelated to the pedestrian volume. The visible green ratio and intersection density loaded onto component 2 reflect the quality of the walking environment. Pedestrian volume and component 2 are uncorrelated to each other. However, this result may vary depending on the season and changes in the weather, which have not been factored into this study. In addition, the sensor array is not able to differentiate between pedestrian route selection or opportunities for pedestrian route choice.
Two quantitation models of pedestrian volume were presented in this study after the stepwise regression and principal component analysis. Together these results provide valuable insights into how the built environment variables were grouped as different components, and how the components were associated with the pedestrian volume in the high-density area. In addition, this study provides various suggestions for planners to help create a walkable environment and promote walking travel in the Melbourne CBD. This research has some limitations. The dataset was collected around the sensors of pedestrian counting system in the Melbourne CBD, hence further studies are required to process data gathered from the surrounding sub-urban areas in order to validate the results presented in the paper. This study provides a path to analysing the correlation between the pedestrian volume and built environment variables. More variables explaining pedestrian volume attributes such as the impact of topography on pedestrian route selection, or variations in weather and season could be incorporated and assessed in further studies. In addition, the research can be extended to include other case study cities presenting a range of street patterns, and both regular and irregular grid/morphology structures in order to better understanding the correlation between built environment variables and pedestrian volume with respect to improving the walkability in a variety of highdensity areas.