Inﬂuence Area of Transit-Oriented Development for Individual Delhi Metro Stations Considering Multimodal Accessibility

: Understanding the inﬂuence areas for transit stations in Indian cities is a prerequisite for adopting transit-oriented development (TOD). This study provides insights into the last mile patterns for selected Delhi Metro Rail (DMR) stations, speciﬁcally, Karkardooma, Dwarka Mor, Lajpat Nagar, and Vaishali, and the extent of the inﬂuence area based on di ﬀ erent access modes. The variation in the extent of the inﬂuence areas based on various modes and the locational characteristics of stations have been considered in this study. The last mile distances reported in the conducted survey involved the problems of rounding and heaping, and they were subjected to multiple imputation to remove the bias. The spatial extent of the inﬂuence areas for various modes was estimated based on the compound power exponential distance decay function. Further, the threshold walking distances were calculated using the receiver operating characteristic (ROC) curves. The variations were noted in the last mile distances among stations. The walking distances (mean and 85th percentile) among stations did not vary considerably; however, large variations were noted when comparing other modes. These di ﬀ erences in accessibility must be taken into account when considering multimodal accessibility and multimode-based TOD. The study can provide useful inputs for planning and implementing TOD in New Delhi.


Introduction
The influence areas are a critical part of transit-oriented development (TOD), as this is the area around the station where the TOD principles are applied, including high densities, mixed land-uses, mixed income housing and improved infrastructure for non-motorized transport. Andersen and Landex [1] define the influence area of public transit as a "vicinity of a stop or station of a public transport line" and the "area is where most of the non-transferring passengers at the particular stop or station come from". The influence area of a transit station, therefore is an area around a transit which serves as the customer base for transit services. It is also the area that receives the maximum benefits of transit. Often these influence areas are based on the distances people are willing to travel to transit in a specified time. These specifications are further based on the various travel modes that are used for last-mile connectivity, often by walking. In the literature, this area has been specified based on access distance, which directly provides the geographic extent of the TOD. A distance of 2000 ft (600 m) was introduced by Calthorpe [2,3]. Untermann [4] and Dittamar and Ohland [5] determined the distance as 1/2 mi (800 m). These aforementioned distances have been specified based on the walking distance that people prefer to transit [6][7][8]. To determine the extent of the TOD influence areas, some literature refers to a single distance, whereas it is reported that others use a distance range as a basis. Guerra et al. [8] and Flamm and Rivasplata [9] emphasized that in the U.S, the radius of influence area can vary between 1/4 and 1/2 mi (400-800 m). Consequently, various cities have adopted different radii for TOD. Guerra et al. [8] raised doubts about the feasibility of adopting 1/2 mi (800 m) as the de facto standard for TOD in the United States as it is "more an artifact of historical precedent than a statistical or analytical construct".
In the National TOD Policy of India, the influence area of TOD has been set between 500 m to 800 m [10]. The capital city of New Delhi is among the pioneers in the country to adopt TOD into its city planning. The Unified Traffic and Transportation Infrastructure (Planning & Engineering) Centre (UTTIPEC) suggested that the influence zones in New Delhi be classified into an intense zone with a radius of 300 m, a standard TOD zone with a radius of 800 m (which corresponds to a 10-min walk), and a TOD transition zone with a radius of 2000 m (which corresponds to a 10-min cycle ride) [11]. In the Delhi Master Plan 2021, which lays out the specifications for city planning for the coming years, the city authorities have adopted a TOD influence zone of 500 m on both sides of the mass rapid transit system (MRTS) corridor and the Delhi Metro Rail (DMR).
The currently adopted standards are heavily influenced by the TOD standards adopted in developed countries. Quite often, the distances that are used range between approximately 400 m to 800 m considering walking as the last mile mode [3][4][5][6][7][8][9]. It has been noted that the influence areas for transit vary depending on the type of access mode, the type of main mode, the trip purpose, and the area type [12]. Walking is undoubtedly the most popular access mode worldwide and has been widely studied by several researchers. In addition, in many cities in developed countries, bicycles are popular as access modes, with many transit agencies allowing bicycles to be taken aboard. The access distances for cyclists have been noted to have a large range (from 1.96 to 4.8 km), varying among studies and cities [9,13,14]. Lee et al. [14] explored the possibility of introducing a bicycle-based TOD in Seoul, Korea, which could enable the coverage of 74-94% of the area, as compared to the coverage of 30% by walk-only catchment areas. The catchment ranges of feeder buses and cars (kiss and ride) were estimated in the range of 1.24-3.73 miles (approx. 2000-6000 m) and 0.62-4.35 miles (approx. 1000-7000 m) respectively [15], increasing the influence area of transit services to a larger extent. Therefore, a walk-based TOD is not always necessary; however, it should include other modes of last mile connectivity. The influence of modes other than walking on the catchment areas of transit stations thus, must be thoroughly investigated.
Johar et al. [16] studied the distances walked by commuters from bus stops to various destinations in New Delhi and found that the mean walking distances (based on lognormal distribution) were 677, 660, 654, and 637 m for shopping, recreation, education, and work trips, respectively. The research shows that commuters walk longer distances to access rail transit than reaching the bus transit [17][18][19]. Therefore, it can be assumed that commuters walk longer distances to reach metro stations in Delhi. Additionally, in Indian cities, modes such as cycle rickshaws, auto rickshaws, mini vans (commonly known as gramin seva in New Delhi) and other forms of informal transport are commonly used for last mile connectivity. Considering the multimodal nature of last mile connectivity in New Delhi, Ann et al. [20] estimated the influence zones for DMR stations in New Delhi. In the study, the mean values of the distance for access were estimated to be 700 m for walking, 2900 m for informal transit, 6300 m for buses and private transport. Moreover, using the distance decay function, the 85th percentile distances for access were estimated to be 1400 m for walking, 5600 m for informal transit, 11900 m for private transport and buses. Zhao et al. [21], El-Geneidy et al. [22] and Hochmair [23] have also used the 85th percentile value to establish the catchment areas around transit stations for modes such as walking and cycling. In addition, Ann et al. [20] found that the threshold distance for access was 1200 m for walking, which is close to the 80-85th percentile values from the decay analysis. These results are considerably different from those in India's national TOD policy, which specifies the extent of the influence area for walking to be 500-800 m. According to this, the TOD development and Sustainability 2019, 11, 4295 3 of 23 associated higher densities and infrastructure will be concentrated within this 500-800 m, limiting the planned development to a limited area without much justification. Moreover, according to the estimated decay curves, the influence area with 500-800 m can only cover 50%-65% of the current transit passengers who walk to stations [20], excluding the rest from the spatial extent of TOD. Hence, in order to capture the benefits of TOD and extend it to real users of the transit system, the guidelines set for the size of the influence areas and for the extent of influence areas based on ground reality need to be reconsidered. This aspect highlights the need to examine TOD principles and standards based on the mobility characteristics of Indian cities. It has already been shown that commuters to the DMR system in New Delhi, travel much longer distances than 500-800 m. The difference in the last mile connectivity patterns in Indian cities, and cities of the developed world can impact the spatial extent of the TOD influence areas. The multimodal accessibility, if not accounted for in TOD planning, may lead to the exclusion of a certain amount of existing transit users. A brownfield development in such conditions may not be cost effective in developing countries and could also possibly displace several low-income households that may not be able to afford to live in the new developments.
The larger size of the influence areas estimated by Ann et al. [20] as compared to the size specified in the national TOD policy can help planners and policy makers identify and plan for the real catchment areas of DMR in New Delhi. They need not restrict the development plans for only 500-800 m around stations. Additionally, it can help transit agencies to identify catchment areas and to estimate demand. However, Ann et al. [20] estimated the influence area with all DMR stations together. They have not considered the station specific characteristics. The locational differences between stations may cause different accessibility patters and traveling preference. Therefore, whether the extent of the influence area differs for individual stations is an aspect worth investigating.
This study focuses on estimating the influence areas for various last mile modes for the individual metro stations in New Delhi. Four stations were chosen from the DMR network: Karkardooma, which is a city station and interchange and urban regional center station; Dwarka Mor, which is a subcity residential area station; Lajpat Nagar, which is an interchange and market station in a central city environment; and Vaishali, which is an outer city station. A questionnaire survey was conducted at these stations to collect information regarding last mile mobility patterns of metro commuters. The methodology for estimation of the sizes of the influence areas was drawn from Ann et al. [20]. The reported distance data from the surveys revealed considerable heaping and rounding. Thus, the multiple imputation (MI) derived from the work of Heitjan and Rubin [24], Drechsler and Kiesl [25] and Yamamoto et al. [26] was applied to remove the rounding bias before employing the distance decay and receiver operating characteristic (ROC) curves.
The remaining paper is organized as follows: Section 2 describes the data collection process, and the estimation of the influence areas for the different modes, along with the distance decay analysis and ROC analysis, is described in Section 3. Section 4 summarizes the study and presents the derived conclusions.

Study Area and Data
It is considered essential to understand the variation of the last mile distance patterns across the different types of stations. Therefore, a survey was planned and executed for specific DMR stations. The survey was carried out for four existing DMR stations of the DMR network in New Delhi, India, namely, Karkardooma, Dwarka Mor, Lajpat Nagar, and Vaishali. The modeling is not targeted to assess prospective locations for transit stations but rather understand the influence areas for the four existing DMR stations to help in planning for brownfield development for TOD around these stations. The Karkardooma and Dwarka Mor stations were selected to be developed by the city authorities in consideration of TOD. Karkardooma is a part of a TOD project of Delhi Development Authority (DDA), and has been planned for development with over 30 hectares of residential and commercial centers. In addition, Karkardooma is an interchange station and a place of commercial importance in East Delhi. The station has been developed as a complex in a mixed land use area and has a parking facility for private modes (privately owned cars and two-wheelers). The DDA has selected Dwarka to be developed into a smart subcity in the South-West region of New Delhi, with commercial, residential, and entertainment facilities being established according to TOD norms. The area is sought after for residential purposes and has medium to high density. There are also some institutions and government offices around the station. The Lajpat Nagar station area represents a mixed land use, mixed income, highly dense area located in New Delhi, and falls on the interchange of the Violet and Pink lines of the DMR network. One of the major markets in the city lies in close proximity to the station, and the area is of considerable commercial and residential importance. Vaishali is an end station on the Red Line. The station is located in the suburbs with highly dense and mixed-income housing. Although this region is outside New Delhi, it belongs to the National Capital Region (NCR) of Delhi. Mixed land use is predominant around the station area. The modeling is not targeted to assess prospective locations for transit stations but rather understand the influence areas for the four existing DMR stations to help in planning for a brownfield development for TOD around these stations. The locations of these stations are shown in Figure 1 on a map of New Delhi and aerial shots of the station and its surrounding area are shown in Figure 2. selected Dwarka to be developed into a smart subcity in the South-West region of New Delhi, with commercial, residential, and entertainment facilities being established according to TOD norms. The area is sought after for residential purposes and has medium to high density. There are also some institutions and government offices around the station. The Lajpat Nagar station area represents a mixed land use, mixed income, highly dense area located in New Delhi, and falls on the interchange of the Violet and Pink lines of the DMR network. One of the major markets in the city lies in close proximity to the station, and the area is of considerable commercial and residential importance.
Vaishali is an end station on the Red Line. The station is located in the suburbs with highly dense and mixed-income housing. Although this region is outside New Delhi, it belongs to the National Capital Region (NCR) of Delhi. Mixed land use is predominant around the station area. The modeling is not targeted to assess prospective locations for transit stations but rather understand the influence areas for the four existing DMR stations to help in planning for a brownfield development for TOD around these stations. The locations of these stations are shown in Figure 1 on a map of New Delhi and aerial shots of the station and its surrounding area are shown in Figure 2.  The purpose of the survey was to collect data pertaining to the last mile connectivity of commuters at the selected stations and to estimate the distances travelled by commuters to access these stations. The survey contained questions concerning access and egress travel patterns of DMR passengers, such as the trip purposes, the travel modes, the travel distances, and the time. The passengers' attributes, such as gender, age, and income, were also included. The preferred mode for covering the last mile distance, and the various alternatives available to the commuters were also determined. Furthermore, additional information pertaining to the passengers' willingness to travel and the motives behind choosing a particular mode was collected.
The distances collected from surveys often suffer from rounding and heaping, leading to biased results. To increase the accuracy of the distance data reported by the respondents in the survey, spatial data was initially planned to be obtained by plotting the origin/destination points of the commuters on a map. However, this idea was dismissed because during the pilot survey, the respondents were reluctant to provide this information, which they deemed to be sensitive and personal. In addition, during the survey, low participation of female commuters was noticed, as they appeared to be uncomfortable interacting with the surveyors, likely because they did not wish to talk to strangers. The survey was conducted at the platform when the commuters were waiting for the train. The commuters deboarding the train were always in a hurry to exit the station, and it was difficult to engage them in the survey. A total of 1061 respondents were interviewed during the survey across the four stations (Karkardooma: 267, Dwarka Mor: 250, Lajpat Nagar: 286, Vaishali: 258). The responses for access and egress were then combined according to the stations.
An analysis of the last mile modes indicated that walking is the most preferred mode, followed by informal modes. Auto rickshaws, electric rickshaws, mini vans (gramin sevas), shared auto rickshaws and cycle rickshaws were considered under the category of informal modes. The shares of the bus and the private modes were low, and they were combined for ease in the analysis. The absence of bicycles as a last mile mode was noticed in the survey, which was also noted by Ann et al. [20]. The mode share for last mile connectivity is shown in Figure 3. The purpose of the survey was to collect data pertaining to the last mile connectivity of commuters at the selected stations and to estimate the distances travelled by commuters to access these stations. The survey contained questions concerning access and egress travel patterns of DMR passengers, such as the trip purposes, the travel modes, the travel distances, and the time. The passengers' attributes, such as gender, age, and income, were also included. The preferred mode for covering the last mile distance, and the various alternatives available to the commuters were also determined. Furthermore, additional information pertaining to the passengers' willingness to travel and the motives behind choosing a particular mode was collected.
The distances collected from surveys often suffer from rounding and heaping, leading to biased results. To increase the accuracy of the distance data reported by the respondents in the survey, spatial data was initially planned to be obtained by plotting the origin/destination points of the commuters on a map. However, this idea was dismissed because during the pilot survey, the respondents were reluctant to provide this information, which they deemed to be sensitive and personal. In addition, during the survey, low participation of female commuters was noticed, as they appeared to be uncomfortable interacting with the surveyors, likely because they did not wish to talk to strangers. The survey was conducted at the platform when the commuters were waiting for the train. The commuters deboarding the train were always in a hurry to exit the station, and it was difficult to engage them in the survey. A total of 1061 respondents were interviewed during the survey across the four stations (Karkardooma: 267, Dwarka Mor: 250, Lajpat Nagar: 286, Vaishali: 258). The responses for access and egress were then combined according to the stations.
An analysis of the last mile modes indicated that walking is the most preferred mode, followed by informal modes. Auto rickshaws, electric rickshaws, mini vans (gramin sevas), shared auto rickshaws and cycle rickshaws were considered under the category of informal modes. The shares of the bus and the private modes were low, and they were combined for ease in the analysis. The absence of bicycles as a last mile mode was noticed in the survey, which was also noted by Ann et al. [20]. The mode share for last mile connectivity is shown in Figure 3.

Rounding Problem of Reported Distance Data
The histograms and cumulative frequency graphs were studied to observe the heaping behavior of the reported distances for all stations. The reported distances were heaped at 100, 500, and 1000 m for walking, and at 500, 1000, and 5000 m for the bus and the private modes across all stations. It was observed that coarseness increased with the distance. A histogram for walking distances for Dwarka Mor station has been shown in Figure 4. The heaping can be clearly seen at multiples of 100, 500 and 1000 m. The longer distances were commonly rounded to the nearest 1000 or 5000 m across modes. Compared to the raw data of the reported heaped distances, the imputed data can lead to a better analysis and interpretation, as illustrated by Ann et al. [20]. The study highlighted that when rounding is present, the imputed data gives a better fit and statistically significant estimates for a distance decay analysis. The imputed data also yielded results for cases where the raw data could not. In the case of a ROC analysis, the imputed data gave a smoother curve and unique Youden's index for each distance range. With heaping in the raw data, the multiple distance ranges corresponded to the same Youden's index and hence, the threshold extended to multiple distance ranges. Therefore, this study also applied a heaping model to account for the heaping issues and performed MI to obtain the imputed dataset to achieve results devoid of rounding bias/errors. Appendix A describes the imputation process adopted for the reported distance data. The subsequent analysis was conducted using the imputed dataset.

Rounding Problem of Reported Distance Data
The histograms and cumulative frequency graphs were studied to observe the heaping behavior of the reported distances for all stations. The reported distances were heaped at 100, 500, and 1000 m for walking, and at 500, 1000, and 5000 m for the bus and the private modes across all stations. It was observed that coarseness increased with the distance. A histogram for walking distances for Dwarka Mor station has been shown in Figure 4. The heaping can be clearly seen at multiples of 100, 500 and 1000 m. The longer distances were commonly rounded to the nearest 1000 or 5000 m across modes.

Rounding Problem of Reported Distance Data
The histograms and cumulative frequency graphs were studied to observe the heaping behavior of the reported distances for all stations. The reported distances were heaped at 100, 500, and 1000 m for walking, and at 500, 1000, and 5000 m for the bus and the private modes across all stations. It was observed that coarseness increased with the distance. A histogram for walking distances for Dwarka Mor station has been shown in Figure 4. The heaping can be clearly seen at multiples of 100, 500 and 1000 m. The longer distances were commonly rounded to the nearest 1000 or 5000 m across modes. Compared to the raw data of the reported heaped distances, the imputed data can lead to a better analysis and interpretation, as illustrated by Ann et al. [20]. The study highlighted that when rounding is present, the imputed data gives a better fit and statistically significant estimates for a distance decay analysis. The imputed data also yielded results for cases where the raw data could not. In the case of a ROC analysis, the imputed data gave a smoother curve and unique Youden's index for each distance range. With heaping in the raw data, the multiple distance ranges corresponded to the same Youden's index and hence, the threshold extended to multiple distance ranges. Therefore, this study also applied a heaping model to account for the heaping issues and performed MI to obtain the imputed dataset to achieve results devoid of rounding bias/errors. Appendix A describes the imputation process adopted for the reported distance data. The subsequent analysis was conducted using the imputed dataset. Compared to the raw data of the reported heaped distances, the imputed data can lead to a better analysis and interpretation, as illustrated by Ann et al. [20]. The study highlighted that when rounding is present, the imputed data gives a better fit and statistically significant estimates for a distance decay analysis. The imputed data also yielded results for cases where the raw data could not. In the case of a ROC analysis, the imputed data gave a smoother curve and unique Youden's index for each distance range. With heaping in the raw data, the multiple distance ranges corresponded to the same Youden's index and hence, the threshold extended to multiple distance ranges. Therefore, this study also applied a heaping model to account for the heaping issues and performed MI to obtain the imputed dataset to achieve results devoid of rounding bias/errors. Appendix A describes the imputation process adopted for the reported distance data. The subsequent analysis was conducted using the imputed dataset.

Distance Decay Analysis
The exponential form of distance decay was proposed by Zhao et al. [21], El-Geneidy et al. [22], Hochmair [23] and Larsen et al. [27] to forecast the travel demand. Compared to a buffer analysis, the exponential form of distance decay provides a better understanding of the transit catchment areas, by assuming varying the demand with the distance. This method was adopted by Ann et al. [20] to estimate the influence areas for the metro stations in New Delhi, and the following equation was used: where y is the percentage of passengers traveling longer than a particular distance d, and α is the exponential decay constant to be estimated. This function has a limitation in that it cannot reflect the curve shape change with the distance. In our studies, the coverage curve declined gently for short distances implying that people do not mind the distance increase in short trips. However, it decreases rapidly in a certain range of distances which means that the distance increase causes a strong impact on one's perception. Finally, the curve decreased slowly with a long tail for long trips as shown in Figure 5. This tendency was observed for all stations studied.
Halas et al. [28] suggested a compound power exponential form of the distance decay with two parameters to investigate the daily travel to work flows. The equation of this function can be written as: where d is the distance from the center, and α and β are the positive parameters. The function follows a bell-shaped curve reflecting the shape changes with the distance. The curve is concave in the beginning and then changes to a convex shape. The parameter α indicates the variation in the interaction with distance, i.e., the extent of interaction, and β explains the perception of commuters at various distances, determining the shape of the curve. Therefore, this function is expected to reflect our data properly. In this study, both Equations 1 and 2 were used to estimate the distance decay curves for the different stations with different modes. The estimates of the two decay functions for each station are presented in Table 1. The high t-statistic values for the parameter estimates signify satisfactory outcomes for the estimation. The correlation coefficient was close to one for all the categories in the compound power exponential function, indicating the closeness of the observed and estimated data. In the compound function, when the value of β is close to one, the function adopts the simpler exponential form. All the estimation results of β are higher than 1 to indicate the limitation of the simple exponential function. In addition, the estimated β values are different for the four stations and for the various modes. Therefore, this function makes it possible to catch the differences in the effects of the station locations and the access modes to the transit passengers. This function has a limitation in that it cannot reflect the curve shape change with the distance. In our studies, the coverage curve declined gently for short distances implying that people do not mind the distance increase in short trips. However, it decreases rapidly in a certain range of distances which means that the distance increase causes a strong impact on one's perception. Finally, the curve decreased slowly with a long tail for long trips as shown in Figure 5. This tendency was observed for all stations studied. Halas et al. [28] suggested a compound power exponential form of the distance decay with two parameters to investigate the daily travel to work flows. The equation of this function can be written as: where d is the distance from the center, and and are the positive parameters. The function follows a bell-shaped curve reflecting the shape changes with the distance. The curve is concave in the beginning and then changes to a convex shape. The parameter indicates the variation in the interaction with distance, i.e., the extent of interaction, and explains the perception of commuters at various distances, determining the shape of the curve. Therefore, this function is expected to reflect our data properly.
In this study, both Equations 1 and 2 were used to estimate the distance decay curves for the different stations with different modes. The estimates of the two decay functions for each station are presented in Table 1. The high t-statistic values for the parameter estimates signify satisfactory outcomes for the estimation. The correlation coefficient was close to one for all the categories in the compound power exponential function, indicating the closeness of the observed and estimated data. Comparing the goodness of fit of Model 1 and Model 2, the correlation coefficient is close to 1 for both of the models and the residual standard error is also very small (close to zero). However, Model 2 gives a better fit with respect to the shape of the distribution of the imputed data compared to Model 1. As an example, the distance decay curves estimated for the Lajpat Nagar station considering the informal mode are shown in Figure 6. The compound power exponential curve closely follows the distribution of the imputed data, whereas the exponential curve takes a simple form that does not reflect the observed data. This phenomenon was witnessed across the three modes. Comparing the goodness of fit of Model 1 and Model 2, the correlation coefficient is close to 1 for both of the models and the residual standard error is also very small (close to zero). However, Model 2 gives a better fit with respect to the shape of the distribution of the imputed data compared to Model 1. As an example, the distance decay curves estimated for the Lajpat Nagar station considering the informal mode are shown in Figure 6. The compound power exponential curve closely follows the distribution of the imputed data, whereas the exponential curve takes a simple form that does not reflect the observed data. This phenomenon was witnessed across the three modes.  Being an end station, outside the city limits, it can be assumed that commuters rely on informal modes to access the metro station from long distances and informal modes hence play a more prominent role around this station. In the case of the bus and private modes, Dwarka Mor and Vaishali show gradual decay and depict longer distances than the other two stations, highlighting the preference of these modes for long distances.  Being an end station, outside the city limits, it can be assumed that commuters rely on informal modes to access the metro station from long distances and informal modes hence play a more prominent role around this station. In the case of the bus and private modes, Dwarka Mor and Vaishali show gradual decay and depict longer distances than the other two stations, highlighting the preference of these modes for long distances.
The compound power exponential decay function was used to estimate the influence areas based on different percentiles. The mean, median, and percentile distances were estimated accordingly. Tables 2-4 summarize the estimated travel distances pertaining to walking, the informal modes, and the bus and the private modes, respectively. The distances were rounded to the nearest 100 m to facilitate the provision of references for the TOD planning. The statistical summary of the imputed data was compared with the estimates obtained using the distance decay function. The mean and median estimated from the distance function yielded slightly larger distances than those corresponding to the estimates from the statistical summary. Comparing the goodness of fit of Model 1 and Model 2, the correlation coefficient is close to 1 for both of the models and the residual standard error is also very small (close to zero). However, Model 2 gives a better fit with respect to the shape of the distribution of the imputed data compared to Model 1. As an example, the distance decay curves estimated for the Lajpat Nagar station considering the informal mode are shown in Figure 6. The compound power exponential curve closely follows the distribution of the imputed data, whereas the exponential curve takes a simple form that does not reflect the observed data. This phenomenon was witnessed across the three modes.   The mean, median, and percentile walking distances were comparable for all the stations. The decay function estimation provided a mean walking distance of 800 m for Karkardooma, Lajpat Nagar and Dwarka Mor and 900 m for Vaishali. The mean walking distance for Karkardooma, Dwarka Mor and Lajpat Nagar indicate an increase of 14%, as estimated for access trips by Ann et al. [20] for the DMR network. The 85th percentile distance, used to define the catchment areas for the transit stations, was 1200, 1200, 1100 and 1300 m for Karkardooma, Dwarka Mor, Lajpat Nagar, and Vaishali, respectively. Comparing the 85th percentile values for walking, these values were in agreement with the value of 1200-1400 m estimated by Ann et al. [20], with Lajpat Nagar exhibiting a slightly smaller distance. Compared to these distances, the influence area (500-800 m) specified in the National TOD policy is extremely conservative. The mean distances travelled by informal modes for Karkardooma, Dwarka Mor and Lajpat Nagar were 2300, 3100, and 2800 m, respectively, whereas the mean distance suggested by Ann et al. [20] for informal modes was 2900 m. For the outer city station, Vaishali, the distance was as much as 59% higher than estimated for New Delhi. The 85th percentile distance determined for the informal modes in the previous study was 5600 m for the informal transit. In this study, the 85th percentile distances for the three stations within the city boundaries (Karkardooma: 3300 m, Dwarka Mor: 4000 m, and Lajpat Nagar: 3700 m) were considerably smaller than estimated for the DMR network, although for Vaishali, the distance was 30% higher than the distance estimated for the regions within New Delhi. The variations in the mean and the 85th percentile distances were more evident when the distances for the bus and private modes were examined for Karkardooma, Dwarka Mor, Lajpat Nagar and Vaishali. The estimated mean distances were 4700, 7800, 5000, and 7800 m, respectively, and the estimated 85th percentile distances were 6800, 11,200, 6700, and 10,700 m, respectively. Dwarka Mor (subcity station) and Vaishali (outer city station) exhibited larger mean and 85th percentile values compared to the other two stations situated in the core urban areas of New Delhi. These stations being far away from the station, and being one of the easiest ways to reach other parts of the city, the commuters are willing to travel further on buses than other stations. Vaishali and Dwarka Mor reported a 26 % increase in the mean distance for the DMR network. The 80th and 85th percentile estimates for Vaishali and Dwarka Mor were smaller compared to the estimate for the DMR (access).

ROC Analysis
Adopted from the field of medicine, the ROC analysis has been applied in the field of transportation studies. The ROC curves have been used to estimate the threshold distances walked or cycled by students to a school or university [29][30][31]. This method provides a simple yet effective approach to estimate the threshold distances by comparing the number of active users (people who walk) versus the number of passive users (commuters who use other modes) for different distance ranges. In this research, the threshold walking distance for each station was estimated by taking into account the tradeoff between the true (sensitivity) and false positive rates (1-specificity) across a series of distance ranges by using a ROC analysis. For each distance range, the active users were the commuters who walked to access transit stations, and the passive users were commuters who used informal modes, buses, and private modes to access transit. The ROC curves for each station are shown in Figure 8. The threshold distance is calculated using the Youden's index. Youden's index is described as the maximum vertical distance from the ROC curve to the diagonal from the lower left corner to the top right corner of the graph. However, it was not possible to estimate the threshold distance of the informal, bus and private modes by using the ROC curve, since the false positive rates for long distance travel with those modes could not be obtained.
As shown in Table 5, Youden's index was calculated as −0.575, −0.798, −0.767, and −0.763 for Karkardooma, Dwarka Mor, Lajpat Nagar, and Vaishali, respectively. The area under the curve (AUC) values for the curves were approximately 1, and thus, the analysis can be considered effective to obtain the threshold values. The estimation results are presented in Table 5. The threshold walking distance for Dwarka Mor and Vaishali was 1300 m, and for Karkardooma and Lajpat Nagar, it was 1200 m. These values are located between the 80th and 85th percentile values of the decay analysis shown in the previous section. This is consistent with the results of Ann et al. [20].
The results for the threshold distances indicate that commuters are willing to walk similar distances to all stations. The mean distances walked to the stations are also comparable to each other. Although the mean distances travelled by modes other than walking exhibited differences among the stations. The threshold distances were not impacted as the maximum distances for walking were 2500 m for Karkardooma, Lajpat Nagar, and Dwarka Mor and 3500 m for Vaishali. Ker and Ginn [32] implied that walking distances in urban areas are larger than those walked to the stations in suburban areas, as demonstrated in the case of Perth. However, such a conclusion cannot be drawn from the cases considered in the present study. Further research needs to be conducted with more types of stations and more samples in each station type to enable the derivation of conclusive remarks.   The results for the threshold distances indicate that commuters are willing to walk similar distances to all stations. The mean distances walked to the stations are also comparable to each other. Although the mean distances travelled by modes other than walking exhibited differences among the stations. The threshold distances were not impacted as the maximum distances for walking were 2500 m for Karkardooma, Lajpat Nagar, and Dwarka Mor and 3500 m for Vaishali. Ker and Ginn [32] implied that walking distances in urban areas are larger than those walked to the stations in suburban areas, as demonstrated in the case of Perth. However, such a conclusion cannot be drawn from the

Conclusions
In this study, the focus was on the last mile distances travelled to individual stations of the DMR network in New Delhi. The objective was to compare the last mile distances travelled on different modes among stations, and to establish the TOD influence zones for the metro stations. The results are aimed at influencing the TOD policy in India and helping create TOD policies that are suited to the urban and transport characteristics in India.
In the primary survey carried out for the study, the issues of rounding and heaping were observed, highlighting the issues in the data collection for transportation studies in India, where there is already a dearth of sufficient data. The potential bias in the results of the estimation was removed by creating an imputed dataset, which was subsequently used to perform a distance decay analysis and a ROC analysis for determining the extent of the TOD influence areas.
The bell-shaped curve of the compound power exponential form of distance decay was found to be reliable to investigate the decreasing interaction between the distance from the stations and the percentage coverage of passengers. The estimation result of the decay function provided that the extent of the TOD influence area varies with access modes as well as with the location of the station. The mean and the percentile values of the travel distances increase in the order of walking, the informal modes, and the bus and private modes.
For walking, the difference among stations was not significant implying the willingness to walk does not vary much with the location of stations. Vaishali, the outer city station exhibited slightly higher distances than the other three stations. However, the mean walking distance for Karkardooma, Dwarka mor and Lajpat Nagar indicate an increase of 14%, as estimated for access trips by Ann et al. [20] for the DMR network. The outer station, Vaishali showed a 29% increase. The threshold distances estimated using the ROC analysis were in agreement with the 80th-85th percentile distances for walking. The threshold walking distance for the four stations lies in the range of 1200-1300 m which is close to the result of the general case across all stations in New Delhi [20]. These distances also indicate that the size of the influence area (500-800 m) specified in the National TOD policy and the Master Plan for Delhi 2021 is extremely conservative.
In the case of informal modes, there was considerable variation among stations. Vaishali, which is an outer station, corresponded to nearly two times the distance for the other three stations. For Vaishali, the distance was as much as 59% higher than estimated for New Delhi for the mean distance whereas the 85th percentile distance was 30% higher. It means that people who live outside of the city usually travel longer distances on informal modes to reach stations compared those who live inside of the city.
This large variation was also noted when comparing the last mile distances for the bus and private modes. Dwarka Mor (subcity station) and Vaishali (outer city station) corresponded to larger distances compared to those of the other two stations situated in the core urban areas of New Delhi. The mean distances and the 85th percentile distances for Dwarka Mor and Vaishali are nearly twice as much as those for Karkardooma and Lajpat Nagar. Vaishali and Dwarka mor also reported a 26 % increase in the mean distance than the DMR network. The 80th and 85th percentile estimates for Vaishali and Dwarka mor were smaller compared to the estimate for the DMR (access) [20].
It can thus be concluded that variations are present in the last mile distances among stations. Although the walking distances did not vary considerably among stations, large variations were observed when other modes were compared. The outer city station, Vaishali, exhibited longer distances for informal modes, buses, and private modes, which illustrates that access to such metro stations means commuters tend to travel longer distances on motorized modes. Therefore, when considering multimodal accessibility and multimode-based TOD, these differences in accessibility must be taken into account.
The study provides insights into the last mile patterns for the four DMR stations, and the extent of the influence area for each mode was calculated. The results are not the same across the selected stations, however, they are not considerably different either. Further research needs to be conducted across more station types to arrive at a conclusive remark regarding the size of the influence areas for specific station types.
The estimates of the influence areas for each station can be used to delineate the influence areas for these stations in New Delhi, enabling planners and policy makers to cater to a larger population and maximize the benefits of TOD. The influence areas based on walking should be at least 1200 m based on this study and Ann et al. [20]. These estimates can be used for a brownfield development (existing stations) and are based on the last mile patterns of existing commuters. Given that informal modes play a very important role in last mile connectivity, the inclusion of informal modes increases the size of the influence areas considerably as shown in this study. The findings from this study can be applied in other cities in India as well as in other developing countries with the presence of informal modes as a preliminary guideline to assess the influence areas and to understand that these areas can extend further than 500-800 m. The methodology can be applied to last mile distances of transit systems and to individual stations. The MI need be applied only in cases where the respondents tend to round the distances.
The cases of rounding in the reported distance data for the four stations are presented in Table A1. The reported distances were heaped at 100, 500, and 1000 m for walking, and at 500, 1000, and 5000 m for bus and private modes across all stations. For informal transport, the rounding was observed at four levels, i.e., 100, 500, 1000, and 5000 m for two stations, Karkardooma and Dwarka Mor. For the other two stations, the rounding was observed at 500, 100, and 5000 m. Yamamoto et al. [26] encountered the issue of rounding in the reported vehicle kilometers travelled in their study, and a heaping model was used to account for the rounding and heaping errors. The model takes the form of a discrete mixture of an ordered probit model. Ann et al. [20] used the same heaping model to account for rounding and heaping issues in last mile distances, which followed a log normal distribution. Higher coarseness was observed at larger distances in both studies. The present study draws from the two preceding studies [20,26]. The first part of the model uses a distance function to articulate the distribution of the heaping data, as defined in Equation (A1): where y * i is the actual distance of individual i, and y i is the reported distance. β is a parameter vector, x i is a vector of explanatory variables, and ε i is a random variable that follows a normal distribution with a mean of 0 and variance of σ 2 ε . Three modes were considered: walking, informal modes, and bus and private modes, and for each mode, x i was treated differently. Based on the cases of rounding presented in Table A1, different categories of rounding were considered. The walking distances were most likely to be rounded to multiples of 100, 500, and 1000 m all the four stations. The distances travelled by informal transport were assumed to be rounded to multiples of 500, 1000, and 5000 m for the Lajpat Nagar and Vaishali stations. However, for Karkardooma and Dwarka Mor, the rounding was considered to occur at multiples of 100, 500, 1000, and 5000 m. As mentioned earlier, bus and private modes were treated as a single category owing to the small sample size, and the corresponding rounding was considered at multiples of 500, 1000, and 5000 m. Considering these rounding ranges, the actual distance y * i was expected to lie in the ranges [y i − 50, y i + 50], [y i − 250, y i + 250], [y i − 500, y i + 500], and [y i − 2500, y i + 2500], respectively, if the reported distance was rounded to multiples of 100, 500, 1000, and 5000 m.
The latent variable, which indicates the coarseness of the reported distance, is a function of the actual distance, and the coarseness function can be defined as in Equation (A2) where, z * i denotes the unobserved tendency of the coarseness of the reported data, α and γ are parameters, and ζ i is a normally distributed random variable with a mean of 0 and variance of σ 2 ζ . X i is the socioeconomic parameter. The coarseness of the reported data, z i , can be discretized as given in Equation (A3), considering only θ 1 if three cases of rounding are present, and considering two threshold values θ 1 and θ 2 if four cases are present.
In the ordered response model, the reported distances can be assigned to specific rounding categories based on the known coarseness levels. Specifically, if z i = 1, the distance is assumed to be rounded to the nearest 100 m. Similarly, if z i = 2, 3, 4, the reported distance is assumed to be rounded to the nearest multiple of 500, 1000, and 5000 m, respectively. ln y * i , z * i is assumed to be distributed as a bivariate normal with mean and covariance as given in Equations (A4) and (A5), respectively. The log-likelihood function for the parameters is estimated by maximum likelihood (ML) method; the log likelihood function is given by Equation (A7).
where f ln y * i , z * i is the bivariate normal of E and V. The estimation results of the bivariate ordered response probit model are presented in Table A2. The estimates have high statistical significance. Variables like age, gender, income, vehicle ownership, etc. were found to be not significant in explaining the reported distance or the coarseness. The log-distance coefficient α, was positive for all modes across the four stations, signifying the increase in coarsened with travel distance. Comparing across stations, α, was least for Vaishali station for all modes. It signifies that commuters from had the least tendency to round travel distances. The estimate for α for bus and private modes for Dwarka Mor was comparable to Karkardooma. It may be said that for these two stations, last mile distances travelled on bus had lesser coarseness levels with distance than other modes and other stations. The coarseness function constant, γ, also comparatively lower for Vaishali, imply lower rounding for last mile distances reported in Vaishali. Considering the distance function, β signifies the distances travelled for each mode. The estimates of β were highest for Vaishali followed by Dwarka Mor, i.e., stations far away from the city center had longer last mile distances for all modes compared to stations nearer to the city center.  Figure A1. Histogram of distances travelled on informal modes for Karkardooma station (before imputation). The distance intervals of the true distance * were determined using the random heaping model described in the previous section. The true distance can be obtained using the reported distance values and the estimated parameters of the heaping model, as presented in Table A1. The simple rejection sampling approach, based on the work of Heitjan and Rubin [24], Drechsler and Kiesl [25], and Ann et al. [20] is used to conduct the imputation process. In this process, the estimated parameters of the heaping model, = ( , , , , ) and the fixed observed data ( , ), were used to derive values for (ln( * ) , * ) for individuals = (1, … , ) subject to , confining (ln( * ) , * ) to the plausible region defined by Equations (A3) and (A6) for each mode.
The candidate values were derived 1000 times from a truncated bivariate normal distribution and tested for boundary conditions to obtain the imputed data with 1000 points. Figures A1 and A2 display the relative distribution of the distance data before and after imputation, respectively for informal modes at Karkardooma. It can be seen from these graphs that using the MI process, the shape of the distribution is maintained, and the heaping of the data is eliminated. Similar results were obtained for all stations for all modes.