Exploring the Impact of Built Environment Factors on the Relationships between Bike Sharing and Public Transportation: A Case Study of New York

: Bike sharing offers a usable form of feeder transportation for connecting to public transportation and effectively meets unmet travel demands, alleviating the pressure on public transportation systems by diverting urban commuters. To advance the comprehension of how the built environment shapes the relationship between bike-sharing systems and public transport modes, we implement a categorization framework that divides bike-sharing data into three distinct patterns: competition, integration, and complementation, based on their coordination with public transportation. The SLM model is employed to investigate the complex correlations between the relationship patterns and four key groups of environmental factors encompassing land use, transportation systems, urban design, and social economy. We ﬁnd a strong correlation between four groups of environmental factors and three relationship patterns. Furthermore, the built environment variables exhibit signiﬁcant variations across the three patterns. Users in the competitive mode prefer the ﬂexibility of shared bikes and place a higher value on the sightseeing and leisure beneﬁts. Instead, users in the integration and complementation modes tend to prefer shared bikes to supplement unmet travel demand and place a higher value on commuting beneﬁts. These ﬁndings can beneﬁt urban planners seeking to encourage greater diversity in transportation modes and incentivize more commuting.


Introduction
Bike-sharing systems offer a sustainable [1] and rapidly expanding transportation option [2], allowing individuals to rent bikes for short-distance trips [3].Previous research has recognized the numerous benefits of bike sharing, such as its convenience [4], costeffectiveness [5], and promotion of healthy lifestyles through increased physical activity [6].Bike sharing can also reduce the use of motor vehicles, decreasing road congestion and saving road space [7].Compared to privately owned bikes, shared bikes are used on demand, eliminating concerns related to maintenance costs and liability [8].
The association between bike sharing and public transportation is complex.Bike sharing can function as a convenient feeder mode that connects to public transportation systems, which effectively integrate with public transportation while simultaneously competing with it to offer commuters a viable alternative option.There are three types of usage patterns: competition, integration, and complementation [9].Integrating bike sharing and public transportation systems is considered an essential method for improving public transportation efficiency [10], where transfer stations seem to be in close proximity to where a person lives or works [11].Martens et al. [12] proposed that "improving public transportation integration" could be one of the most critical paths for the future of bike sharing.Conversely, bike sharing competes with public transportation during specific periods.For example, a study in major cities in the United States emphasized that bike sharing may have a negative effect on public transportation [9].Therefore, the substitution effect is accompanied by the integration effect and cannot be ignored [13].
According to recent studies, the built environment plays an essential role in determining the frequency of bike-sharing usage, particularly in urban areas.The layout and infrastructure of a city's physical environment strongly influence the accessibility, safety, and convenience of bike sharing, which impact the usage patterns of the mode of transportation [14,15].For instance, residential and commercial areas generally account for a significant percentage of bike-sharing commute trips [16].In addition, a study conducted on the Vélo'v program found that leisure activities, such as dining out, going to the cinema, or visiting recreational places, primarily take place in urban centers [17].Ni and Chen [18] found that residential and office areas have a higher demand for bike sharing and metro integration, while Guo et al. [19] found a positive correlation between land use mixture and integration.Despite extensive research on the impact of the built environment on bike sharing, there are still a number of areas that require further investigation and analysis.Firstly, the understanding of the impacts of built environment factors on the connections between bike sharing and public transportation is limited.Secondly, the research on the relationship between bike sharing and public transportation is not thoroughly explored.It is unclear whether bike sharing is meeting previously unmet travel needs or diverting commuters away from public transport.
To address the research gaps identified above, the objective of this research is to explore the impact of built environment factors on the relationships between bike sharing and public transportation by investigating the case of New York.It is important to note that the aim of this study is not to establish a precise causal relationship between shared-bike riding patterns and variables related to the built environment, nor is it to develop a predictive model for shared-bike riding patterns based on built environment variables.Instead, the focus is on analyzing the variations in correlation between different shared-bike modes and a diverse range of built environment variables.
Based on the literature review, this study enriches the existing research in three dimensions.Firstly, this study enhances the understanding of the built environment patterns associated with the transfer between public transportation and bike sharing.The built environment comprises three essential components: land use, transportation systems, and urban design [20].Nonetheless, the contribution of urban design factors as explanatory variables has been disregarded.In this study, a more comprehensive definition of the built environment is described by utilizing multisource data such as land use, transportation systems, urban design, and social economy.In addition, compared to previous studies, we have implemented a novel approach of categorizing bike-sharing data for regression analysis, aiming to enhance the accuracy of the obtained results.
Secondly, the relationship between bike sharing and public transportation is further revealed in New York using a systematic analytical method.Numerous studies have examined the correlation between public transportation and bike usage, yet the consensus on this matter remains uncertain.Based on the classification method proposed by Yunhe Cui et al. [9], this study developed a more sophisticated model classification algorithm using the Python programming language, which improved the accuracy of classifying the three distinctive patterns of competition, integration, and complementation.
Thirdly, this study contributes to future research by proposing a quantitative research framework that classifies bike-sharing data and builds a regression model with the built environment.To promote future research and the sharing of this innovation, the source data and research code are available at "https://github.com/tingfeng6/CITIBIKE-DATACLASSIFICATION(accessed on 17 July 2023)".

Literature Review
Since the aim of this study is to explore the impact of built environment factors on the relationships between bike sharing and public transportation, this section reviews previous literature from the following two aspects: (1) the effects of the built environment on bike sharing; (2) the relationship between bike sharing and public transportation; (3) modeling approach of the relationship between biking sharing and built environment.

The Effects of the Built Environment on Bike Sharing
Scholars and researchers have shown increasing interest in understanding the connection between shared-bike activity and the built environment [21].According to Handy et al.'s definition [20], the built environment consists of three parts: land use, transportation systems, and urban design.In this study, we extend the dimensions of the urban built environment by adding the social economy aspects.Therefore, this paper presents a review of the connection focusing on land use, transportation systems, urban design, and social economy.
The utilization patterns of shared bikes can vary depending on the land use characteristics of the areas where bike stations are located.Research has found that the presence of shopping malls is positively associated with overall bike-sharing usage [22][23][24][25].FaghihImani et al. pointed out that recreational points of interest (POI) are significant contributors to high volumes of shared-bike ridership [26].Wang et al. found that the closer the distance between bike stations and lakes, rivers, and parks, the greater the passenger capacity of shared bikes parked there [17].Meanwhile, residential and commercial land use usually generates much shared-bike commuting [16].
In addition, many academic papers analyzed the interconnectedness of bike sharing and transportation systems.In general, there is a positive correlation between the number of nearby bike stations and the usage of bike-sharing services [21].Ni and Chen [18] found that in both the United States and China, bus stops within a radius of 400 to 500 m from metro stations impact the combined utilization of bike sharing and metro services.Similar findings were reported by Guo and He [27] in China.According to research conducted in the United States [28] and China [29], road density plays a crucial role in bike-sharing usage.Within high-density road areas, pedestrians have easy and quick access to many bikes within walking distance.
Research evidence suggested that the urban design, including density and compactness, as well as the scale of the city, can impact bike usage [27,30].For example, individuals may be more likely to use motorized transportation modes in modern cities with sprawling urban designs due to longer travel distances.On the other hand, compact urban areas are often associated with higher levels of active transportation modes such as walking and cycling.This is partly because shorter travel distances between origin and destination within compact urban areas make active transportation modes more convenient and attractive than motorized options [31].
The social economy conditions of an entire region can also impact the demand for bike-sharing services.Multiple studies conducted in the U.S. [32] and China [25,29,33] have demonstrated that population density has a significant positive impact on bikesharing usage.However, a Polish study by Radzimski and Dzięcielski [24] found that population density has a negative impact on longer trips.The impact of population density on bike-sharing usage appears inconsistent across various case studies, likely because of the nonlinear influence of density on travel behavior [34].Regarding employment density, studies conducted in Canada [35] and China [29] indicate that this variable is positively associated with various factors related to bike-sharing services, such as the destination choice behavior of bike-sharing users, the demand for bike-sharing services, and the usage of bike-sharing services among immigrant populations.

The Relationship between Bike Sharing and Public Transportation
There are many studies focusing on the relationship between public transportation and bike usage.However, the conclusion on this point remains debatable.Some researchers argued that there is a significant level of integration between bike sharing and public transportation.For example, The findings of a Washington D.C. case study indicate that a 10% rise in bike trips results in a 2.8% increase in ridership for public transportation [36].Some researchers found that areas with more bus or subway stops are typically associated with higher usage rates of shared bikes [37].
In contrast, bike-sharing services may compete with public transportation during certain circumstances.A study conducted across major cities in the United States indicates that bike sharing could decrease the number of individuals using buses as their mode of transportation [38].Campbell and Brakewood's research discovered a substantial decrease in the number of individuals using buses upon the expansion of NYC's bike-sharing systems [39].Guo and He [27] conducted research in Shenzhen, revealing that an increase in the number of metro stations does not necessarily correlate with the promotion of bike-metro integrated usage.One possible explanation for this is that the metro system is relatively advanced, and the density of metro stations is high in contemporary Chinese cities.Therefore, people can readily access these metro stations by walking instead of cycling, rendering the use of bikes for accessing metro stations redundant.

Modeling Approach of the Relationship between Bike Sharing and Built Environment
Table 1 presents a comprehensive summary of pertinent studies investigating sharedbike systems that have employed spatial regression models.The table contains details regarding the data utilized, criteria for model selection, independent and dependent variables examined, as well as the methodology employed for model validation.As shown in Table 1, many researchers conducted their studies based on big data (8 out of 14) because big data improve the efficiency of data collection and processing at the city level.As for the independent and dependent variables, 7 of 14 studies used ridership or usage rate as the target variable, indicating the wide applicability of this variable for studying travel-related issues.However, these studies have primarily focused on the utilization of shared bikes in isolation, neglecting to take into account the intricate and multifaceted relationship between shared bikes and public transportation systems.Transportation systems and social economy variables were the most frequently used types of explanatory variables and were considered in 12 out of 14 studies.Urban design factors were hardly considered as explanatory variables.In terms of the models and verification methods, geographically weighted regression (GWR) model and spatial lag mode (SLM) model were used frequently in existing research because the influence of space-time factors can be fully considered by these models.In contrast, machine-learning models such as random parameters were less used by researchers because these models cannot account for geospatial autocorrelation and spatial heterogeneity.Moran's I has traditionally been a widely employed statistical measure for assessing the presence of spatial autocorrelation among variables.R 2 functioned as a suitable measure of adequacy for ordinary least squares (OLS) regressions, while p-value is frequently employed to analyze the associations among variables (11 out of 14).

Research Framework
The three major stages of the research methodology for the present study are presented in Figure 1.
integration, and complementation.In the second step, multiple built environment variables are collected and divided into four categories: land use, transportation systems, urban design, and social economy to ensure the comprehensiveness of the data.Since different data have different dimensions and ranges, STATA is used to standardize the data for the convenience of subsequent data analysis.The third step is to analyze the impact of built environment variables on competition, integration, and complementation.The study area is evenly covered with a square grid with a side length of 500 M, and the sum of the values of the variables contained in each grid is calculated.The Moran index is used to analyze the autocorrelation of the space, and the SLM model is established as a suitable model by LM test.The results are analyzed from multiple perspectives, including descriptive statistical analysis, OLS regression, and SLM regression, to ensure the validity of the results.Finally, the analysis results provide suggestions for urban design.

Study Area
New York City, the most popular city in the United States, occupies an area of 302 square miles and spans five county-level administrative districts.Due to the high cost of living in Manhattan, approximately one-third of daily commutes occur between different administrative districts.In contrast to typical car-oriented American cities, less than 10% of trips are made by private cars, owing to the existing public transportation-oriented In the first step, bike-sharing data are collected from Citi Bike website, Google API is called and Python code is written to divide the raw data into three categories: competition, integration, and complementation.In the second step, multiple built environment variables are collected and divided into four categories: land use, transportation systems, urban design, and social economy to ensure the comprehensiveness of the data.Since different data have different dimensions and ranges, STATA is used to standardize the data for the convenience of subsequent data analysis.The third step is to analyze the impact of built environment variables on competition, integration, and complementation.The study area is evenly covered with a square grid with a side length of 500 M, and the sum of the values of the variables contained in each grid is calculated.The Moran index is used to analyze the autocorrelation of the space, and the SLM model is established as a suitable model by LM test.The results are analyzed from multiple perspectives, including descriptive statistical analysis, OLS regression, and SLM regression, to ensure the validity of the results.Finally, the analysis results provide suggestions for urban design.

Study Area
New York City, the most popular city in the United States, occupies an area of 302 square miles and spans five county-level administrative districts.Due to the high cost of living in Manhattan, approximately one-third of daily commutes occur between different administrative districts.In contrast to typical car-oriented American cities, less than 10% of trips are made by private cars, owing to the existing public transportationoriented infrastructure and policies.Thus, commuters in New York City heavily rely on the Metropolitan Transportation Authority (MTA), which includes buses, subways, and commuter railroads.With the successful development of the sharing economy, bike sharing has become increasingly popular.As the dominant dock-based systems in the greater New York City area, Citi Bike provides service coverage with more than 13,000 bikes and over 850 stations in the city core.Given that Citi Bike stations are relatively concentrated in Manhattan and its surrounding areas, we select 19 communities with relatively evenly distributed station locations as the study objects to ensure the validity of the experimental results.Figure 2 shows the distribution of Citi Bike docking stations and public transportation stations in the study area.

Selection and Processing of Variables
Previous studies have investigated the relationship between the built environment and bike usage using diverse data types that vary according to context.Early research relied on surveys to gather information on the characteristics of bike users and factors in the built environment that may affect their transportation decisions [48].More recently, scholars in Europe and North America have shared bike-sharing data online to foster research on these programs.The large amount of data available has provided a foundation for many empirical studies on bike sharing.In this study, we extract user data from Citi Bike "https://citibikenyc.com (accessed on 10 May 2020)" and classify it as the dependent

Selection and Processing of Variables
Previous studies have investigated the relationship between the built environment and bike usage using diverse data types that vary according to context.Early research relied on surveys to gather information on the characteristics of bike users and factors in the built environment that may affect their transportation decisions [48].More recently, scholars in Europe and North America have shared bike-sharing data online to foster research on these programs.The large amount of data available has provided a foundation for many empirical studies on bike sharing.In this study, we extract user data from Citi Bike "https://citibikenyc.com (accessed on 10 May 2020)" and classify it as the dependent variable.Additionally, we use a range of open geospatial data as independent variables to enhance the model's explanatory power.Employing QGIS tools, the study area is divided into grids with a side length of 500 m.Statistical measures of the three travel modes and explanatory variables for each grid are computed.

Dependent Variables
The trip data were collected for five consecutive working days of a week from the Citi Bike system's data website.This dataset comprises approximately 275,000 trips across the five administrative districts of New York City between 6 July and 10 July 2020.Each trip entry provides information on the start time, end time, start dock ID, end dock ID, and trip duration.We only consider trips within 1 min to 3 h, as trips outside this range are deemed atypical.Subsequently, we utilize the GOOGLE API "https://developers.google.com/maps/documentation/distance-matrix?hl=zh-cn (accessed on 1 March 2023)" and develop Python code to classify the trip data into three dimensions: competition, integration, and complementation.

Independent Variables
Grounded on the classic composition elements of the built environment proposed by Handy et al. [20], this study selects 30 independent variables within four categories: land use, transportation systems, urban design, and social economy factors.Land use includes two types of data: land use type and POI data.It is concerned with the distribution of various types of land and the spatial location and density of activities.Land use has a direct impact on accessibility from origin to destination.Based on the current research addressing the effects of the built environment on bike sharing [30,49], this paper selects 14 categories to represent the land use status.The transportation systems typically comprise physical infrastructure to support and facilitate transportation.By providing connections between different activities, the transportation systems would affect the ease of individuals in reaching their destination from their origin.In this study, four types of public transportation facilities, including bus stations, metro stations, parking facilities, bike-share stations, and the total length of highways within the unit grid, are selected as explanatory variables.Finally, urban design pertains to the appearance and layout of physical elements, such as the shape of blocks and the coverage of trees.It affects mode choice by influencing individuals' attractiveness judgments and sense of safety.Furthermore, the dimension of the built environment is extended by introducing social economy factors like population density, crime rate, rental prices, total land value, and poverty rate.Table 2 shows the classification and the data sources of independent variables.

Citi Bike Data Classification
To investigate the correlation between shared bikes and public transportation, the data are classified using three criteria [9]: bike travel time, proximity to public transportation stations, and bike trip length.The first criterion entails assessing the duration of a bike trip in comparison to the equivalent journey taken using public transportation.Citi Bike supplies the individual transit time of shared-bike rentals.At the same time, we utilize the Google Maps Directions application programming interface (API) for the calculation of public transit time.More specifically, for each bike trip, we obtain the public transit time using the same starting and ending points of bike rental.It is noteworthy that the transit time generated by the Directions API encompasses the multimodal aspects of public transportation, including the time of walking to and from public transit stations and the time spent during transfers.The Direction API generates transit time with a focus on minimizing commute duration.It selects the best public transportation combination, including intermodal routing, to ensure efficient and fast travel.This comprehensive consideration provides a realistic estimation of transit time.
In terms of proximity to public transportation stations, previous articles have indicated that the pedestrian-friendly distance between public transportation stations and public bike stations is approximately 300 m [42,50].Therefore, the original method [9] of setting the buffer zone for the integration mode at 100 m is too small.Considering the high density of public transportation stations in New York City, a 300 m buffer zone is too large.Therefore, the screening radius for the integration mode is increased from 100 m to 200 m instead of 300 m.It is important to acknowledge that a more stringent validation condition is associated with a smaller maximum value.Conversely, as the maximum value increases, there is a possibility of including some transfer behavior data that may not be representative.
Regarding the length of bike trips, Google Maps Directions application programming interface (API) is used for the calculation of cycling distance.In accordance with the research findings of Cui et al. [9], it is inferred that employing bike sharing for distances within two miles, wherein the starting point or destination is a public transit station, presents a greater probability of integrating between bike sharing and public transit.Thus, we consider two miles as the threshold for the preference of bike usage.A 10 min travel time threshold criterion has been set to ensure the practical selection of shorter trips, as a typical bike ride covering a two-mile distance usually takes less than ten minutes.
The classification framework is shown in Figure 1.A bike trip is considered to be in competition with public transit if the alternative public transit option can be completed in a shorter duration.This implies that travelers might choose bike sharing even if it requires more time compared to using public transit.The integration mode refers to the journey that satisfies the following conditions simultaneously: (1) it takes less time than using public transit as an alternative; (2) either the starting point or the destination is within a 200 m radius of a public transit station; (3) the distance of the trip is less than two miles, and the ride should take no more than ten minutes.Such an integration trip represents a short-distance travel option that enables seamless mode switching between the bike and public transit systems.A bike trip is deemed complementary to the public transit system when it falls outside the aforementioned scenarios.This complementary trip, although more time efficient compared to using public transit, is either situated far away from a public transit station or involves a long-distance journey, indicating that its purpose is unlikely to involve mode switching.

Spatial Autocorrelation
The present study conducts a spatial autocorrelation analysis on the three dependent variables of competition, integration, and complementation using GEODA software.Moran's I test is the most commonly used method to evaluate spatial autocorrelation, which can be expressed as follows [51]: As shown in Equation (1), n represents the number of spatial units; w ij is the weight between positions i and j; y i and y j represent attribute values chosen at locations i and j, respectively; and y is the mean value of all observations.The Moran's I statistic ranges between −1 and +1, with a higher positive value indicating spatial aggregation where nearby observations tend to have similar attribute values and distant observations have different ones.Negative values indicate spatial dispersion, while values near zero suggest random spatial distribution.The null hypothesis of the Moran's I test assumes that the explanatory variables are independent in space, implying that the Moran's I statistic is close enough to zero.The Z-score is typically used as an indicator of the significance of the Moran's I statistic to test the null hypothesis, and it is calculated as follows: E(I) and Var(I) are the expected value and standard deviation of Moran's I statistic, respectively.The significance level of this study is p < 0.05.

Spatial Lag Model (SLM)
While machine-learning nonlinear models have the capability to capture intricate relationships, they are not employed in this study due to the spatial data's autocorrela-tion [52], which does not adhere to the assumption of independent distribution followed by the majority of traditional machine-learning models.Despite the potential of the geographically and temporally weighted regression (GTWR) model to analyze the explanatory variables through spatiotemporal weighted regression [53], it has been deemed unsuitable for implementation in this particular research.The primary emphasis of this study lies in the macroscopic comparison of the relationships between the three bike-sharing usage patterns and the built environment rather than conducting a spatial heterogeneity analysis of individual dependent variables through the application of geospatial weighted regression.As such, the adoption of the GTWR model is not warranted in the present context.
In this study, the influence of spatial distribution on regional behavior is significant.To account for spatial autocorrelation, a spatial lag model (SLM) is used instead of a spatial error model, as the Lagrange multiplier test and robust LM test show significant results in all three models.The SLM model incorporates spatial lag terms of the dependent variable, allowing for neighborhood effects and spatial externalities analysis across different spatial units [48].This is described in detail by Sun et al. [54], and the model specifications are provided below: The model includes a vector y, representing the number of rides on shared bikes, and a matrix X, containing explanatory variables.Additionally, a weighting matrix Wy is incorporated, along with a coefficient vector β.The error term is represented by ε.

Descriptive Statistical Analysis
According to the study findings, the proportion of integrated travel (54%) is the highest, followed by competitive travel (39%), but complementary travel (6%) accounts for a relatively small proportion of travel modes (Figure 3).The findings indicate a substantial integration of bike sharing with public transportation.Despite the fact that competitive travel often takes more time than alternative options via public transportation, bike sharing can compete with public transportation to a significant degree and attract potential public transportation users.
Figure 3 shows the spatial distribution of the three travel modes.Generally speaking, shared-bike utilization under the three modes is concentrated on Manhattan Island and less frequently utilized in eastern Queens and Brooklyn, in alignment with prior research conducted by Yang et al., indicating that shared-bike usage rates decrease as the distance from the CBD increases [55].The competitive mode is mainly distributed in the south-central region of Manhattan Island and along its eastern and western coasts.This distribution may be due to the prevalence of cultural institutions, central parks, and museums that generate a high volume of urban tourism demand, causing more people to choose bikes as an alternative to urban public transportation.In contrast, the integrated and complementary modes are concentrated in the southern portion of Manhattan Island, which may be due to the high concentration of corporate headquarters, high-end office buildings, and working commuters with little commuting needs.These factors promote the integration and complementary effects between shared bikes and public transportation in the southern commercial and financial center of New York City.Results of descriptive statistical analysis of the variables are attached in Table 3.
According to the study findings, the proportion of integrated travel (54%) is the highest, followed by competitive travel (39%), but complementary travel (6%) accounts for a relatively small proportion of travel modes (Figure 3).The findings indicate a substantial integration of bike sharing with public transportation.Despite the fact that competitive travel often takes more time than alternative options via public transportation, bike sharing can compete with public transportation to a significant degree and attract potential public transportation users.To further analyze the spatial distribution of the three modes, this study employs a 500 m uniform grid to calculate the frequency of usage for each mode per grid unit.The top 20% of grid units with the highest frequency of usage are identified as high competition, high integration, and high complementation for each mode, respectively (Figure 4).The high competition mode is most prevalent for Area A, while the other two are less common.This may be due to the quiet and pleasant environment near Central Park, with a population of higher income and more leisure time, providing a tremendous bike-sharing tourism value rather than commuting value.Both high integration and high complementation modes are prevalent for Area B, while the high competition mode is less common.This may be because the traffic flow is high and road conditions are more complicated in the central and western districts, with a large population of international students living in rented apartments, which are more likely to use bike sharing for commuting.The distribution of high complementation mode in Area C is relatively less common than in the other two.This may be due to the presence of landmark buildings in this area, providing high tourism value, and being located in the financial district of New York, with a large population of workers for whom commuting value is more important.As the transportation system in this area is well developed, bike sharing is highly integrated with public transportation.For Area D, the high complementation model and the high integration model prevail notably over the competitive model, likely due to Brooklyn's relatively limited accessibility and diminished cityscape compared to Manhattan.Consequently, bike sharing predominantly serves as a tool for urban commuting rather than a mode of travel.To further analyze the spatial distribution of the three modes, this study employs a 500 m uniform grid to calculate the frequency of usage for each mode per grid unit.The top 20% of grid units with the highest frequency of usage are identified as high competition, high integration, and high complementation for each mode, respectively (Figure 4).The high competition mode is most prevalent for Area A, while the other two are less common.This may be due to the quiet and pleasant environment near Central Park, with a population of higher income and more leisure time, providing a tremendous bike-sharing tourism value rather than commuting value.Both high integration and high complementation modes are prevalent for Area B, while the high competition mode is less common.This may be because the traffic flow is high and road conditions are more complicated in the central and western districts, with a large population of international students living in rented apartments, which are more likely to use bike sharing for commuting.The distribution of high complementation mode in Area C is relatively less common than in the other two.This may be due to the presence of landmark buildings in this area, providing high tourism value, and being located in the financial district of New York, with a large population of workers for whom commuting value is more important.As the transportation system in this area is well developed, bike sharing is highly integrated with public transportation.For Area D, the high complementation model and the high integration model prevail notably over the competitive model, likely due to Brooklyn's relatively limited accessibility and diminished cityscape compared to Manhattan.Consequently, bike sharing predominantly serves as a tool for urban commuting rather than a mode of travel.The explanatory variables corresponding to the 500 m grid under each mode are standardized and averaged to understand the explanatory variables associated with the modes of high competition, high integration, and high complementation (Figure 5).The The explanatory variables corresponding to the 500 m grid under each mode are standardized and averaged to understand the explanatory variables associated with the modes of high competition, high integration, and high complementation (Figure 5).The results are used to characterize the geographical environment under each mode.The study finds that commercial official buildings, rented houses, companies, poverty rate, and crimes in the high competition mode are significantly lower than in the other two modes.In contrast, multifamily elevator buildings, street trees, and the population is significantly higher than the other two modes.This suggests that high competition occurs in less office-dense, relatively affluent, safe areas with more greenery and comfortable cycling environments.The values of multiple explanatory variables in high integration and complementation modes are relatively close.These two modes often occur in offices surrounded by relatively poor communities with higher crime rates, lower greenery, and fewer people.

Bike-Sharing Riding Clustering
A spatial autocorrelation analysis using GEODA software is conducted.The resulting Moran's I statistics (999 permutations) are 0.337, 0.522, and 0.472, respectively, and are all significant at the 0.001 level, indicating spatial dependence and clustering among the three modes.Figure 6 depicts a local indicators of spatial association (LISA) cluster map, which reveals the relationship between local Moran's I value and bike-sharing usage at the urban grid level.Specifically, in the competitive mode, high-high cluster areas are primarily concentrated in the southern regions of Manhattan Island and on both sides of Central Park, while low-low cluster areas are found in the northeastern and southern regions.In the integrated and complementary modes, it is noteworthy that the high-high and low-low clusters reflect that bike-sharing usage is highly concentrated in the central and southern parts of Manhattan Island, while all low-usage values are concentrated in the northeastern and southern edges.Moreover, the complementary mode exhibits a low-low cluster area, almost entirely covering Bronx and Queens.Finally, the low-high and high-low cluster areas under the three modes are observed to occupy only a small portion.
Notably, Moran's I reveal considerable spatial autocorrelation.A spatial regression analysis is performed to investigate the correlation between the operating volume and the chosen variables.Based on the Lagrangian multiplier test results illustrated, LM-Lag demonstrates significance in all three models.However, LM-Error, Robust LM-Lag, and Robust LM-Error fail to meet all three models' requirements simultaneously.Consequently, it is suggested that a spatial lag model (SLM) be employed (Table 4).significantly higher than the other two modes.This suggests that high competition occurs in less office-dense, relatively affluent, safe areas with more greenery and comfortable cycling environments.The values of multiple explanatory variables in high integration and complementation modes are relatively close.These two modes often occur in offices surrounded by relatively poor communities with higher crime rates, lower greenery, and fewer people.Central Park, while low-low cluster areas are found in the northeastern and southern regions.In the integrated and complementary modes, it is noteworthy that the high-high and low-low clusters reflect that bike-sharing usage is highly concentrated in the central and southern parts of Manhattan Island, while all low-usage values are concentrated in the northeastern and southern edges.Moreover, the complementary mode exhibits a lowlow cluster area, almost entirely covering Bronx and Queens.Finally, the low-high and high-low cluster areas under the three modes are observed to occupy only a small portion.

OLS and SLM Results
To mitigate the issue of multicollinearity among explanatory variables, STATA is utilized to calculate the variance inflation factor (VIF) values for each potential explanatory variable to mitigate the issue of multicollinearity among explanatory variables.Variables with more significant VIF values than 10 are commonly considered multicollinear variables.The test result indicates that the VIF values for five variables exceeded 10: building using units, depth of buildings, widths of buildings, multifamily walkup buildings, and one-and two-family buildings.Subsequently, the two variables, namely widths of buildings and depth of buildings, are excluded for relatively strong linear correlation, while the three variables remaining are retained.A second examination is conducted, and the VIF values for all variables are below 10 (Table A1).
Following the resolution of multicollinearity, a least squares regression analysis is conducted using the STATA tool.In this study, a significance level of p < 0.05 is assigned, and variables with a p-value exceeding 0.05 are assumed to have no significant influence on the dependent variable.As the analysis involves regressing independent variables on three distinct dependent variables, the performance of each independent variable varies across different regression models (Table A2).
Notably, the results for Moran's I statistics and LM-Lag are significant across all three models, which motivate the adoption of a spatial autoregressive (SLM) model for regression analysis, as shown in Table 4.A comparative analysis of the OLS and SLM models is presented in Table 5, revealing that incorporating the SLM model leads to improvements in adjusted R 2 values by 0.03, 0.077, and 0.048 for the three models.This suggests that the SLM model produces higher degrees of goodness than the OLS model.The SLM results are shown in Table 6, which is worth noting that most variables do not demonstrate many changes in sign, magnitude, or significance of parameter estimates, including spatial interaction, except for a few variables.The results show that commercial official buildings, one-and two-family buildings, multifamily walkup buildings, schools, playgrounds, metro stations, and poverty rate are negatively correlated with the competition mode, while bike-share stations, street trees, building floors and building using units, value of lots, house rent price, and population are positively correlated with it.Rented houses, companies, distance to the nearest park, and bike-share stations positively correlate with the integration mode, while multifamily walkup buildings, multifamily elevator buildings, playgrounds, parking facilities, metro stations, and poverty rate are negatively correlated with it.Multifamily walkup buildings, multifamily elevator buildings, healthcare facilities, playgrounds, metro stations, and population are negatively correlated with the complementation mode, while rented houses, companies, bike-share stations, value of lots, house rent price, and crime are positively correlated with it.These results are broadly consistent with the analysis in Figure 5, demonstrating the validity of the results.

Sensitivity Analysis
To examine the validity of the shared-bike riding pattern classification algorithm and SLM model developed in this study, a sensitivity analysis section is implemented.Since the usage and distribution characteristics of different shared-bike riding patterns cannot be directly measured through built environment variables, and considering the wide range of explanatory variables involved in this study, the stability of the model cannot be tested by simply adjusting model parameters or covariates.The sensitivity analysis in this study is designed to examine the robustness of the shared-bike riding pattern classification algorithm and SLM model by expanding and shrinking the time periods of the research dataset.Two sets of data, one with an expanded time period and one with a reduced time period, are used to build SLM models with the same set of built environment variables.A comparison is made between the results of these models and the model results from the original dataset in the paper, allowing for an assessment of the stability of the sharedbike riding pattern classification algorithm and the SLM model developed in this study (Table A3).
It is important to note that the models built based on the three different time periods do not have numerical comparability.Therefore, the focus of the comparison will mainly be on the changes in the correlation of the explanatory variables.To ensure that the shared-bike riding volume in the dataset is not affected by seasonal variations, the selected time periods for both model A and model B are close to each other and consist of weekdays in the month of July.Specifically, the dataset for model A includes the total number of weekdays in July 2022, while the dataset for model B consists of a single day, 6 July 2020.
The outcomes demonstrate that the models constructed using the two sets of data both achieve an R 2 value of 0.3, as confirmed by the significance tests.The results of the two sets of robustness tests align closely with the findings of the research.Therefore, the shared-bike data classification methodology employed in this study, as well as the correlation between the three riding patterns and the built environment, can be considered reliable.
Regarding land use, both model A and model B show no correlation between companies and the competition mode but a positive correlation with the integration mode.Similarly, rented houses have no correlation with the competition mode but a strong positive correlation with both the integration and complementation modes.In terms of the transportation system, metro stations are negatively correlated with all three modes in both model A and model B. Parking facilities show no correlation with the competition mode but a negative correlation with both the integration and complementation modes in model B. On the other hand, bike-share stations exhibit a positive correlation with all three modes.Concerning urban design, both model A and model B demonstrate a positive correlation between street trees and buildings using units with the competition mode, while no correlation is found with the integration and complementation modes.Regarding the social economy, house rent price shows a strong correlation with all three modes in both model A and model B. The population has a significant negative correlation with the complementation mode, while its correlation with the other two modes is relatively weak.The correlation among fundamental variables remained relatively stable, safeguarding the integrity of the findings.
The correlation between the poverty rate and the three travel modes is weak in both models.Crime is positively correlated with the integration mode in both models, but this correlation is weak with the competition mode.The correlation of crime with the complementation mode shows inconsistent results between the two models, being unrelated in model A and positively correlated in model B. Model A has a lower R 2 compared to model B, indicating a weaker overall correlation between the built environment and three travel modes.This could be due to the longer time period covered by the dataset used in model A, which may have influenced the characteristics of the bike-sharing users and thus affected the experimental results.However, it is important to note that these inconsistent results do not refute the findings of this study but rather indicate weaker correlations with the explanatory variables, supporting the robustness and effectiveness of the experimental results presented in the main text.

Discussion
The significance of four attribute groups in explaining three modes of bike-sharing usage is subsequently examined by SLM models, which are presented in Table 7.The goodness of fit of the four models, as assessed by adjusted R 2 , all surpass 0.300, confirming the impact that each group has on three usage patterns.It is worth noting that the same environmental variable exhibits significant differences across the three different usage patterns.As the SLM model provides better explanatory power compared to the OLS model, the discussion will primarily focus on the results based on the SLM model.

Priority of the Four Environmental Variables
In the three different models, the priority order of the four environmental variables exhibits distinct patterns (Table 7).More specifically, in the competition model, the explanatory power of the variables is ranked in descending order as follows: urban design, social economy, land use, and transportation systems.Conversely, in the integration and complementation models, the explanatory power of the variables is ranked in descending order as follows: transportation systems, land use, social economy, and urban design.This finding indicates that the same environmental variables impact travel patterns significantly differently.The competition model is more closely associated with social economy and urban design, while the integration and complementation models demonstrate stronger relationships with transportation systems and land use.This implies that users in the competition model value bike sharing for its recreational and leisure functions as they generally have more free time.On the other hand, users in the integration and complementation models place greater emphasis on the efficiency gains bike sharing offers for their daily commutes, as commuting represents their primary motivation behind utilizing the service.Therefore, transportation systems and land use variables have more substantial explanatory power in these models.

Land Use Factors Analysis
In the competition model, commercial official buildings and schools are significantly negatively correlated with bike-sharing usage.This result suggests that, compared to the efficient urban commuting function of public transportation, users in the competition model may not necessarily be New York workers who prioritize commuting efficiency while working in office buildings [56,57].Furthermore, users in the competition mode are not students either.
In the integration model, rented houses and companies positively correlate with bike-sharing usage.At the same time, playgrounds, multifamily elevator buildings, and multifamily walkup buildings show significant negative correlations.The distance to the park is positively correlated with the integration model, indicating that customers in the integration model are more likely to choose bike sharing when the park is farther away.This suggests that customers in the integration model have a more substantial travel purpose and attach greater importance to the traffic efficiency gains brought about by integrating bike sharing with public transportation.Notably, rented houses positively correlate with the integration model, but multifamily elevator buildings and multifamily walkup buildings are significantly negatively correlated with it.This suggests that customers in the integration model are more likely to be New York renters than permanent residents.This is consistent with Ni and Chen's research on the correlation between residential and office use and the degree of integration between bike sharing and subways [18].
In the complementation model, rented houses and companies are significantly positively correlated with bike-sharing usage, while playgrounds, multifamily walkup buildings, multifamily elevator buildings, and healthcare facilities show significant negative correlations.Like the integration model, customers in the complementation model are predominantly New York renters who primarily commute to work.

Transportation Systems Factors Analysis
All three models are strongly positively correlated with the bike-share stations, which indicates that urban planners can increase or decrease the number of parking spaces based on demand for bike sharing [25,28,35].Research indicates that the size of a bike-sharing station positively impacts the likelihood of bike-sharing usage.Opting for a station with a larger capacity can improve the chances of finding an available bike or parking space, particularly during weekends, holidays, and peak periods on weekday mornings and evenings.The finding can serve as a valuable reference for the planning and design of future bike-sharing stations.The integration model negatively correlates with parking facilities, indicating that this model supplements the commuting needs that private cars cannot meet.In the area where private automobiles have not been extensively established, integration mode can play an important role due to its low-cost and reliable service, as mentioned in the previous report [58].It is worth noting that there is a negative correlation between metro stations and all three modes.This may be attributed to the fact that when selecting the dependent variables, we considered the total number of bus and metro stations.This indirectly indicates that the integration of shared bicycles and public transportation should consider not only the metro stations but also comprehensive bus and metro systems.

Urban Design Factors Analysis
The analysis reveals a positive correlation between the competition model and the street trees, buildings using units, and total building floors in each grid.In contrast, the correlation between the integration and complementation models and urban design factors is comparatively weaker.These findings provide further support that users in the competition model place a stronger emphasis on the recreational and leisure functions of bike sharing.Green spaces, diverse building types, and higher building floors often indicate an ideal urban environment.Conversely, customers in the integration and complementation models prioritize the commuting function of bike sharing over its recreational benefits.This explains why these groups exhibit weaker correlations with urban design variables, as previously noted in the discussion on the priority of different attribute groups and their association with the dependent variable.Therefore, varying approaches for the provision of bikes should be adopted in order to address these concerns.

Social Economy Factors Analysis
The competition model is positively correlated with the population size and negatively correlated with poverty rates, suggesting that the competition model tends to occur in economically prosperous areas with larger populations.The integration model shows a positive correlation with the crime rate and a negative correlation with the poverty rate, indicating that the surrounding environment in the integration mode, despite its vibrancy, exhibits a higher crime rate.Meanwhile, the complementation model shows a positive correlation with the crime rate and a negative correlation with population size, suggesting that the areas where the complementation model occurs are typically suburban areas with lower population densities, inadequate security measures, and lower coverage of public transportation.Therefore, bike sharing provides a valuable supplement to public transportation in these areas.

Policy Recommendations
This study provides practical evidence for landscape and urban planning.Our research indicates that competition and integration relationships between bike sharing and public transportation are important.Different relationship patterns reflect different urban demands.Therefore, urban planners should consider relationships between bike sharing and public transportation to make more human-centered decisions.For example, locations with more substantial competition relationship attributes can be selected for parking spots suitable for long-distance cycling.In contrast, locations with more robust integration and complementary relationship attributes can be selected for parking spots suitable for short-distance cycling.
Firstly, users in the competition model place more emphasis on the recreational value of bike sharing, thus having a stronger correlation with urban design and social economic relations.Urban planners should formulate city design policies based on spatial distribution guided by the competitive model.For example, they should consider the establishment of urban greenways in the riverside areas on both sides of Manhattan Island while also implementing targeted urban revitalization in this area.Urban planners can also integrate the bike-sharing system with the New York City tourism strategy, planning urban bike touring and sightseeing routes and connecting relevant urban leisure nodes.
Secondly, users in the integration and complementation models place more emphasis on the commuting value of bike sharing, thus having a stronger correlation with land use and transportation systems, positively correlated with workplaces, and negatively correlated with leisure parks.Urban planners should integrate bike sharing and public transportation through careful planning based on the distribution of bike-sharing parking spots with stronger integration and complementary relationship attributes.At the same time, urban planners need to increase parking spots in areas with high rental demand, particularly those in suburban regions that lack reliable, fixed-route public transportation services.
Thirdly, the surroundings of the integration model have inadequate security measures and are positively correlated with rental properties, with many users in poverty.The public security department should strengthen public safety in the areas surrounding the integration model to improve the living environment for underprivileged workers in New York.
Lastly, the complementation model is negatively correlated with population size and positively correlated with crime rates, indicating low public transportation coverage and high crime rates in suburban areas.The planning department should strengthen public transportation construction and enhance the efficacy of security patrols in suburban areas to improve social equity.

Conclusions
This study enriches the literature on bike sharing and the built environment, further revealing the impact of the built environment on the collaborative interaction between bike sharing and public transportation.
Firstly, the understanding of the quantitative relationship between shared bikes and public transportation has been deepened.The study shows that both competitive and integrative relationships between bike sharing and public transportation are significant, indicating that while bike sharing solves the first and last-mile problems in urban transportation systems, it also competes with the urban transportation systems to a large extent.Secondly, this study advances our understanding of how the built environment influences the relationship between bike-sharing systems and public transportation modes.Users of the competition mode value the sightseeing and leisure value of bike sharing more.In contrast, users of the integration and complementation modes value the commuting value of bike sharing more.Moreover, this study makes a valuable contribution to forthcoming research endeavors by introducing a quantitative research framework that categorizes bike-sharing data and constructs a regression model incorporating the built environment.This framework expands the definition of the built environment by incorporating diverse data sources, including land use, transportation systems, urban design, and social economy, thereby providing a more holistic understanding.The effectiveness of this approach is demonstrated through sensitivity analysis experiments that involve extending and reducing the time span of the dataset.These experiments verify the validity of the method.
Therefore, bike-sharing policymakers should consider differentiated strategies based on the value orientation of each parking spot.For example, since bike sharing benefits residents and provides affordable and reliable connections between homes or workplaces, bike-sharing systems can be encouraged in rental housing, office-concentrated areas, and suburban areas with low-quality fixed-route public transportation services.At the same time, due to the advantages of bike sharing in sightseeing value competing with the public transportation systems, more bike-sharing parking spots should be set up, even planning bike-sharing sightseeing routes to fully display the cityscape in areas with solid entertainment attributes and relatively complete urban design.Dedicated bike lanes can even be set up within green spaces and parks to avoid traffic, injury, and traffic signals and fully realize the sightseeing value of bike sharing.In addition, as many complementation modes occur in areas with less population and higher crime rates, environmental enclosure, and security monitoring should be enhanced to improve the safety of users.
The study has some limitations.Firstly, data were extracted for five non-holiday weekdays from the Citi Bike website for analysis, resulting in a small sample size and not considering weekends or holidays.Secondly, the variables examined in this study primarily emphasized the quantity of public transport facilities, while the quality aspects of public transport, such as frequency, service time, speed, and vehicle type, were not fully taken into account.On the other hand, the study ignored the influence of other objective physical features such as the street view environment, green view rate, street surface, and slope.These omissions potentially engendered discrepancies in the research outcomes.Future studies can incorporate variables related to these aspects to enhance the scientific rigor of the argument.In addition, due to the limitation of Citi Bike data, which only contain the coordinates of the starting point and the end point of the ride, the current research adopted a buffering algorithm for classification, which caused the error of the results to a certain extent.Future studies can be analyzed using graphics-based methods.Finally, the study overlooked individual travel behavior (e.g., travel purpose, mode preferences, and path selection).Therefore, mobile data mining must be combined with traditional datasets (e.g., structured travel surveys) to elucidate bike-sharing behavior and context further.The study provides a direction for further research.Future studies can apply this method to cities outside New York and conduct more extensive time-sliced studies, which will enable a deeper understanding of the complex relationship between bike sharing and the built environment.
ISPRS Int.J. Geo-Inf.2023,12,  x FOR PEER REVIEW 7 of 27 infrastructure and policies.Thus, commuters in New York City heavily rely on the Metropolitan Transportation Authority (MTA), which includes buses, subways, and commuter railroads.With the successful development of the sharing economy, bike sharing has become increasingly popular.As the dominant dock-based systems in the greater New York City area, Citi Bike provides service coverage with more than 13,000 bikes and over 850 stations in the city core.Given that Citi Bike stations are relatively concentrated in Manhattan and its surrounding areas, we select 19 communities with relatively evenly distributed station locations as the study objects to ensure the validity of the experimental results.Figure2shows the distribution of Citi Bike docking stations and public transportation stations in the study area.

Figure 2 .
Figure 2. The distribution of Citi Bike docking stations and public transportation stations in the study area.

Figure 2 .
Figure 2. The distribution of Citi Bike docking stations and public transportation stations in the study area.

Figure 3 .
Figure 3.The spatial distribution and proportion of three travel modes.

Figure 4 .
Figure 4. Top 20% grid with high distribution for competition, integration, and complementation and four focus regions named A-D.

Figure 4 .
Figure 4. Top 20% grid with high distribution for competition, integration, and complementation and four focus regions named A-D.

Figure 5 .
Figure 5. Analysis of the independent variables for three high−distribution modes.

Figure 5 .
Figure 5. Analysis of the independent variables for three high−distribution modes.

Figure 6 .
Figure 6.The LISA cluster map displayed local Moran's I values in relation to the classified dependent variables.

Figure 6 .
Figure 6.The LISA cluster map displayed local Moran's I values in relation to the classified dependent variables.

Table 1 .
Summary of spatial regression models employed in selected studies.

Table 2 .
The classification and the data sources of independent variables.
Notes: The sum of the number of variables inside each grid is to be calculated in 500 M units.NYCOD: New York City Open Data.

Table 3 .
General descriptive statistics of the variables (n = 998).

Table 4 .
Lagrangian multiplier test of three dependent variables.

Table 5 .
Comparative analysis of the OLS and SLM models.

Table 7 .
SLM regression analyses for the four variable groups.