Evaluating the Efﬁciency of Bike-Sharing Stations with Data Envelopment Analysis

: This paper focuses on the efﬁciency evaluation of bike-sharing systems (BSSs) and develops an approach based on data envelopment analysis (DEA) to support the decisions regarding the performance evaluation of BSS stations. The proposed methodology is applied and tested for the Malmöbybike BSS in Malmö, Sweden. This was done by employing spatial analyses and data about the BSS usage trends as well as taking into account transport, land use, and socioeconomic context of the case study. The results of the application demonstrate consistency with the literature and highlight meaningful associations between the station relative efﬁciency and the urban context. More speciﬁcally, the paper provides in-depth knowledge about the preprocessing data, selection of input and output variables, and the underlying analytical approach to be potentially applied to other cases and urban contexts. Overall, the DEA-based methodology presented in this study could assist decision-makers and planners with developing operational strategies for planning and management of BSS stations and networks.


Introduction
A bike-sharing system (BSS) is considered an alternative to cars. It is a measure designed to inspire modal shift from short car trips to cycling and intermodal. BSS primary function, typically regarded as a last-mile solution for metropolitan areas, has motivated the investments to provide such services in cities around the world [1,2]. Two main types of BSS exist in cities today, the conventional BSS and the free-floating BSS. The conventional BSS requires the passengers to borrow and return the bicycle from/to fixed stations. Compared to the conventional BSS, free-floating BSS has been recently introduced and it does not have fixed stations for picking up and dropping off bicycles; users are allowed to park the bikes potentially "everywhere" (or within areas with geo-fenced boundaries) as close as possible to their destinations [3]. Both BSS types enable the possibility for the passengers to cycle in a city without owning a bike. This study focuses on conventional BSS.
The first bicycle-sharing scheme was introduced in Amsterdam, the Netherlands in 1965 and it was followed by a station-based BSS implemented in Denmark in 1991 [4]. The first Swedish BSS was a pilot project introduced in Gothenburg in 2005 which operated exclusively in the northern part of the city. The project led to the development of the current BSS in Gothenburg, Styr and Ställ, which was launched in 2010 with 300 bicycles distributed in 20 stations (operating between April and October) [5] and expanded in 2020 to provide 1750 bicycles in 135 stations available throughout the year [6]. Similar The objective of this study is to propose and test a method to evaluate the relative efficiency of each shared-bicycle station within a given system and identify its determinants to establish an operational strategy for public BSSs. The proposed method will not only evaluate the efficiency of shared-bicycle stations but also consider the influence of the external variables, thereby contributing to the literature as a methodology for analyzing the efficient operation of BSS stations and the management of the shared-bicycle systems.
The method is proposed and tested through carrying out an analysis of the comparative efficiency of bike-sharing stations, putting forward a general methodology to apply potentially to any context and proposing a numerical application for the city of Malmö, Sweden. The efficiency measures are calculated by a nonparametric approach known as data envelopment analysis (DEA), showing its particular applicability to BSSs. The evaluation result is expected to help in reallocating the existing resources and assist policymakers when deciding where to allocate new stations (planning stages). In this way, it is possible to discover those stations that work better, that are more efficient according to the considered parameters, and optimize the system with low costs, i.e., reallocating racks where they are more needed (moving them from less used to more used stations, for instance).
The paper is structured as follows. Section 2 provides the introduction of the proposed DEA methodology from a general perspective, specifying the variables that, according to literature and planning guides, mostly characterize BSSs. Section 3 details the study material and method for the application of DEA to the BSS in Malmö, Sweden, including the detailed description of the explanatory analysis on the dataset to identify a subselection of significant variables. Section 4 presents and discusses the obtained results in Malmö. Section 5 concludes the paper with final remarks and reflections on the proposed approach and its implications.

Proposed Methodology
The methodology presented in this section allows at first to define the input and output variables that mostly characterize BSS stations. More specifically, inputs refer to BSS station, built environment, and population-related variables; outputs refer to station usage trends and are based on the trips done by using the system. Data related to BSS usage has to be cleaned and prepared before applying DEA (i.e., removing anomalies that can indicate temporary malfunctioning of the system of broken bicycles/stations) and be able to calculate the efficiency of each station.
Furthermore, to obtain a sufficient differentiation between the efficiency scores and remove from the analysis any potential outliers among the pool of BSS stations (DEA is sensitive to outliers), we propose to use Robust CoPlot (more details in Section 3.4). Robust CoPlot allows choosing inputs, outputs, and stations more significant for the studied context, considering the available data.
After this preliminary data preparation, DEA can be applied to determine the different degrees of efficiency associated with each BSS station. In the following subsections, we provide a more detailed description of the DEA methodology and the inputs/outputs that we suggest to include in the analysis. The data cleaning, elaboration and variable selection are more extensively described when presenting the case study (Section 3).

Data Envelopment Analysis (DEA)
Mathematically, DEA is a linear programming-based model for evaluating the relative efficiency of a set of decision making units (DMUs) which are homogeneous in the sense that they use the same types of resources (inputs) to produce the same kinds of goods or services (outputs) [22]. DEA evaluates the efficiency of each DMU relative to an estimated production possibility frontier determined by all DMUs. It has been used in several contexts (including education systems, health care units, agricultural production, military logistics, etc.); however, when analyzing the areas approached thus far, energy and transportation have the highest number of applied studies [23]. The application of the method in the transport sector is widespread, especially in the evaluation of airports, ports, railways, and urban transport companies [24,25]. In this paper, we suggest applying DEA to evaluate the relative efficiency of bike-sharing stations: hence, each DMU, in this case, corresponds with a bike-sharing station of a selected system.
To our knowledge, only two recent studies present an application of DEA in the bikesharing research. The first one, from Hong et al. [21], is applied to a station-based BSS, but it does not include any external variable in the first stage of the model. The second one, from Chang and Wei [26], uses DEA to evaluate and determine the optimal bike-sharing parking points for free-floating bicycles. We believe that the application of DEA to shared systems, although unconventional, is an interesting line of upcoming research that is worthy of further investigations.
DEA does not require any functional relationship between inputs and outputs, although it is important to provide their accurate measurements to apply it successfully. This means that only those variables that could appropriately capture the nuances in the efficiency of the DMUs have to be selected as inputs and outputs.
Since the DEA model employed in this paper relies on the standard input-oriented CCR model [22], the DMUs that, at the result of the application, obtain efficiency values equal to 1 are considered efficient. On the other end, efficiency scores less than 1 denote some inefficiencies of the considered DMU.
Note that to obtain sufficient differentiation between the efficiency scores, the number DMUs should not be too small when compared to the total number of inputs and outputs. In the literature, there is no theoretical treatment that gives a unique suggestion on this issue, but there are different rules of thumb. In this paper, we follow the recommendation by Dyson et al. [27], keeping the number of DMUs greater than or equal to twice the product between the number of inputs and that of outputs.

List of Inputs to Include in the Model
Input variables for DEA represent the aspects that impact the usage of the BSS and travel behavior in general and may explain the differences in the performance of the stations. To include such aspects in the DEA model, they need to be quantified and recorded as a set of variables. Nevertheless, other relevant qualitative parameters, such as weather and seasonal conditions, that may influence the use of BSS network as a whole, could play an important role in the step of interpreting the result.
In this study, a set of input variables were identified based on the review of literature on the usage of the BSS and travel behavior. In particular, the research by Ewing and Cervero [28] and the review study carried out by Eren and Uz [18] were used as key literature for establishing the list of the input variables which are described in Table 1 below. Table 1. Suggested input variables for measuring the efficiency of the bike-sharing system (BSS) stations using DEA.

Input Variables
Rationality and Description of the Variables

Station age
The variable is relevant for more complex/old systems, particularly if the system has been developed during several stages and groups of stations have been added at different points in time. It can be measured according to the age context of the stations.

Visibility of stations
Visibility of the stations should consider if they are placed next to public transport, or in green areas (i.e., partially hidden by trees/bushes), or in a well-lit environment [29][30][31]. It can be measured taking into account the involved elements, i.e., by assessing the distance to the bus stops/metro stations, and/or the area of the bushes around the stations, etc.

Density of BSS station
The proximity of BSS stations to each other contribute to the increasing demand for BSS services [32,33]. Different buffers have been suggested for effective BSSs in various contexts [18].

Built environment variables
Bicycle infrastructure Increasing the usage of bicycles requires good bicycle infrastructure [34]. The proximity of BSS stations to the cycling infrastructure impacts the motivation of cycling [35]. This variable can be measured computing the total length of bike lanes within the catchment area of each station, possibly weighted by the type of the bike lanes (e.g., separated paths versus paths shared with traffic).

Street connectivity
Street connectivity reflects the level of infrastructure and traffic safety in the network surrounding a BSS station [36,37]. The variable can be applied or not according to the context, it can be measured calculating the number of intersections and the density of the road network in the area.
Public transport (PT) impact factors BSS likely promotes the mode share of public transport by serving as a feeder mode for PT [38,39], and vice versa, the provision of the PT service can impact the usage of BSS. Three dimensions related to the public transport can be measured: (1) distance to the PT stations (i.e., bus stops, railways stations); (2) level of provision, which can be measured by the number of stops and stations, number of bus lines, ride frequency; (3) price scheme and approach for accessing to PT and BSS services, e.g., using a smart card for accessing to both services with a fair price is likely to increase the usage of BSS service [28,36,40,41].

Land Use
Land use impacts the demand for trips and affects the choice of travel modes. Residential areas, public and commercial areas, green areas in the city and outskirt, and mixed level of land use are the main parameters for measuring the impact of land use [28,41].

Slope (morphology of the territory)
Slope is one of the main barriers for motivating cyclists to cycle and it strongly affects bicycle usage [18]. It can be measured by assessing the level of slope in specific streets, and the portion of the streets with a certain slope within the city and catchment area [42]. It should be included/considered as a parameter in the general model formulation especially for those cities that have hilly topographies.

Population size
Population size in the catchment area is an important factor that influences the usage of the BSS service [43]. It can be measured by calculating the number of individuals residing in the catchment area.

Sociodemographic
Age, gender, education, income, employment, ownership of transit mode are the individual factors that most impact the travel behavior [18,41,43]; therefore, these are the parameters within the catchment area suggested to be measured.

List of Outputs to Include in the Model
The outputs are needed in the model to analyze the performance of BSS stations and calculate generation/attraction factors connected to (the usage of) each station. We propose the following three classes of indicators (five outputs in total), all able to appropriately capture the nuances in the efficiency of bike-sharing stations.
The usage trend of each BSS station shows a cyclical trend, i.e., a pattern that repeats itself after a certain time interval ∆t. Here, we suggest calculating the output indicators as daily averages (∆t = 24 h). Note that the output values have to be normalized according to the number of racks of the largest BSS station in the analyzed system, meaning that each station score is adjusted for the number of racks available at that station (this is the reason why we did not include them among the inputs of the model).

•
Station daily amplitude: The station daily amplitude is a way to express the daily variation of the number of bicycles in each station. A higher value (higher amplitude) corresponds to a station that is more regularly used throughout the day. We suggest calculating the amplitude of each station using the fast Fourier transform [44]. Fast Fourier transforms are mathematical calculations that convert a domain waveform (amplitude versus time) into a series of discrete waves in the frequency domain. The daily amplitude for each station can be calculated starting from the bicycle variations (usage trends) in ∆T, obtaining their frequency domain using the fast Fourier transform, and assessing the (daily) amplitude value for frequency (cycles/day) = 1.
• Station prevalence: This indicator is a proxy for the share of bicycle trips that start (departure prevalence) or end (arrival prevalence) in each station. Given n BSS stations in the system, we count the number of trips starting in each stations s i (picked-up bicycles) during ∆t. Then, the stations are ordered from the one that originates more trips (assigning it a score equal to n) to the one that originates less trips (score = 1). The scores are assigned progressively, i.e., the second one in the list has n-1, the third one n-2, and so forth. This process is repeated for every day ∆t in the timeframe ∆T of the analysis (since every station may show a different behavior according to ∆t), and the daily scores assigned to each station s i are summed. From these final scores, an average daily value is calculated, dividing the total score assigned to each station for the days ∆t included in ∆T. This is the station prevalence calculated for the departures from each station (departure prevalence); the same reasoning can be applied looking at the arrivals (i.e., repeating the calculations for the number of bicycles dropped off in each station during ∆t and then obtaining the average arrival prevalence in ∆T). • Station attractiveness: attractiveness is understood as a way to assess how appealing the station is for BSS users compared to the other stations in the network. More specifically, we propose to distinguish an active attractiveness from a passive one, considering the trips that connect each BSS station with the other stations in the network. The unit of these indicators is km/day, associated with each station. To calculate the active station attractiveness, we compute how many trips start in the origin station s i in ∆T, and we multiply each trip for the kilometers (real network distance, shortest path) necessary to reach the destination station. Then, this value is divided according to how many days are included in ∆T, to obtain an average daily value (km/day). The same (opposite) reasoning is applied to calculate the passive station attractiveness, i.e., how many trips have their destination in s i in ∆T, computing again an average daily value (km/day). Note that round trips (that is, those trips having both origin and destination in s i ) should not be included in the calculations.

Context Description and Related Variables
Malmö, with more than 344,000 inhabitants [45], is the third-largest urban area in Sweden. The central-northern part (city center) has the highest population concentration, while smaller urban agglomerations exist in the southwest and eastern parts ( Figure 1). As illustrated in Figure 2, the public transportation network follows a similar configuration and is concentrated in areas with higher population density. The cycling infrastructure ( Figure 3) includes a bike path network with 520 km of completely separated (from motor vehicle traffic) bike paths and prioritized bike paths shared with other road users [46]. In 2016, Malmöbybike (i.e., the Malmö BSS) started operating with 50 stations in the central areas of the city; during 2019, the network expanded to a total of 100 stations. The recent travel survey conducted in 2018 indicates that the modal share of cycling and public transport in Malmö are, respectively, 25.5% and 25.4% [47].
The spatial data about the population statistics and the built environment characteristics in Malmö were extracted from multiple sources including Statistics Sweden (SCB) [45], Lantmäteriet (Swedish mapping, cadastral and land registration authority) [48] and Trafikverket (Swedish Transport Administration) [49]. The population size data were in a grid format of 100 × 100 m; while other socioeconomic data (such as employment status, education level, income level, etc.) were available with two different cell sizes (250 × 250 m for urban areas and 1000 × 1000 m for suburban areas). Land use data available by Lantmäteriet were employed to map three types of land use namely residential, public and commercial, green areas. Moreover, the transport-related geodata captures the existing cycling infrastructure as well as the public transport network including bus stops and train stations [50].

BSS Data Description and Preparation
The available dataset on Malmöbybike (January 2018-July 2020) was provided by Clear Channel [51]. It covers all the OD trips in the system during this timeframe, and it makes it possible to have detailed information about the usage of the system, allowing different analyses and data aggregations.
For this application, we selected one-month data, ∆T = June 2020, i.e., the month that has registered the largest number of movements (64,763 trips) in the available dataset. At that date, 100 BSS stations were built and operating in the network. According to Weather spark [52], the average daylight time in June is 17.5 h, with an average temperature of 28 • C; the summer vacation in Sweden usually starts from the last week of June. This background offers an attractive condition for having outdoor activities. Regarding the restriction related to the COVID-19 pandemic, in June 2020 Sweden has restricted the social gathering in restaurants and public spaces (that should not exceed 50 people) and advised everyone to keep social distance in outdoor activities.
Out of the 100 stations, five of them (namely, stations no. 21, 61, 62, 69, and 79) have not been used at all during June; hence, they were removed from the dataset. As far as concerns those stations that have been partially used during the month (i.e., due to malfunctioning in some days), they were excluded only if they had not been used for more than 50% of the observation time (station no. 41 was removed in this stage). The reason is that we were performing a monthly (∆T) efficiency analysis, determining which stations have been more efficient in the considered period; minor malfunctioning of the stations should be part of the calculations.
An additional data cleaning was performed concerning those bikes that have been used longer than 1 h (i.e., picked up, and not dropped off by 60 min). According to the Malmöbybike terms of use [53], a bike should be used for a maximum of 60 min at a time, and in the case that a bike is not returned within an hour the user would be charged a fine. Therefore, it is assumed that the trips longer than 60 min are due to bikes that are broken or not functioning correctly. The result of data cleaning was a dataset with 94 stations and 63,338 OD-trips.
Considering the previous research [54] as well as the contextual conditions in Malmö (e.g., the urban area size, the MalmöbyBike coverage area), a radius R = 300 m was considered acceptable to define the catchment area (buffer) around each BSS station.
The selected input and output variables are explained and listed in the following Section 3.3.

Specification of Inputs and Outputs
Based on the input variables suggested in Section 2.2, we used publicly available statistical data to calculate the following list of input variables to apply DEA to the Malmöbybike BSS ( Table 2). Note that all the numbers in the input final table are non-negative; the zero values were eliminated by adding a small positive constant, to meet the "positivity" requirement of DEA [55]. Table 2. Description of input categories and variables notations for the DEA applied to Malmöbybike bike-sharing system.

Input Variables
Description of the Variables DEA Notation

BSS Station Related Variables
Density of BSS stations Number of the BSS stations within 1 km radius from each BSS station I1 Density of BSS stations (within 1 km)

Built environmental variables (within the catchment area)
Land use The area of each land use category. Three types of land use are calculated: residential; public and commercial; green areas [48].

Bicycle infrastructure
The total length of bike lanes. We computed separated bike lanes and shared bike lanes. Separated bike lanes refer to the designated road space clearly defined by signs and regulations that space should be only used for cycling; shared bike lanes are the road spaces shared with pedestrian or cars but recommended for cycling in the interest of creating a more continuous cycling network across the city [49].

I5 Separated bike lanes (m) I6 Shared bike lanes (m)
Public transport impact factors The number of tracks/bus lines passing by each station/bus stop, to have a proxy of the actual connectivity granted by the public transport system [56].

Population size
The average number of residents. Since each catchment area is delimited by a circle, and the population is available in a grid format, we calculated the portion of the area of each element of the grid (square) falling within the circle, and the corresponding share of population assuming a uniform population density in each element of the grid. Provided in grid format (2018), 100 × 100 m [57].  Although the station age was listed among the suggested input variables (Section 2.2), we did not include this variable for the case study of Malmöbybike. The decision was made since the system is fairly recent, and it has been mainly built in two steps (50 stations in 2016 and 50 more stations in 2019). As previously explained, since DEA provides a relative efficiency of each station, it is important to provide indicators able to capture in a nuanced way the differences among stations from a certain perspective. The (50 + 50) BSS stations have not been opened simultaneously, but gradually over the year(s). Since the information about the exact days/weeks/months of operation of each station is not available and the Station age input would have had only two values (the two known years: 2016 and 2019), it was not added to the model.
In the following Table 3, some descriptive statistics (mean, median, minimum, maximum, standard deviation) of the input variables used in this analysis are provided. Regarding the output calculation, notation and descriptive statistics are summarized in the following Table 4. If the calculation of station prevalence (O2 and O3) and attractiveness (O4 and O5) is straightforward following the description of Section 2.3, we provided a more detailed explanation for the assessment of the station daily amplitude O1 using the fast Fourier transform.
Using the Clear Channel database [51] for the Malmöbybike BSS, it was possible to obtain the usage trend of each station in ∆T (June 2020). We did not have any information about bicycle relocations among stations performed by the operator; hence, we made an assumption looking at the available data, which indicates origin and destination of each bike-sharing trip in the network. If the bicycle b k is in the station s i at a certain time h 1 , but the previously registered trip (ended at h 2 ) in the system does not have s i as the destination station, we assumed that relocation happened in the time interval h 1 -h 2 , more specifically at the midpoint h 3 (so that the time interval h 2 -h 3 has the same length of h 3 -h 1 ).
After obtaining the final usage trends (i.e., the bicycle variations) in ∆T taking into account relocations as just described, the fast Fourier transform was applied to convert the time domain waveforms to the frequency domain. The value of each station daily amplitude is the one corresponding to frequency (cycles/day) = 1 ( Figure 6).
The following Figures 4-6 show a practical example for two bike-sharing stations in the system.       Transforming the temporal domain (trend over time of the number of bikes in each station) into the frequency domain allows finding signal periodicity that otherwise would not be easy to identify. Figure 6 highlights a series of peaks representing the different amplitudes of the periodicities identified using the fast Fourier transform. Larger amplitudes show the prevailing periodicities.
We chose to visualize the stations 1 and 15 (in Figures 4 and 6) since they are representatives of the different behaviors that the stations in the Malmöbybike system had during ΔT. Some of them (39.4% of the BSS stations) show a peak corresponding to frequency (cycles/day) = 1 (such as the one shown in Figure 6, Station 1): this means that a typical (daily) periodic behavior (Δt = 24 h) was detected for these stations (look at the corresponding time domain, Figure 4, station 1; Figure 5, over 10 days of observations).
The other stations (look at the representative trend of Station 15, Figure 6) show a smaller amplitude corresponding to frequency (cycles/day) = 1, and peak(s) at lower frequencies (i.e., with cycles longer than 24 h).
The highest frequency peak that was found in the entire database for all the BSS stations is the one corresponding to frequency (cycles/day) = 1, that is, the smallest cyclical temporal unit that can be detected in the system corresponds to Δt. Transforming the temporal domain (trend over time of the number of bikes in each station) into the frequency domain allows finding signal periodicity that otherwise would not be easy to identify. Figure 6 highlights a series of peaks representing the different amplitudes of the periodicities identified using the fast Fourier transform. Larger amplitudes show the prevailing periodicities.
We chose to visualize the stations 1 and 15 (in Figures 4 and 6) since they are representatives of the different behaviors that the stations in the Malmöbybike system had during ∆T. Some of them (39.4% of the BSS stations) show a peak corresponding to frequency (cycles/day) = 1 (such as the one shown in Figure 6, Station 1): this means that a typical (daily) periodic behavior (∆t = 24 h) was detected for these stations (look at the corresponding time domain, Figure 4, station 1; Figure 5, over 10 days of observations).
The other stations (look at the representative trend of Station 15, Figure 6) show a smaller amplitude corresponding to frequency (cycles/day) = 1, and peak(s) at lower frequencies (i.e., with cycles longer than 24 h).
The highest frequency peak that was found in the entire database for all the BSS stations is the one corresponding to frequency (cycles/day) = 1, that is, the smallest cyclical temporal unit that can be detected in the system corresponds to ∆t.

Inputs, Outputs, and Station Selection
Since DEA is sensitive to outliers [60] and CoPlot has been often used as a supplemental tool to cluster analysis, DEA and outlier detection methods in the literature [61][62][63], we decided to suggest its application to the proposed analysis [64][65][66]. Additionally, this analysis allows reducing the number of variables/DMUs to obtain a sufficient differentiation between the efficiency scores, while following the rule of Dyson et al. [27].
We propose to use Robust CoPlot, an adaptation of multidimensional scaling (MDS) that facilitates rich interpretation of multivariate data [67]; it has the capacity to work better than CoPlot with datasets containing outliers since it is not affected by their presence.
Both CoPlot and Robust CoPlot are able to reduce multidimensional data into a twodimensional structure, by superimposing two graphs [68][69][70], simultaneously evaluating associations between variables and between observations. The first map uses a nonmetric version of MDS to spatially represent the distances between observations (in our case, the observations are the DMUs, that is, the bike-sharing stations in Malmöbybike): similar observations are located close to one another, and the goodness-of-fit of this representation is summarized by a single parameter, the Kruskal stress value, σ [71]. The second map, which is conditional on the first, generates vectors that display the relationships among the variables (which, in our case, are inputs and outputs, Section 3.3). Each variable has its vector: if two variables are highly correlated, the vectors describing them are close together, and if their correlation is negative, the vectors describing them go in opposite directions. In this case, we have a goodness-of-fit for each variable, which expresses the goodness of the regression with respect to the observations, and is visualized by the length (magnitude) of the vector (for more details, see [62,67]).
The procedure to identify correlated variables and outliers consists of repeating the Robust Co-Plot several times, removing, before each repetition, respectively, some variables correlated to each other and outliers. DMUs identified by a specific input/output variable are positioned in the same direction of that input/output vector. Correlated variables are represented by vectors having the same directions in space, while DMUs outliers are represented by points positioned far from the center of gravity (the point where the vectors diverge) compared to the other points of the chart. Figure 7 shows, for example, the Robust CoPlot obtained for the 20 inputs and five outputs described in Section 3.3 in the first repetition.
The DMUs (bike-sharing stations) are graphically represented by red dots: as explained above, similarities between the stations in the dataset are transformed into distances on the map such that similar stations are closer together than less similar stations. The Kruskal stress value σ is 9.18%, showing a goodness-of-fit between good and fair [71].
The inputs and outputs are each represented by a black vector (labeled, with notation and magnitude). Those vectors having the same directions in space are highly correlated, hence we decided to not consider some of them and repeat the procedure, so to apply the DEA only considering the most significant variables.
Note that the analysis to remove the highly correlated inputs and outputs has to be done separately for inputs and outputs. Looking at the outputs (Figure 7), we can see that O2 and O3 are almost overlapping, and O4 and O5 have a similar direction. Hence, we selected O1, O3. and O5 since they seem to be the less correlated outputs and more significant for this dataset. Similar reasoning was applied to the 20 inputs, also taking into account those more meaningful in the Malmö context. The procedure was repeated three times, progressively removing those vectors with higher correlation, obtaining at the end the configuration shown by Figure 8, with 11 inputs and three outputs (the rule of Dyson et al. [27] is satisfied). When removing a variable, there is a rearrangement of the remaining ones in the Robust CoPlot map, depicting the associations in the new configuration.
iable are positioned in the same direction of that input/output vector. Correlated variables are represented by vectors having the same directions in space, while DMUs outliers are represented by points positioned far from the center of gravity (the point where the vectors diverge) compared to the other points of the chart. Figure 7 shows, for example, the Robust CoPlot obtained for the 20 inputs and five outputs described in Subsection 3.3 in the first repetition. The DMUs (bike-sharing stations) are graphically represented by red dots: as explained above, similarities between the stations in the dataset are transformed into distances on the map such that similar stations are closer together than less similar stations. The Kruskal stress value σ is 9.18%, showing a goodness-of-fit between good and fair [71].
The inputs and outputs are each represented by a black vector (labeled, with notation and magnitude). Those vectors having the same directions in space are highly correlated, hence we decided to not consider some of them and repeat the procedure, so to apply the DEA only considering the most significant variables. Note that the analysis to remove the highly correlated inputs and outputs has to be done separately for inputs and outputs. Looking at the outputs (Figure 7), we can see that O2 and O3 are almost overlapping, and O4 and O5 have a similar direction. Hence, we selected O1, O3. and O5 since they seem to be the less correlated outputs and more significant for this dataset. Similar reasoning was applied to the 20 inputs, also taking into account those more meaningful in the Malmö context. The procedure was repeated three times, progressively removing those vectors with higher correlation, obtaining at the end the configuration shown by Figure 8, with 11 inputs and three outputs (the rule of Dyson et al. [27] is satisfied). When removing a variable, there is a rearrangement of the remaining ones in the Robust CoPlot map, depicting the associations in the new configuration. Looking at Figure 8, the efficient DMUs (bike-sharing stations) are represented with a blue cross (28 in total), while the less efficient are represented with a red dot. By eliminating variables with low correlations, the goodness-of-fit is slightly improved and the Kruskal stress value σ results equal to 9.01%. We did not remove any DMU since we did Looking at Figure 8, the efficient DMUs (bike-sharing stations) are represented with a blue cross (28 in total), while the less efficient are represented with a red dot. By eliminating variables with low correlations, the goodness-of-fit is slightly improved and the Kruskal stress value σ results equal to 9.01%. We did not remove any DMU since we did not notice any significant cluster/variable positioned too far from the center of gravity.
The estimated efficiency scores for the remaining DMUs as well as the inputs and outputs are presented and further discussed in the next section. Figure 9 presents the efficiency scores yielded by DEA. It shows an overall pattern of the relative efficiency for the BSS stations included in the analysis based on the data from June 2020. As represented by the ramp color (dark green to light yellow), stations exhibit clear differences regarding their efficiency levels. Mapping the efficiency scores across space is helpful for both identifying the most/least efficient stations and comparing a subset of the stations to one another or to the contextual conditions. The variation in the relative efficiency scores demonstrate a meaningful pattern concerning the contextual factors and highlights three categories of stations according to their level of efficiency: (1) the efficient BSS stations (having efficiency = 1); (2) the medium efficient BSS stations; (3) the least efficient BSS stations. Each efficiency category is further addressed and discussed in the following subsections.  Figure 9 presents the efficiency scores yielded by DEA. It shows an overall pattern of the relative efficiency for the BSS stations included in the analysis based on the data from June 2020. As represented by the ramp color (dark green to light yellow), stations exhibit clear differences regarding their efficiency levels. Mapping the efficiency scores across space is helpful for both identifying the most/least efficient stations and comparing a subset of the stations to one another or to the contextual conditions. The variation in the relative efficiency scores demonstrate a meaningful pattern concerning the contextual factors and highlights three categories of stations according to their level of efficiency: (1) the efficient BSS stations (having efficiency = 1); (2) the medium efficient BSS stations; (3) the least efficient BSS stations. Each efficiency category is further addressed and discussed in the following subsections.

The Efficient BSS Stations
The stations visualized in the darkest green color represent efficient stations, that is, those having efficiency equal to one (for instance, stations no. 30, 18, or 63). Located in different areas of the city, the efficiencies of these stations may be attributed to varying land use contexts. However, the availability of separated cycling lanes indicates that the catchment areas for these stations contain a high level of bicycle infrastructure. This pattern reflects the results found in previous studies [35,72,73]. Consistent with the literature [4, 15,20], another common property of this category is the proximity to a green area or an activity center such as commercial buildings, public facilities, and job centers. Considering

The Efficient BSS Stations
The stations visualized in the darkest green color represent efficient stations, that is, those having efficiency equal to one (for instance, stations no. 30, 18, or 63). Located in different areas of the city, the efficiencies of these stations may be attributed to varying land use contexts. However, the availability of separated cycling lanes indicates that the catchment areas for these stations contain a high level of bicycle infrastructure. This pattern reflects the results found in previous studies [35,72,73]. Consistent with the literature [4, 15,20], another common property of this category is the proximity to a green area or an activity center such as commercial buildings, public facilities, and job centers. Considering the spatial properties and the urban context of the station locations, three groups can be identified.
The first group includes stations located in the northern part of the city with good access to nature, e.g., green areas and the waterfront. Trips originated from or ending at these stations are likely made by cyclists visiting the area for outdoor activities. Therefore, the presence of natural resources seems to positively contribute to the efficiency of these stations. This result is similar to the finding reported in the study by Kim et al. [74].
The weather or the seasonal conditions may be considered another external factor contributing to a larger number of trips connected to this area [18]. The last week in June coincides with the start of summer vacations in Sweden, hence the increased usage of shared bikes in areas with a larger share of recreational activities. In general, a combination of the mentioned contextual factors is likely to improve the DEA based evaluated efficiency for these stations.
The second group of efficient stations is located in those areas with a high level of access to public transport (no. 18, 16, 1, 24, 25 next to railway stations), and close to the city center. In this case, the shared bicycles users are likely the passengers who are travelling by public transport, using bikes as first/last-mile feeder mode. Such trips can be both commuting and noncommuting trips, meaning the efficiency of these stations may be less affected by the weather or seasonal conditions in June. Hence, good access to public transport may be a major contributor to the higher efficiency of these stations. This result confirms the findings of previous research suggesting that successful BSSs complement existing transport infrastructure such as public transport [16,75].
The third group includes those stations located in areas further from the city center (if compared with the first two groups), but still in the urban area, e.g., stations no. 46, 57, 89. Most of them are newly added stations that have a station age of less than one year. They are located in areas with high population density, next to the buildings which are public facilities or commercial centers, with good bicycle infrastructure available, and close to bus stops. Previous studies have provided strong evidence that these factors contribute to increased use of BSS services [4, 74,76]. In some cases, the density of BSS station within 1 km is rather lower than the average level (farther than 500 m to the next station, e.g., stations no. 57, 89) which could contribute to the efficiency of these stations. The pattern of this group may indicate that, for the less dense areas that are located further away from the city center, locations next to the public facilities and commercial centers where often the bus stops are planned are likely to be the optimal spots for planning efficient BSS stations. At the same time, a good quality cycling infrastructure should be provided.

The Medium Efficient BSS Stations
Those stations colored in mid-range green are categorized as medium efficient stations, such as stations no. 11, 14, 99. Most of these stations are located in the central area of the city with a higher concentration of public facilities and commercial buildings. The central area is often characterized by a high density in terms of population and jobs which, in turn, implies that it generates or attracts a larger number of trips and, due to the densely built environment, makes traveling by bikes or public transport more convenient than by cars [77]. Similarly, this context may create a higher demand for cycling compared to the peripheral areas, which often motivates the need for a medium/high level of BSS service provision in urban centers.
In the case of Malmö, although these stations did not fall into the efficient station group, many of them have obtained an efficiency score close to 1 (that is, the maximum efficiency score in DEA). Their slightly lower efficiency scores are probably due to the very high density of the BSS stations in the area. Most of the stations in this category have overlapping catchment areas and/or more than one BSS station may be present within their 300 m catchment area. Reducing the density by removing some stations would likely make the remaining ones more efficient. However, given the urban form context in the city central area, the level of the current efficiency of all the stations rather demonstrates the success of the BSS service in the area. In a similar urban context, previous studies have suggested the buffer to be between 200 and 400 m when planning for new stations [18,29,78]. In general, a smaller radius seems to contribute positively to the usage of the service.
The stations located further away from the city center (no. 64, 87, 90) are commonly placed within a maximum of 600 m distance from another. While this radius falls within the reasonable distance range noted in the previous studies, these stations seem to further benefit from proximity to bus stops or large public facilities/commercial buildings. Additionally, despite the lower population size in the peripheral areas, a higher residential density in the form of apartment housings, as opposed to single family houses areas, could be observed in the catchment area of these stations. In general, the observed pattern further confirms the results discussed in the earlier section and previous studies that for the noncentral urban area, the density of the BSS stations, proximity to bus stops, and large public buildings, as well as the high population density could contribute to the efficiency of the BSS stations.

The Least Efficient BSS Stations
The least efficient stations, visualized in the lightest shade of green/yellow, mostly include those added during 2019, meaning that their age is less than one year (e.g., stations no. 52,53,56,60,66,68,71,74,83,85,86,93). Most of these stations are located further away from the city center and in areas with lower population density. While some of the stations (such as no. 71,74,75) are located in proximity to small scale public facilities and commercial buildings, the low population density in their catchment areas indicates a low travel demand [28]. Similarly, the cycling infrastructure connected to the stations is rather poor which can significantly impact cycling behavior [18]. Station no. 60 is an exception to this, most likely because it is located next to two other BSS stations (no. 59 and 63) which are, respectively, next to a train station (no. 59) and public facility buildings (no. 63), providing sufficient service demand in the area. In this single case, removing station no. 60 perhaps would make stations no. 59 and 63 more efficient, reducing the running cost in general. This shows how in the areas far away from city center, where the population density is relatively low, even though there is demand due to the connection to the public transport and access to the public facilities or commercial areas, a higher density of BSS stations may not be needed. This issue has been discussed in the previous studies which have suggested different buffers according to the distance between the location of the stations and the central area [11,18].
Station no. 85 is located in a villa house area. The low efficiency of the station may be due to a low population density around the station and to the socioeconomic features of the population living in the catchment area. More specifically, the residents in the area seem to be associated with larger household size and being part of a higher income group who is more likely to travel by car than bicycle [79]. However, we would like to argue that, although the station has low efficiency, from a behavior nudging perspective, it is still worth placing the BSS service here for promoting and normalizing cycling for the groups living in these contexts.
Based on the examination of the three efficiency categories in relation to the urban contexts, the relative efficiencies evaluated by the DEA method seem highly reasonable and well supported by the previous studies.

Conclusions
The study proposed and tested a method, the data envelopment analysis, for evaluating the relative efficiency of BSS stations. The method was tested by applying DEA to a Swedish case study, the BSS Malmöbybike in Malmö.
The efficiencies were evaluated starting from a pool of input and output variables supported by literature, reports, and BSS planning guides, with declinations which allow the same procedure to be applied potentially to any city. This method does not only evaluate the efficiency of each shared-bicycle station but also enables the possibility of considering the influence of external variables, thereby contributing to the literature as a methodology for analyzing the efficient operation of shared-bicycle stations and the management of shared-bicycle systems.
The results provided by the application to the Malmöbybike BSS are meaningful in relation to both the specificities of the urban context and the findings reported in previous studies. This seems to indicate that the suggested method can provide a reliable evaluation of the BSS efficiency and that it can be used by decision-makers and planners for developing operational strategies to plan BSS stations and networks.
One of the limitations of the proposed methodology is related to the identification of a specific timeframe under evaluation. If external factors change during the days/weeks/months after the analysis, the calculated efficiencies are no longer correct. Furthermore, the analyst should have a good knowledge of the urban context under examination to be sure to include the most suitable variables capable of representing it.
It is important to point out that the objective of the study is to propose and test the DEA methodology rather than carrying out a comprehensive evaluation for the BSS in Malmö. In future studies, broader spatial and temporal information should be included and compared to achieve a more complete evaluation of the Malmöbybike efficiency. The evaluation should be carried out during the seasons when cycling is the most and the least popular. The differences between and within days, weeks, and months should all be analyzed and compared to gain a good overview of the efficiency for supporting effective operational and planning strategies.
Some of the input variables may be difficult to be expressed in a quantitative way, such as the station visibility. This type of variable could be defined through fuzzy sets. A new formulation of the methodology proposed here which considers a fuzzy DEA approach [66] is currently being prepared.
A further line of research should possibly investigate the inclusion of the suggested methodology in bike-sharing network design models, to take into account the potential efficiency of BSS stations when planning or expanding such a system.