# A Data-Driven Based Dynamic Rebalancing Methodology for Bike Sharing Systems

^{*}

## Abstract

**:**

## 1. Introduction

- Characterization and identification of historical critical unbalanced situations among nearby stations through frequent geospatial patterns, leveraging on the definitions of positively and negatively critical station status.
- Proposal of a rebalancing approach, based on the extracted patterns, to address the dynamic bike sharing rebalancing problem. The proposed rebalancing approach (i) trains contextualised models offline and (ii) plans the rebalancing operations in few seconds online when rebalancing actions are needed.
- Contextualised rebalancing operations.

## 2. Problem Statement and Preliminaries

#### 2.1. Neighbourhood of a Station

#### 2.2. Station Neighbourhood and Criticality

#### 2.3. Frequent Itemsets and Association Rules

## 3. Related Work

#### 3.1. The BRP Problem

#### 3.1.1. Static Rebalancing

#### 3.1.2. Dynamic Rebalancing

#### 3.2. Association Rule Mining

## 4. Methodology and Methods

#### 4.1. Pattern-Based Model Training

**Neighbourhood identification.**Given the stations’ geographical locations and the neighbourhood radius d, the neighbourhood N(s) for each station $s\in \mathcal{S}$ is computed.**Occupation rate computation.**Given the input dataset $\mathcal{D}$ and the neighbourhood N(s) for each station $s\in \mathcal{S}$, the occupation rate $O{R}_{s}\left(t\right)$ for all stations $s\in \mathcal{S}$ and all timestamps $t\in \mathcal{D}$ is computed, i.e., the occupation is computed for each pair $(s,t)\in \mathcal{D}$.**Identification of critical stations.**Given the criticality threshold $cr$, the occupation rate for each pair $(s,t)\in \mathcal{D}$ and the identified neighbourhoods, the critical rate for all the pairs $(s,t)$ is computed. Then, only the pairs $(s,t)$ associated with either a positively or a negatively critical situations are selected and stored in the dataset ${\mathcal{D}}_{\mathcal{CR}}$, enriched with the critical status (positive or negative).**Contextualised data partitioning.**Given a contextualised partitioning schema based on timestamp, ${\mathcal{D}}_{\mathcal{CR}}$ is split into N non-overlapping partitions ${P}_{i}$. A partition ${P}_{i}$ is a logical group defined on input data, related to a specific temporal context, on which we are interested in training a tailored model, e.g., if we are interested in a contextualised model for each of day of the week, ${\mathcal{D}}_{\mathcal{CR}}$ is split in seven partitions (one for each day of the week).**Generation of transactional datasets.**Given the partitions ${\mathcal{P}}_{i}$, a transactional dataset ${\mathcal{TR}}_{i}$ that encodes the critical stations in each timestamp t is built from ${\mathcal{P}}_{i}$. Each transaction $t{r}_{t}\in {\mathcal{TR}}_{i}$ includes the set of stations that are positively or negatively critical at timestamp t and their status (positive or negative).**Rule extraction.**Finally, the association rules are mined from each transactional dataset ${\mathcal{TR}}_{i}$ to extract for each context the set of frequent patterns ${\mathcal{R}}_{i}$ representing recurrent critical situations among nearby stations.

#### 4.1.1. Contextualised Data Partitioning and Models

**Per month partitioning.**Data belonging to the same month are kept together and monthly models are trained.**Per day of the week partitioning.**Data belonging to the same day of the week are included in the same partition and analysed together. By doing this, the mined patterns are able to get insights about critical stations within the same day of the week. A total number of seven groups ${\mathcal{P}}_{1},...,{\mathcal{P}}_{7}$ are generated.**Per time slot partitioning.**Three timeslots are defined: 5:00–13:00, 13:00–21:00, and 21:00–05:00. One partition for each timeslot is defined. Association rules in this case gather insights about frequent critical stations in certain time slots of the day, independently of the day of the week.**Per day of the week and time slot partitioning.**This approach combines together the latter two partitioning approaches, defining one partition for each combination (timeslot, day of the week).

#### 4.1.2. Transactional Dataset Generation and Rule Extraction

#### 4.2. Planning of Rebalancing Operations by Means of Association Rules

#### 4.3. Parallelization, Frameworks, Hardware and Tools

## 5. Experiments and Results

- The dataset used to test the developed framework is described and analysed, together with the preprocessing steps made to clean the data.
- The effect of different input parameters are evaluated to properly define the preprocessing pipeline for the data cleaning process. In particular, we analysed the effects of the thresholds f and varianceThreshold.
- The effects of the framework’s parameters on the rule extraction process are analysed.
- The effects of all the framework’s parameters and the number of available vehicles ${N}_{t}$ on the rebalancing process are analysed.

#### 5.1. Dataset Description

#### 5.2. Preprocessing and Data Cleaning

- Removal of data associated to total number of slots for a specific station equal to 0 or 1. Such situations can possibly be associated with the system not working properly or maintenance operations ongoing during the time the data was collected.
- Removal of some bike stations from the dataset due to them being logged rarely. In particular, we removed stations that were not present in at least $f\%$ of the available timestamps, being f an input parameter for our framework. We refer to such stations as infrequent stations.
- Removal of some bike sharing stations due to high fluctuation of total number of slots. Specifically, we analysed the variance of the total number of slots for each station and filtered out stations whose variance was higher than a specific threshold denominated as varianceThreshold. We refer to such stations as unstable stations.

#### 5.3. Analyses of Parameters’ Impact on Data Cleaning Operations

#### 5.3.1. Effect of the Frequency Threshold

#### 5.3.2. Effect of the Variance Threshold

#### 5.4. Impact of Parameters on Rule Extraction

#### 5.5. Quality of the Rebalancing Operations

#### 5.5.1. Best Performing Configuration

- $minSupport=10\%$;
- $minConfidence=50\%$;
- ${N}_{t}=5$;
- data partitioning strategy = Per day of the week partitioning.

#### 5.5.2. Effect of the Contextualised Data Partitioning Strategy

#### 5.5.3. Effect of Minimum Support

#### 5.5.4. Effect of Number of Trucks ${N}_{t}$

#### 5.5.5. Effect of Neighbourhood Radius d and Critical Threshold cr

## 6. Discussion and Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Martens, K. The bicycle as a feedering mode: Experiences from three European countries. Transp. Res. Part D Transp. Environ.
**2004**, 9, 281–294. [Google Scholar] [CrossRef] - Dell, M.; Iori, M.; Novellani, S.; Stützle, T. A destroy and repair algorithm for the bike sharing rebalancing problem. Comput. Oper. Res.
**2016**, 71, 149–162. [Google Scholar] - DeMaio, P. Bike-sharing: History, impacts, models of provision, and future. J. Public Transp.
**2009**, 12, 3. [Google Scholar] [CrossRef] - Eren, E.; Uz, V.E. A review on bike-sharing: The factors affecting bike-sharing demand. Sustain. Cities Soc.
**2020**, 54, 101882. [Google Scholar] [CrossRef] - Zhang, Y.; Mi, Z. Environmental benefits of bike sharing: A big data-based analysis. Appl. Energy
**2018**, 220, 296–301. [Google Scholar] [CrossRef] - Qiu, L.Y.; He, L.Y. Bike sharing and the economy, the environment, and health-related externalities. Sustainability
**2018**, 10, 1145. [Google Scholar] [CrossRef] [Green Version] - Otero, I.; Nieuwenhuijsen, M.; Rojas-Rueda, D. Health impacts of bike sharing systems in Europe. Environ. Int.
**2018**, 115, 387–394. [Google Scholar] [CrossRef] - Sun, F.; Chen, P.; Jiao, J. Promoting public bike-sharing: A lesson from the unsuccessful Pronto system. Transp. Res. Part D Transp. Environ.
**2018**, 63, 533–547. [Google Scholar] [CrossRef] - Han, J.; Pei, J.; Yin, Y. Mining frequent patterns without candidate generation. ACM Sigmod Rec.
**2000**, 29, 1–12. [Google Scholar] [CrossRef] - Kaltenbrunner, A.; Meza, R.; Grivolla, J.; Codina, J.; Banchs, R. Urban cycles and mobility patterns: Exploring and predicting trends in a bicycle-based public transport system. Pervasive Mob. Comput.
**2010**, 6, 455–466. [Google Scholar] [CrossRef] - Dell’Amico, M.; Hadjicostantinou, E.; Iori, M.; Novellani, S. The bike sharing rebalancing problem: Mathematical formulations and benchmark instances. Omega
**2014**, 45, 7–19. [Google Scholar] [CrossRef] - Karp, R.M. Reducibility among combinatorial problems. In Complexity of Computer Computations; Springer: Boston, MA, USA, 1972; pp. 85–103. [Google Scholar]
- Hoffman, K.L.; Padberg, M.; Rinaldi, G. Traveling Salesman Problem. In Encyclopedia of Operations Research and Management Science; Springer US: Boston, MA, USA, 2013; pp. 1573–1578. [Google Scholar] [CrossRef] [Green Version]
- Flood, M.M. The traveling-salesman problem. Oper. Res.
**1956**, 4, 61–75. [Google Scholar] [CrossRef] - Toth, P.; Vigo, D. The Vehicle Routing Problem; SIAM: Philadelphia, PA, USA, 2002. [Google Scholar]
- Dantzig, G.B.; Ramser, J.H. The truck dispatching problem. Manag. Sci.
**1959**, 6, 80–91. [Google Scholar] [CrossRef] - Savelsbergh, M.W.; Sol, M. The general pickup and delivery problem. Transp. Sci.
**1995**, 29, 17–29. [Google Scholar] [CrossRef] [Green Version] - Berbeglia, G.; Cordeau, J.F.; Gribkovskaia, I.; Laporte, G. Static pickup and delivery problems: A classification scheme and survey. Top
**2007**, 15, 1–31. [Google Scholar] [CrossRef] - Padberg, M.; Rinaldi, G. A branch-and-cut algorithm for the resolution of large-scale symmetric traveling salesman problems. SIAM Rev.
**1991**, 33, 60–100. [Google Scholar] [CrossRef] - Clarke, G.; Wright, J.W. Scheduling of vehicles from a central depot to a number of delivery points. Oper. Res.
**1964**, 12, 568–581. [Google Scholar] [CrossRef] - Ropke, S.; Pisinger, D. An adaptive large neighborhood search heuristic for the pickup and delivery problem with time windows. Transp. Sci.
**2006**, 40, 455–472. [Google Scholar] [CrossRef] - Ren, Y.; Meng, L.; Zhao, F.; Zhang, C.; Guo, H.; Tian, Y.; Tong, W.; Sutherland, J.W. An improved general variable neighborhood search for a static bike-sharing rebalancing problem considering the depot inventory. Expert Syst. Appl.
**2020**, 160, 113752. [Google Scholar] [CrossRef] - Dell’Amico, M.; Iori, M.; Novellani, S.; Subramanian, A. The bike sharing rebalancing problem with stochastic demands. Transp. Res. Part B Methodol.
**2018**, 118, 362–380. [Google Scholar] [CrossRef] - Gendreau, M.; Jabali, O.; Rei, W. 50th anniversary invited article—Future research directions in stochastic vehicle routing. Transp. Sci.
**2016**, 50, 1163–1173. [Google Scholar] [CrossRef] - Chemla, D.; Meunier, F.; Calvo, R.W. Bike sharing systems: Solving the static rebalancing problem. Discret. Optim.
**2013**, 10, 120–146. [Google Scholar] [CrossRef] - Erdoğan, G.; Battarra, M.; Calvo, R.W. An exact algorithm for the static rebalancing problem arising in bicycle sharing systems. Eur. J. Oper. Res.
**2015**, 245, 667–679. [Google Scholar] [CrossRef] [Green Version] - Cruz, F.; Subramanian, A.; Bruck, B.P.; Iori, M. A heuristic algorithm for a single vehicle static bike sharing rebalancing problem. Comput. Oper. Res.
**2017**, 79, 19–33. [Google Scholar] [CrossRef] [Green Version] - Benchimol, M.; Benchimol, P.; Chappert, B.; De La Taille, A.; Laroche, F.; Meunier, F.; Robinet, L. Balancing the stations of a self service “bike hire” system. RAIRO Oper. Res.
**2011**, 45, 37–61. [Google Scholar] [CrossRef] - Chalasani, P.; Motwani, R. Approximating capacitated routing and delivery problems. SIAM J. Comput.
**1999**, 28, 2133–2149. [Google Scholar] [CrossRef] [Green Version] - Schuijbroek, J.; Hampshire, R.C.; Van Hoeve, W.J. Inventory rebalancing and vehicle routing in bike sharing systems. Eur. J. Oper. Res.
**2017**, 257, 992–1004. [Google Scholar] [CrossRef] [Green Version] - Contardo, C.; Morency, C.; Rousseau, L.M. Balancing a Dynamic Public Bike-Sharing System; Cirrelt: Montreal, QC, Canada, 2012; Volume 4. [Google Scholar]
- Dantzig, G.B.; Wolfe, P. Decomposition principle for linear programs. Oper. Res.
**1960**, 8, 101–111. [Google Scholar] [CrossRef] - Benders, J. Partitioning procedures for solving mixed-variables programming problems. Numer. Math.
**1962**, 4, 238–252. [Google Scholar] [CrossRef] - Chemla, D.; Meunier, F.; Pradeau, T.; Calvo, R.W.; Yahiaoui, H. Self-Service Bike Sharing Systems: Simulation, Repositioning, Pricing. 2013. Available online: https://hal.archives-ouvertes.fr/hal-00824078/document (accessed on 5 May 2021).
- Caggiani, L.; Ottomanelli, M. A dynamic simulation based model for optimal fleet repositioning in bike-sharing systems. Procedia Soc. Behav. Sci.
**2013**, 87, 203–210. [Google Scholar] [CrossRef] [Green Version] - He, M.; Ma, X.; Jin, Y. Station Importance Evaluation in Dynamic Bike-Sharing Rebalancing Optimization Using an Entropy-Based TOPSIS Approach. IEEE Access
**2021**, 9, 38119–38131. [Google Scholar] [CrossRef] - Hu, R.; Zhang, Z.; Ma, X.; Jin, Y. Dynamic Rebalancing Optimization for Bike-Sharing System Using Priority-Based MOEA/D Algorithm. IEEE Access
**2021**, 9, 27067–27084. [Google Scholar] [CrossRef] - Chiariotti, F.; Pielli, C.; Zanella, A.; Zorzi, M. A dynamic approach to rebalancing bike-sharing systems. Sensors
**2018**, 18, 512. [Google Scholar] [CrossRef] [Green Version] - Karlin, S.; McGregor, J. The classification of birth and death processes. Trans. Am. Math. Soc.
**1957**, 86, 366–400. [Google Scholar] [CrossRef] - Fischer, W.; Meier-Hellstern, K. The Markov-modulated Poisson process (MMPP) cookbook. Perform. Eval.
**1993**, 18, 149–171. [Google Scholar] [CrossRef] - El Sibai, R.; Challita, K.; Bou Abdo, J.; Demerjian, J. A New User-Based Incentive Strategy for Improving Bike Sharing Systems’ Performance. Sustainability
**2021**, 13, 2780. [Google Scholar] [CrossRef] - Chiariotti, F.; Pielli, C.; Zanella, A.; Zorzi, M. A bike-sharing optimization framework combining dynamic rebalancing and user incentives. ACM Trans. Auton. Adapt. Syst. TAAS
**2020**, 14, 1–30. [Google Scholar] [CrossRef] [Green Version] - Hulot, P.; Aloise, D.; Jena, S.D. Towards station-level demand prediction for effective rebalancing in bike-sharing systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 378–386. [Google Scholar]
- Zaki, M.J. Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng.
**2000**, 12, 372–390. [Google Scholar] [CrossRef] [Green Version] - Agrawal, R.; Srikant, R. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, Santiago, Chile, 12–15 September 1994; Citeseer: State College, PA, USA, 1994; Volume 1215, pp. 487–499. [Google Scholar]
- Agrawal, R.; Imieliński, T.; Swami, A. Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, DC, USA, 25–28 May 1993; pp. 207–216. [Google Scholar]
- Houtsma, M.; Swami, A. Set-oriented mining for association rules in relational databases. In Proceedings of the Eleventh International Conference on Data Engineering, Taipei, Taiwan, 6–10 March 1995; IEEE: Piscataway, NJ, USA, 1995; pp. 25–33. [Google Scholar]
- The pandas development team. pandas-dev/pandas: Pandas. Zenodo
**2020**. [CrossRef] - McKinney, W. Data Structures for Statistical Computing in Python. In Proceedings of the 9th Python in Science Conference, Austin, TX, USA, 28 June–3 July 2010; van der Walt, S., Millman, J., Eds.; pp. 56–61. [Google Scholar] [CrossRef] [Green Version]
- Zaharia, M.; Chowdhury, M.; Franklin, M.J.; Shenker, S.; Stoica, I. Spark: Cluster computing with working sets. HotCloud
**2010**, 10, 95. [Google Scholar] - Computing Facilities. Available online: https://smartdata.polito.it/computing-facilities/ (accessed on 2 April 2021).
- Han, J.; Pei, J.; Mortazavi-Asl, B.; Pinto, H.; Chen, Q.; Dayal, U.; Hsu, M. Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proceedings of the 17th International Conference on Data Engineering, Heidelberg, Germany, 2–6 April 2001; Citeseer: State College, PA, USA, 2001; pp. 215–224. [Google Scholar]
- Zaki, M.J. Efficiently mining frequent embedded unordered trees. Fundam. Informaticae
**2005**, 66, 33–52. [Google Scholar]

**Figure 2.**Example of local rebalancing. The extracted rule r contains 4 stations: 3 in the body of the rule and 1 in the head. Blue stations are positively critical; the red station is negatively critical. Once such pattern is detected in the new batch of data $\mathcal{B}$, the frameworks suggests to move bicycles from blue stations to the red stations such that the final occupancy rate of the 4 stations is the same.

**Figure 7.**Reference configuration performances in terms of absolute number of station fixed on different data partitioning approaches.

**Figure 8.**Reference configuration performances in terms of fixed stations per movement on different data partitioning approaches.

Id | Neighbourhood |
---|---|

1 | ${s}_{45},{s}_{22},{s}_{6},{s}_{87}$ |

2 | ${s}_{4},{s}_{7},{s}_{11},{s}_{91}$ |

3 | ${s}_{103},{s}_{1},{s}_{47}$ |

4 | ${s}_{31},{s}_{72},{s}_{134}$ |

5 | ${s}_{10},{s}_{40},{s}_{91},{s}_{52}$ |

# | Extracted Rules | Support | Confidence |
---|---|---|---|

1 | $+{s}_{6},+{s}_{45},+{s}_{22}\to +{s}_{87}$ | 30% | 73% |

2 | $+{s}_{4},+{s}_{7}\to -{s}_{11}$ | 60% | 80% |

3 | $+{s}_{103},+{s}_{47},+{s}_{31},+{s}_{72}\to +{s}_{134}$ | 23% | 90% |

4 | $-{s}_{10},-{s}_{40},-{s}_{91}\to -{s}_{52}$ | 82% | 50% |

5 | $+{s}_{4},+{s}_{7},+{s}_{1}\to -{s}_{11}$ | 31% | 60% |

StationId | Longitude | Latitude | Name |
---|---|---|---|

1 | 2.180019 | 41.397978 | Gran Via Corts Catalanes |

2 | 2.176414 | 41.394381 | Plaça de Tetuan |

3 | 2.181164 | 41.393750 | Ali Bei |

StationId | Timestamp | Used Slots | Free Slots |
---|---|---|---|

280 | 2008-08-24 21:44:00 | 8 | 19 |

223 | 2008-09-25 04:52:00 | 5 | 22 |

108 | 2008-06-16 17:20:00 | 3 | 24 |

67 | 2008-06-26 10:24:00 | 5 | 16 |

Parameter | Values |
---|---|

f | 0.1, 0.8, 0.85, 0.9 |

varianceThreshold | 3, 5, 7 |

**Table 6.**Number of filtered infrequent stations with variable f and fixed $\mathit{varianceThreshold}=5$.

f = 0.1 | f = 0.8 | f = 0.85 | f = 0.9 | |
---|---|---|---|---|

# infrequent stations | 1 | 4 | 8 | 12 |

varianceThreshold = 3 | varianceThreshold = 5 | varianceThreshold = 7 | |
---|---|---|---|

# unstable stations | 232 | 155 | 114 |

Parameter | Values |
---|---|

data partitioning strategy | Per month partitioning, |

Per day of the week partitioning, | |

Per timeslot partitioning, | |

Per day of the week and timeslot partitioning | |

minSupport | 10%, 20%, 30% |

minConfidence | 50% |

${N}_{t}$ | 5, 10, 20 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Cipriano, M.; Colomba, L.; Garza, P.
A Data-Driven Based Dynamic Rebalancing Methodology for Bike Sharing Systems. *Appl. Sci.* **2021**, *11*, 6967.
https://doi.org/10.3390/app11156967

**AMA Style**

Cipriano M, Colomba L, Garza P.
A Data-Driven Based Dynamic Rebalancing Methodology for Bike Sharing Systems. *Applied Sciences*. 2021; 11(15):6967.
https://doi.org/10.3390/app11156967

**Chicago/Turabian Style**

Cipriano, Marco, Luca Colomba, and Paolo Garza.
2021. "A Data-Driven Based Dynamic Rebalancing Methodology for Bike Sharing Systems" *Applied Sciences* 11, no. 15: 6967.
https://doi.org/10.3390/app11156967