# Predicting Short-Term Subway Ridership and Prioritizing Its Influential Factors Using Gradient Boosting Decision Trees

^{1}

^{2}

^{3}

^{4}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Methodology

#### 2.1. Gradient Boosting Decision Trees Approach

_{1}),…,h(x; a

_{M}), then f(x) could be expressed as an additive expansion of basis function h(x; a

_{m}) as follows:

_{1m}, …, R

_{jm}and predicts a constant value γ

_{jm}for region R

_{jm}. The parameters β

_{m}represent weights given to the nodes of each tree in the collection and determine how predictions from the individual decision trees are combined [25,26]. The parameters a

_{m}represents the mean values of split locations and the terminal node for each splitting variables in the individual decision tree. The parameters β

_{m}and a

_{m}are estimated by minimizing a specified loss function L(y, f(x)) that indicates a measure of prediction performance [27].

_{m}and a

_{m}should be determined as follows [28]:

_{m}can be determined given h(x, a

_{m}):

_{m}can be introduced into Equation (6) for a single parameter optimization. Thus, for any h(x; a) for which a feasible least-squares algorithm exists, optimal solutions can be computed by solving Equations (4) and (6) via any differentiable loss function in conjunction with forward stagewise additive modeling. Based on the above discussion, the algorithm for the gradient boosting decision trees can be summarized as follows in Figure 1 [24,28,29]:

#### 2.2. Regularization Parameters

#### 2.3. Relative Importance of Influential Factors

_{κ}in predicting the response [33]:

_{κ}is the splitting variable associated with node t, and ${\widehat{\tau}}_{t}^{2}$ is the corresponding empirical improvement in squared error as a result of using the splitting variable x

_{κ}as the non-terminal node t. For a collection of decision trees ${\left\{{T}_{m}\right\}}_{1}^{M}$, obtained through the gradient boosting approach, Equation (8) can be generalized by its average over all of the additive trees:

## 3. Data Sources and Preparation

## 4. Model Results

#### 4.1. Model Setup

^{2}is used as the measures in this study. Gradient boosting process determines the number of iterations that maximizes the likelihood or, equivalently, the pseudo-R

^{2}. The pseudo-R

^{2}is defined as R

^{2}= 1 − L1/L0, where L1 and L0 are the log likelihood of the full model and intercept-only model, respectively. In the case of Gaussian (normal) regression, the pseudo-R

^{2}turns into the familiar R

^{2}that can be interpreted as “fraction of variance explained by the model” [31]. For Gaussian regression, it is sometime convenient to compute R

^{2}as follows:

_{i}) is the model prediction, and $\overline{y}$ is the mean of the test samples. In this study, the test R

^{2}is calculated on the test data based on the training model.

#### 4.2. Model Optimization

^{2}did not reach its best value in following tables.

^{2}for the three optimal models are 0.9806, 0.9893 and 0.9916, respectively. This indicates a good prediction accuracy since the GBDT model is able to handle different types of predictor variables, capture interactions among the predictor variables and fit complex nonlinear relationship [39]. Hence, in this study the GBDT model can handle the nonlinear features of short-term subway ridership and leads to superior prediction accuracy. Similar studies on gradient boosting trees in travel time prediction [24] and auto insurance loss cost prediction can be also found [32].

#### 4.3. Model Comparison

^{2}as the model performance indicators. Lower RMSE value or higher R

^{2}value means higher accuracy. The RMSE is defined as follows:

^{2}measurements. As another ensemble learning method, Random Forest yields the best prediction results among other three approaches excluding GBDT with at most 36% increase of the RMSE value. On the contrary, SVM receives the worst performance for subway ridership prediction at the three stations. This finding further confirms the advantage of GBDT model in modeling complex relations between subway boarding ridership and bus transfer activities.

#### 4.4. Model Interpretation

_{t}

_{−1}contributes most in predicting short-term subway ridership with a relative importance of 82.03%, 85.06% and 92.28% for the three subway stations, respectively. This finding falls within our expectation that the immediate previous ridership is closely related with the current subway ridership. The current bus alighting passengers BUS

_{t}, with a contribution of 9.41%, 4.42% and 0.08% to the short-term subway ridership prediction, respectively ranks the second, third and eighth most influential predictor variable for the DWL, FXM and HLG subway stations. This result indicates that the bus transfer activities around the DWL subway station have the most potentially significant effects on the subway ridership, and the bus transfer activities around the HLG subway station have little effects on the subway ridership. This is consistent with the fact that the DWL subway station is an important transfer station, and a number of residents live around the HLG subway without needing transfer. For FXM station, bus transfer activities contribute less than 5% of ridership. This is because the station is actually within walking distance of Beijing Finance Street, where office workers can directly take subway for commuting. Meanwhile, 26 bus stops are around FXM station, and these stops still transfer a certain amount of passengers to the subway system.

_{t−}

_{2}and at time step t − 3 METRO

_{t}

_{−3}have slightly over 1% of contributions in predicting subway ridership for the DWL subway station, while their contributions become less than 1% for the FXM and HLG subway stations.

## 5. Summary and Discussion

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## References

- Horowitz, A. Simplifications for single-route transit-ridership forecasting models. Transportation
**1984**, 12, 261–275. [Google Scholar] [CrossRef] - Taylor, B.; Miller, D.; Iseki, H.; Fink, C. Nature and/or nurture? Analyzing the determinants of transit ridership across US urbanized areas. Transp. Res. Part A Policy Pract.
**2009**, 43, 60–77. [Google Scholar] [CrossRef] - Chan, S.; Miranda-Moreno, L. A station-level ridership model for the metro network in Montreal, Quebec. Can. J. Civ. Eng.
**2013**, 40, 254–262. [Google Scholar] [CrossRef] - Idris, A.; Habib, K.; Shalaby, A. An investigation on the performances of mode shift models in transit ridership forecasting. Transp. Res. Part A Policy Pract.
**2015**, 78, 551–565. [Google Scholar] [CrossRef] - Zhao, J.; Deng, W.; Song, Y.; Zhu, Y. What influences Metro station ridership in China? Insights from Nanjing. Cities
**2013**, 35, 114–124. [Google Scholar] [CrossRef] - Zhao, J.; Deng, W.; Song, Y.; Zhu, Y. Analysis of Metro ridership at station level and station-to-station level in Nanjing: An approach based on direct demand models. Transportation
**2014**, 41, 133–155. [Google Scholar] [CrossRef] - Cheu, R.L.; Galicia, L.D. Geographic information system-system dynamic procedure for bus rapid transit ridership estimation. J. Adv. Transp.
**2013**, 47, 266–280. [Google Scholar] - Azar, K.T.; Ferreira, J. Integrating geographic information systems into transit ridership forecast models. J. Adv. Transp.
**1995**, 29, 263–279. [Google Scholar] [CrossRef] - Dill, J.; Schlossberg, M.; Ma, L.; Meyer, C. Predicting transit ridership at stop level: role of service and urban form. In Proceedings of the 92nd Annual Meeting of the Transportation Research Board, Washington, DC, USA, 13–17 January 2013.
- Zhang, D.; Wang, X. Transit ridership estimation with network kriging: A case study of second avenue subway, NYC. J. Transp. Geogr.
**2014**, 41, 107–115. [Google Scholar] [CrossRef] - Chow, L.; Zhao, F.; Liu, X.; Li, M.; Ubaka, I. Transit ridership model based on geographically weighted regression. Transp. Res. Rec. J. Transp. Res. Board
**2006**, 1972, 105–114. [Google Scholar] [CrossRef] - Tsai, T.; Lee, C.; Wei, C. Neural network based temporal feature models for short-term railway passenger demand forecasting. Expert Syst. Appl.
**2009**, 36, 3728–3736. [Google Scholar] [CrossRef] - Zhao, S.; Ni, T.; Wang, Y.; Gao, X. A new approach to the prediction of passenger flow in a transit system. Comput. Math. Appl.
**2011**, 61, 1968–1974. [Google Scholar] [CrossRef] - Sun, Y.; Leng, B.; Guan, W. A novel wavelet-SVM short-time passenger flow prediction in Beijing subway system. Neurocomputing
**2015**, 166, 109–121. [Google Scholar] [CrossRef] - Chen, M.; Wei, Y. Exploring time variants for short-term passenger flow. J. Transp. Geogr.
**2011**, 19, 488–498. [Google Scholar] [CrossRef] - Wei, Y.; Chen, M. Forecasting the short-term metro passenger flow with empirical mode decomposition and neural networks. Transp. Res. Part C Emerg. Technol.
**2012**, 21, 148–162. [Google Scholar] [CrossRef] - Ma, X.; Wu, Y.; Chen, F.; Liu, J.; Wang, Y. Mining smart card data for transit riders’ travel patterns. Transp. Res. Part C Emerg. Technol.
**2013**, 36, 1–12. [Google Scholar] [CrossRef] - Xue, R.; Sun, D.; Chen, S. Short-term bus passenger demand prediction based on time series model and interactive multiple model approach. Discret. Dyn. Nat. Soc.
**2015**, 2015, 682390. [Google Scholar] [CrossRef] - Karlaftis, M.; Vlahogianni, E. Statistical methods versus neural networks in transportation research: Differences, similarities and some insights. Transp. Res. Part C Emerg. Technol.
**2011**, 19, 387–399. [Google Scholar] [CrossRef] - Ma, X.; Tao, Z.; Wang, Y.; Yu, H.; Wang, Y. Long short-term memory neural network for traffic speed prediction using remote microwave sensor data. Transp. Res. Part C Emerg. Technol.
**2015**, 54, 187–197. [Google Scholar] [CrossRef] - Vlahogianni, E.; Karlaftis, M.; Golia, J. Short-term traffic forecasting: Where we are and where we are going. Transp. Res. Part C Emerg. Technol.
**2014**, 43, 3–19. [Google Scholar] [CrossRef] - Guo, Z. Transfers and Path Choice in Urban Public Transport Systems. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 2008. [Google Scholar]
- Gallotti, R.; Barthelemy, M. Anatomy and efficiency of urban multimodal mobility. arXiv
**2014**, 4, 6911. [Google Scholar] [CrossRef] [PubMed] - Zhang, Y.; Haghani, A. A gradient boosting method to improve travel time prediction. Transp. Res. Part C Emerg. Technol.
**2015**, 58, 308–324. [Google Scholar] [CrossRef] - De’ath, G. Boosted trees for ecological modeling and prediction. Ecology
**2007**, 88, 243–251. [Google Scholar] [CrossRef] - Chung, Y.S. Factor complexity of crash occurrence: An empirical demonstration using boosted regression trees. Accid. Anal. Prev.
**2013**, 61, 107–118. [Google Scholar] [CrossRef] [PubMed] - Saha, D.; Alluri, P.; Gan, A. Prioritizing highway safety manual’s crash prediction variables using boosted regression trees. Accid. Anal. Prev.
**2015**, 79, 133–144. [Google Scholar] [CrossRef] [PubMed] - Friedman, J.H. Stochastic gradient boosting. Comput. Stat. Data Anal.
**2002**, 38, 367–378. [Google Scholar] [CrossRef] - Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning, 2nd ed.; Springer Series in Statistics; Springer: Berlin, Germany, 2009. [Google Scholar]
- Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat.
**2001**, 29, 1189–1232. [Google Scholar] [CrossRef] - Schonlau, M. Boosted regression (boosting): An introductory tutorial and a Stata plugin. Stata J.
**2005**, 5, 330–354. [Google Scholar] - Guelman, L. Gradient boosting trees for auto insurance loss cost modeling and prediction. Expert Syst. Appl.
**2012**, 39, 3659–3667. [Google Scholar] [CrossRef] - Breiman, L.; Friedman, J.; Stone, C.J.; Olshen, R.A. Regression Trees. In Classification and Regression Trees; CRC Press: Boca Raton, FL, USA, 1984; pp. 56–63. [Google Scholar]
- Si, B.; Fu, L.; Liu, J.; Shiravi, S.; Gao, Z. A multi-class transit assignment model for estimating transit passenger flows—A case study of Beijing subway network. J. Adv. Transp.
**2015**, 50, 50–68. [Google Scholar] [CrossRef] - Guerra, E.; Cervero, R.; Tischler, D. Half-mile circle: Does it best represent transit station catchments? Transp. Res. Rec. J. Transp. Res. Board
**2012**, 2276, 101–109. [Google Scholar] [CrossRef] - Cervero, R. Transit-oriented development’s ridership bonus: A product of self-selection and public policies. Environ. Plan. A
**2007**, 39, 2068–2085. [Google Scholar] [CrossRef] - Ma, X.; Wang, Y.; Chen, F.; Liu, J. Transit smart card data mining for passenger origin information extraction. J. Zhejiang Univ. Sci. C
**2012**, 13, 750–760. [Google Scholar] [CrossRef] - Transportation Research Board. Basic Freeway Segments. In Highway Capacity Manual; National Research Council: Washington, DC, USA, 2010; pp. 163–171. [Google Scholar]
- Elith, J.; Leathwick, J.R.; Hastie, T. A working guide to boosted regression trees. J. Anim. Ecol.
**2008**, 77, 802–813. [Google Scholar] [CrossRef] [PubMed] - Sheela, K.G.; Deepa, S.N. Review on methods to fix number of hidden neurons in neural networks. Math. Probl. Eng.
**2013**, 2013, 425740. [Google Scholar] [CrossRef] - Her, J.; Park, S.; Lee, J.S. The effects of bus ridership on airborne particulate matter (PM10) concentrations. Sustainability
**2016**, 8, 636. [Google Scholar] [CrossRef]

**Figure 2.**Layout of adjacent bus stop locations for three subway stations: (

**a**) Da-Wang-Lu (DWL) station; (

**b**) Fu-Xing-Men (FXM) station; and (

**c**) Hui-Long-Guan (HLG).

**Figure 3.**Weekly ridership change at the three subway stations (weekdays refer to the dates from 15 October to 19 October, and weekends refer to the dates from 20 October and 21 October): (

**a**) Da-Wang-Lu (DWL) station; (

**b**) Fu-Xing-Men (FXM) station; and (

**c**) Hui-Long-Guan (HLG).

Categories | Variables | Variable Description | Value Set |
---|---|---|---|

Subway station characteristics | METRO_{t}_{−1} | Short-term subway ridership at time step t − 1 | Continuous variable: R+ |

METRO_{t}_{−2} | Short-term subway ridership at time step t − 2 | Continuous variable: R+ | |

METRO_{t}_{−3} | Short-term subway ridership at time step t − 3 | Continuous variable: R+ | |

Bus transfer activities characteristics | BUS_{t} | Number of bus alighting passengers at time step t | Continuous variable: R+ |

BUS_{t}_{−1} | Number of bus alighting passengers at time step t − 1 | Continuous variable: R+ | |

BUS_{t}_{−2} | Number of bus alighting passengers at time step t − 2 | Continuous variable: R+ | |

BUS_{t}_{−3} | Number of bus alighting passengers at time step t − 3 | Continuous variable: R+ | |

Temporal characteristics | Time of day | Every fifteen minute time step of given day indexed from 1 to 96 | Categorical variable: {1, 2, 3, …, 96} |

Date of month | Serial date number of given month that represents from 1 to 31 | Categorical variable: {1, 2, 3, …, 31} | |

Day of week | Serial day number of given week that represents from Monday to Sunday | Categorical variable: {1, 2, 3, …, 7} |

Subway Ridership (METRO _{t}) | Bus Transfer Volume (BUS _{t}_{−1}) | Bus Transfer Volume (BUS _{t}_{−2}) | Bus Transfer Volume (BUS _{t}_{−3}) |
---|---|---|---|

Da-Wang-Lu (DWL) | 0.243 ** | 0.174 ** | 0.044 * |

Fu-Xing-Men (FXM) | 0.371 ** | 0.277 ** | 0.201 ** |

Hui-Long-Guan (HLG) | 0.311 ** | 0.276 ** | 0.226 ** |

**Table 3.**Performance of gradient boosting decision trees (GBDT) models for Da-Wang-Lu (DWL) subway station.

Shrinkage | R-Squared and Optimal Number of Trees | |||||||||
---|---|---|---|---|---|---|---|---|---|---|

Tree Complexity = 1 | Tree Complexity = 2 | Tree Complexity = 3 | Tree Complexity = 4 | Tree Complexity = 5 | ||||||

R^{2} | Trees | R^{2} | Trees | R^{2} | Trees | R^{2} | Trees | R^{2} | Trees | |

0.10 | 0.9565 | 1571 | 0.9742 | 2400 | 0.9764 | 547 | 0.9753 | 675 | 0.9802 | 429 |

0.05 | 0.9577 | 5730 | 0.9733 | 3383 | 0.9770 | 2894 | 0.9792 | 1617 | 0.9806 | 604 |

0.01 | 0.9605 | 26,851 | 0.9771 | 18,709 | 0.9798 | 14,063 | 0.9796 | 10,257 | 0.9807 | 12,479 |

0.005 | 0.9595 | 29,772 | 0.9771 | 29,972 | 0.9802 | 27,468 | 0.9803 | 23,201 | 0.9811 | 19,177 |

0.001 | 0.9523 | 30,000 * | 0.9724 | 29,999 * | 0.9776 | 29,999 * | 0.9796 | 29,999 * | 0.9806 | 29,999 * |

Shrinkage | R-Squared and Optimal Number of Trees | |||||||||
---|---|---|---|---|---|---|---|---|---|---|

Tree Complexity = 1 | Tree Complexity = 2 | Tree Complexity = 3 | Tree Complexity = 4 | Tree Complexity = 5 | ||||||

R^{2} | Trees | R^{2} | Trees | R^{2} | Trees | R^{2} | Trees | R^{2} | Trees | |

0.10 | 0.9738 | 10,527 | 0.9795 | 2202 | 0.9859 | 464 | 0.9863 | 602 | 0.9869 | 219 |

0.05 | 0.9759 | 25,516 | 0.9809 | 2107 | 0.9871 | 1570 | 0.9881 | 1013 | 0.9879 | 533 |

0.01 | 0.9756 | 29,997 | 0.9835 | 23,318 | 0.9876 | 10,289 | 0.9891 | 7209 | 0.9893 | 2912 |

0.005 | 0.9743 | 30,000 * | 0.9836 | 29,817 | 0.9878 | 19,983 | 0.9890 | 11,473 | 0.9893 | 6743 |

0.001 | 0.9653 | 30,000 * | 0.9819 | 30,000 * | 0.9875 | 29,995 | 0.9888 | 29,982 | 0.9891 | 29,992 |

Shrinkage | R-Squared and Optimal Number of Trees | |||||||||
---|---|---|---|---|---|---|---|---|---|---|

Tree Complexity = 1 | Tree Complexity = 2 | Tree Complexity = 3 | Tree Complexity = 4 | Tree Complexity = 5 | ||||||

R^{2} | Trees | R^{2} | Trees | R^{2} | Trees | R^{2} | Trees | R^{2} | Trees | |

0.10 | 0.9808 | 4236 | 0.9888 | 1072 | 0.9916 | 217 | 0.9916 | 457 | 0.9894 | 303 |

0.05 | 0.9827 | 5386 | 0.98884 | 1965 | 0.9905 | 1365 | 0.9905 | 1757 | 0.9910 | 1113 |

0.01 | 0.9835 | 28,242 | 0.9898 | 10,598 | 0.9916 | 7895 | 0.9914 | 5694 | 0.9915 | 5493 |

0.005 | 0.9831 | 29,978 | 0.9901 | 19,910 | 0.9915 | 17,596 | 0.9914 | 14,583 | 0.9916 | 8431 |

0.001 | 0.9786 | 30,000 * | 0.9896 | 29984 | 0.9911 | 30,000 * | 0.9915 | 29,927 | 0.9915 | 29,981 |

Subway Station | Performance for Different Models (Measured by Root Mean Squared Error (RMSE) and R^{2}) | |||||||
---|---|---|---|---|---|---|---|---|

NN | SVM | RF | GBDT | |||||

RMSE | R^{2} | RMSE | R^{2} | RMSE | R^{2} | RMSE | R^{2} | |

DWL | 134.2033 | 0.9599 | 171.4534 | 0.9346 | 107.6754 | 0.9742 | 65.9933 | 0.9806 |

FXM | 60.9258 | 0.9825 | 88.1399 | 0.9633 | 68.2797 | 0.9780 | 37.4414 | 0.9893 |

HLG | 99.4166 | 0.9837 | 149.4753 | 0.9631 | 125.6164 | 0.9739 | 64.0564 | 0.9916 |

Variable | DWL Subway Station | FXM Subway Station | HLG Subway Station | |||
---|---|---|---|---|---|---|

Rank | Relative Importance (%) | Rank | Relative Importance (%) | Rank | Relative Importance (%) | |

METRO_{t}_{−1} | 1 | 82.03 | 1 | 85.06 | 1 | 92.28 |

METRO_{t}_{−2} | 4 | 1.71 | 4 | 0.95 | 4 | 0.20 |

METRO_{t}_{−3} | 5 | 1.65 | 5 | 0.77 | 3 | 0.46 |

BUS_{t} | 2 | 9.41 | 3 | 4.42 | 8 | 0.08 |

BUS_{t}_{−1} | 6 | 0.55 | 6 | 0.44 | 6 | 0.11 |

BUS_{t}_{−2} | 8 | 0.40 | 7 | 0.34 | 9 | 0.06 |

BUS_{t}_{−3} | 7 | 0.41 | 8 | 0.27 | 5 | 0.16 |

Time of day | 3 | 3.55 | 2 | 7.59 | 2 | 6.54 |

Date of month | 10 | 0.12 | 9 | 0.10 | 7 | 0.10 |

Day of week | 9 | 0.17 | 10 | 0.06 | 10 | 0.01 |

© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Ding, C.; Wang, D.; Ma, X.; Li, H.
Predicting Short-Term Subway Ridership and Prioritizing Its Influential Factors Using Gradient Boosting Decision Trees. *Sustainability* **2016**, *8*, 1100.
https://doi.org/10.3390/su8111100

**AMA Style**

Ding C, Wang D, Ma X, Li H.
Predicting Short-Term Subway Ridership and Prioritizing Its Influential Factors Using Gradient Boosting Decision Trees. *Sustainability*. 2016; 8(11):1100.
https://doi.org/10.3390/su8111100

**Chicago/Turabian Style**

Ding, Chuan, Donggen Wang, Xiaolei Ma, and Haiying Li.
2016. "Predicting Short-Term Subway Ridership and Prioritizing Its Influential Factors Using Gradient Boosting Decision Trees" *Sustainability* 8, no. 11: 1100.
https://doi.org/10.3390/su8111100