# Towards a Better Understanding of Public Transportation Traffic: A Case Study of the Washington, DC Metro

^{*}

## Abstract

**:**

## 1. Introduction

- How discriminative is metro traffic data? Given a daily time series of inflow and outflow of a station, is it possible to infer the name of the station and the date of time series?
- Based on the results to the previous question, to what degree is it possible to predict the inflow and outflow of metro stations over the next hours?

## 2. Related Work

#### 2.1. Modeling Passenger Flow Using PCA

#### 2.2. Public Transportation Traffic Prediction

#### 2.3. Road Traffic Prediction

#### 2.4. Time Series Prediction

## 3. Passenger Volume Data Description

## 4. Problem Definition

**Definition**

**1**(Passenger Flow Database).

**Definition**

**2**(Passenger Traffic Prediction).

## 5. Methodology

#### 5.1. Feature Extraction

#### 5.2. Unsupervised Labeling of Stations and Days

#### 5.3. Classification

**Task I:**Classifying the type of a station, using the unsupervised grouping of stations into clusters as described in Section 7.2.**Task II:**Classifying the exact station label.

**Example**

**1.**

#### 5.4. Prediction

## 6. Proof of Concept

#### 6.1. Clustering of Stations

#### 6.2. Clustering of Days

## 7. Experiments

#### 7.1. Data

#### 7.2. Classification

#### 7.2.1. Number of Nearest Neighbors

#### 7.2.2. Classification Accuracy

#### 7.3. Prediction

#### 7.3.1. MLP Settings

#### 7.3.2. Prediction Quality

#### 7.3.3. Relative Gain in Error

#### 7.3.4. Computational Cost

## 8. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Metro Facts 2017. Available online: https://www.wmata.com/about/upload/Metro-Facts-2017-FINAL.pdf (accessed on 26 July 2018).
- Pelletier, M.-P.; Trépanier, M.; Morency, C. Smart card data use in public transit: A literature review. Transport. Res. Part C
**2011**, 19, 557–568. [Google Scholar] [CrossRef] - Jolliffe, I. Principal component analysis. In International Encyclopedia of Statistical Science; Springer: Berlin/Heidelberg, Germany, 2011; pp. 1094–1096. [Google Scholar]
- Luo, D.; Cats, O.; van Lint, H. Analysis of network-wide transit passenger flows based on principal component analysis. In Proceedings of the 2017 5th IEEE International Conference on Models and Technologies for Intelligent Transportation Systems (MT-ITS), Naples, Italy, 26–28 June 2017; pp. 744–749. [Google Scholar]
- Zhong, C.; Manley, E.; Arisona, S.M.; Batty, M.; Schmitt, G. Measuring variability of mobility patterns from multiday smart-card data. J. Comput. Sci.
**2015**, 9, 125–130. [Google Scholar] [CrossRef] - Roth, C.; Kang, S.M.; Batty, M.; Barthélemy, M. Structure of urban movements: Polycentric activity and entangled hierarchical flows. PLoS ONE
**2011**, 6, e15923. [Google Scholar] [CrossRef] [PubMed] - Cats, O.; Wang, Q.; Zhao, Y. Identification and classification of public transport activity centres in Stockholm using passenger flows data. J. Transp. Geogr.
**2015**, 48, 10–22. [Google Scholar] [CrossRef] - Toqué, F.; Khouadjia, M.; Come, E.; Trepanier, M.; Oukhellou, L. Short & long term forecasting of multimodal transport passenger flows with machine learning methods. In Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), Yokohama, Japan, 16–19 October 2017; pp. 560–566. [Google Scholar]
- Dou, M.; He, T.; Yin, H.; Zhou, X.; Chen, Z.; Luo, B. Predicting passengers in public transportation using smart card data. ADC
**2015**, 28–40. [Google Scholar] [CrossRef] - Kieu, L.M.; Bhaskar, A.; Chung, E. Passenger segmentation using smart card data. IEEE Trans. Intell. Transp. Syst.
**2015**, 16, 1537–1548. [Google Scholar] [CrossRef] - Janoska, Z.; Dvorský, J. P system based model of passenger flow in public transportation systems: A case study of prague metro. Dateso
**2013**, 2013, 59–69. [Google Scholar] - Celikoglu, H.B.; Cigizoglu, H.K. Public transportation trip flow modeling with generalized regression neural networks. Adv. Eng. Softw.
**2007**, 38, 71–79. [Google Scholar] [CrossRef] - Okutani, I.; Stephanedes, Y.J. Dynamic prediction of traffic volume through Kalman filtering theory. Transp. Res. Part B Methodol.
**1984**, 18, 1–11. [Google Scholar] [CrossRef] - Pfoser, D.; Tryfona, N.; Voisard, A. Dynamic travel time maps—Enabling efficient navigation. In Proceedings of the 18th International Conference on Scientific and Statistical Database Management (SSDBM), Vienna, Austria, 3–5 July 2006; pp. 369–378. [Google Scholar]
- Kriegel, H.P.; Renz, M.; Schubert, M.; Züfle, A. Statistical density prediction in traffic networks. SDM SIAM
**2008**, 8, 200–211. [Google Scholar] - Min, W.; Wynter, L. Real-time road traffic prediction with spatio-temporal correlations. Transp. Res. Part C Emerg. Technol.
**2011**, 19, 606–616. [Google Scholar] [CrossRef] - Hendawi, A.M.; Bao, J.; Mokbel, M.F.; Ali, M. Predictive tree: An efficient index for predictive queries on road networks. In Proceedings of the 2015 IEEE 31st International Conference on Data Engineering (ICDE), Seoul, Korea, 13–17 April 2015; pp. 1215–1226. [Google Scholar]
- Lv, Y.; Duan, Y.; Kang, W.; Li, Z.; Wang, F.Y. Traffic flow prediction with big data: A deep learning approach. IEEE Trans. Intell. Transp. Syst.
**2015**, 16, 865–873. [Google Scholar] [CrossRef] - Zhang, J.; Zheng, Y.; Qi, D. Deep spatio-temporal residual networks for citywide crowd flows prediction. AAAI
**2017**, 2017, 1655–1661. [Google Scholar] - Ban, T.; Zhang, R.; Pang, S.; Sarrafzadeh, A.; Inoue, D. Referential kNN regression for financial time series forecasting. ICONIP
**2013**, 601–608. [Google Scholar] [CrossRef] - Parmezan, A.R.S.; Batista, G.E.A.P.A. A study of the use of complexity measures in the similarity search process adopted by kNN algorithm for time series prediction. ICMLA
**2015**, 45–51. [Google Scholar] [CrossRef] - Do, C.; Chouakria, A.D.; Marié, S.; Rombaut, M. Temporal and frequential metric learning for time series kNN classification. AALTD
**2015**, 1425, 35–41. [Google Scholar] - Pearson, K. LIII. On lines and planes of closest fit to systems of points in space. Lond. Edinb. Dublin Philos. Mag. J. Sci.
**1901**, 2, 559–572. [Google Scholar] [CrossRef] - Beeferman, D.; Berger, A. Agglomerative clustering of a search engine query log. In Proceedings of the Sixth ACM SIGKDD International Conference On Knowledge Discovery and Data Mining, Boston, MA, USA, 20–23 August 2000; pp. 407–416. [Google Scholar]
- Cimiano, P.; Hotho, A.; Staab, S. Comparing conceptual, divisive and agglomerative clustering for learning taxonomies from text. In Proceedings of the 16th European Conference on Artificial Intelligence, Valencia, Spain, 22–27 August 2014; IOS Press: Amsterdam, The Netherlands, 2004; pp. 435–439. [Google Scholar]
- Hartigan, J.A.; Wong, M.A. Algorithm AS 136: A k-means clustering algorithm. J. R. Stat.l Soc. Ser. C Appl. Stat.
**1979**, 28, 100–108. [Google Scholar] [CrossRef] - Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res.
**2011**, 12, 2825–2830. [Google Scholar]

**Figure 3.**Explained variance per principal component of the PCA on (

**a**) the time series of stations and (

**b**) the time series of days.

**Figure 13.**Absolute prediction error: kNN vs. weekly and daily periodicity prediction of the three station clusters.

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Truong, R.; Gkountouna, O.; Pfoser, D.; Züfle, A.
Towards a Better Understanding of Public Transportation Traffic: A Case Study of the Washington, DC Metro. *Urban Sci.* **2018**, *2*, 65.
https://doi.org/10.3390/urbansci2030065

**AMA Style**

Truong R, Gkountouna O, Pfoser D, Züfle A.
Towards a Better Understanding of Public Transportation Traffic: A Case Study of the Washington, DC Metro. *Urban Science*. 2018; 2(3):65.
https://doi.org/10.3390/urbansci2030065

**Chicago/Turabian Style**

Truong, Robert, Olga Gkountouna, Dieter Pfoser, and Andreas Züfle.
2018. "Towards a Better Understanding of Public Transportation Traffic: A Case Study of the Washington, DC Metro" *Urban Science* 2, no. 3: 65.
https://doi.org/10.3390/urbansci2030065