# Towards a Better Understanding of Public Transportation Traffic: A Case Study of the Washington, DC Metro

## Abstract

## 1. Introduction

- How discriminative is metro traffic data? Given a daily time series of inflow and outflow of a station, is it possible to infer the name of the station and the date of time series?
- Based on the results to the previous question, to what degree is it possible to predict the inflow and outflow of metro stations over the next hours?

## 2. Related Work

#### 2.1. Modeling Passenger Flow Using PCA

#### 2.2. Public Transportation Traffic Prediction

#### 2.3. Road Traffic Prediction

#### 2.4. Time Series Prediction

## 3. Passenger Volume Data Description

## 4. Problem Definition

## 5. Methodology

#### 5.1. Feature Extraction

#### 5.2. Unsupervised Labeling of Stations and Days

#### 5.3. Classification

**Task I:**Classifying the type of a station, using the unsupervised grouping of stations into clusters as described in Section 7.2.**Task II:**Classifying the exact station label.

#### 5.4. Prediction

## 6. Proof of Concept

#### 6.1. Clustering of Stations

#### 6.2. Clustering of Days

## 7. Experiments

#### 7.1. Data

#### 7.2. Classification

#### 7.2.1. Number of Nearest Neighbors

#### 7.2.2. Classification Accuracy

#### 7.3. Prediction

#### 7.3.1. MLP Settings

#### 7.3.2. Prediction Quality

#### 7.3.3. Relative Gain in Error

#### 7.3.4. Computational Cost

## 8. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## References

**Figure 3.**Explained variance per principal component of the PCA on (

**a**) the time series of stations and (

**b**) the time series of days.

**Figure 13.**Absolute prediction error: kNN vs. weekly and daily periodicity prediction of the three station clusters.

