Development of High-Speed Rail Demand Forecasting Incorporating Multi-Station Access Probabilities

Hong, Seo-Young; Park, Ho-Chul

doi:10.3390/engproc2025102002

Open AccessProceeding Paper

Development of High-Speed Rail Demand Forecasting Incorporating Multi-Station Access Probabilities^†

by

Seo-Young Hong

and

Ho-Chul Park

^*

Department of Transportation Engineering, Myongji University, Yongin 17058, Republic of Korea

^*

Author to whom correspondence should be addressed.

^†

Presented at the 2025 Suwon ITS Asia Pacific Forum, Suwon, Republic of Korea, 28–30 May 2025.

Eng. Proc. 2025, 102(1), 2; https://doi.org/10.3390/engproc2025102002

Published: 22 July 2025

(This article belongs to the Proceedings of The 2025 Suwon ITS Asia Pacific Forum)

Download

Browse Figure

Versions Notes

Abstract

This study develops a high-speed rail demand prediction model based on access probability, which quantifies the likelihood of passengers choosing a departure station among multiple alternatives. Traditional models assign demand to the nearest station or rely on manual calibration, often failing to reflect actual travel behavior and requiring excessive time and resources. To address these limitations, this study integrates survey data, real-world datasets, and machine learning techniques to model station choice behavior more accurately. Key influencing factors, including headway, access time, parking availability, and transit connections, were identified through passenger surveys and incorporated into the model. Machine learning algorithms improved prediction accuracy, with SHAP analysis providing interpretability. The proposed model achieved high accuracy, with an average error rate below 3% for major stations. Scenario analyses confirmed its applicability in network expansions, including GTX openings and the integration of mobility as a service. This model enhances data-driven decision-making for rail operators and offers insights for rail network planning and operations. Future research will focus on validating the model across diverse regions and refining it with updated datasets and external data sources.

Keywords:

transportation demand forecasting; high-speed rail; multi-station access probability; travel behavior; machine learning; data-driven; multi-layer perceptron (MLP)

1. Introduction

Traffic demand forecasting plays a critical role in railway investment and operational planning, yet traditional four-step models fail to accurately reflect passenger behavior. These models typically assign demand to the nearest station within an administrative district, disregarding alternative station choices. Studies show that only 47% of railway users select the closest station; among multimodal travelers, this figure drops to 40% [1,2]. To address this, network calibration is often applied, but it is resource-intensive and reliant on subjective adjustments. Research highlights the importance of access time in station choice, with one study finding it nearly twice as influential as in-vehicle time [3]. However, existing studies primarily focus on identifying influencing factors rather than integrating station choice and demand estimation into a unified forecasting model. Recent advancements in big data, automation, and machine learning offer promising solutions for improving demand forecasting accuracy. Machine learning models have demonstrated superior predictive performance compared to traditional logit-based approaches [4]. However, their lack of interpretability remains a challenge, necessitating methods like SHAP (Shapley Additive Explanation) to enhance model transparency [5]. This study aims to develop a machine learning-based demand forecasting model by estimating access probabilities, overcoming the limitations of distance-based assignment models. By incorporating access probabilities into station choice modeling, the proposed framework enhances prediction accuracy and provides a more precise station-level demand estimation method for high-speed rail networks. Its performance will be validated through real-world transportation scenarios to ensure practical applicability.

2. Data & Methodology

This study models access probability to enhance high-speed rail (HSR) demand forecasting, estimating the likelihood of passengers selecting a departure station (Figure 1). Given that the exact demand distribution across stations is unknown, an optimization technique is applied, leveraging station-specific demand data and inter-zone rail traffic volumes. The analysis identifies three alternative stations per zone based on centroid distances, ensuring a realistic representation of station choice behavior. The estimated access probabilities serve as outputs in the MLP model, while station-specific characteristics act as input variables to refine demand predictions.

To support this estimation, the study utilizes 2019 KTDB O-D data, focusing on five major rail lines, connecting 12 key departure stations in the Seoul metropolitan area to 1135 zones nationwide. Each zone considers three alternative stations, resulting in 16,566 observations. The dataset integrates station accessibility, service frequency, and operational factors, including access time (TMAP API, smart card data), train headways (timetables), parking capacity, and transit connections. These variables, sourced from 2021–2023 datasets, account for regional and station-specific differences, forming a robust foundation for access probability estimation and improving demand forecasting accuracy.

This study employs a multi-layer perceptron (MLP) model, a type of supervised artificial neural network (ANN), to enhance high-speed rail (HSR) demand forecasting by capturing complex, non-linear relationships between input variables and station choice behavior. The model consists of three layers: an input layer, two hidden layers (with 8 and 6 units, respectively), and an output layer. Each node adjusts its activation through backpropagation, minimizing errors using the Levenberg–Marquardt training algorithm. The ReLU function is used for hidden layer activation, while Softmax represents station choice probabilities in the output layer.

The dataset was split into 70% training and 30% testing sets, with feature scaling applied using Standard Scaler to normalize input values. The model was implemented using Keras, trained with Mean Squared Error (MSE) as the loss function and optimized using Adam optimizer. The batch size was set to 32, with a learning rate of 0.001, and the model was trained for 10 epochs. Performance was assessed through the Area Under the Curve (AUC) metric, ensuring robust predictive accuracy.

3. Results and Discussion

The optimization process effectively distributed trip volumes across three alternative stations per zone, with access probabilities validated against Railway Statistical Yearbook data, showing an error rate close to 0%. The MLP model demonstrated a strong predictive performance (Table 1), achieving a Mean Squared Error (MSE) of 0.0214, a Mean Absolute Error (MAE) of 0.0990, and a coefficient of determination (R²) of 0.7529. Errors were below 1% for major stations (Seoul, Yeongdeungpo, Suseo, Gwangmyeong, Yongsan) and within 3% for Dongtan, Haengsin, and Sangbong, though stations with low demand (<5000 trips) showed higher errors, indicating limitations in sparsely used locations. SHAP analysis identified train frequency, bus routes, headway, and parking capacity as key factors influencing station choice, with lower train frequency and longer headways reducing access probability.

The case study analysis assessed the impact of key variable changes on access probability and rail demand, demonstrating the model’s applicability in real-world scenarios. One scenario analyzed the effect of mobility as a service (MaaS) on transit efficiency, showing that reducing public transit access time by 10–30% led to a 2–5% increase in access probability, with greater improvements observed in peripheral areas.

This study introduces a novel approach to rail demand forecasting, addressing the limitations of traditional models while enhancing predictive accuracy. By incorporating access probability, the proposed method provides valuable insights for optimizing station accessibility and supporting data-driven policy and network planning.

Author Contributions

The authors confirm the following contributions to this paper: study conception and design: S.-Y.H. and H.-C.P.; data collection: S.-Y.H.; analysis and interpretation of results: S.-Y.H. and H.-C.P.; and draft manuscript preparation: S.-Y.H. and H.-C.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (RS-2024-00348596).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Debrezion, G.; Pels, E.; Rietveld, P. Modelling the Joint Access Mode and Railway Station Choice. Transp. Res. Part E Logist. Transp. Rev. 2009, 9, 270–283. [Google Scholar] [CrossRef]
Cheon, M.J.; Choi, H.J.; Park, J.W.; Choi, H.Y.; Lee, D.H.; Lee, O. A Study on the Traffic Prediction through CatBoost Algorithm. J. Korea Acad.-Ind. Coop. Soc. 2021, 22, 58–64. [Google Scholar]
Lee, J. A Development of Intercity Travel Mode Choice Model for High-Speed Rail Demand Analysis. J. Transp. Res. Korean Soc. Transp. 2009, 16, 27–40. [Google Scholar]
Zhang, X.; Zhao, X. Machine Learning Approach for Spatial Modeling of Ride sourcing Demand. J. Transp. Geogr. 2022, 100, 103310. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the 31st Conference on Neural Information Processing Systems (NeurIPS 2017), Long Beach, CA, USA, 4–9 December 2017; pp. 4766–4777. [Google Scholar]

Figure 1. The concept of access probability in OD trips.

Table 1. MLP estimation results.

Station	Actual Value	Forecasted Value
Station	Actual Value	Estimated Value	Difference	Error Rate
Seoul	85,022	85,011	–11	0%
Suseo	41,438	41,086	–352	–1%
Gwangmyeong	27,581	27,676	95	0%
Yongsan	25,764	25,782	18	0%
Dongtan	8249	8475	226	3%
Cheongnyangni	5055	5980	925	18%
Haengsin	4291	4170	–121	–3%
Pyeongtaek/Jije	3741	4592	851	23%
Suwon	3686	2861	–825	–22%
Yeongdeungpo	1320	1315	–5	0%
Sangbong	977	1010	33	3%
Yangpyeong	839	5	–834	–99%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hong, S.-Y.; Park, H.-C. Development of High-Speed Rail Demand Forecasting Incorporating Multi-Station Access Probabilities. Eng. Proc. 2025, 102, 2. https://doi.org/10.3390/engproc2025102002

AMA Style

Hong S-Y, Park H-C. Development of High-Speed Rail Demand Forecasting Incorporating Multi-Station Access Probabilities. Engineering Proceedings. 2025; 102(1):2. https://doi.org/10.3390/engproc2025102002

Chicago/Turabian Style

Hong, Seo-Young, and Ho-Chul Park. 2025. "Development of High-Speed Rail Demand Forecasting Incorporating Multi-Station Access Probabilities" Engineering Proceedings 102, no. 1: 2. https://doi.org/10.3390/engproc2025102002

APA Style

Hong, S.-Y., & Park, H.-C. (2025). Development of High-Speed Rail Demand Forecasting Incorporating Multi-Station Access Probabilities. Engineering Proceedings, 102(1), 2. https://doi.org/10.3390/engproc2025102002

Article Menu

Development of High-Speed Rail Demand Forecasting Incorporating Multi-Station Access Probabilities^†

Abstract

1. Introduction

2. Data & Methodology

3. Results and Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Development of High-Speed Rail Demand Forecasting Incorporating Multi-Station Access Probabilities †

Abstract

1. Introduction

2. Data & Methodology

3. Results and Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Development of High-Speed Rail Demand Forecasting Incorporating Multi-Station Access Probabilities^†