Next Article in Journal
Water-Level Forecasting Based on an Ensemble Kalman Filter with a NARX Neural Network Model
Previous Article in Journal
Phenological Evaluation in Ravine Forests Through Remote Sensing and Topographic Analysis: Case of Los Nogales Nature Sanctuary, Metropolitan Region of Chile
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Proceeding Paper

Development of High-Speed Rail Demand Forecasting Incorporating Multi-Station Access Probabilities †

Department of Transportation Engineering, Myongji University, Yongin 17058, Republic of Korea
*
Author to whom correspondence should be addressed.
Presented at the 2025 Suwon ITS Asia Pacific Forum, Suwon, Republic of Korea, 28–30 May 2025.
Eng. Proc. 2025, 102(1), 2; https://doi.org/10.3390/engproc2025102002
Published: 22 July 2025

Abstract

This study develops a high-speed rail demand prediction model based on access probability, which quantifies the likelihood of passengers choosing a departure station among multiple alternatives. Traditional models assign demand to the nearest station or rely on manual calibration, often failing to reflect actual travel behavior and requiring excessive time and resources. To address these limitations, this study integrates survey data, real-world datasets, and machine learning techniques to model station choice behavior more accurately. Key influencing factors, including headway, access time, parking availability, and transit connections, were identified through passenger surveys and incorporated into the model. Machine learning algorithms improved prediction accuracy, with SHAP analysis providing interpretability. The proposed model achieved high accuracy, with an average error rate below 3% for major stations. Scenario analyses confirmed its applicability in network expansions, including GTX openings and the integration of mobility as a service. This model enhances data-driven decision-making for rail operators and offers insights for rail network planning and operations. Future research will focus on validating the model across diverse regions and refining it with updated datasets and external data sources.

1. Introduction

Traffic demand forecasting plays a critical role in railway investment and operational planning, yet traditional four-step models fail to accurately reflect passenger behavior. These models typically assign demand to the nearest station within an administrative district, disregarding alternative station choices. Studies show that only 47% of railway users select the closest station; among multimodal travelers, this figure drops to 40% [1,2]. To address this, network calibration is often applied, but it is resource-intensive and reliant on subjective adjustments. Research highlights the importance of access time in station choice, with one study finding it nearly twice as influential as in-vehicle time [3]. However, existing studies primarily focus on identifying influencing factors rather than integrating station choice and demand estimation into a unified forecasting model. Recent advancements in big data, automation, and machine learning offer promising solutions for improving demand forecasting accuracy. Machine learning models have demonstrated superior predictive performance compared to traditional logit-based approaches [4]. However, their lack of interpretability remains a challenge, necessitating methods like SHAP (Shapley Additive Explanation) to enhance model transparency [5]. This study aims to develop a machine learning-based demand forecasting model by estimating access probabilities, overcoming the limitations of distance-based assignment models. By incorporating access probabilities into station choice modeling, the proposed framework enhances prediction accuracy and provides a more precise station-level demand estimation method for high-speed rail networks. Its performance will be validated through real-world transportation scenarios to ensure practical applicability.

2. Data & Methodology

This study models access probability to enhance high-speed rail (HSR) demand forecasting, estimating the likelihood of passengers selecting a departure station (Figure 1). Given that the exact demand distribution across stations is unknown, an optimization technique is applied, leveraging station-specific demand data and inter-zone rail traffic volumes. The analysis identifies three alternative stations per zone based on centroid distances, ensuring a realistic representation of station choice behavior. The estimated access probabilities serve as outputs in the MLP model, while station-specific characteristics act as input variables to refine demand predictions.
To support this estimation, the study utilizes 2019 KTDB O-D data, focusing on five major rail lines, connecting 12 key departure stations in the Seoul metropolitan area to 1135 zones nationwide. Each zone considers three alternative stations, resulting in 16,566 observations. The dataset integrates station accessibility, service frequency, and operational factors, including access time (TMAP API, smart card data), train headways (timetables), parking capacity, and transit connections. These variables, sourced from 2021–2023 datasets, account for regional and station-specific differences, forming a robust foundation for access probability estimation and improving demand forecasting accuracy.
This study employs a multi-layer perceptron (MLP) model, a type of supervised artificial neural network (ANN), to enhance high-speed rail (HSR) demand forecasting by capturing complex, non-linear relationships between input variables and station choice behavior. The model consists of three layers: an input layer, two hidden layers (with 8 and 6 units, respectively), and an output layer. Each node adjusts its activation through backpropagation, minimizing errors using the Levenberg–Marquardt training algorithm. The ReLU function is used for hidden layer activation, while Softmax represents station choice probabilities in the output layer.
The dataset was split into 70% training and 30% testing sets, with feature scaling applied using Standard Scaler to normalize input values. The model was implemented using Keras, trained with Mean Squared Error (MSE) as the loss function and optimized using Adam optimizer. The batch size was set to 32, with a learning rate of 0.001, and the model was trained for 10 epochs. Performance was assessed through the Area Under the Curve (AUC) metric, ensuring robust predictive accuracy.

3. Results and Discussion

The optimization process effectively distributed trip volumes across three alternative stations per zone, with access probabilities validated against Railway Statistical Yearbook data, showing an error rate close to 0%. The MLP model demonstrated a strong predictive performance (Table 1), achieving a Mean Squared Error (MSE) of 0.0214, a Mean Absolute Error (MAE) of 0.0990, and a coefficient of determination (R2) of 0.7529. Errors were below 1% for major stations (Seoul, Yeongdeungpo, Suseo, Gwangmyeong, Yongsan) and within 3% for Dongtan, Haengsin, and Sangbong, though stations with low demand (<5000 trips) showed higher errors, indicating limitations in sparsely used locations. SHAP analysis identified train frequency, bus routes, headway, and parking capacity as key factors influencing station choice, with lower train frequency and longer headways reducing access probability.
The case study analysis assessed the impact of key variable changes on access probability and rail demand, demonstrating the model’s applicability in real-world scenarios. One scenario analyzed the effect of mobility as a service (MaaS) on transit efficiency, showing that reducing public transit access time by 10–30% led to a 2–5% increase in access probability, with greater improvements observed in peripheral areas.
This study introduces a novel approach to rail demand forecasting, addressing the limitations of traditional models while enhancing predictive accuracy. By incorporating access probability, the proposed method provides valuable insights for optimizing station accessibility and supporting data-driven policy and network planning.

Author Contributions

The authors confirm the following contributions to this paper: study conception and design: S.-Y.H. and H.-C.P.; data collection: S.-Y.H.; analysis and interpretation of results: S.-Y.H. and H.-C.P.; and draft manuscript preparation: S.-Y.H. and H.-C.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (RS-2024-00348596).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Debrezion, G.; Pels, E.; Rietveld, P. Modelling the Joint Access Mode and Railway Station Choice. Transp. Res. Part E Logist. Transp. Rev. 2009, 9, 270–283. [Google Scholar] [CrossRef]
  2. Cheon, M.J.; Choi, H.J.; Park, J.W.; Choi, H.Y.; Lee, D.H.; Lee, O. A Study on the Traffic Prediction through CatBoost Algorithm. J. Korea Acad.-Ind. Coop. Soc. 2021, 22, 58–64. [Google Scholar]
  3. Lee, J. A Development of Intercity Travel Mode Choice Model for High-Speed Rail Demand Analysis. J. Transp. Res. Korean Soc. Transp. 2009, 16, 27–40. [Google Scholar]
  4. Zhang, X.; Zhao, X. Machine Learning Approach for Spatial Modeling of Ride sourcing Demand. J. Transp. Geogr. 2022, 100, 103310. [Google Scholar] [CrossRef]
  5. Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the 31st Conference on Neural Information Processing Systems (NeurIPS 2017), Long Beach, CA, USA, 4–9 December 2017; pp. 4766–4777. [Google Scholar]
Figure 1. The concept of access probability in OD trips.
Figure 1. The concept of access probability in OD trips.
Engproc 102 00002 g001
Table 1. MLP estimation results.
Table 1. MLP estimation results.
StationActual ValueForecasted Value
Estimated ValueDifferenceError Rate
Seoul85,02285,011–110%
Suseo41,43841,086–352–1%
Gwangmyeong27,58127,676950%
Yongsan25,76425,782180%
Dongtan824984752263%
Cheongnyangni5055598092518%
Haengsin42914170–121–3%
Pyeongtaek/Jije3741459285123%
Suwon36862861–825–22%
Yeongdeungpo13201315–50%
Sangbong9771010333%
Yangpyeong8395–834–99%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hong, S.-Y.; Park, H.-C. Development of High-Speed Rail Demand Forecasting Incorporating Multi-Station Access Probabilities. Eng. Proc. 2025, 102, 2. https://doi.org/10.3390/engproc2025102002

AMA Style

Hong S-Y, Park H-C. Development of High-Speed Rail Demand Forecasting Incorporating Multi-Station Access Probabilities. Engineering Proceedings. 2025; 102(1):2. https://doi.org/10.3390/engproc2025102002

Chicago/Turabian Style

Hong, Seo-Young, and Ho-Chul Park. 2025. "Development of High-Speed Rail Demand Forecasting Incorporating Multi-Station Access Probabilities" Engineering Proceedings 102, no. 1: 2. https://doi.org/10.3390/engproc2025102002

APA Style

Hong, S.-Y., & Park, H.-C. (2025). Development of High-Speed Rail Demand Forecasting Incorporating Multi-Station Access Probabilities. Engineering Proceedings, 102(1), 2. https://doi.org/10.3390/engproc2025102002

Article Metrics

Back to TopTop