Predicting Public Transit Demand Using Urban Imagery with a Dual-Latent Deep Learning Framework

Eunseo Ko; Gitae Park; Sangho Choo

doi:10.3390/su18010067

,

and

¹

School of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA

²

Department of Urban Design and Planning, Hongik University, Seoul 04066, Republic of Korea

^*

Author to whom correspondence should be addressed.

Sustainability2026, 18(1), 67;https://doi.org/10.3390/su18010067
(registering DOI)

This article belongs to the Special Issue Research on Sustainable Public Transportation System: Behavior, Safety and Planning—2nd Edition

Version Notes

Order Reprints

Abstract

Public transit demand forecasting is a foundational component of sustainable urban mobility, enabling efficient operation, equitable service provision, and planning of public transit systems. Urban imagery, such as aerial images, contains rich information about urban sociodemographic characteristics and the built environment, offering particular value for data-scarce regions where conventional datasets are limited or outdated. However, there is limited research on using these images for public transit demand forecasting. This study introduces a deep learning approach for predicting transit ridership using aerial images. The method employs an encoder–decoder architecture to functionally separate image-derived latent representations into sociodemographic and physical environment vectors, which are subsequently used as inputs to a neural network for ridership prediction. Using data from Seoul, South Korea, the effectiveness of the proposed method is evaluated against three baseline configurations. The results show that the sociodemographic latent vector captures spatially organized residential characteristics, while the physical environment vector encodes distinct urban landscape patterns such as dense housing, traditional street grids, open spaces, and natural environments. The proposed model, which uses only imagery-derived latent features, substantially outperforms the pure image baseline and narrows the performance gap with census-informed models, reducing sMAPE by 25–60% depending on the mode. Combining imagery with census variables yields the highest accuracy, confirming their complementary nature. These findings highlight the potential of imagery-based approaches as a scalable, cost-efficient, and sustainable tool for data-driven transit planning.

Keywords:

public transit; sustainable transportation planning; demand forecasting; ridership prediction; deep learning; urban imagery

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Article metric data becomes available approximately 24 hours after publication online.