Evaluating Statistical Models of Railway Dwell Time: Video-Based Evidence from Regional Railways in Victoria, Australia

Ng, Kenneth; Shiwakoti, Nirajan; Stasinopoulos, Peter

doi:10.3390/su172410968

Open AccessArticle

Evaluating Statistical Models of Railway Dwell Time: Video-Based Evidence from Regional Railways in Victoria, Australia

by

Kenneth Ng

,

Nirajan Shiwakoti

^*

and

Peter Stasinopoulos

School of Engineering, RMIT University, Melbourne 3000, Australia

^*

Author to whom correspondence should be addressed.

Sustainability 2025, 17(24), 10968; https://doi.org/10.3390/su172410968

Submission received: 28 October 2025 / Revised: 28 November 2025 / Accepted: 4 December 2025 / Published: 8 December 2025

(This article belongs to the Special Issue System Design and Operation in Sustainable Transport Networks)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Accurate prediction and management of train dwell times are essential for achieving efficient and sustainable public transport operations. This study evaluates established statistical dwell-time models within the context of Victoria’s regional railway network, contrasting their predictions with empirical data from video-based observations. Historically, these models—rooted in linear and non-linear regression analyses—have been designed for urban settings in peak periods. However, their applicability to regional railways, characterised by lower service frequencies with unique infrastructure and operational constraints, has been underexplored. The models were assessed for their ability to predict both passenger flow time and total dwell time under regional operating conditions. Results show that while passenger flow time can be predicted with moderate accuracy (best model R² ≈ 0.65), total dwell time models perform considerably worse (best model R² ≈ 0.25), largely due to unmodelled operational delays. The analysis identifies door operation cycles and conductor procedures as the primary operational variables influencing variability in total dwell time. Additionally, variations in passenger behaviour between peak and off-peak periods affect model performance. The findings underscore the need to incorporate local operational and behavioural factors into dwell-time models to enhance their predictive reliability for regional rail contexts. This study provides an empirical foundation for refining dwell time modelling approaches, supporting policymakers and operators in improving scheduling efficiency and overall service sustainability in regional rail networks.

Keywords:

regional transport; trains; empirical; dwell times; sustainable transport; service sustainability

1. Introduction

Efficient and reliable train dwell time management plays a critical role in promoting sustainable railway operations by improving timetable stability, reducing energy-intensive idling, and enhancing passenger satisfaction. Understanding how dwell time behaves under regional operational constraints supports the optimisation of service schedules and resource allocation—key pillars of sustainable transport systems [1]. Accurate prediction and management of dwell times are essential for optimising train schedules, ensuring smooth passenger flow, and enhancing the overall performance and service sustainability of transportation networks.

Historically, the academic and practical pursuit of comprehending and forecasting dwell times has led to the development of various models. These models have generally fallen into one of three categories: statistical-based models, simulation models, and so-called ‘advanced’ models [1]. Statistical-based models are deemed helpful in comprehending the relationship between various variables and factors and can be generically applied to all appropriate stations on a rail network. Simulation-based models help comprehend passenger flow dynamics during dwell time for specific environments and circumstances, such as a specific station design and train type. The ‘advanced’ models aim to impart new technologies and complex thought processes to dwell time modelling, such as digital twins, fuzzy logic approach, and machine learning methods. All three categories of models have predominantly been developed for and based on metro or suburban types of railways, and it is relatively unknown how well dwell time models suit other types of railways, specifically regional railways.

Over the past two decades, the demand for regional rail travel has expanded markedly worldwide. In the United Kingdom, for example, regional rail patronage rose by approximately 42% between 2002/03 and 2008/09, representing an average annual growth rate of around 6% [2]. This upward trend continued, reaching a 66% increase by 2014/15, accompanied by rises of 74% in passenger kilometres and 151% in passenger revenue over the same period [3]. In the Australian context, regional rail has also experienced significant growth. In Victoria, annual regional passenger trips increased from 6.7 million in 2006 to 17.9 million in 2017, marking an overall expansion of 167% [4]. Following the opening of the Regional Rail Link (RRL) in 2015, the Geelong line, Victoria’s busiest regional corridor, recorded a further 80% increase in ridership [5]. Meanwhile, New South Wales maintained a considerably larger passenger base, carrying over 40 million regional rail journeys annually between 2017 and 2019, prior to the COVID-19 pandemic [6].

Unlike urban railways, regional rail systems are characterised by single-track operations, longer interstation distances, and mixed-traffic interfaces with freight and suburban services. These factors introduce variability in dwell time through procedural, signalling, and crew coordination effects not typically captured in urban dwell models. For instance, single-track sections and junction scheduling introduce operational buffers that affect total dwell beyond passenger-related factors [7,8]. Our previous study [9] utilised CCTV data from regional stations in Australia and identified unique operational patterns, including the “blinded phenomenon” affecting conductors during afternoon peaks. Recognising these distinctions is crucial for understanding why traditional dwell models, developed for metro networks, may underperform in regional contexts.

This study focuses on the regional railway network of Victoria, Australia, which serves as the basis for defining a regional rail system (see Figure 1 for the network layout, adapted from [9]). Approximately 70–95% of Victorian regional passengers travel between major provincial centres, such as Geelong, Ballarat, Bendigo, Seymour, and Traralgon, and Melbourne’s central business district (CBD). These regional cities are located roughly 70–165 kilometres from Melbourne and are connected through a series of intermediate regional stations (e.g., Ballan) and peri-urban stations (e.g., Rockbank). During peak periods, these stations typically experience 6–7 train services per hour, compared with 1.5–3 services per hour during off-peak times. Several lines also extend to outer destinations such as Warrnambool, Ararat, Maryborough, Swan Hill, Echuca, Shepparton, and Bairnsdale, although these operate with only 2–5 services daily. Despite offering fewer services than metropolitan networks, passenger behaviour and its influence on dwell times remain crucial because regional lines operate within constrained infrastructure environments—including single-track sections, flat junctions, and shared corridors with suburban and metropolitan services. For instance, the Geelong, Ballarat, and Bendigo corridors converge at Sunshine Station, approximately 13 kilometres from the Melbourne CBD, funnelling 18–19 regional trains per hour during peak periods. In such a setting, unplanned or extended dwell times can disrupt carefully timed paths through junctions, passing loops, or shared suburban tracks, causing cascading delays across both regional and metropolitan services.

The primary aim of this study is to evaluate the transferability and contextual reliability of established statistical dwell-time models within regional railway operations. We hypothesise that established statistical dwell time models, while effective for predicting passenger flow, will be insufficient for predicting total dwell time in regional settings due to significant and variable operational overheads. While existing research has largely focused on densely populated metropolitan systems, limited evidence exists on how these well-known analytical models perform under the operational and behavioural conditions unique to regional railways. Rather than seeking to introduce new theoretical paradigms or models, this study deliberately positions itself as a foundational contribution that validates and contextualises existing theories for non-urban settings. Such evidence-based validation is essential before any generalised theoretical framework can be extended across different rail typologies, as transferability without empirical grounding risks misrepresentation of regional operating realities [9,10]. Therefore, the originality of this paper lies in bridging the gap between theoretical constructs and applied operational environments—offering an empirically verified baseline that future, more theory-oriented research can build upon. In this way, the study contributes to methodological advancement through contextual robustness rather than through abstraction alone, aligning with calls for practical, evidence-driven extensions of public transport modelling theory [10,11,12].

2. Related Works

In the realm of rail operations research, statistical dwell-time models play a pivotal role in enhancing the operational efficiency of rail systems. These models, which are fundamentally derived from multivariate regression analyses, can be categorised as either linear or non-linear. The evolution of dwell time models has seen significant contributions from various scholars, including Wirasinghe and Szplett [13], Weston [14], Lin and Wilson [15], Lam et al. [16], Puong [17], and Douglas [18]. A generalised representation of these models can be succinctly expressed as follows:

D T = c_{0} + c_{1} A + c_{2} B

(1)

where dwell time (

D T

) is a function of passengers boarding (

B

) and alighting (

A

); the constant (

c_{0}

) represents a fixed notion of fixed operational and mechanical time; and the constants of (

c_{2}

) and (

c_{1}

) represent a boarding and alighting flow factor, respectively.

The inception of these investigations by Wirasinghe and Szplett [13] introduced a model that underscored the average passenger demand for boarding and alighting, coupled with a fixed operational time component. Subsequent developments by Weston [14] tailored a non-linear model specific to the London Underground, incorporating variables such as the number of double-door-width doors on the train, the peak door factor, and the existing passenger count, aiming to account for the unequal distribution of passengers across the train. Lin and Wilson [15] further diversified the landscape by including the number of car sets in their analysis, based on data from the Massachusetts Bay Transportation Authority (MBTA) Green Line.

The studies by Lam et al. [16] and Puong [17], although distinct in their approach, shared a common foundation in examining the impact of boarding and alighting passenger numbers. However, Puong’s [17] model notably diverged by considering the per-door perspective and the influence of through-standing passengers on dwell time. This line of inquiry was further refined by Douglas [18], who, while building upon Puong’s [17] model, introduced modifications tailored to the Sydney suburban rail network. Douglas’s [18] research not only accounted for the base variables of earlier models but also incorporated additional factors to better reflect the passenger behaviours encountered in the Sydney metro system.

In addition to these developments, Douglas [18] also highlighted a model devised for the Thameslink and Crossrail projects by John Rosser and Peter Howarth. This model, which utilised Weston’s work as a foundational concept, focuses on the intricacies of passenger boarding and alighting dynamics without delving into door operation times, thus presenting a specialised approach for specific rail projects. The academic community has shown a particular interest in the Weston [14] model for its comprehensive and adaptable nature. Notably, studies by Harris [19] and Harris and Anderson [20] have underscored its applicability across a wide range of conditions and locations, despite identifying some overestimations in scenarios of high passenger volume.

More recently, research has advanced toward machine learning (ML), hybrid, and simulation-based dwell-time models that leverage large-scale, sensor-derived datasets. To touch on simulation works for a broader context, these models are beneficial for accounting for cascading delays and passenger interactions in rail operations. Studies by Zhang et al. [21] and Jiang et al. [22] used micro and macro-simulation models to analyse passenger movement and the relationship between train and passenger delays, respectively. Yamamura and Inagi [23] developed a multi-agent model to estimate train dwell times considering congestion. Jiang et al. [24] introduced a time-driven micro-simulation model to optimise dwell times on Shanghai’s rail transit Line 8. Perkins et al. [25] used agent-based modelling to reduce dwell times, finding that a combination of an active passenger information system and designated doors decreased loading times by 7.3%. Ahn et al. [26] improved passenger satisfaction in Brisbane through an agent-based simulation that relayed carriage occupancy levels, leading to more evenly distributed occupancy and reduced crowding. Simulation models are effective for localised studies but can be time-intensive for broader network outcomes.

There are also those ‘advanced’ models that incorporate methods like ML, fuzzy logic, probabilistic/stochastic approaches, and digital twins. Alvarez et al. [27] used a fuzzy-logic-based AI method to estimate dwell times at metro stations in Panama City, accounting for passenger preferences, yielding accurate approximations of actual dwell times. Wen-jun et al. [28] proposed an extreme learning machine (ELM) neural network model to analyse and model factors influencing urban rail dwell times in Beijing, outperforming other algorithms and existing statistical models. Glatin and Clarke [29] conducted a feasibility study for the Rail Safety and Standards Board on a real-time digital twin to reduce dwell-time variations on the UK Thameslink route, focusing on real-time passenger-flow prediction. Coulaud et al. [10] developed a hybrid approach using machine learning to create statistical models for the Paris Metro, validating their approach with extensive railway operations and passenger flow data.

A hybrid data-driven approach combining deep reinforcement learning (Proximal Policy Optimisation) and machine learning was proposed in [12] to optimise train trajectory reconstruction under service interruptions. Applied to the Wuhan–Guangzhou high-speed railway, the method achieved over 12% improvement in timetable rescheduling and a 20% reduction in train delays compared with conventional control decisions, demonstrating its effectiveness and practical applicability. A simulation-based Digital Twin (DT) prototype was developed by Padovano et al. [30] for a major Italian railway station to enhance operational control. The case study demonstrates the DT’s effectiveness in synchronising virtual and physical systems, integrating diverse data sources, and achieving a practical balance between accuracy, performance, and scalability, thereby bridging the gap between theoretical models and real-world transport hub management. A study by Bapaume et al. [31] introduces a computer vision–based deep learning framework for real-time prediction of passenger loads and train headways in urban metro systems, formulating the task as an image completion problem. Using three years of data from Paris Metro Line 9, France, the research compares several architectures, including transformer-based models, and demonstrates the framework’s robustness across both typical and atypical operating conditions, such as strikes and disruptions. Pritchard et al. [32] introduced a data-driven approach proposing a novel metric referred to as excess probability of delays to quantify how specific factors affect train delays, using two years of Swedish railway data. Results show that train meets and passes, particularly on single-track lines, substantially increase dwell-time delays, contributing roughly 4% of total dwell-time delays and highlighting opportunities for targeted operational improvements.

Through review of speed optimisation and dwell time control as measures for intelligent traffic management in rail-bound public transportation systems, Abrecht et al. [33] introduced operational target points and windows to improve throughput and reduce energy consumption. Case studies across three different rail-bound passenger systems in Germany demonstrate the potential benefits of these strategies, which can be implemented via Driver Advisory Systems (DAS) or Automatic Train Operation (ATO). A study by Kecman and Goverde [34] develops and compares data-driven models for estimating running and dwell times in railway traffic, using high-granularity historical track-occupation data. Both global models (robust linear regression, regression trees, and random forests) and refined local models are evaluated, with local models showing the best performance in terms of accuracy and computational efficiency, demonstrating their applicability for real-time railway operation.

A peer-to-peer train rescheduling system using Genetic Algorithm–based local search and negotiation protocols, tested on a UK railway bottleneck, shows faster computation and comparable optimality to centralised approaches [35]. A multi-agent system for real-time train rescheduling, which decomposes the network into single-junction levels and utilises a Condorcet voting–based collaboration mechanism [36], demonstrates a 34% increase in line capacity compared with conventional methods on a UK railway network. In another study in the UK, a reinforcement learning–based Q-learning approach with a tiered reward mechanism for very-short-term train rescheduling in a bi-directional single-track corridor was proposed [37], demonstrating improved solution quality, computational efficiency, and knowledge reusability compared with existing methods. A safety-oriented Origin Destination-based time-dependent fare optimisation model solved using an Iterated Local Search algorithm, which, when applied to the Beijing Metro Batong Line in China, effectively reduced passenger accumulation risk and improved safety on overcrowded metro lines [38].

For readers interested in state-of-the-art modelling advancements, several recent reviews [39,40,41] provide detailed discussions of the integration of digital twins, machine learning, and data-driven methods in railway operations and dwell-time analysis.

This extensive body of work on dwell time models not only exemplifies the depth of research dedicated to understanding and optimising rail system operations but also highlights the continuous need for model evolution to adapt to the nuanced demands of different rail systems and passenger behaviours worldwide.

3. Methodology

3.1. Data Collection and Locations

To obtain essential data on passenger boarding and alighting times (passenger flow time), door preferences, and total dwell times for regional trains, video-based observations from CCTV footage at Cobblebank and Rockbank stations, Victoria, Australia, were used. These stations were selected for their high-quality CCTV footage, capable of clearly distinguishing passenger movements in and out of the train and capturing the entire train length. The selection followed extensive discussions with the rail operator, who provided a sample of CCTV footage from various stations. The use of two stations—Cobblebank and Rockbank—was based on their representativeness of mid-tier regional stops with distinct passenger access configurations and consistent CCTV visibility, which ensured high data quality. This follows best practice in exploratory validation studies where observational precision outweighs geographic breadth [42]. Future research can extend this dataset to additional stations and longer durations as higher-fidelity video and sensor data become available. The stations’ location with respect to the broader network is shown in Figure 1. It is worth noting that a portion of the dataset used in this study was also included in our earlier work [9]; however, the present study focuses specifically on the evaluation and comparison of model performance.

The configuration of the two stations is illustrated in Figure 2. To describe the stations, at Cobblebank, passengers access Platform 1 via a staircase or a long ramp at the Ballarat end. At the same time, Platform 2 has two open entrances, one leading directly to the car park and another accessible via a staircase or ramp (see Figure 2a). In contrast, Rockbank has a central, large, open entrance on both platforms connecting to the overpass and parking area, which all passengers must use to board or alight (Figure 2b).

3.2. Ethics Consideration

The rail operator granted permission for the use of CCTV footage for research purposes. All extracted footage contained only non-identifiable visual information, ensuring that individual passengers could not be recognised. As the dataset consisted solely of secondary, de-identified material, the university’s ethics committee subsequently granted an exemption from full ethical review.

3.3. Data

CCTV footage was acquired for Cobblebank and Rockbank Stations throughout the day as two sets from Monday, 7 February 2022, to Wednesday, 9 February 2022, and from Monday, 5 September, to Friday, 10 September 2022. These time periods were selected because they were free of significant external influences on train patronage. The Victorian populace had adapted to the “new COVID-normal,” resuming in-person activities where possible.

The surveillance system footage obtained comprised short video files capturing only the periods when trains were at the station to minimise viewing risk. Each station had four camera angles, but no single camera covered all the necessary details. For complete visual coverage of the platform and train doors, videos from all four cameras were synchronised in Movavi Academic 23 (a video editing software) and merged into one composite file to enable thorough review. Data were collected from 398 VLocity train services at Cobblebank and Rockbank Stations. Each train stopped at the relevant marker on the platform depending on whether it was a 3-car or 6-car VLocity. Our sample of 398 services significantly exceeds the sample sizes reported in similar observational studies, such as Oliveira et al. [42], which used only nine departures at one station.

The train services observed throughout the day were categorised as follows. Morning peak (AMP Up) services, arriving at Southern Cross Station (Melbourne) between 07:00 and 09:00, used Platform 1. Interpeak Up services (INP Up), arriving Southern Cross Station (Melbourne) between 09:01 and 15:30, used Platform 1. Interpeak Down services (INP Down), departing Southern Cross Station (Melbourne) between 09:01 and 15:30, used Platform 2. Afternoon peak (PMP Down) services, departing Southern Cross Station between 15:30 and 18:30, used Platform 2. Post PM Up (POP Up) services, arriving at Southern Cross Station (Melbourne) after 18:30, used Platform 1. Post PM Down (POP Down) services, departing Southern Cross Station (Melbourne) after 18:30, used Platform 2. Services in the counter-peak direction were excluded (i.e., AMP Down, PMP Up) due to having very low service counts and patronage.

The data breakdown and the number of services examined are detailed in Table 1.

3.4. Model Selection

The most prominent statistical dwell time models available in the current literature are Wirasinghe and Szplett [13], Weston [14], Lam et al. [16], Puong [17], and Douglas [18], which are related to heavy rail and would be performance tested as part of this study. The comparative evaluation of these five established statistical dwell-time models represents the chronological and methodological progression of regression-based approaches developed over the past four decades. These models were selected because they are the most frequently cited and/or operationally applied dwell time formulations in both academic and practitioner literature, each contributing a distinct advancement to the understanding of how passenger boarding and alighting behaviour influences total dwell duration [1,43,44].

While more recent studies have demonstrated the predictive superiority of data-driven and machine learning (ML) methods, such as random forests or neural networks [10,12], traditional regression models continue to play a critical role in rail operations research due to their transparency, interpretability, and low data requirements, allowing explicit estimation of how individual factors, such as passenger boarding, alighting activity, and operational constants, affect total dwell time. This feature is particularly valuable for railway planners and policymakers who require clear, quantifiable relationships rather than opaque algorithmic outputs [45,46]. In contrast, most ML techniques, while often delivering higher predictive accuracy, function as “black-box” systems, providing limited insight into causal mechanisms or the practical levers available for intervention [47,48]. Classical statistical frameworks, such as those five models used in this study, remain indispensable for benchmarking emerging algorithms, validating their outcomes, and explaining the physical and behavioural mechanisms underlying dwell-time variability [43,44]. Accordingly, this study examines these foundational models in a regional railway context, not to replicate past work but to test their contextual robustness and delineate the boundaries of their applicability, providing a reference point for future hybrid or ML-based model development.

Wirasinghe and Szplett [13] introduced a series of models, developed from survey observations (unknown time period), distinguishing between scenarios with dominant alighting, mixed flow, and dominant boarding. In the dominant alighting model, the total dwell time (t) is expressed as a linear function of the number of alighting (a) and boarding (b) passengers, with alighting passengers weighted more heavily. This model underscores the disproportionate impact of alighting passengers on dwell times. Conversely, the mixed flow model adopts a more balanced approach, albeit with different coefficients, suggesting a nuanced interaction between boarding and alighting processes. The dominant boarding model mirrors the structure of the dominant alighting model but assumes equal weights for both boarding and alighting, highlighting scenarios where boarding activities predominate. What this paper terms as the ‘partial model (Pf)’, for Wirasinghe and Szplett [13], refers to the complete model without the ‘fixed time lost (l)’, as this aims to represent the operational nuances as a fixed constant.

Weston [14] model, developed from survey observations (unknown time period), offers a complex formulation that incorporates not only the numbers of boarding and alighting passengers but also the train’s seating capacity, the number of through passengers, and the train’s door factors, among others. This comprehensive approach aims to capture the multifaceted dynamics affecting dwell time, accounting for physical constraints and passenger behaviours. What this paper terms the ‘partial model (Pf)’ for Weston [14] is a complete model without the ‘function time’ that also aims to represent the operational nuances as a fixed constant. Lam et al. [16] proposed a simpler model, developed from peak-period survey observations, linearly relating dwell time to the number of boarding and alighting passengers, with a fixed time added to account for operational constants. This model emphasises the direct impact of passenger flow on dwell times, providing a straightforward method for estimation. Again, what this paper terms the ‘partial model (Pf)’ for Lam et al. [16] refers to the complete model without the 10.5 s fixed constant representing operation nuance.

Puong [17] developed from peak-period survey observations, focusing on the per-door dynamics of passenger boarding and alighting, introducing a model that considers the number of passengers per door and the cubic impact of standing passengers per door. This model highlights the significance of door-level passenger flows and the non-linear effects of passenger congestion on dwell times. The ‘partial model (Pf)’ for Puong [17], as deemed by this paper, refers to the complete model without the 12.22 s fixed constant representing operation nuance. Douglas [18] further explored the non-linear dynamics of passenger interactions at the door level, incorporating both boarding and alighting passengers, as well as the estimated number of standing through-passengers per door. This model, developed from a combination of peak-period survey observations and controlled live simulations, accounts for the complex interplay between different types of passengers and their collective impact on dwell times. The ‘partial model (Pf)’ for Douglas [18], as deemed by this paper, refers to the complete model without the 10 s fixed constant representing operation nuance. Both complete and partial models used for testing in this study are stated in Table 2.

3.5. Performance Testing

The performance testing of the selected models began by segmenting the observed data into distinct ‘conditions’. These conditions corresponded to specific time periods and directions during which the data was recorded. The categories included AMP Up, INP Up, INP Down, PMP Down, POP Up, and POP Down.

For each observation under each condition, relevant recorded data—such as the number of boarders and alighters and the door that was used—was processed through each selected model. The models were evaluated in two distinct states: the partial state, which represented only passenger flow time, and the complete state, which represented total dwell time.

Each model generated predictions for both passenger flow time and total dwell time based on the input data for each condition. These predicted times were then compared against the observed times. The comparisons were visualised through plotting, which provided an intuitive understanding of how each model performed under each condition.

To quantitatively assess the model performance, the coefficient of determination (r²) was calculated. This statistic measured the proportion of variance in the observed times (both passenger flow and total dwell time), which was explained by the model predictions. An r² value close to 1 indicated a high proportion of variance explained by the model, suggesting good predictive accuracy. By combining visual and quantitative analyses, this methodology ensured a comprehensive evaluation of the selected models’ performance across different conditions.

4. Results and Analysis

The summary of dwell time observations, highlighting key insights, is first presented. It will be followed by model performance results for both partial and full models.

4.1. Summary of Dwell Time Observations

The summary of dwell time observations includes the results on observed dwell time components, observed passenger flow rates and observed concentrated boarding and alighting.

4.1.1. Observed Dwell Time Components

The data collection methodology allowed for the different components of the dwell process to be distinguished, including the time until doors opened (C1), the time doors started to open before passengers began to board or alight (C2), the time taken for boarding or alighting (C3), additional time spent by conductors on various operational tasks (C4), the time for doors to close (C5), and the preparation time before departure (C6). The observed number of trips, along with the average number of passengers joining and alighting, was also recorded, providing a comprehensive overview of the dwell time at the stations investigated. This has been summarised in Table A1 in Appendix A, while Table 3 shows only C3 (Board/Alight), C4 (Conductor Time) and Total Dwell for each band/train type/direction, focusing on the most critical comparative metrics.

During the Morning Peak (AMP), the data indicated generally higher dwell times, with 3VL trains in the Up direction (n = 19) showing the highest average dwell time at 40.84 s. This increase can most likely be attributed to only having the six single stream doors available for an average of 19.79 passengers to choose and board. A notable distinction was observed between 3VL and 6VL trains, with the latter showing higher average totals of joins (53.63), suggesting their increased use during peak times due to larger capacity. Despite this, their average dwell times of 37.92 s were similar to those of 3VL trains at 40.84 s, pointing to a more inefficient boarding and alighting process given the number of people boarding. The average “Additional Conductor Time (C4)” of 9.74 s for a 3VL and 5.33 s for a 6VL emerged as a significant factor impacting dwell times, especially for trains in the UP direction, indicating potential areas for operational improvement.

In the Inter-Peak (INP) period, dwell times were found to be moderately high but less variable than during the AMP, with 6VL trains in the Down direction (n = 23) experiencing the highest average dwell time at 37.57 s. This would suggest similar passenger flows as those of the AMP; however, this was not the case, with an average of 15.35 alights and 1.91 joins. The average totals of joins and alights were significantly higher for 6VL trains, especially in the down direction, highlighting their role in managing passenger volumes efficiently during these less-peak times. Both 3VL and 6VL trains identified “Additional Conductor Times (C4)” as a key factor affecting dwell times, with longer durations of up to 10.39 s experienced by 6VL trains, likely due to handling more passengers.

The Evening Peak (PMP) recorded the longest dwell times, particularly for 6VL trains in both directions, reflective of the evening rush when commuters return from their daily activities. The highest average dwell time observed was 45 s for 6VL trains in the Up direction (n = 8), which was unexpected; however, it was better explained with the average C4 time of 18.25 s, which can be suggestive of a service’s ‘waiting time’ to depart. In the peak Down direction, 6VL trains had an average dwell of 46.54 s (n = 59). Here again, 6VL trains showed higher average total alights, especially in the down direction, with 46.54 average alights, underscoring their crucial role during the evening peak.

During Off-Peak (POP) hours, generally lower dwell times were noted, except for 6VL trains in the Down direction (n = 2), which had an average dwell time of 40.50 s. This was indicative of a less hurried boarding/alighting process in times of lower train occupancy. Remarkably, 6VL Up trains (n = 2) demonstrated extraordinarily low average dwell times of 24.5 s, indicating highly efficient operations likely facilitated by lower passenger volumes. The significant disparity in Board/Alight times between directions and train types hinted at variability in passenger behaviour and train operations, suggesting opportunities for further efficiency improvements.

Overall, the results confirm that C4 time (conductor procedures) is a significant and variable determinant of total dwell time. Its consistent influence indicates that operational factors, beyond passenger flow, are critical drivers of dwell time variability in regional railway settings.

4.1.2. Observed Passenger Flow Rates

The analysis of passenger flow rates through the peak door, per condition, provided detailed insights into the efficiency and variability of passenger movement (Figure 3). By focusing on the interquartile ranges (IQRs) and means, a nuanced understanding of the flow dynamics was achieved, revealing trends and patterns crucial for understanding why some models may have achieved a better fit with the observed data than others.

In the AMP Up condition (which predominantly had passengers boarding) exhibited an IQR from 1.40 to 2.14 s per passenger, indicating a relatively uniform flow, with the mean (μ = 1.83) suggesting efficiency in passenger movement.

The INP conditions showed greater variability, especially in the Up direction, with an IQR from 1.48 to 3.47 s per passenger, highlighting potential flow inefficiencies. The mean flow for the Up direction (μ = 2.68) was slower than that of the Down direction (μ = 2.18), which had a tighter IQR from 1.35 to 2.42 s per passenger, suggesting moderate variability but generally efficient flow.

In the PMP Down condition (which predominantly had passengers alighting) was highly consistent, with an IQR from 1.17 to 1.47 s per passenger and a mean (μ = 1.42) indicating efficiency.

For the POP conditions, the Down direction had an IQR from 1.28 to 1.79 s per passenger, showcasing efficient and consistent flow rates, similar to the PMP band, with the mean (μ = 1.66) reflecting swift passenger movements. The Up direction demonstrated an IQR from 1.71 to 2.38 s per passenger, indicating less variability compared to other bands, with the mean (μ = 2.02) flow rate being slower than that of the Down direction, highlighting a trend toward slower yet more consistent flows in the Up direction.

4.1.3. Observed Concentrated Boarding and Alighting

3-Car VLocity Train (6 Single Stream Doors)

The percentage of total passengers (PAX) that boarded or alighted through the peak door of a 3-car VLocity train service, categorised by condition, is shown in Figure 4. It is important to note that a 3-car VLocity train has six doors that can be boarded or alighted; therefore, the theoretical peak door-through percentage under even loading is 16.67%.

The findings revealed a range of peak door usage percentages across the conditions. In the AMP Up, the IQR for the peak door usage for the AMP band ranged from 28% to 36% with a mean of 31% (n = 19), which suggests passengers were more willing to ‘spread out’, potentially due to the higher number of passengers waiting to board through the six possible doors.

Regarding the INP conditions, peak door usage exhibited more consistency across both directions. In the Down direction, where passengers mostly alighted, the IQR for the peak door usage ranged from 29% to 53% with a mean of 43% (n = 65), while in the Up direction, where passengers mostly boarded, the IQR similarly spanned from 27% to 54% with a mean of 42% (n = 115).

For the PMP Down condition, the IQR for peak door usage ranged between 28% and 36%, with a mean of 32% (n = 9), similar to the AMP Up condition. Analysis of the POP conditions also showcased diverse patterns in peak door usage. In the Down direction, the IQR for peak door usage ranged from 27% to 35% with a mean of 32% (n = 17), which is very similar to the PMP Down, whereas in the Up direction, the IQR for peak door usage varied from 67% to 80% with a mean of 72% (n = 7). These observations underscored the nuanced nature of passenger behaviour and demand across different time periods and directions.

6-Car VLocity Train (12 Single Stream Doors)

The percentage of total passengers (PAX) that boarded or alighted through the peak door of a 6-car VLocity train service, categorised by condition, is shown in Figure 5. It is important to note that a 6-car VLocity train has 12 single-stream doors that can be boarded or alighted; therefore, the theoretical peak door-through percentage under even loading is 8.33%.

In the AMP Up condition, the IQR for peak door usage ranged between 15% and 20%, with a mean of 19% (n = 51). Within the INP band, the IQR peak door usage for Down direction services was between 22% and 36% with a mean of 33% (n = 23), while for Up direction services, it ranged between 22% and 39% with a mean of 36% (n = 29), which is consistent for the time period. Moving to the PMP Down condition, the IQR for peak door usage ranged from 20% to 27%, with a mean of 23% (n = 59). Finally, in the POP band, Down direction services displayed an IQR for peak door usage ranging from 25% to 30%, with a mean of 28% (n = 2), whereas only one entry was recorded for Up direction services, indicating 100% (n = 1).

4.2. Model Performance

4.2.1. Passenger Flow Time Only (Partial Model)

The observed and modelled passenger (PAX) flow time (partial model) can be summarised in Table 4. In the analysis of all services (All), with a sample size of 398, Douglas [18] achieved the highest r² value of 0.6473, indicating the strongest fit among the ‘partial’ models. The p-values were recorded at <0.001 for all studies, suggesting the results were statistically significant and indicative of a consistently significant relationship between observed and modelled passenger flows. The scatter plot for ‘All services’ is shown in Figure 6. It is to be noted that ‘All services’ aggregates all the variability (peak, off-peak, different directions).

In the AMP Up scenario, which had a sample size of 70, the highest r² was reported by Puong [17] at 0.5745, with Douglas [18] closely following. The significance of the findings was underscored by statistically significant p-values, demonstrating strong and statistically significant predictive capabilities for passenger flow (predominantly boarding) during the AMP Up scenario. The AMP scatter plot is shown in Figure 7.

For the INP Up scenario, with 144 observations, Douglas [18] reported the highest r² value of 0.2302. However, the overall lower r² values in this scenario suggested less predictive accuracy. The variation in p-values indicated differing levels of significance among the studies. The INP Up scatter plot is shown in Figure 8.

In the INP Down scenario, comprising 88 observations, the highest r² was seen in Douglas [18] with 0.6703, suggesting a strong model fit. The findings were supported by very significant p-values, indicating a reliable predictive capability for passenger flow during this time period. The INP Down scatter plot is shown in Figure 9.

The PMP Down scenario, with 68 observations, showcased particularly high r² values, with Puong [17] leading at 0.8553, followed by Douglas [18]. The significant p-values emphasised the reliability of these findings, showing strong predictive strength for passenger flow (predominantly alighting) during the PMP Down scenario. The PMP down scatter plot is shown in Figure 10.

In the POP Up scenario, which included a small sample size of 9, Douglas [18] achieved the highest r² of 0.7954, indicating a robust model fit. Despite the small sample size, the statistically significant p-values, albeit higher, suggested a cautious interpretation but still underscored Douglas's [18] robust predictive capability for passenger flow during this time period and direction. The POP Up scatter plot is shown in Figure 11.

For the POP Down scenario, with 19 observations, the highest r² was by Lam et al. [16] at 0.6127, closely followed by other studies. The variability in p-values across studies suggested differences in the models’ predictive capabilities for passenger flow for this time period and direction. The POP Down scatter plot is shown in Figure 12.

Overall, the analysis indicated that Douglas [18] model consistently showed strong predictive capabilities across most time periods and directions, suggesting it as a robust and reliable model for predicting passenger flows for a regional railway. However, the effectiveness of the models varied significantly based on the specific time period and direction, highlighting the importance of context in model selection and application for a regional railway. The correlation statistics, along with the error metrics—Mean Absolute Percentage Error (MAPE), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE)—are presented in Table 4.

4.2.2. Total Dwell Time (Complete Model)

The observed and modelled total dwell times (complete model) can be summarised in Table 5, drawing insights from the same studies and time periods as those of Section 4.1.1.

For all services examined, the sample size remained consistent at 398 observations. The coefficients of determination (r²) varied across studies, with Puong [17] yielding the strongest correlation (r² = 0.2452), while Weston [18] demonstrated the weakest relationship (r² = 0.0443). Statistically significant results (p < 0.05) were observed in most studies, except for the dominant alighting variation of Wirasinghe and Szplett [13] and Weston [14], suggesting a notable association between observed and modelled total dwell times. A scatter plot for all services is shown in Figure 13.

Within the AMP Up time period, the sample size remained constant at 70 observations across all studies. Despite this consistency, correlations were generally weak, with the highest coefficient of determination recorded in Puong [17] at r² = 0.0734. However, none of the scenarios yielded statistically significant results (p > 0.05), indicating limited predictive accuracy for AMP Up conditions. The AMP Up scatter plot is shown in Figure 14.

Similarly, the INP Up time period, with a sample size of 144 observations in all studies, demonstrated weak correlations overall. While Weston [14] presented the highest coefficient of determination (r² = 0.1013), none of the correlations were statistically significant (p > 0.05), suggesting challenges in accurately modelling total dwell times for INP Up conditions. The INP Up scatter plot is shown in Figure 15.

In the context of the INP Down time period, characterised by 88 observations in all studies, correlations ranged from weak to moderate, with Douglas [18] exhibiting the highest coefficient of determination (r² = 0.1470). However, statistical significance was elusive in most scenarios (p > 0.05), indicating difficulty in effectively modelling total dwell times under such conditions. The scatter plot for INP Down is shown in Figure 16.

Conversely, in the PMP Down time period, with a consistent sample size of 68 observations, demonstrated stronger correlations across studies. Notably, Puong [17] exhibited a high coefficient of determination (r² = 0.4778), with statistically significant results (p < 0.05), suggesting effective model performance in predicting dwell times for PMP Down conditions. The scatter plot for PMP Down is shown in Figure 17.

The POP Up time period, characterised by a notably small sample size of 9 observations, presented high coefficient of determination values, particularly in Weston [14] (r² = 0.6834). Despite this, mixed results were observed in terms of statistical significance, with Weston [14] showing significant results (p = 0.0424), indicating potential in modelling POP Up scenarios despite the limited sample size. The scatter plot for POP Up is shown in Figure 18.

Finally, POP Down scenarios, with 19 observations across studies, demonstrated generally low to moderate correlations. Weston [14] exhibited the highest coefficient of determination (r² = 0.1767), yet statistical significance remained elusive in most instances (p > 0.05), highlighting the challenges in accurately modelling total dwell times in such scenarios. The POP Down scatter plot is shown in Figure 19.

In summary, the analysis demonstrated that utilising the ‘full model’ to predict total dwell time was less effective at predicting observed data across various scenarios, when compared to using only the passenger flow component of the models to predict passenger flow time. The correlation statistics, along with the error metrics (MAPE, RMSE, and MAE) for full model, are presented in Table 5.

4.2.3. Goodness of Fit Comparison (Partial vs. Complete)

The comparative analysis of model performance across different service scenarios revealed clear distinctions between the effectiveness of the ‘partial models’ based solely on passenger flow and complete models that attempt to predict total dwell time. Across all services, the partial models demonstrated moderate to strong correlations between observed and modelled passenger flows. The study by Douglas [18] yielded the highest correlation (r² = 0.6473, p < 0.001), with an RMSE of 6.16 and MAE of 4.61, indicating both statistical robustness and practical accuracy. Other models, such as Puong [17] and Lam et al. [16], also showed solid performance with r² values above 0.53 and relatively low error rates. In contrast, the complete models produced considerably weaker correlations in the all-services scenario, with r² ranging from just 0.0443 to 0.2452. Although some models (e.g., Puong [17], Douglas [18]) reached statistical significance, error metrics were notably higher (e.g., MAPE > 50%, RMSE up to 32.55), highlighting the challenges in predicting total dwell time.

During the morning peak upwards (AMP Up) scenario, partial models continued to exhibit moderate correlations (r² = 0.3628 to 0.5745), all of which were statistically significant. The best performance was seen in the Puong [17] model (r² = 0.5745, RMSE = 4.62, MAPE = 24.69), suggesting it was relatively accurate and stable under peak conditions. Conversely, the full dwell time models for AMP Up performed poorly, with r² values not exceeding 0.0734 and none reaching statistical significance. These models also showed high error rates (e.g., RMSE > 22, MAPE > 65%), underscoring the limitations of total dwell time predictions during high-demand periods.

The inter-peak upward (INP Up) and downward (INP Down) scenarios presented more varied results. For INP Up, partial models showed weak to moderate correlations (r² = 0.1013 to 0.2302), with statistical significance present in most models except Weston [14] and Lam et al. [16]. The error values in this group were relatively low, especially in Puong [17], which produced an RMSE of 3.17 and an MAE of 2.29. Complete models, on the other hand, remained weak across all INP Up studies, with no significant correlations and high MAPE values (often exceeding 68%). The INP Down scenario was more promising for the partial models, with Douglas [18] reaching an r² of 0.6703, and most models achieving significance and good fit metrics. However, complete models continued to perform poorly (r² ≤ 0.1470) with none achieving statistical significance and all displaying high prediction error.

In the evening peak downwards (PMP Down) scenario, partial models showed excellent performance. Notably, Puong [17] achieved a correlation of r² = 0.8553 (p < 0.001), while Douglas [18] also performed well (r² = 0.8124). These models maintained relatively low RMSE and MAE values, confirming their practical applicability during peak evening operations. Although full models showed improved performance in this scenario compared to others, their r² values (ranging from 0.0940 to 0.4778) remained generally lower. Only Puong’s [17] dwell time model approached acceptable error margins (MAPE = 17.47, RMSE = 10.23), indicating some potential for model refinement.

Finally, in the evening off-peak downwards (POP Down) scenario, partial models again outperformed their complete counterparts, though with somewhat more variability. Lam et al. [16] and Puong [17] reported strong correlations (r² = 0.6127 and 0.6081, respectively) with statistical significance and moderate error margins. Nonetheless, MAPE for some models exceeded 70%, suggesting declining accuracy during lower-demand periods. The full models in this scenario failed to produce any significant correlations, with r² values not exceeding 0.18 and MAPE consistently exceeding 50%, indicating unreliable fit and poor generalizability.

This analysis highlighted the varying degrees of success in modelling passenger flow and total dwell times across different time (and direction) scenarios (as shown in Figure 20). While the passenger flow component of the dwell time models tended to predict passenger flow with reasonable accuracy, particularly under specific peak conditions, the prediction of total dwell times proved to be more challenging, indicating areas for future improvement in transportation modelling.

5. Discussion

5.1. Performance

The consistently higher r² values for passenger flow time only, compared to total dwell time, across studies suggest that while passenger flow can be predicted with a moderate degree of accuracy, total dwell time is influenced by additional factors not captured by passenger flow models alone. This discrepancy may be attributed to variables such as door operations, operator staff procedures, and other inefficiencies not directly related to the number of passengers.

For instance, in scenarios with high passenger flow, one might expect total dwell time to increase due to the longer time required for passengers to board and alight. However, the lower correlation with total dwell time suggests that the number of passengers is not the sole determinant of dwell time, particularly for a regional railway; operational factors play a significant role as well.

One additional point to be conscious of is that weak r² values in this context should not be completely disregarded, as they can still provide meaningful insights into the study. In the field of social sciences, correlations of 0.20 to 0.30 are often deemed meaningful due to the complexity of human behaviour [49,50,51]. Therefore, even weak correlations have been identified and discussed to provide a comprehensive view of the data, identifying trends that contribute to understanding passengers’ boarding or alighting behaviours. This approach ensures no significant patterns or insights are overlooked.

5.2. Model Architecture

Considering the discussion about the higher performance of the models by Puong [17] and Douglas [18] against a regional railway, it is worth discussing why the authors believe this to be the case, and one pathway that could be followed is to examine where these models have been developed and the key variables within them.

It is worth noting that the highest performing models in this study, Puong [17] and Douglas [18] are similar in nature. They both account for the same key variables of the number of alighting and boarding passengers per door, in addition to the number of standing passengers per door. One difference between the models is that Douglas [18] uses a power function of 0.7 on the number of boarders and alighters (similar to Weston [14])) as opposed to Puong’s [17] linear function for this component, and adopts a linear function for standing through passengers that multiplied standing passengers by the combined total of boarding plus alighting passengers, as opposed to the cubic function. It is worth noting that the ‘standing through passengers’ component of all models tested was effectively cancelled out as this was assumed to be zero due to patronage data showing that the number of people on board at the departure of the two stations that the data came from, was less than the number of seats provided, and it was assumed all people would have gravitated to a free seat.

Another difference between the respective models of Puong [17] and Douglas [18] is the magnitude of the flow rate factors used for the estimation of the time required for boarding and alighting. Puong [17] found in their dataset that boarders were slower than alighters, which explained the flow rate factor of 2.27 and 1.82 applied, respectively. Douglas [18], on the other hand, found in their dataset that alighters were slower than boarders, which explained the flow rate factor of 1.9 and 1.4, respectively, which is contrary to Puong’s [17] findings. It is also important to note that both these models were developed based on peak-period datasets.

As this study found that the average passenger flow rates in the AMP Up and PMP Down were 1.99 and 1.42 s per passenger, respectively, it could be suggested that peak period passenger behaviour aligned closer to the dataset by Puong [17], where boarders were slower than alighters. However, it must be noted that interestingly, the flow rates found in this study were essentially the inverse of Douglas (2012), where the boarding rate found in this study was very close to Douglas [18] alighting rate found in that study and vice versa. Potential reasons for this could lie in the rolling stock type used in each of the studies. It was noted that Puong [17] collected data from rolling stock types that had single-stream doors; however, Douglas [18] collected data from the Millennium train sets, where each carriage had large double-stream doors. The VLocity train, from which the data was collected in this study, has single stream doors and therefore aligns most closely to the train type Puong [17] used to collect data, which can explain the more similar passenger behaviour observed in the PMP Down, compared to that of Douglas [18], which subsequently can explain the slightly better passenger flow r² fit of Puong [17] model of 0.8553 compared with Douglas [18]’s r² fit of 0.8124.

The condition that drew the poorest performance from the models by Puong [17] and Douglas [18] was the INP Up, where both models could only achieve r² values of 0.2182 and 0.2302, respectively. The possible explanation for such a poor fit for this particular condition was that the INP Up saw the slowest average flow rate of all (by at least a full second on average), with passengers boarding at an average rate of 3.63 s per passenger. This can indicate that passengers who travel between the peaks on a regional railway can be less confident about the experience and require more time to board (i.e., tourist, occasional traveller). This slower flow rate can therefore have a significant impact on dwell time estimations using known statistical models, as none of the models have been developed using ‘off peak’ data that captures ‘less confident’ travellers and the slower flow rates they may require.

5.3. Implications for Dwell Time Modelling

The prolonged and variable conductor times (C4) observed in this study are likely a direct outcome of the procedural and coordination requirements unique to regional rail operations. Unlike metropolitan systems, regional lines often involve single-track sections, flat junctions, and shared corridors with freight and suburban services, which necessitate additional crew communication, safety checks, and dispatch confirmation before train departure. These operational complexities, sometimes compounded by “blinded sections” or restricted sightlines, contribute to greater variability in conductor procedure time. This finding reinforces the need for dwell time models that explicitly account for regional operational characteristics, rather than relying solely on urban-based formulations.

The implications of the results of this study draw particular attention to three aspects of dwell time modelling, with a particular focus on regional railways. The first is that the modelling of time required for passenger flows, only from previous studies, can be appropriately applied to regional railways as well. The second is the need to gain a better understanding of the lost time due to operational factors, and the third is how differing passenger flow rates and peak door percentages can affect the performance of dwell time models.

5.3.1. Passenger Flow Compatibility

The high correlations between observed and modelled passenger (PAX) flow from previous studies indicate that these models are robust and can also be effectively applied to a regional railway. This suggests that the assumptions used to develop the passenger flow component of models in previous research, such as those by Wirasinghe and Szplett [13] and Douglas [18], are also apparent for a regional railway. This transferability is beneficial for a regional rail operator, as it can be suggested that one could use these existing models to forecast passenger flow time and the consequences of that.

5.3.2. Time Lost to Operation Factors

The weaker correlations for total dwell time predictions across the studies highlight the need for a deeper understanding of the lost time due to operational factors. Factors such as door operations (i.e., how long it takes the doors to open and close) are generally fixed for each train type, and this lost time can be appropriately assumed as a fixed constant in models. However, factors such as operational staff procedures that must be completed for both train and platform can also significantly impact the time lost to operational factors; however, these factors do vary as seen by the observed results in Table 3. This variability needs to be better understood to better inform dwell time modelling, and from a rail planning perspective, rail operators who may use the available dwell time models may need to quantitatively account for this based on local network procedures.

5.3.3. Flow Rates and Peak Door Percentage

Based on the findings of this study, passenger flow rates in the peak period and direction (see Figure 3, AMP U, PMP D) were found to be the fastest compared to the off-peak findings and ‘the peak door percentage of total passengers per service’ was also observed to be the lowest in the peak period and direction for both 3-car (see Figure 4, AMP U, PMP D) and 6-car (see Figure 5, AMP U, PMP D) VLocity trains indicating that passengers in the peak periods had more of an inclination to ‘spread out’ when they looked to board or alight.

These findings can be explained by the rationale that during peak periods, passengers are typically commuters who are likely to be more confident in the station/train environment and exhibit more efficient boarding and alighting behaviours compared to off-peak periods. In the off-peak periods, it can generally be expected to see a higher proportion of passengers who are likely less confident and efficient with the station/train environment [52] as it is generally more isolated to family/group travel, occasional travellers, tourists, or the elderly (i.e., fewer commuters) [53], which can align with the findings of slower passenger flow rates and a higher percentage of passengers boarding through the peak door (i.e., a family or group of four friends would nearly always board through the same door to stay together as opposed to finding a separate door each).

It can, therefore, be suggested that passenger behaviour and potentially, passenger demographics (i.e., commuter, occasional traveller, elderly) can vary significantly across different time periods on a regional railway, which does impact the performance of dwell time models (as seen in Figure 20) if the variations in both flow rates and peak door use are not considered together. Using one of the tested dwell-time models to estimate dwell times for the entire day for a regional railway can have varying results (as seen in Figure 20). Timetable planners should consider using different dwell time values for peak vs. off-peak services to account for these behavioural differences.

5.4. Policy Implications

The findings have several important policy and operational implications for regional railway management. First, the strong and variable influence of C4 time (conductor procedures) suggests a need for standardised staff training and operational protocols across stations to reduce inconsistency in departure processes. Establishing clearer procedural guidelines and monitoring compliance could help minimise avoidable dwell time extensions.

Second, the results support the development of evidence-based timetable policies that incorporate more realistic dwell time allowances reflecting regional operational complexity, including mixed-traffic conditions, single-track operations, and manual dispatch procedures. Adjusting dwell buffers to match observed variability can enhance service punctuality and minimise cascading delays.

Third, policymakers should consider targeted infrastructure and technology investments, such as door-automation improvements, real-time dwell monitoring systems, or digital dispatch aids, to reduce dependence on manual coordination. These measures can strengthen both efficiency and safety without requiring large-scale infrastructure expansion.

Finally, the methodology used in this study demonstrates the value of micro-level video analysis as a policy tool for evaluating on-ground operational performance. Expanding this data-driven approach to more stations could enable continuous dwell time auditing and support more adaptive, regionally tailored railway policies.

5.5. Limitations

This study presented several limitations that should be acknowledged. Firstly, the geographic and operational scope was confined to two stations—Cobblebank and Rockbank—within the Victorian regional railway network. These stations were selected based on the quality and suitability of available CCTV footage; however, they may not have captured the broader diversity of infrastructure, operational conditions, or passenger demographics that exist across other regional stations. As such, the results may not be entirely generalizable to other regional railways elsewhere. Although this study focused on two representative intermediate stations along the Ballarat line, the findings provide an important benchmark for understanding dwell dynamics in regional contexts. By systematically decomposing 398 train services into passenger flow and operational components, the study offers transferable methodological insights for other regional rail networks. The results emphasise that model reconstruction and system-wide extrapolation should account for regional heterogeneity in track layout, crew procedures, and scheduling policies. The relatively low explanatory power of the complete model likely reflects unobserved influences from infrastructure and procedural variables not captured in the dataset, such as the speed of door opening, the length of platform announcements, and crew coordination during departure authorisation. While these variables could further explain the variability within the C4 component, they were beyond the scope of this study due to data and access limitations. Future research should seek to disaggregate C4 into regulatory (“mandatory”) and behavioural (“discretionary”) subcomponents and test their effects using multi-level or random-effects models when detailed operational data are available. These findings thus provide an empirical basis for future research aimed at tailoring statistical or hybrid models to regional operating environments.

Secondly, the coverage of the data was limited to two observation periods (February and September 2022), spanning approximately ten days in total. Although these timeframes were chosen to avoid abnormal patterns in patronage (e.g., public holidays or major disruptions), they did not account for potential seasonal fluctuations or other temporal variations in passenger behaviour. This constraint may have limited the comprehensiveness of the findings from a ‘model responsiveness’ perspective; however, the intent of this study is intended more to inform high-level timetable planning, where it is important to establish the ‘normal’ expected dwell time at a station.

Furthermore, counter-peak services were excluded due to their low service frequencies and patronage, which meant that this study did not explore model performance under conditions that were more conducive to minimal boarding or alighting. Likewise, the study did not consider scenarios involving high service disruption, crowd surges, or emergency operational conditions, which could be relevant for resilience planning in regional rail networks.

The study also focused exclusively on established statistical dwell-time models and did not evaluate simulation-based or machine learning approaches. While this was appropriate given the study’s objectives, it limited the exploration to more computational or adaptive modelling methods that may be better suited to capturing the possibility of a more complex, non-linear nature of regional rail operations.

Given these limitations, the authors reiterate that this study is more about laying a foundational piece that uses empirical evidence to highlight the limited ability of models tested in this study to model regional railway dwell times of two regional stations accurately. This study does not claim that the empirical findings presented here can be generalised to all regional railway stations, as the data scope was intentionally bounded to ensure data quality and control for local operational variability. Rather than representing a limitation, this focused design provides a robust empirical foundation for understanding model performance in a representative regional context. The findings should be interpreted as an evidence-based prompt for further exploration of dwell-time modelling in regional rail systems, where empirical studies remain limited. Building on this foundation, future research should broaden the spatial and temporal scope by incorporating more stations, larger sample sizes, and varying operational conditions, as well as by developing and testing new or hybrid models specifically calibrated for regional railway environments.

6. Conclusions

This study rigorously tested the performance of established statistical dwell time models against empirical data from a regional railway system, offering novel insights into the accuracy and reliability of these models in non-urban contexts. The findings highlighted several critical aspects of dwell time modelling for regional railways.

Firstly, the strong correlations between observed and modelled passenger flow times indicated that existing models, particularly those developed by Puong [17] and Douglas [18], were robust and could be effectively applied to regional railways. This transferability suggested that regional rail operators could utilise these models to forecast passenger flow times reliably.

Secondly, the weaker correlations for total dwell time predictions emphasised the need for a deeper understanding of operational factors that influenced dwell time. Although some models reached statistical significance, error metrics were notably higher, highlighting the challenges of predicting total dwell time. Variability in the door operation times and staff procedures significantly impacted dwell time but were not adequately captured in existing models. This variability underscored the necessity for regional rail operators to account for these factors quantitatively based on local network procedures.

Thirdly, the study found that passenger flow rates and the percentage of passengers using peak doors varied significantly between peak and off-peak periods. During peak periods, passengers were typically more efficient in boarding and alighting, whereas off-peak periods saw slower flow rates due to potentially less confident travellers, such as tourists and occasional passengers. These behavioural differences affected the performance of dwell-time models and needed to be accounted for to ensure accurate predictions across different time periods.

Policymakers and rail operators should address several critical implications highlighted in this study. For instance, incorporating local operational variables, such as door operation times and staff procedures, into predictive models is essential, as current dwell time models fail to adequately account for the impact of regional-specific factors. Policy measures should prioritise integrating these variables to enhance the accuracy and reliability of predictions. Additionally, the variability in passenger behaviour between peak and off-peak periods necessitates differentiated operational strategies. Developing dynamic dwell-time models that consider these temporal variations would enable more efficient resource allocation and scheduling. Furthermore, refining dwell time predictions requires comprehensive data collection systems that capture detailed operational and passenger flow variables. Policymakers should invest in advanced monitoring technologies and data analytics to continuously update and improve these models, ensuring robust and adaptive solutions to evolving operational challenges. Since staff procedures significantly impact dwell times, policies should advocate for regular training and standardised protocols tailored to the regional context. This approach can streamline operations and reduce variability in dwell times. Recognising differences in passenger demographics and behaviours during various periods, policies should promote passenger-centric solutions, such as targeted information campaigns and improved station design, to facilitate smoother boarding and alighting processes.

It is important to note that this study was based on two representative intermediate stations along a single corridor, chosen for data quality and operational uniformity. Accordingly, the external validity of the findings is limited to similar regional contexts. The insights gained, however, provide an essential benchmark for extending model calibration to other station types and operational configurations.

Overall, this research contributed to the enhancement of scheduling and operational planning for policymakers and the regional rail services they represent by identifying strengths and limitations in existing models and the potential for necessary refinements. This study serves as a foundational study for regional railways to inform future research into dwell time modelling (particularly for regional railways). The findings of this study reinforce the importance of context-aware modelling for sustainable railway planning. By revealing the limitations of urban-derived dwell time models and identifying region-specific operational influences, the research underscores the need for adaptive, data-informed approaches that enhance service reliability and energy efficiency. Extending such modelling efforts across broader regional networks can support more sustainable mobility outcomes by reducing delays, optimising resource use, and improving passenger experience, aligning with the global agenda for low-carbon and equitable transport systems. Future work should focus on integrating the variability of operational factors and passenger behaviours into dwell-time models to improve their predictive accuracy, particularly for use on a regional railway. Specifically, future research should aim to extend and refine this work through several specific directions:

Develop correction factors or sub-models, including multi-level or random-effects formulations that explicitly account for operational delay components, infrastructure or regulatory factors, particularly C4 time (conductor procedures), tailored to regional railway conditions.
Expand data collection to include a broader range of stations with diverse infrastructure, service frequency, and operational constraints, enabling broader model validation and generalisation.
Investigate hybrid modelling approaches that combine a statistical core (for passenger flow prediction) with rule-based or stochastic elements to better capture the operational variability inherent in regional railways.
These targeted extensions will help improve dwell time modelling accuracy, enhance operational planning, and support the development of more resilient and efficient regional rail systems.

Author Contributions

Conceptualisation, K.N. and N.S.; methodology, K.N.; software, K.N.; validation, K.N. and N.S.; formal analysis, K.N.; investigation, K.N. and N.S.; resources, K.N.; data curation, K.N.; writing—original draft preparation, K.N.; writing—review and editing, N.S. and P.S.; visualisation, K.N.; supervision, N.S. and P.S.; project administration, K.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data will be made available by the first author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Summary of observed dwell time characteristics.

Band	Train Type	Direction	C1 (s)	C2 (s)	C3 (s)	C4 (s)	C5 (s)	C6 (s)	Observed Trips	Avg of Total Joins	Avg of Total Alights	Avg of Total Dwell
AMP	3VL	U	3.8	2.3	12.2	9.7	6.0	6.9	19	19.8	0.4	40.8
AMP	6VL	U	3.6	2.4	15.0	5.3	6.0	5.6	51	53.6	0.7	37.9
INP	3VL	D	3.3	2.5	4.6	8.7	6.0	6.4	65	0.4	7.3	31.5
INP	6VL	D	3.4	2.5	7.1	10.4	6.0	8.2	23	1.9	15.4	37.6
INP	3VL	U	3.6	2.6	8.1	7.3	6.0	7.6	115	9.5	0.5	35.1
INP	6VL	U	3.4	2.5	8.1	7.5	6.0	8.0	29	16.5	0.9	35.4
PMP	3VL	D	3.9	2.1	15.6	7.7	6.0	7.2	9	0.3	36.4	42.4
PMP	6VL	D	3.9	2.3	14.2	11.0	6.0	6.7	59	0.6	46.5	43.9
POP	3VL	D	4.1	2.5	9.8	7.9	6.0	5.8	17	0.4	22.9	36.0
POP	6VL	D	3.5	2.0	6.5	16.0	6.0	6.5	2	0.5	9.0	40.5
POP	3VL	U	3.9	2.9	5.7	7.1	6.0	6.1	7	2.4	2.4	31.7
POP	6VL	U	2.5	2.0	0.5	9.0	6.0	4.5	2	0.5	0.0	24.5

Note: Colour legend: green (shortest amount of time/least no. of PAX) to red (longest amount of time/highest no. of PAX) colour scale with respect to each column.

References

Yang, J.; Shiwakoti, N.; Tay, R. Train dwell time models—Development in the past forty years. In Proceedings of the Australasian Transport Research Forum 2019 Proceedings, Canberra, Australia, 30 September–2 October 2019. [Google Scholar]
Crockett, J.; Mason, A.R.; Segal, J.; Whelan, G.A.; Condry, B. UK Regional Rail Demand in Britain. In Proceedings of the European Transport Conference, Glasgow, UK, 11–13 October 2010. [Google Scholar]
Urban Transport Group. The Transformational Benefits of Investing in Regional Rail. 2017. Available online: https://www.urbantransportgroup.org/system/files/general-docs/The%20Transformational%20Benefits%20of%20Investing%20in%20Regional%20Rail.pdf (accessed on 12 September 2025).
Transport for Victoria. Growing Our Rail Network (2018–2025). 2018. Available online: https://www.vic.gov.au/growing-our-rail-network-2018-2025 (accessed on 18 April 2025).
Victoria Auditor-General’s Office. Assessing Benefits from the Regional Rail Link Project. 2018. Available online: https://www.audit.vic.gov.au/report/assessing-benefits-regional-rail-link-project?section=32786 (accessed on 16 July 2025).
Transport for NSW. Data and Insights—Public Transport Trips. 2024. Available online: https://www.transport.nsw.gov.au/data-and-research/data-and-insights/public-transport-trips-all-modes (accessed on 19 June 2025).
Givoni, M.; Banister, D. Moving Towards Low Carbon Mobility; Edward Elgar: Cheltenham, UK, 2013. [Google Scholar]
González-Gil, A.; Palacin, R.; Batty, P.; Powell, J.P. A systems approach to reduce urban rail energy consumption. Energy Convers. Manag. 2014, 80, 509–524. [Google Scholar] [CrossRef]
Ng, K.; Shiwakoti, N.; Stansinopoulos, P. Comprehensive examination of regional railway passenger behavior and dwell time components: Insights from video-based observations in Victoria, Australia. J. Rail Transp. Plan. Manag. 2024, 31, 100464. [Google Scholar] [CrossRef]
Coulaud, R.; Keribin, C.; Stoltz, G. Modeling dwell time in a data-rich railway environment: With operations and passenger flows data. Transp. Res. Part C Emerg. Technol. 2023, 146, 103980. [Google Scholar] [CrossRef]
Kuipers, R.A.; Palmqvist, C.W.; Olsson, N.O.; Winslott Hiselius, L. The passenger’s influence on dwell times at station platforms: A literature review. Transp. Rev. 2021, 41, 721–741. [Google Scholar] [CrossRef]
Pang, Z.; Wang, L.; Wang, S.; Li, L.; Peng, Q. Dynamic train dwell time forecasting: A hybrid approach to address the influence of passenger flow fluctuations. Railw. Eng. Sci. 2023, 31, 351–369. [Google Scholar] [CrossRef]
Wirasinghe, S.C.; Szplett, D. An investigation of passenger interchange and train standing time at LRT stations: (ii) Estimation of standing time. J. Adv. Transp. 1984, 18, 13–24. [Google Scholar] [CrossRef]
Weston, J.G. Train service model—Technical guide. Lond. Undergr. Oper. Res. Note 1989, 89, 18. [Google Scholar]
Lin, T.-m.; Wilson, N.H. Dwell time relationships for light rail systems. Transp. Res. Rec. 1992, 1361, 287–295. [Google Scholar]
Lam, W.H.K.; Cheung, C.Y.; Poon, Y.F. A study of train dwelling time at the hong kong mass transit railway system. J. Adv. Transp. 1989, 32, 285–295. [Google Scholar] [CrossRef]
Puong, A. Dwell Time Model and Analysis for the MBTA Red Line; Massachusetts Institute of Technology: Cambridge, MA, USA, 2000; pp. 02139–04307. [Google Scholar]
Douglas, N. Modelling Train & Passenger Capacity Report to Transport for NSW Modelling Train & Station Demand & Capacity Final Report for Transport for NSW—For Distribution. DOUGLAS Economics. 2012. Available online: https://www.researchgate.net/publication/316979236_Modelling_Train_Passenger_Capacity (accessed on 27 October 2025).
Harris, N.G. Train boarding and alighting rates at high passenger loads. J. Adv. Transp. 2006, 40, 249–263. [Google Scholar] [CrossRef]
Harris, N.G.; Anderson, R. An international comparison of urban rail boarding and alighting rates. Proc. Inst. Mech. Eng. Part F-J. Rail Rapid Transit 2007, 221, 521–526. [Google Scholar] [CrossRef]
Zhang, Q.; Han, B.; Li, D. Modeling and simulation of passenger alighting and boarding movement in Beijing metro stations. Transp. Res. Part C Emerg. Technol. 2008, 16, 635–649. [Google Scholar] [CrossRef]
Jiang, Z.; Li, F.; Xu, R.-h.; Gao, P. A simulation model for estimating train and passenger delays in large-scale rail transit networks. J. Cent. South Univ. 2012, 19, 3603–3613. [Google Scholar] [CrossRef]
Yamamura, A.; Inagi, T. Dwell Time Analysis in Urban Railway Lines using Multi Agent Simulation. In Proceedings of the World Conference on Transport Research (WCTR13), Rio de Janeiro, Brazil, 15–18 July 2013. [Google Scholar]
Jiang, Z.; Xie, C.; Ji, T.; Zou, X. Dwell Time Modelling and Optimized Simulations for crowded Rail Transit Lines Based on Train Capacity. PROMET—Traffic Transp. 2015, 27, 125–135. [Google Scholar] [CrossRef]
Perkins, A.; Ryan, B.; Siebers, P.-O. Modelling and simulation of rail passengers to evaluate methods to reduce dwell times. In Proceedings of the 14th International Conference on Modeling and Applied Simulation, MAS 2015, Bergeggi, Italy, 21–23 September 2015. [Google Scholar]
Ahn, S.; Kim, J.; Bekti, A.; Cheng, L.-C.; Clark, E.; Robertson, M.; Salita, R. Real-time Information System for Spreading Rail Passengers across Train Carriages: Agent-based Simulation Study. In Proceedings of the Australasian Transport Research Forum 2016 Proceedings, Melbourne, Australia, 16–18 November 2016. [Google Scholar]
Alvarez, A.B.; Merchan, F.; Poyo, F.J.C.; George, R.J.C. A Fuzzy Logic-Based Approach for Estimation of Dwelling Times of Panama Metro Stations. Entropy 2015, 17, 2688–2705. [Google Scholar] [CrossRef]
Chu, W.J.; Zhang, X.C.; Chen, J.H.; Xu, B. An ELM-Based Approach for Estimating Train Dwell Time in Urban Rail Traffic. Math. Probl. Eng. 2015, 2015, 473432. [Google Scholar] [CrossRef]
Glatin, N.L.; Clarke, P. A Feasibility Study Towards the Conceptual Development of a Real-Time Digital Twin to Reduce Dwell Time Variations on the Thameslink Route (COF-DSP-06). 2021. Available online: https://www.rssb.co.uk/research-catalogue/CatalogueItem/COF-DSP-06 (accessed on 27 October 2025).
Padovano, A.; Longo, F.; Manca, L.; Grugni, R. Improving safety management in railway stations through a simulation-based digital twin approach. Comput. Ind. Eng. 2024, 187, 109839. [Google Scholar] [CrossRef]
Bapaume, T.; Côme, E.; Ameli, M.; Roos, J.; Oukhellou, L. Forecasting passenger flows and headway at train level for a public transport line: Focus on atypical situations. Transp. Res. Part C Emerg. Technol. 2023, 153, 104195. [Google Scholar] [CrossRef]
Pritchard, J.; Sadler, J.; Blainey, S.; Waldock, I.; Austin, J. Predicting and mitigating small fluctuations in station dwell times. J. Rail Transp. Plan. Manag. 2021, 18, 100249. [Google Scholar] [CrossRef]
Albrecht, T.; Binder, A.; Gassel, C. Applications of real-time speed control in rail-bound public transportation systems. IET Intell. Transp. Syst. 2013, 7, 305–314. [Google Scholar] [CrossRef]
Kecman, P.; Goverde, R.M. Predictive modelling of running and dwell times in railway traffic. Public Transp. 2015, 7, 295–319. [Google Scholar] [CrossRef]
Liu, J.; Chen, L.; Roberts, C.; Nicholson, G.; Ai, B. Algorithm and peer-to-peer negotiation strategies for train dispatching problems in railway bottleneck sections. IET Intell. Transp. Syst. 2019, 13, 1717–1725. [Google Scholar] [CrossRef]
Liu, J.; Chen, L.; Tian, Z.; Zhao, N.; Roberts, C. A Novel Multi-Agent-Based Approach for Train Rescheduling in Large-Scale Railway Networks. Appl. Sci. 2025, 15, 7996. [Google Scholar] [CrossRef]
Liu, J.; Lin, Z.; Liu, R. A reinforcement learning approach to solving very-short term train rescheduling problem for a single-track rail corridor. J. Rail Transp. Plan. Manag. 2024, 32, 100483. [Google Scholar] [CrossRef]
Yang, J.; Liu, J.; Chen, H.; Shi, J.; Zhang, Y. Safety-oriented OD-based time-dependent fare strategies for overcrowded metro lines. Comput. Ind. Eng. 2025, 209, 111424. [Google Scholar] [CrossRef]
Kushwaha, D.; Kumar, A.; Harsha, S.P. Advancements and applications of digital twin in the railway industry: A literature review. Int. J. Rail Transp. 2024, 13, 1–26. [Google Scholar] [CrossRef]
Ghaboura, S.; Ferdousi, R.; Laamarti, F.; Yang, C.; El Saddik, A. Digital twin for railway: A comprehensive survey. IEEE Access 2023, 11, 120237–120257. [Google Scholar] [CrossRef]
Zhu, L.; Chen, C.; Wang, H.; Yu, F.R.; Tang, T. Machine learning in urban rail transit systems: A survey. IEEE Trans. Intell. Transp. Syst. 2023, 25, 2182–2207. [Google Scholar] [CrossRef]
Oliveira, L.; Fox, C.; Birrell, S.; Cain, R. Analysing passengers’ behaviours when boarding trains to improve rail infrastructure and technology. Robot. Comput. Integr. Manuf. 2019, 57, 282–291. [Google Scholar] [CrossRef]
Harris, N.; de Simone, F.; Condry, B. A Comprehensive Analysis of Passenger Alighting and Boarding Rates. Urban Rail Transit 2022, 8, 67–98. [Google Scholar] [CrossRef]
Kuipers, R. Dwell Time Delays for Commuter Trains: An Analysis of the Influence of Passengers on Dwell Time Delays (No. 331). Ph.D. Thesis, Lund University, Lund, Sweden, 2024. [Google Scholar]
James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning: With Applications in R; Springer: New York, NY, USA, 2013; Volume 103. [Google Scholar]
Burnham, K.P.; Anderson, D.R. (Eds.) Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach; Springer: New York, NY, USA, 2002. [Google Scholar]
Mi, J.X.; Li, A.D.; Zhou, L.F. Review study of interpretation methods for future interpretable machine learning. IEEE Access 2020, 8, 191969–191985. [Google Scholar] [CrossRef]
Nguyen, H.; Kieu, L.M.; Wen, T.; Cai, C. Deep learning methods in transportation domain: A review. IET Intell. Transp. Syst. 2018, 12, 998–1004. [Google Scholar] [CrossRef]
Cohen, J. Statistical Power Analysis for the Behavioral Sciences, 2nd ed.; Routledge: Oxfordshire, UK, 1988. [Google Scholar] [CrossRef]
Hemphill, J.F. Interpreting the Magnitudes of Correlation Coefficients. Am. Psychol. 2003, 58, 78–79. [Google Scholar] [CrossRef] [PubMed]
Richard, F.D.; Bond, C.F.; Stokes-Zoota, J.J. One Hundred Years of Social Psychology Quantitatively Described. Rev. Gen. Psychol. 2003, 7, 331–363. [Google Scholar] [CrossRef]
Paramita, P. Modelling Commuters’ Mode Choice: Integrating Travel Behaviour, Stated Preferences, Perception, and Socio-Economic Profile. Ph.D. Thesis, Queensland University of Technology, Brisbane, Australia, 2018. [Google Scholar]
Du, B. Estimating travellers’ trip purposes using public transport data and land use information. In Proceedings of the Tenth Triennial Symposium on Transportation Analysis (TRISTAN X), Hamilton Island, Australia, 17–21 June 2019. [Google Scholar]

Figure 1. Illustrative layout of Victoria’s regional passenger services (adapted from [9]).

Figure 2. Station layouts for (a) Cobblebank and (b) Rockbank.

Figure 3. Passenger flow rates per condition.

Figure 4. Percentage of passengers choosing to board or alight through the ‘peak’ door per service—3-car VLocity (6 available doors).

Figure 5. Percentage of passengers choosing to board or alight through the ‘peak’ door per service—6-car VLocity (12 available doors).

Figure 6. Time required for passenger flow only—observed vs. modelled—all investigated VLocity services [13,14,16,17,18].

Figure 7. Time required for passenger flow only—observed vs. modelled—AMP Up VLocity services [13,14,16,17,18].

Figure 8. Time required for passenger flow only—observed vs. modelled—INP Up VLocity services [13,14,16,17,18].

Figure 9. Time required for passenger flow only—observed vs. modelled—INP Down VLocity services [13,14,16,17,18].

Figure 10. Time required for passenger flow only—observed vs. modelled—PMP Down VLocity services [13,14,16,17,18].

Figure 11. Time required for passenger flow only—observed vs. modelled—POP Up VLocity services [13,14,16,17,18].

Figure 12. Time required for passenger flow only—observed vs. modelled—POP Down VLocity services [13,14,16,17,18].

Figure 13. Time required for total dwell time—observed vs. modelled—all investigated VLocity services [13,14,16,17,18].

Figure 14. Time required for total dwell time—observed vs. modelled—AMP Up VLocity services [13,14,16,17,18].

Figure 15. Time required for total dwell time—observed vs. modelled—INP Up VLocity services [13,14,16,17,18].

Figure 16. Time required for total dwell time—observed vs. modelled—INP Down VLocity services [13,14,16,17,18].

Figure 17. Time required for total dwell time—observed vs. modelled—PMP Down VLocity services [13,14,16,17,18].

Figure 18. Time required for total dwell time—observed vs. modelled—POP Up VLocity services [13,14,16,17,18].

Figure 19. Time required for total dwell time—observed vs. modelled—POP Down VLocity services [13,14,16,17,18].

Figure 20. Goodness of fit comparison between partial (passenger flow only) versus complete (total dwell time) models [13,14,16,17,18].

Table 1. Data selection and sample sizes.

Year	Date Range	Station	Train Type	Platform	Time Period	Services Investigated
2022	07/02–09/02, 05/09–10/09	Cobblebank	6VL	1	AMP	18
				1	INP	13
				2	INP	14
				2	PMP	25
				1	POP	1
				2	POP	1
			3VL	1	AMP	12
				1	INP	66
				2	INP	40
				2	PMP	4
				1	POP	4
				2	POP	9
		Rockbank	6VL	1	AMP	33
				1	INP	16
				2	INP	9
				2	PMP	34
				1	POP	1
				2	POP	1
			3VL	1	AMP	7
				1	INP	49
				2	INP	25
				2	PMP	5
				1	POP	3
				2	POP	8
Total						398

Table 2. The complete and partial models used for validation.

Model	Complete Model (Total Dwell Time)	Partial Model (Passenger Flow Time Only, Pf)	Notation
Wirasinghe and Szplett [13] (Dominant alighting)	$t = 2 + 1.0 (a) + 2.4 (b)$	$P f = 1.0 (a) + 2.4 (b)$	t = dwell time b = passengers boarding a = passengers alighting
Wirasinghe and Szplett [13] (Mixed flow)	$t = 2 + 0.4 (a) + 1.4 (b)$	$P f = 0.4 (a) + 1.4 (b)$
Wirasinghe and Szplett [13] (Dominant boarding)	$t = 2 + 1.4 (a) + 1.4 (b)$	$P f = 1.4 (a) + 1.4 (b)$
Weston [14]	$S S = 15 + 1.4 [1 + \frac{F}{35} (\frac{(T - S)}{D})] [{(\frac{F B}{D})}^{0.7} + {(\frac{F A}{D})}^{0.7} + (0.027 (\frac{F B}{D}) (\frac{F A}{D}))]$	$P f = 1.4 [1 + \frac{F}{35} (\frac{(T - S)}{D})] [{(\frac{F B}{D})}^{0.7} + {(\frac{F A}{D})}^{0.7} + (0.027 (\frac{F B}{D}) (\frac{F A}{D}))]$	SS = Station stop time (in seconds) 15= Function time (secs)—train stop to doors open plus time for doors to close and train to start moving A = Number of passengers alighting the train B = Number of passengers boarding the train D = Number of doors on train (double door width) F = Peak door/average door factor T = Number of through passengers S = Number of seats on the train
Lam et al. [16]	$D T = 10.5 + 0.021 A + 0.016 B$	$P f = 0.021 A + 0.016 B$	DT = train dwell time (in seconds) A = number of alighting passengers per train B = number of boarding passengers per train
Puong [17]	$D T = 12.22 + 2.27 B_{d} + 1.82 A_{d} + {0.00062 T S_{d}}^{3} B_{d}$	$P f = 2.27 B_{d} + 1.82 A_{d} + {0.00062 T S_{d}}^{3} B_{d}$	DT = Dwell time (in seconds) Bd = Number of passengers boarding per door (single-width door) Ad = Number of passengers alighting per door (single-width door) TSd = Number of standing passengers per door
Douglas [18]	$D T = 10 + 1.9 A_{d}^{0.7} + 1.4 B_{d}^{0.7} + 0.007 (A_{d} + B_{d}) ({S t d}_{d}) + 0.005 (A_{d} B_{d})$	$P f = 1.9 A_{d}^{0.7} + 1.4 B_{d}^{0.7} + 0.007 (A_{d} + B_{d}) ({S t d}_{d}) + 0.005 (A_{d} B_{d})$	DT = train dwell time (in seconds) Ad = number of alighting passengers per door Bd = number of boarding passengers per door [Std] d = estimated number of standing through—passengers per door

Table 3. Summary of observed C3 (board/alight), C4 (conductor time) and total dwell time.

Band	Train Type	Direction	C3 (s)	C4 (s)	Avg of Total Dwell
AMP	3VL	U	12.2	9.7	40.8
AMP	6VL	U	15.0	5.3	37.9
INP	3VL	D	4.6	8.7	31.5
INP	6VL	D	7.1	10.4	37.6
INP	3VL	U	8.1	7.3	35.1
INP	6VL	U	8.1	7.5	35.4
PMP	3VL	D	15.6	7.7	42.4
PMP	6VL	D	14.2	11.0	43.9
POP	3VL	D	9.8	7.9	36.0
POP	6VL	D	6.5	16.0	40.5
POP	3VL	U	5.7	7.1	31.7
POP	6VL	U	0.5	9.0	24.5

Note: Colour legend: green (shortest amount of time/least no. of PAX) to red (longest amount of time/highest no. of PAX) colour scale with respect to each column.

Table 4. Goodness of fit and error metrics between Observed versus Modelled (Partial Model-Passenger flow only.

Correlation Between Observed vs. Modelled (PAX Flow Only)		Wirasinghe & Szplett [13] (Dom. Alight)	Wirasinghe & Szplett [13] (Alight & Board)	Wirasinghe & Szplett [13] (Dom. Board)	Weston [14]	Lam, Cheung, and Poon [16]	Puong [17]	Douglas [18]
	Metrics	PAX Flow Only	PAX Flow Only	PAX Flow Only	PAX Flow Only	PAX Flow Only	PAX Flow Only	PAX Flow Only
All	n	398	398	398	398	398	398	398
	r²	0.4430	0.3606	0.5785	0.3037	0.5360	0.5844	0.6473
	t	10.4485	8.1741	14.9928	6.7400	13.4235	15.2282	17.9567
	p	<0.001	<0.001	<0.001	<0.001	<0.001	<0.001	<0.001
	MAPE	N/A	N/A	N/A	N/A	N/A	N/A	N/A
	RMSE	5.49	7.94	6.11	9.32	10.71	6.06	6.16
	MAE	3.83	6.18	4.6	7.71	9.01	4.14	4.61
AMP Up	n	70	70	70	70	70	70	70
	r²	0.5136	0.5097	0.5055	0.4098	0.3628	0.5745	0.5633
	t	4.9358	4.8852	4.8317	3.7048	3.2105	5.7883	5.6225
	p	<0.001	<0.001	<0.001	<0.05	<0.05	<0.001	<0.001
	MAPE	22.69	52.12	48.62	79.32	94.85	24.69	48.58
	RMSE	5.54	9.32	8.88	12.64	14.91	4.62	9.15
	MAE	3.71	7.82	7.34	11.35	13.51	3.48	7.54
INP Up	n	144	144	144	144	144	144	144
	r²	0.1976	0.1915	0.1875	0.1013	0.1527	0.2182	0.2302
	t	2.4020	2.3250	2.2752	1.2139	1.8407	2.6641	2.8193
	p	<0.05	<0.05	<0.05	>0.05	> 0.05	<0.05	<0.05
	MAPE	N/A	N/A	N/A	N/A	N/A	N/A	N/A
	RMSE	3.31	5.21	4.80	7.13	8.42	3.17	4.73
	MAE	2.42	4.20	3.81	6.21	7.44	2.29	3.69
INP Down	n	88	88	88	88	88	88	88
	r²	0.4568	0.3920	0.5629	0.3072	0.5991	0.6163	0.6703
	t	4.7616	3.9512	6.3152	2.9939	6.9388	7.2574	8.3756
	p	<0.001	<0.05	<0.001	<0.05	<0.001	<0.001	<0.001
	MAPE	N/A	N/A	N/A	N/A	N/A	N/A	N/A
	RMSE	3.50	4.90	3.20	5.27	6.18	3.85	3.15
	MAE	2.47	3.78	2.16	4.22	5.03	2.80	2.16
PMP Down	n	68	68	68	68	68	68	68
	r²	0.6011	0.5043	0.6970	0.2099	0.6432	0.8553	0.8124
	t	6.1111	4.7450	7.8957	1.7438	6.8236	13.4130	11.3181
	p	<0.001	<0.001	<0.001	>0.05	<0.001	<0.001	<0.001
	MAPE	59.11	82.32	48.62	87.19	93.08	74.25	46.80
	RMSE	9.82	13.07	8.30	13.53	14.55	11.80	8.22
	MAE	8.74	11.93	7.29	12.40	13.38	10.44	7.12
POP Up	n	9	9	9	9	9	9	9
	r²	0.7297	0.7060	0.6895	0.4412	0.5215	0.6900	0.7954
	t	2.8234	2.6376	2.5185	1.3008	1.6169	2.5221	3.4719
	p	<0.05	<0.05	<0.05	>0.05	>0.05	<0.05	<0.05
	MAPE	N/A	N/A	N/A	N/A	N/A	N/A	N/A
	RMSE	1.58	2.54	1.89	5.16	5.27	1.58	2.49
	MAE	1.05	2.02	1.57	4.39	4.49	1.32	1.97
POP Down	n	19	19	19	19	19	19	19
	r²	0.4069	0.2699	0.5742	0.5676	0.6127	0.6081	0.5255
	t	1.8364	1.1558	2.8914	2.8423	3.1969	3.1585	2.5468
	p	>0.05	>0.05	<0.05	<0.05	<0.05	<0.05	<0.05
	MAPE	49.94	77.95	38.92	68.16	95.21	59.56	42.48
	RMSE	5.41	7.95	4.26	6.65	9.47	6.45	4.84
	MAE	4.88	7.48	3.77	6.33	9.02	5.49	4.28

Note: r² colour legend: Red = weak [0.0, 0.3), Orange = moderate [0.3, 0.6), Green = strong [0.6, 1.0] p value colour legend = Green = <0.05 or <0.001, Red = ≥0.05. MAPE values of N/A denote a non-applicability of this metric due to actual recorded passenger flow time values of ‘0’ where a train service stopped and no passengers boarded or alighted.

Table 5. Goodness of fit statistics between Observed versus Modelled (Complete Model—Total Dwell).

Correlation Between Observed vs. Modelled (Total Dwell)		Wirasinghe & Szplett [13] (Dom. Alight)	Wirasinghe & Szplett [13] (Alight & Board)	Wirasinghe & Szplett [13] (Dom. Board)	Weston [14]	Lam, Cheung, and Poon [16]	Puong [17]	Douglas [18]
	Metrics	Total Dwell Time	Total Dwell Time	Total Dwell Time	Total Dwell Time	Total Dwell Time	Total Dwell Time	Total Dwell Time
All	n	398	398	398	398	398	398	398
	r²	0.0880	0.0597	0.1788	0.0443	0.1930	0.2452	0.2215
	t	1.8679	1.2646	3.8428	0.9382	4.1585	5.3470	4.8029
	p	>0.05	>0.05	<0.05	>0.05	<0.001	<0.001	<0.001
	MAPE	77.58	85.33	81.01	52.47	68.88	34.19	58.34
	RMSE	29.79	32.55	30.79	21.74	27.17	15.20	23.20
	MAE	28.50	31.36	29.70	19.91	25.71	12.57	21.78
AMP Up	n	70	70	70	70	70	70	70
	r²	0.0513	0.0511	0.0494	0.0510	0.0126	0.0734	0.0664
	t	0.4232	0.4222	0.4076	0.4212	0.1035	0.6068	0.5491
	p	>0.05	>0.05	>0.05	>0.05	>0.05	>0.05	>0.05
	MAPE	65.54	77.89	76.64	52.29	69.91	28.55	55.61
	RMSE	27.30	31.69	31.24	22.80	29.08	15.31	23.87
	MAE	25.58	30.30	29.83	20.84	27.49	11.60	22.03
INP Up	n	144	144	144	144	144	144	144
	r²	0.0334	0.0322	0.0323	0.1013	0.0216	0.0335	0.0329
	t	0.3954	0.3841	0.3850	1.2139	0.2575	0.3992	0.3927
	p	>0.05	>0.05	>0.05	>0.05	>0.05	>0.05	>0.05
	MAPE	76.20	83.86	82.55	50.99	68.02	42.57	58.41
	RMSE	27.87	30.38	29.95	20.11	25.44	17.43	22.27
	MAE	26.76	29.39	28.93	18.46	24.18	15.47	20.89
INP Down	n	88	88	88	88	88	88	88
	r²	0.0992	0.0914	0.1036	0.0155	0.1095	0.1183	0.1470
	t	0.9243	0.8514	0.9659	0.1438	1.0213	1.1050	1.3786
	p	>0.05	>0.05	>0.05	>0.05	>0.05	>0.05	>0.05
	MAPE	84.56	89.42	83.81	50.03	66.57	39.45	59.39
	RMSE	28.87	30.48	28.64	18.61	23.58	15.37	21.11
	MAE	27.95	29.58	27.71	17.00	22.33	13.21	19.88
PMP Down	n	68	68	68	68	68	68	68
	r²	0.3070	0.2567	0.3577	0.0940	0.3782	0.4778	0.4342
	t	2.6206	2.1576	3.1117	0.7668	3.3193	4.4188	3.9158
	p	<0.05	<0.05	<0.05	>0.05	<0.05	<0.001	<0.05
	MAPE	82.60	89.80	79.32	60.56	73.02	17.47	59.93
	RMSE	36.99	40.21	35.50	28.11	33.42	10.23	27.58
	MAE	36.14	39.32	34.68	26.80	32.28	7.67	26.51
POP Up	n	9	9	9	9	9	9	9
	r²	0.4968	0.4663	0.5265	0.6834	0.6413	0.3055	0.4315
	t	1.5145	1.3946	1.6385	2.4766	2.2112	0.8488	1.2656
	p	>0.05	>0.05	>0.05	<0.05	>0.05	>0.05	>0.05
	MAPE	78.07	84.93	81.94	48.53	64.11	43.26	57.27
	RMSE	23.53	25.73	24.79	15.56	20.03	13.66	17.77
	MAE	23.32	25.49	24.57	14.94	19.54	13.14	17.38
POP Down	n	19	19	19	19	19	19	19
	r²	0.0432	0.0139	0.1135	0.1767	0.0947	0.0000	0.0026
	t	0.1782	0.0574	0.4710	0.7401	0.3921	0.0001	0.0105
	p	>0.05	>0.05	>0.05	>0.05	>0.05	>0.05	>0.05
	MAPE	81.59	88.87	78.40	49.38	69.56	23.85	57.84
	RMSE	30.26	32.76	29.22	19.04	25.86	11.21	21.70
	MAE	29.88	32.48	28.77	18.33	25.52	9.04	21.28

Note: r² colour legend: Red = weak [0.0, 0.3), Orange = moderate [0.3, 0.6), Green = strong [0.6, 1.0] p value colour legend = Green = <0.05 or <0.001, Red = ≥0.05.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ng, K.; Shiwakoti, N.; Stasinopoulos, P. Evaluating Statistical Models of Railway Dwell Time: Video-Based Evidence from Regional Railways in Victoria, Australia. Sustainability 2025, 17, 10968. https://doi.org/10.3390/su172410968

AMA Style

Ng K, Shiwakoti N, Stasinopoulos P. Evaluating Statistical Models of Railway Dwell Time: Video-Based Evidence from Regional Railways in Victoria, Australia. Sustainability. 2025; 17(24):10968. https://doi.org/10.3390/su172410968

Chicago/Turabian Style

Ng, Kenneth, Nirajan Shiwakoti, and Peter Stasinopoulos. 2025. "Evaluating Statistical Models of Railway Dwell Time: Video-Based Evidence from Regional Railways in Victoria, Australia" Sustainability 17, no. 24: 10968. https://doi.org/10.3390/su172410968

APA Style

Ng, K., Shiwakoti, N., & Stasinopoulos, P. (2025). Evaluating Statistical Models of Railway Dwell Time: Video-Based Evidence from Regional Railways in Victoria, Australia. Sustainability, 17(24), 10968. https://doi.org/10.3390/su172410968

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Evaluating Statistical Models of Railway Dwell Time: Video-Based Evidence from Regional Railways in Victoria, Australia

Abstract

1. Introduction

2. Related Works

3. Methodology

3.1. Data Collection and Locations

3.2. Ethics Consideration

3.3. Data

3.4. Model Selection

3.5. Performance Testing

4. Results and Analysis

4.1. Summary of Dwell Time Observations

4.1.1. Observed Dwell Time Components

4.1.2. Observed Passenger Flow Rates

4.1.3. Observed Concentrated Boarding and Alighting

3-Car VLocity Train (6 Single Stream Doors)

6-Car VLocity Train (12 Single Stream Doors)

4.2. Model Performance

4.2.1. Passenger Flow Time Only (Partial Model)

4.2.2. Total Dwell Time (Complete Model)

4.2.3. Goodness of Fit Comparison (Partial vs. Complete)

5. Discussion

5.1. Performance

5.2. Model Architecture

5.3. Implications for Dwell Time Modelling

5.3.1. Passenger Flow Compatibility

5.3.2. Time Lost to Operation Factors

5.3.3. Flow Rates and Peak Door Percentage

5.4. Policy Implications

5.5. Limitations

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI