A Cox Model-Based Workflow for Increased Accuracy in Activity-Travel Patterns Generation

Katsaitis, Dionysios; Rizopoulos, Dimitrios; Gkiotsalitis, Konstantinos

doi:10.3390/app15116237

Open AccessArticle

A Cox Model-Based Workflow for Increased Accuracy in Activity-Travel Patterns Generation

by

Dionysios Katsaitis

,

Dimitrios Rizopoulos

and

Konstantinos Gkiotsalitis

^*

Department of Transportation Planning and Engineering, School of Civil Engineering, National Technical University of Athens, Zografou Campus, 9, Iroon Polytechniou Str., 15780 Athens, Greece

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(11), 6237; https://doi.org/10.3390/app15116237

Submission received: 6 April 2025 / Revised: 15 May 2025 / Accepted: 18 May 2025 / Published: 1 June 2025

(This article belongs to the Special Issue Intelligent Transportation System Technologies and Applications, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Understanding how people spend time on daily activities is key to modeling travel behavior. However, accurately estimating the duration of these activities remains a significant challenge, especially when generating synthetic activity-travel data. This article introduces an activity-based approach that addresses this issue by applying statistical and machine learning models to improve the precision of activity duration estimates. The method utilizes real-world Origin-Destination (OD) datasets to generate additional synthetic data that can support transportation planning processes. Unlike conventional approaches that rely solely on OD matrices, this framework incorporates Cox and Cox-based hazard models to more precisely estimate activity durations, as well as arrival and departure times across trip segments. Statistical tests and comparative evaluations show that the proposed method produces more accurate synthetic data than existing open-source tools that do not employ hazard-based modeling. A case study using real-world data from Athens, Greece, demonstrates the effectiveness of the proposed approach.

Keywords:

activity-travel patterns; activity duration estimation; cox model; hazard-based model; activity-based modeling

1. Introduction

Urban transportation systems face mounting pressure due to rapid urbanization and evolving mobility demands. In many modern cities, worsening traffic congestion underscores the urgent need for strategic interventions to improve transport networks and facilitate smoother daily commutes. Effective planning is essential to achieving these goals, as poorly designed interventions can result in inefficiencies, public dissatisfaction, and adverse environmental impacts.

To support this process, urban and transport planners employ digital modeling tools that enhance efficiency and provide a data-driven foundation for decision-making. These tools allow planners to simulate various transportation scenarios, assess the potential impact of different policies, and optimize infrastructure improvements based on real-world conditions. By integrating scientific methodologies and well-defined criteria, digital modeling ensures that interventions are both effective and sustainable.

A crucial component of digital modeling in transportation planning is the use of activity-travel patterns, which capture individual or household trips by identifying the starting and ending points of travel within an urban area. These data provide valuable insights into commuter behavior, peak traffic flows, and demand for public transportation. By incorporating travel-activity patterns into digital models, planners can design targeted solutions that address congestion peaks, improve transit accessibility, and create more efficient, well-connected transportation networks that meet the needs of growing urban populations.

Except for a few metropolitan areas around the world, usually located in Europe [1] or the United States [2], most cities worldwide are currently attempting to establish methods and technical procedures for systematically collecting and updating activity-travel patterns data. However, these efforts are costly, requiring substantial personnel resources and person-month hours to effectively collect and update the data.

While technology has reduced costs for effectively capturing activity-travel patterns [3], the excessive data collection needs lead researchers and practitioners toward synthetic data generation methodologies. In synthetic generation methodologies, a population is synthesized based on assumptions and statistical models, with the goal of generating activity-travel patterns and supplementing an existing dataset. In all cases, the generation of synthetic activity-travel patterns requires a pre-existing set of datasets like a spatial zoning system, census data, income information, household travel surveys, address databases, and service and facilities census data [4].

The method proposed in this article demonstrates its potential as an effective alternative for achieving higher accuracy in the generation of synthetic activity-travel patterns in urban environments. By adopting an activity-based approach, the study analyzes an existing travel-activity dataset (OD matrix) and produces additional synthetic patterns that more closely reflect real-world behavior compared to existing methods found in the literature or current practice. A key contribution of the approach is the use and comparison of Cox-based models, including standard Cox models, Cox models with latent classes, and Cox model-based machine learning models, which enable more accurate estimation of activity durations within the dataset. This enhancement can be viewed as an extension of the study by Yee et al. [5], who used a simple Cox model in a related application. In our work, we also deal with incomplete or illogical data, and we aim to predict the closest possible durations based on individual characteristics, thereby enriching the database and yielding more reliable results.

It is also important to mention that the current study focuses on the case of Athens, Greece. According to the 2023 Traffic Index Ranking by TomTom [6], Athens ranks 31st among the cities with the worst traffic in the world (measuring the average time required for every 10 km). In fact, a 10 s increase in travel time compared to 2022 has been observed, highlighting the need for effective interventions in the city’s transport networks. Given this continuous increase in travel times, extensive studying and re-planning of the existing transport network is required.

In light of the challenges and contributions outlined above, the structure of this article is as follows: Section 2 presents a literature review on relevant topics. Section 3 details the proposed methodology, including the models evaluated, their parameters and characteristics, and the steps taken to estimate activity durations. Section 4 documents the application of the model and presents the study’s results. Section 5 discusses these results, their broader implications, and potential avenues for further development. Finally, Section 6 concludes the article by summarizing the key findings, acknowledging the study’s limitations, and proposing directions for future research.

2. Literature Review

2.1. Data Sources

Historically, the most popular data source for creating Origin-Destination (OD) passenger demand matrices has been household travel surveys. In these surveys, respondents are asked to provide detailed information about their travel activities over a typical period (e.g., a day, a week). This method can offer very high accuracy as the questions can be highly detailed, and the sample size can be substantial. Another similar data source still used today is the national census data. Both sources are highly reliable, but their main drawback is that they are conducted infrequently due to the significant time, workforce, and financial resources required. Nevertheless, they remain essential as they serve as a fundamental base upon which new data are added using more modern data collection methods.

In the era where technology has become an integral part of our daily lives, data availability is continuously expanding. According to Canalys [7], approximately 82% of people in Europe use smartphones, while according to PewResearch [8], 3 out of 4 users have their Global Positioning System (GPS) enabled, as it is required for the functionality of certain applications. Similar percentages are observed in the United States (U.S.), while in developing countries, the figures are lower but still statistically significant. These location-based data are collected and can be exploited for research purposes by various applications. The key difference compared to traditional data sources is that these data are available in real-time, offering rich insights into the movement of individuals, peak hours, and road congestion.

Another type of data that originates from smartphones is social media data. According to DataReportal [9], approximately 80% of residents in various regions are social media users, with approximately 30% of them sharing their location via check-ins. This creates a vast new data source that can be leveraged, as it could provide even more detailed insights depending on the declared location and the purpose of travel. In recent years, many studies have conducted traffic analysis using social media data (e.g., Gkiotsalitis and Stathopoulos [10]).

Big data also include information collected from mobile networks. This type of data has been used by Li et al. [11] in an effort to simulate the population composition and subsequently create an agent-based model (ABM), as well as by Chen et al. [12]. Locations are collected through mobile phone towers, which exist in sufficient numbers to support such research. With the appropriate assumptions, this type of data is also highly useful for deriving significant conclusions regarding transport systems and traveler behavior. Another type of data that can be utilized comes from smart cards used in public transportation, as demonstrated in the study by Munizaga and Palma [13], where they attempted to create an OD matrix. With appropriate assumptions, these data can provide valuable insights into the characteristics of travel patterns. Lastly, data are also obtained from road sensors, which are devices installed at fixed locations on the road surface that can detect various travel characteristics, such as vehicle type and speed. These sensors contribute to more accurate assessments of traffic flow.

2.2. Modeling Travel Activity Patterns

The second important aspect examined and studied in the literature review concerns the different modeling approaches. The challenging task of modeling, and what we strive for, is predicting mobility patterns. Regarding the creation of activity patterns, prediction becomes even more difficult since travelers’ behavior is complex and influenced by many different variables. Different modeling methods provide flexibility in research, as they can better adapt depending on the type of data and the specific case, allowing for a more accurate representation of reality.

Traditionally, the four-step model (Trip Generation–Trip Distribution–Mode Choice–Trip Assignment) has been used. As analyzed by Ahmed [14] for the city of Dhaka, it becomes clear that the assumptions of the trip generation and the trip distribution steps of the four-step model are more simplistic, possibly relying on growth factors for prediction. The activity patterns of travelers are difficult to estimate because the available data lacks the level of detail that would be necessary.

With technological advancements, newer modeling methods have been applied that can incorporate more data and generate multidimensional conclusions. One such approach is agent-based modeling. These models focus on creating a specific environment with predefined rules, within which the behavior of agents (or individuals) is studied. In most cases, a population is synthesized based on demographic and socioeconomic data obtained from census data and surveys, aiming to represent the real population as accurately as possible. The closer the agents reflect the real population, the better the model and the decisions derived from it. Several research efforts were made on this topic, each focusing on different aspects of analysis. For example, Hörl and Balac [4] in Paris, focus on controlling travel modes to develop activity-travel patterns. Ziemke et al. [15] perform a study on the city of Berlin and examine mode choice along with the temporal and spatial distribution of trips based on points of interest, providing insights into how different locations influence travel behavior. Similarly, Tozluoğlu [16] in Stockholm, focuses on activity durations and mode choices, shedding light on how individuals allocate time across different activities and transportation options. Joubert [17] applied Bayesian Networks to enhance the realism of population synthesis for agent-based modeling.

Modeling activity-travel patterns has also been essential for simulation purposes and solving problems such as understanding how each area is utilized (e.g., residential neighborhood, industrial zone, etc.). Based on this logic, Land Use and Transportation Interaction (LUTI) models have been developed. These models capture the dynamic interaction between land use and transportation, considering how land-use changes affect transport demand and how transportation influences the development and distribution of land use. Moreover, they allow the simulation of different policy and planning scenarios, such as the construction of new transport infrastructure or changes in zoning regulations. Such studies have been analyzed by Aljoufie et al. [18] and Wang et al. [19], attempting to quantify transport accessibility based on land use using different parameters and objectives in each study. Furthermore, in the research by Guzman et al. [20] on Bogotá, the analysis aims to create travel activity patterns, specifically predicting Home-Work-Home (HWH) and Home-Other-Home (HOH) tours. Or in other studies, such as the one by Ortega et al. [21], the authors demonstrate how integrating autonomous vehicles and park-and-ride strategies in MATSim simulations can influence daily activity plans and mobility patterns, further underscoring the flexibility and potential of agent-based modeling frameworks in transport planning.

Another crucial approach is machine learning. In a study by Gkiotsalitis and Stathopoulos [22], machine learning was used with the introduction of Kullback–Leibler Divergence and a Naïve Bayes algorithm. Similarly, Sallard and Balac [23] utilize Bayesian Networks. In this process, data are split into training and testing sets so that a portion of the data is trained to provide more accurate predictions based on statistical metrics or algorithms. The metrics examined in the analysis again relate to mobility characteristics such as trip purpose, mode choice, and time, but statistical analysis can also be performed.

In some cases, a combination of different modeling approaches was recorded. More specifically, in the study by He et al. [24], tour-based and trip-based models were combined to determine the mode choice. Conclusions were also drawn for each scenario separately regarding the temporal and spatial distribution of trips. Tour-based models can be considered a variation of the four-step model, while, in contrast to Trip-based models, which analyze individual trips, they account for the interdependence of trips throughout the day. A similar approach is observed in the model by Jovicic and Hansen [25], which predicted the future traffic demand and aligned more closely with the four-step model, primarily focusing on trip volume calculations. A more recent extension of such activity-based methodologies is presented by Rizopoulos et al. [26], who incorporate electric vehicles and charging behaviors into an activity-based framework tailored for evaluating alternative modal share change scenarios in urban environments.

2.3. Cox Proportional Hazards and Hazard-Based Models in Transport Research

Cox models fall under the broader category of survival analysis models and are commonly used in healthcare research. Some applications can be found in the field of transportation, although one may argue that their adoption still remains limited.

Their flexibility makes them suitable for analyzing both short-term and long-term behaviors related to travel activity. These durations may refer to travel times, as in the study by Gong et al. [27] in Shenzhen, China, where they estimated the duration of trips made by public transport users based on selected variables. Similarly, Raux et al. [28] focused on the estimation of both trip start times and travel durations across different modes of transport in eight European cities. In Toronto, Canada, Kalatian and Farooq [29] applied deep survival analysis using a Cox model to estimate pedestrians’ available waiting time, based on traffic-related characteristics. Another application is found in the work of Bhat [30] in Washington, D.C., where the model was used to examine the duration of post-work activities and their relationship with socioeconomic and spatiotemporal factors. In Kochi, Japan, Nishiuchi et al. [31] applied Cox modeling to smart card data to predict the likelihood of discontinuing public transport usage.

Beyond the transport domain, more advanced applications of Cox models have emerged in healthcare research. For example, Mbotwa et al. [32] used a latent class extension of the Cox model to analyze survival patterns among patients with chronic heart failure, demonstrating the value of incorporating unobserved heterogeneity in model structure. Likewise, Sinha et al. [33] discussed methodological practices for latent class analysis in clinical research, reinforcing the importance of probabilistic modeling in uncovering hidden population subgroups with distinct risk trajectories.

2.4. Modeling Activity Durations

In the context of estimating activity durations, literature predominantly emphasizes hazard-based models, which are among the most widely adopted approaches in activity-based travel behavior research. These models estimate the hazard function, that is, the instantaneous probability that an activity will end at a given time, conditional on it having lasted until that point.

Early studies primarily employed parametric hazard-based models, which assume a specific probability distribution for the duration process, most commonly the Weibull or Exponential distributions. These models offer analytical tractability and interpretability but are limited by their reliance on predefined functional forms, which may not adequately capture complex duration patterns observed in empirical data.

Some key studies in this area include Sreela [34], which was designed specifically for shopping durations and focuses on the influence of various variables. Additionally, Bhat [35] and Hamed and Easa [36] primarily study shopping durations, while also considering other activity purposes. Their work investigates whether unobserved factors not captured in the available data influence results and how such factors should be appropriately accounted for. Another parametric hazard-based model is that of Enam and Auld [37], which utilizes GPS data to correlate activity purpose with duration.

Beyond parametric models, semi-parametric hazard-based models were also explored. The key difference between these and parametric models is that semi-parametric models do not assume a specific probability distribution for the hazard function. Instead, the baseline hazard function remains unspecified and is estimated from the data (e.g., individual characteristics). One such model is that of Liu et al. [38], which addresses a gap in the literature by analyzing activity durations during emergency situations (such as COVID-19). This study incorporates socioeconomic and demographic characteristics, as well as land-use variables, as major influences on activity durations. Another semi-parametric model is the Cox model developed by Yee and Niemeier [5], which is based on a five-year study and provides new insights into the dynamic nature of time in activity durations. However, none of these studies explicitly examined the effect of unobserved heterogeneity.

Another hazard-based model studied was that of Van den Berg et al. [39]. This research employs an accelerated hazard model to estimate the relationships between social activity durations and the interpersonal relationships of the participants. It also examines details such as the frequency of these activities and the conditions under which they were planned. The main distinction of this model is that variables directly influence activity duration, unlike previous models, where variables primarily affected risk factors, which in turn influenced duration.

In addition to hazard-based models, a discrete-continuous choice model by Y. Li et al. [40] was also studied. This model simultaneously represents two levels of choice: a discrete choice regarding participation in a maintenance activity and a continuous variable corresponding to the duration of the activity. This approach highlights the interdependence between these two aspects of the problem as modeled by the two variables.

Finally, another alternative approach to modeling activity durations was examined in the work of Tilahun and Levinson [41]. They used a path model to emphasize causal relationships between variables, aiming to capture both direct and indirect effects on activity durations. To summarize the literature and underline the contribution of our study, Table 1 is presented below.

Our study is closely connected to the work of Yee [5], Liu [38], and Bhat [30] in terms of activity duration modeling. The first two go beyond traditional parametric hazard-based models by using a Cox semi-parametric approach, where the hazard function is not constrained to follow a specific statistical distribution (e.g., Weibull). Instead, it varies based on individual characteristics from survey responses. Chandra Bhat’s work also provides a fresh perspective by addressing unobserved heterogeneity, a factor not directly captured by survey or diary variables but rather linked to hidden behavioral patterns that influence duration. Additionally, for the generation of synthetic travel activity patterns, we drew insights from various studies, particularly those employing activity-based models, such as Moeckel et al. [42] and Liao et al. [43]. This aligns with the methodological approach taken in our paper. The contributions of the current research work are stated as follows:

Development of an activity-based method integrating hazard-based models to enhance the accuracy of synthetic activity-travel pattern generation,
Evaluation and comparison of Cox-based modeling approaches, including models with and without unobserved heterogeneity, to improve activity duration estimation.
Assessment of the realism of the generated synthetic datasets through statistical testing against real-world data from Athens, demonstrating improved behavioral fidelity.

3. Methodology

This section is organized into three parts. The first outlines the theoretical framework underlying the statistical and machine learning models employed in the activity-based approach, while the second presents the activity-based model that is used for the generation of synthetic activity-travel pattern datasets. The third part focuses on the criteria used for the models’ performance evaluation.

3.1. Modeling Unobserved Heterogeneity and Non-Linear Relationships in Source Data by Using the Cox-Based Model

Several modern open-source libraries that are used to generate synthetic activity-travel patterns employ simple statistical distributions for the estimation of synthetic activity durations [48], with only a few exceptions [16]. This context highlights a key challenge: the need to identify and utilize more suitable statistics-based models for the generation of synthetic activity-travel patterns. In response, the current study investigates the applicability of three Cox-based models, aiming to evaluate their effectiveness in addressing this task. In more detail, these models are:

A simple Cox model without unobserved heterogeneity considerations,
A Cox model that accounts for unobserved heterogeneity in the source dataset by including two latent classes,
A Cox-based machine learning model, which leverages a neural network to learn complex, non-linear relationships in the data while maintaining the proportional hazards structure of the traditional Cox model.

Based on these three short descriptions, the current section dives deeper into each of the three models examined in this research work.

Starting with the simple Cox model, it serves as a standard textbook approach to survival analysis [49], assuming proportional hazards and no unobserved heterogeneity. This model posits that all differences in survival outcomes can be attributed to observed covariates, treating individuals with identical covariate profiles as having the same hazard function, thereby excluding the influence of latent factors or frailty. Very importantly, its simplicity and reliance on fewer parameters make it a strong benchmark model, as it is less prone to overfitting compared, most likely to generalize better to bigger and more complex datasets.

Moving on to the second model, a Cox model with unobserved heterogeneity incorporates measurable covariates that explain differences in individual hazard rates, allowing the model to account for known sources of variation across subjects. Considering the size of our dataset and the importance of model interpretability and replicability, the use of two latent classes was deemed more suitable, rather than having one, three, or more latent classes for this second model. This choice reflects findings in recent methodological literature, which highlight that two-class solutions often strike the best balance between interpretability, statistical fit, and model convergence, especially in clinical datasets of moderate size [32,33]. Additionally, adding more classes frequently leads to overfitting and unstable class definitions, offering limited clinical value despite potentially improved statistical fit metrics [33].

More specifically, in the second model, two variables (

{f r a i l t y}_{1}

and

{f r a i l t y}_{2}

) were included to represent the two distinct latent classes identified in the population. The latent classes are two subgroups of observations that are not directly observed but are computed through statistical models. During the duration calculation process, the data are divided into two different groups based on their characteristics. Thus, each of these groups reveals a hidden pattern that, although not described by the data and the variables derived from them, is capable of affecting the final duration of the activities. For example, this could include travel by people with young children or travel by people who avoid traffic, etc. In other words, it concerns hidden characteristics or behaviors that can indeed influence the activity durations. Each observation can belong either to one group or the other, depending on both the individual characteristics it possesses and the individual characteristics of the others, and this is calculated using probabilities as will be further explained.

The model is based on the estimation of a Cox model with frailty, using a non-parametric baseline hazard function:

h_{i} (t) = h o (t) \sum_{c = 1}^{2} Z_{i c} * e x p (β_{c}^{Τ} X_{i Κ} + f r a i l t y_{c})

where:

$h_{i} (t) :$ The hazard function for individual i at time t,
$h_{0} (t)$ : The non-parametric baseline hazard function,
$Z_{i c}$ : The probability that individual i belongs to latent class c,
$X_{i}$ : The vector of predictor variables for individual i,
$β_{c}$ : The vector of coefficients for latent class c,
$f r a i l t y_{c}$ : The frailty term for class c, calculated as:

f r a i l t y_{c} = {z_{f r a i l t y}}_{c} * σ_{f r a i l t y}

where:

$z_{f r a i l t y}$ is an uncentered random factor ranging from 0 to 1,
$σ_{f r a i l t y}$ is the standard deviation.

Given this fundamental discussion, the sets, parameters and variables of our model are presented in more detail in Table 2.

Regarding the model’s input, a research survey conducted by the National Technical University of Athens (NTUA) is used, with the analysis focusing on daily activity chains of a typical working day. Table 3 below presents all the variables included in the study.

To add a bit of perspective on the input data used in the models and the proposed workflow, the columns in Table 3 are explained in the following paragraph. The table outlines various survey variables used to analyze activity-travel patterns, grouped into Demographic, Spatial, Mode Choice, and Temporal categories. Demographic variables include gender, age, education, employment, income, and car ownership, with most being categorical except for age, which is continuous. Spatial variables describe destination and home locations, with the home variable assumed to match the destination in categorical form. Mode Choice is represented by the categorical “Mode” variable, which captures different transportation types such as car, taxi, bus, train, and others. Finally, the Temporal variables include “Time” and “Distance”, both continuous, representing the trip start time and travel distance, respectively. The survey variables described in Table 3 are used as features in the Cox and Cox-based machine learning models, as detailed in later sections. The same variables/features are the ones of the synthetically generated datasets, since the goal of the workflow is to create a wider OD dataset based on an initial one that comes from a small-scale survey.

Latent Class Membership Probabilities

For classifying observations into the two latent classes, we use the probability Ζ_ic. This represents the a posteriori probability that observation i belongs to latent class c. The prior probabilities, classprobs, are obtained from a Dirichlet prior distribution, and they must satisfy:

\sum_{c \in c} c l a s s p r o b s_{c} = 1, c l a s s p r o b s_{c} \geq 0

This condition ensures that the prior probabilities are valid probabilities, summing to one and being non-negative. We note that Ζ_ic is calculated from the following equation, which combines the prior probability with the data likelihood, and normalizes across all classes to yield a posterior probability.

Ζ_{i c} = c l a s s p r o b s_{c} * l i k e l i h o o d_{c} * \frac{1}{\sum_{c} c l a s s p r o b s_{c} * l i k e l i h o o d_{c}}

where:

classprobs_c is the prior probability of belonging to latent class $c$
likelihood_c represents how well the data fit the parameters of latent class c, defined as:

l i k e l i h o o d_{c} = \{\begin{matrix} W e i b u l l P D F, & s t a t u s_{i} = 1 \\ W e i b u l l C C D F, & s t a t u s_{i} = 0 \end{matrix}

This distinction accounts for whether the event of interest (end of activity) was observed or censored. The Weibull probability density function (PDF) is given by:

W e i b u l l P D F (t) = \frac{k}{λ} * {(\frac{t}{λ})}^{κ - 1} {\exp (- \frac{t}{λ})}^{κ}

with the parameters:

k = \frac{1}{σ_{f r a i l t y}} a n d λ = e x p (X_{i} * β_{c} + f r a i l t y_{c})

These parameters reflect the shape and scale of the Weibull distribution, influenced by covariates and latent class-specific effects. Similarly, the complementary cumulative distribution function (CCDF) of the Weibull is:

W e i b u l l C C D F (t) = {\exp (- \frac{t}{λ})}^{κ}

This form is used for censored observations to reflect the probability of survival beyond time t. Finally, the term

\sum_{c} c l a s s p r o b s_{c} * l i k e l i h o o d_{c}

refers to the normalization of the probabilities so that their sum equals 1.

Survival Function for Each Latent Class

The survival function for each latent class c is given by:

S_{i, c} (t) = e x p (- \int_{0}^{t} h_{o} (u) \exp (X_{i} β_{c} + f r a i l t y_{c}) d u)

where,

$h o (t)$ : The non-parametric baseline hazard function
$X_{i} β_{c}$ : the inner product of the coefficients βc and the covariates $X$ for individual $i$
$f r a i l t y_{c}$ : The frailty term for each latent class $c$

Overall Survival Function

The overall survival function is a weighted combination of the survival functions for the two latent classes:

S_{i} (t) = π_{1} {S_{i}}_{1} (t) + π_{2} {S_{i}}_{2} (t)

where,

$π_{1} = Z_{i 1}$ (probability of belonging to latent class 1)
$π_{2} = Z_{i 2}$ (probability of belonging to latent class 2)

The Cox-based machine learning model

The third model employed in this study is a Cox-based machine learning model, which extends the classical semi-parametric Cox proportional hazards model by introducing a non-linear predictor through a fully connected neural network. This approach preserves the proportional hazards assumption while allowing the model to learn complex, non-linear interactions among covariates. Specifically, the hazard function for an individual i at time t is defined as:

h_{i} (t | x_{i}) = h_{0} (t) \cdot e x p (f_{θ} (x_{i}))

where

h_{0} (t)

denotes the unspecified baseline hazard function

x_{i}

is the covariate vector for individual i, and

f_{θ} (\cdot)

is a neural net parametrized by weights and biases θ. Model training is performed by maximizing the Cox partial log-likelihood function:

L (θ) = \sum_{i \in Ε} [f_{θ} (x_{i}) - l o g (\sum_{j \in R_{i}} e x p (f_{θ} (x_{j})))]

With E representing the set of uncensored events and

R_{i}

the risk set for subject i. This modeling approach proposed by Farragi et al. [50] and further developed by Katzman et al. [51], enhances the Cox model’s expressiveness while maintaining its foundational structure, making it potentially suitable for capturing complex behavioral patterns in activity-travel data.

3.2. Activity-Based Model Structure and Integration with Cox-Based Models

An important tool under the proposed methodology is a pre-existing open-source ABM model that already includes several software routines for the generation of synthetic activity-travel patterns based on an initial real-world OD dataset. As shown in Figure 1, the ABM model takes as input the survey and, in collaboration with the open-source Python framework called Population Activity Modeler (PAM), generates the population and synthesizes their travel movements. For a potential implementation of the ABM, Python 3.12 may be used alongside PAM version 0.3.2.

This workflow ultimately creates synthetic travel activity patterns that can be used as input to transport planning models. In addition to generating the synthetic population, the PAM framework is able to perform macroscopic traffic analysis based on the generated synthetic data, which can be used as direct input to transport planning processes.

While the open-source software has the capability to generalize an initial OD dataset to a bigger synthetic one, it does so by employing simplistic statistical rules applied to the source OD data. However, it is often the case that the input survey (e.g., ABM Travel Survey in our case) is incomplete, thus forcing the synthetic data generation software to rely on false data that include:

Missing activity durations,
Invalid activity durations (Negative values occurred when durations were calculated from inconsistent trip start times, requiring correction through estimation),
Missing the “return-to-home” activity.

More specifically, the proposed method introduces a new workflow, which identifies when some problem exists in the initial travel survey (aforementioned bullet points) and then uses the Cox and Cox-based models to estimate the missing durations of activities and respective travel times.

For example, in the case when there is a missing return-to-home activity, the workflow identifies it and applies one of the three models, so that the return time, as well as the duration of the previous trip, are estimated. An example of a missing return-to-home trip is given in Figure 2, where one may notice that there is no return trip home, while the last trip appears to be for work purposes and starts at 18:00. After the application of the proposed Cox model-based workflow, the return home trip is added, and the model estimates the duration of the work-related trip, followed by the estimation of the return time. In this way, we provide the initial travel survey is supplemented, filling in missing data, ultimately resulting in a more accurate input dataset to the ABM, which will be used to create synthetic travel-activity patterns.

It is important to note that the two Cox and Cox-based machine learning models are trained on the source survey data using an 80/20 train-test split and 5-fold cross-validation, prior to their integration into the workflow illustrated in Figure 3. These trained models are used to identify and correct missing or erroneous activity durations, taking into account the individual characteristics of each observation (Table 3). The corrected duration values are then reintegrated into the original survey, which undergoes a preprocessing phase involving minor adjustments. This produces an updated and enriched survey dataset, which is subsequently given as input to PAM, which follows the standard modeling and synthetic data generation processes.

3.3. Model Performance Evaluation

In the attempt to showcase the suitability of the proposed Cox model for the generation of synthetic activity-travel patterns data, their performance is evaluated in two phases. In the first phase (as described in Section 3.3.1), they are tested according to five metrics that emerge after applying an 80/20 train-test split and K-fold cross-validation. Then, in the second phase, synthetic datasets are generated for each Cox model-based approach and the ABM model, which are then compared to the real-world data according to some metrics (as further described in Section 3.3.2). It is crucial to emphasize that each test is conducted per trip purpose. Otherwise, the complexity would be overwhelming, making it impossible to draw reliable conclusions. The comparison of the models is carried out in two stages.

3.3.1. Cox Models Comparison According to Metrics

First, the three Cox-based models, the simple Cox model, the Cox model with latent classes, and the Cox model-based machine learning model, are trained and evaluated using both an 80/20 train-test split and K-fold cross-validation. Performance is assessed according to the Concordance Index (C-index), Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE), ensuring robust evaluation across different scenarios.

Additionally, the two traditional Cox models are also compared using the Akaike Information Criterion (AIC) and the Watanabe–Akaike Information Criterion (WAIC). These criteria assess model fit while penalizing complexity: AIC is standard for conventional models, while WAIC is more suitable for Bayesian models. Since the Cox model with frailty incorporates Bayesian elements, calculating AIC directly may not be ideal, but significant differences in magnitude can still offer insight. It is important to note, however, that AIC and WAIC are not appropriate for evaluating machine learning models, particularly machine learning approaches, due to differences in model structure, regularization techniques, and optimization objectives. These comparisons help highlight the benefit of accounting for unobserved heterogeneity in generating realistic synthetic data. Lower values in AIC, WAIC, and error metrics indicate better model performance.

3.3.2. Comparison of Synthetic Data vs. Real Data

In the second stage of the comparison, statistical tests are used to evaluate the realism of the synthetic data. Specifically, each set of synthetic data generated by the Activity-Based Model, according to three different input survey configurations, is compared to real survey data.

Using the MAE, RMSE, and the K–S test, we assess which methodology produces the most realistic synthetic data. MAE provides an average measure of prediction error magnitude, while RMSE emphasizes larger deviations due to squaring the error terms. The K–S test measures the difference between cumulative distributions, with the D-statistic indicating the maximum deviation and the p-value indicating statistical significance. The respective mathematical formulas for MAE and RMSE can be found in Appendix A.

4. Results

Before presenting the results, it is essential to describe the convergence safeguards and training configurations employed in each model. For the simple Cox model, no specific measures were taken, and convergence was straightforward to achieve. For the Cox model with frailty, Hamiltonian Monte Carlo sampling was carried out using four chains with 2000 warm-up and 3000 sampling iterations each. Initial values for regression coefficients, latent frailties, and class probabilities were drawn from weakly informative priors, while a high target acceptance probability and tree depth limit of 20 were used to avoid divergent transitions. These settings ensured convergence and robust posterior estimates, as verified by diagnostic statistics. In the Cox-based machine learning model, a fully connected neural network with a single hidden layer of 8 units, ReLU activation, and dropout (0.4) was trained over 100 epochs using the Adam optimizer (learning rate = 0.01, batch size = 4). To prevent overfitting and ensure generalizability, both an 80/20 train-test split and 5-fold cross-validation were implemented. These procedures, together with consistent random seed initialization, provided a stable and reproducible framework for model training and evaluation.

It is also important to mention that the input dataset consists of 513 individual observations, including 311 work-related, 43 educational, 27 market, and 132 trips categorized as “other.” The models are trained on these data but are also used to repair this initial dataset, as shown in the workflow in Figure 3, which ultimately results in a synthetic population of approximately 13,500 observations.

4.1. Cox and Cox-Based Models Comparison

Table 4, Table 5, Table 6 and Table 7 present a comparative analysis of three Cox-based models: the simple Cox model, the Cox model with frailty (i.e., incorporating latent class heterogeneity), and the Cox-based machine learning model. Each model is evaluated for four distinct trip purposes (Work, Market, Education, and Other) using a combination of performance metrics, including C-index, MAE, and RMSE, with evaluations performed using both 80/20 train-test splits and 5-fold cross-validation.

Table 4 illustrates the performance of the simple Cox model. While the model demonstrates moderate predictive accuracy for work-related and “Other” trips (C-index of 0.6746 and 0.8421, respectively), its performance significantly degrades for market and education trips, with C-index values near or below 0.5 (indicative of limited discriminative capability). The MAE and RMSE values are correspondingly high, particularly for the “Other” category in the 5-fold cross-validation scenario.

Table 4. Simple Cox model evaluation results.

Trip Purpose	Model Type	C-index	MAE	RMSE
Work	80/20 Split	0.6746	5.9995	6.6242
Work	5-Fold CV	0.6788	5.9963	6.7622
Market	80/20 Split	0.45	3.72	4.8572
Market	5-Fold CV	0.393	3.0232	3.8979
Education	80/20 Split	0.4545	3.7411	4.7305
Education	5-Fold CV	0.6718	4.5595	5.3528
Other	80/20 Split	0.8421	3.4673	4.5029
Other	5-Fold CV	0.7071	7.3472	12.9006

Table 5 evaluates the Cox model with frailty. The inclusion of latent classes introduces the modeling of unobserved heterogeneity, slightly improving performance in some categories (e.g., Market trips), although the overall C-index values can be perceived as inconsistent for the different trip purposes. Notably, the model performs better in estimating market trip durations, with improvements in MAE and RMSE relative to the simple Cox model. However, for the “Other” category, performance is worse than that of the simple Cox model for both the 80/20 train/test split approach and the 5-fold cross-validation.

Table 5. Cox model with frailty evaluation results.

Trip Purpose	Model Type	C-index	MAE	RMSE
Work	80/20 Split	0.4319	4.5439	5.2035
Work	5-Fold CV	0.4753	4.6526	5.2832
Market	80/20 Split	0.5	3.4341	3.7552
Market	5-Fold CV	0.5164	3.2440	3.5082
Education	80/20 Split	0.7879	4.1424	4.8065
Education	5-Fold CV	0.507	3.4398	4.0277
Other	80/20 Split	0.2632	6.2377	6.7870
Other	5-Fold CV	0.4907	5.5641	6.3158

Table 6 demonstrates the performance of the Cox-based machine learning model, which outperforms the traditional models across most trip purposes. The model exhibits higher C-index values and substantially lower error metrics, particularly for Work and Other trips. For example, the MAE for Work trips reduces to approximately 2.24 compared to approximately 6.0 in the simple Cox model. This enhanced performance highlights the machine learning model’s ability to capture complex, non-linear relationships in the data.

Table 6. Cox-based machine learning model.

Trip Purpose	Model Type	C-index	MAE	RMSE
Work	80/20 Split	0.7005	2.2353	3.1947
Work	5-Fold CV	0.6814	2.2546	3.2137
Market	80/20 Split	0.15	1.4444	1.7321
Market	5-Fold CV	0.404	2.3028	2.9354
Education	80/20 Split	0.4848	3.2857	4.0883
Education	5-Fold CV	0.6707	2.2667	2.9981
Other	80/20 Split	0.6842	1.8571	2.2361
Other	5-Fold CV	0.651	2.4238	3.1286

Table 7 provides a model comparison using the AIC for the simple Cox model and the WAIC for the Cox model with frailty. Across all trip purposes, the frailty-based model yields lower values (e.g., WAIC = 103.1 for Market trips vs. AIC = 285.44), indicating better model fit while accounting for unobserved heterogeneity and model complexity. These findings support the fact that incorporating latent class structures improves the model’s explanatory power, although machine learning remains superior in predictive performance. However, it is important to note that AIC and WAIC are not directly comparable and should only be interpreted as rough indicators at this point of the analysis.

Table 7. Cox models comparison.

Trip Purpose	Model Type	Criterion	Value
Work	Cox without frailty	AIC	3162.45
Work	Cox with frailty	WAIC	1590.6
Other	Cox without frailty	AIC	171.83
Other	Cox with frailty	WAIC	166.17
Market	Cox without frailty	AIC	285.44
Market	Cox with frailty	WAIC	103.1
Education	Cox without frailty	AIC	257.65
Education	Cox with frailty	WAIC	229.44

The tests performed under Section 4.1 are just preliminary and serve to benchmark the relative performance of the Cox-based models in isolation. While they indicate the statistical superiority of the machine learning model, they do not yet reflect the models’ real-world impact on synthetic data generation. Therefore, Section 4.2 advances this analysis by directly comparing the synthetic data outputs generated using each model against actual observed data. This deeper evaluation helps determine whether improvements in model structure translate into more realistic and reliable synthetic activity-travel patterns.

4.2. Comparison of Synthetically Generated Data with Real-World

Table 8, Table 9, Table 10 and Table 11 assess the realism of synthetically generated activity-travel duration data compared to observed real-world survey data. Four different synthetic data configurations are evaluated per trip purpose using the K–S test, MAE, and RMSE. Regarding the first synthetic data (indicated as “Synth. Data 1”) it corresponds to the synthetic data that are produced when input travel survey to the ABM is “as-is” without using any of the Cox or Cox-based models to repair its missing activity durations, or other inconsistencies as presented in Section 3. The second and third synthetic datasets, indicated as “Synth. Data 2” and “Synth. Data 3” in Table 8, Table 9, Table 10 and Table 11, are the two synthetic datasets that emerge when input travel surveys to the ABM are repaired with the simple Cox model and the Cox model with two latent classes, respectively. Finally, with “Synth. Data 4” we denote the synthetic data that emerge when the input travel survey is repaired with the Cox-based machine learning model.

Table 8 focuses on Work-related activities. The synthetic dataset is labeled as Synth. Data 3 demonstrates the highest fidelity to real-world data, with the lowest K–S test statistic (D = 0.20822, p < 0.001), and favorable MAE (3.66) and RMSE (4.72). This indicates strong similarity in the distribution and dispersion of durations.

Table 8. Comparison of real vs. synthetic data for work purposes.

Metrics	Synth. Data 1	Synth. Data 2	Synth. Data 3	Synth. Data 4
K–S test (D/p-value)	0.2939/ <2.2 × 10⁻¹⁶	0.25486/ <2.2 × 10⁻¹⁶	0.20822/ 3.46 × 10⁻¹¹	0.16116/ 6.885 × 10⁻⁸
MAE	4.572079	3.707881	3.664099	4.195042
RMSE	5.609498	4.739945	4.723046	5.323675

Table 9 pertains to Market trips. Again, Synth. Data 3 yields the most accurate results, with the lowest RMSE (3.59) and a significantly reduced K–S statistic (D = 0.4235), suggesting it most closely replicates the real-world distribution among all tested configurations.

Table 9. Comparison of real vs. synthetic data for Market purposes.

Metrics	Synth. Data 1	Synth. Data 2	Synth. Data 3	Synth. Data 4
K–S test (D/p-value)	0.8619/ 2.2 × 10⁻¹⁶	0.62409/ 1.73 × 10⁻¹⁴	0.4235/ 3.71 × 10⁻⁷	0.4213/ 2.033 × 10⁻⁷
MAE	2.685109	2.3248	2.407971	2.841279
RMSE	3.932881	3.943628	3.591605	4.174001

Table 10 presents results for Educational activities. All synthetic datasets exhibit similar MAE and RMSE values, with Synth. Data 3 offering marginal improvements in alignment with the real data distribution (p = 0.1031 in K–S test). Given the minimal intervention in educational trip durations, these results imply inherent stability in this activity type.

Table 10. Comparison of real vs. synthetic data for education purposes.

Metrics	Synth. Data 1	Synth. Data 2	Synth. Data 3	Synth. Data 4
K–S test (D/p-value)	0.25078/ 0.000972	0.2253/ 0.02735	0.18736/ 0.1031	0.24032/ 0.01534
MAE	4.047721	3.91789	3.951568	3.971024
RMSE	5.132352	5.062376	5.040041	4.999081

Table 11 evaluates the “Other” category. While all synthetic datasets show higher errors compared to other trip purposes, Synth. Data 3 again performs best, producing the lowest MAE (4.39) and RMSE (5.84). These outcomes reinforce the utility of more sophisticated modeling (e.g., latent class or machine learning-based) in enhancing synthetic data realism.

Table 11. Comparison of real vs. synthetic data for other purposes.

Metrics	Synth. Data 1	Synth. Data 2	Synth. Data 3	Synth. Data 4
K–S test (D/p-value)	0.47753/ 8.42 × 10⁻⁸	0.2801/ 0.005868	0.35482/ 0.0001707	0.31586/ 0.001193
MAE	4.755	4.5188	4.3898	4.45858
RMSE	6.1339	5.9359	5.8461	5.921535

Collectively, the analysis demonstrates that advanced Cox-based modeling approaches, particularly those that incorporate unobserved heterogeneity or leverage machine learning to capture complex, non-linear relationships, significantly improve the realism and predictive accuracy of synthetic activity-travel datasets. These enhancements are especially pronounced in more variable and complex trip categories, such as Work and Other, where conventional models often fail to capture nuanced duration patterns. By bridging model diagnostics (Section 4.1) with the validation of synthetic data (Section 4.2), the results underscore the critical role of both latent class modeling and neural network-based survival analysis in generating synthetic travel patterns that are behaviorally realistic and statistically robust.

4.3. ABM’s Outputs and Traffic Analysis Results

One of the features of the open-source ABM that is used as part of this research work is Traffic Analysis based on the synthetic data generated. This analysis, and the respective data that can be collected, as they are plotted in Figure 4, Figure 5, Figure 6, Figure 7 and Figure 8 of the current section and Figure A1 and Figure A2 in Appendix A, can serve as an interesting input to any transport planning process.

Figure 4 and Figure 5 focus on the hourly distribution of trips by income group, before and after the intervention, respectively. While the overall structure of trip activity remains consistent, subtle yet significant shifts are observable, particularly among lower-income groups, which exhibit a redistribution of trip frequencies toward the midday period post-intervention. This suggests a refinement in activity duration estimates, which better aligns synthetic behavior with known empirical patterns. In contrast, higher-income groups maintain a more bimodal distribution centered around conventional commuting hours, but with slightly sharper peaks after the intervention, indicative of improved modeling of work-related trip timings.

Figure 6 further reinforces this finding by juxtaposing the aggregate hourly trip percentages across the two scenarios (pre- and post-intervention). The inclusion of Cox-based duration estimation yields a smoother and more realistic temporal curve, especially during late afternoon and evening periods, where unprocessed data previously underrepresented trip volume. These refinements support the hypothesis that data-driven duration correction enhances the fidelity of synthetic activity chains.

Figure 4. Percentage of Trips by Hour and by Income group before the intervention proposed in this research.

Figure 5. Percentage of Trips by Hour and by Income group after the intervention proposed in this research.

Figure 6. Hourly comparison of the percentage of trips for the scenarios when the workflow does not include the Cox-based models for activity duration estimation (purple), and when it includes (orange).

In spatial terms, Figure 7 and Figure 8 depict traffic density distributions before and after the intervention, focusing on market-related and car-mode trips, respectively. The maps use a color gradient, from light blue for lower densities to magenta for higher densities, to highlight variations across both time and space. Notably, post-intervention distributions show broader and more nuanced traffic dispersal, particularly in peripheral zones, indicating a better representation of non-central urban activity. This suggests that the intervention enables the ABM to capture not only core commuting behaviors but also discretionary and decentralized trip patterns more effectively.

Figure A1 and Figure A2, included in Appendix A, present cumulative frequency plots of trips per age group generated by the ABM using the original and enhanced workflows, respectively. A comparative inspection reveals that the intervention leads to a more proportionate distribution of trips across age groups, particularly among younger cohorts (e.g., 18–35), who exhibit increased activity in the updated model. This correction likely reflects the influence of latent behavioral variables captured by the Cox model with frailty, which were previously underrepresented in the original workflow.

Taken together, these figures demonstrate the multidimensional value of incorporating advanced duration modeling into synthetic activity generation. Not only do they enhance temporal realism and spatial accuracy, but they also improve the demographic representativeness of the synthetic population. These improvements are of particular relevance for transport planning applications where precision in demand modeling across time, space, and user profiles is critical for robust policy evaluation and infrastructure design.

Figure 7. A comparative analysis of Market Trips density before and after the intervention: Areas shaded in light blue indicate lower traffic density, whereas regions highlighted in magenta correspond to periods of higher traffic concentration (sourced from PAM).

Figure 8. A comparative analysis of Car Mode trip density before and after the intervention: Areas shaded in light blue indicate lower traffic density, whereas regions highlighted in magenta correspond to periods of higher traffic concentration (sourced from PAM).

5. Discussion

Considering the results of our study, we can assert that the missing durations are filled with greater accuracy through the Cox-based machine learning model and the Cox frailty model with 2 latent classes, as compared to a simple Cox model, and significantly better than assuming a normal or Weibull distribution (as it is done in the standard scenario by the ABM model).

By utilizing our proposed Cox-based workflow, one may notice that there are no major differences in the daily movement schedules by purpose of movement and use of transportation modes with all the different approaches followed. However, significant changes can be distinguished, mainly toward the end of the day, as the majority of interventions concern durations in activity chains where the “return home at the end of the day” was not mentioned. In that regard, the authors have noticed considerable differentiations in the synthetic data that are generated with the proposed workflow.

Some more specific conclusions regarding the traffic behavior of the travelers in the generated datasets are:

In shopping-related trips, an increase in duration range is observed, with several trips lasting 2–3 h, which is expected. Higher activity is noted in the afternoon, particularly between 18:00 and 19:30.
Work-related trips show greater variability in duration and end times, unlike the unprocessed data, where trip endings concentrate in the afternoon.
Other and service-related trips display a previously absent duration range, with more trips lasting 1.5–3 h. While most occur in the afternoon, service-related trips show increased morning activity.
Educational trips remain largely unchanged, likely due to minimal intervention. Their start and end times, as well as durations, follow a pattern that can be considered more irregular than other types of purpose trips.
Increased percentage of use of cars and trains/metro during peak afternoon hours.
Slightly reduced percentage of use of motorcycles and buses in the afternoon.
Higher usage percentage of e-scooters and bicycles at night.
Greater walking activity in the afternoon.
Rise in both percentage and absolute number of trips during peak hours (17:00–20:00, 22:00–00:00), with a relative decline at 16:00 and 20:00.
The high-income category aligns with typical working hours, showing peak trip percentages in both morning and afternoon. The low-income category exhibits a midday peak, absent in other groups.
Fewer trips are associated with the high-income category, while the remaining groups follow similar patterns.
Older age groups correspond to fewer trips, while younger groups travel more in the synthetic data produced by the updated workflow.

Some limitations of the proposed research are discussed next. It is very important to mention that for this study, a small survey conducted by NTUA was used, comprising 513 observations and different daily activity chains. By adding complexity to the model with the two latent classes, which essentially represent two variables of the models for unobservable patterns affecting durations (hidden groups), it becomes clear that the more data the model is provided with for training, the more reliable results it will produce. Specifically, since all models are created separately for each travel purpose, this does reduce complexity, but at the same time, it significantly reduces the data, making it difficult to uncover these hidden patterns. Also, the authors acknowledge that the Cox model’s scalability has to be examined with bigger datasets since the stability of latent classes may vary with larger or more complex datasets.

The issue of data collection remains both significant and persistent, particularly in the context of model generalizability. While large-scale surveys are essential for accurate forecasting and effective transport planning, their absence limits the robustness and scalability of modeling approaches. The current study, based on a relatively small sample, faces inherent challenges in generalizing its findings to larger populations. This limitation is especially relevant when using models such as the Cox model with frailty, which relies on identifying latent behavioral groups. Expanding the number of latent classes (e.g., from two to four) and refining model variables would require more comprehensive survey data, ideally incorporating observations from multiple days, household-level characteristics, and broader demographic representation, to improve both accuracy and applicability at scale.

Another important aspect of this research is the application of the proposed models within an open-source software environment. Specifically, the PAM library was used for population synthesis and travel pattern generation in the context of the ABM. While the use of open-source tools facilitates the integration of the Cox model with frailty and supports adaptability across different applications, a limitation of the current study is its reliance on a single ABM framework. Future research could benefit from evaluating the proposed methodology across a broader range of ABM platforms to assess its generalizability and performance in varied modeling environments. Nonetheless, the open nature of such software promotes transparency, methodological reuse, and continued innovation in activity-pattern modeling and traffic analysis.

6. Conclusions

In this paper, a Cox model-based workflow is presented to fill missing or wrong durations and enrich OD surveys. As these surveys are ABMs’ input, the ultimate goal is to generate more accurate synthetic travel-activity patterns. The results highlight the importance of accounting for unobserved heterogeneity and/or non-linear patterns, as travelers’ behavior is multidimensional, and it is very hard for a survey to describe it at all through its variables. Nonetheless, some limitations should be acknowledged. First, the small survey size may limit the proposed workflows’ and models’ generalizability, as unobserved heterogeneity might not be fully captured. Second, while the Cox model accounts for time-varying risks, it assumes that the relationship between covariates and the hazard rate remains constant over time. This assumption implies that external conditions, such as changes in weather or congestion, do not influence the activity durations in the way they might in a more complex model. Finally, assumptions regarding missing data patterns could introduce bias if unobserved mechanisms differ from those modeled.

Future research could explore expanding the dataset to include more diverse scenarios and behavioral contexts, which could help improve the model’s generalization. Additionally, integrating dynamic modeling approaches that account for temporal fluctuations in activity patterns (e.g., seasonal trends, time-of-day effects) could enhance the model’s ability to capture variability in activity durations. Finally, exploring alternative statistical techniques to account for unobserved heterogeneity more flexibly could also improve the precision of estimated durations.

Author Contributions

Conceptualization, D.K., D.R., and K.G.; methodology, D.K., D.R., and K.G.; software, D.K., and D.R.; validation, D.K., D.R., and K.G.; data curation, D.K.; writing—original draft preparation, D.K., D.R., and K.G.; writing—review and editing, D.K., D.R., and K.G.; visualization, D.K.; supervision, K.G.; funding acquisition, K.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Restrictions apply to the availability of these data. Data were obtained from LoTE Group from NTUA and are available from the corresponding author with the permission of LoTE Group from NTUA.

Acknowledgments

We would like to thank Panagiotis G. Tzouras for his support during the initial stages of this research.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ABM	Activity-based Modeling
AIC	Akaike Information Criterion
C-index	Concordance Index
DOAJ	Directory of open access journals
GPS	Global Positioning System
HOH	Home-Other-Home
HWH	Home-Work-Home
IBS	Integrated Brier Score
K–S	Kolmogorov–Smirnov
MAE	Mean Absolute Error
MDPI	Multidisciplinary Digital Publishing Institute
NTUA	National Technical University of Athens
OD	Origin-Destination
PAM	Population Activity Modeler
WAIC	Watanabe-Akaike Information Criterion
RMSE	Root Mean Square Error

Appendix A

Regarding the Mean Absolute Error (MAE), it is expressed by the following equation:

M A E = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - \hat{y_{i}} |

where

y_{i}

is the actual duration for each observation i, and

y_{P i}

is the predicted duration for the same observation. MAE is used to calculate the average magnitude of prediction errors.

Similarly, the Root Mean Squared Error (RMSE) is defined as:

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}}

Using the same notation, RMSE places greater emphasis on larger errors due to the squaring of differences, thereby highlighting the influence of potential outliers.

Figure A1. Cumulative Frequency of Number of Trips per Age group for a synthetic population based on the initial workflow (without the intervention proposed in this paper).

Figure A2. Cumulative Frequency of Number of Trips per Age group for a synthetic population based on the Cox-based workflow (including the intervention proposed in this paper).

References

Fina, S.; Joshi, J.; Wittowsky, D. Monitoring travel patterns in German city regions with the help of mobile phone network data. Int. J. Digit. Earth 2021, 14, 379–399. [Google Scholar] [CrossRef]
Shen, S.; Koech, W.; Feng, J.; Rice, T.M.; Zhu, M. A cross-sectional study of travel patterns of older adults in the USA during 2015: Implications for mobility and traffic safety. BMJ Open 2017, 7, e015780. [Google Scholar] [CrossRef]
Barmpounakis, E.; Geroliminis, N. On the new era of urban traffic monitoring with massive drone data: The pNEUMA large-scale field experiment. Transp. Res. Part C Emerg. Technol. 2020, 111, 50–71. [Google Scholar] [CrossRef]
Hörl, S.; Balac, M. Synthetic population and travel demand for Paris and Île-de-France based on open and publicly available data. Transp. Res. Part C Emerg. Technol. 2021, 130, 103291. [Google Scholar] [CrossRef]
Yee, J.L.; Niemeier, D.A. Analysis of activity duration using the Puget sound transportation panel. Transp. Res. Part A Policy Pract. 2000, 34, 607–624. [Google Scholar] [CrossRef]
TomTom. 2024. Available online: https://www.tomtom.com/traffic-index/ranking/ (accessed on 31 March 2025).
Canalys. Available online: https://www.canalys.com/newsroom/europe-smartphone-market-Q1-2024 (accessed on 31 March 2025).
PewResearch. 2024. Available online: https://www.pewresearch.org/internet/2012/05/11/three-quarters-of-smartphone-owners-use-location-based-services/ (accessed on 31 March 2025).
DataReportal. 2024. Available online: https://datareportal.com/reports/digital-2024-deep-dive-5-billion-social-media-users (accessed on 31 March 2025).
Gkiotsalitis, K.; Stathopoulos, A. A utility-maximization model for retrieving users’ willingness to travel for participating in activities from big-data. Transp. Res. Part C Emerg. Technol. 2015, 58, 265–277. [Google Scholar] [CrossRef]
Li, J.; Rombaut, E.; Vanhaverbeke, L. A Stepwise Approach of Generating Agent-based Simulation Model for Brussels Using Ubiquitous Big Data. Transp. Res. Procedia 2023, 72, 2261–2268. [Google Scholar] [CrossRef]
Chen, C.; Ma, J.; Susilo, Y.; Liu, Y.; Wang, M. The promises of big data and small data for travel behavior (aka human mobility) analysis. Transp. Res. Part C Emerg. Technol. 2016, 68, 285–299. [Google Scholar] [CrossRef] [PubMed]
Munizaga, M.A.; Palma, C. Estimation of a disaggregate multimodal public transport Origin–Destination matrix from passive smartcard data from Santiago, Chile. Transp. Res. Part C Emerg. Technol. 2012, 24, 9–18. [Google Scholar] [CrossRef]
Ahmed, B. The Traditional Four Steps Transportation Modeling Using Simplified Transport Network: A Case Study of Dhaka City, Bangladesh. Int. J. Adv. Sci. Eng. Technol. Res. 2012, 1, 19–40. [Google Scholar]
Ziemke, D.; Kaddoura, I.; Nagel, K. The MATSim Open Berlin Scenario: A multimodal agent-based transport simulation scenario based on synthetic demand modeling and open data. Procedia Comput. Sci. 2019, 151, 870–877. [Google Scholar] [CrossRef]
Tozluoğlu, Ç.; Dhamal, S.; Yeh, S.; Sprei, F.; Liao, Y.; Marathe, M.; Barrett, C.L.; Dubhashi, D. A synthetic population of Sweden: Datasets of agents, households, and activity-travel patterns. Data Brief 2023, 48, 109209. [Google Scholar] [CrossRef] [PubMed]
Joubert, J.W. Synthetic populations of South African urban areas. Data Brief 2018, 19, 1012–1020. [Google Scholar] [CrossRef]
Aljoufie, M.; Zuidgeest, M.; Brussel, M.; van Vliet, J.; van Maarseveen, M. A cellular automata-based land use and transport interaction model applied to Jeddah, Saudi Arabia. Landsc. Urban. Plan. 2013, 112, 89–99. [Google Scholar] [CrossRef]
Wang, Y.; Monzon, A.; Di Ciommo, F. Assessing the accessibility impact of transport policy by a land-use and transport interaction model—The case of Madrid. Comput. Environ. Urban. Syst. 2015, 49, 126–135. [Google Scholar] [CrossRef]
Guzman, L.A.; Gomez, A.M.; Rivera, C. A Strategic Tour Generation Modeling within a Dynamic Land-Use and Transport Framework: A Case Study of Bogota, Colombia. Transp. Res. Procedia 2017, 25, 2536–2551. [Google Scholar] [CrossRef]
Ortega, J.; Hamadneh, J.; Esztergár-Kiss, D.; Tóth, J. Simulation of the Daily Activity Plans of Travelers Using the Park-and-Ride System and Autonomous Vehicles: Work and Shopping Trip Purposes. Appl. Sci. 2020, 10, 2912. [Google Scholar] [CrossRef]
Gkiotsalitis, K.; Stathopoulos, A. Predicting Traveling Distances and Unveiling Mobility and Activity Patterns of Individuals from Multisource Data. J. Transp. Eng. A Syst. 2020, 146, 04020025. [Google Scholar] [CrossRef]
Sallard, A.; Balac, M. Travel demand generation using Bayesian Networks: An application to Switzerland. Procedia Comput. Sci. 2023, 220, 267–274. [Google Scholar] [CrossRef]
He, B.Y.; Zhou, J.; Ma, Z.; Chow, J.Y.J.; Ozbay, K. Evaluation of city-scale built environment policies in New York City with an emerging-mobility-accessible synthetic population. Transp. Res. Part A Policy Pract. 2020, 141, 444–467. [Google Scholar] [CrossRef]
Jovicic, G.; Hansen, C.O. A passenger travel demand model for Copenhagen. Transp. Res. Part A Policy Pract. 2003, 37, 333–349. [Google Scholar] [CrossRef]
Rizopoulos, D.; Esztergár-Kiss, D. A modal share scenario evaluation framework including electric vehicles. Res. Transp. Bus. Manag. 2024, 56, 101201. [Google Scholar] [CrossRef]
Gong, L.; Han, P.; Lei, T.; Li, B.; Luo, Q.; Zhu, C. Analyzing the transfer duration of public transport passengers using classification and regression tree-multiple-Cox proportional hazards (CART-Multi-Cox) model. Transp. Lett. 2024, 1–16. [Google Scholar] [CrossRef]
Raux, C.; Ma, T.-Y.; Joly, I.; Kaufmann, V.; Cornelis, E.; Ovtracht, N. Travel and activity time allocation: An empirical comparison between eight cities in Europe. Transp. Policy 2011, 18, 401–412. [Google Scholar] [CrossRef]
Kalatian, A.; Farooq, B. DeepWait: Pedestrian Wait Time Estimation in Mixed Traffic Conditions Using Deep Survival Analysis. In Proceedings of the 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 27–30 October 2019. [Google Scholar]
Bhat, C.R. A generalized multiple durations proportional hazard model with an application to activity behavior during the evening work-to-home commute. Transp. Res. Part B Methodol. 1996, 30, 465–480. [Google Scholar] [CrossRef]
Nishiuchi, H.; Chikaraishi, M. Identifying Passengers Who Are at Risk of Reducing Public Transport Use: A Survival Time Analysis Using Smart Card Data. Transp. Res. Procedia 2018, 34, 291–298. [Google Scholar] [CrossRef]
Mbotwa, J.; de Kamps, M.; Baxter, P.D.; Gilthorpe, M.S. Application of Cox Model to predict the survival of patients with Chronic Heart Failure: A latent class regression approach. arXiv 2019, arXiv:1907.07957. [Google Scholar]
Sinha, P.M.; Calfee, C.S.M.; Delucchi, K.L. Practitioner’s Guide to Latent Class Analysis: Methodological Considerations and Common Pitfalls. Crit. Care Med. 2021, 49, e63–e79. [Google Scholar] [CrossRef]
Sreela, P.K.; Melayil, S.; Anjaneyulu, M.V.L.R. Modeling of Shopping Participation and Duration of Workers in Calicut. Procedia Soc. Behav. Sci. 2013, 104, 543–552. [Google Scholar] [CrossRef][Green Version]
Bhat, C.R. A hazard-based duration model of shopping activity with nonparametric baseline specification and nonparametric control for unobserved heterogeneity. Transp. Res. Part B Methodol. 1996, 30, 189–207. [Google Scholar] [CrossRef]
Hamed, M.M.; Easa, S.M. Integrated Modeling of Urban Shopping Activities. J. Urban. Plan. Dev. 1998, 124, 115–131. [Google Scholar] [CrossRef]
Enam, A.; Auld, J. Hazard-Based Model of Activity Generation Using Vehicle Trajectory Data. Procedia Comput. Sci. 2020, 170, 764–770. [Google Scholar] [CrossRef]
Liu, C.; Zuo, X.; Gu, X.; Shao, M.; Chen, C. Activity Duration under the COVID-19 Pandemic: A Comparative Analysis among Different Urbanized Areas Using a Hazard-Based Duration Model. Sustainability 2023, 15, 9537. [Google Scholar] [CrossRef]
Van den Berg, P.; Arentze, T.; Timmermans, H. A latent class accelerated hazard model of social activity duration. Transp. Res. Part A Policy Pract. 2012, 46, 12–21. [Google Scholar] [CrossRef]
Li, Y.; Dai, Z.; Zhu, L.; Liu, X. Analysis of spatial and temporal characteristics of citizens’ mobility based on e-bike GPS trajectory data in Tengzhou City, China. Sustainability 2019, 11, 5003. [Google Scholar] [CrossRef]
Tilahun, N.; Levinson, D. Contacts and Meetings: Location, Duration and Distance Traveled. 2009. Available online: https://ideas.repec.org/p/nex/wpaper/contactsandmeetings.html (accessed on 15 May 2025).
Moeckel, R.; Huang, W.-C.; Ji, J.; Llorca, C.; Moreno, A.T.; Staves, C.; Zhang, Q.; Erhardt, G.D. The Activity-based model ABIT: Modeling 24 hours, 7 days a week. Transp. Res. Procedia 2024, 78, 499–506. [Google Scholar] [CrossRef]
Liao, X.; Jiang, Q.; He, B.Y.; Liu, Y.; Kuai, C.; Ma, J. Deep Activity Model: A Generative Approach for Human Mobility Pattern Synthesis. arXiv 2024, arXiv:2405.17468. [Google Scholar]
Alsger, A.; Tavassoli, A.; Mesbah, M.; Ferreira, L.; Hickman, M. Public transport trip purpose inference using smart card fare data. Transp. Res. Part C Emerg. Technol. 2018, 87, 123–137. [Google Scholar] [CrossRef]
Kharoufeh, J.P.; Goulias, K.G. Nonparametric identification of daily activity durations using kernel density estimators. Transp. Res. Part B Methodol. 2002, 36, 59–82. [Google Scholar] [CrossRef]
Golshani, N.; Shabanpour, R.; Auld, J.; (Kouros) Mohammadian, A. Activity start time and duration: Incorporating regret theory into joint discrete–continuous models. Transp. A Transp. Sci. 2018, 14, 809–827. [Google Scholar] [CrossRef]
He, B.Y.; Zhou, J.; Ma, Z.; Wang, D.; Sha, D.; Lee, M.; Chow, J.Y.; Ozbay, K. A validated multi-agent simulation test bed to evaluate congestion pricing policies on population segments by time-of-day in New York City. Transp. Policy 2021, 101, 145–161. [Google Scholar] [CrossRef]
Shone, F.; Chatziioannou, T.; Pickering, B.; Kozlowska, K.; Fitzmaurice, M. PAM: Population Activity Modeller. J. Open Source Softw. 2024, 9, 6097. [Google Scholar] [CrossRef]
Therneau, T.M.; Grambsch, P.M. The Cox Model. In Modeling Survival Data: Extending the Cox Model; Springer: New York, NY, USA, 2000; pp. 39–77. [Google Scholar] [CrossRef]
Faraggi, D.; Simon, R. A neural network model for survival data. Stat. Med. 1995, 14, 73–82. [Google Scholar] [CrossRef] [PubMed]
Katzman, J.L.; Shaham, U.; Cloninger, A.; Bates, J.; Jiang, T.; Kluger, Y. DeepSurv: Personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med. Res. Methodol. 2018, 18, 24. [Google Scholar] [CrossRef]

Figure 1. ABM workflow before the intervention proposed by the current research.

Figure 2. An example of how the “return-to-home” intervention improves an activity-travel pattern in the ABM survey.

Figure 3. ABM workflow after the intervention proposed in this article.

Table 1. Most related articles summary table.

	Authors	Year	Decision Variables	Study Purpose	Type of Model
Travel Demand Modeling and Forecasting	J.L. Yee et al. [5]	2000	Socioeconomic Characteristics and Trip Characteristics	Estimating activity durations per purpose based on socioeconomic and trip characteristics	Non-Linear Model (Cox parametric hazard-based)
	Munizaga et al. [13]	2012	Location and Duration between consecutive payments	Recording OD matrices from public transport data	Discrete Non-Linear Model (Τrip-based model)
	Bayes Ahmed [14]	2012	Population, Income, Land Cost, Unemployment	Forecasting OD matrices in 10 years	Linear Model
	Hörl et al. [4]	2021	Trip Characteristics	Creation of synthetic travel demand using open data to facilitate methodology replication	Discrete Non-Linear Model (Agent-based model)
	Aljoufie et al. [18]	2013	Land Use, availability, Trip cost	OD matrices and accessibility estimation	Discrete non-Linear Model (LUTI model)
	Gkiotsalitis et al. [22]	2020	Location, Type of Activity, Travel Distance	Retrieve information for trip characteristics based on social media data	Hybrid Activity-Mobility Model with Machine Learning
	Enam et al. [37]	2020	Socioeconomic Characteristics and Trip Characteristics	Activity generation modeling from vehicle trajectory data to improve travel behavior prediction	Non-Linear Model (Weibull parametric hazard-based)
	Chunguang Liu et al. [38]	2023	Socioeconomic characteristics, Land Use, travel distances	Estimating activity duration based on socioeconomic characteristics and land use	Semi-parametric model (Cox)
	Tilahun et al. [41]	2009	Socioeconomic Characteristics and Trip Characteristics	Estimating locations, trip distance, and duration based on social characteristics	Non-Linear Model (path model)
	Rolf Moeckel et al. [42]	2024	Socioeconomic Characteristics and Trip Characteristics	Creation of a simulation environment, retrieving and forecasting trip characteristics information	Activity-based model
	Liao et al. [43]	2024	Socioeconomic Characteristics and Trip Characteristics	Generating accurate activity chains through a deep learning process	Activity-based model
	Alsger et al. [44]	2018	Spatial and Temporal variables, socioeconomic characteristics, Land Use	Estimation of travel purposes	Discrete Non-Linear Model (rule-based model)
Activity Duration estimation research	Tozluoglu et al. [16]	2023	Socioeconomic and spatial characteristics	Estimation of travel activity patterns and their Spatial-Temporal Distribution	Discrete Non-Linear Model (rule-based model)
	Dominik Ziemke et al. [15]	2019	Socioeconomic and trip characteristics	Population Synthesis and creation of a MATSim environment	Discrete Non-Linear Model (Agent-based model)
	Sallarda et al. [23]	2023	Socioeconomic characteristics and trip purposes	Estimation of travel activity patterns	Discrete Non-Linear Model (machine learning)
	He, B.Y., et al. [24]	2020	Mode choice, Trip cost, trip duration	Spatial and Temporal Distribution of trips and Mode choice based on specific scenarios	Discrete Non-Linear Model (Tour-based model)
	Jovicic et al. [25]	2003	Population, Land Use, Car Ownership	Estimation of the number of trips and purposes for toll policy examination	Discrete Non-Linear Model (Tour-based model)
	Sreela P. et al. [34]	2013	Socioeconomic Characteristics and Trip Characteristics	Estimating workers’ shopping duration based on socioeconomic characteristics	Non-Linear Model (Weibull parametric)
	Chandra R. Bhat [35]	1996	Socioeconomic Characteristics and Trip Characteristics	Estimating shopping durations based on trip characteristics and by taking into consideration heterogeneity	Non-Linear Model (Weibull parametric vs. non parametric)
	M. Hamed [36]	1998	Socioeconomic and Trip Characteristics	Disaggregate modeling of shopping urban activities based on social characteristics and household	Non-Linear Model (Weibull parametric hazard)
	Kharoufeh et al. [45]	2002	Socioeconomic characteristics/gender	Examining non-parametric pattern recognition tool for the purpose of investigating covariate effects and heterogeneity in duration models	Non-Linear Model (Kernel density estimator)
	N. Golshani et al. [46]	2018	Socioeconomic characteristics and Start trip time	Estimating activity duration based on socioeconomic characteristics and travel time	Non-Linear Model (copula joint-based model)
Activity-Based Modeling and Simulation	Li, J., et al. [11]	2023	Population, Location, Start trip time	Estimation of chain activities through ABM simulation	Discrete non-Linear Model (Agent-based model)
	Chen et al. [12]	2016	Location, Start Trip time, Duration,	Estimation of OD Matrix and trip purposes	Discrete Choice non-Linear Model
	Yixiao Li et al. [40]	2019	Location, Start trip time and travel speed	Estimation of travel activity patterns	Discrete non-Linear Model (spatial statistic model)
	Pauline Van den Berg et al. [39]	2012	Socioeconomic Characteristics and Trip Characteristics	Estimating social activity durations by latent class based on social characteristics	Non-Linear Model (Weibull parametric)
Scenario Analysis and Policy Evaluation	Gkiotsalitis et al. [10]	2015	Start trip time and type of trip	Forecasting traveled distances and travel patterns	Discrete Non-Linear Model (machine learning)
	He, B.Y., et al. [47]	2021	Mode choice, Spatial Distribution	Examination of toll policy scenarios based on different pricing policies	Discrete Non-Linear Model (Agent-based model)
	Joubert J. [17]	2018	Socioeconomic Characteristics	Population Synthesis in order to be input for MATSim	Discrete Non-Linear Model (Agent-based model)
	Y. Wang et al. [19]	2015	Socioeconomic characteristics and accessibility, land use	Exploration of scenarios and their evaluation based on financial conclusions.	Discrete Non-Linear Model (LUTI model)

Table 2. Cox model’s characteristics.

Sets
$K$	Set of covariates included as predictive factors in the model
$C$	Set of Latent Classes
$I$	Set of Observations
Parameters
$β_{c, κ}$	Coefficients for each characteristic k and latent class c
$z_{f r a i l t y}$	Latent frailty variables for each class
$σ_{f r a i l t y}$	Standard deviation of frailty terms
${c l a s s}_{p r o b s}$	The probabilities of belonging to each latent class
Variables
$a c t i v i t y$ $d u r_{i}$	The duration of activity (survival time) for each observation i
$X_{i},_{k}$	The vector of predictor variables k for each observation i
$s t a t u s_{i}$	The censoring variable (1 = event occurred, 0 = censored)

Table 3. Survey variables for the Activity-Travel Patterns, that are also used as features for the training of the models.

Category	Variable Name	Type	Vaue Mappings
Demographic	Gender	Categorical	(1: male, 0: female)
	Age	Continuous	Age in years
	Education	Categorical	1: Primary School, 2: High School, 3: Bachelor, 4: Master or PhD
	Employment	Categorical	1: Inactive, 2: Unemployed, 3: Student, 4: Active
	Income	Categorical	0: No income, 1: ≤750, 2: 750–1500, 3: 1500–2500, 4: ≥2500
	Car_own	Categorical	0: No, 1: Yes
Spatial Variables	Dest	Categorical	1: Central Athens, 2: West Athens, 3: East Attica, 4: South Athens, 5: North Athens, 6: Piraeus
	Home	Categorical	Same as Dest
Mode Choice	Mode	Categorical	1: car, 2: taxi,3: bus, 4: train, 5: motorcycle, 6: bicycle, 7: walk, 8: E-scooter
Temporal Variables	Time	Continuous	Start trip time (24-h format)
Distance	Dist	Continuous	Distance

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Katsaitis, D.; Rizopoulos, D.; Gkiotsalitis, K. A Cox Model-Based Workflow for Increased Accuracy in Activity-Travel Patterns Generation. Appl. Sci. 2025, 15, 6237. https://doi.org/10.3390/app15116237

AMA Style

Katsaitis D, Rizopoulos D, Gkiotsalitis K. A Cox Model-Based Workflow for Increased Accuracy in Activity-Travel Patterns Generation. Applied Sciences. 2025; 15(11):6237. https://doi.org/10.3390/app15116237

Chicago/Turabian Style

Katsaitis, Dionysios, Dimitrios Rizopoulos, and Konstantinos Gkiotsalitis. 2025. "A Cox Model-Based Workflow for Increased Accuracy in Activity-Travel Patterns Generation" Applied Sciences 15, no. 11: 6237. https://doi.org/10.3390/app15116237

APA Style

Katsaitis, D., Rizopoulos, D., & Gkiotsalitis, K. (2025). A Cox Model-Based Workflow for Increased Accuracy in Activity-Travel Patterns Generation. Applied Sciences, 15(11), 6237. https://doi.org/10.3390/app15116237

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Cox Model-Based Workflow for Increased Accuracy in Activity-Travel Patterns Generation

Abstract

1. Introduction

2. Literature Review

2.1. Data Sources

2.2. Modeling Travel Activity Patterns

2.3. Cox Proportional Hazards and Hazard-Based Models in Transport Research

2.4. Modeling Activity Durations

3. Methodology

3.1. Modeling Unobserved Heterogeneity and Non-Linear Relationships in Source Data by Using the Cox-Based Model

3.2. Activity-Based Model Structure and Integration with Cox-Based Models

3.3. Model Performance Evaluation

3.3.1. Cox Models Comparison According to Metrics

3.3.2. Comparison of Synthetic Data vs. Real Data

4. Results

4.1. Cox and Cox-Based Models Comparison

4.2. Comparison of Synthetically Generated Data with Real-World

4.3. ABM’s Outputs and Traffic Analysis Results

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI