Next Article in Journal
Evaluation of SLAM Methods for Small-Scale Autonomous Racing Vehicles
Previous Article in Journal
Phase-Field Simulation of Bubble Evolution and Heat Transfer in Microchannels Under Subcooled and Saturated Flow Boiling
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Proceeding Paper

Urban Mobility Modeling: Application to Seoul Bike-Sharing Data †

1
Euromed Polytechnic School, Euromed University of Fes, Fez 30030, Morocco
2
Artificial Intelligence Research and Applications Laboratory (AIRA), Faculty of Science and Technology, Hassan First University, Settat 26000, Morocco
*
Author to whom correspondence should be addressed.
Presented at the 7th edition of the International Conference on Advanced Technologies for Humanity (ICATH 2025), Kenitra, Morocco, 9–11 July 2025.
Eng. Proc. 2025, 112(1), 51; https://doi.org/10.3390/engproc2025112051
Published: 27 October 2025

Abstract

This study applies a model from the normal variance–mean mixture family to capture daily demand in urban bike sharing. We fit both a mixture-based model and a standard Gaussian model to the logarithmic returns of total daily rental counts from the Seoul Bike-Sharing Demand dataset. Parameter estimation is performed, and model performance is assessed using mean squared error (MSE). Using one year of hourly rental data aggregated to daily counts from the Seoul Bike dataset, we find that the mixture-based model substantially outperforms the Gaussian counterpart, achieving a lower MSE. These results suggest that models from the normal variance–mean mixture family are more effective at capturing the large fluctuations and outliers inherent in bike-sharing demand data compared to models assuming normally distributed returns.

1. Introduction

Bike-sharing systems have been widely implemented across many countries as part of urban strategies to encourage sustainable transportation. In South Korea, the capital city of Seoul introduced its public bike-sharing program locally named Ddareungi and commonly referred to as Seoul Bike in 2015. The initiative aimed to alleviate urban issues such as increasing fuel costs, traffic congestion, and environmental pollution, while also encouraging a more active and healthier lifestyle for citizens. Initially launched along the Han River in October 2015, the program quickly expanded, with approximately 150 stations and 1500 bikes in operation within the first few months. By 2016, the system experienced substantial growth, eventually offering more than 20,000 bikes across Seoul. Today, with the aid of digital infrastructure, over 1500 rental stations operate city-wide, accessible 24/7 via mobile apps and internet-connected devices that provide real-time bike availability. Cycling has become a key urban transport mode due to safety concerns with public transit, changing travel patterns, and remote work. It supports sustainability by cutting emissions, improving health, and enhancing public transport integration [1,2]. Predicting cycling demand is crucial for effective planning and resource allocation, ensuring these systems meet urban mobility goals efficiently.
Given the scale and dynamic nature of Seoul’s bike-sharing program, accurate modeling of rental demand is essential for operational efficiency and policy planning. In this study, we apply a flexible probabilistic framework to model the daily variability in bike rental demand. Specifically, we focus on the logarithmic returns of daily rental counts, aggregated from hourly usage data in the Seoul Bike-Sharing Demand dataset. To capture the heavy tails and irregular fluctuations often observed in such demand patterns due to factors like weather, holidays, and local events, we employ a model from the normal variance–mean mixture family. This class of distributions generalizes the normal model by allowing for skewness and heavy tails, which are common in real-world data but inadequately captured by Gaussian assumptions [3].
We estimate the parameters of the mixture model using regression on quantiles, which provides a data-driven approach to identifying the structure of variability across different points in the distribution. Model performance is assessed by comparing the fitted series to a Gaussian benchmark using mean squared error (MSE). The results indicate that the mixture-based model yields a considerably better fit, effectively reflecting the variation and extremes observed in the daily demand series. These findings highlight the relevance of flexible, non-Gaussian modeling frameworks in transportation analytic, particularly those enhanced by machine learning and deep learning, as demonstrated in recent lead time prediction models developed for industrial systems [4], and deep anomaly detection methods using spatio-temporal clustering and congestion pattern analysis in urban mobility contexts [5].
The paper is divided into sections, with Section 2 providing an overview of the concepts of the variance–mean mixture framework and its statistical properties. Section 3 describes the dataset, preprocessing methods, estimation procedures, and empirical results. Finally, Section 4 concludes with a discussion of our findings and directions for future research.

2. The Normal Variance–Mean Mixture Model

Normal variance–mean mixture models provide a flexible framework for modeling data with skewness and heavy tails, extending the normal distribution by allowing its mean and variance to vary according to a latent mixing distribution. A prominent example within this class is the Normal Tempered Stable (NTS) model, introduced by [6]. The NTS model arises when the normal distribution is mixed with a tempered stable distribution, yielding a rich structure that can capture both asymmetry and heavy-tailed behavior. From a stochastic process perspective, it is equivalent to a Brownian motion modified by a tempered stable subordinator (for more details on tempered stable distributions (see [7,8,9,10,11,12]). Due to its flexibility and mathematical tractability, the NTS distribution has been successfully applied in a range of financial modeling tasks, as shown in the works of [3,13,14,15].
In essence, the tempered stable distribution is a variant of the stable distribution, adjusted through an exponential transformation, a concept presented by [6], page 3. Let us consider a tempered α -stable random variable denoted as T, with α being the stability index in the range of (0,1). The Laplace transform of T is given for all values of θ less than γ 1 α 2 , as follows:
E ( e θ T ) = exp δ γ 1 1 2 γ 1 α θ α ,
where δ > 0 and γ > 0 (see [6] page 6). Without loss of generality, we take δ γ = 1 and we set λ = 2 γ 1 α . In what follows, we denote by π α , λ the distribution of T. Consider now a standard normal random variable Z N ( 0 , 1 ) independent of T. The normal tempered stable model describes the distribution of the random variable Y, which can be represented as Y = d T Z , where = d signifies equality in distribution. You can refer to [6] for further details on this concept, as mentioned on page 7 of their work. We denote by f α , λ the density function of Y. Please note that when y tends to + , the density function f α , λ ( y ) has the following approximation:
f α , λ ( y ) = e 2 π 0 + e y 2 2 x x π α , λ ( d x ) c α , λ y α 1 e 2 λ y ,
where c α , λ = 2 α + 1 e 2 / λ α 2 + 1 4 Γ ( α + 1 ) Γ ( α ) Γ ( 1 α ) (For more detail, we refer to [6] pages 7 and 8).

2.1. Estimation

In this section, we employ a novel parameter estimation method, which we have termed “the tail regression method”, for parameter estimation of α and λ of the normal tempered stable model. This is achieved through the tail regression method, which is based on an approximation of the cumulative distribution of the response variable. This approximation establishes a linear relationship between certain transformations of the quantiles and the cumulative distribution function. The unknown parameters are then estimated using the least squares method.
We need first to approximate the Normal Tempered Stable distribution’s cumulative function, expressed as
F Y ( z ) = P ( Y z ) = z f α , λ ( y ) d y , z R .
The Equation (1) and the Hôpital’s rule imply that the survival function
F ¯ Y ( z ) = 1 F Y ( z ) z + c α , λ y α 1 e 2 λ y d y ,
when z + . By applying integration by parts, we obtain
z + c α , λ y α 1 e 2 λ y d y = c α , λ z α 1 λ 2 e 2 λ z c α , λ ( α + 1 ) λ 2 z + y α 2 e 2 λ y d y .
Let K ( z ) = ( α + 1 ) z + y α 2 e 2 λ y d y z α 1 e 2 λ z . Since
0 z + e 2 λ y y α + 2 d y z + e 2 λ y z α + 2 d y = λ 2 e 2 λ z z α + 2 ,
then we obtain 0 K ( z ) λ 2 z 1 . This implies that K ( z ) 0 ,   when   z + . Hence z + c α , λ y α 1 e 2 λ y d y c α , λ z α 1 λ 2 e 2 λ z , when z + . This, with (2), gives us the following approximation of the normal tempered stable cumulative function
F Y ( z ) 1 c α , λ z α 1 λ 2 e 2 λ z ,   when   z + .
Let Y 1 , Y 2 , , Y k be k copies of the normal tempered stable random variable Y and let z 1 , z 2 , , z n be n points on the tail of the empirical cumulative function. According to the law of large numbers, for k large enough, we have the following approximation f i = 1 k j = 1 k 1 { Y j > z i } F ¯ Y ( z i ) . This together with (3) imply that for all i { 1 , 2 , , n } ,
ln ( f i ) = ln c α , λ λ 2 2 λ z i ( α + 1 ) ln ( z i ) + ε i ,
where ε i is a random error variable. This implies that the empirical mean of ( ln ( f i ) ) 1 i n is given by
ln ( f ) ¯ = ln c α , λ λ 2 2 λ z ¯ ( α + 1 ) ln ( z ) ¯ + ε ¯ ,
where ln ( f ) ¯ = 1 n i = 1 n ln ( f i ) , z ¯ = 1 n i = 1 n z i , ln ( z ) ¯ = 1 n i = 1 n ln ( z i ) and ε ¯ = 1 n i = 1 n ε i . Therefore
ln ( f i ) ln ( f ) ¯ = 2 λ ( z i z ¯ ) ( α + 1 ) ( ln ( z i ) ln ( z ) ¯ ) + ( ε i ε ¯ ) ,
for all i { 1 , 2 , , n } . Let X = ln ( z 1 ) ln ( z ) ¯ z 1 z ¯ ln ( z 2 ) ln ( z ) ¯ z 2 z ¯ ln ( z n ) ln ( z ) ¯ z n z ¯ , Y = ln ( f 1 ) ln ( f ) ¯ ln ( f n ) ln ( f ) ¯ , ϵ = ε 1 ε ¯ ε n ε ¯ and θ = α 1 2 / λ , we obtain the following linear regression model
Y = X θ + ϵ .
Using the least squares method (For more details about this method, please see [12], Page 155), we deduce that the estimator θ ^ = ( X T X ) 1 X T Y , where X T is the transpose matrix of X . Since θ = α 1 2 / λ , then by projecting θ on the canonical basis e 1 = ( 1 , 0 ) and e 2 = ( 0 , 1 ) of R 2 , we obtain θ , e 1 = α 1 and θ , e 2 = 2 / λ . It follows that the estimators α ^ and λ ^ of α and λ are given by
α ^ = θ ^ , e 1 1 ,   and   λ ^ = 2 θ ^ , e 2 2 .

2.2. Numerical Illustration

In this section, we present an algorithm for simulating the normal tempered stable model and provide a numerical demonstration of the estimated parameters α and λ .
Our approach begins with simulating the standard stable distribution, as outlined by [16], page 20, which allows us to approximate the density of the standard stable model. Given that the tempered stable distribution can be viewed as a stable distribution altered through exponential tempering (as indicated in the work of [6], page 3), we proceed to approximate the density of the tempered stable model. Using this approximation, we derive an estimate of the cumulative distribution function for the tempered stable distribution. By employing the inverse transform sampling method, we generate simulations of the tempered stable model. Figure 1 provides a visual depiction of the tempered stable distribution curves, showcasing various values of α and λ .
Suppose we have a set of n independent observations from the tempered stable distribution, denoted as x 1 , x 2 , , x n . Utilizing the Monte Carlo integration technique (for a comprehensive understanding of this method, refer to [17]) and the expression for the normal tempered stable density provided in Equation (1), we obtain the following result:
f α , λ ( y ) = lim n + 1 n i = 1 n e 2 π x i e y 2 2 x i .
As a result, the application of the method of the inverse transform sampling enables us to generate simulations as desired. Figure 2 illustrates the curves of the normal tempered stable distribution, showcasing various combinations of α and λ values.
Next, we proceed to generate a dataset of 1000 observations from the normal tempered stable distribution. Subsequently, we select 1000 data points from the upper tail of the empirical cumulative distribution function, ensuring that they satisfy the condition F Y ( z i ) > 0.8 , where i ranges from 1 to 1000. The estimation of θ ^ leads us to determine the ones of α ^ and λ ^ . Then, we compute the estimators α ^ and λ ^ , and we consider 1000 samples of these estimators. Figure 3 represents the histograms of the estimators α ^ and λ ^ .
It is worth noting that the histograms of the estimators, α ^ and λ ^ , exhibit a normal distribution-like shape. To assess the performance of the estimated parameters, α ^ and λ ^ , we employ the mean squared error (MSE), defined as: M S E ( α ^ ) = 1 K i = 1 K ( α ^ i α ) 2 and M S E ( λ ^ ) = 1 K i = 1 K ( λ ^ i λ ) 2 , We also calculate their coefficients of variation (CV), defined as:
C V ( α ^ ) = M S E ( α ^ ) m e a n ( α ^ )   and   C V ( λ ^ ) = M S E ( λ ^ ) m e a n ( λ ^ ) ,
where m e a n ( α ^ ) = 1 K i = 1 K α ^ i and m e a n ( λ ^ ) = 1 K i = 1 K λ ^ i . The results of our simulations are given in Table 1.
Subsequently, we present the plots depicting the mean squared errors (MSEs) of α ^ and λ ^ under two scenarios: one with α = 0.2 and λ = 1 , and the other with α = 0.5 and λ = 1 .
From the data presented in Figure 4, we observe that the mean squared error for α remains below 5 × 10 3 for the case of α = 0.2 ;   λ = 1 , and below 4 × 10 3 for α = 0.5 ;   λ = 1 . Additionally, the mean squared error for λ is less than 6.5 × 10 3 .

3. Modeling Bike-Sharing Data

The dataset comprises the count of total rental bikes in Seoul [18]. To facilitate our analysis, we introduce this Log Return transformation
S i = log ( R i / R i 1 ) ,
where R i is the count of total rental bikes in day i Based on the data S i , we utilize the tailed regression model to estimate the parameters α and λ of the normal tempered stable law. Additionally, to facilitate a comparison with the normal distribution, we perform parameter estimation for the normal distribution, namely the mean μ and the variance σ 2 , using the same dataset. The results of our estimations are summarized in Table 2:
To enable a meaningful comparison between the dataset and the simulated normal distribution (characterized by estimated parameters μ ^ and σ 2 ^ ), as well as the simulated normal tempered stable distribution (with estimated parameters α ^ and λ ^ ), we provide the following visual representation. This Figure 5 illustrates the curves corresponding to the dataset, the simulated normal tempered stable distribution, and the simulated normal distribution.
In order to compute the errors between the simulated models and the data, we calculate the Mean Squared Error (MSE) in Table 3 to quantify the errors between the simulations and the actual data:
Based on the analysis of Table 3, it can be inferred that the normal tempered stable (NTS) model outperforms the normal distribution in handling the Log Return data. Therefore, by employing a simulation of the NTS distribution denoted as S N T S , we can substitute the Log Return values with the simulated values from S N T S . This presents us with the chance to establish the relationship between R i and R i 1 in both directions. Specifically, we have the following relations:
R i = R i 1 exp ( S i ) R i 1 exp ( S N T S )

4. Conclusions

This paper introduces the Normal Tempered Stable (NTS) distribution for modeling daily bike-sharing demand, offering a robust alternative to Gaussian models by capturing heavy-tailed fluctuations and volatility. We derive its theoretical properties for demand log-returns and propose a novel tail-regression estimation method. Empirical tests on Seoul Bike-Sharing data show the NTS model outperforms Gaussian models in forecasting accuracy, particularly for extreme demand spikes. Key advantages include improved mean squared error (MSE) and better volatility prediction. Future work may extend the model to hourly demand, integrate weather/demographic factors, and test it across other cities and transport modes to enhance generalizability.

Author Contributions

The authors F.M., M.F. and N.R., have contributed in equal measure to this research. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data will be made available on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. El Amrani, A.M.; Fri, M.; Benmoussa, O.; Rouky, N. The integration of urban freight in public transportation: A systematic literature review. Sustainability 2024, 16, 5286. [Google Scholar] [CrossRef]
  2. Benjdiya, O.; Rouky, N.; Benmoussa, O.; Fri, M. On the use of machine learning techniques and discrete choice models in mode choice analysis. LogForum 2023, 19, 321–336. [Google Scholar] [CrossRef]
  3. Charfi, S.; Mselmi, F. Modeling exchange rate volatility: Application of GARCH models with a Normal Tempered Stable distribution. Quant. Financ. Econ. 2022, 6, 206–222. [Google Scholar] [CrossRef]
  4. Fri, M. Lead time prediction using advanced deep learning approaches: A case study in the textile industry. LogForum 2024, 20, 145–155. [Google Scholar] [CrossRef]
  5. Oucheikh, R.; Fri, M.; Fedouaki, F.; Hain, M. Deep anomaly detector based on spatio-temporal clustering for connected autonomous vehicles. In Proceedings of the International Conference on Ad Hoc Networks, Paris, France, 17 November 2020; pp. 201–212. [Google Scholar]
  6. Barndorff-Nielsen, O.E.; Shephard, N. Normal modified stable processes. Theor. Probab. Math. Stat. 2001, 65, 1–19. [Google Scholar]
  7. Bianchi, M.L.; Tassinari, G.L. Estimation for multivariate normal rapidly decreasing tempered stable distributions. J. Stat. Comput. Simul. 2024, 94, 103–125. [Google Scholar] [CrossRef]
  8. Xia, Y.; Grabchak, M. Estimation and simulation for multivariate tempered stable distributions. J. Stat. Comput. Simul. 2022, 92, 451–475. [Google Scholar] [CrossRef]
  9. Baeumer, B.; Kovács, M. Approximating multivariate tempered stable processes. J. Appl. Probab. 2012, 49, 167–183. [Google Scholar] [CrossRef]
  10. Mselmi, F. Lévy processes time-changed by the first-exit time of the inverse Gaussian subordinator. Filomat 2018, 32, 2545–2552. [Google Scholar] [CrossRef]
  11. Mselmi, F. Characterization of the inverse stable subordinator. Stat. Probab. Lett. 2018, 140, 37–43. [Google Scholar] [CrossRef]
  12. Rencher, A.C.; Christensen, W.F. Methods of Multivariate Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2012. [Google Scholar]
  13. Bianchi, M.L.; Tassinari, G.L.; Fabozzi, F.J. Riding with the four horsemen and the multivariate normal tempered stable model. Int. J. Theor. Appl. Financ. 2016, 19, 1650027. [Google Scholar] [CrossRef]
  14. Sabino, P. Normal tempered stable processes and the pricing of energy derivatives. SIAM J. Financ. Math. 2023, 14, 99–126. [Google Scholar] [CrossRef]
  15. Sabino, P. Exact simulation of normal tempered stable processes of OU type with applications. Stat. Comput. 2022, 32, 81. [Google Scholar] [CrossRef]
  16. Nolan, J.P. Stable Distributions: Models for Heavy Tailed Data; Birkhäuser: Boston, MA, USA, 2008. [Google Scholar]
  17. Geweke, J. Monte Carlo simulation and numerical integration. Handb. Comput. Econ. 1996, 1, 731–800. [Google Scholar]
  18. UCI Machine Learning Repository. Bike Sharing. Available online: https://archive.ics.uci.edu/dataset/275/bike+sharing+dataset (accessed on 25 June 2025).
Figure 1. Curves of the tempered stable distribution for different values of α and λ .
Figure 1. Curves of the tempered stable distribution for different values of α and λ .
Engproc 112 00051 g001
Figure 2. Curves of the normal tempered stable distribution for different values of α and λ .
Figure 2. Curves of the normal tempered stable distribution for different values of α and λ .
Engproc 112 00051 g002
Figure 3. Histograms of α ^ and λ ^ .
Figure 3. Histograms of α ^ and λ ^ .
Engproc 112 00051 g003
Figure 4. Curves of the M S E ( α ^ ) and M S E ( λ ^ ) when { α = 0.2 ,   λ = 1 } and { α = 0.5 ,   λ = 1 } .
Figure 4. Curves of the M S E ( α ^ ) and M S E ( λ ^ ) when { α = 0.2 ,   λ = 1 } and { α = 0.5 ,   λ = 1 } .
Engproc 112 00051 g004
Figure 5. The curve of the Log Return data compared with the simulated normal tempered stable distribution, and the simulated normal distribution.
Figure 5. The curve of the Log Return data compared with the simulated normal tempered stable distribution, and the simulated normal distribution.
Engproc 112 00051 g005
Table 1. An examination of the estimators α ^ and λ ^ , along with their mean squared errors (MSEs) and coefficient of variations (CVs).
Table 1. An examination of the estimators α ^ and λ ^ , along with their mean squared errors (MSEs) and coefficient of variations (CVs).
α λ α ^ λ ^ MSE α ^ MSE λ ^ CV α ^ CV λ ^
0.2 1 0.1875 1.0699 0.0016 0.0049 0.2147 0.0653
0.2 2 0.1468 2.1259 0.0041 0.0162 0.4412 0.0598
0.5 1 0.4951 1.0745 0.0010 0.0056 0.0630 0.0698
0.5 2 0.4626 2.1551 0.0022 0.0240 0.1031 0.0719
0.7 1 0.6876 1.0340 0.0015 0.0012 0.0534 0.0336
0.7 2 0.6707 2.0912 0.0018 0.0086 0.0638 0.0440
Table 2. Estimating the parameters for the Log Return data by utilizing both the normal and normal tempered stable models.
Table 2. Estimating the parameters for the Log Return data by utilizing both the normal and normal tempered stable models.
Parameters μ ^ σ 2 ^ α ^ λ ^
Normal model 1.1504 × 10 4 0.4253
Normal tempered stable model 0.4558 0.3968
Table 3. MSE between the Log return data and the normal distribution, and the normal tempered stable distribution.
Table 3. MSE between the Log return data and the normal distribution, and the normal tempered stable distribution.
Normal ModelNormal Tempered Stable Model
MSE0.0130020.0076658
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Mselmi, F.; Fri, M.; Rouky, N. Urban Mobility Modeling: Application to Seoul Bike-Sharing Data. Eng. Proc. 2025, 112, 51. https://doi.org/10.3390/engproc2025112051

AMA Style

Mselmi F, Fri M, Rouky N. Urban Mobility Modeling: Application to Seoul Bike-Sharing Data. Engineering Proceedings. 2025; 112(1):51. https://doi.org/10.3390/engproc2025112051

Chicago/Turabian Style

Mselmi, Farouk, Mouhsene Fri, and Naoufal Rouky. 2025. "Urban Mobility Modeling: Application to Seoul Bike-Sharing Data" Engineering Proceedings 112, no. 1: 51. https://doi.org/10.3390/engproc2025112051

APA Style

Mselmi, F., Fri, M., & Rouky, N. (2025). Urban Mobility Modeling: Application to Seoul Bike-Sharing Data. Engineering Proceedings, 112(1), 51. https://doi.org/10.3390/engproc2025112051

Article Metrics

Back to TopTop