Applying Spectral Clustering to Decode Mobility Patterns in Athens, Greece

Andrinopoulou, Eirini; Tzouras, Panagiotis G.

doi:10.3390/app15073419

Open AccessArticle

Applying Spectral Clustering to Decode Mobility Patterns in Athens, Greece

by

Eirini Andrinopoulou

and

Panagiotis G. Tzouras

^*

Department of Infrastructure and Rural Development, School of Rural, Surveying and Geoinformatics Engineering, National Technical University of Athens, Iroon Politechneiou 9, 15780 Zografou, Greece

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(7), 3419; https://doi.org/10.3390/app15073419

Submission received: 29 January 2025 / Revised: 12 March 2025 / Accepted: 18 March 2025 / Published: 21 March 2025

(This article belongs to the Special Issue Sustainable Urban Mobility)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The limited availability of mobility data makes it challenging to model demand, especially its spatiotemporal variations. Simultaneously, traditional transport modeling tools often rely on less disaggregated approaches, leading to gaps in understanding. To overcome these limitations, this study introduces the spectral clustering method to uncover major demand patterns considering various transport modes. It focuses on Athens, Greece, and utilizes a set of 1347 reported trips. The study reveals six distinct trip clusters. The first group, “an evening stroll nearby”, captures short distance tours typically undertaken by walking. The second cluster, “my work is nearby but I use my car” highlights a significant trend where individuals with short commutes, less than 6 km, predominantly use private cars. The third cluster, “commuting by metro”, features long-distance trips primarily for work. The fourth cluster reveals long-distance work-related trips with private cars, favored by active residents with high income. The fifth pattern, “trips of young people”, involves midnight recreational and moderate-distance morning trips for education, with an increased usage of public transport. The sixth cluster concerns short distance tours for various activities. The findings indicate that the second cluster’s high reliance on private cars for short trips is problematic. Reducing this reliance should be a priority for policymakers.

Keywords:

spectral clustering; mode choice; travel pattern; urban mobility; trip attributes

1. Introduction

Climate change, rapid urbanization, and of course emerging social inequalities are some of the present issues which challenge the efficiency and sustainability of transport networks and cities [1,2]. Mobility has become a major factor in energy consumption, CO₂ emissions, air pollution, traffic noise in urban areas [3,4,5], and even in the spread of viruses like COVID-19 during the pandemic [6,7,8]. Nevertheless, mobility remains still an unknown and unpredictable factor [9]. This impacts the effectiveness of policies, such as road pricing or parking restriction, as well as the efficiency of new promising and innovative modes that shared mobility encloses, are often not assessed in advance but rather after their implementation [10].

The scarcity of mobility data, such as travel diaries, is one of the major reasons why travel behavior is an x-factor in many cities in both the developed and developing world. For example, in Athens, Greece, the last Origin-Destination (OD) survey in Athens was conducted by Transport for Athens in 2006, when the first metro lines were constructed [11]. Since then, systematic data about transport demand has not been recorded again by any official body. While some municipalities collect travel data to support their plans, access to these data are often restricted due to privacy concerns [12]. The same issue arises with platforms and services that handle large datasets, such as GPS trajectory data and location posts on social media. Simultaneously, traditional transport modeling tools utilized a less disaggregated approach, often overlooking significant determinants of travel demand and their interactions, such as land use changes, socio-demographic characteristics, spatial inequalities, and economic shifts. The classic four-stage model was indeed more static than dynamic [13]. Rasouli and Timmermans [14] have mentioned that integrity, interdependencies, high spatiotemporal aggregation, and behavioral basis are the main drawback of classical modeling techniques. Sallard et al. [15] argue that four-stage models fail to account for individuals’ decisions and their interactions that reshape travel behavior. Heinen [16] have shown that self-determination as a user of a particular transport service was found to increase the likelihood of using it. Individual “mobility identity” does not only explain the transport mode choice for a single trip but also the frequency of using a transport mode.

Before heavily criticizing past models, it is important to consider that computational power was limited in the past. This made it challenging to process and analyze the large datasets required for disaggregate models. Additionally, artificial intelligence and advanced machine learning algorithms were not available at the time, so that complicated interrelationships among variables of transport demand can be handled and modeled. Various advanced analytical methods are now applied in transport science to create synthetic populations and better predict “the unpredictable” travel demand. Factor Analysis and Analysis of Variance (ANOVA), as modern statistical techniques, have been used to eliminate the multiple dimensions of mobility [17]. Iterative Proportional Fitting has been employed in developing synthetic populations and activity-based models [18]. Monte Carlo Simulation methods were utilized to account for heterogeneity in traveler behavior and the factors influencing it [19]. Hierarchical models that integrate graph-theoretical and combinatorial optimization concepts have been contributed to reconstructing trip sequences from OD matrices in cities with significant data limitations [20]. Hidden Markov Models methods have been applied to forecast travel behavior [21], while neural Networks are now used for the deep generative modeling of synthetic travelers [22]. Nevertheless, a recent challenge that has emerged is not the accuracy of projections—which has generally improved—but the interpretability of the numerous and complex transport models.

Clustering is a type of unsupervised learning in which objects are grouped based on their inherent similarities. Various clustering techniques exist, including hierarchical, partitional, grid-based, density-based, and model-based approaches [23]. In travel behavior research, cluster analyses have been applied to group passengers with similar preferences and needs [24]. The most highly used method is the Latent Class Cluster Analysis (LCCA), which is ideal when the imported dataset consists solely of categorical variables. For instance, Muchlisin et al. [25] utilized LCCA on a survey to explore how variations in trip patterns are influenced by socio-demographics, household characteristics, and travel-related attitudes. Similarly, Soza-Parra and Cats applied Latent Profile Analysis to identify socio-demographic groups with a high dependence on cars [26]. However, grouping travel plans is quite complex due to the inclusion of both continuous and categorical (or dummy) variables in the collected datasets. Allahviranloo et al. [27] found clustering techniques effective in identifying significant similarities among the plans of social groups. Indeed, using a k-means clustering method, they achieved to identify popular travel schedules. Hafezi et al. [28,29] proposed an activity scheduler by applying a novel fuzzy C-Means clustering method. In this approach, each data point is assigned a probability of belonging to multiple clusters, which leads to the formation of more homogeneous activity patterns. Although, there is much research on clustering travelers and daily travel schedules, no study has yet concentrated on trips.

This study assumes that trip patterns are associated with noticeable repetitions that can be utilized to overcome data availability limitations appearing in Athens Metropolitan Area (AMA), Greece. Moreover, it hypothesizes that these repetitions are correlated with socio-demographic characteristics. This starting hypothesis is reasonable, as it has already been confirmed by Susilo and Axhausen [30]. Therefore, the objective of this research is to identify key demand clusters for each transport mode by applying a flexible and scalable clustering method, called Spectral Clustering. To do so, trips collected by a recent revealed preferences survey are classified into clusters and their dimensions are analyzed in-depth. The developed tools are also useful for training and improving an open-source package called AthensPop (version 0.1) (AthensPop GitHub repository: https://github.com/Theodore-Chatziioannou/athenspop (accessed on 28 February 2025) taking into account key demand repetitions. AthensPop regenerates synthetic disaggregate demand by importing small sets of travel diaries that can be quickly collected in Athens. The paper is structured as follows: the overall methodological framework is given in Section 2, and the results are presented and discussed in Section 3, before ending with some valid conclusions and recommendations in the last section.

2. Data and Methods

The analysis is based on 1327 reported trips, examined using Spectral Clustering. The trips exhibit four key dimensions: purpose, transport mode, departure time, and distance. These attributes are processed before running the cluster analysis. The interpretation of clusters is later performed by estimating descriptive statistics of the socio-demographic characteristics of travelers (the “trip owners”) included in each cluster.

2.1. Spectral Clustering Analysis

In k-means clustering, the distances between homogeneous groups can be calculated by determining the distance of each examined item from the center of each cluster. The center of each cluster is represented by a vector of means—called the cluster center—and corresponds to the variables used for grouping the candidates. Each examined item is assigned to a cluster by calculating the distance between the item and each cluster and assigning it to the nearest cluster. K-mean makes strong assumptions from the beginning of the estimation process. Yet, the application of the k-means cluster analysis seems to not be effective in high-dimensional problem due to the increased sparsity of data. At the same time, hierarchical clustering techniques cannot be seen as an alternative that ensures higher flexibility. Indeed, these techniques require much time and computational storage [23]. Therefore, it is essential to precede to a dimensionality reduction process, such as the technique of spectral clustering.

Spectral clustering is a graph-based clustering method derived from graph theory. This technique transforms the data into a spectral domain. Spectral domain refers to the space where data points are represented using eigenvalues and eigenvectors. This “space transformation” makes clusters more distinguishable. Spectral clustering groups observations based on the graph connectivity, which refers to the “strength” of relationships between data points. In this analysis, the measure of “strength” is the similarity of two data points (i.e., trips). This is beneficial because many real-world datasets do not conform to simple spherical distributions (like in k-means clustering), presenting complex mathematical challenges. In graph-based representation, nodes represent data points connected by edges, which often represent the similarity level [31]. Spectral clustering excels in handling data that is non-linearly separable and does not conform to specific cluster shapes, making it ideal for data dispersed in multiple directions. That is why this method is characterized by its flexibility. Although computationally demanding, this method prevents the merging of discrete clusters that lack significant similarities [32]. According also to Saxena et al. [23], compared to k-means and DBSCAN—a density-based clustering technique—spectral clustering offers greater scalability. This attribute is particularly useful in this case, as the size of the trip dataset can vary significantly, with additional trips continuously being added. A significant challenge of this method is the interpretation of the output clusters. Although spectral clustering is a flexible method, it does not guarantee high interpretability [23]. This means that the applicability of this method cannot be evaluated beforehand.

This technique includes the following steps: The first step of the algorithm is the calculation of distances between observations based on Manhattan distance (see Equation (1)). The Manhattan distance between two points is the sum of the absolute differences in their coordinates. It works better than the Euclidean distance, when data contain many zeros (e.g., in dummy coding schemes) or small values. Yet, it treats the continuous variables in a very “linear” way. In this case, the most important continuous variable is trip distance, which can reasonably be assumed to have a more linear impact on travel choices. What is more, Manhattan distance is a less sensitive to outliers and datasets with much noise. It does not increase the computational cost. Nevertheless, this distance metric may not be able to capture complex interaction between dummy and continuous variables.

d (p, q) = \sum_{i = 1}^{n} |p_{i} - q_{i}|

(1)

where

d (p, q)

: the Manhattan distance between trip p and q,

n

: the total dimensions included in the set, (trip purpose, mode, departure time and distance)

p_{i}

: coordinate of trip p in dimension i.

q_{i}

: coordinate of trip q in dimension i.

After creating the distance matrix, the computation of the similarity matrix follows. To do so, the Gaussian similarity function is utilized (see Equation (2)). This mathematical process requires the definition the parameter σ. This parameter gives the similarity scale between points and determines the influence that one point has on others in terms of similarity. A common practice for calculating σ is to use the median of the distances between all pairs of points in the dataset. This method provides a good balance as the median represents a central tendency of the distances, avoiding extreme values that could distort the scale of similarity.

S (p, q) = \exp (- \frac{d {(p, q)}^{2}}{2 σ^{2}})

(2)

where

S (p, q)

: similarity level between trip p and q,

σ

: scale parameter; in this case, it is equal to the median of Manhattan distances

In the third step, the Laplacian matrix is constructed. Initially, a degree matrix was created by calculating the sum of each row of the similarity matrix and placing these sums on the diagonal of a diagonal matrix. The degree matrix is a diagonal matrix where each diagonal element represents the degree of point p, i.e., the sum of the similarity values of all edges connected to this point:

d_{p} = \sum_{q = 1}^{Q} S (p, q)

(3)

where

d_{p}

: the degree of trip p,

Q: the total trips included in the set

The next step is computing the Laplacian matrix by subtracting the similarity matrix from the degree matrix, as shown below:

L = D - S

(4)

where

D: the degree matrix

S: the similarity matrix

L: the Laplacian matrix

Through the Laplacian matrix, the eigenvalues and eigenvectors of the matrix were calculated. The Laplacian matrix is symmetric, ensuring optimal performance of the calculations. From the set of eigenvectors, the first (smallest) k eigenvectors were selected, where k is the number of desired clusters. The eigenvectors were chosen based on their eigenvalues and used to transform the data into a new space with fewer dimensions. This transformation ensures that closely situated points fall within the same cluster, while those distanced from each other are categorized into separate clusters. Subsequently, the rows of the matrix containing the selected eigenvectors were normalized so that each row has unit length. This assists in preparing the data for clustering. Finally, the normalized data were clustered using the k-means method, as presented above. The algorithm was executed on the normalized data with centers being the selected k smallest eigenvectors of the Laplacian matrix.

Both the Gap Statistic Method and Silhouette scores were utilized to determine the optimal number of clusters and to iterate the process as needed. In the first method, this achieved by comparing the within-cluster variation in the actual data to the expected variation from a null reference distribution. This distribution represents random data without any inherent structure [33]. In the second method, the selection of the optimal number of clusters is performed by estimating the Silhouette score [34]. Equation (5) gives its formula. A rate close to +1 indicates that the data point is well placed within its cluster, while a score close to −1 suggests that the data point is misclassified.

S L (q) = \frac{β (q) - a (q)}{\max (a (q), β (q))}

(5)

where

S L (q)

: the Silhouette score of trip q,

a (q)

: the average distance (similarity level) between a point q and all the points in the same cluster

β (q)

: the average distance (similarity level) between a point q and all the points in the nearest different cluster.

2.2. Data Collection and Processing

The data used to conduct this research were collected from a revealed preferences online survey distributed by the Hellenic Broadcasting Corporation (ERT) websites and radio frequencies in 2022. The number of survey participants was 513 coming from almost all zones of AMA (see the maps in Appendix A). The innovation of the data collection process lies in the distribution strategy, as these channels are accessed by a diverse group of people. The survey was anonymous, and respondents were not required to provide all their socio-demographic information. Consent was obtained from all participants in the research.

Initially, participants responded to questions about demographic and socio-economic content. In more detail, they were asked about their gender, age, level of education, employment status, income category, residential zone, and whether they own a private car. Subsequently, respondents were asked to describe five trips they made on a typical workday. For each trip, the purpose of the trip, the departure time, the mode of transport, and the destination zone were to be mentioned. The available mode options were car (driver or passenger), taxi, bus (or trolley bus), train (metro, tram, and suburban railway), motorcycle, bicycle (or e-bike), walking, and e-scooter. The purposes of the trips could be work, returning home, education, shopping, recreation, service, and other. The travel time was given in a 24 h timeframe.

AMA was divided into 36 zones, with one of these zones referring to transitioning outside of the study. Through this zoning system, respondents reported their residential zone without further specifying their location. A centroid per zone was defined, so that the travel distances can be estimated. In the measurement process, a Euclidean distance multiplied by 1.3 was used. In other words, the network distance in Athens is 30% larger than the Euclidean distance. The parameter value is based on a mean factor derived from previous accessibility analyses conducted in the same city [35]. Another assumption that was made in the data collection is that the first trip of the day starts from home. It is also assumed that the destination zone of trip i is the origin zone of trip i + 1, creating a chain of activities.

In this analysis, each trip comprised a separate and independent observation. The final dataset contained 1347 trips. Table 1 gives a summary of the variables that are considered in two different stages of the data analysis process. The variables that imported in the spectral clustering were the travel distance (from centroid to centroid), the start time of the trip, the transport mode, as well as the trip purpose. These constitute the main attributes of trips. The data were processed by applying a dummy coding scheme in all the categorical variables. This means that for each level of each categorical variable, one variable with binary format (0 or 1) was created in the dataset. This approach enables the exploration of potential non-linear relationships between categories that cannot be naturally ordered (e.g., transport mode or trip purpose). Socio-demographic characteristics are not imported in the cluster analysis, but they were utilized later to interpret the clusters. A Chi-squared test is later conducted to investigate potential statistically significant dependencies between clusters and socio-demographic characteristics. In this process, dummy coding is applied to both socio-demographic characteristics and trip clusters. At this point, it should be mentioned that socio-demographic attributes describe the trip ‘owner’ and not the trip itself.

The data analysis with spectral clustering was conducted in Python, using the following open-source packages: NumPy (version 1.21.5), Scikit-learn (version 1.0.2), Matplotlib (version 3.5.2) SciPy (version 1.9.1) and Panda (version 1.4.4). The computations were carried out on a system equipped with an Intel i8 processor and 16 GB of RAM.

3. Results

From the clustering process, 6 clusters emerged. The pie in Figure 1 gives the share of trips allocated in each cluster. As can be observed, the Cluster 2 and 4 report the highest percentage of trips of 39.0% and 28.4%, respectively. The sum of other clusters’ share does not exceed the 33% (i.e., approximately the 1/3 of the sample). These clusters represent distinct types of trips that are performed daily in Athens, Greece. It comprises an alternative way to classify trips and understand mobility.

The first cluster consists of 108 reported trips (8.02%), 42% of which appear to involve walking (see Figure 2). Indeed, 65% of the trips in this group are conducted either by walking or by bus, while 85% of the total trips are made by walking, bus, or trains. Thus, the majority of trips in this cluster are carried out by transport modes that are more “environmentally friendly.” Regarding the purpose of trips included in the first cluster, it is observed that 48% of them are for recreation. Furthermore, at least 75% of the trips begin during the afternoon and evening hours, between 16:00 and 22:00 (see Figure 3). The peak hour of these trips is at 21:00. Also, in this cluster, the travel distance shows a mean value of about 1687 m (see Figure 4). Also, 75% of the trips of the cluster have a distance of less than or equal to 5 km. Considering these attributes, it is concluded that the trips of the first cluster concern intra-zonal movements or movements to neighboring or adjacent zones of Athens.

The second cluster incorporates 525 trips, i.e., 38.98% of the total observations of the sample. Private cars dominate as the mode of transport, accounting for 47% of the total trips that are included. In other words, half of the trips in the second cluster are made by car. The bus contributes 13%, train 13%, bicycle 2%, and walking 17%. Thus, 40% of the trips are made by alternative means of transport. As for the purposes, it is observed that 35% of the movements are made for commuting to work, while a notable share also gathers for recreation travels (21%) (see Figure 2). The departure times of trips included in this cluster appear at the morning and afternoon peak hours, as can be observed in Figure 3. Specifically, at least 75% of the commutes of this group take place from 7:00 to 11:00 in the morning and from 13:00 to 19:00, when is the peak of this cluster. It turns out that the start times of 50% of the trips in the second cluster correspond approximately to those not covered by the first cluster. Additionally, it is evident in Figure 4 that the trip distances of the second cluster are similar to those of the first cluster. Indeed, they range between 1000 and 6000 m. In fact, 75% of the reported are less than 5 km long.

The third cluster consists of 121 trips (8.98%). Regarding the mode of transport, this cluster is characterized by trip that are 45% served by metro, tram or suburban railway (see Figure 2). Additionally, 14% of the trips are conducted using buses. So, around 60% of the trips are carried out using Public Transport. Of course, the remaining 30% of trips are made by cars (private cars or taxis). In this group, commuting to work accounts for 22% and recreation for 16%. The trip departure time in the third cluster, for around 75% of the trips, range between 7:00 and 10:00 and 16:00 to 20:00. Regarding the distance of the trips, 50% of the observations show distances between 5 and 10 km, as can be observed in Figure 4.

The fourth cluster consists of 383 trips (28.43%). The big majority of the commutes in the fourth cluster are made using cars, accounting for 70%. The main purpose of trips going to work, which accounts for 58%. The start times for 75% of the trips range between 6:00 and 17:00. The peak hour for this cluster is at 09:00 in the morning (see Figure 3). The mean trip distance in Cluster 4 is 12.95 km, and the distribution of this variable appears to follow a normal pattern, as illustrated in Figure 4.

The fifth cluster appears to gather 148 trips (10.99%), which are made using train services, buses, or walking, accounting for a total of 65% of the trips. Reasons for traveling are mainly education and recreation, as revealed in Figure 2. The start times of 75% of these journeys are between 0:00 and 12:00 (see Figure 3). The last comprise a unique characteristic of Cluster 5. The distances of the trips included in the fifth cluster have a wide range of values (see Figure 4).

Last but not least, the sixth cluster includes 62 trips (4.60%) and is dominated by the use of cars (private cars), accounting for around 45% and adding taxi use, the car usage rate touches 50% (see Figure 2). A significant percentage also gathers for public transport such as buses, trains, and walking (49% in total). In this cluster, all trip purposes appear with almost similar percentages. Additionally, the start times of these trips were between 8:00 and 12:00 in the morning. Yet, there is an interesting high peak at 11:00. The distance variable for 75% of the trips seems to take values from 1000 to 5000 m. Table 2 presents all the previously mentioned descriptive statistics.

Appendix A presents six maps illustrating the spatial distribution of trip origin locations for each cluster. The reveals some spatial patterns that have to be discussed. In Cluster 1, nearly the one-third of trips originate from the city center. Cluster 2 trips also show a strong presence in the city center and neighbor zones, with lower percentages in the suburbs. In contrast, cluster 3 highlights a high share of trips that start from zones, which are along metro line 3, particularly in the eastern parts of Athens. The opposite trend emerges in cluster 4, where a greater proportion of trips originate from the outskirts. Specifically, north zones concentrate one-quarter of the trips. Cluster 5 spatial patterns exhibit a strong alignment with metro line 1, showing similarities in the spatial distribution observed in cluster 3. Lastly, cluster 6 is similar to cluster 1, with central zones once again displaying the highest concentration of trip origins.

4. Discussion

4.1. Clusters’ Interpretation and Main Findings

The six clusters showed diverse trip attributes. Socio-demographic characteristics of the travelers, who reported these trips, should be considered at this point. Table 3 presents sociodemographic characteristics and respective frequencies per cluster. In addition, a chi-square test was conducted to investigate the significance of relationships among the dummy variables. Initially, the first cluster could be titled “an evening stroll nearby”, as it incorporates short-distance journeys performed by people who are probably not students, considering a 95% confidence interval. These trips predominantly occur during evening hours. Walking or riding the bus are the preferred transport modes. The second cluster that is the largest brought light about the commuting patterns of people whose job is in a close distance of their homes. A high and statistically significant share of them belong to the age group 50 to 65. They still prefer to travel by their private car, which is why the group is titled “my work is nearby but I use my car”. The third pattern is called “commuting by metro”. It concerns long-distance trips mainly to work performed by people who are not between 31 and 50 years old, as the chi-squared test revealed. A pattern that emerged from the fourth cluster concerns trips for work purposes, which, in contrast with the second pattern, have to do long distances. This pattern is mainly detected in the morning hours and the car are mostly chosen as the preferred mode of transport. This pattern predominantly includes economically active individuals with very high income. The fourth cluster can be titled as: “my work is far away, so I have to use my car”. The fifth pattern that was revealed is characterized by “trips of young people” and it concerns midnight trips for recreation activities and morning trips from education at moderate distances, where increased use of public transport can be noted. This travel pattern is significantly composed of travelers aged 18–30 years old. Finally, the last pattern which emerged from the sixth cluster concerns movements to relatively short distances for other obligations, outside of work and education. In this pattern, balanced shares of car and public transport trips were reported. These trips are mainly made by elderly residents (65 or more years old). That is why is called “other trips of inactive residents”.

Regarding preferred transport modes, walking is directly associated with recreational trips. Additionally, most trips for educational purposes in Athens are carried out using public transport. Furthermore, shopping journeys are predominantly conducted on foot. Short-distance journeys are sometimes served by bus, while longer-distance trips are typically dominated by private car usage. Regarding temporal mobility trends, it can be concluded that morning trips are mainly conducted by car and train, while afternoon and evening trips are primarily carried out by walking, motorcycle, and bicycle. In relation to the age of commuters, young people use public transport significantly more compared to older age groups, who prefer using cars. The analysis reveals that reported trips exhibit meaningful repetitions, as evident from the distinct characteristics observed within each cluster (i.e., group of trips). The existence of such repetitions was a key hypothesis of this study, and its confirmation aligns with previous research [30], further supporting the robustness of the findings.

Overall, this study focuses on clustering trips rather than travelers, allowing for a more dynamic understanding of mobility patterns without restricting individuals to a single cluster. One individual can “own” trips from more than one cluster. Considering the findings, this assumption appears to be quite reasonable. Previous approaches focused on defining traveler profiles and understanding the trip generation process [25,26,27,28,29]. While they successfully uncovered the underlying reasons for travel behavior, overall mobility patterns remained largely unknown in the past. The applied approach effectively decodes urban mobility in a city, resulting in six highly meaningful clusters. In general, it provides a more comprehensive understanding of the possible range of alternatives—something that initially seemed unfeasible given the limited sample size. This, of course, has policy implications.

4.2. Study Limitations

During the conduct of the research, certain limitations were presented. Specifically, the analysis is based on a sample that consists of 1347 trips. The clusters represent the prevailing mobility patterns observed in 2022 and do not capture long-term variations or trends. The representation of all socio-demographic groups was not equal in this set. Young people are highly represented and that is why a distinct cluster that describes their trips was formed. Another limitation of this research was the lack of real-life trip distances, i.e., network distances. Network distances deviate from the Euclidean ones; a uniform multiplier was used in all distances to describe these deviations. Nevertheless, this assumption may not be realistic in all cases. A more detailed breakdown of the zones inside the study area seems useful to achieve smaller extents and thus greater accuracy in distances. Additionally, the trip duration could be an additional dimension in the estimation process of clusters. This would allow the consideration of traffic congestion, public transport frequencies and network coverage revealing interesting spatial patterns about mode choice. Furthermore, there was no differentiation of public transport beyond the categories of buses and train services. This research did not consider multimodal trips since survey respondents reported only the main transport mode. Lastly, all respondents included in the dataset described at most five trips. This constrained the ability for someone to report a (more) complex travel schedule. Nevertheless, it questionable and requires further research how spectral clustering can treat complex itinerates. In addition, there were no plans with zero trips; this case would correspond to remote workers who have emerged after the COVID-19 pandemic. Remote workers may also be enclosed in some clusters that are associated with intra-zonal trips, namely: clusters 1 and 6.

4.3. Scientific and Practical Recommendations

Future research could explore possible integration of cluster analysis results in predictive models of travel behavior. The reduction in dimensionality through the clustering process would result in less complicated and more interpretable transport models. Clusters can contribute to development of synthetic populations and synthetic trips chains based on sociodemographic characteristics. Yet, this is not a straightforward process. The consideration of clusters as discrete alternatives would result in high endogeneity because these alternatives are not being exogenously defined but rather identified based on the observed behavior that we aim to understand. This is a theoretical question that need to be addressed in the future. In this context, additional variables could be added. Those extra variables could concern other characteristics of the movement itself, such as the cost of the trip, the comfort factor, and the accessibility. It is suggested to investigate to what extent the residents of Athens who frequently use the car for their daily trips are truly dependent on it and which factors influence their choice. Parking space non-availability is a tremendous problem is some neighborhoods of Athens and comprise an additional variable that form travel behavior. Last but not least, the development of a data harmonization framework to integrate all potential mobility data pieces (from official or unofficial bodies) is more crucial than ever. This new pipeline will ensure that valuable datasets are not lost and enable further analysis using the spectral clustering approach. The later will enable the analysis of long-term changes in the identified mobility patterns.

Nevertheless, the outputs of this analysis can now be utilized in the transport planning process. As can be understood, high-cost projects currently underway, such as the construction of the new metro line 4 are likely to benefit long-distance commuters, enchasing the competition between metro and private car use for work-related trips (cluster 4 vs. cluster 3). But they are not enough. Cluster 2 exposes a critical mobility problem. Indeed, a relatively high percentage of respondents use private car for short-distance trips. The reduction in this type of trips should be a priority for policymakers by developing a multi-level strategy. The improvement of bus services in terms of frequencies and coverage is more than ever necessary in order to decrease intra-zonal traffic flows. Ride-hailing and other shared mobility services can complement local transport systems and increase their efficiency. Push measures, like parking restrictions, is always an additional option to protect the neighborhoods of Athens by high attracting flows. Yet, municipalities should take actions to promote active mobility for daily short-distance trips not only in the evening (like cluster 1) but in the morning. This can of course be performed by increasing the walkability of the road network and creating cycling infrastructure for micro-mobility modes (e.g., e-bikes and e-scooters). Overall, the existence of the cluster 2 necessitates the implementation of multiple small-scale policies and interventions to protect the neighborhoods of Athens rather than large scale projects.

5. Conclusions

The research utilized mobility data from a revealed preferences survey to explore personal mobility habits adopted by the residents of Athens, Greece. In essence, it applied Spectral Clustering algorithms to analyze mode choices for the first time. Six distinct groups of trips were exported from this process. This gives a new classification of trips in AMA. Overall, it can be concluded that trips clustering has led to a more comprehensive understanding of the possible range of alternatives, something that was not feasible in the beginning considering the limited sample size. It can be said that the approach that was introduced in this study simplified the problem by reducing the multiple dimensions of mobility. The findings from this analysis can provide valuable and interpretable insights for the transport planning and policymaking process. It is clear from the results that the elimination of short-distance car trips, which constitute a significant proportion of all trips, should be prioritized. Finally, addressing the last-mile problem seems to be the key to mitigating car dependency and ensuring sustainability. Attractive shared mobility services and active modes can reverse this negative trend.

Author Contributions

Conceptualization, P.G.T.; Data curation, E.A.; Formal analysis, E.A.; Investigation, E.A.; Methodology, P.G.T.; Supervision, P.G.T.; Validation; Visualization, E.A.; Writing—original draft, E.A. and P.G.T.; Writing—review and editing, P.G.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Hellenic Broadcasting Corporation (ERT). Project code: 91008900, Project name: “Scientific Support for the Analysis of Current Transport Trends in Greece”.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki. The overall research project methods have been approved by the Ethics Committee of Research of National Technical University of Athens (project code: 91008900, Date of approval: 15 September 2021).

Informed Consent Statement

Informed consent was obtained from all subjects involved in this study.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

We sincerely acknowledge the Hellenic Broadcasting Corporation (ERT) for distributing the revealed preferences survey. Their contribution has been invaluable to our study. Additionally, we would like to express our gratitude to Theodore Chatziioannou for his invaluable contributions as an external collaborator, particularly for his work on the AthensPop package.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Figure A1. Spatial distribution of origin locations of cluster 1 trips.

Figure A2. Spatial distribution of origin locations of cluster 2 trips.

Figure A3. Spatial distribution of origin locations of cluster 3 trips.

Figure A4. Spatial distribution of origin locations of cluster 4 trips.

Figure A5. Spatial distribution of origin locations of cluster 5 trips.

Figure A6. Spatial distribution of origin locations of cluster 6 trips.

References

Knowles, R.D.; Ferbrache, F.; Nikitas, A. Transport’s Historical, Contemporary and Future Role in Shaping Urban Development: Re-Evaluating Transit Oriented Development. Cities 2020, 99, 102607. [Google Scholar] [CrossRef]
Tsigdinos, S.; Tzouras, P.G.; Bakogiannis, E.; Kepaptsoglou, K.; Nikitas, A. The Future Urban Road: A Systematic Literature Review-Enhanced Q-Method Study with Experts. Transp. Res. Part D Transp. Environ. 2022, 102, 103158. [Google Scholar] [CrossRef]
Peeters, P.; Dubois, G. Tourism Travel under Climate Change Mitigation Constraints. J. Transp. Geogr. 2010, 18, 447–457. [Google Scholar] [CrossRef]
Gurram, S.; Stuart, A.L.; Pinjari, A.R. Agent-Based Modeling to Estimate Exposures to Urban Air Pollution from Transportation: Exposure Disparities and Impacts of High-Resolution Data. Comput. Environ. Urban Syst. 2019, 75, 22–34. [Google Scholar] [CrossRef]
SÒlensminde, K. Stated Choice Valuation of Urban traffic Air Pollution and Noise. Transp. Res. Part D Transp. Environ. 1999, 4, 13–27. [Google Scholar]
Müller, J.; Straub, M.; Richter, G.; Rudloff, C. Integration of Different Mobility Behaviors and Intermodal Trips in MATSim. Sustainability 2021, 14, 428. [Google Scholar] [CrossRef]
Müller, S.A.; Balmer, M.; Neumann, A.; Nagel, K. Mobility Traces and Spreading of COVID-19. MedRxiv 2020. [Google Scholar] [CrossRef]
Garrido-Jiménez, F.J.; Rodríguez-Rojas, M.I.; Vallecillos-Siles, M.R. Recovering Sustainable Mobility after COVID-19: The Case of Almeria (Spain). Appl. Sci. 2024, 14, 1258. [Google Scholar] [CrossRef]
Huertas, J.I.; Stöffler, S.; Fernández, T.; García, X.; Castañeda, R.; Serrano-Guevara, O.; Mogro, A.E.; Alvarado, D.A. Methodology to Assess Sustainable Mobility in LATAM Cities. Appl. Sci. 2021, 11, 9592. [Google Scholar] [CrossRef]
Chatziioannou, I.; Nakis, K.; Tzouras, P.G.; Bakogiannis, E. How to Monitor and Assess Sustainable Urban Mobility? An Application of Sustainable Urban Mobility Indicators in Four Greek Municipalities. In Smart Energy for Smart Transport; Nathanail, E.G., Gavanas, N., Adamos, G., Eds.; Lecture Notes in Intelligent Transportation and Infrastructure; Springer Nature: Cham, Switzerland, 2023; pp. 1689–1710. ISBN 978-3-031-23720-1. [Google Scholar]
Kepaptsoglou, K.; Karlaftis, M.G.; Gkotsis, I.; Vlahogianni, E.; Stathopoulos, A. Urban Regeneration in Historic Downtown Areas: An Ex-Ante Evaluation of Traffic Impacts in Athens, Greece. Int. J. Sustain. Transp. 2015, 9, 478–489. [Google Scholar] [CrossRef]
Tzamourani, E.; Tzouras, P.G.; Tsigdinos, S.; Kosmidis, I.; Kepaptsoglou, K. Exploring the Social Acceptance of Transforming Urban Arterials to Multimodal Corridors. The Case of Panepistimiou Avenue in Athens. Int. J. Sustain. Transp. 2023, 17, 333–347. [Google Scholar] [CrossRef]
Te Brömmelstroet, M.; Bertolini, L. Developing Land Use and Transport PSS: Meaningful Information through a Dialogue between Modelers and Planners. Transp. Policy 2008, 15, 251–259. [Google Scholar] [CrossRef]
Rasouli, S.; Timmermans, H. Applications of Theories and Models of Choice and Decision-Making under Conditions of Uncertainty in Travel Behavior Research. Travel Behav. Soc. 2014, 1, 79–90. [Google Scholar] [CrossRef]
Sallard, A.; Balać, M.; Hörl, S. An Open Data-Driven Approach for Travel Demand Synthesis: An Application to São Paulo. Reg. Stud. Reg. Sci. 2021, 8, 371–386. [Google Scholar] [CrossRef]
Heinen, E. Identity and Travel Behaviour: A Cross-Sectional Study on Commute Mode Choice and Intention to Change. Transp. Res. Part F Traffic Psychol. Behav. 2016, 43, 238–253. [Google Scholar] [CrossRef]
Szmelter-Jarosz, A.; Suchanek, M. Mobility Patterns of Students: Evidence from Tricity Area, Poland. Appl. Sci. 2021, 11, 522. [Google Scholar] [CrossRef]
Choupani, A.-A.; Mamdoohi, A.R. Population Synthesis in Activity-Based Models: Tabular Rounding in Iterative Proportional Fitting. Transp. Res. Rec. 2015, 2493, 1–10. [Google Scholar] [CrossRef]
Farooq, B.; Bierlaire, M.; Hurtubia, R.; Flötteröd, G. Simulation Based Population Synthesis. Transp. Res. Part B Methodol. 2013, 58, 243–263. [Google Scholar] [CrossRef]
Ballis, H.; Dimitriou, L. Revealing Personal Activities Schedules from Synthesizing Multi-Period Origin-Destination Matrices. Transp. Res. Part B Methodol. 2020, 139, 224–258. [Google Scholar] [CrossRef]
Saadi, I.; Mustafa, A.; Teller, J.; Cools, M. Forecasting Travel Behavior Using Markov Chains-Based Approaches. Transp. Res. Part C Emerg. Technol. 2016, 69, 402–417. [Google Scholar] [CrossRef]
Borysov, S.S.; Rich, J.; Pereira, F.C. How to Generate Micro-Agents? A Deep Generative Modeling Approach to Population Synthesis. Transp. Res. Part C Emerg. Technol. 2019, 106, 73–97. [Google Scholar] [CrossRef]
Saxena, A.; Prasad, M.; Gupta, A.; Bharill, N.; Patel, O.P.; Tiwari, A.; Er, M.J.; Ding, W.; Lin, C.-T. A Review of Clustering Techniques and Developments. Neurocomputing 2017, 267, 664–681. [Google Scholar] [CrossRef]
Xu, R.; Wunsch, D. Survey of Clustering Algorithms. IEEE Trans. Neural Netw. 2005, 16, 645–678. [Google Scholar] [CrossRef]
Muchlisin, M.; Soza-Parra, J.; Susilo, Y.O.; Ettema, D. Unraveling the Travel Patterns of Ride-Hailing Users: A Latent Class Cluster Analysis across Income Groups in Yogyakarta, Indonesia. Travel Behav. Soc. 2024, 37, 100836. [Google Scholar] [CrossRef]
Soza-Parra, J.; Cats, O. Who Is Ready to Live a Car-Independent Lifestyle? A Latent Class Cluster Analysis of Attitudes towards Car Ownership and Usage. Transp. Res. Part A Policy Pract. 2024, 190, 104271. [Google Scholar] [CrossRef]
Allahviranloo, M.; Regue, R.; Recker, W. Modeling the Activity Profiles of a Population. Transp. B Transp. Dyn. 2017, 5, 426–449. [Google Scholar] [CrossRef]
Hafezi, M.H.; Daisy, N.S.; Millward, H.; Liu, L. Ensemble Learning Activity Scheduler for Activity Based Travel Demand Models. Transp. Res. Part C Emerg. Technol. 2021, 123, 102972. [Google Scholar] [CrossRef]
Hafezi, M.H.; Liu, L.; Millward, H. Learning Daily Activity Sequences of Population Groups Using Random Forest Theory. Transp. Res. Rec. 2018, 2672, 194–207. [Google Scholar] [CrossRef]
Susilo, Y.O.; Axhausen, K.W. Repetitions in Individual Daily Activity–Travel–Location Patterns: A Study Using the Herfindahl–Hirschman Index. Transportation 2014, 41, 995–1011. [Google Scholar] [CrossRef]
Von Luxburg, U. A Tutorial on Spectral Clustering. Stat. Comput. 2007, 17, 395–416. [Google Scholar] [CrossRef]
Shang, Q.; Yu, Y.; Xie, T. A Hybrid Method for Traffic State Classification Using K-Medoids Clustering and Self-Tuning Spectral Clustering. Sustainability 2022, 14, 11068. [Google Scholar] [CrossRef]
Khan, I.K.; Daud, H.B.; Zainuddin, N.B.; Sokkalingam, R.; Farooq, M.; Baig, M.E.; Ayub, G.; Zafar, M. Determining the Optimal Number of Clusters by Enhanced Gap Statistic in K-Mean Algorithm. Egypt. Inform. J. 2024, 27, 100504. [Google Scholar] [CrossRef]
Shahapure, K.R.; Nicholas, C. Cluster Quality Analysis Using Silhouette Score. In Proceedings of the 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), Sydney, Australia, 6–9 October 2020; pp. 747–748. [Google Scholar]
Tsigdinos, S.; Tzouras, P.G.; Kosmidis, I.; Bakogiannis, E.; Kepaptsoglou, K. Examining the Impact of Bicycle-Oriented Multimodality on Accessibility and Transport Equity in the Metropolitan Area of Athens, Greece. Int. J. Urban Sci. 2024, 28, 495–521. [Google Scholar] [CrossRef]

Figure 1. Percentage of trips included in each cluster.

Figure 2. Percentage of trips included in each cluster vs. (a) selected transport mode and (b) trip purpose.

Figure 3. Distribution of departure time (24 h time format) of per cluster: (a) cluster 1, (b) cluster 2, (c) cluster 3, (d) cluster 4, (e) cluster 5 and (f) cluster 6.

Figure 4. Probability density of trip distance included per cluster: (a) cluster 1, (b) cluster 2, (c) cluster 3, (d) cluster 4, (e) cluster 5 and (f) cluster 6.

Table 1. Overview of variables used in the analysis.

Variable	Type	Description {Levels} (If Categorical Variable)
Variables that are imported in the spectral clustering process
transport mode	categorical	{car, taxi, bus, train, motorcycle, bicycle, walk, e-scooter}
trip departure time	integer	hours in 24 h format, from 0 to 24.
trip distance	continuous	distance in m between trip origin and destination zone
trip purpose	categorical	{work, return home, education, market, recreation, service, other}
Variables that are used to interpret the clusters
gender	categorical	{female, male}
age group	categorical	{18–30, 31–40, 41–50, 51–65, >65} years old
education level	categorical	{primary school, high school, bachelor, master/PhD
employment status	categorical	{inactive, unemployed, student, active}
income level	categorical	{0, <750, 751–1500, 1501–2500, >2500} euros
car ownership	categorical	{no, yes}

Table 2. Main attributes of identified clusters.

Cluster	Preferred Transport Modes	Mean Trip Distance (Std. Dev)	Departure Time Period, 75% of Trips	Main Trip Purposes
Cluster 1	Walking (41.7%)	1.68 km (±1.17)	16:00–22:00	Home (50.9%)
Cluster 1	Bus (23.1%)			Recreation (48.1%)
Cluster 2	Car (47.4%)	2.70 km (±1.82)	07:00–11:00	Work (35.4%)
Cluster 2	Walking (16.6%)			Recreation (20.6%)
Cluster 3	Train (44.6%)	14.04 km (±7.46)	07:00–20:00	Home (50.4%)
Cluster 3	Car (28.1%)			Work (22.3%)
Cluster 4	Car (70.2%)	12.95 km (±7.67)	06:00–17:00	Work (58.5%)
Cluster 4	Train (14.6%)			Home (13.8%)
Cluster 5	Train (40.5%)	10.60 km (±9.35)	00:00–12:00	Home (56.1%)
Cluster 5	Car (23.6%)			Recreation (16.2%)
Cluster 6	Car (45.2%)	3.67 km (±4.30)	08:00–12:00	Other (22.6%)
Cluster 6	Train (21.0%)			Recreation (21.0%)

Table 3. Clusters vs. socio-demographic characteristics (in parenthesis the p-value as estimated by chi-square test of independence; significant dependencies for 95% confidence interval with bold).

The Trip “Owner” Is/Has:	Cluster 1	Cluster 2	Cluster 3	Cluster 4	Cluster 5	Cluster 6
Female	59	337	77	217	84	35
Male	47 (p: 0.250)	189 (p: 0.016)	46 (p: 0.618)	164 (p: 0.160)	65 (p: 0.401)	27 (p: 0.816)
18–30 years	50 (p: 1.000)	233 (p: 0.185)	63 (p: 0.160)	167 (p: 0.327)	90 (p: <0.001)	23 (p: 0.078)
31–40 years	32 (p: 0.051)	116 (p: 0.755)	18 (p: 0.042)	92 (p: 0.293)	27 (p: 0.229)	11 (p: 0.347)
40–50 years	11 (p: 0.094)	71 (p: 0.129)	27 (p: 0.034)	74 (p: 0.012)	20 (p: 0.605)	5 (p: 0.150)
50–65 years	12 (p: 0.371)	101 (p: 0.000)	13 (p: 0.277)	43 (p: 0.026)	9 (p: 0.003)	17 (p: <0.001)
>65 years	3 (p: 0.579)	4 (p: 0.021)	1 (p: 0.367)	4 (p: 0.964)	3 (p: 1.000)	7 (p: <0.001)
Primary School graduate	0	0	1 (p: 0.428)	0	0	0
High School gradute	22 (p: 1.000)	98 (p: 0.340)	27 (p: 0.518)	50 (p: 0.000)	50 (p: 0.000)	22 (p: 0.002)
Bachelor graduate	42 (p: 0.896)	221 (p: 0.217)	50 (p: 0.673)	156 (p: 0.755)	48 (p: 0.059)	20 (p: 0.258)
Master/PhD graduate	46 (p: 0.767)	207 (p: 0.749)	42 (p: 0.264)	175 (p: 0.007)	50 (p: 0.091)	20 (p: 0.263)
Inactive	5 (p: 0.767)	26 (p: 0.749)	1 (p: 0.264)	9 (p: 0.007)	3 (p: 0.091)	16 (p: 0.263)
Unemployed	5 (p: 0.737)	7 (p: 0.763)	22 (p: 0.072)	4 (p: 0.027)	5 (p: 0.192)	1 (p: <0.001)
Student	22 (p: 0.005)	80 (p: 0.528)	98 (p: 0.249)	39 (p: 0.342)	46 (p: 0.185)	12 (p: 1.000)
Active	72 (p: 0.212)	411 (p: 0.335)	9 (p: 0.707)	330 (p: 0.000)	92 (p: 0.000)	32 (p: 0.434)
0 euros income	19 (p: 0.232)	63 (p: 0.718)	15 (p: 1.000)	34 (p: 0.012)	28 (p: 0.028)	12 (p: 0.192)
<750 euros income	22 (p: 1.000)	102 (p: 0.798)	26 (p: 0.939)	65 (p: 0.129)	42 (p: 0.011)	12 (p: 1.000)
750–1500 euros income	51 (p: 0.789)	269 (p: 0.211)	53 (p: 0.421)	194 (p: 0.442)	55 (p: 0.007)	34 (p: 0.365)
1500–2500 euros income	15 (p: 0.634)	96 (p: 0.170)	20 (p: 0.925)	69 (p: 0.392)	15 (p: 0.060)	4 (p: 0.081)
>2500 euros income	4 (p: 0.497)	5 (p: 0.014)	3 (p: 1.000)	16 (p: 0.022)	4 (p: 0.956)	0 (p: 0.422)
not car owner	26	92	26	30	35	0
car owner	82 (p: 0.027)	433 (p: 0.293)	95 (p: 0.119)	353 (p: 0.000)	113 (p: 0.012)	54 (p: 0.599)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Andrinopoulou, E.; Tzouras, P.G. Applying Spectral Clustering to Decode Mobility Patterns in Athens, Greece. Appl. Sci. 2025, 15, 3419. https://doi.org/10.3390/app15073419

AMA Style

Andrinopoulou E, Tzouras PG. Applying Spectral Clustering to Decode Mobility Patterns in Athens, Greece. Applied Sciences. 2025; 15(7):3419. https://doi.org/10.3390/app15073419

Chicago/Turabian Style

Andrinopoulou, Eirini, and Panagiotis G. Tzouras. 2025. "Applying Spectral Clustering to Decode Mobility Patterns in Athens, Greece" Applied Sciences 15, no. 7: 3419. https://doi.org/10.3390/app15073419

APA Style

Andrinopoulou, E., & Tzouras, P. G. (2025). Applying Spectral Clustering to Decode Mobility Patterns in Athens, Greece. Applied Sciences, 15(7), 3419. https://doi.org/10.3390/app15073419

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Applying Spectral Clustering to Decode Mobility Patterns in Athens, Greece

Abstract

1. Introduction

2. Data and Methods

2.1. Spectral Clustering Analysis

2.2. Data Collection and Processing

3. Results

4. Discussion

4.1. Clusters’ Interpretation and Main Findings

4.2. Study Limitations

4.3. Scientific and Practical Recommendations

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI