Applying Spectral Clustering to Decode Mobility Patterns in Athens, Greece
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe study aims to analyze travel demand patterns for some classified modes of transport by utilizing an advanced clustering approach to explain the travel behaviors of different socio-economic groups of people. The authors nicely discussed the gap in research with references in the introduction and clearly explained the methodology. However, the article lacks novelty and insightful discussion to be considered for publication in its current form. The data itself is valuable, and the way you analyze it and develop clustering is interesting. However, I would suggest you add more discussion that explains how your results are better than the existing approaches with some references. Moreover, in the conclusion, you claimed that this is the first time spectral clustering has been used to analyze modes of transport and this provides you with valuable insight for transportation planning. But reading the article I just found some generalized statements rather than conclusive arguments which proves your claim. Additional analysis is also needed.
Some minor corrections that I found while reading:
In Figure 4, please add the unit for trip distance.
Edit the title of Table 1.
Author Response
The study aims to analyze travel demand patterns for some classified modes of transport by utilizing an advanced clustering approach to explain the travel behaviors of different socio-economic groups of people. The authors nicely discussed the gap in research with references in the introduction and clearly explained the methodology.
Thank you for your kind feedback. We appreciate your recognition of our approach to analyzing travel demand patterns using clustering, and we hope our findings contribute meaningfully to the understanding of urban mobility.
Each point you provided below was carefully considered, and we are addressed these suggestions to further improve the manuscript. We would like to let you know that text modifications are highlighted with yellow color, and additions are shown with green color.
However, the article lacks novelty and insightful discussion to be considered for publication in its current form. The data itself is valuable, and the way you analyze it and develop clustering is interesting. However, I would suggest you add more discussion that explains how your results are better than the existing approaches with some references.
We can understand your concerns regarding the novelty of our research. After examining all (potential) dimensions of the problem (and adding more Figures with results), we can confidently assert that our research is novel, as it successfully interprets the mobility patterns of Athens by just running a single clustering approach. The clarity and consistency of the identified clusters, as reflected in all figures, serve as strong evidence supporting this conclusion. This study decodes mobility, as title says. Last but not least, we have to keep in mind that it is applied science research; therefore, the novelty should lie on results. Section 4.1 was expanded to underline the novelty of our study and its findings (check paragraph 2 and 3) compared to previous studies.
Moreover, in the conclusion, you claimed that this is the first time spectral clustering has been used to analyze modes of transport and this provides you with valuable insight for transportation planning. But reading the article I just found some generalized statements rather than conclusive arguments which proves your claim. Additional analysis is also needed.
It is indeed the first time that spectral clustering is applied to investigate mobility and mode choices. To address this comment, we extended paragraph 3 and 4 of introduction. Previous approaches are presented in detail, which contribute to clarifying (even better) the research gap more effectively.
Some minor corrections that I found while reading: In Figure 4, please add the unit for trip distance. Edit the title of Table 1.
Thank you for the notice. This comment was completely addressed. Yet, new Figures and Tables were added; thus, there are major changes in their order. Thank you again for the time to review our manuscript.
Reviewer 2 Report
Comments and Suggestions for Authors- The choice of spectral clustering over other clustering techniques (e.g., DBSCAN, hierarchical clustering) is not sufficiently justified. While spectral clustering is noted for handling non-linear data, a comparative analysis or references to prior studies in similar contexts would strengthen the rationale. I suggest add a brief comparison of spectral clustering with alternative methods, highlighting its advantages for this specific dataset (e.g., high dimensionality, non-spherical clusters).
- The survey distribution via ERT websites/radio may skew the sample toward younger or tech-literate demographics, as acknowledged in the limitations. This raises concerns about generalizability. Please discuss potential biases explicitly and consider weighting adjustments or sensitivity analyses to assess how overrepresentation affects cluster validity.
- The use of Euclidean distance multiplied by 1.3 to approximate network distances lacks validation. The arbitrary factor (1.3) is not referenced or justified. It will be better to provide empirical or literature-based justification for the 1.3 multiplier or validate it against a subset of real network distances.
- Socio-demographic differences between clusters (e.g., car usage in Cluster 2) are presented descriptively but lack statistical tests (e.g., chi-square, ANOVA) to confirm significance. Include statistical tests to verify whether observed differences (e.g., mode choice, age groups) are significant, strengthening the interpretative claims.
- The conclusion states that high-cost metro projects may have limited impact, yet Cluster 3 highlights metro usage for long commutes, suggesting metro expansion could still be beneficial. Reconcile this contradiction by clarifying that metro improvements may target long-distance commuters, while short trips require alternative interventions (e.g., buses, micro-mobility).
- The term "vulnerable populations" in Cluster 6 (e.g., elderly, inactive residents) is undefined and lacks socio-demographic specificity. Define vulnerability metrics (e.g., income, age, employment status) and provide data to support this classification.
- The assumption that trips form a chain (destination of trip i = origin of trip i+1) may not account for complex itineraries (e.g., starting from work). Discuss how deviations from this assumption (e.g., multi-stop trips) were handled or propose limitations more explicitly.
- The socio-demographic analysis (Table 2) lacks exploration of causal factors (e.g., why Cluster 2 prefers cars for short trips). Incorporate qualitative insights (e.g., survey comments) or literature references to hypothesize behavioral motivations (e.g., convenience, safety perceptions).
- The limitation regarding "returning home" trips is mentioned but not quantitatively addressed (e.g., how these trips influenced clustering). Provide a brief analysis of how excluding "returning home" trips might alter cluster composition or interpretations.
Author Response
The choice of spectral clustering over other clustering techniques (e.g., DBSCAN, hierarchical clustering) is not sufficiently justified. While spectral clustering is noted for handling non-linear data, a comparative analysis or references to prior studies in similar contexts would strengthen the rationale. I suggest add a brief comparison of spectral clustering with alternative methods, highlighting its advantages for this specific dataset (e.g., high dimensionality, non-spherical clusters).
This was indeed a serious limitation of our work. Thank you for the comment, it helped us to improve the paper.
To address this issue, we extended two parts of the paper. In the third paragraph of the introduction, we present previous travel behavior studies that applied clustering techniques. We highlight their objective to underline the research gap. We also highlight the necessity to process mixed data (dummy and continuous variables) with a flexible and scalable technique.
In the first three paragraphs of section 2.1, we explain how spectral clustering meets this requirement compared to other techniques you mentioned. We also considered the review study of Saxena et al. (2017) that compared clustering techniques.
At this point, we would like to let you know that text modifications are highlighted with yellow color, and additions are shown with green color.
The survey distribution via ERT websites/radio may skew the sample toward younger or tech-literate demographics, as acknowledged in the limitations. This raises concerns about generalizability. Please discuss potential biases explicitly and consider weighting adjustments or sensitivity analyses to assess how overrepresentation affects cluster validity.
This is not entirely accurate, as ERT websites and radio tend to be more frequently followed by individuals aged 50 and above. Therefore, this strategy improved the sample representation, considering the main social groups of Athenian. Of course, there is a relatively higher representation of young people, which influences the number of trips included in each cluster (so Figure 1). This limitation is mentioned in section 4.2. However, this does not affect the formulation of the clusters. Even if a single trip is included within a cluster, it is a cluster. To address this comment, we clarified the objective of this study which is the identification of key trip clusters. In addition, we modified Table 3 so that it provides the frequencies per social group and cluster. It gives now a better overview of the sample representation.
The use of Euclidean distance multiplied by 1.3 to approximate network distances lacks validation. The arbitrary factor (1.3) is not referenced or justified. It will be better to provide empirical or literature-based justification for the 1.3 multiplier or validate it against a subset of real network distances.
It is a mean factor based on previous accessibility analyzes considering different transport modes in Athens. In Athens, network distance is 1.3 times larger than Euclidean distance. We added a reference in the second paragraph of section 2.2. Of course, the use of this flat value is a limitation of the study. It is mentioned in section 4.2. Yet, the significance of this limitation in the final results can be assumed negligible, since the study follows a quite macroscopic perspective with the use of zone centroids instead or very specific origin locations.
Socio-demographic differences between clusters (e.g., car usage in Cluster 2) are presented descriptively but lack statistical tests (e.g., chi-square, ANOVA) to confirm significance. Include statistical tests to verify whether observed differences (e.g., mode choice, age groups) are significant, strengthening the interpretative claims.
This was indeed a second problematic point of our paper. Thank you for the notice. We decided to incorporate a x-square test to investigate the significance of these relationships. In Table 3, the resulting p-values are provided in parenthesis. In most of the cases, this analysis confirmed our observations. We have added details to Section 4.1 to include the confidence interval, ensuring a clearer interpretation of the validity of each observation.
The conclusion states that high-cost metro projects may have limited impact, yet Cluster 3 highlights metro usage for long commutes, suggesting metro expansion could still be beneficial. Reconcile this contradiction by clarifying that metro improvements may target long-distance commuters, while short trips require alternative interventions (e.g., buses, micro-mobility).
We agree that this argument was not written correctly. We did some modifications based on your suggestion. They are in the beginning of the last paragraph of chapter 4. Metro line projects may be effective in moving trips from cluster 4 to cluster 3. Yet, it seems that cluster 2 will remain and is a serious issue. This analysis provides some valuable insights into mobility patterns, helping to better understand mobility and effectively prioritize policy interventions.
The term "vulnerable populations" in Cluster 6 (e.g., elderly, inactive residents) is undefined and lacks socio-demographic specificity. Define vulnerability metrics (e.g., income, age, employment status) and provide data to support this classification.
We completely modified it. As it seems from the chi-square test that the dependence is between the age group of 65+ and cluster 6. The new sentence is at the end of paragraph 1 of section 4.1.
The assumption that trips form a chain (destination of trip i = origin of trip i+1) may not account for complex itineraries (e.g., starting from work). Discuss how deviations from this assumption (e.g., multi-stop trips) were handled or propose limitations more explicitly.
The limitation does not lie on the way trip origin and destinations are kept. The formation of chains is important, when collecting diaries. The limitation lies in the number of trips a respondent can report, as well as the inability to fully describe multimodal trips. In the dataset, only the primary transport mode was recorded. Complex itineraries could not be perfectly described due to these limitations. Both limitations are mentioned in section 4.2. It is indeed questionable how spectral clustering can treat complex itinerates. This requires further research, as said in section 4.2.
The socio-demographic analysis (Table 2) lacks exploration of causal factors (e.g., why Cluster 2 prefers cars for short trips). Incorporate qualitative insights (e.g., survey comments) or literature references to hypothesize behavioral motivations (e.g., convenience, safety perceptions).
The observation is absolutely correct. Our study does not provide quantitative insights into why these clusters emerge. This is a key distinction from previous studies that employed clustering techniques. While those studies were successful in explaining trip generation, they did not fully decode overall mobility patterns. Our study achieved the opposite. It will decoded mobility trends, which we believe offers a unique contribution to policy making. To further clarify this, we have expanded the discussion in Section 4.1 to include a comparison with previous studies.
The limitation regarding "returning home" trips is mentioned but not quantitatively addressed (e.g., how these trips influenced clustering). Provide a brief analysis of how excluding "returning home" trips might alter cluster composition or interpretations.
The new Figures include return home trips. Therefore, we decided to overcome and delete this limitation. Thank you again for your thoughtful comments and for taking the time to review our manuscript.
Reviewer 3 Report
Comments and Suggestions for AuthorsThe Authors employ the spectral clustering method to uncover major demand patterns across various transport modes. This approach enables them to identify issues such as the problem of short trips by car. The study, conducted in Athens, Greece, is based on previously collected survey data, which were not part of the presented research. The Authors propose a methodology that enhances the analysis of these data.
The manuscript requires minor expansions in the introduction and more substantial additions to the methodological description, particularly regarding the process of transforming source data for analysis. Additionally, some revisions are necessary.
Introduction
Overall, the introduction effectively outlines the research problem. However, the literature review appears too general (concise). For instance, in lines 59–63, the Authors state:
"Due to the technological advancements, methods such as: Factors Analysis and Analysis of Variance (ANOVA) [16], Iterative Proportional Fitting [17], Monte Carlo Simulation [18], Hidden Markov models [19], Hierarchical models [20], Clustering Algorithms [21,22], and Neural Networks [23] are applied now in transport science to accurately describe travel demand."
Referring to such general methods could be further specified with more details.
The introduction should be expanded to include a discussion on applying spectral clustering as a dimensionality reduction process in scientific research, particularly in transportation studies or other relevant fields. Some of this information is already present in the methodology section, for example, in line 105.
Data and methods
The authors clearly describe the applied methodology; however, the explanation requires further clarification of key terms. The process of data preprocessing and preparation for analysis should be described more clearly in greater detail.
In the methodology section (lines 104–115), several terms require definition or explanation, or at least references to literature where they are properly defined. However, it would be preferable to include their definitions directly in the text. These terms include: "a spectral domain," "network connectivity," and "weights" (are these similarity weights from the similarity matrix?). What do these terms represent, and what is their significance?
Line 146: Does "the sum of the weights" refer to "the sum of the similarity weights"? Please clarify.
Formula (3): The symbol E appears—please provide its definition.
Formula (4): The symbol D appears—please explicitly introduce it earlier as the "degree matrix," for example, in line 143, or add explanations for S and D below the formula.
Lines 167–168: The authors refer to "the Gap Statistic Method and Silhouette scores." Please add references to literature that describe these methods.
Figure 1: Please add the spatial data source, explain the color scheme (legend), and consider including a cartographic grid.
The data preparation process has been described too briefly. Please provide a more detailed explanation. In lines 196-201, the Authors wrote:
"The variables that imported in the spectral clustering were the travel distance (from centroid to centroid), the start time of the trip, the transport mode, as well as the trip purpose. Dummy coding is applied in all variables and for all potential options in the form, except the trip distance which is considered as a continuous variable. Socio-demographic characteristics are not imported in the cluster analysis, but they utilized later to interpret the clusters."
Please list the variables used for clustering, possibly in a table, and specify the method for determining dummy coding (at least by providing concrete examples). Additionally, clarify the units in which distances were expressed and whether they were normalized.
Results
Please add a chart showing the percentage of trips in each cluster. This will facilitate the interpretation of the results.
In lines 212–213, the authors state:
"Furthermore, at least 75% of the trips begin during the afternoon and evening hours, between 16:00 and 22:00 (see Figure 2)."
However, I do not see any information about trip timing in Figure 2. Should this reference be to Figure 3 instead? Please verify and correct if necessary.
Discussion and Conclusions: The Authors present an interesting analysis of the socio-economic characteristics of trip participants in each cluster. The conclusions are well-formulated, and the Authors appropriately acknowledge the limitations of the applied methodology.
Author Response
The Authors employ the spectral clustering method to uncover major demand patterns across various transport modes. This approach enables them to identify issues such as the problem of short trips by car. The study, conducted in Athens, Greece, is based on previously collected survey data, which were not part of the presented research. The Authors propose a methodology that enhances the analysis of these data.
We attempted to address all of your comments in the new version of the manuscript. We would like to let you know that text modifications are highlighted with yellow color, and additions are shown with green color.
The manuscript requires minor expansions in the introduction and more substantial additions to the methodological description, particularly regarding the process of transforming source data for analysis. Additionally, some revisions are necessary.
We acknowledge that section 2.2 was quite short. To fix this issue and explain better how trip data were imported in the spectral clustering analysis, Table 1 was added. It gives the variables and their levels that were utilized to perform spectral clustering and interpret the clusters. Also, we expanded the last paragraph of section 2.2 to explain how data were processed.
Overall, the introduction effectively outlines the research problem. However, the literature review appears too general (concise). For instance, in lines 59–63, the Authors state: "Due to the technological advancements, methods such as: Factors Analysis and Analysis of Variance (ANOVA) [16], Iterative Proportional Fitting [17], Monte Carlo Simulation [18], Hidden Markov models [19], Hierarchical models [20], Clustering Algorithms [21,22], and Neural Networks [23] are applied now in transport science to accurately describe travel demand." Referring to such general methods could be further specified with more details.
Thank you for the comment. This was a problematic part of the introduction in the previous version. We decided to give more details about how these techniques have been used in the past to describe travel demand. So, this part was significantly expanded. Of course, there are numerous of studies and models that attempted to predict travelers' behavior. This is also a problem. We tried to write a conclusion that explain why clustering techniques are more appealing methods for transport planners and daily practice (see paragraph 3 of chapter 1).
The introduction should be expanded to include a discussion on applying spectral clustering as a dimensionality reduction process in scientific research, particularly in transportation studies or other relevant fields. Some of this information is already present in the methodology section, for example, in line 105.
The second and third paragraph of introduction were extended to answer this comment too. As mentioned, in the second paragraph, we added a number of innovative techniques that have been applied in transport modeling apart from the classic 4-stage model. Of course, one of this is the cluster analysis. Therefore, in the third paragraph, we give travel behavior studies that utilized clustering algorithms. In the methodology section, we focused only on spectral clustering. Lastly, this clustering method is not explicitly intended for dimensionality reduction but rather as a solution to address data availability limitations by leveraging the power of repetition. This is now better explained in the last paragraph of the introduction.
The authors clearly describe the applied methodology; however, the explanation requires further clarification of key terms. The process of data preprocessing and preparation for analysis should be described more clearly in greater detail.
The introduction paragraph of chapter 2 was rewritten. The methodological process has three steps. First, the trip data are processed, followed by their integration into the spectral clustering analysis. The resulting outputs are then examined through descriptive statistics, allowing a comprehensive interpretation of the results.
In the methodology section (lines 104–115), several terms require definition or explanation, or at least references to literature where they are properly defined. However, it would be preferable to include their definitions directly in the text. These terms include: "a spectral domain," "network connectivity," and "weights" (are these similarity weights from the similarity matrix?). What do these terms represent, and what is their significance?
Thank you for the comment. In paragraph 2 of section 2.1, the definition of spectral domain and graph (and not network) connectivity is given. We replace the word "weight" with "similarity level" to be more consistent.
Line 146: Does "the sum of the weights" refer to "the sum of the similarity weights"? Please clarify.
We changed it to similarity values. Thank for this recommendation.
Formula (3): The symbol E appears—please provide its definition.
It is all the trips (data points) included in the set. We simplified the formula to help the reader.
Formula (4): The symbol D appears—please explicitly introduce it earlier as the "degree matrix," for example, in line 143, or add explanations for S and D below the formula.
They are all matrices. We provide explanation now below the equation. Thank you for the notice.
Lines 167–168: The authors refer to "the Gap Statistic Method and Silhouette scores." Please add references to literature that describe these methods.
We created a new paragraph at the end of section 2.1 explaining both methods. We also added the formula of Silhouette score. For both methods, we added references. Thank you for the suggestion. We think that the theoretical part of the paper was generally improved.
Figure 1: Please add the spatial data source, explain the color scheme (legend), and consider including a cartographic grid.
Figure was indeed complicated. We decided to eliminate it. As a replacement, there are maps in the Appendix A which present the spatial distribution of trip origin per cluster and the zones (plus the centroids) too.
The data preparation process has been described too briefly. Please provide a more detailed explanation. In lines 196-201, the Authors wrote: "The variables that imported in the spectral clustering were the travel distance (from centroid to centroid), the start time of the trip, the transport mode, as well as the trip purpose. Dummy coding is applied in all variables and for all potential options in the form, except the trip distance which is considered as a continuous variable. Socio-demographic characteristics are not imported in the cluster analysis, but they utilized later to interpret the clusters."Please list the variables used for clustering, possibly in a table, and specify the method for determining dummy coding (at least by providing concrete examples). Additionally, clarify the units in which distances were expressed and whether they were normalized.
Thank you for this comment. The description of the data preparation process was indeed poor in the previous version of the manuscript. We created Table 1 that gives a list of the variable that were utilized in the two stages of the analysis process: clustering analysis and interpretation of clusters. Table 1 also gives the levels of variables. What is more, the last paragraph of section 2.2 was expanded to explain what dummy coding scheme is. With simple words, one column with binary format values was created per level of each categorical variable.
Please add a chart showing the percentage of trips in each cluster. This will facilitate the interpretation of the results.
Thank you for your feedback. We have added Figure 1 to provide an overview of the trip distribution across clusters. The pie charts help illustrate which clusters contain the highest number of observations, offering a clearer perspective on the data. Additionally, this visualization enhances the flow of the text, guiding the results presentation from general to specific.
In lines 212–213, the authors state: "Furthermore, at least 75% of the trips begin during the afternoon and evening hours, between 16:00 and 22:00 (see Figure 2)." However, I do not see any information about trip timing in Figure 2. Should this reference be to Figure 3 instead? Please verify and correct if necessary.
We have corrected all figure references in Chapter 4 and 5. Since Chapter 4 is structured now by cluster rather than by figure, we found it more appropriate to reference each figure whenever a specific value is mentioned. These additions may improve the readability of results presentation.
Discussion and Conclusions: The Authors present an interesting analysis of the socio-economic characteristics of trip participants in each cluster. The conclusions are well-formulated, and the Authors appropriately acknowledge the limitations of the applied methodology.
Thank you for the good words. We would like to inform you that we have expanded our analysis by incorporating the chi-square test to further examine the relationships between socio-economic characteristics and cluster membership. This addition strengthens our conclusions by providing statistical validation of the observed dependencies.
Thank you again for the feedback and the time you dedicated to review our paper.
Reviewer 4 Report
Comments and Suggestions for AuthorsThe idea is interesting for beginners who have interest on applying Spectral Clustering and could be beneficial to reduce the limitation of mobility data in a particular city. The model only applicable to Athens and does not cover broader impact within field. How your model beneficial for other cities? My comments are as follows:
I suggest authors to add more demographic survey in order to cover wide range instead of only particular group of people.
The authors used 6 clusters. Where are those clusters in your study area? I did not see any of them.
Data collection: How do you collect the data? Its a survey? Should explain it clearly. The authors should cite the Eqs. references or at least explain them in details
How do you cluster the travel behaviors? I suggest authors to add the block diagram to let the readers understand that how does model work.
English could be polished as well.
Add the limitations of your work.
Comments on the Quality of English LanguageEnglish needs to be polished as well
Author Response
The idea is interesting for beginners who have interest on applying Spectral Clustering and could be beneficial to reduce the limitation of mobility data in a particular city. The model only applicable to Athens and does not cover broader impact within field. How your model beneficial for other cities?
We think that this method can be applied in other cities. As we now mention in the last paragraph of introduction, spectral clustering is flexible and scalable method. We provide more justifications about this in the second paragraph of section 2.1. It can treat larger trip datasets that may exist in other cities. The only challenge (also added in 2.1) is the interpretability of the clusters, which cannot be a-priori guaranteed. Nevertheless, the results are quite promising at the end. The cluster seem quite distinct and logical. At this point, we would like to let you know that text modifications are highlighted with yellow color, and additions are shown with green color.
My comments are as follows:
I suggest authors to add more demographic survey in order to cover wide range instead of only particular group of people.
Unfortunately, this was not possible due to the data limitation issues. Nevertheless, it is important to keep in mind that this study does not search for the most popular cluster of trips that exist in Athens. It does not want to explain the behavior of the majority. It aims to capture the full spectrum of travel behavior, regardless of the cluster size. It identifies key demand patterns, as mentioned in paragraph 5 of chapter 1. Therefore, a larger set that includes the same social groups and trip demand patterns cannot be much beneficial to meeting this objective.
The authors used 6 clusters. Where are those clusters in your study area? I did not see any of them.
Good comment, thank you. It led us to investigate the spatial distribution of trip origins per cluster. They are some interesting revelations that are presented int the last paragraph of chapter 4. The 6 maps are now given in Appendix A. The distinction of clusters is obvious in space too.
Data collection: How do you collect the data? Its a survey? Should explain it clearly. The authors should cite the Eqs. references or at least explain them in details
Thank you for the notice, because the first sentence of section 2.2 was not correct. It was a revealed preferences online survey distributed by ERT. We also improve paragraph 1 of this section to give more details about the data collection process.
How do you cluster the travel behaviors? I suggest authors to add the block diagram to let the readers understand that how does model work.
We acknowledge this limitation of our study. To address this problem, we added Table 1 that give a quite comprehensive overview of the variables that were used to perform spectral clustering and interpret the clusters later. In addition, we expanded section 2.2 to explain how data processing was performed.
English could be polished as well.
English was generally improved by double reading the manuscript. One college helped us to correct some syntax and grammar errors.
Add the limitations of your work.
Section 4.2 gives explicitly the study limitations. Overall, we improved this section. Thank you again for the feedback.
Reviewer 5 Report
Comments and Suggestions for AuthorsFirst of all, comgratulations for this research. It´s a great paper.
If it´s a help for you, the article presents a well-structured study that effectively integrates the spectral clustering method with mobility data from Athens, Greece. The research is conceptually strong, and it contributes with valuable insights into urban mobility analysis. The introduction provides a comprehensive background on the research problem, clearly describing the challenges related to mobility data availability and the relevance of using spectral clustering for mobility pattern analysis. However, one aspect that could improve the introduction is a more explicit explanation of how the lack of mobility data since 2006 has been addressed. While the study mentions the use of a revealed preference survey from 2022, detailing additional sources or methodological assumptions that bridge the gap between 2006 and 2022 would enhance the study’s credibility. The article follows a logical and coherent structure, evolving from a broad presentation of the problem to a conceptually strong connection between spectral clustering and survey data. This transition is well-executed, yet the conceptual connection could be further strengthened by discussing how data scarcity over time was handled. If there were methodological adjustments to compensate for the lack of historical data, these should be explicitly stated.
The methods section is well-detailed and appropriately describes the application of spectral clustering, including its advantages over traditional clustering techniques. The explanation of the clustering process and the choice of metrics, such as Manhattan distance, is clear. However, a potential improvement would be to increase the discussion on the combination of survey data with assumptions made to address low representativeness. Given that the dataset consists of only 1,347 trips, which is a small sample relative to the total volume of trips in Athens, it would be useful to describe in more detail how this limitation was addressed to ensure meaningful clustering results. For example, were weighting techniques or post-sampling adjustments applied? It´d be a useful overview.
The results are well-conceptualized and logically aligned with the study’s objectives, and the clustering analysis effectively differentiates mobility patterns, providing insights into urban travel behavior. However, the addition of graphical visualizations would significantly enhance the clarity and impact of the findings. While the textual description of the six clusters is detailed, including heat maps or spatial distribution graphs of mobility clusters, time-based charts illustrating peak travel hours per cluster, and mode share distribution graphs per cluster would make the results more accessible and easier to interpret. The conclusions are well-founded in the results and provide meaningful implications for urban mobility policies, successfully linking the spectral clustering findings with policy recommendations.
Author Response
First of all, comgratulations for this research. It´s a great paper.
Thank you for the positive feedback and the time you dedicated to review our papers.
If it´s a help for you, the article presents a well-structured study that effectively integrates the spectral clustering method with mobility data from Athens, Greece. The research is conceptually strong, and it contributes with valuable insights into urban mobility analysis. The introduction provides a comprehensive background on the research problem, clearly describing the challenges related to mobility data availability and the relevance of using spectral clustering for mobility pattern analysis.
We have tried to specificaly reply to all your concerns. At this point, we would like to let you know that text modifications are highlighted with yellow color, and additions are shown with green color.
However, one aspect that could improve the introduction is a more explicit explanation of how the lack of mobility data since 2006 has been addressed. While the study mentions the use of a revealed preference survey from 2022, detailing additional sources or methodological assumptions that bridge the gap between 2006 and 2022 would enhance the study’s credibility.
We agree with this comment. Nevertheless, it is not within the study scope, as we attempted to exploit the full power of the current dataset. We just mention this about 2006 in the introdution to underline the importance of the problem we have in Athens. This can serve as a recommendation for future research and development. To this end, we propose the creation of a pipeline integrated within the spectral clustering framework. This will facilitate the harmonization and utilization of all the potential data pieces from 2006 until now.
The article follows a logical and coherent structure, evolving from a broad presentation of the problem to a conceptually strong connection between spectral clustering and survey data. This transition is well-executed, yet the conceptual connection could be further strengthened by discussing how data scarcity over time was handled. If there were methodological adjustments to compensate for the lack of historical data, these should be explicitly stated.
This comment is quite connected with the previous one. It is definetely a limitation of our study we could not overcome. The collected trips give the situation as it was in 2022. We do not have any evidence to further investigate long-term changes in the mobility patterns. Cities with historical data of trips may be able to do this analysis. The applied method can be utilized, as it nicely decode mobility.
Therefore, to adress this comment, we added an extra limitation in chapter 4.2. In addition, we tranformed it into a recommendation for further research. It is now mentioned at the of paragraph 1 of section 4.1.
The methods section is well-detailed and appropriately describes the application of spectral clustering, including its advantages over traditional clustering techniques. The explanation of the clustering process and the choice of metrics, such as Manhattan distance, is clear. However, a potential improvement would be to increase the discussion on the combination of survey data with assumptions made to address low representativeness. Given that the dataset consists of only 1,347 trips, which is a small sample relative to the total volume of trips in Athens, it would be useful to describe in more detail how this limitation was addressed to ensure meaningful clustering results. For example, were weighting techniques or post-sampling adjustments applied? It´d be a useful overview.
We addresed this comment by improving the last paragraph of chapter 1. We clearly mention our starting hypothesis: trip patterns are associated with noticeable repetitions and as Susilo and Axhausen have mentioned, these repetitions are associated with socio-demographic characteristics. So, the magic word is repetitions. Once we have captured the full spectrum of social groups and travel behaviors, we can be confident that all (key demand) patterns have been accounted for. If we succeed in capturing them all, regardless of the cluster size, we can then regenerate synthetic demand based on the total population characteristics. Also, we have to note that spectral clustering is flexible and scalable to larger datasets.
The results are well-conceptualized and logically aligned with the study’s objectives, and the clustering analysis effectively differentiates mobility patterns, providing insights into urban travel behavior. However, the addition of graphical visualizations would significantly enhance the clarity and impact of the findings. While the textual description of the six clusters is detailed, including heat maps or spatial distribution graphs of mobility clusters, time-based charts illustrating peak travel hours per cluster, and mode share distribution graphs per cluster would make the results more accessible and easier to interpret.
We have refined the figure definitions and standardized the color scheme to enhance clarity and accessibility throughout the paper. We added maps with the spatial distribution of trip origins per cluster in Appendix A. They also give some interesting insights. Regarding the peak hour per cluster, we now note this in the results' presentation. Additionally, a pie has been added that gives the percentages of trips allocated in each cluster. Thank you for the comment.
The conclusions are well-founded in the results and provide meaningful implications for urban mobility policies, successfully linking the spectral clustering findings with policy recommendations.
Thank you for the good words. We reinfrorced the statistical validity of our findings by adding the results from a chi-square analysis in Table 3.
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsGreat work.
Reviewer 2 Report
Comments and Suggestions for AuthorsMy comments in the previous round have been dealt with in an acceptable way, and as far as I can see, the authors have also amended the paper adequately in response to the other reviewer's comments. The revised version of the paper is much improved, and I appreciate the authors' thorough responses to the comments.
Reviewer 4 Report
Comments and Suggestions for AuthorsWhile the authors did not address some of my comments, the paper in its present form has improved a lot. English seems to be improved as well.
Comments on the Quality of English LanguageIt has been improved.