Next Article in Journal
Prediction of Parking Space Availability Using Improved MAT-LSTM Network
Previous Article in Journal
Comparing Machine Learning and Time Series Approaches in Predictive Modeling of Urban Fire Incidents: A Case Study of Austin, Texas
 
 
Article
Peer-Review Record

Delineating Source and Sink Zones of Trip Journeys in the Road Network Space

ISPRS Int. J. Geo-Inf. 2024, 13(5), 150; https://doi.org/10.3390/ijgi13050150
by Yan Shi 1, Bingrong Chen 1, Jincai Huang 2,*, Da Wang 1, Huimin Liu 1 and Min Deng 1
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
ISPRS Int. J. Geo-Inf. 2024, 13(5), 150; https://doi.org/10.3390/ijgi13050150
Submission received: 23 February 2024 / Revised: 25 April 2024 / Accepted: 26 April 2024 / Published: 30 April 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This paper includes some well-designed experiments and comprehensive discussions. The illustrations in the paper are also quite exquisite. A few minor revisions are list below.

I have noticed that there are differences in the display times chosen for the various experiments you conducted. However, you have not provided a clear explanation for these disparities. I suggest that you include relevant explanations regarding the choice of display times during the revision process. This will enhance the coherence and persuasiveness of your article.

Author Response

Response to Reviewer1’s Comments

 

First and foremost, I would like to express my sincere gratitude for the invaluable comments and suggestions provided by you and the other reviewers. These insights have been instrumental in enhancing the quality of our manuscript which ID is ijgi-2901882 and titled "Delineating source and sink zones of trip journeys in road network space". I have meticulously revised our manuscript in accordance with your recommendations, and below is a summary of the modifications made:

 

I have noticed that there are differences in the display times chosen for the various experiments you conducted. However, you have not provided a clear explanation for these disparities. I suggest that you include relevant explanations regarding the choice of display times during the revision process. This will enhance the coherence and persuasiveness of your article.

 

Response: Thank you very much for your insightful comments and for highlighting the issue regarding the disparities in display times chosen for the various experiments presented in our manuscript.  Your feedback has prompted a thorough review and reflection on this aspect of our study.

Upon re-evaluation, we recognized that the inconsistencies in the choice of display times might indeed detract from the coherence and overall persuasiveness of our article.  In response to your suggestion, we have harmonized the display times across all experiments to ensure uniformity. We substituted the data from June 4th, a typical weekday, in place of the previously used May 31st for our comparative experiment.

 

 

Table 3. Quantitative evaluations of different methods using clustering validation indexes.

Date

Time period

Method

Quantitative evaluation indexes

Dunn

Sil

DB

SD

S_Dbw

Weekend

(1st Jun)

Morning

Liu’s

0.016

0.490

0.653

0.326

1.101

Zhu’s

0.734

0.743

0.687

0.196

0.456

Fang’s

0.580

0.825

0.340

0.075

0.159

Jia’s

0.741

0.803

0.622

0.145

1.389

Proposed

0.880

0.954

0.257

0.045

0.108

Evening

Liu’s

0.592

0.754

0.963

1.002

2.269

Zhu’s

0.847

0.636

0.829

0.628

0.662

Fang’s

0.438

0.817

0.446

0.251

0.195

Jia’s

0.122

0.596

0.759

0.432

8.207

Proposed

1.578

0.919

0.423

0.202

0.179

Workday

(4th Jun)

Morning

Liu’s

0.645

0.815

0.836

0.413

0.804

Zhu’s

0.537

0.734

0.764

0.723

2.478

Fang’s

0.802

0.657

0.805

0.564

6.574

Jia’s

0.315

0.810

0.973

0.654

1.978

Proposed

1.286

0.927

0.631

0.275

0.167

Evening

Liu’s

0.582

0.815

0.934

0.497

2.547

Zhu’s

0.704

0.938

0.749

0.305

0.957

Fang’s

0.679

0.804

0.631

0.482

1.367

Jia’s

0.457

0.733

0.834

0.592

4.578

Proposed

0.834

1.174

0.627

0.257

0.844

Holiday

(7th Jun)

Morning

Liu’s

0.572

0.658

0.869

0.361

0.705

Zhu’s

0.540

0.803

0.939

0.738

3.379

Fang’s

0.791

0.584

0.756

0.431

8.317

Jia’s

0.232

0.781

0.932

0.542

1.289

Proposed

1.688

1.029

0.533

0.312

0.059

Evening

Liu’s

0.527

0.933

0.899

0.341

0.369

Zhu’s

0.642

1.041

0.738

0.291

0.424

Fang’s

0.595

0.959

0.338

0.324

0.375

Jia’s

0.306

0.785

0.943

0.616

8.391

Proposed

0.924

1.246

0.547

0.273

0.214

 

Additionally, we have tabulated the daily data volume and the number of trajectories contained within the experimental dataset as follows. Based on this data, we have selected June 1st to represent weekend travel characteristics, June 4th to epitomize weekday travel patterns, and June 7th to serve as a representative for the travel behavior during the Dragon Boat Festival:

Date

Number of GPS Points

Number of trip with passengers

31 May 2019

7073934

180728

1 June 2019

8478981

215514

2 June 2019

3725119

104195

3 June 2019

6564268

171328

4 June 2019

7764173

208118

5 June 2019

666596

173824

6 June 2019

8572970

202482

7 June 2019

9519063

258272

 

 

Thank you for your detailed and constructive feedback. I believe these revisions will further enhance the quality of our manuscript. I look forward to your further guidance and am hopeful for the acceptance of our manuscript.

 

Sincerely,

 

Jincai Huang

Big Data Institute, Central South University

[email protected]

 

 

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

The work described in the paper is very interesting and approaches a very relevant problem. However, some rework is necessary to make it proper for publication. In my opinion, the paper is very difficult to understand by readers who do not have expertise in this knowledge field.  

1. Most of the related work described in Section 2 is old (the most recent was published in 2021). Maybe it is necessary to make a new review in the literature to try to gather the most recent papers related to the theme approached in the paper;

2. In line 216, the authors state that their work considers each trajectory as a document that describes a potential topic. However, throughout the paper, it is not clear how the trajectories are transformed in documents nor how the topics are extracted by using NLP techniques;

3. Throughout the paper, it is possible to find the concepts “topic” and “theme” associated with trajectories. Do these terms have the same meaning? This is not clear in this current version.

4. In line 316, I do not know if it makes sense to compare an input vector to a single neuron in a neural network. It looks very strange.

5. In section 4.1 the authors say that the trajectory dataset used in this work was obtained from DCIC 2020. However, they do not provide any information about the structure of these data (which information is available for each trajectory). If a specific feature was used to convert each trajectory to a document, this information must be clear along the paper.

6. In Figure  6 (and along the text) it is not clear which part of the figure represents periods of weekdays, ordinary weekends, and Dragon Boat Festival;

7. In line 442 the authors state that the coverage of R1 and R7 in Figure 6 shows a dynamic trend. How can the reader notice that characteristic by looking at this figure? This needs to be explained in the text.

8. In section 4.2.5 the authors selected four related works to make comparative experiments.  What were the criteria used for selecting these specific works? 

9. During the experimental evaluation, which dataset was used to compare the work proposed in the paper and the other related work? It is necessary to provide more details about how this process was accomplished. 

 

Author Response

Response to Reviewer2’s Comments

First and foremost, I would like to express my sincere gratitude for the invaluable comments and suggestions provided by you and the other reviewers. These insights have been instrumental in enhancing the quality of our manuscript which ID is ijgi-2901882 and titled "Delineating source and sink zones of trip journeys in road network space". I have meticulously revised our manuscript in accordance with your recommendations, and below is a summary of the modifications made:

 

  1. Most of the related work described in Section 2 is old (the most recent was published in 2021). Maybe it is necessary to make a new review in the literature to try to gather the most recent papers related to the theme approached in the paper;

Response: Thank you very much for your valuable feedback concerning the recency of the literature reviewed in Section 2 of our manuscript. We acknowledge the importance of integrating the most current research findings to ensure that our work is both relevant and contributes effectively to the existing body of knowledge.

In response to your suggestion, we have conducted an extensive search for the latest studies related to our paper's theme. As a result, we have updated our literature review to include nine new references that have been published since 2021. These additional references enhance our discussion of recent advancements and trends in the field, ensuring our work's alignment with current research trajectories.

The newly added references are as follows:

  • Cao, Wenpu et al. 2023. “Constructing Multi-Level Urban Clusters Based on Population Distributions and Interactions.” Computers, Environment and Urban Systems 99: 101897.
  • Huang, Weiming et al. 2022. “Estimating Urban Functional Distributions with Semantics Preserved POI Embedding.” International Journal of Geographical Information Science 36(10): 1905–
  • Kang, Chaogui, Zhuojun Jiang, and Yu Liu. 2022. “Measuring Hub Locations in Time-Evolving Spatial Interaction Networks Based on Explicit Spatiotemporal Coupling and Group Centrality.” International Journal of Geographical Information Science 36(2): 360–
  • Liu, Kai, Yuji Murayama, and Toshiaki Ichinose. 2021. “Exploring the Relationship between Functional Urban Polycentricity and the Regional Characteristics of Human Mobility: A Multi-View Analysis in the Tokyo Metropolitan Area.” Cities 111: 103109.
  • Shi, Haochen et al. 2023. “Capturing Urban Recreational Hotspots from GPS Data: A New Framework in the Lens of Spatial Heterogeneity.” Computers, Environment and Urban Systems 103: 101972.
  • Wang, Jiaxin, Feng Lu, and Shuo Liu. 2023. “A Classification-Based Multifractal Analysis Method for Identifying Urban Multifractal Structures Considering Geographic Mapping.” Computers, Environment and Urban Systems 101: 101952.
  • Xing, Xiaoyue et al. 2022. “Flow Trace: A Novel Representation of Intra-Urban Movement Dynamics.” Computers, Environment and Urban Systems 96: 101832.
  • Yang, Jing et al. 2022. “A Constraint-Based Approach for Identifying the Urban–Rural Fringe of Polycentric Cities Using Multi-Sourced Data.” International Journal of Geographical Information Science 36(1): 114–
  • Zhang, Hong, Tian Lan, and Zhilin Li. 2022. “Fractal Evolution of Urban Street Networks in Form and Structure: A Case Study of Hong Kong.” International Journal of Geographical Information Science 36(6): 1100–

Following the inclusion of new references, we have refined the literature review section, specifically in lines 85-162 of the manuscript document.

 

  1. In line 216, the authors state that their work considers each trajectory as a document that describes a potential topic. However, throughout the paper, it is not clear how the trajectories are transformed in documents nor how the topics are extracted by using NLP techniques.

Response: We would like to express our sincere appreciation for your thoughtful and constructive feedback on our manuscript. We realize that our initial presentation may not have provided sufficient detail or clarity on this matter. We deeply regret this oversight and the confusion it may have caused.

Initially, each trajectory is conceptualized as a sequence of spatial locations (road segments). These segments act analogously to words in traditional text documents. This approach is rooted in the principle that just as words form sentences to convey meaning, sequential road segments in trajectories encapsulate movement patterns. For vectorization, each unique segment in our dataset is treated as a distinct "word," and each trajectory as a "document." This preprocessing step involves the conversion of raw trajectory data into a list of segment identifiers (akin to words), thereby forming a collection of "documents" suitable for NLP analysis.

And we employ the LDA model, a widely recognized method in natural language processing (NLP) for topic discovery. The model is fed with the processed trajectory data (now in the form of documents), identifying latent topics within the dataset. Each topic is characterized by a distribution over the "words" (road segments), indicating the likelihood of each segment being part of a particular topic. A dictionary is constructed from our "documents," mapping each unique segment to a unique integer ID. This step is crucial for the LDA algorithm, which operates on numerical representations. Each trajectory-document is then converted into a bag-of-words (BoW) format. In BoW, a document is represented as a list of (wordID, frequency) tuples, where "wordID" corresponds to a road segment and "frequency" to its occurrence within the document. The LDA model is trained on this corpus, extracting topics that are essentially clusters of frequently co-occurring segments. This reflects underlying patterns in urban mobility, with each topic representing a distinct pattern of movement or behavior. After training the LDA model, the topics are extracted and formatted for clarity. Each topic is expressed as a list of segments (and their corresponding probabilities), illustrating the composition of topics in terms of road segments.

To facilitate analysis, we further refine the output, arranging the topics and their segment components in a structured format (e.g., data frames, pivot tables). This allows us to quantitatively assess the contribution of each segment to different mobility patterns, enhancing our understanding of urban dynamics.

In summary, by treating trajectories as documents and employing the LDA model, we effectively uncover latent patterns in the data, akin to topics in textual analysis.  This innovative approach enables us to interpret complex urban mobility data through the lens of NLP, offering novel insights into the structure and characteristics of movement within urban spaces.  We appreciate the opportunity to clarify our methodology and hope this response adequately addresses your query.

 

  1. Throughout the paper, it is possible to find the concepts “topic” and “theme” associated with trajectories. Do these terms have the same meaning? This is not clear in this current version

Response: Thank you for your insightful observation regarding the use of the terms “topic” and “theme” throughout our manuscript. Upon reflection, we recognize that our application of these terms was not as consistent and precise as it should have been, leading to potential confusion about their meanings and implications within the context of our study.

To clarify, in the context of our research, we primarily intended “topic” to denote a specific pattern or cluster of road segments that frequently appear together within trajectories, as analyzed through the Latent Dirichlet Allocation (LDA) model. These "topics" reveal underlying patterns of urban mobility, reflecting how individuals navigate through city spaces, and indicating clusters of movement that suggest specific urban functionalities or behavioral patterns. For example, a “topic” might represent a cluster of road segments commonly traveled in conjunction, suggesting a route predominantly used for commuting to business districts or residential areas. In light of your feedback, we will undertake a thorough revision to ensure consistent use of the term “topic” throughout the manuscript to refer to the specific patterns of road segments identified through the LDA model. Additionally, we will either clarify or limit the use of the term “theme” to avoid ambiguity, ensuring that our terminology accurately reflects the analytical concepts underpining our research.

We appreciate your constructive feedback, which has prompted this necessary clarification and will undoubtedly enhance the clarity and coherence of our manuscript. We hope that this explanation satisfactorily addresses your concern regarding our use of terminology and the concept of “topic” within our study.

 

 

  1. In line 316, I do not know if it makes sense to compare an input vector to a single neuron in a neural network. It looks very strange.

Response: Thank you for your insightful question regarding the comparison of input vectors to single neurons within the GeoSOM model, as outlined in line 316 of our manuscript. We appreciate the opportunity to clarify this aspect of our methodology.

The GeoSOM model, an extension of the traditional Self-Organizing Map (SOM), is specifically designed to handle geospatial data, making it particularly suitable for semantic trajectory clustering. The rationale behind comparing input vectors (representing semantic trajectory vectors processed by the LDA model) to individual neurons in the GeoSOM network is two fold:

Topology Preservation: The core principle of the SOM, and by extension GeoSOM, is to preserve the topological properties of the input space in the network's two-dimensional representation. By mapping input vectors to the closest neurons based on similarity, the model ensures that similar input vectors are mapped to nearby neurons. This process facilitates the identification of clusters within the data, where each neuron effectively represents a centroid of the input vectors it is closest to.

Spatial Contextualization: The GeoSOM model leverages the spatial context inherent in geospatial data, which is crucial for semantic trajectory clustering. By comparing input vectors to individual neurons, the model can spatially contextualize the semantic information from the trajectories, allowing for a more nuanced understanding of spatial patterns and relationships.

During the GeoSOM training process, the weight vectors of the neurons are iteratively adjusted to better reflect the distribution of the input trajectory vectors. In each iteration, a trajectory vector is selected, and the neuron whose weight vector is most similar to it (i.e., the distance between the weight vector and the trajectory vector is minimized) is identified. The weight vector of this neuron and those of its neighboring neurons are then updated to more closely match the currently processed trajectory vector. This process ensures that similar trajectory vectors are ultimately mapped to adjacent or nearby neurons on the GeoSOM grid. After training, the neurons on the GeoSOM grid are clustered based on their weight vectors. As the trained GeoSOM preserves the topological properties of the input data, similar trajectory vectors (i.e., similar trajectories) are mapped to proximate neurons. Standard clustering algorithms.

 

  1. In section 4.1 the authors say that the trajectory dataset used in this work was obtained from DCIC 2020. However, they do not provide any information about the structure of these data (which information is available for each trajectory). If a specific feature was used to convert each trajectory to a document, this information must be clear along the paper.

Response: In response to the reviewer's request for clarification on the structure of the trajectory dataset used in our work, as well as the specifics regarding how each trajectory was converted into a document, we offer the following detailed explanation:

The trajectory dataset utilized in our study was sourced from the DCIC 2020 competition.  Each trajectory in this dataset represents the movement of an object, typically a vehicle, through a network of roads over time. The fundamental components of each trajectory include spatial locations (expressed as latitude and longitude coordinates), timestamps indicating the time at which each location was recorded, and a unique identifier for each trajectory. Additional metadata are listed as follows:

Attribute Name

Description

Data Type

CARNO

License Plate Number

String

POSITION_TIME

Positioning Time

Date Time

LONGITUDE

Longitude

Float

LATITUDE

Latitude

Float

DIRECTION

Driving Direction Angle

Integer

SPEED

GPS Speed (km/h)

Float

ORDER_ID

Order Number

String

And the details of the data are presented in the table below

Date

Number of GPS Points

Number of trip with passengers

31 May 2019

7073934

180728

1 June 2019

8478981

215514

2 June 2019

3725119

104195

3 June 2019

6564268

171328

4 June 2019

7764173

208118

5 June 2019

666596

173824

6 June 2019

8572970

202482

7 June 2019

9519063

258272

 

Initially, each trajectory point is mapped to the spatially adjacent road segment based on its (longitude, latitude, direction) information. This process extracts the sequence of road segments traversed by the travel trajectory, thereby re-expressing the trajectory as illustrated in the figure below.

 

The LDA model is fed with the processed trajectory data (now in the form of documents), identifying latent topics within the dataset. Each topic is characterized by a distribution over the "words" (road segments), indicating the likelihood of each segment being part of a particular topic.  A dictionary is constructed from our "documents," mapping each unique segment to a unique integer ID. This step is crucial for the LDA algorithm, which operates on numerical representations.  Each trajectory-document is then converted into a bag-of-words (BoW) format.  In BoW, a document is represented as a list of (wordID, frequency) tuples, where "wordID" corresponds to a road segment and "frequency" to its occurrence within the document.  The LDA model is trained on this corpus, extracting topics that are essentially clusters of frequently co-occurring segments. Then we utilize a hierarchical clustering algorithm, taking into account the similarity between trip routes based on their topic compositions.  The Word Mover's Distance (WRD) is introduced as a measure of dissimilarity between trip routes to effectively group similar routes together.

We have revised Section 4.1 to include a detailed description of the specific features used for document conversion, ensuring a comprehensive understanding of our methodological framework.  We hope this clarification adequately addresses the reviewer's concerns.

 

  1. In Figure 6 (and along the text) it is not clear which part of the figure represents periods of weekdays, ordinary weekends, and Dragon Boat Festival;.

Response: We sincerely apologize for not specifying the time categories in the results images of thematic road segments. We are grateful for your valuable feedback and appreciate the opportunity to amend the information presented in our images. To illustrate, we have selected June 1st as a representative for weekend travel characteristics, June 4th to represent weekday travel patterns, and June 7th as a representative for travel behavior during the Dragon Boat Festival. The updated figure below facilitates an easier distinction for readers between regular weekdays, ordinary weekends, and the Dragon Boat Festival.

Figure.Spatial distribution of road segment topics in different time periods

 

  1. In line 442 the authors state that the coverage of R1 and R7 in Figure 6 shows a dynamic trend. How can the reader notice that characteristic by looking at this figure? This needs to be explained in the text.

Response: We sincerely apologize for any confusion caused by the ambiguous analysis of our experimental results. In line 442, we mention that the coverage of spatial line element clusters R1 and R7 exhibits a dynamic trend over the observed periods.

Spatial distribution density is analyzed based on the convex hull and total road length of line elements associated with similar topic labels, enabling an in-depth examination of the intensity of the main thoroughfares and their connections with surrounding roads for different topic labels. R1, representing Chenggong Avenue, serves as a critical north-south arterial route traversing the main urban areas of Xiamen City. Its connectivity strength with neighboring areas such as Dianqian Street to the north, Zhongshan Street in the center, Jiaotong Street, and Kaiyuan Street to the south exhibits significant variations across different time periods. R7, identified as Jiahe Road, demonstrates a close road linkage with Jialian Street during weekend midday peak times, weekdays, and holiday evening peak phases, sharing the same thematic label. Conversely, during weekend peak hours in the morning and evening, it forms a strong spatial association with Jiaotong Street, and during holiday morning and midday peaks, it intensively connects with roads within Jialian Street and Yundang Subdistrict, revealing dynamic spatial associations between major roads and the surrounding neighborhood roadways.

In manuscript, we have amended and supplemented the relevant experimental analysis content between lines 487 and 494.

 

  1. In section 4.2.5 the authors selected four related works to make comparative experiments. What were the criteria used for selecting these specific works?

Response: The selection of the four comparison methods for our analysis—namely, Liu's method, Zhu's method, Fang's method, and Jia's method—was strategically made to encompass a broad spectrum of existing approaches in source-sink zone delineation within urban settings. Each of these methods embodies a distinct perspective on handling origin-destination (OD) data, offering a comprehensive backdrop against which the merits and limitations of our proposed semantic aggregation approach could be evaluated. Here's a summary of the rationale behind choosing each method for comparative analysis:

Liu's Method (2012): This method, focusing on the temporal dynamics of inflow and outflow differences to delineate zones, provides a time-series analytical perspective.  Its ability to identify zones with significant gathering and dispersing patterns makes it a vital comparator for evaluating our method's effectiveness in capturing similar dynamic patterns through semantic aggregation.

Zhu's Method (2014): Utilizing hierarchical clustering to aggregate OD flows, Zhu's method offers insights into how spatial clustering techniques can be leveraged to identify source and sink zones based on flow magnitudes. This method's tendency to generate zones with large spatial sizes due to dominant OD pairs contrasts with our approach's capability to discern cross-regional urban zones, emphasizing the added value of considering route sequences and semantic features.

Fang's Method (2017): By prioritizing the stability of inflow and outflow time series for regionalization, Fang's method introduces the concept of temporal stability in zone delineation. Its limitation in identifying functionally significant small areas, such as snack-gathering places in suburbs, highlights the need for a method like ours that incorporates semantic understanding of route sequences to capture such nuances.

Jia's Method (2021): This recent approach, which employs a spatial interaction network for community detection, represents the cutting-edge in leveraging spatial graphs and network parameters. Its resulting isolation of zones, especially in suburban areas, underscores the challenge of using rigid spatial constraints, contrasting with our method's flexibility in semantic aggregation to link disparate urban areas meaningfully.

 

  1. During the experimental evaluation, which dataset was used to compare the work proposed in the paper and the other related work? It is necessary to provide more details about how this process was accomplished.

Response: Building upon our previous response regarding the selection of specific works for comparative experiments, we would like to provide further clarification on the dataset used during the experimental evaluation and elaborate on the methodology employed for this comparative analysis.

The dataset utilized for the experimental evaluation comprised of GPS-enabled taxi trajectory data collected from Xiamen, covering a comprehensive range of urban movements within a specified period, typically [specific time frame, e.g., one month]. This dataset was chosen for its high resolution and the rich temporal and spatial information it provides, allowing for detailed analysis of urban mobility patterns, including source and sink dynamics.

Data Preprocessing: Initially, the dataset underwent a thorough preprocessing stage, which involved cleaning, normalization, and the extraction of relevant features, such as the sequences of road segments traversed by each trajectory. Feature Extraction and Representation: Following preprocessing, we extracted the starting point of each trajectory, along with the grid and traffic zone in which the starting point is located, based on the information of longitude, latitude, and direction, thereby constructing OD flows. This process transforms the raw trajectory data into a structured format, facilitating further analysis. Method Implementation: We implemented the proposed method as well as the four comparative methods (Liu et al., 2012; Zhu et al., 2014; Fang et al., 2017; Jia et al., 2021) on the preprocessed dataset.

By providing these additional details, we aim to offer a comprehensive view of the dataset and experimental procedures employed in our study. This rigorous comparative analysis underscores the effectiveness and novelty of our proposed method in extracting meaningful insights from complex urban trajectory data.

 

 

Thank you for your detailed and constructive feedback. I believe these revisions will further enhance the quality of our manuscript. I look forward to your further guidance and am hopeful for the acceptance of our manuscript.

 

Sincerely,

 

Jincai Huang

Big Data Institute, Central South University

[email protected]

 

 

Author Response File: Author Response.docx

Reviewer 3 Report

Comments and Suggestions for Authors

This paper proposes a novel OD zone delineation approach based on trip-route topic modeling and trajectory aggregations in road networks.

The main strategy of the proposed methodology is first to reconstruct trip routes using road segment sequences covered by trajectories, then to apply the LDA model in order to learn distinct topics hidden in a series of trip routes, then to apply a hierarchical clustering algorithm to aggregate the topic-embedded trip routes by introducing the WRD, and finally to vectorize the trajectory clusters for each basic spatial unit and to detect the source and sink zones through a spatially constrained SOM network.

Comparative experiments were conducted on real data (taxi trajectories in Xiamen, China) with promising results.  

Further Comments/Suggestions/Improvements:

1. A summarized algorithm/pseudocode for the proposed methodology would be very helpful for the reader.

2. According to the Data Availability Statement the proposed methodology was implemented in Python/Jupyter Notebook. Please provide some more information into the paper:

- What kind of data structures and libraries were used?

- What were the temporal/spatial requirements?

- Technical specifications for the conducted experiments (CPUs/RAM/...). In what kind of systems were conducted?

3. Time performance was not measured in the experiments.

- What was the required execution time to provide the results for each case?

- What is the time-complexity for the proposed methodology?

 

Author Response

Response to Reviewer3’s Comments

I would like to express my sincere gratitude for the invaluable comments and suggestions provided by you and the other reviewers. These insights have been instrumental in enhancing the quality of our manuscript which ID is ijgi-2901882 and titled "Delineating source and sink zones of trip journeys in road network space". I have meticulously revised our manuscript in accordance with your recommendations, and below is a summary of the modifications made:

  1. A summarized algorithm/pseudocode for the proposed methodology would be very helpful for the reader.

Response: We are profoundly grateful for your insightful feedback and constructive suggestions regarding our manuscript.  Your expert critique has undeniably served as a catalyst for enhancing the quality of our paper. In our manuscript, we have provided a summarized pseudocode for the algorithm in line 775 in manuscript as follows:

 

  1. According to the Data Availability Statement the proposed methodology was implemented in Python/Jupyter Notebook. Please provide some more information into the paper:

- What kind of data structures and libraries were used?

Response: In our study, we utilized a range of Python libraries to support data processing, geospatial analysis, natural language processing (NLP), and statistical modeling. Specifically, we employed libraries such as numpy and pandas for handling and manipulating numerical and tabular data, arcpy for geospatial data analysis, and additional libraries like glob, simpledbf, dbfread for file operations and database interactions. For text analysis and topic modeling, we used jieba and gensim, and for visualization of the results, pyLDAvis and scipy were utilized.

 

- What were the temporal/spatial requirements?

Response: For temporal requirements: The trajectory points are segmented into specific hourly slices based on the recorded time, facilitating the analysis of spatiotemporal trends during peak periods each day, namely the morning rush hours from 7 to 9, midday peak from 11 to 13, and evening rush from 17 to 19. Spatial Requirements: Our methodology accommodates a variety of geospatial data formats, including GeoJSON and Shapefiles. The implementation in Python/Jupyter Notebook leverages libraries such as geopandas for processing these formats. Geospatial data is projected onto a uniform projection coordinate system (EPSG:3857, Pseudo-Mercator) to ensure consistency in spatial measurements, such as distances and areas. This uniformity is crucial for precise spatial analysis and density calculations. The scope of the study area is focused on the Siming and Huli districts of Xiamen City, Fujian Province, China. The geographical boundaries of the research area are delineated and utilized to filter the dataset for relevant line elements.

 

- Technical specifications for the conducted experiments (CPUs/RAM/...). In what kind of systems were conducted?

Response: The experiments described in our study were executed in a computing environment based on the Windows operating system. The technical specifications of the system utilized for these experiments are as follows:

Operating System: All experiments were carried out on systems running the Windows operating system.  This choice was made to ensure broad accessibility and compatibility with commonly used geospatial data processing libraries in the Python ecosystem, such as geopandas, scipy, and numpy.

Central Processing Unit (CPU): The experiments were conducted on a standard CPU-based system. all experiments were designed to be compatible with general-purpose CPUs without requiring specialized hardware acceleration (e.g., GPUs).

Random Access Memory (RAM): To accommodate the computational demands of processing large spatial datasets and performing complex spatial-temporal analyses, the system was equipped with a minimum of 32GB of RAM.

Software Environment: The methodology was implemented using Python programming language within the Jupyter Notebook environment.  This setup provided a flexible and interactive platform for developing and testing our spatial-temporal analysis algorithms, facilitating code sharing and reproducibility.

 

  1. Time performance was not measured in the experiments.

- What was the required execution time to provide the results for each case?

Response: Thank you for bringing to our attention the absence of time performance metrics in our initial manuscript. We greatly value your insightful feedback, which underscores the importance of providing comprehensive details to assess the practical applicability and efficiency of our proposed methodology. In response to your valuable suggestion, we have measured the time performance of our method. Attached below is a table listing the runtime for each case, measured across various dataset sizes:

Date

Time period

Method

Average Execution Time

Dataset Size

Weekend

(1st Jun)

Morning

Liu’s

62 seconds

68.4MB

Zhu’s

116 seconds

Fang’s

213 seconds

Jia’s

184 seconds

Proposed

634 seconds

Evening

Liu’s

53 seconds

62.8MB

Zhu’s

109 seconds

Fang’s

197 seconds

Jia’s

168 seconds

Proposed

582 seconds

Workday

(4th Jun)

Morning

Liu’s

85 seconds

80.7MB

Zhu’s

138 seconds

Fang’s

301 seconds

Jia’s

217 seconds

Proposed

716 seconds

Evening

Liu’s

74 seconds

75.6MB

Zhu’s

123 seconds

Fang’s

228 seconds

Jia’s

191 seconds

Proposed

671 seconds

Holiday

(7th Jun)

Morning

Liu’s

67 seconds

71.3MB

Zhu’s

120 seconds

Fang’s

218 seconds

Jia’s

187 seconds

Proposed

651 seconds

Evening

Liu’s

57 seconds

66.3MB

Zhu’s

114 seconds

Fang’s

197 seconds

Jia’s

172 seconds

Proposed

595 seconds

From the comparative analysis of the data presented in the table, it is observed that our method, owing to its capability to extract a more extensive array of feature information from trajectory data, consequently incurs a higher time expenditure during execution.

 

- What is the time-complexity for the proposed methodology?

Response: We appreciate the opportunity to discuss the time complexity of our proposed methodology. For our methodology, which involves multiple computational steps including data preprocessing, spatial analysis, and statistical modeling, the overall time complexity is determined by the combined complexities of these individual processes.

Data Preprocessing: This phase encompasses data loading and cleansing, exhibiting a linear time complexity of O(n), where n represents the quantity of data points. Additionally, the complexity escalates to O(nlogn) due to the incorporation of spatial indexing and sorting within the data processing tasks.

Statistical Modeling (e.g., LDA): The time complexity for techniques such as Latent Dirichlet Allocation (LDA) and similar statistical modeling approaches typically stands at O(nkd), where n denotes the number of trajectories, k represents the number of topics, and d indicates the average number of features per trajectory.

After feature vectors are derived from trajectories through the LDA model, the trajectories are subjected to hierarchical clustering. The time complexity associated with this algorithmic segment typically scales as O(n^2), where n represents the number of trajectories. This complexity arises due to the pairwise comparison and agglomeration steps inherent to hierarchical clustering methodologies.

The time complexity of the GeoSOM (Geographical Self-Organizing Map) model primarily depends on the number of iterations, the size of the input dataset, and the dimensions of the map.  It is generally expressed as O(tnm), where t represents the number of iterations, n is the number of zones in the input dataset, and m denotes the number of neurons in GeoSOM grid.

 

 

Thank you for your detailed and constructive feedback. I believe these revisions will further enhance the quality of our manuscript. I look forward to your further guidance and am hopeful for the acceptance of our manuscript.

 

Sincerely,

 

Jincai Huang

Big Data Institute, Central South University

[email protected]

 

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

This version represents a good improvement of the first version. I recommend accepting the article for publication.

Author Response

Thank you for your comments.

Back to TopTop