Next Article in Journal
Spatio-Temporal Analysis of the Redundancies of Construction Land in the Beijing-Tianjin-Hebei Region (2000–2020)
Previous Article in Journal
Innovation and Research to Support Policies on Sustainable Development Goals: An Integrated ICT Platform for the Definition and Monitoring of Programs in Puglia Region, Italy
 
 
Article
Peer-Review Record

Generating Large-Scale Origin–Destination Matrix via Progressive Growing Generative Adversarial Networks Model

ISPRS Int. J. Geo-Inf. 2025, 14(4), 172; https://doi.org/10.3390/ijgi14040172
by Zehao Yuan 1,2,3,*, Xuanyan Chen 1,2,3, Biyu Chen 1,2,3, Yubo Luo 4, Yu Zhang 1,2,3, Wenxin Teng 1,2,3 and Chao Zhang 1,2,3
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
ISPRS Int. J. Geo-Inf. 2025, 14(4), 172; https://doi.org/10.3390/ijgi14040172
Submission received: 15 January 2025 / Revised: 8 April 2025 / Accepted: 10 April 2025 / Published: 14 April 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors The manuscript proposes a new GAN model for generating OD matrices, whose main innovation lies in the ability to generate large-scale OD matrices while protecting data privacy. There are some issues that need to be revised 1. In the experimental section, it is necessary to analyze and discuss the experimental results of Figure 4 and 5, explain why the self-designed model can achieve good results, and combine the inner structure of the model for analysis and discussion 2.  It is necessary to use some good ITS algorithms or models to validate OD matrix data, and compare and explain it with real OD data to verify the usability of the generated OD data Comments on the Quality of English Language

Overall, the English proficiency of the paper is acceptable, but there are still some minor issues , such as lines 96-97, 71-72, 239-241, etc

 

Author Response

Comment 1. In the experimental section, it is necessary to analyze and discuss the experimental results of Figure 4 and 5, explain why the self-designed model can achieve good results, and combine the inner structure of the model for analysis and discussion

Reply:

Thank you for your valuable feedback. To provide a more detailed explanation of why our designed model achieves good results, we have redesigned our comparative experiments with the baseline models. Additionally, we further examined the necessity of adopting the PGGANs framework by attempting to train the model without using the PGGANs architecture. However, due to the sparsity of large-scale OD matrices, the training process struggled to converge effectively, indicating that alternative GAN architectures might be required for training. In future work, we will explore the application of other GAN architectures for large-scale OD matrix generation to further enhance model performance and robustness.

Comment 2. It is necessary to use some good ITS algorithms or models to validate OD matrix data, and compare and explain it with real OD data to verify the usability of the generated OD data.

Reply:

Thank you for your constructive comment. In recent years, synthetic data has received widespread attention, with related studies primarily focusing on generating data that closely align with the distribution of real datasets. However, in the field of OD matrix generation, research on the usability of synthetic data remains limited.

We fully agree with the reviewer’s perspective that validating the usability of GAN-generated OD matrices through ITS algorithms or models is essential. We apologize for the lack of validation experiments in this study due to time constraints. In future work, we plan to further explore validation methodologies by applying well-established ITS models to assess the effectiveness of our generated OD matrices in real-world applications.

Comment 3. Overall, the English proficiency of the paper is acceptable, but there are still some minor issues , such as lines 96-97, 71-72, 239-241, etc.

Reply:

Thank you for your valuable feedback. We apologize for the minor errors in the manuscript. We have carefully reviewed and made the necessary revisions throughout the paper to ensure grammatical accuracy and improve the overall clarity of the text.

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

The paper titled "Generating large scale Origin-Destination Matrix via Progressive Growing Generative Adversarial Networks Models" explores the application of PG-GAN for OD matrix generation, which is an intriguing approach. However, there are several areas that require further clarification and improvement. Below are my review comments:

1.Purpose
An OD matrix serves as a computational representation of traffic flow, which is fundamental to transportation modeling. It captures spatiotemporal features that reflect the dynamic movement of humans or vehicles. When discussing the OD matrix, both spatial and temporal resolutions must be considered. Traditional models have the specified application scenarios, such as the gravity model performing well for generation of large-scale regional traffic rather than fine scale, and also this model cannot capture the temporal dynamic.

Researchers have developed various transportation models to analyze traffic with different spatiotemporal resolutions. While the introduction and related work sections extensively discuss GANs and their application in image generation, the specific advantages of using GANs for OD matrix generation—compared to traditional models—remain unclear. The scientific contribution of the paper is not sufficiently articulated.

2.Data and Methodology
GANs learn from real data distributions, which assumes that each individual image or matrix holds meaningful information. In this study, the authors use cell phone trajectories for 31 days, covering 17 million users. However, it is unclear why only 500,000 users were selected for each OD matrix conversion. The reliability of the OD matrix depends on the representativeness of the dataset, and selecting just a subset of 500,000 users could introduce sampling bias, distorting the real distribution. The paper does not justify how these 500,000 users were chosen. If the dataset is not comprehensive, the GAN may fail to capture realistic travel behavior and may instead generate arbitrary random OD flows.

To fit the training/testing/validation framework, authors could aggregate trajectories at various spatial and temporal scales (e.g., smaller zones or shorter time intervals) other than selected 500,000 people’s trajectories for conversion.

3.Results
In Figure 5, the comparison between the gravity/radiation model and OD-PGGAN is not entirely clear, as the former generates a single matrix while the latter produces multiple matrices. It is unclear whether an averaged GAN-generated matrix was used for comparison, which could lead to an unfair evaluation.

Although OD-PGGAN presents a novel approach for large-scale OD matrix generation, its methodological limitations hinder its practical applicability in transportation modeling. To enhance the study's credibility, the following improvements are recommended: (1) Address the sampling bias by justifying the selection of the 500,000 users or evaluating how different sample sizes affect model performance; (2) Ensure a fair comparison of OD-PGGAN against traditional OD models; (3) Expand the model to generate time-sensitive OD matrices, which would enhance its utility for traffic forecasting and transit planning.

Comments on the Quality of English Language

Typos:

Line 97: "to simulate and generate urban traffic flow Studies have shown that", missing stop between "flow" and "Studies".

Line 184-185: "higher solution and clarity of images", or should be "resolution"?

Line 216: "Aggregation of Teacher Ensembles framework (PATE)[42, 43] framework", should be one "framework"?

Line 354: title "Definition 7. OD Network. " is the same of Line 349.

Author Response

Reviewer 2

Comment 1. An OD matrix serves as a computational representation of traffic flow, which is fundamental to transportation modeling. It captures spatiotemporal features that reflect the dynamic movement of humans or vehicles. When discussing the OD matrix, both spatial and temporal resolutions must be considered. Traditional models have the specified application scenarios, such as the gravity model performing well for generation of large-scale regional traffic rather than fine scale, and also this model cannot capture the temporal dynamic. Researchers have developed various transportation models to analyze traffic with different spatiotemporal resolutions. While the introduction and related work sections extensively discuss GANs and their application in image generation, the specific advantages of using GANs for OD matrix generation—compared to traditional models—remain unclear. The scientific contribution of the paper is not sufficiently articulated.

Reply:

Thank you for your valuable feedback. We have revised the introduction section to explicitly the advantages of using GANs for OD matrix generation task compared to other models. Additionally, we have redesigned our comparative experiments with baseline models to further demonstrate the strengths of GANs in capturing spatiotemporal dynamics and structural variability in OD matrices. These modifications aim to better articulate the scientific contributions of our study.

Comment 2. GANs learn from real data distributions, which assumes that each individual image or matrix holds meaningful information. In this study, the authors use cell phone trajectories for 31 days, covering 17 million users. However, it is unclear why only 500,000 users were selected for each OD matrix conversion. The reliability of the OD matrix depends on the representativeness of the dataset, and selecting just a subset of 500,000 users could introduce sampling bias, distorting the real distribution. The paper does not justify how these 500,000 users were chosen. If the dataset is not comprehensive, the GAN may fail to capture realistic travel behavior and may instead generate arbitrary random OD flows. To fit the training/testing/validation framework, authors could aggregate trajectories at various spatial and temporal scales (e.g., smaller zones or shorter time intervals) other than selected 500,000 people’s trajectories for conversion. To enhance the study's credibility, the following improvements are recommended: Address the sampling bias by justifying the selection of the 500,000 users or evaluating how different sample sizes affect model performance.

Reply:

Thank you for your constructive feedback. We have revised the data description section to provide a detailed explanation of our sampling process and the rationale for selecting 500,000 users per OD matrix conversion. If no sampling method were used and the entire dataset was utilized, we would only have 31 OD matrices, which would be insufficient for effectively training GANs. The use of sampling enables us to generate a larger dataset for training.

To further evaluate the impact of different sample sizes on model performance, we conducted additional experiments comparing OD matrices generated from different sample sizes. The results indicate that the sampled datasets maintain statistical consistency with the full dataset across several key metrics. This suggests that the sampling approach is a feasible strategy for constructing training datasets for GANs.

We acknowledge that sampling may introduce certain biases, and we plan to further investigate methods to mitigate this issue in future research. Potential solutions include data augmentation techniques to enhance the representativeness of training samples.

 

we have made the following revisions:

To ensure sufficient training data, this study adopted a non-replacement random sam-pling approach to construct the OD matrix. Specifically, for each day, 500,000 users were randomly sampled without replacement to construct an OD matrix, and this process was repeated iteratively until the remaining number of users was insufficient to form another complete OD matrix. In total, this sampling approach construct 1,020 OD matrix as the real sample over the 31 day period. Each OD matrix represents the OD flow information derived from 500,000 users sampled on a given day.

 

Table. 4. Statistics of CPC, NRMSE, and JSD for OD matrix constructed using different sample sizes.

 

CPC

NRMSE

     

100,000

0.772

0.368

0.083

0.086

0.342

200,000

0.775

0.375

0.081

0.085

0.342

500,000

0.783

0.367

0.079

0.082

0.326

1,000,000

0.785

0.362

0.080

0.081

0.328

All users

0.792

0.357

0.081

0.082

0.311

To ensure an adequate training dataset, this study adopts a sampling approach to construct OD matrices. To assess the bias introduced by the number of sampled users in OD matrix construction, we generate OD matrices using different sample sizes while applying the same sampling methodology. Table 4 presents the statistics of CPC, NRMSE, and JSD for OD matrices constructed using different sample sizes. It is important to note that when using the entire dataset to construct OD matrices, the results reflect variations across different days within the month rather than biases introduced by sampling. The results indicate that as the sample size increases, the CPC value also increases. This trend occurs because larger sample sizes result in fewer OD matrices, thereby reducing sampling bias. NRMSE also exhibits variations across different sample sizes. The inflow JSD remains relatively stable regardless of the sample size. The outflow JSD gradually stabilizes as the sample size increases, while the OD flow JSD decreases progressively with increasing sample size. These findings suggest that the OD matrices constructed using the sampling approach in this study can approximate the distribution characteristics of the full dataset to a certain extent.

Comment 3. In Figure 5, the comparison between the gravity/radiation model and OD-PGGAN is not entirely clear, as the former generates a single matrix while the latter produces multiple matrices. It is unclear whether an averaged GAN-generated matrix was used for comparison, which could lead to an unfair evaluation. Although OD-PGGAN presents a novel approach for large-scale OD matrix generation, its methodological limitations hinder its practical applicability in transportation modeling. To enhance the study's credibility, the following improvements are recommended: Ensure a fair comparison of OD-PGGAN against traditional OD models.

Reply:

Thank you for your valuable suggestion. To improve the credibility of our study and ensure a fair comparison between OD-PGGANs and traditional OD models, we have redesigned our baseline comparison experiments. In our previous approach, we aggregated the OD matrices from all days before fitting the Gravity and Radiation models. However, in our revised experiments, we no longer perform this summation. Instead, we fit the Gravity and Radiation models separately for each day using the corresponding daily flow data, ultimately generating 31 OD matrices for each model. This refined approach better reflects whether the Gravity and Radiation models can capture the daily variations observed in the real OD dataset, providing a more accurate and meaningful comparison against OD-PGGANs.

 

we have made the following revisions:

Considering that physics-based methods generate only a single OD matrix. To en-sure a fair comparison with OD-PGGANs, which generate multiple OD matrix, we adapt these models to generate multiple OD matrix as well. Specifically, we fit the gravity and radiation models for each day using the corresponding daily data, allow-ing us to obtain 31 OD matrix for each model. Figure 6 illustrates the distribution of CPC, NRMSE, and JSD metrices for the four datasets: the test set, synthetic set gener-ated by OD-PGGANs, synthetic set generated by gravity model, synthetic set generated by radiation model. Across all five metrics, the distribution of the OD-PGGANs-generated dataset closely aligns with the test dataset, suggesting that the synthetic OD matrices effectively preserve the statistical characteristics and variability of real OD matrices. In contrast, the Gravity and Radiation models exhibit distinct dis-tributions, indicating that physics-based methods struggle to capture the heterogeneity of the real dataset.

Table 2 presents the statistical characteristics, the results indicate that OD-PGGANs achieve CPC and NRMSE values that are nearly identical to those of the test dataset, confirming the model's ability to generate OD matrices that reflect re-al-world mobility patterns. In contrast, Gravity and Radiation models tend to generate highly similar matrices, which is reflected in their higher CPC values and lower NRMSE and JSD values. This indicates that the OD matrix dataset generated by phys-ics-based models is insufficient to capture the variability between matrices in the real dataset. On the other hand, the synthetic dataset generated by OD-PGGANs effectively captures this variability, demonstrating that GAN-generated OD matrices better align with the real-world distribution.

 

Comment 4. Expand the model to generate time-sensitive OD matrices, which would enhance its utility for traffic forecasting and transit planning.

Reply:

Thank you for your helpful comment. We completely agree with your perspective that generating time-sensitive OD matrices would significantly enhance the applicability of OD-PGGANs in traffic forecasting and transit planning within ITS research. We sincerely apologize for not addressing this aspect in our current study due to time constraints. Given the scale of our dataset, reconstruct OD matrices at different time intervals and retraining the generative model could not be completed within the time constraints of this manuscript.

However, we recognize the importance of this research direction, and we plan to explore the generation of OD matrices at different time intervals throughout the day in our future studies. This extension will further improve the practical utility of OD-PGGANs for dynamic traffic analysis and real-time transportation modeling.

Comment 5. Line 97: "to simulate and generate urban traffic flow Studies have shown that", missing stop between "flow" and "Studies". Line 184-185: "higher solution and clarity of images", or should be "resolution"? Line 216: "Aggregation of Teacher Ensembles framework (PATE)[42, 43] framework", should be one "framework"? Line 354: title "Definition 7. OD Network. " is the same of Line 349.

Reply:

Thank you for your valuable suggestion. We apologize for the language errors in the manuscript. We have carefully reviewed and revised the paper to correct grammatical mistakes and improve overall clarity.

Author Response File: Author Response.docx

Reviewer 3 Report

Comments and Suggestions for Authors

The paper falls within the scope of the journal and has the potential to be considered for publication. However, the paper needs some improvements described through the following suggestions.

- Keywords should be modified because for example human mobility I found only four times in the text of the paper (not including keyword and one reference in the list of references). Also, two more keywords should be added.

- The paper contains a lot of technical errors, for example, no spacing ..."transportation modeling[1, 2]." Please refine the whole paper.

- Contributions and novelties should be with more details described in the introduction section. Also, more details should be provided about the motivation to perform this research.

- At the end of the introduction section as a paragraph, should be provided a short structure of the paper.

- A diagram of flow for the whole research should be added and well described.

- Please give emphasis on research questions, gaps, and your contributions from aspect to solve these gaps.

- Sections Results, Conclusion, and future research should be numbered.

- Section Results should be more elaborated and described.

- Clear limitations should be elaborated on in the conclusion. Also, clear discussion should be added to provide justification for your work.

- I would like to propose that authors provide more diversity in references and consider more sources and newer dates. Currently, a lot of used references are from Chinese authors.

Author Response

Comment 1. Keywords should be modified because for example human mobility I found only four times in the text of the paper (not including keyword and one reference in the list of references). Also, two more keywords should be added.

Reply:

Thank you for your valuable suggestion. We have revised the keywords accordingly, removing "human mobility" and adding "Intelligent Transportation Systems," "Gravity model," and "Radiation model" as new keywords to better reflect the content and focus of the study.

Comment 2. The paper contains a lot of technical errors, for example, no spacing ..."transportation modeling[1, 2]." Please refine the whole paper.

Reply:

Thank you for your valuable suggestion and appreciate the attention to detail. We apologize for the technical errors in the manuscript. We have carefully reviewed and refined the entire paper to correct formatting issues.

Comment 3. Contributions and novelties should be with more details described in the introduction section. Also, more details should be provided about the motivation to perform this research.

Reply:

Thank you for your valuable comment. We have carefully revised the introduction section to provide a more detailed explanation of the contributions and novelties of this study.

we have made the following revisions:

Although GANs-based methods can effectively generate synthetic OD matrix, re-search on using GANs to generate large scale OD matrix still remains limited. Due to rapid urbanization, urban areas have expanded significantly. Large scale OD matrix with more nodes can capture urban traffic flows at finer scale, which is important in the study of fine-scale urban studies and ITS researches. Using GANs to generate large scale OD matrix presents several challenges. There are two challenges as follows: How to generate OD matrix with thousands of nodes? In the OD matrix generation task, the output size of GANs directly determines the scale of the OD matrix. Current GANs-based researches typically can only generate OD matrix with tens of nodes [12]. Large scale OD matrix generation task meaning handling the topological connections between millions of nodes, which significantly increases the complexity of the training process. Another challenge is how to capture the implicit spatial relationships in OD matrix? OD matrix elements inherently encode spatial dependencies and geographical correlations. Directly applying image-based processing algorithms may fail to preserve these implicit geographical relationships.

 

The main contributions of this study are as follows. Firstly, the proposed OD-PGGANs model is capable of generating large scale OD matrix with thousands of nodes. Compared to traditional models, GANs-based model can produce more diverse synthetic OD matrix that have the same distribution of real sample, demonstrating the potential of GANs-based models for large-scale OD matrix generation tasks. Secondly, this study introduces a geography-based upsampling and downsampling algorithm that can capture the inherent spatial relationships between OD matrix at different spatial resolutions. This approach ensures that the progressive training process pre-serves the spatial relationship of OD matrix, allowing for a more accurate representa-tion of real world traffic flows patterns across varying scales.

Comment 4. At the end of the introduction section as a paragraph, should be provided a short structure of the paper.

Reply:

Thank you for your suggestion. We have added a brief paragraph at the end of the introduction section to outline the structure of the paper

 

we have made the following revisions:

The structure of this paper is as follows: Section 2 reviews related work on GANs models and OD matrix generation methods. Section 3 introduces the methodology, covering key concepts, the define of the research problem, the architecture of OD-PGGANs and its improvements, including the multi-scale generators and dis-criminators and geography-based upsampling and downsampling algorithm. Section 4 presents experimental setup, including experimental data, baseline models and evalu-ation metrics. Section 6 presents the experiments results, demonstrating the superior performance of OD-PGGANs in generating large scale OD matrix task. Section 6 summarizes the main content of the research and discusses future research directions.

Comment 5. A diagram of flow for the whole research should be added and well described.

Reply:

Thank you for your valuable comment. We have added a workflow diagram illustrating the overall research process and included a detailed description to enhance clarity and comprehensibility.

Comment 6. Please give emphasis on research questions, gaps, and your contributions from aspect to solve these gaps.

Reply:

Thank you for your valuable comment. We have revised the introduction section to clearly define the two main research questions, identify the existing research gaps, and elaborate on how our proposed approach addresses these gaps.

 

we have made the following revisions:

There are two challenges as follows: How to generate OD matrix with thousands of nodes? In the OD matrix generation task, the output size of GANs directly determines the scale of the OD matrix. Current GANs-based researches typically can only generate OD matrix with tens of nodes [12]. Large scale OD matrix generation task meaning handling the topological connections between millions of nodes, which significantly increases the complexity of the training process. Another challenge is how to capture the implicit spatial relationships in OD matrix? OD matrix elements inherently encode spatial dependencies and geographical correlations. Directly applying image-based processing algorithms may fail to preserve these implicit geographical relationships.

Comment 7. Sections Results, Conclusion, and future research should be numbered.

Reply:

Thank you for your valuable comment. We apologize for the formatting errors in the manuscript. We have carefully reviewed and revised the paper to correct all formatting inconsistencies

Comment 8. Section Results should be more elaborated and described.

Reply:

Thank you for your valuable comment. We have expanded the experimental section to provide a more detailed analysis and explanation of our results, ensuring a clearer presentation of our findings and conclusions.

Comment 9. Clear limitations should be elaborated on in the conclusion. Also, clear discussion should be added to provide justification for your work.

Reply:

Thank you for your valuable suggestion. We have revised the conclusion section to elaborate on the limitations of our study and provide a clearer discussion to justify our work.

 

we have made the following revisions:

Despite OD-PGGANs demonstrating the potential of GAN-based models for large-scale OD matrix generation task, there are several limitations that need to be ad-dressed in future research. Firstly, GAN-based models typically require a substantial amount of data to accurately capture the complex statistical properties of OD matrices. However, in real-world scenarios, OD matrix data may be limited due to privacy con-cerns, data scarcity, or high acquisition costs. To address this issue, this study intro-duces a sampling approach to construct the training dataset, ensuring sufficient data for model learning. While this method provides a cable solution, it also introduces sampling biases. Future research should explore robust data augmentation techniques to mitigate sampling biases and improve model generalizability under limited data conditions. Secondly, the region factors such as population distribution, points of in-terests, urban land use, and road network density have a direct impact on traffic flows, yet they are not explicitly incorporated into the current model. Future research should consider integrating the socio-demographic factors into the generative model to im-prove the interpretability and accuracy of synthetic OD matrix. Another promising future aspect is the geographical transferability of OD-PGGANs. Future studies should investigate whether OD-PGGANs trained in one city can be effectively adapted for use in other cities. Developing a generalizable generative model that integrates so-cio-demographic factors would help address the challenge of OD matrix estimation in cities where mobility data are scarce or unavailable.

Comment 10. I would like to propose that authors provide more diversity in references and consider more sources and newer dates. Currently, a lot of used references are from Chinese authors.

Reply:

Thank you for your valuable suggestion. We have carefully reviewed the references and made necessary adjustments to ensure a more diverse selection of sources, incorporating a broader range of studies, including more recent publications.

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

The authors responded point-for-point to the reviewers' concerns and revised the article. However, before acceptance, a few key concerns warrant further discussion:

What is synthetic data and why is the synthesis of the OD matrix important, or why OD matrix generation tasks important? Lines 62-64 are very unsolid statements. What's “real distribution”? If the real data is available, why need synthetic data? Normally two reasons are claimed for data synthetic: no data due to the high cost of collecting, and no ‘PUBLIC AVAILABLE” data due to privacy.  Line 56-58 and line 60-70 only give a very basic explanation. Based on the later mentioned workflow of this study, OD matrix generation based on mobile phone data, the workflow does not follow the claim of “data sparse”. So authors should focus on the privacy issues in the introduction to better elaborate the purpose. 

“Same distribution of real sample” is not a straightforward argument in this study, it could be “due to the data privacy, the synthetic data, as a replacement of real data with the same distribution of private owned data , could be generated and used publicly for other research and application ” The line 192-202 are good examples, which need to be added in the introduction section. 

The physical models are population and distance based estimation of global OD flow; while ML approach focuses on detecting connection between the OD flow with impact features. The reasoning parts in the line 81-86 are very weak. Authors could consider references to the reasoning part in other papers. 

Here I list some sentences from the referenced paper as examples: 

“Using it (synthetic data) in replace of real data. By controlling the data generation process, the end-user can, in principle, adjust the amount of private information released by synthetic data and control its resemblance to real data.” (ref to paper [4]). 

“OD matrix forecasting aims to generate the ODmatrix for a city without any historical flow information.” (Rong 2023) 

“Physics-based methods have relatively poor performance, while machine learning-based methods have recently achieved better results and wider applications [1, 14, 16, 42] aided by the sophisticated model structure.”(Rong 2023) 

For the mobile phone signaling data, what’s the location reference? Is it the cell tower based location or the GPS location?  

For line 589-591, how did authors “ fit the gravity and radiation models for each day using the corresponding daily data, allowing us to obtain 31 OD matrix for each model.” Does each matrix represent the corresponding 500k users data selected by each day?    

Other issue: 

Missing reference for many statements, such as line 72-76, line 88-100.

Ref: Rong, 2023, Complexity-aware Large Scale Origin-Destination Network Generation via Diffusion Model  (also, other Rong’s paper are recommended as related work)

For future work, examples such as GenAI (Zhou 2023) and LLM (Prabin 2024) are suggested in the discussion of synthetic OD flows:

- Zhilun Zhou, Jingtao Ding, Yu Liu, Depeng Jin, and Yong Li. 2023. Towards Generative Modeling of Urban Flow through Knowledge-enhanced Denoising Diffusion. In Proceedings of the 31st ACM International Conference on Advances in Geographic Information Systems (SIGSPATIAL '23). Association for Computing Machinery, New York, NY, USA, Article 91, 1–12. https://doi.org/10.1145/3589132.3625641 - Prabin Bhandari, Antonios Anastasopoulos, and Dieter Pfoser. 2024. Urban Mobility Assessment Using LLMs. In Proceedings of the 32nd ACM International Conference on Advances in Geographic Information Systems (SIGSPATIAL '24). Association for Computing Machinery, New York, NY, USA, 67–79. https://doi.org/10.1145/3678717.3691221



Author Response

First and foremost, we wish to express our sincere gratitude and appreciation for the Editor and the reviewer’s patience and constructive suggestions which help improving the quality of the paper significantly. The paper is substantially revised according to these suggestions.

Reviewer 2

Comment 1. What is synthetic data and why is the synthesis of the OD matrix important, or why OD matrix generation tasks important? Lines 62-64 are very unsolid statements. What's “real distribution”? If the real data is available, why need synthetic data? Normally two reasons are claimed for data synthetic: no data due to the high cost of collecting, and no ‘PUBLIC AVAILABLE” data due to privacy.  Line 56-58 and line 60-70 only give a very basic explanation. Based on the later mentioned workflow of this study, OD matrix generation based on mobile phone data, the workflow does not follow the claim of “data sparse”. So authors should focus on the privacy issues in the introduction to better elaborate the purpose.

Reply:

Thank you for your valuable suggestion. We have revised the introduction to provide a clearer explanation of what synthetic data is and why OD matrix generation is important. In particular, we emphasized the role of privacy protection as a primary motivation for generating synthetic OD matrices, especially in scenarios where real mobility data cannot be shared publicly due to regulatory or ethical concerns. Additionally, we have added several relevant references in the introduction to support and enrich the discussion.

 

we have made the following revisions:

60-78:

To overcome the limitations of high costs and privacy concerns associated with collecting real-world traffic mobility data, researchers have turned to synthetic data generation as a promising solution [6-8]. Synthetic data is generated by model and has the same statistical distribution of real data. This enables synthetic data to serve as a viable substitute for real data in both research and application by preserving the statistical characteristics while safeguarding sensitive information [9]. Synthetic data helps to address privacy concerns, compensate for incomplete, scarce, or biased datasets [10, 11]. In this context, the OD matrix generation task has garnered considerable critical attention. The OD matrix generation task aims to construct synthetic OD matrix that have the same distribution as the real OD matrix [12]. OD matrix genera-tion is particularly important in two scenarios. First, in cities where traffic flow data is limited or absent due to the high cost of deploying sensors or conducting surveys, OD matrix generation allows researchers to obtain usable OD matrix in the absence of traffic flow data. Second, in cases where real mobility data (e.g., mobile phone signal-ing data, GPS floating car data) are available but not public accessible due to privacy regulations, synthetic OD matrix can serve as a substitute in research while protecting individuals` privacy. Therefore, using OD matrix synthetic data helps to address the challenge of obtaining OD matrix data in ITS research, especially in cases where real world datasets are either difficult to obtain because of high costs or unsuitable for public use due to privacy concerns.

 

79-96

Previous studies on OD matrix generation generally fall into two primary categories: traditional physics-based methods and data-driven machine learning methods [13]. Physics-based methods apply physical laws to model traffic flow, such as the gravity model, radiation model, and intervention opportunity model, etc. The second category is data-driven machine learning methods. With the development of machine learning, models such as random forest [14], Kalman filters [15], and neural networks [16, 17] are widely used in traffic flow simulation. These machine-learning-based models construct complex fitting functions to capture the complex nonlinear between traffic flow and urban features such as transportation networks, urban land use, and points of interests (POIs), while also capturing the spatial relationships among different urban regions. However, existing research still has limitations, particularly in syn-thetic data generation task. As a result, machine-learning-based models often achieve better empirical performance than physics-based models [18, 19]. Physics-based models have fewer parameters and often ignore multiple factors that influence movement behavior (e.g., transportation facilities and urban built environment), resulting in suboptimal performance. On the other hand, machine-learning-based models primarily focus on fitting training data instead of capturing the intrinsic distribution of movements, which lead to poor generalization capability [13]. Neither method is an effective solution for OD matrix generation task.

Comment 2. For the mobile phone signaling data, what’s the location reference? Is it the cell tower based location or the GPS location? 

Reply:

Thank you for your constructive feedback. The mobile phone signaling data used in this study is based on cell tower locations, not GPS. We have added this clarification to the data description section of the manuscript..

 

we have made the following revisions:

476-479:

This data records the cell tower location information of users during activities such as phone calls, text messaging, switching between cell towers, and regular communication with cell towers. In Shanghai, the average coverage area of each tower is ap-proximately 0.447 .

 

Comment 3. For line 589-591, how did authors “ fit the gravity and radiation models for each day using the corresponding daily data, allowing us to obtain 31 OD matrix for each model.” Does each matrix represent the corresponding 500k users data selected by each day?

Reply:

Thank you for your valuable suggestion. When fitting the gravity and radiation models, we used the full set of users for each day to construct the daily OD matrix. This process was repeated for all 31 days, resulting in a total of 31 OD matrices for each model. We have clarified this point in the revised manuscript.

 

we have made the following revisions:

596-600:

Considering that physics-based models generate only a single OD matrix. To ensure a fair comparison with OD-PGGANs, which generate multiple OD matrix, we adopt these models to generate multiple OD matrix as well. Specifically, we used all users` data for each date to construct daily OD matrix, and then fit the Gravity and Radiation models based on the daily OD matrix which allowing us to obtain 31 OD matrix for each model.

 

Comment 4. Missing reference for many statements, such as line 72-76, line 88-100.

Reply:

Thank you for your helpful comment. We have carefully reviewed the manuscript and added the missing references, especially in the sections around lines 72–76 and 88–100. In particular, we have cited Rong’s work, which we find highly relevant and valuable to our study.

 

Comment 5. For future work, examples such as GenAI (Zhou 2023) and LLM (Prabin 2024) are suggested in the discussion of synthetic OD flows.

Reply:

Thank you for your valuable suggestion. We have incorporated a discussion of recent developments in GenAI and LLM-based approaches  into the discussion section of the manuscript to highlight their potential for future research in OD matrix generation.

 

we have made the following revisions:

695-700:

In addition, in recent years, GeoAI techniques and large language models (LLMs) have attracted significant attention in urban traffic flow research [59, 60]. In future work, we plan to incorporate GeoAI and LLMs-based approaches into the large-scale OD matrix generation task to further enhance model performance and scalability.

Author Response File: Author Response.docx

Reviewer 3 Report

Comments and Suggestions for Authors

The paper has been improved and all my suggestions have been adopted.

Author Response

Please allow us to once again express our sincere gratitude to the reviewer for the thoughtful comments and constructive suggestions. 

Back to TopTop