Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Logistics Sprawl and Urban Congestion Dynamics Toward Sustainability: A Logistic Regression and Random-Forest-Based Model

Sustainability 2025, 17(13), 5929; https://doi.org/10.3390/su17135929

by Manal El Yadari^*

, Fouad Jawab

, Imane Moufad and Jabir Arif

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Reviewer 4: Anonymous

Sustainability 2025, 17(13), 5929; https://doi.org/10.3390/su17135929

Submission received: 16 May 2025 / Revised: 22 June 2025 / Accepted: 24 June 2025 / Published: 27 June 2025

(This article belongs to the Special Issue Sustainable Operations and Green Supply Chain)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This paper is innovative and relevant in studying the relationship between logistics sprawl and urban congestion. However, there are still some areas for improvement in the literature review, data processing, results presentation, and conclusion sections. Here are the detailed review comments:

Insufficient Depth and Breadth of the Literature Review: There are few references to international cutting-edge studies, and a lack of critical analysis of existing research. The specific definition and manifestations of logistics sprawl are not described in detail, and there is no discussion on the relationship between logistics sprawl and urban spatial structure. The authors are advised to supplement more international frontier research, especially important research results in logistics sprawl and urban congestion in recent years. The definition, characteristics, and manifestations of logistics sprawl in different cities should be elaborated, and specific cases or data can be used to illustrate the specific impact of logistics sprawl.
General Discussion on the Relationship between Logistics Sprawl and Congestion: The discussion on the relationship between logistics sprawl and congestion is broad and lacks in-depth analysis. The summary of existing research is not systematic, with some references cited sporadically and lack of integration.
In line 175, Table 3 lists a large number of models and cites numerous references, but the authors merely enumerate the models and application fields without in-depth analysis. Is it necessary to simply list them in this way? It is suggested that the authors categorize and discuss the different machine learning models, analyze their advantages and disadvantages, and application scenarios in detail, and then select the models to be used in this study.
Since line 17, many paragraphs consist of only one sentence. This writing style is more like note-taking rather than an academic research paper. The language should be revised.
Insufficient detailed explanation of model selection in the methodology section: There is a lack of discussion on the specific advantages and applicability of Random Forest and Logistic Regression in dealing with the relationship between logistics sprawl and congestion.
Conclusion section mentions the limitation of data scarcity but fails to elaborate on the specific impact of data insufficiency on the research results and how to address this issue in future studies.
Although the suggestion of using more up-to-date and reliable datasets is made, specific directions for further exploration in future research are not clearly pointed out. For example, the impact of logistics sprawl on congestion levels in different types of cities (such as large cities and small and medium-sized cities) could be explored. The model could be further optimized by integrating real-time traffic data and Geographic Information System (GIS) technology. The indirect environmental impacts of logistics sprawl (such as carbon emissions) could also be investigated.

Comments on the Quality of English Language

Some sentence structures are relatively simple and lack diversity. For example, many paragraphs consist of only one sentence. This writing style resembles note-taking more than a complete academic paper. The authors are advised to increase the complexity and variety of sentences to enhance the readability of the article. There are occasional grammatical and spelling errors. For example, “Gaussien noise” should be “Gaussian noise,” and a comma should be added before “and” in “Logistics Sprawl, Logistic Regression and Traffic Congestion.” The language expression in some paragraphs is rather rigid and lacks coherence. For example, when introducing models and data, the authors list a large number of models and references but do not conduct in-depth analysis and integration, making the paragraphs lengthy and unfocused. Some sentences are not expressed clearly and need further polishing. For example, “We developed a model that uses logistic regression and Random Forest to identify and predict the impact of logistics sprawl on congestion and define the relationship between variables.” This sentence can be further simplified and optimized.

Author Response

We would like to express our sincere gratitude to the reviewer for their valuable time, thoughtful evaluation, and constructive feedback. Your insightful comments have been instrumental in enhancing the quality and clarity of our manuscript. We truly appreciate the effort and attention you devoted to reviewing our work.

1.Insufficient Depth and Breadth of the Literature Review:

We sincerely thank the reviewer for this valuable comment and for highlighting the importance of a robust and comprehensive literature review. In response, we carefully revisited the literature review and have focused on incorporating the most relevant and high-impact international studies that directly align with the specific scope and objectives of our research. Rather than expanding the number of references extensively, we prioritized depth over breadth by selecting key works that provide meaningful insights into the definition, characteristics, and spatial manifestations of logistics sprawl.

2.General Discussion on the Relationship between Logistics Sprawl and Congestion

In response, we have revised the discussion section to provide a more structured and in-depth analysis of the relationship between logistics sprawl and congestion. We reorganized the content to ensure a more systematic integration of existing research, and we strengthened the logical flow by grouping studies according to key thematic areas—such as trip length extension, vehicle-kilometer increase, and urban traffic diffusion.

3. In line 175, Table 3 lists a large number of models and cites numerous references..

We have reviewed the table and categorized the algorithms into several distinct groups, while also analyzing the frequency of their application.

4. Since line 17, many paragraphs consist of only one sentence...

In response, we have tried to revise the manuscript to improve the flow and coherence by merging several one-sentence paragraphs into more developed and cohesive units

5. Insufficient detailed explanation of model selection in the methodology section

We have integrated several paragraphs to explain the model selection, as all as the use of metrics that evaluates the potential of the different models and justify our choice.

6 . conclusion section mentions the limitation of data scarcity...

We have addressed the issue of data scarcity by applying data augmentation techniques. This increase in the dataset size allows the model to train more effectively and to better generalise across a wide range of possible cases. We have clarified this point in the revised conclusion and highlighted the role of data augmentation in mitigating the limitations caused by insufficient data.

7.Although the suggestion of using more up-to-date and reliable datasets is made

Thank you for your valuable remark. We plan to integrate these variables and GIS technology in future work. The current study focuses primarily on identifying the relationship between the variables identified in the literature. The integration of such data as inputs does not impact the core model architecture or its performance directly, but rather serves to increase the dataset size and improve the model’s learning process through data augmentation.

8.Some sentence structures are relatively simple and lack diversity

Thank you for your observation regarding the sentence structure. We have reviewed the manuscript and worked to improve the variety and complexity of the sentences to enhance readability and overall flow.

Reviewer 2 Report

Comments and Suggestions for Authors

This research proposes an innovative and useful technique to investigating the link between logistical sprawl and urban congestion. The authors successfully use machine learning techniques, notably logistic regression and random forest algorithms, to model and predict urban congestion levels. The study is well-structured, with separate sections for methodology, model creation, and results analysis.

The combination of logistic regression (for interpretability) and random forest (for predictive strength) is a thoughtful strategy. The authors also use data augmentation techniques such as SMOTE, Gaussian noise, and linear interpolation to increase dataset quality and ensure robust model training. The thorough examination of the factors influencing congestion is likewise commendable.

However, there are some areas for improvement, such as clarity, an additional model validation, and the possible research of new variables.

1.While the work is generally well-organized, some sections, notably those dealing with model creation and data augmentation, should be simplified even more. For example, the explanation of data augmentation strategies may be more brief, emphasizing their specific contribution to model performance.

2. The tables giving data, such as the performance metrics for each model, are valuable, but they might be supplemented with further information on the possible causes of the observed performance discrepancies.

3. Although the authors provide strong results, the validation of the models could be enhanced with a more detailed explanation of cross-validation techniques. For example, using k-fold cross-validation or providing additional tests to assess the generalizability of the model would strengthen the robustness of the findings.

4. The dataset used for training the model is augmented and balanced, but the paper would benefit from a more detailed discussion on how the model performs on real-world datasets and whether it has been tested on unseen data or in different urban settings.

5. The paper largely identifies logistics sprawl, population density, and automobile ownership as the primary causes of congestion. While these are significant, other variables like as road network capacity, infrastructure development, and the impact of emerging transportation technology (e.g., self-driving cars) could be investigated.

6. Policy interventions, such as congestion pricing or road usage laws, could also be explored as potential solutions to congestion, providing a more comprehensive view of the problem.

7. While the article mentions the scarcity of data, it would be helpful to more explicitly recognize the limits of the current investigation and the assumptions made during model construction. For example, the influence of missing data and the findings' applicability to cities with vastly different logistics and transportation systems should be examined.

8. The study lays a solid platform for future research, notably in extending the model to include new variables, enhancing machine learning models, and applying the approach to real-world case studies. Suggestions for further research should be highlighted, directing the reader to potential avenues for developing this research.

Author Response

While the work is generally well-organized, some sections.

We have reviewed the methodology section, optimised and explained the used augmentation technique.

2. The tables giving data, such as the performance metrics..

We tried to enhance the clarity and depth of our analysis through adding more detailed explanations regarding the potential causes of the observed performance differences between the models.

3.Although the authors provide strong results, the validation of the models ..

We have integrated the calculation of K folds to validate the model performance

4.The dataset used for training the model is augmented and balanced...

We would like to clarify that the model is not adversely impacted by real-world datasets, as it is designed to generalize well and learn from a variety of scenarios. Furthermore, our dataset was carefully split, with 80% used for training and 20% reserved for testing and validation. This means the model was evaluated on unseen data, ensuring its robustness and ability to perform well in different urban settings.

5.The paper largely identifies logistics sprawl, population density,

Thank you for this insightful suggestion. We acknowledge that factors such as road network capacity, infrastructure development, and emerging transportation technologies like self-driving cars are important variables that can influence congestion. We plan to incorporate the investigation of these factors in our future research to provide a more comprehensive understanding of congestion dynamics.

6.Policy interventions,

Thank you for this valuable recommendation. We have highlighted policy interventions such as congestion pricing and road usage regulations in the conclusion section to emphasize their potential as future directions for addressing congestion. Exploring these solutions further will help provide a more comprehensive view of the problem in subsequent research.

7.While the article mentions the scarcity of data, ...

We would like to clarify that the model is not negatively impacted by real-world datasets, as it is designed to generalize well and learn from a variety of possible cases. Additionally, our dataset was carefully split, with 80% used for training and 20% reserved for testing and validation(K Fold). Therefore, the model has been evaluated on unseen data, ensuring its robustness and ability to perform effectively across different urban settings.

8.the study lays a solid platform for future research, ...

We highlighted this in the conclusion, We agree that highlighting suggestions for further research will strengthen the paper. We will emphasize potential avenues such as incorporating additional variables, improving machine learning techniques, and applying the model to real-world case studies to guide readers towards future developments in this area

Reviewer 3 Report

Comments and Suggestions for Authors

1 General Comments

1) Originality and Relevance:

The combination of Logistic Regression and Random Forest to study urban congestion in relation to logistics sprawl is a novel methodological choice. While both models are widely used individually, their complementary application to this specific urban planning context is relatively rare.

2) Methodology Improvements:

While the paper evaluates both logistic regression and random forest models using standard metrics such as accuracy, AUC, precision, and recall, it would benefit from a more in-depth error analysis. To enhance the robustness and generalizability of the results, the authors could incorporate cross-validation techniques, such as 10-fold cross-validation.

3) Consistency of Conclusions:

The conclusions of the paper are generally consistent with the methodology and results, but there are some areas where the reasoning could be more cautious.

For example, the conclusion suggests that logistics sprawl generally reduces congestion, but, in the article, data is constructed and augmented, not collected uniformly across cities. It is Recommended to Qualify conclusions with “in the modeled data” or “based on this simulation” to avoid overstating real-world generality.

4). Appropriateness of References:

The appropriateness of references in the paper is generally acceptable.

2 Specific comments

1). Wording:

In the abstract, the dashes in “in-creases” on line 8, “ex-plored”, “dif-ferent” (on line 12, 13), “Ma-chine” (on line 17) “Ran-dom Forest” on line 18, “exe-cuted” on line 25, “varia-bles” on line 27 should be removed. Unnecessary dashes also appear across the paper.

2). Comments on Tables and Figures:

Figures 3–5 should use properly formatted axis labels instead of variable names with underscores (e.g., congestion_on_roads). Descriptive and reader-friendly labels will improve clarity. Additionally, each figure should include a detailed caption that clearly explains what is being shown and why it is significant.

Comments for author File: Comments.pdf

Author Response

2.Methodology Improvements:

Thank you for your valuable suggestion. We have integrated cross-validation techniques, specifically 15-fold cross-validation, into our evaluation process. We reviewed the results obtained from this approach, which further support the robustness and generalizability of our models. We will include this detailed analysis in the revised manuscript to enhance the study’s rigor.

2.Consistency of Conclusions:

We have reviewed the conclusion in light of the suggested direction.

3.Wording and figures are also reviewed as requested

Reviewer 4 Report

Comments and Suggestions for Authors

This paper investigates the relationship between logistics sprawl and urban congestion, employing logistic regression and random forest models. It combines data extracted from literature and databases, uses data augmentation techniques (e.g., SMOTE, MixUp), and evaluates model performance using accuracy, AUC, F1-score, precision, and recall. The authors conclude that combining both models yields robust and interpretable results.

Urban freight, logistics sprawl, and congestion are critical challenges for sustainable cities. The study addresses a real and impactful problem.
Use of multiple machine learning models and comparative evaluation is thorough and appropriate.
Combining logistic regression (for interpretability) with random forest (for accuracy) is well-justified.
Incorporating SMOTE, Gaussian noise, and MixUp for dataset balancing and augmentation is innovative for this context.
it compiles a rich overview of prior work and modeling approaches.
Use of standard and relevant indicators (Accuracy, AUC, etc.) to compare models is strong and informative.

however :

1.The study uses synthetic data derived from the literature and augmentation tools rather than empirical data collected in a systematic or real-time manner. So You have to Clearly highlight that the model is a conceptual/methodological prototype, not yet validated on empirical urban datasets. Encourage future work to test it on real-world congestion data.

2.Some sentences are awkward or grammatically incorrect, especially in the abstract and introduction. For example, “We have considered the congestion level as the dependent variable and we fixed the increasing distance…” (line 14) is awkward. It is recommended to verify the language by proofreading by a native English speaker or editor. Streamline syntax and grammar throughout.

The paper repeats definitions of model metrics (precision, recall, F1) in both the methodology and results sections in near-identical language. It is preferable to consolidate these explanations to avoid redundancy. A reference to the metric table would suffice later in the text.

4.The limitations of the models—especially logistic regression’s assumptions (e.g., linearity, independence)—are mentioned but not critically discussed in the context of the study. I recommend to add a subsection discussing model assumptions, potential biases, and how these might affect interpretation and generalizability.

5.While the results are detailed, the discussion and conclusion are underdeveloped. There is limited reflection on what the findings imply for urban planners, policy makers, or logistics operators. It is better to Expand the conclusion to include:Real-world implications/ Recommendations for sustainable logistics planning/ Suggestions for integrating this model into policy tools or urban planning dashboards.

6.Terms like "logistics sprawl", “urban core”, and “peripheral areas” should be clearly defined early and used consistently.

This paper shows strong methodological contribution and relevance. However, language, structure, and empirical grounding need improvement before publication. With these changes you will improve the paper remarkably.

Author Response

1.The study uses synthetic data derived from the literature and augmentation too:

Thank you, it'is highlighted in the research paper

2. Some sentences are awkward or grammatically incorrect, especially in the abstract and introduction. For example,

We have integrated this observation in the paper

3.The paper repeats definitions of model metrics (precision, recall, F1) in both the methodology and results sections in near-identical language

We consolidated the section and removed redundcy

4.The limitations of the models—especially logistic regression’s assumptions (e.g., linearity, independence)—are mentioned but not critically discussed

We discussed the assumption in the research paper as requested

5.While the results are detailed, the discussion and conclusion are underdeveloped. There is limited reflection on what the findings imply for urban planners, policy makers, or logistics operators.

We integrated this in thepaper conclusion as requested

6.Terms like "logistics sprawl", “urban core”, and “peripheral areas” should be clearly defined early and used consistently.

We have introduced them as requested

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors In Section 2, the Research Background, the author has written in a bullet-point style. This does not read like a formal academic paper, but rather like notes. Please revise this section to conform to the standards of academic writing. The font in Figure 1 is not consistent. Please make the necessary corrections.

Author Response

I would like to sincerely thank the reviewers for their valuable feedback and insightful suggestions. Your thorough and thoughtful comments have significantly contributed to improving the clarity, structure, and quality of the paper. I greatly appreciate the time and effort you all invested in evaluating the manuscript, and I am confident that the revisions made in response to your feedback have strengthened the overall work.

Reviewer 1:

Comments

In Section 2, the Research Background, the author has written in a bullet-point style. This does not read like a formal academic paper, but rather like notes. Please revise this section to conform to the standards of academic writing. The font in Figure 1 is not consistent. Please make the necessary corrections.

Answer

The bullet-point style in Section 2, "Research Background," has been revised and reformulated into a more formal, narrative format to align with the standards of academic writing. The section now presents a cohesive and structured discussion rather than a list of points.
Regarding Figure 1, the font inconsistencies have been addressed. The font style, size, and formatting have been reviewed and adapted to ensure consistency with the overall style of the paper.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

While you acknowledged that the dataset is synthetic and derived from secondary sources, this should be stated more prominently and explicitly in the Abstract and Conclusion. This ensures the reader does not assume the model has already been validated on real-time urban datasets.

While grammar has improved, some residual awkward phrases remain (e.g., “we defined the behavior of logistics sprawl in different cities…” could be “we analyzed how logistics sprawl manifests in various cities”). The article would benefit from final language revision .

Although the assumptions of logistic regression were acknowledged in the revised version, a deeper critique is still warranted. For instance:

How might multicollinearity, linearity in the logit, or feature dependence bias the model's outcome?
How robust is the model if tested on real-world (possibly unbalanced or noisy) datasets?

The paper is lengthy and occasionally repetitive in the modeling and methodology sections. Consider summarizing technical details or shifting them to an appendix.

Tables are dense and could benefit from visual simplification. Consider highlighting the best-performing model in Table 5 (e.g., bold or color shading).

The paper is methodologically solid and makes a notable contribution to urban freight and sustainability modeling. Once the issues above are addressed especially clarity about the conceptual nature of the model and improved language editing, it will be suitable for publication.

Author Response

Comments

While you acknowledged that the dataset is synthetic and derived from secondary sources, this should be stated more prominently and explicitly in the Abstract and Conclusion. This ensures the reader does not assume the model has already been validated on real-time urban datasets.

Answer

We sincerely thank the reviewer for this valuable observation. We understand the importance of clearly stating the nature and origin of the dataset used in our study to avoid any misunderstanding regarding its empirical validation.

To clarify, the dataset used in our model is not purely synthetic. It was collected from real and reliable sources, namely a literature review and open-source databases, including transit-related data repositories. In order to enhance the dataset’s robustness, balance, and representativeness, we applied standard data augmentation techniques such as SMOTE, MixUp, Gaussian noise, and linear interpolation. These methods were used exclusively to address issues like class imbalance or limited data instances, particularly for underrepresented congestion levels or flow categories.

We have intentionally avoided generating artificial or fabricated data, and instead relied on verified secondary sources before performing augmentation. To reflect this more explicitly and avoid confusion, we have reworded the Abstract to specify the dataset’s origin and the nature of the augmentation process. This should ensure that readers clearly understand that the model has not yet been tested on real-time primary urban data, but is based on an enriched dataset derived from actual open data and published literature.

à The dataset is based on actual secondary data sources and open-source databases, and was subsequently augmented using standard techniques to improve balance and data quality.

Comments

While grammar has improved, some residual awkward phrases remain (e.g., “we defined the behavior of logistics sprawl in different cities…” could be “we analyzed how logistics sprawl manifests in various cities”). The article would benefit from final language revision .

Answer

The grammar has been improved throughout the paper, and I have made revisions to address the awkward phrases pointed out. For example, the phrase “we defined the behavior of logistics sprawl in different cities…” has been reworded to “we analyzed how logistics sprawl manifests in various cities.”
I have also conducted a thorough final review of the paper’s language to further optimize clarity, readability, and flow.

I appreciate your helpful suggestions and believe the language has been strengthened as a result.

Comments

How might multicollinearity, linearity in the logit, or feature dependence bias the model's outcome?

Answer

While we did not include an explicit test such as the Variance Inflation Factor (VIF), we mitigated potential multicollinearity concerns by using a Random Forest model alongside logistic regression. Random Forest is inherently robust to multicollinearity and serves as a complementary tool that confirms the stability and importance of the features. This hybrid approach strengthens confidence in our findings by combining interpretability (Logistic Regression) with robustness to feature dependencies (Random Forest).

Additionally, we included interaction terms to better capture the relationship between variables, which partially addresses the assumption of linearity in the logit. We acknowledge that our model design already incorporates mechanisms to minimize the risk of biased results due to multicollinearity or non-linearity.

Comments

How robust is the model if tested on real-world (possibly unbalanced or noisy) datasets?

Answer

Our model was built on data collected from a combination of literature review and open-source databases, representing diverse urban contexts. Recognizing the limitations in real-world data availability, we employed several data augmentation techniques (SMOTE, MixUp, Gaussian noise, and linear interpolation) to simulate the challenges of class imbalance and data variability, ensuring a more generalized model structure.

To further ensure robustness, we applied 15-fold cross-validation, which reduces the risk of overfitting and tests model stability across different data splits. The consistency in performance metrics across these folds indicates that the model is capable of handling a variety of input patterns, similar to those found in real-world scenarios.

Thus, while the dataset is derived from secondary sources, we believe the model is built on a sound methodological foundation that makes it suitable for application and validation on real-world datasets in future work.

Comments

The paper is lengthy and occasionally repetitive in the modeling and methodology sections. Consider summarizing technical details or shifting them to an appendix.

Answer

I appreciate your suggestion to consider moving some technical details to an appendix. While I understand the concern about length, I have made efforts to streamline the sections where possible, ensuring that the core technical aspects are presented clearly and concisely. Moving these details to an appendix may compromise the reader's ability to fully grasp the steps involved in the model development and validation. However, I believe that the modeling and methodology sections are critical for understanding the workflow and validating our model. These sections provide essential context and clarity for the reader, ensuring that the methodology is transparent and well-understood

Thank you for your feedback, and I hope this explanation clarifies the reasoning behind keeping these sections in the main body of the paper.

Comments

Tables are dense and could benefit from visual simplification. Consider highlighting the best-performing model in Table 5 (e.g., bold or color shading).

Answer

I have made efforts to optimize it visually for better readability. Specifically, in Table 5, the best-performing model has been highlighted in bold to draw attention and improve clarity.

Author Response File: Author Response.pdf

Article Menu

Logistics Sprawl and Urban Congestion Dynamics Toward Sustainability: A Logistic Regression and Random-Forest-Based Model

Further Information

Guidelines

MDPI Initiatives

Follow MDPI