Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Machine Learning-Driven Multimodal Feature Extraction and Optimization Strategies for High-Speed Railway Station Area

Land 2025, 14(5), 1039; https://doi.org/10.3390/land14051039

by Xiang Li, Fa Zhang

, Ziyi Liu, Yao Wei, Runlong Dai, Zhiyue Qiu, Yuxin Gu and Hong Yuan^*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Land 2025, 14(5), 1039; https://doi.org/10.3390/land14051039

Submission received: 5 April 2025 / Revised: 4 May 2025 / Accepted: 7 May 2025 / Published: 9 May 2025

(This article belongs to the Topic Spatial Decision Support Systems for Urban Sustainability)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This study developed a multimodal feature extraction and evaluation framework specifically for large-scale analysis of HSR station areas. It has certain innovativeness. The content is complete and has strong logic. The main problems are as follows:

1.What do Q1 and Q3 in Table 2 refer to?

2.The correlation between independent variables was not tested.

3.Why was Random Forest chosen for the study? Why were other ensemble algorithms (e.g., GBDT) not selected?

4.Figure 9 is not clear; it is recommended to revise its presentation.

5.Regarding the calculation of target variables "land supply capacity (LS)" and "demand intensity," please provide supporting references.

Author Response

Comments 1: [What do Q1 and Q3 in Table 2 refer to?]

Response 1: Thank you for pointing this out. We agree with this comment. An explanatory note regarding Q1 and Q3 has been added to the footnote of Table 2. This clarification is now highlighted in red on Lines 200-201 of Page 7 in the revised manuscript. [Q1 (lower quartile) and Q3 (upper quartile) correspond to the 25th and 75th percentiles, respectively, in descendingly ordered sample data.]

Comments 2: [The correlation between independent variables was not tested.]

Response 2: We acknowledge the constructive nature of this technical suggestion. It should be noted, however, that the Boruta-Random Forest feature selection methodology implemented in our study exhibits built-in robustness to feature dependencies, as explicitly stated with supporting citations in Lines 97-100 on Page 3 of the revised manuscript: [The Random Forest algorithm requires no variance inflation factor (VIF) diagnostics [42], while the Boruta mechanism automatically determines feature importance through shadow feature comparisons without subjective intervention [43]]

Comments 3: [Why was Random Forest chosen for the study? Why were other ensemble algorithms (e.g., GBDT) not selected?]

Response 3: [Compared to bagging, random forests is more computationally efficient on a tree-by-tree basis since the tree building process only needs to evaluate a fraction of the original predictors at each split, although more trees are usually required by random forests. Combining this attribute with the ability to parallel process tree building makes random forests more computationally efficient than boosting. The HSR station samples exhibit high-dimensional features. RF algorithm eliminates the need for VIF diagnostics to address multicollinearity among features, thereby significantly streamlining methodological procedures. Compared with other ensemble algorithms, random forest demonstrates superior predictive accuracy, enhanced noise resistance, reduced parameter tuning complexity, lower computational costs, and minimized overfitting risks.] We are grateful for your rigorous identification of this methodological discourse gap. As substantively revised, the rationale for algorithm selection now features an expanded explanation in Section 4.1.2 (Lines 360-368 on Page 12), supported by additional references to seminal works in ensemble learning. These enhancements are visually demarcated with red highlighting for immediate reviewer reference.

Comments 4: [Figure 9 is not clear; it is recommended to revise its presentation.]

Response 4: We sincerely appreciate your astute observation regarding the visual clarity of Figure 9. The revised version now appears in its optimized form at Line 684 on Page 22.

Comments 5: [Regarding the calculation of target variables "land supply capacity (LS)" and "demand intensity," please provide supporting references.]

Response 5: We thank the reviewer for the suggestion. We apologize that the calculations of LS and LD in our manuscript were derived from some relevant references that were not previously listed. They have now been added on Page 9, Line 306 of the revised manuscript.

Reviewer 2 Report

Comments and Suggestions for Authors

This paper aims at addressing land resource wastage and inefficient planning in HSR station areas in China by building a national database of 1018 stations, applying Boruta random forest for feature selection, K-means++ clustering and providing typology specific planning strategies. The paper offers an interesting database that covers big sample size, a clear strategic recommendations that are operationalizable, and a strong theoretical basis that link land supply-demand equilibrium to sustainable planning.

However:

the article lacks a validation section (either through empirical testing or case studies). You could discuss at least one small case study to demonstrate practical applicability
Why 3 km buffer were chosen with the Voronoi method?
Why IQR is used and is more suitable for land use variables? Add a short clarification

This manuscript has significant potential for publication after addressing the above concerns

Comments on the Quality of English Language

The english language used in this manuscript is generally clear, formal and appropriate

Author Response

Comments 1: [The article lacks a validation section (either through empirical testing or case studies). You could discuss at least one small case study to demonstrate practical applicability.]

Response 1: We sincerely appreciate this valuable suggestion, which we had carefully considered during the manuscript preparation. [A concise discussion of this aspect was indeed included (Lines 819-829, Page 25), though positioned outside the main narrative to preserve structural focus. Based on media reports, the reason for the closure of Wuhan Pu'an Station (a Cluster 7 station cluster in the 2020 study sample) in 2022 corroborates our core contradiction diagnosis and risk warning related to Cluster 7 HSR stations.] This small real-world case somewhat supports the applicability of our proposed development strategy.

Comments 2: [Why 3 km buffer were chosen with the Voronoi method?]

Response 2: We appreciate the valuable suggestion regarding the rationale for the 3 km buffer zone in our Voronoi method application. As originally stated on Page 4, Lines 124-126: [Given accelerated urban land expansion that expands HSR stations' actual spatial reach [48], this study delineates station influence zones as circular areas with a 3 km radius.] Recognizing the need for stronger justification, we have revised this section (Lines 124-130, Page 4 of the revised manuscript) with the following additions highlighted in red: [Our nationwide HSR station samples cover diverse regions in China, including highly developed areas where stations exhibit broader spatial influence [48]. Additionally, policy considerations also necessitate sufficiently large influence ranges: edge effects may distort spatial metrics when analysis zones are too small, as proximity to boundaries compromises measurement validity [49]. These factors, this study delineates station influence zones as circular areas with a 3 km radius.]

Comments 3: [Why IQR is used and is more suitable for land use variables? Add a short clarification.]

Response 3: We sincerely appreciate your suggestion. [The robust standardization using the interquartile range (IQR) was implemented to address the outlier sensitivity inherent in traditional methods like Z-score normalization. Given the presence of skewed distributions and numerous outliers in our dataset with substantial value variations across variables, IQR-based standardization provides more stable and reliable standardized results.] We have added this clarification in Lines 344-349 on Page 12 of the revised manuscript.

Reviewer 3 Report

Comments and Suggestions for Authors

Review

I am grateful for the opportunity I have had to review the article “Machine Learning-Driven Multimodal Feature Extraction and Optimization Strategies for High-Speed Railway Station Area”, as it has been a very interesting experience. I would therefore like to congratulate the authors. It is not always easy to combine different techniques and achieve success, although as I have read it, some ideas have come to mind that will help to improve the reading and understanding of the content. In line with this, I have the following observations:

General aspects:
The article focuses on the application of different techniques from the machine learning environment, although their contribution makes them more ambitious in trying to limit the problems that arise from the use of this type of technique. Section 4 stands out, where the authors present the methodology they have followed, explaining its basic foundations. In addition, they present interesting results in a solid way and in line with the objective of the research. However, I think that the discussion of the results should be separated, as the distinction between them is not clearly appreciated. Furthermore, within the discussion I would recommend including the problems that arise from the use of machine learning algorithms. Similarly, the use of other techniques, also well established in Geographic Information Systems, could be addressed.

Specific comments:

- Introduction: From my point of view, the authors have done a good job of compiling the literature, setting out the different lines of research and their limitations. Personally, I fully agree with their statements, since in other environments decision-making lacks the rigor that an in-depth study can provide, especially when it is backed up by multivariate techniques. Perhaps this is the point that is least well resolved, as anyone who is not familiar with Geographic Information Systems and their potential will not fully understand the reason for choosing random forests or any other geospatial technique. In this sense, I would like the authors to specify the reason for this choice, either here or, preferably, in the methodology.

- Area of study and research framework:
The section on the study area is interesting and provides useful figures, such as the distribution of the high-speed network in the study area. This part should be improved by including the complete cartography of the country and then leaving the specific map of the study area. I also recommend including the main cities to have a correct spatial reference. On the other hand, when you construct the Voronoi diagrams with a radius of 3 km, I have some doubts. They justify it because most of the literature sets the radius for the case of China at 2-3 km. However, from the reader's point of view, I think it would be possible to influence some justifying aspect to determine this distance. I am convinced that each author chooses a specific radius for a reason, but it is not reflected in the text. I recommend clarification in this regard. On the other hand, section 2.2 clearly reflects the workflow carried out by the authors through a figure. Personally, I think that readers would appreciate a graphic synthesis of all the processes followed.

- According to the structure proposed by the journal, sections 2, 3 and 4 should be merged, as they deal with Materials and Methods. However, I think that given their complexity and length, the authors have acted correctly in dividing them.

- Sections 3, referring to the data, and 4 are very well resolved.

- Section 5 includes results and discussion. I think they should be separated, as proposed in the structure of the articles. There is a clear difference between what are results and what is a discussion. The results are really well explained and presented, although the clarity of some figures could be improved, as their content is not easily appreciated. This is the case with figures 6 and 9, which could be made larger, especially figure 6a, which refers to the distribution of the clusters obtained. It could also be included with higher resolution in the supplementary material. With regard to the discussion, as well as being separate from the results, it should be more ambitious, highlighting the advantages of using the methodological proposal followed by the authors over other analyses such as those focused on MGWR. They could also refer to the problems that can arise from the use of random forests. Among them, the literature highlights that the combination of multiple trees can reduce the interpretability compared to single-tree-based models. It also mentions that continuous predictors or qualitative predictors with many levels are more likely to contain, purely by chance, some optimal cut-off point, which is why they tend to be favored in the creation of trees. In addition, they are not capable of extrapolating outside the range observed in the training data.

- Finally, the conclusions include some of what could be integrated into the discussion.

- Recommended complementary bibliography:
- Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani Introduction to Statistical Learning, Springer, 2023
- Max Kuhn, Kjell Johnson. Applied Predictive Modeling. Springer, 2013.
- T.Hastie, R.Tibshirani, J.Friedman. The Elements of Statistical Learning. Springer, 2009
- Bradley Efron and Trevor Hastie. Computer Age Statistical Inference: Algorithms, Evidence, and Data Science (1st. ed.). Cambridge University Press, USA, 2016.

Final assessment of the article:
I think it is an interesting article, especially at the methodological level, whose combination of techniques offers novel results. I believe that with a few small adjustments it would be notably improved for the reader.

Good luck

Author Response

Comments 1: [- Introduction: From my point of view, the authors have done a good job of compiling the literature, setting out the different lines of research and their limitations. Personally, I fully agree with their statements, since in other environments decision-making lacks the rigor that an in-depth study can provide, especially when it is backed up by multivariate techniques. Perhaps this is the point that is least well resolved, as anyone who is not familiar with Geographic Information Systems and their potential will not fully understand the reason for choosing random forests or any other geospatial technique. In this sense, I would like the authors to specify the reason for this choice, either here or, preferably, in the methodology.]

Response 1: We are grateful for your rigorous identification of this methodological discourse gap. [As substantively revised, the rationale for algorithm selection now features an expanded explanation in Section 4.1.2 (Lines 360-368 on Page 13), supported by additional references to seminal works in ensemble learning.] These enhancements are visually demarcated with red highlighting for immediate reviewer reference.

Comments 2: [- Area of study and research framework: The section on the study area is interesting and provides useful figures, such as the distribution of the high-speed network in the study area. This part should be improved by including the complete cartography of the country and then leaving the specific map of the study area. I also recommend including the main cities to have a correct spatial reference. On the other hand, when you construct the Voronoi diagrams with a radius of 3 km, I have some doubts. They justify it because most of the literature sets the radius for the case of China at 2-3 km. However, from the reader's point of view, I think it would be possible to influence some justifying aspect to determine this distance. I am convinced that each author chooses a specific radius for a reason, but it is not reflected in the text. I recommend clarification in this regard. On the other hand, section 2.2 clearly reflects the workflow carried out by the authors through a figure. Personally, I think that readers would appreciate a graphic synthesis of all the processes followed.]

Response 2: [We appreciate your suggestions regarding the study area, but we would like to clarify that the high-speed rail stations we studied are spread across the entire country, as shown by the red dots in Figure 1(a) of the manuscript, so the specific map of the study area should logically be the complete map of the entire country.] The area in Figure 3(a) is only a representative area chosen due to the impossibility of presenting a visual representation of each of the nation-wide sites that studied.

We appreciate the valuable suggestion regarding the rationale for the 3 km buffer zone in our Voronoi method application. Recognizing the need for stronger justification, we have revised this section (Lines 124-130, Page 4 of the revised manuscript) with the following additions highlighted in red: [Our nationwide HSR station samples cover diverse regions in China, including highly developed areas where stations exhibit broader spatial influence [48]. Additionally, policy considerations also necessitate sufficiently large influence ranges: edge effects may distort spatial metrics when analysis zones are too small, as proximity to boundaries compromises measurement validity [49]. These factors, this study delineates station influence zones as circular areas with a 3 km radius.]

We appreciate your suggestions regarding the Methodological workflow. [In the revised manuscript, we have inserted relevant images at some key points to make it easier for readers to understand, as detailed on Page 4, Line 136.]

Comments 3: [- Section 5 includes results and discussion. I think they should be separated, as proposed in the structure of the articles. There is a clear difference between what are results and what is a discussion. The results are really well explained and presented, although the clarity of some figures could be improved, as their content is not easily appreciated. This is the case with figures 6 and 9, which could be made larger, especially figure 6a, which refers to the distribution of the clusters obtained. It could also be included with higher resolution in the supplementary material. With regard to the discussion, as well as being separate from the results, it should be more ambitious, highlighting the advantages of using the methodological proposal followed by the authors over other analyses such as those focused on MGWR. They could also refer to the problems that can arise from the use of random forests. Among them, the literature highlights that the combination of multiple trees can reduce the interpretability compared to single-tree-based models. It also mentions that continuous predictors or qualitative predictors with many levels are more likely to contain, purely by chance, some optimal cut-off point, which is why they tend to be favored in the creation of trees. In addition, they are not capable of extrapolating outside the range observed in the training data.]

Response 3: We fully concur with your observation regarding the separation of results and discussion. [The original '5. Results and Discussion' section has been restructured into two distinct chapters: '5. Results' and '6. Discussion' in the revised manuscript.]

We endorse your observations regarding Figures 6 & 9. [Both visualizations have been redesigned/reformatted with enhanced clarity in the revised version (see updated Figures 6 & 9).]

We sincerely appreciate your detailed and constructive comments on the discussion section. [In response to the suggestions, we have revised the manuscript by incorporating additional content on Page 22, Lines 685–700, which has been highlighted in red for easy identification.]

[We are grateful for the bibliographic references that you recommended, and we have added citations to these in the appropriate places.]

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

I have no comments now

Reviewer 2 Report

Comments and Suggestions for Authors

The authors have addressed most of my previous comments so I am generally satisfied with the revisions made.

Article Menu

Machine Learning-Driven Multimodal Feature Extraction and Optimization Strategies for High-Speed Railway Station Area

Further Information

Guidelines

MDPI Initiatives

Follow MDPI