Machine Learning-Based Prediction Performance Comparison of Marshall Stability and Flow in Asphalt Mixtures
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsQ1: Line 1
Although three algorithms were used for feature importance analysis, similar multi algorithm comparative analysis has also been involved in other studies.
It is suggested that the author can further explore the innovation of the research, such as exploring new feature combinations or improving algorithm structures to enhance predictive performance, or conducting research from new perspectives (such as considering the impact of special working conditions or new materials on the performance of asphalt mixtures).
Q2: Line 153
Although you mentioned that these models have been widely used in road engineering literature, it is possible to elaborate in more detail why these specific models are most suitable for this research question. For example, you mentioned that 'LR is simple and a good baseline model for comparing the performance of more advanced algorithms.', but the specific reasons for choosing LR as the baseline model were not explained in detail.
Q3: Line 178
For data sources, the author can conduct more detailed screening and preprocessing of the data, or use some data quality assessment methods to ensure the reliability of the data. At the same time, the article should discuss in detail the potential impacts and countermeasures of data source diversity.
Q4: Line 417
Although the results section provides detailed performance metrics for each model on the training and testing sets, there is insufficient in-depth analysis of the results. For example, when overfitting occurs in certain models (such as the GBM model), it is only briefly mentioned without in-depth analysis of the specific reasons for overfitting, such as whether it is caused by factors such as dataset size, model complexity, or data characteristics.
Q5: Line 466
In the conclusions section, a discussion on the practical application value of the research results should be added to help readers better understand the significance of this study.
Q6: The following relevant references should be considered:
https://doi.org/10.1177/0361198119846473
Comments on the Quality of English LanguageFluent.
Author Response
Comment 1: Although three algorithms were used for feature importance analysis, similar multi algorithm comparative analysis has also been involved in other studies.
It is suggested that the author can further explore the innovation of the research, such as exploring new feature combinations or improving algorithm structures to enhance predictive performance, or conducting research from new perspectives (such as considering the impact of special working conditions or new materials on the performance of asphalt mixtures).
Response 1: Thank you for your suggestion. This study focused on utilizing established machine learning algorithms to analyze the significance of features in predicting Marshall Stability and Flow. Our approach aimed to provide insights into the key influencing factors while maintaining a clear and focused scope. We appreciate your suggestions and will consider them as potential directions for future research to broaden the scope of research.
Comment 2: Although you mentioned that these models have been widely used in road engineering literature, it is possible to elaborate in more detail why these specific models are most suitable for this research question. For example, you mentioned that 'LR is simple and a good baseline model for comparing the performance of more advanced algorithms.', but the specific reasons for choosing LR as the baseline model were not explained in detail.
Response 2: Thank you for your comment. We have addressed this point in the manuscript (see line 155). LR is one of the simplest ML models, making it a natural starting point for comparison. It is selected as the baseline model because it provides a reference point for evaluating the performance of more complex algorithms. Since LR assumes a linear relationship between the features and the target variable, it can serve as a benchmark. If more advanced models such as RF or ANN don’t significantly outperform LR, it implies that the relationships in the data may be simpler than expected and the additional complexity might not be justified.
Comment 3: For data sources, the author can conduct more detailed screening and preprocessing of the data, or use some data quality assessment methods to ensure the reliability of the data. At the same time, the article should discuss in detail the potential impacts and countermeasures of data source diversity.
Response 3: Thank you for your comment. We have added the following text to the manuscript (see line 168). To ensure data reliability in this study, we implemented several preprocessing steps. Missing data were carefully handled by identifying and addressing gaps before analysis, ensuring the dataset was complete. StandardScaler was applied to standardize the data for feature scaling which is necessary to maintain the consistency across variables with different ranges and units. Also, to remove any potential bias resulting from the original dataset order and ensure that the training and testing sets are representative of the whole dataset, we randomized the data using a shuffling technique.
Comment 4: Although the results section provides detailed performance metrics for each model on the training and testing sets, there is insufficient in-depth analysis of the results. For example, when overfitting occurs in certain models (such as the GBM model), it is only briefly mentioned without in-depth analysis of the specific reasons for overfitting, such as whether it is caused by factors such as dataset size, model complexity, or data characteristics.
Response 4: Thank you for your comment. The study noted and briefly addressed overfitting in the GBM models. We acknowledge that further analysis of its causes, such as dataset size, model complexity, or data characteristics could provide additional insights and we will consider this for future work.
Comment 5: In the conclusions section, a discussion on the practical application value of the research results should be added to help readers better understand the significance of this study.
Response 5: Thank you for your comment. We have incorporated your suggestion in the manuscript (see line 516). Identifying features like VMA, Air void % and Gmb can help engineers understand their impact on MS and MF which may allow them to optimise asphalt mix designs to create durable and flexible pavements that withstand traffic loads.
This study also highlighted the importance of selecting suitable algorithms for specific prediction tasks. The superior performance of the RF algorithm as compared to others can serve as a recommendation for other researchers to leverage ensemble-based approaches for similar engineering tasks.
The methodology applied in this study can also be extended to predict other critical pavement properties such as fatigue resistance, rutting potential and thermal cracking resistance. This approach allows the use of ML in a broader range of pavement design and analysis tasks.
Comment 6: The following relevant references should be considered:
https://doi.org/10.1177/0361198119846473
Response 6:
Thank you for the suggestion. We will review the reference and consider its relevance to the study.
Reviewer 2 Report
Comments and Suggestions for AuthorsThe manuscript aims to compare various ML techniques used to predict the Marshal Stability and Flow parameters using published data. The manuscript is good for publication, however it needs some improvements.
1. The terms asphalt and bitumen coexist in the manuscript, it is better to use either asphalt or bitumen.
2. Why the data was divided into 80% for training and 20% for testing, a better practice is 70:30.
3. Please incorporate R2 in Table 2, and if you have an R2 value of more than 85% using simple regression analysis then why you have used such complex methodologies, just for the sake of novelty?
4. I suggest incorporating the developed models/equations in the manuscript for all the ML techniques used in the manuscript.
5. Table 1 should show some initial data points and then some data points from the end.
6. The color combinations of all the figures could be improved for better representation, and the text could be made consistent with the manuscript text.
Comments on the Quality of English LanguageThe overall sentence structure needs to be improved.
Author Response
Comment 1: The terms asphalt and bitumen coexist in the manuscript, it is better to use either asphalt or bitumen.
Response 1: Thank you for your comment. We have made the necessary edits where possible (see lines 46 & 50).
Comment 2: Why the data was divided into 80% for training and 20% for testing, a better practice is 70:30.
Response 2: Thank you for your comment. While the 70:30 split is indeed a common practice, we chose the 80:20 ratio because it provides more data for training, which can be beneficial when the dataset is of moderate size. This approach ensures the model has enough data to learn effectively while still leaving a sufficient portion for testing its performance.
Comment 3: Please incorporate R2 in Table 2, and if you have an R2 value of more than 85% using simple regression analysis then why you have used such complex methodologies, just for the sake of novelty?
Response 3: Thank you for your comment. Table 2 is designed to summarize the descriptive statistics of the parameters providing an overview of their distributions and characteristics. The R2 values of all ML algorithms for MS and MF are presented in Table 7, which focuses on evaluating model performance.
Comment 4: I suggest incorporating the developed models/equations in the manuscript for all the ML techniques used in the manuscript.
Response 4: Thank you for your suggestion. The models used in this study were implemented using scikit-learn which relies on internal algorithms. However, the principles of each model are discussed in the manuscript.
Comment 5: Table 1 should show some initial data points and then some data points from the end.
Response 5: Thank you for your suggestion. Table 1 is designed to summarize the parameter value ranges along with the corresponding references and we opted not to include initial and final data points to maintain this format.
Comment 6: The color combinations of all the figures could be improved for better representation, and the text could be made consistent with the manuscript text.
Response 6: Thank you for your comment. We have updated the color combinations of the figures where possible for better representation and aligned the text in the figures with the manuscript for consistency.