Synthetic Data Generation Methodology for Construction Machinery Assembly Optimization
Round 1
Reviewer 1 Report
Comments and Suggestions for Authors1.Although multiple engineering and mathematical literature sources are cited, the positioning of synthetic data in the legal and ethical dimensions is not explained, and the theoretical foundation therefore remains weak. For example, issues such as data ownership, algorithmic bias, and model accountability mechanisms are not addressed. It is recommended to supplement the discussion with the provisions of the EU’s Artificial Intelligence Act (2024) and the GDPR concerning the risks of data re-identification, so as to enhance the institutional feasibility of the study.
2.The author should discuss the regulatory risks associated with the generation of synthetic data, including data traceability and the accountability of regulators. If this technology is applied to public construction or urban engineering, it must consider the compliance procedures required under data protection laws; otherwise, the article would face problems concerning data legality and privacy protection.
3.With only ten engineering projects in the Czech region as a sample and a deviation of 13% used as the validation criterion, the statistical foundation is weak, revealing the narrow scope of the model validation methods. It is recommended to employ statistical indicators such as RMSE or the K–S test, or to expand the sample toward cross-national comparisons to enhance external validity.
4.The author emphasizes cost and efficiency, but there is insufficient discussion of the interaction among land use efficiency, tax policies, and sustainable urban development, which results in a lack of connection between the theoretical and practical aspects of the article. If the model were applied to urban land development or public construction, its optimization results might affect land value taxation and regional finance. Expanding this discussion would enhance the public policy relevance of the research.
5.The text displays repetitive English sentence structures and inadequate transitions between paragraphs (especially between the second and third sections). It is suggested to shorten or refine sentences and to add logical transition expressions—such as notably, in fact, or as one might expect—to improve readability and avoid a mechanical tone. Professional English editing is also recommended.
6.English translations or brief explanations should be provided for several Russian and Czech references to improve accessibility for an international audience.
7.An enhanced Data Governance Framework diagram should be included, illustrating the processes of data generation, verification, and regulatory oversight.
8.Consistency in the use of symbols—such as “Ks” and “Ks1”—must be ensured, and table units (e.g., USD thousands or integers) should be standardized.
9.Appendix A is excessively long; it is recommended to submit it separately as supplementary material.
10.This paper stands out in terms of engineering and technological contribution but lacks depth in the institutional and ethical dimensions. If it could:
(1) Add a chapter on the Legal and Regulatory Analysis of Synthetic Data.
(2) Explore the potential impact of the technology on land use and urban finance.
(3) Reflect on the limitations of the model assumptions in the discussion section,
then this paper would have the potential to become a representative study that integrates AI engineering with urban governance.
Comments on the Quality of English Language
The text displays repetitive English sentence structures and inadequate transitions between paragraphs (especially between the second and third sections). It is suggested to shorten or refine sentences and to add logical transition expressions—such as notably, in fact, or as one might expect—to improve readability and avoid a mechanical tone. Professional English editing is also recommended.
Author Response
Dear Reviewer,
Thank you for giving the opportunity to submit a revised version of my manuscript titled „Synthetic Data Generation for AI-Driven Optimization of Construction Machinery Assemblies” to Buildings.
I appreciate the time and effort that you have dedicated to providing your valuable feedback on my manuscript. I am grateful to you for insightful comments on the paper. I have been able to incorporate changes to reflect most of the suggestions provided by your review. I have made some changes to the manuscript.
Here is a point-by-point response to the reviewers’ comments and concerns.
Comment 1:
1.Although multiple engineering and mathematical literature sources are cited, the positioning of synthetic data in the legal and ethical dimensions is not explained, and the theoretical foundation therefore remains weak. For example, issues such as data ownership, algorithmic bias, and model accountability mechanisms are not addressed. It is recommended to supplement the discussion with the provisions of the EU’s Artificial Intelligence Act (2024) and the GDPR concerning the risks of data re-identification, so as to enhance the institutional feasibility of the study.
Response 1:
In accordance with the reviewer’s comment, the discussion section was expanded to elaborate on the legal and ethical dimensions of the proposed approach (Lines 684-709).
Comment 2:
2.The author should discuss the regulatory risks associated with the generation of synthetic data, including data traceability and the accountability of regulators. If this technology is applied to public construction or urban engineering, it must consider the compliance procedures required under data protection laws; otherwise, the article would face problems concerning data legality and privacy protection.
Response 2:
In accordance with the reviewer’s comment, the discussion section was expanded to elaborate on the legal and ethical dimensions of the proposed approach, see previous comment 1 and Lines 684-709.
Comment 3:
3.With only ten engineering projects in the Czech region as a sample and a deviation of 13% used as the validation criterion, the statistical foundation is weak, revealing the narrow scope of the model validation methods. It is recommended to employ statistical indicators such as RMSE or the K–S test, or to expand the sample toward cross-national comparisons to enhance external validity.
Response 3:
I sincerely apologize for having provided only a summary of the verification procedure in the previous submission. The validation of the synthetic data generator was conducted across ten large-scale projects, each comprising between three and seven independent construction tasks. For each task, real-world data were independently collected and evaluated. In the revised manuscript, the verification section has been expanded to include additional indicators and datasets. Validation of the synthetic data involved statistical verification through correlation analysis (R², Pearson correlation) and comparison of key statistical measures, particularly the Root Mean Square Error (RMSE), applied to both the real and generated datasets. Furthermore, an F-test was performed for each category, including total cost, completion time, CO₂ emissions, fuel consumption, and number of failures. To examine whether the real and synthetic datasets followed a normal distribution, the Kolmogorov–Smirnov test was applied to each category (lines 586-615). Each construction task was treated as an independent data record, capturing task volumes, parameter specifications, and the actual deployment of machine assemblies. Synthetic data were subsequently generated using the developed model and compared against the corresponding real-world datasets. In total, 49 comparisons were conducted, and the results were aggregated and summarized by project into ten consolidated records. The revised description of the verification process can be found in the manuscript on lines 636–642. The proposal to verify the research results outside the Czech Republic toward cross-national comparisons has been discussed in the expanded Discussion section on lines 663–664.
Comment 4:
4.The author emphasizes cost and efficiency, but there is insufficient discussion of the interaction among land use efficiency, tax policies, and sustainable urban development, which results in a lack of connection between the theoretical and practical aspects of the article. If the model were applied to urban land development or public construction, its optimization results might affect land value taxation and regional finance. Expanding this discussion would enhance the public policy relevance of the research.
Response 4:
The Discussion section has been expanded to address a highly relevant contemporary challenge — the efficient use of resources, which may have implications for land value taxation and regional finance on lines 652 - 655.
Comment 5:
5.The text displays repetitive English sentence structures and inadequate transitions between paragraphs (especially between the second and third sections). It is suggested to shorten or refine sentences and to add logical transition expressions—such as notably, in fact, or as one might expect—to improve readability and avoid a mechanical tone. Professional English editing is also recommended.
Response 5:
I agree with the suggestion to improve the readability of the article and to refine the academic English. A professional language editing service has been used to carry out an expert revision of the manuscript’s English. The corresponding certificate is attached. The paper has undergone English language editing by MDPI. The text has been checked for correct use of grammar and common technical terms and edited to a level suitable for reporting research in a scholarly journal. MDPI uses experienced, native English-speaking editors.
Comment 6:
6.English translations or brief explanations should be provided for several Russian and Czech references to improve accessibility for an international audience.
Response 6:
I appreciate the reviewer’s valuable comment. To improve accessibility for an international readership, I added English translations for several Russian and Czech references (references 11, 16, 17, 19, 21-23, 25-29, 34 and 45-48). I believe these revisions make the manuscript more readable and relevant for a global audience.
Comment 7:
7.An enhanced Data Governance Framework diagram should be included, illustrating the processes of data generation, verification, and regulatory oversight.
Response 7:
Figure 7 has been revised according to the reviewer’s suggestion and now presents an enhanced Data Governance Framework diagram, highlighting the key components of data generation, verification, and regulatory oversight (lines 498).
Comment 8:
8.Consistency in the use of symbols—such as “Ks” and “Ks1”—must be ensured, and table units (e.g., USD thousands or integers) should be standardized.
Response 8:
I sincerely thank the reviewer for this helpful comment. The use of symbols (such as “Ks” and “Ks1”) has been reviewed for consistency throughout the manuscript, and the units in all tables have been standardized accordingly, lines 290, 585.
Comment 9:
9.Appendix A is excessively long; it is recommended to submit it separately as supplementary material.
Response 9:
I acknowledge the reviewer’s valid concern that Appendix A may appear overly detailed. Nevertheless, I believe that including a comprehensive description of the simulation in one document enhances the clarity and coherence of the paper, since the theoretical exposition is inherently connected to the simulation process. The supplementary materials already include the generated and verification datasets, along with the corresponding source codes.
Comment 10:
10.This paper stands out in terms of engineering and technological contribution but lacks depth in the institutional and ethical dimensions. If it could:
(1) Add a chapter on the Legal and Regulatory Analysis of Synthetic Data.
Response 10-1:
In accordance with the reviewer’s comment, the discussion section was expanded to elaborate on the legal and ethical dimensions of the proposed approach, see previous comment 1 and 2, Lines 684-710
(2) Explore the potential impact of the technology on land use and urban finance.
Response 10-2:
In accordance with the reviewer’s comment, the discussion section was expanded to explore the potential impact of the model on land use and urban finance, Lines 652-655
(3) Reflect on the limitations of the model assumptions in the discussion section,
Response 10-3:
In accordance with the reviewer’s comment, the discussion section was expanded to reflect on the limitations, Lines 643-668
then this paper would have the potential to become a representative study that integrates AI engineering with urban governance.
Comment 11:
The text displays repetitive English sentence structures and inadequate transitions between paragraphs (especially between the second and third sections). It is suggested to shorten or refine sentences and to add logical transition expressions—such as notably, in fact, or as one might expect—to improve readability and avoid a mechanical tone. Professional English editing is also recommended.
Response 11:
I agree with the suggestion to improve the readability of the article and to refine the academic English. A professional language editing service has been used to carry out an expert revision of the manuscript’s English. The corresponding certificate is attached. The paper has undergone English language editing by MDPI. The text has been checked for correct use of grammar and common technical terms and edited to a level suitable for reporting research in a scholarly journal. MDPI uses experienced, native English-speaking editors.
I look forward to hearing from you in due course regarding our submission and remain available to address any further questions or comments you may have.
Sincerely, Author of the manuscript.
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe author presents an interesting topic regarding the investigation of synthetic data generation for AI-driven optimization of construction machinery assemblies. However, there are several aspects that require further consideration such as references, structure of the paper, and the novelties. I attach my specific comments below:
- Abstract
- Who will benefit from this study?
- Introduction
- There are too many claims made by the author without supporting references, especially line 27-65
- Line 78-80: what are their works?
- I highly suggest that the author divide the general background of the study and the literature review part to make better clarity
- Methods
- Line 154: what are they?
- Figure 2 is not clear. Does it present a linear flow? Which one is input, process, output?
- The Table 3 needs further elaboration and analysis.
- Where is the case study?
- Where is the discussion part? This cannot be embedded in the conclusion part.
- What are the novelties of the study?
Comments for author File:
Comments.pdf
Author Response
Dear Reviewer,
Thank you for giving the opportunity to submit a revised version of my manuscript titled „Synthetic Data Generation for AI-Driven Optimization of Construction Machinery Assemblies” to Buildings.
I appreciate the time and effort that you have dedicated to providing your valuable feedback on my manuscript. I am grateful to you for insightful comments on the paper. I have been able to incorporate changes to reflect most of the suggestions provided by your review. I have made some changes to the manuscript.
Here is a point-by-point response to the reviewers’ comments and concerns.
Comment 1:
The author presents an interesting topic regarding the investigation of synthetic data generation for AI-driven optimization of construction machinery assemblies. However, there are several aspects that require further consideration such as references, structure of the paper, and the novelties. I attach my specific comments below:
- Abstract
- Who will benefit from this study?
Response 1:
Thank you for pointing this out. I agree with the comment and have added further details to the abstract to address it (lines 010-028).
The study will primarily benefit construction equipment manufacturers and contractors by improving design efficiency, reducing costs, and enhancing machine performance. It will also support maintenance providers and researchers through easier servicing, predictive diagnostics, and the advancement of AI applications in construction engineering.
Comment 2:
- Introduction
- There are too many claims made by the author without supporting references, especially line 27-65
Response 2:
Thank you very much for your valuable comment. I have revised the Introduction section and added the relevant references 1–10 (lines 034-070, 923-944).
Comment 3:
- Line 78-80: what are their works?
Response 3:
I agree with this comment and have incorporated your suggestion throughout the introduction. Additional descriptions have been added to the manuscript to clarify these points (lines 082-105).
Comment 4:
- I highly suggest that the author divide the general background of the study and the literature review part to make better clarity
Response 4:
I agree with the comment. For greater clarity and improved readability, the Introduction has been divided into two sections: General Background and Literature Review. Lines 034-180.
Comment 5:
- Methods
- Line 154: what are they?
Response 5:
Thank you for your feedback. Additional details and references on the mathematical modeling have been included in lines 193–202.
Comment 6:
- Figure 2 is not clear. Does it present a linear flow? Which one is input, process, output?
Response 6:
The diagram is presented as a linear process flow diagram. To improve clarity, the individual procedural steps, representing the modeling phases, have been added. Line 240.
Comment 7:
- The Table 3 needs further elaboration and analysis.
Response 7:
I fully agree with this comment. I apologize for the omission of the explanation related to the table. The clarification has been added to the revised version of the manuscript, line 552-559.
Comment 8:
- Where is the case study?
Response 8:
I apologize for this oversight. The description of the case study object has now been included in the Methodology section (lines 500-505).
Comment 9:
Where is the discussion part? This cannot be embedded in the conclusion part.
Response 9:
Thank you for your suggestion. I agree with this comment. The Discussion section has been completely revised and reorganized into several subsections to improve clarity and readability (lines 617-710).
Comment 10:
- What are the novelties of the study?
Response 10:
I agree with this comment and have incorporated your suggestion throughout the manuscript. The Discussion section has been expanded for greater clarity (lines 675-683) and Introduction section has been expanded (lines 160-168).
I look forward to hearing from you in due course regarding our submission and remain available to address any further questions or comments you may have.
Sincerely, Author of the manuscript.
Reviewer 3 Report
Comments and Suggestions for AuthorsSubject: Peer Review Report for Manuscript – "Synthetic Data Generation for AI-Driven Optimization of Construction Machinery Assemblies"
Manuscript ID: buildings-3963322
Dear Author,
This manuscript explores the development of a methodology and model for generating synthetic data to train AI systems aimed at optimizing construction machinery assemblies—an area where data scarcity poses a significant challenge for the construction industry. The study presents a commendable understanding of an important industrial problem and outlines a promising research direction. However, to achieve the level of scientific rigor, clarity, and impact expected for publication, the manuscript requires revisions. Below, I provide my detailed comments and recommendations.
Overall Assessment
The manuscript addresses the important and timely problem of data scarcity for training AI models in construction management. The proposed solution—a synthetic data generator based on queuing theory and Monte Carlo simulation—is a logical and necessary direction for the field. However, the manuscript in its current form suffers from three critical weaknesses.
Firstly, its strategic contribution is poorly framed; it reads more like a technical report on a specific software tool than a research article making a significant, generalizable contribution to knowledge.
Secondly, the methodology lacks the requisite scientific rigor, with key parameters and assumptions left unjustified and the validation process being statistically weak.
Finally, the manuscript's narrative is disjointed, with a significant logical gap between the theoretical economic principles presented and their practical implementation in the simulation model, which hinders readability and undermines the paper's core argument.
Section-Specific Comments and Recommendations
- Abstract (Lines 9-25):
- Lines 19-20: The abstract states that "Selected generated data were compared with and validated against real construction projects." However, it omits the key quantitative result of this validation. Including the main finding (e.g., "validation against 10 real-world projects showed a mean deviation of less than 13%") would significantly increase the abstract's impact and transparency.
- Introduction (Lines 26-141):
- Lines 35-38: The introduction is verbose and contains generic statements that add little value, such as "From this reflection, a simple conclusion arises: the contemporary professional must employ modern scientific methods, mathematical models, and AI models to keep pace with constantly advancing development." This section should be rewritten to be more direct, immediately stating the problem, the specific gap in existing models, and the contribution of this work in a concise manner.
- Lines 98-99: The introduction identifies the research gap as the "critical lack of high-quality training data." While a valid practical problem, this is not framed as a sharp, theoretical research gap. A top-tier paper must be driven by clear tension in the literature, leading to explicit research questions (RQs), which are currently absent from the manuscript.
- Methods and Models (Section 2 & Appendix A):
- Lines 197-313 (Pages 5-9): The paper presents a lengthy and detailed section on microeconomic theory (isoquants, isocosts, MRTS). However, the manuscript fails to create a clear link—a "golden thread"—showing how these theoretical concepts are implemented or used to constrain the queuing theory and Monte Carlo model described later. A new paragraph or subsection is needed to explicitly bridge this gap and explain how the theory is operationalized in the simulation.
- Lines 376-377: The paper states the decision to apply "Queueing Theory and Monte Carlo." However, it fails to rigorously justify this choice over simpler or alternative simulation approaches (e.g., discrete-event simulation frameworks). A comparative discussion illustrating the strategic advantage of this specific combination is required.
- Lines 557-618 (Appendix A): The appendix, which serves as a worked example of the methodology, contains numerous unjustified assumptions that undermine the entire model's credibility:
- The calculation begins with an extension coefficient Kn = 1.25 (Line 562) without any source or justification.
- Table A1 presents key input parameters ("Average travel time," "Probability of failure," etc.) as fixed values without citing any literature, manufacturer data, or empirical study.
- In lines 591-595, the model introduces stochasticity with specific numerical adjustments (TN = 6 + 0.058, Ty = 42 + 0.052) that appear without any explanation. The origin of these critical values is missing.
- The appendix concludes that "11 dumpers" are required (line 609), but the exact decision criterion used to draw this conclusion from the data in Table A3 is not explicitly stated.
- Generation and Verification of Data (Sections 3 & 4):
- Lines 426-427 (Figure 7): The workflow shown in Figure 7 is too generic. The descriptions ("Input system parameters," "Calculate variant parameters") need to be explained in much greater detail. The manuscript must be self-contained in explaining the model's logic, perhaps with pseudo-code or a more detailed walkthrough in Section 3.
- Lines 472-473 (Figure 8): This figure is difficult to interpret. The y-axis is labeled "Links," but it is unclear what this value represents, and its relationship to "Queue" is not defined. The timeline on the x-axis lacks units. The caption and labels need to be changed so that all of the axes and data series are clear.
- Line 615 (Figure A1): The appendix presents a clear visualization of diminishing returns, a key economic finding of the simulation. However, this important result and its practical implications (e.g., the high marginal cost of adding dumpers beyond the optimum) are never mentioned or analyzed in the main Results or Discussion sections, representing a missed opportunity to connect the model's output to practical insights.
- Lines 494-501 (Validation): The validation described in Section 4 is statistically weak. The comparison between synthetic and real data is presented as an aggregate "Deviation" percentage (Table 4). This method is insufficient to validate a stochastic model. Furthermore, the claim that an overall deviation below 13% (Line 500) constitutes a successful validation is an arbitrary and unjustified threshold. This section must be reworked to include appropriate statistical hypothesis testing (e.g., Kolmogorov-Smirnov test) to compare the data distributions.
- Conclusions and Discussion (Section 5):
- Lines 515-530: The limitations list (starting line 515) and the "Further research" list (starting line 522) are highly repetitive. These sections should be combined and restructured so that each identified limitation is directly and logically linked to a specific future research direction.
- Lines 538-542 (Author Contributions): The extensive list of author contributions for a single-author paper is unconventional and should be removed or formatted according to the journal's standard guidelines.
Sincerely.
Author Response
Dear Reviewer,
Thank you for giving the opportunity to submit a revised version of my manuscript titled „Synthetic Data Generation for AI-Driven Optimization of Construction Machinery Assemblies” to Buildings.
I appreciate the time and effort that you have dedicated to providing your valuable feedback on my manuscript. I am grateful to you for insightful comments on the paper. I have been able to incorporate changes to reflect most of the suggestions provided by your review. I have made some changes to the manuscript.
Here is a point-by-point response to the reviewers’ comments and concerns.
Comment 1:
This manuscript explores the development of a methodology and model for generating synthetic data to train AI systems aimed at optimizing construction machinery assemblies—an area where data scarcity poses a significant challenge for the construction industry. The study presents a commendable understanding of an important industrial problem and outlines a promising research direction. However, to achieve the level of scientific rigor, clarity, and impact expected for publication, the manuscript requires revisions. Below, I provide my detailed comments and recommendations.
The manuscript addresses the important and timely problem of data scarcity for training AI models in construction management. The proposed solution—a synthetic data generator based on queuing theory and Monte Carlo simulation—is a logical and necessary direction for the field. However, the manuscript in its current form suffers from three critical weaknesses.
Firstly, its strategic contribution is poorly framed; it reads more like a technical report on a specific software tool than a research article making a significant, generalizable contribution to knowledge.
Response 1:
I fully understand and agree with this comment. The manuscript has been substantially revised, with all changes highlighted in yellow in the updated version. The Introduction, Methodology, Verification, and Discussion sections have been thoroughly rewritten to ensure greater clarity and readability. For better transparency of the applied algorithm, a pseudo-code of the synthetic data generator has been added as an additional appendix. The scientific procedures, including the methodology, modeling, validation, and verification, have been refined to be not only easily understandable and consistent but also replicable across other domains and problem areas. Please refer to the revised version of the manuscript.
Comment 2:
Secondly, the methodology lacks the requisite scientific rigor, with key parameters and assumptions left unjustified and the validation process being statistically weak.
Response 2:
I sincerely apologize for having provided only a summary of the verification procedure in the previous submission. The validation of the synthetic data generator was conducted across ten large-scale projects, each comprising between three and seven independent construction tasks. For each task, real-world data were independently collected and evaluated. In the revised manuscript, the verification section has been expanded to include additional indicators and datasets. Validation of the synthetic data involved statistical verification through correlation analysis (R², Pearson correlation) and comparison of key statistical measures, particularly the Root Mean Square Error (RMSE), applied to both the real and generated datasets. Furthermore, an F-test was performed for each category, including total cost, completion time, CO₂ emissions, fuel consumption, and number of failures. To examine whether the real and synthetic datasets followed a normal distribution, the Kolmogorov–Smirnov test was applied to each category (lines 586-615). Each construction task was treated as an independent data record, capturing task volumes, parameter specifications, and the actual deployment of machine assemblies. Synthetic data were subsequently generated using the developed model and compared against the corresponding real-world datasets. In total, 49 comparisons were conducted, and the results were aggregated and summarized by project into ten consolidated records. The revised description of the verification process can be found in the manuscript on lines 636–642.
Comment 3:
Finally, the manuscript's narrative is disjointed, with a significant logical gap between the theoretical economic principles presented and their practical implementation in the simulation model, which hinders readability and undermines the paper's core argument.
Response 3:
Thank you very much for this valuable comment. In the revised version of the manuscript, a logical bridge between the theoretical economic principles presented and their practical implementation in the simulation of synthetic data has been added (lines 362–372). To enhance understanding of the algorithm’s functionality, Appendix B (lines 807–921) now includes the pseudo-code, which provides a detailed description, including the integration of economic model parameters into the simulation process.
Comment 4:
Section-Specific Comments and Recommendations
- Abstract (Lines 9-25):
- Lines 19-20: The abstract states that "Selected generated data were compared with and validated against real construction projects." However, it omits the key quantitative result of this validation. Including the main finding (e.g., "validation against 10 real-world projects showed a mean deviation of less than 13%") would significantly increase the abstract's impact and transparency.
Response 4:
I fully agree with the suggestion to include the quantitative validation results in the abstract. This enhancement is expected to substantially increase the manuscript’s impact and improve its clarity for the readers (lines 010–031).
Comment 5:
- Introduction (Lines 26-141):
- Lines 35-38: The introduction is verbose and contains generic statements that add little value, such as "From this reflection, a simple conclusion arises: the contemporary professional must employ modern scientific methods, mathematical models, and AI models to keep pace with constantly advancing development." This section should be rewritten to be more direct, immediately stating the problem, the specific gap in existing models, and the contribution of this work in a concise manner.
Response 5:
I agree with this comment. The Introduction section has been reformulated and visually as well as conceptually divided into several subsections. Some parts have been shortened, while others were expanded with relevant references. Additional subsections describing the Research Gap and the Novelty of the contribution have also been added (lines 134-139, 160–169).
Comment 6:
- Lines 98-99: The introduction identifies the research gap as the "critical lack of high-quality training data." While a valid practical problem, this is not framed as a sharp, theoretical research gap. A top-tier paper must be driven by clear tension in the literature, leading to explicit research questions (RQs), which are currently absent from the manuscript.
Response 6:
The Introduction section was extended with additional subsections describing the Research Gap and the Novelty of the contribution, which clearly define the aims and objectives of this study (lines 134-139, 160–169).
Comment 7:
- Methods and Models (Section 2 & Appendix A):
- Lines 197-313 (Pages 5-9): The paper presents a lengthy and detailed section on microeconomic theory (isoquants, isocosts, MRTS). However, the manuscript fails to create a clear link—a "golden thread"—showing how these theoretical concepts are implemented or used to constrain the queuing theory and Monte Carlo model described later. A new paragraph or subsection is needed to explicitly bridge this gap and explain how the theory is operationalized in the simulation.
Response 7:
In the revised version of the manuscript, a logical bridge between the theoretical economic principles presented and their practical implementation in the simulation of synthetic data has been added (lines 362-372). To enhance understanding of the algorithm’s functionality, Appendix B (lines 807–921) now includes the pseudo-code, which provides a detailed description, including the integration of economic model parameters into the simulation process.
Comment 8:
- Lines 376-377: The paper states the decision to apply "Queueing Theory and Monte Carlo." However, it fails to rigorously justify this choice over simpler or alternative simulation approaches (e.g., discrete-event simulation frameworks). A comparative discussion illustrating the strategic advantage of this specific combination is required.
Response 8:
I fully understand this comment. In the Introduction section, a brief overview of existing methods for simulating discrete and continuous variables in construction was added. The foundation of this research is based on my previous doctoral dissertation, which already contained a detailed review of mathematical approaches. To simplify the presented article and improve the clarity of the results and simulation procedures, I decided to focus solely on the Monte Carlo Method and Queuing Theory. I agree that, for greater credibility of the results, a comparison of different modeling approaches would be beneficial. However, such an analysis would require an almost identical and extensive procedure for each method, which would significantly exceed the scope of a single paper. Therefore, the Discussion section now includes a note on potential future research directions, specifically the examination and comparison of other stochastic modeling techniques for discrete processes (e.g., Markov Chain).
Comment 9:
- Lines 557-618 (Appendix A): The appendix, which serves as a worked example of the methodology, contains numerous unjustified assumptions that undermine the entire model's credibility:
Response 9:
Appendix A has been revised. The missing variable descriptions and references have been added, and all changes are highlighted in yellow.
Comment 10:
- The calculation begins with an extension coefficient Kn = 1.25 (Line 562) without any source or justification.
Response 10:
The relevant reference has been added to the revised manuscript (line 735).
Comment 11:
- Table A1 presents key input parameters ("Average travel time," "Probability of failure," etc.) as fixed values without citing any literature, manufacturer data, or empirical study.
Response 11:
The justification of key input parameters in Table A1 has been added (lines 738-740).
Comment 12:
- In lines 591-595, the model introduces stochasticity with specific numerical adjustments (TN = 6 + 0.058, Ty = 42 + 0.052) that appear without any explanation. The origin of these critical values is missing.
Response 12:
The justification of critical values has been added (line 767-771).
Comment 13:
- The appendix concludes that "11 dumpers" are required (line 609), but the exact decision criterion used to draw this conclusion from the data in Table A3 is not explicitly stated.
Response 13:
A clarification regarding the final decision has been added to the manuscript (line 768-800).
Comment 14:
- Generation and Verification of Data (Sections 3 & 4):
- Lines 426-427 (Figure 7): The workflow shown in Figure 7 is too generic. The descriptions ("Input system parameters," "Calculate variant parameters") need to be explained in much greater detail. The manuscript must be self-contained in explaining the model's logic, perhaps with pseudo-code or a more detailed walkthrough in Section 3.
Response 14:
In the revised version of the manuscript, Figure 7 has been extended with additional blocks (line 499). To enhance the understanding of the algorithm’s functionality, Appendix B (lines 808-922) now includes the pseudo-code, which provides a detailed description, including the integration of economic model parameters into the simulation process.
Comment 15:
- Lines 472-473 (Figure 8): This figure is difficult to interpret. The y-axis is labeled "Links," but it is unclear what this value represents, and its relationship to "Queue" is not defined. The timeline on the x-axis lacks units. The caption and labels need to be changed so that all of the axes and data series are clear.
Response 15:
I agree with the suggestion to improve the readability of Figure 8. An additional description has been added (lines 549-559), and the x-axis and y-axis units and labels have been reformulated.
Comment 16:
- Line 615 (Figure A1): The appendix presents a clear visualization of diminishing returns, a key economic finding of the simulation. However, this important result and its practical implications (e.g., the high marginal cost of adding dumpers beyond the optimum) are never mentioned or analyzed in the main Results or Discussion sections, representing a missed opportunity to connect the model's output to practical insights.
Response 16:
I agree with this comment. In the revised version of the manuscript, I have added a logical bridge between the theoretical economic principles presented and their practical implementation in the simulation of synthetic data (lines 362-372). To facilitate understanding of the algorithm’s functionality, Appendix B (lines 808–922) now includes the pseudo-code, which provides a detailed description of the algorithm, including the integration of the economic model parameters into the simulation process. The Discussion section has been expanded to outline future directions for the research (lines 663).
Comment 17:
- Lines 494-501 (Validation): The validation described in Section 4 is statistically weak. The comparison between synthetic and real data is presented as an aggregate "Deviation" percentage (Table 4). This method is insufficient to validate a stochastic model. Furthermore, the claim that an overall deviation below 13% (Line 500) constitutes a successful validation is an arbitrary and unjustified threshold. This section must be reworked to include appropriate statistical hypothesis testing (e.g., Kolmogorov-Smirnov test) to compare the data distributions.
Response 17:
I sincerely apologize for having provided only a summary of the verification procedure in the previous submission. The validation of the synthetic data generator was conducted across ten large-scale projects, each comprising between three and seven independent construction tasks. For each task, real-world data were independently collected and evaluated. In the revised manuscript, the verification section has been expanded to include additional indicators and datasets. Validation of the synthetic data involved statistical verification through correlation analysis (R², Pearson correlation) and comparison of key statistical measures, particularly the Root Mean Square Error (RMSE), applied to both the real and generated datasets. Furthermore, an F-test was performed for each category, including total cost, completion time, CO₂ emissions, fuel consumption, and number of failures. To examine whether the real and synthetic datasets followed a normal distribution, the Kolmogorov–Smirnov test was applied to each category (lines 586-615). Each construction task was treated as an independent data record, capturing task volumes, parameter specifications, and the actual deployment of machine assemblies. Synthetic data were subsequently generated using the developed model and compared against the corresponding real-world datasets. In total, 49 comparisons were conducted, and the results were aggregated and summarized by project into ten consolidated records. The revised description of the verification process can be found in the manuscript on lines 636–642.
Comment 18:
- Conclusions and Discussion (Section 5):
- Lines 515-530: The limitations list (starting line 515) and the "Further research" list (starting line 522) are highly repetitive. These sections should be combined and restructured so that each identified limitation is directly and logically linked to a specific future research direction.
Response 18:
Thank you for your suggestion. I agree with this comment. The Discussion section has been completely revised and reorganized into several subsections to improve clarity and readability (lines 616-710).
Comment 19:
- Lines 538-542 (Author Contributions): The extensive list of author contributions for a single-author paper is unconventional and should be removed or formatted according to the journal's standard guidelines.
Response 19:
The Author Contributions section was revised to reflect a single (solo) author (line 712).
I look forward to hearing from you in due course regarding our submission and remain available to address any further questions or comments you may have.
Sincerely, Author of the manuscript.
Reviewer 4 Report
Comments and Suggestions for AuthorsDear author,
Thank you for the effort you have put into producing this paper.
Also, thank you for considering Buildings for the submission/.
In general, the paper looks good, but some technical issues need to be considered in the next stage of review. Therefore, would recommend this paper with minor revisions. The author is recommended to address the following suggestions to raise the quality of the paper:
1. Title
The title does not truly reflect the content of the whole manuscript. The author is recommended to be consistent while choosing a consistent and attractive title.
2. Abstract.
The abstract seems not well developed. For example, some awkward terms and sentences can be easily detected.
3. Keywords
There is overuse of the keywords, as well as trying to look at how this part can be developed.
4. Introduction
This section is lacking in related citations that can prove the statements you have provided.
Also, some content is unrelated or unnecessary to the paper.
State the aim and objective clearly and include the contribution and novelty in a single paragraph.
5. Methods and Models
This section is lacking in statistical and methodological citations that can prove the statements you have provided.
Figure 7 should be presented at the beginning of the section so the readers can be well guided.
6. Conclusions and Discussion
I believe there is a massive difference between conclusion and discussion, so the author are recommended to separate this section into 2. This can help the manuscript to be more meaningful and more informative.
General suggestion.
The author is recommended to conduct proofreading and editing for the whole manuscript. For example, the author uses AI directly without clarifying what this is, even in the title, abstract, and the remaining sections, except for abbreviations.
Hope the above helps.
Comments on the Quality of English LanguagePlease refer to my suggestions to the author.
Author Response
Dear Reviewer,
Thank you for giving the opportunity to submit a revised version of my manuscript titled „Synthetic Data Generation for AI-Driven Optimization of Construction Machinery Assemblies” to Buildings.
I appreciate the time and effort that you have dedicated to providing your valuable feedback on my manuscript. I am grateful to you for insightful comments on the paper. I have been able to incorporate changes to reflect most of the suggestions provided by your review. I have made some changes to the manuscript.
Here is a point-by-point response to the reviewers’ comments and concerns.
Comment 1:
- Title
The title does not truly reflect the content of the whole manuscript. The author is recommended to be consistent while choosing a consistent and attractive title.
Response 1:
After thorough and careful consideration, it has been decided to make a minor modification to the title of the paper and remove the reference to the AI-driven model. I fully agree that the title should accurately reflect the main objectives and outcomes of the study. Although the research results will serve as input data for subsequent AI model training, in this paper AI represents more of the target direction of the research rather than the core of the model or methodology.
The revised title of the paper is: Synthetic Data Generation Methodology for Construction Machinery Assembly Optimization.
Comment 2:
- Abstract.
The abstract seems not well developed. For example, some awkward terms and sentences can be easily detected.
Response 2:
I fully agree with the suggestion to rephrase several awkward terms and sentences in the abstract. An edited version of the abstract can be found in the revised manuscript. This improvement is expected to substantially enhance the manuscript’s impact and improve its clarity for readers (lines 010-028).
Comment 3:
- Keywords
There is overuse of the keywords, as well as trying to look at how this part can be developed.
Response 3:
I understand and agree with this comment. The list of keywords has been refined to ensure that it fully corresponds to the key research points presented in the revised manuscript. The updated list can be found on line 029-031.
Comment 4:
- Introduction
This section is lacking in related citations that can prove the statements you have provided.
Also, some content is unrelated or unnecessary to the paper.
State the aim and objective clearly and include the contribution and novelty in a single paragraph.
Response 4:
I agree with this comment. The Introduction section has been reformulated and visually as well as conceptually divided into several subsections. Some parts have been shortened, while others were expanded with relevant references. Additional subsections describing the Research Gap and the Novelty of the contribution have also been added (lines 134-139, 160-169). I have added the relevant references 1–10 (lines 034-070).
Comment 5:
- Methods and Models
This section is lacking in statistical and methodological citations that can prove the statements you have provided.
Response 5:
I sincerely apologize for having provided only a summary of the verification procedure in the previous submission. The validation of the synthetic data generator was conducted across ten large-scale projects, each comprising between three and seven independent construction tasks. For each task, real-world data were independently collected and evaluated. In the revised manuscript, the verification section has been expanded to include additional indicators and datasets. Validation of the synthetic data involved statistical verification through correlation analysis (R², Pearson correlation) and comparison of key statistical measures, particularly the Root Mean Square Error (RMSE), applied to both the real and generated datasets. Furthermore, an F-test was performed for each category, including total cost, completion time, CO₂ emissions, fuel consumption, and number of failures. To examine whether the real and synthetic datasets followed a normal distribution, the Kolmogorov–Smirnov test was applied to each category (lines 586-615). Each construction task was treated as an independent data record, capturing task volumes, parameter specifications, and the actual deployment of machine assemblies. Synthetic data were subsequently generated using the developed model and compared against the corresponding real-world datasets. In total, 49 comparisons were conducted, and the results were aggregated and summarized by project into ten consolidated records. The revised description of the verification process can be found in the manuscript on lines 636–642.
Comment 6:
Figure 7 should be presented at the beginning of the section so the readers can be well guided.
Response 6:
Figure 7 has been moved to the beginning of the section “Generation of Synthetic Data for Model Optimization” to improve clarity and readability. Line 499.
Comment 7:
- Conclusions and Discussion
I believe there is a massive difference between conclusion and discussion, so the author are recommended to separate this section into 2. This can help the manuscript to be more meaningful and more informative.
Response 7:
Thank you for your suggestion. I agree with this comment. The Discussion section has been completely revised and reorganized into several subsections to improve clarity and readability (lines 617-710).
Comment 8:
General suggestion.
The author is recommended to conduct proofreading and editing for the whole manuscript. For example, the author uses AI directly without clarifying what this is, even in the title, abstract, and the remaining sections, except for abbreviations.
Quality of English Language: The English could be improved to more clearly express the research.
Response 8:
I agree with the suggestion to improve the readability of the article and to refine the academic English. A professional language editing service has been used to carry out an expert revision of the manuscript’s English. The corresponding certificate is attached. The paper has undergone English language editing by MDPI. The text has been checked for correct use of grammar and common technical terms and edited to a level suitable for reporting research in a scholarly journal. MDPI uses experienced, native English-speaking editors.
I look forward to hearing from you in due course regarding our submission and remain available to address any further questions or comments you may have.
Sincerely, Author of the manuscript.
Author Response File:
Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsAll my comments have been addressed in the revised version, hence, it is acceptable.
Author Response
Dear Reviewer.
Thank you very much for taking the time to review my manuscript.
Comment 1: All my comments have been addressed in the revised version, hence, it is acceptable.
Response 1: I appreciate the time and effort that you have dedicated to providing your valuable feedback on my manuscript.
Comment 2: The English could be improved to more clearly express the research.
Response 2: A professional language editing service has been used to carry out an expert revision of the manuscript’s English. The corresponding certificate is attached. The paper has undergone English language editing by MDPI. The text has been checked for correct use of grammar and common technical terms and edited to a level suitable for reporting research in a scholarly journal. MDPI uses experienced, native English-speaking editors.
Sincerely, Author of the manuscript.
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsDear Author, thank you for responding to my specific comments. However, the revised manuscript still requires further improvement, as follows:
- Instead of writing the subtitle in a bold format - I suggest making it in a subsection. For example: 1. Introduction, then the subsection would be 1.1 General background and 1.2 Literature review. This also applies to others.
- The conclusion should be placed in a single section. Currently, it blends into the discussion part.
Author Response
Dear Reviewer.
Thank you very much for taking the time to review my manuscript. I am grateful to you for insightful comments on the paper. I have been able to incorporate changes to reflect most of the suggestions provided by your review. I have made some changes to the manuscript.
Here is a point-by-point response to the reviewers’ comments and concerns.
Comment 1:
Dear Author, thank you for responding to my specific comments. However, the revised manuscript still requires further improvement, as follows:
- Instead of writing the subtitle in a bold format - I suggest making it in a subsection. For example: 1. Introduction, then the subsection would be 1.1 General background and 1.2 Literature review. This also applies to others.
Response 1:
I agree with the suggestion. After thorough and careful consideration, it has been decided to make a minor modification to the structure of the paper, lines 33-184
Comment 2:
- The conclusion should be placed in a single section. Currently, it blends into the discussion part.
Response 2:
I agree with the suggestion. After thorough and careful consideration, it has been decided to make a minor modification to the structure of the paper, lines 620-717.
Sincerely, Author of the manuscript.
Reviewer 3 Report
Comments and Suggestions for AuthorsThank you for your revision. Congratulations on addressing most of my previous concerns. However, there is still a minor issue to address.
For line 735: Task volume: 6,000 m³; excavation class: 3; extending coefficient, Kn = 1.25 [47]. I do not have access to reference number 47.
Author Response
Dear Reviewer.
Thank you very much for taking the time to review my manuscript.
Comment:
Thank you for your revision. Congratulations on addressing most of my previous concerns. However, there is still a minor issue to address.
For line 735: Task volume: 6,000 m³; excavation class: 3; extending coefficient, Kn = 1.25 [47]. I do not have access to reference number 47.
Response: After careful consideration, I have decided to expand the references that clearly confirm the origin of the coefficient Kn = 1.25.
In the general theory for adjusting the processable volume of soil, the basic general formula [47, formula 6.4, page 122] is used; the scanned pages of the source are provided below. The specific coefficient refers to a certain type of soil and can be found in the additionally included source — the Czech Technical Standard 73 3050, on page 23. The scanned pages of the standard are also provided below.
The basic table 5 [53, page 23] for determining the coefficient has been translated into English and indicates a coefficient of 125% or 1.25. The new reference will be included in the revised manuscript on lines 743 and 1035.
For more details, please see the attached document containing scanned pages from the references.
Sincerely, Author of the manuscript.
Author Response File:
Author Response.docx

