Analyzing Time and Cost Deviations in Educational Infrastructure Projects: A Data-Driven Approach Using Colombia’s Public Data Platform
Round 1
Reviewer 1 Report
Comments and Suggestions for Authors- I suggest that the authors look for the following comments. It's minor in nature but must be addressed before final acceptance.
- Focus is only on educational infrastructure; the conclusion extrapolates too broadly to “public infrastructure” without sufficient cross-sector validation.
- The Random Forest model is tuned but lacks a train/test split, cross-validation, or performance metrics (R², RMSE, MAE). Current results risk overfitting.
- Phrases like “optimism blindness” are used without a rigorous definition.
- Report how many data points were removed as outliers and test robustness with/without them.
- Expand discussion of how Colombian procurement rules shape results.
Author Response
Please see the attachment.
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for Authorsrevise and edit the following in your paper:
Could the analysis be extended beyond finalized projects to include ongoing projects, thereby capturing early warning signals of cost and time deviations?
How might incorporating qualitative variables such as stakeholder behavior, political context, or site-specific challenges improve the explanatory power of the models?
Did the authors assess whether regional differences (urban vs. rural) or geographic constraints influence deviation patterns?
Would integrating geospatial data (e.g., terrain type, accessibility) provide additional insights into time delays?
How sensitive are the results to the choice of statistical techniques—would advanced machine learning models (e.g., random forests, gradient boosting) uncover non-linear relationships missed by traditional regression?
To improve the depth of the literature review, include the following citations, cite when you discuss ML:
https://ieeexplore.ieee.org/document/7400560
https://doi.org/10.1016/B978-0-12-823432-7.00007-0
Did the authors consider interaction effects among predictors such as project type × contract value or bidders × award growth?
Could time-series or survival analysis approaches capture dynamics of project delays more effectively than cross-sectional methods?
Were robustness checks performed to account for potential data quality issues (e.g., missing values, reporting inconsistencies) in the open government datasets?
How reproducible are the findings across other sectors of infrastructure (e.g., health, transport)—does the educational sector exhibit unique deviation patterns?
Would benchmarking Colombian projects against international datasets provide comparative insights into systemic vs. country-specific challenges?
Could the study quantify the economic or social impact of delays and overruns beyond statistical correlations?
How might incorporating procurement process transparency indices improve understanding of bidder-related effects?
Did the authors explore whether project governance mechanisms (e.g., contract monitoring, auditing) mitigate deviations?
Could clustering analysis help identify typologies of projects with similar risk profiles for delays and overruns?
How scalable is the proposed methodology for real-time monitoring dashboards that inform policymakers during project execution rather than after completion?
Author Response
Please see the attachment
Author Response File:
Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsDear Authors,
This article is written on a relevant topic and is devoted to time and cost deviations in educational infrastructure. However, a number of comments should be suggested which could help improving the manuscript.
Main note: please use MDPI template to format text, references in it and the reference list.
Abstract: this section is rather informative, nevertheless, main object of the study should be added.
- Introduction: this section seems quite rational, but description of the gap in existing studies of deviations in educational infrastructure, that this research is aimed to fill, is missed. I recommend stating the research problem and hypothesis more clearly, for example, after line 213.
- Materials and Methods: A clearer final overview of the methods of the exploratory, bivariate, and machine learning analyses should be given, since the article is devoted to an overview of the methods. I suggest combining them into a table that shows the strengths and weaknesses of the methods. Also, equations must be formatted as in MDPI template.
- Results: Table 1. “Independent variables” us actually a part of “Materials and Methods” section. Also Figures 1-6 and Tables 1-5 should be analyzed in detail to confirm the author's hypothesis.
- Discussion: Discussion subsection should contain the main breakthrough results, a description of the obstacles and limitations of the study, as well as future ways to overcome them.
Table 6 is a part of “Results” section.
- Conclusions: no notes for this section.
Good luck!
Comments on the Quality of English LanguageDear Authors,
this manuscript contains a number of grammatical and syntactic errors that should be corrected before promotion,
Regards,
Author Response
Please see the attachment
Author Response File:
Author Response.pdf
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsThe paper has been improved and I recommend its acceptance.
Reviewer 3 Report
Comments and Suggestions for AuthorsGood luck!
