Previous Article in Journal
Quantization of Faster R-CNN
Previous Article in Special Issue
An Adaptive Machine Learning Approach to Sustainable Traffic Planning: High-Fidelity Pattern Recognition in Smart Transportation Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Stochastic Cost Estimation in Transportation Infrastructure Projects Using Monte Carlo Simulation and Correlated Risk Variables

1
Escuela de Ingeniería Civil, Universidad Nacional Mayor de San Marcos, Lima 15081, Peru
2
Facultad de Ingeniería Civil, Universidad Nacional Federico Villarreal, Lima 15088, Peru
3
Escuela de Ingeniería Civil, Universidad Nacional Santiago Antunez de Mayolo, Huaraz 020105, Peru
*
Author to whom correspondence should be addressed.
Future Transp. 2025, 5(4), 176; https://doi.org/10.3390/futuretransp5040176 (registering DOI)
Submission received: 15 September 2025 / Revised: 28 October 2025 / Accepted: 12 November 2025 / Published: 20 November 2025

Abstract

Peru faces critical challenges in the development and maintenance of its national road infrastructure, comprising over 32,000 km, of which only 26% are classified as being in good condition. This infrastructural deficit significantly elevates logistics costs and undermines national competitiveness, particularly in key sectors such as agriculture and mining. In this context, improving the accuracy and reliability of cost estimation in road infrastructure projects is imperative to optimize resource allocation and mitigate the risk of cost overruns. This study proposes a stochastic cost estimation framework that integrates Monte Carlo simulation with correlation matrices, enabling the modeling of uncertainty and the complex interdependencies among critical cost drivers. The methodology was applied to the Oyon Ambo highway in Peru. Historical input cost databases were analyzed to define probabilistic distributions, and correlation coefficients were employed to represent the dependencies between variables such as material prices, labor productivity, and equipment efficiency. The stochastic model produced probabilistic cost forecasts with associated confidence intervals and quantified risk exposure. The findings demonstrate that the proposed integrated approach significantly enhances the precision and robustness of cost estimates, providing project managers and decision-makers with a rigorous, data-driven tool for risk-informed budgeting and strategic financial planning in complex infrastructure projects.

1. Introduction

Road infrastructure is a highly valued aspect of current policies aimed at achieving economic development and competitiveness, as it facilitates market integration, reduces logistics costs, and shortens transportation times [1,2,3]. Recent IDB estimates indicate that meeting SDG-consistent service levels by 2030 would require annual infrastructure investment of about 3.12% of regional GDP across water and sanitation, energy, transport, and telecommunications (with roughly 1.37% of GDP in transport), levels that remain above current investment trends [4]. As a consequence, road quality in the region ranks among the lowest globally, with a direct impact on competitiveness and production costs. In Latin America, logistics costs account for nearly 20% of total production costs, double the global average [5].
One of the critical factors exacerbating this situation is the frequent occurrence of cost overruns and delays in infrastructure projects. Globally, infrastructure investment projects show average cost overruns of 28%, but in Latin America this figure rises to 48% [6]. Moreover, it has been documented that around 75% of projects face budget increases and 65% experience schedule delays. Unlike other regions where these indicators tend to decrease, in Latin America they have continued to rise over time. This situation not only increases the fiscal burden but also generates uncertainty in execution and undermines the efficiency of public spending: the IDB estimates that between 20% and 53% of the budget allocated to public investment remains unexecuted due to administrative and technical delays [7].
In Peru, this problem takes on special significance. The country ranks 110th out of 141 in terms of road infrastructure quality according to the World Economic Forum [2], and faces paradigmatic cases such as the Nueva Carretera Central, whose budget doubled from S/11 billion to over S/24 billion after alignment adjustments during the study phase [8]. Adding to this is a troubling panorama of paralyzed works: by the end of 2023, the Comptroller General of the Republic reported 2298 projects halted with S/26.992 billion already invested, of which an additional S/13.772 billion would be required for completion [9]. Transport sector leads this group with 628 stalled works, many of them roads, mainly due to contractual breaches (23.5%) and lack of financing (22.4%) [10].
This combination of insufficient investment, poor construction quality, and frequent budget deviations seriously limits a country’s ability to effectively execute its infrastructure programs. The economic consequences are significant: it is estimated that if these gaps persist over the next decade, Latin America’s GDP would be 15 percentage points lower, with an accumulated loss close to USD 900 billion for the region.
In this context, the need to substantially improve cost estimation methodologies in road projects becomes evident, incorporating approaches that more realistically reflect the inherent uncertainty of this type of work. Traditional techniques, based on deterministic values, often underestimate the variability of inputs and the interdependencies among key factors such as material prices, labor productivity, and equipment efficiency. Hence, recent studies have proposed the use of stochastic tools such as Monte Carlo simulation to capture this complexity [11,12]. In a previous study [13], it was argued that Monte Carlo simulation is a suitable tool in contexts like Latin America, where the use of stochastic methods is still limited. Following this line of improvement, the present study proposes a stochastic cost estimation framework that integrates Monte Carlo simulation with correlation matrices, with the objective of explicitly modeling the relationships among interdependent variables using historical data or expert judgment.
Several international studies have demonstrated the usefulness of Monte Carlo simulation in estimating costs and schedules in construction projects, integrating correlation matrices to model interdependencies among variables. Research by Firouzi et al. [14], Moselhi and Roghabadi [15] and Sobieraj and Metelski [16] has shown that ignoring correlation among items can lead to underestimating project uncertainty, affecting the accuracy of contingency and schedule estimates. However, these approaches have been developed mainly in European, North American, or Asian contexts, where there is greater availability of data and specialized tools. In contrast, in Latin America—and particularly in Peru—the application of such methodologies is still incipient, requiring the development of practical, adaptable, and rapidly implementable tools that allow for a robust initial approach to quantitative risk analysis. Therefore, this research proposes a model tailored to Latin American conditions, with direct application to a case study in a Peruvian road project.
Unlike previous studies that apply Monte Carlo simulation under the assumption of independence among variables, this research explicitly incorporates correlations between critical cost factors such as prices, productivity, and operational efficiency [17,18,19,20]. This methodological differentiation is key in real-world contexts where these variables are strongly interrelated, yet the literature often treats them as independent. The proposed model, in addition to presenting a correlation structure adaptable through historical data and expert judgment, also enables its practical implementation in data-limited environments such as those prevalent in Latin America. With this, the study seeks to close a concrete gap in the quantitative risk analysis of infrastructure projects in the region.

2. Methodology

2.1. Methodological Framework of the Research

The methodology of this research is structured in three main phases, as shown in Figure 1. Phase 1 comprises the identification of the problem, the literature review, and the selection of the case study. In Phase 2, the stochastic model is developed through data collection, goodness-of-fit testing, the selection of probability distributions, and the construction of the correlation matrix. Finally, Phase 3 involves a simulation using the Monte Carlo method and the analysis of the resulting indicators. This comprehensive approach allows for a realistic representation of uncertainty and interdependencies among critical variables in road infrastructure projects.

2.2. Case Study Description

The model was applied to the improvement of a 49 km section of the Oyón–Ambo highway, located in the central-northern region of Peru, which connects Lima, Pasco, and Huánuco. This case was selected due to its technical and logistical complexity, as well as the availability of reliable historical data, making it representative for validating a stochastic cost estimation methodology under real conditions of uncertainty. This project has been the subject of previous research under a qualitative approach [21,22], and a quantitative approach using Monte Carlo Simulation [13]. However, unlike earlier studies that relied solely on Monte Carlo simulation, this research introduces correlation matrices to capture interdependencies among key variables, representing a significant methodological advancement in contexts such as LATAM.

2.3. Data Collection and Historical Database

The model was based on the analysis of more than 500 unit records related to input prices, labor productivity, and equipment efficiency from road projects executed between 2019 and 2023 in the central-northern region of the country. The information sources included technical files from the Ministry of Transport and Communications (MTC), regional contractors’ databases, the Peruvian Unit Price Bank (BPU), and records from the National Public Investment System (SNIP). The data were cleaned to remove incomplete records and outliers. Subsequently, an exploratory analysis was conducted using statistical tools to identify the probability distribution of each variable. Goodness-of-fit tests such as Kolmogorov–Smirnov, as expressed in Equation (1), and Anderson–Darling were applied to validate the adequacy of the selected functions.
D = s u p x F n x F x
A 2 = n 1 n i = 1 n [ ( 2 i 1 ) ( ln F X i + ln ( 1 F X n + 1 i ) ) ]
The goodness-of-fit tests, shown in Table 1 and Figure 2, confirmed that the key variables of the model, Portland cement for concrete, fuel, fine aggregate, and coarse aggregate—follow a normal distribution, with positive results in both Kolmogorov–Smirnov and Anderson–Darling tests.
For the Kolmogorov–Smirnov test, the test statistic is the largest vertical gap between the empirical and the theoretical cumulative distribution functions; for the Anderson–Darling test, it is a tail-weighted distance. Decisions are made by comparing the test statistic with its probability value or with a critical value.
Given the 2019–2023 time span, marked by COVID-19 and inflation shocks, we interpret unit-price observations as regime-conditional rather than strictly stationary. To preserve transferability, budgeting inputs should be deflated to constant currency (using official construction/producer price indices) and de-trended before distribution fitting; correlations (ρ) should be estimated on normalized series or residuals. In practical applications, we recommend calibrating the chosen percentile (as P90–P95) to contemporaneous indices at the time of funding decisions, treating the percentiles reported here as conditional on the 2019–2023 regime. This regime-aware reading mitigates spurious inferences from temporary shocks without changing the scope of this study.
The normal distribution is used to model phenomena with a central tendency and symmetrical variability, making it useful when values are evenly distributed around a mean. Its probability density function is given by (2), where μ represents the mean and σ the standard deviation. This distribution is applied in cost and schedule estimations when variations result from multiple independent factors, in accordance with the Central Limit Theorem.
f x = 1 σ 2 π x μ 2 2 σ 2
The triangular distribution is used when information is limited and only the minimum (a), most likely (b), and maximum (c) values are known. This model, described in (3), is applied in cost and schedule estimation when subjective assessments are available but there is insufficient historical data to use more complex distributions.
f x = 2 x a b a c a ; a x   b 2 c x c a c b ;   a x   b 0                             ; i n   a n o t h e r   c a s e s
The Beta PERT distribution is a variant of the Beta distribution used to represent uncertainty in project activity estimates. It is defined in terms of the minimum (a), maximum (b), and mode (m) values, with its probability density function shown in (3).
f x = x a α 1 b x β 1 B α , β β a α + β 1
where B(α, β) is the Beta function, and the expressions for α and β are represented as shown in (4).
α = 1 + 4 m a b a ;   α = 1 + 4 b m b a

2.4. Definition of Random Variables and Probability Distributions

Based on the historical data collected, the main random variables that significantly influence the cost estimation of the Oyón–Ambo project were identified and selected. These variables were grouped into three main categories: (i) input prices, (ii) labor productivity, and (iii) equipment efficiency. Each variable was analyzed individually to determine its statistical behavior and the probability distribution function that best represented its variability.
Specialized tools such as @Risk were applied to fit probability distribution functions to the collected data, using triangular distributions for inputs with low dispersion, PERT distributions for productivity rates and operating times, and normal distributions for symmetric variables with a large sample size. They were parameterized according to minimum, maximum, most likely values, or means and standard deviations. In total, 15 random variables, as shown in Table 2, were modeled with their respective distributions, providing a solid statistical basis to simulate uncertainty scenarios and obtain more precise and realistic cost estimates.
Distribution types were selected based on previous studies [13] and data availability, goodness-of-fit, and variable nature, consistent with established cost-risk guidance. For unit input prices with sufficient observations and no rejection of normality (cement, fuel, fine and coarse aggregates; K–S and Anderson–Darling at α = 5%), we used Normal. Where data were sparse and bounded by procurement ranges (as steel price, plasticizer additive), we used Triangular, a recommended choice when only minimum/most-likely/maximum are credibly specified and tail behavior is uncertain. For labor productivity and equipment efficiency—bounded rates typically elicited from technical files as min–mode–max with potential skew—we used Beta-PERT, which emphasizes the most-likely value while respecting bounds.
These criteria are aligned with AACE guidance on selecting probability distributions [23,24,25] and with government handbooks that favor Triangular/PERT under limited data and elicitation settings [26,27]; our earlier study documents these same justifications and diagnostics in road-project applications.

2.5. Correlation Matrix and Interdependency Modeling

In stochastic cost modeling, assuming independence among random variables can lead to biased estimates of the project’s total risk. In real contexts, factors such as material prices, labor productivity, and equipment efficiency exhibit significant interdependencies. Ignoring these correlations limits the accuracy of uncertainty analysis and may underestimate the contingencies required to mitigate cost overruns.
To capture interdependencies, we computed a Pearson-based correlation matrix from the cleaned and temporally normalized historical data and retained only pairs with technical justification and |r| ≥ 0.30 (a pragmatic “moderate-effect” screening cut-off that balances signal and noise in data-constrained settings [28]). The selected correlations were then implemented in the simulation using @RISK’s correlation engine: Cholesky factorization was used to ensure a valid positive-definite matrix, and rank-correlated sampling (Iman–Conover) was applied to generate correlated draws while preserving each variable’s marginal distribution. The resulting correlation matrix is consistent with the variance–covariance structure shown in Equation (3) [26]
V = v 11 v 12 v 13 v 21 v 22 v 23 v 31 v 32 v 33 = σ 1 2 ρ 12 σ 1 σ 2 ρ 13 σ 1 σ 3 ρ 12 σ 1 σ 2 σ 2 2 ρ 23 σ 2 σ 3 ρ 13 σ 1 σ 2 ρ 23 σ 2 σ 3 σ 3 2
Table 3 summarizes the pairs of correlated random variables incorporated into the model, along with their respective correlation coefficients.
A Pearson-based correlation matrix, coupled with Iman–Conover rank mapping, is employed as a pragmatic first-order representation of co-movement under data constraints. This choice preserves marginal fits and propagates dependence for contingency sizing; however, it does not capture non-linear or asymmetric tail dependence, so extreme co-movements may be understated. The objective is to quantify the value of moving from independence to an empirically supported dependence structure rather than to fully characterize tail risk. Consistent with Firouzi et al. [14], richer copula models can reshape tail behaviour.

2.6. Monte Carlo Simulation Framework and Risk Metrics

Using @Risk [29], simulations with 10,000 iterations were executed to model cost uncertainty for the Oyón–Ambo section. This number of iterations was adopted based on prior Monte Carlo analyses conducted on the same project [13], which evaluated sample sizes of 5000, 10,000, and 50,000 iterations. The results demonstrated that convergence of key statistical outputs (mean, percentiles, and standard deviation) was achieved from approximately 10,000 iterations onward, indicating that higher sample sizes did not significantly improve accuracy. Therefore, this iteration count ensures reliable convergence while optimizing computational resources. The model incorporated triangular, PERT, and normal input distributions, enforced the empirically derived correlation matrix, and was applied to the project’s bill of quantities.
Each iteration produced a total-cost scenario, yielding a full probability distribution of project cost. We then computed decision-oriented outputs: expected total cost (mean), standard deviation, confidence percentiles (P10, P50, P90), 95% Value-at-Risk (VaR), and the probability of exceeding the base budget. Together, these metrics quantify central tendency, dispersion, tail risk, and budget-overrun likelihood, supporting contingency sizing, proactive financial management, and strengthened controls under uncertainty.
The simulation employed 10,000 iterations as a practical convergence threshold, consistent with prior runs on the same project [13]. Beyond this size, the principal outputs (mean and upper percentiles) exhibited no material variation, meeting the study’s precision requirements while maintaining computational efficiency.

2.7. Monte Carlo Workflow for Correlated Cost Estimation

Inputs comprise the fitted marginal distributions for each cost driver (Xi), a target correlation matrix (R), and the iteration count (N). Outputs include the empirical distribution of total cost, summary statistics (mean, P50, P90, P95), and exceedance probabilities relative to specified budget thresholds. The procedure fits appropriate marginal distributions to each driver and stores their parameters; generates N independent pseudo-random variates per driver and computes their ranks; imposes the target dependence via Iman–Conover rank mapping to approximate R while preserving empirical ranks; maps the correlated ranks to the physical space using the inverse CDFs of the marginals to obtain simulated cost deltas; aggregates the simulated drivers into total cost for each iteration to form the sample of totals; and finally computes the summary statistics and decision quantiles and evaluates budget exceedance probabilities. The process is listed below:
  • Fit an appropriate marginal distribution to each driver Xi and store parameter estimates;
  • Generate N independent pseudo-random variates and rank them per driver;
  • Impose the target dependence using Iman–Conover rank mapping to match R while preserving the empirical ranks;
  • Map the correlated ranks to physical space via the inverse CDFs of {Xi} obtaining simulated cost deltas per driver.
  • Aggregate simulated drivers to total cost for each iteration; record the sample of totals.
  • Compute summary statistics and decision quantiles (P50, P90, P95), and evaluate budget exceedance probabilities.

3. Results

3.1. Distribution of Results

The execution of the Monte Carlo simulation with 10,000 iterations produced a probabilistic distribution of the total estimated cost for the Oyón–Ambo road project, integrating the variability of the input variables and their respective interdependencies. From these results, the most likely budget execution scenarios were identified, as well as the levels of exposure to economic risk.
When comparing the results of both simulations shown in Figure 3, a significant difference is observed in the projected values for the 95th percentile of the project’s total cost. In the model without correlations, P95 was USD 84.63 million, while in the model incorporating correlation matrices this value increased to USD 86.26 million. This increase reflects a direct effect of the dependencies introduced among random variables, which amplify risk propagation within the system. The wider interval between the 5th and 95th percentiles suggests greater dispersion in the possible scenarios, translating into higher volatility in cost estimation. This behavior is consistent with statistical principles, which indicate that positive correlation among variables increases the joint variance of the system. Therefore, the results show that the use of correlation matrices not only alters the expected value but also affects the shape and spread of the simulated outcome distribution, which is particularly relevant for contingency estimation in contexts with multiple interrelated sources of uncertainty.
In the uncorrelated simulation (red curve), the distribution exhibits a slightly positive skew, with a longer right tail, indicating fewer high-cost scenarios. In contrast, the correlated model (blue curve) shows a more centered and compact distribution, with greater overall dispersion, reflecting a more uniform propagation of risk due to dependencies among variables.
This behavior is consistent with statistical theory: the incorporation of correlations among variables tends to increase joint variance and redistribute risk, making the curve more symmetric but broader. This results in more conservative and realistic estimates for budget planning.

3.2. Statistical Analysis

The main statistical indicators derived from both Monte Carlo simulations are presented below: one without considering correlations among random variables, and another integrating a correlation matrix. The objective is to contrast the effects of explicitly incorporating interdependencies on the shape and dispersion of the simulated total cost distribution. The results, as shown in Table 4, reveal relevant differences in the mean, key percentiles, and shape measures (skewness and kurtosis), showing greater dispersion and a slight reduction in kurtosis when correlations are considered.

3.3. Sensitivity Analysis

Based on the cost distribution of the correlated model reported in Section 3.1 (Mean = USD 84.01 M, P50 = USD 83.98 M, P90 = USD 85.00 M, and P95 = USD 86.26 M), exceedance probabilities were estimated for different budget thresholds:
  • If the approved budget were USD 84.00 M, the probability of exceedance would be 45.4%.
  • If the approved budget were USD 84.50 M, the probability of exceedance would be 28.7%.
  • If the approved budget were set at P90 (USD 85.00 M), the residual risk would decrease to 16.8%.
  • If the approved budget were set at P95 (USD 86.26 M), the residual risk would be ≈ 3.0%.
These figures were obtained by fitting a skew-normal distribution that reproduces the reported quantiles and evaluating the cumulative distribution function (CDF) at each threshold. Compared to the approved budget in the technical file, the probability of exceedance can be directly interpreted depending on where the budget lies relative to P50/P90/P95: if set around USD 84.5 M, the project faces moderate risk (~29%); if raised to P90, that risk falls to ~17%; and at P95, the residual risk reduces to ~3%.
Figure 4 shows the probability density of simulated project costs, highlighting the expected distribution around the mean and indicating the position of the approved budget. This visualization allows decision-makers to directly observe the likelihood of costs clustering around central estimates or deviating toward higher values.
Figure 5 presents the CDF of simulated costs, which is particularly useful for interpreting exceedance probabilities. By comparing the position of the approved budget with the cumulative curve, one can directly quantify the risk of cost overruns. Together, these figures link the statistical analysis with practical financial decision-making, reinforcing the connection between percentiles (P50, P90, P95) and the residual budgetary risk identified in the sensitivity analysis.

4. Discussion

4.1. Comparison with Similar Methodology Studies

The results of this study’s stochastic cost estimation—particularly the inclusion of correlated risk factors—align with and extend findings from existing literature. The Monte Carlo simulations showed that incorporating correlation between cost drivers significantly widened the cost distribution and raised the high-end risk estimate. For instance, the 95th percentile (P95) cost forecast increased from about USD 84.63 million under an independence assumption to USD 86.26 million when correlations were modeled. This ~2% rise in the P95, coupled with a jump in the standard deviation (from 0.61 to 0.73), indicates greater volatility and a more conservative contingency requirement once interdependencies are acknowledged. Such behavior is consistent with statistical theory and has been observed by other authors—positively correlated inputs lead to higher joint variance and broader outcome ranges [30,31,32,33].
These findings corroborate prior studies emphasizing the critical role of accounting for correlations in project cost estimates [34,35,36,37]. Decades ago, Touran and Wiser [38] demonstrated that ignoring correlations among cost elements can severely underestimate total cost variance. More recently, Firouzi et al. [14] applied a copula-based Monte Carlo model and found that different dependency structures yield markedly different cost distribution shapes. They concluded that modeling such interdependencies (via copulas) improves the accuracy of total cost prediction. Likewise, Moselhi and Roghabadi underscored that correlation is “the most important issue” for accurate contingency estimation. In their fuzzy Monte Carlo approach, they even introduced subjective correlation techniques to handle situations with limited data on correlation coefficients [15]. Our results directly echo these studies: by explicitly integrating a correlation matrix, we obtain a more realistic risk profile, avoiding the underestimation of uncertainty that occurs when cost drivers are treated in isolation. This agreement with global research validates our methodological choice and highlights a key improvement over traditional Monte Carlo analyses that assume independent cost items.
Methodologically, our approach distinguishes itself by using actual historical data and expert judgment to construct the correlation matrix for cost variables. This is in contrast to earlier works that often assumed simplistic or no dependencies, or relied on preset correlation categories (“high/medium/low” subjective correlations as in Touran [39]. By capturing nuanced interdependencies, our model provides a refined risk quantification. Notably, Moselhi and Roghabadi’s fuzzy-simulation method achieved comparable accuracy to a full Monte Carlo by incorporating uncertainty in the correlation inputs—their “developed method” was shown to match the performance of a Monte Carlo with correlation even when run in an analytical (non-simulation) mode. This highlights that while the implementation may vary, the inclusion of correlation in cost risk models is indispensable for realism. In sum, the consensus across these studies is that acknowledging the interconnected nature of cost drivers leads to more robust and defensible estimates of contingency and risk exposure.

4.2. Global Context and Implications

Placing our findings in a global context, it becomes clear that advanced risk analysis techniques like ours are not just academic exercises but are crucial for addressing well-documented cost overrun trends. Worldwide, infrastructure projects commonly suffer cost overruns averaging around 28%, whereas in Latin America the average overrun soars to roughly 48%. Traditional deterministic estimating approaches—still prevalent in many projects—have been heavily criticized for contributing to these overruns due to their inability to represent uncertainty. They tend to produce point estimates that ignore variability and correlations, resulting in budgets that are too optimistic. Our Monte Carlo-based framework directly tackles this shortcoming by providing probability-based forecasts. For example, rather than a single-valued contingency, we can quantify the likelihood of exceeding the baseline budget. This probabilistic insight allows decision-makers to prepare more effectively for worst-case scenarios, a practice aligned with modern risk management but often missing in conventional estimates.
It is important to note that while quantitative risk analysis methods like Monte Carlo simulation are well-established in many developed regions, their adoption in developing contexts has lagged. The literature observes that most prior applications of Monte Carlo cost simulation with correlation have been in North America, Europe, or Asia, where abundant data and specialized tools facilitate such analyses. In contrast, in Latin America—and Peru in particular—the use of these stochastic methods is still incipient. This study contributes to bridging that gap by demonstrating a practical implementation tailored to a data-scarce environment. The current approach shows that even with limited historical information, a carefully constructed model can yield meaningful risk metrics [40,41,42]. The case of the Oyón–Ambo road project serves as an example of how integrating Monte Carlo simulation into local project planning can enhance estimate reliability, offering a template for similar efforts in the region.
The implications of our findings extend to improving contingency practices. By capturing a more realistic spread of possible costs, our model suggests contingencies that are neither overly conservative nor dangerously low, but rather calibrated to a chosen confidence level. In a recent related study of Peruvian road projects, Ariza and Zavala [13] reported that applying Monte Carlo–based quantitative risk analysis reduced uncertainty significantly, resulting in recommended cost contingencies between 1.34% and 11% of project budgets—far lower than the ~32% cost overruns historically observed on similar projects. This improvement underscores how embracing formal risk analysis can narrow the chronic gap between estimated and actual costs. In the current study, we similarly see that a rigorous stochastic estimation can elucidate a plausible contingency range (on the order of only a few percent of total cost for high-confidence budgeting), which can be compared against traditional ad hoc contingencies or past project outcomes. Such comparisons reveal that quantitative risk models can substantially improve the accuracy of budget forecasts, thereby enhancing financial control.
The results are in strong agreement with global research on cost estimation under uncertainty [12,43], while also shedding light on context-specific considerations. The enhanced accuracy and robustness of the cost forecasts obtained here—attributable to modeling correlations and uncertainty comprehensively—mark a valuable advancement for infrastructure project management in developing regions. This comparative perspective, juxtaposing our findings with those of other authors and methods, not only validates the effectiveness of our approach but also illustrates the differences in outcomes when modern stochastic methods are employed versus when traditional techniques fall short. By reflecting on these differences, it underscore the importance of methodological choice in cost estimation: incorporating interdependencies and probabilistic risk assessment leads to more reliable budgets, ultimately improving the likelihood of project success in both local and global settings.

4.3. Cost–Benefit and Value of Information (VOI)

Modeling dependencies among variables raised P95 from USD 84.63 M (independence) to USD 86.26 M (with correlations), i.e., +USD 1.63 M (~2%), and widened the cost distribution (standard deviation from 0.61 to 0.73). This thicker tail translates into material changes in exceedance risk at typical budgeting thresholds: with the correlated model, if the budget were USD 84.00 M the exceedance probability would be 45.4%; at USD 84.50 M, 28.7%; at P90 = USD 85.00 M, 16.8%; and at P95 = USD 86.26 M, about 3.0%. To assess whether incorporating correlations is worthwhile, we propose a conservative proxy for the expected underestimation cost, computed as the gap between the adopted threshold and P95 (with ρ) times the exceedance probability at that threshold. This metric approximates the minimum benefit (expected value) of aligning contingency to the true risk when dependencies are present.
Results, as shown in Table 5, indicate that budgeting USD 84.50 M implies an expected overrun proxy ≈ USD 0.51 M, whereas budgeting P90 = USD 85.00 M reduces it to ≈ USD 0.21 M, and P95 = USD 86.26 M is effectively zero (≈0). Therefore, any analysis cost (data preparation, marginal fitting and ρ-matrix, simulation) below those expected values yields positive VOI. In large projects (>USD 80 M), the incremental effort to build/maintain a correlation matrix and run the model is small relative to the expected exposure avoided; moreover, the reduction in residual risk supports funding policies consistent with P90–P95 risk appetite.
The conservative expected-underestimation proxy is computed as Pr (Cost > Threshold) × max (P95 with ρ − Threshold, 0), where P95 with ρ is the 95th percentile under the correlated model. Intuitively, it treats P95 with ρ as a conservative stand-in for the size of the overrun whenever the budget is exceeded; multiplying by the exceedance probability yields a lower bound on the expected overrun that would arise if dependencies were ignored. This proxy does not integrate the full tail (as CVaR/expected shortfall) and therefore understates tail risk; it equals zero when Threshold ≥ P95 with ρ and is positive otherwise. It is suitable for quick cost–benefit/VOI screening with the evidence available here, while comprehensive funding decisions should also consider tail metrics and stress tests.

4.4. Limitations and Future Research

This study is constrained by data availability, distributional assumptions, and the chosen dependence structure. First, correlations among cost drivers were estimated from limited historical evidence and expert judgment; sampling error and potential non-stationarity (e.g., inflation regimes, FX shocks) may bias both the correlation matrix and the resulting tail risk. Second, the fitted marginal distributions and the skewness/asymmetry implied by our model are approximations; model misspecification can propagate to percentile-based contingencies. Third, linear correlation matrices capture co-movement but may under-represent tail dependence and asymmetric co-fluctuations. Future work should: (i) estimate dependencies empirically from multi-project panels and test vine-copulas against linear correlation; (ii) integrate dynamic price processes (e.g., SDEs for asphalt and steel) to link market volatility to budget risk; (iii) adopt hierarchical/Bayesian updating to recalibrate during execution; (iv) jointly model cost–schedule risk; and (v) back-test forecasts against realized outcomes, including extreme-value stress tests and bootstrap validation, to quantify calibration and improve transferability across contexts.
Furthermore, the fitting assumes stationarity after deflation, yet the 2019–2023 window likely embeds structural breaks (pandemic/inflation) and potential time-varying correlations. While our goal here is to quantify the VOI of modeling dependencies for contingency setting, a full time-series treatment (as break tests and segmentation, fitting distributions on residuals from ARIMA/ETS where needed, and rolling/regime-specific ρ) is outside this research. In practice, percentiles should be re-calibrated to contemporaneous construction/producer price indices at the time of funding decisions, and documented structural breaks should guide regime conditioning. When correlations appear time-varying, rolling-window or DCC-type diagnostics can be used; future extensions will incorporate these diagnostics and report CVaR alongside percentile metrics to strengthen out-of-sample applicability.
Linear correlation matrices capture average co-movement but under-represent tail dependence and asymmetric co-fluctuations; therefore, copula-based (as vine or t-copulas) dependencies are a priority for future work once broader panel data are available.

5. Conclusions

This paper advances stochastic cost estimation for transportation infrastructure by embedding correlated Monte Carlo simulation within a transparent risk framework. Relative to independence assumptions, modeling interdependencies among cost drivers produces a wider and more realistic cost distribution, materially lifting upper-tail percentiles (e.g., P90–P95) and, hence, contingency requirements. The approach yields interpretable outputs—exceedance probabilities for any budget threshold and sensitivity insights—that directly inform governance decisions on contingency sizing and risk appetite. In contexts where deterministic budgeting remains prevalent, these results demonstrate the practical value of probabilistic planning to reduce the likelihood and severity of cost overruns.
Beyond its empirical findings, the study offers a replicable template for agencies and practitioners operating under data constraints. By combining defensible marginal distributions with an explicit dependence structure, the method balances rigor and implementability, and can be progressively enriched with better data (panel histories), richer dependence models (copulas), and dynamic drivers (commodity price processes). The broader implication is clear: adopting correlated, percentile-based budgeting shifts practice from optimistic point estimates to risk-informed decisions, improving fiscal discipline and project credibility across the transportation sector.

Author Contributions

Conceptualization, V.A.F. and G.Z.; methodology, V.A.F.; software, V.A.F.; validation, G.Z., R.S. and J.B.C.; formal analysis, V.A.F.; investigation, V.A.F.; resources, G.Z.; data curation, V.A.F.; writing—original draft preparation, V.A.F.; writing—review and editing, G.Z.; visualization, R.S.; supervision, J.B.C.; project administration, V.A.F.; funding acquisition, V.A.F. All authors have read and agreed to the published version of the manuscript.

Funding

The APC was funded by Ariza Ingenieros Consulting Firm.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

We acknowledge Ariza Ingenieros Consulting Firm by the support given for this research.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. IMD World Competitiveness Center. IMD World Competitiveness Booklet 2023. Lausanne, June 2023. Available online: https://worldcompetitiveness.imd.org/ (accessed on 21 September 2024).
  2. World Economic Forum. The Global Competitiveness Report, 1st ed.; World Economic Forum: Geneva, Switzerland, 2019; Volume 1. [Google Scholar]
  3. ComexPeru. Infraestructura Vial: Gobierno Subnacionales Estancados. Available online: https://www.comexperu.org.pe/articulo/infraestructura-vial-gobiernos-subnacionales-estancados#:~:text=La%20provisi%C3%B3n%20de%20carreteras%2C%20puentes,semanarios%201007%20%20y%2010 (accessed on 10 April 2025).
  4. Brichetti, J.P.; Mastronardi, L.; Rivas, M.E.; Serebrisky, T.; Solís, B. The Infrastructure Gap in Latin America and the Caribbean: Investment Needed Through 2030 to Meet the Sustainable Development Goals; Inter-American Development Bank: Washington, DC, USA, 2021. [Google Scholar] [CrossRef]
  5. Portal Portuario. BID: Urge Reducir Costos Logísticos Para Enfrentar Efectos de la Pandemia en el Comercio de América Latina. Portal Portuario. Available online: https://portalportuario.cl/bid-urge-reducir-costos-logisticos-para-enfrentar-efectos-de-la-pandemia-en-el-comercio-de-america-latina/#:~:text=%E2%80%9CEl%20costo%20de%20transporte%2C%20log%C3%ADstica,o%20m%C3%A1s%20y%20est%C3%A1n%20relacionados (accessed on 30 April 2025).
  6. Serebrisky, T.; Suárez-Alemán, A.; Pastor, C.; Wohlhueter, A. Increasing the Efficiency of Public Infrastructure Delivery: Evidence-based Potential Efficiency Gains in Public Infrastructure Spending in Latin America and the Caribbean; Inter-American Development Bank: Washington, DC, USA, 2017. [Google Scholar] [CrossRef]
  7. Serebrisky, T.; Cavallo, E.A.; Powell, A. From Structures to Services: The Path to Better Infrastructure in Latin America and the Caribbean; Inter-American Development Bank: Washington, DC, USA, 2020. [Google Scholar] [CrossRef]
  8. Infobae. La Nueva Carretera Central: Primer Tramo Hubiera Costado S/11 Millones por 40 Kilómetros; Ahora el Precio Será más del Doble. Lima. 2024. Available online: https://www.infobae.com/peru/2024/03/04/carretera-central-el-primer-tramo-costaria-s-11-millones-por-40-kilometros-mas-ahora-el-precio-sera-mas-del-doble/#:~:text=un%20trazo%20de%20145%20kil%C3%B3metros,tendr%C3%A1%20una%20extensi%C3%B3n%20de%20185 (accessed on 27 April 2025).
  9. Contraloría General de la República. Reporte de Obras Paralizadas 2024; Contraloría General de la República: Bogotá, Colombia, 2024. [Google Scholar]
  10. Flyvbjerg, B.; Ansar, A.; Budzier, A.; Buhl, S.; Cantarelli, C.; Garbuio, M.; Glenting, C.; Holm, M.S.; Lovallo, D.; Lunn, D.; et al. Five things you should know about cost overrun. Transp. Res. Part A Policy Pract. 2018, 118, 174–190. [Google Scholar] [CrossRef]
  11. Jaco, J.C.E.; Galarza, C.M.L.; Venero, R.M.; Quispe, J.A.D. Monte Carlo Simulation in a Peruvian Highway. Civ. Eng. Archit. 2021, 9, 1727–1734. [Google Scholar] [CrossRef]
  12. Gómez, H.D.; Orobio, A. Effects of uncertainty on scheduling of highway construction projects. Dyna 2015, 82, 155–164. [Google Scholar] [CrossRef]
  13. Flores, V.A.A.; Ascaño, G.Z. Quantitative Risk Analysis Framework for Cost and Time Estimation in Road Infrastructure Projects. Infrastructures 2025, 10, 139. [Google Scholar] [CrossRef]
  14. Firouzi, A.; Yang, W.; Li, C.-Q. Prediction of Total Cost of Construction Project with Dependent Cost Items. J. Constr. Eng. Manag. 2016, 142. [Google Scholar] [CrossRef]
  15. Moselhi, O.; Roghabadi, M.A. Risk quantification using fuzzy-based Monte Carlo simulation. J. Inf. Technol. Constr. 2020, 25, 87–98. [Google Scholar] [CrossRef]
  16. Sobieraj, J.; Metelski, D. Project Risk in the Context of Construction Schedules—Combined Monte Carlo Simulation and Time at Risk (TaR) Approach: Insights from the Fort Bema Housing Estate Complex. Appl. Sci. 2022, 12, 1044. [Google Scholar] [CrossRef]
  17. Pal, S. Quantitative risk analysis for institutional building construction. Mater. Today Proc. 2022, 69, 127–132. [Google Scholar] [CrossRef]
  18. Nabawy, M.; Khodeir, L.M. A systematic review of quantitative risk analysis in construction of mega projects. Ain Shams Eng. J. 2020, 11, 1403–1410. [Google Scholar] [CrossRef]
  19. Chen, L.; Lu, Q.; Han, D. A Bayesian-Driven Monte Carlo Approach for Managing Construction Schedule Risks of Infrastructures Under Uncertainty. Expert Syst. Appl. 2022, 212, 118810. [Google Scholar] [CrossRef]
  20. Chen, L.; Lu, Q.; Li, S.; He, W.; Yang, J. Bayesian Monte Carlo Simulation–Driven Approach for Construction Schedule Risk Inference. J. Manag. Eng. 2021, 37, 04020115. [Google Scholar] [CrossRef]
  21. Flores, V.A.A.; Portocarrero, E. Integrating Resilience in Construction Risk Management: A Case Study on Peruvian Road Infrastructure. E3S Web Conf. 2024, 497, 02019. [Google Scholar] [CrossRef]
  22. Flores, V.A.A.; Salvador, R. Adaptive Risk Management in Road Construction: Oyon-Ambo Highway Insights, El Niño 2019 Case Study. E3S Web Conf. 2024, 497, 02020. [Google Scholar] [CrossRef]
  23. AACE. 57R-09: Integrated Cost and Schedule Risk Analysis Using Risk Drivers and Monte Carlo Simulation of a CPM Model; AACE: Morgantown, WV, USA, 2019. [Google Scholar]
  24. AACE. 42R-08: Risk Analysis and Contingency Determination Using Parametric Estimating; AACE: Morgantown, WV, USA, 2021. [Google Scholar]
  25. AACE International. 66R-11: Selecting Probability Distribution Functions for Use in Cost and Schedule Risk Simulation Models; AACE: Morgantown, WV, USA, 2012. [Google Scholar]
  26. Damnjanovic, I.; Reinschmidt, K. Data Analytics for Engineering and Construction Project Risk Management. In Risk, Systems and Decisions; Springer International Publishing: Berlin/Heidelberg, Germany, 2020. [Google Scholar] [CrossRef]
  27. Hofstadler, C.; Kummer, M. Chancen-und Risikomanagement in der Bauwirtschaft; Springer: Berlin/Heidelberg, Germany, 2017. [Google Scholar] [CrossRef]
  28. Cohen, J. Statistical Power Analysis for the Behavioral Sciences; Routledge: Abingdon, UK, 2013. [Google Scholar] [CrossRef]
  29. Palisade Corporation. @RISK, version 8.4; Palisade Corporation: Ithaca, NY, USA, 2024.
  30. Senić, A.; Dobrodolac, M.; Stojadinović, Z. Development of Risk Quantification Models in Road Infrastructure Projects. Sustainability 2024, 16, 7694. [Google Scholar] [CrossRef]
  31. Canesi, R.; Gallo, B. Risk Assessment in Sustainable Infrastructure Development Projects: A Tool for Mitigating Cost Overruns. Land 2023, 13, 41. [Google Scholar] [CrossRef]
  32. Haddad, R.K.; Harun, Z. Development of a Novel Quantitative Risk Assessment Tool for UK Road Tunnels. Fire 2023, 6, 65. [Google Scholar] [CrossRef]
  33. Chou, J.-S. Cost simulation in an item-based project involving construction engineering and management. Int. J. Proj. Manag. 2011, 29, 706–717. [Google Scholar] [CrossRef]
  34. El-Kholy, A.M.; Tahwia, A.M.; Elsayed, M.M. Prediction of simulated cost contingency for steel reinforcement in building projects: ANN versus regression-based models. Int. J. Constr. Manag. 2022, 22, 1675–1689. [Google Scholar] [CrossRef]
  35. Halil, F.M.; Ismail, H.; Hasim, M.S.; Hashim, H. A Conceptual Study on the Monte Carlo Simulation for Cost Forecasting in the Green Building Project. Environ.-Behav. Proc. J. 2020, 5, 75. [Google Scholar] [CrossRef]
  36. Halil, F.M.; Ismail, H.; Hasim, M.S.; Hashim, H. Monte Carlo Simulation for Cost Forecasting in the Green Building Project. Asian J. Qual. Life 2020, 5, 33–42. [Google Scholar] [CrossRef]
  37. Zhu, B.; Yu, L.-A.; Geng, Z.-Q. Cost estimation method based on parallel Monte Carlo simulation and market investigation for engineering construction project. Clust. Comput. 2016, 19, 1293–1308. [Google Scholar] [CrossRef]
  38. Touran, A.; Wiser, E.P. Monte Carlo Technique with Correlated Random Variables. J. Constr. Eng. Manag. 1992, 118, 258–272. [Google Scholar] [CrossRef]
  39. Touran, A. Probabilistic Cost Estimating with Subjective Correlations. J. Constr. Eng. Manag. 1993, 119, 58–71. [Google Scholar] [CrossRef]
  40. Smirnova, E. The use of the Monte Carlo method for predicting environmental risk in construction zones. J. Phys. Conf. Ser. 2020, 1614, 012083. [Google Scholar] [CrossRef]
  41. Zamani, V.; Yavari, E.; Taghaddos, H. A science mapping lens on discrete event simulation applications in construction engineering and management. Autom. Constr. 2024, 166, 105625. [Google Scholar] [CrossRef]
  42. Qazi, A.; Simsekler, M.C.E. Risk assessment of construction projects using Monte Carlo simulation. Int. J. Manag. Proj. Bus. 2021, 14, 1202–1218. [Google Scholar] [CrossRef]
  43. Zhasmukhambetova, A.; Evdorides, H.; Davies, R.J. Integrating Risk Assessment and Scheduling in Highway Construction: A Systematic Review of Techniques, Challenges, and Hybrid Methodologies. Future Transp. 2025, 5, 85. [Google Scholar] [CrossRef]
Figure 1. Methodological Framework of the Research.
Figure 1. Methodological Framework of the Research.
Futuretransp 05 00176 g001
Figure 2. Simulated distributions of unit prices for key materials expressed in US dollars (USD). Subfigure (a) shows the adjusted distribution for fine aggregate, while subfigure (b). corresponds to cement. Both curves include a normal distribution fit based on historical data from the Peruvian market.
Figure 2. Simulated distributions of unit prices for key materials expressed in US dollars (USD). Subfigure (a) shows the adjusted distribution for fine aggregate, while subfigure (b). corresponds to cement. Both curves include a normal distribution fit based on historical data from the Peruvian market.
Futuretransp 05 00176 g002
Figure 3. Results of the Monte Carlo simulation for the estimation of the Oyón–Ambo project cost. Figure (a) shows the total cost distribution without considering correlations among variables, while figure (b) incorporates a correlation matrix between random variables, showing greater dispersion and an increase in the 95th percentile of the simulated total cost.
Figure 3. Results of the Monte Carlo simulation for the estimation of the Oyón–Ambo project cost. Figure (a) shows the total cost distribution without considering correlations among variables, while figure (b) incorporates a correlation matrix between random variables, showing greater dispersion and an increase in the 95th percentile of the simulated total cost.
Futuretransp 05 00176 g003
Figure 4. Histogram of Simulated Total Costs under the Correlated Monte Carlo Model.
Figure 4. Histogram of Simulated Total Costs under the Correlated Monte Carlo Model.
Futuretransp 05 00176 g004
Figure 5. Cumulative Distribution Function (CDF) of Simulated Total Costs and Budget Exceedance Probability.
Figure 5. Cumulative Distribution Function (CDF) of Simulated Total Costs and Budget Exceedance Probability.
Futuretransp 05 00176 g005
Table 1. Goodness-of-fit tests for key variables of the stochastic model.
Table 1. Goodness-of-fit tests for key variables of the stochastic model.
VariableTestTest Statistic (D/A2)p-Value/Critical LevelResult
Portland Cement for ConcreteKolmogorov–Smirnov0.05080.9467Normality not rejected
Portland Cement for ConcreteAnderson–Darling0.25340.7590 (5%)Normality not rejected
FuelKolmogorov–Smirnov0.07330.6293Normality not rejected
FuelAnderson–Darling0.62140.7590 (5%)Normality not rejected
Fine aggregateKolmogorov–Smirnov0.05430.9136Normality not rejected
Fine aggregateAnderson–Darling0.40660.7590 (5%)Normality not rejected
Coarse aggregateKolmogorov–Smirnov0.06680.7405Normality not rejected
Coarse aggregateAnderson–Darling0.38640.7590 (5%)Normality not rejected
Table 2. Variables and inputs with probabilistic behavior in the stochastic model.
Table 2. Variables and inputs with probabilistic behavior in the stochastic model.
CategoryRandom VariableType of Distribution
Input prices1. Portland Cement priceNormal
2. Fine aggregate priceNormal
3. Coarse aggregate priceNormal
4. Fuel priceNormal
5. Steel priceTriangular
6. Cost of plasticizer additiveTriangular
Labor productivity7. Productivity in earthworks (m3/day)PERT
8. Productivity in granular base spreadingPERT
9. Productivity in concrete placementPERT
10. Productivity in asphalt layer placementPERT
Equipment efficiency11. Hourly cost of front loaderNormal
12. Hourly cost of compactor rollerNormal
13. Roller efficiency (m3/h)PERT
14. Motor grader efficiencyPERT
15. Hourly cost of concrete mixer truckTriangular
Table 3. Correlated random variables in the stochastic model.
Table 3. Correlated random variables in the stochastic model.
FuelHeavy MachineryLaborFine AggregateCoarse AggregateRollerCompaction
Fuel1.000.78
Heavy Machinery0.781.00
Labor 1.00 0.65
Fine Aggregate 1.000.83
Coarse Aggregate 0.831.00
Roller 1.000.71
Compaction 0.65 0.711.00
Table 4. Comparison of statistical indicators between simulations with and without correlation.
Table 4. Comparison of statistical indicators between simulations with and without correlation.
IndicatorWithout Correlation (Red)With Correlation (Blue)
Mean (USD million)83.5584.01
Percentile 10 (P10)82.7783.06
Percentile 50 (P50)83.5083.98
Percentile 90 (P90)84.3985.00
Standard deviation0.610.73
Skewness0.320.18
Kurtosis−0.44−0.52
Table 5. Exceedance Risk and Expected Overrun Proxy by Budget Threshold.
Table 5. Exceedance Risk and Expected Overrun Proxy by Budget Threshold.
Budget ThresholdExceedance Probability (%)Gap vs. P95 with ρ (MUSD)Expected Overrun Proxy (MUSD)
USD 84.00 M45.42.261.03
USD 84.50 M28.71.760.51
P90 = USD 85.00 M16.81.260.21
P95 = USD 86.26 M~3.00.000.00
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zavala, G.; Ariza Flores, V.; Santos, R.; Blas Cano, J. Stochastic Cost Estimation in Transportation Infrastructure Projects Using Monte Carlo Simulation and Correlated Risk Variables. Future Transp. 2025, 5, 176. https://doi.org/10.3390/futuretransp5040176

AMA Style

Zavala G, Ariza Flores V, Santos R, Blas Cano J. Stochastic Cost Estimation in Transportation Infrastructure Projects Using Monte Carlo Simulation and Correlated Risk Variables. Future Transportation. 2025; 5(4):176. https://doi.org/10.3390/futuretransp5040176

Chicago/Turabian Style

Zavala, Gerber, Victor Ariza Flores, Ricardo Santos, and Jaime Blas Cano. 2025. "Stochastic Cost Estimation in Transportation Infrastructure Projects Using Monte Carlo Simulation and Correlated Risk Variables" Future Transportation 5, no. 4: 176. https://doi.org/10.3390/futuretransp5040176

APA Style

Zavala, G., Ariza Flores, V., Santos, R., & Blas Cano, J. (2025). Stochastic Cost Estimation in Transportation Infrastructure Projects Using Monte Carlo Simulation and Correlated Risk Variables. Future Transportation, 5(4), 176. https://doi.org/10.3390/futuretransp5040176

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop