Next Article in Journal
Energy Consumption Patterns and Characteristics of College Dormitory Buildings Based on Unsupervised Data Mining Method
Previous Article in Journal
Use of BIM as a Support for Tendering of Facility Management Services
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Project Cost Overrun Risk Prediction Using Hidden Markov Chain Analysis

Department of Civil and Construction Engineering, National Taiwan University of Science and Technology, 43 Keelung Road, Section 4, Taipei 106, Taiwan
*
Author to whom correspondence should be addressed.
Buildings 2023, 13(3), 667; https://doi.org/10.3390/buildings13030667
Submission received: 9 January 2023 / Revised: 26 February 2023 / Accepted: 1 March 2023 / Published: 2 March 2023
(This article belongs to the Section Construction Management, and Computers & Digitization)

Abstract

:
Construction project cost overrun is a common problem in the construction industry. The cost of construction projects is thought to have increased by approximately 33% on average. Several types of research on construction project cost overrun have been conducted and these generally rely on historical data. However, whilst each project has its own project characteristics and cost trend, real-time project cost data are more reliable to forecast its own cost trend. This paper proposes a real-time hidden Markov chain (HMM) model to predict cost overrun risk based on project-owned cost performance data and the corrective actions if adopted. The cost overrun events occurrence in this model was assumed to follow a Poisson arrival pattern. Real-time HMM with a particle filter was used to run the simulation. One SRC building project in Taiwan was used for model validation and comparison. The posterior probabilities from the real-time HMM model were highly consistent with the cost overrun ratios of real construction projects. The proposed cost overrun prediction model could provide an early alert of cost overruns to the project manager. Based on the survey of cost overrun risk and significantly influential factors, we propose effective cost management plans to alleviate the frequency of project cost overrun.

1. Introduction

One common problem in the construction industry is project cost overrun. The cost of construction projects is thought to have increased by approximately 33% on average [1]. In the construction project cost domain, many studies have focused on developing methodologies that incorporate the effects of uncertainty on project cost overruns. Most of them heavily rely on historical data. Nevertheless, each project may have its own project characteristics and cost trend. Historical data are best used as the prior information at the beginning of the project. During the project operation, real-time project cost data are more reliable to forecast its own cost tendency. As explored in past research, one of the most important requirements of a cost system is to give a trustful warning of cost overruns as early as possible [2,3].
Many cost prediction techniques have been explored in the construction industry, such as regression, simulation, artificial neural network (ANN), and fuzzy sets [4,5,6,7]. The main difference is the input data for training and testing; i.e., whether the historical data are from the industry or the project itself. Additionally, in practice, the project manager needs to assess the effect of the corrective actions if they are adopted to minimize the expected variances from planned performance. It is more productive for a cost overrun prediction model to take the effect of corrective action into consideration.
Many previous studies have developed macro-level prediction models which need a lot of historical construction project data or questionnaire data as input to model construction. Few studies support the assessment of cost overrun based upon real-time project-owned data (micro-level). Additionally, the previous models seldom consider the effect of corrective action. To overcome the limitations of traditional cost overrun prediction models, the model in this paper used HMM with the Poisson process to relate the influence of cost overrun factors and the occurrence of cost overrun events, and then adopted the particle filter algorithm to perform the sampling based upon the assessment data. In this model, only the project’s own cost data and status reports are used. The model presented in this research attempts to forecast the cost overrun probability based on the project-owned cost performance data and the corrective action if adopted. The significant features of the model built are: (1) the model is constructed without historical data as prior information; (2) it mainly relies on the project cost records and is reports to perform real-time prediction; (3) it can assess the effect of the cost corrective action after the occurrence of cost overrun; and (4) the potential cause combinations with high possibility affecting the project cost overrun can be supplementarily surveyed using a sensitivity analysis inside HMM inference.
This paper is organized as follows. First, the literature review of cost overrun prediction models and factors affecting cost overrun are reviewed. Second, the Poisson cost overrun model is first presented, and then a real-time HMM algorithm with a particle filter for cost overrun probability estimates is presented. Finally, an SRC project in Taiwan was employed to verify the applicability of the real-time HMM algorithm.

2. Literature Review

The prediction models can be generally classified into two categories: the causal model and the time-sequential model. The causal prediction model must collect, compare, and summarize the common significant causes to build the model. The time-sequential model mainly relies on the historical data of surveyed targets to make the model, such as EVM extrapolation. In previous studies, various statistical and artificial intelligence methods and tools have been used to solve the problem of predicting construction costs and cost overruns in the construction projects, such as regressions, neural networks, machine learning, fuzzy logic, Bayesian network, simulation, etc. [4,6,7,8,9,10,11,12,13,14]. These previous studies have mainly focused on the macro level for the overall assessment (e.g., early budget estimates) using various statistical and artificial intelligence methods. Using the comprehensive evaluation, macro-level factors are generally defined for the model construction. They can be project scope, project size, project duration, etc.
As stated above, whilst each project has its own project characteristics and cost trend, real-time project-owned cost data are more reliable for forecasting their own cost trend. This study plans to develop a prediction model only based on the project’s own cost data. In addition to conventional EVM extrapolation methods (e.g., linear, exponential, and trend extrapolation), some deterministic and stochastic models were proposed in the past. Chen et al. [15] proposed a straightforward modeling method for improving the predictive power of the planned value (PV) so that the earned value (EV) and actual cost value (AC) could also be correspondingly improved. Acebes et al. [16] drew upon Monte Carlo simulation to obtain information about the expected behavior of the project and then used statistical learning methodologies to detect the project deviations. Sackey et al. [17] adopted linear regression and time series to predict duration at completion based on the actual time spent on each activity. Zhao and Zi [18] applied the exponential smoothing technique to forecast project costs at EVM. Yu et al. [19] proposed an active construction dynamic schedule management model based on fuzzy earned value management and a BP neural network to predict project duration under risk.
Based upon the survey mentioned above, the summary and the limitations of previous research are described as follows: (1) many previous researchers have developed macro-level models which require a lot of historical construction project data or questionnaire data as input to model construction; few types of research support the assessment of cost overrun based upon real-time project-owned data (micro-level); and (2) previous models have seldom considered the effect of corrective action. The model presented in this research attempted to develop a prediction model to estimate the cost overrun probability founded on the project-owned cost performance data and the corrective action if adopted during the project execution. Furthermore, based upon the common definition of cost overrun factors for a project, the potential cause combinations with a high possibility affect the project cost overrun were supplementarily surveyed using sensitivity analysis inside HMM inference.
As stated above, the model proposed in this paper focused on the construction of the time-sequential prediction model using HMM with a particle filter approach. For the further identification of the potential causes and combinations that affect the project cost overruns using sensitivity analysis inside HMM inference, this paper surveyed the common classification of cost overrun factors. This classification definition was used to consistently categorize project-specific cost overrun causes. The classification of cost overrun factors is diversified based on the research focuses and purposes [1,20,21,22,23,24,25,26,27,28]. For the overall assessment, macro-level factors are generally defined for the model construction. They can be project scope, project size, project duration, etc. As discussed above, it may be more reliable to adopt project-owned cost performance data to estimate and control project cost overrun during project execution. These factors belong to the project-specific level (micro level); i.e., they are generally stepwise assessed and recorded in the project cost reports based on cost performance outcome and the corresponding influence factors. Yeo [27] claimed that the scope and quantity increases, engineering and design changes, underestimation, and unforeseen conditions could cause cost overrun risks. Elinwa and Buba [1] summarized three influence factors of cost overrun: the cost of materials, management practices, and fluctuation in material prices. Based upon the study of Kaming et al. [29], inflationary increases in material cost, inaccurate material estimation, and project complexity were the three main cost overrun factors. Dissanayaka and Kumaraswamy [21] indicated the cost overrun factors to be the construction team, risk retained by a client, project complexity, and payment modality. Wang and Demsetz [25,26] summarized five significant cost overrun factors: approval delay, weather, material delivery, labor, and equipment. In the study of Elhag et al. [22], several external and internal cost overrun factors were summarized as client characteristics, consultant and design parameters, contractor attributes, project characteristics, contract procedures, and procurement methods, as well as external factors and market conditions. Aljohani et al. [30] intensively surveyed the causes of construction project overrun based on a literature review and summarized 173 causes of cost overrun in seventeen internal and external frameworks. Xie et al. [28] surveyed the critical influence factors in construction projects using fuzzy synthetic evaluation. There were 65 critical factors covered in the research which were classified into four categories: project macro, project management, project environment, and core stakeholder.
The cost overrun factors were apparently different in the afore-mentioned studies. This research attempts to forecast cost overrun probability based on the project-owned cost performance data and corrective action if adopted. By unifying the factors proposed in the previous studies, these attributes were re-classified based on their common characteristics. Five significant project-specific classification factors were defined and they are weather, productivity, material, equipment, and management. The project cost tends to overrun if the poor status of these factors happens during the construction project execution. The real-time status of these factors can be summarized and surveyed following the project reports and checklists. Based upon the performance data and the corrective actions input to the model, the cost overrun risk can be in-time assessed and the effect of the corrective action is also surveyed. Accordingly, the potential cause combinations with high possibility affecting the project cost overrun can be supplementarily surveyed using sensitivity analysis. Based on the survey of cost overrun risk and influence factors with high possibility, the project management division can establish the proper effective cost–risk treatment plans in a timely manner.

3. Real-Time Cost Overrun Prediction Method

To achieve the aforementioned objective, we propose a real-time HMM method to forecast the cost overrun probability based on the cost performance data and the adopted corrective actions. In the model, the Poisson process was used to simulate cost overrun occurrence events with unknown arrival rates and impacts. The effect of corrective action was also unidentified and defined as an unknown modeling parameter. An HMM algorithm using a particle filter was proposed to learn the unknown parameters and update the cost overrun probability in a real-time manner. The overall analysis process of the proposed model is illustrated in Figure 1. Mainly, it is composed of Poisson cost overrun model and a real-time Bayesian updating model. Their detail will be depicted and explained in the following.

3.1. Poisson Cost Overrun Model

The proposed Poisson cost overrun model consists of three modules: (1) cost overrun events occurrence module; (2) corrective action module; and (3) cost status assessment module. They are described in detail in the following:

3.1.1. Cost Overrun Events Occurrence Module

This study mimicked the cost overrun events with the project operation lifecycle as a stochastic process. Cost overruns can be regarded as discrete rare events, compared with regular cost conditions [9,31]. This paper followed Touran [9] to assume a Poisson arrival pattern and independent random variables for the cost overrun events. A cost overrun event is described as a random event following a Poisson process with a mean rate of occurrence equal to μ per unit of time, and the occurrence rate contributes a cost overrun amount equal to λ. In most cases, λ and μ are unknown.
The discrete-time index k is defined to represent the time. Let X k be the accumulated number of cost overrun events at the discrete time   k , therefore
X k + 1 = X k + V k k = 0 , 1 , , T 1
where T means the total discrete-time duration of interest; and V k follows Poisson distribution with a mean value μt, i.e.,
P ( V k = τ ) = ( μ . Δ t ) τ e μ . Δ t τ !
Assume that there is no cost overrun at the beginning of the project; i.e., X 0 = 0 . Since Poisson process are memoryless, V0, V1, …, VT−1 are independently identically distributed, so X0, X1, …, XT form a Markov chain. The actual accumulated amount of cost overrun at time instant k is λXk, and the cost overrun probability at time k is P (λXk > 1). If a previous known event Xn (n < k) is defined as xn, the cost overrun probability at time k is
P [ λ X k > 1 | λ , μ , x n ] = P ( X k x n > 1 λ x n | λ , μ , x n ) = τ = 1 / π x n [ ( k n ) μ Δ t ] τ e ( k n ) μ Δ t τ !
where . represents the smallest integer greater than the internal real number. Here, we implement the fact that Xkxn follows a Poisson distribution with a mean rate of occurrence equal to ( k n ) μ Δ t .

3.1.2. Corrective Action Module

The effect of corrective actions was further defined in the model to overcome the limitation of previous research in which corrective action was not covered and assessed. It is assumed that if the cost is overrun at time instant m, corrective action needs to be taken at that time. In practice, it is hoped that the actual cost (AC) returns to the planned value (PV) after the corrective action is utilized. However, due to the improved performance gap, even the corrective action is taken as project AC does not return to PV. It is reasonable to assume that, if the corrective action is taken at time instant m, λXm will be set equal to a random number ν [ 0 ,   1 ] (i.e., Xm =   ν / λ ).

3.1.3. Cost Status Assessment Module

The judgment of cost status may be affected by some noisy information, such as incomplete progress data, subjective experience, etc. It is necessary to assume that an assessment random variable at a time instant i is defined to judge whether the cost is overrun; i.e., to determine whether λXk is greater than 1 or not. Given λ and Xi = xi, the probability of cost overrun is
  P ( Y i = 1 | λ , x i , α ) = 1 1 + e α ( λ x i 1 )
It is assumed that the cost overrun status assessment data are D k = { Y ^ i i = 1 , , k } where Yi is the noisy assessment at the time i: 1 means the cost identified to be overrun, −1 means the cost identified to be underrun, and 0 means that the cost is identified in the budget. The variable, α , is an unknown parameter that characterizes the uncertainty degree of the cost status assessment outcome. A large α means a more accurate assessment and a small α represents a poor and noisy assessment.

3.2. Real-Time Bayesian Updating Model

As discussed above, the model parameters λ, μ, α , and ν are usually unknown. The most probable values are essential to be determined based on the actual performance data from the project report. This paper utilizes a Bayesian updating approach to estimate the probability distribution of the parameters from the project performance data. The overall data sampling process of the real-time Bayesian updating model based on the particle filter is depicted in Figure 2.

3.2.1. Real-Time Estimation and Prediction Algorithms

The assessment data D 1 k   , λ , μ , α , ν , X k samples can be drawn from f ( λ , μ , α , ν , x k | D 1 k   ) by the stochastic simulation methods as discussed below. Let those samples be denoted by { λ k ( j ) , μ k ( j ) , α k ( j ) , ν k ( j ) , X k ( j ) j = 1 , , N } where N is the total sample number. Once the initial samples are appropriately obtained, the real-time estimate algorithms were inferred and described as follows.

3.2.2. Real-Time Cost Overrun Probability

According to the Law of Large Number, the real-time cost overrun probability can be estimated as
P ( λ X k > 1 | D 1 k   ) = P ( λ X k > 1 | D 1 k   , λ , x k ) f ( λ , x k | D 1 k   ) 1 N j = 1 N P ( λ X k > 1 | D 1 k   , λ k ( j ) , X k ( j ) ) 1 N j = 1 N P ( λ X k > 1 | λ k ( j ) , X k ( j ) ) 1 N j = 1 N I ( λ k ( j ) X k ( j ) > 1 )

3.2.3. Future Cost Overrun Probability

Moreover, if k > T, P ( λ X k | D 1 T   ) stands for the failure probability at future time k given past data D1:T. Based on the Law of Large Number,
P ( λ X k > 1 | D 1 T   ) = P ( λ X k > 1 | λ , μ , x T , D 1 T   ) f ( λ , μ , x T | D 1 T   ) d λ . d μ . d x T = P ( λ X k > 1 | λ , μ , x T ) f ( λ , μ , x T | D 1 T   ) d λ . d μ . d x T 1 N j = 1 N P ( λ X k > 1 | λ T ( j ) , μ T ( j ) , X T ( j ) ) 1 N j = 1 N [ τ = 1 / λ T ( j ) X T ( j ) [ ( k T ) μ T ( j ) Δ t ] τ e ( k T ) μ T ( j ) Δ t τ ! ]
where { λ T ( j ) , μ T ( j ) , X T ( j ) j = 1 , , N } are samples from f ( λ , μ , x T | D 1 T   ) under the condition that conditioning on XT, D1:T, and Xk are independent.

3.2.4. Simulation Sampling

How to do sample drawing from f ( λ , μ , α , ν , x k | D 1 k   ) is a prerequisite for computing all estimates. This means that it is vital to find a real-time sample drawing mechanism, i.e., wherein { λ k ( j ) , μ k ( j ) , α k ( j ) , ν k ( j ) , X k ( j ) j = 1 , , N } distributed as f ( λ , μ , α , ν , X k | D 1 k   ) and the new data Y ^ k + 1 , { λ k + 1 ( j ) , μ k + 1 ( j ) , α k + 1 ( j ) , ν k + 1 ( j ) , X k + 1 ( j ) j = 1 , , N } distributed as f ( λ , μ , α , ν , X k + 1 | D 1 k + 1 ) can be acquired with no reference to the result from the previous time steps. The real-time Bayesian updating algorithm utilized a particle filter approach.

3.2.5. Model Revisited

Before the brief introduction of the particle filter algorithm, the model below is defined following Equation (1).
[ X k + 1   λ k + 1   μ k + 1   α k + 1   ν k + 1   ] = [ X k   λ k   μ k   α k   ν k   ] + [ V k   0   0   0   0   ] X 0 = 0 ,   V k ~ . i . d   N ( μ k , σ k 2 ) k = 0 , 1 , , T 1
λ 0 = f ( λ ) μ 0 = f ( μ ) α 0 = f ( α ) ν 0 = f ( ν )
where X k , λ k , μ k , α k and νk are the model “state variables”. This model explicitly states the prior probability density functions (PDFs) for the uncertain variables λ , μ , α , and ν. Note that the values of the parameters λ k , μ k , α k , and νk keep fixed over time.
This above-mentioned model in Equation (7) depicts the evolution of the actual model state updating without the consideration of corrective actions. If a corrective action is conducted at time m, Xm will be readjusted to νm/λm, where νm = 0 for AC return to PV when the corrective action taken and νm > 0 for AC does not return to PV, even under the corrective action. Notice that, although Equation (7) describes the formula of the state evolution, the real values of the state are underdetermined since λ0, μ0, α0, and ν0 are uncertain and { V k k = 0 , , T 1 } are also uncertain.

3.2.6. Particle Filter Approach and Process

The values of Xk, λk, μk, αk, and νk based on Equation (7) are further simulated from f ( λ , μ , α , ν , X k | D 1 k   ) with the incorporation of the real-time assessment data D1:k using particle filter algorithm. To simplify notations, the state at time k is defined as Zk, i.e., Z k = { λ k , μ k , α k , ν k , X k } . The simulation process using the particle filter algorithm was explained as follows.
The given samples { Z k ( j ) j = 1 , , N } distributed as f ( z k | D 1 k   ) and the new assessment data Y ^ k + 1 , { Z k + 1 ( j ) j = 1 , , N } samples distributed as f ( z k + 1 | D 1 k + 1 ) can be obtained without referring to the result of earlier time instants. Once the initial samples are drawn from f ( z 0 | D 10   ) , it would be easier to sample from f ( z k | D 1 k   ) at any time instant k using a particle filter algorithm. Note the initial state f ( z 0 | D 10   ) that is simply f ( λ 0 , μ 0 , α 0 , ν 0 , x 0 ) , which can be easily sampled. The following states using the particle filter algorithm are presented as a semi-code right after the derivations.
Let { Z k ( j ) j = 1 , , N } be the samples from f ( z k | D 1 k   ) . By following the Law of the Large Numbers, f ( z k | D 1 k   ) can be approximated as:
f ( z k | D 1 k   ) 1 N j = 1 N δ ( z k Z k ( j ) )
where δ is the Dirac delta function. According to Bayes’ rule:
f ( z k + 1 | D 1 k + 1 ) = f ( z k + 1 , D 1 k + 1 ) f ( D 1 k + 1 ) = f ( z k , z k + 1 , D 1 k   , Y ^ k + 1 ) d z k f ( D 1 k + 1 ) = f ( Y ^ k + 1 | z k , z k + 1 , D 1 k   ) . f ( z k + 1 | z k , D 1 k   ) . f ( z k | D 1 k   ) d z k f ( Y ^ k + 1 | D 1 : k ) = f ( Y ^ k + 1 | z k + 1 ) . f ( x k + 1 | z k ) . f ( z k | D 1 k   ) d z k f ( Y ^ k + 1 | D 1 : k ) = f ( Y ^ k + 1 | z k + 1 ) f ( z k + 1 | z k ) 1 N j = 1 N δ ( z k Z k ( j ) ) d z k f ( Y ^ k + 1 | D 1 : k ) 1 N i = 1 N [ f ( Y ^ k + 1 | z k + 1 ) . f ( z k + 1 | Z k ( j ) ) f ( Y ^ k + 1 | D 1 : k ) ]
where the derivations were conducted under the assumption of Z k ,   D 1 k   a n d   Z k + 1   are independent, and the adjustment to Z k + 1 ,   Y k + 1 is also independent.
Drawing samples based on the mixture N PDFs in proportion to f ( Y ^ k + 1 | z k + 1 ) . f ( z k + 1 | Z k ( j ) ) in Equation (9) is akin to drawing samples from f ( z k + 1 | D 1 k + 1 ) . There are several ways to draw N samples from the mixture PDF and a simple way is that of sample-importance resampling (SIR), in which the SIR process is explained as follows:
Given a previous sample Z k ( j ) , the main SIR task is to draw Z k + 1 C ( j ) (C stands for the candidate) from f ( z k + 1 | Z k ( j ) ) . The candidates, { Z k + 1 C ( j ) j = 1 , , N } , are drawn first. Suppose the previous sample Z k ( j ) contains X k ( j ) , λ k ( j ) , μ k ( j ) , α k ( j ) a n d   ν k ( j ) , and then drawing sample from f ( z k + 1 | Z k ( j ) ) can be obtained by letting
X k + 1 C ( j ) = X k ( j ) + V k                                             V k P o i s s o n   ( μ k ( j ) ) λ k + 1 C ( j ) = λ k ( j ) μ k + 1 C ( j ) = μ k ( j ) α k + 1 C ( j ) = α k ( j ) ν k + 1 C ( j ) = ν k ( j )
The above-mentioned drawing process is conducted under the condition of cost underrun. In this case, there is a cost overrun or in budget at time k + 1, i.e., Y ^ k + 1 ≠ −1, as these candidates are not distributed as f ( z k + 1 | D 1 k + 1 ) since these samples need to include the new information Y ^ k + 1 . The importance weights { w k + 1 ( j ) j = 1 , , N } will be embodied in each candidate:
w k + 1 ( j ) = f ( Z k + 1 C ( j ) | D 1 k + 1 ) f ( Z k + 1 C ( j ) | D 1 k   ) f ( Z k + 1 C ( j ) | D 1 k   ) f ( Y ^ k + 1 | Z k + 1 C ( j ) ) f ( Z k + 1 C ( j ) | D 1 k   ) = f ( Y ^ k + 1 | Z k + 1 C ( j ) ) = [ 1 1 + e α k + 1 C ( j ) ( λ k + 1 C ( j ) X k + 1 C ( j ) 1 ) ] Y ^ k + 1 [ 1 1 1 + e α k + 1 C ( j ) ( λ k + 1 C ( j ) X k + 1 C ( j ) 1 ) ] 1 Y ^ k + 1
where w k + 1 ( j ) reflects the relative degree plausibility of candidates Z k + 1 C ( j ) about the new information Y ^ k + 1 . Once the weight is obtained, the samples of f ( z k + 1 | D 1 k + 1 ) , denoted by { Z k + 1 ( j ) j = 1 , , N } , can be obtained by resampling { Z k + 1 C ( j ) j = 1 , , N } according to their weights { w k + 1 ( j ) j = 1 , , N } , i.e., let Z k + 1 ( j ) = Z k + 1 C ( 1 ) with the probability
w k + 1 ( 1 ) j = 1 N w k + 1 ( j )   j = 1 , , N

3.2.7. Check for Corrective Actions

If a corrective action is executed for time instant k + 1 when the actual cost (AC) goes above the planned value (PV), let X k + 1 j = ν k + 1 j λ k + 1 j for j = 1 , , N . If a corrective action is taken and AC is assumed to return to PV, i.e., X k + 1 ( j ) = 0 for j = 1 , , N . The simulated samples { Z k ( j ) j = 1 , , N } are distributed as f ( z k | D 1 k   ) . The λ , μ , α , ν parts of samples are distributed as f ( λ , μ , α , ν | D 1 k   ) , and the X k parts of samples are distributed as f ( x k | D 1 k   ) . These samples can be further combined to estimate the cost overrun probability at every time instant in real-time.

4. Model Validation and Explanation

4.1. Parameter Tuning

Before the validation against a real project case, it is necessary to conduct the parameter tuning of the real-time HMM cost overrun prediction model compared with a simulated example taken from Barraza et al. [32]. The simulated example is the bridge construction project consisting of a prestressed concrete girder bridge of three 30 m spans, and a cast-in situ deck, supported on two river piers and two abutments on level banks. The planned cost and the duration of the bridge project activities are presented in Table 1. The project duration and the budget were 289 days and USD 632,669, respectively.
The time interval Δ t is ten months and the assessment basis is taken as monthly. The actual evolution of { X k k = 0 , , T } is simulated according to Equation (7) and the assessment result X k at the k-th month is simulated according to Equation (4), where λ , μ , and α are prescribed real numbers. If a cost overrun is reported from the assessment in the k-th month, the project manager will immediately take corrective action, i.e., X k will be set to (1 − ν)/λ right after the assessment, where ν is a prescribed real number based upon corrective action.
The assessment and the validation were conducted as follows. First, a blind examination was conducted; i.e., no prior knowledge of the initial input data of λ , μ , α , ν ,   and { X k k = 0 , , T } . The assessment result is denoted as Y ^ i and its value is defined as 1 for cost overrun, 0 for in budget, and −1 for under budget at time instance i. Because of no prior information about the parameter λ , its prior PDF is fairly assumed to follow a uniform distribution over a relatively broad interval [0.0001, 1], and the parameter α as uniform over [10,31]. The PDF for ν is defined as uniform over [0.1, 1], and the prior PDF for μ as uniform over [0.001, 0.5]. The number of simulation samples N is defined as 5000.
Figure 3 shows the real-time samples of the unknown parameters and the samples drawn from f ( λ , μ , α , ν | D 1 k   ) for k = 0, 50, 100, and 200 of all factors. These ranges reflect the updated values of the unknown parameters. As shown in Figure 2, { λ , μ } samples evolve with time and finally cluster around their actual values for both parameters, while the   { α , ν } parameters seem unidentifiable from the assessment data. To compare the result from Barraza et al. [32] with our model on the same basis, the cost overrun probabilities in Barraza et al. [32] were counted following the normal distribution, in which the overrun average and the standard deviation were estimated based upon historical data.
Table 2 shows the comparison of the cost overrun probabilities between Barraza et al. [32] and our model. The cost overrun probabilities in Barraza et al. [32] tends to increase as the project duration becomes longer. In contrast, the cost overrun probabilities from our model fell down after the cost overrun events were recorded. The basic reason is that the proposed model takes corrective action into consideration. Once the cost overrun was indicated (e.g., project at day 50), the corrective action would be taken in practice. The cost overrun probability fell from day 50 to 100. The project cost control became poorer from day 50 to 289, as indicated by the probability increase. In real cost records, the cost over occurred again at time 289. This means that, after the corrective action is taken, the project cost is generally under control and within budget. Nevertheless, there is still a chance of overrunning if the project cost is gradually lost control. In practice, it is fairly stated that, in addition to project duration, the cost overrun probability trend significantly depends on the status of influencing factors (such as management and material). If they are under control, the cost overrun probability becomes lower.

4.2. SRC Building Project in Taiwan

An SRC building project with a comprehensive cost report was further employed to illustrate the use of the proposed method. This SRC building project is located in Taipei, Taiwan. It is a compound building composed of two towers (12F/2B) and six towers (5F/2B). The project duration and the budget are 35 months and NTD 956,912,592 (USD 29,903,520), respectively. The status of the cost overrun in the project is shown in Table 3 based on the project cost report. The project report recorded eight cost overrun events (the period from the second to eighth months and the tenth month) and Table 4 lists the influential factors status within the project duration to be assessed. The statuses were defined as 1 for cost overrun, 0 for in budget, and −1 for under budget at time instant i.
The assessment simulation was conducted on monthly basis. Basically, the X k evolution was simulated according to Equation (7), and the assessment result was simulated according to Equation (4) and the actual values of X k . If a cost overrun is reported at time instant i, the corrective action is taken right after the cost overrun event; i.e., X k will be set to (1 − ν)/λ right after the assessment. At the beginning, the influence factors are assumed to occur individually and independently to each other for simplicity. Based upon the cost records at the project; i.e., how many times each factor affects the cost overrun in one month, their prior PDFs are assumed to be uniformly distributed over [0.001, 0.1], and [0.001, 0.2], [0.001, 0.2], [0.001, 0.1], [0.001, 0.25] for weather, productivity, material, equipment, and management, respectively. Because there is no such prior information about parameters λ , ν, and α , their prior PDFs were assumed to follow uniform distributions over a relatively broad interval [0.00001, 1], interval [0.1, 1], and interval [10,31], respectively. For all the assessments, the number of simulation samples N was selected to be 5000.
Figure 4 plots the real-time cost overrun probability trend for each influence factor in the SRC project. Asterisks in the figure indicate reported cost overrun events in the assessment. The dashed line represents the borderline between the assessment simulation (i.e., the 1st–10th months) and prediction. It is found that if the assessment data and the parameter μ are similar for some factors, the trend plots look similar, e.g., productivity and material. Additionally, note that the cost overrun probability and the cost overrun rate decrease right after each reported cost overrun event because each reported cost overrun event is followed by corrective action.
In the real project execution, many influence factors may simultaneously affect the project cost overruns. This paper further surveyed the potential combinations of the influence factors to determine which combination fits the real cost overrun trend the most using sensitivity analysis. Table 5 shows the potential factor combinations. The threshold of the simulation outcome is set as 0.5. If the combined probability is less than 0.5 then it is recorded with “U”, otherwise “O”. “O” means that it is likely to be cost overrun at the time instant, and “U” means cost underrun. Table 6 shows that combinations 5 and 7 have a better match than other combinations, compared to the real cost trend status. This means that the productivity and material may give a more significant impact to the project cost overrun.
Finally, to compare the project cost based upon the earned value management (EVM) with the real-time HMM assessment model on the same basis, the EVM predicted cost was converted into the cost overrun probability values which were counted following the normal distribution in which the overrun average and the standard deviation were estimated based upon the SRC project cost data. The comparison of the accuracies of both is described in Table 7. The percentage of accuracy of EVM and our model is 77.2% and 82.9%, respectively. It is found that our model is more accurate than EVM. In addition, our model also considers the effect of corrective action that is hardly considered in EVM.

5. Conclusions and Recommendations

This paper proposed a new method of project cost overrun probability prediction in a real-time fashion. The applicability of the proposed model and algorithm was verified against an SRC building project in Taiwan. The posterior probabilities from the real-time HMM model were highly compatible with the cost overrun ratios of a real construction project. This model overcame several limitations of the classical project cost overrun prediction approaches. This proposed method is capable of providing a fast and timely estimate of cost overrun probability. It does not require the support of historical data from other projects, but only the latest data from the assessed project. This method also considers the effect of corrective action which is rarely considered in past research. Furthermore, the potential cause combinations with high possibility affecting the project cost overrun were supplementarily surveyed using sensitivity analysis compared with HMM inference. In practice, according to the analysis of cost overrun risks, the effect of corrective action, and significant influence factors with high possibility, proper effective cost management plans can be developed to alleviate the risk of construction project cost overrun.
The study is exploratory in nature; further research needs to continue in this area. In this model, cost overrun influence factors are assumed to be independent of each other. In some real construction projects, these factors may not be independent. Additionally, the method strongly relied on the validity of the Poisson arrival assumption which may not be reasonable for some construction projects. Nonetheless, this method has provided a realistic preliminary model to predict the project real-time cost overrun probability. In the future, the accuracy and the applicability of the model may be improved if the assumption of factor independence relaxes. The model would also benefit from examining the possible distributions of cost overrun events, as well as more realistic corrective actions and cost status assessments.

Author Contributions

Conceptualization, S.-S.L.; methodology, S.-S.L. and Y.L.; software, Y.L.; validation, S.-S.L. and Y.L.; formal analysis, S.-S.L. and Y.L.; investigation, S.-S.L. and Y.L.; resources, S.-S.L. and Y.L.; data curation, S.-S.L., Y.L. and P.-L.W.; writing—original draft preparation, S.-S.L. and Y.L.; writing—review and editing, P.-L.W.; visualization, S.-S.L. and Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data used to support the findings of this study are included in the article.

Acknowledgments

The authors thank the many experienced engineers who provided valuable information about construction cost management, as well as Jianye Ching at NTU for modeling suggestions and the comments provided by anonymous reviewers.

Conflicts of Interest

The authors declare that they have no conflict of interest.

References

  1. Elinwa, A.; Buba, S. Construction cost factors in Nigeria. J. Constr. Eng. Manag. ASCE 1993, 119, 698–713. [Google Scholar] [CrossRef]
  2. Hartley, J.R.; Okamoto, S. Concurrent Engineering: Shortening Lead Times, Raising Quality, and Lowering Costs; Productivity Press: Shelton, Conn, 1997. [Google Scholar]
  3. Teicholz, P. Forecasting final cost and budget of construction projects. J. Comput. Civ. Eng. ASCE 1993, 126, 511–529. [Google Scholar] [CrossRef]
  4. Baccarini, D. The maturing concept of estimating project cost contingency—A review. In Proceedings of the 31st Australasian University Building Educators Association Conference (AUBEA 2006): Building in Value, Sydney, Australia, 11–14 July 2006; pp. 2327–2337. [Google Scholar]
  5. Barraza, G.A.; Back, E.; Mata, F. Probabilistic monitoring of project performance using SS-curves. J. Constr. Eng. Manag. ASCE 2000, 126, 142–148. [Google Scholar] [CrossRef]
  6. Touran, A.; Lopez, R. Modeling cost escalation in large infrastructure projects. J. Constr. Eng. Manag. ASCE 2006, 132, 853–860. [Google Scholar] [CrossRef]
  7. Plebankiewicz, E.; Wieczorek, D. Prediction of cost overrun risk in construction projects. Sustainability 2020, 12, 9341. [Google Scholar] [CrossRef]
  8. Knight, K.; Fayek, A.R. Use of fuzzy logic for predicting design cost overruns on building projects. J. Constr. Eng. Manag. ASC 2002, 128, 503–512. [Google Scholar] [CrossRef]
  9. Touran, A. Probabilistic model for cost contingency. J. Constr. Eng. Manag. ASCE 2003, 129, 280–284. [Google Scholar] [CrossRef]
  10. Odeck, J. Cost overrun in road construction—what are their sizes and determinants? Transp. Policy 2004, 11, 43–53. [Google Scholar] [CrossRef]
  11. El-Kholy, A.M. Predicting Cost overrun in construction projects. J. Constr. Eng. Manag. ASCE 2015, 4, 95–105. [Google Scholar] [CrossRef]
  12. Ahiaga-Dagbui, D.D.; Smith, S.D. Dealing with construction cost overruns using data mining. J. Constr. Manag. Econ. 2014, 32, 682–694. [Google Scholar] [CrossRef] [Green Version]
  13. Huo, T.; Ren, H.; Cai, W.; Shen, G.Q. Measurement and dependence analysis of cost overruns in megatransport infrastructure projects: Case study in Hong Kong. J. Constr. Eng. Manag. ASCE 2018, 144, 05018001. [Google Scholar] [CrossRef]
  14. Ashtari, M.A.; Ansari, R.; Hassannayebi, E.; Jeong, J. Cost Overrun Risk assessment and prediction in construction projects: A Bayesian network classifier approach. Buildings 2022, 12, 1660. [Google Scholar] [CrossRef]
  15. Chen, H.; Chen, L.T.; Lin, Y.L. Earned value project management: Improving the predictive power of planned value. Int. J. Proj. Manag. 2016, 34, 22–29. [Google Scholar] [CrossRef]
  16. Acebes, F.; Pereda, M.; Poza, D.; Pajares, J.; Galán, J.M. Stochastic earned value analysis using Monte Carlo simulation and statistical learning techniques. Int. J. Proj. Manag. 2015, 33, 1597–1609. [Google Scholar] [CrossRef] [Green Version]
  17. Sackey, S.; Lee, D.E.; Kim, B.S. Duration Estimate at Completion: Improving Earned Value Management Forecasting Accuracy. KSCE J. Civ. Eng. 2020, 24, 693–702. [Google Scholar] [CrossRef]
  18. Zhao, M.; Zi, X. Using earned value management with exponential smoothing technique to forecast project cost. J. Phys. Conf. Ser. 2021, 1955, 21–23. [Google Scholar] [CrossRef]
  19. Yu, F.; Chen, X.; Cory, C.A.; Yang, Z.; Hu, Y. An active construction dynamic schedule management model: Using the fuzzy earned value management and BP nural network. KSCE J. Civ. Eng. 2021, 25, 2335–2349. [Google Scholar] [CrossRef]
  20. Akinci, B.; Fischer, M. Factors affecting contractors’ risk of cost overburden. J. Constr. Eng. Manag. ASCE 1998, 14, 67–76. [Google Scholar] [CrossRef]
  21. Dissanayaka, S.M.; Kumaraswamy, M.M. Evaluation of factors affecting time and cost performance in Hong Kong building project. Eng. Constr. Archit. Manag. 1999, 6, 287–298. [Google Scholar] [CrossRef]
  22. Elhag, T.M.S.; Boussabaine, A.H.; Ballal, T.M.A. Critical determinants of construction tendering costs: Quantity surveyors’ standpoint. Int. J. Proj. Manag. 2005, 23, 538–545. [Google Scholar] [CrossRef]
  23. Nassar, K.M.; Nassar, W.M.; Hegab, M.Y. Evaluating cost overruns of asphalt paving project using statistical process control methods. J. Constr. Eng. Manag. ASCE 2005, 7, 1173–1178. [Google Scholar] [CrossRef]
  24. Alshihri, S.; Al-Gahtani, K.; Almohsen, A. Risk factors that lead to time and cost overruns of building projects in Saudi Arabia. Buidlings 2022, 12, 902. [Google Scholar] [CrossRef]
  25. Wang, W.C.; Demsetz, L.A. Model for evaluating networks under correlated uncertainty—NETCOR. J. Constr. Eng. Manag. ASCE 2000, 126, 458–466. [Google Scholar] [CrossRef]
  26. Wang, W.C.; Demsetz, L.A. Application example for evaluating networks considering correlation. J. Constr. Eng. Manag. ASCE 2000, 126, 46–474. [Google Scholar] [CrossRef]
  27. Yeo, K.T. Risks, classification of estimates and contingency management. J. Constr. Eng. Manag. ASCE 1990, 6, 458–470. [Google Scholar] [CrossRef]
  28. Xie, W.; Deng, B.; Yin, Y.; Lv, X.; Deng, Z. Critical factors influencing cost overrun in construction projects: A fuzzy synthetic evaluation. Buildings 2022, 12, 2028. [Google Scholar] [CrossRef]
  29. Kaming, P.F.; Olomolaiye, P.O.; Holt, G.D.; Harris, F.C. Factors influencing construction time and cost overruns on high-rise projects in Indonesia. Constr. Manag. Econ. 1997, 15, 83–94. [Google Scholar] [CrossRef]
  30. Aljohani, A.; Ahiaga-Dagbui, D.; Moore, D. Construction projects cost overrun: What does the literature tell us? Int. J. Innov. Manag. Technol. 2017, 8, 137–143. [Google Scholar] [CrossRef]
  31. Monaka, M.; Xhu, L.; Babar, M.A.; Staples, M. Project cost overrun simulation in software product line development. In Proceedings of the Product-Focused Software Process Improvement, 8th International Conference, Riga, Latvia, 2–4 July 2007. [Google Scholar] [CrossRef] [Green Version]
  32. Barraza, G.A.; Back, E.; Mata, F. Probabilistic forecasting of project performance using stochastic S-curves. J. Constr. Eng. Manag. ASCE 2004, 130, 25–32. [Google Scholar] [CrossRef]
Figure 1. Overall analysis process of proposed method.
Figure 1. Overall analysis process of proposed method.
Buildings 13 00667 g001
Figure 2. Overall data sampling process of the real-time Bayesian updating model.
Figure 2. Overall data sampling process of the real-time Bayesian updating model.
Buildings 13 00667 g002
Figure 3. Real-time samples of the {λ, μ} and {ν, α}.
Figure 3. Real-time samples of the {λ, μ} and {ν, α}.
Buildings 13 00667 g003
Figure 4. Cost Overrun Probability Trend for Each Influence Factor at SRC Project.
Figure 4. Cost Overrun Probability Trend for Each Influence Factor at SRC Project.
Buildings 13 00667 g004
Table 1. Three Span Bridge—Project Activity Data.
Table 1. Three Span Bridge—Project Activity Data.
ActPred.DescriptionDur (Days)Cost (USD)Work (%)
1 Mobilization3090001.4
2 Girder casting yard3012,6002.0
31Drive piles in Abutment A2478001.3
49Drive piles in Abutment B2481001.3
57, 12Drive piles in Pier no. 12381001.0
68, 13Drive piles in Pier no. 22360001.0
73Cofferdam—install at Abutment A1516,0002.9
85, 16Cofferdam remove—install Pier 12021,0003.3
96, 17Cofferdam remove—install Pier 22021,0003.3
104, 19Cofferdam remove—install Abut. B 2021,0003.3
1121Cofferdam remove from Abut. B1530000.5
121Erect falsework in Span 12512,0001.9
1312Erect falsework in Span 22512,0001.9
1413Erect falsework in Span 32512,0001.9
1528Remove falsework, all spans2060000.9
167, 12Reinforced concrete, Abutment A2015,0002.4
178, 13Reinforced concrete, Pier 1 (1/2)2016,5002.6
1817Reinforced concrete, Pier 1 (1/2)2016,5002.6
199, 14Reinforced concrete, Pier 2 (1/2)2016,5002.6
2018, 19Reinforced concrete, Pier 2 (1/2)2016,5002.6
2110, 16Reinforced concrete, Abutment B2015,0002.4
222Manufacture PC Girders, Span 17096,00015.2
2322Manufacture PC Girders, Span 26596,00015.2
2423Manufacture PC Girders, Span 36596,00015.2
2518, 22Erection of PC Girders, Span 115144,4002.3
2620, 23, 25Erection of PC Girders, Span 215150,0002.3
2711, 24, 26Erection of PC Girders, Span 315156,0002.4
2827In situ concrete deck, Span 31590001.4
2927Approaches, handrails, etc3021,0003.3
3029Clean up and move out1060000.9
Table 2. Cost Overrun Probability Comparison between Barraza et al. [32] and Our Model.
Table 2. Cost Overrun Probability Comparison between Barraza et al. [32] and Our Model.
Time (Days)Planning Cost (USD)Actual Cost (USD)Barraza et al. [32]Our Model
5065,61665,6210.1150.968
100204,914206,6200.5790.116
289632,669635,3670.8930.468
Table 3. Status of Cost Overrun report at SRC Project.
Table 3. Status of Cost Overrun report at SRC Project.
Time
(Month)
ActivityStatusTime
(Month)
ActivityStatus
1Diaphragm wall finished 30%Underrun198F, 9F structure finishedUnderrun
2Diaphragm wall finished 50%Overrun2010F structure finishedUnderrun
3Diaphragm wall finished 70%Overrun2111F, 12F structure finishedUnderrun
4Diaphragm wall finished Overrun22RF structure finishedUnderrun
5Excavation finished30%Overrun23PHF structure finishedUnderrun
6Excavation finished 60%Overrun24Interior partition, door frame, windows frame finished 50%Underrun
7Excavation finishedOverrun25Interior partition, door frame, windows frame finishedUnderrun
8Floating raft pump concrete finished (FS plate)Overrun26Exterior decoration under coating varnish, roof water resist, insulation layer finished 51% Underrun
9B2F plate RC finishedUnderrun27Exterior decoration under coating varnish, roof water resist, insulation layer finishedUnderrun
10B1F plate RC finishedOverrun28Exterior shelf, frame, railing, brick wall finished 50%Underrun
11Platform constructionUnderrun29Exterior shelf, frame, railing, brick wall finishedUnderrun
12Steel-reinforced erection construction (Section 1)Underrun30Occupation licenseUnderrun
13Steel-reinforced erection construction (Section 2)Underrun31Interior decorations finished, inspection and turn overUnderrun
14Steel-reinforced erection construction (Section 3 and Section 4)Underrun32Interior decorations finished, inspection and turn overUnderrun
151F structure finishedUnderrun33The interior decorations finished, inspection and turn overUnderrun
162F, 3F structure finishedUnderrun34Interior decorations finished, inspection and turn overUnderrun
174F, 5F structure finishedUnderrun35Interior decorations finished, inspection and turn overUnderrun
186F, 7F structure finishedUnderrun
Table 4. Assessment Data at SRC Project.
Table 4. Assessment Data at SRC Project.
MonthInfluence Factor Status
WeatherProductivityMaterialEquipmentManagement
1−1−1−1−1−1
2−11111
3−11111
4−11111
51−1−1−11
61−1−1−11
71−1−1−11
8111−1−1
9−1−1−1−1−1
10−111−1−1
Note: [1] = overrun, [0] = in budget, [−1] = underrun.
Table 5. Potential Factor Combinations and Their Logic Gate.
Table 5. Potential Factor Combinations and Their Logic Gate.
CombinationFactor ConsideredCombinationFactor Considered
1Management11Management and productivity
2Management12Management and productivity
3Weather13Management and weather
4Weather14Management and weather
5Productivity15Management and material
6Productivity16Management and material
7Material17Management and equipment
8Material18Management and equipment
9Equipment19management, Productivity and weather
10Equipment20management, Productivity and weather
Table 6. Sensitivity Test of Potential Influence Factor Combinations.
Table 6. Sensitivity Test of Potential Influence Factor Combinations.
Combination
Stage1234567891011121314151617181920Real Value
1UUUUUUUUUUUUUUUUUUUUU
2UOUUUOUOUOUOUOUOUOUOU
3OOUUOOOOOOOOUOOOOOUOO
4OOUUOOOOOOOOUOOOOOUOO
5UOUUUOUOUOUOUOUOUOUUU
6OOOUUOUOUOUOUOUOUOUUU
7OOOUUOUOUOUOUOUOUOUUU
8UOOUOOOOUOUOUOUOUOUOO
9UUUUUUUUUUUUUUUUUUUUU
10UOUOOOOOUOUOUOUOUOUOO
11UOUOUOUOUOUOUOUOUOUUO
12UOUOUOUOUOUOUOUOUOUOO
13UOUOUOUOUOUOUOUOUOUOU
14UOUOUOUOUOUOUOUOUOUOU
15UOUOUOUOUOUOUOUOUOUOU
16UOUOUOUOUOUOUOUOUOUOU
17UOUOUOUOUOUOUOUOUOUOU
18UOUOUOUOUOUOUOUOUOUOU
19UOUOUOUOUOUOUOUOUOUOU
Note: (O) = overrun; (U) = underrun.
Table 7. Comparison between Our Model and EVM.
Table 7. Comparison between Our Model and EVM.
Our ModelEVM
Total number matched2927
Total prediction number3535
Percentage of accuracy (%)82.977.2
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Leu, S.-S.; Liu, Y.; Wu, P.-L. Project Cost Overrun Risk Prediction Using Hidden Markov Chain Analysis. Buildings 2023, 13, 667. https://doi.org/10.3390/buildings13030667

AMA Style

Leu S-S, Liu Y, Wu P-L. Project Cost Overrun Risk Prediction Using Hidden Markov Chain Analysis. Buildings. 2023; 13(3):667. https://doi.org/10.3390/buildings13030667

Chicago/Turabian Style

Leu, Sou-Sen, Yanni Liu, and Pei-Lin Wu. 2023. "Project Cost Overrun Risk Prediction Using Hidden Markov Chain Analysis" Buildings 13, no. 3: 667. https://doi.org/10.3390/buildings13030667

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop