Adaptation of a Cost Overrun Risk Prediction Model to the Type of Construction Facility

: To assess the risk of project cost overrun, it is necessary to consider large amounts of symmetric and asymmetric data. This paper proposes a cost overrun risk prediction model, the structure of which is based on the fuzzy inference model of Mamdani. The model consists of numerous inputs and one output (multi-input-single-output (MISO)), based on processes running consecutively in three blocks (the fuzzy block, the interference block, and the block of sharpening the representative output value). The input variables of the model include the share of element costs in the building costs (SE), predicted changes in the number of works (WC), and expected changes in the unit price (PC). For the input variable SE, it is proposed to adjust the fuzzy set shapes to the type of building object. Single-family residential buildings, multi-family residential buildings, o ﬃ ce buildings, highways, expressways, and sports ﬁelds were analyzed. The initial variable is the value of the risk of exceeding the costs of a given element of a construction investment project (R). In all, 27 rules were assumed in the interference block. Considering the possibility of applying sharpening methods in the cost overrun risk prediction model, the following defuzziﬁcation methods were investigated: the ﬁrst of maxima, middle of maxima, and last of maxima method, the center of gravity method, and the bisector area method. Considering the advantages and disadvantages, the authors assumed that the correct and basic defuzziﬁcation method in the cost overrun risk prediction model was the center of gravity method. In order to check the correctness of the assumption made at the stage of designing the rule database, result diagrams were generated for the relationships between the variable (R) and the input variables of individual types of buildings. The results obtained conﬁrm the correctness of the assumed assumptions and allow to consider the input variable (SE), adjusted individually to the model for each type of construction object, as crucial in the context of the impact on the output value of the output variable (R).


Introduction
Cost overruns in construction projects are a common phenomenon, occurring in different market and legal conditions and, unfortunately, often negatively influencing the achievement of project goals. Numerous research results indicate the scale of this problem. For instance, Love et al. [1] analyzed cost overruns from 276 construction and engineering projects. The research revealed a mean cost overrun of 12.22%. According to research performed by Andrić et al. [2] on cost overruns in infrastructure projects in Asia, the mean value of cost overrun is 26.24%. Senouci et al. [3] in their study on the increase in term cost in 122 construction contracts in Qatari showed that 54% had their costs increased and 72% their deadlines increased. Larsen et al. [4] established that more than half of Malaysian construction projects (55%) experienced cost overruns.

Main Assumptions of the Model
The construction of the cost overrun risk prediction model was based on the fuzzy inference model of Mamdani. This model has been frequently used in the field of construction management, for instance, to build fuzzy risk inference models, in the context of assessing: • exceeding the time and cost of construction investments [34], • exceeding the time, cost, and impact on quality and other technical considerations in the implementation of construction projects [35,36], • occupational risks on construction sites [37], • level of safety of construction workers [38], • technological, financial, political, environmental, and legal risk factors in the life cycle of buildings [39], • technological risk factors for old buildings [40].
A cost overrun risk prediction model is a model with multiple inputs and one output (multi-input-single-output (MISO)), based on processes that run sequentially in three blocks (the fuzzy block, the interference block, and the block of sharpening the representative output value). Share of element costs in the building costs (SE), predicted changes in the number of works (WC), and expected changes in the unit price (PC) are the input variables of the model. The database of 27 individually designed rules supports the inference process in the interference block, and the level of risk of exceeding the costs of a given element of a construction project (R) is an output variable (y).
To construct a cost overrun risk prediction model, the authors decided to choose the theory of possibilities and fuzzy logic, because the risk is related to the so-called measurable uncertainty. Its measurable character results also from the fact that the risk is quantifiable and can be directly translated into the size of parameters necessary, for example, to determine the value of the risk of cost overrun. In practice, it often happens that an expert who evaluates risk does not have a sufficient number of historical data to perform statistical research that would result in a probabilistic distribution, and thus determines subjectively the size of parameters necessary for risk assessment.

Block of Fuzzification
The input variables, namely share of element costs in the building costs (SE), predicted changes in the number of works (WC), and expected changes in the unit price (PC), are described with appropriate linguistic terms (fuzzy sets) in the consideration spaces on the so-called universes X 1 , X 2 , and X 3 . The domain (range of arguments) of the universes was determined as a percentage within the interval [0; 100%] for each input variable, with the model using the decimal notation corresponding to the interval [0; 1]. In defining the X consideration spaces, for all variables described by the linguistic terms "high", "average", and "low", it was assumed that the adjacent fuzzy sets (representing consecutive linguistic terms) would overlap. According to Hovde and Moser [41], only this modelling of the linguistic terms for the input variables gives a favorable effect in the inference process. Table 1 represents the fuzzy sets for the linguistic terms L(X 2 ) and L(X 3 ), that is, for the input variables WC and PC. For the description of linguistic terms, membership functions with line graphs were used (triangular functions and classes Γ and L). The qualitative definition of fuzzy sets was based on the selection of appropriate types of membership functions. The quantitative definition was performed on the basis of the selection of the values of parameters characterizing the functional curves, which made it possible to precisely determine the degrees of membership of individual fuzzy sets. Degrees of membership for fuzzy sets are described in Table 1 (in the last column) by means of four numbers {α 1 , α 2 , α 3 , α 4 }. These parameters indicate, respectively, the intervals of achieving the value of membership degree 1.0 {α 2 , α 3 } and the left or right width of the distribution of the membership function to the value of the membership degree 0.0 {α 1 , α 4 }. It was assumed that linguistic values for both input variables (WC and PC) would remain unchanged regardless of the type of the building object. The data presented in Table 1 correspond to the graphic interpretation of fuzzy sets of linguistic values for WC and PC, which is illustrated in Figure 1. appropriate linguistic terms (fuzzy sets) in the consideration spaces on the so-called universes X1, X2, and X3. The domain (range of arguments) of the universes was determined as a percentage within the interval [0; 100%] for each input variable, with the model using the decimal notation corresponding to the interval [0; 1]. In defining the X consideration spaces, for all variables described by the linguistic terms "high", "average", and "low", it was assumed that the adjacent fuzzy sets (representing consecutive linguistic terms) would overlap. According to Hovde and Moser [41], only this modelling of the linguistic terms for the input variables gives a favorable effect in the inference process. Table 1 represents the fuzzy sets for the linguistic terms L(X2) and L(X3), that is, for the input variables WC and PC. For the description of linguistic terms, membership functions with line graphs were used (triangular functions and classes Γ and L). The qualitative definition of fuzzy sets was based on the selection of appropriate types of membership functions. The quantitative definition was performed on the basis of the selection of the values of parameters characterizing the functional curves, which made it possible to precisely determine the degrees of membership of individual fuzzy sets. Degrees of membership for fuzzy sets are described in Table 1 (in the last column) by means of four numbers {α1, α2, α3, α4}. These parameters indicate, respectively, the intervals of achieving the value of membership degree 1.0 {α2, α3} and the left or right width of the distribution of the membership function to the value of the membership degree 0.0 {α1, α4}. It was assumed that linguistic values for both input variables (WC and PC) would remain unchanged regardless of the type of the building object.  The data presented in Table 1 correspond to the graphic interpretation of fuzzy sets of linguistic values for WC and PC, which is illustrated in Figure 1. Input variable: share of element costs in the building costs (SE) should be subject to the process of adjusting the shapes of fuzzy sets described by the linguistic terms "high", "average", and "low" individually, depending on the type of the building object. The authors decided to analyze the following types of building objects in the context of determining the parameters denoting the intervals of attaining the value of the membership degree of 1.0 and the left or right width of the distribution of the membership function to the value of the membership degree 0.0. The following types of buildings were analyzed: Input variable: share of element costs in the building costs (SE) should be subject to the process of adjusting the shapes of fuzzy sets described by the linguistic terms "high", "average", and "low" individually, depending on the type of the building object. The authors decided to analyze the following types of building objects in the context of determining the parameters denoting the intervals • single-family residential buildings, • multi-family residential buildings, • office buildings, • highways and expressways, • sports fields.
Each of the buildings was divided according to cost elements following the tables of billing elements for an average of five buildings of each type. Table 2 presents the range of cost elements for cubature facilities, highways and expressways, as well as sports fields.

Type of Building Cost Elements
Cubature facilities (singleand multi-family residential buildings, office buildings) Earthworks, foundations (including walls and insulation of the ground floor of the building), ground walls, ceilings, stairs, partition walls, roof (construction and covering), sleepers and canals inside the building, insulation of the ground, plaster and interior cladding, windows and doors, painting work, floors (with layer), facades with works outside the building, water and sewage installations, central heating installations and electrical installations.

Highways and expressways
Preparatory works, earthworks, drainage of road body, substructures, surfaces, finishing works, traffic safety equipment, street and road elements and other works.

Sports fields
Site preparation and earthworks, substructures, sports surfaces, landscaping and equipment.
For each building object, based on the data from an average of five objects, the average percentage of each cost component was determined. Then, the values of quartiles Q1 and Q3 and the median were calculated using statistical measures. The results are presented in Table 3. It should be noted that the research sample (five objects) is relatively small. However, it can be concluded that for standard material and technological solutions, the deviations from the results obtained for a given type of building are small. In the case of non-standard solutions, the share of component costs should be modified, taking into account the specificity of a given building object.
On the basis of the data presented in Table 3, a fuzzy interpretation of the linguistic input variable SE for each of the buildings was proposed. It was assumed that for fuzzy sets: • "high"-description of the variable would relate to the value "about or above quartile Q3", • "average"-description of the variable would relate to the value "about median", • "low"-description of the variable would relate to the value "about or below quartile Q1".  Table 4. Fuzzy interpretation of the linguistic input variable share of element costs in the building costs (SE). In Figures 2-6, graphical interpretations of the input variable consideration space are presented for the subsequent types of buildings subjected to analysis. These interpretations accurately reproduce the fuzzy sets for linguistic terms "high", "average", and "low", which are described in Table 4.  In Figures 2-6, graphical interpretations of the input variable consideration space are presented for the subsequent types of buildings subjected to analysis. These interpretations accurately reproduce the fuzzy sets for linguistic terms "high", "average", and "low", which are described in Table 4.

Block of Inference
In the inference block in the fuzzy inference model of Mamdani of the MISO type, the resulting membership function is calculated for the output variable µ(y). Its calculation is based on the values of the degree of membership of the sharp input variables µ(x 1 ), µ(x 2 ), and µ(x 3 ) for individual fuzzy sets of linguistic values. The resulting function often has a complex shape and its calculation is done by the so-called inference (inference process). The inference block consists of two basic elements, namely the rule base and the inference mechanism, the operation of which is based on the three following consecutive mathematical operations: aggregation of simple premises, implications of fuzzy inference rules, and aggregation of conclusions of all rules.
The designed base of rules in the cost overrun risk prediction model has a conjunctive form due to the logical conjunction "and" used in conditional sentences, which combines all three simple premises. he model proposes five result conclusions that inform about the size of the calculated risk of cost overruns, i.e., "very low" (Vl), "quite low" (Ql), "average" (Av), "quite high" (Qh), and "very high" (Vh).
For the purpose of developing the rule base, the authors assumed that with an increase in the share of element costs in the building costs (SE), predicted changes in the number of works (WC), and expected changes in the unit price (PC), the value of the risk level of exceeding the costs of a given element in the construction project (R) will naturally and smoothly increase. For this purpose, it was decided to examine the quantities of the products of all combinations of input variables in a set of all 27 possible rules, and then to assign the results to five possible result conclusions on the assumption that the minimum quantities correspond to the "very low" conclusion, the maximum-to the "very high" conclusion, and the intermediate-to the "quite low", "average", and "quite high" conclusions, respectively and proportionally. The following weights were assumed for the linguistic input variables SE, WC, and PC: 1 for "low", 2 for "average", and 3 for "high". Table 5 illustrates the rule base of the inference block consisting of 27 rules, for which equal degrees of fuzzy relationship validity are assumed to be 1.0.
In the interference block, the processes of premise aggregation and rule conclusion aggregation are performed. Aggregation of simple premises consists in calculating the degree of belonging (truthfulness) of the fuzzy rule created by these premises. Due to the fact that in the conditional sentences the logical conjunction "and" was used, which in fuzzy logic is represented by the concept of intersection (product) of the fuzzy sets, the operation of premise aggregation was reduced to searching for the value of the degree of membership to the fuzzy relationship (F R ). This value was determined by applying the Mamdani fuzzy implication rule (T-norm), calculated according to the following formula: The final stage of the inference block is the aggregation of the conclusions of all running fuzzy rules (the so-called output aggregation). This procedure consists of summing up the conclusions of activated rules that are responsible for the shape of the resulting membership function µ(y). According to the calculation algorithm, the first step is to define separately the modified membership functions of the fuzzy sets of the output variable for the rules involved in the inference, and then sum up these fuzzy sets based on one of the formulas for S-norm. In the cost overrun risk prediction model, the basic S-norm is the following formula of Mamdani: Output variable (y) is described in space (universe) Y. The scope of the Y universe was determined as a percentage [0; 100%]. As in the case of all input variables, the record of the argument domain in the decimal interval was adopted [0; 1]. Sets correspond to the resultant conclusions in the rule database ("very low", "quite low", "average", "quite high", and "very high").
Fuzzy sets for the final result conclusions ("very low" and "very high") and the intermediate internal conclusion ("average") were attempted to be parameterized in such a way that the membership function graphs did not interpenetrate, but were continuous in the full scope of the Y universe. For internal relative conclusions ("quite low" and "quite high"), the same procedure was followed, where the fuzzy sets were entered symmetrically between the extreme (final) and internal (intermediate) conclusions. The parameterization was performed in such a way that the adjacent fuzzy sets overlapped with the membership degree for intermediate elements equal to µ(0.2) = µ(0.4) = µ(0.6) = µ(0.8) = 0.5. Table 6 presents sets of linguistic terms L(Y) for the output variable (y). The membership of all fuzzy sets was defined as in the case of the input variables, that is, using four numbers {α 1 , α 2 , α 3 , α 4 }. Figure 7 presents a graphic interpretation of the consideration space of the output variable (y), which is represented by the fuzzy sets of all five result conclusions, described in Table 6.   Figure 7 presents a graphic interpretation of the consideration space of the output variable (y), which is represented by the fuzzy sets of all five result conclusions, described in Table 6.

Block of Defuzzification
The defuzzification process is a mathematical operation performed on the resultant membership function shape (the resulting fuzzy set) obtained after aggregating the conclusions of

Block of Defuzzification
The defuzzification process is a mathematical operation performed on the resultant membership function shape (the resulting fuzzy set) obtained after aggregating the conclusions of all inference rules. This operation aims to determine one sharp value of the variable (y) that will appropriately represent the output fuzzy set and indicate unambiguously the result conclusion.
Considering the possibility of using sharpening methods in the cost overrun risk prediction model, the following defuzzification methods were investigated: the first of maxima, middle of maxima, and last of maxima method, the center of gravity method, and the bisector area method. The advantages and disadvantages, as well as the conditions for the application of individual methods, were highlighted. The suggestions and observations contained in [42] were especially taken into account, according to which the methods of maxima: • are not able to implement the assumption adopted for the purposes of building the rule base, that with the increase in the share of element costs in the building costs (SE), predicted changes in the number of works (WC), and expected changes in the unit price (PC), the value of the risk level of exceeding the costs of a given element of the construction investment (R) will naturally and smoothly increase, • result in sharp values, which will not in every case adequately represent the output fuzzy set, which is caused by the impact on the sharp result of only the most activated fuzzy set of the output variable.

Discussion
A cost overrun risk prediction model was developed for each type of construction site separately using the "Fuzzy Logic Designer" application that is available in the MATLAB R2013a software package (The MathWorks, Inc., Natick, MA, USA) for scientific and engineering calculations.
In order to investigate the correctness of the assumption made at the design stage of the rule base (i.e., that as the share of element costs in the building costs (SE), predicted changes in the number of works (WC) and expected changes in the unit price (PC) increase, the value of the risk level of exceeding the costs of a given element of the construction project (R) will increase naturally and smoothly) and also to examine the impact of the change of the membership function for the input variable (i.e., share of element costs in the building costs (SE) for individual types of building objects on the value of the results obtained for the output variable (R)), the following result diagrams were generated for the relationships between the variable R and the input variables: Taking into account the above observations, it was assumed that the proper and basic defuzzification method in the cost overrun risk prediction model would be the center of gravity method.

Discussion
A cost overrun risk prediction model was developed for each type of construction site separately using the "Fuzzy Logic Designer" application that is available in the MATLAB R2013a software package (The MathWorks, Inc., Natick, MA, USA) for scientific and engineering calculations.
In order to investigate the correctness of the assumption made at the design stage of the rule base (i.e., that as the share of element costs in the building costs (SE), predicted changes in the number of works (WC) and expected changes in the unit price (PC) increase, the value of the risk level of exceeding the costs of a given element of the construction project (R) will increase naturally and smoothly) and also to examine the impact of the change of the membership function for the input variable (i.e., share of element costs in the building costs (SE) for individual types of building objects on the value of the results obtained for the output variable (R)), the following result diagrams were generated for the relationships between the variable R and the input variables: • diagrams of the result area for the output variable (R) due to the influence of the input variables PC and SE in the cross-section, when WC = 0.5, and WC and SE in the cross-section, when PC = 0.5, • diagrams of the result area for the output variable (R) taking into account the set of input variables PC and WC in the cross-section, when SE = 0.5, • flat diagrams of the resultant curves for the output variable (R) due to the influence of PC input variables in the cross-section, when WC = SE = 0.5, WC in the cross-section, when PC = SE = 0.5, and SE in the cross-section, when PC = WC = 0.5.
The following figures show flat and spatial diagrams for the relationships between the output variable (R) and the input variables (SE, WC, and PC) for all types of buildings under analysis (singleand multi-family residential buildings, office buildings, highways and expressways, and sports fields). Figure 9 shows the result area for the output variable (R) in terms of PC and WC variables (left diagram) and the relationship between the output variable (R) and the PC input variable (right diagram). It should be noted that both the result areas as well as dependencies on the output variable (R) are analogous for each type of building object because, in the cost overrun risk prediction model, it was assumed that PC and WC input variables would remain the same for all buildings.  Figures 10-14 show the result area for the output variable (R) in terms of the variables PC and SE (diagrams on the left, respectively) and the relationships between the output variable (R) and the input variable SE (diagrams on the right, respectively). It should be noted that both the result area and the dependencies with respect to the output variable (R) are analogous for the set of input variables WC and SE.
Diagrams of the result areas and of the relationship between the output variable (R) and the input variables confirm the correctness of the assumptions made when designing the rule base of the cost overrun risk prediction model. Figures 9-14 indicate unequivocally that with an increase in the share of element costs in the building costs (SE), predicted changes in the number of works (WC), and expected changes in the unit price (PC), the value of the risk level of exceeding the costs of a given element of a construction investment (R) increases naturally and smoothly.
In contrast, the diagrams of the dependence between the output variable (R) and the input variable SE in the cross-section were superimposed on Figure 15 when WC = PC = 0.5 for all five types of construction objects.  share of element costs in the building costs (SE), predicted changes in the number of works (WC), and expected changes in the unit price (PC), the value of the risk level of exceeding the costs of a given element of a construction investment (R) increases naturally and smoothly.
In contrast, the diagrams of the dependence between the output variable (R) and the input variable SE in the cross-section were superimposed on Figure 15 when WC = PC = 0.5 for all five types of construction objects.          From the comparison of flat dependence diagrams (Figure 15), the input variable share of element costs in the building costs (SE), adjusted individually to the model for each building type, should be considered crucial in the context of the impact on the result value of the output variable (R). The lower the membership for the values of the arguments of the X1 universe domain for the linguistic terms "average" and "high" of the SE variable, the more the resulting value of the risk of construction investment cost overrun (R) increases for the arguments of the X1 variable universe with smaller values-the SE interval approximately [0.1; 0.3]. This conclusion is confirmed in particular by the comparison of the course of the result curves for office buildings (blue line) and sports fields (purple line).
Single-family residential buildings Multi-family residential buildings Office buildings Highways and expressways Sports fields In contrast, the diagrams of the dependence between the output variable (R) and the input variable SE in the cross-section were superimposed on Figure 15 when WC = PC = 0.5 for all five types of construction objects.  From the comparison of flat dependence diagrams (Figure 15), the input variable share of element costs in the building costs (SE), adjusted individually to the model for each building type, should be considered crucial in the context of the impact on the result value of the output variable (R). The lower the membership for the values of the arguments of the X1 universe domain for the linguistic terms "average" and "high" of the SE variable, the more the resulting value of the risk of construction investment cost overrun (R) increases for the arguments of the X1 variable universe Single-family residential buildings Multi-family residential buildings Office buildings Highways and expressways Sports fields Figure 15. The diagrams of the dependence between the output variable (R) and the input variable SE for all five types of construction objects.
From the comparison of flat dependence diagrams (Figure 15), the input variable share of element costs in the building costs (SE), adjusted individually to the model for each building type, should be considered crucial in the context of the impact on the result value of the output variable (R). The lower the membership for the values of the arguments of the X1 universe domain for the linguistic terms "average" and "high" of the SE variable, the more the resulting value of the risk of construction investment cost overrun (R) increases for the arguments of the X1 variable universe with smaller values-the SE interval approximately [0.1; 0.3]. This conclusion is confirmed in particular by the comparison of the course of the result curves for office buildings (blue line) and sports fields (purple line).

Conclusions
The phenomenon of exceeding planned investment costs is often encountered in the construction industry, and the determination of the risk associated with it may be of key importance for achieving the objectives of the project. This paper discusses a cost overrun risk prediction model, the development of which was based on the fuzzy inference model of Mamdani. The model input variables include the following: share of element costs in the building costs (SE), predicted changes in the number of works (WC), and expected changes in the unit price (PC). The basic problem is to adjust the shape of the fuzzy sets for a given input SE to the type of building object. The paper proposes a shape for cubature buildings (residential and office ones), highways and expressways, and sports fields.
In order to check the correctness of the assumption made of the rule database, result diagrams were generated for the relationships between the variable R and the input variables of individual types of buildings. The obtained results confirm the correctness of the assumptions. With an increase in input variables, the value of the risk level of exceeding the costs increases naturally and smoothly. The results prove that the input variable SE, adjusted individually to the model for each type of construction object, is crucial in the context of influencing the output value. The lower the membership for the values of the arguments of the X1 universe domain for the linguistic terms "average" and "high" of the SE variable, the more the resulting value of the risk of construction investment cost overrun (R) increases for the arguments of the X1 variable universe with smaller values.
The model requires further research, both in terms of the input data taken into account and the diversity of the analyzed construction projects. Further testing of the model on actual construction projects will confirm its usefulness in determining the risk of cost overruns.
Author Contributions: E.P. carried out a review of the literature concerning the introduction part. E.P. and D.W. described all assumptions of the cost overrun risk prediction model. D.W. prepared all figures and tables. E.P. and D.W. discussed the results, drew conclusions. All authors have read and agreed to the published version of the manuscript.