Statistical Methods in Bidding Decision Support for Construction Companies

On the border of two phases of a building life cycle (LC), the programming phase (conception and design) and the execution phase, a contractor is selected. A particularly appropriate method of selecting a contractor for the construction market is the tendering system. It is usually based on quality and price criteria. The latter may involve the price (namely, direct costs connected with works realization as well as mark-ups, mainly overhead costs and profit) or cost (based on the life cycle costing (LCC) method of cost efficiency). A contractor’s decision to participate in a tender and to calculate a tender requires an investment of time and company resources. As this decision is often made in a limited time frame and based on the experience and subjective judgement of the contractor, a number of models have been proposed in the literature to support this process. The present paper proposes the use of statistical classification methods. The response obtained from the classification model is a recommendation to participate or not. A database consisting of historical data was used for the analyses. Two models were proposed: the LOG model—using logit regression and the LDA model—using linear discriminant analysis, which obtain better results. In the construction of the LDA model, the equation of the discriminant function was sought by indicating the statistically significant variables. For this purpose, the backward stepwise method was applied, where initially all input variables were introduced, namely, 15 identified bidding factors, and then in subsequent steps, the least statistically significant variables were removed. Finally, six variables (factors) were identified that significantly discriminate between groups: type of works, contractual conditions, project value, need for work, possible participation of subcontractors, and the degree of difficulty of the works. The model proposed in this paper using a discriminant analysis with six input variables achieved good performance. The results obtained prove that it can be used in practice. It should be emphasized, however, that mathematical models cannot replace the decision-maker’s thought process, but they can increase the effectiveness of the bidding decision.


Introduction
With the development of new technologies and advanced building materials, an increasing number of demands are placed on the construction industry. Modern buildings should have as little impact as possible on the environment [1][2][3] using sustainable materials (such as natural or recycled materials) [4][5][6] and environmentally friendly construction technologies [7][8][9]. They should have low energy consumption [10,11], demonstrate the ability to perform repairs resulting from wear and tear [12][13][14], as well as from possible breakdowns [15,16]. Preferably, they should allow the recycling or disposal [17,18] of the resulting construction waste. These aspects are considered by the participants in the investment process, both the investor, the contractor, and the user against the background of the different stages of the building life cycle. Phases identified in the literature include the following: the programming phase (study and conceptual analysis, as well as design), the execution phase (construction of the facility), the operation phase (operation, use, and maintenance of the facility) and the decommissioning phase (demolition of the facility). In this paper, attention is paid to the programming phase, and in particular to the conclusion of the design phase, which must be followed here by the selection of the contractor for the construction work before the execution phase begins (Figure 1). The methods of sourcing contractors in the construction market depend on the type of market (private or public sector) and the value of the project. Due to the individualized nature of construction production and the long production cycle, the tendering system is particularly suited to the operating conditions of the construction market [19]. The bidding procedure ensures that competition takes place properly and that its results are objective. It is also a factor conditioning the objectivity of prices in the construction industry. Bidding can be carried out by any investor looking for a contractor, but it is the potential contractor who must decide to tender and begin the laborious process of preparing a bid.
The selection of the most advantageous tender is normally based on quality and price criteria [20]. The price criterion may involve a price or cost and is based on a costeffectiveness method, such as life-cycle costing (LCC). In the former case, the basis for determining a price are the direct costs connected with works realization as well as markups, mainly overhead costs and profit [21][22][23]. In the latter case, life cycle costs (LCC) should be estimated, including the costs for planning, design, operation, maintenance, and decommissioning minus the residual value, if there is any [24]. In the literature, one can find many mathematical models prepared for the estimation of building life cycle costs [25][26][27], the description and comparison of which can be found, for example, in [28]. A contractor's decision to enter a tender requires action to prepare the tender and requires investment of time and commitment of staff, that is, the direct use of company resources. Irrespective of the outcome of the tender, the costs of preparing the tender will be incurred. Efficient bidding is certainly essential for every construction company. Choosing the right tender for a company has an impact on the creation of its image, its financial condition, and its aspiration to success [29].
The decision to participate in a tender often must be made by the contractor within a limited time frame and it is often based on his or her own experience. To improve the effectiveness of the decision, various models have been developed to support this process. In this case, a bidding decision model should be understood as a mathematical representation of reality, with a proposed technique to help the construction contractor decide to participate in the tender, avoiding errors and randomness. Efficient decision making is one of the greatest challenges of contemporary construction [30].
Different methods and tools are used to build models supporting construction contractors' decisions to bid. A summary of the selected existing (published after 2000) models is provided in Table 1. It is worth noting that the indicated models differ in the methods used. Different methods, techniques, and approaches are sought and applied to obtain the most effective models. What is important, continuously for at least 20 years, modeling of a tender decision is still an object of research and interest of researchers.
The models proposed in the literature are generally based on factors, also called criteria, affecting the decision, and using them as input parameters. The number of publications on the identification of factors is considerable, as each country and region has a certain characteristic group of factors that will not be found in other markets [48][49][50]. It can therefore be concluded that the factors influencing tender decisions depend not only on the project to be tendered but also on the environment and market in which the contractor operates.
Bidding problems are also known in procurement auctions [51,52]. This paper [53] presents the analysis of the relation between the award price and the bidding price in the case of public procurement in Spain. An award price estimator was proposed as it is believed to be particularly useful for companies and public procurement agencies. Procurement auctions have long been employed in the logistics and transportation industry [54]. In combinatorial auctions, each carrier must determine the set of profitable contracts to bid on and the associated ask prices. This is known as the bid construction problem (BCP) [55]. Different approaches for the bid construction problem (BCP) in transportation procurement auctions are proposed in literature. One of them can be found in [56] where authors proposed solving the BCP problem for heterogeneous truckload using exact and heuristic methods.
The paper proposes the use of statistical methods to support the decision-making process of a construction contractor related to the preparation of a price offer and entering a tender. Two classification methods were used as decision support models. The response obtained from the classification model is a recommendation to participate in the tender (qualification into the W-winning class), or a recommendation to resign (allocation into the L-losing class). To perform the analyses, it was necessary to use a database consisting of historical data, that is, resolved tenders. The research framework diagram is presented in Figure 2.

Data Acquisition
In [57], a literature survey and research gap analysis of statistical methods used in the context of optimizing bids were presented. The paper attempts to build a decision-making model using two statistical methods: regression analysis and discriminant analysis. In the methods derived from regression analysis, the values of the Y variable (the explained variable) are given before determining the model and based on them and the adopted factors, the parameters of the model are determined. However, in the case of discriminant analysis, the values of the variable are obtained when the model is determined.
Factors influencing decision-making were proposed as input parameters of the models (explanatory variables). As a result of research (a questionnaire survey) conducted by the author in Poland, presented and described in previous works [29,44], 15 factors were identified: x 1 -type of works, x 2 -experience in similar projects, x 3 -contractual conditions, x 4 -investor reputation, x 5 -project value, x 6 -need for work, x 7 -the size of the project, x 8 -profits made in the past from similar undertakings, x 9 -duration of the project, x 10 -tender selection criteria, x 11 -project location, x 12 -time to prepare the offer, x 13 -possible participation of subcontractors, x 14 -the need for specialized equipment, and x 15 -degree of difficulty of the works. The tender score was the model output variable (Y) representing the class: • W-win-interpreted as a recommendation to take part in a tender, • L-loss-interpreted as a recommendation to abandon the tender.
The starting point for the selected methods was the construction of a database. The research performed in Poland was of primary nature, based on information collected to solve a given decision problem. With regard to the type of research material, the study comprised quantitative research (evaluation of factors) and qualitative research: determination of the result obtained by the contractor in a given evaluated tender. The factors identified were used to evaluate the tenders entered into by the contractors participating in the research. Each factor, from x 1 to x 15 , was rated on a scale from 1 to 7, where the numbers meant 1-very unfavorable, and 7-very favorable influence of the factor on the decision to participate in the tender. This scale has already been used successfully in previous works [44]. The result for each tender evaluated was then recorded (W-win, L-loss). In the end, the database contained 88 evaluated tenders, of which 64 were lost cases (L) and 24 won cases (W). Selected database records of evaluated tenders including factor evaluations with the corresponding result obtained in the tender (W-win, L-loss,) are presented in Table 2.

Regression Analysis Model
The main task of the qualitative decision-making model will be to determine the probability of the contractor's success in the tender (winning) and to identify variables that significantly affect the outcome of the tender. A binomial (dichotomous) model is sought in which the explanatory variable Y is quantified by a zero-one variable. It takes two possible variants described by the codes "1"-W (win) and "0"-L (loss). If p i is the probability of the event Y i = 1, then 1 − p i is the probability of the event Y i = 0. The expected value of the variable Y i is [58,59]: In binomial models, it is assumed that pi is a function of the vector of values of the explanatory variables x i for the i-th object and the parameter vector β [58,59]: Depending on the type of F-function, different types of models are distinguished [60]: a linear probability model, logit model, and probit model. Using the simplest of the binomial models-the linear probability model-has many negative consequences described in the literature [58,61]. Probit and logit models, on the other hand, as indicated by some authors [60], are similar to each other and in practice one of them is used. Therefore, the search for a binomial model for the phenomenon in question was limited to a logistic regression model. The general form of the logit model is as follows [58,59]: where: β j -structural model parameters, u i -random component, ln i -unobservable qualitative variable, X ji -the values of the explanatory variables of the model, p i -the probability of taking the value "1" by the dependent variable Y i calculated from the logistic distribution density function.
Unobservable variable Y * i is defined as a latent variable, as one can observe only the binary variable Y i in the form: Logit according to [53], denotes the odds ratio of accepting to not accepting the value "1" for the variable Y i . It takes the value zero if p i = 0.5. In the case when p i < 0.5, the odds ratio takes a negative value, and when p i > 0.5, a positive one.

Discriminant Analysis Model
Linear discriminant analysis (LDA), presented in 1936 [62], enables the classification of cases (objects) into one of the predetermined groups based on explanatory variables (case characteristics). The use of linear discriminant analysis to classify objects (cases) [63] or supporting decision-making processes [64] are commonly found in the literature. The aim of discriminant methods is to determine which of the explanatory variables differentiate groups the most. The discrimination problem can be solved by means of discriminant functions which are most often linear functions of input variables characterizing the cases [65]. If group sizes are not comparable, a modified form of the discriminant function should be used [65]: where: K r -classification function (for the r-th group of cases), c rj -the coefficient of the r-th classification function with j-th input variable of significant discriminatory power, j = 0, 1, . . . , m', c ro = lnp ri -absolute term, probability p i means the a priori probability of qualifying the i-th object to the r-th group, n r -denotes the size of a given group, n-sample size.
Modeling takes place in several stages. In the first step of building the model, the discriminant function equation is sought by identifying variables that significantly discriminate groups. The next step is to check the statistical significance of the discriminant function and determine its coefficients. The next stage of the analysis is a classification procedure using classification functions.

Evaluation of the Proposed Models
To assess the quality and relevance of the performance of the proposed classification models [66], the following were proposed: • A relevance matrix that indicates the number and often the proportion of correctly and incorrectly classified cases; • Diagnostic test parameters: sensitivity (7), specificity (8), positive (9), and negative (10) predictive value, test reliability (11) based on the contingency matrix: Sensitivity indicates the ratio of true positives to the sum of true positives and false negatives. In the problem under analysis, it describes the ability to detect the winning cases.
Specificity means the ratio of true negatives to the sum of true negatives and false positives. In the problem examined, it describes the ability to detect the losing cases.
PPV (positive predictive value) denotes the probability that the case identified by the classifier as winning is indeed a winning case.
NPV (negative predictive value) stands for the probability that the case identified by the classifier as loss is indeed a losing case.
Effectiveness of the decision rule ACC (accuracy) implies the extent to which the results of the study reflect reality. where: TP-true positive results, FP-false positive results, TN-false negative results, FN-true negative results.

LOG Model-The Model Using Regression Analysis
Using logit regression, an attempt was made to estimate the qualitative variable Y, also trying to explain which factors, with what strength and in what direction, affect the chance of a tender success (Y). The parameter estimates are summarized in Table 3.
By analyzing the obtained results with the assumed significance level α = 0.1, only two variables significantly affect the model: x 3 -contractual conditions and x 6 -need for work. However, the p value for the variables x 12 -time to prepare the offer and x 15 -the degree of difficulty of the works, are slightly higher than 0.1, so it was decided to include these variables and recalculate the model. The parameter estimates for the logistic regression model (with four explanatory variables) are summarized in Table 4.
Finally, three variables were left (the non-significant variable x 12 -time to prepare an offer, was discarded) and recalculations were made.
The parameter estimates for the logistic regression model (with three explanatory variables) are summarized in Table 5.
The form of the proposed logit model (LOG model) is as follows: This means that the probability p i (that is, situation Y i = 1) is estimated as: Statistical verification of the logit model consisting in determining the degree of the model fitting the data and testing the statistical significance of the parameters was successful. The odds quotient is 9.62 and is higher than 1 which means that the classification is nine times better than what would be expected by chance. Using the proposed logit model, it is possible to estimate the probability with which a given tender will be won.

LDA Model-The Model Using Discriminant Analysis
In the first step of building the model, the equation of the discriminant function was searched for, indicating variables that significantly discriminate groups. To achieve this, the backward stepwise method was applied. In this approach, all variables are entered into the model (step 0) and then, in subsequent steps, one variable that is the least statistically significant is removed. Results with all 15 input variables (step 0) indicated at the assumed significance level α = 0.1, that only four variables significantly discriminate between groups (x 3 , x 5 , x 6 , x 13 ).
The results for the model and the evaluation of all 15 input variables (step 0) are given in Table 6. By analyzing the obtained results with the assumed significance level α = 0.1, only four variables (x 3 ; x 5 ; x 6 ; x 13 ) discriminated significantly between groups. The model parameters are as follows: Wilks' Lambda = 0.41344, the corresponding F statistic (15.72) = 6.8100, and p < 0.0000.
During the first step of the analysis, the variable x 2 was removed-the least significantly discriminating group. Subsequent steps (k = 2, . . . , 15) made it possible to select the most significant variables (Table 7). Finally, six input variables, x 1 -type of works, x 3 -contractual conditions, x 5 -project value, x 6 -need for work, x 13 -possible participation of subcontractors, x 15 -the degree of difficulty of the works, discriminate significantly between groups. The model parameters are as follows: Wilks' Lambda = 0.47047; the corresponding statistic F (6.81) = 15.195; p < 0.0000. It is worth noting that the smaller the value of Wilks' Lambda (from the range <0, 1>) the better the discriminating power the model has. In the analyzed example (0.47047), it is acceptable. Tolerance coefficient T k determines the proportion of the variance of the variable x k that is not explained by the variables in the model. If T k coefficient takes a value smaller than the default 0.01, the variable is more than 99% redundant with other variables in the model. Entering variables with low tolerance coefficients into the model may cause its large inaccuracy. In the model under consideration, the T k coefficients for the assumed variables exceed the value of 0.5. The next step of the analysis is to check the statistical significance of the discriminant function (Table 8) and to determine its coefficients. The eigenvalue of a discriminant function represents the ratio of the between-group variance to the within-group variance. Large eigenvalues characterize functions with high discriminatory power. Canonical correlation is a measure of the magnitude of the association between a grouping variable and the results of a discriminant function. It ranges from <0, 1>, where 0 means no relationship and 1 means maximum relationship. The value of 0.727691 means that the function is related to a grouping variable. The value of Wilks' Lambda is acceptable. The value of p = 0.000000 < 0.05. The proposed discriminant function is statistically significant and ultimately takes the following form: The next stage of the analysis is the classification procedure using classification functions. In the problem under analysis, two classification functions were defined (two groups were assumed; W-win, L-loss), which take the following form: • K 0 function, classifying to "L-loss" group:  (16) A given case is classified in the group for which the classification function takes the highest value.

Evaluation of Models-Discussion of Results
To evaluate the model, the classification efficiency expressed as the number of cases correctly classified into predefined classes was used. A summary of the performance of the proposed models is presented in Tables 9 and 10. The data in Tables 8 and 9 enable the basic parameters of the classification model to be determined. The results are given in Table 11. From the values in Table 10, the LOG model correctly classified 79.55% of the cases, more correctly predicting tender failure (83.82%). The values obtained show a good fit of the model, but it is worrying that the model indicated only three tender factors as statistically significant: x 3 -contractual conditions, x 6 -need for work, and x 15 -degree of difficulty of the works. In the case of the LDA model, classification into the set L-87.14% means that the model (analogous to the LOG model) more accurately predicts tender failure than winning (83.33%). The results obtained by the LDA model are better as it rendered 86% of correctly classified cases. The discriminant analysis, apart from the variables x 3 , x 6 , x 15 indicated also x 1 -type of works, x 5 -value of the project, and x 13 -possible participation of subcontractors, as significant variables for the model, where the greatest independent influence on the result of the discriminant function is exerted by the variable x 6 -the need for work, while the least x 3 -contractual conditions. The following is an extract from the LDA model results sheet with the values of the classification functions in relation to the observed (actual) values shown in Table 12. The analyses presented in this paper do not exhaust the issue of modeling contractors' decisions to participate in tenders for construction works. They can become a supplement of the models proposed so far in the literature. It should be noted that the construction of classification models requires having an appropriate database, which is built based on tender factors selected by the author of each model. It is also worth emphasizing that in the face of fierce competition on the construction market, contractors are looking for solutions to maximize their chances of winning tenders. It is worth noting the observations of the authors of the study [67], who noted that the bid preparation process, which is time-consuming and requires a lot of effort, may create the need to have appropriate specialists. Typically, large companies are more able to employ such specialists, while small and medium-sized companies are definitely more likely to feel the need for tools to support the proper selection of orders and the decision to tender. It therefore appears that the proposal to build and use mathematical models is appropriate.
In further research, using the author's constructed database, the author of the paper intends to apply methods of artificial intelligence. The same database, model input and output parameters will allow to objectively compare the effectiveness of these two approaches.

Conclusions
The construction company at each stage of its activity has to make a number of important decisions related to the functioning of the company. One of them is the decision to enter a tender. Although it involves company finances and resources, the decision is usually taken quickly and based on subjectively perceived information. A number of models and mathematical methods have been proposed in the literature to assist the decision maker and to increase the effectiveness of the decisions taken. In this paper, two statistical classification methods are used for modeling: linear regression and linear discriminant analysis. The response obtained from the classification model is a recommendation to participate in the tender (qualification into class W-win), or a recommendation to resign (allocation into class L-loss). To perform the analyses, it was necessary to use a database consisting of historical data, that is, resolved tenders. The comparison of the classification models shows that the model using linear discriminant analysis performed well (86% correctly classified cases). The backward stepwise method was used to eliminate the least statistically significant variables. Finally, from a set of 15 identified factors, six input variables (factors) were identified that significantly discriminate between groups: x 1 -type of works, x 3contractual conditions, x 5 -project value, x 6 -need for work, x 13 -possible participation of subcontractors, x 15 -the degree of difficulty of the works. With these variables, the model achieved good performance. The paper by [44] presents the results of a survey in which the works contractors selected the following as the most important factors influencing the decision to enter a tender: type of works, contractual conditions, experience in similar projects, project value, need for work. As can be seen, they mostly coincide with the results obtained from statistical methods. The obtained results (effectiveness of classification and values of model evaluation parameters) testify to the possibility of using the LDA model in practice.
Funding: This research received no external funding.