Classifying the Level of Bid Price Volatility Based on Machine Learning with Parameters from Bid Documents as Risk Factors

: The purpose of this study is to classify the bid price volatility level with machine learning and parameters from bid documents as risk factors. To this end, we studied project-oriented risk factors affecting the bid price and pre-bid clariﬁcation document as the uncertainty of bid documents through preliminary research. The authors collected Caltrans’s bid summary and pre-bid clariﬁcation document from 2011–2018 as data samples. To train the classiﬁcation model, the data were preprocessed to create a ﬁnal dataset of 269 projects consisting of input and output parameters. The projects in which the bid inquiries were not resolved in the pre-bid clariﬁcation had higher bid averages and bid ranges than the risk-resolved projects. Besides this, regarding the two classiﬁcation models with neural network (NN) algorithms, Model 2, which included the uncertainty in the bid documents as a parameter, predicted the bid average risk and bid range risk more accurately (52.5% and 72.5%, respectively) than Model 1 (26.4% and 23.3%, respectively). The accuracy of Model 2 was veriﬁed with 40 veriﬁcation test datasets.


Introduction
Sustainability refers to the whole life cycle from siting to design, construction, operation, maintenance, renovation, and deconstruction [1,2]. Traditional research focused on the design and construction stages to maximize profits has gradually expanded to the entire life cycle of construction projects to realize sustainable development. Accordingly, many researchers have conducted valuable studies to minimize the impact on the environment by improving the energy performance of buildings and reducing waste. As a result, many studies on sustainability have developed remarkably around maintenance and subsequent steps. Recently, this trend has been further expanded to realize the results of many studies conducted so far [3]. Therefore, many studies have refocused on project management, which corresponds to the preceding stages in terms of sustainability [4]. The success or failure of a construction project starts from the initial stage of the project. More precisely, the feasibility at the bid and contract phases, stipulating plans for the future, enables the completion of a sustainable project.
Contracts for construction projects are created based on competitive bids. In general, the bidder who offers the lowest price is selected as the final winner. Therefore, determining the final price is crucial for bidders [5]. It is also difficult because the bid price affects the likelihood of gaining a satisfactory profit and winning the project [6]. The client provides a bid document to the bidders, who then examine it to estimate the bid price. Thus, the bid document plays an essential role in determining the bid price. If the content of the bid document is uncertain, the intention of the construction object may be ambiguous and cause mistakes during the construction phase, which may lead to construction rework and C i represents the construction cost and M i is the markup (i.e., contingency), which means the risk cost due to uncertainty in the bid document [9]. In other words, if the risk cost increases owing to uncertainty in the bid documents, the bid price increases [10].
The uncertainty factor in bid documents that causes the risk cost must be determined and investigated. However, because reviewing all bid documents in a limited time frame is difficult [11], businesses often rely on their experience rather than quantitative uncertainty measurements [12][13][14][15]. Because uncertainty in bid documents is affected by complex factors that are difficult to measure quantitatively, most qualitative research studies have been conducted in academia. Therefore, determining the risk cost remains a tough challenge [16][17][18][19][20]. Bid prices that are not adequately set negatively affect bidders, clients, and users alike. Bidders take the risk to a severe degree, which not only does not yield the expected return but can lead to more serious financial difficulties. Simultaneously, such a bid price may increase the cost of completion due to frequent design changes during project execution, increasing the burden on the client. Eventually, the project quality completed by this process could be worse, causing great inconvenience to users.
Although reviewing all documents may be challenging, pre-bid clarification documents contain much more uncertain information than other documents. This document type includes inquiries and answers from bidders and clients about the uncertainty factors in the bid documents; this information can be used as an input parameter for a machine learning-based model to construct a bid price. This study aims to examine whether the uncertainty measured in pre-bid clarification documents affects the bid price. This uncertainty may change the mean value or variance of the bid price. In this paper, these two changes are operatively defined as "bid price volatility." In this study, a sample of data from the California Department of Transportation (Caltrans) in the US was used to see how uncertainty in the tender document changes bid price volatility. Analyzing the uncertainty of the bidding document is very difficult. In particular, the volume of bidding documents is enormous because of construction project size, making analysis difficult. Crucially, construction project data has an unstructured text format, making quantitative analysis even more difficult. However, the authors solved this problem using the pre-bid clarification document, which inquired about this uncertainty as a proxy, and used it with the bid summary. This study suggests that the uncertainty of the bid document affects the change in the bid price volatility. This allows bidders to execute the project at a reasonable price between earning a profit and winning the project. Further, this reasonable price can improve the project performance and realize the client's satisfaction. More ultimately, it can extend sustainability in terms of the life cycle of the project.

Definition of Project Risk in Bid Phase
The definition of risk depends on the subject and purpose in a field. Because risk is a concept defined to quantify the uncertainty regarding danger, it differs from the latter; it is defined as the "possibility of loss or injury" in dictionaries. In academia, the risk is more clearly defined as a factor or condition that can cause loss or injury owing to uncertainty; this definition focuses more on the possibility of risk rather than the risk itself, typically expressed in the following equation [21]: Risk magnitude is one of several attempts to measure risk [16]. It is a useful indicator for determining priorities among various risk factors using "risk" and "uncertainty" as variables. However, this also acts as a limitation because relative comparisons between risk factors are possible, but absolute comparisons are impossible. Therefore, the quantitative relationship with the bidding price resulting from the risk in the actual bidding process is blurred. This suggests that a new indicator that can reflect the risk of bidding price is necessary for at least construction projects, and there have been many studies related to this. Abotaleb and El-Adaway [9] attempted to measure bidding risk as a percentage of markups. Besides the total construction cost, bidders present the total construction cost plus a specific rate as the bid price for pursuing profit while preparing for risks. In addition, a study was conducted to determine whether the successful bid price was a price that had more risk than necessary by using the contrast between the successful bid price and the average bid price [22]. Lee et al. [23] attempted to measure the bid risk by using an equation similar to the equation of Williams [22], but in which the engineer's estimate replaced the successful bid price. However, the previously suggested equations have limitations in that they are challenging to use in this study in the following aspects. First, it is a matter of the possibility of utilizing the markups. It is correct that the contingency is included in the price, but a third party such as researchers other than bidders cannot check from the bid history. This is because the contingency is included in one or more of several bid items of the bill of quantities (BOQ). Second, the successful bid price and the average bid price are values determined after the bidding date. It is difficult to predict similar projects' risks during the bid phase using these values.
In this study, the risk is defined as the quantitative uncertainty regarding risk, and the risk factor represents a factor that causes uncertainty regarding time, cost, and quality risk. We use two metrics that match our definition of risk in Section 4.1.2. The scope of this study covers construction projects, and the project risk corresponds to the uncertainty regarding risks that arise from the characteristics of the construction project. The bid varies according to the project delivery method in actual construction projects. In this study, the bid phase is considered as the period in which the construction bid is made with the design-bid-build method. Uncertainty in a bid document is one of the many project risk factors in the bid phase.

Project Risk Factors Affecting the Bid Price
Several researchers have studied project risk factors that affect the bid price. Construction projects can be classified into several types depending on the case, and any project type can include risk. Therefore, researchers have analyzed the risk without considering the project type. In this study, risk factors that affect the bid price are extracted from 13 reviews in the field of transportation (Table 1).
The above studies are of great significance in that they have substantially advanced the critical risk identification stage in risk management. Many studies have extracted common factors as considerations when bidding for projects. Existing studies have facilitated more detailed risk management by deriving or breaking down the priorities of risks to be considered when performing projects based on surveys of most experts. On the other hand, some studies have analyzed how the number of bidders affects the bid price and predicts the bid price through simulation using multiple variables instead of one variable. However, there is a limitation in not considering how the project risk is integrated into the project's initial bid price.

Uncertainty in Bid Document as a Project Risk Factor
Uncertainty in bid documents is one of the most crucial risk factors. The bid phase is the first stage of a project contract. The bidder submits the bid price after reviewing the extensive bid documents, which contain information on the following three aspects: (1) the bid procedure (e.g., the announcement, guide, participation application, participation notice, and bid), (2) contract (e.g., the general and special conditions), and (3) construction (e.g., the drawings, specification, and pre-bid clarification document). Each bid document has a different scope and form. For example, the specification document contains a set of documented requirements and the drawings that present the building requirements. The special conditions are contract clauses that apply only to the project subject to the contract; they are created by changing, adding, or deleting existing content in the General Conditions section. In other words, the bid documents present standards and procedures regarding the design, construction method, materials, and inspection for the completion of the construction object; thus, the bid documents constitute the basis for calculating the bid price. In addition, because bid documents are contract documents, they are the basis for judgment when legal problems arise in the future. Cost overrun can occur if the bidder fails to review the bid document's risk factors in advance [23]. Therefore, analyzing the uncertainty in bid documents is crucial.
Discrepancies, errors, and omissions cause uncertainty in bid documents; these are the leading causes of legal adjustment, arbitration, or litigation regarding the project costs. According to Tanaka [33], 74.4% of construction-related claims in the United States are due to uncertainty in bid documents; Erdis and Ozdemir [34] studied the dispute between a client and bidder, arguing that uncertain expressions in a bid document could lead to construction disputes.
Public projects in the US include a pre-bid clarification procedure that can resolve all uncertainty in bid documents before the bid. If bidders find uncertainty in a bid document during the quotation, they can contact the client, who must respond within the deadline. Relieving all uncertainty in bid documents through this approach helps bidders present the correct bid price [35]. New Work State in the US emphasized that the pre-bid clarification is a significant procedure for the client and bidder [36]. The former can calculate the project cost more accurately with less uncertainty. Pre-bid clarification is an institutional method that helps present accurate project costs and prevents possible future design changes, extensions, additional construction costs, and disputes [37].

Pre-Bid Clarification Document as a Proxy for Uncertainty in Bid Documents
Uncertainty in bid documents includes (1) unclear communication caused by discrepancies, errors, or the omission of information or (2) unclear requirements regarding the project object. In general, the bid process for a construction project involves many bid documents [23]. Each document may independently contain risk factors; besides this, they may interact with each other, creating risk factors. Hence, all bid documents must be carefully reviewed to determine the uncertainty level. However, it is complicated for bidders to identify all hidden risks within a short bid preparation time [16][17][18][19][20]23]. Therefore, in the actual field and academia, the uncertainty of bid documents has been considered a complex problem to solve [11,38] and risk beyond control [28].
The uncertainty that arises in the pre-bid clarification procedure is caused by factors, which occur in all the bid documents that the bidders read. These documents are incorporated into the pre-bid clarification document, which can serve as a proxy variable that gauges the entire bid document's uncertainty. For example, Daoud and Allouche [39] analyzed pre-bid clarification documents to examine which uncertainty factors occur in the bid documents of construction projects.

Hypothesis Development
From the literature review, there is a widely believed proposition: uncertain things during the bid phase affect bid price on the theoretical plane. However, the problem is that the factors classified as risks are mixed with what can be measured and what is not, what can be controlled and what is not possible, making quantitative analysis impossible. For this reason, when practitioners calculate prices, these uncertainties are guessed and reflected in prices without a factual basis. We made the following two assumptions to establish the hypothesis: (1) As the uncertainty increases, the bidders will reflect this in their prices, causing an increase in the overall average bid price. (2) The greater the uncertainty, the more significant the difference in prices offered by bidders will also increase, resulting in an increase in the range of bidding prices formed. Under these assumptions, we set up the following two hypotheses on the empirical plane.

Hypothesis 1 (H 1 ).
Factors derived from bid summary and pre-bid clarification document affect F 1 (x), representing the volatility of bid price.

Hypothesis 2 (H 2 ).
Factors derived from bid summary and pre-bid clarification document affect F 2 (x), representing the volatility of bid price.
In H 1 and H 2 , the factors consist of seven independent parameters obtained from the bid summary and pre-bid clarification document. Then, F 1 (x) of H 1 becomes Bid Average Risk (Equation (3)), and F 2 (x) of H 2 becomes Bid Range Risk (Equation (4)) discussed in Section 4.

Research Gaps and Research Questions
According to the Project Management Body of Knowledge [40], risk management research is based on (1) risk identification, (2) risk assessment, and (3) risk plan and control. The risk assessment, which is a leading step in risk planning and control, quantifies the potential impact of these uncertain factors [11]. However, many variables to be considered and interrelated make the analysis in the actual field and research studies difficult [14,24].
The fundamental reason is risk identification (which is a leading step); the general approach is to subdivide all project risk factors into controllable units based on specific criteria. Analyzing segmented risks can reduce uncertainty in the bid phase [37]. However, this approach has not been both quantitatively and qualitatively studied for risk factors in bid documents [23] because they contain vast amounts of information and differ in content depending on the project. Therefore, uncertainty in bid documents has been classified as an uncontrollable risk [28]. As mentioned in Section 2.2, researchers have only progressed to Level 1 by suggesting that uncertainty in bid documents is a risk factor; there are no sufficient specific studies on Level 2 [23]. In other words, published management studies have mainly focused on high-level risk factors and surveys with expert groups [11,38]. However, these data [41] only serve as references for determining the bid price in the bid phase.
The following factors must be investigated: first, regarding the social background, a reasonable bid price is crucial for establishing a reasonable project budget for the bidder and client [20]. This requires a decision support tool that can be used by practitioners who encounter difficulties in the bid price prediction. In research, a more quantitative study based on actual bid data is required to assess whether uncertainty in bid documents affects bid prices. This study aims to meet both academic and practical needs by analyzing whether uncertainty in bid documents affects the bid price.
Uncertainty in a bid document is expected to have the following effects on the bid price. First, each bidder will represent this risk factor in his/her bid price, thereby increasing the project's overall bid price (i.e., the bid price average). Besides this, the other bidders represent this risk factor in prices, which increases the range of the established bid price bands. In this study, these two X are defined as "bid average risk" and "bid range risk," respectively. Further, the bid price volatility comprises the two types of bid price fluctuations due to uncertainty in a bid document (i.e., the increase in the bid price average and range). This study aims to provide answers to the following two questions: The results of this study are two types of bid price volatility level classification models: • Model 1: level classification model without uncertainty in the bid documents; • Model 2: level classification model with uncertainty in the bid documents.
In this study, the performance of Model 2 is evaluated to support decision-making about bid prices.

Materials and Methods: Modeling Approach
Regarding risk management, this study on assessment is different from risk plan and control, which supports decision-making on participation in the bid phase. This is because the decision-making process regarding bid prices of bidders who have already decided to participate is supported in this study. In general, risk assessment studies can be classified into studies of B i and studies of M i (Equation (1)). Because it is difficult to collect and analyze sufficient data, mainly M i has been studied; in this study, B i is empirically evaluated based on the actual bid results. In addition, in other published studies, the uncertainty factors of bid documents were analyzed with a proxy (i.e., a pre-bid clarification document).
Uncertainty factors in bid documents are natural phenomena because construction projects are typically one-off projects. Considering the toxin clause that partially exists in the special conditions, an uncertainty factor in bid documents is problematic because the artificial content clearly defines who should be responsible in certain circumstances. Therefore, the uncertainty factors in bid documents considered in this study are limited to those that occur naturally because of specific characteristics of the construction industry.

Materials
The data from the bid results regarding the project risk factors discussed in Section 2.2 are the variables of interest. To study their effects, the data in which the influence of other factors can be minimized should be analyzed. The public construction project of Caltrans meets this purpose because of the following reasons: first, the uncertainty in the bid documents can be analyzed. Because Caltrans includes a pre-bid clarification process in the bid phase, the pre-bid clarification document is publicly available. Second, the quantity and quality of available project data are sufficient. Caltrans invests approximately $1.7 billion per year in approximately 450 projects, which is the largest of the 50 US states. The thousands of standardized project datasets of Caltrans have led to large amounts of high-quality data and excellent project management capabilities based on experience. Third, the absence of special conditions reduces influences from other than the variables of interest. Standard contracts used worldwide include the FIDIC (Fédération Internationale Des Ingénieurs-Conseils), JCT (Joint Contracts Tribunal), NEC (New Engineering Contract), and AIA (American Institute of Architects), mainly applied to private projects. By contrast, Caltrans uses federal-aid construction contracts (FHWA-1273) for public projects. In this case, only general conditions without special conditions (unlike private projects) are applied, which means that the projects are relatively standardized. The bid document of a standardized project reduces the influences of numerous external factors; it is considered suitable for observing the effect of uncertainty in bid documents on the bid price because of the absence of special conditions.
Caltrans has published all the bid results online since 2004 (they provide all bid documents, bid summaries, and important information). However, the online services for pre-bid clarification documents have been operated since 2011. The number of projects since the access date was 3584 during 2011-2018. In total, 3578 datasets were collected (six cases were excluded because they could not be accessed owing to system errors).

Methods
Pre-Data Analysis (Data Preprocessing Based on Bid Summary and Pre-Bid Clarification Document): in this step, information that can be obtained from the bid result is preprocessed into input parameters (IPs) and output parameters (OPs). Caltrans has published a bid summary containing the critical details of the bid results. In this study, the data related to project risk factors affecting the bid price (which was discussed in Section 2.2) are extracted from the bid summary (B. S.) and pre-bid clarification document (P. C. D.). Subsequently, the final dataset is constructed from the raw data.
Data Analysis (Two Classification Models of Bid Price Volatility Based on Machine Learning): Methods of analyzing data can be classified into several categories depending on the purpose of the study and the characteristics of the data. When analyzing data that is large and composed of various factors, techniques such as data mining through machine learning (ML) are mainly used, and the data mining method is actively applied in recent risk analysis studies [42]. Such data mining can be classified mainly into a prediction technique that derives a regression equation, such as statistical analysis, and a classification technique that determines the category of data. Therefore, this study uses a machine learning-based data mining classification technique as a data analysis technique to classify the level of bidding risk with a large amount of data composed of various variables. In this study, machine learning can be used to classify the OPs in data consisting of multiple IPs. In this study, the class of the OP is designated such that the model algorithm learns to classify the levels of bid price volatility with MATLAB. To evaluate the model performance during training and validate it through validation tests in a post-data analysis, the pre-data analysis's final dataset is classified into training and validation data (for the training validation and validation test, respectively). As a result, Model 1 (which does not include the uncertainty in the bid documents in the IPs) and Model 2 (which includes the uncertainty in the bid documents in the IPs) are generated.
Post-Data Analysis (Validation): Models 1 and 2 are tested in a validation test to determine whether the models created in the data analysis step show similar performance characteristics for new data other than the data used for training. The test results are presented in a confusion matrix, which is analyzed and discussed.  (Table 2). Meanwhile, the information extracted from the bid summary requires preprocessing to be used as a model parameter. Further, text data must be standardized through nominalization, and numeric data must be filtered based on a chosen range such that the outliers do not affect the model performance. The pre-bid clarification document contains the inquiries and answers for the project (Table 3).
Most inquiries aim to accurately estimate the bid prices by resolving uncertainty; because some inquiries do not, they must be preprocessed to include only those related to uncertainty. If these inquiries can be resolved with appropriate answers, they are excluded. The following describes the seven input parameters presented in Table 2 in detail for each risk type.

Time Risk
IP-1 (Working Days): IP-1 represents the period of completion of the project required by the client. If IP-1 is relatively short considering the size of the project, the bid price may increase owing to required rush or night work. The projects considered in this study have IP-1 values between 47 and 1530.
IP-2 (Project Location): the IP-2 of the raw data is an address close to the construction site. IP-2 is related to the local price index, affecting the bid price. In the US, the price index is generally determined at the state level; however, differences in prices can occur within a single state. Because California is a large state in the US, Caltrans divides its administration into 12 districts separate from their counties. Accordingly, in this study, IP-2 is coded as 1-12 according to the district.

IP-3 (Engineer's Estimate)
: the project cost to which bidders can refer for the bid price is IP-3 at the time of the announcement; because the raw data have a too wide IP-1 distribution, the range must be adjusted. In this study, projects in the range of $10,000,000-$280,000,000 are used in the model.

IP-4 (Bid Preparation Days):
IP-4 is when bidders have to review the bid document (including the uncertainties); thus, this time affects the accuracy of the bid price. IP-4 is calculated as the period from the bid announcement date to the bid opening date extracted from the raw data (values between 18 and 237).
IP-5 (Number of Bidders): to use IP-5 as an input parameter, it must be checked whether the information is known before the bid opening. Researchers have argued that the variable IP-5 influences the bid price [24,25,32]. When it increases, the bidders deliberately lower the bid price to win the project [43,44]. Thus, the bidders are aware of the number of competitors in advance; Christodoulou [43] studied the optimal M i (Equation (1)) based on this premise. Therefore, IP-5 is included as an input parameter with values from 2 to 12.

Quality Risk
IP-6 (Project Type): IP-6 can be mainly classified into roads (e.g., highways, freeways, or roadways) and bridges; the numbers "1" and "2" represent a road and bridge, respectively. IP-7 (Uncertainty of Bid Documents): 3578 raw datasets are screened through Section 4.1.1, which result in 269 final datasets with 6682 bid inquiries. As mentioned in Section 2.4, there are two uncertainty factors in bid documents: unclear communication (BI. 1-3) and unclear requirements (BI. [4][5]. Uncertain communication includes discrepancies, errors, and the omission of information in the bid document, each of which has overlapping meanings. For example, omission means that necessary information is missing owing to an error; thus, it can be interpreted as an error itself. Therefore, each term is clearly classified according to the mutually exclusive and collectively exhaustive principle. When certain identical information in various bidding documents causes conflicts, the case corresponds to case BI. 1: discrepancy. The case in which being inquired by an error of single information itself is categorized as case BI. 2: error. Uncertainty due to the lack of specific information is classified as case BI. 3: omission. Furthermore, uncertain requirements are classified into two types that ask for insufficient but non-essential information (BI. 4: insufficient information) or accept alternatives to the existing guidelines (BI. 5: alternative information). That is, only inquiries corresponding to BI. 1-5 among the pre-bid clarification document content are regarded as uncertainty factors. Through this process, 52 bid inquiries are excluded. After excluding the questions, the uncertainty of which has been resolved with appropriate answers (4336), the number of unsolved bid inquiries is 1994 with values between 2 and 59 for each project (i.e., the IP-7).
All coded IPs are used to train the model in the normalized form.

Output Parameters
As stated in Section 2.6, the output parameters of the models are the bid average risk (OP-1) and bid range risk (OP-2), which are based on the bid price of the raw data. The bid average risk is the ratio of the average price to the engineer's estimate (Equation (3)): For example, if the engineer's estimates of projects A and B are $10 billion and the respective average bid prices are $10 billion and $13 billion, the bid average risks are 1.0 and 1.3, respectively. Thus, it can be assumed that the bidders expect a higher risk for project B. Moreover, the bid range risk (OP-2) refers to the difference between the maximal and minimal bid price concerning the engineer's estimate (Equation (4)): For projects A and B with estimates of $10 billion and $1 billion, respectively, the differences between the maximal and minimal bid prices would be identical ($2 billion); however, the differences between the uncertainties of the two projects cannot be considered identical: project B is riskier than A. Finally, as mentioned in Section 2, the bid average risk and bid range risk defined in this section act as F 1 (x) of H 1 and F 2 (x) of H 2 .

Impact of IP-7 on Bid Price Volatility
The raw data of 3578 projects are preprocessed to create the final dataset of 269 projects. To answer research question 1 in Section 2.6, the final dataset should be classified into groups with and without uncertainty factors. In this study, projects with IP-7 values between 2 and 59 are considered a group with uncertainty factors, and projects with IP-7 values between 0 and 1 are considered a group without uncertainty factors. The OP-1 and OP-2 in each group are presented in Figure 1 and Table 4.    As shown in Figure 1 and Table 4, projects with uncertainty (Project 1) score higher values for the bid average risk and bid range risk than projects without uncertainty (Project 2). In other words, uncertainty in bid documents increases the bid price volatility.

Design
In the data analysis stage, the final data in Section 4.1 is used to implement a model for classifying the level of volatility in bid prices. This study presupposes that the uncertainty of the bid document is related to the bid price and ultimately tries to improve the accuracy of the bidding price volatility level classification model by adding the variable of the uncertainty of the bid document to the existing bid-related variables.
SPSS, MATLAB, R, and Python are mainly used for machine learning-based data mining classification. In this study, data analysis and model development were performed using Mathworks' MATLAB software as a data analysis tool.
Because this model predicts the classes of OP-1 and 2 with input parameters, the models are trained through supervised learning after the class designation. A suitable number of classes is significant because too many classes decrease the reliability of the prediction results; too few classes make the interpretation of the results difficult. In this study, four levels according to the bid price volatility distribution are considered.
When the boundary between the classes is set, a natural breakpoint is preferable; the breakpoint is the point at which the distribution of the OP values suddenly breaks. If there is no natural breakpoint, a boundary should be set such that the data of each class is evenly distributed to ensure reliability. Figure 2 shows the distributions of OP-1 and OP-2 of the 269 final projects; the parameters do not have natural breakpoints. The set boundaries between the classes of the models are presented in Table 5. The OP-1 and OP-2 have a total of 4 classes: OP-1 is a "-" class that is much smaller than 0, a slightly smaller class is "-," a slightly larger class is "+," and a much larger class is "++." On the other hand, classes of OP-2 were named with the symbols "+," "++," "+++," and "++++" in the order of close to 0.    Not all final datasets are used to train the model. The remaining datasets are used to check whether the classification model performs consistently for new data (typically 30% of the total data). Half these data is used for the validation during training, and the rest is applied in the validation tests (Section 4.3). Accordingly, 269 datasets are allocated to 40 for the training validation, 40 for the validation test, and 189 for the model training with machine learning. There are several machine learning algorithms for training classification models; in this study, neural net (NN), which shows good performance, is used. In addition, three algorithms are combined to evaluate the results derived with neural net. In total, four classification algorithms are used for the models: NN, decision tree (Tree), support vector machine (SVM), and K-nearest neighbor (KNN).

Results
In the bid price volatility level classification based on the model design in In this section, the accuracies of Model 1 trained without IP-7 and Model 2 trained with IP-7 are compared to answer research question 2 in Section 2.6 ( Table 6). According to Table 6, Model 2 performs better than Model 1 for all algorithms, including the NN. Second, the classification Models 1 and 2 result in higher accuracies for OP-2 than for OP-1 in most cases. Third, the NN exhibits the best performance in Models 1 and 2.
Model 1 results in accuracies with the order of SVM > KNN > Tree, and Model 2 results in accuracies with the order of Tree > SVM > KNN. Thus, IP-7 (a parameter of the uncertainty in bid documents) improves the performance of the model that classifies the level of the bid price volatility. Regarding the NN algorithm classifier, the performance scores of Model 1 (without IP-7) are 37.5% (OP-1) and 42.5% (OP-2), whereas those of Model 2 (with IP-7) are 63.9% (OP-1) and 65.8% (OP-2); these results are 26.4% and 23.3% higher than those of Model 1, respectively. This trend is identical for the averages of all algorithms: the accuracies of Model 1 are 34.1% (OP-1) and 38.4% (OP-2), while those of Model 2 are 60.9% (OP-1) and 60.8 % (OP-2).

Post-Data Analysis: Validation
In the validation test, the accuracy of a classification model is evaluated with data that have not been used for training. In this section, the training performance of Model 2 with IP-7 is validated based on the NN. There are various approaches for validating a machine learning model (e.g., the N-fold cross-validation, bootstrap, and sliding window methods); N-fold cross-validation is the most used approach. The results of the validation test are presented in a confusion matrix, which enables the determination of the accuracy of the model and true positive rate (TPR, which represents the recall and sensitivity): where "True Positive" (TP) refers to samples in which the positive cases are correctly classified, whereas "False Negative" (FN) refers to samples in which the negative cases are incorrectly classified. In other words, the TPR is the ratio of correct samples to total samples classified by the model. Figure  The validation accuracies of Model 2 for OP-1 and OP-2 are 52.5% and 72.5%, re tively; thus, the model correctly classifies the bid price volatility levels of 21 and 29 o 40 projects. First, the TPR of Model 2 for OP-1 is highest in the "+" class (66.7%) and lo in the "-"class (44.4%). However, according to the confusion matrix, incorrectly class samples are classified into groups of similar rather than entirely different classes. T because successive values are cut by the artificial boundaries of the class designa Moreover, the accuracy of the model for the test data (52.5%) is slightly lower tha accuracy of the model for the training data (69.3%). Second, the TPR of Model 2 for is highest in the "+++" class (76.9%) and lowest in the "++" class (66.7%). For OP-1 model classifies most of the samples into same or similar classes. In addition, the pe mance of the model for the validation test data is 72.5%, which is better than the accu of the model for the training data (64.6%).

Discussion
From a background in Introduction and Research Gaps, a decision support too bid price budgeting for the actual field and research studies was created based on qu tative data analyses with project-based bid results from other studies. Based on the quirements, two research questions were established, and Caltrans's bid results were to select related parameters based on previous studies. Based on the 269 final projec tasets obtained after preprocessing, the uncertainty level in bid documents was quan as the number of inquiries corresponding to BI. 1-5 in the pre-bid clarification docu that have not been resolved.
To answer research question 1, the 269 projects were classified into two grou group with IP-7 values of 0-1 and another group with IP-7 values of 2-59. The group The validation accuracies of Model 2 for OP-1 and OP-2 are 52.5% and 72.5%, respectively; thus, the model correctly classifies the bid price volatility levels of 21 and 29 out of 40 projects. First, the TPR of Model 2 for OP-1 is highest in the "+" class (66.7%) and lowest in the "-"class (44.4%). However, according to the confusion matrix, incorrectly classified samples are classified into groups of similar rather than entirely different classes. This is because successive values are cut by the artificial boundaries of the class designation. Moreover, the accuracy of the model for the test data (52.5%) is slightly lower than the accuracy of the model for the training data (69.3%). Second, the TPR of Model 2 for OP-2 is highest in the "+++" class (76.9%) and lowest in the "++" class (66.7%). For OP-1, the model classifies most of the samples into same or similar classes. In addition, the performance of the model for the validation test data is 72.5%, which is better than the accuracy of the model for the training data (64.6%).

Discussion
From a background in Introduction and Research Gaps, a decision support tool for bid price budgeting for the actual field and research studies was created based on quantitative data analyses with project-based bid results from other studies. Based on these requirements, two research questions were established, and Caltrans's bid results were used to select related parameters based on previous studies. Based on the 269 final project datasets obtained after preprocessing, the uncertainty level in bid documents was quantified as the number of inquiries corresponding to BI. 1-5 in the pre-bid clarification document that have not been resolved.
To answer research question 1, the 269 projects were classified into two groups: a group with IP-7 values of 0-1 and another group with IP-7 values of 2-59. The group with uncertainty in the bid documents generally had a higher bid average risk and bid range risk than the group without uncertainty. Thus, uncertainty in bid documents increases the bid price volatility. It is noteworthy that the project with an IP-7 value of 1 was also considered a project without uncertainty in this study because the considered uncertainty level depends on the stakeholders' opinion. For example, for projects with IP-7 values of 1, some bidders may believe that there is little uncertainty, whereas others may believe that the uncertainty level is relatively high. Therefore, the boundary of uncertainty can be expressed as follows: {project without uncertainty: IP-7 value of x|x = 0} ∪ {project with uncertainty: IP-7 value of x|x > 0} for risk-averse bidders and as {project without uncertainty: IP-7 value of x|0 ≤ x ≤ n, where n = 1, . . . , 59} ∪ {project with uncertainty: IP-7 value of x|x > n} for risk-takers. In other words, for projects with IP-7 values of n, the level of perceived uncertainty depends on the stakeholder; the case n = 1 was presented as an example in this study.
Furthermore, Models 1 and 2 were trained with an NN to examine whether IP-7 affects the bid price volatility classification. The accuracies of Model 1 for the bid average risk and bid range risk were 37.5% and 42.5%, respectively; those of Model 2 were 63.9% and 65.8%, respectively.
The accuracies of Model 1 exceeded 25%, which corresponds to the mathematical probability when one of four classes is randomly selected; nevertheless, the accuracies were too low for decision making. This means that parameters that are not included in the model have a more substantial influence on the bid price. By contrast, the accuracies of Model 2 were better, and similar phenomena were observed for the three other algorithms and NN. In other words, the influence of IP-7 is relatively strong compared to those of the other parameters included in the model. However, the fact that the accuracy of the models has stopped at the current level proves that there are still other parameters that are not included in the models and that significantly affect the bid price. However, this effect can be attributed to the nature of IP-7 itself; IP-7 represents the number of unsolved bid inquiries; this surrogate endpoint is set because it is impossible to cost the situation mentioned in each inquiry. For example, two inquiries might be worth $100 and $10,000, respectively; however, they are treated equally as a value of 1 in the models. This may be why the models score higher accuracies for the bid range risk than for the bid average risk. From a superficial point of view, this implies that the model's application may be limited due to its accuracy. However, the more time it takes to consider accurately the cost of each inquiry, the less time it takes to identify more risks, which in turn increases the uncertainty of the bidding document. In this respect, obtaining such a relatively high accuracy only with the number of unsolved bid inquiries is a good sign.
Moreover, the results were validated with test data, and the answer to research question 2 was found. Bidders should determine the bid price by integrating projectrelated information. Before this, the bidders should add the cost of project uncertainty in the markups along with Ci (Equation (1)), which is calculated based on construction budgeting. Because "uncertainty" literally means "lack of knowledge," there is no raw data for calculating it; thus, Model 2 can be applied. For example, if the project's bid average risk is classified as "-" the average of the bid price is likely to be much lower than the engineers' estimate. If the currently calculated Ci is high, Mi must be lowered, or Ci must be adjusted to win the bid. If the results are below the lower bound of the expected profit, the bidder may stop participating in the project. Likewise, if the bid range risk is classified as "+++," many bidders may present prices that deviate much from the engineers' estimate; thus, the bidder may use this strategy to adjust Ci and Mi.
Since there are too many uncertainties affecting the bid price of a construction project and many of them are risks, researchers have performed many valuable studies, and practitioners have been through trial and error. However, when a specific point in time and space is determined, many risks are eventually settled. For example, a city's price index itself fluctuates over time, but finally, the city's price index is determined as one at the time and place of the project. So, for bidders, one dilemma between winning bids and project profits is, therefore, whether they should call more or less than expected. These finegrained adjustments are ultimately determined by project-intrinsic-risks that have not been finalized. The bidding document is a kind of contract document, and all the explicit contents contained in it are directly related to the price. However, there is a problem that uncertain content cannot be dealt with one-to-one with the price. Moreover, since the contents of the bidding document are all different for each project, it was impossible to solve this with the existing linear method. On the other hand, the authors measured the uncertainty of each bid document, not the content of the bid document, and analyzed the risk through the NN algorithm. This study provided an answer to how much the uncertainty in the contents of bid documents increases or decreases the already expected price.
Nevertheless, the method presented in this study has several limitations, which must be considered in future research studies. First, the data are limited. Caltrans's project data were used to minimize the influence of external factors; however, owing to the characteristics of standardized data, the impact of uncertainty in bid documents (which is one of the internal factors) may be relatively weak. In this respect, it might be difficult to generalize the results of this study. Another limitation is related to the preprocessing of IP-7. Because the pre-bid clarification document contains unstructured text data, which is challenging to be computationally processed, there is no automated method for quantifying it; thus, the authors manually analyzed 6682 datasets. Because this process can introduce human error, the authors mutually verified the results. Third, there are obvious limitations in that this paper does not address all of these factors and only deals with the risk of uncertainty in the bid document. The risk of fluctuations in the material and workforce market is a significant factor affecting the bid amount and must be considered in the bidding stage. The bidders scrutinize these through market research. Meanwhile, the risk of fluctuations in the material and labor market varies with time and location, which means they are variables. If they are determined, the risk of volatility could be determined, too. Since the bidders bid simultaneously for work performed at the same location, these costs will reach some agreed value. However, this cannot be confirmed as a single value, so it remains a risk of the bid price. Future studies have to further consider the remaining major fluctuation risks and use them as parameters. In that case, significant improvements are expected to improve model accuracy.

Conclusions
In this study, a classification model for the bid price volatility level was developed by analyzing the relationship between the uncertainty in bid documents and bid price based on bid data. The model results in this study reveal that uncertainty in bid documents is causing bid prices to rise or fall more than necessary. Therefore, it is essential to conduct a thorough review of items causing uncertainty before bidding. We present these items as discrepancy, error, omission, insufficient information, and alternative information. Besides this, the inclusion of a pre-bid clarification process allows the price of the project to converge more appropriately if the remaining uncertainties are eliminated at the time of bidding. This study has the following contributions: the first step in quantifying the uncertainty in bid documents. The results of published qualitative studies of risk identification were evaluated with bid result data in this quantitative study. The model proposed in this paper enables risk management at a lower level: in the new approach, the uncertainty in bid documents considered to have an uncontrollable risk is analyzed with a pre-bid clarification document. Through this, the theoretical gaps are closed. The results of this study can help bidders to determine the bid price. According to the results, the pre-bid clarification in the bid phase is an essential process because the resolution of uncertainty in the bid documents can reduce the bid average risk and bid range risk. Accordingly, it is expected that the bid price of the project, the risk of which has been resolved during the pre-bid clarification process, will converge to a more acceptable price. The construction objectives created with this price will improve bidders' profitability and meet the client's expectations, which ensures a successful construction project. The final contribution of this study is that the concept of sustainability has been further expanded in construction projects. In construction, the idea of sustainability is meaningful in that it extends the project management, which focused on the design and construction stages, to the entire life cycle. Until now, researchers have carried out valuable studies related to sustainable energy use, noting the importance of maintenance and operation phase after completion. However, the early stages of the project are also of great importance. This study tried to ensure the success and sustainability of the project through research on a reasonable project price. In the future, the authors will establish a method that comprehensively analyzes the uncertainty in unstructured text data from public projects of various institutions that provide pre-bid clarification documents for the automatic extraction of the IP-7 content. Further, the authors will combine this with the results of this study to establish a more general and highly accurate risk classification model.