# Predicting Project’s Uncertainty Risk in the Bidding Process by Integrating Unstructured Text Data and Structured Numerical Data Using Text Mining

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Literature Review

#### 2.2. Pre-Bid Clarification

## 3. Materials and Method

#### 3.1. Data Collection

#### 3.2. Modeling Process

## 4. Data Modeling

#### 4.1. Text Mining

#### 4.1.1. Data Pre-Processing

#### 4.1.2. Text Topic Modeling

- Topic 1: Surface roughness requirements
- Topic 2: Disadvantaged Business Enterprise (DBE) work category
- Topic 3: Guardrail system detail
- Topic 4: Excavation quantity and payment
- Topic 5: Highway planting and irrigation
- Topic 6: Hot Mixed Asphalt (HMA) spec.
- Topic 7: Type of bid item and quantity clarification
- Topic 8: Manufacturing and supplying of equipment
- Topic 9: Monitoring report and duties
- Topic 10: Working day and schedule
- Topic 11: Temporary traffic control

#### 4.2. Integrated Data Set

- if the bidders have queried and the owner has not responded within a deadline;
- if the bidders have requested additional information on unclear information in the bidding document, but the owner has declined; and,
- if the bidders have made further proposals in case of danger and the owner has rejected.

- N01 is the number of bidders; N02 indicates whether the project is pre-clarified or unclarified; N03 is the number of NTB changes; N04 is the number of drawings changes; N05 is the number of specification changes; N06 is the number of bid item changes; N07 is the number of wage rate changes; and, N08 is the number of issued addendums issued.

#### 4.3. Prediction Model Algorithm

- ANNs: ANNs is a model that computes output values of artificial neurons by performing learning and assigning weights (connection strength of neurons) from a large number of input data. Input neurons are activated through sensors that sense the environment, and other neurons are activated through the weighted connections of previous active neurons [30]. One of the characteristics of ANNs is that they allow the development of sophisticated models through hidden layers.
- SVM: SVM is an algorithm for determining the optimal separating hyperplane that causes minimal generalization error [31]. SVM covers both linear and nonlinear classifiers [32]. For nonlinear classification, the given data needs to be mapped to the high dimensional feature space, and the kernel function is used to perform this efficiently.
- KNN: k-nearest neighbor classification is also called lazy learner. ANNs or SVMs first develop a model from the collected data and then apply it to the data to be classified, while KNN develops the model if data needs to be classified. The KNN algorithm classifies new objects according to the outcome of the closet object or the outcomes of several closest objects in the feature space of the training set [32]. Although one nearest data similar to the target data can be used for the classification, it is generally classified using k-nearest data.
- NB: The Naïve Bayes classifier considerably simplifies learning by assuming that the features are independent in each given class [33]. The Bayes classification is an algorithm that classifies groups with high posterior probability based on the knowledge of the prior probability and likelihood probability of each group using Bayes theorem.

## 5. Results

## 6. Discussion

## 7. Conclusions

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## References

- Ahmad, I.; Minkarah, I. Questionnaire survey on bidding in construction. J. Manag. Eng.
**1988**, 4, 229–243. [Google Scholar] [CrossRef] - Shash, A.A. Factors considered in tendering decisions by top UK contractors. Constr. Manag. Econ.
**1993**, 11, 111–118. [Google Scholar] [CrossRef] - Fayek, A. Competitive bidding strategy model and software system for bid preparation. J. Constr. Eng. Manag.
**1998**, 124, 1–10. [Google Scholar] [CrossRef] - Chua, D.K.H.; Li, D.Z.; Chan, W.T. Case-based reasoning approach in bid decision making. J. Constr. Eng. Manag.
**2001**, 127, 35–45. [Google Scholar] [CrossRef] - El-Mashaleh, M.S. Empirical framework for making the bid/no-bid decision. J. Manag. Eng.
**2012**, 29, 200–205. [Google Scholar] [CrossRef] - Brook, M. Estimating and Tendering for Construction Work, 3rd ed.; Butterworth-Heinemann: Oxford, UK, 2004. [Google Scholar]
- Laryea, S. Quality of tender documents: Case studies from the UK. Constr. Manag. Econ.
**2011**, 29, 275–286. [Google Scholar] [CrossRef] - Duzkale, A.K.; Lucko, G. Exposing uncertainty in bid preparation of steel construction cost estimating: II. Comparative analysis and quantitative CIVIL classification. J. Constr. Eng. Manag.
**2016**, 142, 04016050. [Google Scholar] [CrossRef] - Lee, J.H.; Yi, J.S.; Son, J.W.; Jang, Y.E. Pre-bid clarification for construction project risk identification using unstructured text data analysis. In Proceedings of the Joint Conference on Computing in Construction, Heraklion, Greece, 4–7 July 2017; pp. 219–227. [Google Scholar]
- Carr, P.G. Investigation of bid price competition measured through prebid project estimates, actual bid prices, and number of bidders. J. Constr. Eng. Manag.
**2005**, 131, 1165–1172. [Google Scholar] [CrossRef] - Zeng, J.; An, M.; Smith, N.J. Application of a fuzzy based decision making methodology to construction project risk assessment. Int. J. Proj. Manag.
**2007**, 25, 589–600. [Google Scholar] [CrossRef] - Leśniak, A.; Plebankiewicz, E. Modeling the decision-making process concerning participation in construction bidding. J. Manag. Eng.
**2013**, 31, 04014032. [Google Scholar] [CrossRef] - Han, S.H.; Diekmann, J.E. Approaches for making risk-based go/no-go decision for international projects. J. Constr. Eng. Manag.
**2001**, 127, 300–308. [Google Scholar] [CrossRef] - Han, S.H.; Diekmann, J.E. Making a risk-based bid decision for overseas construction projects. Constr. Manag. Econ.
**2001**, 19, 765–776. [Google Scholar] [CrossRef] - Mak, S.; Picken, D. Using risk analysis to determine construction project contingencies. J. Constr. Eng. Manag.
**2000**, 126, 130–136. [Google Scholar] [CrossRef] - Laryea, S.; Hughes, W. Risk and price in the bidding process of contractors. J. Constr. Eng. Manag.
**2010**, 137, 248–258. [Google Scholar] [CrossRef] - Williams, T.P.; Gong, J. Predicting construction cost overruns using text mining, numerical data and ensemble classifiers. Autom. Constr.
**2014**, 43, 23–29. [Google Scholar] [CrossRef] - Christodoulou, S. Optimum bid markup calculation using neurofuzzy systems and multidimensional risk analysis algorithm. J. Comput. Civ. Eng.
**2004**, 18, 322–330. [Google Scholar] [CrossRef] - Abotaleb, I.S.; El-Adaway, I.H. Construction Bidding Markup Estimation Using a Multistage Decision Theory Approach. J. Constr. Eng. Manag.
**2016**, 143, 04016079. [Google Scholar] [CrossRef] - Weiss, S.M.; Indurkhya, N.; Zhang, T. Fundamentals of Predictive Text Mining; Springer: London, UK, 2010; Volume 41. [Google Scholar]
- Tan, A.H. Text mining: The state of the art and the challenges. In Proceedings of the PAKDD Workshop on Knowledge Discovery from Advanced Databases; Beijing, China, 26–28 April 1999; pp. 65–70. [Google Scholar]
- Christopher, D.M.; Prabhakar, R.; Schütze, H. Introduction to Information Retrieval; Cambridge University Press: New York, NY, USA, 2008. [Google Scholar]
- Steyvers, M.; Griffiths, T. Probabilistic Topic Models. Handbook of Latent Semantic Analysis; Routledge: New York, NY, USA, 2007; Volume 427, pp. 424–440. [Google Scholar]
- Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent dirichlet allocation. J. Mach. Learn. Res.
**2003**, 3, 993–1022. [Google Scholar] - Erosheva, E.A. Grade of Membership and Latent Structure Models with Application to Disability Survey Data. Ph.D. Thesis, Carnegie Mellon University, Pittsburgh, PA, USA, 2002. [Google Scholar]
- Buntine, W.; Löfström, J.; Perkiö, J.; Perttu, S.; Poroshin, V.; Silander, T.; Tirri, H.; Tuominen, A.; Tuulos, V. A Scalable Topic-Based Open Source Search Engine. In Proceedings of the IEEE/WIC/ACM Conference on Web Intelligence, Beijing, China, 20–24 September 2004; pp. 228–234. [Google Scholar]
- Griffiths, T.L.; Steyvers, M. Finding Scientific Topics; National Academy of Sciences: Washington, DC, USA, 2004; Volume 101, pp. 5228–5235. [Google Scholar]
- Buntine, W.L. Estimating Likelihoods for Topic Models. In Proceedings of the 1st Asian Conference on Machine Learning, Nanjing, China, 2–4 November 2009; pp. 51–64. [Google Scholar]
- Sievert, C.; Shirley, K.E. LDAvis: A method for visualizing and interpreting topics. In Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces, Baltimore, MD, USA, 27 June 2014; pp. 63–70. [Google Scholar]
- Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw.
**2015**, 61, 85–117. [Google Scholar] [CrossRef] [PubMed] - Burges, C.J. Tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov.
**1998**, 2, 121–167. [Google Scholar] [CrossRef] - Ledolter, J. Data Mining and Business Analytics with R; Wiley: Hoboken, NJ, USA, 2013. [Google Scholar]
- Rish, I. An empirical study of the naive Bayes classifier. In Proceedings of the IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, Seattle, WA, USA, 4–10 August 2001; Volume 3, pp. 41–46. [Google Scholar]

**Figure 2.**Distribution of bid price relative to estimated cost between pre-clarified projects and unclarified projects.

Number of documents | 243 documents |

Number of terms (before preprocessing) | 5009 terms |

Number of terms (after preprocessing) | 3948 terms |

Minimum word length | 3 characters |

Sparsity | 98% |

Variables from Text Modeling | Variables from Numeric Data | |||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

T01 | T02 | T03 | T04 | T05 | T06 | T07 | T08 | T09 | T10 | T11 | N01 | N02 | N03 | N04 | N05 | N06 | N07 | N08 |

0.083 | 0.108 | 0.096 | 0.096 | 0.096 | 0.083 | 0.146 | 0.070 | 0.070 | 0.070 | 0.083 | 4 | 1 | 0 | 0 | 0 | 0 | 1 | 1 |

0.083 | 0.083 | 0.083 | 0.068 | 0.113 | 0.128 | 0.083 | 0.068 | 0.157 | 0.068 | 0.068 | 5 | 1 | 0 | 1 | 1 | 0 | 1 | 1 |

0.044 | 0.063 | 0.217 | 0.044 | 0.073 | 0.188 | 0.111 | 0.073 | 0.063 | 0.063 | 0.063 | 11 | 1 | 0 | 1 | 1 | 1 | 1 | 2 |

0.056 | 0.056 | 0.068 | 0.081 | 0.105 | 0.056 | 0.056 | 0.068 | 0.266 | 0.093 | 0.093 | 7 | 1 | 1 | 1 | 0 | 0 | 0 | 1 |

0.143 | 0.110 | 0.069 | 0.078 | 0.078 | 0.208 | 0.061 | 0.086 | 0.069 | 0.061 | 0.037 | 7 | 1 | 2 | 1 | 1 | 3 | 1 | 6 |

Classification Algorithm | Accuracy | Class Precision | Class Recall | ||||
---|---|---|---|---|---|---|---|

A | B | C | A | B | C | ||

ANNs | 45.83% | 55.6% | 41.7% | 38.9% | 55.6% | 33.3% | 46.7% |

SVM | 52.08% | 57.9% | 42.9% | 53.3% | 61.1% | 40% | 53.3% |

KNN | 50% | 52.6% | 46.2% | 50% | 55.6% | 40% | 53.3% |

NB | 45.83% | 50% | 333% | 53.3% | 50% | 33.3% | 53.3% |

Classification Algorithm | Accuracy | Class Precision | Class Recall | ||||
---|---|---|---|---|---|---|---|

A | B | C | A | B | C | ||

ANNs | 70.83% | 63.6% | 58.3% | 92.9% | 77.8% | 46.7% | 86.7% |

SVM | 72.92% | 63.6% | 63.6% | 93.9% | 77.8% | 46.7% | 93.3% |

KNN | 70.83% | 64.7% | 53.3% | 93.8% | 61.1% | 53.3% | 100% |

NB | 70.83% | 71.4% | 57.9% | 86.7% | 55.6% | 73.3% | 86.7% |

Text Topic | Key Terms | |
---|---|---|

Low risk group (A) | Type of bid item and quantity clarification (Topic 7) | Type, quantity, depth, grade, location |

Moderate risk group (B) | Monitoring report and duties (Topic 9) | Monitor, survey, permit, report, biologist |

High risk group (C) | Excavation quantity and payment (Topic 4) | Excavation, remove, rock, backfill, pay |

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Lee, J.; Yi, J.-S. Predicting Project’s Uncertainty Risk in the Bidding Process by Integrating Unstructured Text Data and Structured Numerical Data Using Text Mining. *Appl. Sci.* **2017**, *7*, 1141.
https://doi.org/10.3390/app7111141

**AMA Style**

Lee J, Yi J-S. Predicting Project’s Uncertainty Risk in the Bidding Process by Integrating Unstructured Text Data and Structured Numerical Data Using Text Mining. *Applied Sciences*. 2017; 7(11):1141.
https://doi.org/10.3390/app7111141

**Chicago/Turabian Style**

Lee, JeeHee, and June-Seong Yi. 2017. "Predicting Project’s Uncertainty Risk in the Bidding Process by Integrating Unstructured Text Data and Structured Numerical Data Using Text Mining" *Applied Sciences* 7, no. 11: 1141.
https://doi.org/10.3390/app7111141