Risk-Informed Prediction of Dredging Project Duration Using Stochastic Machine Learning
Abstract
:1. Introduction
2. Literature Review
3. Methods
3.1. Stochastic Optimized Machine-Learning Framework
- Define the goal and collect the data accordingly.
- Identify the input variables of the model.
- Determine the distributions of the input variables.
- Realize random numbers from the finalized distributions for the machine learning.
- Establish the model and analyze the results of the simulations.
3.2. AI-Based Prediction Model
3.3. Swarm Optimization
3.4. Model Performance Evaluation
4. Risk-Informed Simulation for Dredging Project
4.1. Collection and Preprocessing of Data
4.1.1. Data Preprocessing
4.1.2. Feature Selection
4.1.3. Sensitivity Analysis on Influential Factors
4.2. Establishment of Deterministic Project Duration Model
4.2.1. Model with Default Parameter Settings
4.2.2. Model Parameter Optimization and Stability Test
4.3. Project Duration Risk Simulation
4.3.1. Risk Probability Density Function of Model Variables
4.3.2. Risk-Informed Curve
5. Development and Design of Risk Quantification System
5.1. Background and Tools for Interface Creation
5.2. Interface Creation Process
5.3. System Verification
5.4. System Application
6. Conclusions and Suggestions
- The database that was used herein comprised a total of 48 cases and eight factors, which were the amounts of sand, gravel and soil as proportions of the river, the average price of soil and gravel, the total dredging volume, the total tender price, the number of days with rainfall, and the cost of road transport. Relatively few data were obtained because this study, to the best of our knowledge, represents the first attempt for dredging projects in Taiwan. Data integrity is inadequate because no comprehensive database exists for analysis. Future studies should involve collaboration with the WRA to ensure that each river management office uploads relevant data.
- Combining machine learning with the optimization algorithm (PSO–SVR) improved the model performance: R increased from 0.318 to 0.647; MAE decreased from 77.415 to 63.020 (days); RMSE decreased from 100.234 to 76.555 (days); and MAPE decreased from 27.55% to 22.93%. Accordingly, PSO is suitable for use with the project duration model. After PSO was conducted to optimize the model parameters, ten-fold cross-validation was performed to test the stability of the model.
- Overfitting was detected because the applicable data were insufficient; the problems that were encountered by the river management office personnel in relation to dredging differed and the considered dredging projects had different scales. Consequently, available empirical references for the model were lacking, and the model was not applicable to all the construction projects. Adding the number of days with rainfall and the cost of road transport to the already used six factors yielded an error rate of 16.36%. The two added factors strongly influenced accuracy. Accordingly, the error could be improved by adding the highly influential factors.
- The machine-learning model, combined with the MCS, was used through a graphical user interface, which presents the predictions and risk assessment to the user. To test the applicability of the model, two newly collected dredging projects were considered. The prediction errors of the model were 26.46% and 16.77% for Cases 1 and 2, respectively. Hence, the stability of the model required improvement, although the predictive accuracy was acceptable.
- The thus obtained early predicted duration can be used as a reference basis for evaluating the appropriateness of construction planning, equipment access and environmental and social issues. If the project was delayed, the continuous siltation of the river may cause overflows and floods, which affect the safety of people’s lives and properties around the riversides.
Author Contributions
Funding
Conflicts of Interest
References
- Makridakis, S. The forthcoming Artificial Intelligence (AI) revolution: Its impact on society and firms. Futures 2017, 90, 46–60. [Google Scholar] [CrossRef]
- Mellit, A.; Kalogirou, S.A. MPPT-based artificial intelligence techniques for photovoltaic systems and its implementation into field programmable gate array chips: Review of current status and future perspectives. Energy 2014, 70, 1–21. [Google Scholar] [CrossRef]
- Wang, Y.-R.; Yu, C.-Y.; Chan, H.-H. Predicting construction cost and schedule success using artificial neural networks ensemble and support vector machines classification models. Int. J. Proj. Manag. 2012, 30, 470–478. [Google Scholar] [CrossRef]
- Yetilmezsoy, K.; Ozkaya, B.; Cakmakci, M. Artificial intelligence-based prediction models for environmental engineering. Neural Netw. World 2011, 21, 193–218. [Google Scholar] [CrossRef]
- Salehi, H.; Burgueño, R. Emerging artificial intelligence methods in structural engineering. Eng. Struct. 2018, 171, 170–189. [Google Scholar] [CrossRef]
- Chai, H.; Zhu, S.M. Financial risk assessment of engineering projects based on Monte Carlo simulation. Proj. Manag. Technol. 2012, 11, 79–82. [Google Scholar]
- Chou, J.-S. Cost simulation in an item-based project involving construction engineering and management. Int. J. Proj. Manag. 2011, 29, 706–717. [Google Scholar] [CrossRef]
- Raychaudhuri, S. Introduction to Monte Carlo simulation. In Proceedings of the 2008 Winter Simulation Conference, Miami, FL, USA, 7–10 December 2008; pp. 91–100. [Google Scholar]
- Sadeghi, N.; Fayek, A.R.; Pedrycz, W. Fuzzy Monte Carlo Simulation and Risk Assessment in Construction. Comput.-Aided Civ. Infrastruct. Eng. 2010, 25, 238–252. [Google Scholar] [CrossRef]
- Wu, Y.W. Application of engineering project time history risk management Monte Carlo simulation. Sinotech Eng. 2010, 55–65. [Google Scholar] [CrossRef]
- Wang, Y.; Yu, C. Predicting project success using ANN-ensemble classificaiton models. In Proceedings of the 2011 IEEE 3rd International Conference on Communication Software and Networks, Xi’an, China, 27–29 May 2011; pp. 47–51. [Google Scholar]
- Gibson, G.E.; Wang, Y.-R.; Cho, C.-S.; Pappas Michael, P. What Is Preproject Planning, Anyway? J. Manag. Eng. 2006, 22, 35–42. [Google Scholar] [CrossRef]
- Wang, Y.R. Research on Applying Artificial Intelligence to the Performance of Project Predictive Project—As an example in taiwan construction project. 2010. Available online: http://ir.lib.kuas.edu.tw/retrieve/8315/992221E151053.pdf. (accessed on 1 June 2020).
- Sooksatra, V.; Rujirayanyong, T.; Pewdum, W. Forecasting final budget and duration of highway construction projects. Eng. Constr. Archit. Manag. 2009, 16, 544–557. [Google Scholar] [CrossRef]
- Cai, X.Z.; Lu, G.C.; Xu, N.N.; Jia, A.M. Application of Monte Carlo Method in Estimating the Probability of Typhoon Invasion. Atmos. Sci. 2011, 39, 269–288. [Google Scholar]
- Mitchell, R.; Michalski, J.; Carbonell, T. An Artificial Intelligence Approach; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
- Li, Y.; Jiang, W.; Yang, L.; Wu, T. On neural networks and learning systems for business computing. Neurocomputing 2018, 275, 1150–1159. [Google Scholar] [CrossRef]
- Izenman, A.J. Multivariate regression. In Modern Multivariate Statistical Techniques; Springer: Berlin/Heidelberg, Germany, 2013; pp. 159–194. [Google Scholar]
- Vapnik, V. The Nature of Statistical Learning Theory, 2nd ed.; Springer: New York, NY, USA, 2013. [Google Scholar]
- Hong, J.-H.; Goyal, M.K.; Chiew, Y.-M.; Chua, L.H. Predicting time-dependent pier scour depth with support vector regression. J. Hydrol. 2012, 468, 241–248. [Google Scholar] [CrossRef]
- Chou, J.-S.; Chiu, C.-K.; Farfoura, M.; Al-Taharwa, I. Optimizing the Prediction Accuracy of Concrete Compressive Strength Based on a Comparison of Data-Mining Techniques. J. Comput. Civ. Eng. 2011, 25, 242–253. [Google Scholar] [CrossRef]
- Chen, K.Y.; He, J.H.; Xiao, H.C. Application of Support Vector Regression to Forecast of International Tourism Demand. Tour. Manag. Res. 2004, 4, 81–97. [Google Scholar]
- Hong, W.-C.; Dong, Y.; Chen, L.-Y.; Wei, S.-Y. SVR with hybrid chaotic genetic algorithms for tourism demand forecasting. Appl. Soft Comput. 2011, 11, 1881–1890. [Google Scholar] [CrossRef]
- Hong, W.-C.; Dong, Y.; Zheng, F.; Wei, S.Y. Hybrid evolutionary algorithms in a SVR traffic flow forecasting model. Appl. Math. Comput. 2011, 217, 6733–6747. [Google Scholar] [CrossRef]
- Luo, L.J.; Ding, H.F. Grain production forecasting model based on PSO-SVR. Stat. Decis. 2010, 2010, 37–38. [Google Scholar]
- Raghavendra, N.S.; Deka, P.C. Support vector machine applications in the field of hydrology: A review. Appl. Soft Comput. 2014, 19, 372–386. [Google Scholar] [CrossRef]
- Baydaroğlu, Ö.; Koçak, K. SVR-based prediction of evaporation combined with chaotic approach. J. Hydrol. 2014, 508, 356–363. [Google Scholar] [CrossRef]
- Suryanarayana, C.; Sudheer, C.; Mahammood, V.; Panigrahi, B.K. An integrated wavelet-support vector machine for groundwater level prediction in Visakhapatnam, India. Neurocomputing 2014, 145, 324–335. [Google Scholar] [CrossRef]
- Sumaiya Thaseen, I.; Aswani Kumar, C. Intrusion detection model using fusion of chi-square feature selection and multi class SVM. J. King Saud Univ.—Comput. Inf. Sci. 2017, 29, 462–472. [Google Scholar] [CrossRef] [Green Version]
- Patel, N.; Upadhyay, S. Study of various decision tree pruning methods with their empirical comparison in WEKA. Int. J. Comput. Appl. 2012, 60, 20–25. [Google Scholar] [CrossRef]
- Yeh, J.C.; Liu, Z.Q.J.; Dong, G.T. The Impact of Population Structure Change on Human Resources, Economic Growth and Social Welfare Allocation: A Case Study of China. J. Glob. Bus. Manag. 2011, 23–31. [Google Scholar] [CrossRef]
- Zhang, G.; Eddy Patuwo, B.; Hu, M. Forecasting with artificial neural networks: The state of the art. Int. J. Forecast. 1998, 14, 35–62. [Google Scholar] [CrossRef]
- Heidari, E.; Sobati, M.A.; Movahedirad, S. Accurate prediction of nanofluid viscosity using a multilayer perceptron artificial neural network (MLP-ANN). Chemom. Intell. Lab. Syst. 2016, 155, 73–85. [Google Scholar] [CrossRef]
- Kayarvizhy, N.; Kanmani, S.; Uthariaraj, R. ANN models optimized using swarm intelligence algorithms. WSEAS Trans. Comput. 2014, 13, 501–519. [Google Scholar]
- Karlik, B.; Olgac, A.V. Performance analysis of various activation functions in generalized MLP architectures of neural networks. Int. J. Artif. Intell. Expert Syst. 2011, 1, 111–122. [Google Scholar]
- Kennedy, J.; Eberhart, R. Particle swarm optimization (PSO). In Proceedings of the IEEE International Conference on Neural Networks, Perth, Australia, 27 November–1 December 1995; pp. 1942–1948. [Google Scholar] [CrossRef]
- Marini, F.; Walczak, B. Particle swarm optimization (PSO). A tutorial. Chemom. Intell. Lab. Syst. 2015, 149, 153–165. [Google Scholar] [CrossRef]
- Zheng, K.-W.; Wang, H.-F. A dynamic local and global conjoint particle swarm optimization algorithm. Int. J. Inf. Manag. Sci. 2014, 25, 1–16. [Google Scholar]
- Hong, Z.S. Research on Estimating Road Section Rate by Fuzzy Grouping Method. Bachelor’s Thesis, Jiaotong University, Shanghai, China, 2013; pp. 1–36. [Google Scholar]
- Singh, G.; Panda, R.K. Daily sediment yield modeling with artificial neural network using 10-fold cross validation method: a small agricultural watershed, Kapgari, India. Int. J. Earth Sci. Eng. 2011, 4, 443–450. [Google Scholar]
- Kamaruddin, S.; Ravi, V. Credit card fraud detection using big data analytics: Use of psoaann based one-class classification. In Proceedings of the International Conference on Informatics and Analytics, Pondicherry, India, 25–26 August 2016; pp. 1–8. [Google Scholar] [CrossRef]
- Bao, Z.; Watanabe, T. Mixed constrained image filter design using particle swarm optimization. Artif. Life Robot. 2010, 15, 363–368. [Google Scholar] [CrossRef]
- Ji, C.; Liu, F.; Zhang, X. Particle swarm optimization based on catfish effect for flood optimal operation of reservoir. In Proceedings of the 2011 Seventh International Conference on Natural Computation, Shanghai, China, 26–28 July 2011; pp. 1197–1201. [Google Scholar] [CrossRef]
- Saha, N.; Rahman, M.S.; Ahmed, M.B.; Zhou, J.L.; Ngo, H.H.; Guo, W. Industrial metal pollution in water and probabilistic assessment of human health risk. J. Environ. Manag. 2017, 185, 70–78. [Google Scholar] [CrossRef]
MAPE (%) | Prediction Performance |
---|---|
<10 | High Accuracy |
10~20 | Excellent |
20~50 | Reasonable |
>50 | Inaccuracy |
Variable | Model_1 | Model_2 | Model_3 | Model_4 | Model_5 |
---|---|---|---|---|---|
Sand (%) | Original | Original | Original | Original | Original |
Gravel (%) | Original | Original | Original | Original | Original |
Soil (%) | Original | Original | Original | Original | Original |
Average soil and gravel price (NTD/ton) | Original | Min_Max | Log | Original | Original |
Total dredging volume (ton) | Original | Min_Max | Log | Min_Max | Min_Max |
Transport road cost (NTD) | Original | Min_Max | Log | Original | Original |
Rainfall day (day) | Original | Min_Max | Log | Original | Original |
Project cost (NTD) | Original | Min_Max | Log | Original | Min_Max |
Project duration (day) | Original | Min_Max | Log | Original | Original |
Model | Parameter | Value | Model | Parameter | Value |
---|---|---|---|---|---|
LR | copy_x fit_intercept n_jobs normalize | TRUE TRUE 1 ####### | SVR | C | 1 |
cache_size | 200 | ||||
coef0 | 0 | ||||
degree | 3 | ||||
epsilon | 0.1 | ||||
gamma | auto | ||||
kernel | RBF | ||||
max_iter | −1 | ||||
shinking | TRUE | ||||
tol | 0.001 | ||||
verbose | ####### | ||||
CART | criterion max_depth max_features max_leaf_nodes min_impurity_decrease min_impurity_split min_samples_leaf min_sampls_split min_weight_fraction_leaf presort random_state splitter | mse None None None 0 None 1 2 0 ####### None best | ANN | activation | relu |
alpha | 0.0001 | ||||
batch_size | auto | ||||
beta_1/ beta_2 | 0.9/0.999 | ||||
early_stopping | ####### | ||||
epsilon | ####### | ||||
hidden_layer_size | (100,) | ||||
learning_rate | constant | ||||
learning_rate_init | 0.001 | ||||
max_iter/momentum | 200/0.9 | ||||
nesterovs_momentum | TRUE | ||||
power_t | 0.5 | ||||
random_state | None | ||||
shuffle | TRUE | ||||
solver | adam | ||||
tol | 0.0001 | ||||
validation_fraction | 0.1 | ||||
verbose/ warm_start | ####### |
Numerical Experiment | R | MAE (Day) | RMSE (Day) | MAPE (%) | |
---|---|---|---|---|---|
Model_1 | LR | 0.381 | 80.626 | 98.286 | 29.98 |
SVR | N/A | 80.508 | 103.362 | 32.05 | |
CART | 0.317 | 90.313 | 113.536 | 33.59 | |
ANN | N/A | N/A | N/A | N/A | |
Model_2 | LR | 0.363 | 80.169 | 98.527 | 29.91 |
SVR | 0.348 | 76.607 | 97.249 | 28.50 | |
CART | 0.304 | 93.292 | 125.434 | 34.96 | |
ANN | 0.136 | 90.099 | 111.084 | 34.35 | |
Model_3 | LR | 0.334 | 80.986 | 102.554 | 28.70 |
SVR | 0.318 | 77.415 | 100.234 | 27.55 | |
CART | 0.056 | 93.417 | 122.531 | 34.25 | |
ANN | N/A | N/A | N/A | N/A | |
Model_4 | LR | 0.336 | 84.291 | 102.482 | 31.45 |
SVR | N/A | 81.647 | 103.888 | 32.43 | |
CART | 0.227 | 111.000 | 139.179 | 41.03 | |
ANN | N/A | N/A | N/A | N/A | |
Model_5 | LR | 0.337 | 81.113 | 100.844 | 30.04 |
SVR | 0.198 | 80.209 | 102.836 | 31.89 | |
CART | 0.339 | 83.417 | 111.400 | 30.42 | |
ANN | N/A | 182.968 | 226.733 | 56.85 |
Data | R | MAE (Day) | RMSE (Day) | MAPE (%) |
---|---|---|---|---|
Duration_Fold_1_Result | ||||
Learning Data | 0.882 | 48.92 | 57.497 | 17.008 |
Test Data | 0.418 | 37.009 | 46.173 | 15.234 |
Duration _Fold_2_Result | ||||
Learning Data | 0.862 | 46.704 | 55.709 | 16.796 |
Test Data | 0.894 | 64.051 | 75.907 | 17.889 |
Duration _Fold_3_Result | ||||
Learning Data | 0.904 | 55.194 | 66.466 | 18.823 |
Test Data | 0.41 | 41.913 | 48.998 | 17.906 |
Duration _Fold_4_Result | ||||
Learning Data | 0.808 | 45.891 | 53.819 | 15.692 |
Test Data | 0.72 | 96.681 | 131.325 | 31.481 |
Duration _Fold_5_Result | ||||
Learning Data | 0.854 | 49.255 | 57.43 | 17.177 |
Test Data | 0.605 | 45.471 | 52.602 | 16.592 |
Duration _Fold_6_Result | ||||
Learning Data | 0.822 | 50.369 | 57.888 | 17.829 |
Test Data | 0.89 | 44.073 | 53.312 | 16.29 |
Duration _Fold_7_Result | ||||
Learning Data | 0.869 | 45.331 | 52.041 | 16.265 |
Test Data | 0.768 | 90.926 | 100.692 | 23.166 |
Duration _Fold_8_Result | ||||
Learning Data | 0.614 | 66.896 | 84.349 | 24.811 |
Test Data | 0.61 | 67.015 | 74.528 | 22.089 |
Duration _Fold_9_Result | ||||
Learning Data | 0.804 | 50.268 | 59.369 | 16.91 |
Test Data | 0.483 | 82.018 | 100.144 | 51.364 |
Duration _Fold_10_Result | ||||
Learning Data | 0.777 | 58.51 | 70.391 | 21.06 |
Test Data | 0.669 | 62.044 | 81.869 | 17.333 |
Data | R | MAE (Day) | RMSE (Day) | MAPE (%) |
---|---|---|---|---|
Learning Data | 0.820 | 51.734 | 61.532 | 18.237 |
Test Data | 0.647 | 63.020 | 76.555 | 22.934 |
Model Variable | Variable Description | Probability Density Function |
---|---|---|
X1 | Sand (%) | Logistic distribution |
X2 | Gravel (%) | Weibull distribution |
X3 | Soil (%) | Lognormal distribution |
X4 | Average soil and gravel price normalization | Minimum extreme value distribution |
X5 | Total dredging volume normalization | Weibull distribution |
X6 | Project total cost | Weibull distribution |
X7 | Rainfall day | Weibull distribution |
X8 | Transport road cost | Logistic distribution |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chou, J.-S.; Lin, J.-W. Risk-Informed Prediction of Dredging Project Duration Using Stochastic Machine Learning. Water 2020, 12, 1643. https://doi.org/10.3390/w12061643
Chou J-S, Lin J-W. Risk-Informed Prediction of Dredging Project Duration Using Stochastic Machine Learning. Water. 2020; 12(6):1643. https://doi.org/10.3390/w12061643
Chicago/Turabian StyleChou, Jui-Sheng, and Ji-Wei Lin. 2020. "Risk-Informed Prediction of Dredging Project Duration Using Stochastic Machine Learning" Water 12, no. 6: 1643. https://doi.org/10.3390/w12061643