This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
A Machine Learning Approach for Factor Analysis and Scenario-Based Prediction of Construction Accidents
by
Ki-nam Kim
Ki-nam Kim 1
,
Dae-gu Cho
Dae-gu Cho 2 and
Min-jae Lee
Min-jae Lee 1,*
1
Department of Civil Engineering, Chungnam National University (CNU), Engineering Hall #2, 99 DaeHakRo, Yuseong-gu, Daejeon 34134, Republic of Korea
2
Ninetynine Co., Ltd., Heeseong Plaza #312, 370 Wolgye-ro, Nowon-gu, Seoul 01905, Republic of Korea
*
Author to whom correspondence should be addressed.
Buildings 2025, 15(23), 4343; https://doi.org/10.3390/buildings15234343 (registering DOI)
Submission received: 29 October 2025
/
Revised: 25 November 2025
/
Accepted: 27 November 2025
/
Published: 28 November 2025
Abstract
The construction industry has persistently high accident rates, and major events continue despite strengthened safety management systems. This study analyzes 19,456 accident records from the national Construction Safety Management Integrated Information (CSI) system and applies a Light Gradient Boosting Machine (LightGBM) model to predict fatal versus injury outcomes. SHAP was used to identify influential factors and quantify each variable’s contribution. Fatal events represented about 5% of cases, reflecting substantial class imbalance. To address this, three oversampling methods—SMOTE, Borderline-SMOTE, and ADASYN—were tested. The ADASYN model showed the best performance (F1-score = 0.905, AUC = 0.879) and was selected as the final model. Oversampling was applied exclusively to the training folds during stratified 10-fold cross-validation on the training set. After identifying the optimal number of iterations, the model was retrained on the full training data and its final performance was evaluated on the independent test set. SHAP results indicated that Type of Accident, Accident Object, and Work Process were primary drivers of fatal outcomes, whereas Safety Management Plan and Public/Private Ownership helped lessen severity. Project Cost, Progress Rate, and Number of Workers moderated prediction strength through interactions with key variables. This study clarifies structural relationships among factors affecting accident outcomes using a LightGBM–SHAP framework that captures nonlinear interactions, supporting explainable artificial intelligence (AI)–based safety management and risk monitoring.
Share and Cite
MDPI and ACS Style
Kim, K.-n.; Cho, D.-g.; Lee, M.-j.
A Machine Learning Approach for Factor Analysis and Scenario-Based Prediction of Construction Accidents. Buildings 2025, 15, 4343.
https://doi.org/10.3390/buildings15234343
AMA Style
Kim K-n, Cho D-g, Lee M-j.
A Machine Learning Approach for Factor Analysis and Scenario-Based Prediction of Construction Accidents. Buildings. 2025; 15(23):4343.
https://doi.org/10.3390/buildings15234343
Chicago/Turabian Style
Kim, Ki-nam, Dae-gu Cho, and Min-jae Lee.
2025. "A Machine Learning Approach for Factor Analysis and Scenario-Based Prediction of Construction Accidents" Buildings 15, no. 23: 4343.
https://doi.org/10.3390/buildings15234343
APA Style
Kim, K.-n., Cho, D.-g., & Lee, M.-j.
(2025). A Machine Learning Approach for Factor Analysis and Scenario-Based Prediction of Construction Accidents. Buildings, 15(23), 4343.
https://doi.org/10.3390/buildings15234343
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details
here.
Article Metrics
Article metric data becomes available approximately 24 hours after publication online.