Previous Article in Journal
Comprehensive Performance Evaluation of C Class Fly Ash Stability and Activity Index Based on Projection Pursuit Regression
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

A Machine Learning Approach for Factor Analysis and Scenario-Based Prediction of Construction Accidents

1
Department of Civil Engineering, Chungnam National University (CNU), Engineering Hall #2, 99 DaeHakRo, Yuseong-gu, Daejeon 34134, Republic of Korea
2
Ninetynine Co., Ltd., Heeseong Plaza #312, 370 Wolgye-ro, Nowon-gu, Seoul 01905, Republic of Korea
*
Author to whom correspondence should be addressed.
Buildings 2025, 15(23), 4343; https://doi.org/10.3390/buildings15234343 (registering DOI)
Submission received: 29 October 2025 / Revised: 25 November 2025 / Accepted: 27 November 2025 / Published: 28 November 2025
(This article belongs to the Section Construction Management, and Computers & Digitization)

Abstract

The construction industry has persistently high accident rates, and major events continue despite strengthened safety management systems. This study analyzes 19,456 accident records from the national Construction Safety Management Integrated Information (CSI) system and applies a Light Gradient Boosting Machine (LightGBM) model to predict fatal versus injury outcomes. SHAP was used to identify influential factors and quantify each variable’s contribution. Fatal events represented about 5% of cases, reflecting substantial class imbalance. To address this, three oversampling methods—SMOTE, Borderline-SMOTE, and ADASYN—were tested. The ADASYN model showed the best performance (F1-score = 0.905, AUC = 0.879) and was selected as the final model. Oversampling was applied exclusively to the training folds during stratified 10-fold cross-validation on the training set. After identifying the optimal number of iterations, the model was retrained on the full training data and its final performance was evaluated on the independent test set. SHAP results indicated that Type of Accident, Accident Object, and Work Process were primary drivers of fatal outcomes, whereas Safety Management Plan and Public/Private Ownership helped lessen severity. Project Cost, Progress Rate, and Number of Workers moderated prediction strength through interactions with key variables. This study clarifies structural relationships among factors affecting accident outcomes using a LightGBM–SHAP framework that captures nonlinear interactions, supporting explainable artificial intelligence (AI)–based safety management and risk monitoring.
Keywords: construction safety; accident prediction; LightGBM; SHAP; explainable AI (XAI) construction safety; accident prediction; LightGBM; SHAP; explainable AI (XAI)

Share and Cite

MDPI and ACS Style

Kim, K.-n.; Cho, D.-g.; Lee, M.-j. A Machine Learning Approach for Factor Analysis and Scenario-Based Prediction of Construction Accidents. Buildings 2025, 15, 4343. https://doi.org/10.3390/buildings15234343

AMA Style

Kim K-n, Cho D-g, Lee M-j. A Machine Learning Approach for Factor Analysis and Scenario-Based Prediction of Construction Accidents. Buildings. 2025; 15(23):4343. https://doi.org/10.3390/buildings15234343

Chicago/Turabian Style

Kim, Ki-nam, Dae-gu Cho, and Min-jae Lee. 2025. "A Machine Learning Approach for Factor Analysis and Scenario-Based Prediction of Construction Accidents" Buildings 15, no. 23: 4343. https://doi.org/10.3390/buildings15234343

APA Style

Kim, K.-n., Cho, D.-g., & Lee, M.-j. (2025). A Machine Learning Approach for Factor Analysis and Scenario-Based Prediction of Construction Accidents. Buildings, 15(23), 4343. https://doi.org/10.3390/buildings15234343

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop