Analyzing OSHA Construction Accident Reports Using BERTopic Topic Modeling for Thematic Insights

Yuntao Cao; Ziyi Qu; Shujie Wu; Yuting Chen; Martin Skitmore; Xingguan Ma; Jun Wang

doi:10.3390/buildings16010010

,

and

¹

School of Municipal and Environmental Engineering, Shenyang Jianzhu University, Shenyang 110168, China

²

School of Management Engineering, Qingdao University of Technology, Qingdao 266520, China

³

Civil Engineering Technology and Construction Management, University of North Carolina at Charlotte, Charlotte, NC 28223, USA

⁴

Faculty of Society and Design, Bond University, Robina, QLD 4226, Australia

Buildings2026, 16(1), 10;https://doi.org/10.3390/buildings16010010

This article belongs to the Special Issue Research on Safety Control and Risk Management in Construction Engineering: Progress, Challenges and Strategies

Version Notes

Order Reprints

Abstract

Hazards at construction sites can lead to severe accidents, posing significant risks to worker safety, financial stability, and public confidence in industry safety standards. As a result, understanding and preventing these accidents has become increasingly critical. Although previous studies have examined historical accidents through detailed reports, few have systematically applied automated natural language processing (NLP) techniques to uncover hidden topics and patterns in large datasets without manual intervention. This study addresses this gap by applying topic modeling to 22,623 accident reports from the Occupational Safety and Health Administration (OSHA) spanning 2004 to 2023. The results demonstrate that BERTopic substantially outperforms the traditional LDA model across multiple accident datasets, achieving higher topic coherence and topic diversity. Leveraging contextual embeddings, BERTopic identifies nuanced risk scenarios, occupation–accident patterns, and temporal trends that earlier text-mining approaches often overlooked. The findings also generate actionable managerial insights, including peak accident periods, vulnerable worker groups, and scenario-specific risk factors. Overall, this study provides a clearer and more data-driven understanding of construction accident mechanisms through advanced topic modeling. Applying BERTopic for topic extraction and content analysis introduces a novel and effective approach to analyzing construction accident reports. The insights derived provide valuable guidance for decision-makers in risk mitigation and accident prevention, while helping to rebuild public confidence in safety standards. Moreover, the approach’s reproducibility and potential for broader safety applications contribute to fostering a safer construction environment.

Keywords:

topic modeling; BERTopic; LDA; construction industry; accident report analysis

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Article metric data becomes available approximately 24 hours after publication online.