Abstract
Hazards at construction sites can lead to severe accidents, posing significant risks to worker safety, financial stability, and public confidence in industry safety standards. As a result, understanding and preventing these accidents has become increasingly critical. Although previous studies have examined historical accidents through detailed reports, few have systematically applied automated natural language processing (NLP) techniques to uncover hidden topics and patterns in large datasets without manual intervention. This study addresses this gap by applying topic modeling to 22,623 accident reports from the Occupational Safety and Health Administration (OSHA) spanning 2004 to 2023. The results demonstrate that BERTopic substantially outperforms the traditional LDA model across multiple accident datasets, achieving higher topic coherence and topic diversity. Leveraging contextual embeddings, BERTopic identifies nuanced risk scenarios, occupation–accident patterns, and temporal trends that earlier text-mining approaches often overlooked. The findings also generate actionable managerial insights, including peak accident periods, vulnerable worker groups, and scenario-specific risk factors. Overall, this study provides a clearer and more data-driven understanding of construction accident mechanisms through advanced topic modeling. Applying BERTopic for topic extraction and content analysis introduces a novel and effective approach to analyzing construction accident reports. The insights derived provide valuable guidance for decision-makers in risk mitigation and accident prevention, while helping to rebuild public confidence in safety standards. Moreover, the approach’s reproducibility and potential for broader safety applications contribute to fostering a safer construction environment.