New Trends in the Use of Artificial Intelligence and Natural Language Processing for Occupational Risks Prevention
Abstract
1. Introduction
2. Methodology
2.1. Data Sources and Search Strategy
2.2. Eligibility Criteria
2.3. Screening and Selection Process
2.4. Data Extraction and Synthesis
- Bibliographic and contextual information, such as first author, year of publication, country or region (when reported), publication venue and study type (methodological paper, empirical case study, review, or framework proposal), as well as industrial sector and setting: dominant sector(s) addressed (e.g., aviation, construction, mining, chemical and process industries, manufacturing, transportation, healthcare, public sector and other services) and any specific workplace or process characteristics relevant to OSH.
- Data sources and modalities, including type and origin of the safety-related data (e.g., free-text accident/incident reports, near-miss and hazard observations, PSIF/FRCP datasets, occupational injury or disease registries, compensation claims, exposure or environmental monitoring data, job descriptions, safety-inspection reports, training materials, video or image data for PPE/unsafe-condition detection, sensor and process data in multi-source frameworks). Particular attention was given to whether unstructured text was the primary data source or part of a multimodal pipeline.
- AI/NLP/LLM and vision methods: main model families employed (e.g., classical supervised machine learning; topic models and other unsupervised text-mining techniques; word and sentence embeddings; recurrent or convolutional neural networks; transformer-based NLP models; LLMs and retrieval-augmented generation pipelines; ensemble models; computer vision and vision–language architectures) and any domain adaptation, fine-tuning or strategies reported.
- Analytical objectives (e.g., automated classification of incident types or causes; extraction of causal chains and contributing factors; topic modeling of safety concerns; prediction of accident occurrence, likelihood or severity, including PSIF and FRCP levels; risk-index estimation; early warning and anomaly detection; generation of job- or task-specific safety guidance and reports; PPE compliance or unsafe-condition detection; monitoring of safety-culture or safety-climate indicators).
- Evaluation and performance metrics (e.g., accuracy, precision, recall, F1-score, AUC, confusion matrices) and/or qualitative assessments (e.g., expert validation, comparative analyses with baseline methods, user or practitioner feedback) used to evaluate model performance and practical utility.
- Advantages, limitations and implementation aspects, such as improved prediction accuracy, better handling of unstructured narratives, ability to integrate multi-source data, explainability and data quality, under-reporting, representativeness issues, transparency and explainability challenges.
3. Results
3.1. Aviation
3.2. Construction
| Objective | Methods | Results | Reference |
|---|---|---|---|
| Effective retrieval of relevant historical cases to prevent occupational risks in the construction industry. | Euclidean distance measure, cosine similarity measure and the co-occurrence and structured term vector model to represent unstructured textual cases. | Demonstration of the superior information retrieval of NLP-based models over traditional methods in a construction management information system. | [66] |
| Text mining and NLP techniques are used to classify accident causes and identify common hazardous objects from construction accident reports. | Five baseline models (Support Vector Machine, Linear Regression, K-Nearest Neighbor, Decision Tree, Naive Bayes) and an ensemble model, with the Sequential Quadratic Programming (SQP) algorithm to optimize the weights of classifiers within the ensemble. | Optimized models in terms of average weighted F1-score, even with low support, enabling automatic extraction of common objects responsible for accidents. | [1] |
| Identify injury precursors from construction accident reports to predict and prevent workplace injuries. | Convolutional Neural Networks (CNNs) and Hierarchical Attention Networks (HANs), combined with Term Frequency-Inverse Document Frequency (TF-IDF) and Support Vector Machines (SVMs). | Improve the understanding, prediction, and prevention of workplace injuries and provide tools that allow users to visualize and understand the predictions. | [69] |
| Effective management of occupational risks in the field of construction safety. | NLP with a Named Entity Recognition (NER) scheme specifically designed for the construction safety domain. | Effective and reliable annotator scheme with an agreement rate of 0.79 F-Score, overcoming previous limitations such as scope issues within hazard classification and the lack of coverage for specific construction activities, body parts injured, harmful consequences and protective measures. | [74] |
| Analysis of near-miss reports to prevent potential accidents in the construction industry. | Bidirectional Transformers for Language Understanding (BERT) for automatic classification of near-miss data. | Outperforms the performance of other current state-of-the-art automatic text classification methods. | [70] |
| More effective precautionary strategies and, consequently, improved safety assessments for construction projects. | Symbiotic Gated Recurrent Unit (SGRU) using NLP for text data preprocessing. | Improved classification accuracy and removal of human error in accident analysis and root cause identification. | [72] |
| Identification of the critical causes of metro construction accidents in China. | Development of a text mining strategy incorporating metric-information entropy weighted term frequency (TF − H)—metric to evaluate the importance of terms. | Successful extraction of 37 safety risk factors from 221 metro construction accident reports, demonstrating effective distillation of important factors from accident reports regardless of their length. | [73] |
| Extract and categorize safety risks from records, focusing on high-frequency but low-severity risks that are often missed by traditional methods. | Text mining Word2Vec models integrated with NLP. | Seven unsafe-act-related and nine unsafe-condition-related risks were uncovered, revealing predominant inappropriate human behaviors and the primary sources of safety hazards on site. | [68] |
| To establish an automatic inspection mechanism. | Use of NLP to integrate Building Information Modeling (BIM) with a safety rule library. | Development of a safety rule-checking system for the construction process. | [75] |
| Prevention of Fall From Height (FFH) accidents in the context of occupational safety. | NLP combined with knowledge graphs (KGs). | A robust approach to enhance occupational safety, using NLP and knowledge graphs, to mitigate FFH risks and improve prevention strategies. | [76] |
| Occupational risk prevention in the construction industry using NLP and semi-supervised machine learning techniques. | Yet Another Keyword Extractor (YAKE) with Guided Latent Dirichlet Allocation (GLDA). | Effectiveness of the YAKE-GLDA approach, achieving an F1-score of 0.66 for OSHA injury narratives and an F1-score of 0.86 for specific categories, significantly reducing the need for manual intervention. | [71] |
| Mining of safety hazard information in construction documents presented in unstructured or semi-structured formats. | Term recognition models using semantic similarity and information correlation and term frequency-inverse document frequency methods (TF-IDF). | Automatic extraction and visualization of safety hazard information. | [67] |
| Extracting information from construction accident investigation reports in China to identify causes and see underlying patterns. | Text mining techniques and Dirichlet latent allocation (LDA) models were combined. | Delayed hazard identification and inadequate safety management on construction sites are the most frequent causal factors. | [85] |
| Analyze the causes and trends of industrial accidents at small-scale construction sites in South Korea to improve safety management and prevention strategies. | Statistical analysis, latent Dirichlet allocation (LDA) topic modeling and network analysis were applied to KOSHA accident data from 2018 to 2022, focusing on small-scale construction sites. | Scaffolding and working platforms were identified as the most critical cause of accidents, with falls being the predominant type; findings provide evidence to enhance safety culture and preventive measures for construction workers. | [86] |
| Improve construction safety management by automatically extracting semantic information from on-site video data, enabling more effective monitoring of worker performance and safety conditions. | A visual attention framework integrating frame extraction with interframe differences and a ResNet101–LSTM attention model was developed to generate natural language descriptions from construction video frames and validated on offline scene image datasets. | The framework accurately captured objects, relationships and attributes, enhancing automated safety monitoring, worker assessment and video management. | [77] |
| Identify critical safety risks and key transfer pathways in subway construction environments | Text mining, association rules and complex network modeling were applied to incident reports to extract risk factors and map their interrelationships. | Key risks include inadequate safety management, unimplemented responsibilities, operational violations and insufficient training; controlling these factors or disrupting transfer paths effectively mitigates accidents. | [87] |
| Improve accident analysis in highway construction by leveraging LLM to extract insights from textual injury reports and identify major causes of severe incidents. | OpenAI’s GPT-3.5 was applied to OSHA’s Severe Injury Reports (SIR) database, integrating natural language processing, dimensionality reduction, clustering algorithms and LLM-based summarization to analyze and categorize accident narratives. | LLM-assisted cluster and causal analysis identified key accident types, demonstrating AI’s potential to support data-driven safety strategies and enhance accident prevention in construction. | [78] |
| Extracting useful information from road construction accident reports using LLM. | OpenAI’s GPT-3.5 was applied to OSHA Severe Injury Reports (SIR), integrating NLP techniques, dimensionality reduction, clustering algorithms, and LLM-based prompting to identify patterns and causes of major accidents. | The most significant types of accidents were identified, including those related to heat and pedestrian accidents, associating recurring factors in the cases, demonstrating the potential of AI analysis to support more effective accident prevention and intervention strategies. | [79] |
| Elucidate the underlying causes of construction accidents in highway work zones—among the most hazardous environments in the transportation sector—to inform targeted safety interventions. | Employed advanced text mining and latent Dirichlet allocation (LDA) modeling on OSHA narrative reports, complemented by social network analysis (SNA) to quantify interrelationships and criticality among root causes. | Four dominant root causes—supervisory negligence, low safety awareness, poor work environments and risk-taking behavior—were identified as critical to improving highway work zone safety. | [88] |
| Develop an automated system to extract and manage construction safety knowledge, enhancing risk assessments and reducing reliance on individual expertise within the construction sector. | Combined natural language processing (NLP) with graph-based models to extract predefined knowledge from unstructured construction data and construct an entity-relationship knowledge base, including entity-name recognition and keyword-extraction engines. | The proposed method efficiently and effectively generated a construction risk-assessment knowledge base, outperforming existing approaches and providing a foundation for automated knowledge management in construction safety. | [89] |
| Enhance accident prediction and safety management in the construction sector by integrating ontologies with deep learning models to leverage knowledge from construction accident reports. | Developed a construction safety ontology using domain word discovery and literature analysis, transformed accident reports into conceptual vectors via TransH and implemented a TextCNN model, comparing performance against five traditional machine learning models. | The ontology-integrated TextCNN model outperformed all baseline models, achieving 88% accuracy and 0.92 AUC, demonstrating improved predictive performance and actionable insights for construction site safety management. | [90] |
| Evaluate the effectiveness of a Retrieval-Augmented Generation GPT (RAG-GPT) model for generating accurate and detailed construction safety information. | The RAG-GPT model was evaluated against four GPT variants, with responses assessed by researchers, safety experts, and construction workers using quantitative and qualitative metrics. | RAG-GPT outperformed other models, providing more accurate and contextually relevant safety information, demonstrating the efficacy of retrieval-augmented strategies in construction safety management. | [82] |
| Predict and prevent construction accidents by leveraging large language models to identify key accident types from textual reports. | Transfer learning was used with a precisely tuned, pre-trained generative transformer (GPT). | The generated model achieved 82% accuracy in predicting six types of accidents, enabling proactive safety interventions. | [80] |
| Strengthen safety risk management in the construction sector by automatically generating high-quality, activity-specific safety guidance using LLM. | A Retrieval-Augmented Generation framework was employed to retrieve pertinent information from 64,740 construction accident reports, integrating domain-adapted text embeddings with LLM-based natural language generation to produce context-specific safety guidance. | The generated safety risk management guidance was found to be of equivalent or superior quality to those written by experienced practitioners through a double-blind peer review. | [22] |
| Identify gaps in construction site safety inspections within the construction sector, highlighting which leading indicators fail to capture hazards associated with workplace incidents. | Natural language processing (NLP), text mining, and deep learning (SBERT) techniques were applied to generate embeddings from 633 incident reports and 9681 inspection descriptions, followed by root cause analysis and visualization using bow-tie and Sankey diagrams. | High-risk hazards—working at heights (81%), equipment handling/storage (17%) and ergonomics (0.4%)—were inadequately captured during inspections, providing actionable insights to enhance predictive and proactive risk management in construction. | [83] |
| Construct an automated framework to identify and quantify Fall From Height (FFH) risk factors in construction. | LLM generated a FFH knowledge graph from 1097 accident reports, with clustering and network analysis applied for quantitative risk assessment. | GPT-4o achieved high extraction accuracy (F1 = 0.94; precision = 0.90), revealing key risk factors and unsafe behaviors, supporting enhanced construction site safety management. | [91] |
| Identify technological opportunities to prevent occupational incidents on construction sites by analyzing incidents and patent textual data. | Applied text mining and self-organizing maps to integrate incident reports and patent documents, categorizing potential safety technologies into five groups and performing gap analysis to assess feasibility. | The study revealed actionable technology solutions across machine tool work, high-place work, vehicle-related facilities, hydraulic machines and miscellaneous tools, providing strategic guidance for enhancing workplace safety for business owners and safety managers. | [92] |
| Assess the ability of LLM to support workplace management in the radiology healthcare sector | ChatGPT-3.5, ChatGPT-4.0, Gemini and Gemini Advanced answered 31 workplace management questions; responses were scored for quality, clarity and implementability. | ChatGPT-4.0 performed best across all metrics, followed by Gemini Advanced, showing that LLMs can aid workplace management in healthcare without specialized management training. | [93] |
| Design a platform for training in construction safety. | The proposed system integrates a validated safety knowledge base, an LLM-driven scenario and feedback generator, game-based instructional elements and a user interface. | The use of personalized and contextually realistic risk scenarios facilitated student decision-making, thereby enhancing the adoption of safe practices in workplace settings. | [84] |
| Use Generative Pre-trained Transformer (GPT) models for the automated analysis of subway construction accident investigation reports, with the goal of improving the efficiency of accident identification and analysis | Developed the AIR Agent, a GPT-based system with conversation, instruction and knowledge modules and validated it on 50 subway accident reports using ablation studies. | The AIR Agent achieved 80.32% accuracy in identifying accident types and extracting key details, demonstrating its capability to standardize, structure and expedite accident investigation analysis. | [81] |
| Examine the application of ML to construction accident report analysis, identifying methodological gaps and challenges in processing textual safety data. | A systematic literature review of ML-based studies was conducted, focusing on data preprocessing, algorithm selection, testing and implementation. | Findings reveal underutilized unsupervised learning and NLP and inconsistent validation and emphasize standardized pipelines, robust preprocessing and LLM adoption to advance construction safety decision-making. | [65] |
3.3. Chemical, Mines and Other High-Risk Industrial Environments
| Objective | Methodology | Results | Reference |
|---|---|---|---|
| Analyze coal mine accident risks using LLMs and probabilistic modeling. | Use a large language model to extract risk factors from 700 coal mine accident investigation reports; apply a priori association rule mining to derive strong association rules; build a 127-node Bayesian network and conduct sensitivity and critical path analyses. | Identify multiple layers of risk factors (direct, composite, specific) and seven primary drivers mainly related to on-site safety management, execution of operational procedures and safety supervision, providing a basis for data-driven early warning and policy design. | [32] |
| Prioritize causes and types of mine accidents using a structured decision framework. | Apply the Analytic Hierarchy Process (AHP) to accident data (2011–2020) from the Indian mining industry, treating six accident types as alternatives and three criteria (human error, environmental factors, equipment faults); use expert-based pairwise comparisons implemented in R. | Show that transport machinery accidents have the highest priority, followed by ground movement and falls; human error emerges as the dominant causal factor across accident categories, guiding targeted prevention strategies in mines. | [37] |
| Integrate heterogeneous data sources for incident likelihood analysis in process industries. | Combine natural language processing-based feature extraction from CSB loss-of-containment narratives (2002–2021) via a co-occurrence network with operational parameters; perform scenario-based model verification and sensitivity analysis. | Develop a multi-source likelihood model that improves prediction of loss-of-containment events; reveal that inadequate written procedures and management/organizational failures have the highest sensitivity, supporting Safety 4.0 monitoring and control. | [36] |
| Build a dynamic, data-driven coal mine environmental safety risk assessment system. | Construct an environmental safety indicator system and threshold rules; integrate expert judgments, sensor data and reported data; harmonize heterogeneous data via fuzzy linguistic transformation and range standardization; fuse information using FAHP, CRITIC, grey clustering (GCL–RCV) and linear weighting models. | Achieve objective, real-time environmental risk assessment in coal mines; case studies demonstrate good accuracy and responsiveness, enabling identification and control of critical risks with strong industrial application potential. | [34] |
| Develop major process accident (MA) indicators supported by Industrial Internet data. | Use process safety management software linked to Industrial Internet infrastructures to define MA indicators; employ STAMP to map logical relationships between indicators and accidents; retrospectively analyze 212 accident reports with a large language model. | Produce SMART-compliant MA indicators empirically linked to accident patterns; show that the combination of STAMP and LLM-based analysis strengthens causal interpretation and practical usability of the indicator set. | [35] |
| Perform process risk assessment and fault diagnosis from safety reports using text mining. | Propose a hybrid framework that combines accident theory and prior hazard information with finite-state rule-based chunking of incident descriptions; apply an ensemble of unsupervised and semi-supervised models (clustering, logistic regression, association rules) to identify hazardous elements, chains of events and fault trees. | Identify 56 chains of events and 13 fault trees in Indian steel plant incident reports; achieve high agreement (~85%) with HSE expert assessments, demonstrating the effectiveness of chunking-based text mining for fault detection, diagnosis and accident modeling. | [101] |
| Objectively classify occurrence types in industrial accident cases to support prevention planning. | Develop and compare three AI models based on the KoBERT natural language processing architecture; implement a pipeline including sentence preprocessing, keyword replacement and morphological analysis tailored to Korean-language accident narratives. | Show that the best-performing model achieves 93.1% accuracy and allows up to three occurrence-type labels per case, reducing subjectivity and improving data quality for industrial accident prevention policies and strategies. | [26] |
| Identify and analyze chemical safety risk factors from accident reports using modern AI. | Apply text mining and an improved LDA topic model to chemical safety accident cases to extract 33 main risk factors; use association rule mining and Bayesian network modeling to reveal correlations, causal relationships and critical accident development paths; perform sensitivity analysis of key nodes. | Demonstrate that the LDA–Bayesian network approach effectively extracts keywords, uncovers causal structures and critical paths in accident development, overcoming the subjectivity and limited scalability of traditional expert-based analyses. | [33] |
| Predict adverse events by learning from experience in the chemical industry. | NLP combined with Interpretive Structural Model (ISM) in a probabilistic approach. | Identify critical factors that contribute to fire and explosion incidents, mainly management issues and lack of procedures and training. | [46] |
| Analyze and improve the understanding of flare system failures in the oil and gas industry. | Fault Tree Analysis (FTA) and Dynamic Bayesian Network (DBN) approaches. | A comprehensive and accurate assessment of flare system reliability is provided. | [94] |
| Predicting and preventing incidents in aboveground onshore oil and refined products pipeline. | Artificial Neural Networks (ANNs) use models to predict root causes and sub-causes using 108 incidents relevant attributes. | 80–92% accuracy range in predicting incident causes and sub-causes for aboveground onshore oil and refined products pipelines. | [48] |
| Reduce occupational risks associated with confined spaces work by automatically extracting and classifying contributory factors from accident reports. | BERT-BiLSTM-CRF and CNN models | Effective quantification and frequency estimation of the contributory factors contributing to risks associated with work in confined spaces | [96] |
| Improve hot work accident prevention in the chemical industry through an automated system that can classify and predict the causes, overcoming the limitations of manual analysis of unstructured accident records. | AAI and LLM models, such as the Latent Dirichlet Allocation (LDA) model for topic extraction and Convolutional Neural Networks (CNN) for cause prediction. | F1-score of 0.89 in predicting key causes of hot work accidents in the chemical industry. | [97] |
| Extracting information from free text chemical accident reports to enhance the prevention of occupational risks. | NLP and AI techniques combine word embedding and bidirectional long-short-term memory (LSTM) models with attention mechanisms. | The classification of accident causes, including unsafe acts, behaviors, equipment, material conditions and management strategies, with identification of common trends, characteristics, causes and high-frequency types of chemical accidents, had an average precision (p) of 73.1% and recall (r) of 72.5%. | [98] |
| Accident prevention in the chemical industry, using NLP to construct a knowledge graph of chemical accidents. | The NLP model is named entity recognition (NER), and it uses SoftLexicon and BERT-Transformer-CRF to structure and store accident knowledge in a Neo4j graph database. | Automatic extraction and categorization of risk factors from 290 Chinese chemical accident reports, outperforming previous models. | [99] |
| Enhance the early stages of quantitative risk analysis (QRA) to prevent occupational risks associated with hazardous substances. | Text mining and fine-tuned trained bidirectional encoder representations from transformers (BERT) models. | Identify potential accident outcomes and rank them by severity and probability, achieving mean accuracies of 97.42%, 86.44% and 94.34%, respectively. User-friendly web-based app called HALO (hazard analysis based on language processing for oil refineries). | [95] |
| Detection of anomalous conditions in accidents by mining text information from accident report documents. | AI and NLP, with text mining-based Local Outlier Factor (LOF) algorithm. | Four major types of anomaly accidents in chemical processes are identified, and risk keywords are extracted and compared to provide a comprehensive view of the anomalous conditions. | [100] |
| NLP application for unsupervised anomaly detection and efficient evaluation of chemical accident risk factors. | A Variational Autoencoder (VAE) is used for unsupervised anomaly detection in industrial accident reports. Doc2Vec is utilized as the ‘Vector Space Model’. | Quantitative risk factors are extracted from narrative-based accident reports using an outlier factor (OF) function. The six most anomalous accident reports were identified. | [102] |
3.4. Transport System
3.5. Healthcare and Assistive Services Systems
| Objective | Methodology | Results | Reference |
|---|---|---|---|
| Analyze complex narrative clinicians’ reports to prevent medical errors and enhance patients’ safety. | Convolutional and recurrent neural networks, coupled with an attention mechanism. NLP techniques to identify and categorize harm events in patient care narratives. | Improved medical error detection in large datasets, enhanced data analysis and root cause understanding and better allocation of resources to address safety incidents have led to the prevention of patients’ harm. | [116] |
| Explore potential applications of NLP methods in the analysis of critical incident reports in healthcare to enhance patient safety and quality of care. | Faceted search for intuitive report retrieval and text mining to uncover relationships between reported events. Mapping incident reports to the International Classification of Patient Safety (ICPS) to facilitate faceted searching and semantic annotation. | Requirements for automated processing include entity recognition, information categorization, event detection and temporal analysis. | [117] |
| Reduce musculoskeletal disorder (MSD) risks among home healthcare workers by leveraging a machine learning and large language model (LLM)-based AI system to predict long-term postures and deliver personalized ergonomic health recommendations. | Developed ERG-AI, a sustainable machine IA pipeline that combines multi-sensor, uncertainty-aware posture prediction with LLM-driven natural language generation to communicate individualized ergonomic insights. | Utilizing the DigitalWorker Goldicare dataset, ERG-AI demonstrated high predictive accuracy under uncertainty, low computational and environmental costs and effective generation of clear, context-specific ergonomic guidance. | [119] |
| Improve anomaly detection in electronic health records (EHRs) to enhance patient safety and data reliability. | Developed EHR-BERT, a BERT-based framework using Sequential Masked Token Prediction to learn bidirectional clinical event sequences and identify anomalies. | Outperformed existing models on large multi-domain EHR datasets, reducing false positives, improving detection accuracy and minimizing information loss. | [118] |
| Evaluate Swiss clinicians’ use, knowledge and perceptions of large language models (LLMs) and identify factors associated with their adoption. | This is an adoption/perception survey (not a model), that was distributed through Swiss medical societies, assessing frequency of LLM use, with quantitative and qualitative analysis. | 32.8% reported frequent LLM use; younger, male and research-active clinicians showed higher use and knowledge. Main benefits were administrative and analytical support, while key concerns involved ethics and output quality. | [120] |
3.6. Other Sectors
3.7. Severity, PSIF and Proactive Risk Prediction
3.8. Text Mining, Topic Modeling and Hybrid Risk-Assessment Frameworks
3.9. Vision and Vision–Language Models for PPE Compliance and Unsafe Conditions
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Zhang, F.; Fleyeh, H.; Wang, X.; Lu, M. Construction site accident analysis using text mining and natural language processing techniques. Autom. Constr. 2019, 99, 238–248. [Google Scholar] [CrossRef]
- ILO. A Call for Safer and Healthier Working Environments; International Labour Organization: Geneva, Switzerland, 2023; Available online: https://researchrepository.ilo.org/esploro/outputs/report/995343988202676 (accessed on 25 December 2025).
- Badri, A.; Boudreau-Trudel, B.; Souissi, A.S. Occupational health and safety in the industry 4.0 era: A cause for major concern? Saf. Sci. 2018, 109, 403–411. [Google Scholar] [CrossRef]
- Kim, Y.; Park, J.; Park, M. Creating a Culture of Prevention in Occupational Safety and Health Practice. Saf. Health Work 2016, 7, 89–96. [Google Scholar] [CrossRef]
- Sharma, A.; Singh, B.J. Evolution of Industrial Revolutions: A Review. Int. J. Innov. Technol. Explor. Eng. 2020, 9, 66–73. [Google Scholar] [CrossRef]
- Vinitha, K.; Ambrose Prabhu, R.; Bhaskar, R.; Hariharan, R. Review on industrial mathematics and materials at Industry 1.0 to Industry 4.0. Mater. Today Proc. 2020, 33, 3956–3960. [Google Scholar] [CrossRef]
- Zhang, C.; Yang, J. Second Industrial Revolution. In A History of Mechanical Engineering; Zhang, C., Yang, J., Eds.; Springer: Singapore, 2020; pp. 137–195. [Google Scholar] [CrossRef]
- Fonseca, L.M. Industry 4.0 and the digital society: Concepts, dimensions and envisioned benefits. Proc. Int. Conf. Bus. Excell. 2018, 12, 386–397. [Google Scholar] [CrossRef]
- Roberts, B. The Third Industrial Revolution: Implications for Planning Cities and Regions; Working Paper Urban Frontiers 1; Urban Frontiers: Brisbane, Australia, 2015. [Google Scholar] [CrossRef]
- Leesakul, N.; Oostveen, A.-M.; Eimontaite, I.; Wilson, M.L.; Hyde, R. Workplace 4.0: Exploring the Implications of Technology Adoption in Digital Manufacturing on a Sustainable Workforce. Sustainability 2022, 14, 3311. [Google Scholar] [CrossRef]
- Milea, A.; Cioca, L.-I. Work evolution and safety and health at work in Industry 4.0/Industry 5.0. MATEC Web Conf. 2024, 389, 00074. [Google Scholar] [CrossRef]
- Gomes-Miranda, L.; Gonçalvez, F. The impact of Industry 4.0 on occupational health and safety: A systematic literature review. J. Saf. Res. 2024, 90, 254–271. [Google Scholar] [CrossRef]
- Miraz, M.H.; Hasan, M.T.; Sumi, F.R.; Sarkar, S.; Hossain, M.A. Industry 5.0: The Integration of Modern Technologies. In Machine Vision for Industry 4.0; CRC Press: Boca Raton, FL, USA, 2022. [Google Scholar]
- Wang, Y.; Chen, H.; Liu, B.; Yang, M.; Long, Q. A Systematic Review on the Research Progress and Evolving Trends of Occupational Health and Safety Management: A Bibliometric Analysis of Mapping Knowledge Domains. Front. Public Health 2020, 8, 81. [Google Scholar] [CrossRef]
- Garvin, T.; Kimbleton, S. Artificial intelligence as ally in hazard analysis. Process Saf. Prog. 2021, 40, 43–49. [Google Scholar] [CrossRef]
- Howard, J.; Schulte, P. Managing workplace AI risks and the future of work. Am. J. Ind. Med. 2024, 67, 980–993. [Google Scholar] [CrossRef]
- Mollaei, N.; Fujao, C.; Rodrigues, J.; Cepeda, C.; Gamboa, H. Occupational health knowledge discovery based on association rules applied to workers’ body parts protection: A case study in the automotive industry. Comput. Methods Biomech. Biomed. Eng. 2023, 26, 1875–1888. [Google Scholar] [CrossRef]
- Pishgar, M.; Issa, S.F.; Sietsema, M.; Pratap, P.; Darabi, H. REDECA: A Novel Framework to Review Artificial Intelligence and Its Applications in Occupational Safety and Health. Int. J. Environ. Res. Public Health 2021, 18, 6705. [Google Scholar] [CrossRef]
- Westhoven, M. Requirements for AI Support in Occupational Safety Risk Analysis. Proc. Mensch. Comput. 2022, 2022, 561–565. [Google Scholar] [CrossRef]
- Yimyam, W.; Ketcham, M. Occupational Disease Risk Assessment System Using Artificial Intelligence System and Chatbot. In Proceedings of the 2022 International Conference on Cybernetics and Innovations (ICCI), Ratchaburi, Thailand, 28 February–2 March 2022; pp. 1–5. [Google Scholar] [CrossRef]
- Van Gulijk, C. El Desarrollo de una Evaluación de Riesgos Dinámica y sus Implicaciones para la Salud y Seguridad en el Trabajo; Agencia Europea para la Seguridad y la Salud en el Trabajo (EU-OSHA): Bizkaia, Spain, 2021. [Google Scholar]
- Baek, S.; Park, C.Y.; Jung, W. Automated safety risk management guidance enhanced by retrieval-augmented large language model. Autom. Constr. 2025, 176, 106255. [Google Scholar] [CrossRef]
- Bernardi, M.L.; Cimitile, M.; Panella, G.; Pecori, R.; Simoncelli, G. Automatic generation of job safety reports with explainable RAG-based LLMs. Inf. Syst. Front. 2025. [Google Scholar] [CrossRef]
- Kim, T.-Y.; Baek, S.-U.; Lim, M.-H.; Yun, B.; Paek, D.; Zoh, K.E.; Youn, K.; Lee, Y.K.; Kim, Y.; Kim, J.; et al. Occupation Classification Model Based on DistilKoBERT: Using the 5th and 6th Korean Working Condition Surveys. Ann. Occup. Environ. Med. 2024, 36, e19. [Google Scholar] [CrossRef]
- Li, N.; Kang, B.; De Bie, T. LLM4Jobs: Unsupervised Occupation Extraction and Standardization Leveraging Large Language Models. Knowl. -Based Syst. 2025, 316, 113302. [Google Scholar] [CrossRef]
- Song, J.-H.; Shin, S.-H.; Kang, S.-Y.; Won, J.-H.; Yoo, K.-H. Occurrence type classification for establishing prevention plans based on industrial accident cases using the KoBERT model. Appl. Sci. 2024, 14, 9450. [Google Scholar] [CrossRef]
- Khairuddin, M.Z.F.; Sankaranarayanan, S.; Hasikin, K.; Abd Razak, N.A.; Omar, R. Contextualizing Injury Severity from Occupational Accident Reports Using an Optimized Deep Learning Prediction Model. PeerJ Comput. Sci. 2024, 10, e1985. [Google Scholar] [CrossRef] [PubMed]
- Sarker, B.; Barman, A.; Garg, A.; Maiti, J. Natural Language Processing-Based Ensemble Technique to Predict Potential Accident Severity. Int. J. Syst. Assur. Eng. Manag. 2025, 16, 1975–1991. [Google Scholar] [CrossRef]
- Parikh, P.; Penfield, J.; Juaire, M. Automatic Identification of Incidents Involving Potential Serious Injuries and Fatalities (PSIF). Sci. Rep. 2024, 14, 8091. [Google Scholar] [CrossRef]
- Chen, Z.; Chen, H.; Imani, M.; Chen, R.; Imani, F. Vision Language Model for Interpretable and Fine-Grained Detection of Safety Compliance in Diverse Workplaces. Expert Syst. Appl. 2025, 265, 125769. [Google Scholar] [CrossRef]
- Liu, Q.; Li, F.; Ng, K.K.H.; Han, J.; Feng, S. Accident investigation via LLMs reasoning: HFACS-guided Chain-of-Thoughts enhance general aviation safety. Expert Syst. Appl. 2025, 269, 126422. [Google Scholar] [CrossRef]
- Du, G.; Chen, A. Coal Mine Accident Risk Analysis with Large Language Models and Bayesian Networks. Sustainability 2025, 17, 1896. [Google Scholar] [CrossRef]
- Zhou, Z.; Guo, J.; Huang, J. Chemical safety risk identification and analysis based on improved LDA topic model and Bayesian networks. Appl. Sci. 2025, 15, 6197. [Google Scholar] [CrossRef]
- Lu, C.; Li, S.; Xu, K.; Zhang, Y. Research on data-driven coal mine environmental safety risk assessment system. Saf. Sci. 2025, 183, 106727. [Google Scholar] [CrossRef]
- Ni, Z.; Wang, X.; Zhang, Z.; Wang, L. Development of major process accident indicators based on Industrial Internet. J. Loss Prev. Process Ind. 2024, 92, 105418. [Google Scholar] [CrossRef]
- Kamil, M.Z.; Khan, F.; Amyotte, P.; Ahmed, S. Multi-source heterogeneous data integration for incident likelihood analysis. Comput. Chem. Eng. 2024, 185, 108677. [Google Scholar] [CrossRef]
- Kar, M.B.; Aruna, M.; Kunar, B.M. An analytical hierarchy approach for studying the impact of human error, environmental factors, and equipment failure on mine accidents: A case study in India. Int. J. Syst. Assur. Eng. Manag. 2024, 15, 2163–2169. [Google Scholar] [CrossRef]
- Zhao, Y.; Diao, X.; Smidts, C. Preliminary Study of Automated Analysis of Nuclear Power Plant Event Reports Based on Natural Language Processing Techniques. 2018. Available online: https://www.researchgate.net/publication/329687258_Preliminary_Study_of_Automated_Analysis_of_Nuclear_Power_Plant_Event_Reports_Based_on_Natural_Language_Processing_Techniques (accessed on 25 December 2025).
- Yang, C.; Huang, C. Natural Language Processing (NLP) in Aviation Safety: Systematic Review of Research and Outlook into the Future. Aerospace 2023, 10, 600. [Google Scholar] [CrossRef]
- Kierszbaum, S.; Lapasset, L. Applying Distilled BERT for Question Answering on ASRS Reports. In Proceedings of the 2020 New Trends in Civil Aviation (NTCA), Prague, Czech Republic, 23–24 November 2020; pp. 33–38. [Google Scholar] [CrossRef]
- Ricketts, J.; Barry, D.; Guo, W.; Pelham, J. A Scoping Literature Review of Natural Language Processing Application to Safety Occurrence Reports. Safety 2023, 9, 22. [Google Scholar] [CrossRef]
- Khairuddin, M.Z.F.; Hasikin, K.; Abd Razak, N.A.; Lai, K.W.; Osman, M.Z.; Aslan, M.F.; Sabanci, K.; Azizan, M.M.; Satapathy, S.C.; Wu, X. Predicting occupational injury causal factors using text-based analytics: A systematic review. Front. Public Health 2022, 10, 984099. [Google Scholar] [CrossRef] [PubMed]
- Sarkar, S.; Maiti, J. Machine learning in occupational accident analysis: A review using science mapping approach with citation network analysis. Saf. Sci. 2020, 131, 104900. [Google Scholar] [CrossRef]
- Dalal, S.; Bassu, D. Deep analytics for workplace risk and disaster management. IBM J. Res. Dev. 2020, 64, 14:1–14:9. [Google Scholar] [CrossRef]
- Westhoven, M.; Jadid, A. Using Natural Language Processing to Generate Risk Assessment Checklists From Workplace Descriptions. In Proceedings of the 33rd European Safety and Reliability Conference, Southampton, UK, 3–8 September 2023; pp. 2636–2637. [Google Scholar] [CrossRef]
- Kamil, M.Z.; Khan, F.; Halim, S.Z.; Amyotte, P.; Ahmed, S. A methodical approach for knowledge-based fire and explosion accident likelihood analysis. Process Saf. Environ. Prot. 2023, 170, 339–355. [Google Scholar] [CrossRef]
- Zhao, Y.; Diao, X.; Huang, J.; Smidts, C. Automated Identification of Causal Relationships in Nuclear Power Plant Event Reports. Nucl. Technol. 2019, 205, 1021–1034. [Google Scholar] [CrossRef]
- Macêdo, J.B.; das Chagas Moura, M.; Aichele, D.; Lins, I.D. Identification of risk features using text mining and BERT-based models: Application to an oil refinery. Process Saf. Environ. Prot. 2022, 158, 382–399. [Google Scholar] [CrossRef]
- Liu, C.; Yang, S. Using text mining to establish knowledge graph from accident/incident reports in risk assessment. Expert Syst. Appl. 2022, 207, 117991. [Google Scholar] [CrossRef]
- Miyamoto, A.; Bendarkar, M.V.; Mavris, D.N. Natural Language Processing of Aviation Safety Reports to Identify Inefficient Operational Patterns. Aerospace 2022, 9, 450. [Google Scholar] [CrossRef]
- Dong, T.; Yang, Q.; Ebadi, N.; Luo, X.R.; Rad, P. Identifying Incident Causal Factors to Improve Aviation Transportation Safety: Proposing a Deep Learning Approach. J. Adv. Transp. 2021, 2021, e5540046. [Google Scholar] [CrossRef]
- Jiao, Y.; Dong, J.; Han, J.; Sun, H. Classification and Causes Identification of Chinese Civil Aviation Incident Reports. Appl. Sci. 2022, 12, 10765. [Google Scholar] [CrossRef]
- Kierszbaum, S.; Klein, T.; Lapasset, L. ASRS-CMFS vs. RoBERTa: Comparing Two Pre-Trained Language Models to Predict Anomalies in Aviation Occurrence Reports with a Low Volume of In-Domain Data Available. Aerospace 2022, 9, 591. [Google Scholar] [CrossRef]
- Madeira, T.; Melício, R.; Valério, D.; Santos, L. Machine Learning and Natural Language Processing for Prediction of Human Factors in Aviation Incident Reports. Aerospace 2021, 8, 47. [Google Scholar] [CrossRef]
- Rose, R.L.; Puranik, T.G.; Mavris, D.N. Natural Language Processing Based Method for Clustering and Analysis of Aviation Safety Narratives. Aerospace 2020, 7, 143. [Google Scholar] [CrossRef]
- Tanguy, L.; Tulechki, N.; Urieli, A.; Hermann, E.; Raynal, C. Natural language processing for aviation safety reports: From classification to interactive analysis. Comput. Ind. 2016, 78, 80–95. [Google Scholar] [CrossRef]
- de Vries, V. Classification of Aviation Safety Reports using Machine Learning. In Proceedings of the 2020 International Conference on Artificial Intelligence and Data Analytics for Air Transportation (AIDA-AT), Singapore, 3–4 February 2020; pp. 1–6. [Google Scholar] [CrossRef]
- Posse, C.; Matzke, B.; Anderson, C.; Brothers, A.; Matzke, M.; Ferryman, T. Extracting information from narratives: An application to aviation safety reports. In Proceedings of the 2005 IEEE Aerospace Conference, Big Sky, MT, USA, 5–12 March 2005; pp. 3678–3690. [Google Scholar] [CrossRef]
- Perboli, G.; Gajetti, M.; Fedorov, S.; Giudice, S.L. Natural Language Processing for the identification of Human factors in aviation accidents causes: An application to the SHEL methodology. Expert Syst. Appl. 2021, 186, 115694. [Google Scholar] [CrossRef]
- Ahadh, A.; Binish, G.V.; Srinivasan, R. Text mining of accident reports using semi-supervised keyword extraction and topic modeling. Process Saf. Environ. Prot. 2021, 155, 455–465. [Google Scholar] [CrossRef]
- Luo, Y.; Shi, H. Using lda2vec Topic Modeling to Identify Latent Topics in Aviation Safety Reports. In Proceedings of the 2019 IEEE/ACIS 18th International Conference on Computer and Information Science (ICIS), Beijing, China, 17–19 June 2019; pp. 518–523. [Google Scholar] [CrossRef]
- Kuhn, K.D. Using structural topic modeling to identify latent topics and trends in aviation incident reports. Transp. Res. Part C Emerg. Technol. 2018, 87, 105–122. [Google Scholar] [CrossRef]
- Robinson, S.D. Visual representation of safety narratives. Saf. Sci. 2016, 88, 123–128. [Google Scholar] [CrossRef]
- Marev, K.; Georgiev, K. Automated Aviation Occurrences Categorization. In Proceedings of the 2019 International Conference on Military Technologies (ICMT), Brno, Czech Republic, 30–31 May 2019; pp. 1–5. [Google Scholar] [CrossRef]
- Shayboun, M.; Kifokeris, D.; Koch, C. A review of machine learning for analysing accident reports in the construction industry. J. Inf. Technol. Constr. (ITcon) 2025, 30, 439–460. [Google Scholar] [CrossRef]
- Fan, H.; Li, H. Retrieving similar cases for alternative dispute resolution in construction accidents using text mining techniques. Autom. Constr. 2013, 34, 85–91. [Google Scholar] [CrossRef]
- Tian, D.; Li, M.; Shen, Y.; Han, S. Intelligent mining of safety hazard information from construction documents using semantic similarity and information entropy. Eng. Appl. Artif. Intell. 2023, 119, 105742. [Google Scholar] [CrossRef]
- Wang, G.; Liu, M.; Cao, D.; Tan, D. Identifying high-frequency–low-severity construction safety risks: An empirical study based on official supervision reports in Shanghai. Eng. Constr. Archit. Manag. 2021, 29, 940–960. [Google Scholar] [CrossRef]
- Baker, H.; Hallowell, M.R.; Tixier, A.J.-P. Automatically learning construction injury precursors from text. Autom. Constr. 2020, 118, 103145. [Google Scholar] [CrossRef]
- Fang, W.; Luo, H.; Xu, S.; Love, P.E.D.; Lu, Z.; Ye, C. Automated text classification of near-misses from safety reports: An improved deep learning approach. Adv. Eng. Inform. 2020, 44, 101060. [Google Scholar] [CrossRef]
- Gadekar, H.; Bugalia, N. Automatic classification of construction safety reports using semi-supervised YAKE-Guided LDA approach. Adv. Eng. Inform. 2023, 56, 101929. [Google Scholar] [CrossRef]
- Cheng, M.-Y.; Kusoemo, D.; Gosno, R.A. Text mining-based construction site accident classification using hybrid supervised machine learning. Autom. Constr. 2020, 118, 103265. [Google Scholar] [CrossRef]
- Xu, N.; Ma, L.; Liu, Q.; Wang, L.; Deng, Y. An improved text mining approach to extract safety risk factors from construction accident reports. Saf. Sci. 2021, 138, 105216. [Google Scholar] [CrossRef]
- Thompson, P.; Yates, T.; Inan, E.; Ananiadou, S. Semantic Annotation for Improved Safety in Construction Work. In Proceedings of the Twelfth Language Resources and Evaluation Conference; Calzolari, N., Béchet, F., Blache, P., Choukri, K., Cieri, C., Declerck, T., Goggi, S., Isahara, H., Maegaard, B., Mariani, J., et al., Eds.; European Language Resources Association: Paris, France, 2020; pp. 1990–1999. Available online: https://aclanthology.org/2020.lrec-1.245 (accessed on 25 December 2025).
- Shen, Q.; Wu, S.; Deng, Y.; Deng, H.; Cheng, J.C.P. BIM-Based Dynamic Construction Safety Rule Checking Using Ontology and Natural Language Processing. Buildings 2022, 12, 564. [Google Scholar] [CrossRef]
- Ben Abbes, S.; Temal, L.; Arbod, G.; Lanteri-Minet, P.-L.; Calvez, P. Combining Ontology and Natural Language Processing Methods for Prevention of Falls from Height. In Knowledge Graphs and Semantic Web; Villazón-Terrazas, B., Ortiz-Rodriguez, F., Tiwari, S., Sicilia, M.-A., Martín-Moncunill, D., Eds.; Springer International Publishing: Cham, Switzerland, 2022; pp. 47–61. [Google Scholar] [CrossRef]
- Zhong, B.; Shen, L.; Pan, X.; Lei, L. Visual attention framework for identifying semantic information from construction monitoring video. Saf. Sci. 2023, 163, 106122. [Google Scholar] [CrossRef]
- Smetana, M.; Sukharev, I.; Salles de Salles, L.; Khazanovich, L. Leveraging State-of-the-Art Large Language Models for Accident Analysis in the Highway Construction Industry. In Proceedings of the 13th International Conference on Construction Processes (ICCP), Online, 29 August 2024; pp. 810–831. Available online: https://iccp-portal.com/index.php/proceedings/article/view/134 (accessed on 25 December 2025).
- Smetana, M.; Salles de Salles, L.; Sukharev, I.; Khazanovich, L. Highway construction safety analysis using large language models. Appl. Sci. 2024, 14, 1352. [Google Scholar] [CrossRef]
- Yoo, B.; Kim, J.; Park, S.; Ahn, C.R.; Oh, T. Harnessing generative pre-trained transformers for construction accident prediction with saliency visualization. Appl. Sci. 2024, 14, 664. [Google Scholar] [CrossRef]
- Zhang, L.; Hou, Y.; Ren, F. AIR Agent—A GPT-based subway construction accident investigation report analysis chatbot. Buildings 2025, 15, 527. [Google Scholar] [CrossRef]
- Uhm, M.; Kim, J.; Ahn, S.; Jeong, H.; Kim, H. Effectiveness of retrieval augmented generation based large language models for generating construction safety information. Autom. Constr. 2024, 170, 105926. [Google Scholar] [CrossRef]
- Elizabeth, R.M.C.; Sattari, F.; Lefsrud, L.; Gue, B. Visualizing what’s missing: Using deep learning and Bow Tie diagrams to identify and visualize missing leading indicators in industrial construction. J. Saf. Res. 2025, 93, 1–11. [Google Scholar] [CrossRef]
- Naderi, H.; Shojaei, A. Large-language model (LLM)–Powered system for situated and game-based construction safety training. Expert Syst. Appl. 2025, 283, 127887. [Google Scholar] [CrossRef]
- Liu, Y.; Wang, J.; Tang, S.; Zhang, J.; Wan, J. Integrating information entropy and latent Dirichlet allocation models for analysis of safety accidents in the construction industry. Buildings 2023, 13, 1831. [Google Scholar] [CrossRef]
- Hwang, J.-M.; Won, J.-H.; Jeong, H.-J.; Shin, S.-H. Identifying critical factors and trends leading to fatal accidents in small-scale construction sites in Korea. Buildings 2023, 13, 2472. [Google Scholar] [CrossRef]
- Wu, K.; Zhang, J.; Huang, Y.; Wang, H.; Li, H.; Chen, H. Research on safety risk transfer in subway shield construction based on text mining and complex networks. Buildings 2023, 13, 2700. [Google Scholar] [CrossRef]
- Do, Q.; Le, T.; Le, C. Uncovering critical causes of highway work zone accidents using unsupervised machine learning and social network analysis. J. Constr. Eng. Manag. 2024, 150, 04024039. [Google Scholar] [CrossRef]
- Lee, W.; Lee, S. Development of a knowledge base for construction risk assessments using BERT and graph models. Buildings 2024, 14, 3359. [Google Scholar] [CrossRef]
- Shi, D.; Li, Z.; Zurada, J.; Manikas, A.; Guan, J.; Weichbroth, P. Ontology-based text convolution neural network (TextCNN) for prediction of construction accidents. Knowl. Inf. Syst. 2024, 66, 2651–2681. [Google Scholar] [CrossRef]
- Liu, Q.; Ding, Y.; Luo, X. Automated knowledge graph based risk assessment for fall from height accidents in construction. Autom. Constr. 2025, 158, 106482. [Google Scholar] [CrossRef]
- Suh, Y. Identifying safety technology opportunities to mitigate safety-related issues on construction sites. Buildings 2025, 15, 847. [Google Scholar] [CrossRef]
- Leutz-Schmidt, P.; Palm, V.; Mathy, R.M.; Grözinger, M.; Kauczor, H.-U.; Jang, H.; Sedaghat, S. Performance of large language models ChatGPT and Gemini on workplace management questions in radiology. Diagnostics 2025, 15, 497. [Google Scholar] [CrossRef]
- Kabir, S.; Taleb-Berrouane, M.; Papadopoulos, Y. Dynamic reliability assessment of flare systems by combining fault tree analysis and Bayesian networks. Energy Sources Part A Recovery Util. Environ. Eff. 2023, 45, 4305–4322. [Google Scholar] [CrossRef]
- Kumari, P.; Wang, Q.; Khan, F.; Kwon, J.S.-I. A unified causation prediction model for aboveground onshore oil and refined product pipeline incidents using artificial neural network. Chem. Eng. Res. Des. 2022, 187, 529–540. [Google Scholar] [CrossRef]
- Wang, B.; Zhao, J. Automatic frequency estimation of contributory factors for confined space accidents. Process Saf. Environ. Prot. 2022, 157, 193–207. [Google Scholar] [CrossRef]
- Xu, H.; Liu, Y.; Shu, C.-M.; Bai, M.; Motalifu, M.; He, Z.; Wu, S.; Zhou, P.; Li, B. Cause analysis of hot work accidents based on text mining and deep learning. J. Loss Prev. Process Ind. 2022, 76, 104747. [Google Scholar] [CrossRef]
- Jing, S.; Liu, X.; Gong, X.; Tang, Y.; Xiong, G.; Liu, S.; Xiang, S.; Bi, R. Correlation analysis and text classification of chemical accident cases based on word embedding. Process Saf. Environ. Prot. 2022, 158, 698–710. [Google Scholar] [CrossRef]
- Luo, X.; Feng, X.; Ji, X.; Dang, Y.; Zhou, L.; Bi, K.; Dai, Y. Extraction and analysis of risk factors from Chinese chemical accident reports. Chin. J. Chem. Eng. 2023, 61, 68–81. [Google Scholar] [CrossRef]
- Song, B.; Suh, Y. Narrative texts-based anomaly detection using accident report documents: The case of chemical process safety. J. Loss Prev. Process Ind. 2019, 57, 47–54. [Google Scholar] [CrossRef]
- Sahoo, S.; Mukane, P.; Maiti, J.; Tewari, V.K. A framework for process risk assessment incorporating prior hazard information in text mining models using chunking. Process Saf. Environ. Prot. 2024, 189, 486–504. [Google Scholar] [CrossRef]
- Rybak, N.; Hassall, M. Deep Learning Unsupervised Text-Based Detection of Anomalies in U.S. Chemical Safety and Hazard Investigation Board Reports. In Proceedings of the 2021 International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME), Mauritius, 7–8 October 2021; pp. 1–7. [Google Scholar] [CrossRef]
- Hughes, P.; Robinson, R.; Figueres-Esteban, M.; van Gulijk, C. Extracting safety information from multi-lingual accident reports using an ontology-based approach. Saf. Sci. 2019, 118, 288–297. [Google Scholar] [CrossRef]
- Valcamonico, D.; Baraldi, P.; Amigoni, F.; Zio, E. A framework based on Natural Language Processing and Machine Learning for the classification of the severity of road accidents from reports. Proc. Inst. Mech. Eng. Part O J. Risk Reliab. 2022, 238, 957–971. [Google Scholar] [CrossRef]
- Jidkov, V.; Abielmona, R.; Teske, A.; Petriu, E. Enabling Maritime Risk Assessment Using Natural Language Processing-based Deep Learning Techniques. In Proceedings of the 2020 IEEE Symposium Series on Computational Intelligence (SSCI), Canberra, ACT, Australia, 1–4 December 2020; pp. 2469–2476. [Google Scholar] [CrossRef]
- Wang, Z.; Yin, J. Risk assessment of inland waterborne transportation using data mining. Marit. Policy Manag. 2020, 47, 633–648. [Google Scholar] [CrossRef]
- Zhang, X.; Srinivasan, P.; Mahadevan, S. Sequential deep learning from NTSB reports for aviation safety prognosis. Saf. Sci. 2021, 142, 105390. [Google Scholar] [CrossRef]
- Ricketts, J.; Pelham, J.; Barry, D.; Guo, W. An NLP framework for extracting causes, consequences, and hazards from occurrence reports to validate a HAZOP study. In Proceedings of the 2022 IEEE/AIAA 41st Digital Avionics Systems Conference (DASC), Portsmouth, VA, USA, 18–22 September 2022; pp. 1–8. [Google Scholar] [CrossRef]
- Hughes, P.; Shipp, D.; Figueres-Esteban, M.; van Gulijk, C. From free-text to structured safety management: Introduction of a semi-automated classification method of railway hazard reports to elements on a bow-tie diagram. Saf. Sci. 2018, 110, 11–19. [Google Scholar] [CrossRef]
- Figueres-Esteban, M.; Hughes, P.; van Gulijk, C. Visual analytics for text-based railway incident reports. Saf. Sci. 2016, 89, 72–76. [Google Scholar] [CrossRef]
- Wu, H.; Zhong, B.; Medjdoub, B.; Xing, X.; Jiao, L. An Ontological Metro Accident Case Retrieval Using CBR and NLP. Appl. Sci. 2020, 10, 5298. [Google Scholar] [CrossRef]
- Heidarysafa, M.; Kowsari, K.; Barnes, L.; Brown, D. Analysis of Railway Accidents’ Narratives Using Deep Learning. In Proceedings of the 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), Orlando, FL, USA, 17–20 December 2018; pp. 1446–1453. [Google Scholar] [CrossRef]
- Ebrahimi, H.; Sattari, F.; Lefsrud, L.; Macciotta, R. A machine learning and data analytics approach for predicting evacuation and identifying contributing factors during hazardous materials incidents on railways. Saf. Sci. 2023, 164, 106180. [Google Scholar] [CrossRef]
- Hua, L.; Zheng, W.; Gao, S. Extraction and Analysis of Risk Factors from Chinese Railway Accident Reports. In Proceedings of the 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 27–30 October 2019; pp. 869–874. [Google Scholar] [CrossRef]
- Kim, M.S. Topics model of overwork-related deaths in Korea and the implications of SDGs’ decent work perspective. Saf. Sci. 2023, 166, 106239. [Google Scholar] [CrossRef]
- Cohan, A.; Fong, A.; Ratwani, R.M.; Goharian, N. Identifying Harm Events in Clinical Care through Medical Narratives. In Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, Boston, MA, USA, 20–23 August 2017; pp. 52–59. [Google Scholar] [CrossRef]
- Denecke, K. Automatic Analysis of Critical Incident Reports: Requirements and Use Cases. Stud. Health Technol. Inform. 2016, 223, 85–92. [Google Scholar]
- Niu, H.; Omitaomu, O.A.; Langston, M.A.; Olama, M.; Ozmen, O.; Klasky, H.B.; Laurio, A.; Ward, M.; Nebeker, J. EHR-BERT: A BERT-based model for effective anomaly detection in electronic health records. J. Biomed. Inform. 2024, 174, 104605. [Google Scholar] [CrossRef]
- Sen, S.; Gonzalez, V.; Husom, E.J.; Tverdal, S.; Tokas, S.; Tjøsvoll, S.O. ERG-AI: Enhancing occupational ergonomics with uncertainty-aware ML and LLM feedback. Appl. Intell. 2024, 54, 12128–12155. [Google Scholar] [CrossRef]
- Egli, S.B.; Apargaus, A.; Amacher, S.A.; Hunziker, S.; Bassti, S. Use, knowledge and perception of large language models in clinical practice: A cross-sectional mixed-methods survey among clinicians in Switzerland. BMJ Health Care Inform. 2025, 32, e101470. [Google Scholar] [CrossRef]
- Ganguli, R.; Miller, P.; Pothina, R. Effectiveness of Natural Language Processing Based Machine Learning in Analyzing Incident Narratives at a Mine. Minerals 2021, 11, 776. [Google Scholar] [CrossRef]
- Shekhar, H.; Agarwal, S. Automated Analysis through Natural Language Processing of DGMS Fatality Reports on Indian Non-Coal Mines. In Proceedings of the 2021 5th International Conference on Information Systems and Computer Networks (ISCON), Mathura, India, 22–23 October 2021; pp. 1–6. [Google Scholar] [CrossRef]
- Qiu, Z.; Liu, Q.; Li, X.; Zhang, J.; Zhang, Y. Construction and analysis of a coal mine accident causation network based on text mining. Process Saf. Environ. Prot. 2021, 153, 320–328. [Google Scholar] [CrossRef]

| Application | Model | Domain/Dataset | Advantages | Limitations | Years of Review | Reference |
|---|---|---|---|---|---|---|
| Aviation Safety | NLP | Analysis of aviation incident/accident reports and air traffic control communications | 1. Enhance situational awareness 2. Reduce workload 3. Improve decision-making capabilities | 1. Ambiguity in language interpretation 2. Scarcity of adequate training data 3. Lack of multilingual support | 2010–2022 | [39] |
| Aviation safety | BERT | Aviation Safety Reporting System dataset | 1. About 70% accuracy in correctly answering the posed question 2. Uncovers information not present in the dataset | 1. More questions are necessary to improve the model 2. Transparency of the model | 2011–2019 | [40] |
| Safety-critical industries | NPL | Safety occurrence reports | 1. Automatically classifies occurrence reports 2. Extract critical information 3. Allows semantic searches | 1. Limited availability of occurrence reporting databases 2. Data privacy restrictions | 2012–2022 | [41] |
| Occupational injury | NPL | Narratives from occupational injury reports | 1. Classify accident types 2. Identify causal factors 3. Predict occupational injuries | 1. Low quality and quantity of data 2. Unbalanced data distribution 3. Inconsistent terminologies | 2016–2021 | [42] |
| Occupational injury | ML | Occupational accident analysis | 1. Prediction of incident outcomes 2. Extraction of rule-based patterns 3. Prediction of injury risk 4. Prediction of injury severity | 1. Review focused on citation network analysis, with no critical comments on limitations | 1995–2019 | [43] |
| Objective | Methodology | Results | Reference |
|---|---|---|---|
| Categorize and visualize the textual narratives from safety incident reports from the Aviation Safety Reporting System (ASRS) | NLP and clustering techniques, K Means clustering and t-distributed Stochastic Neighbor Embedding (t-SNE) | 7 major categories and 23 sub-clusters of flight delay causes were identified, revealing that maintenance issues, rather than weather conditions, are the main contributors to delays | [50] |
| Analysis of voluminous aviation incident reports to prevent occupational hazards | NLP techniques: Universal Language Model Fine-Tuning (ULMFiT) and Averaged Stochastic Gradient Descent Weight-Dropped LSTM (AWD-LSTM) for unsupervised language modeling and text classification. Deep recurrent neural networks and attention-based Long Short-Term Memory (LSTM) models | High accuracy in predicting multiple primary factors, providing a better understanding of incident factors, but limited to the six most common incident categories, with rarer categories not addressed due to insufficient data | [51] |
| Classify and extract risk factors from Chinese civil aviation incident reports, which are traditionally underutilized due to their incoherence, large volume, and poor structure | Machine learning: Extreme Gradient Boosting (XGBoost) classifier, combined with Occurrence Position (OC-POS) vectorization strategy | Identification of incident causes from 25 empirically determined factors covering equipment, human, environmental and organizational domains | [52] |
| Comparison of two language models in aviation safety: pre-trained ASRS-CMFS and RoBERTa model, without domain-specific pre-training | Natural Language Understanding (NLU) and fine-tuning | The RoBERTa model’s size advantage does not outperform the ASRS-CMFS, which demonstrates greater computational efficiency. This highlights the advantage of pre-training compact models in scenarios where domain-specific data is limited | [53] |
| Prediction of human factors in aviation safety incidents, identification and classification of human factor categories in aviation incident reports | NLP for feature extraction, coupled with semi-supervised Label Spreading (LS) and supervised Support Vector Machine (SVM) techniques for data modeling. Use of TF-IDF models as an alternative to Doc2Vec (D2V), and Bayesian optimization to find near-optimal hyper-parameter combinations | The semi-supervised LS algorithm is particularly suitable for classification with fewer labels, while the supervised SVM is more reliable for larger and more uniformly labeled datasets | [54] |
| To enhance flight safety by analyzing aviation safety reports | NLP with preprocessing routines, in particular TF-IDF text representation model for document classification. Categorization and visualization of narratives through k-means clustering and t-distributed Stochastic Neighbor Embedding (t-SNE) and post-processing through metadata-based statistical analysis | Robust and repeatable framework for identifying class categories in aviation safety event narratives, capable of identifying 31 class categories for ASRS event narratives | [55] |
| Management and analysis aviation incident reports | Advanced NLP and text mining techniques, including algorithm design for active learning approaches, document content similarity methods and topic modelling using TreeTagger and Gensim library | A range of developed tools to improve access to and analysis of aviation safety data | [56] |
| Overcome the difficulties of manually reviewing over 45,000 aviation reports | Automatic text classification. Random forest algorithm for ICAO Occurrence Category | Text classification with an accuracy range of 80–93% | [57] |
| Prevention of occupational hazards in aviation safety by efficiently extracting critical information from complex narratives | Common pattern specification language and normalized template expression matching in context | Overcome previous issues in these narratives, handle variants of multi-word expressions and improve accuracy | [58] |
| Automated identification of human factors in aviation accidents | NLP techniques, Semantic Text Similarity approaches, Distributional Semantic theory, Vector Space Model (VSM), and document embeddings, integrated with the Software Hardware Environment Liveware (SHEL) accident causality model | Precision rate exceeding 86% and 30% reduction in time and cost compared to conventional methods | [59] |
| Improve the analysis of accident reports by overcoming the limitations of effective analysis of unstructured information | Automated, semi-supervised, domain-independent approach | User-defined classification topics and domain-specific literature, such as handbooks and glossaries, to autonomously identify and categorize domain-specific keywords with an average classification accuracy of 80%, rivalling traditional supervised learning methods | [60] |
| The critical issue in the analysis of aviation safety reports is the reliance on manually labeled datasets for traditional classification modelling, which has proven to be inadequate | Latent Dirichlet Allocation (LDA) topic modeling to cluster aviation safety reports into meaningful sets for subsequent analysis | Considerable reduction in dependence on aviation experts and improves in flexibility and efficiency | [61] |
| Delve into the vast repository of over a million confidential aviation safety incident reports within the Aviation Safety Reporting System (ASRS) to uncover latent structures and hidden trends | NLP and structural topic modeling, demonstrating flexibility and reduced dependence on subject matter experts | Uncover previously unreported issues, such as fuel pump, tank and landing gear problems, while underscoring the relative insignificance of smoke and fire incidents in private aircraft safety | [62] |
| Visualization of safety narratives to prevent occupational risks through the integration of NLP techniques | Latent semantic analysis (LSA) to uncover latent relationships and interpret meaning within safety narratives, followed by isometric mapping to project this information | Primary safety problems at the different phases of flight were revealed | [63] |
| Classification of aviation safety reports to avoid the time-consuming and resource-intensive process of manual categorization and classification narratives | NLP models with ULM-FiT procedures | Outperforming alternative models, increasing the F1 score from 0.484 to 0.663 | [64] |
| Evaluate the ability of LLMs to infer human errors in general aviation accidents and enhance their reasoning capabilities | The development of two specialized prompts (HFACS-CoT and HFACS-CoT+) and the integration of knowledge into the HFACS 8.0 domain were completed | Creation of a new General Aviation Accident Dataset (GAHFACS) and benchmarking using GPT-4o | [31] |
| Design and evaluate an explainable RAG-based LLM framework that can automatically generate accurate and interpretable occupational safety reports from unstructured accident records | Integration of BERT/SciBERT embeddings into an RAG pipeline; the evaluation of several LLMs; the use of the ASRS aviation dataset; the application of quantitative metrics; and the implementation of layer-wise analysis (LRP) | High-quality reports were generated with F1-scores of up to 0.909, and there was robust GLEU/METEOR performance Domain-specific SciBERT embeddings consistently outperformed general-purpose ones | [23] |
| Objective | Methodology | Results | Reference |
|---|---|---|---|
| Enhance occupational risk prevention in the transport system through the application of NLP and AI. | Text cleansing, tokenizing, tagging and clustering, followed by analysis through NLP and a graph database to facilitate the querying of incident reports. | A true positive rate of 98.5% on a dataset of 5065 incident reports from the Swiss Federal Office of Transport, written in German, French, or Italian. | [103] |
| Previous limitations in the expert interpretation of accident reports for road safety analysis have been overcome due to the voluminous nature of textual reports and the subjectivity of expert judgments. | NLP with textual report representations with Hierarchical Dirichlet Processes (HDPs) and Doc2vec, and ML-based classification by means of Artificial Neural Networks (ANNs), Decision Trees (DTs) and Random Forests (RFs), applied to a repository of road accident reports from the US National Highway Traffic Safety Administration. | Accurate automatic extraction of the critical factors influencing road accident severity from accident reports. | [104] |
| Development of a robust AI-based system capable of analyzing, categorizing and extracting relevant information from unstructured maritime data sources to assist in the prediction and prevention of maritime incidents. | DL and NLP are used to identify, classify and extract relevant maritime incident reports. NLP techniques include the bag-of-words approach, Named Entity Recognition (NER) and advanced word embeddings like Word2Vec, FastText and BERT. ML models include convolutional neural networks (CNNs), artificial neural networks (ANNs) and long short-term memory (LSTM) networks optimized using Keras Tuner for hyperparameter tuning. | Accuracy up to 98.6% for binary incident classification. Incident date extraction achieved 61.8% accuracy. | [105] |
| Assess and identify key risk factors in maritime accidents through text mining applied to accident reports. | Text mining and association rule mining using the FP-Growth algorithm. | The main problems related to maritime accidents were unveiled, including overloading, poor navigational visibility, inadequate sailor competence and insufficient government supervision of shipowners and shipping companies. Practical recommendations were made to government and regulatory bodies | [106] |
| Predict traffic accidents by learning from textual data describing event sequences. | Data labeling from the National Transportation Safety Board (NTSB) accident investigation reports and Long Short-Term Memory (LSTM) neural networks to predict adverse events. | Prototype query interface to predict and analyze traffic accidents from accident investigation reports. | [107] |
| Automatic extraction of hazards, causes and consequences from free-text occurrence reports to validate and refine safety measures for aircraft subsystems. | NLP framework with rule-based phrase matching, combined with a spaCy Named Entity Recognition (NER) model. | Improved hazard identification system capable of reducing manual intervention to accurately determine causes, consequences and hazards in HAZOP studies of aircraft transport systems. | [108] |
| Extraction of safety-related information from a large number of close call records in the GB railway industry, previously unfeasible for human analysis due to their sheer volume. | NLP is applied to the analysis of free-text hazard reports and application to accident causation models, with categorization based on specific tokens. | Semi-automated technique for classifying close call reports in the GB railway industry. | [109] |
| Extracting safety information from GB railways’ Close Call System records, which accumulate over 150,000 text-based archives that are unmanageable using traditional methods. | Visual text analysis techniques to extract safety information from GB railways’ Close Call System records. | The evaluation used 150 datasets covering incidents such as trespassing, slip/trip hazards and level-crossing issues. It showed that the method worked well with small and controlled data groups of data but not with larger datasets from different groups of people describing things in many different ways. | [110] |
| Enhance the efficiency and accuracy decision making in metro accident response. | NLP techniques to automate the annotation of accident cases to facilitate information retrieval and Case-Based Reasoning (CBR) and Rule-Based Reasoning (RBR) to efficiently determine the most appropriate actions based on existing regulations and emergency plans. | Average accuracy of 91%. | [111] |
| NLP application to the prevention of occupational risks avoiding railroad accidents in the United States. | NLP with advanced word embeddings like Word2Vec and GloVe. | Precise classification of accident causes from report narratives, with improved classification accuracy related to the increase in the number of reports analyzed. | [112] |
| Predicting the need for evacuation following railway incidents involving hazardous materials (hazmat). | NLP and co-occurrence network analysis to scrutinize railway incident descriptions and supervised machine learning models, mainly Random Forest (RF), to evaluate the impact of different variables on evacuation prediction. | Elucidation of causal relationships through detailed network mapping of causes and contributing factors to emergencies in hazardous materials (hazmat) railway incidents. | [113] |
| Analyze Chinese railway accident reports to better prevent future accidents. | NLP and text mining techniques, specifically a multichannel convolutional neural network (M-CNN) and a conditional random field (CRF) model, are used to extract critical accident risk factors from text data. | Efficient extraction and summarization of risk factors. | [114] |
| Improvement of occupational risk prevention in railway safety. | Hidden Markov model, conditional random field (CRF) algorithm, bidirectional long short-term memory (Bi-LSTM) and Bi-LSTM-CRF deep learning network for named entity recognition of the reports. Random forest (RF) algorithm to standardize entity classification. Knowledge graph (KG) for railway hazard identification and risk assessment with a visual representation of the relationships between hazards, incidents and accidents in the railway system. | The visualization and quantification of potential risk factors is needed to provide more effective railway risk prevention measures for railways. | [49] |
| Identify the main issues related to deaths caused by overwork in Korea. | Use the Big Kinds database and model with the NetMiner 4 program. It is used primarily in text network analysis. | Postal workers, civil servants and delivery drivers are at risk of dying from overwork. | [115] |
| Objective | Methodology | Results | Reference |
|---|---|---|---|
| Enhance the safety and operation of nuclear power plants by automatically analyzing event reports, using NLP to efficiently extract and identify causal relationships. | The rule-based expert system, named Causal Relationship Identification (CaRI), has been augmented with a curated set of 11 keywords and 184 rules to identify causal relationships. | CaRI system successfully captures 86% of the causal relationships within the test data, surpassing inefficient manual procedures due to the immense volume and unstructured nature of these reports. | [47] |
| Automated analysis of event reports from the nuclear power generation sector, specifically focusing on the US Nuclear Regulatory Commission Licensee Event Report database. | Manual keyword identification is followed by using Stanford CoreNLP for automated analysis and the identification of causal relationships. | 85% success rate in identifying causal relationships. | [38] |
| Automate the analysis of Mine Health and Safety Management Systems (HSMS) data. | NLP and ML methods, with 9 Random Forest (RF) models developed to classify narratives from the Mine Safety and Health Administration (MSHA) database into nine different accident types | Models dedicated to individual categories outperformed those designed for multiple categories. 96% successful automated classification, as confirmed through manual evaluation. | [121] |
| Prevention of fatal and non-fatal injuries through the automated analysis of Directorate General Mines Safety (DGMS) fatality reports for non-coal mines in Indian. | Data acquisition from annual reports, followed by TM and NLP applications with Python libraries (Pandas, NumPy, and Sci-Kit Learn) to format the data, followed by Regular expressions (RegEx) to detect patterns. Later, NLP techniques were applied, tokenization was used using the SpaCy library and part-of-speech (POS) tagging was used using Python’s NLTK library. Finally, Python’s Matplotlib for data analysis, using Seaborn libraries, along with Tableau, for visualization. | The most common accidents involve falling objects impacting workers aged between 28 and 32, specifically the ‘mazdoor’ (laborer) class. Most accidents occur between 10 AM and 2 PM. | [122] |
| Automatic identification and quantification of the contributing factors in coal mine accidents, overcoming the limitations of human analysis methods | Text mining, association rule extraction and network theory. Text mining to extract key accident causes, reduce dimensionality and classify factors within the risk model. A priori algorithm to identify associations between causes, revealing core causes and critical causal pathways. | Fifty-two root causes were identified and categorized. | [123] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Orviz-Martínez, N.; Pérez-Santín, E.; López-Sánchez, J.I. New Trends in the Use of Artificial Intelligence and Natural Language Processing for Occupational Risks Prevention. Safety 2026, 12, 7. https://doi.org/10.3390/safety12010007
Orviz-Martínez N, Pérez-Santín E, López-Sánchez JI. New Trends in the Use of Artificial Intelligence and Natural Language Processing for Occupational Risks Prevention. Safety. 2026; 12(1):7. https://doi.org/10.3390/safety12010007
Chicago/Turabian StyleOrviz-Martínez, Natalia, Efrén Pérez-Santín, and José Ignacio López-Sánchez. 2026. "New Trends in the Use of Artificial Intelligence and Natural Language Processing for Occupational Risks Prevention" Safety 12, no. 1: 7. https://doi.org/10.3390/safety12010007
APA StyleOrviz-Martínez, N., Pérez-Santín, E., & López-Sánchez, J. I. (2026). New Trends in the Use of Artificial Intelligence and Natural Language Processing for Occupational Risks Prevention. Safety, 12(1), 7. https://doi.org/10.3390/safety12010007

