Each RQ is answered in the following subsections, starting with RQ1 on AI-driven tools and techniques for construction safety.
3.1. RQ1: What AI-Driven Tools and Techniques Are Utilized to Enhance Construction Safety?
The first RQ explores different AI techniques including ML, DL, CV, NLP, IoT, Multimodal (MM), Big Data (BD), Data Mining (DM), and Robotics. To systematically evaluate AI applications in construction safety, the selected studies are categorized based on the AI techniques applied. Each category is examined to determine its role in enhancing safety measures within the construction sector. A summary of the frequency of AI techniques utilized in construction safety research is presented in
Table 2. This table outlines the most applied AI technologies, emphasizing their significance in accident prevention, real-time monitoring, and compliance enforcement.
As illustrated in
Figure 4, ML and DL are the most frequently applied AI techniques in construction safety research, while CV, NLP, IoT, and robotics are expanding as important areas of innovation.
Table 3 presents the most frequently used ML models, emphasizing their roles in hazard prediction, safety compliance automation, and worker health monitoring.
ML serves a foundational role in construction safety by analyzing historical data to predict incidents and enable proactive risk management. ML techniques are widely applied to detect hazards, predict equipment failures, and monitor worker health in real-time. This enables predictive maintenance and automates safety compliance processes. Common ML models include SVM, KNN, DT, RF, and NB, which are particularly effective for classification and prediction tasks. More advanced techniques such as Logistic Regression, Bayesian Networks, XGBoost, Bagging Tree, and Stochastic Gradient Tree Boosting are employed to improve predictive accuracy and manage complex datasets. Moreover, unsupervised and semi-supervised learning methods are used to address the challenges of limited labeled data, while RL is explored for building adaptive and dynamic safety management systems.
Figure 5 shows that SVM, DT, KNN, and RF are the most used ML models, reflecting their effectiveness in classification and prediction tasks. Advanced models such as XGBoost and unsupervised learning approaches are less dominant but are gaining interest.
Table 4 demonstrates the DL models most commonly applied in safety-related tasks, emphasizing their ability to detect unsafe conditions and support automated safety monitoring.
DL techniques have advanced construction safety by enabling real-time detection of hazards and unsafe conditions through complex data analysis. DL models such as CNN are commonly used for processing visual data captured by site cameras and drones. These models recognize unsafe conditions, PPE violations, and structural anomalies. Additionally, RNN and ANN are utilized for analyzing sequential and time-series data related to worker activity and environmental changes. DNN enhance the modeling of non-linear patterns, while GNN and DBNs offer innovative approaches for understanding spatial and relational safety data. Collectively, these DL methods support the automation of safety monitoring systems and the implementation of early warning mechanisms.
Figure 6 highlights the dominance of CNNs among DL models, with RNNs and ANNs also frequently employed for hazard detection and site monitoring.
Table 5 indicates the CV models employed for visual safety assessment, emphasizing their effectiveness in real-time site surveillance, PPE detection, and behavioral analysis.
CV technologies are extensively used for visual safety inspection and monitoring on construction sites. CV-enabled systems use camera footage and drone imaging to detect PPE violations, unsafe worker behaviors, and structural risks. These systems automate the safety inspection process, reduce reliance on manual observation, and enable continuous surveillance. By analyzing visual input in real-time, CV enhances situational awareness and supports timely intervention to prevent accidents.
Table 6 explains the NLP models applied to unstructured text data, emphasizing their role in hazard identification, safety communication, and voice-based alert systems.
NLP contributes to construction safety by extracting meaningful information from unstructured text sources such as safety reports, inspection logs, and regulatory documents. NLP models help identify patterns and indicators of potential hazards. Additionally, NLP supports the development of voice-enabled assistants that provide real-time safety alerts and guidance to on-site workers, thereby improving awareness and adherence to safety protocols. LLMs such as Generative Pre-trained Transformer (GPT) 3.5 process free text accident narratives with strong contextual understanding, enabling clustering and summarization, cause extraction, and direct classification with minimal feature engineering [
50,
109,
113]. In contrast, conventional NLP relies on Term Frequency-Inverse Document Frequency (TF IDF), topic models, and embeddings combined with SVM, RF, or logistic regression, which depend on hand crafted features and often miss long range semantics [
50]. Fine-tuned GPT classifiers provide saliency explanations, and highway safety pipelines that combine embedding-based clustering with LLM summarization reveal patterns that traditional methods overlook [
26].
IoT technologies facilitate the integration of wearable devices and environmental sensors for continuous safety monitoring. IoT wearables track worker health indicators such as heart rate, fatigue, and location, while embedded sensors detect hazardous site conditions like gas leaks, temperature anomalies, or equipment malfunctions. This network of connected devices enables real-time data transmission and immediate risk detection, contributing to safer and more responsive work environments.
Data Analytics plays a crucial role in processing large datasets generated by IoT devices, sensors, and site operations. It enables the identification of safety risks, trend analysis, and the evaluation of safety measures. By analyzing real-time data, data analytics helps refine construction protocols, optimize resource allocation, and support informed decision-making to enhance safety performance.
A Multimodal (MM) model in a construction site is an intelligent system that integrates information from multiple sources such as images, videos, text, and sensor data to interpret and understand the dynamic conditions of the site. It combines visual perception with language understanding to analyze workers, machinery, materials, and safety conditions in real time. By linking visual cues with contextual descriptions, the model can identify tasks, detect potential hazards, and assess compliance with safety regulations. This integration enables comprehensive situational awareness, allowing automated monitoring of operations, recognition of unsafe behaviors, and generation of descriptive safety or progress reports. Through its ability to reason across different modalities, the MM serves as a foundation for adaptive, data-driven management of construction sites, enhancing safety, efficiency, and decision-making [
117,
118,
119,
120,
121,
122,
123,
124,
125,
126,
127,
128].
AI serves as an umbrella framework that integrates various intelligent systems, including vision, learning, and sensing technologies. AI improves safety by detecting hazards, predicting risks, and facilitating compliance using data collected from cameras, wearables, and sensors. AI-driven drones and robots also minimize human exposure by performing safety-critical operations in hazardous environments.
Furthermore, DM is employed to uncover hidden patterns in historical safety records, sensor outputs, and worker behavior logs. It supports the identification of risk factors that may lead to incidents and helps organizations design more effective safety interventions. DM improves safety by enabling data-driven decision-making and enhancing the predictive capabilities of AI systems. DM comprises analytic techniques such as association rule mining, clustering, text or topic mining, sequential pattern mining, and feature engineering that convert unstructured or heterogeneous safety data (including incident narratives, sensor logs, and video annotations) into model ready signals; AI methods then learn task-specific models (including PPE detection, proximity alerts, and risk prediction) from these signals.
In addition, BD techniques are used to manage and analyze vast quantities of safety-related information gathered from IoT networks, surveillance systems, and digital platforms. These techniques facilitate real-time risk analysis and provide construction managers with insights to prevent accidents and ensure compliance. BD contributes to the development of scalable, cloud-based safety monitoring systems that can process information from multiple sources simultaneously. In this review, BD platforms built on Hadoop and Spark are used for offline and online processing of historical and live streams to support rapid safety prediction, and AI models are trained within the same pipeline to improve accuracy and reduce latency. Accordingly, BD is reported under data infrastructure and modality (ingestion, storage, processing), and AI under techniques (model development and validation).
Lastly, Robotics is increasingly being adopted in construction safety for automating hazardous tasks such as inspection, material handling, and demolition. Robots equipped with cameras and sensors perform real-time environmental assessments, detect potential hazards, and execute safety-critical operations with high precision. This minimizes the need for human intervention in dangerous scenarios, thereby reducing injury risks and improving operational efficiency. Robots equipped with cameras and sensors conduct real-time environmental assessments, detect potential hazards, and execute safety critical operations with high precision, reducing the need for human presence in dangerous areas and improving operational efficiency. Robotics and aerial drones further enhance safety by removing personnel from hard-to-reach locations, accelerating inspection and progress monitoring, and supplying continuous visual and sensor data for AI-based analysis of behaviors and conditions such as PPE use, worker posture, and proximity to energized lines or unprotected edges. These systems can reduce manual inspection errors and provide broader, faster site coverage for timely safety decisions. At the same time, wider deployment introduces new risks and constraints, including technical instability in dynamic site environments, limited labelled datasets, and privacy concerns from pervasive imaging. Effective controls include operator training and certification, clear operating procedures and change management, and secure data pipelines with access controls and audit trails.
3.2. RQ2: Which AI Models Are Most Effective in Improving Safety Performance on Construction Sites?
A wide range of AI and ML models can be explored to enhance safety performance on construction sites, utilizing various data sources such as textual reports, imagery, sensor readings, physiological signals, and audio data.
Table 7 summarizes the datasets used, performance metrics, and the most effective AI architectures as reported across recent scholarly contributions, reviewed in this study.
One prominent trend is the application of CNNs beyond conventional image recognition tasks. For example, time series sensor data, including accelerometer, gyroscope, and barometer readings collected from construction workers, was processed using CNNs to classify physical activity with high precision, achieving an accuracy of 94.9% and an F1 score of 94.75% [
77]. This demonstrates the suitability of CNNs for behavior classification using wearable data.
While image-based datasets dominate AI research in construction safety, several studies have successfully applied ML techniques to structured datasets such as the KALIS and OSHA SIR databases. These datasets support regression tasks like injury severity prediction, where models such as DNNs and logistic regression have demonstrated strong performance. For instance, DNNs applied to the KALIS dataset achieved a low mean absolute error (MAE = 0.043) and a high correlation coefficient of 0.9936, indicating their suitability for risk modeling in tabular safety data [
97]. Similarly, logistic regression models applied to the OSHA SIR database attained an accuracy of 93.7%, further highlighting the effectiveness of traditional ML approaches in structured-data contexts.
CV applications continue to dominate in visual compliance monitoring. YOLO-based object detection models, particularly YOLOv5, trained on a hybrid dataset comprising Pictor v3, publicly available datasets, and localized construction imagery, achieved mAP@50 values of 83.1% for PPE detection and 92% for heavy equipment detection [
63]. These findings highlight YOLO’s robustness in dynamic and complex environments. Similarly, Roboflow and MakeML datasets, among others, were used to train models such as Faster R-CNN, YOLOv5x, and YOLOv3; these models consistently achieved accuracy, precision, and recall above 90%, confirming their effectiveness in real-time site surveillance and PPE compliance [
57,
60,
66,
70].
NLP models have also gained traction. OpenAI’s GPT 3.5, for instance, was applied to textual data from OSHA’s Severe Injury Reports, attaining an accuracy of 93.7% and an F1 score of 96.7% [
50]. This reflects the growing use of LLMs in extracting actionable insights from narrative safety records and injury logs, tasks that previously required manual review.
Beyond vision and text, other modalities have shown promise. AutoML frameworks were employed to classify accident types in imbalanced datasets, such as Chinese construction accident records, with performance improving from 83.6% to 84.4% after applying synthetic minority oversampling techniques (SMOTE) [
15]. SVMs were applied to physiological signals gathered from Empatica E4 wristbands; these effectively identified stress and fatigue patterns with an accuracy of 81.2% [
16]. In the oil and gas sector, SVMs outperformed ensemble models for injury type and severity prediction, while a stacked XGBoost and RF combination performed better for incident type and body part classification.
Emerging audio-based safety monitoring also illustrates the adaptability of CNNs. When applied to classify hazardous events from construction activity audio clips extracted from YouTube, CNNs achieved up to 98.52% accuracy, even in challenging acoustic backgrounds [
62]. Meanwhile, SSD MobileNet, a lightweight yet effective CV model, was trained on site-captured and web-crawled videos; it reached 95% precision and 77% recall, highlighting its potential for mobile and drone-based applications despite its lower mAP [
68].
Additional studies leveraged simulated or synthetic data for hazard prediction. For instance, finite element model (FEM) simulations provided strain data used to train an SVM, which achieved a 96% classification accuracy, supporting its use in structural risk analysis [
19]. YOLOv3 with a Darknet-53 backbone performed well on a mixed COCO and custom dataset, further affirming YOLO’s versatility [
72]. Traditional ML models such as RF and stochastic gradient tree boosting were also effectively applied to legacy datasets from industries including mining and infrastructure [
9,
44]. CV-based models are best suited for unstructured visual data tasks such as detecting PPE violations and unsafe behaviors, while tabular ML models such as SVM and RF are more effective in processing structured datasets like safety inspection logs or physiological data to predict injury type and risk severity.
Overall, the findings in
Table 7 indicate that CNN-based models dominate in image, audio, and sensor-driven applications, capitalizing on their superior spatial and temporal recognition capabilities. CNNs consistently outperform traditional ML models in image-based safety applications due to their superior spatial feature extraction capabilities. For instance, while traditional models such as SVM or RF typically achieve accuracy levels in the 70–80% range on tabular or sensor-based datasets [
9,
14], CNN-based models like YOLOv5 and Faster R-CNN have demonstrated mAP@50 values exceeding 90% on complex construction images [
63,
70]. In contrast, structured numerical data is best managed using traditional ML models such as SVM, RF, and ensemble methods like gradient boosting. The successful application of GPT 3.5 illustrates the emerging value of LLMs in interpreting unstructured text for safety insights. Ultimately, model selection should be guided by the nature of the data and the safety task at hand, underscoring the importance of aligning AI architecture with specific construction site challenges to maximize predictive accuracy and operational utility.
Recent MM safety frameworks further extend this progress by enhancing generalization and interpretability across visual and textual domains. A zero-shot system integrating Florence-2, SAM-2, and GPT-4o achieved an F1 score of 82.2%, task accuracy of 79.0%, and idle-state recognition of 93.2% on the Alberta Construction Image Dataset and YouTube clips [
128]. The Clip2Safety framework, combining BLIP2-OPT-2.7B, YOLO-World, CLIP, and GPT-4o, reported an overall accuracy of 77.2% and an AUC of 0.76 across multiple datasets [
122]. Similarly, a zero-shot CLIP model applied to OSHA prevention images and hazard descriptions attained 70% accuracy in human-verified labeling [
119]. Collectively, these studies indicate a transition toward explainable, cross-modal, and context-aware AI capable of integrating visual and linguistic cues for proactive safety monitoring in dynamic construction environments.
The dominance of models such as CNNs and YOLO reflects the field’s heavy emphasis on visual data sources, including camera and drone imagery. This trend is driven by the relative ease of collecting labeled images compared to structured logs or unstructured text. It also indicates that many research efforts prioritize observable hazards (e.g., PPE violations, proximity detection) over less tangible factors like human intent or systemic risks. Meanwhile, textual data (e.g., reports, near-misses) and audio-based sensing remain underutilized, suggesting methodological gaps. These patterns reveal both a reliance on mature, well-supported CV models and a need to explore underrepresented modalities using NLP, LLMs, or MM architectures.
To support practical adoption, this paper proposes a straight forward decision-making framework that aligns AI model selection with data type and task complexity. For tasks involving visual data such as PPE detection or proximity monitoring, DL models, especially CNNs and object detection frameworks like YOLO, are most appropriate due to their superior spatial recognition. When working with structured tabular data (e.g., injury logs, sensor outputs), traditional ML models such as RF, SVM, or logistic regression are more effective and easier to interpret. For unstructured textual data (e.g., incident reports), NLP techniques like BERT or GPT-based models are suitable. Practitioners with limited computational resources may benefit from using ensemble models like XGBoost, which offer strong performance with relatively lower overhead. Ultimately, the choice of method should balance the nature of the input data, real-time processing needs, interpretability, and available expertise.
Despite promising performance across tasks, the majority of AI applications in construction safety remain in the experimental or pilot phase. Moving toward real-world implementation requires addressing several infrastructural, data-related, and human-factor challenges. On the technical side, site-specific variability and lack of standardized data pipelines hinder model generalization and scalability. Data-related issues include inconsistent labeling practices, privacy constraints, and limited access to large, diverse, and high-quality multimodal datasets. From a human-centered perspective, frontline personnel may lack the training or trust to interact with AI systems effectively. Resistance can stem from concerns about job displacement, opaque decision logic, or additional cognitive burden. Overcoming these challenges will require not only technical refinement, but also organizational readiness, stakeholder engagement, and regulatory guidance tailored to the unique dynamics of construction environments.
3.3. RQ3: What Are the Key Challenges of Integrating AI Technologies into Construction Safety?
Adoption of AI technologies in the construction sector remains limited due to a variety of practical, technical, and organizational challenges. This section identifies and categorizes the key barriers impeding successful AI implementation in construction safety.
Table 8 outlines the primary challenges associated with the integration of AI technologies into construction safety, highlighting common issues reported across current literature.
A primary technical limitation concerns the quality and availability of data. Construction environments often produce incomplete, noisy, imbalanced, or unstructured datasets, which diminish the accuracy and generalizability of AI models [
9,
14,
15,
22,
47,
61]. In parallel, real-time monitoring, particularly in applications involving CV, poses computational challenges. These systems require high processing power to analyze live video feeds and are susceptible to performance degradation under adverse environmental conditions such as low light or occlusion [
18,
40,
58,
60,
65,
70,
71,
72,
75].
Integration with existing workflows remains a barrier, since AI tools often misalign with established safety protocols and site practices, limiting adoption. This misalignment can result in low adoption rates or active resistance from field personnel unfamiliar with or skeptical of digital technologies [
9,
17,
57,
67,
75,
98,
102]. This is further compounded by the lack of model interpretability, especially with DL techniques, which often function as “black boxes.” Without transparent decision-making processes, safety managers may be reluctant to trust AI-generated insights [
14,
20,
50,
66,
71,
74].
Domain adaptability is also limited; AI models trained in specific regional or environmental contexts (e.g., UAE or Hong Kong) often struggle to perform reliably in different geographic or regulatory settings [
57,
60]. Performance can decline with shifts in appearance and environment (PPE styles, illumination, humidity or dust that cause blur), differences in sensors and cameras, and jurisdiction specific task definitions. For instance, a UAE protocol trains a YOLO-based model for working at height compliance and validates across RGB, grayscale, high brightness, dust, and blur, revealing condition-sensitive performance and the need for local augmentation and fine tuning. By contrast, a Hong Kong study uses fixed cameras for detection, tracking, and hazard status; because it depends on site layout and camera geometry, thresholds and calibration need region specific adjustment. Ethical concerns, particularly regarding privacy, arise when AI systems involve continuous worker surveillance via wearables or site cameras. These practices introduce concerns about data governance, informed consent, and workplace monitoring ethics [
16,
58,
60,
73,
101,
108].
Another frequently cited limitation is the shortage of skilled personnel. The effective implementation and maintenance of AI solutions in construction require data scientists, software engineers, and technicians roles that are in short supply, particularly in small and medium-sized enterprises (SMEs) [
9,
50]. Compounding this, the accuracy of AI detection systems remains an issue, with false positives and negatives leading to misplaced trust or neglect of real hazards [
60,
63,
65,
69,
71].
Furthermore, reliance on hardware systems such as wearables and sensors introduces maintenance, durability, and user compliance challenges [
16,
57,
68,
77,
98,
101]. From a behavioral modeling perspective, current AI tools are limited in their ability to assess human psychological factors, such as stress, fatigue, or risk perception elements critical to proactive safety management [
16,
40,
72,
77].
The scalability and generalizability of AI systems also remain unresolved. Models developed for one site often require substantial retraining to perform effectively elsewhere, limiting the economic feasibility of AI deployment across multiple projects [
49,
64,
66,
69,
73,
75,
76]. In addition, the burden of manual annotation and dataset preparation poses a resource bottleneck. Labeling data necessary for supervised learning algorithms is time-consuming and expensive in dynamic construction settings [
19,
20,
22,
61,
64,
74].
Finally, there is a widely acknowledged gap in AI-capable workforce availability. Many firms lack the in-house expertise needed to deploy, fine-tune, and manage AI tools, especially those operating in resource-constrained environments [
61,
64,
67,
74,
75,
109]. These workforce limitations hinder both initial AI adoption and long-term sustainability. In addition, prompt sensitivity and contextual bias remain critical barriers in multimodal AI, as model outputs often fluctuate with changes in question phrasing, visual cropping, or environmental context. Moreover, domain-specific knowledge gaps persist because general-purpose VLMs such as CLIP and Florence-2 lack construction-oriented semantics, limiting their ability to reason about complex, context-dependent hazards [
119,
122,
128].
Together, these findings illustrate the multifaceted nature of the challenges facing AI integration in construction safety and highlight the need for interdisciplinary solutions that address technical, social, and organizational dimensions simultaneously.
3.4. RQ4: What Are the Future Directions and Opportunities for AI in Construction Safety?
The construction industry is increasingly adopting AI to transform safety management through enhanced risk prediction, real-time monitoring, and data-driven decision-making. Advanced techniques such CV, DL, and NLP are being applied to address critical challenges, including hazard identification, PPE compliance, and accident prediction. Ongoing research continues to expand the scope of AI applications, contributing to safer and more efficient construction environments.
Table 9 outlines the applications of AI in predictive analytics for construction safety, including accident forecasting, injury prediction, fall risk analysis, and safety consequence forecasting. It highlights future directions such as BIM integration, real-time dashboards, AutoML, and explainable AI, along with relevant research publications.
Table 10 presents the applications of CV in construction safety management, such as PPE compliance detection, site hazard recognition, UAV-based monitoring, and Visual Question Answering (VQA). Future enhancements include 3D detection, integration with BIM and drones, attention mechanisms, and real-time monitoring systems.
Table 11 summarizes NLP-based approaches for improving construction safety, focusing on hazard classification, incident report analysis, OSHA regulation automation, and knowledge graph creation. Future opportunities include multilingual NLP, mobile deployment, LLM-based tools (e.g., ChatGPT), and integration with BIM and XR platforms.
Table 12 details integrated and multimodal AI approaches in construction safety, including AI-augmented training, VR-based eye tracking, audio-based hazard detection, physiological monitoring with wearables, and sensor-based monitoring systems. It also covers hybrid techniques like vision-rule semantic matching and lightweight ML models for mobile PPE detection.
AI is being applied to forecast accidents using ML models trained on inspection and incident data. Techniques such as NLP, fuzzy logic, and AutoML are being integrated with BIM systems to create real-time dashboards and risk alerts. For instance, AutoML and RF models are used to predict the severity of safety outcomes, while fuzzy logic and unsupervised ML enhance excavation and fall hazard analysis [
9,
14,
15,
17,
38,
44,
47,
49,
61,
98,
112]. CV has become pivotal in monitoring on-site activities. Applications range from detecting worker-equipment interactions using convolutional and RNN (CNN, LSTM) to real-time monitoring of PPE compliance via YOLO and other object detection algorithms. Future directions involve integrating these models with drone surveillance, 3D mapping, and mobile alert systems [
57,
60,
63,
66,
68,
69,
70,
71,
73,
75,
83,
84,
86,
92].
NLP is utilized for semantic classification of safety reports, near-miss incident analysis, and automated OSHA regulation compliance. Models such as BERT and TF-IDF are enhancing multilingual classification, root cause detection, and dashboard integration. Knowledge graphs generated from NLP outputs offer risk detection and are being linked to BIM/XR environments [
22,
23,
25,
52,
61,
64,
67,
74,
79,
87,
98,
111,
112]. LLMs such as ChatGPT, are being explored for narrative safety data analysis and multilingual reasoning. VQA models combining transformers with AR/XR interfaces are being developed for enhanced inspection and safety training capabilities [
16,
18,
50,
62,
65,
71,
101,
103,
108,
113]
Recent MM AI frameworks in construction safety employ zero-shot and interpretable Vision-Language Models (VLMs) for hazard and PPE detection. They enable real-time, explainable, and cross-modal monitoring by integrating visual, textual, and contextual cues for proactive hazard prevention [
119,
122,
128]. AI models analyze audio signals to detect high-risk events such as collisions in noisy construction environments. Wearable sensor data and physiological monitoring systems assess fatigue and stress levels in real-time, providing adaptive safety responses. Integrated systems are merging IoT, ML, and photogrammetry to support BIM-linked hazard prediction [
16,
18,
62,
65,
72,
102]. Hybrid approaches combine rule-based reasoning with vision models to match observed site behaviors with formal safety rules. Lightweight models such as HOG and CHT are optimized for mobile applications, ensuring real-time helmet and PPE compliance monitoring on resource-constrained devices [
21,
68,
75,
78,
82,
88].
AI is increasingly supporting adaptive safety training modules tailored to individual risk levels. Virtual AI environments simulate rare or complex hazard scenarios for DL model training. Eye-tracking integrated with VR facilitates immersive training by analyzing user attention and perception of hazards [
58,
100,
104,
109,
113]. During the CHPtD phase, LLMs assist in identifying potential risks and safety clashes before construction begins. These tools support early intervention and safer design practices [
108].
Table 13 shows this study’s proposed conceptual framework that integrates artificial intelligence domains with construction safety workflows. It illustrates how different AI techniques support each stage of the safety process, from data acquisition and hazard detection to risk prediction, information extraction, MM reasoning, and continuous feedback. The table highlights representative models, datasets, and applications that demonstrate how AI contributes to real-time monitoring, contextual awareness, and adaptive learning within construction safety management.
To enhance trust and adoption among safety personnel, future research should prioritize the development of explainable and human-centered AI systems. This includes incorporating interpretable outputs, visual explanations, and customizable dashboards that align with how decisions are made on-site. In parallel, the creation of user-friendly GUI-based tools that embed ensemble and DL models can empower practitioners without technical backgrounds to interact with AI systems more intuitively. Together, these efforts can bridge the gap between complex model outputs and practical, actionable insights for improving construction safety.