1. Introduction
Effectively identifying unsafe factors in nuclear power construction operations is crucial for reducing personal injuries and mass casualty incidents, ensuring critical processes remain under control, and safeguarding both construction safety and project quality. Nuclear construction sites are characterized by extensive operational areas, dense interdisciplinary workflows, strict hierarchical segregation of critical zones, and frequent process transitions. These conditions frequently involve multiple overlapping risks, including work at height, lifting operations, temporary electrical installations, confined spaces, hot work, and intersecting transport routes. Traditional online monitoring and offline inspections primarily rely on manual supervision, spot checks, and experiential judgment. These methods suffer from incomplete coverage, delayed detection, inconsistent standards, and difficulties in quantifiable traceability, failing to meet the real-time, accurate, and closed-loop management requirements of large-scale nuclear power projects [
1,
2,
3,
4,
5]. Recent years have witnessed rapid advancements in visual intelligence, object detection, and edge computing technologies. Integrating these into nuclear power construction sites to establish risk factor identification and early warning systems—enabling automatic detection and coordinated response to personnel violations such as mobile phone use while walking, unauthorized absence from posts, failure to wear prescribed protective equipment, and unauthorized zone entry—has become a key research focus in smart construction site development [
6,
7,
8,
9,
10,
11].
Globally, safety research has evolved from accident statistics to mechanism analysis, then risk assessment, culminating in dynamic control. Early studies centered on accident causation theories and statistical regression, examining the interplay of human error, organizational deficiencies, and environmental factors. Subsequently, risk assessment frameworks incorporated methods like hierarchical analysis, fuzzy comprehensive evaluation, and Bayesian inference for hazard classification and priority ranking. In recent years, driven by digital construction and information-based supervision, research has increasingly emphasized dynamic risk management [
12]. Through the integration of on-site sensors, video surveillance, work permits, and progress schedules, continuous risk status updates and closed-loop early warning systems have been achieved. The pivotal shift in this phase involves upgrading safety management from static record-keeping to online perception systems, with a strong emphasis on traceability, quantifiability, and auditability [
13,
14,
15].
Nuclear power engineering construction differs essentially from the construction of standard commercial buildings in terms of risk characteristics, environmental constraints, and consequence severity, which makes the general construction safety model unable to meet the practical needs of nuclear power sites. First, nuclear power construction adheres to the core principle of defense in depth for nuclear safety, and the construction area is strictly divided into controlled areas, supervised areas, and non-controlled areas. Unauthorized entry into the controlled area will not only cause general safety accidents but also may lead to radiation hazards and nuclear safety violations, which are far more serious than the random entry into commercial building construction areas. Second, the key posts of nuclear power construction are highly professional (such as nuclear island hoisting, hot work in radioactive areas), and the consequences of unsafe behaviors such as leaving posts without authorization and operating in violation of regulations are irreversible—even a small human error may lead to potential nuclear safety hazards that run through the whole life cycle of the nuclear power plant, while the human error of commercial building construction can be repaired through subsequent rectification. Third, high-risk operations in nuclear power construction (e.g., confined-space operations on the nuclear island, temporary electrical installations for nuclear power equipment) are subject to mandatory nuclear safety regulations and industry standards, and the safety supervision is more stringent than that of commercial buildings. Fourth, the unsafe behavior in nuclear power construction exhibits obvious long-tail characteristics—high-risk and low-frequency behaviors (such as unauthorized operation of nuclear power special equipment) are scarce in the sample, which poses a challenge for general object detection models, leading to unstable identification. In contrast, the risk behaviors in commercial building construction are relatively few, and the sample data is sufficient, and the general model can achieve basic identification. Therefore, it is necessary to develop a professional behavior risk identification and early warning method for the unique risk characteristics of nuclear power engineering construction.
Internationally, computer vision was applied early to construction site safety oversight, with typical tasks including personnel/machinery/vehicle detection, personal protective equipment recognition, and hazard zone identification around edges and openings. This has progressively formed an engineered chain of detection, tracking, and alerting [
16]. Domestically, driven by smart construction site policies and engineering demands, rapid development has led to extensive practical applications around tasks such as safety helmets, high-visibility clothing, safety harnesses, personnel crowding, unauthorized vehicle parking, and tower crane danger radius intrusions [
17,
18]. Overall, research has expanded from object recognition to event comprehension: on one hand, multi-target tracking and trajectory analysis enable detection of events such as personnel trespassing, lingering in hazardous zones, or loitering beneath suspended loads; on the other, temporal modeling combined with pose estimation or action recognition identifies behaviors like phone use, smoking, climbing over barriers, or fatigue-indicating postures [
19]. For scenarios like nuclear power construction, where area classification and critical role constraints are stricter, integrating visual recognition with electronic fence rules has become a research priority. This approach overlays spatial constraints with identity and role restrictions to achieve consistent verification of personnel, zones, time, and events, thereby elevating alerts from detecting individuals to detecting violations [
20].
Object detection technology has evolved from two-stage methods to single-stage real-time detection, progressing towards end-to-end detection systems [
21]. The overarching trend involves enhanced feature representation, more efficient multi-scale fusion, and reduced inference latency. Research in construction site scenarios typically addresses three critical challenges: first, missed detections caused by small or densely clustered objects; second, false positives resulting from occlusions, strong/backlighting, rain/fog/dust, and complex backgrounds; third, constraints on deployment speed and computational cost. To address these, common improvement approaches include enhancing multi-scale feature pyramids and context aggregation to boost small object recall; introducing attention mechanisms to suppress background noise and highlight critical regions; employing adaptive operators such as deformable convolutions to improve fitting for pose variations and occlusion scenarios; designing more efficient feature fusion units within neck and head structures to balance speed and accuracy. Furthermore, research increasingly emphasizes optimization in training strategies and data processing, such as difficult-example mining, category imbalance reweighting, augmentation techniques, and transfer learning, to enhance cross-site and cross-process generalization capabilities [
22,
23,
24,
25,
26].
Overseas engineering practices stress integrating recognition outcomes with safety management workflows: alerts must have explicit trigger conditions, severity thresholds, responsibility assignments, and review mechanisms to prevent alert fatigue and frontline complacency. Domestic smart site platforms are progressively evolving from isolated algorithm demonstrations towards systematic development [
27]. They typically incorporate modules for video integration, model inference, alert delivery, log retention, rectification verification, and statistical reporting, while exploring integration with access control, public address systems, positioning, and work permit systems. Research consensus holds that algorithms serve merely as entry points; genuine safety performance enhancement lies in institutionalized processes [
28,
29]. In nuclear power engineering, systems must meet heightened reliability and compliance requirements, such as explainable alerts, traceable evidence, configurable critical zone policies, and robust permission and audit mechanisms.
Despite significant domestic and international research progress, common bottlenecks persist in nuclear power construction scenarios: Firstly, complex data distribution with pronounced long-tail characteristics, coupled with the scarcity of high-risk yet low-frequency unsafe behavior samples, leads to unstable model identification of critical violations. Secondly, small targets and occlusions are prevalent, particularly in conditions of dense scaffolding, stacked components, and long-distance monitoring, causing reduced detection accuracy [
30]. Third, real-time constraints and deployment limitations are pronounced, necessitating low-latency inference at the edge or under constrained computational conditions while maintaining stability. Fourth, pure visual outputs struggle to directly meet operational rules and compliance audit requirements, urgently necessitating the integration of detection results with management elements such as area classification, job responsibilities, and work permits to form interpretable, configurable alert strategies [
31,
32].
Based on this, this paper will clarify the key risk factors affecting nuclear power engineering construction projects, construct an identification system for unsafe factors in nuclear power construction, establish a dataset of identification targets for such unsafe factors, incorporate modules including DCN, GELAN, ECA, and SPPF based on the YOLOv8 algorithm, verify the accuracy of the DGEAYoLo-NPE (nuclear power engineering construction risk identification based on YoLo) model through a series of ablation experiments and comparative experiments, and design and develop a risk factor identification and early warning system for nuclear power engineering construction projects.
This study makes four key original contributions to the field of AI-empowered safety management in nuclear power engineering construction:
- (1)
A nuclear power-specific behavioral risk indicator system is constructed via enhanced text mining, which identifies 20 core unsafe factors and categorizes monitoring targets into critical position behaviors and critical area violations, addressing the inadaptability of general construction safety models to nuclear power scenarios with defense-in-depth principles and long-tail risk characteristics.
- (2)
A novel object detection model DGEAYoLo-NPE is proposed by integrating DCN, GELAN, ECA, and ASPP modules into the YOLOv8 backbone, neck, and head structures, which effectively improves the detection accuracy of small targets, occluded objects, and low-frequency high-risk behaviors in complex nuclear power construction environments.
- (3)
A high-quality real-world dataset for nuclear power construction behavioral risk is established, containing 42,800 labeled images and 32 h of continuous video from three active nuclear power construction sites in China (2023–2025), without any simulated data, filling the gap of scarce professional datasets in this field.
- (4)
An integrated behavioral risk identification and early warning system is developed, which fuses computer vision detection with nuclear power construction management elements, and supports edge device deployment, realizing closed-loop management of “detection-alarm-verification-rectification” for unsafe behaviors.
2. Methods
Based on the essential differences between nuclear power construction and commercial building construction in risk characteristics and environmental constraints, this section first adopts text-mining technology to construct a nuclear power construction risk indicator system with nuclear safety characteristics, which is different from the general construction safety indicator system. The indicator system focuses on the key unsafe factors that are highly correlated with nuclear safety (such as venturing into the danger zone, leaving key posts without authorization), and provides a clear identification target for the subsequent design of the computer vision model (DGEAYoLo-NPE), ensuring that the model is highly adapted to the nuclear power construction scene.
Utilizing text mining techniques (
Figure 1), key unsafe influencing factors were extracted from existing accident investigation reports. This formed a catalog of unsafe factors in nuclear power engineering construction projects, thereby establishing an identification system for such unsafe factors.
Based on the corpus and lexicon, the text data was segmented into word groups. Preprocessing was then performed using segmentation encoding to obtain raw feature entries. Words with high frequency but no substantive meaning were removed through stopword filtering. Building upon this foundation, the extraction of key influencing factors was undertaken, enabling the identification and statistical analysis of keywords related to unsafe construction factors in nuclear power engineering projects within accident investigation reports.
Through an in-depth text-mining analysis process, this study systematically organized and statistically analyzed various unsafe influencing factors. The final text mining statistical results identified the top 20 key influencing factor features, with specific statistical outcomes presented in
Table 1.
The analysis results, as shown in
Figure 2, indicate that keywords such as inadequate safety supervision, unauthorized operations, and unauthorized absence from posts are prominent, suggesting a strong correlation with accidents. Among these, unauthorized operations and unauthorized absence from posts are factors that can directly cause accidents. Based on past experience, elements such as safety supervision, safety training, and safety awareness are often key contributing factors to the long-standing unsafe conditions in nuclear power construction. Therefore, the extraction of accident case keywords through visual word cloud analysis reveals that strengthening supervision of nuclear power construction projects and enhancing off-duty management are crucial for ensuring safe construction practices.
In summary, unsafe factors are identified by considering both ‘critical positions’ and ‘critical areas’. Specifically, the identification of critical position factors primarily targets non-compliant behavior by personnel in key roles, including inappropriate actions such as using mobile phones during working hours or leaving posts without authorization. Such conduct not only distracts attention from work but may also trigger safety incidents, posing a serious threat to the safety and order of the construction site. To effectively monitor and rectify such unsafe behaviors, enhancing surveillance and identification of personnel in critical positions is paramount. Critical area identification, meanwhile, focuses on unauthorized access to construction sites. The deployment of intelligent safety monitoring equipment, such as electronic fencing, enables real-time monitoring and early warning of unauthorized intrusions. This effectively prevents unauthorized personnel or objects from entering hazardous zones, thereby significantly reducing the probability of accidents. Within nuclear power construction project sites, managing unauthorized personnel access to critical areas is vital for ensuring construction safety and enhancing operational efficiency. The diagram illustrating the influence of unsafe factors is shown in
Figure 3.
3. Results
3.1. Construction of a DGEAYoLo-NPE Risk Monitoring Model
This paper addresses the specific requirements of nuclear power engineering construction projects by enhancing the You Only Look Once v8 (YOLOv8) detection algorithm in four key aspects: replacing Convolution + Batch Normalization + SiLU (CBS) with Deformable Convolutional Network (DCN); substituting Improved CSPNet with two convolutions (C2f) with Generalized Lightweight Attention Network (GELAN); embedding an Efficient Channel Attention (ECA) module; and replacing Spatial Pyramid Pooling-Fast (SPPF) with Atrous Spatial Pyramid Pooling (ASPP). These algorithmic enhancements aim to further elevate the accuracy, robustness, and real-time performance of object detection.
The DGEAYoLo-NPE algorithm employs additional DCN modules within its backbone layer, enhancing the model’s flexibility. The primary advantage of DCN lies in its adaptive adjustment of convolutional operations, enabling more efficient processing of objects with varied shapes. Replacing SPPF with ASPP and integrating GELAN into the neck layer introduces a global edge-aware mechanism during feature extraction, thereby improving the capture of object edge information. The Neck layer incorporates additional ECA modules, further enhancing the network’s adaptability to diverse objects. The refined DGEAYoLo-NPE architecture undergoes multi-faceted optimization, enabling superior performance in handling complex objects, capturing fine details, and enhancing edge perception capabilities. Consequently, it achieves higher accuracy and robustness in object detection tasks. The improved DGEAYoLo-NPE network structure is illustrated in
Figure 4.
3.2. Risk Monitoring Model Performance Evaluation
The experimental training parameter settings for model performance evaluation are shown in
Table 2.
The performance metrics of object detection models primarily encompass detection accuracy and speed. Evaluation indicators for detection accuracy typically include precision, recall, average precision, and mean average precision, whilst key model performance metrics are frames per second and floating-point operations per second. Following model training, performance is assessed through three metrics: mean average precision (mAP), precision, and recall.
All model training and inference experiments were conducted on a consistent hardware platform to ensure fair performance comparison: (1) Training hardware: Intel Core i9-13900K CPU (32 cores, 64 threads), NVIDIA RTX 4090 GPU (24 GB VRAM), 128 GB DDR5 RAM, 2TBNVMe SSD; (2) Inference hardware: For FPS testing, two scenarios were considered: (a) Server-side deployment: same as training GPU (RTX 4090); (b) Edge device deployment: NVIDIA Jetson AGX Orin (32 GB module), Intel Core i7-12700H CPU.
The FPS values reported in
Table 2 were tested under the server-side deployment scenario (RTX 4090), which is consistent with the hardware configuration of mainstream nuclear power construction site monitoring centers. For edge device deployment (Jetson AGX Orin, Hunan Chuanglebo Intelligent Technology Co., Ltd., Changsha, China), the DGEAYoLo-NPE model achieves an FPS of 38.6 fps, which meets the real-time requirement for on-site video analysis (≥30 FPS), verifying its feasibility for edge deployment.
Precision (P) denotes the proportion of samples correctly predicted as positive instances relative to the total number of positive instances. It serves as a measure of both the model’s accuracy and the accuracy of the detected samples. The expression for precision is given by Equation (1).
Recall (R) denotes the proportion of predicted samples that are correctly identified among all true samples. It serves as a measure of a model’s completeness and its ability to accurately detect genuine instances. The recall expression is given by Equation (2):
where
TP denotes the number of samples correctly predicted as positive by the model,
FN represents the number of samples incorrectly predicted as negative, and
FP indicates the number of negative samples incorrectly predicted as positive. The recall value ranges between 0 and 1; the closer it is to 1, the stronger the model’s ability to identify samples from the positive category. A recall value of 1 indicates the model accurately identifies all samples within the positive category; conversely, a recall value of 0 signifies the model fails to accurately recognize all positive category samples.
Average Precision (AP) measures a neural network model’s detection performance across all categories. The expression for average precision is given by Equation (3).
Mean Average Precision (mAP) denotes the average of the average precision values across all classes, serving as a measure of the model’s overall detection performance. The expression for mean average precision is given by Equation (4).
Detection speed (Frames Per Second, FPS) serves as a crucial metric for evaluating an algorithm’s real-time performance. FPS denotes the number of image frames a model can process within one second, reflecting the algorithm’s computational speed and efficiency. The calculation method for FPS is detailed in Equation (5).
where
t (single-frame processing time) denotes the duration required for the model to complete forward inference on a single input image. The model’s structural design, number of parameters, and computational load all exert a significant influence on frames per second (
FPS). The greater the model’s complexity, the higher the number of parameters, and the more substantial the computational load, resulting in a lower
FPS.
The dataset used in this study is a real-world nuclear power construction behavioral risk dataset collected from an active nuclear power construction site, with no simulated or synthetic data included, to ensure the model’s adaptability to actual construction scenarios.
A total of 15,600 high-resolution images (1920 × 1080 pixels) and 32 h of continuous video footage (25 fps) were collected, covering typical nuclear power construction scenarios including nuclear island hoisting, confined-space operations, temporary electrical installations, hot work in radioactive areas, and high-altitude operations. The collected data also includes complex environmental conditions such as night construction, rainy weather, dense scaffolding occlusion, and long-distance monitoring, to fully reflect the actual working environment of nuclear power construction sites. After frame extraction, de-duplication, and removal of blurred/overexposed samples, 42,800 effective images were finally obtained for model training and evaluation.
Based on the nuclear power construction risk indicator system constructed in
Section 2, the detection targets are divided into two major categories (14 subcategories) of unsafe behaviors, which are highly correlated with nuclear safety: critical position unsafe behaviors (8 subcategories): non-compliant operation, unauthorized absence from posts, mobile phone use during work, fatigue operation, illegal command, unlicensed operation, incorrect use of protective equipment, and operational error; critical area violation behaviors (6 subcategories): unauthorized entry into controlled areas, venturing into danger zones, personnel crowding in restricted areas, unauthorized vehicle parking in hazardous zones, hot work in unapproved areas, and confined space entry without permission.
Labeling work was completed by a professional team consisting of three nuclear power safety engineers (with more than 5 years of on-site safety management experience) and four computer vision researchers, using the open-source Label Studio 1.23.0 tool for bounding box (Bbox) annotation and behavioral category classification, and the whole process followed a standardized three-step workflow combined with strict quality control measures to ensure annotation reliability: first, initial labeling was conducted where the team labeled samples independently, marking the Bbox of target objects and corresponding behavioral category tags; second, cross review was implemented by nuclear power safety engineers and computer vision researchers to check the labeled samples mutually and mark inconsistent labeling results for joint discussion; third, revision and confirmation were carried out to revise inconsistent samples through team discussion and finalize the annotation results. For label quality control, the Cohen’s Kappa coefficient was used to evaluate the inter-annotator consistency, with the final coefficient reaching ≥ 0.92 (indicating almost perfect consistency), and 10% of the labeled samples were randomly selected for a second review by a senior nuclear power safety engineer, with samples featuring labeling errors, ambiguous Bbox or mismatched categories being deleted or revised in a timely manner.
To avoid data leakage and verify the model’s generalization ability, the dataset was partitioned by independent construction sites with a ratio of training set: validation set: test set = 7:1:2: training set (29,960 images): collected from Site A and Site B, used for model parameter optimization; validation set (4280 images): collected from independent camera groups of Site B, used for hyperparameter tuning and overfitting monitoring; Test set (8560 images + 10,000 video frames): collected from Site C (completely unseen before training), including complex scenario samples such as night construction and rainy weather operations, used for independent performance verification.
3.3. Ablation Test
This ablation study was conducted to validate the effectiveness of the DGEAYoLo-NPE enhanced algorithm. Evaluation metrics, including Precision, Recall, and mAP-5,0 were employed to assess the DGEAYoLo-NPE algorithm. The ablation results are presented in
Table 3. The baseline model (YOLOv8s) was modified by replacing the Convolutional Backbone (CBS) with a Deformable Convolutional Network (DCN); substituting the Convolutional Layer (C2f) with a Generalized Lightweight Attention Network (GELAN); introducing an Efficient Channel Attention (ECA) module; and replacing the original Spatial Preprocessing and Filtering (SPPF) module with an Atrous Spatial Pyramid Pooling (ASPP) module. These algorithmic enhancements aim to further improve the accuracy, robustness, and real-time performance of object detection results.
As demonstrated by the ablation results in
Table 3, the introduction of the DCN module elevated Precision, Recall, and mAP-50 to 92.9%, 91.5%, and 92.1%, respectively, indicating that the DCN module marginally improved detection performance by enhancing the network’s adaptability to object deformations. The introduction of the ECA module further elevated Precision and Recall to 93.6% and 91.9%, respectively, with mAP-50 reaching 92.2%. This validates that the ECA module enhances feature extraction capabilities by improving channel attention. Simultaneously incorporating DCN, GELAN, and ECA modules further elevated Precision and mAP-50 to 93.8% and 92.7%, respectively, demonstrating that the synergistic interaction of multiple modules significantly enhances detection accuracy and contextual modeling capabilities. The simultaneous integration of DCN, GELAN, ECA, and ASPP modules elevated Precision, Recall, and mAP-50 to 94.3%, 93.7%, and 94.5%, respectively. Compared to the baseline YOLOv8s model, mAP-50 improved by 2.95 percentage points, comprehensively validating the efficacy of the DGEAYoLo-NPE algorithm.
To visually illustrate the performance of each enhancement module, this section presents the ablation study evaluation results as depicted in
Figure 5. This Figure compares Precision (%), Recall (%), and mAP-50 metrics across different improvement approaches. The bar chart illustrates Precision across various enhancement methods, showing an overall upward trend. The red dashed line denotes Recall (%), while the blue solid line indicates changes in the mAP-50 value. The Y + D + G + E + A enhancement method achieves the highest Recall while simultaneously improving mAP-50, demonstrating the DGEAYoLo-NPE algorithm’s superior performance in object detection tasks.
3.4. Comparative Experiment
This comparative experiment aims to further validate the performance of the DGEAYoLo-NPE algorithm. Three algorithms (SSD, YOLOV5, and YOLOV8) are evaluated against it across four metrics: input size, accuracy, recall rate, mAP-50, and frames per second (FPS). The performance comparison is summarized in
Table 4.
The comparative experimental results show that the proposed DGEAYoLo-NPE outperforms all compared models and achieves the highest Precision (94.3%), Recall (93.7%), and mAP-50 (94.50%), with respective increases of 2.2, 1.9, and 2.95 percentage points over the baseline YOLOv8s. Among the state-of-the-art models, RT-DETRv3-L has a relatively high Precision (93.5%) but lower Recall and mAP-50 than DGEAYoLo-NPE; YOLOv10s boasts the highest FPS (145.2) yet underperforms in accuracy metrics; YOLO-World-S shows moderate real-time performance with inferior accuracy indicators. Traditional SSDs and early YOLOv5 models exhibit significantly lower performance across all metrics by a notable margin. Though the FPS of DGEAYoLo-NPE drops by 7.2 f/s compared with YOLOv8s, it still outperforms RT-DETRv3-L and YOLO-World-S in real-time performance and maintains a high FPS level meeting the practical demands of nuclear power engineering construction. Overall, DGEAYoLo-NPE achieves an excellent balance between detection accuracy and real-time inference efficiency, making it highly suitable for high-precision, real-time behavior risk identification in actual nuclear power engineering construction scenarios.
To confirm the statistical significance of the performance improvements of the DGEAYoLo-NPE model, a two-tailed t-test was conducted between the proposed model and the baseline YOLOv8s. The test results show that for Precision (t = 8.76, p < 0.001), Recall (t = 7.32, p < 0.001), and mAP-50 (t = 9.54, p < 0.001), the differences are statistically significant at the 0.01 level. This indicates that the integration of DCN, GELAN, ECA, and ASPP modules significantly improves the model’s detection performance, rather than random fluctuations.
3.5. Analysis of Results
This paper analyses risk identification detection training results using electronic fencing as an example, as illustrated in
Figure 6. The upper section of
Figure 6 indicates three metrics during training: loss, precision, and recall. The loss functions comprise bounding box loss (box_loss), classification loss (cls_loss), and distribution loss (dfl_loss). The lower section shows loss, precision, and recall during validation, with the horizontal axis representing the number of training epochs. (epoch). Bounding box loss primarily measures the distance between predicted and ground-truth boxes, serving as a key metric for optimizing the model’s localization capability. Classification loss quantifies categorization errors for object classes. Precision denotes the proportion of samples correctly predicted as positive out of all samples predicted as positive. Recall represents the proportion of actual positive samples correctly predicted as positive.
As demonstrated by the detection training results in
Figure 6, both training and validation losses decrease rapidly, with consistent trends across the validation and training sets. This indicates the model converges well without overfitting. The classification loss value drops from 2.5 to approximately 0.5, signifying the model’s predictions for object classification progressively become more accurate. Recall gradually increases from 0.5 to around 0.8, indicating a significant improvement in the model’s detection rate for objects. Intersection over Union (IoU) serves as a common evaluation metric for object detection, quantifying the overlap between predicted and actual bounding boxes; metrics/mAP50 denotes the mean average precision at an IoU threshold of 0.5. The mean average precision increases from 0.5 to over 0.9, demonstrating the exceptional detection performance of DGEAYoLo-NPE.
In the PR curve diagram, the
X-axis represents recall, while the
Y-axis denotes precision. As recall increases, precision gradually decreases, indicating that expanding the detection scope leads to a rise in false-positive rates. The values “all classes 0.859mAP@0.5” signify that at an IoU threshold of 0.5, the model achieves an average precision of 0.859, demonstrating robust overall performance in object detection tasks. The PC curve plot displays confidence on the
X-axis, representing the model’s prediction confidence level, while the
Y-axis shows precision—the proportion of actual positive samples correctly predicted as positive at a given confidence level. The PR and PC curves for unauthorized personnel intrusion events are shown in
Figure 7. At high confidence levels exceeding 0.8, accuracy approaches 1.0, indicating that DGEAYoLo-NPE reliably predicts unauthorized personnel intrusion in critical zones at high confidence thresholds. The notation “all classes 1.00 at 0.923” indicates that at a confidence level of 0.923, the accuracy across all categories reaches 100%. This demonstrates that the DGEAYoLo-NPE algorithm achieves perfect classification under high confidence conditions.
To avoid overestimation of model performance (which may occur when using only validation set results), an independent test set (completely unseen during training) was used for final performance verification. The verification steps are as follows:
Test set preparation: 8560 images from Site C (unrelated to training/validation sites) + 10,000 frames extracted from continuous video footage, covering scenarios not included in the training set (e.g., night construction, rainy weather operations).
Verification indicators: consistent with training/validation, including Precision, Recall, mAP-50, and FPS, with additional calculation of False Positive Rate (FPR) and False Negative Rate (FNR) to evaluate practical application reliability.
Verification process: The trained DGEAYoLo-NPE model was directly deployed on the test set without any parameter adjustment. For video frames, continuous inference was performed to simulate real-time monitoring, and the average performance over 10 consecutive batches was recorded.
Verification results: The model achieved Precision = 93.1%, Recall = 92.5%, mAP-50 = 93.7%, FPR = 2.3%, FNR = 1.8%, and FPS = 100.7 f/s on the independent test set. Compared with the validation set results, the performance degradation is less than 1.5%, indicating that the model has strong generalization ability and no obvious overfitting.
4. Discussion
This paper details the development of a comprehensive nuclear power construction risk early-warning system featuring an intuitive human–machine interface. The system enables real-time monitoring, early warning, and prevention of potential safety hazards during nuclear power project construction. The creation of this hazard identification and early-warning system for nuclear power construction not only enhances safety management standards for such projects but also provides scientific and effective decision support for safety management to construction enterprises involved in nuclear power engineering.
4.1. System Functional Design
The system comprises multiple functionally independent modules, including monitoring, early warning, management, and statistical analysis, enabling effective resolution of multi-dimensional monitoring and safety management challenges at nuclear power construction sites. The overall functional design and implementation of the system are illustrated in
Figure 8.
The system administration interface comprises system management, notification and announcement management, real-time monitoring, safety training management, and early warning management. It supports issue reporting and progress tracking. The system management module enables multiple operations, including account management, parameter configuration, log recording, and personnel management.
4.2. AI Energy Forecasting
In addition to the above differences in risk adaptation and functional design, the proposed model also demonstrates rationality in the trade-off between precision and real-time performance, while aligning with the intelligent energy safety concept advocated by recent studies. Drawing on the predictive analytics methodology in Enemuo et al. [
33], this system can be extended beyond passive monitoring to proactive prevention by integrating construction schedules, personnel shift data, and equipment energy consumption records. For instance, predicting peaks in high-risk behaviors (e.g., fatigue operations during intensive construction) based on time-series data can realize the transition to preventive safety management, which complements the energy consumption optimization logic in industrial scenarios.
From an economic perspective, as highlighted in Morgoeva et al. [
34], although implementing the video analysis system involves hardware and software costs, it avoids substantial economic losses from project shutdowns and accidents. The value of the system is not only reflected in safety improvement but also in resource-optimization-driven cost savings, which is consistent with the resource-saving technology concept emphasized in energy research. In terms of data integration, referencing the multi-source data fusion method in [
4], future iterations can enrich the model with time-series data (e.g., crane operation logs) and metadata (e.g., work permit information), thereby improving the accuracy of context-dependent violation detection (e.g., verifying unlicensed operation through work permit matching). This integration of visual data with non-visual contextual information comprehensively depicts AI’s application in human and economic risk management of critical facilities.
4.3. System Applications
Based on the system functional design diagram in
Figure 8, a comprehensive and highly secure system has been designed from the perspective of each functional module. This system not only enables real-time monitoring and early warning of unsafe behaviors but also offers high security, user-friendliness, and robust administrative capabilities. The system’s functional menu encompasses twelve core modules, including ‘Personal Centre’, ‘Alert Management’, ‘Real-time Monitoring Video’, ‘Data Query’, ‘Notifications and Announcements’, ‘Personnel Management’, and ‘System Administration’, catering to users’ multi-dimensional requirements. Through its visualized interactive interface and clearly delineated functionalities, the system enables users to efficiently undertake tasks such as monitoring unsafe conduct at construction sites, managing alarm records, and conducting data analysis. This enhances the informatisation level of safety management in nuclear power engineering construction projects. Monitoring results are illustrated in
Figure 9.
In addition to the above differences in risk adaptation and functional design, the proposed model also demonstrates rationality in the trade-off between precision and real-time performance. Compared with the baseline YOLOv8s, the DGEAYoLo-NPE model’s FPS decreases by 7.2 fps (109.5 → 102.3 fps), which is fully acceptable for nuclear power construction scenarios. First, the model’s FPS is far higher than the 25 FPS real-time threshold for on-site monitoring systems, supporting smooth multi-channel video processing. Second, the FPS decrease is caused by integrating scenario-specific optimization modules, which bring a 2.95% improvement in mAP-50—critical for identifying high-risk but low-frequency behaviors in nuclear power construction. Third, edge deployment tests (NVIDIA Jetson AGX Orin) show an FPS of 38.6 f/s, meeting single-channel monitoring needs, and lightweight optimization (e.g., quantization) can further improve speed without significant precision loss. This trade-off prioritizes safety and practical applicability, aligning with the core demands of nuclear power construction safety management.
4.4. Future Deployment Optimization and Prospective Solutions
The DGEAYoLo-NPE model and its matching behavioral risk identification and early warning system have achieved preliminary on-site application in nuclear power engineering construction, with core detection performance meeting basic safety monitoring demands. However, subsequent large-scale and full-scenario deployment will face practical challenges, including complex target occlusion, extreme illumination changes, multi-camera collaborative monitoring barriers, and high false alarm risks in actual nuclear power construction environments, and we will conduct in-depth targeted research and optimization for these key problems in follow-up work.
For target occlusion caused by dense scaffolding, stacked components and cross-operation on site, we will combine algorithm optimization and engineering deployment improvement: integrate a target feature complementation module into the DGEAYoLo-NPE model to capture valid local features of occluded targets and reconstruct complete features via historical frame feature fusion, and implement multi-angle camera layout for key areas to eliminate occlusion-induced monitoring blind spots through multi-camera data fusion. Aiming at illumination changes such as strong backlight, low night light and rainy/foggy dimness, we will add an adaptive image preprocessing module to the system front end for real-time image quality improvement, expand the training dataset with illumination variation augmentation to enhance the model’s robustness, and deploy visible light and infrared thermal imaging cameras in combination to realize all-weather monitoring unaffected by natural light.
For the multi-camera collaborative monitoring demand of large-scale, zoned nuclear power construction sites, we will build a unified collaborative monitoring platform based on spatial-temporal calibration, complete camera parameter calibration, and time stamp unification, and establish a partitioned monitoring and cross-area early warning mechanism with an integrated cross-camera target tracking algorithm to realize continuous personnel and machinery tracking and accurate identification of cross-zone unsafe behaviors. In terms of false alarm management, we will construct a multi-dimensional control system combining algorithm optimization, data fusion and human–computer interaction: optimize the model’s hierarchical dynamic confidence thresholds based on behavioral risk levels, fuse visual detection results with work permits, personnel scheduling and other construction management data for cross-verification, add a false alarm feedback module to collect and retrain marked false alarm samples, and establish a closed-loop management process of false alarm reporting, verification, analysis and correction. In subsequent research, we will gradually verify these optimization strategies in actual sites and continuously iterate the DGEAYoLo-NPE model and early warning system according to application effects so as to further enhance the system’s practicality and scenario adaptability and provide more reliable intelligent technical support for nuclear power engineering construction safety management.