1. Introduction
With the rapid development of smart grids, underground power cables have become increasingly important in urban power distribution systems. Compared with overhead lines, underground cables can reduce visual impact and improve land-use efficiency in urban areas [
1,
2]. Previous studies have also compared underground and overhead power lines in terms of cost, reliability, and environmental adaptability [
3,
4,
5]. Among different underground cable laying methods, duct laying, as shown in
Figure 1a, is widely used in urban distribution networks because protective pipes and cable manholes provide regular installation channels and facilitate cable traction, splicing, and inspection [
6,
7]. However, the enclosed and narrow structure of underground cable ducts also makes fault inspection difficult. As shown in
Figure 1b, the remaining space inside a small-diameter duct is limited by the duct size, cable size, and cable position, which constrains the installation and movement of robotic inspection equipment.
Figure 2 further illustrates the developed pipe-inspection robot and its positional relationship with the cable duct under such geometric constraints. In the proposed robotic inspection configuration, the thermal imaging module is mounted on the front part of the pipe-inspection robot and oriented toward the exposed cable surface. The limited residual clearance between the cable and duct wall constrains the sensor size, viewing angle, and installation height. Therefore, the detector must tolerate short imaging distances, partial cable occlusion, and viewpoint variation caused by robot motion. The onboard edge device performs thermal-image acquisition, inference, and fault localization during slow robot movement inside the duct. Traditional manual inspection may involve restricted access, poor visibility, oxygen deficiency, toxic gas leakage, traffic disturbance, and possible service interruption, which limits the frequency and coverage of cable condition monitoring [
8].
Conventional underground cable diagnosis mainly relies on electrical-signal-based methods. Traveling-wave and impedance-based methods are commonly used for cable fault location [
9,
10,
11], while partial discharge detection and sheath-current monitoring are used for insulation-condition assessment and online monitoring [
12,
13]. These techniques are effective for line-level fault localization and condition assessment, but they generally provide signal-level or position-level diagnostic results rather than image-level localization of defect morphology inside cable ducts. In particular, early-stage defects such as insulation degradation, sheath damage, local aging, and structural damage may produce weak or localized abnormal responses that are difficult to inspect visually or locate precisely using conventional methods alone.
Infrared thermography has been widely applied in electrical equipment diagnosis because many faults generate abnormal heat before severe failure occurs. It has been used for infrared image-based diagnosis of power equipment [
14], high-voltage cable accessories [
15], and broader thermal-vision-based fault diagnosis in power systems [
16]. Compared with visible-light imaging, thermal imaging is less dependent on illumination and is suitable for dark and enclosed inspection environments. Robot-mounted sensing has been widely explored for infrastructure and industrial inspection tasks [
17,
18]. For underground cable environments, inspection robots equipped with visual, infrared, gas, LiDAR, or navigation sensors have been developed to reduce manual workload and improve inspection safety [
19,
20,
21,
22]. However, thermographic fault detection for underground small-diameter duct cables remains insufficiently studied. In such scenarios, the imaging distance is short, the viewing angle is unstable, the background is narrow and repetitive, and the onboard computing resources of inspection robots are limited.
Deep learning-based object detection has achieved strong performance in industrial defect detection [
23]. The YOLO series is especially attractive for robotic inspection because of its high inference speed and deployment convenience [
24,
25]. YOLO26, as a recent edge-oriented YOLO variant, introduces end-to-end inference and deployment-oriented architectural improvements, providing a strong baseline for real-time defect detection [
26]. Nevertheless, directly applying YOLO26 to thermographic duct cable inspection may still introduce unnecessary computational cost, especially when deployed on embedded edge devices. Moreover, thermal fault regions in underground duct cables usually present spatially continuous medium- or large-scale abnormal temperature fields, whereas tiny isolated hotspots may be caused by debris, wall reflection, sensor noise, or background interference. Therefore, a task-specific lightweight detector is required.
Another challenge is the lack of large-scale real thermographic datasets for underground duct cable faults. Real fault samples are difficult to collect because in-service cable ducts are subject to strict access restrictions, cable faults occur with low probability, and different defect types are difficult to reproduce under controlled field conditions. To address these challenges, this study constructs a Cable-Thermo dataset using ANSYS-based thermoelectric coupling simulation and proposes a lightweight YOLO26-Thermo framework for thermographic fault detection of underground duct cables.
The main contributions of this study are summarized as follows:
A simulation-based Cable-Thermo dataset is constructed for underground duct cable thermographic fault detection, covering four representative defect categories: hollow-type damage, conductor burnout, sheath damage, and severe damage.
A YOLO26-Thermo framework is proposed by introducing the CDA and SimSPPF modules to improve the balance between detection accuracy and computational efficiency.
A deployment-oriented variant, YOLO26-Thermo-H, is developed by removing the small-scale detection branch according to the thermal-field characteristics of duct cable faults, reducing computational complexity while maintaining high detection accuracy.
The proposed models are evaluated on both an RTX 4060 workstation and a Jetson Orin NX edge platform, demonstrating the feasibility of real-time thermographic inspection for robot-mounted edge deployment.
2. Related Work
2.1. Conventional Fault Detection and Inspection of Underground Cables
Underground power cables are critical components of urban power distribution systems. Compared with overhead lines, they provide better environmental adaptability and reduced exposure to external disturbances, but their buried or ducted installation environment makes fault detection and maintenance more difficult [
2,
3,
4,
5]. Conventional underground cable diagnosis mainly depends on electrical-signal-based methods, including impedance measurement, traveling-wave analysis, partial discharge detection, sheath current monitoring, and distributed temperature sensing [
9,
10,
11,
12,
13]. These methods have been widely used for fault localization and condition monitoring.
However, most conventional methods are designed for post-fault localization, global condition assessment, or offline testing rather than image-level localization of visible or thermal defect regions inside cable ducts. Traveling-wave and impedance-based methods can estimate the fault position along the cable line, but they usually cannot provide two-dimensional defect morphology. Partial discharge and distributed sensing methods can reflect abnormal electrical or thermal trends, but their outputs are generally signal-level or point/line-level measurements. Therefore, non-contact image-based inspection is still needed for robot-assisted detection in confined duct environments.
2.2. Infrared Thermography for Power Cable and Electrical Equipment Diagnosis
Infrared thermography has been widely applied in electrical equipment diagnosis because many faults generate abnormal heat before severe failure occurs. Thermal imaging has been used for transformers, insulators, switchgear, cable accessories, high-voltage cable joints, and other power equipment [
14,
15,
16]. Existing studies show that infrared images can provide useful diagnostic information for overheating detection, condition assessment, and abnormal-region localization.
Nevertheless, most thermographic studies focus on exposed equipment, cable joints, cable accessories, substations, or tunnel-scale inspection environments [
14,
15,
16]. These scenarios are substantially different from small-diameter underground cable ducts, where the camera view is constrained by duct geometry, robot motion, cable occlusion, wall reflection, dust, water vapor, and limited sensor installation space. In addition, publicly available thermographic datasets specifically designed for underground duct cable defects remain scarce. Therefore, thermographic object detection for underground duct cable inspection still requires further investigation.
2.3. Robot-Based Inspection in Underground Cable Environments
Robot-based inspection, as illustrated by the developed platform in
Figure 2 and the inspection schematic in
Figure 3, has become an effective way to reduce manual workload and improve safety in underground cable maintenance. Existing studies have developed cable trench and tunnel inspection robots with different system emphases. Yuan et al. investigated the modeling and simulation of a pipe-arranged cable inspection robot [
19]. Jia et al. developed underground cable trench inspection robots based on SLAM and path planning [
20,
21]. Xia et al. further explored unmanned inspection systems for urban high-voltage cable tunnels using 3D SLAM and digital-twin techniques [
22].
However, most existing robot-inspection studies emphasize platform design, navigation, mapping, or system integration, while relatively few works jointly consider thermographic defect detection, lightweight neural network design, and edge deployment validation for small-diameter duct inspection. For practical duct inspection robots, the detection model must satisfy strict requirements on latency, memory usage, and power consumption. This motivates the development of a lightweight thermal defect detector suitable for embedded robotic platforms. The robustness of the thermal imaging module against electromagnetic interference is further discussed in
Section 4.4.
2.4. Deep Learning-Based Industrial Defect Detection
Deep learning has been widely used in industrial defect detection because of its strong feature extraction and end-to-end learning capability [
23]. Object detection models are particularly suitable for defect localization, as they can simultaneously predict defect categories and bounding-box positions. Among them, YOLO-based detectors are widely adopted in real-time industrial inspection because of their high inference speed and convenient deployment pipeline [
24,
25].
Transformer-based and hybrid CNN–Transformer detectors have also shown promising localization performance in industrial defect detection [
27]. However, these models often require higher computational cost and memory consumption than lightweight CNN-based detectors, which may limit their use on embedded robotic platforms. For underground duct cable inspection, detection accuracy alone is insufficient; model size, computational complexity, inference latency, memory usage, and power consumption must also be considered. Therefore, a task-specific lightweight detector is more suitable than directly applying a generic high-complexity model.
2.5. Research Gaps
In summary, existing studies have made progress in underground cable fault localization, infrared diagnosis of electrical equipment, robot-based cable inspection, and deep learning-based industrial defect detection. However, three gaps remain. First, conventional electrical diagnostic methods mainly provide signal-level or line-level information and are not designed for image-level localization of thermal defect regions inside cable ducts [
9,
10,
11,
12,
13]. Second, thermographic fault detection for underground small-diameter duct cables remains insufficiently explored, and public datasets for this specific scenario are limited [
14,
15,
16]. Third, existing underground cable robot studies rarely combine thermal defect detection, lightweight model design, and edge deployment validation in a unified framework [
19,
20,
21,
22].
To address these gaps, this study constructs a simulation-based Cable-Thermo dataset and proposes a lightweight YOLO26-Thermo framework for thermographic fault detection of underground duct cables. The proposed method aims to balance detection accuracy, computational complexity, and edge deployment efficiency, providing an algorithmic foundation for real-time robot-based inspection.
3. Materials and Methods
3.1. YOLO26 Overview
YOLO26 (2025) is an edge-emphasized redesign that standardizes end-to-end simplicity and output robustness [
26,
28]. In addition to end-to-end inference and DFL removal, YOLO26 also incorporates training- and deployment-oriented optimization strategies such as ProgLoss, STAL, and MuSGD to improve convergence stability, localization robustness, and edge-oriented detection performance. Since these mechanisms are not the focus of this study, this section mainly introduces the architectural components directly related to the proposed YOLO26-Thermo modifications. The structure of YOLO26 is basically consistent with that of YOLO11 (
Figure 4), with improvements only in the following aspects.
3.1.1. New Core Parameter Configurations Are Introduced
The parameter end2end: True is adopted to enable the native end-to-end inference mode, which can directly output detection results without NMS post-processing, thus simplifying the model export and deployment process. The parameter reg_max: 1 is added, and combined with the removal of the DFL layer, it realizes parametric control over the bounding box regression process, which alleviates the limitations of YOLO11 in positioning accuracy for small objects and objects with complex shapes.
3.1.2. Improvements to the Backbone Structure
The SPPF module, which is critical for feature extraction, is enhanced in terms of parameters. The SPPF module in YOLO11, which only includes the number of output channels and pooling kernel size (SPPF, (1024, 5)), is extended to SPPF, (1024, 5, 3, True). Multiple rounds of pooling are added to capture features of different resolutions hierarchically. Meanwhile, shortcut residual connections are enabled to mitigate the problem of gradient vanishing and enhance the integrity of feature transmission (
Figure 5).
3.1.3. Improvements to the C3k2 Module in the Head Structure
In the middle layers, the C3k2 modules of YOLO11 without feature enhancement enabled (C3k2, (512, False) and C3k2, (256, False)) are updated to C3k2, (512, True) and C3k2, (256, True), activating the feature enhancement mechanism to strengthen the discriminability of key features. In the last layer, the YOLO11 configuration (1024, True) is extended to (1024, True, 0.5, True); an expansion ratio is added to balance the model complexity and efficiency, and the attention mechanism is enabled simultaneously to improve the detection sensitivity for small targets and low-contrast targets.
3.1.4. Optimization of Network Depth and Complexity
Through the parameter expansion and addition of core modules, the overall network depth and diverse parameter configurations are increased. The feature processing capability is intensively enhanced in key modules such as SPPF and C3k2. Meanwhile, the deployment pipeline is simplified via end-to-end parameters. On the premise of maintaining adaptability for edge deployment, the performance shortcomings of YOLO11 in complex scenarios are effectively compensated.
3.1.5. Upgrade of Feature Fusion Strategy
The shallow feature fusion mode of YOLO11, which mainly relies on simple channel concatenation and convolution, is abandoned. A large number of True function flags are adopted in the head network to activate cross-channel feature interaction in the C3k2 module and residual feature transmission in the SPPF module. Combined with the expansion ratio and attention mechanism of the last-layer C3k2, a multi-level and multi-dimensional feature fusion mechanism is constructed to reduce information loss, strengthen the retention of critical target features, and improve detection performance for complex scenes and cross-scale objects.
3.2. Model Improvement
To better adapt to the defect recognition of thermography for underground duct cables, we optimize the model size and computational complexity while sacrificing as little detection accuracy (mAP) as possible, enabling it to achieve high-precision real-time detection on edge devices.
3.2.1. Lightweight Upgrade of Backbone Network
To realize a lightweight model for adapting to edge hardware such as embedded robots and portable detection terminals, we first replace the native C3k2 module of YOLO26 with the C2f module (
Figure 6).
Before presenting the feature-fusion process, the notations are defined as follows: is the batch size, is the number of input channels, , are the spatial dimensions, is the number of hidden channels ( is the expansion ratio, default 0.5), is the 1 × 1 convolution operation, is the final number of output channels, and is the depthwise separable convolution module.
The C2f module first splits the input features in the channel dimension to obtain two feature subsets
, then repeatedly applies the standard Bottleneck module (Concat fusion) to the split features, and completes the fusion of all intermediate processing results through channel dimension concatenation:
Finally, the fused features are mapped to the final number of output channels through a 1 × 1 convolution operation.
However, this structure leads to the linear growth of the length of the list y with the number of Bottlenecks, the linear growth of the number of channels of the final Concat with the number of Bottlenecks, and a significant increase in the storage and computation of intermediate features. To further reduce the model size and computational difficulty while ensuring a constant number of intermediate feature channels, we upgrade the C2f module to a customized CDA (C2f_DWS_Add) module (
Figure 7). The CDA removes this intermediate Concat and uses Add fusion to update
instead (residual connection), with the number of channels kept constant at all times:
Finally,
(The number of channels remains constant). Only
and the updated
are concatenated at last:
which greatly reduces the computational complexity.
3.2.2. Speed-Up Optimization of Feature Enhancement Module
The SimSPPF module proposed by Dong Xiuhuan et al. has higher detection efficiency compared with the traditional SPPF module in their experiments [
29]. The SimSPPF module is adopted to replace the original SPPF module (
Figure 8). By simplifying the pooling operation process and feature integration logic, the data interaction path is optimized based on the feature correlation between basic layers: the direct mapping relationship between basic layer and high-level features is retained in the pooling stage, reducing the intermediate redundant steps of inter-layer feature conversion and lowering data transmission and computation overhead; in the feature integration stage, multi-scale features are aggregated through a similarity weighting strategy, which not only avoids the loss of key features in the fault area, but also shortens the feature processing link and improves the computation speed [
29]. This optimization can effectively avoid detection delay caused by insufficient computing power of edge devices, adapt to the real-time requirements of dynamic working conditions such as robot mobile detection, and ensure timely fault early warning through efficient inter-layer feature transmission.
3.2.3. Targeted Adjustment of Detection Scale
In the present simulation-based study, most fault-induced thermal anomalies appear as spatially continuous medium- or large-scale regions. Tiny isolated thermal responses may originate from debris, local noise, wall reflection, or other background disturbances in practical duct environments. Therefore, the removal of the small-scale detection branch is treated as a deployment-oriented design choice rather than a field-validated physical conclusion. This assumption will be further evaluated using real duct thermographic data in future work.
Combined with the scale distribution characteristics of underground cable faults, redundant detection branches are pruned: the original small target detection module is removed, and only the medium and large two-scale branches (P4/16 (40 × 40), P5/32 (20 × 20)) are retained, corresponding to medium-scale fault-induced thermal fields and large-scale heat accumulation regions, which are consistent with the spatially continuous characteristics of thermographic temperature fields [
30,
31]. This design is expected to reduce sensitivity to small isolated thermal responses and to lower computational cost. In the present simulation-based evaluation, it maintains high detection accuracy, but its effectiveness in suppressing real duct interference should be further verified using field thermographic data [
32].
Compared with the original YOLO26 model (
Figure 4), the improved model (
Figure 9) has a more streamlined structure on the premise of ensuring detection accuracy, which greatly reduces the computational overhead and enhances the real-time response capability of the model. At the same time, it is expected to improve deployment efficiency and may enhance robustness to viewpoint and background variations, although its real-world robustness still requires validation using real duct thermographic data.
3.3. Optimization of Model Training Strategy
3.3.1. INT8 Quantization
The INT8 quantization scheme is adopted to reduce the model storage volume and inference delay. A calibration mechanism is enabled during training, and the quantization range is counted in combination with the min-max observer to accurately control the boundary and improve quantization accuracy. The batch normalization layer is fused before quantization to reduce inter-layer redundancy, ensuring that the model has both efficiency and reliability in embedded terminals [
33,
34].
3.3.2. Data Preprocessing Optimization
Data preprocessing is optimized according to the actual detection environment and device characteristics. Translation and rotation operations are retained to simulate the slight position change in the detection device in the pipeline, expanding data diversity to avoid model overfitting and adapt to the dynamic scenario of robot mobile detection. Thermal-intensity perturbation (simulating the spectral response difference and thermographic configuration difference in different types of thermal imagers) and multi-intensity noise (restoring the interference of dust and water vapor in the pipeline) are added to solve the problem of disconnection between simulation data and actual imaging, making the model adapt to complex working conditions. A multi-degree trapezoidal transformation is used to simulate the actual shooting perspective of the thermal imager at different positions and angles of edge devices, offsetting the image distortion caused by device placement deviation and ensuring the detection perspective robustness of the model.
The augmentation was applied to normalized thermal-intensity maps rather than arbitrary RGB color jittering. The perturbation was restricted to small monotonic intensity changes to mimic thermal-imager response variation, so that the relative temperature distribution and the physical meaning of the simulated thermal field were preserved.
4. Cable-Thermo Dataset Construction
4.1. Motivation for Simulation-Based Dataset Construction
A reliable thermographic dataset is essential for training deep-learning-based fault detection models for underground duct cables. However, in practical power-grid operation and maintenance, real thermographic data of underground cables inside small-diameter ducts are extremely difficult to obtain. This difficulty is mainly caused by three factors.
First, underground power cable systems are usually managed by government departments, power-grid enterprises, or municipal infrastructure operators. Data acquisition in actual operating cable ducts is therefore subject to strict safety regulations, enterprise management rules, and access approval procedures. Directly collecting large-scale images from in-service underground cable ducts is difficult in both engineering and administrative practice.
Second, cable faults are low-probability events during normal operation. Severe defects such as conductor burnout, insulation damage, and structural failure rarely occur under controlled and observable conditions. Even when faults occur, emergency repair is usually prioritized over data collection, which makes it difficult to systematically record thermographic images before, during, and after fault development.
Third, different fault types have different thermal characteristics and development mechanisms. It is almost impossible to obtain sufficient real thermographic samples covering multiple fault categories, damage degrees, cable structures, and duct environments through field collection alone. In particular, soft faults such as sheath damage and hollow-type insulation-related defects are difficult to identify in the early stage and are rarely captured in real inspection records.
Therefore, this paper constructs a thermographic fault dataset based on thermoelectric coupling simulation following the workflow illustrated in
Figure 10. The purpose of simulation is not to replace real-world inspection completely, but to provide controllable, repeatable, and physically interpretable thermal images for algorithm development under conditions where large-scale real fault data are unavailable.
4.2. Thermoelectric Coupling Simulation
The Cable-Thermo dataset was constructed using ANSYS 2025 R2 Workbench through thermoelectric coupling simulation. The simulation process considered the structural characteristics of underground duct cables and the heat-generation mechanism caused by abnormal electrical and material conditions. Equations (1)–(5) describe the main thermoelectric coupling mechanism used to generate the simulated thermal fields in ANSYS Workbench.
The main physical mechanism considered in the simulation was the coupling between electric conduction and heat transfer. The electric potential distribution in the cable conductor was governed by the steady-state current-conservation equation:
where
is the current density,
is the electrical conductivity, and
is the electric potential. Therefore, the electric conduction equation can be written as
The heat generated by current conduction was described by Joule heating:
where
is the volumetric Joule heat source and
is the electric field intensity. The temperature field of the cable, surrounding air domain, and duct structure was then solved using the heat-transfer equation:
where
is the temperature,
is the density,
is the specific heat capacity, and
is the thermal conductivity. For steady-state thermal analysis, the transient term can be omitted. Heat transfer between the cable surface, surrounding air domain, and duct wall was described using conductive and convective boundary conditions. The convective heat-transfer boundary can be expressed as
where
is the outward normal vector,
is the convective heat-transfer coefficient,
is the surface temperature, and
is the ambient temperature.
A standard underground cable model was established according to the duct-laying scenario. The model included the conductor, insulation layer, filling layer, sheath, surrounding air domain, and duct structure. Material properties, including electrical conductivity, thermal conductivity, specific heat capacity, and density, were assigned to each component. Electrical and thermal boundary conditions were then applied to simulate the heat-transfer process during normal operation and fault conditions.
Five operating conditions were considered: normal operation, sheath damage, conductor burnout, hollow-type damage, and severe damage. These conditions correspond to different levels of structural and thermal abnormality. Sheath damage mainly affects the outer protective layer and usually produces relatively weak local thermal variation. Conductor burnout and severe damage generate more obvious high-temperature regions due to stronger electrical and thermal disturbance. Hollow-type damage represents a soft-fault condition with less obvious external appearance but a distinguishable internal thermal response.
To improve the physical reliability of the simulation results, mesh refinement was applied near the fault regions and cable interfaces. The boundary conditions and material parameters were kept consistent across different simulations except for the predefined fault-related changes. This ensured that the thermal differences among categories were mainly caused by the fault mechanisms rather than by random simulation settings.
4.3. Dataset Composition and Annotation
The annotation was performed manually using LabelMe based on the simulated thermographic images and the corresponding predefined fault locations in the ANSYS model. For each fault image, the bounding box was drawn to cover the main visible abnormal thermal region associated with the simulated defect. When a fault produced multiple separated abnormal thermal responses, multiple bounding boxes were assigned. Normal-operation images were used as reference samples for thermal-field comparison and were not assigned fault bounding boxes. The annotation files were saved in LabelMe JSON format. All annotations were checked to ensure consistency between the fault category, abnormal thermal region, and bounding-box location before dataset splitting. The final Cable-Thermo dataset contains 6200 thermographic images.
Figure 11 shows the normal-operation reference case and representative thermographic examples of the four fault categories used in the Cable-Thermo dataset.
The normal operating condition was used as the physical reference for simulation and thermal field comparison, while the detection task focused on identifying and locating fault regions. Each image was annotated with the fault category and bounding box location according to the main abnormal thermal region.
The dataset was divided into training, validation, and testing subsets. Normal-operation samples were generated and used as physical references for thermal-field comparison, but they were not included as an independent detection class during model training and evaluation because this study focuses on fault-region localization. To reduce data leakage, images generated from the same simulation condition and similar viewpoint configuration were not simply shuffled at random. Instead, the dataset split considered fault type, thermal distribution pattern, and viewpoint variation to reduce excessive similarity between training and testing samples.
The dataset was divided as shown in
Table 1.
This balanced design ensures that each fault category contributes equally to the training and evaluation process. It also avoids performance bias caused by category imbalance.
4.4. Image Generation and Data Diversity
Although the dataset is simulation-based, several strategies were adopted to improve the diversity of the generated images and reduce the gap between ideal simulation results and practical thermographic inspection conditions.
First, the temperature field distribution was generated under different fault locations and damage degrees. This allowed the dataset to contain thermal abnormal regions with different sizes, shapes, and intensities.
Second, multiple imaging viewpoints were introduced to simulate the relative position changes between the thermal camera and the cable during robotic inspection. Since a pipeline inspection robot may not always maintain a perfectly centered observation angle, viewpoint variation is important for improving model robustness.
Third, image-level transformations were applied during training, including rotation, translation, perspective transformation, thermal intensity perturbation, and random noise injection. These operations were designed to imitate camera vibration, installation deviation, thermal imager response differences, and environmental interference inside cable ducts.
Different from natural visible-light images, thermographic images mainly reflect temperature distribution rather than texture and color appearance. Therefore, data augmentation was carefully controlled to avoid destroying the physical meaning of the thermal field.
It should be noted that the thermographic sensor used in the proposed inspection scheme passively measures infrared radiation emitted from the cable surface, and the image formation mechanism is mainly determined by the surface temperature field rather than by the electromagnetic signal itself. Therefore, electromagnetic disturbances caused by cable defects are not directly modeled as thermal-image features in this study. However, in practical robotic deployment, electromagnetic interference may affect the power supply, signal transmission, and onboard electronics of the thermal imaging module. For this reason, shielding, grounding, stable power supply, sensor calibration, and hardware-level electromagnetic compatibility tests are necessary for field deployment. In the present simulation-based study, robustness is considered mainly through thermal-intensity perturbation, noise injection, and viewpoint variation, while full hardware robustness will be further evaluated in future real-duct experiments.
4.5. Dataset Characteristics and Interpretation of High Accuracy
It should be noted that Cable-Thermo is a simulation-based dataset. This is a necessary compromise caused by the difficulty of obtaining real thermographic data from in-service underground cable ducts. Real underground cable fault data are restricted by enterprise management rules, government safety regulations, limited field access, and the low occurrence probability of cable faults.
Therefore, the dataset cannot fully cover all complex environmental factors in actual underground ducts, such as dust, water vapor, wall reflection, thermal camera noise, cable aging diversity, and uncertain robot motion. These factors may introduce a domain gap between simulation and real-world deployment.
To reduce this limitation, the proposed study focuses not only on detection accuracy but also on model lightweighting and deployment feasibility. The simulation dataset provides a controllable benchmark for comparing different detection models under the same fault mechanisms. Future work will gradually collect real inspection samples under approved engineering conditions and introduce domain adaptation or simulation-to-real transfer learning to further improve real-world generalization.
4.6. Role of the Dataset in This Study
The Cable-Thermo dataset serves as the experimental basis for evaluating thermographic fault detection algorithms under controlled physical conditions. Since the fault categories, thermal distributions, and annotation rules are generated from the same simulation framework, the dataset allows different models to be compared fairly.
The purpose of this dataset is not to claim complete replacement of real-world inspection data, but to provide a feasible and physically interpretable research foundation for underground cable thermographic fault detection, especially under the current condition that large-scale real fault datasets are unavailable.
5. Results
5.1. Experimental Environment
The experiments were conducted to evaluate both the detection accuracy and deployment feasibility of the proposed method for thermographic fault detection in underground duct cables. All models were trained and tested under the same experimental protocol to ensure a fair comparison.
The training experiments were performed on a workstation equipped with an NVIDIA RTX 4060 GPU and an AMD Ryzen 9 7940HX CPU. The software environment consisted of Python 3.9.10, PyTorch 2.5.1, CUDA 12.1, and cuDNN 9.1.0 acceleration. All input images were resized to 640 × 640 pixels. The batch size was set to 32, and each model was trained for 200 epochs. The initial learning rate was 0.01, and the optimizer settings were kept identical for all comparison methods. During training, the same data augmentation strategy was applied to all models, including translation, rotation, thermal intensity perturbation, random noise injection, and perspective transformation.
Considering that the target application is real-time inspection by pipeline robots, deployment experiments were further conducted on an NVIDIA Jetson Orin NX edge computing platform. The deployed models were exported to ONNX and accelerated using TensorRT. Both FP16 and INT8 inference modes were evaluated. The batch size was fixed to 1 during deployment testing to match the actual operating condition of robotic online inspection.
5.2. Compared Methods
To comprehensively evaluate the proposed model, several representative object detection models were selected as comparison methods. All YOLO-based models adopted the medium-scale configuration to ensure consistency in model capacity.
The compared models include YOLOv8m, YOLO11m, YOLOv10m, YOLOv12m, RT-DETR-R50, the original YOLO26m, and the proposed YOLO26-Thermo. YOLOv8m and YOLO11m were selected as widely used CNN-based real-time detection baselines. YOLOv12m was introduced as a recent attention-enhanced real-time detector. RT-DETR-R50 was selected as a representative Transformer-based real-time detector to evaluate the difference between CNN-based and Transformer-based detection frameworks. The original YOLO26m was used as the direct baseline for verifying the effectiveness of the proposed structural modifications.
According to different deployment requirements, two variants of the proposed model were evaluated. YOLO26-Thermo-E adopts the CDA and SimSPPF modules while retaining the original detection scales and is designed for accuracy-prioritized scenarios. YOLO26-Thermo-H further removes the redundant small-scale detection branch and is designed for real-time edge deployment.
5.3. Evaluation Metrics
The detection accuracy was evaluated using precision, recall, mAP50, and mAP50–95. mAP50 reflects the basic defect recognition capability, while mAP50–95 provides a stricter evaluation of localization quality under different IoU thresholds. Since the dataset contains four defect categories with different fault severity levels, class-wise AP was also calculated to analyze the detection performance for each fault type.
The computational complexity was evaluated using the number of parameters and GFLOPs. For deployment evaluation, inference latency, frames per second, memory consumption, and power consumption were measured on the edge platform. These metrics jointly reflect whether the model can satisfy the real-time and low-power requirements of underground duct cable inspection robots.
6. Performance Evaluation
6.1. Detection Performance on Local GPU
Table 2 and
Figure 12 and
Figure 13 present the detection performance of different models on the Cable-Thermo dataset using the RTX 4060 platform.
The results show that YOLO26-Thermo-E achieves the highest mAP50 among all compared models, reaching 99.20%. Compared with the original YOLO26m, it improves mAP50 by 0.10 percentage points, while reducing GFLOPs from 68.2 to 59.6 and parameters from 20.4 M to 18.6 M. This corresponds to a 12.6% reduction in computational complexity and an 8.8% reduction in model parameters, indicating that the CDA and SimSPPF modules improve feature extraction efficiency without weakening defect representation capability.
YOLO26-Thermo-H achieves the lowest computational cost and the highest inference speed on the RTX 4060 platform. Its GFLOPs are reduced to 44.8, and the number of parameters is reduced to 17.1 M. Compared with YOLO26m, the computational complexity decreases by 34.3%, while the FPS increases from 30.8 to 45.3, representing a 47.1% speed improvement. Although its mAP50–95 decreases to 92.20%, its mAP50 remains high at 99.00%, showing that it still maintains strong defect recognition capability. This indicates that YOLO26-Thermo-H is more suitable for deployment scenarios where inference speed and hardware resource consumption are more critical than extremely strict localization accuracy.
RT-DETR-R50 obtains the highest mAP50–95, reaching 94.80%, which indicates that the Transformer-based detection framework has advantages in precise localization. However, its computational burden is much higher than that of YOLO-based models, with 126.0 GFLOPs, 42.1 M parameters, and only 15.9 FPS on the RTX 4060 platform. Therefore, although RT-DETR-R50 performs well in localization accuracy, its high computational cost limits its suitability for lightweight robotic deployment.
6.2. Class-Wise AP Comparison on Cable-Thermo Dataset
To further analyze the recognition capability of different models for different defect types, class-wise AP was calculated for four cable fault categories, as shown in
Table 3 and
Figure 14.
Among the four categories, hollow-type damage and sheath damage are relatively more difficult to detect because they usually present weaker thermal contrast and less obvious boundary characteristics. YOLO26-Thermo-E achieves the best performance on these two categories, indicating that CDA and SimSPPF can strengthen fault-related thermal feature representation.
YOLO26-Thermo-H maintains high AP for severe damage and conductor burnout because these hard faults usually produce more concentrated and distinct thermal abnormal regions. However, the AP of hollow-type damage and sheath damage is slightly lower than that of YOLO26-Thermo-E. This is mainly because the removal of the small-scale detection branch weakens the representation of subtle local thermal changes. Therefore, YOLO26-Thermo-E is more suitable for accuracy-prioritized diagnosis, while YOLO26-Thermo-H is more suitable for real-time robotic inspection.
6.3. Edge Deployment Performance
To verify the feasibility of practical deployment, the models were tested on the Jetson Orin NX platform. The inference batch size was set to 1, and TensorRT acceleration was adopted.
Table 4 and
Figure 15 present the edge deployment results.
The edge deployment results show that YOLO26-Thermo-H achieves the best real-time performance among FP16 models on the Jetson Orin NX platform. Under FP16 inference, it reaches 34 FPS, with an average latency of 29.4 ms per image. After INT8 quantization, the inference speed further increases to 45 FPS, and the latency decreases to 22.2 ms. This performance satisfies the computational real-time requirement for the proposed robotic inspection scenario under the tested edge-deployment conditions.
Compared with YOLO26m, YOLO26-Thermo-H improves the FP16 inference speed from 18 FPS to 34 FPS, representing an 88.9% increase. Meanwhile, the latency decreases from 55.6 ms to 29.4 ms, power consumption decreases from 17.8 W to 14.8 W, and memory usage decreases from 1.1 GB to 0.8 GB. These results confirm that the proposed lightweight structural design can effectively reduce hardware resource consumption on edge devices.
RT-DETR-R50 shows a clear disadvantage in edge deployment. Although it has strong localization ability on the local GPU, its FPS is only 7 on Jetson Orin NX, with a latency of 146.8 ms, power consumption of 22.3 W, and memory usage of 2.8 GB. This indicates that Transformer-based detectors still face deployment challenges in resource-constrained robotic inspection scenarios.
YOLOv10m also shows competitive edge performance because of its relatively lightweight structure, reaching 20 FPS with 0.9 GB memory usage. However, compared with YOLO26-Thermo-H, its FPS is lower and its latency is higher. Therefore, YOLO26-Thermo-H provides a better balance between detection accuracy, inference speed, power efficiency, and memory consumption.
Although the inspection robot moves slowly in a cable duct, a higher inference frame rate is still useful for robotic deployment. First, it provides sufficient processing margin for onboard image acquisition, model inference, data storage, communication, and possible multi-sensor fusion. Second, continuous frame-level detection can increase the spatial overlap between adjacent observations and reduce the risk of missing weak or short-duration thermal anomalies caused by camera vibration or local viewpoint changes. Third, the reported 34 FPS under FP16 does not imply that the robot must operate at this image acquisition rate; rather, it indicates that the detector has enough computational reserve for real-time operation under resource-constrained edge hardware. Therefore, 34 FPS should be interpreted as an edge-deployment capability margin rather than a strict requirement imposed by the slow motion of the robot.
6.4. Effect of INT8 Quantization
To evaluate the influence of quantization on deployment performance, YOLO26-Thermo-H was tested under FP16 and INT8 inference modes. Compared with FP16 inference, INT8 quantization increases FPS from 34 to 45, corresponding to a 32.4% improvement. The latency decreases from 29.4 ms to 22.2 ms, representing a 24.5% reduction. Meanwhile, memory usage decreases from 0.8 GB to 0.7 GB, and power consumption decreases from 14.8 W to 12.2 W.
The improvement demonstrates that INT8 quantization can reduce computational and memory overhead on edge hardware. Therefore, INT8 deployment is especially suitable for long-duration robotic inspection tasks where power consumption, real-time response, and limited onboard memory are critical.
6.5. Discussion
The experimental results demonstrate that the proposed method provides two practical configurations for different deployment requirements. YOLO26-Thermo-E achieves the best detection accuracy, with the highest mAP50 of 99.20%, and is suitable for scenarios where diagnostic reliability is prioritized, such as offline inspection, model-assisted fault review, or deployment on devices with sufficient computing resources. YOLO26-Thermo-H achieves the best computational efficiency and edge inference speed, making it more suitable for real-time inspection robots operating in underground duct environments.
The comparison with YOLOv8m, YOLOv10m, YOLO11m, YOLOv12m, and RT-DETR-R50 shows that the proposed model is not only effective relative to its direct baseline YOLO26m, but also competitive against recent CNN-based and Transformer-based real-time detectors. In particular, YOLO26-Thermo-H achieves the highest FPS on both RTX 4060 and Jetson Orin NX, while maintaining high detection accuracy. This verifies the effectiveness of combining CDA, SimSPPF, and detection-scale adjustment for underground cable thermographic fault detection.
The results also show that model selection should depend on the target application. For subtle thermal defects such as hollow-type damage and sheath damage, retaining all detection scales is beneficial for preserving local thermal details. Therefore, YOLO26-Thermo-E is preferred when fault sensitivity is the main concern. For robotic online inspection, where inference speed, power consumption, and memory usage are constrained, YOLO26-Thermo-H provides a more suitable deployment solution. In particular, its INT8 version reaches 45 FPS, 22.2 ms latency, 12.2 W power consumption, and 0.7 GB memory usage on Jetson Orin NX, showing strong potential for real-time embedded deployment.
YOLO26-Thermo achieves high detection accuracy on the Cable-Thermo dataset. This result is mainly attributed to two factors. First, the simulated thermal images are generated under controlled thermoelectric coupling conditions, which ensure clear physical consistency between fault types and thermal distributions. Second, the dataset is balanced across defect categories, reducing the influence of class imbalance on model training. However, because real duct environments may introduce additional interference such as dust, water vapor, wall reflection, sensor noise, and cable aging diversity, the reported mAP should be regarded as the performance under controlled simulation conditions rather than a direct estimate of field performance.
Overall, the proposed method achieves a favorable balance between detection accuracy and edge deployment efficiency. YOLO26-Thermo-E provides the highest mAP50, while YOLO26-Thermo-H, especially under INT8 inference, provides the best real-time performance and resource efficiency. These results provide an algorithmic foundation for real-time thermographic inspection of underground duct cables.
7. Ablation Study
To clarify the individual contributions, synergistic effects, and functional roles of the three core modules, namely CDA (C2f_DWS_Add), ML (removal of scale adjustment for small-object detection), and SimSPPF, in the improved YOLO26-Thermo model, and to verify their roles in balancing lightweight design and detection accuracy, a series of ablation experiments were designed and conducted.
The experiments took the original YOLO26 as the baseline (Group A), and Groups B, D, F, and H were set as control groups. The individual optimization effects of each module, the synergistic mechanism between modules, and the differences in accuracy (mAP50, mAP50–95) and engineering performance (GFLOPs, number of parameters) under different configurations were investigated, and the experimental data were recorded (
Table 5).
As shown in
Table 5, CDA provides an effective lightweight feature extraction design. Compared with the baseline YOLO26, introducing CDA reduces GFLOPs from 68.2 to 59.6 and parameters from 20.4 M to 18.6 M, while only causing a slight decrease in mAP50 from 99.1% to 98.9%. This indicates that CDA can reduce computational cost with limited influence on detection accuracy.
The ML strategy further reduces computational complexity by removing the small-scale detection branch. When ML is used alone, GFLOPs decrease from 68.2 to 53.1 and parameters decrease from 20.4 M to 18.9 M. However, mAP50–95 drops from 94.0% to 91.8%, indicating that removing the small-scale branch weakens fine localization and subtle thermal anomaly detection. When CDA and ML are directly combined, the model becomes much lighter, with GFLOPs reduced to 44.8 and parameters reduced to 17.1 M, but the accuracy degradation becomes more obvious. This suggests that lightweight operations require an additional feature compensation mechanism.
SimSPPF serves as such a compensation module. When CDA and SimSPPF are combined without scale pruning, the model achieves the highest mAP50 of 99.2% while reducing GFLOPs to 59.6 and parameters to 18.6 M. This configuration corresponds to YOLO26-Thermo-E and is suitable for accuracy-prioritized diagnosis. More importantly, when SimSPPF is added to the CDA + ML configuration, mAP50 increases from 96.8% to 99.0% and mAP50–95 increases from 87.5% to 92.2%, while GFLOPs and parameters remain unchanged. This demonstrates that SimSPPF can effectively compensate for the feature representation loss caused by lightweight pruning.
Overall, the ablation results verify the complementary roles of the three components. CDA reduces computational cost, ML provides deployment-oriented scale pruning, and SimSPPF compensates for the accuracy loss caused by lightweight design. Therefore, two final variants are adopted in this study: YOLO26-Thermo-E, which combines CDA and SimSPPF for higher detection accuracy, and YOLO26-Thermo-H, which combines CDA, SimSPPF, and ML for real-time edge deployment.
8. Confusion Matrix and Error Analysis
The detection confusion matrices of YOLO26, YOLO26-Thermo-E, and YOLO26-Thermo-H are compared to further analyze the recognition behavior of different models in
Figure 16. The confusion matrices were calculated at the bounding-box instance level rather than the image level; therefore, the number of entries may exceed the number of test images because some images contain multiple annotated fault regions. Since the confusion matrix in object detection includes the background class, the defect-to-background entries indicate missed detections, while the background-to-defect entries indicate false alarms.
Overall, the three models show strong discrimination among the four cable defect categories. Most samples are concentrated on the diagonal, and only limited inter-class confusion is observed among cable hollow-type damage, cable conductor burnout, cable sheath damage, and severe damage. This indicates that the thermal patterns of different simulated fault types are distinguishable and can be effectively learned by the detectors. Therefore, the remaining errors are mainly caused by foreground-background discrimination rather than confusion among different defect categories.
For the original YOLO26 model, the diagonal entries for the four defect categories are 203, 345, 189, and 206, respectively. The total number of correctly detected defect instances is 943. The model also produces 91 defect-to-background errors and 18 background-to-defect errors. These results show that the original YOLO26 provides a strong and balanced baseline for thermographic cable fault detection, but it still suffers from missed detections when the thermal response is weak or the defect boundary is ambiguous.
YOLO26-Thermo-E achieves the highest total number of correctly detected defect instances, with 945 diagonal detections. Compared with the original YOLO26, it slightly improves the correct detections of hollow-type damage, sheath damage, and severe damage. This indicates that the CDA and SimSPPF modules can maintain or enhance defect feature representation while reducing model complexity. Since YOLO26-Thermo-E retains all detection scales, it is more suitable for accuracy-prioritized diagnosis and detailed fault review. However, some defect instances are still classified as background, especially for hollow-type damage and severe damage. This suggests that even the accuracy-oriented variant may still miss samples with weak thermal contrast, unclear boundaries, or unstable thermal distributions.
YOLO26-Thermo-H obtains 942 correctly detected defect instances, which is very close to the original YOLO26. Although it removes the small-scale detection branch, its recognition ability for the main defect categories is largely preserved. For example, its correct detection number for cable conductor burnout is the same as that of YOLO26. This result indicates that the medium- and large-scale detection branches are sufficient to capture most fault-induced thermal anomalies in the Cable-Thermo dataset. More importantly, YOLO26-Thermo-H achieves comparable detection behavior with significantly lower computational complexity, making it more suitable for edge-oriented robotic inspection.
The comparison also clarifies the different roles of YOLO26-Thermo-E and YOLO26-Thermo-H. YOLO26-Thermo-E focuses on complete multi-scale representation and diagnostic sensitivity, whereas YOLO26-Thermo-H focuses on real-time screening and edge deployment efficiency. In underground duct cable inspection, real cable faults usually generate spatially continuous medium- or large-scale abnormal temperature fields through heat conduction, while small isolated thermal responses may be caused by duct debris, local noise, wall reflection, or other background interference. Therefore, removing the small-scale detection branch in YOLO26-Thermo-H is not only a lightweight design but also an application-oriented adjustment for suppressing excessive sensitivity to small isolated responses. Nevertheless, because a small number of missed detections and false alarms remain in all three models, the high mAP values should be interpreted as strong performance under controlled simulation conditions rather than perfect detection capability in real-world duct environments.
In summary, the confusion matrix analysis demonstrates that the proposed YOLO26-Thermo variants preserve the strong category discrimination ability of the original YOLO26. YOLO26-Thermo-E is more suitable for accuracy-prioritized thermographic diagnosis, while YOLO26-Thermo-H provides a better trade-off between detection capability and edge deployment efficiency. The main challenge for all models lies in foreground-background separation, especially for weak thermal anomalies and background disturbances, which will be further addressed by incorporating real-scene samples and simulation-to-real adaptation in future work.
9. Conclusions
This study proposed a lightweight thermographic fault detection framework for underground duct cables based on an improved YOLO26 model. The work was motivated by the practical difficulty of inspecting small-diameter underground cable ducts, where manual inspection is labor-intensive and hazardous, while conventional electrical diagnostic methods mainly provide signal-level or line-level fault information rather than image-level localization of thermal defect regions. To support model development under the condition that real thermographic fault samples are difficult to obtain, a Cable-Thermo dataset was constructed using thermoelectric coupling simulation. The dataset provides physically interpretable thermal images for four representative defect categories, including hollow-type damage, conductor burnout, sheath damage, and severe damage.
To meet different application requirements, two variants of the proposed YOLO26-Thermo model were developed. YOLO26-Thermo-E integrates the CDA and SimSPPF modules while retaining the original detection scales. It is designed for accuracy-prioritized scenarios where subtle thermal anomalies and weak defect boundaries need to be preserved. The experimental results show that YOLO26-Thermo-E achieves the highest mAP50 of 99.20%, indicating that the proposed feature extraction and fusion improvements can enhance thermographic defect representation while reducing model complexity.
YOLO26-Thermo-H is designed for real-time robotic inspection and edge deployment. In the present simulation dataset and based on the heat-conduction characteristics of cable faults, most fault-induced thermal anomalies tend to appear as spatially continuous medium- or large-scale regions. However, this assumption still needs to be further validated using real duct thermographic data. Based on this observation, YOLO26-Thermo-H removes the small-scale detection branch and focuses on medium- and large-scale thermal anomaly regions. This design not only reduces the risk of false detection caused by small-scale thermal interference, but also significantly decreases computational cost. Compared with the original YOLO26 model, YOLO26-Thermo-H reduces GFLOPs by 34.3% and parameters by 16.2%, while maintaining an mAP50 of 99.00%. Although its mAP50–95 is lower than that of the accuracy-prioritized variant, the overall detection performance remains competitive for real-time fault screening.
Edge deployment experiments further verified the practical feasibility of the proposed method. On the NVIDIA Jetson Orin NX platform, YOLO26-Thermo-H achieved 34 FPS with an average latency of 29.4 ms under FP16 inference. After INT8 quantization, the inference speed increased to 45 FPS and the latency decreased to 22.2 ms, while memory usage and power consumption were further reduced. These results indicate that YOLO26-Thermo-H has the computational feasibility for robot-mounted edge deployment under the tested hardware and simulation-based evaluation conditions. Field validation is still required before practical inspection performance can be confirmed. In contrast, YOLO26-Thermo-E is more suitable for offline diagnosis, detailed fault review, or edge platforms with relatively sufficient computing resources. Therefore, the proposed framework provides two complementary configurations: YOLO26-Thermo-E for higher diagnostic accuracy and YOLO26-Thermo-H for real-time lightweight deployment.
Despite these improvements, this study still has several limitations. First, the Cable-Thermo dataset is mainly constructed through thermoelectric coupling simulation. Although simulation enables controllable, repeatable, and physically interpretable dataset generation, it cannot fully reproduce all real-world factors in underground ducts, such as dust, water vapor, wall reflection, cable aging diversity, camera vibration, sensor noise, and uncertain robot motion. Second, the current model evaluation is mainly based on simulated thermographic images and edge inference experiments. More real-scene thermal images collected from actual inspection environments are needed to further verify model robustness and simulation-to-real generalization. Third, removing the small-scale detection branch in YOLO26-Thermo-H improves deployment efficiency and suppresses small-scale interference, but it may weaken the detection of extremely subtle early-stage thermal defects. Therefore, the choice between YOLO26-Thermo-E and YOLO26-Thermo-H should depend on the target application: E is preferred for high-sensitivity diagnosis, while H is preferred for real-time robotic inspection under strict hardware constraints.
Future work will focus on three aspects. First, real thermographic samples from underground duct cable inspection will be gradually collected to expand the dataset and evaluate the domain gap between simulation and field deployment. Second, simulation-to-real transfer learning, domain adaptation, and physics-guided data augmentation will be introduced to improve model generalization under real duct conditions. Third, multi-sensor fusion will be explored by combining thermal images with temperature, electrical, positional, or environmental sensing data, thereby improving the reliability and interpretability of underground cable fault diagnosis. Overall, the proposed YOLO26-Thermo framework provides a feasible algorithmic foundation for real-time, robot-based thermographic fault detection of underground duct cables under controlled simulation conditions. The reported results should be regarded as preliminary evidence for algorithm development and edge-deployment feasibility rather than field-validated performance.