Next Article in Journal
Predicting Overload Risk on Plasma-Facing Components at Wendelstein 7-X from IR Imaging Using Self-Organizing Maps
Previous Article in Journal
Simulation Studies of Biomass Transport in a Power Plant with Regard to Environmental Constraints
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Confidence Calibration Based Ensemble Method for Oriented Electrical Equipment Detection in Thermal Images

1
State Grid Shandong Electric Power Research Institute, Jinan 250002, China
2
State Grid Shandong Electric Power Company, Jinan 250013, China
3
College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China
*
Author to whom correspondence should be addressed.
Energies 2025, 18(12), 3191; https://doi.org/10.3390/en18123191
Submission received: 6 May 2025 / Revised: 27 May 2025 / Accepted: 3 June 2025 / Published: 18 June 2025
(This article belongs to the Section F5: Artificial Intelligence and Smart Energy)

Abstract

:
Detecting oriented electrical equipment plays a fundamental role in enabling intelligent defect diagnosis in power systems. However, existing oriented object detection methods each have their own limitations, making it challenging to achieve robust and accurate detection under varying conditions. This work proposes a model ensemble approach that leverages the complementary strengths of two representative detectors—Oriented R-CNN and S2A-Net—to enhance detection performance. Recognizing that discrepancies in confidence score distributions may negatively impact ensemble results, this work first designs a calibration method to align the confidence levels of predictions from each model. Following calibration, a soft non-maximum suppression (Soft-NMS) strategy is employed to fuse the outputs, effectively refining the final detections by jointly considering spatial overlap and the calibrated confidence scores. The proposed method is evaluated on an infrared image dataset for electric power equipment detection. Experimental results demonstrate that our approach not only improves the performance of each individual model by 1.95 mean Average Precision (mAP) but also outperforms other state-of-the-art methods.

1. Introduction

Equipment inspection plays a crucial role in ensuring the safety, reliability, and efficiency of electrical systems. Through regular inspections, potential hazards, such as physical damage and insulation degradation, can be promptly identified and addressed, thereby reducing the risk of equipment failure, minimizing unexpected downtime, and preventing potential safety incidents. In modern electrical equipment inspection, thermal imaging cameras have been widely used due to their non-contact nature, which significantly enhance inspector safety while maintaining operational continuity. These infrared cameras capture detailed thermal images that reveal precise temperature distribution patterns across electrical components, enabling maintenance teams to conduct comprehensive thermal analysis and identify potential anomalies. For accurate equipment status diagnosis, a fundamental step is to detect electrical equipment in thermal images, which serves as the foundation for subsequent thermal pattern analysis and fault diagnosis.
To date, a variety of methods, including segmentation-based approaches [1,2,3] and object detection-based techniques [4,5,6], have been developed for the detection of electrical equipment in thermal images. While segmentation-based methods excel at identifying equipment regions, they are often susceptible to background interference, which can disrupt the precise localization of component centerlines—a crucial aspect for some subsequent diagnostic tasks. On the other hand, object detection methods typically utilize upright bounding boxes to identify equipment components. However, due to the handheld capture manner commonly employed during routine inspections, equipment may appear in various orientations, as shown in Figure 1. As a result, these upright bounding boxes can incorporate excessive background information, introducing noise and potentially compromising the accuracy of subsequent diagnostics. For these reasons, this work opts to use oriented object detection methods to identify power equipment more accurately.
Although numerous oriented object detection methods, including both one-stage [7,8,9,10] and two-stage approaches [11,12,13,14], have been developed, their performance in detecting electrical equipment within complex environments remains suboptimal. As illustrated in Figure 2, challenges such as variations in component sizes and background interference frequently lead to issues like missed detections and false positives. The localizations of oriented bounding boxes also differ across methods. These limitations significantly hinder the practical effectiveness of these methods in real-world electrical equipment detection applications.
Motivated by the observation that different oriented object detection methods exhibit unique strengths and complementary characteristics, this work proposes an ensemble approach that integrates two representative detectors: a two-stage method, Oriented R-CNN [15], which excels at producing detections with accurate localization, and a one-stage method, S2A-Net [8], which demonstrates strong performance in detecting dense and small objects with higher precision. However, directly ensembling the detection results from these two models leads to degraded performance. This is primarily due to inconsistencies in the confidence scores of the predictions, as illustrated in Figure 3. These inconsistencies may arise from differences in model architectures, training objectives, and output distributions. To address this issue, our method employs a confidence calibration method [16,17] to align the predicted confidence levels with their true likelihoods. Once calibrated, it ensembles the results using a post-processing strategy based on soft non-maximum suppression, which further refines the final detections by considering the overlap and confidence of bounding boxes in a more nuanced manner. The effectiveness of our proposed ensemble framework has been validated on a dataset consisting of infrared images for electric power equipment detection, demonstrating considerable improvements over individual models in terms of both precision and recall.

2. Related Works

2.1. Electrical Equipment Detection

Electrical equipment detection is a fundamental step towards the intelligent diagnosis of power system conditions. Consequently, various deep learning-based techniques have been employed for equipment detection. Existing methods can be broadly categorized into segmentation-based and object-detection-based approaches. For instance, Yan et al. [1] and Meng et al. [2] adopted Mask R-CNN [18] for segmenting power equipment, while Tang et al. [3] utilized DeeplabV3+ to segment equipment structures susceptible to thermal faults. Although segmentation-based methods excel at detecting equipment regions, they are often sensitive to background interference, particularly when pinpointing the centerline of equipment—a key element for certain diagnostic tasks.
In object-detection-based methods, Han et al. [4] utilized MobileNet for feature extraction and devised a region of interest (ROI) selection method to identify equipment. Zhang et al. [5] employed Faster R-CNN for electrical equipment detection, while Qi et al. [6] implemented a lightweight model of the Single Shot Multibox Detector (SSD) [19] for real-time detection of electrical apparatus. Most of these methods detect equipment using axis-aligned bounding boxes, which often encompass excessive background regions, potentially interfering with subsequent anomaly diagnostics. An exception is the work by Gong et al. [20], who developed an oriented YOLO [21] model to detect slanted electrical equipment. Our study similarly focuses oriented equipment detection.

2.2. Oriented Object Detection

In many real-world scenarios, such as remote sensing [12] and power inspection [20], objects of interest often appear in tilted or skewed orientations due to the shooting angle. Thus, object detection using upright bounding boxes may include excessive noisy background, making oriented object detection, which can encompass objects more tightly, a topic of significant interest. So far, various deep learning-based methods have been developed to detect oriented objects. For example, a set of methods, including RRPN [11], RoI Transformer [12], ReDet [13], Gliding Vertex [14] and Oriented R-CNN [15], have constructed their frameworks based on two-stage detection methods like Faster R-CNN, generating either horizontal or oriented region of proposals (RoIs) for oriented bounding box regression. Other methods, like RetinaNet [7], S 2 A - Net [8], R 3 Det [9] and RTMDet [10], design their frameworks using one-stage or anchor-free detection methods, directly producing oriented bounding boxes for efficiency. Additionally, some approaches [22] also introduced new losses, such as a Pixels-IoU (PIoU) loss, leveraging both the angle and intersection over Union (IoU), for precise oriented bounding box regression.
Unlike the aforementioned methods that focus on designing new network architectures or loss functions, our work observes that different methods perform variably across different scenarios and may be complementary to each other. This work proposes an ensemble approach of multiple oriented object detection methods to enhance detection performance for electrical equipment detection. Specifically, it ensembles two different models, including Oriented R-CNN [15] and S 2 A - Net [8], which were selected based on their complementary strengths in handling different aspects of oriented object detection. The Oriented R-CNN demonstrates superior performance in generating high-quality oriented proposals, while S 2 A - Net shows strong performance in detecting densely packed or small objects. Their complementary capabilities enable our work to achieve high oriented object detection performance. Notably, our ensemble method can be applied to any number of other models as well, offering flexibility for various detection scenarios and computational requirements.

2.3. Ensemble Learning

Ensemble learning, a methodology that combines multiple base models to create a more powerful model, has been incorporated into deep learning [23,24] to improve performance and reduce the risk of overfitting. Ensemble deep learning techniques vary in the choice of homogeneous or heterogeneous base models and in the fusion of decisions through voting, meta-learning, or other strategies. In conventional object detection where bounding boxes are upright, Terzi [25] designed four ensemble strategies to combine nine object detectors like YOLOv3, Faster R-CNN, and RetinaNet, boosting the performance of anatomical and pathological object detection. Although various ensemble strategies have been designed, they basically average the confidence scores of the intersecting bounding boxes. Casado-García and Heras [26] combined affirmative, consensus, and unanimous strategies to fuse the detected bounding boxes obtained by different object detection methods.
However, in the context of upright object detection, Oksuz et al. [16,17] observed that directly ensembling the outputs of multiple detection models can result in performance degradation. This issue arises due to inconsistencies in confidence distributions, which are caused by variations in network architectures, the use of different classifiers, or discrepancies in localization heads. To address this, they proposed a confidence calibration method prior to ensembling and demonstrated improved performance by integrating the calibrated results. Although various ensemble methods have been proposed for upright (axis-aligned) object detection, there is still a lack of approaches specifically tailored for oriented object detection. Our work extends the concept of confidence calibration to the oriented object detection setting, aiming to enhance the performance of ensemble models by better integrating predictions from different OOD detectors.

3. The Proposed Method

This work proposes a confidence calibration-based ensemble method to fuse the detection results from two representative oriented object detection models: Oriented R-CNN [15] and S2A-Net [8]. An overview of our method is illustrated in Figure 4. Given an infrared image of electrical equipment, our approach first uses Oriented R-CNN and S2A-Net to detect oriented object instances independently. Next, a confidence calibration module is applied to adjust the predicted confidence scores, aligning them more closely with the actual likelihood of correctness. After calibration, our method ensembles the results using a soft non-maximum suppression (Soft-NMS) strategy, which refines the final detections by considering both the spatial overlap and the calibrated confidence scores of the bounding boxes.
Formally, an image I, along with two sets of oriented bounding boxes, { c ^ i 1 , o b ^ i 1 , p ^ i 1 } and { c ^ j 2 , o b ^ j 2 , p ^ j 2 } , are detected by Oriented R-CNN and S2A-Net, respectively. Here, c ^ is the category label, o b ^ denotes the oriented bounding box, and p ^ is the confidence score. Our method first calibrates their confidence distributions. Then, the calibrated bounding boxes are fused using a soft non-maximum suppression (soft-NMS) strategy to produce the final ensembled results.

3.1. The Base-Oriented Object Detection Models

3.1.1. Oriented R-CNN

Oriented R-CNN [15] is a two-stage framework designed for oriented object detection. Built on the widely used Faster R-CNN architecture with a feature pyramid network (FPN) backbone, Oriented R-CNN introduces an oriented region proposal network (Oriented RPN) to generate high-quality oriented proposals directly, eliminating the need for complex intermediate steps. It also employs an oriented region of interest (RoI) alignment module to extract rotation-invariant features, together with an oriented R-CNN head for proposal classification and regression, as illustrated in Figure 5. By integrating the designed components, Oriented R-CNN achieves a high accuracy and efficiency in oriented object detection, outperforming previous approaches while maintaining a streamlined, end-to-end trainable pipeline.
For training, Oriented R-CNN adopts the cross-entropy loss for classification and Smooth L1 loss for bounding box regression. Each bounding box is parameterized by six values: ( x , y , w , h , Δ α , Δ β ) , where ( x , y ) denote the centroid, ( w , h ) represent the width and height of the horizontal bounding box, and ( Δ α , Δ β ) correspond to the midpoint offsets.

3.1.2. S2A-Net

The single-shot alignment network (S2A-Net) [8] is an one-stage oriented object detection method built upon the RetinaNet framework. To better handle objects of varying sizes, the standard FPN backbone is replaced with the LKSNet backbone [27], which adaptively selects convolutional kernels that are better suited for detecting large or small objects. S2A-Net also introduces a feature alignment module (FAM), which refines anchor boxes and aligns features through alignment convolution. Additionally, it incorporates an oriented detection module (ODM) that utilizes rotating filters to effectively encode orientation information, as illustrated in Figure 6.
For training, S2A-Net employs the Focal Loss for classification and the Smooth L1 Loss for regression. Each oriented bounding box is parameterized by five values: ( x , y , w , h , θ ) , where ( x , y ) denotes the center coordinates, ( w , h ) represent the width and height, and θ is the rotation angle.

3.2. Confidence Calibration

Due to differences in network architectures, including varying backbones, one-stage versus two-stage frameworks, distinct bounding box representations, and loss functions, Oriented R-CNN and S2A-Net exhibit significant discrepancies in the confidence scores of their predicted detections, as illustrated in Figure 3. As highlighted by MoCAE [17], such inconsistencies may lead to performance degradation when performing a model ensemble. Therefore, it is essential to calibrate the confidence scores to ensure they accurately reflect the true quality of the predictions.

3.2.1. Calibration Error

The first step toward calibrating an oriented object detector is to define a suitable calibration error that quantifies the misalignment between prediction confidence and actual performance. Our work extends the localization-aware calibration error (LaECE) [16], originally proposed for upright object detection, to the oriented detection setting.
Let an oriented object detector be represented as f : X c ^ i , o b ^ i , p ^ i N , where c ^ i , o b ^ i , and p ^ i denote the predicted class, oriented bounding box, and confidence score of the i-th prediction, respectively, and N is the number of predictions. Then, the extended calibration error is defined as follows:
LaECE = 1 C c = 1 C j = 1 J D ^ j c D ^ c p ¯ j c precision c ( j ) × IoU ¯ c ( j ) ,
where C is the number of classes, J is the number of confidence bins, D ^ j c denotes the set of predictions for class c falling into bin j, p ¯ j c is the average predicted confidence in bin j for class c, precision c ( j ) is the precision of the j-th bin for c-th class, and IoU ¯ c ( j ) is the mean Intersection over Union (IoU) for the predictions in that bin. This formulation jointly accounts for both classification confidence and localization quality, making it well-suited for evaluating calibration in oriented object detection tasks. A lower LaECE indicates that the model’s confidence scores are more reliable and better aligned with its true detection performance.

3.2.2. Model Calibration

An oriented object detection model is calibrated if its classification and localization performance jointly align with the predicted confidence p ^ i . Formally, this condition can be expressed as:
P c ^ i = c i p ^ i Classification perf . E o b ^ i B i p ^ i IoU o b ^ i , o b ψ ( i ) Localisation perf . = p ^ i , p ^ i [ 0 , 1 ]
where P c ^ i = c i p ^ i denotes the probability that the predicted class c ^ i matches the ground truth class c i given the confidence score p ^ i , and IoU o b ^ i , o b ψ ( i ) measures the overlap between the predicted oriented bounding box o b ^ i and the ground truth box o b ψ ( i ) . The expectation is taken over all predictions falling within the confidence level p ^ i , denoted as the bin B i p ^ i .
Thus, this work trains a calibrator ζ θ : [ 0 , 1 ] [ 0 , 1 ] using input–target pairs ( p ^ i , IoU ( b ^ i , b ψ ( i ) ) ) , in which ζ θ is a simple linear function with only two learnable parameters. Therefore, linear regression is employed for efficient optimization.

3.3. Model Ensemble

After calibrating each oriented object detector, our method aggregates the predictions with the calibrated confidence scores using a soft non-maximum suppression (NMS) strategy [17]. In soft NMS, confidence scores are updated as follows:
p ^ i = p ^ i if IoU b ^ i , b ^ α < IoU N M S p ^ i × 1 IoU b ^ i , b ^ α else ,
where IoU N M S is the predefined IoU threshold for the standard NMS. This soft strategy penalizes overlapping boxes rather than discarding them entirely, which helps preserve potentially useful detections and improves overall ensemble performance.

4. Experiments

4.1. Dataset and Evaluation Metric

Due to the proprietary nature of substation data, there is currently no publicly available thermal image dataset specifically tailored for electrical equipment detection. To overcome this limitation and facilitate the evaluation of our ensemble method, this work curated a comprehensive dataset comprising infrared images captured during routine maintenance inspections at various substations in Shandong Province, China. These images were acquired using handheld thermal cameras. Each image in the dataset has been carefully annotated with oriented bounding boxes and labeled according to the corresponding equipment component categories.
The dataset includes four primary categories of power transformation equipment: Current Transformer (CT), Potential Transformer (PT), Surge Arrester (SA), and Circuit Breaker (CB). It also covers six critical component types: insulator, expander, grading ring, flange, interrupter, and coupler. In total, the dataset consists of 4062 thermal images, with 3046 allocated for training and 508 reserved for testing. These images exhibit considerable diversity in terms of appearance, viewing angles, and background contexts, posing significant challenges for object detection. This variability enhances the robustness of the dataset, enabling the comprehensive evaluation of detection algorithms under realistic and complex operational scenarios.
For evaluation metrics, as is common practice [8,12,14,15], this work adopts Average Precision (AP) and mean Average Precision (mAP), which are standard metrics in object detection and oriented object detection. AP is defined as the area under the precision–recall curve for a specific category, reflecting both the precision and recall of the detector. mAP is the mean of AP values across all categories, providing an overall assessment of detection performance. Our experiments follow the common practice of computing AP at an Intersection over Union (IoU) threshold of 0.5, denoted as AP@0.5, unless otherwise specified.

4.2. Experimental Setup

This work ensembles two oriented object detection methods: Oriented R-CNN [15] and S2A-Net [8]. Each model is independently trained on our training set using its default parameter settings. Both detectors are trained for 12 epochs, with Oriented R-CNN employing ResNet-50-FPN [28,29] and S2A-Net utilizing LSKNet [27] as their respective feature extraction backbones. All training is conducted on a single RTX 2080Ti GPU with a batch size of 2. The experiments are implemented and evaluated using the MMRotate framework [30]. For confidence calibration, the number of bins J is set to 25, and in Soft NMS, the IoU N M S threshold is set to 0.3.

4.3. The Results of Confidence Calibration

Figure 7 presents the reliability diagrams, illustrating the calibration performance for each class across confidence bins. The performance is measured by the product of precision and IoU. In a well-calibrated model, the predicted confidence should closely match the actual detection accuracy. This relationship is visualized by the proximity of the reliability curve to the diagonal line (perfect calibration). That is, all histogram bars align along the diagonal, indicating ideal confidence-to-accuracy correspondence. From the figure, it is observed that calibration significantly improves the alignment of predictions, bringing the histograms closer to the diagonal dash line. Specifically, the LaECE for Oriented R-CNN decreases from 35.34% before calibration to 11.86% after calibration, while for S2A-Net, it drops from 13.65% to 4.43%. This demonstrates the effectiveness of the calibration process in reducing prediction uncertainty and enhancing reliability.

4.4. Quantitative Evaluation

Table 1 presents a comparative analysis of the performance of our calibration-based ensemble method, the ensemble method without calibration, and the two individual detectors: Oriented R-CNN and S2A-Net. For broader comparison, this work also includes results from three additional oriented object detection methods: RoI Transformer [12], Gliding Vertex [14], and YOLOv11-OOD [31].
The results in Table 1 indicate that S2A-Net achieves the best overall performance among the four individual methods, particularly excelling in detecting small components such as couplers. However, directly ensembling S2A-Net with Oriented R-CNN without calibration leads to a slight decline in performance. In contrast, our calibration-based ensemble improves the mean Average Precision (mAP) by 1.9% and outperforms all other compared methods. Notably, the performance gains are especially significant for small components such as grading rings and couplers, highlighting the effectiveness of the proposed calibration-enhanced ensemble strategy.

4.5. Qualitative Visualization

Figure 8 presents qualitative comparisons between the individual methods, the ensemble without calibration, and our calibration-based ensemble. As shown in the figure, individual methods may suffer from missed detections, false positives, or inaccurate localization. These issues are significantly mitigated by our calibration-based ensemble, leading to noticeably improved detection performance.

4.6. Complexity Analysis

Note that our proposed method is designed to enhance detection performance by post-processing the outputs of existing oriented object detection (OOD) models. Importantly, our approach does not modify the internal architectures of the base models nor introduce additional trainable parameters during inference. Therefore, the complexity of each individual model remains unchanged. The additional computational overhead introduced by our method is minimal, primarily consisting of confidence calibration (via lightweight linear regression) and bounding box fusion across models. This post-hoc ensembling strategy allows us to improve detection accuracy without significantly increasing computational cost or inference time.

5. Conclusions

This paper presents a confidence calibration-based model ensemble framework to improve oriented object detection for electrical equipment in infrared imagery. By integrating Oriented R-CNN and S2A-Net through a regression-based confidence calibration followed by soft non-maximum suppression, our method effectively leverages the complementary strengths of both detectors. The calibrated fusion not only enhances detection accuracy but also improves robustness under diverse imaging conditions. Experimental results on a specialized dataset for power system equipment confirm that our approach significantly outperforms the individual baseline models. These findings highlight the potential of confidence-aware ensemble strategies in advancing intelligent defect diagnosis within power systems. Future work may explore dynamic weighting schemes or extend the framework to real-time applications in complex environments.

Author Contributions

Conceptualization, Y.L.; methodology, Y.L., Y.S. and Z.L.; software, Z.L., B.S. and N.G.; validation, Y.L. and Y.S.; data curation, N.G.; writing—original draft preparation, Y.L.; writing—review and editing, X.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Science and Technology Project fund of State Grid Shandong Electric Power Company (No. 2022A-020).

Data Availability Statement

The data presented in this study are not publicly available due to restrictions imposed by company confidentiality agreements. The dataset contains proprietary information that is protected under legal and contractual obligations, and as such, cannot be shared openly. Access to the data may be granted by the corresponding author upon reasonable request and with permission from the company, solely for the purposes of peer review and in accordance with applicable data protection regulations.

Conflicts of Interest

Author Ning Ge was employed by State Grid Shandong Electric Power Company. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Yan, N.; Zhou, T.; Gu, C.; Jiang, A.; Lu, W. Instance Segmentation Model for Substation Equipment Based on Mask R-CNN. In Proceedings of the 2020 International Conference on Electrical Engineering and Control Technologies (CEECT), Melbourne, Australia, 10–13 December 2020; pp. 1–7. [Google Scholar] [CrossRef]
  2. Meng, Z.; Wang, Z.; Li, S. Electric Equipment Panel Detection and Segmentation based on Mask R-CNN. In Proceedings of the 2021 China International Conference on Electricity Distribution (CICED), Shanghai, China, 7–9 April 2021; pp. 343–347. [Google Scholar] [CrossRef]
  3. Tang, Z.; Jian, X. Thermal fault diagnosis of complex electrical equipment based on infrared image recognition. Sci. Rep. 2024, 14, 5547. [Google Scholar] [CrossRef] [PubMed]
  4. Han, S.; Yang, F.; Yang, G.; Gao, B.; Zhang, N.; Wang, D. Electrical equipment identification in infrared images based on ROI-selected CNN method. Electric Power Syst. Res. 2020, 188, 106534. [Google Scholar] [CrossRef]
  5. Zhang, Q.; Chang, X.; Meng, Z.; Li, Y. Equipment detection and recognition in electric power room based on faster R-CNN. Procedia Comput. Sci. 2021, 183, 324–330. [Google Scholar] [CrossRef]
  6. Qi, C.; Chen, Z.; Chen, X.; Bao, Y.; He, T.; Hu, S.; Li, J.; Liang, Y.; Tian, F.; Li, M. Efficient real-time detection of electrical equipment images using a lightweight detector model. Front. Energy Res. 2023, 11, 1291382. [Google Scholar] [CrossRef]
  7. Lin, T. Focal Loss for Dense Object Detection. arXiv 2017, arXiv:1708.02002. [Google Scholar]
  8. Han, J.; Ding, J.; Li, J.; Xia, G.S. Align deep features for oriented object detection. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5602511. [Google Scholar] [CrossRef]
  9. Yang, X.; Yan, J.; Feng, Z.; He, T. R3det: Refined single-stage detector with feature refinement for rotating object. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 2–9 February 2021; Volume 35, pp. 3163–3171. [Google Scholar]
  10. Lyu, C.; Zhang, W.; Huang, H.; Zhou, Y.; Wang, Y.; Liu, Y.; Zhang, S.; Chen, K. Rtmdet: An empirical study of designing real-time object detectors. arXiv 2022, arXiv:2212.07784. [Google Scholar]
  11. Ma, J.; Shao, W.; Ye, H.; Wang, L.; Wang, H.; Zheng, Y.; Xue, X. Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimed. 2018, 20, 3111–3122. [Google Scholar] [CrossRef]
  12. Ding, J.; Xue, N.; Long, Y.; Xia, G.S.; Lu, Q. Learning RoI transformer for oriented object detection in aerial images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2849–2858. [Google Scholar]
  13. Han, J.; Ding, J.; Xue, N.; Xia, G.S. Redet: A rotation-equivariant detector for aerial object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 2786–2795. [Google Scholar]
  14. Xu, Y.; Fu, M.; Wang, Q.; Wang, Y.; Chen, K.; Xia, G.S.; Bai, X. Gliding vertex on the horizontal bounding box for multi-oriented object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 1452–1459. [Google Scholar] [CrossRef] [PubMed]
  15. Xie, X.; Cheng, G.; Wang, J.; Yao, X.; Han, J. Oriented R-CNN for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 3520–3529. [Google Scholar]
  16. Oksuz, K.; Joy, T.; Dokania, P.K. Towards building self-aware object detectors via reliable uncertainty quantification and calibration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 9263–9274. [Google Scholar]
  17. Oksuz, K.; Kuzucu, S.; Joy, T.; Dokania, P.K. MoCaE: Mixture of Calibrated Experts Significantly Improves Object Detection. arXiv 2023, arXiv:2309.14976. [Google Scholar]
  18. He, K.; Gkioxari, G.; Dollár, P.; Girshick, R.B. Mask R-CNN. In Proceedings of the ICCV, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
  19. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016. [Google Scholar]
  20. Gong, X.; Yao, Q.; Wang, M.; Lin, Y. A Deep Learning Approach for Oriented Electrical Equipment Detection in Thermal Images. IEEE Access 2018, 6, 41590–41597. [Google Scholar] [CrossRef]
  21. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
  22. Chen, Z.; Chen, K.; Lin, W.; See, J.; Yu, H.; Ke, Y.; Yang, C. PIoU Loss: Towards Accurate Oriented Object Detection in Complex Environments. In Computer Vision, Proceedings of the ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part V; Springer: Berlin/Heidelberg, Germany, 2020; pp. 195–211. [Google Scholar] [CrossRef]
  23. Mohammed, A.; Kora, R. A comprehensive review on ensemble deep learning: Opportunities and challenges. J. King Saud Univ.-Comput. Inf. Sci. 2023, 35, 757–774. [Google Scholar] [CrossRef]
  24. Ganaie, M.A.; Hu, M.; Malik, A.K.; Tanveer, M.; Suganthan, P.N. Ensemble deep learning: A review. Eng. Appl. Artif. Intell. 2022, 115, 105151. [Google Scholar] [CrossRef]
  25. Terzi, R. An Ensemble of Deep Learning Object Detection Models for Anatomical and Pathological Regions in Brain MRI. Diagnostics 2023, 13, 1494. [Google Scholar] [CrossRef] [PubMed]
  26. Casado-García, A.; Heras, J. Ensemble Methods for Object Detection. 2019. Available online: https://www.unirioja.es/cu/joheras/papers/ensemble.pdf (accessed on 5 May 2025).
  27. Li, Y.; Hou, Q.; Zheng, Z.; Cheng, M.M.; Yang, J.; Li, X. Large Selective Kernel Network for Remote Sensing Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 16794–16805. [Google Scholar]
  28. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  29. He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed]
  30. Zhou, Y.; Yang, X.; Zhang, G.; Wang, J.; Liu, Y.; Hou, L.; Jiang, X.; Liu, X.; Yan, J.; Lyu, C.; et al. Mmrotate: A rotated object detection benchmark using pytorch. In Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, 10–14 October 2022; pp. 7331–7334. [Google Scholar]
  31. Khanam, R.; Hussain, M. YOLOv11: An Overview of the Key Architectural Enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar]
Figure 1. Illustrations of typical examples. Semantic segmentation is easily affected by background interference, and object detection using upright bounding boxes tends to include background regions. In contrast, oriented object detection can enclose objects more precisely.
Figure 1. Illustrations of typical examples. Semantic segmentation is easily affected by background interference, and object detection using upright bounding boxes tends to include background regions. In contrast, oriented object detection can enclose objects more precisely.
Energies 18 03191 g001
Figure 2. Illustrations of results predicted by two different methods. Oriented R-CNN predicts with more accurate localization, but suffers from missed detections or false positives. S2A-Net achieves better precision, but the bounding box localization is less accurate.
Figure 2. Illustrations of results predicted by two different methods. Oriented R-CNN predicts with more accurate localization, but suffers from missed detections or false positives. S2A-Net achieves better precision, but the bounding box localization is less accurate.
Energies 18 03191 g002
Figure 3. The distribution of confidence scores for bounding boxes predicted by Oriented R-CNN and S2A-Net. Oriented R-CNN tends to produce more predictions with higher confidence scores, while S2A-Net generates a more conservative set of predictions with generally lower confidence values.
Figure 3. The distribution of confidence scores for bounding boxes predicted by Oriented R-CNN and S2A-Net. Oriented R-CNN tends to produce more predictions with higher confidence scores, while S2A-Net generates a more conservative set of predictions with generally lower confidence values.
Energies 18 03191 g003
Figure 4. An overview of the proposed method. Given an image, our approach first applies Oriented R-CNN and S2A-Net to independently generate detection results. The predicted confidence scores are then calibrated using a confidence calibration module. Finally, the calibrated results are fused using a soft non-maximum suppression strategy to produce the final refined detections.
Figure 4. An overview of the proposed method. Given an image, our approach first applies Oriented R-CNN and S2A-Net to independently generate detection results. The predicted confidence scores are then calibrated using a confidence calibration module. Finally, the calibrated results are fused using a soft non-maximum suppression strategy to produce the final refined detections.
Energies 18 03191 g004
Figure 5. The framework of Oriented R-CNN. It builds on the FPN backbone, followed by an oriented region proposal network (RPN), an oriented region of interest alignment module and an oriented R-CNN head.
Figure 5. The framework of Oriented R-CNN. It builds on the FPN backbone, followed by an oriented region proposal network (RPN), an oriented region of interest alignment module and an oriented R-CNN head.
Energies 18 03191 g005
Figure 6. The framework of S2A-Net. It builds on the LSKNet backbone, followed by a feature alignment module for feature refinement, and an oriented detection module for anchor box classification and regression.
Figure 6. The framework of S2A-Net. It builds on the LSKNet backbone, followed by a feature alignment module for feature refinement, and an oriented detection module for anchor box classification and regression.
Energies 18 03191 g006
Figure 7. Reliability diagrams of Oriented R-CNN and S2A-Net before and after calibration are shown in the first and second rows, respectively.
Figure 7. Reliability diagrams of Oriented R-CNN and S2A-Net before and after calibration are shown in the first and second rows, respectively.
Energies 18 03191 g007
Figure 8. Qualitative comparison of different methods.
Figure 8. Qualitative comparison of different methods.
Energies 18 03191 g008
Table 1. Quantitative comparison results.
Table 1. Quantitative comparison results.
ModelmAPAP for 6 Classes in EPED Dataset
Insulator Expander Grading-Ring Flange Interrupter Coupler
RoI Transformer89.3590.4890.7888.9489.1698.7577.98
Gliding Vertex88.9789.8589.5689.1188.4197.3979.48
YOLOv11OOD91.7690.2590.5692.2189.4198.3089.82
Oriented RCNN89.4590.2690.3888.6789.2598.7779.39
S2A-Net90.9589.7089.7590.3589.1898.0488.70
Ensemble90.8790.0390.3595.2989.2198.9681.41
Cal-Ensemble92.8590.5890.6296.5089.8099.1190.48
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lin, Y.; Li, Z.; Song, B.; Ge, N.; Sun, Y.; Gong, X. A Confidence Calibration Based Ensemble Method for Oriented Electrical Equipment Detection in Thermal Images. Energies 2025, 18, 3191. https://doi.org/10.3390/en18123191

AMA Style

Lin Y, Li Z, Song B, Ge N, Sun Y, Gong X. A Confidence Calibration Based Ensemble Method for Oriented Electrical Equipment Detection in Thermal Images. Energies. 2025; 18(12):3191. https://doi.org/10.3390/en18123191

Chicago/Turabian Style

Lin, Ying, Zhuangzhuang Li, Bo Song, Ning Ge, Yiwei Sun, and Xiaojin Gong. 2025. "A Confidence Calibration Based Ensemble Method for Oriented Electrical Equipment Detection in Thermal Images" Energies 18, no. 12: 3191. https://doi.org/10.3390/en18123191

APA Style

Lin, Y., Li, Z., Song, B., Ge, N., Sun, Y., & Gong, X. (2025). A Confidence Calibration Based Ensemble Method for Oriented Electrical Equipment Detection in Thermal Images. Energies, 18(12), 3191. https://doi.org/10.3390/en18123191

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop