Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Fusion of Electrical and Optical Methods in the Detection of Partial Discharges in Dielectric Oils Using YOLOv8

Electronics 2025, 14(19), 3916; https://doi.org/10.3390/electronics14193916

by José Miguel Monzón-Verona^1,2,*, Santiago García-Alonso^2,3 and Francisco Jorge Santana-Martín¹

Reviewer 1: Anonymous

Reviewer 2:

Xianhao Fan

Reviewer 3:

Soo-Whang Baek

Electronics 2025, 14(19), 3916; https://doi.org/10.3390/electronics14193916

Submission received: 1 September 2025 / Revised: 24 September 2025 / Accepted: 29 September 2025 / Published: 1 October 2025

(This article belongs to the Special Issue Fault Detection Technology Based on Deep Learning)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

* Please explain in detail any limitations or shortcomings of previous research and clearly explain how this study fills these gaps.
* A comparative table can be added to the Introduction section to allow for comparison of studies. This will also help readers understand the innovation.

* The unique contribution of the study should be emphasized more clearly, particularly in the Introduction and Conclusion sections.
* Discuss how the proposed method/analysis improves existing methods, improves performance, or provides practical benefits over existing methods.

* Simplify excessively long or complex sentences.
* Paragraphs should focus on a single main idea, and section headings can be made more descriptive to effectively guide readers.

* Figure 2 is not necessary and can be removed.
* The resolution of Figure 15 should be increased.
* Was the same oil used in all tests, or was the oil changed for each test? Please explain.

Author Response

Answer to Reviewer 1

Thank you very much for your thorough review and constructive suggestions. Your feedback is very helpful for improving the clarity and impact of our paper. We have carefully addressed each of your points below.

Please explain in detail any limitations or shortcomings of previous research and clearly explain how this study fills these gaps.

Answer: We agree this is a critical point. A clear positioning of our work with respect to the state-of-the-art is essential.
Proposed action: We will restructure the Introduction section to improve its logical flow. First, we will present a more detailed review of PD measurement methods, discussing electrical methods (such as IEC 60270, which is our starting point) and optical methods separately, highlighting their respective strengths and limitations. Next, we will explicitly justify the need to combine both methods, arguing that while the electrical detector quantifies the severity of the problem (magnitude of the charge), the optical detector provides information about its nature and location (morphology and spatial location). This fusion offers a much more complete and robust diagnosis than any single-mode method.

We have added the following text on line 77:

“Complementing electrical methods, optical techniques have emerged as a valuable diagnostic tool, focusing on the detection of the weak electroluminescence emitted during a discharge. Research in this field covers a wide spectrum of sensor technologies and analysis methodologies. To overcome this signal weakness, highly sensitive sensors such as silicon photomultipliers (SiPMs) have been developed, capable of detecting PD events with high accuracy, demonstrating their feasibility for real-time monitoring [8]. Beyond mere point detection, direct visualization of the phenomenon offers invaluable diagnostic value. Advances in imaging sensors have enabled the visual characterization of discharges, such as the corona effect, even using low-cost sensors in specific conditions such as aeronautical ones [9].

The current trend is towards combining advanced image sensors with artificial intelligence techniques to automate analysis. For example, the use of high-resolution CMOS sensors, whose rich visual information is processed by CNN, has been demonstrated to characterize and classify PDs in dielectric oils [10]. More advanced research explores the capture of information across multiple multispectral light spectra, such as ultraviolet and visible, to improve the detection, recognition, and assessment of discharges, providing a more complete optical signature of the phenomenon [11]. In turn, more sophisticated deep learning models, such as those based on the transformer architecture, are being developed to analyze composite optical sensing data and achieve even more accurate and robust PD identification [12].

Overall, the main advantage of these optical methods is their ability to provide direct and intuitive visual evidence of the physical location, morphology, and propagation of the discharge, which is vital for identifying the exact point of insulation degradation. However, optical data are often qualitative and may be less sensitive for quantifying the electrical severity of the discharge compared to the standardized electrical method.”

[8] Ren, M.; Zhou, J.; Song, B.; Zhang, C.; Dong, M.; Albarracín, R. Towards Optical Partial Discharge Detection with Micro Silicon Photomultipliers. Sensors 2017, 17, 2595. https://doi.org/10.3390/s17112595

[9] Riba, J.-R.; Gómez-Pau, Á.; Moreno-Eguilaz, M. Experimental Study of Visual Corona under Aeronautic Pressure Conditions Using Low-Cost Imaging Sensors. Sensors 2020, 20, 411. ; https://doi.org/10.3390/s20020411

[31] Monzón-Verona, J.M.; González-Domínguez, P.; García-Alonso, S. Characterization of Partial Discharges in Dielectric Oils Using High-Resolution CMOS Image Sensor and Convolutional Neural Networks. Sensors 2024, 24, 1317. https://doi.org/10.3390/s24041317

[10] Guo, J.; Zhao, S.; Huang, B.; Wang, H.; He, Y.; Zhang, C.; Zhang, C.; Shao, T Identification of Partial Discharge Based on Composite Optical Detection and Transformer-Based Deep Learning Model. IEEE Trans. Plasma Sci. 2024, 52 (10), 4935–4942.852 https://doi.org/10.1109/TPS.2024.3382320.

[11] Xia, C.; Ren, M.; Chen, R.; Yu, J.; Li, C.; Chen, Y.; Wang, K.; Wang, S.; Dong, M. Multispectral optical partial discharge detection, recognition, and assessment. IEEE Trans. Instrum. Meas. 2022, 71, 7380. https://doi.org/10.1109/TIM.2022.3162284

The following paragraph has been added to the end of the introduction on line 256, explaining that both optical and electrical methods have been combined:

“Novel bimodal fusion for contextualized diagnosis: the second key improvement is the novel bimodal fusion that provides diagnostic context unattainable with unimodal models. Our approach fuses and combines this quantitative electrical information —the "how much"— automatically extracted from the detector with the qualitative and morphological characterization of direct optical images —the "where" and the "how." Therefore, the improvement lies not only in the classification algorithm itself, but in the contextual richness of the fused data provided to it. This synergy enables a much more comprehensive diagnosis, correlating the electrical severity of an event with its precise physical manifestation in space.”

A comparative table can be added to the Introduction section to allow for comparison of studies. This will also help readers understand the innovation.

Answer: This is an excellent suggestion for visually summarizing the contributions and highlighting the novelty of our work. We agree that a table would significantly enhance reader understanding.
Proposed action: We have added a comparative table to the Introduction. The table compares representative existing methods with our proposed approach, focusing on aspects like the sensors used, the information obtained, and the key innovation.

The following comparative table has been added to the Introduction section on line 105:

“Table 1 summarizes a Comparison of representative existing methods with our proposed approach, focusing on aspects like the sensors and data sources used, the information obtained, and the key limitations.

The unique contribution of the study should be emphasized more clearly, particularly in the Introduction and Conclusion sections.

Answer: We completely agree. Clearly articulating the core contributions is paramount.
Proposed action: We have added dedicated paragraphs at the end of the Introduction to explicitly define our two fundamental improvements. A similar summary has also been added to the Conclusion section to reinforce these contributions.

We have added the following text at the end of the Introduction section, on line 245:

“Unlike traditional PD diagnostic models, which typically analyze pre-processed numerical data or pattern images, the approach in this work presents two fundamental improvements in the data acquisition and analysis paradigm itself:

Data source automation: The first and fundamental improvement lies in the automation of the data source itself. Conventional approaches typically start with already digitized data. Our system, on the other hand, uses computer vision not only to analyze but also to generate structured data in the first place. By directly reading the display of a conventional, non-smart instrument, it transforms it into a self-contained, structured, digital data source. This eliminates the dependence on manual operator interpretation or proprietary data interfaces, offering an innovative solution for the modernization and automation of existing equipment.
Novel bimodal fusion for contextualized diagnosis: The second key improvement is the novel bimodal fusion that provides diagnostic context unattainable with unimodal models. Our approach fuses and combines this quantitative electrical information —the "how much"— automatically extracted from the detector with the qualitative and morphological characterization of direct optical images —the "where" and the "how." Therefore, the improvement lies not only in the classification algorithm itself, but in the contextual richness of the fused data provided to it. This synergy enables a much more comprehensive diagnosis, correlating the electrical severity of an event with its precise physical manifestation in space.”

Discuss how the proposed method/analysis improves existing methods, improves performance, or provides practical benefits over existing methods.

Answer: Thank you for this question. It is important to highlight the tangible benefits of our approach.
Proposed action: The new paragraphs added to the Introduction and Conclusion (as mentioned in the point above) explicitly address this. We highlight:
- Improvement over existing methods: Our method generates structured digital data from a visual source, unlike methods that rely on pre-digitized data from dedicated sensors.
- Performance improvement: The performance is enhanced not by a marginally better classification accuracy, but by providing a much richer, contextualized diagnosis (fusing "how much" with "where" and "how").
- Practical benefit: The key practical benefit is the ability to modernize legacy equipment (like the DDX-9101) into an automated data source without costly hardware upgrades or proprietary interfaces, making it a scalable and cost-effective solution.

Simplify excessively long or complex sentences. Paragraphs should focus on a single main idea, and section headings can be made more descriptive to effectively guide readers.

Answer: We appreciate this valuable feedback on writing style and structure. Clarity is our priority.
Proposed action: We have conducted a thorough editorial review of the entire manuscript to simplify complex sentences and ensure each paragraph has a single, clear focus. We have also revised section headings to be more descriptive. For example:
- The heading for Section 3, previously "CNN training and inference from DDX images", has been revised to " Method 1: Automating the Electrical Detector via Computer Vision".
- The heading for Section 4, previously "CNN training and inference from HQ images", has been revised to " Method 2: Optical Characterization of PDs via Computer Vision".

Figure 2 is not necessary and can be removed.

Answer: We agree. Figure 2 (a) is not essential for understanding the methodology.
Proposed action: We have removed Figure 2 (a) from the manuscript.

Thank you very much. We agree. Figure 2. (a) is not essential for understanding the methodology and can be removed. However, we think that Figure 2 (b) could be useful to visualize the DDDX-9101 that has been analyzed in this paper. In this way, we have kept Figure 2. (b).

The resolution of Figure 15 should be increased.

Answer: Thank you. We agree that the resolution was insufficient.
Proposed action: We have replaced Figure 15 with an improved, higher-resolution version to ensure all details are clearly visible.

Was the same oil used in all tests, or was the oil changed for each test? Please explain.

Answer: This is an important experimental detail that was not sufficiently clarified. Thank you for pointing it out.
Proposed action: We have added a clarifying sentence to the Section 2, "Experimental Setup", on line 300

"To ensure the comparability of results across different voltage levels within a single experimental run, the same sample of dielectric oil was used. The oil was subsequently replaced before initiating a new series of tests to prevent potential degradation effects from influencing the measurements."

In writing this article, we have tried to use short sentences and paragraphs to facilitate reading and understanding. The manuscript has been carefully reviewed by an English-speaking colleague, who has improved the quality and style of the English.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The article proposes a PD identification method based on discharge pulse images, and the topic discussed is of significant importance to industrial society. However, the introduction is poorly structured, the information included is limited, and the paper's novelty is unclear.

The abstract section does not need to provide such a detailed introduction to the work carried out. It should focus on highlighting the improvements and contributions of the image recognition algorithm presented in the paper.
The introduction lacks logical flow and needs to be redesigned. There is no comprehensive review of optical and electrical partial discharge measurement methods, nor is there an explanation as to why both optical and electrical methods should be combined.
The experimental platform uses a camera to capture optical images, but how can this method be applied to real transformers?
The data used in the paper is the pulse data shown in Figure 4, but how can the signal-to-noise ratio of this raw data be guaranteed? How is the noise included in the data removed?
The paper focuses on image recognition algorithms, but why is the training data not based on the more commonly used PRPD pattern?
In the evaluation model section, what improvements or differences does the model used in the paper have compared to traditional models?
Figure 15 is too blurry.

Comments on the Quality of English Language

None

Author Response

Answer to Reviewer 2

The abstract section does not need to provide such a detailed introduction to the work carried It should focus on highlighting the improvements and contributions of the image recognition algorithm presented in the paper.
- Answer: We appreciate this comment. We agree that the summary could be more concise and focused more on the contributions.
- Proposed action: We will revise the abstract to reduce the introductory section and give greater emphasis to the key contributions of our work. We will further emphasize our innovation: the transformation of a conventional electrical detector into a smart and autonomous data source by applying YOLOv8 to read its display, and the synergy achieved by merging this quantitative data with the morphological and spatial information obtained from the optical system, demonstrating a bimodal and automated diagnostic approach.

The proposed new abstract is the following:

“This work presents an innovative bimodal approach for laboratory partial discharge (PD) analysis using a YOLOv8-based convolutional neural network (CNN). The main contribution consists, first, in the transformation of a conventional DDX-type electrical detector into a smart and autonomous data source. By training the CNN, a system capable of automatically reading and interpreting the data from the detector display —discharge magnitude and applied voltage— is developed, achieving an average training accuracy of 0.91 and converting a passive instrument into a digitalized and structured data source. Second, and simultaneously, an optical visualization system captures direct images of the PDs with a high-resolution camera, allowing their morphological characterization and spatial distribution. For electrical voltages of 10, 13, and 16 kV, PDs were detected with a confidence level of up to 0.92. The fusion of quantitative information intelligently extracted from the electrical detector with qualitative characterization from optical analysis offers a more complete and robust automated diagnosis of the origin and severity of PDs.”

The introduction lacks logical flow and needs to be redesigned. There is no comprehensive review of optical and electrical partial discharge measurement methods, nor is there an explanation as to why both optical and electrical methods should be combined.
- Answer: This is a crucial point, and we appreciate you pointing it out.
- Proposed action: We will restructure the Introduction section to improve its logical flow. First, we will present a more detailed review of PD measurement methods, discussing electrical methods (such as IEC 60270, which is our starting point) and optical methods separately, highlighting their respective strengths and limitations. Next, we will explicitly justify the need to combine both methods, arguing that while the electrical detector quantifies the severity of the problem (magnitude of the charge), the optical detector provides information about its nature and location (morphology and spatial location). This fusion offers a much more complete and robust diagnosis than any single-mode method.

We have added the following text on line 77:

The following paragraph has been added to the end of the introduction on line 256, explaining that both optical and electrical methods have been combined:

The experimental platform uses a camera to capture optical images, but how can this method be applied to real transformers?
- Answer: This is a very interesting question because our method was developed in a laboratory. Despite these limitations, the proposed method sets a solid precedent for smart sensing with optical data, opening new avenues for in-situ diagnosis of high-voltage phenomena with an unprecedented level of detail.
- Proposed action: We will explicitly address the challenges of using in-situ optical method, including sensitivity to lighting conditions, possible attenuation of the optical signal by oil turbidity or aging, the need for direct optical access to the discharge zone, and the overall challenge of scaling this technique to real transformers.

We have added the following text at the end of section 2, on line 313:

“The proposed method is performed in a controlled laboratory environment and has proven to be an effective tool for the detection and characterization of PDs.

However, it is essential to understand the limitations inherent in their in-situ application. Below, we mention some aspects to consider in these cases that are beyond the scope of this study and that may guide future lines of research.

Sensitivity to lighting conditions and optical environment: The performance of both YOLOv8 object detection and OCR character recognition is intrinsically dependent on the quality of the captured images. The methodology is sensitive to variations in ambient lighting conditions. Factors such as reflections, shadows, or uneven lighting can introduce noise and affect the accuracy of the algorithms. A controlled environment was maintained in our laboratory, but in-situ implementation would require more robust solutions, such as the use of controlled and polarized light sources or more advanced image preprocessing algorithms to normalize the captures.
Transparency of the dielectric medium: The effectiveness of optical discharge detection relies on the assumption that the dielectric medium, in this case the oil, is optically transparent. In real-world applications, insulating oils degrade over time due to thermal and electrical stress, which can increase their turbidity, change their color, or generate suspended byproducts. This degradation would cause attenuation of the optical signal through scattering or absorption, making it difficult or even impossible to capture the discharge morphology, especially for low-intensity events.
Direct optical access requirement: A fundamental requirement of this technique is the existence of a direct line of sight to the area where the discharges occur. Our experimental setup used a vessel with transparent walls, simulating an inspection window. However, most high-voltage electrical equipment in service is sealed metal vessels. Widespread application of this method would require the availability of equipment with inspection windows or the possibility of making significant structural modifications to install them, which is not always feasible, safe, or economically viable.
Scalability to field equipment: The transition from a laboratory environment to on-site diagnostics on large equipment, such as power transformers, presents considerable challenges. The large internal volume of this equipment makes it complex to determine the optimal location of one or more cameras to cover all potential risk areas. Furthermore, integrating a vision system into the existing monitoring infrastructure and ensuring its durability in the harsh environmental conditions of an electrical substation are engineering hurdles that must be addressed for practical, large-scale implementation.

Despite these limitations, the proposed method sets a solid precedent for the "smart sensing" of visual sources and their fusion with optical data, opening new avenues for the in-situ diagnosis of high-voltage phenomena with an unprecedented level of detail."

The data used in the paper is the pulse data shown in Figure 4, but how can the signal-to- noise ratio of this raw data be guaranteed? How is the noise included in the data removed?
- Answer: Our approach is different from traditional signal processing and consists of finding the patterns of signal pulses using CNN, including the existing noise.
- Proposed action: We will clarify this point in section 3.1.2, Manual labeling of DDX images. We will explain that the goal is not to filter the electrical signal to remove noise, but to train a computer vision model YOLOv8 to learn to recognize the pulse patterns as they appear on the DDX detector screen. The visual noise of the baseline is, in fact, part of the background that the model learns to ignore. The high-performance metrics (accuracy of 0.91) and the confusion matrix results demonstrate that the model was able to successfully differentiate the pulses from the background signal, visual noise, validating the robustness of this image recognition-based approach.

We have added the following paragraph in section 3.1.2, Manual labeling of DDX images, below Figure 5 on line 399:

“It is important to clarify that the approach in this work does not rely on traditional signal processing to ensure a high signal-to-noise ratio or to remove noise from the underlying electrical signal. Instead, the problem is approached as a computer vision task, where the objective is to train a YOLOv8 model to recognize the visual patterns of pulses as presented on the DDX detector screen. From this perspective, the baseline visual noise, including the sine wave and low-level fluctuations, is not filtered out but constitutes the image background.

During the training process, the CNN learns to identify the distinctive visual characteristics of the pulses, the signal, and to differentiate them from the background, the visual noise, implicitly learning to ignore it. The effectiveness of this approach is validated by the high-performance metrics obtained, as detailed later. The high accuracy scores, above 0.91, and the confusion matrix results quantitatively demonstrate that the model was able to successfully differentiate the pulses of interest from the background, validating the robustness of this image recognition-based method for the proposed task.”

The paper focuses on image recognition algorithms, but why is the training data not based on the more commonly used PRPD pattern?
- Answer: We sincerely thank the reviewer for this excellent question, as it gives us the opportunity to clarify one of the most novel and fundamental contributions of our work.
- Proposed action: While the analysis of PRPD (Phase Resolved Partial Discharge) patterns is a standard and highly valuable technique for PD classification, the primary objective of our study is to address a prior and more fundamental challenge: the automation of data acquisition from conventional, non-digital instrumentation.

Our purpose is to demonstrate the feasibility of transforming a passive, conventional measuring instrument, such as the DDX-9101, into a smart and structured data source in a non-invasive manner. To achieve this, we trained a YOLOv8 model to read and interpret its most basic and direct visualization mode: the time-domain signal. This innovation allows for the modernization of existing legacy equipment without the need for hardware modifications or costly upgrades, thereby democratizing access to digital monitoring.

Indeed, our approach does not preclude PRPD analysis; rather, it enables it. The data that our system automatically extracts (discharge magnitude and its time of occurrence) are precisely the ingredients required to construct PRPD patterns. As can be inferred from the data presented in Figure 17, these patterns could be generated and, in a subsequent research phase, be analyzed by another neural network (which could be another model from the YOLOv8 family or a different architecture) to perform a detailed classification of the PD type.

In summary, our work focuses on the innovative first step of automating the primary data source from legacy equipment, thus laying a robust foundation for more advanced analyses, such as the one the reviewer rightly suggests. We will include this clarification in the manuscript to better contextualize our contribution.

The following clarifying text has been included on line 692:

“The data that our system automatically extracts (discharge magnitude and its time of occurrence) are precisely the ingredients required to construct PRPD patterns. As can be inferred from the data presented in Figure 17, these patterns could be generated and, in a subsequent research phase, be analyzed by another neural network (which could be another model from the YOLOv8 family or a different architecture) to perform a detailed classification of the PD type.”

In the evaluation model section, what improvements or differences does the model used in the paper have compared to traditional models?
- Answer: We appreciate the opportunity to emphasize the novelty of our approach.
- Proposed action: We will improve the introduction and discussion to explicitly highlight the differences. Unlike traditional models that typically analyze pre-processed numerical data, our model features two key improvements:
  1. Data source automation: our system automatically converts a non-smart instrument into a structured data source by reading its screen. This eliminates the need for manual interpretation or proprietary data interfaces.
  2. Novel bimodal fusion: we fuse this electrical information (the "how much") with direct optical images (the "where" and the "how"), providing context that unimodal models cannot offer. The improvement is not only in the classification algorithm, but in the richness of the fused data it is provided with.

To clearly frame the contributions of the work we have included the following text at the end of the introduction, on line 245:

Data source automation: the first and fundamental improvement lies in the automation of the data source itself. Conventional approaches typically start with already digitized data. Our system, on the other hand, uses computer vision not only to analyze but also to generate structured data in the first place. By directly reading the display of a conventional, non-smart instrument, it transforms it into a self-contained, structured, digital data source. This eliminates the dependence on manual operator interpretation or proprietary data interfaces, offering an innovative solution for the modernization and automation of existing equipment.
Novel bimodal fusion for contextualized diagnosis: the second key improvement is the novel bimodal fusion that provides diagnostic context unattainable with unimodal models. Our approach fuses and combines this quantitative electrical information —the "how much"— automatically extracted from the detector with the qualitative and morphological characterization of direct optical images —the "where" and the "how." Therefore, the improvement lies not only in the classification algorithm itself, but in the contextual richness of the fused data provided to it. This synergy enables a much more comprehensive diagnosis, correlating the electrical severity of an event with its precise physical manifestation in space.”

And at the end of section 4, on line 993, we have added the following text:

“The results of this study should be interpreted considering the substantial improvements our model offers compared to traditional approaches. The innovation of this work lies not simply in the application of an advanced classification algorithm, but in a redesign of the PD diagnostic workflow.

First, we have demonstrated that it is possible to convert a conventional measuring instrument into an intelligent, autonomous sensor. Unlike models that require already digitized data, our system automates the extraction of information directly from a screen, eliminating the need for manual intervention or costly hardware upgrades. This intelligence applied to legacy equipment represents a practical and scalable contribution.

Second, the fusion of quantified electrical information with the spatial and morphological characterization of optical images offers an enriched diagnosis. While a traditional unimodal model can identify the severity of a discharge (the "quantum"), our bimodal approach adds crucial context by also answering the "where" and "how" questions. This synergy between quantitative and qualitative data is the system's main strength, allowing for a much deeper and more complete understanding of the PD phenomenon.”

Figure 15 is too
- Answer: Thank you very much. We agree.
- Proposed action: We have replaced Figure 15 with an improved, higher resolution version.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The idea of transforming existing equipment (DDX) from a simple measuring instrument into a "smart data source" has practical and academic value. The reliability of the system is enhanced by the fact that it was tested across 64 independent experiments over a wide voltage range (6–18 kV). Performance was evaluated from various perspectives, including precision, recall, mAP, and a confusion matrix. It has the potential to be directly applied to real-time monitoring and transformer preventive diagnostics.

However, key areas for improvement are noted below.

1. The description does not sufficiently highlight the differences between this study and existing multi-sensor-based PD studies. Direct comparison and emphasis are needed to highlight the core contributions of this study and the improvements achieved over existing methods.

2. The impact of class imbalance in the dataset (e.g., positive/negative pulses) on performance is crucial.

3. While only YOLOv8 was used, the validity of this study would be greatly enhanced if performance comparisons with other models, such as Faster R-CNN and EfficientDet, were presented. The labels in the figures should be consistent with the text terms (e.g., positive_pulse, positive_value). A brief mention of the latest YOLO versions (v9-v12) and a detailed explanation of why YOLOv8 was chosen in terms of "performance and stability" would be more persuasive.

4. Since it was mentioned that validation loss increased after 375 epochs (overfitting) during the HQ image training process, complementary methods such as early stopping, dropout, and data augmentation should be applied or at least discussed.

5. The phenomenon of pulse width decreasing with increasing voltage (negative correlation) is interesting, but the physical explanation is insufficient. Rather than simply presenting numerical results, linking it to the physics of discharge would enhance the paper's persuasiveness.

6. The limitations of optical detection methods (illumination conditions, noise, limitations in field applications, etc.) should also be clearly addressed.

Author Response

Answer to Reviewer 3

Thank you very much for your time and valuable feedback on our work "Fusion of electrical and optical methods in the detection of partial discharges in dielectric oils using YOLOv8". We have carefully reviewed your suggestions and responded to each of the proposed points. We are pleased that you have recognized the practical and academic value of transforming existing equipment into “smart data source."

1. The description does not sufficiently highlight the differences between this study and existing multi-sensor-based PD studies. Direct comparison and emphasis are needed to highlight the core contributions of this study and the improvements achieved over existing methods.

Answer: We completely agree. It's essential to place our work in the proper context and highlight its uniqueness.
Proposed action: In the revised version, we will expand the Introduction section to include a more direct comparison with other existing multi-sensor studies. We will emphasize that, while many works fuse data from established sensors (e.g., acoustic and UHF), our contribution lies in: (a) the novel "smart sensing" of a visual electrical detector using OCR and object detection, and (b) the fusion of this quantitative electrical information with high-resolution direct optical images, which provide morphological and spatial characterization not available with other sensors.

Therefore, we introduce the following text in the Introduction section on line 225:

“PD detection has advanced significantly through multi-sensor fusion strategies, which typically combine data from established sensors such as UHF, acoustic, and ground transient voltages. While these approaches are effective for detection and localization, they often lack direct correlation with quantitative electrical measurements displayed on standard test equipment, as well as a high-resolution visual context that captures the physical manifestation and morphology of the discharge.

Our work breaks away from this paradigm to address these limitations. The fundamental contribution of this study is twofold. First, we introduce a novel concept of smart sensing, through which we transform a conventional visual electrical detector —a passive display device— into an active and intelligent data source. Through advanced computer vision techniques, such as OCR and object detection, we automate the extraction of quantitative electrical values directly from the device's screen. Second, we merge this now digitized electrical information with high-resolution optical images of the discharge phenomenon itself. This fusion creates a unique dataset that enables a morphological and spatial characterization that other sensors cannot offer, directly correlating the electrical magnitude of the event with its physical manifestation (shape, color, intensity, and precise location). Therefore, while existing studies focus on data fusion from sensors operating in distinct physical domains (acoustic, electromagnetic, etc.), our contribution lies in the creation of a new smart sensor from a visual information source and its subsequent fusion with another optical modality, offering an unprecedented level of diagnostic detail."

2. The impact of class imbalance in the dataset (e.g., positive/negative pulses) on performance is crucial.

Answer: We agree, this is an important point in any machine learning study.
Proposed action: We will analyze and discuss the class balance in the results section (3.2.2). As can be seen in the confusion matrix for the test set (Figure 14), the number of instances for negative_pulse (327 correct out of 374) and positive_pulse (316 correct out of 353) is comparable. Therefore, there is no severe class imbalance that could significantly bias the model's performance. However, we will add a brief discussion on this point to confirm that the balance was adequate.

Therefore, we have introduced the following text after the confusion matrix presented in Figure 14, on line 592:

"A fundamental aspect in validating the robustness of a classification model is the analysis of the class balance in the dataset, since a severe imbalance could bias the training and evaluation metrics. To address this point, the distribution of the main classes in our test set has been examined. As evident from the confusion matrix (Figure 14), the total number of instances for the negative_pulse class is 374, while for the positive_pulse class it is 353. This distribution, with a ratio close to 1:1, confirms that there is no significant class imbalance. Therefore, it can be concluded that the high performance of the model in identifying both pulse polarities is genuine and not an artifact derived from an over-representation of one of the classes, which confers greater reliability to the presented results."

3. While only YOLOv8 was used, the validity of this study would be greatly enhanced if performance comparisons with other models, such as Faster R-CNN and EfficientDet, were presented. The labels in the figures should be consistent with the text terms (e.g., positive_pulse, positive_value). A brief mention of the latest YOLO versions (v9-v12) and a detailed explanation of why YOLOv8 was chosen in terms of "performance and stability" would be more persuasive.

Answer: We appreciate these excellent suggestions for strengthening the methodological justification.

• Proposed action:

We will add a more detailed justification for the choice of YOLOv8 in section 3.1. We will explain that it was selected for its optimal balance between high inference speed and high accuracy, making it ideal for potential real-time video analysis. Although a comprehensive comparison with other models such as Faster R-CNN is beyond the scope of this work, our choice is supported by a well-established body of literature. We will cite comparative studies and comprehensive reviews of the field [23;24] that quantitatively demonstrate that the YOLO family of architectures offers superior performance in speed with highly competitive accuracy, thus validating our selection for the purposes of this project.

The following text is added in the introduction on line 149:

"Our choice aligns with the conclusions of comprehensive state-of-the-art reviews [23], which establish that one-stage detectors like YOLO are optimized for real-time applications, unlike two-stage models like Faster R-CNN. Comparative studies such as those presented in the development of previous YOLO architectures [24] already demonstrate with benchmarks on standard datasets like COCO that they achieve a superior balance between speed and accuracy compared to models like Faster R-CNN."

[23] Z. Zou, K. Chen, Z. Shi, Y. Guo and J. Ye, "Object Detection in 20 Years: A Survey," in Proceedings of the IEEE, vol. 111, no. 3, pp. 257-276, March 2023, doi: 10.1109/JPROC.2023.3238524.

[24] Chien-Yao Wang and Alexey Bochkovskiy and Hong-Yuan Mark Liao. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. 2022, https://arxiv.org/abs/2207.02696.

We will add a brief mention of the most recent versions of YOLO (up to v12 at the time of writing, as mentioned in the article), and justify the use of v8 as a mature, stable, and well-documented version at the time of the research.

Therefore, we have added the following text in the introduction on line 192:

"After evaluating the state of the art, the YOLOv8 architecture was selected. This family of models is paradigmatic in one-stage object detection, offering a balance between speed and accuracy that outperforms two-stage architectures such as Faster R-CNN for real-time applications. Although the YOLO family continues its rapid evolution (with versions such as v9 - v12 already available), the choice of YOLOv8 was based on its status as a mature and stable release at the time of this project. Its robust performance, combined with broad community support and comprehensive documentation, provided the ideal framework to ensure the reliability and reproducibility of our results."

Thank you. We will review all figures and text in the manuscript to correct any inconsistencies in labels.

4. Since it was mentioned that validation loss increased after 375 epochs (overfitting) during the HQ image training process, complementary methods such as early stopping, dropout, and data augmentation should be applied or at least discussed.

Answer: We agree. Your observation about overfitting is very convenient.
Proposed action: We will expand on this discussion in section 4.1.2. We will clarify that, for the final evaluation, the model was used with weights corresponding to the point of minimum validation loss (around epoch 375), which is a form of implicit early stopping. We will also discuss how additional techniques such as dropout or more aggressive data augmentation could be applied in future work to further mitigate this effect and improve the model's generalization.

We have added the following text in section 4.1.2., on line 906:

"To mitigate this effect and ensure the selection of the model with the best generalization capacity, a strategy that works as an implicit early stopping mechanism was implemented. Specifically, for the final evaluation and all subsequent inferences, the model weights corresponding to the last training epoch were not used, but rather those saved from the epoch that recorded the minimum validation loss. This practice ensures that the selected model is the one that demonstrated the best performance on data not seen during training.

While this method was effective for the scope of our study, it is worth mentioning that there are additional regularization techniques that could be explored in future work to further robust the model against overfitting. Incorporating dropout layers into the architecture or applying a more aggressive data augmentation pipeline —including transformations such as random crops, stronger color variations, or mixup— could allow for longer training periods without the risk of overfitting, potentially improving the model's generalization.”

5. The phenomenon of pulse width decreasing with increasing voltage (negative correlation) is interesting, but the physical explanation is insufficient. Rather than simply presenting numerical results, linking it to the physics of discharge would enhance the paper's persuasiveness.

Answer: Thank you very much for your observation, which encourages us to delve deeper into the analysis.
Proposed action: In section 3.3.2, where Table 1 is presented, we will expand on the negative correlation (-0.41) between ocr_voltage and Width. We will hypothesize that at higher voltages, the discharges in the oil are more energetic but may be faster and more spatially concentrated, resulting in shorter pulse durations and therefore narrower pulses on the DDX screen. A thorough physical explanation would require a deeper analysis of the plasma dynamics. But this finding, automatically extracted by the model, is in itself an interesting result that links signal morphology to the physics of the phenomenon.

We have introduced the following text in section 3.3.2 on line 716:

“A particularly interesting finding that emerges from the correlation analysis in Table 1 is the moderate negative correlation observed (-0.41) between the applied voltage, ocr_voltage, and the detected pulse width, Width. This result, although at first glance might seem counterintuitive, may have a plausible physical explanation linked to the dynamics of PDs in dielectric oils.

From a physical perspective, we hypothesize that this behavior is related to the energy and speed of the discharge process. As voltage increases, the energy injected into the dielectric medium increases. This could lead to more accelerated ionization processes and the formation of more energetic, but at the same time more ephemeral and spatially concentrated, discharge channels. A shorter duration discharge event, i.e., faster, would be directly translated on the measuring equipment screen as a visual pulse with a smaller temporal width. Therefore, the system would be capturing a morphological manifestation of the greater intensity and brevity of discharges at higher voltages.

While a comprehensive characterization of the underlying plasma dynamics to validate this hypothesis is beyond the scope of this work, this finding is significant in itself. It demonstrates the ability of our computer vision system not only to quantify parameters in isolation but also to uncover subtle correlations that link the morphology of the visual signal to the physical principles of the PD phenomenon. This is a promising result that paves the way for future analyses that can corroborate these relationships with more detailed physical models.”

6. The limitations of optical detection methods (illumination conditions, noise, limitations in field applications, etc.) should also be clearly addressed.

Answer: We agree. A transparent discussion of the limitations is essential.
Proposed action: We will add explanatory text at the end of Section 2. There we will explicitly address the challenges of the optical method, including sensitivity to lighting conditions, possible attenuation of the optical signal by oil turbidity or aging, the need for direct optical access to the discharge zone, and the overall challenge of scaling this technique to large field equipment.

We add the following text at the end of section 2, on line 313:

“The proposed method is performed in a controlled laboratory environment and has proven to be an effective tool for the detection and characterization of PDs.

· Sensitivity to lighting conditions and optical environment: The performance of both YOLOv8 object detection and OCR character recognition is intrinsically dependent on the quality of the captured images. The methodology is sensitive to variations in ambient lighting conditions. Factors such as reflections, shadows, or uneven lighting can introduce noise and affect the accuracy of the algorithms. A controlled environment was maintained in our laboratory, but in-situ implementation would require more robust solutions, such as the use of controlled and polarized light sources or more advanced image preprocessing algorithms to normalize the captures.

· Transparency of the dielectric medium: The effectiveness of optical discharge detection relies on the assumption that the dielectric medium, in this case the oil, is optically transparent. In real-world applications, insulating oils degrade over time due to thermal and electrical stress, which can increase their turbidity, change their color, or generate suspended byproducts. This degradation would cause attenuation of the optical signal through scattering or absorption, making it difficult or even impossible to capture the discharge morphology, especially for low-intensity events.

· Direct optical access requirement: A fundamental requirement of this technique is the existence of a direct line of sight to the area where the discharges occur. Our experimental setup used a vessel with transparent walls, simulating an inspection window. However, most high-voltage electrical equipment in service is sealed metal vessels. Widespread application of this method would require the availability of equipment with inspection windows or the possibility of making significant structural modifications to install them, which is not always feasible, safe, or economically viable.

· Scalability to field equipment: The transition from a laboratory environment to on-site diagnostics on large equipment, such as power transformers, presents considerable challenges. The large internal volume of this equipment makes it complex to determine the optimal location of one or more cameras to cover all potential risk areas. Furthermore, integrating a vision system into the existing monitoring infrastructure and ensuring its durability in the harsh environmental conditions of an electrical substation are engineering hurdles that must be addressed for practical, large-scale implementation.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

I have no more comment.

Comments on the Quality of English Language

None

Reviewer 3 Report

Comments and Suggestions for Authors

I am confident that the revised manuscript has incorporated the reviewers’ valuable suggestions and has achieved a clear qualitative enhancement.

Article Menu

Fusion of Electrical and Optical Methods in the Detection of Partial Discharges in Dielectric Oils Using YOLOv8

Answer to Reviewer 1

Answer to Reviewer 2

We have added the following paragraph in section 3.1.2, Manual labeling of DDX images, below Figure 5 on line 399:

Answer to Reviewer 3

1. The description does not sufficiently highlight the differences between this study and existing multi-sensor-based PD studies. Direct comparison and emphasis are needed to highlight the core contributions of this study and the improvements achieved over existing methods.

Therefore, we introduce the following text in the Introduction section on line 225:

2. The impact of class imbalance in the dataset (e.g., positive/negative pulses) on performance is crucial.

Therefore, we have introduced the following text after the confusion matrix presented in Figure 14, on line 592:

• Proposed action:

The following text is added in the introduction on line 149:

Therefore, we have added the following text in the introduction on line 192:

4. Since it was mentioned that validation loss increased after 375 epochs (overfitting) during the HQ image training process, complementary methods such as early stopping, dropout, and data augmentation should be applied or at least discussed.

We have added the following text in section 4.1.2., on line 906:

5. The phenomenon of pulse width decreasing with increasing voltage (negative correlation) is interesting, but the physical explanation is insufficient. Rather than simply presenting numerical results, linking it to the physics of discharge would enhance the paper's persuasiveness.

6. The limitations of optical detection methods (illumination conditions, noise, limitations in field applications, etc.) should also be clearly addressed.

Further Information

Guidelines

MDPI Initiatives

Follow MDPI