1. Introduction
In the first half of 2025, 140,000 fires caused by electrical faults were reported across China, accounting for 25.4% of all reported fires during this period. Electrical fires account for the largest proportion [
1]. The difficulty in accurately detecting arc faults makes them a persistent and major cause of electrical fires. Consequently, research into effective detection techniques is critically needed.
Traditional arc fault detection technologies primarily rely on the physical characteristics of arcs and time-frequency domain features. Zhao S. et al. [
2] extracted the electromagnetic radiation signals from steadily burning arcs in different DC systems to develop a fault detection method based on steady patterns in the frequency domain, such as the structural similarity index and 6 dB bandwidth bins. This approach can effectively avoid nuisance tripping and demonstrates good adaptability across different DC power systems. Xiong Qing et al. [
3] captured arc-induced high-frequency signals using parallel capacitors and employed a Rogowski coil to measure the amplitude of current pulses and differences in the integrated Fast Fourier Transform, enabling the detection and localization of series arc faults. Their method proved robust under various environmental and load conditions. Nashrulloh M. et al. [
4] investigated the current spectrum characteristics at different arc fault locations through PSIM simulation and MATLAB-based FFT analysis (the study, published in September 2021, likely utilized MATLAB R2021a or R2020b), achieving fault location by distinguishing harmonic components. However, this method is dependent on simulation environments and fixed load conditions. Kim et al. [
5] proposed detecting series AC arc faults using only voltage waveforms, identifying them by analyzing the unique symmetric energy profile formed by harmonics generated during arc ignition and extinction. Kavi et al. [
6] introduced a time-domain technique based on mathematical morphology, termed the Decomposed Open-Close Alternating Sequence (DOCAS), for large grid-connected photovoltaic systems. This method detects faults by correlating sustained random spikes in the algorithm’s output with the rate of change in DC arc current and voltage and utilizes the increased effective fault resistance for localization and noise suppression. For naval shipboard DC power systems with pulsed loads, Maqsood et al. [
7] adopted a clustering-based approach to extract unique feature vectors from Short-Time Fourier Transform (STFT) analysis, enabling the differentiation of load transients, shunt faults, and series arcing faults. Balamurugan et al. [
8] employed a computer-controlled mechatronic testbed to generate repeatable arc conditions and compared the effectiveness of Fast Fourier Transform (FFT) and STFT in analyzing arc voltage/current waveforms for PV arc fault detection. Cho et al. [
9] focused on optimizing frequency feature extraction for DC series arcs through FFT analysis by adjusting sampling frequencies and points, aiming to distinguish fault conditions from normal operations. He et al. [
10] proposed a dual-signal multi-timescale feature extraction framework guided by arcing physical characteristics, combined with a decision tree and criteria, achieving high detection accuracy in complex scenarios involving diverse loads, topologies, and arc-generating modes. Xiong et al. [
11] developed an accurate DC arc model incorporating steady-state impedance, high-frequency, and dynamic characteristics, and proposed a detection algorithm that integrates a K-line diagram with the spectrum integral difference in arc current. This approach can effectively discriminate between normal operation, arc faults, switching actions, and load mutations.
In recent years, significant advancements in artificial intelligence have provided new solutions for arc fault detection. Deep learning methods can automatically extract deeper features from raw signals, enabling more accurate fault identification. Numerous studies have focused on different approaches to data preprocessing, feature extraction, and model architecture. For instance, Y. Wang et al. [
12] proposed a hybrid detection method based on improved Mel-Frequency Cepstral Coefficients (MFCC) preprocessing and a lightweight neural network, achieving high recognition accuracy under various loads. Lu S. et al. [
13] addressed arc detection in photovoltaic systems using a Lightweight Transfer Convolutional Neural Network with Adversarial Data Augmentation (LTCNN-ADA), tackling challenges such as the discrepancy between source and target domain data and the scarcity of fault data in the target domain. Q. Yu et al. [
14] focused on low-voltage three-phase systems, proposing a fault arc phase selection method based on a global temporal convolutional network, which enhances feature extraction through an attention mechanism. However, such end-to-end intelligent diagnostic methods still commonly face issues including lack of model interpretability [
15], shortage of training data, and limited generalization capability due to models often being trained on single-load scenarios. To address these challenges, subsequent research has explored various improvements in feature enhancement, data representation, and model efficiency. Chu et al. [
16] proposed converting time-domain current signals into grayscale images and feeding them into a Long Short-Term Memory (LSTM) network for identification, achieving good performance in residential applications, though with reduced accuracy for thyristor-based loads. Yang et al. [
17] innovatively transformed current time series into visibility graphs and utilized a graph convolutional network for learning, improving detection robustness in environments with variable loads. Park et al. [
18] integrated artificial intelligence with Time-Frequency Domain Reflectometry (TFDR), employing denoising autoencoders and generative adversarial networks to enhance noise-resistant detection of series arc faults in DC grids. At the level of feature engineering, Dai et al. [
19] transformed signals into images via a relative position matrix, employing a mixed-attention residual network to detect singular features. Qu et al. [
20] proposed a multi-domain deep feature association framework, which integrates time-domain, frequency-domain, and wavelet packet energy features through a stacked neural network. Gong et al. [
21] devised a detection model that synergizes wavelet transform, eigenvalue decomposition, and a deep neural network, notable for its minimal input requirements and training efficiency. For multi-branch load circuits, Tian et al. [
22] introduced a feature enhancement method based on seasonal-trend decomposition and recursive least squares, effectively extracting fault features obscured by normal operating signals. Although these methods have achieved high accuracy under specific conditions, their generalizability, real-time performance, and interpretability in broader, more complex real-world scenarios require further verification and improvement.
The effectiveness of conventional machine learning algorithms in arc fault diagnosis is often constrained by their dependence on manually selected features [
23]. In contrast, prototype learning has emerged as a promising paradigm that integrates representation learning with case-based reasoning [
24]. It operates by learning representative prototypes (typical samples or patterns) for each category. During prediction, the model makes decisions by comparing the similarity between an input sample and all learned prototypes.
This paper proposes an arc fault detection method based on mixed attention to extract key arc features and prototype learning ideas. The results show the effectiveness of the method.
The main contributions of this paper are as follows.
- (1)
An arc test platform covering 12 kinds of household loads is established. The accuracy of the proposed arc detection method is not less than 99% when arc occurs in a different load branch. After deploying the method to the hardware, the arc fault detection under multi-load conditions can be realized.
- (2)
This paper proposes a hybrid attention mechanism to extract multi-dimensional arc features, facilitating the construction of a prototype set. Building on these features, we establish an arc fault detection model. The three-dimensional features serve as spatial coordinates, enabling the visualization of arc characteristics and thereby enhancing model interpretability.
- (3)
To achieve precise visualization of the prototype set, we propose a novel convex hull algorithm that iteratively approximates and refines the decision boundaries under different working conditions, thereby enhancing the representational accuracy of the prototype set.
3. Improved Attention Mechanism Fusion Feature Extraction Method
3.1. Prototype Learning
Prototype learning is a methodology rooted in metric learning. Its core idea is to learn a representative prototype for each class and perform classification or representation learning based on the distance between a query sample and each class prototype. This approach typically employs an embedding function that maps input data into a low-dimensional space, where samples cluster around their corresponding prototypes while prototypes of different classes are separated from each other. Owing to this mechanism, prototype learning can rapidly establish and update class representations with only a few samples, thereby effectively addressing classification problems under data scarcity. Moreover, it is structurally simple, computationally efficient, interpretable, and exhibits strong generalization capability.
3.2. Multidimension Feature Extraction
In low-voltage AC systems, a series arc fault is characterized by a significant increase in line current harmonics and the emergence of a current-zero ‘flat shoulder’ in the time domain, alongside distinct high-frequency spectral signatures. To effectively construct a representative arc fault prototype and facilitate subsequent visualization, we select a tri-dimensional feature set encompassing the time domain, frequency domain, and the time derivative (rate of change). This multi-perspective approach leverages comprehensive signal information, thereby enhancing prototype representativeness. Time-domain features provide raw morphological information, offering the most intuitive depiction. Frequency-domain features reveal spectral composition, explaining the signal’s frequency structure. The time derivative captures the dynamic characteristics by describing the signal’s instantaneous rate of change. By integrating these three complementary dimensions within a unified feature space, the constructed prototype can characterize the target fault mode more comprehensively and intrinsically, mitigating the limitations inherent in any single-dimensional representation.
3.3. Attention Mechanism
The attention mechanism in deep learning finds key information from global information to pay attention to, ignoring secondary invalid information, which can improve the expression ability and performance of the model [
26], as shown in
Figure 4.
The core of the attention mechanism is to calculate the weight of Value according to the similarity between Query and Key and generate output by weighted summation. The specific calculation formula is shown in Formula (1). In the formula, ‘Q’ represents ‘Query’, ‘K’ represents ‘Key’, ‘V’ represents ‘Value’ and d
k represents the dimension of the key.
By integrating multiple attention forms, the hybrid attention mechanism effectively strengthens the ability of the deep learning model to capture and select key features. When the input characteristic graph is X ∈ R
H×W×C, where H, W, C represent the height, width and number of channels in turn. The global information extraction of channel attention is shown in Equation (2).
Z
c denotes the global channel statistical vector, i, j denotes the quantity of the spatial position. The channel weight calculation is expressed in Equation (3).
F
1 and F
2 are the weights of the fully connected layer, σ is the sigmoid function, δ is the Rectified Linear Unit (ReLU) activation function, and A
c is the channel attention weight. In spatial attention, spatial feature extraction is represented by Formulas (4) and (5).
M
avg and M
max represent the average feature map and the maximum feature map, respectively. C is the channel index. The calculation of spatial weight is shown in Equation (6).
In Equation (6), M
cat is the splicing result of two feature maps, f is a k × k convolution operation, usually k = 7, and A
s is the spatial attention weight. The weight combination of mixed attention is expressed in Equation (7).
In Equation (7), α, β are the learning fusion weights, and α + β = 1.
The ordinary single-head attention mechanism is good at capturing the global long-term dependencies within the sequence, but its computational complexity is high, and it is easy to lose key location information due to the lack of built-in location awareness mechanism, and the weight may also be unstable when dealing with noise. As a lightweight and efficient channel attention mechanism, Squeeze-Excitation Network (SENet) can effectively improve the sensitivity of the model to important channel features, but its drawback is that it completely ignores the information of spatial dimension, and its channel weight generated by global pooling is easy to become unstable under noise interference [
27]. Convolutional Block Attention Module (CBAM) is a hybrid attention mechanism, which achieves more comprehensive feature enhancement by combining the two dimensions of channel and space [
28]. However, its spatial attention module is redundant in processing one-dimensional signals, and its channel attention is still insufficient in complex cross-channel nonlinear interaction modeling [
29].
3.4. TDDA Module
The one-dimensional nature of arc fault signals necessitates a model capable of precisely identifying and extracting the decisive regions for fault diagnosis. To construct a highly discriminative feature representation for low-voltage AC arc faults, this study begins with an analysis of the underlying physical characteristics: the fault signature manifests as millisecond-level high-frequency oscillations within the current waveform. Its energy distribution varies nonlinearly with load conditions and is frequently obscured by background noise. Conventional feature extraction methods face a fundamental trade-off between capturing transient details and maintaining adaptability across varying operational conditions. While fixed-threshold techniques are susceptible to missing subtle arcs, purely data-driven models often overfit in limited-data scenarios. To overcome these limitations, we propose a triadic attention fusion paradigm. This approach integrates three complementary components: a static unit matrix A0 encoding foundational physical constraints, a dynamic convolution weight A1 that adapts to load-induced feature drift, and a trainable offset matrix A2 dedicated to modeling nonlinear inter-channel interactions. Instantiating this paradigm, we present the Ternary Dynamic Attention module, a lightweight, embeddable component designed for CNN architectures.
Based on the theoretical principles of attention mechanisms detailed in
Section 3.3, this work recombines the core design elements of global statistical aggregation, channel-wise reweighting, and spatial-sequence focusing as encapsulated in Equation (1) through Equation (7). The core design of the TDDA module is to decompose the hybrid attention mechanism into three dedicated yet cooperative matrices. The static identity matrix A0 establishes a stable prior for channel attention, providing a constant and robust initial reference state aligned with the concept formalized in Equation (3). The dynamic convolution weight A1, structured through a learnable bottleneck operation corresponding to Equation (8), emulates the input-adaptive weight generation of self-attention from Equation (1). It also incorporates the focused regional emphasis characteristic of the spatial attention mechanisms described in Equation (4) through Equation (6). The trainable bias matrix A2 introduces higher-order nonlinear interactions, a component often abstracted in standard attention formulations. This matrix functions as a parameter set optimized via gradient-based learning, serving to refine the combined weighting of A0 and A1. This fusion and refinement process is analogous to the weighting principle illustrated in Equation (7) and enhances the modeling of complex cross-channel dependencies. Through the synergistic operation of A0, A1, and A2, the TDDA module preserves the representational strength of the classical attention framework while specifically augmenting its stability, selectivity, and nonlinear modeling capacity for processing one-dimensional fault signals. The structure of the module is shown in
Figure 5. The fusion operation at the bottom of
Figure 5 implements the final step of the ternary attention mechanism. It computes a weighted sum of the static prior A0 and the dynamic weight A1, then adds the trainable offset A2 to generate the final attention weights for feature modulation. This design directly extends the core fusion principle outlined in Equation (7).
In this model, B represents the batch size, L represents the sequence length, and C represents the number of channels. The dimension of the input data is B × L×C. Firstly, the Global Average Pooling (GAP) is used to compress the sequence dimension, and the channel-level statistical description vector is generated. By calculating the global average value of each channel, we focus on the overall characteristics of the channel. Then, the dimension is transposed, and a new dimension is added to the data after global average pooling to meet the input requirements of 1 × 1 convolution. The 1 × 1 convolution reduces the sequence length to C/r (r is the preset compression ratio), and the final dimension is extended to C/r times by the dynamic weight expansion module to obtain the symmetric matrix A1. The process generated by A1 can be represented by Formula (8). Where U is denoted by Formula (9).
In Formulas (8) and (9), W1 represents the learnable weight matrix, r is the compression ratio, d = C/r, and d is the scaling factor.
A1 as a dynamic weight to participate in the subsequent data processing process. A0 participates in the operation as a static unit matrix, and A2 plays the role of offset matrix, as shown in
Figure 6.
The matrix A2 is initialized as an all-zero matrix, with dimensions B × C/r × C/r consistent with A1. Its parameters are then optimized via backpropagation based on the model’s loss function. Through this process, A2 learns to represent nonlinear interactions across channels, which enhances the model’s adaptability to noise interference and nonlinear patterns and provides a corrective adjustment to the static and dynamic weights. This generation process for A2 is summarized by Equation (10).
The Formula (10) represents the result of the t-th iteration when the gradient descent is updated, η is the learning rate and L is the loss function. The accuracy and loss rate of the A2 training process are shown in
Figure 7.
The synergistic design of the three matrices is tailored to precisely capture the distinctive physical characteristics of arc faults. The static identity matrix A0 provides a stable, physics-aligned prior, ensuring consistent attention to fundamental channel features that characterize the arc’s core energy distribution. This enhances model robustness against strong noise interference. The dynamic convolution weight A1 adaptively learns from the input signal to locate and amplify the key millisecond-level high-frequency oscillatory transients in the current waveform, which vary with load conditions. This enables dynamic focus on the most discriminative temporal regions for fault identification. The trainable offset matrix A2 is dedicated to modeling complex nonlinear interactions across channels—an essential aspect of arc behavior. By applying a learned nonlinear correction to the initial attention weights formed by A0 and A1, A2 refines the feature representation, allowing the model to better distinguish genuine arc signatures from background noise or load harmonics.
To illustrate the contribution of different modules in TDDA, we designed a ablation experiment of 7500 data, including 3744 arc samples and 3756 non-arc samples. Based on the CNN model in
Section 4.1, the A1 matrix, A1 and A0, and the effect of the entire TDDA are added, as shown in
Table 2.
The TDDA module introduces three matrices: A1, A2, and A0. Among these, A1 is the core component for extracting key features from arc samples, while A2 is a trainable offset matrix that enables nonlinear channel interactions. The introduction of A1 and A2 accounts for the majority of the module’s parameter increase compared to the baseline. A0 provides a channel-attention prior, further refining the feature selection. Collectively, these components allow the TDDA-enhanced model to focus on more discriminative features, leading to a significant improvement in accuracy over the conventional CNN baseline. Ultimately, this design achieved an 8.51% increase in accuracy with only a 3.1% growth in model parameters.
By comparing the accuracy, parameter increment, and Flops increment of the model in this study between the TDDA module and other mixed attention mechanisms, the effect of the TDDA module is further illustrated, as shown in
Table 3. The reduction rate of SE and CBAM is 16, the convolution kernel size of Efficient Channel Attention (ECA) is 3, and the reduction rate of TDDA is 8. Compared with other hybrid attention mechanism models, TDDA adds a static unit matrix A0 to provide channel independence prior, which solves the problem of weight instability of SENet under noise. In the one-dimensional signal scenario, a CBAM redundant spatial attention branch is proposed to replace the two-dimensional convolution operation with a lightweight one-dimensional convolution. By introducing the trainable offset matrix A2 into the channel attention, the shortcomings of CBAM in cross-channel nonlinear modeling are solved. Compared with the ordinary attention mechanism, TDDA uses a dilated convolution hierarchy instead of position coding to avoid the loss of absolute position information, and performs timing compression through global average pooling, which significantly reduces the computational complexity.
4. Arc Fault Detection Method Based on Fusion Prototype Learning Model
4.1. TDDA-CNN Prototype Learning Model
This paper proposes a backbone network architecture for multi-dimensional feature extraction, which is optimized for three types of derivative features of 50 Hz AC current signals: time domain waveform, frequency domain distribution and time domain change rate (di/dt). Based on 100 kHz sampling rate data (1000 points per power frequency cycle), the network uses three parallel Tri-Domain Dynamic Attention-Convolutional Neural Network (TDDA-CNN) branch backbone networks to input one-dimensional data, extract features through three convolution modules, and finally generate high-resolution prototype vectors through Dense compression. Get the basic characteristics of the prototype, as shown in
Figure 8.
The features extracted from the backbone network are embedded into a three-dimensional feature space that jointly represents information from the time domain, the frequency domain, and their respective rates of change, thereby constructing the visual prototype. In the first convolution block, 1-Dimensional Convolution (Conv1D) is used to extract local features, ReLU activation introduces nonlinearity, and 1-Dimensional Max Pooling (MaxPool1D) downsampling retains the main features. In the second convolution block, Conv1D is used to expand the receptive field, ReLU introduces nonlinearity, TDDA module is used for channel attention calibration, noise suppression, and MaxPool1D is used for downsampling. In the third convolution block, Conv1D is used to further expand the convolution receptive field, ReLU introduces nonlinearity, TDDA is used for secondary calibration of deep features, and 1-Dimensional Global Average Pooling (GlobalAvgPool1D) is used to globally average along the sequence dimension to output global feature vectors. After the feature vectors of the three dimensions are generated, the feature fusion, dimension reduction and visualization operations are further performed to generate the prototype in the three-dimensional scene. The introduction of Gated Linear Unit (GLU) in the change rate branch can better capture the characteristics of the current change signal. The activation function ReLU enhances the nonlinear expression ability of arc oscillation characteristics in the time domain branch, strengthens the nonlinear relationship between frequency bands in the frequency domain branch, and realizes the gated control and characteristic nonlinear enhancement in the rate of change branch.
The structure of each layer of the backbone network is shown in
Table 4, where L is the length of the sequence and C is the number of channels. Enter a sample with 1000 × 1 as an example.
Current data under both normal and arc fault conditions were collected for 12 types of household loads using the experimental platform described in
Section 2.1. These data were then used to train the backbone network. The sampling rate is 100 khz, a cycle is 20 ms, and a cycle of data is used as a data sample. The results collected by the experimental platform are normalized, averaged and processed by Fourier transform to obtain data, including with arc and without arc. The dataset was partitioned into training, validation, and test sets using a stratified random sampling approach. This method ensures that the class distribution across all subsets reflects the original proportion of arc-containing and arc-free samples. Specifically, samples were first grouped by label into two separate subsets. Each subset was then independently shuffled and randomly divided into proportions of 75%, 15%, and 10% to form the training, validation, and test splits, respectively. Finally, the corresponding splits from both categories were combined to create the final datasets. As shown in
Table 5.
The training was carried out in batches, with a total of 10 rounds and a batch size of 32. Adam is selected as the optimizer, the learning rate is set to 0.0001, the random seed is fixed to 42, and the loss function is cross entropy. The accuracy and loss changes in the training model are shown in
Figure 9.
The arc fault recognition model achieved an accuracy of 99.65%, with a precision of 99.75%, a recall of 99.41%, a False Positive Rate (FPR) of 0.174%, and a False Negative Rate (FNR) of 0.594%. The specific results are shown in
Table 6. We confirmed the above results through repeated training to ensure the generalization of the model.
The confusion matrix presented in
Figure 10 categorizes the 12 tested loads into four types: resistive loads, power electronic loads, motor loads, and gas discharge loads. Specifically, the resistive load group comprises a water heater, a bathroom heater, an electric iron, and a water dispenser. The power electronic load group includes LED lights, a switching power supply, an induction cooker, and a microwave oven. Motor loads consist of a vacuum cleaner, a refrigerator, and a washing machine, while the gas discharge load is represented by a fluorescent lamp.
To evaluate the model’s robustness against typical household electromagnetic interference—such as equipment transients, background noise, and AC powerline disturbances—we conducted tests by injecting three types of noise: Gaussian white noise, impulse noise, and periodic noise [
30,
31]. The Signal-to-Noise Ratio (SNR) was varied from 30 dB to −5 dB to simulate conditions ranging from typical to harsh home environments. The detailed performance metrics under these noise conditions are summarized in
Table 7.
The robustness evaluation across varying SNRs and noise types reveals a consistent performance trend. The model’s accuracy monotonically declines as the SNR decreases, with a more pronounced degradation observed below 5 dB. Notably, even under extreme noise conditions at 0 dB and −5 dB, the model maintains robust recognition accuracy. Among the three noise types, impulse noise causes the most significant performance drop due to its transient similarity to arc signatures, whereas periodic noise is relatively easier to suppress owing to its regular pattern.
The horizontal and vertical axes of the confusion matrix correspond to the predicted category and the actual category of the model, respectively. The value on the main diagonal is the number of samples that are correctly identified. Diagonal data is the number of correctly identified samples. Due to the single type of gas discharge load, the main model based on prototype learning idea has fewer samples than other types of load feature extraction, so the accuracy of arc fault judgment of gas discharge load is lower than that of other types of loads.
To illustrate the contribution of each branch of the model, we designed ablation experiments to compare the accuracy and parameters of the time domain, frequency domain, and change rate branches in turn, as shown in
Table 8. We use TIM, FRE, and DEL to represent the time domain, frequency domain and rate of change branch, respectively. It can be seen from
Table 6 that under the action of TDDA, the detection of a single branch can have a higher accuracy. Compared with other branches, the frequency domain branch has the most significant improvement in accuracy.
4.2. Visualization of 3D Prototype Feature Set
Through the processing of arc data by the TDDA-CNN prototype learning model, the arc prototype feature set under the corresponding load can be obtained. The end of each branch is a 32-dimensional fully connected layer, which is used to transform the learned features into a 32-dimensional vector. For each sample
i, we define a three-dimensional mapping function as shown in Equation (11).
The calculation of each point P is shown in Formula (12), where the three f in M represent the kth element of the time domain, frequency domain, and rate of change branch eigenvectors, respectively.
For different class c, the generation of the prototype point set is shown in Equation (13), where Nc is the number of samples in class c.
To visualize the prototype features, we construct a three-dimensional coordinate system. Its axes correspond to the time-domain features, frequency-domain features, and the feature change rate, respectively. For each arc fault sample, the output vector from the fully connected layer of the TDDA-CNN model is mapped to a point in this space, forming a single prototype. By aggregating a large number of such prototypes generated under the same working condition, a cluster emerges. The processed visualization of this cluster constitutes the arc prototype feature set for that specific condition, as shown in
Figure 11.
In
Figure 11, the arc fault prototype set of different loads has different degrees of interval range and maximum value on the axis, and presents the aggregation within a certain range, which lays a foundation for dividing the range of arc prototype feature set under different working conditions.
4.3. Arc Fault Prototype Feature Set Correction
To enhance the detection accuracy, this paper proposes a corrective method that refines the arc prototype set by leveraging the non-arc prototype set. The procedure consists of three key steps. First, the non-arc prototype set is constructed from normal load data to serve as a reference for the negative class (normal operation) in the feature space. Second, an improved convex hull algorithm is employed to delineate the geometric boundaries for both the arc and non-arc prototype sets, respectively. Finally, the overlapping regions between these two boundaries are identified and removed from the arc prototype set. This yields a refined decision region for arc faults, which is more discriminative by explicitly excluding ambiguous zones prone to confusion with normal states.
4.3.1. Convex Hull Approximation Algorithm
The proposed arc fault detection method utilizes a prototype feature set derived from the TDDA-CNN model. For an input sample under unknown operating conditions, the model outputs a prototype point, which is then visualized as a set of coordinates in a three-dimensional feature space. A fault is detected if this point falls within the spatial region occupied by the arc fault prototype set.
However, a simple convex hull is often insufficient for precisely delineating the complex boundary of this region, potentially leading to misclassification. To address this, we design a convex hull approximation algorithm to more accurately model the boundary, particularly for the arc-free point set. The specific steps of this algorithm are described below.
By randomly selecting n boundary points as the initial vertex set V. These points should be located at or near the boundary of the real prototype set so that the algorithm can iterate from a reasonable starting point. For the tth iteration, we pre-extend and correct each point in the vertex set V(t). Finally, the convergence results after the space update are judged.
Convex Hull Pre-Expansion and Boundary Correction
In the search space Fi(t), a new point v is found, so that the volume of the new convex hull is the largest after replacing the current vertex vi(t) with this new point. By removing the vertex vi(t) from the current vertex set V(t), the set V(t)i2 is obtained. A point v is found in the search space V(t)i2, so that the volume of the convex hull formed by adding v to V(t)i2 is the largest. This newly found point is denoted as vi(pre), that is, the pre-expansion point.
Since the pre-expansion point vi(pre) may not be on the boundary of the real prototype set, we need to modify it to the boundary. For the point v
(t) the loss function L(v) is defined as the classification confidence, then the gradient is shown as shown in Formula (14).
The iterative update formula is shown in Equation (15), where α
t is the adaptive step size, Proj
n is the projection along the normal vector.
The adaptive step size is shown in Equation (16), where η is the basic learning rate and L
max is the boundary threshold.
According to the above steps, the point vi(corr) close to the real boundary is obtained by iterative correction with vi(pre) as input.
Space Update and Convergence Judgment
After obtaining the correction point vi(corr), we need to update the search space corresponding to the vertex for further optimization in subsequent iterations. Firstly, the normal vector ni(t) at the correction point vi(corr) is calculated. This normal vector can be understood as the vertical direction of the real boundary at this point, pointing to the outside of the convex hull. Then, the search space Fi(t) is updated as the intersection of the original search space and a half space. This half-space is defined by the normal vector ni(t) and the point vi(corr), and only the points on the inner side of the boundary or on the boundary are considered, thus reducing the search range.
After completing a round of iterations on all vertices, we obtain a new vertex set V(t+1) Then, we calculate the volume change rate of the new convex hull and the old convex hull. If this relative change is less than a preset threshold, the convex hull is considered to have converged and the iteration is stopped. Otherwise, continue the next iteration.
4.3.2. Convex Hull Correction Effect
By establishing convex hulls and selecting some point sets in three-dimensional space as the boundary of three-dimensional graphics, the prototype feature set can be intuitively fitted, as shown in
Figure 12.
To address the potential spatial overlap between non-arc and arc prototype sets, which can degrade detection accuracy, we implement a correction step. As
Figure 13 illustrates, we first train the TDDA-CNN model exclusively on non-arc samples. Then, following the method in
Section 4.3.1, we identify and excise the overlapping regions between the non-arc and arc prototype sets from the former. This refinement of the non-arc prototype set sharpens the detector’s overall discrimination capability.
As shown in
Figure 13, the yellow and blue regions represent the archetypal set regions with and without arc, and the green represents the overlapping part. After the non-arc data is processed by the TDDA-CNN prototype learning model, the generated non-arc prototype set will have two cases of coincidence and non-coincidence. For the non-coincidence case, it has no effect on the arc prototype set. For the arc prototype set with coincidence, the overlap part is removed in the arc part to realize the correction of the prototype set.
After correcting the prototype set of different loads, the data to be detected is processed by the TDDA-CNN prototype learning model, and the arc fault detection accuracy before and after the convex hull correction under different loads is obtained. To further illustrate the effect of convex hull correction, the arc fault detection accuracy of different loads before and after convex hull correction is compared with the minimum bounding rectangle, ellipse fitting, and Gaussian mixture model, as shown in
Table 9. As shown in
Table 9, compared with other methods, the improved convex hull algorithm has achieved superior performance under different loads after correction.
In this paper, TDDA module, CNN network and prototype learning form a unified architecture that is closely coordinated and deepened layer by layer. As an attention mechanism embedded in each branch, the TDDA module dynamically strengthens the channel information most related to the arc state in the features extracted by CNN and improves the discrimination of features. On this basis, the CNN network further fuses and abstracts the local patterns of each branch through multi-layer convolution and pooling operations and finally maps the high-dimensional features of each branch into 32-dimensional feature vectors. These three 32-dimensional vectors are transformed into X, Y, and Z coordinates, which together form a point in a three-dimensional space, thus transforming the multivariate time series features into geometric representations that can be intuitively expressed and measured.
Based on this geometric representation, prototype learning constructs a self-evolvable decision-making framework: by clustering the points of similar samples in three-dimensional space, two types of prototype point sets of ‘arc’ and ‘no arc’ are formed and continuously updated. The new samples are classified by comparing the spatial distance with various prototype points, and their own characteristics are also involved in the iterative correction of the prototype set. Therefore, TDDA and CNN jointly play the role of feature extraction and structuring, transforming the original signal into a spatial point with high discrimination; on this basis, prototype learning realizes interpretable continuous learning and inference. The three are progressive from feature optimization, space construction to dynamic discrimination, and are unified in an end-to-end arc fault detection system.
5. Experimental Verification
To evaluate the performance of the model proposed in this paper, the intelligent circuit breaker using the arc fault judgment algorithm proposed in this paper is connected to the multi-load topology, as shown in
Figure 14. Load types include common household loads such as fluorescent lamps, LED lamps, switching power supplies, water dispensers, vacuum cleaners, refrigerators, microwave ovens, induction cookers, electric irons, water heaters, etc. These loads can be powered on either a single load or multiple loads. On the branch line where the load is electrified, the arc fault that can develop into an electrical fire in the real scene is simulated by the carbonized cable to test whether the circuit breaker deployed in this method can detect the arc and cut off the line in time to avoid the occurrence of electrical fire.
The deployment of the model is based on STM32H750 microcontroller (manufactured by STMicroelectronics, Geneva, Switzerland; sourced from Huai’an Shenbiao Intelligent Technology Co., Ltd.: Huai’an, China). The single inference delay of the optimization model deployed on the platform is 7.9 ms, which meets the real-time requirements of the protection action. Under continuous operation, the CPU load is about 65–70%, and the model weight and peak Random Access Memory (RAM) occupancy are 412 KB and 89 KB, respectively. The resource occupancy is significantly lower than the chip limit, reflecting the lightness of deployment. Under 3.3 V power supply, the average operating current of the system is 42 mA, showing good power consumption characteristics.
The breaking results of the circuit breaker under the scenario of single load arc are shown in
Figure 15. It can be seen from
Figure 15 that the circuit breakers deployed in this method can operate in a short time after the arc occurs, remove the arc fault, and realize the protection of the equipment.
There may also be multiple loads working at the same time in the household scenario. In the case of arcing of multiple loads, the breaking results of the circuit breaker are shown in
Figure 16. It can be seen from
Figure 16 that the circuit breakers deployed in this method can operate in a short time after the arc occurs, remove the arc fault, and realize the protection of the equipment.
To verify the reliability of the test results, based on the requirements of IEC62606 standard [
25], multiple tests were carried out on the scenarios of single load arcing and multiple load arcing. Each load type was tested three times, as shown in
Table 10. The effect under different loads is further illustrated by
Figure 17. After testing, in the household electricity scene, the circuit breaker using this method has a good effect on the identification of low-voltage AC arc faults, and no missed detection or false tripping was observed.
Figure 17 presents the statistical response times across all tested load scenarios. This bar chart aggregates the 24 data points (8 scenarios × 3 trials) from
Table 10. It can be observed that the variation in detection time (indicated by the error bars) for each load is minimal. Furthermore, all measured action times fall within the required limits stipulated by the IEC 62606 standard for the corresponding test conditions. These results collectively demonstrate the high stability and reliability of the algorithm under various and complex real working conditions.
To systematically evaluate the limit performance and generalization capability of the proposed algorithm under unknown and complex working conditions, this section designs an advanced stress test that goes beyond the standard test specifications. The test aims to proactively explore the performance boundaries of the algorithm. With reference to the line combination configuration of the test platform described in
Section 2, multiple sets of random load and multi-line combination scenarios that were absent from the training phase are constructed, with their complexity sequentially increasing. This design simulates unpredictable extreme power usage combinations that may occur in actual household grids, thereby verifying the model’s robustness when confronted with out-of-distribution samples. Each condition was tested 100 times. The key performance indicators obtained from this stress test, including the action time, number of false trips, and number of missed detections, are presented in
Table 11 below.
The test results delineate the trend of algorithm performance with increasing system complexity. Firstly, the algorithm maintains a 100% correct action rate across most untrained random combination scenarios, demonstrating its strong generalization capability. However, under the most complex condition of a random four-line combination, a single missed detection occurred, concomitant with the widest action time range observed. These concurrent findings indicate that when the number and complexity of arc signals requiring concurrent processing reach a certain threshold, the system’s real-time computing resources and decision margin encounter their limits. This point, therefore, defines the performance boundary of the algorithm under the current deployment configuration. This discovery holds significant engineering value, as it clearly delineates the stable operational range of the algorithm and provides a quantitative basis for subsequent hardware selection and system capacity design.
Table 12 and
Figure 18 compare the performance of the proposed method with other low-voltage AC series arc fault detection algorithms. Our method shows superior performance in feature extraction, recognition method, load applicability, and accuracy. It is important to note that the “Number of loads” metric refers to the count of distinct load types tested, not the number operating concurrently. Specifically, within the IEC 62606 standard framework, our method achieves detection for the greatest variety of loads and the highest accuracy, which substantiates its effectiveness and advanced nature.
Figure 18 presents a radar chart comparing the performance of the proposed method and three existing approaches across two key dimensions: number of load types covered and detection accuracy. The radar chart is constructed with five concentric rings, where each ring represents an effectiveness level on a scale from 1 to 5, with 1 being the least effective and 5 being the most effective. As shown, the proposed method occupies the outermost region in the chart, signifying its superior overall performance. Specifically, it achieves the highest load coverage while maintaining a top-tier accuracy of 99.65%. This result demonstrates an effective balance between generalizability and precision.
The practical deployment of the proposed method should consider several potential limitations. First, under extreme electromagnetic interference—such as that generated by large motor drives or radio frequency devices—the acquired current signal quality may degrade, which could theoretically affect the stability of feature extraction and increase variance in response times. Second, while the hardware validation included a representative set of household loads, it does not encompass all possible appliance types or combinations. Therefore, the model’s ability to generalize to unseen loads, particularly those with novel topologies or operating principles, requires further verification. Furthermore, the tests validated performance for single arc faults during multi-load operation; however, more complex scenarios involving concurrent intermittent arcs at multiple locations were not evaluated. The system’s capability to reliably identify and discriminate such rare but theoretically possible fault conditions remains an area for future study. These limitations represent common engineering challenges and point toward specific directions for subsequent research and optimization.
6. Conclusions
This study tackles the critical challenge of balancing diagnostic accuracy with model interpretability in arc fault detection. We propose a novel low-voltage AC arc fault detection method based on a TDDA-CNN prototype learning framework. Its core innovation is the seamless integration of a hybrid attention module for enhanced feature extraction and a prototype learning mechanism for intrinsic interpretability. Experimental results show that our method achieves outstanding accuracy (exceeding 99% under single-load conditions) with robust generalization in multi-load scenarios. More importantly, it fundamentally addresses the ‘black box’ problem of conventional AI models by providing a three-dimensional visualizable prototype set and corrective guidance from an arc-free prototype, thereby offering transparent decision-making insights.
In summary, the principal contribution of this work is a unified framework that simultaneously delivers high accuracy and strong interpretability—a combination essential for building trustworthy AI systems in safety-critical applications like electrical fire prevention. The successful validation on an experimental prototype confirms its practical potential for reliable arc detection in complex household environments.
Future work will proceed along two main trajectories to transition this technology from laboratory validation to field deployment: first, to enhance deployment feasibility, we will focus on model compression and optimization for embedded systems. This includes implementing specific techniques such as structured pruning and post-training quantization to reduce model size and computational latency, enabling cost-effective hardware integration. Second, to ensure long-term robustness, we will expand the validation to assess the impact of critical environmental factors including temperature and humidity variations, as well as more diverse, unseen load types and concurrent fault scenarios.