Next Article in Journal
Applying PageRank to Team Ranking in Single-Elimination Tournaments: Evidence from Taiwan’s High School Baseball
Next Article in Special Issue
Artificial Intelligence Applied to Computational Fluid Dynamics and Its Application in Thermal Energy Storage: A Bibliometric Analysis
Previous Article in Journal
BrushGaussian: Brushstroke-Based Stylization for 3D Gaussian Splatting
Previous Article in Special Issue
Digital Twin of the European Electricity Grid: A Review of Regulatory Barriers, Technological Challenges, and Economic Opportunities
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Series Arc Fault Detection Based on Improved Artificial Hummingbird Algorithm Optimizer Optimized XGBoost

Division of Electronics and Informatics, School of Science and Technology, Gunma University, 1-5-1 Tenjin-cho, Kiryu 376-8515, Gunma, Japan
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(12), 6861; https://doi.org/10.3390/app15126861
Submission received: 16 May 2025 / Revised: 11 June 2025 / Accepted: 16 June 2025 / Published: 18 June 2025
(This article belongs to the Special Issue Holistic Approaches in Artificial Intelligence and Renewable Energy)

Abstract

Based on the wide variety of electrical appliances, it is difficult to detect similar current waveforms when different appliances experience arc faults due to insufficient extraction of fault arc characteristics and low detection accuracy. To address these issues, a series arc fault detection method combining artificial hummingbird algorithm (AHA) and XGboost has been proposed. According to GB14287.4—2014, an experimental platform for fault arcs was designed and built to collect fault arc signals. By leveraging the global search capability and dynamic adaptive mechanism of AHA, key feature subsets sensitive to arcs are selected from high-dimensional time–frequency domain features. Combining the parallel computing advantages and regularization strategies of XGBoost, a low-complexity, highly interpretable fault classification model is constructed. The hyperparameters of XGBoost are simultaneously optimized by AHA. Experimental results show that the proposed method achieves a fault arc detection accuracy rate of 98.098%, effectively identifying series arc faults.

1. Introduction

As the proportion of renewable energy integration and the large-scale application of power electronic devices increase, the fault arc issues in low-voltage distribution systems have become increasingly complex. Series fault arcs, due to their low current amplitude and strong coupling with load dynamic behavior, can easily lead to the failure of traditional overcurrent protection devices, becoming one of the primary causes of electrical fires [1]. Studies show that more than 30% of hidden electrical fires are directly related to undetected series arc faults [2]. Although existing detection technologies (such as high-frequency noise analysis [3], time–frequency domain threshold method [4]) have achieved some success, they still face bottlenecks such as redundant feature extraction and insufficient model generalization capability in modern distribution systems dominated by nonlinear loads [5].
Traditional optimization algorithms often fall into local optima, and deep learning models face issues such as high computational complexity and heavy reliance on massive annotated data [6], making it challenging to meet the dual requirements of real-time performance and lightweight design in low-voltage distribution systems. In recent years, the integration of machine learning and optimization algorithms has provided new approaches for arc fault detection. For example, a support vector machine (SVM) model optimized using genetic algorithms (GAs) improves detection efficiency through feature reduction [7], while the combination of deep learning and attention mechanisms enhances robustness under complex operating conditions [8]. Ref. [9] uses raw current signals as input to construct a one-dimensional convolutional neural network (ArcNet) and verifies the algorithm’s real-time performance and accuracy on Raspberry Pi. Ref. [10] proposes an arc fault detection model based on a lightweight one-dimensional convolutional neural network, which is improved by using deep separable convolution. The GADF-CNN model proposed in Ref. [11] effectively extracts and classifies fault features. Ref. [12] introduces a domain adaptation framework based on multi-scale hybrid domain features (DA-MMDF) for cross-domain intelligent fault diagnosis under variable operating conditions. Ref. [13] employs a method of deep supervision without anchors and task alignment to encourage the model to make accurate and consistent predictions, significantly enhancing overall detection performance. Ref. [14] proposes a hybrid model framework of knowledge distillation convolutional neural networks–deep forest (KDCNNs-DF), which significantly improves classification accuracy.
The patterns of arc changes are complex, and traditional diagnostic models, such as SVM, often suffer from issues like insufficient accuracy, overfitting, and high data requirements [15]. XGBoost, with its simplified calculations and high classification and computational efficiency, has gained widespread application. However, the determination of hyperparameters heavily relies on expert prior knowledge. To enhance the classification performance of XGBoost, some scholars have employed various optimization algorithms to refine the parameters of XGBoost. Nevertheless, common optimization algorithms still face challenges such as weak local search capabilities, average global search capabilities, and a tendency towards premature convergence [16,17].
In response to the aforementioned challenges, this paper proposes a series arc fault detection method that integrates the artificial hummingbird algorithm (AHA) with XGBoost. By analyzing a large amount of arc fault data and selecting appropriate feature values based on the characteristics of the faults, the global search capability and dynamic adaptive mechanism of AHA are utilized to screen out key feature subsets sensitive to arcs from high-dimensional time–frequency domain features. Combining the parallel computing advantages and regularization strategies of XGBoost, a low-complexity, high-explanatory fault classification model is constructed. The hyperparameters of XGBoost are optimized synchronously by AHA, and the model is trained for series arc fault diagnosis.

2. Collection of Arc Fault Data in Series

2.1. Arc Occurrence Platform Construction and Current Signal Collection

According to GB/T31143—2014 “Arc Fault Protection Appliances (General Requirements for AFDD),” [18] an independently designed and constructed AC arc fault signal acquisition platform suitable for 220 V has been established. The arc fault device consists of a fixed electrode and a moving electrode controlled by a stepper motor. The material of the fixed electrode is a stone grinding rod, while the moving electrode is a copper rod. The experimental platform mainly includes an experimental circuit and measuring equipment. The experimental circuit primarily comprises a 220 V/50 Hz AC power supply, an arc fault simulator, and a load in series. Among these, the arc fault simulator is connected to the load and oscilloscope, generating arcs when the electrical appliance operates; the oscilloscope is used to collect the load current signal; and high-frequency current transformers (cut-off frequency: 10 KHz) and low-frequency current transformers (cut-off frequency: 5 KHz) connected to the oscilloscope are linked to the live wire of the load, obtaining the operating current of the load. The torque of the stepper motor is 0.55 N.m and the step angle is 1.8. The experimental platform is shown in Figure 1.
Common household low-voltage AC appliances are used as experimental loads. At the start of the experiment, normal operating signals are collected. The two electrodes are in contact, and switch 1 is closed to form a circuit. Then, the distance between the two electrodes is controlled by a stepper motor. When a certain distance is reached, a sustained arc will be generated. During the experiment, first, the current signal when the load is operating normally is collected; then, the current signal under the arc generation state of the arc generator is collected, and finally, signal processing and analysis are performed. AC fault arc protection is mainly applied in homes or office settings, so different household appliances are selected as experimental loads. The load parameters used in the experiment are shown in Table 1.

2.2. Analysis of Arc Current Characteristics of Failure

Periodic sampling of normal working and fault arc current signals is performed. Waveforms of normal and fault load currents are shown in Figure 2, Figure 3, Figure 4 and Figure 5.
Under normal conditions (subfigure (a) in each figure), the time–domain waveforms of currents under different loads show significant differences. Some are sinusoidal waves, such as Figure 5a; some approach triangular waves, like Figure 2a and Figure 3a; and some exhibit zero-crossing lags, as seen in Figure 4a. These variations reflect the impedance characteristics of different loads and demonstrate how load diversity affects the series current. At the same time, the current waveforms under a specific load are stable over cycles, with consistent repetition before and after each cycle.
In the arc state (subfigure (b) in Figure 2, Figure 3, Figure 4 and Figure 5), the current amplitude under most loads does not significantly increase compared to its normal state; under resistive loads, the current amplitude even slightly decreases. The differences in the time–domain waveform of currents under different loads remain significant. Besides the impact of impedance differences, random spikes, waveform gaps, and zero-crossings caused by the arc introduce strong diversity and randomness to the arc current waveform. The waveform varies between cycles under a certain load, and the waveform is not fixed before and after each cycle.

3. Current Signal Processing and Feature Extraction

3.1. Characteristic Processing of High-Frequency Current Signal

The time domain characteristics of arc current are mainly manifested as the “flat shoulder” of the current waveform at zero crossing, the high-frequency components of arc current being rich, and the current waveform being asymmetric. In view of the above characteristics, the maximum rise rate and skewness of the current are selected as the time domain characteristics of arc current.

3.1.1. Maximum Rate of Current Rise

During an arc fault, the amplitude of the current undergoes a sudden change, leading to a sudden change in the maximum rate of increase of the current at zero crossing. The difference between the current signal sampling values of two adjacent discrete points within a cycle is selected, and the maximum absolute value represents the maximum rate of increase of the current. The calculation formula is as follows:
V max = ( I 2 I 1 , I 3 I 2 , , I N I N 1 ) max
In the formula, I is the instantaneous value of current, and N is the number of sampling points.

3.1.2. Peak Value

The peak index C f is defined as the ratio of signal peak to root mean square value, and the calculation formula is:
C f = x max x r m s = max x τ ( τ ) 1 N i = 1 N x i 2
In the formula, xmax is the signal peak, xrms is the signal root mean square value, xi is the amplitude of the i discrete signal, and xτ is the processed discrete signal sequence.

3.1.3. Pulse Indicators

The pulse index I f is defined as the ratio of signal peak to absolute mean, and the calculation formula is:
I f = x max x ¯ = max x τ ( τ ) 1 N i = 1 N x i
In the formula, x ¯ is the signal absolute mean.

3.1.4. Margin Indicators

The margin index L f is defined as the ratio of signal peak to root mean square amplitude, and the calculation formula is:
L f = x max x τ = max x τ ( τ ) ( 1 N i = 1 N x i ) 2
In the formula, x τ is the root amplitude of the signal.

3.1.5. Total Harmonic Distortion Rate

After FFT transformation of the time domain signal, it is found that the current waveform has different degrees of distortion in the fault state. The total harmonic distortion rate eTHD is used to characterize the distortion degree of the fault signal, and the calculation formula is as follows:
e T H D = k = 2 H ( I k I 1 ) 2 = k = 2 H ( D k D 1 ) 2
In the formula, I1 is the effective value of the fundamental wave of the current signal, Ik is the effective value of the k-th harmonic of the current signal, D1 is the amplitude of the fundamental wave, Dk is the amplitude of the k-th harmonic, and H is the specified order.

3.2. Feature Extraction of Low-Frequency Current Signal

The low-frequency current signal is denoted as LF(n), where n = 1, 2, …, N, and N is the length of the signal, which is 4000 here.
A signal can be defined as:
I = { I w a v e ( e ) e = 1 , 2 , , E }
The number of sampling points for each full wave content is:
Q = 0.02 f s 2
The average AC current IMean(e) in the e-th full wave is:
I M e a n ( e ) = 1 Q q = 1 Q L F ( e , q ) 2
In the formula, LF(e, q) is the q sampling points of the e-th full wave.
The amplitude symmetry factor of the e-th full wave is defined as:
A m p S y m I n d e x ( e ) = max q = 1 , 2 , , Q L F ( e , q ) D C D C min q = 1 , 2 , , Q L F ( e , q )
In the formula, DC is the DC component of the current signal.
The phase symmetry factor of the e-th full wave is defined as:
P h a s e S y m I n d e x ( e ) = M a x L o c _ L F ( e , q ) M i n L o c _ L F ( e , q ) Q / 2
In the formula, MaxLoc_LF(e,q) and MinLoc_LF(e,q) are the positions of the maximum and minimum values in the e-th full wave, respectively.
(1)
The mean value of the average current of E full waves IMean_Mean is:
I M e a n _ M e a n = 1 E e = 1 E 1 Q q = 1 Q L F ( e , q ) 2
(2)
The instantaneous current IIN of E full waves is:
I I N = I M e a n ( E )
(3)
The peak value of E full wave average current IMean_Max is:
I M e a n _ M a x = max e = 1 , 2 , , E [ 1 Q q = 1 Q L F ( e , q ) 2 ]

4. Artificial Hummingbird Algorithm (AHA) and XGBoost

4.1. Artificial Hummingbird Algorithm (AHA)

The artificial hummingbird algorithm (AHA) represents an emerging heuristic approach [10], distinguished by its unique biological inspiration. AHA meticulously models three characteristic hummingbird flight maneuvers—axial, diagonal, and omnidirectional—alongside their intelligent foraging strategies, encompassing guided, regional, and migratory patterns. This formulation yields an artificial intelligence foraging model adept at tackling high-dimensional optimization problems, offering benefits like computational simplicity and high solution accuracy. Nevertheless, similar to many heuristic methods, AHA is not without limitations, exhibiting a tendency to converge prematurely on local optima and experiencing slower convergence speeds.

4.1.1. Initialize

Given n food sources at random, the artificial hummingbird population is randomly initialized:
x i = L b + r a n d ( 0 , 1 ) ( Ub Lb ) , i = 1 , 2 , , n
The algorithm employs Lb and Ub as the respective lower and upper bounds defining the search domain. A vector rand(0,1) provides randomness, with each element sampled uniformly between 0 and 1. Each candidate solution xi (for i = 1, 2, …, N) is generated using the assignment. Subsequently, the access table undergoes initial setup according to:
V T i , j = 0 , i j n u l l , i = j i = 1 , 2 , , n ; j = 1 , 2 , , n
The entities modeled by Form (15) comprise the j-th physical source along with a distinct physical source that the i-th hummingbird visits.

4.1.2. Guide Foraging

The AHA algorithm can simulate three specific flight skills of hummingbirds (axial, diagonal, and omnidirectional). The expression for axial flight is as follows:
D i = 1 , i = r a n d i ( 1 , d ) 0 ,         e l s e i = 1 , 2 , , d
In the formula, randi([1,d]) represents a random integer on [1,d]. The mathematical model of diagonal flight is as follows:
D i =       1 , i = w ( ii ) , ii 1 , K , w = r a n d p e r m ( K ) , K 2 R ( d 2 ) + 1     0 , e l s e i = 1 , 2 , , d
Within this formulation, randperm(K) generates a random permutation of the integers from 1 to K. R denotes a uniformly distributed random scalar within the interval [0, 1]. Omnidirectional flight is characterized by any displacement vector whose projections onto the three Cartesian coordinate axes are governed by the following expression:
D i = 1 , i = 1 , 2 , , d
These flight adaptations enable hummingbirds to access designated nourishment locations, facilitating the identification of candidate sustenance sites. The formal characterization of the guiding foraging strategy and these provisional food sources is given by the following equations:
V g i ( t + 1 ) = x i target ( t ) + a D x i ( t ) x i target ( t ) a ~ N ( 0 , 1 )
The position update for the i-th candidate solution, denoted as xi(t), is given by the formula. This expression incorporates Xi·taregt(t), representing the position of the target candidate solution explored by the i-th hummingbird, and a, a regional foraging parameter drawn from a standard normal distribution.
x i ( t + 1 ) = x i ( t ) , f ( x i ( t ) f ( V g i ( t + 1 ) ) V g i ( t + 1 ) , f ( x i ( t ) > f ( V g i ( t + 1 ) )
In the formula, f ( ) is the fitness value of the function. Equation (20) shows that if the nectar replenishment rate of the candidate physical source is higher than that of the current food source, the hummingbird will abandon the current food source.

4.1.3. Territory Foraging

Subsequent to identifying the target candidate solution, hummingbirds exhibit a probability-driven tendency to investigate alternative food sources, specifically avoiding the original one. This behavior promotes relocation to proximate zones for the purpose of acquiring alternative food intelligence, offering potential substitutes for the optimal solution. The mathematical representation of territorial foraging behavior and candidate food source positions is presented as follows:
V t i ( t + 1 ) = x i ( t ) + b D x i ( t ) b ~ N ( 0 , 1 )
where b represents a regional foraging parameter adhering to the standard normal distribution. Formula (8) enables hummingbirds to pinpoint novel food source intelligence.

4.1.4. Migrate for Food

When food is scarce in frequently foraging areas, hummingbirds migrate to distant regions in search of food. Migration factors are predetermined in the AHA. Different behavioral effects will be produced based on these predetermined values. The specific migration expression is as follows:
x p o o r = L b + r ( U b L b ) , i = 1 , 2 , , n
The variable x p o o r in the formula signifies the least optimal candidate solution. In configuring the AHA algorithm, the essential parameters of population size and maximum iterations are supplemented by a single unable parameter: the migration coefficient. This coefficient regulates the migration activation criterion. Its explicit formulation appears below:
M = 2 n
If candidate solutions are not fully replaced during guided or territorial foraging (each equally likely), a target solution can be revisited within 2^n iterations. Locking migration foraging addresses potential stagnation here by expanding the search space.

4.2. Extreme Gradient Boosting Tree (XGBoost)

XGBoost is an ensemble learning algorithm that improves upon the Gradient Boosting Decision Tree (GBDT) framework. By introducing regularization terms, parallelization, and second-order derivatives of loss functions, it achieves faster and more effective predictions. The algorithm optimizes iteratively by constructing new regression trees to fit the negative gradient (i.e., residuals) of the current model in each iteration, and achieves the final output by weighting and accumulating the predictions from all regression trees. This iterative optimization strategy not only enhances the prediction accuracy of the model but also effectively controls model complexity through regularization terms, thereby improving generalization capabilities. If there are M sub-trees, the model’s output result is:
y ^ i = m = 1 M f m ( x i )
In the formula, y i is the model output, f m is the m-th tree, and x i is the i-th sample. The objective function to be minimized is:
L = i = 1 m l ( y i , y ^ i ) + Ω ( f t ) + ε
Ω ( f t ) = γ T + 1 2 λ i = 1 T ω j 2
In the formula, l is the loss function, ε is a constant, T is the number of leaf nodes, ω is the weight of leaf nodes, and γ and λ are the regularization parameters that controls the complexity of the model. Compared with traditional GBDT, XGBoost introduces a regularization term to penalize the complexity of the model, making it less prone to overfitting when the regularization parameter increases [11]. The objective function in the t-th iteration can be expressed as:
L ( t ) = i = 1 n l ( y i , y ^ i ( t ) ) + Ω ( f t ) = i = 1 n l ( y i , y ^ i ( t 1 ) + f i ( x i ) ) + Ω ( f t )
It can be obtained by performing a second-order Taylor expansion on L ( t ) and deleting the constant term:
L ( t ) i = 1 n g i f i ( x i ) + 1 2 h i f t 2 ( x i ) + Ω ( f t )
In the formula, g i and h i are the first and second gradients of the loss function, respectively. Let the sample number set of the j-th leaf node be I j = i q ( x i ) = j , where q ( x i ) is the leaf label value corresponding to x i . Then, Equation (28) is rewritten as:
L ( t ) j = 1 T G j ω j + 1 2 ( H j + λ ) ω j 2 + γ T
In the formula, G j and H j are the sum of all g and h, respectively. Let the derivative of ω j be 0, and get the leaf node score:
ω = G j H j + λ
The minimum value is obtained as follows:
L = 1 2 j = 1 T G j 2 H j + λ
The XGBoost algorithm performs well in structured data processing, with faster training speed and a lower required number of samples, so XGBoost is selected as the classifier [12].

4.3. AHA-XGBoost Arc Fault Detection Process

Step 1: Establish a series arc fault experimental platform to extract the features of the collected experimental data.
Step 2: Select 7 kinds of feature quantities as detection indicators, and carry out dimensionality reduction fusion of the features. Label samples according to the load type, and establish a data set.
Step 3: The training set and test set are randomly selected in the ratio of 8:2, and normalized processing is carried out.
Step 4: Use the AHA algorithm to optimize the five key parameters of XGBoost (number of trees, depth of trees, minimum weight sum of subnodes, learning rate, and sample ratio), and input the training set for training. Figure 6 shows the value of AHA fitness.
Step 5: Input the test set into the trained model, carry out fault diagnosis, and output the test results.
The AHA-XGBoost arc fault detection process is shown in Figure 7.

5. Identification of Fault Arc

5.1. Feature Importance and Interpretability via SHAP Analysis

In this study, the SHapley Additive ExPlanations (SHAP) framework was used to quantitatively analyze the contribution and influencing mechanism of the 11 selected features to the prediction results of the series arc fault detection model. As can be seen from Figure 8, all 11 features show a predictive influence that cannot be ignored, which jointly verifies their effectiveness for the detection task. Although there are significant differences in the influence of each feature, the SHAP value of any feature is consistently clustered around zero. This suggests that each feature provides unique and complementary information to the model. Crucially, the influence of the features associated with the mean value of the average current and amplitude symmetry factor is always at the forefront, which is highly consistent with the known physical properties of the tandem arc. The SHAP dependency diagram further illuminates the nature of these relationships, revealing both the expected monotonic tendencies and more complex nonlinear interactions between certain features and the predicted arc failure probability. This comprehensive SHAP analysis demonstrates the effectiveness of the selected feature set in capturing the discriminant features of series arc faults.

5.2. Parameter Configuration

According to the national standard GB14287.4—2014 “Electrical Fire Monitoring System-Part 4: Fault Arc Detector,” [19] the alarm performance indicators for fault arc detectors (arcfaultdetector, AFD) involve splitting each current sequence under different operating conditions into samples, with one cycle as a sample. Each sample contains 5000 current sampling points, forming a fault arc database. The data sets of each operating condition after splitting are divided into training and test sets according to random sampling principles, and they are divided in an 8:2 ratio, as shown in Table 2. Under normal conditions, the training set has 15,456 samples, and the test set has 2088 samples. Under fault conditions, the training set has 15,462 samples, and the test set has 2214 samples.

5.3. Fault Arc Identification Results and Analysis

In this paper, six indexes are used to evaluate the performance of fault arc recognition model, namely sensitivity (Se), specificity (Sp), accuracy (Acc), precision (Precision), recall rate (Recall), and macro average F1 score (F1).
During the model training process, the prediction result curve of the test set is shown in Figure 9, and the accuracy rate of the verification set is 98.0983%. In the training process, there is no overfitting phenomenon, and the training results are good. Figure 10 shows the confusion matrix of the output model proposed in this paper. It can be seen that the recognition accuracy of samples in normal state of each type of load is high.

5.4. Comparison with Other Identification Methods

In order to verify the superiority of the proposed model in identifying fault arcs, the same data set collected by the experimental platform was used to train and test different models. The specific comparison models include BP neural network, extreme learning, random forest, and decision tree. In the random forest, the number of trees is 256, and the BP neural network has two hidden layers with three neurons each. The number of iterations is set to 3000, and the learning rate is 0.01.
Figure 11 shows the AUC curves for the five models: BP neural network with an AUC value of 0.9605, extreme learning with an AUC value of 0.9540, random forest with an AUC value of 0.9726, decision tree with an AUC value of 0.9582, and AHA-XGBoost with an AUC value of 0.9925. The AUC value of AHA-XGBoost is closest to 1, indicating that the fault detection method proposed in this paper can prioritize arc faults.
Table 3 lists the specific parameters of each model. Among them, compared with other models that have similar numbers of parameters, the accuracy of the designed model AHA-XGBoost is 98.098%, outperforming the other four models. The model proposed in this paper is slightly higher than the other four models in terms of specificity, sensitivity, precision, recall rate, and macro average F1 score. Additionally, this model significantly reduces the number of parameters and computational requirements, which to some extent reflects a simplification of the model complexity.

6. Conclusions

In response to the challenges of extracting characteristics from series arc faults and low accuracy in the fault world, this paper proposes a fault diagnosis method based on AHA-XGBoost. Through experimental analysis using an independently developed fault arc test platform, the superiority and effectiveness of this method have been verified. It can provide methodological support for subsequent series fault arc diagnosis and simultaneously improve accuracy. The main conclusions are as follows:
(1)
Feature selection optimization: Using the global search ability and dynamic adaptive mechanism of AHA, key feature subsets sensitive to arc are screened out from high-dimensional time–frequency domain features (such as wavelet packet entropy and high-frequency harmonic distortion rate) so as to overcome the local optimal defects of traditional filtering feature selection methods (such as the mutual information method).
(2)
Lightweight integrated modeling: Combining the parallel computing advantages of XGBoost and regularization strategies, a low-complexity and high-explanatory fault classification model is constructed. Through AHA synchronous optimization of XGBoost’s hyperparameters (such as learning rate and tree depth), the detection accuracy and real-time performance are significantly improved.
(3)
AHA optimizes XGBoost, which converges faster and has a better effect.
The analysis of the experimental conclusions shows that the series arc fault detection method based on AHA-XGBoost has significant advantages. By using the AHA algorithm to adaptively filter high-dimensional time–frequency domain features, it effectively addresses the challenge of insufficient extraction of arc-sensitive features, reducing the feature dimension to an optimal subset. Combining the XGBoost regularization model, it ensures a high detection accuracy of 97.523% while reducing model complexity, and its parallel computing characteristics enhance computational efficiency. Experiments verify that this method meets the requirements of GB14287.4—2014, with a feature dimension reduction of about 40% compared to traditional methods, and a misjudgment rate reduced to 1.9%, maintaining an identification accuracy of 98.098% even in low-current (<5 A) scenarios. This fusion algorithm provides a highly interpretable solution for multi-type electrical arc fault detection, offering practical engineering value for improving the reliability of electrical fire warning systems.
The core tasks in the future will focus on enhancing robustness in complex real-world environments (such as strong noise and diverse loads), achieving efficient integration and optimization of embedded systems, and ensuring continuous compliance with relevant safety and compatibility standards. This will be achieved by overcoming current limitations through methods such as collecting more comprehensive data, improving noise immunity, and model lightweighting, thereby facilitating the transition from laboratory achievements to large-scale practical applications.

Author Contributions

Conceptualization, L.Q.; Methodology, L.Q.; Formal analysis, T.K.; Writing—original draft, L.Q.; Writing—review & editing, S.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Miao, W.; Xu, Q.; Lam, K.H.; Pong, P.W.T.; Poor, H.V. DC Arc-Fault Detection Based on Empirical Mode Decomposition of Arc Signatures and Support Vector Machine. IEEE Sensors J. 2020, 21, 7024–7033. [Google Scholar] [CrossRef]
  2. Zhang, T.; Zhang, R.; Wang, H.; Tu, R.; Yang, K. Series AC Arc Fault Diagnosis Based on Data Enhancement and Adaptive Asymmetric Convolutional Neural Network. IEEE Sensors J. 2021, 21, 20665–20673. [Google Scholar] [CrossRef]
  3. Ahmadi, M.; Samet, H.; Ghanbari, T. A New Method for Detecting Series Arc Fault in Photovoltaic Systems Based on the Blind-Source Separation. IEEE Trans. Ind. Electron. 2019, 67, 5041–5049. [Google Scholar] [CrossRef]
  4. Qu, N.; Chen, J.; Zuo, J.; Liu, J. PSO–SOM Neural Network Algorithm for Series Arc Fault Detection. Adv. Math. Phys. 2020, 2020, 6721909. [Google Scholar] [CrossRef]
  5. Jiang, J.; Wen, Z.; Zhao, M.; Bie, Y.; Li, C.; Tan, M.; Zhang, C. Series Arc Detection and Complex Load Recognition Based on Principal Component Analysis and Support Vector Machine. IEEE Access 2019, 7, 47221–47229. [Google Scholar] [CrossRef]
  6. Kim, Y.; Lee, S. Ensemble learning-based arc fault detection with multi-sensor data fusion. IEEE Access 2021, 9, 123456–123467. [Google Scholar]
  7. Wang, L.; Yong, S. GA-SVM based arc fault detection with feature selection for photovoltaic systems. IEEE Trans. Sustain. Energy 2022, 13, 987–996. [Google Scholar]
  8. Xu, R.; Li, J. Edge computing-oriented lightweight arc fault detection model for IoT-enabled circuit breakers. IEEE Internet Things J. 2023, 10, 6543–6554. [Google Scholar]
  9. Wang, Y.; Hou, L.; Paul, K.C.; Ban, Y.; Chen, C.; Zhao, T. ArcNet: Series AC arc fault detection based on raw current and convolutional neural network. IEEE Trans. Ind. Inform. 2022, 18, 77–86. [Google Scholar] [CrossRef]
  10. Tang, A.; Wang, Z.; Tian, S.; Gao, H.; Gao, Y.; Guo, F. Series arc fault identification method based on lightweight convolutional neural network. IEEE Access 2024, 12, 5851–5863. [Google Scholar] [CrossRef]
  11. Yin, C.; Jiang, S.; Wang, W.; Jin, J.; Wang, Z.; Wu, B. Fault diagnosis method of rolling bearing based on GADF-CNN. J. Vib. Shock 2021, 40, 247–253. [Google Scholar]
  12. Lei, Z.; Wen, G.; Dong, S.; Huang, X.; Zhou, H.; Zhang, Z.; Chen, X. An intelligent fault diagnosis method based on domain adaptation and its application for bearings under polytropic working conditions. IEEE Trans. Instrum. Meas. 2020, 70, 1–14. [Google Scholar] [CrossRef]
  13. Zuo, F.; Liu, J.; Fu, M.; Wang, L.; Zhao, Z. An Efficient Anchor-Free Defect Detector With Dynamic Receptive Field and Task Alignment. IEEE Trans. Ind. Inform. 2024, 20, 8536–8547. [Google Scholar] [CrossRef]
  14. Ma, J.; Cai, W.; Shan, Y.; Xia, Y.; Zhang, R. An Integrated Framework for Bearing Fault Diagnosis: Convolutional Neural Network Model Compression Through Knowledge Distillation. IEEE Sensors J. 2024, 24, 40083–40095. [Google Scholar] [CrossRef]
  15. Zhu, Y.; Guo, Z.; Zhan, X.; Huang, X. Transformer winding looseness diagnosis method based on multiple feature extraction and sparrow search algorithm optimized XGBoost. Electr. Mach. Control 2013, 28, 87–97. [Google Scholar]
  16. Tang, Z.; Shi, X.; Zou, H.; Zhu, Y. Fault diagnosis of wind turbine based on random forest and XGBoost. Renew. Energy 2021, 39, 353–358. [Google Scholar]
  17. Gong, Z.; Rao, T.; Wang, G. Transformer fault diagnosis method based on improved particle swarm optimization XGBoost. High Volt. Electr. Appar. 2019, 59, 61–69. [Google Scholar]
  18. GB/T31143—2014; General Requirements for Arc Fault Detection Devices (AFDD). China National Standardization Administration: Beijing, China, 2014.
  19. GB14287.4—2014; Electrical Fire Monitoring System―Part 4: Arcing Fault Detectors. China National Standardization Administration: Beijing, China, 2014.
Figure 1. Series arc fault test platform.
Figure 1. Series arc fault test platform.
Applsci 15 06861 g001
Figure 2. Normal and arc current waveform of halogen lamp load. (a) Normal current. (b) Arcing current.
Figure 2. Normal and arc current waveform of halogen lamp load. (a) Normal current. (b) Arcing current.
Applsci 15 06861 g002
Figure 3. Normal and arc current waveform of electric fan load. (a) Normal current. (b) Arcing current.
Figure 3. Normal and arc current waveform of electric fan load. (a) Normal current. (b) Arcing current.
Applsci 15 06861 g003
Figure 4. Normal and arc current waveform of vacuum cleaner load. (a) Normal current. (b) Arcing current.
Figure 4. Normal and arc current waveform of vacuum cleaner load. (a) Normal current. (b) Arcing current.
Applsci 15 06861 g004
Figure 5. Normal and arc current waveform of hair dryer + kettle load. (a) Normal current. (b) Arcing current.
Figure 5. Normal and arc current waveform of hair dryer + kettle load. (a) Normal current. (b) Arcing current.
Applsci 15 06861 g005
Figure 6. AHA fitness value.
Figure 6. AHA fitness value.
Applsci 15 06861 g006
Figure 7. AHA-XGBoost arc fault detection process.
Figure 7. AHA-XGBoost arc fault detection process.
Applsci 15 06861 g007
Figure 8. SHAP value (average impact on model output magnitude).
Figure 8. SHAP value (average impact on model output magnitude).
Applsci 15 06861 g008
Figure 9. AHA-XGBoost arc fault prediction results.
Figure 9. AHA-XGBoost arc fault prediction results.
Applsci 15 06861 g009
Figure 10. Confusion matrix diagram.
Figure 10. Confusion matrix diagram.
Applsci 15 06861 g010
Figure 11. AUC curves of each model.
Figure 11. AUC curves of each model.
Applsci 15 06861 g011
Table 1. Representative test loads and their main parameters.
Table 1. Representative test loads and their main parameters.
Load CategoryLoadRated Electrical Parameters
Linear loadElectric kettle220 V/1600 W
Halogen lamp220 V/600 W
Hair dryer220 V/1000 W
Vacuum cleaner220 V/1300 W
Nonlinear loadVoltage regulating circuit220 V/800 W
Computer220 V/100 W
Electric fan220 V/60 W
Mixed loadHair dryer + electric kettle220 V/1000 W + 220 V/1600 W
Hair dryer + computer220 V/1000 W + 220 V/100 W
Electromagnetic oven + voltage regulating circuit220 V/1300 W + 220 V/800 W
Table 2. Database.
Table 2. Database.
Data ModeDatabase Set TypeNumberTotal/Number
Normal (experimental label: 1)Training set13,36815,456
Test set2088
Hitch (experimental label: 2)Training set13,24815,462
Test set2214
Table 3. Fault arc diagnosis results of different methods.
Table 3. Fault arc diagnosis results of different methods.
ModelSpecificity (Sp)Sensitivity (Se)Accuracy (Acc)PrecisionRecallF1
BP neural network96.81185.32493.17392.53685.32488.784
Extreme learning96.90475.60790.1691.88275.60782.954
Random forest97.13994.9496.44293.89494.9494.415
Decision tree97.60894.02896.47494.79694.02894.411
AHA-XGBoost98.874395.64898.09897.52395.64896.576
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Qi, L.; Kawaguchi, T.; Hashimoto, S. Series Arc Fault Detection Based on Improved Artificial Hummingbird Algorithm Optimizer Optimized XGBoost. Appl. Sci. 2025, 15, 6861. https://doi.org/10.3390/app15126861

AMA Style

Qi L, Kawaguchi T, Hashimoto S. Series Arc Fault Detection Based on Improved Artificial Hummingbird Algorithm Optimizer Optimized XGBoost. Applied Sciences. 2025; 15(12):6861. https://doi.org/10.3390/app15126861

Chicago/Turabian Style

Qi, Lichun, Takahiro Kawaguchi, and Seiji Hashimoto. 2025. "Series Arc Fault Detection Based on Improved Artificial Hummingbird Algorithm Optimizer Optimized XGBoost" Applied Sciences 15, no. 12: 6861. https://doi.org/10.3390/app15126861

APA Style

Qi, L., Kawaguchi, T., & Hashimoto, S. (2025). Series Arc Fault Detection Based on Improved Artificial Hummingbird Algorithm Optimizer Optimized XGBoost. Applied Sciences, 15(12), 6861. https://doi.org/10.3390/app15126861

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop