Series Arc Fault Detection Method Based on Category Recognition and Artificial Neural Network

The influence of a series arc on line current is different with different loads, which makes it difficult to accurately extract arc fault characteristics suitable for all loads according to line current signal. To improve the accuracy of arc fault detection, a series arc fault detection method based on category recognition and an artificial neural network is proposed on the basis of analyzing the current characteristics of arc faults under different loads. According to the waveform of current and voltage, the load is divided into three types: Resistive category (Re), resistive-inductive category (RI), and rectifying circuit with a capacitive filter category (RCCF). Based on the wavelet transform, the characteristics of line current in the time domain and frequency domain when the series arc occurs under different types of loads are analyzed, and then the time and frequency indicators are taken as the inputs of the artificial neural network to establish three-layer neural networks corresponding to three types of loads to realize the detection of the series arc fault of lines under different categories of loads. To avoid the neural network falling into a local optimum, the initial weight and threshold of the neural network are optimized by a genetic algorithm, which further improves the accuracy of the neural network in arc identification. The experimental results show that the proposed arc detection method has the advantages of high recognition rate and a simple neural network model.


Introduction
Arc faults are an important factor causing electrical fires. At present, many countries have been paying more attention to protection against arc faults and have formulated corresponding product standards. For example, the United States formulated UL1699 [1], the International Electrotechnical Commission formulated IEC62606 [2], and China formulated GB14287.4 [3] and GB/T31143 [4]. The difficulty of arc fault protection lies in the accurate detection of series arc faults. Therefore, in recent years, many scholars have done much research on the accurate detection of series arc faults.
When the arc fault occurs, the current signals of lines with different loads have various time-frequency characteristics [5,6]: The randomness and fluctuation of the current signal increase and the periodicity in the strict definition is lost. There will be a "zero off" phenomenon at the zero crossing point (namely, where the current waveform shows an obvious shoulder phenomenon); in addition, multiple pulse currents may occur within a period, leading to a significant increase in the high-frequency components of current signals.
directly. However, the standard of category recognition is easily interfered with by harmonics of the power grid and harmonics of electric appliances when they work normally, so there is room for further optimization.
Based on this idea, this paper proposes a series arc fault detection method based on category recognition and artificial neural network. According to the waveform of voltage and current and their phase relationship, the load is classified into the resistive category (Re), resistive-inductive category (RI), and rectifying circuit with a capacitive filter category (RCCF). Moreover, the classification standard has strong anti-interference ability and is not easily affected by power grid harmonics. Therefore, the characteristic indicators extracted based on the current signal wavelet transform are more representative, which lays a theoretical foundation for greatly simplifying the artificial neural network. For different load categories, the simple "2-5-1" three-layer artificial neural network can produce better results. Experiments show that this method can cover more load types, has higher accuracy of arc fault detection, and can effectively prevent misjudgment.

Influence of Arc Fault on Current Signal under Different Loads
When an arc fault occurs in the circuit, the waveform of the current signal changes significantly, and the waveforms of different categories of loads have different characteristics.
For the resistive category (Re), Figure 1a shows that there is no shoulder in the waveform during the normal state. In the event of a fault, when the arc gap voltage is less than the air gap breakdown voltage, the air is not broken down and the current is zero. When the arc gap voltage exceeds the breakdown voltage, the air is broken down, an arc is generated immediately, the current suddenly changes to a certain numerical value, after which the arc keeps burning, and the current changes with the voltage. When the arc gap voltage is less than the breakdown voltage, the arc extinguishes, the current becomes zero, and the cycle repeats. During the period when the arc gap voltage is less than the breakdown voltage, the current remains at zero; therefore, the current waveform has an obvious shoulder phenomenon at the time of an arc fault.
Electronics 2020, 9, x FOR PEER REVIEW 3 of 21 of training data compared with the method of arc fault identification after using data training neural network directly. However, the standard of category recognition is easily interfered with by harmonics of the power grid and harmonics of electric appliances when they work normally, so there is room for further optimization. Based on this idea, this paper proposes a series arc fault detection method based on category recognition and artificial neural network. According to the waveform of voltage and current and their phase relationship, the load is classified into the resistive category (Re), resistive-inductive category (RI), and rectifying circuit with a capacitive filter category (RCCF). Moreover, the classification standard has strong anti-interference ability and is not easily affected by power grid harmonics. Therefore, the characteristic indicators extracted based on the current signal wavelet transform are more representative, which lays a theoretical foundation for greatly simplifying the artificial neural network. For different load categories, the simple "2-5-1" three-layer artificial neural network can produce better results. Experiments show that this method can cover more load types, has higher accuracy of arc fault detection, and can effectively prevent misjudgment.

Influence of Arc Fault on Current Signal under Different Loads
When an arc fault occurs in the circuit, the waveform of the current signal changes significantly, and the waveforms of different categories of loads have different characteristics.
For the resistive category (Re), Figure 1a shows that there is no shoulder in the waveform during the normal state. In the event of a fault, when the arc gap voltage is less than the air gap breakdown voltage, the air is not broken down and the current is zero. When the arc gap voltage exceeds the breakdown voltage, the air is broken down, an arc is generated immediately, the current suddenly changes to a certain numerical value, after which the arc keeps burning, and the current changes with the voltage. When the arc gap voltage is less than the breakdown voltage, the arc extinguishes, the current becomes zero, and the cycle repeats. During the period when the arc gap voltage is less than the breakdown voltage, the current remains at zero; therefore, the current waveform has an obvious shoulder phenomenon at the time of an arc fault.   For the rectifying circuit with a capacitive filter category (RCCF), under normal conditions, when the voltage across the capacitor is lower than the rectifying side voltage, the capacitor is charged, and the current is generated on the power supply side. When the voltage across the capacitor is higher than the rectifying side voltage, the capacitor discharges to the load, and the current of the side of supply power is zero, so there is a shoulder phenomenon during the normal state. To improve the current waveform, such loads are usually added with an inductive filter after the rectifier circuit to slow down the slope when the capacitor is charged. In the fault state, when the arc voltage exceeds the breakdown voltage and an arc is generated, most of the current will charge the capacitor in the load, and the current value will increase rapidly at the moment of charging. At the same time, because the value of the capacitor is generally small, the rising speed of the voltage across the capacitor is higher than the rising speed of the rectifying side voltage, which causes the arc gap voltage to fall rapidly and descend below the breakdown voltage; the arc extinguishes, and the current drops to zero, which is expressed as a pulse current. As the supply voltage rises, the arc gap will be broken down again, the current also appears as a pulse, etc. At this time, there is a certain randomness to whether the diode is turned on. The reason is as follows [26]: Due to the uncertainty of the arc gap distance, the arc extinction voltage is also uncertain under a certain current. After the arc gap is broken, if the arc gap voltage is always greater than the arc extinguishing voltage during the rise of the capacitor voltage, the arc will not extinguish, and the diode will remain turned on. If the arc gap voltage is less than the arc extinguishing voltage, the arc will extinguish, and the diode will not be turned on. Therefore, Figure 1b shows that such a load may have multiple pulse currents in a half period when an arc fault occurs, and the number of pulse currents is not fixed.
For the resistive-inductive category (RI), there is no shoulder phenomenon in the normal state. There is a parasitic capacitance between the turns of the inductive coil of the resistance-inductance load, which can be equivalently connected in parallel with the load. When an arc fault occurs, the arc extinguishes near the zero-crossing time of the current, and the arc gap equivalent resistance is approximately infinite. At this time, the supply voltage will be applied to both ends of the arc gap. The arc gap can only be broken after the arc voltage exceeds the breakdown voltage. The current charges the parasitic capacitance and generates a pulse current (the principle is the same as the RCCF category); that is, when the load is resistive-inductive, pulse currents may occur at the shoulder. The analysis here is consistent with Figure 1c.
The above is a brief summary and analysis of the shoulder phenomenon and mutation phenomenon caused by the arc fault under different categories of loads. In addition, there is a common feature of the above load categories, which is the randomness of the current waveform at the time of an arc fault. During the generation of the arc, the burning of the arc is accompanied by the volatilization of the electrode. The volatilization of the electrode causes the arc gap distance to change, so the arc resistance also changes. According to Ohm's Law of the whole circuit, when the power supply voltage remains unchanged and the resistance of the arc changes continuously, the current will also fluctuate as shown in Figure 1. If the current is analyzed in the frequency domain, there will be a high frequency noise component, and random characteristics in the frequency domain will be exhibited.

Detection Method
The flow of the detection method is shown in Figure 2. This paper will explain each step in accordance with the detection sequence.

Basis for Dividing the Half Period
The standard of arc fault detection UL1699 requires that when there are more than eight half periods within 0.5 s, the arc fault circuit breaker should perform protection. GB14287.4 requires 14 arcs within 1 s; the detector should issue an alarm signal. GB/T31143 requires that when the line current reaches 75 A, there are 12 half periods of the arc fault within 1 s, and the arc fault protector should be protected by breaking. Using the half period as the time unit to detect whether there is an arc fault that is beneficial to calculating the number of half cycles of the arc existing per unit time, which can better meet the requirements of the related standards.
When an arc fault occurs, the current waveform has the characteristics of randomness and loses periodicity in the strict definition, so the time unit cannot be divided by the current waveform. The

Basis for Dividing the Half Period
The standard of arc fault detection UL1699 requires that when there are more than eight half periods within 0.5 s, the arc fault circuit breaker should perform protection. GB14287.4 requires 14 arcs within 1 s; the detector should issue an alarm signal. GB/T31143 requires that when the line current reaches 75 A, there are 12 half periods of the arc fault within 1 s, and the arc fault protector should be protected by breaking. Using the half period as the time unit to detect whether there is an arc fault that is beneficial to calculating the number of half cycles of the arc existing per unit time, which can better meet the requirements of the related standards.
When an arc fault occurs, the current waveform has the characteristics of randomness and loses periodicity in the strict definition, so the time unit cannot be divided by the current waveform. The waveform of the power supply voltage is almost not affected by the load. Therefore, the time unit can be divided according to the voltage waveform. Figure 3 shows that the period between two adjacent zero-crossing points in the voltage waveform can be divided into a half period.
Electronics 2020, 9, x FOR PEER REVIEW 6 of 21 waveform of the power supply voltage is almost not affected by the load. Therefore, the time unit can be divided according to the voltage waveform. Figure 3 shows that the period between two adjacent zero-crossing points in the voltage waveform can be divided into a half period.

Wavelet Transform to Process Signals
The wavelet transform is convenient for detecting the information of sudden points (singular points) in the signal [7]. Therefore, the wavelet transform is used to process the current waveform.

Choosing the Wavelet Base
Daubechies 3 (db3) has a good denoising effect [27]. It is characterized by orthogonality and sensitivity to irregular signal [12]. Therefore, when performing the wavelet transform on the non-stationary signal with much noise information, db3 is suitable for wavelet analysis.

Frequency Band Distribution of the Wavelet Transform
Mallat algorithm is a fast wavelet transform algorithm. Based on the theory of multi-resolution analysis, Mallat proposes the Fast Wavelet Transform (FWT) algorithm, Mallat algorithm [28]. Mallat's core recursive formulas are shown in formula (1) and formula (2): is the discrete detail coefficient, and h(k) is the filter satisfying the two-scale difference equation.
A previous study found [29] that the frequency range of the current arc is 2-100 kHz, and it is not smooth. Reference [15] found that the amplitude fluctuation of 2.4-39 kHz is more obvious by analyzing the spectrum of different levels after wavelet decomposition. Reference [30], through incandescent lamps, hand drills, fluorescent lamps and other loads, performed spectrum analysis to determine that the high frequency band of the study is below 400 kHz. Therefore, this paper selects 3-100 kHz as the research frequency band; that is, the signal after wavelet transformation should include this frequency band.
To obtain the detailed signal and approximate signal of each layer of decomposition according to the discrete coefficients of wavelet decomposition āi(k) and d i (k) reconstruction, the frequency ranges of the two types of signals are:

Wavelet Transform to Process Signals
The wavelet transform is convenient for detecting the information of sudden points (singular points) in the signal [7]. Therefore, the wavelet transform is used to process the current waveform.

Choosing the Wavelet Base
Daubechies 3 (db3) has a good denoising effect [27]. It is characterized by orthogonality and sensitivity to irregular signal [12]. Therefore, when performing the wavelet transform on the non-stationary signal with much noise information, db3 is suitable for wavelet analysis.

Frequency Band Distribution of the Wavelet Transform
Mallat algorithm is a fast wavelet transform algorithm. Based on the theory of multi-resolution analysis, Mallat proposes the Fast Wavelet Transform (FWT) algorithm, Mallat algorithm [28]. Mallat's core recursive formulas are shown in Formulas (1) and (2): is the discrete detail coefficient, and h(k) is the filter satisfying the two-scale difference equation.
A previous study found [29] that the frequency range of the current arc is 2-100 kHz, and it is not smooth. Reference [15] found that the amplitude fluctuation of 2.4-39 kHz is more obvious by analyzing the spectrum of different levels after wavelet decomposition. Reference [30], through incandescent lamps, hand drills, fluorescent lamps and other loads, performed spectrum analysis to determine that the high frequency band of the study is below 400 kHz. Therefore, this paper selects 3-100 kHz as the research frequency band; that is, the signal after wavelet transformation should include this frequency band.
To obtain the detailed signal and approximate signal of each layer of decomposition according to the discrete coefficients of wavelet decomposition a i (k) and d i (k) reconstruction, the frequency ranges of the two types of signals are: f s is the sampling frequency. Taking the sampling frequency of 200 kHz as an example, the frequency band and spectrum of each layer of wavelet decomposition are shown in Table 1 and Figures 2 and 3, respectively. According to the research frequency band, the number of decomposition layers of wavelet transform is determined as five layers.  5 show that, after the original signal is decomposed and reconstructed, the amplitude of the approximate signal is much greater than the detailed signal. In addition, the frequency spectrum of the detailed signal of the first layer is concentrated at 50-100 kHz, and the frequency spectrum of the approximate signal is concentrated within 50 kHz. The frequency spectrum of the detailed signal of the second layer is concentrated at 25-50 kHz, and the frequency spectrum of the approximate signal is concentrated within 25 kHz, etc., consistent with the theoretical frequency band distribution presented in Table 1.
f s is the sampling frequency. Taking the sampling frequency of 200 kHz as an example, the frequency band and spectrum of each layer of wavelet decomposition are shown in Table 1 and Figures 2 and 3, respectively. According to the research frequency band, the number of decomposition layers of wavelet transform is determined as five layers.  5 show that, after the original signal is decomposed and reconstructed, the amplitude of the approximate signal is much greater than the detailed signal. In addition, the frequency spectrum of the detailed signal of the first layer is concentrated at 50-100 kHz, and the frequency spectrum of the approximate signal is concentrated within 50 kHz. The frequency spectrum of the detailed signal of the second layer is concentrated at 25-50 kHz, and the frequency spectrum of the approximate signal is concentrated within 25 kHz, etc., consistent with the theoretical frequency band distribution presented in Table 1.
f s is the sampling frequency. Taking the sampling frequency of 200 kHz as an example, the frequency band and spectrum of each layer of wavelet decomposition are shown in Table 1 and Figures 2 and 3, respectively. According to the research frequency band, the number of decomposition layers of wavelet transform is determined as five layers.  5 show that, after the original signal is decomposed and reconstructed, the amplitude of the approximate signal is much greater than the detailed signal. In addition, the frequency spectrum of the detailed signal of the first layer is concentrated at 50-100 kHz, and the frequency spectrum of the approximate signal is concentrated within 50 kHz. The frequency spectrum of the detailed signal of the second layer is concentrated at 25-50 kHz, and the frequency spectrum of the approximate signal is concentrated within 25 kHz, etc., consistent with the theoretical frequency band distribution presented in Table 1.   As the frequency band of the signal decreases, the noise content decreases, and the waveform backbone becomes clear. Table 1 shows that the fifth layer approximate signal (a5) has the lowest frequency. In addition, Figure 6 shows that after wavelet decomposition, the waveform backbone of the a5 signal is more obvious than that of the original signal, reducing the influence of the interference signal.
Electronics 2020, 9, x FOR PEER REVIEW 8 of 21 Figure 5. Spectral analysis of approximate signals.
As the frequency band of the signal decreases, the noise content decreases, and the waveform backbone becomes clear. Table 1 shows that the fifth layer approximate signal (a5) has the lowest frequency. In addition, Figure 6 shows that after wavelet decomposition, the waveform backbone of the a5 signal is more obvious than that of the original signal, reducing the influence of the interference signal.  In summary, after the original signal is decomposed and reconstructed by the wavelet transform, two different signals are generated: The detailed signal allows focus on the time-frequency characteristics of the waveform and can be used to detect whether the current signal contains a pulse mutation or high-frequency components; the approximate signal can effectively filter out the interference signal and keep the waveform backbone of current, which is convenient to calculate the current indicator such as the ''shoulder proportion''.

Primary Classification
According to the analysis in Section 2, under the normal state, the current waveforms of the resistive category and the resistive inductive category have no shoulder characteristics. The current waveforms of the rectifying circuit with a capacitive filter category has a shoulder characteristic. Therefore, the load category is divided into two types: Normal with shoulder and normal without shoulder.
By performing wavelet decomposition and reconstruction on the current waveform when various loads are normal, the fifth layer approximate signal a5 is selected to calculate the shoulder ratio of the current signal.
The shoulder always occurs near the zero-crossing point of the current, and the amplitude xmid about the shoulder is roughly determined as: In summary, after the original signal is decomposed and reconstructed by the wavelet transform, two different signals are generated: The detailed signal allows focus on the time-frequency characteristics of the waveform and can be used to detect whether the current signal contains a pulse mutation or high-frequency components; the approximate signal can effectively filter out the interference signal and keep the waveform backbone of current, which is convenient to calculate the current indicator such as the "shoulder proportion".

Primary Classification
According to the analysis in Section 2, under the normal state, the current waveforms of the resistive category and the resistive inductive category have no shoulder characteristics. The current waveforms of the rectifying circuit with a capacitive filter category has a shoulder characteristic. Therefore, the load category is divided into two types: Normal with shoulder and normal without shoulder.
By performing wavelet decomposition and reconstruction on the current waveform when various loads are normal, the fifth layer approximate signal a5 is selected to calculate the shoulder ratio of the current signal.
The shoulder always occurs near the zero-crossing point of the current, and the amplitude x mid about the shoulder is roughly determined as: x max , x min are the maximum and minimum currents in a half cycle of power frequency, respectively. To generalize the results, it is necessary to determine the detection bandwidth of the shoulder: On the basis of mid, the up and down fluctuation does not exceed 5% of the amplitude. Points within this range are considered shoulder points: n is the number of sampling points in a half cycle of power frequency, and ε is the "shoulder proportion".
For the three categories of loads, 100 sets of data under normal condition are selected respectively to calculate the proportion of shoulder.
The load examples in Figure 7 show that the loads can be divided into two types (normal with shoulder and normal without shoulder) when thresholds are selected from 4% to 42%. In order to leave a certain margin, the threshold is set as 10%: ε ≥ 10%normal with shoulder ε < 10%normal without shoulder (7) Electronics 2020, 9, x FOR PEER REVIEW 9 of 21 xmax, xmin are the maximum and minimum currents in a half cycle of power frequency, respectively.
To generalize the results, it is necessary to determine the detection bandwidth of the shoulder: On the basis of mid, the up and down fluctuation does not exceed 5% of the amplitude. Points within this range are considered shoulder points: n is the number of sampling points in a half cycle of power frequency, and is the "shoulder proportion".
For the three categories of loads, 100 sets of data under normal condition are selected respectively to calculate the proportion of shoulder.
The load examples in Figure 7 show that the loads can be divided into two types (normal with shoulder and normal without shoulder) when thresholds are selected from 4% to 42%. In order to leave a certain margin, the threshold is set as 10%:

Secondary Classification
For a normal load type without shoulder, the voltage and current of the resistive category are in the same phase, and the voltage of the resistive-inductive category leads the current. Therefore, this can be further classified by the phase relationship of the voltage and current signals.
The maximum value of the voltage signal and the current signal within a certain half period are selected to compare the time sequence of the difference between the two: n is the number of sampling points in the half cycle of the power frequency, and m is the maximum sampling point label, ∆ is the "phase difference".
For the loads without shoulder under normal condition, 100 sets of data are also selected respectively to calculate the phase difference.

Secondary Classification
For a normal load type without shoulder, the voltage and current of the resistive category are in the same phase, and the voltage of the resistive-inductive category leads the current. Therefore, this can be further classified by the phase relationship of the voltage and current signals.
The maximum value of the voltage signal and the current signal within a certain half period are selected to compare the time sequence of the difference between the two: n is the number of sampling points in the half cycle of the power frequency, and m is the maximum sampling point label, ∆ is the "phase difference".
For the loads without shoulder under normal condition, 100 sets of data are also selected respectively to calculate the phase difference. The load examples in Figure 8 show that the loads can be divided into two types (resistive-inductive category and the resistive category) when thresholds are selected from 1% to 17%. In order to leave a certain margin, the threshold is set as 10%: The load can be classified according to the above classification criteria, as shown in Table 2.
Electronics 2020, 9, x FOR PEER REVIEW 10 of 21 The load examples in Figure 8 show that the loads can be divided into two types (resistive-inductive category and the resistive category) when thresholds are selected from 1% to 17%. In order to leave a certain margin, the threshold is set as 10%:   Δ 10%, RI Δ 10%, Re (9) The load can be classified according to the above classification criteria, as shown in Table 2.

Primary Classification Secondary Classification
Normal without shoulder Resistive category (Re) Resistive-inductive category (RI) Normal with shoulder Rectifying circuit with a capacitive filter category (RCCF)

Time-Frequency Indicators Selection
Comprehensive use of multiple time-frequency indicators will be more convincing and reliable. In terms of resistive category (Re), the current shoulder ratio increases significantly when arc faults occur. Therefore, "shoulder proportion" is selected as the time domain indicator. In addition, the current of the resistive-inductive category (RI) and rectifying circuit with a capacitive filter category (RCCF) will show more current pulses during faults. Thus, "degree of fluctuation" is quantified. For the frequency domain indicator, the current of all load categories will have a phenomenon of increasing high-frequency components during the fault, and "average energy" is selected to quantify.
The selected indicators are shown in Table 3. The selection methods of the "effective value of the current", "degree of fluctuation", and "average energy" are described as follows.

Time-Frequency Indicators Selection
Comprehensive use of multiple time-frequency indicators will be more convincing and reliable. In terms of resistive category (Re), the current shoulder ratio increases significantly when arc faults occur. Therefore, "shoulder proportion" is selected as the time domain indicator. In addition, the current of the resistive-inductive category (RI) and rectifying circuit with a capacitive filter category (RCCF) will show more current pulses during faults. Thus, "degree of fluctuation" is quantified. For the frequency domain indicator, the current of all load categories will have a phenomenon of increasing high-frequency components during the fault, and "average energy" is selected to quantify.
The selected indicators are shown in Table 3. The selection methods of the "effective value of the current", "degree of fluctuation", and "average energy" are described as follows. Table 3. Selection of load indicators and comparison of characteristics.

Re Shoulder proportion
Average energy RI Degree of fluctuation RCCF

Effective Value of the Current
By calculating the root-mean-square of the approximate signal a5 after wavelet decomposition and reconstruction of the current signal, the effective value of the current is obtained: x is the data column, α is the effective value, and n is the number of sampling points in a half cycle of power frequency.

Degree of Fluctuation
The forward differential of the current a5 signal after wavelet reconstruction is: Then, the differential series is summed to reflect the relativity of the current fluctuation and then divided by the effective value of the current signal half cycle to obtain the fluctuation degree: z is the differentiated data, γ is the degree of fluctuation, and α is the effective value. Table 1 shows that the detailed signal is more suitable for analyzing the high-frequency characteristics of current signal when a fault occurs. In addition, the load examples in Figures 9-11 show that, after wavelet decomposition, the amplitude of each layer of the five-layer detailed signal of the load is greater than that of the normal time.

Average Energy
x is the data column, α is the effective value, and n is the number of sampling points in a half cycle of power frequency.

Degree of Fluctuation
The forward differential of the current a5 signal after wavelet reconstruction is: Then, the differential series is summed to reflect the relativity of the current fluctuation and then divided by the effective value of the current signal half cycle to obtain the fluctuation degree: z is the differentiated data, γ is the degree of fluctuation, and  is the effective value. Table 1 shows that the detailed signal is more suitable for analyzing the high-frequency characteristics of current signal when a fault occurs. In addition, the load examples in Figures 9-11 show that, after wavelet decomposition, the amplitude of each layer of the five-layer detailed signal of the load is greater than that of the normal time.         Synthesizing the five layers of detailed information of wavelet decomposition, the "average energy" is selected as the frequency domain indicator of all loads. That is, after the decomposition of the five layers of the wavelet, the energies of the detailed signals d 1 , d 2 , d 3 , d 4 , and d 5 of different layers are averaged:

Average Energy
η is average energy, d n is the layer n of the detailed signal, and n is the number of sampling points in a half cycle of power frequency.

Establishing the BP Neural Network
After the current waveform of the load is quantified by the indicators, it can be further identified by setting a threshold. However, if the thresholds of multiple indicators for different loads and the weights between different indicators are artificially set, since the indicators will randomly fluctuate, this can easily lead to misjudgment. Therefore, artificial neural networks are used to adaptively find the optimal thresholds and weights.
The error back propagation neural network (BP neural network) is a multilayer feedforward neural network. The basic idea is to use the gradient search technique to continuously modify the weights and thresholds between layers to minimize the error between the actual output value of the network and the expected output value [31]. In this paper, a more accurate and reasonable category recognition is carried out, and obvious characteristic indicators are found for each load category, which lays a theoretical foundation for the simplification of artificial neural networks. Therefore, the three-layer BP neural network of "input layer-hidden layer-output layer", as shown in Figure 12, is used for various load categories. Synthesizing the five layers of detailed information of wavelet decomposition, the "average energy" is selected as the frequency domain indicator of all loads. That is, after the decomposition of the five layers of the wavelet, the energies of the detailed signals d1, d2, d3, d4, and d5 of different layers are averaged: η is average energy, d n is the layer n of the detailed signal, and n is the number of sampling points in a half cycle of power frequency.

Establishing the BP Neural Network
After the current waveform of the load is quantified by the indicators, it can be further identified by setting a threshold. However, if the thresholds of multiple indicators for different loads and the weights between different indicators are artificially set, since the indicators will randomly fluctuate, this can easily lead to misjudgment. Therefore, artificial neural networks are used to adaptively find the optimal thresholds and weights.
The error back propagation neural network (BP neural network) is a multilayer feedforward neural network. The basic idea is to use the gradient search technique to continuously modify the weights and thresholds between layers to minimize the error between the actual output value of the network and the expected output value [31]. In this paper, a more accurate and reasonable category recognition is carried out, and obvious characteristic indicators are found for each load category, which lays a theoretical foundation for the simplification of artificial neural networks. Therefore, the three-layer BP neural network of "input layer-hidden layer-output layer", as shown in Figure 12, is used for various load categories. The BP neural network has a strong multivariate mapping ability and can be used for binary pattern recognition. According to Table 3, this paper takes the time-frequency indicators of different loads as the input of this type of load neural network; the results of identifying the arc fault adopt the binary output of "arc fault 1" and "normal operation 0", so the number of neurons in the input layer(m1) is 2, and the number of neurons in the output layer(m2) is 1. The number of hidden layer neurons can be determined according to the empirical formula: The BP neural network has a strong multivariate mapping ability and can be used for binary pattern recognition. According to Table 3, this paper takes the time-frequency indicators of different loads as the input of this type of load neural network; the results of identifying the arc fault adopt the binary output of "arc fault 1" and "normal operation 0", so the number of neurons in the input layer(m 1 ) is 2, and the number of neurons in the output layer(m 2 ) is 1. The number of hidden layer neurons can be determined according to the empirical formula: There are many types of training kernel functions for BP neural networks. To meet the needs of binary 0-1 output, the transfer function of hidden layer neurons uses the S-type tangent function Tansig, and the transfer function of output layer neurons uses the S-type logarithmic function Logsig. The training kernel function selects the Trainlm function based on the Levenberg-Marquardt algorithm.
The data used to train the network should be normalized in advance, and the data column should be normalized to the interval [0,1] to eliminate the influence of the order of magnitude and unit. The normalization method used in this paper is the maximum-minimum method: (15) x min is the minimum value of the data, x max is the maximum value of the data, and x h is the normalized value of the data.
According to the load category, the BP neural network builds three structures as "2-5-1". Using the normalized data to train the network, the output of neurons in the output layer is in the range of [0,1]. To generalize the training results, the output values y is organized as follows:

Genetic Algorithm Optimized Neural Network
For BP, the neural network is based on the gradient information of error function, when the initial value is not selected properly or the gradient information is hard to get, the BP neural network may be helpless. Before officially training the BP neural network, the genetic algorithm is used to find the optimal initial value, which can improve the performance of the neural network [32]. The algorithm flow [33] is shown in Figure 13: Electronics 2020, 9, x FOR PEER REVIEW 14 of 21 m 2 = 2 × m 1 + 1 = 5 (14) There are many types of training kernel functions for BP neural networks. To meet the needs of binary 0-1 output, the transfer function of hidden layer neurons uses the S-type tangent function Tansig, and the transfer function of output layer neurons uses the S-type logarithmic function Logsig. The training kernel function selects the Trainlm function based on the Levenberg-Marquardt algorithm.
The data used to train the network should be normalized in advance, and the data column should be normalized to the interval [0,1] to eliminate the influence of the order of magnitude and unit. The normalization method used in this paper is the maximum-minimum method: x min is the minimum value of the data, x max is the maximum value of the data, and x h is the normalized value of the data.
According to the load category, the BP neural network builds three structures as "2-5-1". Using the normalized data to train the network, the output of neurons in the output layer is in the range of [0,1]. To generalize the training results, the output values y is organized as follows:

Genetic Algorithm Optimized Neural Network
For BP, the neural network is based on the gradient information of error function, when the initial value is not selected properly or the gradient information is hard to get, the BP neural network may be helpless. Before officially training the BP neural network, the genetic algorithm is used to find the optimal initial value, which can improve the performance of the neural network [32]. The algorithm flow [33] is shown in Figure 13:  The "2-5-1" network adopted in this paper has a total of 2 × 5 + 5 × 1 = 15 weights and 5 + 1 = 6 thresholds, and the number of parameters to be optimized is 15 + 6 = 21. Each parameter uses 10-bit binary coding, so the length of the individual in the genetic algorithm is 210 bits. The norm of the error of the test sample is used as an indicator to measure the quality of the network, and the individual's fitness value is calculated by the error norm. The larger the individual's fitness value, the better the individual.
The other parameters of the genetic algorithm are shown in Table 4. Therefore, BP neural networks optimized by GA (GA-BP networks) of the resistive category, resistive-inductive category, and rectifying circuit with a capacitive filter category were established.

Experimental Device and Experimental Object Selection
According to the UL1699 standard of the American arc fault circuit breaker, the arc fault generating device was independently built in the laboratory. Figure 14a shows the laboratory test platform. The digital oscilloscope model is a Tektronix DPO3054. A TCP0030 current probe is used for current signal acquisition, and a P5200A/50 MHz high-voltage differential probe is used to collect the voltage signals. The arc generator consists of a stationary electrode and a moving electrode; one electrode is a carbon-graphite rod and the other electrode is a copper rod. As shown in Figure 14b, by adjusting the stepping motor, adjusting the distance between the copper rod and the carbon rod to generate an arc, the input voltage is 220 V. Current waveforms during the normal state and under an arc fault state are recorded with an oscilloscope (the sampling frequency is 200 kHz).
Electronics 2020, 9, x FOR PEER REVIEW 15 of 21 The "2-5-1" network adopted in this paper has a total of 2 × 5 + 5 × 1 = 15 weights and 5 + 1 = 6 thresholds, and the number of parameters to be optimized is 15 + 6 = 21. Each parameter uses 10-bit binary coding, so the length of the individual in the genetic algorithm is 210 bits. The norm of the error of the test sample is used as an indicator to measure the quality of the network, and the individual's fitness value is calculated by the error norm. The larger the individual's fitness value, the better the individual.
The other parameters of the genetic algorithm are shown in Table 4. Therefore, BP neural networks optimized by GA (GA-BP networks) of the resistive category, resistive-inductive category, and rectifying circuit with a capacitive filter category were established.

Experimental Device and Experimental Object Selection
According to the UL1699 standard of the American arc fault circuit breaker, the arc fault generating device was independently built in the laboratory. Figure 14a shows the laboratory test platform. The digital oscilloscope model is a Tektronix DPO3054. A TCP0030 current probe is used for current signal acquisition, and a P5200A/50 MHz high-voltage differential probe is used to collect the voltage signals. The arc generator consists of a stationary electrode and a moving electrode; one electrode is a carbon-graphite rod and the other electrode is a copper rod. As shown in Figure 14b, by adjusting the stepping motor, adjusting the distance between the copper rod and the carbon rod to generate an arc, the input voltage is 220 V. Current waveforms during the normal state and under an arc fault state are recorded with an oscilloscope (the sampling frequency is 200 kHz).
(a) Several loads are randomly selected as the test objects, as shown in Table 5. The current waveforms of the load measured in the laboratory under the normal state and fault state are shown in Figure 15, which shows that after the series arc occurs, different categories of load current waveforms show different characteristics, which is consistent with the previous theory.   Several loads are randomly selected as the test objects, as shown in Table 5. The current waveforms of the load measured in the laboratory under the normal state and fault state are shown in Figure 15, which shows that after the series arc occurs, different categories of load current waveforms show different characteristics, which is consistent with the previous theory. Several loads are randomly selected as the test objects, as shown in Table 5. The current waveforms of the load measured in the laboratory under the normal state and fault state are shown in Figure 15, which shows that after the series arc occurs, different categories of load current waveforms show different characteristics, which is consistent with the previous theory.

Category Recognition Results
Category recognition is performed before using GA-BP neural networks for detection. According to the classification method described in Section 3.3, category recognition of the sample data, the results are shown in Table 6: Category recognition is based on the voltage waveforms and current waveforms under normal situation. For different categories of loads, the waveforms vary greatly as shown in Figure 1. Therefore, category recognition is easy to achieve a high accuracy. As Table 6 shows, a total of 630 sets of data were used in the category recognition test, which achieved a high accuracy rate.

Fault Arc Identification Results
There are currently no generally accepted mathematical rules regarding the division of data sets for the development of artificial neural networks. Only some empirical rules divide the collected data into training sets, test sets, and validation sets, or training sets and test sets [34]. Generally, the ratio of the training set to the test set is roughly 7:3; for example, reference [35] recommended that 65% of the data be used to train the neural network and 35% of the data should be used for test verification. Therefore, for each load category, this paper selects 70% of the data as training samples and 30% of the data as test samples.
The time-frequency indicators of various loads in the training samples according to Table 3 are calculated, and the corresponding neural network is trained. Then, the 630 sets of test samples are tested one by one according to the arc fault detection method shown in Figure 2.
The experimental results are shown in Table 7, which demonstrates that the method proposed in this paper can efficiently identify the generation of arcs. The comprehensive accuracy of the BP networks fluctuates between 90.79% (572 out of 630 tests) and 93.65% (590 out of 630 tests). At the same time, the recognition accuracy of the GA-BP neural networks is 99.21% (625 out of 630 tests), which proves that the genetic algorithm can effectively improve the recognition accuracy of the neural network for arc generation.
The accuracy of the not optimized neural networks will fluctuate because the initial threshold and weight of the BP neural network are random numbers, so it can easily to fall into a local optimum during the training process. The weight and threshold obtained at this time are not the most appropriate parameters, leading to a decrease in accuracy. Before the BP neural network training, the genetic algorithm is used to find the optimal initial value, which can avoid this problem. Next, through the BP neural network training, the global optimal threshold and weight can be obtained, so the accuracy of the GA-BP neural networks will remain constant. In addition, Figure 16 shows that the comprehensive recognition results of the GA-BP neural networks are relatively concentrated, while the comprehensive recognition results of the BP neural networks are relatively dispersed. Therefore, the use of the genetic algorithm to optimize the BP neural network has a significant effect on improving the stability of accuracy. dispersed. Therefore, the use of the genetic algorithm to optimize the BP neural network has a significant effect on improving the stability of accuracy.

Comparison with Previous Detection Methods
Starting from the four major aspects of the category recognition method, neural network structure, optimization of the neural network and detection accuracy of arc recognition, the similarities and differences of the four representative methods previously mentioned and the methods proposed in this paper are compared. Table 8 shows that, in terms of category recognition method, the recognition standards of reference [25] are susceptible to interference from grid harmonics and can only be established under certain circumstances. The recognition standards mentioned in this paper are not subject to harmonics interference. It is simple and easy to implement; in terms of neural network structure, this paper greatly simplifies the neural network on the basis of category recognition. There is only one hidden layer, there are two input neurons and one output neuron, and the three neural networks after category recognition have the same structure. In terms of neural network optimization, this paper uses genetic algorithms to optimize the initial value of the network. Compared with other methods, it ensures that the recognition accuracy is stable. In terms of the accuracy of arc detection, the method proposed in this paper is based on category recognition and a simplified network structure, achieving 99.21% accuracy, and the test data includes various load categories, making the results more general and reliable.
In addition, the software accuracy top limitation in the GA-BP is the amount and type of training data. If the type and amount of training data are increased, the accuracy will be further improved. However, training time will also increase.

Comparison with Previous Detection Methods
Starting from the four major aspects of the category recognition method, neural network structure, optimization of the neural network and detection accuracy of arc recognition, the similarities and differences of the four representative methods previously mentioned and the methods proposed in this paper are compared. Table 8 shows that, in terms of category recognition method, the recognition standards of reference [25] are susceptible to interference from grid harmonics and can only be established under certain circumstances. The recognition standards mentioned in this paper are not subject to harmonics interference. It is simple and easy to implement; in terms of neural network structure, this paper greatly simplifies the neural network on the basis of category recognition. There is only one hidden layer, there are two input neurons and one output neuron, and the three neural networks after category recognition have the same structure. In terms of neural network optimization, this paper uses genetic algorithms to optimize the initial value of the network. Compared with other methods, it ensures that the recognition accuracy is stable. In terms of the accuracy of arc detection, the method proposed in this paper is based on category recognition and a simplified network structure, achieving 99.21% accuracy, and the test data includes various load categories, making the results more general and reliable. In addition, the software accuracy top limitation in the GA-BP is the amount and type of training data. If the type and amount of training data are increased, the accuracy will be further improved. However, training time will also increase.
It should be noted that the focus of this paper is to propose a detection method. Based on the measured data, MATLAB is used to run the algorithm program on the laptop. The CPU of the laptop is Intel(R) Core(TM) i5-6300HQ CPU@2.30GHz. In addition, as Table 9 shows, the training time of GA-BP network is less than 403.23 s, and the running time of the detection function is less than 0.011 s. In practice, if we want to realize the above with MCU or DSP, we can refer to reference [25] or reference [26].

Conclusions
Based on the summary of arc volt-ampere characteristics, this paper analyzes the characteristics of different categories of loads when the arc is faulty. Then, a series arc fault detection method based on category recognition and artificial neural network are proposed. First, two-level classification of the loads is based on the voltage waveforms and current waveforms, and then the quantitative indicators are selected according to the characteristics of different categories of loads. The data of indicators are used to train the artificial neural network corresponding to the category of loads, which will be used to detect the arc. Taking the half period as the time unit, within the time specified by the detection standard, when an arc is detected, the number of arcs generated is increased by one, and finally the count result is integrated. When the count value reaches the threshold specified by the standard, it is considered that an arc fault has occurred; at this time, the protection equipment cuts off the circuit.
In summary, the arc fault identification method mentioned in this paper is novel and unique, with clear ideas. After experimental verification, the accuracy of arc identification is high, which means that the method can be further promoted and applied.

Conflicts of Interest:
The authors declare no conflict of interest.