A DC Series Arc Fault Detection Method Based on a Lightweight Convolutional Neural Network Used in Photovoltaic System

: Although photovoltaic (PV) systems play an essential role in distributed generation systems, they also suffer from serious safety concerns due to DC series arc faults. This paper proposes a lightweight convolutional neural network-based method for detecting DC series arc fault in PV systems to solve this issue. An experimental platform according to UL1699B is built, and current data ranging from 3 A to 25 A is collected. Moreover, test conditions, including PV inverter startup and irradiance mutation, are also considered to evaluate the robustness of the proposed method. Before fault detection, the current data is preprocessed with power spectrum estimation. The lightweight convolutional neural network has a lower computational burden for its fewer parameters, which can be ready for embedded microprocessor-based edge applications. Compared to similar lightweight convolutional network models such as Efﬁcientnet-B0, B2, and B3, the Efﬁcientnet-B1 model shows the highest accuracy of 96.16% for arc fault detection. Furthermore, an attention mechanism is combined with the Efﬁcientnet-B1 to make the algorithm more focused on arc features, which can help the algorithm reduce unnecessary computation. The test results show that the detection accuracy of the proposed method can be up to 98.81% under all test conditions, which is higher than that of general networks.


Introduction
With the frequent occurrence of climate changes caused by global warming [1], environmental problems have attracted more and more attention. In order to reduce carbon dioxide emissions, the use of fossil energy is limited and green energy is more and more widely used. Solar energy is a kind of green energy that adds no pollution to the environment. Photovoltaic (PV) systems can convert solar energy into electric energy for people to use conveniently [2]; they play an essential role in distributed generation systems [3], so they are widely used in households and other places where solar energy is plentiful [4]. However, arc faults on the DC side of a PV system may cause severe electrical fires due to the high temperature above 5000 • C, which may ignite surrounding combustible material [5].
of the input data. Other neural networks can also be applied to arc fault detection. The neural network methods have high accuracy and do not need to set the threshold artificially. To further improve the accuracy of arc fault detection, the depth scaling, width scaling, and resolution scaling of the network need to be increased. However, if the three scales are added together, this dramatically increases the requirement for computer computing resources. Therefore, most existing methods add one network scaling to improve accuracy.
Although the existing AI-based arc fault detection methods have achieved good accuracy, higher than 95%, the accuracy needs to be further improved to reduce fire risk. Moreover, existing methods have not considered the situation of high current value and the influence of normal operations, such as PV inverter startup and irradiance mutation, on arc fault detection; therefore, the robustness of the methods needs to be improved. Furthermore, the number of model parameters of the existing methods is vast. The computational burden is too enormous for industrial embedded microcontrollers to implement.
In this paper, we propose a lightweight convolutional neural network-based method for detecting DC series arc faults in PV systems.
The main contributions of this paper are as follows: 1.
Since the actual DC arc faults of PV systems are very stochastic, it is difficult to capture a large amount of DC series arc fault data directly for algorithm research of arc fault detection. Therefore, this paper establishes a DC arc fault experimental platform for an arc fault detection device (AFDD) installed within the inverter in the UL 1699B standard and analyzes DC series arc faults under different current values from 3 A to 25 A. Moreover, PV inverter startup and irradiance mutation are also considered, to evaluate the robustness of the algorithm.

2.
Due to the complex working conditions of PV systems, it is essential to find the apparent characteristics of the current signal for arc fault detection. This paper takes the DC series arc current signal in PV systems as the research object and analyzes its power spectrum characteristics. The AR model of the DC current signal is established to obtain the power spectrum images, by exploring the principles of commonly used power spectrum estimation methods. The results show that the power spectrum images of current data in normal states and arc fault states have apparent differences.

3.
An algorithm based on a lightweight convolutional neural network is proposed to detect DC series arc faults in PV systems. The gray images of the power spectrum of the DC current data are fed into the network model, and the detection accuracy of the proposed method is 98.81%, which is higher than the accuracies of GoogLeNet, AlexNet, and existing general networks. This algorithm, with fewer model parameters, has a low computational burden, provides better performance during the running process, and is feasible to run in an embedded microprocessor.

Arc Fault Experiment Platform
Since the actual DC arc faults in PV systems have stochastic characteristics, it is challenging to directly capture a large amount of DC arc current data for the arc fault detection algorithm. Therefore, a DC arc fault experimental platform is established to generate DC series arc faults under different working conditions for collecting current data.
The experimental platform mainly includes a PV string, an arc fault generator, signal acquisition devices, and a PV inverter. The UL1699B standard includes four application examples for different AFDD installation positions. The first case, in which an AFDD is installed within the inverter, was used in this experimental platform. A GOODWE GW36K-MT three-phase inverter was used as the load, and the AFDD was installed within the inverter. The UL1699B standard indicates that the PV simulator can replace the actual PV string. Therefore, the ITECH IT6018C PV simulator replaced the PV string to make the experiments more convenient and diversified. The voltage range was 0-1500 V, and the current range was 0-40 A. The ITECH IT6018C PV can simulate the I-V curve under various weather conditions, such as irradiance. In accordance with the UL1699B standard, two circuit forms were used: (1) the circuit of one PV string for a centralized power inverter; (2) the circuit of two PV strings for a centralized power inverter. Figure 1 shows the circuit of the two PV strings for a centralized power inverter. line (80 m) between the AFDD and the PV string in PV systems, an impedance network module was added to the circuit to simulate the high-frequency characteristics of the PV system. The impedance network parameters shown in Figure 1 were set in accordance with the UL 1699B standard. When C1 was set to two parameters for testing-300 nF and 20 μF, respectively-the arc fault was the most serious, so each situation had to be tested. The standard stipulates that a decoupling network should be added in front of the impedance network to control the output capacitance of the PV simulator and simulate the DC characteristics of the PV system. The decoupling network is shown in Figure 2. According to the UL1699B standard, when Impp = 3 A, R3 = R4 = 27 Ω, and when Impp = 16 A, R3 = R4 = 4.5 Ω. According to the IEC 63027 standard, when Impp = 25 A, R3 = R4 = 2.5 Ω.  The different locations of arc faults have different effects on the DC side current of PV systems. In accordance with the UL 1699B standard, an arc generator was added to the circuit for simulating arc faults, as shown in 1 , 2 , and 3 in Figure 1. They are between the PV strings, at the end of the PV strings, and at the start of the PV strings. The arc generator was integrated into the system and combined with the system to generate a series arc fault.
In order to simulate the parasitic capacitance and inductance generated by the long line (80 m) between the AFDD and the PV string in PV systems, an impedance network module was added to the circuit to simulate the high-frequency characteristics of the PV system. The impedance network parameters shown in Figure 1 were set in accordance with the UL 1699B standard. When C1 was set to two parameters for testing-300 nF and 20 µF, respectively-the arc fault was the most serious, so each situation had to be tested. The standard stipulates that a decoupling network should be added in front of the impedance network to control the output capacitance of the PV simulator and simulate the DC characteristics of the PV system. The decoupling network is shown in Figure 2. According to the UL1699B standard, when I mpp = 3 A, R3 = R4 = 27 Ω, and when I mpp = 16 A, R3 = R4 = 4.5 Ω. According to the IEC 63027 standard, when I mpp = 25 A, R3 = R4 = 2.5 Ω.

Different Operating Conditions in Experiments
Different operating conditions were used for data collection to verify the generalization ability of the algorithm. In this experiment, we selected three various tests from the UL 1699B and IEC63027 standards, as shown in Table 1. In order to simulate the worst arc fault situation, the impedance network component C1 was set to two parameters for testing, 300 nF and 20 μF, respectively. Each test, as shown in Table 1, was performed at the three arc fault locations shown in Figure 1 to verify the reliability of the algorithm. The minimum Iarc represents a realistic arc event with one or two strings at low irradiance, and Impp, Vmpp represent current and voltage in the maximum power point, respectively. Voc represents open-circuit voltage. The PV simulator can set the four parameters. A stepping motor controller can set the gap and arcing speed. In this experiment, we added two situations, as shown in Table 1: (1) a PV inverter startup, and (2) irradiance mutation, which causes current mutation. These two situations, which belong to the normal state, were tested to verify the robustness of the algorithm.
The DC arc current in PV systems presents the characteristics of stochastic high-frequency burrs in the time domain. In contrast, the frequency spectrum amplitude increases slightly in a specific frequency band (such as 40-100 kHz) in the frequency domain. The high-frequency noise of a similar frequency band will be generated when the PV inverter is in the PWM state, and its frequency spectrum amplitude is the same as or even higher than the arc current signal. Therefore, it is difficult to distinguish between the normal state and the arc fault state according to the amplitude difference. However, the PWM noise generated by power electronic devices has regularity due to periodic modulation and system inertia, so the current signals under different working conditions can be distinguished by analyzing the power spectrum. The power spectrum can describe the stochastic signal, which defines the power of the current signal as a function of frequency, and it is susceptible to the change of the signal. It can essentially reflect the objective law of signal change. The process of solving the power spectrum is called power spectrum estimation. Modern power spectrum estimation methods mainly include parametric model spectrum estimation and nonparametric model spectrum estimation. Compared with parametric model spectrum estimation, nonparametric model spectrum estimation has better spectrum estimation performance. However, it requires a large amount of calculation and model complexity, which present challenges in meeting the real-time requirements of DC arc fault detection in practical applications. Therefore, the power spectrum estimation method of the parametric model, with less calculation, is selected. The general power spectrum estimation methods of the parametric model include the autoregressive (AR) model and the autoregressive moving average (ARMA) model.

Different Operating Conditions in Experiments
Different operating conditions were used for data collection to verify the generalization ability of the algorithm. In this experiment, we selected three various tests from the UL 1699B and IEC63027 standards, as shown in Table 1. In order to simulate the worst arc fault situation, the impedance network component C1 was set to two parameters for testing, 300 nF and 20 µF, respectively. Each test, as shown in Table 1, was performed at the three arc fault locations shown in Figure 1 to verify the reliability of the algorithm. The minimum I arc represents a realistic arc event with one or two strings at low irradiance, and I mpp , V mpp represent current and voltage in the maximum power point, respectively. V oc represents open-circuit voltage. The PV simulator can set the four parameters. A stepping motor controller can set the gap and arcing speed. In this experiment, we added two situations, as shown in Table 1: (1) a PV inverter startup, and (2) irradiance mutation, which causes current mutation. These two situations, which belong to the normal state, were tested to verify the robustness of the algorithm.  The DC arc current in PV systems presents the characteristics of stochastic highfrequency burrs in the time domain. In contrast, the frequency spectrum amplitude increases slightly in a specific frequency band (such as 40-100 kHz) in the frequency domain. The high-frequency noise of a similar frequency band will be generated when the PV inverter is in the PWM state, and its frequency spectrum amplitude is the same as or even higher than the arc current signal. Therefore, it is difficult to distinguish between the normal state and the arc fault state according to the amplitude difference. However, the PWM noise generated by power electronic devices has regularity due to periodic modulation and system inertia, so the current signals under different working conditions can be distinguished by analyzing the power spectrum. The power spectrum can describe the stochastic signal, which defines the power of the current signal as a function of frequency, and it is susceptible to the change of the signal. It can essentially reflect the objective law of signal change. The process of solving the power spectrum is called power spectrum estimation. Modern power spectrum estimation methods mainly include parametric model spectrum estimation and nonparametric model spectrum estimation. Compared with parametric model spectrum estimation, nonparametric model spectrum estimation has better spectrum estimation performance. However, it requires a large amount of calculation and model complexity, which present challenges in meeting the real-time requirements of DC arc fault detection in practical applications. Therefore, the power spectrum estimation method of the parametric model, with less calculation, is selected. The general power spectrum estimation methods of the parametric model include the autoregressive (AR) model and the autoregressive moving average (ARMA) model.

AR Model
The time series x(n) of the p-order AR model is obtained by the superposition of the signal value at the first p moments and the white noise, and the calculation formula is In Formula (1), a m is the coefficient of the corresponding time series data, and w is the Gaussian white noise with mean value 0 and variance σ 2 .
The system transfer function expression of the p-order AR model is According to Equation (2), the AR model is an all-pole model, which can directly reflect the peak distribution in the power spectrum. The Fourier transform processes the transfer function in Equation (2) to obtain the power spectrum calculation, as shown in Equation (3):

ARMA Model
The time series calculation formula of the (p, q) order ARMA model is According to Equation (4), the system transfer function of the ARMA model is According to Equation (5), the ARMA model is a zero-pole model, which can directly reflect the peak and valley distribution in the power spectrum. The Fourier transform processes the transfer function in Equation (5) to obtain the power spectrum calculation, as shown in Equation (6): The AR model has a simpler structure and fewer calculations than the ARMA model. Therefore, the AR model is selected as the power spectrum estimation model. After the AR model of the DC current signal is established, the model parameters need to be calculated.

The Selection of Optimal Parameters in the AR Model
It can be seen from Equation (3) that the prediction accuracy of the power spectrum depends on the coefficient a m and the order p, so choosing a suitable model parameters calculation method is necessary. Commonly used calculation methods for model parameters include the Levinson-Durbin algorithm and Burg algorithm. In this paper, the Burg algorithm was selected as the parameter calculation method for the current signal AR model of the PV system for research, because it has the minimum sum of total mean square error. The calculation process is as follows.
In Equation (7), k m is the reflection coefficient. The forward and backward prediction error power ε is defined as: To minimize the error power ε, make ∂ε ∂k m = 0; the reflection coefficient k m is calculated by Equations (7) and (8): In Equation (9), m = 1, 2, 3, . . . , p. Since the reflection coefficient k m is an unbiased estimation of the partial correlation coefficient, the autocovariance function R xx of order from 0 to p, which is related to the parameter, can be derived from the Yule-Walker formula: In Equation (10), l = 1, 2, . . . , m − 1. The following Equation (11) can be obtained by cycle calculation. In Equation (9), the reflection coefficient k m can be used as the estimated value of c m . The Levinson recurrence Formula (12) can be obtained by substituting it into Equation (11). The AR model coefficient a m is calculated according to the recurrence relationship: In Equation (12), l = 1, 2, . . . , m − 1. After the calculation, add 1 to the value of m and repeat the above steps until m = p.
After using the Burg algorithm to obtain the AR model coefficient a m , it is necessary to determine the optimal order p of the model. If the order is not selected correctly, the estimation results will be inconsistent with reality. Using the Akaike information criterion (AIC) to fit the asymptotic unbiased estimation of the difference between the AR model and truth-value, the best order of the model can be determined when the model is unknown. The smaller the AIC value, the better the fitting effect of the model.
The general form of the AIC criterion is: where k is the number of parameters and L is the likelihood function. Assuming that the number of current samples is N and SSR is the sum of squares of residuals, Equation (13) can be converted to: Equation (14) is applied to the order determination of the AR model. k represents the order p, N is the number of samples, and SSR N is the variance of the prediction error of the AR model, which can be replaced by σ 2 p ; then Equation (14) is converted to: In Equation (15), σ 2 p can be calculated by the reflection coefficient k p in the Burg algorithm by Equation (9), and the calculation formula is: In order to obtain the optimal order of the AR model, the above arc fault experimental platform was used to collect eighteen groups of DC side current data by tests no. 1, no. 2, and no. 3 with a 250 kHz sampling rate. The arc current is disordered and stochastic, and it influences the calculation result, so the current in the normal state was selected for calculating AIC values and analysis. The time window of each group of data was 10 ms. Thus, each time a window had 2500 samples, which ensured the validity of the calculation, and the samples were not very large. The order p was from 1 to 20, and the AIC values corresponding to different orders could be obtained according to Equations (15) and (16). The results are shown in Figure 3.
It can be seen from Figure 3 that when the order p = 12, the AIC value of the current data was the smallest. When the order p increased, the AIC value changed indistinctly and had a slightly increasing trend. Therefore, the optimal order of the DC current signal AR model was p = 12.
The Burg algorithm was used to solve the 12-order AR model coefficient of the current signal, and the expression of the transfer function is: Energies 2022, 15, 2877 9 of 20 and no. 3 with a 250 kHz sampling rate. The arc current is disordered and stochastic, and it influences the calculation result, so the current in the normal state was selected for calculating AIC values and analysis. The time window of each group of data was 10 ms. Thus, each time a window had 2500 samples, which ensured the validity of the calculation, and the samples were not very large. The order p was from 1 to 20, and the AIC values corresponding to different orders could be obtained according to Equations (15) and (16). The results are shown in Figure 3. It can be seen from Figure 3 that when the order p = 12, the AIC value of the current data was the smallest. When the order p increased, the AIC value changed indistinctly and had a slightly increasing trend. Therefore, the optimal order of the DC current signal AR model was p = 12.
The Burg algorithm was used to solve the 12-order AR model coefficient of the current signal, and the expression of the transfer function is: According to Equation (17), the power spectrum estimation expression of the 12-order AR model of the current signal is calculated as:  According to Equation (17), the power spectrum estimation expression of the 12-order AR model of the current signal is calculated as: When using the AR model to calculate the power spectrum of the PV system current data, it is necessary to select a suitable time window scale to enlarge the difference in the power spectrum of the current signal under different time windows. In particular, the difference can reflect the changing characteristics in arc current, which is significantly different from the normal state. Since the correlation coefficient can reflect the relationship between two variables, the correlation coefficients of the power spectrum under different time windows were calculated by three groups of current data in test no. 1 separately. The characteristics of tests no. 2 and no. 3 were similar to those of test no. 1, and the variance was also calculated. The larger the variance value, the more pronounced the power spectrum difference in different time windows. It can be seen from Table 2 that when 10 ms and 17 ms time windows were selected for power spectrum estimation, the variance of the correlation coefficient in the arc fault state was considerable. In contrast, the variance of the normal state correlation coefficient was much smaller than that of the arc fault state. However, the 17 ms time window was too long to process Energies 2022, 15, 2877 10 of 20 data quickly. Therefore, 10 ms was selected as the time window scale for calculating the DC current power spectrum.
After the time window scale was determined, the power spectrum of current signals was drawn for comparative analysis. Six groups of current data were selected by tests no. 1, no. 2, and no. 3, the 12-order AR model was established, and the power spectrum was calculated. One of the results of test no. 1 is shown in Figure 4. After the time window scale was determined, the power spectrum of current signals was drawn for comparative analysis. Six groups of current data were selected by tests no. 1, no. 2, and no. 3, the 12-order AR model was established, and the power spectrum was calculated. One of the results of test no. 1 is shown in Figure 4.  In Figure 4, the orange line represents the power spectrum of the current in the arc fault state. With the increase of frequency, the power spectrum values decreased gradually. The values of the low-frequency part were significantly higher than those of the high-frequency part. The blue line represents the power spectrum of the current in the normal state. The power spectrum values were basically unchanged with the frequency increase, except for 0-10 kHz. In addition, the power spectrum values of arc fault were higher than those of the normal state. The spike at 32 kHz was due to the noise interference of the PV inverter. Therefore, the power spectrum was significantly different between the arc fault state and the normal state, and could be used as the neural network input to detect arc fault.

Data Processing and Creating the Dataset
The power spectrum of the DC current under the normal state and the arc fault state were different, so it could be used as the input of the neural network model for training. We used the experimental platform to collect the current data of the tests shown in Table 1. Tests no. 1 to no. 3 contained eighteen groups of data that included the normal state and the arc fault state. Test no. 4 contained three groups of data, and test no. 5 contained six groups of data. Both test no. 4 and test no. 5 belonged to the normal state.
The original data were split to extract arc fault data and normal data. Since neural network learning requires a large amount of data, the current data collected by the experimental platform were processed into a dataset, as input for the neural network. The dataset's format and size were unified to facilitate network training. In order to unify the size of the dataset, the classified data for the arc fault state and the normal state were processed into the same time scale, and the 10 ms sampling window was taken as the unit time window.
Since the dataset sampling rate was 250 kHz, the number of sampling points in the unit time window was 2500. The 12-order AR model was used to obtain the power spectrum data. According to the different range of power spectrum values under different working conditions, the values were normalized to map the data value between [0, 1]. The deviation standardization was used as the normalization method, and the equation was as follows: In Equation (19), max and min are the maximum and minimum values of power spectrum data and x* is the normalized value. Each group's normalized power spectrum data has the same order of magnitude.
The normalized data were transferred into the two-dimensional image format. In order to improve the training efficiency of the model, the images were processed into the gray images shown in Figure 5. The resolution of the images was converted into 240 × 240 to meet the EfficientNet-B1 input requirement. The total number of images was 10,000 in the dataset, including 6000 images in the training set, 2000 in the validation set, and 2000 in the test set. After the data were processed into images, each group of data was labeled and divided into two types: arc fault and normal.

Methodology
Convolutional neural network (CNN) has emerged as a fundamental feature exaction program for applications in image tasks. However, the existence of multiple complex behaviors of arc current in PV systems makes some convolutional frameworks suboptimal for the arc fault detection task. Due to the complexity of the DC series arc fault current in PV systems, it is difficult to find a suitable set of CNN parameters, including depth, width, and resolution size, for effectively distinguishing between the arc fault state and the normal state using the current. Inspired by EfficientNet and the attention mechanism, this paper proposes a model based on a lightweight convolutional neural network with a channel and spatial attention mechanism for arc fault detection, and names it ArcDetec-tionNet (ADNet).

Lightweight Convolutional Backbone Network Structure
A lightweight convolutional backbone network structure, referring to the idea of Ef-ficientNet, is shown in Figure 6. H, C, and W represent three dimensions of the convolutional neural network. First, we performed a 1 × 1 point-by-point convolution on the input data and changed the output channel dimension according to the expansion ratio. The global features were obtained in the channel dimension of the feature map, and then k × k depth convolution was carried out. Second, we performed an excitation operation on the output result. The 1 × 1 convolution result was multiplied by the activation ratio R, and the original channel dimension was restored at the end of the 1 × 1 point-by-point convolution. Finally, the connection deactivation and the input jump connection were carried out. This structure is called mobile inverted bottleneck convolution (MBConv). Each convolution operation in this module is normalized and uses the swish activation func-

Methodology
Convolutional neural network (CNN) has emerged as a fundamental feature exaction program for applications in image tasks. However, the existence of multiple complex behaviors of arc current in PV systems makes some convolutional frameworks suboptimal for the arc fault detection task. Due to the complexity of the DC series arc fault current in PV systems, it is difficult to find a suitable set of CNN parameters, including depth, width, and resolution size, for effectively distinguishing between the arc fault state and the normal state using the current. Inspired by EfficientNet and the attention mechanism, this paper proposes a model based on a lightweight convolutional neural network with a channel and spatial attention mechanism for arc fault detection, and names it ArcDetectionNet (ADNet).

Lightweight Convolutional Backbone Network Structure
A lightweight convolutional backbone network structure, referring to the idea of Effi-cientNet, is shown in Figure 6. H, C, and W represent three dimensions of the convolutional neural network. First, we performed a 1 × 1 point-by-point convolution on the input data and changed the output channel dimension according to the expansion ratio. The global features were obtained in the channel dimension of the feature map, and then k × k depth convolution was carried out. Second, we performed an excitation operation on the output result. The 1 × 1 convolution result was multiplied by the activation ratio R, and the original channel dimension was restored at the end of the 1 × 1 point-by-point convolution.
Finally, the connection deactivation and the input jump connection were carried out. This structure is called mobile inverted bottleneck convolution (MBConv). Each convolution operation in this module is normalized and uses the swish activation function. The swish activation function equation is as follows: Energies 2022, 15, x FOR PEER REVIEW 13 of 21 The effect of the swish function is better than that of the ReLU function on the deep network model. It has a lower bound without an upper bound, and it is smooth and nonmonotonic. This method can make the model have stochastic depth, reduce the time required for model training, and improve model performance.

Arc Detection Attention Mechanism Module
The neural network uses the attention mechanism to generate different connection weights between layers and obtain the output of this layer, so it can focus on specific input characteristics, reduce the number of network operations, and improve network performance. This paper proposes an arc detection attention mechanism (ADAM) module. ADAM was calculated based on the channel and space dimensions for the feature map generated by the convolutional neural network. The calculation results were multiplied by the input data to carry out adaptive learning of features. Moreover, the module was designed for a convolutional neural network, which could be combined with various convolutional neural networks for end-to-end training. For example, we set the channel attention mechanism and then set the spatial attention mechanism after the channel attention mechanism. The structure of the ADAM module is shown in Figure 7.
As shown in Figure 7, the ADAM module extracts data features from two dimensions: channel and space. The channel attention mechanism performs pooling and convolution operations for the input data. The output data of the above processes are each channel's weight coefficient, and the weight coefficient is multiplied by the input data to weight and fuse the channels. The output features weighted by the channel attention mechanism are used as the input of the spatial attention mechanism module to weight the crucial regions in the spatial dimension.
The channel attention mechanism module and the spatial attention mechanism module are connected in serial. By changing the combination and position of the two modules, the optimal combination was selected to construct the ADNet model.  Figure 6. Lightweight convolutional backbone network structure.
In Equation (20), β is a constant or trainable parameter, which defaults to 1. The effect of the swish function is better than that of the ReLU function on the deep network model. It has a lower bound without an upper bound, and it is smooth and non-monotonic. This method can make the model have stochastic depth, reduce the time required for model training, and improve model performance.

Arc Detection Attention Mechanism Module
The neural network uses the attention mechanism to generate different connection weights between layers and obtain the output of this layer, so it can focus on specific input characteristics, reduce the number of network operations, and improve network performance. This paper proposes an arc detection attention mechanism (ADAM) module. ADAM was calculated based on the channel and space dimensions for the feature map generated by the convolutional neural network. The calculation results were multiplied by the input data to carry out adaptive learning of features. Moreover, the module was designed for a convolutional neural network, which could be combined with various convolutional neural networks for end-to-end training. For example, we set the channel attention mechanism and then set the spatial attention mechanism after the channel attention mechanism. The structure of the ADAM module is shown in Figure 7.
As shown in Figure 7, the ADAM module extracts data features from two dimensions: channel and space. The channel attention mechanism performs pooling and convolution operations for the input data. The output data of the above processes are each channel's weight coefficient, and the weight coefficient is multiplied by the input data to weight and fuse the channels. The output features weighted by the channel attention mechanism are used as the input of the spatial attention mechanism module to weight the crucial regions in the spatial dimension.
The channel attention mechanism module and the spatial attention mechanism module are connected in serial. By changing the combination and position of the two modules, the optimal combination was selected to construct the ADNet model.
The ADAM module could be added at the front of the network, after the 3 × 3 convolution layer, or at the end of the network, after 16 MBConv modules. The optimal method was finally determined through the following experiments.
In addition to adding the ADAM module, it was also necessary to configure the other functions of the ADNet. The adaptive moment estimation algorithm was selected as the weight updating optimization algorithm, and the cross-entropy loss function was chosen as the loss function. The swish function was selected as an activation function. The ADAM module could be added at the front of the network, after the 3 × 3 convolution layer, or at the end of the network, after 16 MBConv modules. The optimal method was finally determined through the following experiments.
In addition to adding the ADAM module, it was also necessary to configure the other functions of the ADNet. The adaptive moment estimation algorithm was selected as the weight updating optimization algorithm, and the cross-entropy loss function was chosen as the loss function. The swish function was selected as an activation function.

Experimental Results and Analysis
This section analyzes the experimental results to select the optimal structure of the proposed ADNet algorithm. The dataset included current data under the arc fault state and the normal state in tests no. 1 to no. 5, and the samples in the test set excluded those in the training and validation sets.

Experimental Results and Analysis
This section analyzes the experimental results to select the optimal structure of the proposed ADNet algorithm. The dataset included current data under the arc fault state and the normal state in tests no. 1 to no. 5, and the samples in the test set excluded those in the training and validation sets.

The Optimal Model Selection Based on EfficientNet
Since the ADNet network is based on the EfficientNet, and the EfficientNet model has eight models, the best model was selected at first. Among them, EfficientNet-B1~B7 are improved from the baseline model EfficientNet-B0. In order to get the most suitable network model, PyCharm software (JetBrains, Prague, Czech Republic) was used to build the program, and the environment was Python 3.7 (Guido van Rossum, Harlem, The Netherlands) and TensorFlow 2.4.0 (Google Brain, Mountain View, CA, USA). Due to the size of the dataset, a smaller network structure in the EfficientNet series networks was selected to reduce the number of parameters and unnecessary calculations for improving the training speed. The resolution of the input images becomes larger from EfficientNet-B0 to EfficientNet-B7, and the height and width of the output characteristic matrix of each layer structure will increase accordingly; the occupation of video memory will also increase. Therefore, the EfficientNet-B0-B3 of the EfficientNet series models were selected for training by the dataset. The model basic parameters and training results are shown in Table 3. It can be seen from Table 3 that the detection accuracy of the EfficientNet-B0~B3 networks can reach more than 95%. The EfficientNet-B1 has the highest detection accuracy, indicating that it is suitable for the DC series arc fault detection in PV systems. At the same time, it avoids the problem of reducing the calculation speed caused by the increasing network complexity, which is the advantage of the EfficientNet series model.
In order to further improve the generalization ability and accelerate the convergence speed of the ADNet, considering that not every part of the power spectrum image is equally important, the channel attention mechanism was used, and different convolution kernels were used to capture various features for channel weighted fusion. In addition, the judgment of whether the circuit has an arc fault mainly depends on some critical areas of the power spectrum image, and the characteristics of each part of the image cannot be treated equally. Therefore, the spatial attention mechanism was used to weight some important regions in space, to strengthen important information and suppress nonimportant information.
We continued with experimental verification to find the optimal ADNet model. The experimental results of different ADAM types used in the ADNet are shown in Table 4. In Table 4, C represents the channel attention mechanism, and S represents the spatial attention mechanism. Q represents putting the attention mechanism in the front of the network, which follows the 3 × 3 convolution layer, and H represents adding the attention mechanism to the end of the network, which follows the 16 MBConv modules. According to Table 4, the ADNet model, compared with the original EfficientNet-B1 neural network model, improves the feature extraction ability of data samples and the accuracy of arc fault detection. Among the samples, the training set accuracy and test set accuracy of the improved CS-H model were the highest: the accuracy of arc fault detection of the training set was 99.96%, and that of the test set was 98.81%. Therefore, adding the channel attention mechanism first and then the spatial attention mechanism at the end of the network model can improve the model's detection accuracy. The ADAM module was more effective when applied to the deep layer of the network than when applied to the shallow layer of the network, because the characteristics of the deep layer of the network are more robust after multiple feature extractions. Thus, the ADNet model could capture some crucial features of power spectrum images with better robustness and performance after ADAM operation.
According to the above analysis, the EfficientNet-B1 and CS-H of the ADAM type were selected; the optimal structure of the ADNet model is shown in Figure 8. more effective when applied to the deep layer of the network than when applied to the shallow layer of the network, because the characteristics of the deep layer of the network are more robust after multiple feature extractions. Thus, the ADNet model could capture some crucial features of power spectrum images with better robustness and performance after ADAM operation.
According to the above analysis, the EfficientNet-B1 and CS-H of the ADAM type were selected; the optimal structure of the ADNet model is shown in Figure 8.

The Selection of ADNet Training Parameters
The learning rate directly affects the convergence state of the network model, which determines the step length of the weight iteration. The model will not converge when the learning rate is set too large. When the learning rate is set too small, the convergence speed of the model will become slower, and it will be unable to learn. The best initial learning rate usually uses the search method, which starts training the model from small to large. After many experiments, 0.001 was chosen as the learning rate of the network to accelerate the convergence speed and save the training time.
Batch size refers to the stochastic sample size used in the gradient descent algorithm, which affects the generalization performance of the convolutional neural network model. In a specific range, increasing the batch size will help the stability of convergence, improve the memory utilization rate, and speed up the processing speed of data volume. This paper set the batch size to 8 in many experiments with the ADNet model. Since the ADNet model has a complex structure, dropout was used, and the dropout rate was set to 0.2 in many experiments for avoiding over-fitting, and the number of iterations was 120 times.

The Selection of ADNet Training Parameters
The learning rate directly affects the convergence state of the network model, which determines the step length of the weight iteration. The model will not converge when the learning rate is set too large. When the learning rate is set too small, the convergence speed of the model will become slower, and it will be unable to learn. The best initial learning rate usually uses the search method, which starts training the model from small to large. After many experiments, 0.001 was chosen as the learning rate of the network to accelerate the convergence speed and save the training time.
Batch size refers to the stochastic sample size used in the gradient descent algorithm, which affects the generalization performance of the convolutional neural network model. In a specific range, increasing the batch size will help the stability of convergence, improve the memory utilization rate, and speed up the processing speed of data volume. This paper set the batch size to 8 in many experiments with the ADNet model. Since the ADNet model has a complex structure, dropout was used, and the dropout rate was set to 0.2 in many experiments for avoiding over-fitting, and the number of iterations was 120 times.

Influence of Different Current Values on Detection Results
In order to study the influence of different current values on the arc fault detection accuracy of the ADNet, we used 3 A, 16 A, and 25 A current data from the dataset to carry on experiments. Moreover, the PV inverter startup and irradiance mutation situations were considered the normal state to improve the robustness of the network. The results are shown in Table 5. According to Table 5, with the increase of the current value, the accuracy of the training set and test set decreased gradually. By comparing Figures 4 and 9, it can be seen that with the increase of current values, the power spectrum values of current data also increased. Since the difference in power spectrum values between the high-frequency part and the low-frequency part decreased in the arc fault state, the power spectrum characteristics of arc fault and normal states were similar, which had a certain impact on arc fault detection. However, according to Figure 9, whether the original power spectrum or the normalized power spectrum was considered, the power spectrum values in the arc fault state were basically higher than those in the normal state, and arc fault could still be detected accurately by the ADNet model, as shown in Table 5. The ADNet model's detection accuracy was 98.81%, including three current levels, indicating that this method can detect arc fault accurately. overall 99.96% 98.81% According to Table 5, with the increase of the current value, the accuracy of the training set and test set decreased gradually. By comparing Figures 4 and 9, it can be seen that with the increase of current values, the power spectrum values of current data also increased. Since the difference in power spectrum values between the high-frequency part and the low-frequency part decreased in the arc fault state, the power spectrum characteristics of arc fault and normal states were similar, which had a certain impact on arc fault detection. However, according to Figure 9, whether the original power spectrum or the normalized power spectrum was considered, the power spectrum values in the arc fault state were basically higher than those in the normal state, and arc fault could still be detected accurately by the ADNet model, as shown in Table 5. The ADNet model's detection accuracy was 98.81%, including three current levels, indicating that this method can detect arc fault accurately.

Detection Accuracy of Different Existing Neural Networks
The existing research rarely used the power spectrum images as the input data for neural networks. Therefore, in order to verify whether the arc fault detection accuracy of the ADNet model is higher than that of the existing neural network models, we built GoogLeNet and AlexNet models to train and test with the same dataset as the ADNet model's and compared the accuracy of several existing arc fault detection networks. The results are shown in Table 6. According to Table 6, the detection accuracy of GoogLeNet, AlexNet, BP neural net-

Detection Accuracy of Different Existing Neural Networks
The existing research rarely used the power spectrum images as the input data for neural networks. Therefore, in order to verify whether the arc fault detection accuracy of the ADNet model is higher than that of the existing neural network models, we built GoogLeNet and AlexNet models to train and test with the same dataset as the ADNet model's and compared the accuracy of several existing arc fault detection networks. The results are shown in Table 6. According to Table 6, the detection accuracy of GoogLeNet, AlexNet, BP neural network, and DA-DCGAN is 96.23%, 96.83%, 95.23%, and 97.68%, respectively, and that of the ADNet model is 98.81%. Therefore, the arc fault detection accuracy of the ADNet model is higher than the other existing arc fault detection networks. The results indicate that the ADNet model has a better performance in arc fault detection.

Feasibility Analysis of Application in the Embedded Modules
The ADNet model can be used for edge applications based on embedded processors or modules of the arc fault detection equipment, such as Raspberry Pi, because: (1) The AR model-based data preprocessing method is employed to capture the arc features and remove un-sensitive parts of the power spectrum, which can help to reduce the amount of input data; (2) The ADNet model is based on EfficientNet-B1, a commonly-used lightweight convolutional neural network. Moreover, we used the attention mechanism to combine with the EfficientNet-B1, making the algorithm more concentrated on the arc features while ignoring the rest information. Specifically, spatial attention was used to locate the more sensitive part of the input signal, while channel attention was used to determine the more valuable channels or layers in the model [29]. Therefore, the proposed method can be further light-weighted with considerable detection accuracy; (3) Due to the above lightweight design and operation, the total parameters of the proposed ADNet model are only 6.58 × 10 6 , which are less than those of other commonly used methods. Meanwhile, the detection accuracy was higher than that of others. Table 7 shows a detailed comparison. The more model parameters, the greater the amount of calculation and the slower the running speed [33]. We compared the number of network model parameters with the built GoogLeNet, AlexNet, and several commonly used networks. As shown in Table 7, the total of the parameters was the sum of the model parameters. The total number of model parameters in GoogLeNet, AlexNet, Inception V3, Xception, and ResNet50, which are commonly used convolutional neural networks, are 10.31 M, 14.59 M, 23.63 M, 22.86 M, and 23.48 M, respectively. The quantity is too large, resulting in too much computation and slowing down the running speed. However, the total number of parameters in the ADNet model, which belongs to the lightweight convolutional neural network, is 6.58 M, which is lower than that of the above networks. The results show that the proposed method achieves the best detection accuracy, with minimum computational burden, due to the well-designed lightweight algorithm. Therefore, the ADNet model is ready for edge applications and can be implemented with embedded processors or modules, such as the Raspberry Pi 3B with a quad-core 1.2 GHz CPU and 1 GB RAM. This calls for further research in the future.

Conclusions
In this paper, we established an experimental platform, based on the UL1699B standard to collect DC current data in creating a dataset, which can obtain current data efficiently. The power spectrum image of current data can clearly distinguish the current in the normal state and the arc fault state. Therefore, it can be used as the input for the arc fault detection algorithm. In order to avoid the problem of excessive consumption of computing resources due to increasing algorithm complexity, this paper proposed a detection method of DC series arc faults in PV systems based on a lightweight convolutional neural network, which has fewer parameters and a low computational burden. The power spectrum images were normalized and converted into 240 × 240 gray images as the dataset. Compared with the EfficientNet series model, the EfficientNet-B1 was selected as the optimal network. The channel attention mechanism and the spatial attention mechanism were added to the deep layer of the EfficientNet-B1 to construct the ADNet model for improving the network's detection accuracy and making it more effective. This method considered the situations of PV inverter startup and irradiance mutation, enhancing the robustness of the network. The results showed that the accuracy of the training set was 99.96%, and that of the test set was 98.81%, which are higher than the accuracies of GoogLeNet, AlexNet, and other commonly used networks. According to the above analysis, this method can be used in PV systems to detect DC series arc faults accurately and to reduce arc fire hazards. Therefore, the safety of PV systems will be improved, and solar energy may be used sufficiently and stably.

Conflicts of Interest:
The authors declare no conflict of interest.