Enhanced Distributed Parallel Fireﬂy Algorithm Based on the Taguchi Method for Transformer Fault Diagnosis Zhi-Jun

: To improve the reliability and accuracy of a transformer fault diagnosis model based on a backpropagation (BP) neural network, this study proposed an enhanced distributed parallel ﬁreﬂy algorithm based on the Taguchi method (EDPFA). First, a distributed parallel ﬁreﬂy algorithm (DPFA) was implemented and then the Taguchi method was used to enhance the original communication strategies in the DPFA. Second, to verify the performance of the EDPFA, this study compared the EDPFA with the ﬁreﬂy algorithm (FA) and DPFA under the test suite of Congress on Evolutionary Computation 2013 (CEC2013). Finally, the proposed EDPFA was applied to a transformer fault diagnosis model by training the initial parameters of the BP neural network. The experimental results showed that: (1) The Taguchi method effectively enhanced the performance of EDPFA. Compared with FA and DPFA, the proposed EDPFA had a faster convergence speed and better solution quality. (2) The proposed EDPFA improved the accuracy of transformer fault diagnosis based on the BP neural network (up to 11.11%).


Introduction
Since swarm intelligence optimization algorithms were proposed, they have been accepted by more and more non-computer researchers due to their efficient optimization performance, especially because they do not need special information about the problems to be optimized [1]. Their application fields have rapidly expanded to scientific computing [2], workshop scheduling optimization [3], transportation configuration [4], combination problems [5], digital image processing [6], engineering optimization design [7] and other fields. They have become an indispensable part of artificial intelligence and computer science. However, compared with traditional optimization algorithms, the development history of swarm intelligence optimization algorithms is still relatively short and there are many imperfections. In particular, the foundation of mathematics has always been a hindrance in its development [8]. Therefore, there are still too many problems to be explored and solved in this field.
The Taguchi method is a robust industrial design method that is used to evaluate and implement improvements in products, processes and equipment [9]. It is an experimental design method that focuses on minimizing process variability or making products less sensitive to environmental variability [10]. GA is a famous optimization algorithm [11]. The genetic algorithm has good global search ability and can quickly search out all the solutions in the solution space, but the local search ability of the genetic algorithm is poor and the search efficiency is low in the late evolution [12]. Chou and his associates used the Taguchi method with the genetic algorithm (GA), which improved the quality of 1.
The distributed parallel firefly algorithm (DPFA) was implemented and then a new enhanced distributed parallel firefly algorithm (EDPFA) based on the Taguchi method was proposed.

2.
The Taguchi method selected the better dimensions of different solutions to obtain a new solution, which was used as a new communication strategy for EDPFA. 3.
The proposed EDPFA was tested by using the CEC2013 suite and had better performance than the standard FA and DPFA. 4.
The proposed EDPFA was used to train the parameters of the BP neural network and improve the accuracy of the transformer fault diagnosis model based on the BP neural network.
The rest of the paper is structured as follows. Section 2 describes the original DPFA and the Taguchi method. Section 3 introduces the Taguchi method into the original DPFA and analyses the details of algorithm improvements. Section 4 focuses on testing the proposed EDPFA under the CEC2013 suite and compares it with other algorithms. Section 5 implements the proposed EDPFA in the field of transformer fault diagnosis. Section 6 sums up this paper.

Distributed Parallel Firefly Algorithm and Taguchi Method
This section provides a brief introduction to the original DPFA and Taguchi method.

Distributed Parallel Firefly Algorithm
The distributed parallel firefly algorithm (DPFA) was proposed by Pan and his associates in 2021 [24]. The DPFA is an updated version of the firefly algorithm (FA) proposed in 2007 [15]. The core idea of the DPFA is that the initial solutions are divided into some subgroups and share the information based on different communication strategies among subgroups after some fixed number of iterations.

The Mathematical Form of the DPFA
The search process of FA relates to two significant concepts: attractiveness and brightness. The attractiveness exists between two fireflies and indicates the position movement relationship between fireflies. The brightness is an individual characteristic of fireflies and is proportional to the fitness function. The standard FA satisfies the following three characteristics [15]: (1) Suppose that all fireflies can attract each other. (2) Fireflies' attractiveness is only related to distance and brightness. A firefly with a strong brightness will attract a firefly with a weak brightness. (3) The fitness function determines the brightness.
The mathematical form of the DPFA is as follows: In Formula (1), β(r) represents the attractiveness of two fireflies. β 0 represents the maximum degree of attractiveness (r = 0). Because the brightness will gradually weaken with the increase of distance and the absorption of the medium, the brightness absorption coefficient (γ) can be set as a constant to reflect the above characteristics.
In Formula (2), r ij is the Cartesian distance between two fireflies. x i,g is the i th firefly in the g group. x i,g,k is the k th component of the spatial coordinate of firefly x i,g .
In formula (3), the value of x i,g represents the brightness of firefly x i,g . t represents the current iteration. i = 1, 2, 3, . . . , N g ; j = 1, 2, 3, . . . , N g . N g represents the number of fireflies in group g.

Communication Strategies
In the DPFA, when t = nR (n = 1, 2, 3, . . .), these subgroups trigger communication strategies. t and R represent the current iteration and the fixed communication iteration, respectively. The DPFA has four communication strategies, namely, the maximum of the same subgroup, the average of the same subgroup, the maximum of different subgroups and the average of different subgroups. The core ideal of communication strategies is to select some better solutions to replace the poorer ones in the subgroups. Different communication strategies have different ways of selecting better solutions. Take the maximum of the same subgroup as an example: In strategy 1, when t = nR iterations (n = 1, 2, 3, . . .), the brightest firefly x max,g (t) in the same group will replace the darkest k fireflies in the same group. Figure 1 shows strategy 1.
Dear editor: Part of the article need to be adjusted. The specific adjustment parts are as follows: ① The fonts in Figures 1 and 3   The other three communication strategies are as follows. More details of the DPFA are described in the literature [24]. Strategy 2: The average of the same subgroup: , ..., x k,g (t) represent the k brightest fireflies' positions in the gth group. Strategy 3: The maximum of different subgroups: where x 1 (t), x 2 (t), . . . , x N (t) represent all fireflies' positions in all groups. Strategy 4: The average of different subgroups: where x max,1 (t), x max,2 (t), x max,3 (t), ..., x max,G (t) represent the brightest fireflies' positions in all groups. For more detail on the DPFA, please refer to [24]. Algorithms 1 shows the pseudocode of DPFA. Initialize the N fireflies and divide them evenly into G groups. 1: while T < F do 2: for g = 1:G do 3: Calculate the light intensity I ig at x ig using f x ig and rank the fireflies 4: for i = 1:N/G do 5: for j = 1:i do 6: if (I jg > I ig ) 7: Move firefly i toward j in the g th subgroup in all D dimensions by using Equation (3)  The global best firefly x gbest and the value of f x gbest .

The Taguchi Method
The Taguchi method includes two major tools: (1) orthogonal arrays and (2) the signalto-noise ratio (SNR) [10]. In the following, the concepts of these two tools are reviewed.
An array is said to be orthogonal if it satisfies two conditions: (1) each column represents a different level value of a considered factor and these considered factors can be evaluated independently, and (2) each row represents a set of parameters for an experiment. The orthogonal array can be described as where K represents the number of columns (factors) and K is a positive integer. Q represents the number of level values of a considered factor, where Q is also a positive integer. M represents the number of experiments, where M = K * (Q − 1) + 1. For instance, suppose that there are three sets of solutions with four parameters in an experiment. This means that each of the four factors can be at three levels. Then, Table 1 shows the orthogonal array L 9 3 4 . In the absence of the orthogonal array, if one wishes to find the optimal combination of parameters, the total number of experiments is 3 4 = 81. However, orthogonal arrays provide us with a set of just nine experiments. The orthogonal array proposed by [12] can effectively reduce the number of experiments in the instance of obtaining the optimal combination of parameters.  A  B  C  D   1  1  1  1  1  2  1  2  2  2  3  1  3  3  3  4  2  1  2  3  5  2  2  3  1  6  2  3  1  2  7  3  1  3  2  8  3  2  1  3  9  3  3  2  1 Energies 2022, 15, 3017

Number of Experiments Considered Factors
The SNR tool is used to find the parameters' optimal combination from all the combinations listed. To be more specific, the SNR is used to determine the appropriate level for each factor. The SNR can be calculated in various ways. For optimization problems, the value of the objective function can generally be regarded as the SNR.

Enhanced DPFA and Communication Strategy
In the original DPFA, the four communication strategies improve the algorithm through the group optimal solution or the global optimal solution, which has a great influence on the performance of the algorithm [24]. However, these strategies ignore the influence of various dimensions (parameters) in the optimal solution. Therefore, this study extracted all the dimensions (parameters) in the optimal solution and then used the Taguchi method to recombine the dimensions (parameters) to obtain a better solution.

Operation Strategy of the Taguchi Method
The operation strategy of the Taguchi method is described as follows: Step 1: Choose k sets of solutions, which are denoted using the symbols x 1,g,d , x 2,g,d , . . . , x k,g,d . g represents the gth group and d represents the dth dimension of the solution space (d = 1, 2, 3 . . . , D). D represents the total number of dimensions of the solution space.
Step 2: Each dimension of candidate solutions corresponds to a factor (the number of factors is D). The different values of candidate solutions denote different level values (the number of level values is k). The value of the objective function corresponding to each candidate solution is used as an SNR to judge whether the solution is good or bad. Next, it can combine these dimensions into a better solution (x better ) using the Taguchi method.
Step 3: The better solution (x better ) replaces the worst solution in the original groups.
To facilitate the reader's understanding, the following example is given.
Given the objective function f (x) = x 2 1 + x 2 2 + x 2 3 + x 2 4 , minimize it. Assume three solutions: Using the Taguchi method to combine these three solutions to get a better solution, Table 2 shows the results of solution combinations. According to Table 2, the best combination is x better = [2, 0, 0, 0], f (x better ) = 4.

New Communication Strategies
In the original DPFA, the communication strategies are divided into two ways: intragroup information exchange (strategies 1 and 2) and inter-group information exchange (strategies 3 and 4). If the parameters of the solutions are independent, it is easier to obtain better results with the former. If the parameters of the solutions are loosely correlated, it is easier to obtain better results with the latter [38]. To improve the efficiency of information exchange, the Taguchi method is used to enhance the original communication strategies. New strategy 1 follows the three steps in Section 3.1. The candidate solutions are the best k solutions in the group. New strategy 1 is an enhanced version of strategies 1 and 2 in the original DPFA. Figure 2 shows the new communication strategy 1.

New Communication Strategies
In the original DPFA, the communication strategies are divided into two ways: intragroup information exchange (strategies 1 and 2) and inter-group information exchange (strategies 3 and 4). If the parameters of the solutions are independent, it is easier to obtain better results with the former. If the parameters of the solutions are loosely correlated, it is easier to obtain better results with the latter [38]. To improve the efficiency of information exchange, the Taguchi method is used to enhance the original communication strategies.

New Strategy 1
New strategy 1 follows the three steps in Section 3.1. The candidate solutions are the best solutions in the group. New strategy 1 is an enhanced version of strategies 1 and 2 in the original DPFA. Figure 2 shows the new communication strategy 1.

New Strategy 2
New strategy 2 also follows the three steps in Section 3.1. The candidate solutions are the best solution in each group. New strategy 2 is an enhanced version of strategies 3 and 4 in the original DPFA. Figure 3 shows the new communication strategy 2.

New Strategy 2
New strategy 2 also follows the three steps in Section 3.1. The candidate solutions are the best solution in each group. New strategy 2 is an enhanced version of strategies 3 and 4 in the original DPFA. Figure 3 shows the new communication strategy 2.

The Pseudocode of the EDPFA
In the EDPFA, all initial solutions are divided into g subgroups. After the fixed iterations, these subgroups use the new communication strategy 1 or 2 to achieve the benefit of intra-group and inter-group collaboration. Algorithms 2 shows the pseudocode of the EDPFA.
Objective function f (x), x = (x 1 , x 2 , . . . , x d ); Initializing a population of N fireflies, x i (i ≤ n); Set the number of groups G. 17: while t < Max Generation 18: for g = 1:G

19:
Calculate the light intensity I ig using f x i,g and rank the fireflies. 20: for i = 1:N/G 21: for j = 1:i 22: if (I j,g > I i,g ) 23: Move firefly i toward j in the gth subgroup in all D dimensions by using Equation (

Test Functions and Parameters Setting
This study chose the CEC2013 suite to test the proposed EDPFA. The CEC2013 suite included unimodal functions ( f 1 ∼ f 5 ), multimodal functions ( f 6 ∼ f 20 ) and composite functions ( f 21 ∼ f 28 ), and their dimensions were set to 30. The search range was set to [−100, 100]. More details of CEC2013 are presented in [39,40].
This study compared the proposed EDPFA with the FA and DPFA for testing the performance of algorithms. To assure the fairness of the experiment, 28 test functions were evaluated with 51 runs and 500 iterations. Because the operation of the Taguchi method calls test functions, the population size of the EDPFA was set to 94. Furthermore, the population size of the FA and DPFA was set to 100. In the experimental comparison, the number of function calls for all algorithms was the same. In addition, the three algorithms maintained consistent parameter settings (α = 0.25, β = 0.2, γ = 1, G = 4). The programming was based on MATLAB 2019a. All the simulations were performed on a laptop with an AMD Ryzen7 2.90 GHz CPU and 16 GB RAM. Table 3 shows the performance comparison results of the FA, DPFA and EDPFA from the "Mean" of 51 runs. The smaller the "Mean", the better the final result. The experimental results of FA and DPFA on each test function were compared with the EDPFA. The symbol (=) represents that the performance of the two algorithms was similar. The symbol (>) represents that the EDPFA performed well. The symbol (<) represents that the EDPFA performed poorly. Finally, the last row of Table 3 counts the results on all benchmark functions.  Table 3, compared with the FA, the proposed EDPFA had 22 better results, 2 similar results and 2 bad results in 28 test functions. This result shows that EDPFA had a competitive search ability and solution accuracy. Compared with the DPFA, the proposed EDPFA had 19 better results, 1 similar result and 8 bad results in all test functions. This showed that the EDPFA was stronger than the EPFA in performance, and the DPFA was enhanced by the Taguchi method. However, regarding the results for test functions f 1 ∼ f 5 , the proposed EDPFA was not as good as the DPFA. f 1 ∼ f 5 were the unimodal functions. The comparison results showed that the EDPFA was not suitable for solving the unimodal functions.

Comparison with the Original FA and DPFA
Next, to further evaluate the performances of the algorithms, the convergence curves of the FA, DPFA and EDPFA were compared. Each curve represented the convergence of the median value of the total 51 runs by a given algorithm, and some of them are presented in Figure 4. Table 4 summarizes the convergence figures under IEEE CEC 2013 for the 30D optimization. As shown in Figure 4, the proposed EDPFA could obtain a better convergence speed in some test functions (

Comparison with Other Algorithms
This section compares the performance of the EDPFA with some famous algorithms. All settings of the EDPFA were the same as in Sections 4.1 and 4.2. Table 5 shows the performance comparison results of particle swarm optimization (PSO) [41], parallel particle swarm optimization (PPSO) [42], the genetic algorithm (GA) [11], the multi-verse optimizer (MVO) [43], the whole optimization algorithm (WOA) [44] and the ant lion optimizer (ALO) [45] in terms of the "Mean" of 51 runs. According to the data in Table 5, it is obvious that the proposed EDPFA performed better under the CEC2013 test suite. Compared with PSO, PPSO, the GA, the MVO, the WOA and the ALO, the proposed EDPFA achieved 24, 23, 24, 18, 26 and 21 better results, respectively.

Comparison with Other Algorithms
This section compares the performance of the EDPFA with some famous algorithms. All settings of the EDPFA were the same as in Sections 4.1 and 4.2. Table 5 shows the performance comparison results of particle swarm optimization (PSO) [41], parallel particle swarm optimization (PPSO) [42], the genetic algorithm (GA) [11], the multi-verse optimizer (MVO) [43], the whole optimization algorithm (WOA) [44] and the ant lion optimizer (ALO) [45] in terms of the "Mean" of 51 runs. According to the data in Table 5, it is obvious that the proposed EDPFA performed better under the CEC2013 test suite. Compared with PSO, PPSO, the GA, the MVO, the WOA and the ALO, the proposed EDPFA achieved 24, 23, 24, 18, 26 and 21 better results, respectively.

Application for Transformer Fault Diagnosis
In machine learning, the backpropagation (BP) neural network has a strong ability to fit nonlinear systems. It is very suitable for solving prediction and classification problems [46]. Transformer fault diagnosis is essentially a fault classification problem. Therefore, it has been a research hotspot to introduce a BP neural network into the field of transformer fault diagnosis [47][48][49][50]. As described in this section, the proposed EDPFA was used to train the initial parameters of a BP neural network to improve the performance of transformer fault diagnosis model based on a BP neural network.

Structure of Transformer Fault Diagnosis Model Based on a BP Neural Network
The steps to establish the transformer fault diagnosis model based on a BP neural network were as follows: Step 1: First, the characteristic gas content of transformers and the corresponding fault were composed into a data set.
Step 2: Then, 80% of the samples in the data set were used to train the BP neural network model. The other 20% of samples in the data set were used to test the trained BP neural network model.
Step 3: Finally, the transformer fault classification accuracy of the test set was counted to judge the performance of the model.
The transformer fault diagnosis data for dissolved gas in oil mainly include five fault gases (H 2 , CH 4 , C 2 H 2 , C 2 H 4 , C 2 H 6 ) and their corresponding six fault types (normal state, NS; low-energy discharge, LED; arc discharge, AD; middle-and-low-temperature overheating, MLTO; high-temperature overheating, HTO; partial discharge, PD). Figure 5 shows the transformer fault diagnosis model based on BP neural network.

Structure of Transformer Fault Diagnosis Model Based on a BP Neural Network
The steps to establish the transformer fault diagnosis model based on a BP neural network were as follows: Step 1: First, the characteristic gas content of transformers and the corresponding fault were composed into a data set.
Step 2: Then, 80% of the samples in the data set were used to train the BP neural network model. The other 20% of samples in the data set were used to test the trained BP neural network model.
Step 3: Finally, the transformer fault classification accuracy of the test set was counted to judge the performance of the model.
The transformer fault diagnosis data for dissolved gas in oil mainly include five fault gases ( , , , , ) and their corresponding six fault types (normal state, NS; low-energy discharge, LED; arc discharge, AD; middle-and-low-temperature overheating, MLTO; high-temperature overheating, HTO; partial discharge, PD). Figure 5 shows the transformer fault diagnosis model based on BP neural network.

Structure of Transformer Fault Diagnosis Model Based on EDPFA-BP Neural Network
Even though the fitting ability of a traditional BP neural network is very strong, it still has some inherent defects, including low accuracy and slow convergence, which can no longer meet the requirements of a power system regarding transformer reliability [33]. The main reason is that all the thresholds and weights are randomly generated before the training of a BP neural network. These unoptimized initial values often lead to slow convergence and low accuracy of fault diagnosis results. Therefore, this study adopted the EDPFA to optimize the initial value of the BP neural network to improve the performance of the model. Figure 6 shows the transformer fault diagnosis model based on the EDPFA-BP neural network.

Structure of Transformer Fault Diagnosis Model Based on EDPFA-BP Neural Network
Even though the fitting ability of a traditional BP neural network is very strong, it still has some inherent defects, including low accuracy and slow convergence, which can no longer meet the requirements of a power system regarding transformer reliability [33]. The main reason is that all the thresholds and weights are randomly generated before the training of a BP neural network. These unoptimized initial values often lead to slow convergence and low accuracy of fault diagnosis results. Therefore, this study adopted the EDPFA to optimize the initial value of the BP neural network to improve the performance of the model. Figure 6 shows the transformer fault diagnosis model based on the EDPFA-BP neural network.

The Data Collection and Pretreatment
In this study, there were 465 sets of transformer fault data (including labels and features), some of which are shown in Table 6. Table 7 shows the codes of the transformer fault types. Figure 7 shows the sample distribution of the transformer fault types, in which the HTO faults had the highest number and the PD faults had the lowest number. To verify the model, 80% of the data of each fault type was randomly selected as the training set and 20% as the test set. In total, there were 375 sets of training data and 90 sets of testing data.

The Data Collection and Pretreatment
In this study, there were 465 sets of transformer fault data (including labels and features), some of which are shown in Table 6. Table 7 shows the codes of the transformer fault types. Figure 7 shows the sample distribution of the transformer fault types, in which the HTO faults had the highest number and the PD faults had the lowest number. To verify the model, 80% of the data of each fault type was randomly selected as the training set and 20% as the test set. In total, there were 375 sets of training data and 90 sets of testing data.

The Parameter Setting of a BP Neural Network
A BP neural network is a kind of mathematical model that can simulate complex nonlinear relations and automatically modify parameters. In a BP neural network, there are input layers, hidden layers and output layers. The signal first travels through the input layer, then to the hidden layer and finally to the output layer. In the above process, the relevant information is processed by regulating internal relations between lots of nodes. Figure 8 shows the topological type of the BP neural network adopted in this study. The number of inputs was 5 (five fault gases), the number of hidden layers was 12, the number of output layers was 6 and the number of outputs was 6 (six fault types). In addition, after many experimental trials, this study set the iteration times and learning precision goal of the BP neural network as 1000 and 0.0001, respectively. The activation function adopts a sigmoid function and the BP neural network introduced error backpropagation into the multilayer networks.
A BP neural network is a kind of mathematical model that can simulate complex nonlinear relations and automatically modify parameters. In a BP neural network, there are input layers, hidden layers and output layers. The signal first travels through the input layer, then to the hidden layer and finally to the output layer. In the above process, the relevant information is processed by regulating internal relations between lots of nodes. Figure 8 shows the topological type of the BP neural network adopted in this study. The number of inputs was 5 (five fault gases), the number of hidden layers was 12, the number of output layers was 6 and the number of outputs was 6 (six fault types). In addition, after many experimental trials, this study set the iteration times and learning precision goal of the BP neural network as 1000 and 0.0001, respectively. The activation function adopts a sigmoid function and the BP neural network introduced error backpropagation into the multilayer networks. To ensure the objectivity of the experiment process, all parameters in each transformer fault diagnosis model were the same. The parameters to be used for the EDPFA were consistent with Section 4. Figure 9 shows the diagnosis results, which included four models (the BP neural network, FA-BP neural network, DPFA-BP neural network and EDPFA neural network). In Figure 9, the ordinate represents the six transformer fault types, and the abscise represents 465 sets of transformer faults. The "○" in Figure 9 represents a predicted fault type, and the "✱" represents an actual fault type. If the "○" and the "✱" overlap, this transformer fault was correctly predicted; otherwise, the prediction was wrong. To make the results more intuitive, "" represents the improved BP neural network identifies correctly, while the original BP neural network made an incorrect identification. "" is the opposite. Table  8 shows the diagnosis accuracy of each model. To ensure the objectivity of the experiment process, all parameters in each transformer fault diagnosis model were the same. The parameters to be used for the EDPFA were consistent with Section 4. Figure 9 shows the diagnosis results, which included four models (the BP neural network, FA-BP neural network, DPFA-BP neural network and EDPFA neural network). In Figure 9, the ordinate represents the six transformer fault types, and the abscise represents 465 sets of transformer faults. The " " in Figure 9 represents a predicted fault type, and the " ork is a kind of mathematical model that can simulate complex d automatically modify parameters. In a BP neural network, there n layers and output layers. The signal first travels through the input en layer and finally to the output layer. In the above process, the s processed by regulating internal relations between lots of nodes. e topological type of the BP neural network adopted in this study. was 5 (five fault gases), the number of hidden layers was 12, the rs was 6 and the number of outputs was 6 (six fault types). In addiimental trials, this study set the iteration times and learning preciural network as 1000 and 0.0001, respectively. The activation funcfunction and the BP neural network introduced error backpropagar networks. lts and Analysis e diagnosis results, which included four models (the BP neural netetwork, DPFA-BP neural network and EDPFA neural network). In epresents the six transformer fault types, and the abscise represents r faults. The "○" in Figure 9 represents a predicted fault type, and actual fault type. If the "○" and the "✱" overlap, this transformer dicted; otherwise, the prediction was wrong. To make the results presents the improved BP neural network identifies correctly, while network made an incorrect identification. "" is the opposite. " represents an actual fault type. If the " " and the "

Experiment Results and Analysis
A BP neural network is a kind of mathematical model that can simulate complex nonlinear relations and automatically modify parameters. In a BP neural network, there are input layers, hidden layers and output layers. The signal first travels through the input layer, then to the hidden layer and finally to the output layer. In the above process, the relevant information is processed by regulating internal relations between lots of nodes. Figure 8 shows the topological type of the BP neural network adopted in this study. The number of inputs was 5 (five fault gases), the number of hidden layers was 12, the number of output layers was 6 and the number of outputs was 6 (six fault types). In addi-tion, after many experimental trials, this study set the iteration times and learning precision goal of the BP neural network as 1000 and 0.0001, respectively. The activation function adopts a sigmoid function and the BP neural network introduced error backpropagation into the multilayer networks. To ensure the objectivity of the experiment process, all parameters in each transformer fault diagnosis model were the same. The parameters to be used for the EDPFA were consistent with Section 4. Figure 9 shows the diagnosis results, which included four models (the BP neural network, FA-BP neural network, DPFA-BP neural network and EDPFA neural network). In Figure 9, the ordinate represents the six transformer fault types, and the abscise represents 465 sets of transformer faults. The "○" in Figure 9 represents a predicted fault type, and the "✱" represents an actual fault type. If the "○" and the "✱" overlap, this transformer fault was correctly predicted; otherwise, the prediction was wrong. To make the results more intuitive, "" represents the improved BP neural network identifies correctly, while the original BP neural network made an incorrect identification. "" is the opposite. Table  8 shows the diagnosis accuracy of each model. " overlap, this transformer fault was correctly predicted; otherwise, the prediction was wrong. To make the results more intuitive, " " represents the improved BP neural network identifies correctly, while the original BP neural network made an incorrect identification. " " is the opposite. Table 8 shows the diagnosis accuracy of each model. As shown in Figure 9, compared with other models based on the improved BP neural network (b-d), there were more " " faults in the unimproved BP model (a). This shows that the transformer fault diagnosis models based on the BP neural network had poor fault classification ability. Furthermore, it is obvious that, compared with other neural networks, the EDPFA-BP neural network had better performance regarding fault 4 (middleand-low-temperature overheating), where it identified fault 4 more often. From Table 8, the fault classification accuracy of the BP-EDPFA neural network was the highest (up to 84.44%). Compared with the other models, the accuracy of EDPFA-BP neural network was higher by 11.11%, 6.66% and 3.34%. The recall and precision of each model are shown in Tables 9 and 10. As shown in Table 9, the EDPFA-BP neural network had the highest recall rate for six fault types. Especially regarding the PD fault, its recall rate reached 100%. From Table 10, the precision of the BP neural network was the lowest, and the precision of the EDPFA-BP neural network was the highest. This indicated that the EDPFA-BP neural network had a better classification effect and fewer fault classification errors. From the above three aspects, it can be concluded that the proposed EDPFA could better optimize the initial parameters of the BP neural network and manage the transformer fault diagnosis model based on a BP neural network.   As shown in Figure 9, compared with other models based on the improved BP neural network (b-d), there were more "○" faults in the unimproved BP model (a). This shows that the transformer fault diagnosis models based on the BP neural network had poor fault classification ability. Furthermore, it is obvious that, compared with other neural networks, the EDPFA-BP neural network had better performance regarding fault 4 (middleand-low-temperature overheating), where it identified fault 4 more often. From Table 8, the fault classification accuracy of the BP-EDPFA neural network was the highest (up to 84.44%). Compared with the other models, the accuracy of EDPFA-BP neural network was higher by 11.11%, 6.66% and 3.34%. The recall and precision of each model are shown in Tables 9 and 10. As shown in Table 9, the EDPFA-BP neural network had the highest recall rate for six fault types. Especially regarding the PD fault, its recall rate reached 100%. From Table 10, the precision of the BP neural network was the lowest, and the precision of the

Conclusions
An enhanced distributed parallel firefly algorithm (DEPFA) based on the Taguchi method was proposed and it was applied to transformer fault diagnosis. The Taguchi method could be used to improve the effectiveness of the original communication strategies in the DPFA, which enhanced the influence of various dimensions (parameters) in the optimal solution. In the test functions, the implemented EDPFA achieved faster convergence and could find better solutions. Compared with the FA and DPFA, the EDPFA had 24 and 19 better results. This is important for the safety and stability of a power system to quickly diagnose and predict the existing or latent transformer faults. The proposed EDPFA was used to train the BP neural network to implement diagnoses. The experimental results showed that the proposed EDPFA could effectively improve the accuracy of the transformer fault diagnosis model based on a BP neural network (up to 11.11%). However, the EDPFA is not fully studied and there is still a lot of room for optimization, especially regarding solving the unimodal optimization problems.