Electrostatic Field for Positive Lightning Impulse Breakdown Voltage in Sphere-to-Plane Air Gaps Using Machine Learning

: Breakdown (BD) voltage is signiﬁcant in high-voltage power electric machines. Currently, BD voltages are mainly predicted by the semi-empirical formula in strongly inhomogeneous electric ﬁelds. However, the equation could not be applied for electrodes with weakly inhomogeneous electric ﬁelds. In this paper, positive lightning impulse BD voltages are predicted in various sphere-to-plane air gaps using forms of machine learning such as support vector regression (SVR), Bayesian regression (BR) and multilayer perceptron (MLP). Unlike previous studies, a method is also proposed by introducing streamer propagation characteristics as new features and by removing electric ﬁeld gradients as unnecessary features to ﬁnd out how to reduce the feature dimension. The streamer propagation characteristics are suggested to reﬂect the possibility of a discharge process between electrodes. Predicted voltages from machine learning algorithms are compared with the experimental results and calculated voltages from the semi-empirical formula. Firstly, the predictions from each model agreed well with the datasets. New features were observed to be applied for machine learning algorithms and to be as important as known electrostatic features before discharge. Secondly, predicted BD voltages were more accurate than calculated voltages from the semi-empirical equation in strongly inhomogeneous electric ﬁelds. Predictions from each model also agreed well with the experimental results in weakly inhomogeneous electric ﬁelds. The prediction accuracy of SVR was be�er than those of BR and MLP. Machine learning algorithms were also shown to be applied for electrodes with a wide range of inhomogeneities, unlike a semi-empirical method. We expect that the suggested features and machine learning algorithms can be used for accurately calculating BD voltages.


Introduction
Dielectric design of electric equipment is to predict the breakdown voltage.Electric field distribution of an air gap has an influence on its dielectric strength [1].In strongly inhomogeneous field distributions (radius gap distance) such as needle-to-plane, air dielectric insulation strength is mainly dependent on either the maximum electric field or streamer propagation characteristics.However, electrical breakdown becomes complex in weakly inhomogeneous field distribution (radius ≤ gap distance).
Many studies have sought to predict the electrical breakdown by using q particle-incell method, a fluid model, an air discharge mechanism and machine learning.The particlein-cell method is applied to gaps less than tens of um at pd lower than Paschen's minimum to predict minimum ignition voltages in the circuit breaker.The fluid model also shows that it is possible to model electrical breakdown of air at pd values higher than Paschen's minimum [2,3].Various methods are suggested to predict the breakdown voltage of air gaps with a typical electrode [4][5][6][7][8][9].However, more research is required to solve several concerns [10].Recently, streamer process during discharge has been studied to calculate the breakdown voltage.It involves the inception and propagation, which are determined by the ratio between positive electrode shape and gap distance in inhomogeneous electric fields [11].This method is still not perfect and can mainly be used to calculate BD voltages in strongly inhomogeneous electric fields.
A neural network is applied to predict electrical breakdown voltages of transformer oil or to evaluate partial discharge in transformer oil [12][13][14].Neural network is mainly used to precisely analyze the performance of the complex system, which involves various materials, such as pure oils, water contents, impurities and solid insulators.A support vector machine (SVM) is utilized to predict BD voltages of the air in various electrodes, such as rod-to-plane and sphere-to-sphere [15][16][17].Generic algorithm (GA) and least square algorithm are applied for the feature selection [18,19].Previous studies consider electrostatic field features which characterize electrode shapes before discharge.However, it is impossible for such inputs to represent the discharge process after ignition.Some of these features may also not be useful.Thus, in this paper, a method is suggested by introducing streamer propagation characteristics as new features and by removing electric field gradients as unnecessary features.Unlike former studies, electrostatic field features are calculated based on the electric field that exceeds 90% of the maximum electric field.Machine learning algorithms are also investigated to apply for inhomogeneous electric fields with various nonuniformities.
Electric fields change during the discharge.The complete breakdown takes place when the electric fields stay sufficiently large during the discharge.Thus, streamer propagation characteristics are considered to represent the possibility of the sustainable discharge process between electrodes.If we involve electrostatic fields before discharge and streamer propagation characteristics at the same time, it may be possible to predict the BD voltage more accurately.For this, various electrostatic fields before discharge are calculated to characterize the capacitive energy and inhomogeneity of each gap structure.Streamer propagation characteristics are also expressed as the ratio of electric fields over the critical electric field for streamer propagation.These features are utilized as inputs of machine learning to calculate the electrical BD voltage.Machine learning is necessary for automating the entire electrical breakdown analysis processes based on automatic electric field calculations.In particular, SVR and Bayesian regression can train models even with a small amount of data.Computational efficiency of two algorithms is high, so that these algorithms draw results within a short time, as shown in self-driving applications [20].SVR and Bayesian regression are also compared with a multilayer perceptron (MLP) neural network.
In this paper, positive lightning impulse BD voltage is predicted by SVR and Bayesian regression and MLP in a sphere-to-plane electrode with air gaps of <200 mm.Predicted BD voltages from machine learning are compared with both experimental results and calculated voltages in the strongly inhomogeneous electric fields.Moreover, prediction accuracy of BD voltages is analyzed in weakly inhomogeneous electric fields.Section 2 presents the limits of the semi-empirical method and the need for machine learning.In Section 3, suggested parameters are defined as new features, and electrostatic features are expressed.Section 4 explains machine learning algorithms and parameter tunning.Section 5 provides experimental BD results to make datasets for machine learning.In Section 6, simulation results are discussed.Section 7 gives the conclusion of this paper.

Semi-Empirical Methods for BD Voltage and Necessity of Machine Learning
Air insulation design criteria are mainly based on semi-empirical methods related to streamer inception and propagation.Firstly, the streamer breakdown criterion is known for calculating inception voltages or breakdown voltages in inhomogeneous electric fields under various gases [21].The criterion is expressed by Equation (1); x c is either identical with the gap distance or a critical avalanche length; K is the number of critical electrons, which is required for electrical breakdown; α is the ionization coefficient.K and α are influenced by various factors, such as humidity, air density, gas mixture ratio and electrode shape [22,23].
Secondly, a streamer propagates between electrodes in case the applied voltage is large enough to sustain the propagation process [24].Semi-empirical Equation (2) for the BD voltage is expressed by the voltage drop of streamer head and propagation length; in (2), d is the distance between electrodes in mm, E st is the internal field strength along the positive ion channels behind the head and U 0 is the equivalent potential required for ionization; it is approximately 20-45 kV [25,26].(Calculated voltages from (2) are compared with predicted voltages from machine learning in strongly inhomogeneous fields).
These equations are only used in case either gap distances or electrode shapes satisfy certain conditions; (1) can be applied to weakly inhomogeneous electric fields' needleplane or sphere-plane; (2) can be used in strongly inhomogeneous electric fields.Incorrect ionization coefficients (α) and improper K value cause the error in calculated BD voltages from (1).Breakdown in ambient air is also influenced by metallic particles and protrusion, which cause the distortion of the electric field [27,28].However, these are not involved in semi-equations.Thus, machine learning is necessary for finding non-linear relationships between multiple variables and BD voltages through the kernel function of SVR or hidden layer of MLP.Moreover, machine learning quantifies the effect of each variable over BD voltages in detail, and it is possible to categorize variables affecting BD voltages in each sphere-to-plane air gap.Thus, machine learning predicts BD voltages with only accurate variables related to electrode shapes, except for inaccurate data.This makes machine learning apply to various electrode shapes.

Electric Fields Properties of Sphere-to-Plane Electrodes
As shown in Figure 1, the electric field between electrodes is calculated along the shortest path with MAXWELL software (ANSYS MAXWELL 19 version).A voltage of 1 kV is applied to each sphere, and the ground potential is applied to the plane electrode.
Energies 2023, 16, 6221 3 of 13 which is required for electrical breakdown; α is the ionization coefficient.K and α are influenced by various factors, such as humidity, air density, gas mixture ratio and electrode shape [22,23].
Secondly, a streamer propagates between electrodes in case the applied voltage is large enough to sustain the propagation process [24].Semi-empirical Equation (2) for the BD voltage is expressed by the voltage drop of streamer head and propagation length; in (2), d is the distance between electrodes in mm,   is the internal field strength along the positive ion channels behind the head and  0 is the equivalent potential required for ionization; it is approximately 20-45 kV [25,26].(Calculated voltages from (2) are compared with predicted voltages from machine learning in strongly inhomogeneous fields).
These equations are only used in case either gap distances or electrode shapes satisfy certain conditions; (1) can be applied to weakly inhomogeneous electric fields' needleplane or sphere-plane; (2) can be used in strongly inhomogeneous electric fields.Incorrect ionization coefficients (α) and improper K value cause the error in calculated BD voltages from (1).Breakdown in ambient air is also influenced by metallic particles and protrusion, which cause the distortion of the electric field [27,28].However, these are not involved in semi-equations.Thus, machine learning is necessary for finding non-linear relationships between multiple variables and BD voltages through the kernel function of SVR or hidden layer of MLP.Moreover, machine learning quantifies the effect of each variable over BD voltages in detail, and it is possible to categorize variables affecting BD voltages in each sphere-to-plane air gap.Thus, machine learning predicts BD voltages with only accurate variables related to electrode shapes, except for inaccurate data.This makes machine learning apply to various electrode shapes.

Electric Fields Properties of Sphere-to-Plane Electrodes
As shown in Figure 1, the electric field between electrodes is calculated along the shortest path with MAXWELL software (ANSYS MAXWELL 19 version).A voltage of 1 kV is applied to each sphere, and the ground potential is applied to the plane electrode.Electric field distribution and the nonuniformity coefficient (NUC) are shown in Figure 2. Maximum electric field is inversely proportional to the radius.Electric fields are exponentially varying with respect to gap distances.As radius decreases, the electric field gradient increases around sphere electrodes.Nonuniformity coefficients range from 4.7 to 73 in all the test electrodes.In the case of a radius of 3 mm, the slope of NUC is much larger than that of other sphere electrodes.As radius increases from 3 to 10 mm, the slope rapidly decreases.Strongly inhomogeneous electric fields are shown in sphere-to-plane air gaps of 3 mm radius.Inhomogeneity is not large for spheres of radius 10 and 25 mm.

Suggested Input Features: Streamer Propagation Characteristics
A positive streamer triggers an electrical BD in a nonuniform field with a positive polarity if the air gap is <200 mm [29].The critical electric field for steamer penetration in the gap is 0.4-0.5 kV/mm, with a standard deviation of approximately 3% under standard atmospheric conditions [30,31].The probability of electrical BD increases as either the average electric field between electrodes or the electric field of the plane electrode approaches the critical electric field.Thus, the ratios of either the average electric field over the critical electric field or electric field of the plane electrode over the critical electric field are used as input parameters to represent streamer propagation characteristics.

Electrostatic Fields as Input Parameters
Electrostatic fields for machine learning are listed in Table 1.Eleven physical quantities are calculated and then classified into four groups.The maximum electric field (  ) and the electric field deviation (  ) are fundamental features in electrode systems.The capacitive characteristics involve the stored energy (  ) and the stored average energy in the air gap (   ).Streamer propagation characteristics are considered to represent the probability that a streamer penetrates between the sphere and the plane electrode.Inhomogeneity is a measure of nonuniformity imparted by variations in either the radius of the sphere or the gap distance.Eleven input parameters are defined at the point of the maximum electric field or along the path with an electric field that exceeds 90% of the maximum field.These parameters are expressed as follows: 1. Maximum electric field:

Suggested Input Features: Streamer Propagation Characteristics
A positive streamer triggers an electrical BD in a nonuniform field with a positive polarity if the air gap is <200 mm [29].The critical electric field for steamer penetration in the gap is 0.4-0.5 kV/mm, with a standard deviation of approximately 3% under standard atmospheric conditions [30,31].The probability of electrical BD increases as either the average electric field between electrodes or the electric field of the plane electrode approaches the critical electric field.Thus, the ratios of either the average electric field over the critical electric field or electric field of the plane electrode over the critical electric field are used as input parameters to represent streamer propagation characteristics.

Electrostatic Fields as Input Parameters
Electrostatic fields for machine learning are listed in Table 1.Eleven physical quantities are calculated and then classified into four groups.The maximum electric field (E max ) and the electric field deviation (E std ) are fundamental features in electrode systems.The capacitive characteristics involve the stored energy (E s ) and the stored average energy in the air gap (E s ave ).Streamer propagation characteristics are considered to represent the probability that a streamer penetrates between the sphere and the plane electrode.Inhomogeneity is a measure of nonuniformity imparted by variations in either the radius of the sphere or the gap distance.Eleven input parameters are defined at the point of the maximum electric field or along the path with an electric field that exceeds 90% of the maximum field.These parameters are expressed as follows: 1.
Maximum electric field: Energies 2023, 16, 6221 where E i is the electric field of the i-th element and n is the sum of elements in the shortest path; 2.
Electric field standard deviation: where E std is calculated along the shortest path and E ave is the average electric field; 3.
Energy stored along the shortest path in the air gap: where d i is the fine distance of the i-th element and E s is the energy stored along the shortest path.E s is dependent on the electrode shape; stored energy tends to become concentrated around the sphere owing to the nonuniformity;

4.
Average energy stored along the shortest path in the air gap: where E i is the electric field of the i-th element and n is the sum of elements in the shortest path;

5.
Ratio of the average electric field to the critical electric field: where E c0 is 0.5 kV/mm (the critical electric field for streamer propagation);

6.
Ratio of the plane electrode electric field to the critical electric field: where E c0 is 0.5 kV/mm (the critical electric field for streamer propagation).E g is the electric field of the plane electrode;

7.
Voltage drop: V_E 90 where V_E 90 is the voltage drop in the region with an electric field strength that exceeds 90% of the strength present with the maximum electric field;

8.
Path length: L_E 90 where L_E 90 considers the ionization and avalanche formation that precedes streamer inception and is the sum of the fine distances;

9.
Energy stored in the region where the field exceeds 90% of the maximum field: E s _E 90 11. Relative ratio of the voltage to the length: V r _E 90 , L r _E 90

Machine Learning Algorithms and Parameter Tunning 4.1. Support Vector Regression
An SVM is a machine-learning algorithm.An SVR is an SVM that solves regression problems.An ε-SVR identifies the hyperplane on which the loss range (ε) is acceptable; the soft margin is indicated by a slack variable (ξ).The primary optimization problem addressed by an ε-SVR is the problem represented in (16).C and ε serve as hyperparameters that enhance the accuracies of the predicted values [16,17]: ε-SVR is a convex quadratic problem, but it can be converted to a dual problem.The decision function of the ε-SVR is defined in (20).The radial basis function serves as the kernel that maps values from the original space to higher dimensions, and thus arranges the data in a linear manner.γ is a parameter of the kernel function, comprising the Euclidean distance between two points:

Bayesian Regression
Bayesian models are utilized in various fields, such as artificial intelligence and machine learning.Bayes' Theorem connects the prior probability and the posterior probability.Its Equation ( 22) is as follows [32]: where θ is a parameter to be estimated, p(θ) is the prior probability of θ, E is data and p(E) is a constant value.p(E|θ) is the likelihood.Bayesian inference uses prior probability and likelihood to estimate the posterior probability, which is Equation (23) [28,29]: This equation shows that the posterior probability keeps changing with new data.Since posterior probabilities can be used as new prior probabilities, data inference automation is possible.
The Bayesian regression uses Bayesian inference (conditional probability) for regression analysis between target (Y) and independent variables (X).The formula of Bayesian regression is (24) [33]; X consists of n attributes such as x 1 , x 2 , . . .and x n ; each attribute is assumed to be independent; Y, X and are random variables; is a singular value.The hyperparameters are found by GridSearch.

Multilayer Perceptron Neural Networks
Multilayer perceptron (MLP) neural network is a supervised learning algorithm in breakdown analysis [34].Given a set of input parameters (X = X 1 , X 2 , . .., X 3 ) and a target (Y), it can learn a model for regression.Figure 3 shows the conceptual multilayer perceptron for predicting BD voltages.MLP is a two layers model, which consists of an input layer, one hidden layer and an output layer.

Multilayer Perceptron Neural Networks
Multilayer perceptron (MLP) neural network is a supervised learning algorithm in breakdown analysis [34].Given a set of input parameters ( =  1 ,  2 , …,  3 ) and a target (), it can learn a model for regression.Figure 3 shows the conceptual multilayer perceptron for predicting BD voltages.MLP is a two layers model, which consists of an input layer, one hidden layer and an output layer.The input layer consists of a set of neurons representing input parameters.Each neuron in the hidden layer transforms the values from the previous layer with a weighted linear summation ( 1  1 +  2  2 +. . .+    ) , followed by the activation function.The output layer receives values from the hidden layer and transforms them into the output values.In a neuron of one hidden layer, MLP learns Equation (25). 1 and  2 are the weights of the input layer and hidden layer. 1 and  2 are the bias added to the hidden layer and the output layer. is the activation function and the identity function is used.The solver uses 'lbfgs'.The number of neurons in one hidden layer is found by GridSearch.

Feature Normalization and Parameter Tunning
The input parameters are normalized to eliminate the value deviation and unit effects; this normalization improves machine learning performance.The input parameters are normalized using (26): where   is the normalized value of the input parameter ();   and   are the minimum and maximum values, respectively.-fold cross-validation is used, considering the limited amount of data. is 3.A test dataset involves nine samples, which are randomly divided into three sub-datasets; two of these sub-datasets are used to train each SVR, the Bayesian regression model and MLP, whereas the remaining sub-datasets are used to validate models.GridSearch was used to select the hyperparameters and tune the model.The input layer consists of a set of neurons representing input parameters.Each neuron in the hidden layer transforms the values from the previous layer with a weighted linear summation (W 1 X 1 + W 2 X 2 + ... + W m X m ), followed by the activation function.The output layer receives values from the hidden layer and transforms them into the output values.In a neuron of one hidden layer, MLP learns Equation (25).W 1 andW 2 are the weights of the input layer and hidden layer.b 1 andb 2 are the bias added to the hidden layer and the output layer.g is the activation function and the identity function is used.The solver uses 'lbfgs'.The number of neurons in one hidden layer is found by GridSearch.

Feature Normalization and Parameter Tunning
The input parameters are normalized to eliminate the value deviation and unit effects; this normalization improves machine learning performance.The input parameters are normalized using (26): where X i is the normalized value of the input parameter (x); x min and x max are the minimum and maximum values, respectively.K-fold cross-validation is used, considering the limited amount of data.K is 3.A test dataset involves nine samples, which are randomly divided into three sub-datasets; two of these sub-datasets are used to train each SVR, the Bayesian regression model and MLP, whereas the remaining sub-datasets are used to validate models.GridSearch was used to select the hyperparameters and tune the model.

Breakdown Experimental Results for Dataset Design
The experimental setup to predict lightning impulse BD voltages was shown in Figure 4.This setup involved an impulse generator, electrodes and measurement systems.The generator delivered 1.2/50-µs standard impulses.Measurement systems stored voltage waveforms in an oscilloscope and a personal computer; measured voltages were derived from the voltage divider.BD experiments were conducted in an up-and-down method.BD voltages were measured 15 times under each condition.Calculated BD voltages were mean values, except for maximum and minimum voltages.

Model Learning and Testing
Good datasets were important for the enhancement of machine learning accuracy.Four groups of datasets were randomly made by NUC, as shown in Table 3.The samples in all the datasets were unique.Datasets were randomly selected to learn SVR, Bayesian regression and MLP models; one of four datasets was utilized to learn models; other datasets were applied to analyze the prediction accuracy.Nonuniformity coefficients (NUC) were evaluated to indicate nonuniform degrees of sphere-to-plane air gaps.Mean BD voltages and calculated NUC were shown in Table 2. NUC ranged from 7.134 to 55.647.Three samples in each electrode were uniformly and randomly extracted by considering NUC.Each dataset consisted of 9 samples.

Model Learning and Testing
Good datasets were important for the enhancement of machine learning accuracy.Four groups of datasets were randomly made by NUC, as shown in Table 3.The samples Energies 2023, 16, 6221 9 of 13 in all the datasets were unique.Datasets were randomly selected to learn SVR, Bayesian regression and MLP models; one of four datasets was utilized to learn models; other datasets were applied to analyze the prediction accuracy.Three error indices are applied to evaluate predicted voltages: root mean square error (RMSE), mean absolute percentage error (MAPE) and relative error (RE).RMSE is expressed by (28).MAPE is given by (29).Relative Error (RE) is the individual difference.
where U bi is the BD voltage and U pi is the value predicted by SVR, Bayesian regression and MLP.n is the number of samples.The error indices of four predictions were shown in Table 4. Predictions from each model showed good agreement with the experimental results.In the case of SVR, the maximum and minimum RMSE between predictions and a corresponding dataset were about 3.79 and 2.2 kV, and MAPEs were all <2.28%.The maximum and minimum RMSE between predictions from Bayesian regression and a dataset were about 4.95 and 1.71 kV.MAPEs were all <2.89%.The maximum RMSE of MLP was higher than that of SVR, while it was lower than that of Bayesian regression.The prediction accuracy of Bayesian regression was the lowest among three algorithms, based on RMSE and MAPE.The predicted BD voltages from SVR, Bayesian regression and MLP were compared with both experimental results and calculated voltages in a strongly inhomogeneous electric field with spheres of 3 mm radius, as shown in Figure 5.The predictions from three models exhibited good agreement with the experimental results.In particular, the maximum RE between the predicted voltages from SVR and experimental results was 4.13%.Maximum RE between the predicted voltages from Bayesian regression and experimental results was 6.30%.In the case of MLP, maximum RE was 7.75%.However, the maximum RE between calculated voltages from (2) and experimental voltages was approximately 13%.Predicted BD voltages were more accurate than calculated voltages from the semi-empirical equation.
There was also BD voltage difference between SVR and Bayesian regression by 2.17%.SVR was the most accurate among the three algorithms.Predicted BD voltages from three models were compared with experimental results in a weakly inhomogeneous electric field, as shown in Figure 6.The predictions generally agreed with experimental results.As shown in Figure 6a, the REs between predicted voltages from SVR and experimental results were within 4.05%.As there was an 8.51% error at the particular gap distances, the predicted voltages from SVR were more accurate than those from Bayesian regression.In a weakly inhomogeneous electric field with a sphere of radius 25 mm as shown in Figure 6b, REs between predicted voltages from SVR and experimental results were all <4.15%, except for a gap distance of 70 mm.In the case of predictions from Bayesian regression, REs were all <4.47%.Moreover, predicted voltages from SVR were analogous to those from MLP, except for a gap distance of 70 mm in weakly inhomogeneous electric fields.Predicted BD voltages from three models were compared with experimental results in a weakly inhomogeneous electric field, as shown in Figure 6.The predictions generally agreed with experimental results.As shown in Figure 6a, the REs between predicted voltages from SVR and experimental results were within 4.05%.As there was an 8.51% error at the particular gap distances, the predicted voltages from SVR were more accurate than those from Bayesian regression.In a weakly inhomogeneous electric field with a sphere of radius 25 mm as shown in Figure 6b, REs between predicted voltages from SVR and experimental results were all <4.15%, except for a gap distance of 70 mm.In the case of predictions from Bayesian regression, REs were all <4.47%.Moreover, predicted voltages from SVR were analogous to those from MLP, except for a gap distance of 70 mm in weakly inhomogeneous electric fields.

Disucussion
Prediction of the dielectric insulation strength was challenging owing to a complex relationship between various variables and the breakdown.Nevertheless, predictions from machine learning algorithms were shown to be similar to the datasets.As new features, streamer propagation characteristics were as important as known electrostatic features characterizing electrode shapes before discharge.Electric field gradients were unnecessary due to the time-varying Poisson's electric field during the discharge process.Suggested features were effective for lowering the feature dimension.
Three machine learning algorithms agreed well with the experimental results in all the test electrodes.In particular, predicted voltages were more precise than calculated results in range 80 mm-110 mm.SVR was more precise than MLP, which was easier to analyze than more complex systems by controlling the number of hidden layers.This showed that SVR as well as MLP could be sufficiently possible for analyzing breakdown voltages of a single medium with various factors, such as dust, salt and metallic particles.Prediction accuracy of Bayesian regression was lower than that of SVR and MLP.This was because variables affecting BD were dependent to some extent.
As there are no dominant equations for the physical discharge process under various conditions, more study is needed to confirm whether the suggested method and machine learning algorithms are applied for compressed single gas systems as well as insulator creepage structures, which are utilized in high-voltage power apparatuses.
(c) Predicted BD voltages from three models were compared with experimental results in a weakly inhomogeneous electric field, as shown in Figure 6.The predictions generally agreed with experimental results.As shown in Figure 6a, the REs between predicted voltages from SVR and experimental results were within 4.05%.As there was an 8.51% error at the particular gap distances, the predicted voltages from SVR were more accurate than those from Bayesian regression.In a weakly inhomogeneous electric field with a sphere of radius 25 mm as shown in Figure 6b, REs between predicted voltages from SVR and experimental results were all <4.15%, except for a gap distance of 70 mm.In the case of predictions from Bayesian regression, REs were all <4.47%.Moreover, predicted voltages from SVR were analogous to those from MLP, except for a gap distance of 70 mm in weakly inhomogeneous electric fields.

Disucussion
Prediction of the dielectric insulation strength was challenging owing to a complex relationship between various variables and the breakdown.Nevertheless, predictions from machine learning algorithms were shown to be similar to the datasets.As new features, streamer propagation characteristics were as important as known electrostatic features characterizing electrode shapes before discharge.Electric field gradients were unnecessary due to the time-varying Poisson's electric field during the discharge process.Suggested features were effective for lowering the feature dimension.
Three machine learning algorithms agreed well with the experimental results in all the test electrodes.In particular, predicted voltages were more precise than calculated results in range 80 mm-110 mm.SVR was more precise than MLP, which was easier to analyze than more complex systems by controlling the number of hidden layers.This showed that SVR as well as MLP could be sufficiently possible for analyzing breakdown voltages of a single medium with various factors, such as dust, salt and metallic particles.Prediction accuracy of Bayesian regression was lower than that of SVR and MLP.This was because variables affecting BD were dependent to some extent.
As there are no dominant equations for the physical discharge process under various conditions, more study is needed to confirm whether the suggested method and machine

Conclusions
In this paper, machine learning algorithms were investigated for predicting positive lightning impulse BD voltages in sphere-to-plane electrode systems with various nonuniformities.A method was also suggested by introducing streamer propagation characteristics as new features and by removing electrostatic field gradients as unnecessary features, to reduce feature dimensions for three algorithms, which were SVR, Bayesian regression and MLP neural network.These algorithms were trained based on new features during the discharge process and known electrostatic features characterizing electrode shapes before discharge.The predicted voltages from each model were compared with experimental results and calculated voltages from the semi-empirical equation: (1) The maximum RMSE between predictions from SVR and datasets was 3.79 kV, while maximum RMSEs between predictions from other models (Bayesian regression, MLP) and datasets were 4.95 kV and 4.04 kV, respectively.The predictions agreed well with

Figure 1 .Figure 1 .
Figure 1.Electric field in sphere-to-plane air gaps for a sphere of radius 10 mm and a gap distance of 100 mm.(a) Schematic electrode; (b) electric field.(A voltage of 1 kV is applied to a sphere).Electric field distribution and the nonuniformity coefficient (NUC) are shown in Figure 2. Maximum electric field is inversely proportional to the radius.Electric fields are exponentially varying with respect to gap distances.As radius decreases, the electric field gradient increases around sphere electrodes.Nonuniformity coefficients range from 4.7 to 73 in all the test electrodes.In the case of a radius of 3 mm, the slope of NUC is much

Energies 2023 ,Figure 2 .
Figure 2. Electric field distributions of a function of the air gap and the radius of sphere.(a) Electric field; (b) nonuniform coefficient (NUC).(NUC = maximum electric field/average electric field).

Figure 2 .
Figure 2. Electric field distributions of a function of the air gap and the radius of sphere.(a) Electric field; (b) nonuniform coefficient (NUC).(NUC = maximum electric field/average electric field).

Energies 2023 ,
16, 6221 8 of 13 method.BD voltages were measured 15 times under each condition.Calculated BD voltages were mean values, except for maximum and minimum voltages.

Figure 4 .
Figure 4. Experimental setup.Nonuniformity coefficients (NUC) were evaluated to indicate nonuniform degrees of sphere-to-plane air gaps.Mean BD voltages and calculated NUC were shown in Table 2. NUC ranged from 7.134 to 55.647.Three samples in each electrode were uniformly and randomly extracted by considering NUC.Each dataset consisted of 9 samples.Nonuniform coefficients (NUC) =   / .

Table 2 .
Mean BD voltages and nonuniformity coefficients for datasets.

Table 2 .
Mean BD voltages and nonuniformity coefficients for datasets.

Table 3 .
Datasets and sizes.

Table 4 .
Error indices of four predictions and comparison among SVR, BR and MLP.