Estimation of Effective Length of Type-A Grounding System According to IEC 62305-3 Using a Machine Learning Regression Model

: Two types of grounding systems are recommended for use in the international standard IEC 62305-3, Part 3: Physical damage to structures and life hazard. One of these is a radial-based grounding system (type-A), which is used in soil resistivities of up to 3000 Ω m and is considered in this paper. It is a well-known fact that during lightning strikes, only a part of the grounding wire contributes to dissipating the lightning current into the surrounding soil. This effective part of the grounding system depends on several features, such as soil resistivity, burial depth, and rise time of the dissipated lightning current. The effect of all of these features on the effective length of the type-A grounding system is explored in this paper. A suitable supervised machine learning regression model is developed, which will enable readers to accurately approximate the effective length of the type-A grounding system for realistic values of input features. The trained model in the paper yielded an R 2 value of 0.99998 on the test set. In addition, two simple mathematical formulas are also provided, which produce similar but less accurate results ( R 2 values of 0.989883 and 0.998557, respectively).


Introduction
Grounding systems come in various forms and sizes, with their primary function being the safe dissipation of electric current into the surrounding soil.This dissipated current may originate from human activities (including direct current, alternating current, or transient current) or natural phenomena (such as lightning currents).Lightning currents [1], or transient currents in general, including those generated by switching operations [2], uniquely impact grounding systems.Specifically, when a grounding system is subjected to a transient current, only a portion of the grounding system effectively dissipates the current into the surrounding soil, while the remaining part of the grounding system plays a passive role.Throughout the years, numerous researchers have extensively studied this phenomenon in grounding systems of varying complexity.For example, effective lengths of simple horizontal or vertical grounding electrodes are examined in [3,4] and, more recently, in [5][6][7][8], to name a few.More complex grounding electrode configurations and their effective areas are analyzed in [9] and, more recently, in [10][11][12].In addition to analyzing the effect of this phenomenon, some researchers have developed approximation formulas to predict the area (or length) of the grounding system that participates in the dissipation of transient current depending on several input parameters such as soil resistivity, the rise time of the dissipated lightning function, and the peak value of the lightning current [9,[13][14][15].These approximation formulas basically utilize the least-squares method to fit the selected formula to the analytically computed data.In this paper, we will improve upon this approach by providing our own approximation formulas with the addition of utilizing advanced machine learning (ML) regression models to achieve a much better fit.
In the scope of this paper, only one type of grounding system will be observeda radial-based grounding system as described in the international standard IEC 62305- 3 [16].This grounding system, the so-called type-A grounding system, is characterized by a set of horizontal electrical ground electrodes, buried parallel to the Earth's surface, all radiating from a central point.The simplest form of this grounding system, which consists of two grounding electrodes, will be the object of observation in this paper.This configuration is mainly used in lightning protection of relatively low dislocated objects such as meteorological stations that have only one conductor.The standard [16] states that in the type-A arrangement, the minimum number of ground electrodes should be one for each down-conductor and at least two for the entire lightning protection system.Note that this grounding electrode with a central point lightning injection is also observed and analyzed in [11,15].In this paper, it is our intention to model the mentioned type-A grounding system using a well-tested frequency-domain-based algorithm [17] and to subject the grounding system model to a number of various lightning strikes with the purpose of finding the effective length of the grounding system.To further refine our effective length analysis, we will modify two additional input parameters alongside the lightning current: the burial depth of the grounding system and the soil resistivity.Note that the influence of the burial depth of the grounding electrode on its effective length was not analyzed in any of the available references.This is to be expected, since the electrodes are almost always buried at a standard prescribed depth.However, as in all practical cases, a scenario may occur where the actual burial depth will deviate from the prescribed depth, so we included this as an input parameter to our model.It is also important to note here that in our analysis we disregarded the frequency dependence of the soil parameters, and we disregarded the beneficial effect of the soil ionization phenomenon [18].This nonlinear effect, in reality, reduces the effective length value of the grounding electrode, so in our analysis, we practically conservatively overestimate the effective length values.Due to this fact, the peak value of the lightning current will not influence the effective length in our approximation since the soil ionization is neglected.In addition to disregarding the soil ionization, we assumed a constant soil permittivity value in all considered cases since, in our preliminary analysis, we found that the effect of this parameter on the solution was less significant than the effect of the chosen input parameters.By varying the chosen parameters (soil resistivity, rise time of the lightning current, and the burial depth of the grounding system), we aim to generate a comprehensive dataset containing effective length values of the type-A grounding system for various input parameter combinations.This extensive dataset will serve as a foundation for applying various regression algorithms with the aim of developing predictive models that will enable users to accurately estimate the effective length of the type-A grounding system in most practical cases.
To address this research goal, we explored multiple approaches.Initially, we tested various mathematical regression functions on the input dataset, utilizing nonlinear leastsquares estimation to optimize the function parameters for a better fit.Through this process, we identified two regression functions that provided satisfactory results and are relatively easy to calculate.To further improve our approximations, we then applied a range of ML regression models, training them on the dataset.Our testing revealed that only a subset of these ML models delivered superior approximation results.In this paper, we will present only the best-performing ML regression model to avoid redundancy.We validated both the mathematical regression functions and the ML regression model using standard regression quality metrics.This will be elaborated on in the corresponding section.
The paper is organized as follows.In the second section, following the Introduction, we provide a summary of the methodological approach used in this study.This includes a description of the observed type-A grounding system [16], the definition of the effective length used in the paper, and an explanation of the iterative procedure for determining the effective lengths of the type-A grounding system for various input parameters.The third section features an overview of the input parameters that were varied to generate the effective length dataset.The section includes a detailed description of each input parameter along with its observation interval and an initial feature analysis to identify which parameters most significantly impact the effective length.Additionally, this section describes the resulting dataset.In the fourth section, on the basis of the described dataset, we develop two simple mathematical regression functions followed by a detailed regression quality analysis.These formulas are intended for quick engineering use.It is important to emphasize here that the developed formulas take into account the burial depth of the grounding electrodes unlike the formulas available in the literature.In the final section, we explore the possibility of applying a specific ML regression model to our dataset in order to further increase the accuracy of the effective length approximation.The Gaussian process regression model was used in our paper since it performs extremely well on smooth functions, especially when a suitable kernel is selected.The model was trained and tested in the section and the reader was provided with the optimized values of the model hyperparameters so the results may be replicated without the training procedure.

Computation of the Effective Length of Type-A Grounding System According to IEC 62305-3 2.1. Description of the Considered Grounding System
As mentioned in the introductory section, the international standard [16] recommends the use of two types of grounding systems depending on the resistivity of the homogeneous soil in which the grounding system is buried.Concerning the soil, this is naturally an oversimplification since the soil is rarely homogeneous and more often heterogeneous in nature [19].This heterogeneity can, in most cases, be approximated relatively accurately using horizontal layers of varying resistivity.In practice, the heterogeneity of most soils is adequately approximated using a two-layer earth model [20], only in some cases extending to a three-layer earth model.However, in this paper, we will concern ourselves with the homogeneous soil as prescribed in the standard [16].
Regarding the grounding systems mentioned in the standard [16], in the scope of this paper we will observe the type-A grounding system intended for use in homogeneous soils of resistivity up to 3000 Ωm, as stated in [16].This is a type of radial earth electrode arrangement which comprises horizontal or vertical electrodes that do not form a closed loop.A minimum of two earth electrodes is prescribed, although this type of grounding system can have more.In this paper, we will limit our observations to the simplest arrangement consisting of two horizontal earth electrodes positioned one opposite another as depicted in Figure 1.The length ℓ 1 of each electrode is also prescribed in the standard [16] depending on the earth resistivity and the desired protection class.However, it is a wellknown fact that during the dissipation of lightning current into the surrounding soil, only a part of the grounding grid arrangement contributes to this dissipation-i.e., the effective part of the grounding system.Therefore, it would be beneficial to ascertain the effective length of this type of grounding system when it is subjected to a lightning strike in order to optimize the usage of the number of grounding grid conductors that are installed.This is discussed in the following subsection.

Definition of Effective Length of the Grounding System Used in This Paper
The effective length of the grounding electrode can be defined in various ways.According to [4], the effective length of the grounding electrode represents the optimal length of the electrode whereby increasing its length does not significantly lower the impulse impedance of the grounding electrode.This impulse impedance is a well-known parameter [3], which is defined as the ratio between the peak values of the scalar potential at the injection point and the transient current at the injection point of the grounding system.
In the scope of this paper, we have selected the following condition to define the effective length of each electrode for the observed grounding system: when increasing the length of the grounding electrode does not produce a significant decrease in impulse impedance, then the effective length of the grounding electrode ℓ e f f is considered to be found.Or put more mathematically, the condition is satisfied when the rate of change (ROC) of impulse impedance Z relative to the increasing length of the grounding system ℓ becomes sufficiently small [4]: where ∆Z is the change in the impulse impedance and ∆ℓ is the change in the grounding electrode length.The 5 • angle of the slope taken from [4] is an often-used cut-off angle value, after which one can safely assume that the impulse impedance will not decrease in a significant manner when increasing the grounding electrode length.

Methodology Used for Obtaining the Type-A Grounding System Effective Length Dataset
In this section, we will briefly outline the procedure developed for obtaining the effective length of each electrode for the type-A grounding system.The described methodology will be repeated a large number of times since we will modify the mentioned input parameters in order to ascertain their effect on the effective length (this will be discussed in the following section).At the foundation of our methodological approach stands a previously developed robust and accurate transient electromagnetic model of a grounding system, which was extensively tested on much more complex grounding grid systems and soil configurations [17,21].Its accuracy was verified both by comparison with published algorithms as well as commercial software such as XGSLab (version 9.6.1)[22] and CDEGS (version 13.0) [23].So, for our case of a simple grounding electrode buried horizontally in homogeneous soil, this model will certainly provide results of sufficient accuracy.It is important to note here that in our computations the beneficial effect of soil ionization [18] on the impulse impedance of the grounding electrode will be disregarded in the scope of this paper.Also, as mentioned in the introductory section, the frequency dependence of soil parameters is disregarded, and a constant soil permittivity value of 10 is chosen for all simulations.It is our plan to modify some of these assumptions in our future work and to investigate the effect this produces.
The methodology of obtaining the effective length of each electrode for the type-A grounding system is relatively simple, although time-consuming Figure 2. In the first step of every simulation, a 0.5 m long type-A grounding system is buried at a certain depth parallel to the Earth's surface, and a lightning current is injected into its central point.Note here that this corresponds to two segments each of length ℓ 1 = 0.25 m as seen in Figure 1.Next, the computation of impulse impedance is performed and the impulse impedance for that grounding rod length is noted.Then, the grounding electrode is extended in each direction by 0.25 m, and the computation procedure is repeated.If the change of the impulse impedance is considered insignificant according to the previously defined condition, then it is the effective length is considered found.If not, the process is repeated by extending the electrode by an additional 0.25 m in each direction until the effective length is found.This iterative process is extremely time-consuming since it needs to be repeated for every input parameter combination.Note again that three input parameters are modified: the soil resistivity, the rise time of the lightning current injected into the grounding system, and the burial depth of the grounding system.More about the selection of their ranges and chosen sample values is provided in the following section.

Analysis of Input Parameters and an Overview of the Effective Length Dataset
Using the previously described methodology, we performed a series of numerical simulations to determine the influence of various input parameters (features) on the effective length of the type-A grounding system.Specifically, we performed a total of 880 simulations, varying three key features: soil resistivity, burial depth of the ground electrode, and the rise time of the lightning current dissipated into the surrounding soil.Please note that the results of simulations are available for download as a public dataset [24].
In the international standard [16], a type-A grounding system is recommended for use in soils with resistivity values up to 3000 Ωm, effectively establishing an upper limit for our selected feature set.As for the lower limit of our soil resistivity feature set, reference [25] and several other sources indicate that soils originating from the Cretaceous period exhibit "unusually low" resistivity values, with loam and clay reaching as low as 10 Ωm.With the lower and upper limits established, other values in our soil resistivity feature set are (more or less) uniformly selected within the defined interval, resulting in a set of 11 soil resistivity values: ρ = {10, 50, 100, 250, 500, 750, 1000, 1500, 2000, 2500, 3000} Ωm.Note that more soil resistivity values were selected in the interval up to 1000 Ωm for our feature set, which is consistent with reference [25], where a larger number of soil types are also listed in the mentioned interval.Note that in [25], soil types of resistivity values 1000 Ωm and 3000 Ωm are characterized as having "high" and "very high" resistivity, respectively, indicating that soils with these kinds of resistivities are less common, which is consistent with our feature set selection.
The second feature we varied in the mentioned simulations that potentially influenced the effective length of the grounding electrode was the electrode burial depth.Similar to the soil resistivity value selection, in this case, we also chose a realistic interval of electrode burial depths, ranging from 0.25 m to 2 m (see for example [26]).This resulted in a set of eight type-A grounding system burial depths: d = {0.25,0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2.0} m.
The third and final feature we varied in the simulations was the rise time of the lighting current that is dissipated into the surrounding soil via the type-A grounding system.Observing the recent statistical analysis of real lightning strikes in [27] and the rise times prescribed for positive and negative subsequent strokes in the international standard [28], the following set of ten rise time values was selected: T 1 = {1.0,2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0} µs.
An excerpt from the dataset [24] containing all combinations of previously described features along with the computed values of the type-A grounding system effective length (of each electrode) is provided in Table 1.A visualization of these data is slightly problematic since the dataset contains three feature columns and one response column, which would correspond to a 4D plot.Although possible, we found that the readers would best benefit if we isolated, for example, all the data for the burial depth of 0.5 m.Note that a 3D surface plot has been generated in Figure 3 instead of only the data points.This visual interpolation was performed solely to improve the reader's viewing experience.After performing all 880 simulations and determining the effective lengths of each type-A grounding electrode, a preliminary analysis of feature importance was performed using the Minimum Redundancy Maximum Relevance (MRMR) algorithm [29].As expected, it was determined that not every selected feature has a significant impact on the effective length of the grounding electrode.Importance scores of all three selected features on the effective length are depicted in Figure 4. Similar feature importance was also obtained by the ANOVA F-Test, but results were not provided here to avoid redundancy.Observing Figure 4, it can be seen that the soil resistivity dominates as the most crucial feature influencing the effective length of the grounding electrode, boasting an MRMR importance score of 1.7347.Rise time trails significantly behind, with a score of 0.0659, whereas the burial depth of the grounding electrode, as a factor affecting its effective length, was considered practically inconsequential, evident from its MRMR importance score of 0.0015.

Nonlinear Regression of the Type-A Grounding System Effective Length
The first approach to developing a predictive interpolant model of the effective length of each electrode based on the available dataset [24] will be the selection of a suitable mathematical model function with which we will try to approximate the effect of the independent variables (features) on the electrode effective length.In order to test our approximation function properly, we first divided the 880 simulation results into a randomly chosen training set comprised of 90% of the simulation results, whereas the remaining 10% of the results represent the test set.The main idea, as usual, is to use the training set to perform the regression procedure and then use the test set to validate the quality of the produced approximation function and its estimated coefficients.After a number of numerical and mathematical experiments, a simple product of power functions was selected as a model function for the prediction of the effective length of each electrode for the type-A grounding system: where βs are the unknown regression function coefficients.We fitted the selected model function to the input data from the training set using a nonlinear least-squares regression algorithm [30] and thus obtained optimized values of the unknown coefficients.These estimated coefficients, along with their standard errors, are given in Table 2.In order to confirm the validity of the derived regression function, we conducted a comprehensive analysis of our regression model's performance using several regression quality metrics.This included creating scatter plots of actual vs. predicted values, as well as a residuals plot for both the training and test sets.Additionally, we evaluated three standard regression metrics [31,32]: the Root Mean Squared Error (RMSE), the R 2 value, and the Mean Absolute Error (MAE), for both datasets.
Figure 5a,b depict the actual vs. predicted values of the effective length of the type-A grounding system obtained using Equation (2) for the training set and the test set, respectively.Similarly, Figure 6a,b depict the residuals of predicted values obtained using Equation (2) for the training set and the test set, respectively.Observing Figure 5a for this simplest approximation function, when applying the regression formula given by Equation ( 2) with the optimized coefficient value given in Table 2 on the training set, the produced results deviate from the perfect score moderately.They, nevertheless, yield relatively accurate results, although the data show clear signs of heteroscedasticity [33], i.e., the heterogeneity of variance.The moderately accurate results provided by the regression function are further validated on the test set (Figure 5b), where the regression function defined by Equation (2) performs similarly.A different visualization of this model error is presented in Figure 6a,b, where the residuals of the regression function are depicted relative to the predicted values.It is evident that, for this case, the residuals vary ±10 m from the predicted responses, especially for greater values of the grounding system's effective length.Note that the regression function displays a certain amount of heteroscedasticity for both plot types, which indicates that the variance progressively increases as the predicted values increase.A Breusch-Pagan test was performed for the regression function given by Equation ( 2) with parameters given in Table 2, and the p-values obtained were 1.2231 × 10 −58 for the training set and 9.7 × 10 −7 for the test set, which are well below the usual cut-off value of 0.05.Attempts to increase these p-values using the usual methods, such as input data transformation, yielded little to no effect, although this was to be expected considering the simplicity of the regression function in question.Moving on to the numerical regression quality metrics mentioned before-these metrics confirm the previous visual analysis (Table 3).The R 2 value of the regression function is high for both the training and test sets, which is to be expected given that the results cluster around the perfect score.The average error, measured by the RMSE, is consistent across both sets, approximately 4 m.In conclusion, this simplest regression function will yield moderately satisfactory results.In addition to this simplest regression formula, we also attempted to fit the input data using a slightly more complex model function with hopes of obtaining more accurate results: As before, we fitted the selected model function to the input data from the training set using a nonlinear least-squares regression algorithm and obtained optimized values of the unknown coefficients.These values, along with their standard errors, are given in Table 4.   Again, the regression quality metrics confirm this visual analysis (Table 5)-the R 2 is closer to 1, and the RMSE is almost three times lower for both the training and test sets.Similarly, other metrics also reflect this level of approximation quality.

Using Advanced Supervised ML Regression Models for a Better Fit
In an attempt to enhance the accuracy of the predictive model and, possibly, reduce the heteroscedasticity present in the models from the previous section, we utilized a comprehensive range of supervised ML regression algorithms.These included various types of regression trees (fine, medium, and coarse) [34], support vector machines with diverse kernels such as quadratic, cubic, and Gaussian kernels of varying coarseness [35], Gaussian process (GP) regression models with different kernels [36], kernel approximation regression models, ensembles of trees [37], and neural networks [38].We obtained the best results using the GP regression models followed closely by the neural network approach whereas other ML algorithms did not prove to be suitable for our interpolation problem.Note that we performed a random search procedure to optimize the selection of suitable hyperparameters for each tested ML regression model.Therefore, we decided to provide only the best fit obtained using the GP regression in this chapter to avoid redundancy.
The mentioned GP regression is a well-known non-parametric Bayesian approach to regression that utilizes stochastic Gaussian processes and is generally considered an excellent choice for multidimensional interpolation problems.It completely eliminates the usage of the least-squares method in the regression analysis, which could possibly eliminate the previously observed heteroscedasticity of predicted results.In this paper, the GP regression model will be implemented using the scikit-learn library [39].
Basically, the GP regression model applied to any input data as an approximation model can be briefly described as a combination of a prior mean function and a covariance or kernel function: The regression quality metrics also reflect this visual analysis (Table 6)-the R 2 is much closer to 1, and the RMSE is almost nine times lower for both the training and test sets.Although the predicted results look homoscedastical in nature, a Breusch-Pagan test was again performed to confirm the elimination of previously observed heteroscedasticity.The trained and optimized GP regression model yielded p-values of 0.18147 for the training set and 0.087989 for the test set, which is above the usual cut-off value of 0.05.Therefore, we can conclude that when using the trained GP regression model to interpolate effective length values in the prescribed input data intervals, heteroscedasticity will be absent from the predicted results.1.000000 0.999980

Discussion and Conclusions
In this section, we will briefly summarize the main features of the three regression procedures previously presented.We developed two mathematical regression formulas and an ML regression model based on GP to predict the effective length of a subset of the type-A grounding system with varying degrees of accuracy.All developed procedures are practically interpolant predictive models, which are valid for the following ranges of input parameters: 10 Ωm ≤ ρ ≤ 3000 Ωm, 1 µs ≤ T 1 ≤ 10 µs, and 0.25 m ≤ d ≤ 2 m.We believe these intervals encompass most practical cases of potential use.
The analyzed subset of the type-A grounding system consists of two opposing grounding electrodes.This configuration is used mainly in grounding systems of very small dislocated objects such as, for example, hydrological or meteorological stations.These kinds of objects, due to their size, most often have only one down conductor, whereas the standard [16] prescribes a minimum of two ground electrodes per grounding system-thus the centrally fed observed system.The standard [16] also prescribes curves of minimal length ℓ 1 of type-A grounding electrodes depending only on the soil resistivity and the Lightning Protection Level (LPL) chosen.In our paper, we analyzed the effective lengths of the electrodes for different values of soil resistivity, rise time, and even burial depth for this subset of the type-A grounding system.Although a comparison between these two lengths would be interesting, it is somewhat difficult to perform since the standard provides, for example, a curve of minimal length for the LPL I independent of the lightning current rise time and the burial depth.If we, for example, consider our subset of type-A grounding system buried at a depth of 0.5 m in a soil of resistivity 3000 Ωm, our analysis yields effective lengths of each of the electrodes ranging from 78.25 m for a 1 µs rise time to 134 m for a 10 µs rise time.The standard [16] yields a minimal length ℓ 1 of the grounding electrode of 80 m for our chosen scenario for all cases.On the other hand, our proposed regression formula and the ML regression model based on GP provide the readers with optimal lengths of the grounding electrodes for the considered subset of the type-A grounding system.
All the presented regression approaches yield satisfactory results, especially when the accuracy of the prediction is put into context with the complexity of usage of that particular approach.Naturally, the simplest regression formula, which is the easiest to use, produces results with the highest deviations between the predicted effective length values and the actual values, although these deviations are not substantial.The slightly more complex regression formula produces significantly lower deviations, as demonstrated in the previous sections.Both formulas are intended for quick engineering estimates.
In contrast, the last approach, a GP-based ML regression model, is designed for more complex estimations of the effective length of the considered grounding system where accuracy is more important.This model provides significantly more accurate prediction results compared to the other two models, as seen in the previous section.We presented only one ML regression model in the paper-the one that consistently yielded the best prediction results in our numerical experiments.Many other models were tested but were intentionally excluded from the paper to maintain clarity and avoid clutter.
It is worth pointing out at the end of the paper that in our future work we plan to extend our effective length analysis to complex grounding systems.These include, for example, end-fed simple grounding rods as well as various interpretations of the type-A and type-B grounding systems according to [16].In addition to this, we will explore

Figure 2 .
Figure 2. Methodology of obtaining the effective length dataset.

Figure 3 .
Figure 3. Visualization of the dataset for a burial depth of 0.5 m.

Figure 4 .
Figure 4. Feature importance scores sorted using the MRMR algorithm.

FigureFigure 7 .
Figure 7a,b depict the actual vs. predicted values obtained using Equation (3) for the training set and the test set, respectively.Figure 8a,b depict the residuals vs. predicted results obtained using Equation (3) for the training set and the test set, respectively.It is immediately noticeable that by introducing a sum of two products of power functions, more accurate predicted results were obtained.Results are clustered much closer to the perfect score line for both the training set and the test set (Figure 7a,b).This is also seen when observing the residuals plot (Figure 8a,b), where we can see that the maximum residual values peak at cca 4 m.Note that in this case, heteroscedasticity is still present in the model, which is observable from previous figures and from the p-values of the Breusch-Pagan test-3.4499× 10 −38 for the training set and 0.000237 for the test set.

Figure 9 .Figure 10 .
Actual vs. predicted values obtained using the GP regression model: (a) Training set, (b) Test set.Residuals vs. predicted values obtained using the GP regression model: (a) Training set, (b) Test set.

Table 2 .
Estimated values of coefficients for the regression function given by Equation (2).

Table 3 .
Regression quality metrics for the first regression function given by Equation (2).

Table 4 .
Estimated values of coefficients for the regression function given by Equation (3).

Table 5 .
Regression quality metrics for the second regression function given by Equation (3).

Table 6 .
Regression quality metrics for the trained GP regression model on the training set and the test set.