The General Regression Neural Network Based on the Fruit Fly Optimization Algorithm and the Data Inconsistency Rate for Transmission Line Icing Prediction

: Accurate and stable prediction of icing thickness on transmission lines is of great signiﬁcance for ensuring the safe operation of the power grid. In order to improve the accuracy and stability of icing prediction, an innovative prediction model based on the generalized regression neural network (GRNN) and the fruit ﬂy optimization algorithm (FOA) is proposed. Firstly, a feature selection method based on the data inconsistency rate (IR) is adopted to select the optimal feature, which aims to reduce redundant input vectors. Then, the fruit FOA is utilized for optimization of smoothing factor for the GRNN. Lastly, the icing forecasting method FOA-IR-GRNN is established. Two cases in different locations and different months are selected to validate the proposed model. The results indicate that the new hybrid FOA-IR-GRNN model presents better accuracy, robustness, and generality in icing forecasting.


Introduction
The transmission line ice coating can cause many types of accidents, including those related to flashover performance of the ice-covered insulator, breakage of the ground line, and the collapse of the tower [1].They seriously affect the stability and security of power system operation.Since the recording of icing accidents began, cases of transmission line ice coating causing the fall of high voltage (HV) transmission line towers as well as wire breakages have been reported at home and abroad.Some accidents are serious.In January 1998, a week-long ice disaster occurred in Canada, which caused a blackout for one million users [2].January 2008 witnessed four successive large-scale rainy and snowy storms in the south of China.The electricity grid was seriously iced and the power line was repeatedly broken, resulting in a direct economic loss of 10.45 billion CNY [3].Therefore, establishing a prediction model of icing thickness and predicting the icing thickness of transmission lines accurately are of great significance for ensuring the security and stability of the power grid.
Currently, some scholars at home and abroad are researching icing thickness prediction of the transmission line.They have put forward a variety of forecasting models, which mainly include mathematical physics prediction models, statistical prediction models, and intelligent prediction models.The mathematical physics prediction model mostly predicts the icing thickness of the transmission line, based on the fluid motion law and the heat transfer mechanism of the wire icing [4].The authors of [5], from the view of aerodynamics and thermodynamics, establish an icing forecasting model including the super-cooled water drop and the heat transfer process on ice.The authors of [6] point out that the icing of transmission lines is the result of coupling effect of thermodynamics, hydromechanics, and the electric current and field.On this basis, the physics prediction model of icing thickness is built.In addition, typical mathematical physics prediction methods for icing thickness include the Imai model [7], the Goodwin model [8] and the Lenhard model [9].However, due to the fact that some of the parameters in the mathematical physics prediction model are difficult to obtain through the measurement in the actual line, such models are more difficult to apply directly to the ice prediction of the actual transmission lines.The statistical prediction model is based on the statistical laws of icing thickness of transmission lines [10], mainly including the extrema prediction model [11], Markov chain prediction model [12], and so on.However, the icing thickness prediction model based on the data statistics method cannot be extended to other transmission lines with different geographical environments, so the desired effect of this model is not satisfactory.Therefore, under the background of rapid development of artificial intelligence technology, it is more significant to predict the icing thickness of transmission lines by using intelligent prediction methods.Intelligent prediction methods mainly include artificial neural networks (ANNs) [13] and the support vector machine (SVM) [14].Here, the back-propagation neural network (BPNN) is typical of ANNs.Luo et al. [15] presented an icing forecasting model of BPNN based on Levenberg-Marquardt and obtained a higher prediction accuracy than the statistical forecasting model.However, the BPNN has the problem of many parameters to set, and can easily to fall into over-fitting or local optimum.For avoiding the local optimum problem, some scholars began to adopt the SVM model in the field of icing prediction.Li et al. [16] proposed a model based on the SVM for icing forecasting and its generalization ability is better than the model based on the BPNN.Ma et al. [17] introduced a short-term prediction model of icing thickness based on grey SVM, and it was pointed out that the model can achieve better prediction effect in ice-prone areas.However, it is difficult for the SVM model to deal with large-scale training samples, so it cannot obtain ideal prediction accuracy.The generalized regression neural network (GRNN) is a kind of radial basis function neural network proposed by Specht, which has a strong ability for nonlinear mapping [18].Compared with the BPNN and the SVM, the GRNN has fewer adjustment parameters, does not easily fall into local minima, and is good at processing large-scale training samples.In addition, the GRNN has an advantage in forecasting volatile data.Therefore, the GRNN has been widely employed in the field of prediction, such as electricity price forecasting [19], energy consumption forecasting [20], and traffic flow forecasting [21].Zhang et al. [19] introduced a novel hybrid forecasting model using the GRNN combined with wavelet transform for electricity price forecasting, and this model obtained better forecasting performance compared with the BPNN and SVM.Zhao et al. [20] utilized the GRNN model to forecast the annual energy consumption due to its good ability for dealing with the nonlinear problems.Leng et al. [21] established a short-term forecasting model of traffic flow based on the GRNN and it has stronger approximation capability and higher forecasting accuracy than the forecasting models of the radial basis function (RBF) and back-propagation (BP) neural network.
However, it is difficult to determine the smoothing factor in the GRNN model exactly and the selection of this parameter has a significant influence on its forecasting performance.Intelligent optimization algorithms such as the genetic algorithm (GA) [22] and particle swarm optimization (PSO) [23] are usually taken to select parameters for forecasting models.Gao et al. [22] proposed the GA to optimize the initial weights and thresholds of BPNN for housing price prediction, which accelerated the convergence rate of BPNN and improved the prediction accuracy of house prices.Ye [23] presented a kernel extreme learning machine model based on particle swarm optimization (PSO-KELM) to predict the power interval of wind power.PSO algorithm is utilized to optimize the output weights of KELM, and satisfactory prediction results are obtained.The above algorithms effectively improved the forecasting accuracy but also presented the malpractice of easily falling into local optimum.In order to overcome the drawbacks, the fruit fly optimization algorithm (FOA) [24], based on the behaviors of food finding, was proposed by Pan in 2011.This method only needs to set a few parameters and performs at a relatively high speed for optimum searching with wide applications [25].Sun et al. [26] introduced a new model based on wavelet transform and the least-squares support vector machine (LSSVM) optimized by the FOA for short-term load forecasting and compared the forecasting results between the proposed model and least-squares SVM optimized by PSO, which demonstrated that the FOA performed better than PSO.In addition, Li et al. [27] presented an LSSVM-based annual electric load forecasting model optimized by the FOA, and the proposed model obtained better forecasting effectiveness than the LSSVM optimized by the coupled simulated annealing algorithm (CSA).Hence, the FOA is utilized to adjust the appropriate smoothing factor in the GRNN model.
In addition, many factors can influence the formation of icing on the transmission line.If all the influencing factors are used as input indicators of the forecasting model, there will be a lot of redundant data [28].Hence, the feature selection is also of great significance.Feature selection is about identifying and selecting the appropriate input vector in the prediction model to reduce redundant data and improve computational efficiency.The inconsistency rate (IR) model refers to dividing the feature set into many feature subsets and calculating the minimum inconsistency under this partition mode, so as to determine the optimal feature subsets and complete the feature selection [29].Ma et al. [30] employed the IR model to select the input features of the short-term load forecasting model, whose simulation result demonstrated that the IR model gave the input vector of the strong pertinence of the prediction model, and reduced the redundancy of the input information, thus improving the accuracy of load forecasting.Liu et al. [31] also selected the optimal features for forecasting power load by adopting the IR model so as to reduce the redundancy of input vectors, and the IR model obtained an ideal feature selection effect.Using the IR model for feature selection can not only eliminate redundancy features by utilizing the inconsistency of the data set, but also take the correlative characteristics among the features into consideration, which does not ignore the relationship among features so that all the statistical information can be perfectly expressed by the selected optimal feature.Hence, this paper adopts the IR model for feature selection.
According to the above research, a GRNN model-integrated IR with the FOA is proposed.It is the first time these three models are combined for icing thickness forecasting and several comparing methods are utilized to validate the effectiveness of the proposed hybrid model.This paper is organized as follows: Section 2 introduces the implementation process of the IR and GRNN optimized by the FOA.Section 3 presents the evaluation criteria of the results.Section 4 provides a case to validate the proposed model.Section 5 analyzes another case in a different place at another time to prove the generalization of the forecasting method.Section 6 presents the conclusions in this paper.

Fruit Fly Optimization Algorithm
The FOA is a new global optimization method based on foraging behaviors.There are two steps for searching food of fruit fly swarm: (1) use the olfactory organ to collect odors floating in the air and fly towards the food location; and (2) adopt a view to find food and other fruit flies' gathering positions and fly towards that direction.The iterative food searching process of the fruit fly swarm is presented in Figure 1.
The steps of looking for the optimal features are as follows: (1) Initialize the population size Sizepop, the iterations Maxgen, and the position coordinates (X 0 , Y 0 ) of the random fruit fly population.(2) Give the individual fruit flies' random flight direction and step size so that they can find food by using the smell: (3) Since the fruit flies cannot obtain the food position, the distance Dist i between the individual and the origin of the flies is estimated first, and the taste concentration determination value S i is calculated: (4) Put the taste concentration determination value S i into the adaptation function Fitness to determine the taste concentration Smell i of the individual position.
(5) Identify the individual of the highest concentration among the fruit fly populations including the concentration and coordinates: (6) Retain the maximum taste concentration value best Smell and its individual coordinates.The fruit fly population uses vision to fly in that direction: Then, the stage of iterative refinement is entered; repeat steps (2)~( 5), and judge that whether the maximum taste concentration is superior to the previous generation, and whether the current iteration is less than the maximum number of iteration Maxgen, and if so, then execute step (6).The steps of looking for the optimal features are as follows: (1) Initialize the population size Sizepop, the iterations Maxgen, and the position coordinates (X0, Y0) of the random fruit fly population.(2) Give the individual fruit flies' random flight direction and step size so that they can find food by using the smell: Where

Data Inconsistency Rate
The aim of feature selection under large amounts of historical data of transmission line icing is to distinguish the data characteristics of the strongest correlation with respect to the icing thickness, so Energies 2017, 10, 2066 5 of 20 as to ensure that the input vector of the icing prediction model has strong pertinence, reducing the redundancy of input information and consequently improving the accuracy of icing prediction of the transmission lines.The inconsistent rate of the data can accurately describe the discrete characteristics of the input feature.Different feature patterns can be obtained by different division modes and different frequency distributions can be obtained by different partition patterns.The calculation of IR can be used to distinguish the distinguishing ability of the data category.The smaller the data IR is, the stronger the classification ability of the eigenvector is.
It is necessary to know the computing methods of the inconsistent rate if we want to perform feature selection by using the inconsistency method.Therefore, assuming that the collected icing thickness data has g characteristics (such as temperature, humidity, wind speed, etc.), which are respectively expressed as G 1 , G 2 , . . ., G g , Γ stands for feature set and L stands for the feature subset of Γ.Then, it is stipulated that qualification M has c categories and N data instances according to the degree of severity of the lines' icing.Z ji stands for the eigenvalue which corresponds to the feature F i , and M stands for the value λ i , so the data instances can be expressed as [z j , λ i ].Thereinto, Z j = [Z j1 , Z j2 , Z j3 , Λ, Z jg ].Therefore, the calculation formula of the data inconsistency rate is: In the formula, f kl is the number of data instances in the feature subset of the X K mode in the data set; and X k means that there are in total P patterns of feature partition range (k = 1, 2, . . ., p; p ≤ N).The steps for using the inconsistent rate to perform the feature selection are as follows: (1) Initialize the optimal feature subset as null set Γ = {}.
(2) Calculate the inconsistent rate of the data sets G 1 , G 2 , ..., G g in the feature subsets which are made up with the remaining feature of each subset.(3) Select the feature G i which corresponds to the minimum inconsistent rate as the optimum feature, and then update the optimum feature subset to Γ = {Γ, G i }. (4) Calculate the inconsistent rate statistics table of the feature subsets and arrange them from small to large.(5) Select the feature subsets L with the smallest number of features, which can be selected as the optimal feature subsets if they satisfy the condition that τ L ≈ τ Γ or τ L /τ L is the minimum of the inconsistent rate of all adjacent feature subsets.L is an adjacent feature subset of L.
Using calculating inconsistent rate can not only eliminate redundancy features by utilizing the inconsistency of the data set, but also take the correlative characteristics among the features into consideration, which does not ignore the relationship among features so that all the statistic information can be perfectly expressed by the selected optimal feature.

Generalized Regression Neural Network
The general regression neural network (GRNN) was proposed by the American scholar Donald F. Specht in 1991, with the theoretical basis of nonlinear regression analysis.As shown in Figure 2, the GRNN constitutes four components: (1) The input layer: the original variables enter the network which correspond to the neurons one by one and are submitted to the next layer.(2) The pattern layer: nonlinear transformation is applied to the values received from the input layer.
The transfer function of the ith neuron in the pattern layer is: Energies 2017, 10, 2066 6 of 20 where X represents input variable, X i is the learning sample corresponding to the ith neuron; and σ is the smoothing parameter.(3) The summation layer: calculate the sum and weighted sum of the pattern outputs.
The summation layer contains two types of neurons, in which one neuron S A makes arithmetic summation of the output of all pattern layer neurons, and the connection weight of each neuron in the pattern layer to this neuron is 1.Its transfer function is: The outputs of all neurons in the pattern layer were weighted and summed to gain the other neurons S Nj in the summation layer.The transfer function of the other neurons in the summation layer is: where y ij is the connection weight between the ith neuron in the pattern layer and the jth neuron in the summation layer.y ij is the jth element in the ith output sample y i .
(4) The output layer: the forecasting results can be derived.The output of each neuron is: where y j is the output of the jth neuron.

The Forecasting Model of FOA-IR-GRNN
The icing thickness forecasting model combining FOA, IR, and GRNN are constructed as illustrated in Figure 3.It can be seen from the figure that the model of icing prediction proposed in this paper mainly includes three parts: The first part is the feature selection based on the inconsistent rate, the second part is the sample training based on the GRNN model, and the third part is the icing prediction based on the GRNN model.When the established feature subset L cannot satisfy the algorithm stopping criteria, the program will continue to cycle until reaching the expected precision and then output the optimal feature subset.Therefore, in the model of icing prediction proposed in this paper, the purpose of the first part is to find the optimal feature subset and the best value of smoothing factor in the GRNN by iterative calculation.The purpose of the second part is to calculate the prediction accuracy of the training samples in every process of iteration, so that the fitness function can be calculated.In the third part, we will utilize the optimum feature subsets and parameters obtained from the above two parts and perform the final prediction of the icing thickness of the test samples by retraining the GRNN model.
The specific steps for icing thickness prediction are listed as follows:

The Forecasting Model of FOA-IR-GRNN
The icing thickness forecasting model combining FOA, IR, and GRNN are constructed as illustrated in Figure 3.It can be seen from the figure that the model of icing prediction proposed in this paper mainly includes three parts: The first part is the feature selection based on the inconsistent rate, the second part is the sample training based on the GRNN model, and the third part is the icing prediction based on the GRNN model.When the established feature subset L cannot satisfy the algorithm stopping criteria, the program will continue to cycle until reaching the expected precision and then output the optimal feature subset.Therefore, in the model of icing prediction proposed in this paper, the purpose of the first part is to find the optimal feature subset and the best value of smoothing factor in the GRNN by iterative calculation.The purpose of the second part is to calculate the prediction accuracy of the training samples in every process of iteration, so that the fitness function can be calculated.In the third part, we will utilize the optimum feature subsets and parameters obtained from the above two parts and perform the final prediction of the icing thickness of the test samples by retraining the GRNN model.

Performance Evaluation Index
The primary issue is to determine which forecasting model outperforms the other models, and the performance of the prediction models is usually assessed by statistical criteria: the relative error (RE), root mean square error (RMSE), mean absolute percentage error (MAPE) and average absolute error (AAE).The smaller the values of these four indicators are, the better the forecasting performance is.Furthermore, the indicators named RMSE, MAPE, and AAE can reflect the overall error of the prediction model and the degree of error dispersion.The smaller the values of these three indicators are, the more concentrated the distribution of errors is.These four error indexes are defined as follows: The specific steps for icing thickness prediction are listed as follows: (1) Determine the initial candidate feature.In this paper, we choose ambient temperature, relative humidity, wind speed, wind direction, light intensity, atmospheric pressure, altitude, condensation height, conductor direction, the height of conductor suspension, load current, precipitation, and conductor surface temperature, all of which are selected as the candidate features of the factors that influence icing.In addition, when it reaches the point t-i (i = 1, 2, 3, 4), thickness value, temperature, relative humidity and wind speed are also selected as the main influencing factors of line icing.All the initial candidate features are shown in Table 1.In the IR algorithm, the optimal feature subset needs to be initialized as an empty set Γ = {}. . ., G g in the feature subsets which are made up of the remaining features of each subset and then select the feature G i which corresponds to the minimum inconsistent rate as the optimum feature, and update the optimum feature as Γ = {Γ, G i }.
(4) Get the optimal feature subset and the best value of smoothing factor in GRNN.Put the current feature subsets into the GRNN model, and calculate the prediction accuracy during the learning process of the circular training samples.Then, the fitness function Fitness(j) can be worked out.We can get the optimum feature subset by comparing the fitness function among each generation and judge whether all iterations have achieved the algorithm stopping conditions.
If not, re-initialize a new feature subset and put it into a new circulation until the optimum feature subset which meets all the conditions is obtained.It should be noted that the smoothing factor of the GRNN also needs to be optimized, and the initial value of smoothing factor will be assigned randomly.In this paper, a fitness function is established based on the two factors of prediction accuracy and feature selection: In the formula, Numfeature(x i ) is the number of optimum feature which is selected by each iteration and both a and b are constants between [0, 1]; r(j) represents the prediction accuracy of ice cover thickness at each iteration.The optimal number of features is proportional to the fitness function for all iterations, and the accuracy of the icing prediction is inversely proportional to the fitness function.Different smoothing factors will result in different forecasting results and lead to different prediction accuracy, indicating that the smoothing factor of the GRNN also influences the value of fitness function Fitness(j).Hence the optimal feature subset and the best value of smoothing factor in the GRNN will be obtained at the same time in this step.( 5) Stop optimization and start prediction.Circulation ends at the maximum number of iteration.
Here, the optimum feature subset and the best value of smoothing factor can be substituted into the GRNN model for icing thickness forecasting.ST represents the surface temperature on the transmission line

Performance Evaluation Index
The primary issue is to determine which forecasting model outperforms the other models, and the performance of the prediction models is usually assessed by statistical criteria: the relative error (RE), root mean square error (RMSE), mean absolute percentage error (MAPE) and average absolute error (AAE).The smaller the values of these four indicators are, the better the forecasting performance is.Furthermore, the indicators named RMSE, MAPE, and AAE can reflect the overall error of the Energies 2017, 10, 2066 9 of 20 prediction model and the degree of error dispersion.The smaller the values of these three indicators are, the more concentrated the distribution of errors is.These four error indexes are defined as follows: where y t and y * t are the actual and forecast icing thickness at the time point t, respectively.N refers to the groups of data.

Data Collection and Pretreatment
In 2008, China was hit by a disaster of frozen rain and snow rarely seen in history.It brought huge losses to life, and seriously affected the national economy.Hunan Province was one of the worst hit provinces in this icing disaster.During the frozen period, the icing accident led to 182 towers with 500-kV power transmission lines falling down, 633 towers with 220-kV power transmission lines falling down, 1427 towers with 110-kV power transmission lines falling down, 1064 towers with 35-kV power transmission lines falling down, and 63,036 towers with 10-kV power transmission lines falling down.As for 10-kV and above, 50,000 wires were broken.Yueyang and Loudi (cities in Hunan Province) as well as other areas had large-area power outages.The Hunan power grid suffered the most serious threat in history, and the direct economic losses were up to more than 1 billion CNY.Therefore, this paper chooses the transmission line of the Hunan Province power grid to carry on the empirical analysis.
In this paper, the power transmission line, named "Kunxia line" in YueYang of Hunan Province is selected as the case to verify the effectiveness of the proposed model.All the data are provided by the Key Laboratory of Disaster Prevention and Mitigation of Power Transmission and Transformation Equipment (Changsha, China).
The data from the "Kunxia line" are from 10 January 2008 to 12 January 2008, and include 288 data groups.Here, taking 15 min as the data collection frequency, the first 230 groups are adopted as the training samples and the latter 58 are utilized as the testing samples in Case 1.The main micro-meteorology data, including temperature, wind speed, and humidity are shown in Figure 4.
In order to better train the proposed model and ensure the prediction accuracy, it is of significance to normalize all the original data in the range of [0, 1], and the processing equation is as follows: where x i is the actual value; x min and x max are the minimum and maximum values of the sample data respectively; and z i represents the value of the adjusted ith sample point.
the Key Laboratory of Disaster Prevention and Mitigation of Power Transmission and Transformation Equipment (Changsha, China).
The data from the "Kunxia line" are from 10 January 2008 to 12 January 2008, and include 288 data groups.Here, taking 15 min as the data collection frequency, the first 230 groups are adopted as the training samples and the latter 58 are utilized as the testing samples in Case 1.The main micrometeorology data, including temperature, wind speed, and humidity are shown in Figure 4.

Feature Selection
Based on the IR model, this section is about the selection of the optimal feature subset, and the determination of the input index of the prediction model.This paper uses Matlab R2014b for programming, and as for the test platform environment, we use the Intel Core i5-6300U, with 4G memory and the Windows 10 Professional Edition system.It can be seen from Figure 5 that the FOA converges when the number of iterations is 51, and the optimal fitness function is −0.88; at this time the prediction accuracy of the training sample is up to 98.6%.This shows that through the learning and training of the algorithm, the fitting ability of the GRNN is strengthened, and the prediction accuracy of training samples is the highest.Moreover, when the FOA runs the 51st time, the number of selected features also tends to be stable.It can be concluded that the algorithm eliminates 23 redundant features from 29 candidate features, and the final input features are the tth time point's ambient temperature, relative air humidity, wind speed and t − 1th time point's icing thickness, ambient temperature, relative air humidity.
extraction.The accuracy curve shown in the figure describes the prediction accuracy of the training samples which were made by the GRNN in different iterations.The fitness curve describes the fitness function values calculated during the process of iteration.The number of selected features indicates the optimal number of features calculated by the IR model in the convergence process.The number of feature reductions is the number of features that the FOA eliminates during the convergence process.It can be seen from Figure 5 that the FOA converges when the number of iterations is 51, and the optimal fitness function is −0.88; at this time the prediction accuracy of the training sample is up to 98.6%.This shows that through the learning and training of the algorithm, the fitting ability of the GRNN is strengthened, and the prediction accuracy of training samples is the highest.Moreover, when the FOA runs the 51st time, the number of selected features also tends to be stable.It can be concluded that the algorithm eliminates 23 redundant features from 29 candidate features, and the final input features are the tth time point's ambient temperature, relative air humidity, wind speed and t − 1th time point's icing thickness, ambient temperature, relative air humidity.

The GRNN for Icing Forecasting
After the optimal feature subset is obtained, put the input vector into the model proposed in this paper to train and test.The smoothing factor of the GRNN model is 0.0031, which is calculated by the running program.
A k-fold cross validation (K-CV) test is conducted here, so as to show whether the forecasting results of the proposed model is obtained at local optimal or global optimal location and whether this proposed model can be generalized to other unseen data.The K-CV test method divides the samples into k disjoint subsets randomly, each of which is roughly equal in size.Using k − 1 subsets, a model is established for a given set of parameters, and the RMSE of the remaining last subset is employed to evaluate the performance of the parameters.Repeat the procedure K times, and each subset has the opportunity to be tested.Hence, the 288 sets of data are randomly divided into 12 datasets, each of which has 24 groups of data, and they do not intersect with each other.After 12 operations, each sub data set is tested and the RMSE of the sample is obtained, which can be seen in Table 2. From Table 2, it can be found that the average RMSE value and the RMSE standard deviation of the proposed model is 0.0122 and 0.0010, respectively.It is indicated that the validation error of the icing prediction model proposed in this paper can obtain its global minimum.
In order to verify the performance of the proposed model, this paper employs the GRNN model which is not optimized by FOA.The mature BP neural network model and SVM model do the contrast experiments, supported by the test sample data in Section 4.1.In addition, the FOA-GRNN model without considering IR model for feature selection is also utilized for icing forecasting so as to demonstrate the effects of the IR and the FOA.The smoothing factor of the single GRNN model is 0. The actual values and forecasting values of the GRNN, BPNN, SVM, FOA-GRNN and the model presented in this paper are presented in Figure 6.The relative error of each model is shown in Figure 7. Figure 8 displays the RMSE, MAPE, and AAE of each prediction model.Table 3 displays part of the predicted values and errors.
Figure 6 and Table 3 describe the forecasting results of the five prediction models and the actual icing thickness.It can be seen from Figure 6 that the relative distance between the predicted and actual values of each prediction model.In general, the overall forecasting trends of the five models are close to the actual values.The forecasting curve of the proposed model is the closest to the actual curve, whereas the other prediction curves have some deviation.The forecasting curve of the FOA-GRNN is closer than that of the GRNN alone, demonstrating that the FOA makes the GRNN forecast better than the GRNN model without the FOA.However, the prediction accuracy of the FOA-GRNN model is not as good as the FOA-IR-GRNN model, indicating that feature selection method named the IR model can further improve the forecasting effectiveness of the GRNN.In addition, it can be found that the forecasting curve of the GRNN model is closer than the BPNN model and SVM model, indicating that the GRNN performs better than the BPNN and SVM for icing forecasting.Figure 7 reflects the relative error distribution of the four models.From Figure 7, the difference of prediction effect among different models can be seen more clearly.The RE ranges [−3%, 3%] and [−1%, 1%] are popularly regarded as a standard to evaluate the performance of a prediction model [32].
From Figure 7, we can obtain that: ( 1   The RMSE, MAPE, and AAE of BPNN, SVM, GRNN, FOA-GRNN, and FOA-IR-GRNN are shown in Figure 8. From Figure 8, we can conclude that the RMSE, MAPE, and AAE of the proposed model are 1.2326%, 1.2006%, and 1.2059%, respectively, which are all the smallest among the above four models.In addition, the RMSE, MAPE, and AAE of the FOA-GRNN model are 2.0485%, 1.9462%, and 1.9994% respectively; the RMSE, MAPE, and AAE of the GRNN model are 2.6514%, 2.5375%, and Energies 2017, 10, 2066 14 of 20 2.5086% respectively; the RMSE, MAPE, and AAE of the SVM model are 2.8999%, 2.8295%, and 2.8200% respectively; and the RMSE, MAPE, and AAE of the BPNN model are 3.6889%, 3.5612% and 3.5252% respectively.These indicators can reflect the overall error of the prediction model and the degree of error dispersion.Hence it can be further proved that the overall prediction effect of the GRNN model is better than that of the SVM model and the BPNN model, while the overall prediction effect of the SVM model is better than that of the BPNN model.The prediction accuracy of the FOA-GRNN model is better than that of the GRNN model, which demonstrates that adopting the FOA to choose the smoothing parameter in the GRNN model has achieved a satisfactory optimization effect.Meanwhile, the FOA-IR-GRNN model obtains better overall forecasting accuracy than the FOA-GRNN model.This result proves that the IR model not only reduces the redundant data, but also ensures the integrity of the input information, thus obtaining the ideal prediction results.Figure 7 reflects the relative error distribution of the four models.From Figure 7, the difference of prediction effect among different models can be seen more clearly.The RE ranges [−3%, 3%] and [−1%, 1%] are popularly regarded as a standard to evaluate the performance of a prediction model [32].From Figure 7, we can obtain that: (1) there are only nine relative error values of BPNN model in the range of [−3%, 3%] and only one value in the range of [−1%, 1%]; the maximum relative error is 4.99% at the 24th sample point, while the minimum is −4.98% at the sixth sample point; (2) the relative error of the SVM model has 35 forecasting points belonging to the range of [−3%, 3%], and there exist three forecasting points in the range of [−1%, 1%]; the maximum relative error value is 3.48% at the 15th sample point, and the minimum is −4.41% at the 51st point; (3) in the GRNN model, the relative errors of 43 sample points are in the range of [−3%, 3%], and the relative errors of five sample points are in the range of [−1%, 1%]; the maximum value is 3.38% at the 40th predicted point, while the minimum is −3.95% at the 23rd point; (4) there are 52 relative error values of the FOA-

Case Study 2
In order to verify the proposed model has good adaptability in different time and places, another case which selects the relevant data of the "Tianshang line" located in Loudi, Hunan Province, is provided in this paper.The study is carried out with data from 17    The results of the k-fold cross-validation for the icing prediction model proposed in this paper are described in Table 4.The forecasting results are displayed in Figure 11 and Table 5.The error analyses are presented in Figures 12 and 13.
As is shown in Table 4, the average RMSE value and RMSE standard deviation of the proposed model are 0.0118 and 0.0011, respectively.These data illustrate the fact again that the generalization performance of the icing prediction model proposed in this paper has been improved.The iterative process of sample data of "Tianshang line" by employing the FOA-IR-GRNN model is presented in Figure 10.From Figure 10, we can conclude that the optimal fitness function calculated by the IR model is −0.91.When the FOA achieves the optimum in the 47th iteration, the prediction accuracy of the sample reaches 98.3%.It can also be seen that 25 redundant features are eliminated from 29 candidate features, and the final input features include the tth time point's ambient temperature, relative air humidity, wind speed and the t − 1th time point's icing thickness.In addition, the smoothing factor of the GRNN was 0.0056, optimized by the FOA.The results of the k-fold cross-validation for the icing prediction model proposed in this paper are described in Table 4.The forecasting results are displayed in Figure 11 and Table 5.The error analyses are presented in Figures 12 and 13.
As is shown in Table 4, the average RMSE value and RMSE standard deviation of the proposed model are 0.0118 and 0.0011, respectively.These data illustrate the fact again that the generalization The results of the k-fold cross-validation for the icing prediction model proposed in this paper are described in Table 4.The forecasting results are displayed in Figure 11 and Table 5.The error analyses are presented in Figures 12 and 13.As is shown in Table 4, the average RMSE value and RMSE standard deviation of the proposed model are 0.0118 and 0.0011, respectively.These data illustrate the fact again that the generalization performance of the icing prediction model proposed in this paper has been improved.It can be concluded from Figure 11 and Table 5 that the predicted value of the FOA-IR-GRNN model is the closest to the actual value, which demonstrates that the proposed model is not only accurate but also has robustness.When comparing the forecasting curves of the FOA-IR-GRNN model and the FOA-GRNN model, we can conclude that adopting the IR model for feature selection can significantly improve the prediction accuracy, in that this feature selection method can enhance the effectiveness of input information.Furthermore, the forecasting curve of the FOA-GRNN model is closer than that of the GRNN model, indicating that in addition to the IR model, the FOA also makes a significant contribution to the improvement of GRNN prediction accuracy.Compared with SVM and BPNN, the forecasting value of the GRNN model is closer to the actual ice thickness, which demonstrates once again that the approximation and classification ability of the GRNN model is better than that of the SVM model and the BPNN model, and the GRNN emerges with better performance in dealing with unstable data.
Figure 12 presents the relative error of the four models.As the calculation results shown, we can conclude that: (1) the fitting and learning ability of the FOA-IR-GRNN model is the strongest, in that  It can be concluded from Figure 11 and Table 5 that the predicted value of the FOA-IR-GRNN model is the closest to the actual value, which demonstrates that the proposed model is not only accurate but also has robustness.When comparing the forecasting curves of the FOA-IR-GRNN model and the FOA-GRNN model, we can conclude that adopting the IR model for feature selection can significantly improve the prediction accuracy, in that this feature selection method can enhance the effectiveness of input information.Furthermore, the forecasting curve of the FOA-GRNN model is closer than that of the GRNN model, indicating that in addition to the IR model, the FOA also makes a significant contribution to the improvement of GRNN prediction accuracy.Compared with SVM and BPNN, the forecasting value of the GRNN model is closer to the actual ice thickness, which demonstrates once again that the approximation and classification ability of the GRNN model is better than that of the SVM model and the BPNN model, and the GRNN emerges with better performance in dealing with unstable data.
Figure 12 presents the relative error of the four models.As the calculation results shown, we can conclude that: (1) the fitting and learning ability of the FOA-IR-GRNN model is the strongest, in that It can be concluded from Figure 11 and Table 5 that the predicted value of the FOA-IR-GRNN model is the closest to the actual value, which demonstrates that the proposed model is not only accurate but also has robustness.When comparing the forecasting curves of the FOA-IR-GRNN model and the FOA-GRNN model, we can conclude that adopting the IR model for feature selection can significantly improve the prediction accuracy, in that this feature selection method can enhance the effectiveness of input information.Furthermore, the forecasting curve of the FOA-GRNN model is closer than that of the GRNN model, indicating that in addition to the IR model, the FOA also makes a significant contribution to the improvement of GRNN prediction accuracy.Compared with SVM and BPNN, the forecasting value of the GRNN model is closer to the actual ice thickness, which demonstrates once again that the approximation and classification ability of the GRNN model is better than that of the SVM model and the BPNN model, and the GRNN emerges with better performance in dealing with unstable data.
Figure 12 presents the relative error of the four models.As the calculation results shown, we can conclude that: (1) the fitting and learning ability of the FOA-IR-GRNN model is the strongest, in that its relative errors are all in the range of [−3%, 3%] and there exist 16 sample points belonging to the range of [−1%, 1%]; the maximum relative error is 2.21% at the 24th point, and the minimum value is −1.85% at the 33rd point; (2) there exist 55 relative error values of the FOA-GRNN model in the range of [−3%, 3%] and nine values in the range of [−1%, 1%]; the maximum relative error is 3.37% at the 33rd sample point, while the minimum is −3.70% at the 41st sample point; (3) the GRNN model emerges with 49 sample points in the range of [−3%, 3%], while seven sample points are in the range of [−1%, 1%]; the maximum value is 3.84% at the 39th point, and the minimum is −4.19% at the 35th point; (4) the SVM model emerges with 27 sample points in the range of [−3%, 3%], and there are five sample points in the scope of [−1%, 1%]; the maximum value is 4.24% at the tenth point, while the minimum is −5.82% at the 25th point; and (5) the BPNN model has nine points belonging to the range of [−3%, 3%], and there are only two points in the scope of [−1%, 1%]; the maximum value is 5.94% at the 45th point, while the minimum is −5.89% at the 23rd point.This further demonstrates that the nonlinear fitting ability of the proposed model is the strongest so that its prediction accuracy and robustness are both the most satisfactory.
The RMSE, MAPE and AAE of the four prediction models are shown in Figure 13.It can be concluded that the RMSE, MAPE, and AAE values of the FOA-IR-GRNN model are still the lowest, which are 1.2016%, 1.1534% and 1.1535%, respectively.It is proved that the proposed model can obtain the highest prediction accuracy and the best stability under different conditions.This model can eliminate the interference of redundant factors through feature selection, so as to ensure the accuracy and stability of prediction.This result is consistent with the results obtained in Section 4.3.
In summary, the proposed model optimizes the GRNN model with the FOA, and obtains the appropriate smoothing parameter in the GRNN model, which can effectively reduce the icing prediction error.The IR model can not only reduce the noise data of the input variables to improve the effectiveness of input information, but also ensure the integrity of the input information, thus improving the accuracy and robustness of icing prediction.The validity of the proposed ice prediction model is proved by the data calculation results.

Conclusions
This paper presents a hybrid icing forecasting model that combines IR with GRNN optimized by FOA.First, in order to predict the icing thickness, the IR combined with the FOA is employed to select the input feature.Furthermore, the FOA is adopted to optimize the smoothing factor of the GRNN.Finally, after obtaining the optimized feature subset and the best value of smoothing factor, the proposed model is utilized for icing forecasting.Several conclusions based on the studies can be obtained as follows: (1) by the utilization of IR, the influence of unrelated noises can be reduced and the forecasting performance can be effectively improved; (2) the optimization algorithm FOA adds strong global searching capability to the model, and the GRNN model optimized by FOA shows good performance; (3) based on the error valuation criteria, the FOA-IR-GRNN model is a more promising methodology in icing forecasting as compared with the three classical icing forecasting models (SVM, BPNN, and GRNN); and (4) according to the empirical analysis of two cases, it is found that the model proposed in this paper still has good prediction performance for forecasting the icing thickness of transmission lines at different times and places.Hence, the proposed icing forecasting method of the FOA-IR-GRNN model is effective and feasible, and it may be an effective alternative for icing forecasting in the electric-power industry.

Figure 1 .
Figure 1.Iterative food searching process of the fruit fly swarm.

Figure 1 .
Figure 1.Iterative food searching process of the fruit fly swarm.

Figure 2 .
Figure 2. The structure of the generalized regression neural network (GRNN).

Figure 2 .
Figure 2. The structure of the generalized regression neural network (GRNN).

( 2 )
Initialize the parameters of FOA.Suppose the population size is 20, the maximum iteration number is 200 and the range of random flight distance is set as [−10, 10].(3) Calculate the inconsistent rate.After completing steps (1) and (2), put the candidate features into the IR feature selection model gradually.Calculate the inconsistent rate of the data sets G 1 , G 2 , .

Figure 4 .
Figure 4. Original data chart of icing thickness, temperature, wind speed and humidity.Note: (a) represents the original data of icing thickness; (b) represents the original data of temperature; (c) represents the original data of wind speed; and (d) represents the original data of humidity.

Figure 4 .
Figure 4. Original data chart of icing thickness, temperature, wind speed and humidity.Note: (a) represents the original data of icing thickness; (b) represents the original data of temperature; (c) represents the original data of wind speed; and (d) represents the original data of humidity.

Figure 5
presents the iteration process of the FOA-IR-GRNN model for training sample feature extraction.The accuracy curve shown in the figure describes the prediction accuracy of the training samples which were made by the GRNN in different iterations.The fitness curve describes the fitness function values calculated during the process of iteration.The number of selected features indicates the optimal number of features calculated by the IR model in the convergence process.The number of feature reductions is the number of features that the FOA eliminates during the convergence process.

Figure 5 .
Figure 5.The curve of convergence for feature selection.Note: (a) represents the fitness value; (b) represents the forecasting accuracy; (c) represents the reduced number of candidate feature; and (d) represents the selected number of optimization feature.

Figure 5 .
Figure 5.The curve of convergence for feature selection.Note: (a) represents the fitness value; (b) represents the forecasting accuracy; (c) represents the reduced number of candidate feature; and (d) represents the selected number of optimization feature.
2, while the smoothing factor of the FOA-GRNN model without considering the IR model is 0.1026.The topological structure of the BPNN model is 9-7-1, and the hidden layer transfer function is expressed by the tansig function.The output layer transfer function is expressed as purelin function.The maximum number of trainings is 100 and the minimum error of the training target is 0.0001.The training rate is 0.1.The initial weights and thresholds are obtained by their own training.In the SVM model, the penalty parameter c is 9.236 which is obtained by the training, the kernel function parameter g is 0.0026, and the ε loss function parameter p is 2.3572.

Figure 6 .
Figure 6.The forecasting values of the proposed method and the comparison methods.Note: (a) the forecasting value from sample points 1-20; (b) the forecasting value from sample points 20-40; (c) the forecasting values from sample points 41-58.BPNN: back-propagation neural network; SVM: support vector machine.

Figure 6 .
Figure 6.The forecasting values of the proposed method and the comparison methods.Note: (a) the forecasting value from sample points 1-20; (b) the forecasting value from sample points 20-40; (c) the forecasting values from sample points 41-58.BPNN: back-propagation neural network; SVM: support vector machine.

21 Figure 6 .
Figure 6.The forecasting values of the proposed method and the comparison methods.Note: (a) the forecasting value from sample points 1-20; (b) the forecasting value from sample points 20-40; (c) the forecasting values from sample points 41-58.BPNN: back-propagation neural network; SVM: support vector machine.

Figure 7 .
Figure 7.The relative error curve of each method.Figure 7. The relative error curve of each method.

Figure 7 .
Figure 7.The relative error curve of each method.Figure 7. The relative error curve of each method.
January 2008 to 10 February 2008 as the training set and data from 11 February 2008 to 15 February 2008 as the testing set.Here, we take

Figure 9 .
Figure 9. Original data chart of icing thickness, temperature, wind speed and humidity.Note: (a) represents the original data of icing thickness; (b) represents the original data of temperature; (c) represents the original data of wind speed; and (d) represents the original data of humidity.

Figure 10 .
Figure 10.The curve of convergence for feature selection.Note: (a) represents the fitness value; (b) represents the forecasting accuracy; (c) represents the reduced number of candidate feature; and (d) represents the selected number of optimization feature.

Figure 9 .
Figure 9. Original data chart of icing thickness, temperature, wind speed and humidity.Note: (a) represents the original data of icing thickness; (b) represents the original data of temperature; (c) represents the original data of wind speed; and (d) represents the original data of humidity.

Energies 2017, 10 , 2066 16 of 21 Figure 9 .
Figure 9. Original data chart of icing thickness, temperature, wind speed and humidity.Note: (a) represents the original data of icing thickness; (b) represents the original data of temperature; (c) represents the original data of wind speed; and (d) represents the original data of humidity.

Figure 10 .
Figure 10.The curve of convergence for feature selection.Note: (a) represents the fitness value; (b) represents the forecasting accuracy; (c) represents the reduced number of candidate feature; and (d) represents the selected number of optimization feature.

Figure 10 .
Figure 10.The curve of convergence for feature selection.Note: (a) represents the fitness value; (b) represents the forecasting accuracy; (c) represents the reduced number of candidate feature; and (d) represents the selected number of optimization feature.

Figure 11 .
Figure 11.The forecasting values of the proposed method and the comparison methods.Note: (a) the forecasting value from sample points 1-20; (b) the forecasting value from sample points 21-40; and (c) the forecasting value from sample points 41-60.

Figure 11 .
Figure 11.The forecasting values of the proposed method and the comparison methods.Note: (a) the forecasting value from sample points 1-20; (b) the forecasting value from sample points 21-40; and (c) the forecasting value from sample points 41-60.

Figure 12 . 21 Figure 12 .
Figure 12.The relative error curves of each method.

Table 1 .
The full candidate features.

Table 2 .
Results of the k-fold cross-validation.

Table 3 .
Part of the forecasting value and relative errors of each model.

Table 3 .
Part of the forecasting value and relative errors of each model.

Table 4 .
Results of the k-fold cross-validation.

Table 5 .
Part of the forecasting value and relative errors of each model.