An Enhanced Genetic Algorithm for Parameter Estimation of Sinusoidal Signals

: Estimating the parameters of sinusoidal signals is a fundamental problem in signal processing and in time-series analysis. Although various genetic algorithms and their hybrids have been introduced to the ﬁeld, the problems pertaining to complex implementation, premature convergence, and accuracy are still unsolved. To overcome these drawbacks, an enhanced genetic algorithm (EGA) based on biological evolutionary and mathematical ecological theory is originally proposed in this study; wherein a prejudice-free selection mechanism, a two-step crossover (TSC), and an adaptive mutation strategy are designed to preserve population diversity and to maintain a synergy between convergence and search ability. In order to validate the performance, benchmark function-based studies are conducted, and the results are compared with that of the standard genetic algorithm (SGA), the particle swarm optimization (PSO), the cuckoo search (CS), and the cloud model-based genetic algorithm (CMGA). The results reveal that the proposed method outperforms the others in terms of accuracy, convergence speed, and robustness against noise. Finally, parameter estimations of real-life sinusoidal signals are performed, validating the superiority and effectiveness of the proposed method.


Introduction
Many practical signals, such as voice and audio signals, power system transient signals, and response signals of sensors, are recognized as signals of the sum of sinusoidal signals. The parameter estimation of sinusoidal signals from the periodic time series data is a classical problem of ongoing interest in signal processing. In recent years, it has drawn considerable attention of scientists and researchers from various fields including speech analysis [1], electrical power and energy system [2], and sensor signal analysis [3], among others. There are some methods addressing this problem; in general, these can be classified into two categories, classical techniques [3][4][5][6][7][8][9][10], and intelligent algorithms [11][12][13][14][15][16][17][18][19][20]. The classical techniques, when compared with intelligent ones, are easier to implement and are computationally more efficient. However, they do have limitations, including frequency resolution, low accuracy, mathematical demands, and so on. On the other hand, the intelligent algorithms and hybrids are favorably employed due to their fewer requirements about the problems and excellent resolution. A few recent applications include atomic physics [21], electrical machines [22], big data [23], traveling salesman problem [24], and fuel cell [25]. Thus, intelligent techniques have extended significantly to provide effective and reliable solutions for parameter estimations of sinusoidal signals [11][12][13][14][15][16][17][18][19][20][26][27][28][29] accordingly. In a literature [11], Spavieri et al., used a particle swarm optimization (PSO)-based approach for the parameterization of a power capacitor model fed by harmonic voltages in power distribution systems. For parameter estimation of exponentially damped sinusoidal signals, Xiao et al. [15] presented a specific artificial neural network (ANN) based estimator, by which the parameters of each exponentially damped sinusoidal signal component could be directly calculated with a high precision. However, the performance strongly relies on the converged weights of the ANN, and hence, an absence of optimal weights or insufficient training will result in a poor performance. Based on a genetic algorithm (GA), Coury et al. [18,20] presented an efficient estimator to evaluate the frequency and phase for a phasor management unit. In comparison to the phase-locked loop (PLL) and the discrete Fourier transform (DFT) based methods, the GA revealed the fastest response to a sufficient precision. In order to minimize bias and improve estimation performance, researchers generally combine different methods to overcome the individual drawbacks. In another article, Djurović et al. [17] presented a hybrid genetic algorithm (HGA) fused with a maximum likelihood approach to estimate the phase parameters of frequency modulated signals. Zang et al. [30] proposed a cloud model-based genetic algorithm (CMGA) for optimization problems. Benefiting from the cloud model, the CMGA continuously produces new individuals, and hence, the exploration and randomness can be significantly improved, showing a better optimization when compared to other methods. In the literatures [31][32][33], Raja et al., introduced some hybrid techniques with ANN, GA, and other algorithms for the dynamics of nonlinear singular heat conduction model of the human head, nonlinear Painlevé II systems, and nonlinear singular Thomas-Fermi systems. Comparisons between the proposed schemes and the standard numerical solutions as well as analytical methods revealed the feasibility and effectiveness of the proposed schemes. Similarly, a hybrid approach using a differential evolution (DE) and GA was proposed by Ali et al. [34], where a DE-based multi-parent crossover operation was introduced to enhance the search ability and to avoid premature convergence by exploring more solutions in the problem search space. In combination with a local search optimization algorithms, Zaman et al. [35] proposed an HGA to estimate the amplitude, frequency, range, elevation angle, and azimuth angle of near field sources; the proposed schemes achieved good results in terms of accuracy, convergence rate, and robustness, but the complexity increased a lot.
In the light of these facts, authors are motivated in this work to design an enhanced genetic algorithm (EGA), through a prejudice-free selection mechanism for preserving population diversity, diversified individuals with a two-step crossover (TSC) operator, and a good cooperation of fast convergence and searching performance with an adaptive mutation strategy, for estimating the parameters of sinusoidal signals. The prominent features of the proposed method are briefly summarized as follows: • Validation through the results of a comparative analysis in terms of performance-monitoring metrics based on the mean, coefficients of determination (R 2 ), and the sum of squared residuals value (SSR) of each algorithm; • An intelligible concept with easy realization and a worry-free tradeoff between global and local search in comparison to general hybrid techniques.
The rest of the work is organized as follows. The mathematical ecological theory foundation of the proposed approach is presented in Section 2. Section 3 describes the proposed EGA in detail, including the prejudice-free selection, the two-step crossover operator, and the adaptive mutation operator. In Section 4, benchmark function based numerical analyses are carried out to verify the validity of the EGA. The parameter estimations of real-life sinusoidal signals are performed and discussed in Section 5. At last, we summarize our work and highlight the main contributions in Section 6.

Mathematical Ecological Theory Foundation
In ecology, the number of offspring restricted to be equal to or less than that of parents is not common. Even though this may happen for reasons such as disease, food, water, and other factors, the species may become extinct or be restricted due to genetic competition as well. When the number of offspring is more than two, the population size grows rapidly with more competitive individuals [24], resulting in an increased chance for producing better offspring. Thus, it is desirable to have a larger number of individuals in genetic algorithms.
Population dynamics were first studied by Verhulst [36] in 1838, who describes the size and age composition of populations as dynamical systems driven by biological and environmental processes. The population dynamics model of a species can be mathematically expressed by: where P represents population size, t is time, r defines the growth rate, and K is a constant denoting the environment carrying capacity. The solution to Equation (1) with an initial population P 0 can be written as: Taking the birth and mortality rate into account, Equation (2) can be rewritten as: where ∆P is the total population size, and r 1 and r 2 are birth rate and mortality rate, respectively. Over time, the population dynamics presents different consequences in accordance with the relations between r 1 and r 2 .
1. When the birth rate is higher than the mortality rate, r 1 > r 2 , the final population size for an infinite amount of time, t, can be simplified as: This means that a species can exist forever.
2. When the birth rate is equal to the mortality rate, r 1 = r 2 , the exponential terms in the numerator and denominator become 1, so Equation (3) is given as follows: Here, as time approaches infinity, the final population size will equal to the initial value. What if the initial population size equals to 1 or 2? Although this species can live for a long time, the problem of self-reproduction and inbreeding results in an extinction.
3. When the birth rate is smaller than the mortality rate, r 1 < r 2 , it is clear that e −x → 0 as t → ∞. Hence, one can get the following expression: Obviously, this equation represents an inevitable extinction of a species.
According to the analysis above, on the one hand, it can be concluded that the extinction of a species occurs if the birth rate is equal to or less than the mortality rate; only when r 1 > r 2 can the species survive forever. On the other hand, it is found that the final population size strongly depends on both the growth rate and the time when the initial population size is fixed. Thus, in order to increase the number of offspring and get excellent individuals within a limited time, it is very necessary to improve the reproduction rate of a species [24]. In the case of the standard genetic algorithm (SGA), as the procreation is artificially controlled to be unchangeable, the extinction of the population of SGA is not allowed. Hence, neither is the evolutionary law consistent with the natural law, nor will it have a chance to acquire more excellent individuals. In this paper, based on the biology theory and mathematical ecological theory foundation, a genetic algorithm with several enhancements is proposed in order to achieve a higher accuracy, and faster convergence, and hence, to further improve the parameter estimation results of sinusoidal signals.

Prejudice-Free Selection
Elitism-based simple selection strategies [37] are popularly used in GAs, which favor the higher fitness chromosomes (individuals) and discriminate against the lower ones. Consequently, outstanding individuals dominate in iterations leading to a monotonous population diversity and breaking up a healthy competition. Besides, in the SGA the production mechanism of offspring using a random pairing through two individuals will reduce the population quality, because an outstanding chromosome may combine with a poor one and lose its superior genes. To settle this, in the EGA the individuals are divided into two groups according to their fitness values: the best half individuals with high fitness values are in a benign group, while the remaining ones are assigned to a malignant group; the evolution in each group takes place independently, and a new selection is always executed to redistribute them in every iteration. So, offspring may stay in their original group or migrate to the other in accordance with their new fitness values. Compared to traditional selections, this operation not only reduces the overhead of selection resulting from a non-bias selection procedure [38], but also it allows a parallel computation providing a faster convergence [39]. Thanks to this, the population diversity and efficient convergence can be guaranteed.

Two-Step Crossover (TSC) Operator
In GAs, crossover operation simulates the processes of sexual reproduction, through which offspring inherit parts of genetic information from their parents. However, conventional crossover operators just follow the genetic concept, weakening the search ability [40,41]. For instance, in a single-point crossover (SPC), two individuals are first selected as the parent chromosomes, then a randomly-generated point is used to truncate the chromosomes and get two sub-sequences; the fragments behind the point are exchanged with each other and coupled with the front ones to generate two new offspring. The principle of a SPC operation is shown in Figure 1. For a given population of a size of N × M, the number of the newly generated variables after a SPC operation can be calculated by [NP c2 ], where N is the number of individuals, M denotes the length of each chromosome, M = n × l, n is the number of the variables defined by the chromosomes, l is the coding length of each variable, and P c2 is the crossover probability. The symbol [·] rounds the number toward positive infinity. It is clear that a SPC can produce new individuals and change the population diversity. However, only the variables whose sequences are trimmed are updated during the operation, and hence, the population diversity is still restricted. Although other types of crossover operations such as two-point crossover and multi-point crossover can mitigate this effect, the similar operation mechanism dooms the same outcome.
In this regard, a two-step crossover strategy that comprehensively considers the information exchange among entities and variables is designed in this work. The details are described as follows: 1. Dividing each chromosome into segments based on the number of the variables to be solved, and gathering all the specific segments for a certain variable to form a variable set for the related variable; 2. Randomly selecting some variable sets with a probability P c1 ; 3. Implementing crossover operation on each elected set to update the variables with a probability P c2 ; 4. Once the crossover is completed, reassembling the variable sets together to get the new individuals.
For a better understanding, a SPC is employed in the TSC to illustrate the mechanism, as shown in Figure 2. Here, the number of the newly produced variables by the TSC can be calculated by where P c1 is the first crossover probability, and the other parameters are the same as the ones mentioned above. The difference between the newly generated variables between the TSC and SPC is mathematically expressed as With some mathematical manipulations, this expression can be simplified as [nNP c2 ] · (P c1 − 1/n). It can be known that a positive value is always achieved if (P c1 − 1/n) > 0. So, by simply adjusting P c1 , the number of the updated variables yielded by the TSC can be much more than that of the SPC, which amounts to indirectly increase the number of the offspring [24]. Hence, the population diversity can be significantly improved. Besides, notice that the applied SPC in the TSC can also be replaced by other types of crossover operations; for the case of multi-point crossover, a considerable amount of new variables can be obtained.

Select variables using Pc1
Crossover operation using Pc2  Figure 2. The two-step crossover operation for floating sequence.

Adaptive Mutation Operator
The mutation operator has a strong influence on population diversity and convergence speed, playing a significant role in GAs. Usually, mutation probability, P m , keeps constant throughout the whole iteration in the SGA and HGAs [16,19,36,38,41]. Although a constant mutation probability provides GAs a relatively stable performance, the problem of ever-increasing similar individuals cannot be effectively solved. To address this, an adaptive mutation mechanism that considers the evolutionary features of GAs at different stages is proposed in this work.
At the beginning of an evolution, a small P m is allocated for boosting the convergence [42]. As the processing progresses, the number of similar individuals increases rapidly, which causes the optimization to fall back to local optima repeatedly. In order to overcome this, a larger mutation probability is assigned at the intermediate stage so that the evolution can easily get rid of predicaments. In the final stage, a small probability is distributed again to ensure a global convergence [37].
Inspired also by the characteristics of population dynamics, a combination of a general Logistic function [36] and its mirror is employed to support the proposed idea in its entirety. The function is expressed as follows: where a denotes the initial mutation probability, b is the maximum value of the original Logistic curve, e is the natural logarithm base, c describes the steepness of the curve, g and g 0 are the evolution generation and the curve's midpoint, respectively. G is the total number of evolutionary iterations. Figure 3 shows the curves of the functions.

EGA Procedures
To solve a problem using the proposed EGA, N individuals are randomly generated by a float encoding method as the initial population Init_Pop. Each individual's objective value and fitness value are calculated, and the best individual BestX_Old in the current population is obtained. In order to keep population diversity and promote a healthy competition, individuals are assigned to a Benign group or Malignant group with the same size of N/2, respectively, according to their fitness values. As the individuals in the Benign group have higher fitness values, performing a crossover operation in this group is helpful to generate better descendants. Therefore, the TSC operation with probabilities P c1 and P c2 is run for them only. Instead of being discarded directly, in the Malignant group, the remaining individuals with a low fitness but rich population diversity go through the adaptive mutation. Once the crossover and mutation are complete, the fitness values of the individuals in the new population New_Pop will be evaluated. The individuals in the Malignant group that have a high fitness will migrate to the Benign group, and a reverse movement is also carried out synchronously for the ones in the Benign group having a low fitness. Again, the best individual Best_New in the current population will be picked up against the Best_Old, and the better one is saved as Best_Temp. As the optimum may hide near Best_Temp, some random individuals (e.g., 10) are generated in this area, and their fitness values are calculated. If someone is better than Best_Temp, they will be reassigned as Best_Temp; else, Best_Temp keeps its original value. At the end of each iteration, BestX_Old and Old_Pop are updated by Best_Temp and New_Pop, respectively.
The termination criterion will be triggered when the maximum iteration MaxGen reaches. A more clear procedure of the EGA is shown in Figure 4 and Algorithm 1.

Benchmark Function Study
Benchmark functions [14,30,34,39,43] are widely adopted to demonstrate the performance of algorithms. Thus, in the present work, benchmark based numerical experiments are performed for the EGA as well. Here, eight unimodal and multimodal functions [40] are selected (see Appendix A).
As a comparison, studies using the SGA [44] as well as the PSO [11,29], the CS [28,29], and the CMGA [30] are also performed for the functions; the corresponding parameter configurations are listed in Table 1. Without favoritism, all the algorithms are assigned the same size initial population, N = 50, and the maximum iteration number, MaxGen = 200. The optimization results are plotted in Figures 5 and 6. To have an insight into how well these algorithms performed, all the vertical coordinates are displayed on a logarithmic scale.  Figures 5 and 6, it can be seen directly that the performance of the EGA is superior to that of the SGA, PSO, CS, and CMGA for most of the benchmark functions, except for the Zakharov function ( f 2 (x)). Examining the optimization results of f 2 (x) shown in Figure 5b, it is noted that the evolutionary curve of the SGA displays in part. This is caused by the estimating value of zero (equaling to the optimum), and the logarithmic y-axis. In this regard, the SGA performs better than the EGA. However, when compared to other results, the proposed EGA shows overwhelming advantages against others, especially for Yang's No. 5 function ( f 8 (x)). Different from other functions, f 8 (x) contains uniform distribution terms, which increases not only the number of local optima but also the uncertainty, and hence, the optimization becomes much more difficult. Taking a look at the results in Figure 6d, frequent vibrations are observed in the evolution curves of the SGA and PSO, while the EGA shows an unremitting evolution trend, demonstrating its robustness against noise. (c) (d) To further examine the performance, the obtained best and mean solutions as well as the convergence speed of each function were compared and listed in Tables 2-4, respectively. All the results were obtained over 50 Monte-Carlo simulations. Through Table 2, it is evident that the EGA outperforms the SGA, PSO, CS, and CMGA for most cases. Looking at the optimization results for f 2 (x), although only the SGA achieves the global optimum, the EGA still prevails over the others when comparing the rest optimizations. The results listed in Table 3 also confirm this point. In terms of the convergence speed, the maximum, the minimum, and the average termination iterations were measured, where the termination criterion was defined by O opt − O obt < 10 −4 , O opt is the optimum of a given problem, and O obt is the obtained optimized result, respectively. From Table 4, one can find that the convergence speed of the proposed EGA is fast, presenting a strong competitiveness over its counterparts. From these comparisons, it can be concluded that the EGA shows an overwhelming superiority over the competitors in terms of accuracy, convergence speed, and robustness against noise. Table 2. Comparison of the best optimal solutions of standard genetic algorithm (SGA), particle swarm optimization (PSO), cuckoo search (CS), cloud model-based genetic algorithm (CMGA), and EGA.

Parameter Estimation of Sinusoidal Signals
According to the numerical analyses above, the optimization performance of the EGA has already been demonstrated. To evaluate its performance for parameter estimations of sinusoidal signals, two real-life cases are selected, the voice data from singing the vowel 'ooh' [45], and the Circadian Rhythms [46]. As with the numerical analyses, the SGA, PSO, CS, and CMGA were also employed.

The Voice Dataset
The voice dataset gives the magnitudes of the voice when the vowel 'ooh' was sung at a pitch of 290 Hz. The frequencies and amplitudes found in the signal are used to determine the phonetic vowel, and are of interest in voice synthesis, therapy, and training. In the present work, a ten-parameter model is utilized for the signal as expressed as follows: where K is the offset, A denotes the amplitude of sinusoid signal, ω and ϕ are the frequency and phase, respectively. The parameter initialization is set as The corresponding parameter configurations for the algorithms were the same as the ones in Table 1, except for the crossover and mutation parameters of the EGA. The new values were set as [P c1 , P c2 ] = [0.8, 0.7], and [P m1 , P m2 ] = [0.05, 0.22], respectively. The same N = 200, and MaxGen = 500 were assigned for all the algorithms as well. Figure 7 shows the original data as well as the fitting curves, from which one can observe that all the curves follow the pattern well. For having an insight into the difference of the algorithms, the estimated parameters, and the coefficients of determination R 2 were examined, listed in Table 5. As can be seen from the table, the estimated parameters are very close, it is difficult to arrive at a conclusion. However, by comparing the values of the R 2 , the performance of the EGA can be determined, which is the best among the algorithms. In other literature, Smyth et al. [47] considered the same dataset for frequency estimation, their reported result was 0.2299, 0.3408, and 0.1134, which are very close to the results in this work.

The Circadian Rhythms
The Circadian Rhythms was an experiment that recorded the temperature of a long-tail pocket mouse every two minutes over three months. As some problems arose during the experiment, a proportion of outliers remained in the dataset. Though these outliers make the estimation quite difficult, they are meaningful and necessary for verifying the performance of an algorithm, because noise is ubiquitous in real-life signals. In this work, a 20 min temperature sample averaged from the dataset was extracted, and a four-component sinusoid model was used for the estimation, expressed as Equation (9): where K is the offset, A denotes the amplitude of sinusoid signal, ω and ϕ are the frequency and phase, respectively. The parameter initialization for the estimation was set as K ∈ [y m − 50, y m + 50], A ∈ [−100, 100], ω ∈ [0, π], and ϕ ∈ [0, π]. Parameter y m is the median of the real data. The parameter configurations for the algorithms were the same as those set in Section 5.1, except the ones for the EGA, whose new settings were: [P c1 , P c2 ] = [0.7, 0.7], and [P m1 , P m2 ] = [0. 15, 0.35]. The corresponding evaluated results are illustrated in Figure 8 and Table 6. Figure 8 displays the original data and the best fitting curve. From this graph, it is evident to see that the dataset contains a large number of outliers, but the fitting curve successfully ignores them and follows nicely the periodic trend. In addition, the less evident outliers close to the fitting curve did not distort the data fitting, either. To examine the results in Table 6, we found that the sum of squared residuals (SSR) of the EGA is less than that of the others, indicating that the performance of the EGA surpasses its competitors. Considering the same dataset, it was reported that the fitted frequency by the elemental set method [48] was 0.0873, which once again is close to our frequency estimation.

Conclusions
This work presents an EGA for estimating the parameters of sinusoidal signals. The proposed algorithm takes into account the preservation of population diversity, the balance between convergence speed and accuracy, and the implementation complexity. In contrast to other genetic algorithms, the main features of the EGA are: • a prejudice-free selection mechanism for preserving population diversity; • a TSC operation for enhancing information exchange among individuals and variables; • An adaptive mutation strategy to avoid premature convergence and stagnation scenarios.
To evaluate the performance of the proposed method, studies based on benchmark functions are conducted. The results indicate that the EGA not only achieves a higher accuracy and faster convergence speed, but also it provides a good robustness against noise, showing an overwhelming superiority over the SGA, the PSO, the CS, and the CMGA. At last, parameter estimations of real-life sinusoidal signals are performed, which demonstrates the superiority and validity of the proposed algorithm in its entirety.
The EGA has many issues worth studying further, such as the relationship between time consumption and number of offspring, the applications where it excels and fails, and so on.
Author Contributions: Conceptualization, C.J.; formal analysis, M.L.; software, C.J.; Supervision, C.C.; visualization, P.S.; writing-original draft, C.J. and P.S.; writing-review and editing, P.S. and C.C. All authors have read and agreed to the published version of the manuscript.