1. Introduction
Wired or wireless communication channels are under multipath fading as well as impulsive noise from various sources [
1,
2]. The impulsive noise can cause large instantaneous errors and system failure so that enhanced signal processing algorithms for coping with such obstacles are needed. Most algorithms are designed based on the mean squared error (
MSE) criterion, but it often fails in impulsive noise environments [
3].One of the cost functions based on information theoretic learning (ITL), minimum error entropy (
MEE) has been developed by Erdogmus [
4]. As a nonlinear version of 
MEE, the decision feedback 
MEE (
DF-MEE) algorithm has been known to yield superior performance under severe channel distortions and impulsive noise environments [
5]. It also has been shown for shallow underwater communication channels that the 
DF-MEE algorithm has not only robustness against impulsive noise and severe multipath fading but can also be more improved by some modification of the kernel size [
6].
One of the problems of the 
MEE algorithm is its heavy computational complexity caused by the computation of double summations for the gradient estimation of 
MEE algorithm at each iteration time. In the work conducted by [
7], a computation reducing method by the recursive gradient estimation of the 
DF-MEE has been proposed for practical implementation. Though those practical difficulties have been removed through the recursive method, theoretic analysis in depth on its optimum solutions and their behavior has not been carried out yet for further enhancement of the algorithm.
In this paper, based on the analysis of behavior of optimum weight and some factors on mitigation of influence from large errors due to impulsive noise, we propose to employ a time-varying step size through normalization by the input power that is recursively estimated for effectiveness in computational complexity. The performance comparison with 
MEE will be discussed and experimented through simulation in equalization as well as in system identification problems with impulsive noise that can be encountered in experiments investigating physical phenomenon [
8].
  2. MSE Criterion and Related Algorithms
The overall communication system model for this work is described in 
Figure 1. The transmitter sends a symbol 
 at time k through the multipath channel described in z-transform, 
, and then impulsive noise 
 is added to the channel output to become the received signal 
 so that the adaptive system input 
 contains noise 
 and intersymbol interference (ISI) caused by the channel’s multipath [
9].
With the input 
 and weight 
 of the tapped delay line (TDL) equalizer, the output 
 and the error 
 become
      
With the current weight , a set of error samples and a set of input samples, the adaptive algorithms designed according to their own criteria such as MSE or MEE produce updated weight  with which the adaptive system makes the next output .
Taking statistical average 
 to the error power 
, the 
MSE criterion is defined as 
. For practical reasons, instantaneous error power 
 can be used and the 
LMS (least mean square) algorithm has been developed based on minimization of 
 [
9].The minimization of 
 can be carried out by the steepest descent method utilizing the gradient of 
 as
      
With Equation (4) and the step size 
, the well-known 
LMS algorithm is presented as
      
By letting the gradient 
 be zero, we have the optimum condition of the 
LMS as
      
Taking statistical average 
 to Equation (6) leads us to the optimum condition of the 
MSE criterion as
      
Inserting (3) into (6), we get the optimum weight of the 
LMS algorithm, 
 as
      
The optimum weight  in (8)might be expected to get wildly shaky in impulsive noise situations since it has no protection measures from such impulses existing in the input vector .
When the effect of fluctuations in the input power levels is considered, the fact that the step size 
 of the 
LMS algorithm should be inversely proportional to the power of the input signal 
 leads to the normalized 
LMS algorithm (
NLMS) where its step size is normalized by the squared norm of the input vector 
, that is, 
 [
9]. One of the principal characteristics of the 
NLMS algorithm is that the parameter 
 is dimensionless, whereas 
 has the dimensioning of inverse power as mentioned above. Therefore, we may view that the 
NLMS algorithm has an input power-dependent adaptation step size, so that the effect of fluctuations in the power levels of the input signal is compensated at the adaptation level. When we assume that in the steady state 
 and 
 are independent, the input vector 
 can be viewed as being normalized by its squared norm 
 in the 
NLMS algorithm as 
.
Unlike the 
LMS or 
NLMS, the 
MEE algorithm based on the error entropy criterion is known for its robustness against impulsive noise [
6]. In the following section, the 
MEE algorithm will be analyzed with respect to its weight behavior under impulsive noise environments.
  3. MEE Algorithm and Magnitude Controlled Input Entropy
The MSE criterion is effective under the assumptions of linearity and Gaussianity since it uses only second order statistics of the error signal. When the noise is impulsive, a criterion considering all the higher order statistics of the error signal would be more appropriate.
Error entropy as a scalar quantity provides a measure of the average information contained in a given error distribution. With 
N samples (sample size 
N) of error samples 
 the distribution function of error, 
 can be constructed based on Kernel density estimation as in Equation (9) [
10].
      
Since the Shannon’s entropy in [
9] is hard to estimate and to minimize due to the integral of the logarithm of a given distribution function, Renyi’s quadratic error entropy 
 has been effectively used in ITL methods as described in (10).
      
When error entropy 
 in (10) is minimized, the error distribution 
 of an adaptive system is contracted and all higher order moments are minimized [
4].
Inserting (9) into (10) leads to the following 
 that can be interpreted as interactions among pairs of error samples where error samples act as physical particles.
      
Since the Gaussian kernel 
 is always positive and is an exponential decay function with the distance square, the Gaussian kernel may be considered to create a potential field. The sum of all pairs of interactions in the argument of log [.] in (11) is called information potential 
 [
4].
      
Then, minimization of error entropy is equivalent to maximization of 
. For the maximization of 
, the gradient of (12) becomes
      
At the optimum state (
), we have
      
Since the term 
 implies how far the current error 
 is located from each error sample 
, we may define the error pair 
 as 
 which is generated from the error space 
 at each iteration time as in 
Figure 2. The term 
 can be considered to contain information of the extent of spread of error samples. Considering that entropy is a measure of how evenly energy is distributed or the range of positions of components of a system, we will refer to this information as error entropy (
EE) in this paper for convenience.
Similarly, the term 
 indicates the distance between the current input vector 
 and another input vector 
 in the input vector space. Therefore, with the following definition, we can say that 
 contains the information of the extent of spread of input vectors, that is, input entropy (
IE). Likewise, we will refer to 
 as an 
IE vector in this paper.
      
Then, with the 
EE sample 
 and 
IE vector 
 Equation (14) can be rewritten as
      
If we consider the sample-averaged operation 
 in (16) can be replaced with the statistical average 
 or vice versa for practical reasons, the comparison between (16) and the optimum condition of the 
MSE criterion 
 in (7) provides insight that 
 of the 
MSE criterion can correspond to 
EE sample 
, and 
 of 
MSE criterion can be related to 
 as a kind of modified input entropy vector. We also see that the term 
 in (16) implies that the magnitude of 
 is controlled by 
. At the occurrence of a strong impulse in 
,
 can be located far away from 
 so that the 
EE sample 
 has a very large value. Then, the Gaussian function output 
 becomes a very small value since its exponential is a decay function of 
. In turn, the value of the 
IE vector 
 is reduced by the multiplication of 
. In this regard, it is appropriate that the term 
 in (16)is interpreted as a magnitude-controlled version of ... Defining 
 as a magnitude controlled input entropy (
MCIE) in (17), this process can be described as in 
Figure 3.
      
In an element expression,
      
With 
, the 
MEE algorithm becomes
      
The optimum condition in (16) can be rewritten as
      
We may observe that the MEE algorithm in (20) is very similar to (7) in the aspect of the error and input terms. One different part is that the MEE algorithm consists of summations of error entropy samples and input entropy vectors, while the LMS just has an error sample and an input vector.
On the other hand, it can be noticed that MCIE  can keep the algorithm stable even at the occurrences of large error entropy that occurs mostly when the input is contaminated by impulse noise. The summation process over  can also mitigate the influence of impulses, but it does not contribute much to deterring the influence of large errors since even an impulse can dominate the averaging (summation) operation.
  4. Recursive Power Estimation of MCIE
The fixed step size of the MEE algorithm may make the MEE require an understanding of the statistics of the input entropy prior to the adaptive filtering operation. This makes it hard in practice to choose an appropriate step size  that controls its learning speed and stability.
Like the approach of the normalized 
LMS that solves this kind of problem through normalization by the summed power of the current input samples as in [
9,
11], we propose heuristically to normalize the step size by the summed power of the current 
MCIE element in (18) as
      
Considering the fact that impulses can defeat the average operation as explained in 
Section 3, we can notice that the denominator may become large in an incident with impulsive noise; in turn, 
 becomes a very small value, so that it may induce a very slow convergence. To avoid this kind of situation, we may adopt a sliding window as
      
However, this approach places a heavier computational complexity on the 
MEE algorithm. For reducing the burdensome computations, we need to track the power recursively using a single-pole low-pass filter, i.e.,
      
	  where 
β  controls the bandwidth and time constant of the system whose transfer function 
 with its input 
 is given by
      
Then, the resulting algorithm that we will refer to in this paper as normalized 
MEE (
NMEE) becomes
      
On the other hand, the 
NLMS in (19) has been developed based on the principle of minimum disturbance that states the tap weight change of an adaptive filter from one iteration to the next, that is, the squared Euclidean norm (
SEN) of the change in the tap-weight vector, 
 should be minimal [
9]. From that perspective, the effectiveness of the proposed 
NMEE algorithm can be analyzed based on the disturbance, 
SEN, at around the optimum state as
      
For the existing 
MEE algorithm of (19),
      
For the proposed 
NMEE algorithm,
      
Comparison of 
 in (27) and 
 in (29) leads to
      
This result indicates that the proposed method is more suitable for the conventional MEE when the MCIE power is greater than , which means when a smaller  is demanded, such as when the input signal is contaminated with strong impulsive noise. On the other hand, it can be noticed that when the input signal is not large so that a bigger  can be employed for faster convergence, the proposed method may not be guaranteed to be better than the fixed step size MEE algorithm.
On the other hand, we know there are a lot of step size selection methods for gradient-based algorithms, and we need to verify that this approach is the right one for the MEE problems. Considering that the proposed step size selection method is motivated and designed by the concept of the input power normalization as in the NLMS algorithm, it may be reasonable to investigate whether the input power normalization is effective in the MEE algorithm under impulsive noise.
When we employ the squared norm of the input vector 
, that is, 
 in the 
MEE algorithm (we will refer to this as 
NMEE2 for convenience), the squared Euclidean norm becomes
      
Assuming the error entropy 
 and 
MCIE  are independent in the steady sate, (33) becomes
      
The 
SEN in (29) adopting the squared norm of the 
MCIE instead of 
 can be rewritten as
      
Comparing the two 
SENs, (34) and (35), the 
MCIE in 
 is normalized by 
MCIE power 
, whereas the 
MCIE in 
 is normalized by the simply summed input power 
. This indicates that 
 might vary to some degree since the denominator 
 containing impulsive noise can fluctuate from small values to large values due to strong impulses dominating the sum operation. From this analysis, the fact that the 
MCIE in 
 is normalized by 
MCIE power 
 that uses the output of the magnitude controller cutting the outliers from strong impulses leads us to the argument that our proposed method is appropriate for impulsive noise situations. This will be tested in 
Section 5. As observed in (30), when the input signal is not in strong impulsive noise environments, the proposed method may not be better than the existing 
MEE algorithm.
The effectiveness of the proposed NMEE algorithm under strong impulsive noise will be investigated in the following section.
  5. Results and Discussion
The simulation for observations of the optimum weight behavior of 
MEE algorithm is carried out in equalization of the multipath channel of 
 [
12]. The transmitted symbol 
 sent at time k is randomly chosen from the symbol set 
(
). The impulsive noise 
 in (1) consists of the background white Gaussian noise (BWGN) and impulses (IM) with variance 
 and 
, respectively. The impulses are generated according to a Poisson process with its incident rate 
ε [
10]. The distribution 
 of the impulses is
      
The BWGN with  = 0.001 is added throughout the whole time to the channel output. The impulses are generated with variance  = 50. The TDL equalizer has 11 tap weights . For the parameters for the MEE algorithm, the sample size N, the kernel size σ and convergence parameter  are 20, 0.7 and 0.01, respectively. The step size  for the LMS algorithm is 0.001. All parameter values are selected when they produce the lowest minimum MSE in this simulation.
Firstly, the weight traces will be investigated through simulation in order to verify the property of robustness against impulsive noise. The impulses are generated with 
ε = 0.01 for clear observation of the weight behavior. The impulse noise as depicted in 
Figure 4 is applied to the channel output in the steady state, that is, after convergence.
Figure 5 shows the learning curves of weight 
 and 
 (only two weights are chosen due to the page limitation). At around 5000 samples, both reach their optima completely, and then they undergo the impulsive noise like that in 
Figure 4. In 
Figure 5, it is observed that 
MEE and 
LMS have the same steady state weight values and each weight trace of 
MEE in the steady state shows no fluctuations remaining undisturbed under the strong impulses. This is obviously in contrast to the case of the 
LMS algorithm where traces of 
 and 
 have sharp perturbations at impulse occurrences and remain perturbed for a long time though gradually dying.
 We can notice that the optimum weight of MEE has averaging operations and MCIE in (23) has some differences when the weight update Equation (8) is compared. Since the average operations can easily be defeated even by just one strong impulse, we can figure out that the dominant role of robustness against impulsive noise is the MCIE.
Secondly, the effectiveness of the proposed 
NMEE algorithm (25) designed with the 
MCIE is investigated through the learning performance comparison with the original 
MEE algorithm in (19) under the same impulsive noise with 
 = 50 and 
ε = 0.03 as in the work [
5] in which the impulsive noise is used in all time. The 
MSE learning results are shown in 
Figure 6.
The 
LMS algorithm converges very slow and stays at about −8 dB of 
MSE in the steady state. This result can be explained from the expression of 
 in (8) having no measures to protect it from fluctuations from impulsive noise as discussed in 
Section 3. On the other hand, the 
MEE algorithm rigged with the magnitude controller for 
IE converges in about 1000 samples even under the strong impulsive noise. This result supports the analysis that the 
MCIE  keeps the algorithm (19) and its steady state weight undisturbed by large error values that may be induced from excessive noise such as impulses.
As for the performance comparison between 
MEE and 
NMEE in 
Figure 6, 
NMEE shows lower minimum 
MSE and faster convergence speed simultaneously. The difference of convergence speed is about 500 samples and that of minimum 
MSE is around 1 dB. When compared to the condition of the same convergence speed, the difference in minimum 
MSE is shown to be about 3 dB. This amount of performance gap indicates that the proposed method of tracking the power of 
MCIE recursively and using it in normalization of the step size is significantly effective in the aspect of performance as well as computational complexity.
In 
Figure 7, the 
MCIE power becomes large as the 
MEE algorithm converges, and after convergence, the trace shows large variations, mostly above 6. The condition 
 in this simulation implies that when 
NMEE is employed, the value 
μ according to (32) must be greater than 
 for better performance. The fact that this is exactly in accordance with the choice 
 described in 
Figure 6 justifies the effectiveness of the proposed method by simulation.
In the same simulation environment, the 
MSE learning curves for two input power normalization approaches, 
NMEE and 
NMEE2 are compared in 
Figure 8.
As observed in 
Figure 8, the input power normalization approach for variable step size selection for the 
MEE algorithm shows different 
MSE performances according to which signal power is normalized. When 
NMEE is employed where the magnitude controls input entropy, 
MCIE is used for power normalization, the 
MSE learning performance yields better steady state 
MSE of above 2 dB and faster convergence speed by about 1000 samples than when 
NMEE2 is adopted, in which the squared norm of the unprocessed input 
 is used for normalization. As discussed in 
Section 4, under strong impulsive noise, the power of 
MCIE can be the right choice for step size normalization for better performance.
In system identification applications of adaptive filtering as appeared in the work [
8], the desired signal is derived by passing the white Gaussian input through the unknown system. The unknown system in this simulation is of length 9. The impulse response of the unknown system is chosen to follow a triangular wave form that is symmetric with respect to the central tap point [
9,
13]. The TDL filter has 9 tap weights. The input signal is a white Gaussian process with zero mean and unit variance. The same impulsive noise used in 
Figure 6, uncorrelated with the input, is added to the output of the unknown system. 
MSE learning curves are depicted in 
Figure 9.
One can observe from 
Figure 8 that the proposed 
NMEE achieves lower steady-state 
MSE than the conventional 
MEE algorithm in the system identification problems as well.