Iterative Receiver Design for the Estimation of Gaussian Samples in Impulsive Noise

Featured Application: The communications scenario analyzed in this paper is typical of environments affected by strong electromagnetic interference (EMI), such as, e.g., power line communications or power substations. The transmission of random correlated samples with continuous values, that we analyze, can be seen both as a rough model for multicarrier digital transmission systems (such as OFDM) or, more realistically, as an accurate model for a network of distributed sensors, as typical of an Internet of Things (IoT) scenario. The receivers that we propose in this paper are a convenient choice whenever the sensors communicate (with each-other or with a base station) in an environment affected by strong EM interference. Abstract: Impulsive noise is the main limiting factor for transmission over channels affected by electromagnetic interference. We study the estimation of (correlated) Gaussian signals in an impulsive noise scenarios. In this work, we analyze some of the existing as well as some novel estimation algorithms. Their performance is compared, for the ﬁrst time, for different channel conditions, including the Markov-Middleton scenario, where the impulsive noise switches between different noise states. Following a modern approach in digital communications, the receiver design is based on a factor graph model and implements a message passing algorithm. The correlation among signal samples as well as among noise states brings about a loopy factor graph, where an iterative message passing scheme should be employed. As it is well known, approximate variational inference techniques are necessary in these cases. We propose and analyze different algorithms and provide a complete performance comparison among them, showing that both Expectation Propagation, Transparent Propagation, and the Parallel Iterative Schedule approaches reach a performance close to the optimal, at different channel conditions. validation, A.M., A.V. and R.P.; analysis, A.M., A.V. and G.C.; investigation, A.M. and A.V.; resources, G.C. and L.V.; data curation, A.M.; writing–original draft preparation, A.M. and A.V.; writing–review and editing, A.M., A.V., G.C., R.P. and L.V.; visualization, A.M. and A.V.; supervision, G.C.; project administration, A.V.; funding acquisition,


Introduction
The design of a robust communication system for a transmission medium with strong electromagnetic interference (EMI), such as a power line communication (PLC) channels or a wireless network of distributed sensors subject to EMI, is a challenging task. In such scenarios, the dominant source of impairment is the impulsive noise, a non Gaussian additive noise that arises from neighbouring devices (e.g., in power substations; when (dis)connecting from mains; or due to other electronic switching events) and whose power fluctuates in time [1][2][3][4][5].
Different impulsive noise models have been proposed in the past years, the simplest being a Bernoulli-Gaussian model [1], where a background Gaussian noise with given power switches to another (usually much larger) power level, during an impulsive event. These events induce a

System Model
A sequence of Gaussian samples {s k } K−1 k=0 is transmitted over a channel impaired by impulsive noise. The received samples are thus expressed by y k = s k + n k (k = 0, 1, · · · , K − 1) where {n k } K−1 k=0 is a sequence of (zero-mean) additive Gaussian noise samples whose variance depends on the time index k, as detailed in the following.
The transmitted samples are assumed to be correlated according to an autoregressive model of order 1. The AR(1) sequence is thus obtained as the output of a single-pole Infinite Impulse Response (IIR) digital filter, fed by independent and identically distributed (i.i.d.) Gaussian samples {ω k } K−1 k=0 , where ω k ∼ N (0, σ 2 ω ): where a 1 is the pole of the filter. The variance σ 2 s of s k is taken as a reference, hence we set σ 2 ω = (1 − a 2 1 )σ 2 s in (2). The noise samples n k in (1) follow the statistical description of either the Markov-Gaussian model or of the Markov-Middleton class A noise model. Both models, as detailed in the following subsections, account for a correlation among noise samples, which in turn reflects the physical property of burstiness. As is, in fact, well known, impulsive noise events occur in bursts, i.e., through a sequence of noise samples whose average power (variance) becomes suddenly larger.

Markov-Gaussian noise
The Bernoulli-Gaussian noise [1] is a simple two state model consisting of a background Gaussian noise sequence {n G k }, with variance σ 2 G , on the top of which another independent Gaussian noise sequence {n I k }, with variance σ 2 I , may appear, due to a source of extra noise, like an interferer. The occurrence of such an impulsive event clearly follows the Bernoulli distribution, where the probability of observing a sequence {n B k } = {n G k + n I k }, i.e., to be in a bad channel condition (the superscripts G , I and B stand for Good, Interferer and Bad, respectively) is usually much smaller than that of being in a good condition, with only background noise present. At the same time, the variance σ 2 B of the bad noise samples is usually much larger than that of the good ones, σ 2 G , so that the power ratio R = σ 2 B σ 2 G is typically much larger than one. According to the Bernoulli-Gaussian model, the probability density function (pdf) of each noise sample n k is thus expressed as where P G and P G = 1 − P B are the probabilities of the good or bad channel states, i.e., without or with an active source of interference. We call i k the underlying Bernoulli variable, that acts as a switch to turn on/off the interferer's noise, so that the overall noise sample at time k is In order to model the bursty nature of impulsive noise, a Markov-Gaussian model was proposed in [3]. The following transition probability matrix, accounts for the correlation between successive noise states. As in any transition matrix, the elements are such that P GG + P GB = 1 and P BG + P BB = 1, so that in the state diagram represention of the Markov-Gaussian noise, in Fig. 1, the outgoing transition probabilities sum to one. Moreover, by introducing a correlation parameter γ = (P GB + P BG ) −1 , the stady-state probabilities of the two noise states are P G = γP GB and P B = γP BG . The parameter γ determines the amount of correlation between successive noise states, so that a larger γ implies an increased burstiness of impulsive noise. More explicitly, the average duration of permanence in a given noise state is whereas T B = P −1 B and T G = P −1 G would occur in the case of a memoryless Bernoulli-Gaussian process, to which the Markov-Gaussian model reduces in the case γ = 0.

Markov Middleton class A noise
In this model, the number i of interferers, at time k, can exceed one and can be virtually any integer number, despite a maximum number M of interferers is usually considered for practical reasons. The Gaussian interferers are all independent from each other and are assumed to have the same power, σ 2 I . As in the Bernoulli-Gaussian and in the Markov-Gaussian models of Sec. 2.1, the interferers are superimposed to an independent background noise sequence {n G k } with variance σ 2 G , so that the overall noise power is σ 2 i = σ 2 G + iσ 2 I , whenever the channel is in state i, i.e., i interferers are active. The pdf of noise samples is thus In the Middleton class A model, the channel state is assumed to follow a Poisson distribution [2], so that in (7) P i = e −A A i (i!) −1 , where A is called impulsive index and can be interpreted as the average number of active interferers per time unit. When the channel state accounts for i interferers, we can thus express the total noise power as in which a new parameter Γ = σ 2 G (Aσ 2 I ) −1 represents the ratio between the power of background noise and the average power of interferers. The average power of Markov Middleton class A noise is thus In order to avoid the practical complications of an extremely large (and unlikely) number of interferers, by exploiting the rapidly decreasing values of the Poisson distribution, an approximation of the Middleton class A noise is usually introduced by truncating its pdf to a maximum value for the channel state i ≤ M, as follows: In order to account for memory in the Middleton class A model, a hidden Markov model (HMM) is assumed to govern the underlying transitions among channel states, so that successive noise states are correlated [4]. The corresponding transition probability matrix is thus in which each row sums to one and x is the correlation parameter. Referring to the state diagram in Fig.  2, the channel can remain in the present state i k ∈ {0, 1, · · · , M − 1} with probability x or otherwise switch to one of the states (including i k ), with a (pruned, due to (11)) Poisson distribution. The average duration of an impulsive event with m interferers, i.e., the average permanence time within a state m, can be computed as The Markov-Middleton class A model reduces to the simpler memoryless Middleton class A model in the case x = 0. At the same time, the Markov-Gaussian model is an instance of the Markov-Middleton class A model, for M = 2. Despite the latter is a more general model that includes the earlier, we illustrate and analyze both, in line with the existing literature that developed along the two tracks. In any case, the expression of noise samples to be considered in the observation model (1) is the one in (4).

Factor Graph and Sum-Product Algorithm
As a common technique to perform maximum a posteriori (MAP) symbol estimation, we use a factor graph representation for the variables of interest and for the relationships among them that stem from the system model at hand. These variables and relationships are in turn the variable nodes and the factor nodes appearing in the FG [15]. The aim is to compute the marginal distributions (pdf) of the random variables to be estimated, i.e., the transmitted signal samples s k , conditioned on the observation of the received noisy samples y k in (1), whereas the overall FG represents the joint distribution of all the variable nodes, i.e., of transmitted samples as well as of channel states i k . Despite this marginalization task is nontrivial, in the case of many (e.g., thousands of) variable nodes, it is brilliantly accomplished by the sum-product algorithm (SPA), which is a message-passing algorithm able to reach the exact solution in the case of FGs without loops [15].
Thanks to Bayes' rule, the joint posterior distribution of signal samples s = {s k } and channel states i = {i k } given the observation samples y = {y k } can be factorized as follows. p(s, i | y) ∝ p(y | s, i)p(s)P(i) The Markovianity of both the signal sequence and of the noise sequence is a key factor in the application of the chain rule in (13), where simplified expressions result for its factors. Such a short-term memory 1 implies that, besides the two types of variable nodes, i.e., s k and i k , the FG consists of three types of factor nodes, i.e., p(y k | s k , i k ), p(s k | s k−1 ), and P(i k | i k−1 ), the latter involving only pairs of time-adjacent variable nodes. This feature is further evidenced by the FG depicted in Fig. 3, which graphically represents the joint pdf in (13). The pdfs p(s k | s k−1 ) and p(y k | s k , i k ), and the conditional probability mass function (PMF) P(i k | i k−1 ) simply arise from the system model, i.e., from equations (2), (1) and (5) or (11), and from the statistical description of the random variables therein. By denoting g(x; η, γ 2 ) the standard Gaussian pdf with mean η and variance γ 2 , we have where I is the variance of the observed sample at time k, that depends on the corresponding impulsive noise state i k , represented by the variable node under factor node (16), in Fig. 3. In (15), the transition probability between successive impulsive noise states is found from the entries P(i, j) of matrix (5) or (11), depending on which of the noise models is adopted.
A straightforward application of the SPA [15] yields the forward and backward messages exchanged along the top line of the graph, 1 The Markov property of signal and noise sequences could be easily extended to a memory larger than one. This would however complicate the resulting notation without introducing an extra conceptual contribution.
where the initial condition for (17) is p(s 0 ) = g(s 0 ; 0, σ 2 s ), while a constant value, as p(s K−1 ) = 1, is assumed to bootstrap the backward messages, which models the absence of a posterior information on the last signal sample. In a similar way, the forward and backward messages exchanged along the bottom line of the FG are where the initial conditions clearly depend on the adopted noise model. In the Markov-Gaussian case, the pdf of the initial impulsive noise state is p( , in the case of Markov-Middleton noise, according to the descriptions in Secs. 2.1 and 2.2. In both cases, a uniform PMF (obtained by setting a possibly un-normalized equal value, as P(i K−1 ) = 1, for any of its realizations) models the absence of prior assumptions on the last impulsive noise state.
The upward and downward messages, on the vertical branches of the FG, are directed, respectively, towards the variable nodes s k and i k of our interest and their expressions are where, given the Gaussian expression (16), the integral in (22) takes the form of a convolution. The described FG, as evident from Fig. 3, includes loops, hence the SPA does not terminate spontaneously. As is well known, approximate variational inference techniques are required, in these cases [15]. We shall describe some of them in Sec. 4 -including both traditional approaches and novel ones -and the performance of the resulting algorithms will be compared in Sec. 5. Nevertheless, all of them share a few fundamental features: i) first, the resulting algorithms are all iterative, passing messages more than once along the same FG edge and direction; ii) as a consequence, scheduling is a main issue, since it defines the message-passing procedure; iii) during iterations, the complexity of messages spontaneously tends to "explode", so that clever message approximation strategies must be devised [15,16].
More specifically, regarding message approximations (point iii) above), the usual approach is the following. Cast in a qualitative parlance, one should: 1) select an approximating family , i.e., a family of pdfs/PMFs that is flexible and tractable enough, possibly exhibiting closure properties under multiplication, convolution or other mathematical operations; 2) select an appropriate divergence measure to quantify the accuracy of the approximation, hence to identify the best approximating pdf within the chosen family [17]. While there is no standard method to select the approximating family, the most common choice for the divergence measure is the Kullback-Leibler (KL) Divergence, of which we give a few details hereafter.

Kullback-Leibler (KL) Divergence
The KL divergence is an information theoretic tool (see. e.g., [17]) that can be used to measure the similarity between two probability distributions. We can search a pre-selected approximating family F of (possibly "simple") distributions, to find the distribution q(x) closest to a given distribution p(x), that is possibly much more "complicated" than q(x) (such as, e.g., a mixture). Such an operation is called projection and is accomplished by minimizing the KL divergence, as A common choice is to let q(x) belong to an exponential family, are the so called features of the family and v j are the natural parameters of the distribution. Gaussian pdfs of Tikhonov pdfs, just to name a couple, are common distributions that naturally belong to exponential families, each with its own set of features (G j (x) = x j (j = 0, 1, 2) for the Gaussian; G 1 (x) = cos(x) and G 2 (x) = sin(x) (j = 1, 2) for the Tikhonov) and each with its own set of constraints on the natural parameters (e.g., v 2 < 0, for the Gaussians), that force the family to include proper distributions only. Exponential families have the pleasant property of being closed under the multiplication operation. Furthermore, it can be shown [17] that the projection operation in (23) simply reduces to equating the expectation of each feature with respect to the true distribution p( . Such a procedure of matching the expectations (of the features) is at the heart of (and gives the name to) the celebrated EP (Expectation Propagation) algorithm [18], further applied in Sec. 4 to our problem.
To get a better understanding of this procedure, let's apply it to project a Gaussian mixture onto a single Gaussian pdf. Suppose p(x) is the following Gaussian mixture: where ∑ i α i = 1 for normalization. We shall approximate it by the nearest Gaussian q(x) in the sense of the KL divergence, i.e., by applying (23). To that purpose, given that the expectations of the features of a Gaussian distributions are simply the mass, the mean and the mean-square value of that Gaussian (i.e., the expectations of 1, x and x 2 ), the matching of these moments computed under p(x) with those computed under q(x), simply amounts to equating the mean and variance of the two distributions (assuming that both p(x) and q(x) are normalized).
are the resulting algebraic equations from which the parameters (both the natural parameters and the moments) of the best approximating Gaussian q(x) are derived.

Signal Estimation
Referring to the FG in Fig.3, that models the problem described in Sec. 2, we shall first show that the message-passing procedure that is established by the SPA, as detailed in Sec. 3, along the horizontal edges of the FG coincides with two classical estimation problems, and is solved by two equally classical signal processing algorithms. Namely, along the top line of edges in the FG, the forward-backward message-passing of Gaussian estimates for the (continuous) samples s k coincides with the celebrated Kalman smoother [21]. Along the bottom line of edges in the FG, a similar forward-backward message-passing of (discrete) PMFs exactly follows the famous BCJR algorithm Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 30 November 2020 doi:10.20944/preprints202011.0725.v1 [22] (named after the initials of the authors), that is known to provide a MAP estimate of the channel states i k at each time epoch, i.e., for the variable nodes along the bottom line of the FG. It is significant that the same BCJR algorithm was recently implemented in [9], in the context of a problem similar to ours, where memoryless Gaussian signal samples are estimated in the presence of Markov-Gaussian impulsive noise (to which our system model reduces in the special case a 1 = 0 and M = 2). It is indeed the memory in both the signal and the noise sequences that makes the loopy FG in Fig. 3 "unsolvable" (in the sense of marginalizing the joint pdf) with the SPA. The problem arises in fact from the messages passed along the vertical edges of the FG, i.e., those that connect the top half (that of signal samples) to the bottom one (that of impulsive noise channel states). As discussed in the following in more detail, while the downward messages (22) result from products and convolutions of Gaussian pdfs, hence are themselves Gaussian, this is not the case for the upwards messages in (21), which are Gaussian mixtures, basically due to the presence of the discrete variables i k . That of mixed discrete and continuous variable nodes is a well-known scenario, that makes the number of mixture elements, hence the complexity of messages, increase exponentially at every time step, so that mixture reduction techniques should be employed [16]. Mixture reduction can either be accomplished by taking hard decisions on the impulsive noise channel states or otherwise by exploiting the soft information contained in the mixture messages p u (s k ), through more sophisticated approximation schemes [23].

Upper FG half: Kalman Smoother
Suppose that the lower part of the FG sends messages p u (s k ) consisting of a single Gaussian distribution, with meanŝ u,k and varianceσ 2 u,k , Under this assumption and based on the SPA, the forward and backward equations (17,18) along the upper line of the FG coincide with those of a Kalman smoother [15]. To compute the forward and backward messages, we assume that η f and η b are respectively the means of the Gaussian messages p f (s k ) and p b (s k ), and σ 2 f and σ 2 b are their variances. Equations (14), (17), and (27) yield in which η f ,k and σ 2 f ,k can recursively be updated as In the same way, the backward recursion can be obtained from (14), (18), and (27): where Thus, the message sent from the variable node s k to the factor node p(y k | s k , i k ), here named p d (s k ), is Gaussian too, where, as in all Gaussian product operations, the variance and mean, are found by summing the precision (i.e., the inverse of the variance) and the precision weighted mean.

Lower FG half: BCJR
As mentioned earlier, the message P d (i K ) in (22) is the result of a convolution between two Gaussian pdfs, one provided by the channel observation, p(y k | s k , i k ), and the other, p d (s k ), by the upper FG half: As discussed in Sec. 2.2, σ 2 i,k takes M different values, each in a one-to-one correspondence with an impulsive noise channel state. According to (15) and (37), the forward and backward recursions (19,20), along the bottom line of FG edges, are: The above equations form the well-known MAP symbol detection algorithm known as BCJR [22] and more generally referred to as forward-backward algorithm [15]. For each variable node i k , the product of (38,39) is the message, here named P u (i k ), that is sent upward to the factor node p(y k | s k , i k ),

Hard Decisions and the Parallel Iterative Schedule algorithm
Substituting (16) and (40) in (21), it is clear that the resulting message is a Gaussian mixture and not a single Gaussian distribution, as assumed in (27). The generation of a Gaussian mixture, at every vertical edge of the FG, exponentially increases the computational complexity. Thus, the use of mixture reduction techniques is unavoidable. To that purpose, one radical approach is to take a hard decision on everyî k = argmax{P(i k )} , selected as the modal value of the estimated PMF,P(i k ) = P u (i k )P d (i k ). The mixture (41) is thus approximated by its most likely Gaussian term, so that the assumption of having Gaussian priors for each s k (i.e., the information on the individual sample that does not depend on the correlated sequence), as carried by the upward messages p u (s k ), is respected and the algorithm in Sec. 4.1 can be applied.
With this approximation, we can then establish a parallel iterative schedule (PIS) in the loopy FG. The upper and the lower parts of the FG work in parallel, at every iteration, according to their forward-backward procedure (corresponding, as seen, to the Kalman smoother and the BCJR, respectively). At the end of a forward-backward pass, the two halves of the FG exchange information through messages sent along the vertical edges of the graph. Namely, p d (s k ) is convolved with the corresponding Gaussian observation (16), so as to get P d (i k ) in (22), while P u (i k ) is used to update the impulsive noise state PMF,P(i k ) = P u (i k )P d (i k ), hence the estimatedî k , at every iteration. Such a scheduling and the approximate hard decision strategy, which is an essential part of it, shall guarantee the convergence of the overall iterative algorithm, as shown in Sec. 5.

Soft Decisions: EP and TP algorithms
Hard decisions on every i k imply that only one part of the information, provided by the lower FG half (Sec. 4.2), is being used by the signal estimation, in the upper FG half (Sec. 4.1) and the rest of the information is being discarded. This is clearly a suboptimal approach.
A better performing soft information strategy can instead be pursued, based on approximating the mixture (41) by minimizing the KL divergence. This can be implemented by using either the EP algorithm or the TP algorithm. To have a better insight on similarities and differences between EP and TP, consider the posterior marginal p(s k |y) obtained, according to the SPA rules, as the product of all incoming messages to the variable node s k : where p d (s k ) in (34) is Gaussian and p u (s k ) in (41) is a Gaussian mixture; thus, p(s k |y) is a Gaussian mixture too. An approximation for it can be computed by the projection operation (23), once an approximating exponential family F is selected. In the problem that we analyze in this manuscript, the Gaussian family seems a natural choice, so that This is exactly the approach of the EP algorithm: the posterior marginal is approximated by matching the moments of the approximating Gaussian to those of the mixture, as discussed in Sec. 3.1, and the upward message p u (s k ), sent to the variable node s k for the next iteration, is replaced by the following, obtained by Gaussian division This division of pdfs, that is inherent to the EP algorithm, is not painless and can give rise to unproper distributions, which in turn introduce instabilities into the algorithm [18,23,24]. This is easily understood by considering a Gaussian division in (44) where the variance of the denominator is larger than the variance of the numerator, so that an absurd negative variance results. Several techniques have been proposed to avoid these instabilities; we adopt the simple unproper message rejection discussed in [24].
An alternative approach, that spontaneously avoids instabilities, is that of the TP algorithm [19], where it is the individual message p u (s k ), i.e., the mixture, that is projected onto an approximating (here Gaussian) pdf, instead of the posterior marginal, as done in EP: In TP, the posterior marginal, from which the signal sample is estimated, is as usual obtained by multiplying the incoming messages to s k , i.e., the TP message (45) is multiplied by p d (s k ): to finally get the TP symbol estimate from the posterior marginal (46). Note that the result in (45) would be obtained from the EP projection in (44) if only the operator proj[·] in (43) were transparent to p d (s k ), i.e., if it were proj[p d (s k )p u (s k )] = p d (s k )proj[p u (s k )] (which amounts to stating p EP (s k |y) = p TP (s k |y)), which is in general not the case. Finally, the TP algorithm is inherently stable and does not require any complementary message rejection procedure.

Results
We evaluated, by numerical simulation, the mean square error (MSE) versus the average signal to noise ratio (SNR) in a bursty impulsive noise scenario with maximum M − 1 interferers, i.e., with M noise states, where the 0-th state corresponds to background noise only. As discussed in Sec. 2, the peculiarity of this system is that the actual ratio of signal to noise power can dramatically change at every time epoch. Fig. 4 shows the MSE for the estimated signals corrupted by impulsive noise with M = 2, 4, 16 states (Figs. 4a, 4b and 4c), where the following system parameters were chosen for the simulation. The correlation parameter x was set to 0.9, meaning that the channel state is expected to switch to a new state (including the present one) with probability 0.1; adjacent noise samples are thus highly correlated, i.e., the noise is bursty. The impulsive index A, i.e., the average number of active interferers, was chosen to be either 0.2 or 1, that statistically correspond to a "hardly present interferer" or to an "almost constantly present" one (at least). Γ is the ratio between the power of background noise and the average power of interferers and was set to 0.01, corresponding to a strongly impulsoive noise where interference can boost noise power by a factor of Γ −1 = 100. The signal parameters, in the AR(1) model, were set as follows: a 1 = 0.9 and σ 2 s = 1 (i.e., normalized signal power). We transmitted 100 frames of 1000 samples each (for a total number of 10 5 received samples).
The curve labeled GAKS in Fig. 4 shows the performance of a genie aided Kalman smoother, and refers to an ideal receiver with perfect channel state information, i.e., with an exact knowledge of i k , hence of noise variance at every time epoch. In this case, the system reduces to the classical estimation of correlated Gaussian samples, for which the Kalman smoother is optimal, hence the GAKS curve represents a lower bound for the performance of any non-ideal receiver. The curve labelled PIS in Fig. 4 is obtained through the parallel iterative schedule algorithm discussed in Sec. 4.3. A comparison between 4a, 4b and 4c reveals that the simulation results are very close to each other. 2 This implies that the performance of the estimator is not considerably changed when we opt for noise models with more than M = 2 states, which corresponds to the Markov-Gaussian model of Sec. 2.1. The reason is that a larger maximum number of interferers implies a smaller amount of power per interferer, so that, for a given average SNR, it is the average number of active interferers A, rather than its maximum value M − 1, that determines the performance. Results in Fig. 4 confirm, as expected, that a larger A degrades the performance of the estimator.    For this reason, in the following simulations, we limited our attention to the case M = 4 only. We considered the same noise parameters as in [4], i.e., a correlation parameter x = 0.98, corresponding to an impulsive noise with increased burstiness, while the values of A = 0.2 or 0.8 were slightly decreased (compared to Fig. 4). The values Γ = 0.01 and 0.001, considered in Figs. 5a and 5b, account for a strongly impulsive noise, where the impulsive events can increase noise power up to 1000 times the background noise power. Fig. 5 shows the performance of an estimator that implements the EP and TP algorithms described in Secs. 4.4 (curves labelled EP and TP), that pursue the approximation of Gaussian mixture messages. These strategies exploit soft decisions on the impulsive noise states and show a superior performance, compared to the PIS algorithm that is based on hard decisions on i k . As can be seen in Fig. 5 , the performance of PIS is degraded especially around SNR values where signal and noise power are balanced. At these SNR levels (0 − 10 dB), signal estimation is neither dominated by noise, nor it is similar to a noiseless scenario. On the contrary, the performance of both EP and TP is close to the lower bound (GAKS), meaning that these algorithms are practically optimal. If we do not consider the issue of convergence, it would thus be irrelevant to choose either of the two. However, we recall that the EP algorithm requires the implementation of an extra unproper message rejection strategy. In contrast, the TP algorithm is inherently stable and requires less computations, and it is thus the preferable choice, for this problem.
A comparison between Figs. 5a and 5b further proves that the performance of the estimator does not strongly depend on the value of Γ, as one could expect. To complete the picture, Fig. 6 reports simulation results obtained at other different impulsive index values (A = 0.1, 0.5, 1.2).
The iterative algorithms that have been analyzed (PIS, EP, TP) showed a fast convergence in all cases, so that simulations were carried out in 4 iterations; we verified that convergence is practically reached in 3 iterations and that no further modification of results occurs after the 4 th iteration (as we tested, until the 10 th iteration).

Conclusions
We proposed different algorithms to estimate correlated Gaussian samples in a bursty impulsive noise scenario, where successive noise states are highly correlated. Receiver design was based on a factor graph approach, resulting in a loopy FG due to the correlation among signal samples as well as among noise samples [15]. Due to the joint presence of continuous and discrete random variables (namely, signal samples and impulsive noise channel states), as it typically occurs in these Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 30 November 2020 doi:10.20944/preprints202011.0725.v1 cases, the resulting iterative sum-product algorithm has an intractable complexity [17]. The bursty impulsive noise is in fact modelled either as Markov-Gaussian [3] or as Markov-Middleton [4], hence the channel is characterized by (dynamically switching) states that count the sources of electromagnetic interference, at every time epoch [2]. Although belonging to the broad class of switching linear dynamical systems [16], the system considered here exhibits remarkable symmetry properties that allow an effective estimation of the signal samples through approximate variational inference. A simple parallel iterative schedule (PIS) of messages, including dynamically updated hard decisions on the channel states [13], was shown to provide a satisfactory, although suboptimal, performance, for many different channel conditions [14]. The more computationally costly expectation propagation (EP) [18] is also applicable to this problem albeit affected by its usual instability problems during the iterations, that can however be solved by known methods in the literature [24]. For this reason, an alternative transparent propagation (TP) algorithm has been introduced, that has a lower computational complexity (compared to EP) and that is inherently stable [19]. Both EP and TP reach a performance that is close to the optimal one, i.e., that of a receiver with perfect channel state information.
For the first time, we applied all of these algorithms to bursty impulsive noise channels with a very large number (up to 15) of independently switching interferers [20]. Results demonstrate that, for a given SNR, the degradation induced by many (e.g., 15) interferers, with limited power, is similar to that produced by fewer (e.g., 3) stronger interferers with correspondingly larger power. Hence, few-states channel models, or even the binary Markov-Gaussian model, are adequate to predict the performance with any of the algorithms discussed here, both suboptimal (PIS) and close to optimal (EP,TP). Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: