Kurtosis-Based Symbol Timing and Carrier Phase/Frequency Tracking

Kurtosis is known to be effective at estimating signal timing and carrier phase offset when the processing is performed in a “burst mode,” that is, operating on a block of received signal in an offline fashion. In this paper, kurtosis-based estimation is extended to provide tracking of timing and carrier phase, and frequency offsets. The algorithm is compared with conventional PLL-type timing/phase estimation and shown to be superior in terms of speed of convergence, with comparable variance in the matched filter output symbols.


Introduction
In [1], a method was introduced for burst-mode symbol timing estimation and carrier phase estimation based upon complex and real kurtosis, respectively, of the received signal. The method involves computing kurtosis at several different parameter values (for both delay and phase) and is thus computationally expensive and more suited to offline computation than real-time implementations or parameter tracking. In this paper, kurtosis-based methods are extended to algorithms that track timing and phase. The carrier estimation is also extended to include both carrier phase and carrier frequency offset. As tracking algorithms (rather than burst-mode algorithms, which obtain one estimate for an entire burst), these algorithm are potentially amenable to real-time tracking application. In one development, tracking is accomplished by moving downhill on an objective function surface that operates without derivatives in a manner analogous to the Nelder-Mead simplex algorithm in one dimension [2]. In another development, a gradient descent algorithm is employed for phase/frequency estimation. The gradient descent method has higher complexity than the non-derivative method but has an otherwise similar performance. As shown in simulations, the kurtosis-based algorithms typically converge in fewer symbols than conventional PLL-type timing and phase tracking methods.
While [1] applied this kurtosis-based method only to QPSK constellations, in fact, it is agnostic with respect to signal constellation. Conventional synchronization algorithms typically employ knowledge of the signal constellation using training symbols and/or in a decision-directed mode (see, e.g., [3]). In some settings, such as in cognitive radio radio settings, in which an "intelligent receiver ... adapt [s] itself to a specific transmission context and blindly estimate[s] the transmitter parameters for self-reconfiguration purposes" [4], signals with unknown signal constellations may be employed. It would be helpful to be able to perform symbol and carrier sync without knowledge of the constellation, following which constellation identification may be more readily undertaken. The algorithms presented here serve that purpose. In addition, many detection problems require the symbols to be appropriately scaled, which often requires the use of automatic gain control (AGC) loops as part of the synchronization process. It may be advantageous to decouple the AGC loop from symbol timing and phase estimation, which the kurtosis-based approach where y gauss is a Gaussian random variable with the same covariance as y. When the negentropy is approximated using higher-order cumulants, the negentropy can be expressed as For a zero-mean variable with symmetric distribution, the negentropy is thus essentially equivalent to the kurtosis. This relationship between an entropy-related quantity and the kurtosis is what suggested this article as a venue of publication. Parameter estimation for communication systems has been widely studied. Textbooks on this topic include [10][11][12][13][14][15][16]. Parameter estimation for communication is also covered in conventional digital communication textbooks, e.g., [3,17,18]. See also [19]. What distinguishes this work from all of these references is the use of complex and real kurtosis as a primary tool for adaptation toward the parameter estimates, which enables the parameter estimators to operate agnostic of the signal constellation. While there are some methods that can be applied without knowing the transmitted signal, such as taking powers of the received data to remove digital symbol information, those methods work only with some constellations. By contrast, the kurtosis-based methods are more general. As shown below, they can converge quickly to parameter estimates, more quickly than, for example, PLL-based methods. This suggests the possibility of kurtosis-based phase estimation to be used in phase acquisition and, where the constellation is known, switching over to a PLL-type technique for tracking. This paper focuses on single-carrier linearly modulated digital communication signals. Extension to multicarrier signals is a topic for future research.

Signal Model
Digital information is transmitted at a rate of 1/T s symbols per second according to the complex bandpass representation where p(t) is a unit-energy, pulse-shaping function satisfying the Nyquist zero ISI theorem (e.g., a square-root raised cosine, SRRC [3] (Appendix A)), s k = a k + jb k is a complex point from the signal constellation, and ω 0 is the carrier frequency. The pulse p(t) is assumed to be symmetric so that the matched filter is the same as p(t). The received signal is where τ is delay resulting from transmission through the channel and n(t) is noise, assumed to be 0 mean. At the receiver, the signal is bandpass filtered and basebanded using a frequency ω 1 ≈ ω 0 . The resulting complex (nearly) basebanded signal is denoted as where ω off = ω 0 − ω 1 is the residual carrier frequency offset and φ accounts for the time delay and changes in index reference at the receiver. This signal is rotated by an estimate of the offset frequencyω off and passed through a matched filter with estimated delayτ to produce the signal x(t) = e −jω off t (u(t) * p(t −τ)), where * denotes convolution. The matched filter output can be expressed in terms of the pulse autocorrelation function where z(t) represents the noise filtered through the matched filter and r p is the pulse autocorrelation function The representation in (1) is accurate provided that the frequency offset does not exceed about 5% of the symbol rate [16] restriction was pointed out by a reviewer).
In modern practice, of course, the processing steps described above are implemented in discrete time and filters must be truncated to finite length. The signal u(t) is sampled at a rate of P (an integer) samples per symbol. Using T = T s /P, we write the basebanded sampled signal as u[n] = u(t)| t=nT . The pulse-shaping function p(t) is truncated to finite duration, which we take to be (Q − 1)T, where Q is an odd integer and T is the sampling interval. The pulse-shaping function is thus represented by Q samples, p(nT). Different authors employ different conventions for the pulse shaping function, representing it either as a noncausal signal, centered around 0, or as a causal signal. We use a notation that accommodates both convention. If the pulse p(t) is centered around t = 0, then there are (Q − 1)/2 samples before and after n = 0 and the peak of r p (nT) is at n = 0. If p(t) and its matched filter are causal, then the samples of interest of p(t) occur at indices for n = 0, 1, . . . , Q − 1. In this case, the peak of the autocorrelation function occurs at time t = (Q − 1)T, corresponding to sample n = (Q − 1). In either case, let O ("offset") be the index offset at which the peak sample of r p occurs: O = 0 for the pulse centered around the origin and O = Q − 1 for the causal pulse.
Due to the zero-ISI property, samples of the autocorrelation at shifts of multiples of T s are 0: Let the downsampled signal be indexed so that the sample at n = 0 corresponds to the full matched filter response of the first symbol. That is, for the causal pulse-shaping function and its matched filter, x[n] = x(t)| t=(O+n)T and z[n] = z(t)| t=(O+n)T . This results in Thus, x[0] corresponds to the symbol s 0 , etc. In what follows, the noise terms z[n] is omitted for brevity. (The phase change due to the difference in O was absorbed here into the phase φ.) The matched filter output is downsampled to the symbol rate, taking samples at indices n = P, = 0, 1, . . .. The downsampled matched filter output is This can be decomposed into the term k = and the other terms are (2) Ifτ = τ, then the second sum, representing intersymbol interference (ISI), disappears and the downsampled matched filter output is y[ ] = s e j((ω off −ω off ) PT+φ ), a single rotated (and rotating) symbol output. On the other hand, whenτ = τ, the sample is corrupted by the terms in the ISI sum. The goal of the timing delay estimation problem is to determineτ to eliminate the ISI. The goal of the phase/carrier offset problem is to eliminate the complex rotation factor e j((ω off −ω off ) PT+φ) .

Review of Kurtosis-Based Estimation
With this communication notation in place, we are now in a position to describe kurtosis-based parameter estimation. The kurtosis of a real zero mean random variable y is defined as [20,21] K The kurtosis of a complex random variable is [21] The kurtosis (either real or complex) has the following key properties: (1) If y and z are independent random variables, then for constants a and b, K(ay + bz) = |a| 4 K(y) + |b| 4 K(z), and (2) the kurtosis of a Gaussian random variable is 0. For non-Gaussian random variables, the kurtosis may be greater than zero or less than zero. Random variables representing points drawn from a symbol constellation have negative kurtosis (that is, they are sub-Gaussian). For a stationary (or nearly stationary) sequence of random variables (y [1], y [2], . . . , y[N kurt ]), the kurtosis may be estimated by using sample averages instead of expectations. Thus,K and, analogously, an estimateK r is computed for the real kurtosis. Note thatK c (y) accepts an entire sequence of data as its argument, over which the averaging occurs to estimate the kurtosis.
The complex kurtosis of a sequence of matched filter outputs y[ ] may be used to determineτ as follows. Considering the matched filter output in (2), whenτ = τ, by the central limit theorem, the sum of terms due to ISI causes y[ ] to tend toward having a Gaussian distribution, which has small absolute kurtosis. On the other hand, whenτ = τ, y[ ] consists only of the phase-rotated point from the signal constellation. Since the phase rotation does not affect the complex kurtosis, this y[ ] has negative kurtosis. This concept is used in [1] to synchronize a communication burst of many symbols (e.g., on the order of 100 symbols) using a method portrayed in Figure 1a. The received signal passes through a bank of N τ matched filters, each having a different delay. (The burst-mode synchronization operation implies that the matched filtering is computed over the entire received sequence.) The delays τ 0 , τ 1 , . . . , τ N τ −1 uniformly sample the range [0, T s ). The kurtosis of each of the downsampled matched filter outputs is computed, and the signal having the lowest (most negative) kurtosis is selected. That is, Since complex magnitudes are employed in the complex kurtosis, a lack of knowledge about phase and carrier offset does not affect the estimation of the delay. By contrast, in conventional time and phase estimation (e.g., [3]), timing and phase are jointly estimated in a joint computational structure, so the convergence of one estimate affects the convergence of the other.   The paper [1] recommends determining carrier frequency offset ω off using Fourier transform techniques. This is still recommended when using the methods described in this paper when the carrier frequency offset exceeds the ability of the algorithm to track. However, even after removing the carrier frequency offset using Fourier transform techniques, some residual carrier may remain to be tracked, which we still refer to as ω off . The method presented below reliably perform such offset frequency tracking.
For the moment, however, assume that the frequency offset is removed, so that, when the symbol timing is correctly estimated, the matched filter output is y[ ] = s e jφ . In terms of real and imaginary parts, y[ ] = y r [ ] + jy i [ ] and s = a + jb . By stacking these components, the rotation can be written, and using the rotated symbols can be written as According to (5), the rotation G(φ) produces a linear combination of the real and imaginary components of the symbol. Computing the kurtosis (separately) on the real and imaginary components y r [ ] and y i [ ], the kurtosis is smaller (nearer to zero) when φ = 0 due to the mixture of a and b . Remarkably, even though y r and y i are mixtures of only two random variables, the presence of the mixture can be detected using kurtosis. A phase estimateφ is selected and minimizes the sum of the kurtosis of the real and imaginary parts: The phase offset can be removed by computing the product y rot [ ] = e −jφ y[ ]. This can be expressed in matrix/vector form by stacking real and imaginary parts, leading to Whenφ = φ, the components of y ,rot are not mixed, so that the sum of the real kurtosis of the real and imaginary parts are maximally negative.
This kurtosis-based estimation is applied as shown in Figure 1b. There are N φ different rotations which span [0, 2π). The time-synchronized matched filter output y[ ] is rotated by each rotation, and the sum of the kurtosis of the real part and the kurtosis of the imaginary part are computed. The symbol having is determined from the kurtosis that is smallest (most negative) kurtosis.

Symbol Timing Tracking: Discrete Downhill Minimization
The kurtosis-based methods portrayed in Figure 1, representing the method in [1], require evaluation using N τ different matched filters in the symbol timing sync, followed by the same number of kurtosis computations, or N φ phase rotations, followed by the same number of kurtosis computations, where the kurtosis was computed using data over an entire burst. In [1], the expense of these computations is ameliorated by the fact that these computations are performed only once per burst. However, there is no provision in [1] to re-estimate or track the parameters. For this reason, the method is referred to as a "burst mode" algorithm, suitable for one-time estimation on a burst, without adaptation. The present paper computes only two kurtosis values for each symbol and computes the kurtosis over smaller segments of the signal, which reduces computational complexity (for each kurtosis). The authors found it surprising that, even using rather short segments to estimate the kurtosis (resulting in noisy estimates of the kurtosis), these kurtosis estimates were able to be used to estimate the timing and phase parameters.
This section describes how to use two kurtosis values computed for each symbol using a method that "slides downhill" toward the most negative kurtosis to estimate the timing offset. Only two kurtosis values are computed per symbol (as opposed to evaluating at N τ different values). This adaptive downhill slide is able to track changes in the parameters.
Let the (ostensibly) basebanded signal be denoted as u [n]. It is assumed that it is available over the necessary length of indices (such as by buffering).
In this paper, instead of evaluating kurtosis at N τ different timing offsets, at each symbol time, kurtosis is evaluated at two timing estimates. Let i 1 and i 2 be the indices of delay of the matched filters being used at the present time, where the indices refer to the delay estimates τ i 1 and τ i 2 . In order to produce the downsampled matched filter outputs used to estimate kurtosis, L = Q + (N kurt − 1)P input symbols are convolved with the matched filters p i 1 and p i 2 , resulting in the matched filter outputs where * denotes convolution. The downsampled matched filter samples are indexed so that the first retained matched filter sample is at n = 0 are denote the complex kurtosis estimated from the sequence y m [n], computed as in (4). The minimization algorithm seeks to minimize the kurtosis K m over a series of symbol times. The minimization operates analogous to a one-dimensional Nelder-Mead algorithm [2], rolling downhill toward minimum kurtosis by moving in the direction of smaller kurtosis. This is referred to as discrete downhill minimization. It has been found by experimentation on communications data that the kurtosis is, in fact, a convex function of the delay.
At some symbol time of the two kurtoses K 1 and K 2 , computed using p i 1 and p i 2 , the lower (closer to −∞) kurtosis is retained, the downsampled matched filter output is saved in y best , and the index of the other delay is adjusted by n τ steps to attempt to move toward lower kurtosis. The operation is detailed in Figure 2 and the logic is explicitly described in lines 48-60 of Algorithm 1 below. After execution of the steps, i 1 indexes the delay with lower kurtosis and i 2 is ready to be tested at the next time step. For example, in Case 1, since K 1 > K 2 , i 1 is set to the value i 2 and i 2 is adjust by some number of steps n τ . n τ is a small integer, say 2 or 3, indicating how fast to adapt. The other cases in Figure 2 correspond to other configurations of the kurtosis as a function of the delay indices i 1 and i 2 .
If i 1 already indexes the delay of minimum kurtosis, i 2 bounces back and forth between i 1 − n τ and i 1 + n τ , leaving i 1 at the delay τ i 1 , producing the lowest kurtosis. The descent algorithm moves at n τ at each step, so that the average number of steps to converge is N τ /(2n τ ).
The sequence of matched filter outputs selected with the lowest kurtosis is referred to as y best .
The number of steps of adjustment, n τ , is adjusted to provide a variable stepsize algorithm. The variance in the changes in index values is computed. When the variance is below a threshold (suggesting that the estimate is converging to a steady value) n τ is decremented, provided that it is > 1. This reduces the jitter of the estimate in steady state. This dynamic encourages rapid convergence. Tuning of the algorithm can be performed by adjusting the decision thresholds. (While not totally satisfying, this is not so different from a PLL-based estimator, for which tuning may be required to obtain a desired performance.) Before: Before: After: Before: After: Before: After: When an index falls outside the range 0, 1, . . . , N τ − 1, it should be reduced modulo N τ . However, in order to preserve the directional ordering between i 1 and i 2 , this modulo reduction occurs only when both indices fall outside this range. Then both indices are reduced. These operations are described in lines 69-77 of Algorithm 1.

Carrier Frequency and Phase Tracking: Discrete Downhill Minimization
In this section, a minimization technique is developed for carrier frequency and phase tracking, similar to that used in the previous section for timing synchronization. This allows the algorithm to track these parameters.
The phase is estimated to have one of N φ different values φ 0 , φ 1 , . . . , φ N φ , which uniformly sample the range [0, 2π). Let ∆ φ denote the increment in phase between adjacent steps, e.g., Phase is estimated by rotating y best using two values of phase indexed by j 1 and j 2 and an estimate of ω off , then by estimating the real kurtosis on the real and imaginary parts of the signal, and by using this information to move downhill. The rotating signals are produced by where denotes element-by-element multiplication. The real kurtosis is computed as J 1 = realkurtosis(Re(y φ1 )) + realkurtosis(Im(y φ2 )) and similarly for J 2 .
The frequency is estimated as follows. Let j last denote the phase index of the best phase at the previous time. The change in phase from the last to the current time is ∆φ = (δ φ )(j 1 − j last ). With an index changes of 0 or ±1, ∆φ may be thought of as a discretetime point process taking values 0 and ±δφ. The average value of this point process is the frequency estimateω. The frequency estimateω taken as the output of a single-pole lowpass filter with unit DC gain and with input ∆φ. The pole of the filter is denoted by α. The filtering is computed according toω = ∆(1 − α) + αω last ,ω last =ω.

Huber Loss Function
The fourth moments computed in the kurtosis open the algorithm to the vulnerability that outlier events unduly affect the estimate. This effect can be mitigated by employing a Huber loss function [5], which is commonly used for robust estimation. This function is defined as follows: behaves quadratically for small values of a (when |a| ≤ δ H ) and linearly for larger values of a, with the transition defined such that the transition from quadratic behavior to linear behavior at |a| = δ H is continuous and continuously differentiable. The real kurtosis of (3) is approximated using the Huber function by In complex kurtosis, for the terms involving the magnitude |y|, the Huber function can be applied directly. For the term involving y 2 , the Huber function is applied to the magnitude, leaving the phase quadratically varying. The complex Kurtosis is approximated using the Huber function as follows: This is estimated from a sequence of observations: Huber functions can be used in Algorithm 1 by simply replacing the computations of K 1 and K 2 at lines 45 and 46 with the complex Huber function, and the computations of J 1 and J 2 at lines 85 and 86 with the real Huber function.

Gradient Descent for Phase Estimation
As an alternative to optimizing over a fixed number of alternatives, gradient descent may also be employed. We illustrate the method here for phase estimation; this can be modified for timing estimation.
Let the real and imaginary components of the rotated signal be denoted as where c(φ) = cos(φ) and s(φ) = sin(φ). The objective function is the sum of the real kurtoses of the real and imaginary parts. Expanding this and identifying the moment terms using the variables A through D, we find

Experimental Results
Kurtosis-based estimation was compared with the PLL-based timing and phase synch algorithms of [3] (Sections 7.4 and 8.4.4). In these algorithms, the timing interpolator is governed by the fractional symbol µ. The phase estimate is represented by the DDS (direct digital synthesizer) value. Two examples are presented here, one with ω off = 0 and one with ω off = 0.01. Example 1. QPSK, SNR = 10 dB, excess bandwidth=0.2. ω off = 0. Algorithm parameters: N kurt = 20, N τ = 60, N φ = 40, Q = 101. ω-filtering parameter α = 0.99. Figure 3a shows the results of the delay estimation. The inset in the figure shows the first 30 samples, illustrating that convergence has occurred in about 10 symbols periods. Figure 3b shows the phase estimation. The phase estimate (top, in radians) shows the phase estimate. The inset shows that the phase estimate has converged in less than 10 symbols. The ∆-phase (middle) shows ∆φ. The bottom plot shows the estimate of ω.
By comparison, Figure 4a shows the PLL-based estimates, with a filter time-bandwidth product B N T = 0.01. The top plot shows the fractional symbol µ, demonstrating convergence somewhere around 200 symbols. The bottom plot shows the phase converges in about 100 symbols.
As another method of comparing performance, the matched filter outputs were clustered (according to nearest signal constellation point) and the average variance of the clusters was computed. This variance was compared with the variance that would occur if the only source of noise were the AWGN. These variance results are shown in Table 1. In the column labeled "Noise Var", the variance due to the AWGN at that SNR is shown. For example, in the first row where E b /N 0 = 10 dB, the noise variance is 0.05. The column labeled "No Offsets" shows the result of estimating the variance of the clustered matched filter outputs when there are no phase or timing offsets. When E b /N 0 = 10 dB, this is 0.05. This value should be close to the "Noise Var." The columns labeled "δ H " and "Grad µ" indicate the settings for the parameters. "δ H " set to a value indicates that the Huber function was used. "Grad µ" set to a value indicates that gradient descent was used. The column labeled "Kurtosis" indicates the variance of the clustered matched filter outputs for the various kurtosis-based estimation algorithms after convergence. For example, the first row with δ H = 0.5 shows that the variance is 0.061. This can be compared with the variance in the "No Offsets" column, indicating that using the estimate does, in fact, increase the variance of the matched filter outputs compared to not having to estimate the parameters at all. The column labeled "PLL-based" shows the variance of the clustered matched filter outputs for the PLL-based estimators. For E b /N 0 = 10dB, this variance is 0.06. Finally, the column "dB(Kurt/PLL)" shows the comparison (in dB) between the kurtosis and PLL-based estimators, where negative numbers indicate superiority of kurtosis-based vs. PLL-based. For example, in the first row, kurtosis-based estimation is 0.06 dB worse (in variance) than PLL-based estimation.
As Table 1 shows that the variance performance between kurtosis-based and PLLbased estimation is generally quite close.
Example 2. This example shows behavior typical of the kurtosis-based phase estimate. In this case, N φ = 100, and SNR = 24 dB. Figure 5 demonstrates an example of the phase estimate. After convergence, the estimator tends to jitter around the bottom of the kurtosis "bowl", because the kurtosis is only estimated. The phase variance thus largely determined by the steps size, essentially 2π/N φ .      Figure 6a shows the results of the delay estimation. The inset in the figure shows the first 30 samples, illustrating that convergence occurred in less than 20 symbols. Figure 6b shows the phase estimation. The phase estimate (top, in radians) shows the phase estimate. The inset shows that the phase estimate converged in less than 10 symbols. The ∆-phase (middle) shows ∆φ. The bottom plot shows the estimate of ω. Figure 7 shows results for the PLL-based methods in this setting. While the phase tracking appears smoother in the PLL-based estimate, as the results for Example 3 in Table 1 show, there is less variance around the constellation points at the matched filter outputs for the kurtosis-based methods than the PLL-based methods when step descent is used. Interestingly, the gradient descent performs significantly worse than PLL-based estimation at all SNRs.

Comparison with Modified Cramer-Rao Lower Bound
Evaluating the performance in terms of how the recovered signal points are clustered, as in Table 1, is a natural way to evaluate the performance of these estimation algorithms, since it demonstrates how all of the estimators work together to achieve what is desired in the receiver: good signal detection. Another way of evaluating performance of estimators is to compare the estimator variance against lower bounds such as the Cramer-Rao lower bound (CRLB) or modified CRLB. The modified CRLB provides a bound when the observed data depends on multiple parameters, but only one parameter at a time is estimated [22]. The modified CRLB is, in general, lower than the CRLB.
The modified CLRB for the estimate of φ is [22] (Equation (31)) In [22], B L = 1/(2LT s ), where LT s is the length of window over which the estimator operates. The modified CLRB for the estimate of τ is [22] (Equation (32)) where where P( f ) is the Fourier transform of the pulse-shaping function p(t).
The kurtosis is computed over N kurt symbols (matched filter outputs). The first matched filter output occurs at sample QT s /P, with symbol outputs occuring every T s seconds thereafter. Thus, the duration over which a kurtosis is computed is QT s /P + (N kurt − 1)T s . We take this as the value of B L T s over which the estimate is computed. Since the estimator steps over several symbol times to converge, this does not apply initially but should be applicable after convergence. Figure 9a shows the variance of estimate of τ using the kurtosis-based method (red) and the conventional (loop-based) method (yellow) compared with the modified CRLB (blue), as a function of E b /N 0 . The variance of the kurtosis-based method was obtained by computing the variance ofτ after convergence, such as seen in Figure 3a, averaged over 10 independent runs. The variance of the loop-estimated variance was computed as the variance of µ (the fractional timing offset in the synchronization algorithm [3]), averaged over 10 runs. The kurtosis-based method performs significantly better than the loop-based method and demonstrates variance decreases with SNR. However, it does not decrease as fast as the modified CRLB. Additionally, it appears that the kurtosis-based method meets the modified CRLB, which is unexpected. It may be that the value for B L T s used to compute the bound in (7) is larger than it should be. Figure 9b shows the variance of the estimate of φ using kurtosis (red) and PLL (yellow) compared with the modified CRLB (blue). The variance of the kurtosis-based method was obtained by computing the variance ofφ after convergence, such as seen in Figure 3b, averaged over 10 independent runs. The variance of the kurtosis method hardly changes with SNR (although it can be seen to vary slightly). The continued variation is due to the fixed step size, and the fact that the kurtosis estimate has some variance associated with it causing jitter in the indices. Depending on the value of N φ , this can achieve variances less than those of the PLL-based method. This suggests that a variable N φ algorithm would be useful: use a larger step size to converge quickly, then reduce the step size to reduce the variance.

Discussion and Conclusions
We demonstrated that kurtosis-based methods can be applied to symbol timing and phase estimation. These offer potential advantages such as being able to synchronize without knowledge of the signal constellation in a scale-invariant way and, generally, convergence as fast as or faster than PLL-based methods. The experiments showed that the variance of the matched filter outputs for kurtosis-based methods is about the same as or slightly better than PLL-based methods. The gradient descent phase estimation, however, does not generally perform as well as discrete minimization methods. We made the following observations: • Using the Huber function instead of the full kurtosis reduces the variance of the estimators. • For lower SNRs, the gradient-based method performs significantly worse than the discrete minimization. • The gradient-based method has a higher complexity than the discrete minimization. • Kurtosis-based methods converge more quickly than PLL-based methods. The number of steps is determined primarily by N τ , N φ , and n τ . (There is a secondary effect due to the bandwidth of the lowpass filter forω off .) The fast convergence time suggests that these kurtosis-based methods may be useful in situations where short packets are used, such as in the Internet of Things.