Selecting FFT Word Length for an OFDM Receiver That Supports Undersampling

: In this paper, we focus on Orthogonal Frequency Division Multiplexing (OFDM) transceivers where undersampling is employed by the receiver Analog / Digital Converter (ADC) when sparse information is exchanged. Several Fast Fourier Transform (FFT) symmetry properties are exploited to allow the substitution of speciﬁc input values by others that have already been sampled by the ADC. Several architectures have been proposed in the literature for e ﬃ cient FFT implementations in terms of power, speed and hardware resources. The FFT input / output values, twiddle factors, etc., are complex numbers with their real and imaginary parts being represented using ﬁxed point format. A tradeo ﬀ has to be made between rounding error and complexity. The optimal minimum FFT word length is investigated by combining the undersampling and the rounding error. A conﬁgurable new FFT architecture has been developed in hardware description language to test the error model with various FFT sizes, word lengths and Quadrature Amplitude Modulations (QAM). A system designer can take into account the sparseness of the input data and deﬁne the desired rounding and undersampling error relation. The developed error model would then predict the required word length and ADC resolution with average Root Mean Square Error (RMSE) less than 1.


Introduction
Recovering information from fewer samples is possible if data are sparse or compressible. In this case, an ADC can operate in a sampling rate closer to the actual information rate rather than the Nyquist one [1]. The ADCs in this case are often called Analog to Information Converters (AIC) [2]. Compressive Sampling or Sensing (CS) methods employ iterative optimization techniques like Regressive Analysis or Orthogonal Matching Pursuit [3] to recover information from fewer measurements. The hardware implementation of a CS algorithm requires a large number of resources due to its increased complexity. CS techniques are applied in image processing applications such as radar, medical imaging (such as Magnetic Resonance Imaging (MRI), ultrasounds, X-ray imaging), surveillance systems, etc. For example, in [4], the acquisition time needed for MRI scans is significantly reduced. In [5], radar data are compressed and decompressed using CS techniques with Normalized Mean Square Error (NMSE) ranging between 1.5 and 2.5 under certain measurement conditions.
In OFDM environments channel estimation is achieved using CS techniques since it can be assumed that the OFDM channel is sparse. In this way, the number of pilots can be substantially reduced as described in [6], where 511 subcarriers and 20 pilots are used, and only a few of the 40 channels taps are assumed non-zero. The Bit Error Rate (BER) is approximately 0.003 if Signal/Noise Ratio (SNR) is 30dB and 5 of the 40 channel taps are non-zero. The efficient implementation of sparse FFT and Inverse FFT (IFFT) in OFDM environments is an important target since these modules are computationally intensive and power greedy. In [7] sub-sampling of the input signal is performed with in Section 4. Finally, in Section 5 the simulation results are presented along with a discussion on how the error modeling followed can be useful for other non-OFDM telecommunication systems.

Proposed Undersampling Method on OFDM Receiver Side
In an OFDM transmitter, the binary input data stream is encoded for the generation of a parity bit stream used for error correction on the receiver side. The data and parity bit streams are usually interleaved in order to avoid burst errors. Then, groups of log 2 q bits are mapped to q-QAM constellation symbols X k (0≤k<N) that form the parallel input to an N-point IFFT. The output of this IFFT are the symbols x n (0≤n<N) that are serially transmitted over the channel. Pilot symbols with known value are placed on reserved subcarriers for channel estimation and equalization. A cyclic prefix is also appended to avoid Inter-Symbol Interference (ISI). Digital/Analog Conversion (DAC) is required for the transmission of the resulting symbols using an appropriate pulse shaping method [12,13]. In wired OFDM transceivers the channel noise is assumed to be Additive White Gaussian Noise (AWGN) and the y n symbols received are y n =x n +z n , (z n is the noise with variance σ 2 n ). A different model is used for wireless channels that takes into consideration the reflections, interference, Rayleigh fading, etc. In optical communications the fiber channels are affected by several sources of distortion including Kerr non-linearity, chromatic dispersion, optical filtering, double Rayleigh scattering, shot and thermal noise and especially Amplified Spontaneous Emission (ASE) [12]. The y n symbols at the output of the receiver ADC, form the input of an FFT. The FFT output symbols Y k (0≤k<N) are mapped to the closest QAM symbol and then QAM demodulation is performed (e.g., using hard or preferably soft decision demodulators). Forward Error Correction (FEC) decoding (Viterbi, Turbo codes, etc) exploits the available parity bits in order to correct as many errors as possible on the receiver side.
A Recursive Systematic Convolutional (RSC) FEC encoder can be used such as the one with feed forward and feedback polynomials: 1+D+D 2 +D 3 and 1+D+D 2 respectively, where the D p denotes a delay of p clock periods. In order to apply the proposed undersampling method, the Interleaver should generate q-QAM symbols derived from parity or data bits only. Thus, a pair of small buffers at the FEC encoder output can store temporarily log 2 (q) bits from the systematic and the parity output of the RSC encoder before they are mapped to the corresponding q-QAM symbol. Most of the q-QAM symbols derived from sparse data bits will have a common value X c . However, several parity q-QAM symbols derived from parity bits are likely to have identical values because the parity output of the employed RSC encoder described above remains '0' until a first data '1' appears. Then, the 7-bit pattern "0111010" is repeated as long as the data input remains '0'. A different 7-bit pattern is generated when another '1' appears at the input and this is also repeated until a third '1' appears. The distance between each '1' at the FEC encoder input is expected to be high due to the sparseness in the input. Consequently, several identical consecutive 7-bit patterns are expected to appear at the parity output of the FEC encoder. These identical parity patterns can be treated as if they were X c symbols as will be explained below.
The proposed undersampling method is taking advantage of the data sparseness in time domain in conjunction with some well-known properties of the Discrete Fourier Transform (DFT). Proper symbol arrangement is employed at the IFFT input, allowing the substitution of several symbols at the receiver FFT input, by others without significant loss of information. In this way, the receiver ADC can periodically relax (operate in lower rate) without sampling some y n symbols since they can be substituted by others, that have already been received. The adopted symbol arrangement at the IFFT input, makes trivial several IFFT/FFT operations that can be omitted in order to achieve lower power and faster operation. This is similar to output pruning described in [14]. If w r N = e i2πr/N are the twiddle factors then DFT is defined as: and the Inverse DFT (IDFT): One of the IDFT symmetry properties used by the proposed undersampling method concerns the relationship between x n and x n+N/2 when n is odd: According to Equations (3) and (4), x n = x n+N/2 . if X k = X k+N/2 . As already mentioned, X k and X k+N/2 with k being odd, are likely to have equal value if they are both sparse data q-QAM symbols or if they are parity q-QAM symbols generated in a relatively close distance. Consequently, if for example data q-QAM symbols are placed in the odd positions of the IFFT input, then up to half (N/4) of the odd y n symbols at the FFT input can be replaced by their counterparts at distance N/2: y n+N/2 . It is obvious that no error would occur only if all the data q-QAM symbols placed the odd positions were equal to X c , e.g., if the data input is constantly zero. An error occurs if some of the data q-QAM symbols at the odd positions are not trivial (not equal to X c ). To reduce the probability of errors, the number of samples R that can be substituted at the input of the receiver FFT can be lower than N/4, e.g., N/8 or N/16 [8].
Another DFT symmetry property had also been employed in [9] to extend the maximum number of samples that can be replaced from N/4 to 3N/8. It is based on the following properties: w The IFFT output x n can be expressed as: From Equation (5) it can be deduced that x n =x N/2-n (n<N/4) if the following three conditions hold: a) X k = X N 2 −k (with odd k and k≤N/4), b) X N−k = X N 2 +k (with odd k and k≤N/4) and c) X k = X N−k with even k and 0≤k<N/2. In a similar way, it can be shown that x N/2+n =x N-n (n<N/4). Consequently, the y n samples with odd n≤N/4 can be substituted by the samples y N 2 −n and the y N 2 +n samples can be substituted by y N−n .
If 16-QAM modulation is employed, the IFFT input packet structure shown in Figure 1 can be used in order to apply the proposed undersampling technique. The data QAM symbols have been placed at the even positions and most of them have identical value (X c ). If the RSC FEC encoder described earlier is employed then, the repeated 7-bit parity pattern can be padded with one more bit. The Most Significant 4-bits (Parity MSB) and the Least Significant 4-bits (Parity LSB) of the padded 8-bit parity patterns are placed in Figure 1 in appropriate positions in order to apply the proposed undersampling scheme with the lower possible error. In [9], the proposed undersampling method is extended to Symmetry 2020, 12, 543 5 of 20 wireless OFDM systems with STBC encoding and several other IFFT input packet structures like the one shown in Figure 1 are described.
Symmetry 2020, 12, x FOR PEER REVIEW 5 of 19 Figure 1. An appropriate IFFT input structure that supports the undersampling mode proposed in [8] and [9].
The representation of the real numbers is very important to computationally intensive operations like the FFT and IFFT module of an OFDM system. Due to the FFT's high complexity, the real numbers have to be implemented using the minimum word length, otherwise severe rounding errors may occur. The selection of the minimum word length in the case of an OFDM system that supports undersampling is studied in conjunction with the UE error caused by the undersampling process. It would be redundant to use more bits for the representation of the real numbers since it would not improve the UE error.

Review of Quantization and Round-off Error Estimation Methods
One significant source of error is the communication channel that can be affected by AWGN, Rayleigh scattering, ASE, instant and thermal noise, etc., according to the physical media it consists of [12,13]. These sources of error are taken into account by the degradation they cause to the channel SNR. In this sense, the channel errors are not combined with the error sources at the receiver that are examined below.
The Quantization Error (QE) caused by ADCs and the Round-off Errors (RE) that stem from the use of finite word-length for the representation of real numbers have been extensively studied for several decades. Some of the popular old and newer methods are summarized here to combine with the specific requirements of the undersampling method described in subsection 2.1.
Welch [15] studied the effect on the output of the rounding at each stage of a Radix-2 Decimation in Time (DIT) FFT. Block floating point format with the mantissa and exponent are stored with fixed bit length. The exponent is common for all numbers thus, only the mantissa is stored for each number. The biggest number at the input is downscaled until its absolute value is between 0.5 and 1. The rest of the numbers are downscaled in the same way in order to preserve a common exponent. The upper bound of the error expressed in RMSE is estimated as: The parameter C is a constant between 0.4 and 0.9 depending on the signal shape. In a more recent paper, Pálfi and Kollár [16], showed experimentally that Welch' s results [15] are not valid if the input is between −1 .. +1 instead of 0 .. +1.
A worst-case output Noise-to-Signal Ratio (NSR) is estimated in [17] taking into consideration the QE of the sin/cos coefficients of an FFT. If these coefficients are represented with b+1 bits then ≤ | | 2 ( − 2) in radix-2 FFT, with | | = 2 − /√2 and with windowed input signal. The parameter mw depends on the window function (m values between 3 and 10 are tested in [17]). This limit is compared in [17] against older stochastic approximations presented in [18] ( = 2 −2 /6) and [19] ( = 2 2 2 −2 ). If ∆ is the minimum difference that can be discriminated between the quantized real numbers (or voltage levels) by a bADC-bit resolution ADC with reference voltage Vref, then = 2 ⁄ (see Figure 2). For normalization reasons we assume Vref=1. The error caused by the quantization process is between − /2 and + /2. The error probability is assumed to be uniform (1/ ) in these limits. The variance of the error is then estimated as: 2 = 2 12 ⁄ . The RE error caused by the use of finite  Figure 1. An appropriate IFFT input structure that supports the undersampling mode proposed in [8] and [9].
The representation of the real numbers is very important to computationally intensive operations like the FFT and IFFT module of an OFDM system. Due to the FFT's high complexity, the real numbers have to be implemented using the minimum word length, otherwise severe rounding errors may occur. The selection of the minimum word length in the case of an OFDM system that supports undersampling is studied in conjunction with the UE error caused by the undersampling process. It would be redundant to use more bits for the representation of the real numbers since it would not improve the UE error.

Review of Quantization and Round-off Error Estimation Methods
One significant source of error is the communication channel that can be affected by AWGN, Rayleigh scattering, ASE, instant and thermal noise, etc., according to the physical media it consists of [12,13]. These sources of error are taken into account by the degradation they cause to the channel SNR. In this sense, the channel errors are not combined with the error sources at the receiver that are examined below.
The Quantization Error (QE) caused by ADCs and the Round-off Errors (RE) that stem from the use of finite word-length for the representation of real numbers have been extensively studied for several decades. Some of the popular old and newer methods are summarized here to combine with the specific requirements of the undersampling method described in Section 2.1.
Welch [15] studied the effect on the output of the rounding at each stage of a Radix-2 Decimation in Time (DIT) FFT. Block floating point format with the mantissa and exponent are stored with fixed bit length. The exponent is common for all numbers thus, only the mantissa is stored for each number. The biggest number at the input is downscaled until its absolute value is between 0.5 and 1. The rest of the numbers are downscaled in the same way in order to preserve a common exponent. The upper bound of the error expressed in RMSE is estimated as: The parameter C is a constant between 0.4 and 0.9 depending on the signal shape. In a more recent paper, Pálfi and Kollár [16], showed experimentally that Welch' s results [15] are not valid if the input is between −1-+1 instead of 0-+1.
If ∆ is the minimum difference that can be discriminated between the quantized real numbers (or voltage levels) by a b ADC -bit resolution ADC with reference voltage V ref , then ∆ = V re f /2 r ADC (see Figure 2). For normalization reasons we assume V ref =1. The error caused by the quantization process Symmetry 2020, 12, 543 6 of 20 is between −∆/2. and +∆/2. The error probability is assumed to be uniform (1/∆) in these limits. The variance of the error is then estimated as: σ 2 QE = ∆ 2 /12. The RE error caused by the use of finite word-length in an N-point FFT is also viewed as QE in [10]. The quantization noise power P QE in all real and imaginary parts of the DFT outputs as defined in Equation (1) is estimated in [10] as: Symmetry 2020, 12, x FOR PEER REVIEW 7 of 19 The parameters b1, b2, b3 are the word lengths of u, a and v, respectively. The last part of the Equation holds if all the word lengths were equal (b1 = b2 = b3 = b). The total variance of the RE error in the Radix-2 DIT FFT presented in [23] is simplified to = 8 2 for large N values. This Radix-2 variance is compared with Radix-4 and Radix-8 variance (approximately 16 2 and 24 2 , respectively). The approach presented in [23] will serve as the base for the development of our error model as will be described in the following section.
In [24] fast fixed-point algorithms are used to estimate the RE in the DFT. The RE variance depends on frequency index. Modelling RE as a random variable with uniform distribution holds only for the first FFT stage. RE probability density function is not uniform in the following stages. The discretization error variance at each stage p is estimated recursively as: where k=0,…,2 p-1 -1 and ℎ 2 = 2 −2 −4 if the operations are performed with b+1 precision. The parameter 2 is the variance of the discretization error caused by the multiplication with the cos/sin coefficients. These theoretical models are statistically checked in [24] using DIT FFT with two rounding methods.
In [25], the authors optimize the FFT word length in a memory based architecture attempting to avoid a very pessimistic estimation. A butterfly operation is expressed as = 1 2 ( + ) where A and B are the complex inputs. The Z norms are related to the norm of A and B by ‖ ‖ 2 ≤ (‖ ‖ 2 + ‖ ‖ 2 )/2. The covariance of the FFT error at stage p of an N-point FFT (with = log 2 stages) is estimated as: where a(p) and b(p) are the number of bits used to represent the butterfly inputs and sine/cosine coefficients at stage p, respectively. The upper bound of Signal to Noise and Quantization error Ratio y 1 y 9 y 5 y 13 y 3 y 11 y 7 y 15 In FFT implementations, this P QE noise is reduced as indicated by the relation (8) below [10]. The upper limit of the relation (8) corresponds to the classic FFT implementation by Cooley and Turkey [20].
Swartzlander and Saleh [19], describe an FFT Implementation with fused floating-point operations and explain the worst-case error caused in Radix-2 and Radix-4 FFT implementations. Errors of ±1/2 Least Significant Bit (LSB) are caused by rounding and normalization at the output of the adder or the multiplier. Thus, one of the Radix-2 butterfly outputs may have a 1/2 LSB error, while the other may have a 2 LSB error. For the fused implementation the second error is reduced to 1 LSB. In Radix-4, all of the butterfly outputs may have 2 1/2 LSB errors. In the fused implementation presented in [21], rounding and normazation error is reduced to 1 1/2 LSB. Although Radix-2 butterfly error is smaller, the error in an FFT is expected to be smaller for a Radix-4 implementation due to fewer stages and this is confirmed in [21] by simulating 64K point FFT.
The effect of fixed-point format with limited precision for different FFT algorithms is studied in [22]. The error of a single quantization operation is modelled as above and then, the error of a complex multiplication is estimated as 4σ 2 QE . A matrix representation of error propagation model is proposed to analyze the rounding effect in DIT and Decimation in Frequency (DIF) FFTs. A similar propagation model will be examined in the next subsection for the FFT of our case that supports undersampling [8,9]. The radix-2 DIT FFT algorithm has better accuracy in term of Signal-to-Quantization-Noise Ratio Symmetry 2020, 12, 543 7 of 20 (SQNR) [22]. For this reason, we focus on the error modelling of DIT FFT in this paper. The overall output error ∆Υ of a DIT FFT is estimated in [22] as: Where w T j is the equivalent twiddle factor matrix at the i-th stage of DIT FFT algorithm. B d,d−i is the equivalent butterfly matrix at the i-th stage of 2 d -point FFT and e i (0≤i≤d) is the corresponding additive N×1 additive noise vector of w Fi (the equivalent twiddle factor matrix at the i-th stage of DIF FFT: c . The total quantization noise power P nt of the DIT FFT algorithm is: The parameter n T i is the number of nontrivial twiddle factors at the i-th stage. In [23], the case where the twiddle factor word length is different from the register word length is studied. First, the statistical noise model for the prediction of the RE after a multiplication of two quantized signals u and a, of different precision, is presented:v =û * â = (u + ε1)(α + ε2) + ε3 = au + noise. The parameters ε1, ε2, ε3 are the errors caused by the rounding of u, a, and v, respectively. The total noise is uε2 + αε1 + ε1ε2 + ε3.The variance of noise σ 2 n given u, a, is: The parameters b1, b2, b3 are the word lengths of u, a and v, respectively. The last part of the Equation holds if all the word lengths were equal (b1 = b2 = b3 = b). The total variance of the RE error in the Radix-2 DIT FFT presented in [23] is simplified to P RE = 8σ 2 n for large N values. This Radix-2 variance is compared with Radix-4 and Radix-8 variance (approximately 16σ 2 n and 24σ 2 n , respectively). The approach presented in [23] will serve as the base for the development of our error model as will be described in the following section.
In [24] fast fixed-point algorithms are used to estimate the RE in the DFT. The RE variance depends on frequency index. Modelling RE as a random variable with uniform distribution holds only for the first FFT stage. RE probability density function is not uniform in the following stages. The discretization error variance at each stage p is estimated recursively as: where k=0, . . . ,2 p-1 -1 and σ 2 h = 2 −2b−4 if the operations are performed with b+1 precision. The parameter σ 2 cs is the variance of the discretization error caused by the multiplication with the cos/sin coefficients. These theoretical models are statistically checked in [24] using DIT FFT with two rounding methods.
In [25], the authors optimize the FFT word length in a memory based architecture attempting to avoid a very pessimistic estimation. A butterfly operation is expressed as Z = 1 2 w kn N (A + B) where A and B are the complex inputs. The Z norms are related to the norm of A and B by Z 2 ≤ ( A 2 + B 2 )/2. The covariance of the FFT error at stage p of an N-point FFT (with p = log 2 N stages) is estimated as: where a(p) and b(p) are the number of bits used to represent the butterfly inputs and sine/cosine coefficients at stage p, respectively. The upper bound of Signal to Noise and Quantization error Ratio (SNQR) is less than −10 log 10 σ 2 p . The target of the optimization problem is to maximize the vector of q elements subject to σ 2 p ≤ 10 −SNQR/10 . In our approach, the optimization target is to find the minimum number of bits b that can be used as the word length subject to QE ≤ UE × p f where p f is a fraction or a multiple of UE depending on the application specifications.

Proposed UE, RE Error Model
We focus on a 16-point FFT on the OFDM receiver side that is used as a case study and appears in Figure 2. It is attempted to estimate first, how the error, caused by the undersampling process described in Section 2.1, propagates and calculate the expected error on the FFT outputs. The FFT output error affects the Symbol Error Rate (SER) and the Bit Error Rate (BER) of the OFDM system. However, in this work the FFT output error caused by the undersampling process (P UE ) is compared to the QE error (P QE ) in order to find an acceptable word length for the FFT.
In the four propagation cases examined, a single error is caused by a QAM symbolŷ i that does not have the expected trivial value y i (they differ by ε) in order to generate FFT outputs equal to zero (that can be pruned) or the expected trivial intermediate butterfly outputs. Instead, let us assume that:ŷ i = y i + ε. The propagation of the error in the first two levels of butterflies (a pair of 2-point and one 4-point FFT) is denoted with the dashed lines in Figure 2. In the first case, the error ε appears at the even input i of the 2-point FFT at the top. The error at the outputs of the In the second case that appears from top to the bottom of Figure 2, the non-trivial y i symbol appears at the even input i+2 of the four-point FFT. The error in this case In the third case the error appears at the odd input i+1: e (i+1) In the last case at the bottom of Figure 2, the error at the odd input i+3 is e (i+3) In the previous expressions of e logN-1 the twiddle factors w j N are used without the index N for simplicity. The differences in the four e logN-1 expressions are owed to the different twiddle factors that multiply the error ε as it propagates through different paths. Figure 3 shows how the e logN-I error of stage logN-p propagates to the next butterfly stage logN-(p-1). The arrows in Figure 3 are buses and Figure 3a shows the propagation of the error from the top butterfly input, while Figure 3b shows the propagation of the error from the bottom input. In the first case e logN−(p−1) = e logN−p e logN−p while in the latter case e logN−(p−1) = w. * e logN−p -w. * e logN−p . The symbol ".*" implies multiplication of the corresponding elements of the vectors and w is the vector of all the twiddles of a specific butterfly stage. If multiple errors exist in the FFT inputs, their effect is added at the FFT output. For example, if the errors ε 3 and ε 7 occur in the bit reversed 16-point DIT Symmetry 2020, 12, 543 9 of 20 FFT inputs y 3 and y 7 (corresponding to Y 12 and Y 14 outputs), the individual output errors (e y 3 0 and e y 7 0 , respectively) would be: The combined output error if both ε3 and ε7 occur is 0 3 , 7 = 0 3 + 0 7 . Of course, the same error 303 propagation model holds for the IFFT on the transmitter side. Moreover, the estimation of the error 304 ε is easier on the transmitter side since, the IFFT inputs are QAM symbols with integer values. The
Equation (15) can be rewritten as:   Since X c = (1,1), the minimum error in 16-QAM modulation is ε min = (1 − 1) 2 + (1 − (−1)) 2 = 2 = ε (2) . The rest of the errors are: The expected value of ε will be: For 16-QAM, E[ε]=3.37 if only the symbols that are different than X c are taken into consideration and they have equal probability. In general, E[ε] depends on the sparseness level s<1 of the input and the QAM modulation. The sparseness level s means that a fraction s of the input data bits is non-trivial. If all the non-trivial symbols have equal probability to appear, then p (k) =s/(N-1). In this case, Equation (15) can be rewritten as: The effect of the employed modulation scheme to the BER/SER of a telecommunications' system is explained in detail in Appendix C of [12]. The estimation of E[ε] for different modulation schemes can be assisted by the analysis performed in [12]. There are several alternative options to place the constellation symbols that correspond to a specific number of bits. The lower error is achieved when adjacent constellation symbols differ only in one digit (Gray mapping). In one-dimensional or ring constellations, the Gray mapping can be easily found while square QAM constellations can be Gray encoded hierarchically by examining smaller blocks. There are, however, constellations where there is no perfect Gray mapping. Each constellation X is surrounded by a decision region with minimum distance called Voronoi. When a received symbol Y resides within the Voronoi region of X it is decoded as X on the receiver. The higher the minimum distance of a modulation scheme, the lower the BER/SER that can be achieved. For example, if hard decoding is employed, i.e., if a received QAM symbol is first demodulated to its corresponding bits and then these bits are corrected by the supported FEC method, SER can be expressed as [12]: where Y is the received symbol, X is the constellations of a specific modulations scheme, p(Y|X) is the probability to get Y at the receiver given that the symbol transmitted is X and R x is the decision region (Voronoi) of X. In the 16QAM modulation examined earlier, if one of the constellation bits is inverted, the effect on the BER of this error is 1/4 of the effect on the SER. It can be stated that SER represents the worst case effect of the error to the OFDM system. This is also confirmed by various simulations performed in the Appendix C of [12] where several modulations schemes are tested. In order to estimate the variance of the UE in each one of the FFT outputs we have to define all the possible paths that the error can follow from the FFT input. The paths can be determined in a systematic way starting from a specific output. Let us assume that when a butterfly crossing is reached following the upper branch is denoted by '0' while following the lower branch is denoted by '1'. All the paths can be determined in this way by the combination of log 2 N bits. For example, in Figure 2, the dashed line that reaches output Y 12 shows a potential path that the error has followed from input y 6 . The error propagation path in this case can be denoted by "0110" and the initial error ε can be multiplied in each branch by 1, a twiddle factor w or -w. Table 1, lists for example, all the potential errors that can occur at each output of an 8-point FFT as well as the expected output error values (in all cases they are 0 except from Y0 due to orthogonality) and its complex variance. Since the complex variance is the sum of the variances of the real and imaginary parts and the expected values of the error at each output is 0 (except Y0), the complex variance is actually the sum of the squares of the sine and cosine of the same number (the power of the corresponding twiddle factor) which results in 1. Thus, the complex variance is equal to 1×ε 2 in all cases but Y0. This fact holds for any N-point FFT. If R is the number of samples substituted by the undersampling procedure (e.g., R=1/16 means that N/16 of the FFT inputs have been substituted by others), the total power of the UE error (P UE ) can be estimated as a function of N, R, s, E[ε]:  The selection of an appropriate FFT word length and ADC resolution should target to the restriction of QE and RE errors within a fraction or a multiple p f (e.g., P QE , P RE ≤ 10% of the P UE ) of the UE error estimated in the way described above. More specifically, using P RE = 8σ 2 n from Equation (11) and defining c = 1 + u 2 + a 2 , P RE can be expressed as: Symmetry 2020, 12, 543 12 of 20 If g = 2 −2b+1 , Equation (20) can be expressed as a 2 nd degree equation as follows: The determinant Det in Equation (20) above is Det = 4c 2 + 24P RE = 4c 2 (1 + 6P RE ). Solving (21) and keeping only the positive square root of the determinant (the negative square root is not applicable) we get: The first two terms of the Taylor series approximation of the square root were preserved in Equation (20). Since the variables α and u in the definition of c change in the various operations of the FFT, it is attempted to estimate only c and not α and u separately. Replacing g in Equation (20), the required word length b can be estimated as follows: If c 1 = 1 2 log 2 (8c), and Equation (18) is written in a general form in order to adjust the weight of the parameters s, N, p f , R, the word length b can be expressed as: In order to select appropriate values for the unknown c i values, the Octave fsolve function for non-linear equations is used with a small number of instances of Equation (23) i.e., with a small number of b, p f , s, R, N, combinations. The experimental results show that Equation (23) can be used then to accurately estimate the required word length b for other p f values given a specific OFDM configuration (s, R, N, ε). The physical meaning of the estimated c i parameters will be explained in Section 5.

The Employed FFT Architecture
A DFT requires O(N 2 ) operations that are reduced to O(N·logN) if the original FFT architecture is employed [18]. The number of points used by the FFT can be expressed as a product of numbers that are powers of 2. Thus, a 1024-point FFT can be implemented by 10 Radix-2 stages, or 5 Radix-4 stages. If the number of points of the FFT is not a power of 2, then Radix-3 or Radix-5 butterflies can also be employed. For example, a 100-point FFT can be implemented with one stage of Radix-4 and two stages of Radix-5 butterflies [26]. The round-off errors depend on the architecture of the FFT (serial/parallel, Decimation in Time or Frequency, etc.) and the number of stages. An FFT can be implemented either in software if slower operation is acceptable or in hardware for faster response. Modern telecommunication systems require high speed hardware FFTs. Hardware FFTs can either consist of a large number of hardware resources working in parallel or reusable components for more compact, low power implementations with a slightly higher latency overhead.
In this paper, a robust memory-based pipeline FFT has been developed to test the effect of round-off errors in conjunction with the undersampling scheme described in Section 2. It consists of log 2 N stages (one of them appears in Figure 5). The inputs of stage l are stored in the double buffer l (its size is 2×N×b bits). The word length of a butterfly output can be larger by one bit compared to its inputs for optimal resource utilization. However, we use a constant size of b-bits for the inputs/outputs and twiddles of all stages in order to get similar results with the case where a single reusable pipeline stage was used iteratively. One buffer l of the pair is used to store the real and the other for the imaginary part of the FFT inputs/outputs. Buffer l is accessed for write through the buses w1(l) and w2(l), and for read through the buses r1(l) and r2(l). Each one of these buses consists of a log 2 N bits, address bus (ra(l) or wa(l)) and a pair of b-bits data buses (Re{rd(l)} and Im{rd(l)}, or Re{rd(l)} and Im{rd(l)}). Each data bus carries real numbers in fixed point format with a size of b bits. The inputs of each Radix-2 butterfly are the rd1(l) and rd2(l) while its outputs are wd1(l) and wd2(l). The real and imaginary parts of the twiddle factors w are retrieved from the twiddle Read Only Memory (ROM). The size of the twiddle ROM of stage l is 2×N/2 l+1 . round-off errors in conjunction with the undersampling scheme described in Section 2. It consists of log2N stages (one of them appears in Figure 5). The inputs of stage l are stored in the double buffer l (its size is 2×N×b bits). The word length of a butterfly output can be larger by one bit compared to its inputs for optimal resource utilization. However, we use a constant size of b-bits for the inputs/outputs and twiddles of all stages in order to get similar results with the case where a single reusable pipeline stage was used iteratively. One buffer l of the pair is used to store the real and the other for the imaginary part of the FFT inputs/outputs. Buffer l is accessed for write through the buses w1(l) and w2(l), and for read through the buses r1(l) and r2(l). Each one of these buses consists of a log2N bits, address bus (ra(l) or wa(l)) and a pair of b-bits data buses (Re{rd(l)} and Im{rd(l)}, or Re{rd(l)} and Im{rd(l)}). Each data bus carries real numbers in fixed point format with a size of b bits. The inputs of each Radix-2 butterfly are the rd1(l) and rd2(l) while its outputs are wd1(l) and wd2(l). The real and imaginary parts of the twiddle factors w are retrieved from the twiddle Read Only Memory (ROM). The size of the twiddle ROM of stage l is 2×N/2 l+1 . The operations performed at a Butterfly block are: The address buses ra(l) and wa(l) are driven by the Address Generator module that is based on an up counter with log2(N)-1 resolution. In each stage l the pair of addresses used for the retrieval of the butterfly inputs/outputs (Addr0 for I0 and O0, Addr1 for I1, O1) and AddrT for the corresponding twiddle factor are the following: The operations performed at a Butterfly block are: Im{O 0 } = Im{I 0 } + Re{I 1 }·Im{tw} + Im{I 1 }·Re{tw} (25) The address buses ra(l) and wa(l) are driven by the Address Generator module that is based on an up counter with log 2 (N)-1 resolution. In each stage l the pair of addresses used for the retrieval of the butterfly inputs/outputs (Addr0 for I 0 and O 0 , Addr1 for I 1 , O 1 ) and AddrT for the corresponding twiddle factor are the following: In Equations (28)-(30), % is the modulo operator, Cnt/ N 2 l+1 is the floor function, and Cnt is the current value of the counter in the Address Generator. The stages of the developed FFT operate in a ping-pong manner. For example, in the first N/2 clock cycles the FFT inputs are loaded on the input Buffer at stage l=logN-1. In the next N/2 cycles, the butterfly of stage l is driving its outputs to Buffer l-1. Then in the next N/2 cycles, the stage l-1 is reading inputs from Buffer l-1 and driving the outputs to the Buffer l -2. At the same time the input Buffer l can be loaded with the next set of FFT inputs. The FFT latency is N 2 logN cycles and the throughput is an FFT output completed every N cycles. The FFT architecture described in this section can be used to evaluate the complexity of the system in relation with the word length. Focus is given on the main FFT blocks: adders/subtractors and multipliers in the butterflies, counter in the address generator, input/output buffers and twiddle factor ROM. The silicon area or gate count of a ripple carry adder/subtractor is proportional to the word length of the operands. However, if the word length of an b-bit adder with carry look-ahead is increased by 1 (b+1) the required gate count will be increased by more than 1/b since the carry look-ahead logic needed to generate the additional carry is more complicated than the logic needed to generate the carry of the least significant bits. The gate count required by a multiplier depends on its architecture. For example, an b-bit Scaling Accumulator Multiplier (SAM) consists of b AND gates, a b-bit adder, a b-bit shift register and the b-bit output register. Thus, the SAM multiplier gate count is proportional to the word length b. The same holds for Serial by Parallel Booth Multipliers i.e., the gate count is proportional to the word length. In these kind of multiplier architectures one of the operands has to be inserted serially bit by bit. Ripple Carry Multipliers (RCM) require all the operand bits in parallel but the area needed is proportional to the square of the word length. Other multiplier architectures like row adder tree and carry-save multipliers require approximately the same gate count as RCM but can achieve faster operation. The storage area needed by the twiddle ROMs and the input/output buffers is proportional to the word length but the area needed by the corresponding address decoders is proportional to the logarithm of the word length. In general, it can be stated that the complexity of the FFT/IFFT modules in an OFDM transceiver is approximately proportional to the employed word length.
The proposed FFT architecture has been described in synthesizable VHDL and has been tested in Modelsim. The description of this module in VHDL is sufficient for the assessment of the effect of the finite word length in the overall error in the OFDM receiver. Implementation on a Field Programmable Gate Array (FPGA) would also be useful to estimate the speed and the power consumption of this module and this will be part of our future work.

Simulation Results and Discussion
In this section, the word length estimation based on Equation (23)  Then, the rest of the L=116 configurations were tested and the RMSE between the real value of b and the estimated b est for a specific p f value is extracted as shown in Equation (31). In this way, the minimum number of non-linear equations that have to be solved in order to determine the c i parameters precisely is found.
The sets of non-linear equations described in Tables 2-5 have been used to estimate the values of the c i parameters. The average RMSE achieved in the word length estimation of all the 116 configurations is also listed in the 1 st row of these tables along with the estimated c i values for each case. As can be seen from Table 3, determining the c i values from 12 instances of Equation (23) leads to the lowest RMSE (0.736 for 16QAM and 1.09 for QPSK modulation). A relatively low RMSE is also estimated if 16 equations are used as shown in Table 2. The c i parameters estimated in Tables 2 and 3 are rounded to c 1 = −5, c 2 = −2, c 3 = 2, c 4 = 1, c 5 = 2 in order to explain the physical meaning of these values and how they lead to an accurate word length estimation.  Table 3. Using a set of 12 non-linear equations (23) to determine the c i parameters.

RMSE
From c 1 = 1 2 log 2 (8c) = 1 2 log 2 8 1 + u 2 + a 2 = −5 we get that 1 + u 2 + a 2 = 0.000122 or u 2 + a 2 = −0.99988 which is impossible unless u and a are assumed complex numbers. When the set of equations listed in Table 4 is used c 1 = +4 and u 2 + a 2 = 31 which is more consistent with the model presented in [22] and Equation (11). However, the initial definition of c 1 will be ignored in an attempt to define the overall error model that matches the experimental results more accurately. In this perspective, the rest of the terms in the right side of Equation (23)  f is close to a constant since p f is proportional to the sparse level s: if only non-sparse FFT inputs were present, there would be errors in all the FFT outputs and p ns = P QE /P UE . The higher value measured for p ns is 0.3. If the input is sparse, the ratio p f of the quantization error to the undersampling error is proportional to the sparseness level s: p f =∝ p ns s. When the input is too sparse, the UE and QE errors are both low. When the input is less sparse (s value is higher), UE raises but the raise of QE is even higher. This is owed to the fact that although UE gets worse, there may be still FFT outputs unaffected by the undersampling process if some samples are replaced by the others with identical value. However, if s is higher, more operations with numbers that are not zero will be performed and the QE will increase respectively since all the results of these non-trivial operations will have QE error. In this sense, s 2 counterbalances p 2 f and the maximum value for the 3rd term of Equation (23) will be 1 2 log 2 s −2 p 2 = 1 2 log 2 p ns = 1 2 log 2 0.3 = −0.87. If p ns is lower, a higher positive offset in Equation (23) occurs.
The last term of Equation (23) can have two values in the results presented in this paper: either log 2 1024 = 10 or log 2 256 = 8. This is the larger positive offset that counterbalances the negative value of the constant value c 1 . The specific c i parameters have been approximated for these two FFT sizes. Should different FFT sizes be covered, the set of nonlinear equations that have to be used for the approximation of c i parameters must also include configurations with these FFT sizes. If we try to use the approximated c i parameters of Table 3 for the case of a 64-point FFT size, c 1 + log 2 N would be 0. The term − 1 2 log 2 (3·R·E[ε]) results in small signed offsets between −0.185 and 0.2 as explained above and thus, the word length would be actually determined by the factor − 1 2 log 2 s −2 p 2 f = − 1 2 log 2 p ns . In order to get a realistic estimation of at least 5 bits as a word length, p ns should be 10 -3 , or, in other words, UE error should be 1000 times larger than QE error. Such a relation between UE and QE errors is not always guaranteed.
The estimated and expected word lengths for all the 16QAM and QPSK OFDM configurations tested when the c i parameters listed in Table 3 are used, are compared in Figures 6 and 7, respectively. In these figures, the required minimum ADC resolution is also included. This ADC resolution b ADC has been estimated in Equation (33) that has been derived from Equation (8), the definition ∆ = V re f /2 b ADC and the specification that P QE should be equal to P RE . V ref was selected equal to 1V but approximately the same results would have been achieved if a different voltage reference had been selected, such as 3V. The ADC resolution b ADC should match the FFT word length b thus, b ADC should be selected equal to b since b ADC < b in all cases as shown in Figures 6 and 7.  The estimated and expected word lengths for all the 16QAM and QPSK OFDM configurations tested when the ci parameters listed in Table 3 are used, are compared in Figure 6 and Figure 7, respectively. In these figures, the required minimum ADC resolution is also included. This ADC resolution bADC has been estimated in Equation (33) that has been derived from Equation (8), the definition = /2 and the specification that PQE should be equal to PRE. Vref was selected equal to 1V but approximately the same results would have been achieved if a different voltage reference had been selected, such as 3V. The ADC resolution bADC should match the FFT word length b thus, bADC should be selected equal to b since < in all cases as shown in Figure 6 and    The procedure followed in this paper to define a model that optimizes the word length subject to error restrictions and a predetermined relation between undersampling error and quantization/rounding error can be followed in other non-OFDM telecommunication systems. For example, in optical networks, a model can be created that selects the appropriate modulation (number of bits/symbol: Rs) in order to achieve a desired capacity, given a specific power budget Po. Based on the analysis presented in [12], the capacity Co can be expressed as: The noise term NoRs can be expressed as a weighted combination of the various noise sources in an optical channel [12]: the beat noise 2 = 4 2 , the shot noise ℎ 2 = 2 and the thermal/electronic noise 2 . Concerning the constants (we assume that their values are known) used in these noise variance expressions, Sb is the photodetector responsivity, No is the noise spectral density, PLO the optical power, Be the power equivalent bandwidth of the entire receiver and e the elementary charge. The most important optical noise is Amplified Spontaneous Emission (ASE) which actually describes the attenuation of the optical signal by a factor = 0.2 / . The distortion posed by the required NA repeater/amplifiers placed at distance can be described by the noise spectral density = ( − 1)ℎ or = ℎ if Erbium-doped fiber amplifiers (EDFAs) or Ideal Distributed Raman Amplification (IDRA) is used, respectively. The parameters used in these noise spectral densities are also assumed to have known values: h is the Plank constant, vs is the optical frequency, KT is the photon occupancy factor and < 1 the spontaneous emission factor. The model can be trained by a number of non-linear equations that combine the channel error sources with various capacity and power requirements for specific predefined modulations schemes. The target of this training would be to estimate the weights of the channel error sources. After updating the error model with these weights, it can be used to select an appropriate modulation scheme for different capacity and power specifications or channel conditions.

Conclusions
An error model has been developed for an OFDM transceiver architecture that supports undersampling when sparse information is exchanged. The error model combines undersampling and round-off noise and determines the word length that should be employed by the FFT/IFFT modules. The desired round-off error can be defined as a fraction or a multiple of the undersampling The procedure followed in this paper to define a model that optimizes the word length subject to error restrictions and a predetermined relation between undersampling error and quantization/rounding error can be followed in other non-OFDM telecommunication systems. For example, in optical networks, a model can be created that selects the appropriate modulation (number of bits/symbol: R s ) in order to achieve a desired capacity, given a specific power budget P o . Based on the analysis presented in [12], the capacity C o can be expressed as: The noise term N o R s can be expressed as a weighted combination of the various noise sources in an optical channel [12]: the beat noise σ 2 beat = 4S 2 b N o P LO B e , the shot noise σ 2 shot = 2eS b P LO B e and the thermal/electronic noise σ 2 elec . Concerning the constants (we assume that their values are known) used in these noise variance expressions, S b is the photodetector responsivity, N o is the noise spectral density, P LO the optical power, B e the power equivalent bandwidth of the entire receiver and e the elementary charge. The most important optical noise is Amplified Spontaneous Emission (ASE) which actually describes the attenuation of the optical signal by a factor a ASE = 0.2dB/Km. The distortion posed by the required N A repeater/amplifiers placed at distance L a can be described by the noise spectral density N EDRA ASE = N A e a ASE L a − 1 hv s n sp or N IDRA ASE = a ASE L a hv s K T if Erbium-doped fiber amplifiers (EDFAs) or Ideal Distributed Raman Amplification (IDRA) is used, respectively. The parameters used in these noise spectral densities are also assumed to have known values: h is the Plank constant, v s is the optical frequency, K T is the photon occupancy factor and n sp < 1 the spontaneous emission factor. The model can be trained by a number of non-linear equations that combine the channel error sources with various capacity and power requirements for specific predefined modulations schemes. The target of this training would be to estimate the weights of the channel error sources. After updating the error model with these weights, it can be used to select an appropriate modulation scheme for different capacity and power specifications or channel conditions.

Conclusions
An error model has been developed for an OFDM transceiver architecture that supports undersampling when sparse information is exchanged. The error model combines undersampling and round-off noise and determines the word length that should be employed by the FFT/IFFT modules. The desired round-off error can be defined as a fraction or a multiple of the undersampling error and an appropriate word length is estimated to achieve this goal. A new FFT pipeline module has been developed in synthesizable VHDL in order to evaluate the correctness of the predicted word length. Simulation for all the combinations of QPSK or 16QAM modulation, FFT sizes of N=256 or N=1024 points, sparseness levels 0.5%, 1%, 2% and 10%, and FFT input replacement of either N/4 or N/16 samples have been tested. The simulation results show that the appropriate word length can be determined with RMSE less than 1 and the ADC resolution should match the estimated word length.
Future work will focus on extending the developed error model for pipeline FFTs with different word length in each stage. A different range of FFT sizes and QAM modulations will also be tested. Finally, the FFT/IFFT as well as other OFDM modules will be implemented in real hardware (FPGA) to measure their power consumption and speed.

Patents
The undersampling method described in