Performance Investigation of Peak Shrinking and Interpolating the PAPR Reduction Technique for LTE-Advance and 5G Signals

: Orthogonal frequency division multiplexing (OFDM) has become an indispensable part of waveform generation in wideband digital communication since its ﬁrst appearance in digital audio broadcasting (DAB) in Europe in 1980s, and it is indeed in use. As has been seen, the OFDM based waveforms work well with time division duplex operation in new radio (NR) systems in 5G systems, supporting delay-sensitive applications, high spectral efﬁciency, massive multiple input multiple output (MIMO) compatibility, and ever-larger bandwidth signals, which has demonstrated successful commercial implementation for 5G downlinks and uplinks up to 256-QAM modulation schemes. However, the OFDM waveforms suffer from high peak to average power ratio (PAPR), which is not desired by system designers as they want RF power ampliﬁers (PAs) to operate with high efﬁciency. Although NR offers some options for maintaining the efﬁciency and spectral demand, such as cyclic preﬁx based (CP-OFDM), and discrete Fourier transform spread based (DFT-S-OFDM) schemes, which have limiting effects on PAPR, the PAPR is still as high as 13 dB. This value increases when the bandwidth is increased. Moreover, in LTE-Advance and 5G systems, in order to increase the bandwidth, and data-rate, carrier aggregation technology is used which increases the PAPR the same way that bandwidth increment does; therefore, it is essential to employ PAPR reduction in signal processing stage before passing the signal to PA. In this paper, we investigate the performance of an innovative peak shrinking and interpolation (PSI) technique for reducing peak to average power ratio (PAPR) in orthogonal frequency division multiplexing (OFDM) based signals at waveform generation stage. The main idea behind the PSI technique is to extract high peaks, scale them down, and interpolate them back into the signal. It is shown that PSI technique is a possible candidate for reducing PAPR without compromising on computational complexity, compatible for existing and future telecommunication systems such as 4G, 5G, and beyond. In this paper, the PSI technique is tested with variety of signals in terms of inverse fast Fourier transform (IFFT) length, type of the signal modulation, and applications. Additional work has been carried out to compare the proposed technique with other promising PAPR reduction techniques. This paper further validates the PSI technique through experimental measurement with a power ampliﬁer (PA) test bench and achieves an adjacent channel power ratio (ACPR) of less than –55 dBc. Results showed improvement in output power of PA versus given input power, and furthermore, the error vector magnitude (EVM) of less than 1% was achieved when comparing of the signal after and before modiﬁcation by the PSI technique.


Introduction
The new 5G wireless networks promise a faster data rate and more reliable service to today's mobile users. New technologies are being designed by wireless engineers around the world everyday, and many of these new technologies, such as autonomous vehicle, virtual reality, and Internet of Things (IoT), are planing to adapt to the first commercial 5G networks in the early 2020s [1][2][3]. Millimeter-waves (mmWaves), small cells, massive multiple input multiple output (MIMO), full duplex, and beam-forming are main features of 5G systems that seem promising for fulfilling ultra-fast systems in demand [4]. In mm wave frequencies, the 5G NR supports up to 52.6 GHz applications, and this system uses orthogonal frequency division multiplexing (OFDM) for waveform generation [1]. It should be noted that this orthogonality is different from the multiple access schemes which may or may not be orthogonal as seen in orthogonal multiple access (OMA), and non-orthogonal multiple access (NOMA) schemes [4]. Although the orthogonality in multiple access scheme has also increment effect on peak to average power ratio (PAPR) for multiple users [5], in this paper only the PAPR of one user signal waveform is analysed. As, Fourier transform [6,7] plays an essential role in developing OFDM signals, it is fundamental to understand the connection of Fourier transform and creation of high peaks in any multi-carrier signal which has been used in many telecommunication systems [4].

Fast Fourier Transform; Origin of High Peak to Average Power Ratio or Crest Factor
A block diagram of OFDM is presented in Figure 1, which presents the process of generating orthogonal frequency division multiplexing (OFDM) at the transmitter and the receiver. The orthogonality characteristic is provided by Fourier transform which is applied to signal in form of inverse and discrete as IDFT in transmitter part, and as discrete form as DFT in the receiver part [8]. As result of Fourier transform, there are construction and destruction effects and some signal peaks cancel each other, and some add which creates high peaks in the time domain. The peak to average power ratio (PAPR) parameter for the signal S(t) can be calculated by (1). PAPR dB = 10 log 10 P(t) max P(t) mean (1) where maximum power, or P max , and average power, or P mean can be measured as: P mean = mean(P(t)) = mean(abs (S(t)) 2 ) P max = max(P(t)) = max(abs (S(t)) 2 ), where S(t) is time domain signal and can be represented as: where N is the transform size or the number of sample points in the data frame, and j = −1, s(n) is the signal in frequency domain. It should be noted that the block diagram in Figure 1 represents the signal generation for one user only, and it should not be confused with multiple access techniques such as orthogonal frequency multiple access (OFDMA) for 4G techniques, or recently proposed non-orthogonal multiple access (NOMA) techniques for 5G techniques [9]. In practise, as the value measured from Equation (1), is completely random due to its nature, and it varies from 5 dB to 14 dB, for example, in a LTE signal with bandwidth of 512 Mhz; therefore, the probability of accordance of PAPR is considered to compare two signals together. This probability is defined and explained in Section 3.2.

Previous Peak to Average Power Ratio or Crest Factor Reduction Works
Here we present a summary of the most popular approaches for PAPR reduction in Table 1.
In the literature, algorithms addressing PAPR reduction are also known as crest factor reduction (CFR) algorithms. In order to avoid confusion between crest factor (CF), and clipping and filtering (CF); a technique to reduce PAPR, for the remainder of the manuscript, only the phrase PAPR is used. Among all techniques that are developed to reduce PAPR, a large proportion of the literature is focused on those categories shortlisted in Table 1 (e.g., see [11]). It is observed that clipping based techniques [12], are considered distortion based methods (which means that applying these methods result in out-of-band distortion); however, due to their good performance and simplicity they are considered one of the most practical PAPR reduction techniques [13]. As shown in Figure 2, the original LTE signal when clipped experiences an increase in its adjacent sideband power levels.
Clipping based techniques are developed based on the simple idea of limiting the signal in transmitter to a desired magnitude level without changing the phase [12,14]. As seen in Figure 2, when a sharp clipping technique is applied, about 20 dB distortion is added to the signal which is not desired. The sharp clipping process can be mathematically presented as: where P(t) is time domain power of the LTE signal S(t). CR denotes clipping ratio, and as it can be seen in the example presented in Figure 2, the CR is 0.0015. As shown in Figure 3, two stages of Fourier transform are required in order to remove the unwanted distortion added as a result of the sharp clipping process. These stages require extra computations in the system; however, one filter stage is not sufficient as the PAPR can rise. Clipping has to be preformed again, and in turn, more filtering. A number of loops through the clipping and filtering stages are required to have a clipped signal where distortion added to the main spectrum is minimised. An important benefit about this group of techniques is that they do not need to transfer extra information to the receiver to inform about the modification performed on the signal at the transmitter. For many coding based techniques, such as [15,16], SLM [17], or PTS [18,19] based algorithms, the receiver has to be informed about the modifications performed on the signal at the receiver side, and generally, a compensation block has to be added to the receiver side to reverse the process, and this normally effects the symbol error rate (SER), and error vector magnitude (EVM). Moreover, informing the receiver normally involves transmitting extra bits as side information. In 2011, a technique called DSI-SLM technique was developed [20], which is combination of two techniques of dummy sequence insertion (DSI), and the selected mapping (SLM) technique. The block diagram of DSI-SLM technique is presented in Figure 4. DSI-SLM approach can reduce the PAPR by more than 4 dB; however, in order to achieve that performance the inverse discrete fast Fourier transform (IDFT) can require being implemented approximately 16 times, which in turn is computationally challenging. Moreover, due to the selection of the sequence with minimum PAPR at the right side of the above block diagram, the information about this selection also has to be sent to the receiver. For example, in order to have 16 selections, 4 extra bits have to be sent. Another challenge for implementing DSI-SLM compared to clipping based techniques is the necessity of implementing compensation algorithm in the receiver side. As seen in block diagram of Figure 4, M different phases (Ph.1, Ph.2, . . . , Ph.M) are multiplied into the signal. The received signal at the receiver has to be divided by the same phase in order to extract the original signal, this means implementing a complex division block at the receiver, which has its own complications, as explained in [21]. There are some works aimed at improving these complications [22,23] and the authors have designed algorithms to avoid transferring extra bits; however, the complexity at the transmitter, and the channel security, remain challenging.

The Peak Shrinking and Interpolation (PSI) Technique
Although most of the recent research into PAPR reduction has concentrated on modified SLM and PTS methods, the most practical technique is a combination of clipping and filtering. The goal of of designing peak shrinking and interpolation (PSI) is to have a technique as practical as clipping, and as specific as windowing, an intermediate technique between windowing and clipping techniques. The idea is to simply find problematic peaks in power domain (time domain), extract them, shrink them, and put them back in their place.
A general block diagram for the PSI technique is illustrated in Figure 5. It is seen that first the signal is generated, the PSI technique splits the signal into a number of slices (N s ), as shown in Figure 5.

Detection of a Peak and Its Surrounding
The rational behind the idea of splitting the signal into smaller slices, is to enable parallelization of the process, and looking at peaks closely, as the extent of the peaks varies considerably between slices. The zoomed window (the call-out) in Figure 5 shows slides overlapping (in this case by 1%), which helps to ensure that a peak point is not being missed by being on the border of a slice. In terms of identification of high peaks, comparison with threshold can be employed which can be considered similar to many other techniques [11]. Another approach for detecting high peaks is to find maximum point of each slice. An example of a particular example slice is shown in Figure 6. The location of the peak occurring is indicated as P index . In a similar way, as seen in Algorithm 1, the address or index for second point before and after the high peak on the left side and right side is indicated as L index and R index , where L index = P index − SUR and R index = P index + SUR and SUR indicates the surrounding or skirt of a peak. In should be noted that, unlike previous techniques that consider peaks only by the raised point, here, a peak is defined by its surrounding sample values also. As seen in Algorithm 1, peaks and surrounding points are detected. It should be noted that the process of finding peak points indicated by P index , can be performed in two ways. The first approach is to enforce a threshold value, which is essentially a crest detection technique. This approach results in a matrix containing a number of points, and each point is addressed in terms of surrounding points. The second approach is find the maximum point of each slice, and define the surrounding points. This technique simplifies the initial process; however, it often needs to be repeated as the PAPR is not satisfactory following the shrinking process performed on the detected peak. It can be concluded that there is trade-off between these two techniques, and depending to the hardware requirement, one can be preferred to another, as the enforcing threshold requires a number of bits to store the information of the peak and its surroundings. This study tested 100 instances of randomly generated signals. It was observed that for the first approach the maximum number of peaks found in each slice was 3, and this means 3 × 5 points have to be stored. This can be improved, as mentioned by the second approach, which requires storage for only 5 points; however, it should be noted that the loop might have to be repeated, and this will multiply the involved computational complexity in number of required iterations.
In the process of developing PSI algorithm, it was observed that the dimension of a peak and its surrounding is very important; therefore, the parameter named surrounding region (SUR) is defined in order to show the number of points around the main raised point. Simulations were performed for SUR = 4, 3, and 2, and it was found out that SUR = 2 was the most suitable value, as it did not add unnecessary complication, and resulted in changes smooth enough in order to satisfy the EVM target, discussed later. This means that two points on the left side of the peak point, and two points on the right side are considered for each peak; therefore, a total of five points are processed. It can be seen in Algorithm 1, that if a peak is very close to the beginning of the slice (L index ≤ 0), then first point of signal in the slice is taken as left border point (L index = 1). In the example shown in Figure 6, SUR = 2, and therefore, L index = P index − 2, and R index = P index + 2. It can be seen in Algorithm 1, that if the right side border is very close to the end of the slice (R index ≤ N, the end point of the signal in the slice with address of N is considered for extraction (R index = N). It should be noted that in the cases mentioned, the number of extracted points will be less than five, and when interpolating the processed peak back to the signal, careful addressing should be preformed.

Peak Shrinking Process
Once the peak is detected and extracted, the shrinking process can be performed. Equation (5) shows that the shrinking process can be implemented by a multiplication. It should be noted that this multiplication is performed only on five data sample points when SUR = 2, as number of data points is equal to (2 × SUR + 1). Therefore, it does not need a large amount of memory. The shrinking process can be mathematically presented as: where Peak 0 is the signal before process, SR is the shrinking ratio, and Peak Sh is the resulting shrunk peak. In simulations we may modify the SR, to achieve different results. The shrinking process illustrated in examples in Figure 7a-c is performed with SR = 0.5. As it is observed from Table 1, when SR = 0.5, the performance is the best compared to when SR = 0.4, 0.6. It should be noted that when SR = 1, it means that the peak is not shrunk; therefore, the PAPR remains the same. On the other hand, when SR is selected to be very small, this is the equivalent of removing the signal point. In the latter case, the EV M is degraded since there will be a sharp signal discontinuity. This has a similar effect to performing sharp clipping, and requires many filter stages to compensate. As the main aim here is to reduce the complexity of the system, SR is optimised. Figure 7. (a) An example of a peak border does not need matching/smoothing; (b) an example of a peak border that needs matching/smoothing; (c) an example of a peak border after the matching/smoothing process [24].

Matching/Smoothing Algorithm in PSI Technique
Figure 7 presents two cases: (1) the case that the shrunk peak does not need smoothing process is presented, where it is seen that D R is 0.001 which is smaller than D max . It should be noted that by checking the EVM performance, the suitable value for D max is considered to be 0.005.
Algorithm 2 Matching/smoothing algorithm (performed after the shrinking process) D ← Absolute distance between power of borders; D L ← Left side border point distance between peak surrounding and the rest of the signal; D R ← Right side border point distance between peak surrounding and the rest of the signal; For the next case (2) the peak definitely requires smoothing process, as D R is grater than D max , and the peak after the smoothing process is seen in Figure 7c.
It should be noted that in some cases, the peak can be found at the very beginning of the slice, or at the very end of signal (L index = 0 or R index = N). It can be understood that smoothing algorithm does not have access to the adjacent points to perform smoothing correctly, in order to avoid the scenario which happens 5% of the times when a signal with length of 30 × 10 3 is tested, overlapped slicing, as seen in Figure 5 is recommended, which removes the scenario completely.

Analysing the Error Vector Magnitude (EVM)
Error vector magnitude (EVM) is a commonly used figure of merit to quantify the performance of a transmitter or receiver. It quantifies to what extent constellation points deviate from their ideal locations. It should be noted that in many works, root square mean (RMS) of EVM is used [25][26][27]; however, here, in order to analyse the worst case, we consider the maximum of EVM in each symbol as: where I i and Q i are the I and Q of an ideal signal without PSI modification, and I m and Q m are I and Q of a modified signal after being processed with PSI algorithm. Therefore, (I i − I m ) 2 + (Q i − Q m ) 2 is the measured error between the modified I and Q, and the ideal I and Q.
In order to provide a more comprehensive test of the technique, LTE signals were used for experimentation [28]. The EV M for signal processing on the transmitter should be minimised as much as possible. Here, over 100 randomly generated modulated signals with Offset QPSK (OQPSK) were tested and the EV M was measured as presented in Table 2. OFDM signals with various lengths of inverse fast Fourier transform (IFFT) (K = 512, 1024, 2048, 4096, and 8192) were tested with different values of RP, and SR, and the EV M max in each case was measured and is presented. It was observed that the maximum EVM captured in Scenario 4 when K = 512, RP = 10, SR = 0.5, and NS = 16, and the EV M max had the lowest value in Scenario 6, when K = 2048, RP = 5, and SR = 0.5. Table 2 presents   In order to gauge the effectiveness of the proposed techniques, simulations were first carried out in Matlab [29]. When comparing signals with K = 512, 1024, 4096, and 8192 analytically, it was expected to experience higher PAPR as K was increased. This is due to a higher probability of the presence of high peaks in general. In order to measure this probability the complementary cumulative distribution function (CCDF) was used, which can be represented as: where PAPR th is a threshold PAPR, and P denotes the probability. Again, the modulation scheme used in these tests is offset quadrature shift key (OQPSK), and K is varied from 512 to 8192. It should be noted that the LTE toolbox used did not exceed IFFT length of 2048 [26,30]; however, as carrier aggregated signals in LTE-A and 5G technologies had increased IFFT lengths [31][32][33], in order to validate that this scheme is applicable for LTE-A signals and 5G signals, the test was performed for larger IFFT lengths of K = 4096 and 8192. The comparison of the original signal and the signal with PSI applied in the time domain is shown in Figure 8a, and the CCDF plot is presented in Figure 8b, in which the common point to look at the CCDF for performance comparison is at 10 −4 . This means that, for example, an original signal with length of K = 1024 is 0.01% likely to have PAPR of higher than 12 dB. It is shown that this value is reduced to about 7.2 dB by applying the PSI technique. It should be noted that any improvement in CCDF more than 4 dB aligns with IEEE standards [34], and can be considered a useful technique in practice. A full comparison is displayed in Table 3, which shows in case of K = 8192, the PAPR is reduced by 4.4 dB. This reduction is same as the reduction for K = 4096, however the point is that by doubling the value of K, the length of IFFT is doubled, and it is directly related to number of carrier aggregations or in other words, the bandwidth in 5G signal. This consistent performance makes PSI a reliable and adaptive technique for high bandwidth applications as seen in LTE-Advanced, and 5G signals.  An analysis is provided in order to demonstrate value of NS can effect the PAPR reduction performance. It is seen in Figure 9a that increasing the value of NS does not necessarily improve the PAPR reduction performance; here, 16 was the value used for NS. Figure 9b shows how PAPR reduction performance is improved by increasing the value of RP. However, it should be noted that the computational complexity is also affected by incremented RP. It was observed that choosing RP = 10 is sufficient to satisfy the rule of thumb minimum 4 dB PAPR reduction performance. When RP = 10, it means that 10 iterations for the loop presented in Figure 11, are performed to search for peaks and shrink them. This is by means of employing a search for maximum power points. It should be noted that if threshold enforcement is employed instead of search for maximum point, the loop is replaced with iterative shrinking process, which in fact has same effect in terms of computational complexity. It is in Table 2 that changing NS, RP, and SR has a marginal effect on EVM. As observed from results in Figure 10, the SR value plays an important role for PAPR reduction performance. Therefore, furthermore in Section 3.4, the mathematical relationship between PAPR reduction, and the value of SR is modeled as seen in Figure 14. It can be observed from the plot, and measurements, that when SR is selected about 0.6 and 0.7, the PAPR reduction achieves its optimum performance, which is 3.45 dB for this particular scenario. This simplifies the implementation of PSI as the SR rev. has also same value of 0.7; therefore, it can be excluded. It should be noted that this analysis was performed in CCDF of 10 −1 , and for full CCDF analysis, the reduction at CCDF of 10 −4 was considered, which can be seen in Figure 8b.

Computational Complexity Comparison
As demonstrated in previous sections, simplicity and EVM performance makes the PSI algorithm a practical method. In this section computational complexity of PSI is analysed and compared with previous techniques. Comparison of the proposed technique was first performed with respect to a PAPR reduction technique implemented on an IC [13]. This technique is an implemented version of a clipping technique [12,14,35]. In general, when analysing the computational complexity, the number of additions and number of multiplications are studied. It should be noted that if an algorithm requires complex multiplications, or complex divisions, they all have to be transformed into real multiplications and additions. For example, as mentioned earlier about SLM based techniques [17,20,36], at the receiver, a compensation block including a complex division is required in order to extract the original transmitted signal [37]. The second parameter that plays an important role in implementing a PAPR reduction technique in hardware is the the memory that is required. Here we analyse the number of bits that are required for implementing a technique. By referring to the simplest example of a clipping based technique presented in Figure 3, it is observed that number of multiplications can be measured by the following: n mul,CF = n mul,Clipping + n mul,FFT + n mul,Filtering + n mul,IFFT , where n mul,Clipping is the number of multiplications needed for clipping; and as seen in Figure 3, it involves peak detection block, the defining threshold block, and at least one if/else for iterating each loop. The number of multiplications required for a filtering process is presented by n mul,Filtering , and number of multiplications required for performing FFT and IFFT processes is indicated by n mul,FFT , and n mul,IFFT . For FFT or IFFT, the number of needed multiplications is calculated from the following equation: n mul,IFFT = K log k where n mul,IFFT = 3072, and here it is assumed that K = 1024. It should be noted that for understanding the filtration process, it can be considered a convolution of two signals with same length in domain of frequency. Number of required multiplications for filtration, n mul,Filtering can be calculated by following formula: where here is it assumed that K = 1024, and therefore, n mul,Filtering = 1048576.
It is known that for storage of complex numbers, log 2 n bits is required for absolute of each complex number, and log 2 n is required for storing every angle of every complex number. For example, if we assume a complex number with length of n, the number of needed bits is 2 log 2 n, in order to implement it digitally. When K = 1024, considering the length of each filter, total 2K log 2 n = 20, 480 is required in order to implement CF technique. However, for implementing PSI technique, by referring to Figures 5 and 11, the number of needed multiplications, n mul,PSI can be calculated as: n mul,PSI = n mul,Peak + n mul,Borders + n mul,Extracting + n mul,Shrinking + n mul,Smoothing , where the n mul,Peak is the number of multiplications required for peak detection. Figure 11. The PSI algorithm iteration loop block diagram [24].
It should be noted that the process of detecting peaks, or enforcing the threshold, is very similar to CF technique. In the PSI technique there is a process for defining the borders, and the number of modifications required to do this process, n mul,Boarders is two, which are for indicating the left border, L index and right border, R index . However, these pieces of information need to be temporarily stored, and therefore, two bits of memory are occupied. These indexes are replace by every run, so they do not have to be multiplied into the number of iterating loops. The peak extraction process needs n mul,Extracting number of multiplications which is zero, as it does not require mathematical calculation; however, extracting each peak requires five bits of memory for each peak point and its surrounding points, two points at the left side and two points at the right side. To aid understanding, let us consider the following example, which shows an extracted peak and its surrounding points: where P(t) indicates the sample point from the power of the signal S(t), which is the square of absolute the signal as: In order to implement the shrinking process itself, the five points are multiplied by a constant value of the shrinking ratio (SR); therefore, we see that n mul,Shrinking = 5. For storing the constant value, one bit of memory is occupied. The last algorithm employed in PSI technique is the smoothing, or matching the border points, and this involves two i f /then loops and it is equivalent to two subtractions and two multiplications [24]. It should be noted that in general the total number of bits required for each PSI loop can be represented as: where number of PSI loops is shown by RP, and by assuming that RP = 10, number of required bits for PSI loop can be calculated as: n mul,Boarders = 20bits (15) From (8) to (15), total computational complexity of PSI technique and CF technique can be calculated, and Table 4 presents a comparison in computational complexity among some the most popular CFR techniques. It should be noted that for techniques of PTS [18,[38][39][40], SLM based techniques [20,36,41], the set ups of the algorithms are considered in a way that the PAPR reduction performance is comparable with the PSI technique. This means that number of IDFT processed in both mentioned categories, is considered to be 16, and the length of the signal is considered to be 1024, which indicates the number of bts for storing the phases that are multiplied into the input signal [17,41]. Table 4. Computational complexity comparison.

Type of CFR Technique Number of Required Multiplications Number of Required Bits Length of the IFFT (K = 1024)
(n mul ) (n bit ) Clipping and Filtering (CF) ≥10 6 ≥10 3 Two Step Peak Clipping (TPC) [14] ≥2 × 10 6 ≥10 3 DSI-SLM technique [20] 47810 1024 DSI-PTS technique [41] 67,112,615 1024 Conventional SLM(C-SLM) technique [36] 95, 620 1024 Peak Shrinking and Interpolation (PSI) 1440 80 It can be seen in Table 4, that the newly developed technique of by two step peak clipping (TPC) [14], which shows 3.5 dB PAPR reduction with moderate EVM performance, has about double the complexity of the conventional CF, since there are two stages of clipping, and therefore, two stages of FIR filters. The outstanding computational complexity of the PSI technique, which is achieved by the idea of eliminating the filtration process, allows PSI to be considered a very good candidate for existing and future systems.

Signal Spectrum Leakage
As important as it is to analyse a PAPR performance, it should be always considered that by modifying the signal, additional distortion is not added. Here the spectra of the signal with and without PSI technique are compared, and any leakage is carefully measured. This indicates how well the signal is behaving with respect to allocated bandwidth frequency. Figure 12a represents the spectrum of an LTE signal with a bandwidth of 20 Mhz, and as it is seen there is a slight different between the processed signal and original signal (indicated by red spectrum and blue spectrum in Figures 12 and 13); however, this amount of leakage is normal due to the process of shrinking, and the smoothing algorithm keeps this slight different within acceptable range, as it does not interfere with adjacent channels. However, it should be noted that in case of 5G signals, considering carrier aggregation of for example 20 aggregation process, this leakage should be further tested. Here, the leakage behaviour, as well as PAPR reduction performance are mathematically modeled in order to have good estimation for signals with high bandwidth, and/or carrier aggregation. The PAPR reduction value is formulated as PAPR reduction = −35 SR 4 + 53 SR 3 − 25 SR 2 + 4.7 SR + 2.9. Here, a simulation is performed to understand this effect more clearly. The leakage is also measured for variety of values of SR, and it should be noted that in both cases, the re-sampled signal is re-sampled in order to match, The analysis results are presented in Figure 14a,b. As it is observable from Figure 14a, the leakage is reduced by increasing the value of SR. The behaviour is reasonable, as a value of SR shows the amount of shrinking process. The optimum of value observed from other results (SR opt ) is sufficient in terms of PAPR reduction, EVM performance, and this leakage. Here, an analysis is presented that explains how PSI technique can be equally used for different types of LTE signal, as described in Table 5. A parameter known as number of resource block (NRB) [30] is varied and the spectrum is compared. The NRB indicates how many blocks of 180KHz is occupied by the signal structure. The spectrum results in Figures 12a and 13a are captured, when the original and also processed signals are both re-sampled with re-sampling factor of two. It can be observed that the leakage caused does not reach the next harmony and it is acceptable. Figures 12b and 14b shows the spectrum results when NRB values of 5, and 15 are employed, and the signal is not re-sampled. It should be noted that these tests are performed when NS = 16, SR = 0.9, and RP = 10. Table 5, shows the structure of LTE standard and the PAPR reduction that can be achieved by applying the PSI technique on different settings. It can be seen that when NRB is increased, the length of IFFT is also increased (for instance, when NRB = 6, IFFT/FFT length = 128, and bandwidth is 1.4). The PSI technique was tested with all the varieties of signal, and as seen in the Table 5, the spectrum leakage remained small in all cases. It should be noted that here, we consider spectrum leakage very close to the main spectrum in both cases, with and without re-sampling process. The worst case shows about 9.29 dB leakage when NRB is 75. It should be noted that by optimising parameters of the PSI algorithm, such as SR, PR, and NS, the leakage is minimised, as shown in Figure 12a. However, here in order to have clear picture, worse scenarios were considered. As the variation is comparably small, and close to the main spectrum, this comparison was performed; however, in many works, when the variation and spectrum reshaping is noticeably high, mathematical calculations for adjacent channel power ratio (ACPR) or adjacent channel leakage ratio (ACLR) must be measured. ACPR or ACLR is a measure that shows a ratio between the power of adjacent channel to the main channel's power. In this paper, the comparison of the modified signal with the original signal without PSI applied are shown with red and blue colours in Figures 12 and 13.
It should be noted that all the tests were performed based on LTE toolbox and QPSK, or OQPSK modulation schemes; however, according to LTE standard documents [28], if test model 3.1 or test model 3.2 are employed, the waveform is slightly different, since the modulation scheme in test model 3.1 is 64QAM and the modulation scheme in test model 3.2 is 16QAM. PSI was tested with the aforementioned test model signals; by running one test with each, it was found that the original signal with test model 3.1 experiences 12.1859 dB PAPR, and test model 3.2 experiences 16QAM PAPR. The PSI technique was able to reduce this PAPR to about 8 dB, which is very similar to QPSK and OQPSK schemes. However, further analysis for EVM for different test models can be performed in future studies.

Testing PSI Performance with WCDMA, DVB S2, 4G, and 5G Signals
Wideband code division multiple access (WCDMA) is one of the earliest signals that experienced high PAPR [43]. Here, a 20MHZ bandwidth signal was generated and tested with PSI technique. This test was also performed with Digital Video Broadcasting Satellite Second Generation (DVB S2) Signal, a typical 4G orthogonal frequency division multiplexing (OFDM) signal, and a 5G signal based on orthogonal frequency multiplexing division access (OFDMA) signals [44].
The necessity of applying PAPR reduction on DVB S2 signal is due to the need for maximising the power efficiency of amplitude phase shift key (APSK) transmission in satellite channels, as the bandwidth requirements are very limited, and as power efficiency is mostly effected by high power amplifiers (HPAs), and as explained about efficiency section, PAPR determines the efficiency of HPAs in modern signals. Therefore, it is beneficial to show that the PSI technique is usable in the DVB S2 case and for other types of signals. In order to have comparable test, the signals were simulated with close characteristics; for example, length of a inverse fast Fourier transform (IFFT) in three cases of WCDMA, DVB S2, and 4G signals was kept to 2048 in order to fulfil bandwidth of 20MHZ. It was captured that by testing about 1000 symbols of these signals, the PAPR value fell on about 10.3 dB. As shown in previous test results, this value increases by extending the test to 10, 000 symbols, as shown in Figures 8b and 10. However, it should be noted that length of IFFT in 5G signals start at 4096, which effects the probability of PAPR value the same way that increasing bandwidth does, and therefore, as seen in Figure 15, the PAPR value of reference 5G signal is about 12 dB at CCDF of 10 −3 . The probability of PAPR after the PSI technique is shown by purple colour plots in Figure 15. It is observed that in cases of WCDMA, DVB S2, and 4G signals, PSI reduces the PAPR by about 2.4 dB at CCDF of 10 −3 ; it should be noted that this performance increases by looking at CCDF of 10 −4 . Here, due to time limitation, the simulation was performed only 1000 times.

Experimental Validation
In this section, an implementation with an FPGA evaluation board is presented. The general implementation set up is shown Figure 16, and laboratory test bench in Figure 17. As seen in this figure, a Zedboard FPGA board (ZedBoard Zynq R -7000 ARM/FPGA SoC Development Board [45]) was connected to a computer running Matlab R2016b simulation tool. This connection can be established using the 10/100/1000 Ethernet port [46]. The Zedboard was connected to an AD-FMCOMMS2-EBZ board [47], which is a FMC board for the AD9361, a highly integrated agile RF transceiver [48]. However, due to the limitation of the instantaneous bandwidth less than or equal to 56 MHz, this limits implementation of DPD, in this experiment to 5 MHz bandwidth signals.  The signal was passed through to the power amplifier (PA). In this work, an RTP26010-N1 RFHIC was used for amplification [49]. This power amplifier from class Doherty [50] has maximum output power of 47 W.
A major component of the experimental validation is to confirm that the PSI technique does not apply additional distortion outside the main channel. As such, it is beneficial to experimentally validate the PSI technique on the bench with a nonlinear PA. Additional DPD linearisation was also applied to the signal in order to linearise the PA output and perform as close a test to a real world scenario as possible. The resulting spectrum and amplitude/amplitude (AM/AM) behaviour of PA was compared with and without DPD, and PSI technique. The DPD linearisation is widely used in many applications, such as cellular telecommunications and satellite linking, because of its relatively simple implementation [51][52][53]. Here a DPD based on the least squares (LS) optimisation technique was used [54]. For simplicity the number of coefficients was minimised; the AM/AM behaviour of PA is plotted in Figure 18b. The inverse of this dynamic nonlinearity was applied to the signal prior passing it through the PA. As mentioned earlier, as DPD relies on the initial response of PA, in order to operate, a wide-bandwidth should be considered; for example, if the signal generated has 5 MHz bandwidth, the bandwidth allowed for FMCOMM and FPGA should be able to compute 5 times that [47]. Therefore, the test signal presented in Figure 18a,b, had a 25 MHz bandwidth. It is seen in Figure 18a, that when a signal is passed through the PA, the distorted output signal has an ACPR or ACLR of approximately −30 dBc. Once DPD is applied, this distortion is reduced to −55 dBc. Applying the PSI to the signal also does not distort the spectrum beyond that of the pre-distorted signal. The PA AM/AM behaviour with and without DPD and PSI are shown, and again, the PSI technique does not impair the overall performance. It is important to note that by applying PSI technique allows the PA has to operate longer in more efficient mode. This can be seen by more frequent data points in the saturation region, as illustrated in the zoomed rectangle frame in Figure 18b. The test presented in Figures 17 and 18a showed that about 1 dB increment in the output of PA was achieved by applying the PSI algorithm.

Conclusions
This article is an extension of work on a novel technique presented for PAPR reduction of OFDM based signals at the 2019 Wireless-Days Conference in Manchester [24]. A more than 4 dB PAPR reduction at CCDF of 10 −4 or 0.0001, while the EVM performance degraded by less than 1%, was demonstrated. In order to explore the practicality of the PSI technique, more evaluations and tests have been performed. The CCDF analysis was tested using LTE signals with IFFT lengths of 4096 and 8192 (in the case of an aggregated signal), with bandwidths of 40 MHz and beyond. Furthermore, the PSI technique was tested with a Doherty PA and a FPGA evaluation board. It was seen that the DPD is able to operate almost in same way as it does without PSI; however, with PSI applied, the PA has more chance to operate in higher power region without experiencing non-linearity. It was observed from experimental tests that the back-off is improved by about 1 dB. Studies showed similar performance for 16QAM and 64QAM modulation schemes. For future work, generating co-sim block and compiling the algorithm directly on FPGA is recommended; this will indicate the exact amount of hardware resources required for implementing the proposed technique in hardware. It is expected that by optimising all the parameters of PSI algorithm, the output power value achievable can be increased further in future work. Moreover, bit error rate (BER) and symbol error rate (SER) can be investigated in future works. Funding: This publication has emanated from research conducted with the financial support of Science Foundation Ireland (SFI) and was co-funded under the European Regional Development Fund under grant number 13/RC/2077.