Reducing the Cost of Implementing Filters in LoRa Devices

This paper presents two methods to optimize LoRa (Low-Power Long-Range) devices so that implementing multiplier-less pulse shaping filters is more economical. Basic chirp waveforms can be generated more efficiently using the method of chirp segmentation so that only a quarter of the samples needs to be stored in the ROM. Quantization can also be applied to the basic chirp samples in order to reduce the number of unique input values to the filter, which in turn reduces the size of the lookup table for multiplier-less filter implementation. Various tests were performed on a simulated LoRa system in order to evaluate the impact of the quantization error on the system performance. By examining the occupied bandwidth, fast Fourier transform used for symbol demodulation, and bit-error rates, it is shown that even performing a high level of quantization does not cause significant performance degradation. Therefore, the memory requirements of LoRa devices can be significantly reduced by using the methods of chirp segmentation and quantization so as to improve the feasibility of implementing multiplier-less filters in LoRa devices.


Introduction
The LoRa (Low-Power Long-Range) modulation technique is an excellent solution for many internet-of-things (IoT) applications due to its excellent energy consumption, link robustness and long-range capabilities at the expense of low bit rates [1,2]. LoRa uses a modified form of chirp spread spectrum (CSS) modulation, wherein the carrier frequency of a sinusoid is linearly varied across a specific bandwidth. This results in a set of signals known as chirps [3], which are distinguishable by their starting frequencies.
The behavior of a LoRa chirp is controlled by both the spreading factor, SF, and the bandwidth parameter, BW. The spreading factor is an integer value, typically ranging from 6 to 12, while the specified bandwidth can be chosen from values in the range of 7.8 to 500 kHz [4,5]. Each chirp (or symbol) is encoded with SF bits, which means that there are M = 2 SF possible symbol values, where M is the modulation order [6]. The instantaneous frequency of a chirp linearly increases or decreases across the bandwidth specified by BW over the symbol duration [3]. The tradeoff between the range capabilities and the nominal bit rate depends on SF and BW. For instance, high SF and low BW allow for higher receiver sensitivity, but at a lower bit rate, whereas low SF and high BW lead to reduced receiver sensitivity, but a higher bit rate.
As the demand for long-range, low-power IoT devices increases, so does the need to improve the spectral efficiency of these devices' transmission. One promising solution is to implement a set of pulse shaping and matched square-root raised cosine (SRRC) filters in LoRa transmitters and receivers, respectively. The use of these filters can significantly reduce the bandwidth containing 99% of the total mean signal power while also reducing the out-of-band emissions created by LoRa devices [6][7][8]. The increase in spectral efficiency allows us to accommodate a larger number of IoT devices.
The challenge is that since LoRa devices are characterized by their low complexity, it is more difficult to justify the added resources required for filtering, especially when longer filters are required (which is often the case for LoRa devices with lower bandwidth settings [8]). Since the cost of implementing multipliers in hardware can prove significant, a preferred solution should eliminate the need for multiplications altogether. As such, the main objective of this paper is to investigate the feasibility of implementing a "multiplier-less" pulse shaping filter in a LoRa transmitter.
Replacing hardware multipliers in a pulse-shaping filter can be done with a look-up table (LUT), provided there is a finite number of input sample values to the filter that are known. Instead of multiplying each incoming sample by a filter coefficient using a hardware multiplier, the result of every possible multiplication can be precalculated and stored in the LUT. Then, the LUT can output the correct product of the required multiplication based on the associated input sample value. The size of the LUT depends on both the number of unique input sample values and the filter length. In such an implementation, the complexity of the filter is measured by the cost of the memory instead of hardware multipliers.
The problem with LoRa is that since discrete-time LoRa chirps are made up of M = 2 SF samples, the multiplier-less filter must be able to accommodate the M possible input values for each chirp waveform. Furthermore, implementing multiple SF settings and ensuring the continuous phase of modulated chirp waveforms exponentially increase the already large number of unique filter inputs. While many of the chirp sample values are repeated among spreading factors and/or symbol values, the memory requirement of the LUT is still significant. The LoRa end-devices are particularly constrained by the additional memory requirements as these devices have a greater need for low energy consumption and few complex operations than the LoRa gateway.
In this paper, two methods of optimizing LoRa transmitters are proposed in order to reduce the complexity of filtering. First, waveform segmentation is used to generate an entire basic LoRa chirp waveform from only a portion of the total number of chirp samples in order to reduce the size of the chirp generation ROM. While this does not directly impact the LUT size, it reduces the overall memory requirement. This method is inspired by the CSS transceiver design presented in [9] and it has been adapted for LoRa.
The second method involves quantizing the LoRa chirp samples to a significant degree so as to reduce the number of unique input values to the multiplier-less filter, which helps to reduce the LUT size. While chirp segmentation does not add any error, quantization adds some rounding errors to the quantized chirp signals. It is important to ensure that the desired sample reduction can be achieved without significant performance degradation.
The feasibility of implementing a multiplier-less SRRC pulse shaping filter in a LoRa transmitter will be evaluated in terms of the tradeoff between the potential sample reduction and impact of quantization noise on the performance. In order to quantify the effectiveness of the sample reduction, the number of samples required to form LoRa chirp waveforms and subsequent filtered signals shall be compared to that of an unoptimized LoRa device. The degree of sample reduction, therefore, depends on which spreading factors are supported, the length of the pulse shaping filter, if chirp segmentation is used, and the quantization step size (if any).
Finally, the performance of the system will be evaluated for LoRa signals with various degrees of quantization. A LoRa communication system is simulated in Matlab, and the performance is evaluated in terms of the occupied bandwidth (OBW) of transmitted LoRa signals, the output of the fast Fourier transform (FFT) performed for symbol demodulation, and the bit-error-rate (BER). The goal is to find appropriate levels of quantization in order to significantly reduce the overall memory requirement while maintaining excellent performance. Figure 1 illustrates a block diagram of a LoRa system that implements a pulse shaping filter at the transmitter and a matched filter at the receiver. While the performance benefits brought by implementing these filters are demonstrated in detail in [6,8], this paper focuses on reducing the complexity of implementing the pulse shaping filter in a LoRa end device's transmitter.  Basic LoRa chirp waveforms are used as the basis for the LoRa modulation technique. They are used in both the preamble and payload of transmitted LoRa packets. The expression for a continuous-time basic LoRa chirp waveform is shown in (1), where T sym is the symbol duration in seconds and µ is the chirp rate in Hz/second. The continuous-time chirp waveform is then sampled at a rate of F s = 1 T s = BW for digital implementation [6]. The expression for a discrete-time basic LoRa chirp is given below in (2), where t = nT s = n/BW. As an example, Figure 2 plots the real and imaginary components of both x 0 [n] and x 0 (t) with SF = 6 and BW = 125 kHz. LoRa symbols are modulated by cyclically shifting the basic chirp waveform by the symbol value, m, as shown in (3).

Sample Reduction Methods
Furthermore, in order to maintain the phase continuity between subsequent chirps, each modulated chirp waveform obtained from (3) is multiplied by the complex conjugate of its first sample, x * m [0]. This causes the instantaneous phase of the chirp to be zero at both the beginning and end of the symbol duration, rather than causing sharp phase discontinuities between consecutive modulated chirps [3,6]. Performing phase correction changes the modulated chirp waveform expression given by (3) to that of (4).
While performing the complex multiplication will not affect the number of samples required for chirp generation, it will drastically affect the number of unique filter inputs. However, as in the case of basic chirp samples, it turns out that many phase-corrected sample values are shared among multiple SF settings, as well as modulated chirp waveforms associated with other symbols.
In terms of implementation, the real and imaginary components are considered separately as they correspond to the I and Q channels in a practical system. However, since the magnitude of x 0 [n] at each sample index is always equal to 1, the sequence of sample values for the I and Q channels are subject to the same patterns. Therefore, the properties discussed in this paper that are used to simplify the chirp generation shall apply to both the real and imaginary samples of x 0 [n]. Both components also contain the same sample values, albeit the signs and sample indices may differ.
With that in mind, the following subsections detail the proposed methods and the resulting sample reduction compared to a standard LoRa device. It is also important to note that while basic chirp samples obtained from (2) are used for chirp generation, the LUT must account for each possible phase-corrected modulated chirp sample obtained from (4).

Chirp Waveform Segmentation
It is perhaps more intuitive to begin by examining the inherent symmetry in the sequence of M = 2 SF basic chirp samples that make up the waveform. For instance, consider the basic chirp waveform shown in Figure 2 once again. Both components appear to exhibit a symmetry about the midpoint located at n = M 2 , which is n = 32 in this case. While it may not be obvious from Figure 2, there are several other patterns present in the sequence of chirp sample values as well. A close inspection reveals that a basic chirp waveform can be divided into four segments, each containing M 4 samples according to (5) and (6), where k is an integer representing the segment number. More importantly, each of these segments contains identical sample values, but they differ with predictable patterns of opposing signs and/or sample order.
Using (5) with (2) gives the exponential form of x 0,k [n] shown in (7). Substituting each value of k into (7) gives the individual waveform segment expressions shown in (8).
As an example, consider the basic chirp sample values of x 0,k [n] by segments for SF = 6 shown in Table 1. It is important to note that while the analysis below refers to this specific set of data, the following relationships between segments hold for all SF settings. The first segment, x 0,1 [n], can be manipulated in order to obtain the remaining three segments with relatively simple operations. First, consider segments 1 and 3. It is obvious from Table 1 that every odd sample of x 0,3 [n] has the opposite sign of x 0,1 [n] at the same sample index value, n. This relationship can be obtained mathematically by comparing the expressions for k = 1 and k = 3 in (8) as shown below. Next, consider the relationship between segments 1 and 4. It is simple to see that x 0,4 [n] is a reverse indexed copy of x 0,1 [n]. This can be proven by first finding x 0,1 M 4 − n as shown below, and then comparing the resulting expression to that of x 0,4 [n].
Lastly, x 0,2 [n] is a reverse indexed copy of x 0,1 [n] with opposing signs at every odd value of n. This is confirmed by the following comparison between the expressions for x 0,1 M 4 − n and x 0,2 [n]. . As a result, the number of samples stored in the ROM can be reduced from a total of 2M to M 2 real and imaginary samples without introducing any error.
x 0,3 [n] = −x 0,1 [n], for n odd In order to quantify the impact of chirp segmentation on the complexity of a practical system, the actual number of real and imaginary samples contained in the ROM and LUT must be considered. Let N gen represent the number of samples required for chirp generation, while N in represents the number of unique samples at the input of the pulse shaping filter. The calculated values of N gen and N in are shown in Table 2 for a LoRa system using the chirp segmentation method. It should be pointed out that "all" means the support of spreading factors ranging from 6 to 12 in the scope of the study. Here, N gen is calculated as M 2 , while N in is found by counting the number of unique sample values given by (4) for each possible symbol value. Furthermore, N in is counted based on the absolute (unsigned) value of each sample value. This is because the sign of the input samples to the filter can be easily detected and the sign of the corresponding LUT output can be corrected accordingly (by taking the two's complement), if necessary.
The number of samples required in the filter LUT depends on both the number of filter coefficients (N filt ) and N in . The output values of the LUT are found by multiplying each filter coefficient by each unsigned filter input value. Since the filter coefficients are symmetric about the midpoint of the filter, the number of stored multiplications can be reduced to just over half the number of filter coefficients instead. As a result, the total number of samples that must be stored in the transmitter for each spreading factor can be found using (12). It should be noted that the filter input value of zero included in N in can be disregarded since the output of the coefficient multiplication(s) will simply be zero as well.
As an example, Table 3 displays the total numbers of samples required for three different systems calculated with (12). This example considers a standard LoRa device that does not use chirp segmentation or a multiplier-less filter (N filt = 0), and two devices using chirp segmentation with length-17 and 81 multiplier-less SRRC filters, respectively. The filter lengths were selected based on their ability to reduce the occupied bandwidth of LoRa signals for different BW settings [8]. When supporting individual spreading factors, the length-17 filter requires almost double the number of stored samples compared to the standard system. However, when supporting multiple spreading factors, the difference is not as substantial. Furthermore, if the standard device implements filtering with hardware multipliers, it will require the use of at least N filt +1 2 multipliers for the filter in addition to the samples provided in Table 3.
In this regard, chirp segmentation improves the feasibility of implementing the length-17 filter without significant resource usage. For accommodating longer filters, the use of chirp segmentation alone does not provide a significant reduction in complexity due to the large number of samples. However, these results were obtained by modelling the system with a very small quantization step size (i.e., Matlab precision) in order to exactly represent the theoretical response. If the quantization step size is increased, it is possible to reduce the number of unique filter input values in order to accommodate the use of longer filters. This is discussed further in the next section.

Quantization
In order to implement a practical LoRa system, some level of quantization is necessary to represent the LoRa chirp signals. Since the chirp sample values are normally between ±1, two integer bits are needed to represent a signed chirp signal. Thus, only the number of fractional bits can be varied and investigated. Let B represent the number of fraction bits used for the system such that the uniform quantization step size is Q = 2 −B .
Assuming the use of chirp segmentation, the values of N gen will be those found in Table 2 as before. However, N in depends on the quantization factor, i.e., the number of fraction bits, B. Table 4 contains the values of N in found for five different values of B, namely 2, 4, 6, 8, and 10 bits.
Once again, the total number of samples required to implement the LoRa transmitter for each quantization factor can be found from (12). The calculated values of N TX for LoRa devices utilizing chirp segmentation and quantization are shown in Tables 5 and 6 with filter lengths of 17 and 81 taps, respectively. Note that the results for the standard system do not change with quantization.
By comparing the results in Tables 5 and 6 with that of Table 3, it is clear that quantization provides a more significant reduction in stored samples than using chirp segmentation alone. In fact, using chirp segmentation and quantization not only can match, but improve upon the results obtained for a standard system that does not use multiplier-less filtering. Representing the real and imaginary chirp samples with 8-bit fractional precision for both cases of filters would be an appropriate solution in this regard. Not only is there a significant sample reduction from the standard case, but no hardware multipliers would be required. The main concern with quantizing the LoRa chirp signals is the potential impact on the decoding performance. The performance results and comparison presented in the next section shall remove this concern. Table 4. N in for LoRa systems with various levels of quantization. (B)   2  4  6  8  10   6  64  5  17  23  24  25  7  128  5  17  40  47  48  8  256  5  17  58  89  95  9  512  5  17  65  160  185  10  1024  5  17  65  233  352  11  2048  5  17  65  257  637  12  4096  5  17  65  257  929 All 5 17 65 257 1025 Table 5. Total number of samples (N TX ) for a system with a length-17 filter, chirp segmentation, and quantization.

Fractional Bit Precision
alongside the measurement guidelines for LoRa devices provided by Semtech [12] in order to comply with FCC regulations as well. Each set of transmitted signals consists of 10 preamble symbols and 250 modulated chirp symbols. The signals were generated in Matlab and then sent to a Keysight N5182B Signal Generator [13]. The central carrier frequency was set to 915 MHz and the transmit power was set to 0 dBm. Since the upsampling factor is equal to 4, the sampling frequency of both the signal generator and spectrum analyzer was set to F s = L × BW = 4BW. Measurements were obtained with the occupied BW mode using the peak detector and max hold traces on the spectrum analyzer. The values were recorded once the traces had stabilized after sweeping across over 350 points. The spectrum analyzer settings, which are summarized in Table 7, were varied with the LoRa bandwidth to comply with the standard [11]. The LoRa spreading factor was set to 10, while the specified bandwidth was set to 125, 250, and 500 kHz. The measured OBW results can be found in Table 8 for each tested bandwidth setting and fractional-bit precision. It is clear from these results that most tested levels of quantization do not impact the measured OBW. Even in the worst case, the difference is only a little over 100 Hz. Additionally, consider sample screenshots taken from the spectrum analyzer shown in Figure 3. There is some noticeable distortion in the passband of Figure 3c and a small change in OBW due to the quantization error. However, there are no noticeable differences between the spectra shown in Figure 3a,b, even though the later is for signals that were quantized to 8 fractional bits. While the example presents results for SF = 10, the measured OBWs for devices with different spreading factors and bandwidth settings also showed minor differences at high levels of quantization. In general, quantizing the chirp samples in the transmitter to a moderate degree does not appear to significantly distort the OBW measurements or the shape of the signal spectra.

FFT and Signal Spectrograms
As illustrated in Figure 1, LoRa symbol demodulation begins with a process known as dechirping [6]. Each received LoRa chirp waveform is multiplied by the complex conjugate of a basic upchirp having the same SF and BW. The product is a pure sinusoid whose frequency corresponds to the frequency offset associated with the modulated symbol value, m [3,6]. The frequency of the dechirped signal is then found by taking the M-point FFT and detecting which frequency bin contains the maximum energy. The index of the detected frequency bin is the LoRa symbol value.
Since decoding the proper symbol value depends on accurate peak detection, it is important to ensure that the peak associated with the symbol value is clearly distinguishable from that of the quantization noise. Figure 4 shows the M-point FFTs and spectrograms associated with quantized and non-quantized LoRa signals. Each tested LoRa signal corresponds to a symbol value of m = 841, spreading factor of 10, and bandwidth of 125 kHz.  While there is a clear increase in the noise level when the quantization step size increases, the desired FFT peaks, and frequency ramps remain clearly visible for each case. Even with only 2-bit fractional precision, the quantization noise is well over 20 dB below the peak of the desired frequency bin. Furthermore, the spectrograms for the cases where B = 8 and B = 10 show hardly any noticeable distortion from the non-quantized case. It is clear that even with significant amounts of quantization noise, information symbols can be decoded properly provided there is no severe noise introduced by the channel.

Bit-Error Rate
The BER tests were performed for filtered LoRa signals with and without quantization to see how they would perform under the effects of noise in an AWGN channel. The bandwidth was set to a fixed value of 125 kHz, while the spreading factor was varied from 6 to 12 for all tests. The BER results for each case are shown in Figure 5. The BER was tested at each desired signal-to-noise ratio (SNR) level by transmitting 175,000 data bursts containing 10 LoRa symbols each. The transmit signal is normalized to have unit power and hence the noise power is calculated based on the desired SNR level as P noise = 10 −SNR 10 . The results for the non-quantized case can be corroborated by the simulated BER measurements of typical LoRa systems given in both [6,14]. It is evident that the results for most of the quantized cases match the non-quantized results, with the exception of the case with two-bit fractional precision. As such, it can be concluded that quantizing LoRa chirp signals to a certain degree does not affect the symbol decoding capabilities of LoRa devices in the presence of noise from an AWGN channel.

Conclusions
Two methods were presented to reduce the complexity of implementing multiplier-less SRRC pulse shaping filters in LoRa transmitters. These methods focus on reducing the required number of samples in the ROM used to generate basic chirp signals, as well as those required for the multiplier-less filter LUT. Chirp segmentation can be used to generate the entire basic LoRa chirp waveform from only a quarter of its samples without adding any additional error to the signal. Quantization can also be used to exponentially decrease the number of unique samples at the input to the multiplier-less pulse shaping filter at the cost of introducing small errors to the transmitted signal.
Using both methods allows for a reduction in the number of stored samples so as to not only match, but also improve upon the results obtained from a standard LoRa device that does not contain a multiplier-less filter. For example, a system using 10-bit fractional precision and a length-17 multiplier-less pulse shaping filter requires fewer samples to be stored in memory compared to a standard LoRa system when supporting spreading factors 6 to 12. Even a device with a length-81 filter requires fewer stored samples than a standard device by quantizing the LoRa chirp samples to 8 fractional bits.
Furthermore, it was shown that moderate levels of quantization do not hinder the decoding performance of LoRa devices, even under harsh channel conditions. Therefore, the quantization factor can be chosen based on the complexity requirements of the system. For example, devices intended for long-range communication require larger spreading factors and, as a result, a higher quantization factor to compensate for the added complexity. In conclusion, using the proposed sample reduction methods can aid in further alleviating the complexity concerns associated with implementing SRRC filters in LoRa devices.

Conflicts of Interest:
The authors declare no conflict of interest. The funding sponsors contribute to define and approve the NSERC/Cisco IRC Program, which includes this research project.

Abbreviations
The following abbreviations are used in this manuscript: