Next Article in Journal
Methodology for Digital Synthesis of Deterministic and Random Jitter Generation on Rising or Falling Edges of Data Pattern
Next Article in Special Issue
DrawerPipe: A Reconfigurable Pipeline for Network Processing on FPGA-Based SmartNIC
Previous Article in Journal
Delay Bound Optimization in NoC Using a Discrete Firefly Algorithm
Previous Article in Special Issue
Memory Optimization for Bit-Vector-Based Packet Classification on FPGA
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Fast Scalable Architecture of a Near-ML Detector for a MIMO-QSM Receiver

1
Electronics and Electrical Engineering Department, Instituto Tecnologico de Sonora, Ciudad Obregon 85000, Mexico
2
Department of Electronics, Systems and Computer Science, Instituto Tecnologico y de Estudios Superiores de Occidente, Tlaquepaque 45604, Mexico
*
Author to whom correspondence should be addressed.
Electronics 2019, 8(12), 1509; https://doi.org/10.3390/electronics8121509
Submission received: 25 October 2019 / Revised: 25 November 2019 / Accepted: 29 November 2019 / Published: 9 December 2019
(This article belongs to the Special Issue New Applications and Architectures Based on FPGA/SoC)

Abstract

:
This paper presents a proposal for an architecture in FPGA for the implementation of a low complexity near maximum likelihood (Near-ML) detection algorithm for a multiple input-multiple output (MIMO) quadrature spatial modulation (QSM) transmission system. The proposed low complexity detection algorithm is based on a tree search and a spherical detection strategy. Our proposal was verified in the context of a MIMO receiver. The effects of the finite length arithmetic and limited precision were evaluated in terms of their impact on the receiver bit error rate (BER). We defined the minimum fixed point word size required not to impact performance adversely for n T transmit antennas and n R receive antennas. The results showed that the proposal performed very near to optimal with the advantage of a meaningful reduction in the complexity of the receiver. The performance analysis of the proposed detector of the MIMO receiver under these conditions showed a strong robustness on the numerical precision, which allowed having a receiver performance very close to that obtained with floating point arithmetic in terms of BER; therefore, we believe this architecture can be an attractive candidate for its implementation in current communications standards.

1. Introduction

A wireless communications system employing multiple antennas achieves a better performance over the wireless channel in terms of capacity and bit error rate (BER) [1] by spreading the transmitted information over space and time in a pattern specified by a space-time block code [2]. Classic examples include V-BLAST, which maximizes the spatial multiplexing gain [3], and the Alamouti code [4], which maximizes the diversity gain. A wide variety of space-time coding techniques has been proposed in the past two decades, each achieving a specific combination of rate and diversity gains and whose decoding requires a certain computational complexity.
Recently, the spatial dimension of MIMO systems was used to modulate the transmitted signal instead of using it for diversity or multiplexing. This technique is known as spatial modulation (SM) [5,6,7]. SM considers the complete array of transmit (Tx) antennas as a spatial constellation, where each Tx antenna in this array represents a point in the spatial constellation. In SM, only one Tx antenna is activated at a time, while the other Tx antennas stay off the air during one symbol transmission. Ideally, there is a different and unique channel between each Tx antenna and a receive (Rx) antenna. Therefore, the receiver can determine which Tx antenna has been utilized in the transmission. In this way, spatially modulated bits add to the received quadrature amplitude modulation (QAM) symbol [8].
Quadrature spatial modulation (QSM) is an SM based transmission technique that uses the quadrature (Q) and the in-phase (I) QAM components to independently modulate a spatial constellation. As a result, QSM has better spectral efficiency (SE) since it transmits twice the number of bits in the spatial domain [9,10,11] with the aim of further improving the performance of SM systems in terms of bit error rate.
The complexity in the receiver is an important factor to take into account for hardware implementations [12], mainly in MIMO and massive MIMO systems where the number of antennas used increases significantly the complexity of the receivers. Some previous works have analyzed this problem for basic QSM transmission schemes. For example, low complexity QSM detectors that consider compressive sensing [13,14], sphere decoding [15], minimum mean squared error (MMSE) [16], equivalent maximum likelihood [17], and zero forcing precoding [18] based low complexity detectors have been recently proposed. However, to the best of our knowledge, practical implementations of hardware architectures for the detection process in MIMO-QSM systems have not been discussed.
Against this background, this research proposes an architecture in FPGA to implement a low complexity near maximum likelihood (Near-ML) detector for a MIMO-QSM system. The proposed low complexity detection algorithm is compared with the ML criterion in terms of BER performance and complexity. Results show that the proposed low complexity Near-ML algorithm performs very near to the optimal detector while achieving a complexity reduction of up to 88% for the analyzed cases.
The contribution of the paper is three-fold:
  • A modified low complexity Near-ML detection algorithm based in sphere detectors for the MIMO-QSM system is proposed.
  • A new reconfigurable architecture to implement in FPGA the Near-ML detection algorithm is proposed.
  • The proposed architecture for the MIMO-QSM receiver can operate for different combinations in transmitter and receiver antennas, including all different sizes of modulation schemes M-QAM, which makes it attractive to be used in the new wireless communication standards.
The remainder of the paper is organized as follows: A description of the QSM general system model with a brief explanation of the QSM implementation, the channel effects at the QSM receiver, and the considerations for an ML detection are presented in Section 2; Section 3 describes the low complexity detection algorithm implemented at the MIMO-QSM receiver; Section 4 show a full description of the proposed architecture for the MIMO-QSM receiver; Section 5 shows the results of BER performance and the proposed architecture to implement the receiver for the MIMO-QSM system. Finally, conclusions are summarized in Section 6.
Notation: Uppercase boldface letters denote matrices, whereas lowercase boldface letters denote vectors. The transpose, Hermitian transpose, and Frobenius norm of A are denoted by A T , A H , and A F 2 , respectively. Finally, CN ( μ , σ 2 ) is used to represent the circularly symmetric complex Gaussian distribution with mean μ and variance σ 2 .

2. MIMO-QSM Transmission System

The system model of the MIMO-QSM transmission scheme is presented in Figure 1. We considered a transmitter with n T transmit antennas and a receiver with n R receiving antennas. Thus, the end-to- end configuration can be considered as a n R × n T MIMO transmission system. We assumed a rich scattering, Rayleigh wireless channel with flat and slow fading, where the channel between transmitter antenna j and receiver antenna i can be modeled as a complex Gaussian gain h i j C ( 0 , 1 ) of zero mean and variance 0.5 per dimension. This gain remains constant for several symbol intervals, after which it changes to a new independent channel realization. The system considered here can transmit m Q S M = log 2 ( M ) + 2 log 2 ( n T ) bits in each time slot, where M is the size of the M-ary quadrature amplitude modulation (QAM) constellation S = { s 1 , s 2 , , s M } and n T is the size of the QSM transmission vector. Thus, the general MIMO-QSM communication system can be mathematically modeled as:
y 1 y n R = γ h 1 , 1 h 1 , n T h n R , 1 h n R , n T x 1 x n T + n 1 n n R ,
which can be expressed equivalently as:
y = γ H x + n ,
where x C n T × 1 is the overall transmission vector and y C n R × 1 is the received vector. H C n R × n T is the channel matrix; γ is the signal-to-noise ratio (SNR) at each antenna; and n C n R × 1 represents the AWGN vector. The generated noise samples are independent and identically distributed (i.i.d.) with CN ( 0 , σ 2 ) .
Further assumptions were that all the antennas transmit information symbols from the same M-QAM constellation, the receiver has perfect channel state information (CSI), and the receiver is perfectly synchronized to the transmitter.

2.1. QSM Modulation

In order to generate the QSM signals x , the input sequence of bits a was split into three flows. One flow was utilized to modulate an M-QAM signal, and the other two flows (spatial bits) were used to modulate the position in the output vector x . For an input bit sequence of length m Q S M bits, the first log 2 ( M ) bits modulated an M-QAM symbol. The remaining 2 log 2 ( n T ) bits were split into two flows of log 2 ( n T ) spatial bits. These spatial bits modulated the position in the output vector x as follows: the real part of the QAM symbol was assigned to one specific position in the output vector of length n T . The imaginary part of the QAM symbol was assigned to another one or even the same position antenna in the output vector x . Finally, these two SM signals were combined to obtain the QSM output vector [9].
Table 1 shows a mapping rule example for QSM using four-QAM and n T = 2 . The first column shows the input bit sequence of length m Q S M , where the first two bits modulate a four-QAM symbol and the remaining two bits modulate the position of the non-zero entries in the output vector x as follows: the real part of the QAM symbol was assigned to one Tx antenna out of n T available in order to modulate log 2 ( n T ) = 1 bit, whereas the imaginary part of the QAM symbol was assigned to another or even the same position to modulate log 2 ( n T ) = 1 bit to define the and index transmitter antenna, respectively, as shown in the third column of Table 1.

2.2. ML Detection

The MIMO transmitted signal is:
r = γ H x
where the transmitted symbol x can be split into two real valued signals, s and s , according to the QSM mapping rule described in Table 1. Since the QAM symbol s is expanded by the matrix H C n R × n T , (3) can be expressed as:
r = h ( ) · s + j h ( ) · s ,
where h ( ) and h ( ) denote the th and th columns of H , respectively, with , 1 , 2 , , n T , and the vectors s = S and s = S of dimension q and q , respectively, represent the different real and imaginary parts of the symbols belonging to the M-QAM constellation S ; finally, the symbol · represents the product among the th and th columns of H and each element of the vectors s and s .
Assuming that the receiver had perfect channel state information, the ML estimation compared the distance between the received signal and all possible received signals. The ML criterion is defined as:
x ^ = argmin x X y γ H x F 2 ,
where F 2 is the Euclidean distance among two vectors and X C n T × 2 m Q S M is the full spatial modulation QSM used in the transmitter; therefore, the ML detector jointly estimated the two possible active Tx antenna indices, ^ and ^ , and the corresponding real valued signals s ^ and s ^ . Therefore, (5) can be written as:
^ , ^ , s ^ , s ^ = argmin , , s , s y h ( ) · s j h ( ) · s F 2

3. Low Complexity Detection Algorithm

In recent works, some low complexity algorithms have been presented for MIMO, SM, and QSM signal detection [10,13,19]. In the works presented by [14,16], optimization algorithms and trigonometric functions were required in the receiver to detect the most likely antenna combinations. After the antenna indices were detected, a reduced ML detector was utilized to identify the transmitted symbols in the MIMO-QSM system. However, these schemes demand many hardware implementation resources. Other detection techniques for MIMO-SM-QSM were reported in [15,19,20]. These detectors were based on tree search and spherical detection. These algorithms had an excellent BER performance, and their detection complexity in terms of flops were relatively simple for hardware implementation in FPGA.
In this section, a modified low complexity Near-ML detector for MIMO-QSM signals is presented. The proposed detector is based on a tree search and a spherical algorithm [20,21].
The ML solution to (6) may be expressed as a tree search: each branch in this tree was assigned a distance metric where the symbols with the smallest overall distance were selected as possible optimum solutions [22] in each level of the tree. To carry out this process, an adaptive M-algorithm based on a breadth first sorted tree search was used. The proposed algorithm reduced the search complexity by storing only, at maximum, the best L branches at a time [23] in each level of the tree. Henceforth, a small L resulted in low complexity and relatively sub-optimal performance. As L increased, the complexity of the detector in terms of flops also increased, and the performance of the algorithm approached the ML solution.
The decision metrics d 1 and d T , required in the Near-ML proposal detector for the MIMO-QSM scheme can be established as follows:
d 1 = y h ( ^ ) · s ^ F 2 ,
d T = y h ( ^ ) · s ^ ( l ) h ( ^ ) · s ^ ( m ) F 2 .
We denote the l th and m th element in the vectors s and s by s ( l ) and s ( m ) , respectively. The goal of the decoder is to find the optimum solution to the ML criterion in (6), using the distances calculated in (7) and (8). For the case of the distance d 1 , it is a vector of distances calculated for each valid combination between the ^ th transmitter antenna h ( ^ ) and each element of the vector s . The decoding procedure was split into two parts. First, a pre-ordering with s was carried out; specifically, the distance of (7) was calculated, and symbols were re-ordered in ascending order. In this way, a set of N = q n T tuples was obtained, each tuple ^ ( l ) , s ^ ( l ) being formed of a combination of the Tx antenna index ^ 1 , , n T and the Tx symbol s ^ ( l ) . This part is summarized in Algorithm 1.
The second part corresponds to an optimized detector based on the detector for SM signals published in [21]. Since the first part of the Near-ML algorithm estimated the real part of the transmitted QSM symbol, ^ ( l ) , s ^ ( l ) , the second part considered that only one Tx antenna was active, which corresponded to the imaginary part of the QSM transmitted symbol ^ ( m ) , s ^ ( m ) . Therefore, the proposed method performed the search using the following modified Rx vector:
y ( 1 ) = y h ( ^ ( l ) ) s ( l ) , l = 1 , , N .
The decision metrics d 2 required in the second part or the Near-ML proposal detector for the MIMO-QSM scheme can be established as follows:
d 2 = y ( 1 ) h ( ^ ) · s ^ F 2 ,
Algorithm 1 QSM real part decoding.
Require: Channel matrix H , received vector y , s , n T
Ensure: The set of tuples ordered ( ^ ( l ) , s ^ ( l ) )
 1: Let d 1 = [ · ] , tuple = [ · ]
 2: for i = 1 : n T do
 3: Let d 1 = d 1 y h ( i ) · s F 2
 4: for l = 1 : q do
 5:  Let tuple = tuple ( ^ ( l ) = i , s ^ ( l ) )
 6: end for
 7: end for
 8: Let d ord = sort ( d 1 ) in ascending order.
 9: Order tuple with the same order of ord
10: Return tuple , d ord
For the case of the distance d 2 , it was a vector of distances calculated for each valid combinations between the ^ th transmitter antenna h ( ^ ) and each element of the vector s . This part of the Near-ML algorithm is summarized in Algorithm 2. We denote the i th column of H like h ( i ) and the j th row of h ( i ) and y ( 1 ) like as h ( i ) ( j ) and y ( 1 ) ( j ) , respectively.
Algorithm 2 QSM imaginary part decoding.
Require: Channel matrix H , modified received vector y ( 1 ) , s , n T , n R V t h 2
Ensure: The optimum tuple ( ^ , s ^ ) , d T
 1: Let d 2 = [ · ] , tuple = [ · ]
 2: for i = 1 : n T do
 3: Let d 2 = d 2 y ( 1 ) ( 1 ) h ( i ) ( 1 ) · s F 2
 4: for m = 1 : q do
 5:  Let tuple = tuple ( ^ ( m ) = i , s ^ ( m ) )
 6: end for
 7: end for
 8: Let d ord = sort ( d 2 ) in ascending order.
 9: Order tuple with the same order of ord
10: Let l i m = q n T
11: for i = 2 : n R do
12: Let d m i n =
13: for m = 1 : l i m do
14:  Let ^ = ord ( m ) and s ^ = s ord ( m )
15:  Let e r r = y ( 1 ) ( i ) h ( ^ ) ( i ) · s ^ 2
16:  Let d ( m ) = d ( m ) + e r r
17:  if d ( m ) < d m i n then
18:   Let d m i n = d ( m )
19:   Let ^ = ord ( m ) and s ^ = s ord ( m )
20:  else if d ( m + 1 ) > V t h 2 then
21:    l i m = m
22:    break
23:  end if
24: end for
25: end for
26: Return [ ^ , s ^ ] , d m i n
The complete Near-ML detector is described with detail in Algorithm 3. Each iteration produced a symbol estimation ^ , s ^ , ^ , s ^ with distance d T . Symbol pairs tuple , ^ , s ^ whose distance d m i n were not smaller than the previous ones were skipped. In each iteration, we used the criterion ( V t h 1 = n R γ and V t h 2 = 2 V t h 1 ). The detector used the metrics of the sphere detector to stop the search and discard branches of the tree that were not viable solutions because they exceeded the maximum radius of the detection sphere according to [15,24]. With this modification, the number of branches for each level was adaptive and depended on the SNR and the channel. For these reasons, the proposed algorithm had a significantly reduced complexity.
Algorithm 3 Complete Near-ML detector.
Require: Channel matrix H , modified received vector y , s , s n T , n T , q, N b , γ , N
Ensure: Optimum ^ , s ^ , ^ , s ^
 1: Let d T =
 2: Let V t h 1 = n R γ and V t h 2 = 2 V t h 1
 3: Execute the QSM real part decoding to obtain tuple , d ord
 4: Let l i m = l e n g t h ( tuple )
 5: for m = 1 : l i m do
 6: Let ( ^ ( m ) , s ^ ( m ) ) = tuple ( m )
 7: Let y ( 1 ) = y h ^ s ^
 8: Execute the QSM imaginary part decoding to obtain [ ^ , s ^ ] , d m i n
 9: if d m i n < d T then
10:  Let d T = d m i n
11:  Let ^ = ( m )
12:  Let s ^ = s ( m )
13:  Let ^ =
14:  Let s ^ = s
15:  if d m i n < V t h 1 then
16:   break
17:  end if
18: else if d T > V t h 1 and V t h 2 < d m i n < V t h 1 then
19:  break
20: end if
21: end for
22: Return ^ , s ^ , ^ , s ^
It is also worth noting that in the proposed algorithm, it is possible to adjust the complexity/BER performance trade-off of the detector the maximum limit of the thresholds V t h 1 and V t h 2 . The advantages of our proposal with respect to other similar schemes recently proposed were: It did not require calculating the QR decomposition; therefore, it was less complex. Additionally, it did not require using complex operations; therefore, it was most adequate for hardware implementation.
In the next subsection, BER performance results and the detection complexity of the proposed scheme were compared to the conventional MIMO-QSM scheme for the ML detection algorithm. Furthermore, the BER performance and the complexity of the proposed low-complexity detection algorithm were analyzed.

3.1. BER Performance Comparison of the MIMO-QSM Scheme

In this subsection, two different configurations were used in order to compare the BER performance of the proposed MIMO-QSM scheme for the ML detection under uncorrelated Rayleigh fading channels. The systems were analyzed considering the same spectral efficiency, the same number of Tx and Rx antennas, and a normalized transmission power in the transmitter. For all computer simulations, we targeted a BER of 10 4 .
Figure 2 shows the performance comparison for the optimal ML and the proposed low complexity Near-ML algorithm for the MIMO-QSM scheme using the 2 × 2 and 8 × 8 configuration with QPSK modulator. For both cases, the proposed detector performed very near to the ML algorithm, and for this reason, we called our proposal detector Near-ML.

3.2. Complexity

The ML detection complexity η for the ML criterion in (6) was measured in terms of complex operations (CO). One arithmetic operation was considered as one CO; also, one comparison was considered as one CO. The lattice in the system had 2 m Q S M points. Subtraction, obtaining the square, and finding the minimum in (6) resulted in 2 m Q S M + 2 CO. Considering n R receive antennas, the complexity for ML in (6) can be approximated by:
η 2 m Q S M + 2 n R .
Table 2 shows a comparison of the complexity for two MIMO configurations. In Table 2, the QSM scheme with ML detector is considered as the reference with 100% of complexity for an SNR of 9 dB. The last column shows the complexity of the proposed Near-ML algorithm where it is observed how this detector outperformed significantly the QSM ML detector in terms of complexity.
The results showed that the proposed Near-ML algorithm performed very near to the optimal one with the advantage of a reduction in detection complexity of 88%.

4. Proposed Hardware Architecture for the Digital QSM Detector

A hardware architecture that implements the detection algorithm presented in Section 3 was proposed. In order to illustrate it, the top module is shown first, and then, the internals of each module are explored.
The design of the architecture followed a top-down approach, and it was composed of the data-path that included four modules and a control unit. The detector is presented in Figure 3. The det1and det2 modules implemented the first and second parts of the detection process, described in Algorithms 1 and 2, respectively; the sort module performed the sort operation used in the three algorithms; the fd module compared the distance metrics needed to determine the received symbol (inside the for cycle in Algorithm 3); and the ctrl module provided the timing signals and activation flags for the data-path.
From this point onwards, a bold line represents a bit vector, whereas a thin line represents a single bit.

4.1. Top View of the QSM Detector Architecture

Using the description of the QSM communication system in Section 2.1 and the architecture proposed in Figure 3, a top view of the detector is presented considering the following parameters:
  • NT: number of Tx antennas.
  • NR: number of Rx antennas.
  • WL: word length used for fixed point representation.
  • IP: amount of bits used to represent the integer part of data.
  • FP: amount of bits used to represent the fractional part of data. Result of subtracting WL and IP.
  • HXQC: number of columns of an auxiliary matrix needed for calculations in det1.
  • HXQ1S: size of an auxiliary vector needed for calculations in det2.
Considering Table 3, which summarizes the inputs and outputs of the whole system, the detection process started when the received signal y was mapped into ports yt_real and yt_imag. This input signal was then processed by det1, which executed Algorithm 1 in hardware. After that, det2 received data in its yt1_real and yt1_imag (equivalent to the modified received vector y) ports to continue with Algorithm 2.
The sort operation was utilized by both detection processes; therefore, sort can be used by det1 and det2.
At the final step of the detection process, the fd module compared the current detection results with predefined parameters ( V t h 1 , V t h 2 , d m i n , in Algorithm 3); if the process matched these parameters, the current results became the final results; otherwise, the detection process was restarted until the conditions were met. The ctrl module controlled and synchronized the whole interoperability of the architecture.
The outputs detected_index1 and detected_index2 represent the pair [transmit antenna, symbol sent] for QSM.
In what follows, the modules of the detector are described in detail.

4.2. det1 Module

The detection started with det1, implementing Algorithm 1. Its architecture is shown in Figure 4, and an explanation of the inputs and outputs is in Table 4.
Modules:
  • Norm (norm): operation of Euclidean distance.
  • RegisterArray (ra): register array for the results of norm.
  • SortingRegs (sregs): register array for the results of sort.
When the detection started, the norm modules computed the Euclidean distance operation between (yt_real, yt_imag) and (aux_real, aux_imag) corresponding to the norm operation presented in Line 3 of Algorithm 1.
The results were then added and stored in the corresponding ra registers in an orderly manner. When every register had data stored, these results were sent to the sort module in a parallel way, as better described in Section 4.4.
When the sorted data returned, as seen in “From sort” in Figure 4, the data and indexes were stored, in the same order as they came out of the sorting module, in their respective sregs, which was another register array.

4.3. det2 Module

The second detection process, det2, was at the same time divided into two parts representing the for cycles in Lines 2 and 11, respectively, of Algorithm 2. Figure 5 shows the first part (det2_p1) with the first norm blocks and the second part (det2_p2) with the last block. When det2_p1 ended, det2_p2 immediately started. Table 5 shows the inputs and outputs of the module.
Modules:
  • Norm (norm): operation of Euclidean distance.
  • RegisterArray for det2 (ra_det2): register array for the results of norm in det2_p1.
  • SortingRegs_SP (sregs_sp): register array for the sorting results in det2.
  • SortingRegs_SP2 sregs_sp2: register array for the sorting indexes results in det2.
  • RearrangerP(rp): rearranger for the sorting indexes.
After the calculations with the norm modules were done, ra_det2 sent its stored data to sort. When the data came back from the module, they were split into the data vector and index vector to be stored in sregs_sp and sregs_sp2, respectively.
sregs_sp was a register array that had both serial and parallel inputs and outputs and stored the sorting data. It also had outputs like data1_out that fed the control module, equivalent to d ( m + 1 ) in Algorithm 2, Line 19.
sregs_sp2 stored the indexes given by the sort module. It only had a parallel input, but serial and parallel outputs; this configuration helps when you need to reorder all indexes (parallel output, as in Line 9 of Algorithm 2) or you need to read only one index.
In det2_p2, another norm module was used, and its result was added to one of the already stored in sregs_sp, then fed back to the same register and sent to sorting again, emulating the behavior of Lines 14 and 15 of Algorithm 2.
When coming back from sorting, rp rearranged the old indexes stored in sregs_sp2 based on the new ones and replaced them. For example, if the stored vector was [ 3 , 1 , 2 , 4 ] and the new indexes were [ 2 , 1 , 4 , 3 ] , then the rearranged vector would be [ 1 , 3 , 4 , 2 ] .

4.4. sort Module

The sort module was the most used during the detection process. Sorting networks were chosen as the option for sorting in FPGA [25].
A sorting network is one of the most efficient and traditional ways of sorting in FPGA. They are attractive due two main reasons: they do not require control instructions, and they are relatively easy to parallelize due to the simplicity of the data flow. Sorting networks are adequate for sorting short arrays, the length of which is known beforehand.
According to the detection algorithm, besides sorting, the network must indicate in which position of the array the elements were originally, before entering the network and being sorted; similar to how MATLAB does it with its integrated function [ B , I ] = s o r t ( A ) [26]. There was already a work that addressed this problem in hardware [27], so the architecture proposed there was used here.
The implemented sorting network consisted of purely combinational comparators so, as the network grew according to the sorting needs, the critical route of the system increased as well.
Figure 6 shows the structure of the sorting network, and Table 6 describes the inputs and outputs of the module.
The vector Sorting_Elements came from the registers in det1 and det2, and Sorted_Elements was composed of the sorted data and their respective indexes that were going to be stored in sregs.

4.5. fd and ctrl Modules

The fd module made the decision of accepting the current detected indexes as valid or not. It achieved this by comparing the detection results with predefined parameters specific to QSM detectors ( V t h 1 , V t h 2 , d m i n , in Algorithm 3). In case the results were accepted, the detection process ended; otherwise, det2 started over with different data. This was implemented with counters and a state machine, so it emulated the behavior of the for cycle in Algorithm 3.
The ctrl module grouped three independent modules that controlled the three main parts of the detector (det1, det2_p1, and det2_p2).
The control modules for det1 and det2_p1 were composed mainly of counters that checked the number of elapsed clock cycles, representing their respective for cycles in the algorithms.
The control module for det2_p2 differed from the other two control modules. This one consisted of a state machine and counters that performed the for cycles in Algorithm 2, specifically Lines 11 and 13. As is seen in Line 13, the lim variable was known until execution time and changed depending on the data, the reason why a state machine was required for proper control.

5. Analysis of Hardware Implementation and Verification Results

5.1. Hardware Budget

In order to implement the proposed design, an Intel-Altera Cyclone IV EP4CE115 FPGA was used. In Table 7, the amount of resources, maximum frequency, and throughput, are shown for two representative cases of the QSM communication systems. The number in brackets represents the percentage of resources used out of the total available in the FPGA device.
Table 8 presents a breakdown of post-synthesis resources used by the different modules of the architecture for the 2 × 2 QPSK and 2 ×2 16-QAM configurations, respectively. Naturally, the amount of resources went up as the order of modulation and the number of antennas increased.
According to the results, the SORT block was the module with the greatest amount of hardware resources. It used only combinational components (as it was based on sorting networks) [27] to make a comparison between elements and increased in size as the number of elements to sort rose. Given these characteristics, the critical route of the whole implementation was established by this module and affected the general performance of the architecture.
Due to the algorithm being inherently recursive, the number of clock cycles was data dependent. Table 7 shows the max frequency considering the slow 1200 mV 0 C model; and throughput, which is the number of detection processes that can be done per second in the worst case.

5.2. Simulation Results

In order to verify the results, a MATLAB implementation was used to generate random input vectors written in text files for the architecture. These files were fed into a test bench that controlled the reading of the test vectors and the writing of the results.
Figure 7 summarizes the testing process to obtain the results and compare them with the outputs of the MATLAB algorithm.
Fixed point simulations were performed in order to determine the ideal IP and FP parameters depending on the BER performance of the algorithm. Figure 8 shows the comparison for different configurations of word length against the floating point model used (4 × 4, QPSK). Taking as a reference Figure 8, the fixed point format IP = 5 and FP = 11 had a close performance to the floating point model.
Figure 9 shows the timing diagram of the detection process. When start = 1, yt_real and yt_imag were processed by det1 and after five cycles in the case of a QPSK modulation, the flag done_detection1 was set to one (END_DET1 in Figure 9), indicating that det2_p1 could start. At this point, Algorithm 1 was performed.
The det2_p1 module read the yt1_real and yt1_imag inputs and processed them according to Algorithm 2. Immediately after, det2_p2 did the same with the yl_real and yl_imag inputs and set done_detection2 to one (END DET2) when it finished, informing the fd module that the data were ready for it to make a decision. At this point, Algorithm 2 was performed.
The fd module required one clock cycle to make the decision of accepting the current detected indexes or not. In the case of this simulation, said indexes were not accepted at first, so it sent the appropriate signals to start det2 over.
det2 read its respective signals and was performed again. When it finished (represented by the rightmost END DET2 in Figure 9), the fd module took another clock cycle to decide. This time, the detected indexes were accepted, as the finished_detection flag was raised, meaning that the detection process (Algorithm 3) was over and the detected data (FINAL DATA in Figure 9) were valid.

6. Conclusions

A low complexity detection algorithm based on a tree search and spherical detection, in the context of MIMO QSM transmission, was presented. It was shown that the proposed algorithm achieved a similar performance to the ML detector, but with a significant complexity reduction in terms of the operations required in its software and hardware implementation. Fixed point analysis showed that BER performance was maintained on the detection process, allowing a simpler hardware implementation rather than the hardware needed for an implementation using floating point precision. The novel hardware architecture showed the feasibility of the hardware implementation of the proposed algorithm using fixed point precision and the process of transformation from the algorithm to hardware architecture.
A possible workaround for the critical route would be the implementation of pipeline stages inside the sorting module in order to improve the maximum frequency and throughput significantly, even if the added registers would mean a redesign of the control unit.

Author Contributions

Conceptualization, I.L., J.C. and L.P.-E.; Methodology, I.L., J.C. and L.P.-E.; Software, I.L. and J.C.; Experimentation, I.L., J.C., L.P.-E. and O.L.-G.; Validation, I.L., J.C., L.P.-E. and O.L.-G.; Formal Analysis, I.L., J.C., L.P.-E., O.L.-G. and A.G.; Investigation, I.L., J.C., L.P.-E., O.L.-G. and A.G.; Resources, J.C. and A.G.; Data Curation, I.L.; Writing—Original Draft Preparation, I.L.; Writing—Review & Editing, I.L., J.C., L.P.-E. and O.L.-G.; Visualization, I.L., J.C., L.P.-E., O.L.-G. and A.G.; Supervision, I.L., J.C., L.P.-E. and O.L.-G.; Project Administration, J.C. and A.G.; Funding Acquisition, J.C. and A.G.

Funding

The present article was jointly funded by PFCE 2019, CONACYT scholarship and PROFAPI 2019.

Conflicts of Interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

References

  1. Wang, C.X.; Haider, F.; Gao, X.; You, X.H.; Yang, Y.; Yuan, D.; Aggoune, H.M.; Haas, H.; Fletcher, S.; Hepsaydir, E. Cellular architecture and key technologies for 5G wireless communication networks. IEEE Commun. Mag. 2014, 52, 122–130. [Google Scholar] [CrossRef] [Green Version]
  2. Zheng, L.; Tse, D.N.C. Diversity and multiplexing: A fundamental tradeoff in multiple-antenna channels. IEEE Trans. Inf. Theory 2003, 49, 1073–1096. [Google Scholar] [CrossRef] [Green Version]
  3. Foschini, G.J. Layered space-time architecture for wireless communication in a fading environment when using multi-element antennas. Bell Labs Tech. J. 1996, 1, 41–59. [Google Scholar] [CrossRef]
  4. Alamouti, S.M. A simple transmit diversity technique for wireless communications. IEEE J. Sel. Areas Commun. 1998, 16, 1451–1458. [Google Scholar] [CrossRef]
  5. Basar, E.; Wen, M.; Mesleh, R.; Di Renzo, M.; Xiao, Y.; Haas, H. Index Modulation Techniques for Next-Generation Wireless Networks. IEEE Access 2017, 5, 16693–16746. [Google Scholar] [CrossRef]
  6. Bai, Z.; Peng, S.; Zhang, Q.; Zhang, N. OCC-Selection-Based High-Efficient UWB Spatial Modulation System Over a Multipath Fading Channel. IEEE Syst. J. 2019, 13, 1181–1189. [Google Scholar] [CrossRef]
  7. Wen, M.; Zheng, B.; Kim, K.J.; Di Renzo, M.; Tsiftsis, T.A.; Chen, K.; Al-Dhahir, N. A Survey on Spatial Modulation in Emerging Wireless Systems: Research Progresses and Applications. IEEE J. Sel. Areas Commun. 2019, 37, 1949–1972. [Google Scholar] [CrossRef] [Green Version]
  8. Renzo, M.D.; Haas, H.; Ghrayeb, A.; Sugiura, S.; Hanzo, L. Spatial Modulation for Generalized MIMO: Challenges, Opportunities, and Implementation. Proc. IEEE 2014, 102, 56–103. [Google Scholar] [CrossRef]
  9. Mesleh, R.; Ikki, S.S.; Aggoune, H.M. Quadrature Spatial Modulation. IEEE Trans. Veh. Technol. 2015, 64, 2738–2742. [Google Scholar] [CrossRef]
  10. Castillo-Soria, F.; Cortez, J.; Gutierrez, C.; Luna-Rivera, M.; Garcia-Barrientos, A. Extended quadrature spatial modulation for MIMO wireless communications. Phys. Commun. 2019, 32, 88–95. [Google Scholar] [CrossRef]
  11. Hussein, H.S.; Elsayed, M.; Mohamed, U.S.; Esmaiel, H.; Mohamed, E.M. Spectral Efficient Spatial Modulation Techniques. IEEE Access 2019, 7, 1454–1469. [Google Scholar] [CrossRef]
  12. Mesleh, R.; Hiari, O.; Younis, A. Generalized space modulation techniques: Hardware design and considerations. Phys. Commun. 2018, 26, 87–95. [Google Scholar] [CrossRef]
  13. Xiao, L.; Yang, P.; Fan, S.; Li, S.; Song, L.; Xiao, Y. Low-Complexity Signal Detection for Large-Scale Quadrature Spatial Modulation Systems. IEEE Commun. Lett. 2016, 20, 2173–2176. [Google Scholar] [CrossRef]
  14. Yigit, Z.; Basar, E. Low-complexity detection of quadrature spatial modulation. Electron. Lett. 2016, 52, 1729–1731. [Google Scholar] [CrossRef] [Green Version]
  15. Al-Nahhal, I.; Dobre, O.A.; Ikki, S.S. Quadrature Spatial Modulation Decoding Complexity: Study and Reduction. IEEE Wirel. Commun. Lett. 2017, 6, 378–381. [Google Scholar] [CrossRef]
  16. Li, J.; Jiang, X.; Yan, Y.; Yu, W.; Song, S.; Lee, M.H. Low Complexity Detection for Quadrature Spatial Modulation Systems. Wirel. Pers. Commun. 2017, 95, 4171–4183. [Google Scholar] [CrossRef]
  17. Zheng, B.; Chen, F.; Wen, M.; Ji, F.; Yu, H.; Liu, Y. Low-Complexity ML Detector and Performance Analysis for OFDM With In-Phase/Quadrature Index Modulation. IEEE Commun. Lett. 2015, 19, 1893–1896. [Google Scholar] [CrossRef]
  18. Li, J.; Wen, M.; Cheng, X.; Yan, Y.; Song, S.; Lee, M.H. Generalized Precoding-Aided Quadrature Spatial Modulation. IEEE Trans. Veh. Technol. 2017, 66, 1881–1886. [Google Scholar] [CrossRef]
  19. Al-Nahhal, I.; Basar, E.; Dobre, O.A.; Ikki, S. Optimum Low-Complexity Decoder for Spatial Modulation. IEEE J. Sel. Areas Commun. 2019, 37, 2001–2013. [Google Scholar] [CrossRef] [Green Version]
  20. Jiang, Y.; Lan, Y.; He, S.; Li, J.; Jiang, Z. Improved Low-Complexity Sphere Decoding for Generalized Spatial Modulation. IEEE Commun. Lett. 2018, 22, 1164–1167. [Google Scholar] [CrossRef]
  21. Zheng, J.; Yang, X.; Li, Z. Low-complexity detection method for spatial modulation based on M-algorithm. Electron. Lett. 2014, 50, 1552–1554. [Google Scholar] [CrossRef]
  22. Anderson, J.; Mohan, S. Sequential Coding Algorithms: A Survey and Cost Analysis. IEEE Trans. Commun. 1984, 32, 169–176. [Google Scholar] [CrossRef] [Green Version]
  23. Hassibi, B.; Vikalo, H. On the sphere-decoding algorithm I. Expected complexity. IEEE Trans. Signal Process. 2005, 53, 2806–2818. [Google Scholar] [CrossRef] [Green Version]
  24. Xiao, L.; Yang, P.; Xiao, Y.; Fan, S.; Di Renzo, M.; Xiang, W.; Li, S. Efficient Compressive Sensing Detectors for Generalized Spatial Modulation Systems. IEEE Trans. Veh. Technol. 2017, 66, 1284–1298. [Google Scholar] [CrossRef]
  25. Mueller, R.; Teubner, J.; Alonso, G. Sorting Networks on FPGAs. VLDB J. 2012, 21, 1–23. [Google Scholar] [CrossRef]
  26. MathWorks. Sort Array Elements—MATLAB Sort. Available online: https://www.mathworks.com/help/matlab/ref/sort.html (accessed on 6 September 2018).
  27. López Mendoza, I.; Pizano Escalante, J.L.; Cortez González, J.; Longoria Gándara, O.H. Implementation of a parameterizable sorting network for spatial modulation detection on FPGA. In Proceedings of the 2019 IEEE Colombian Conference on Communications and Computing (COLCOM), Barranquilla, Colombia, 5–7 June 2019; pp. 1–6. [Google Scholar]
Figure 1. The MIMO-QSM system model.
Figure 1. The MIMO-QSM system model.
Electronics 08 01509 g001
Figure 2. Performance comparison for an 2 × 2 and 8 × 8 configuration at m Q S M = 4 and m Q S M = 8 bpcu respectively.
Figure 2. Performance comparison for an 2 × 2 and 8 × 8 configuration at m Q S M = 4 and m Q S M = 8 bpcu respectively.
Electronics 08 01509 g002
Figure 3. QSM detector architecture.
Figure 3. QSM detector architecture.
Electronics 08 01509 g003
Figure 4. Architecture of det1.
Figure 4. Architecture of det1.
Electronics 08 01509 g004
Figure 5. Architecture of det2.
Figure 5. Architecture of det2.
Electronics 08 01509 g005
Figure 6. Architecture of sort.
Figure 6. Architecture of sort.
Electronics 08 01509 g006
Figure 7. Process of the generation of simulation results.
Figure 7. Process of the generation of simulation results.
Electronics 08 01509 g007
Figure 8. Fixed point analysis for the algorithm, 16-bit word length, 4 × 4, QPSK.
Figure 8. Fixed point analysis for the algorithm, 16-bit word length, 4 × 4, QPSK.
Electronics 08 01509 g008
Figure 9. Simulation in Modelsim of the proposed detector.
Figure 9. Simulation in Modelsim of the proposed detector.
Electronics 08 01509 g009
Table 1. Example of the QSM mapping rule with n T = 2 .
Table 1. Example of the QSM mapping rule with n T = 2 .
InputQAM SymbolPosition AntennaOutput Vector
as ( , ) x i
0000 1 + j (1, 1) 1 + j , 0
0001 1 + j (1, 2) 1 , + j
0010 1 + j (2, 1) j , + 1
0011 1 + j (2, 2) 0 , 1 + j
0100 1 + j (1, 1) 1 + j , 0
0101 1 + j (1, 2) 1 , + j
0110 1 + j (2, 1) j , 1
0111 1 + j (2, 2) 0 , 1 + j
1000 1 j (1, 1) 1 j , 0
1001 1 j (1, 2) 1 , j
1010 1 j (2, 1) j , 1
1011 1 j (2, 2) 0 , 1 j
1100 1 j (1, 1) 1 j , 0
1101 1 j (1, 2) 1 , j
1110 1 j (2, 1) j , + 1
1111 1 j (2, 2) 0 , 1 j
Table 2. Comparison of complexity ( η ) .
Table 2. Comparison of complexity ( η ) .
Scheme/ η QSMQSM
MLNear ML
4 bpcu 2 × 2 256282
QPSK(100%)(73%)
8 bpcu 8 × 8 12,2881475
QPSK(100%)(12%)
Table 3. Inputs and outputs of the detector.
Table 3. Inputs and outputs of the detector.
InputSize (Bits)Goes intoDescription
yt_realWL × NRdet1Real part of the received data, y,
in antennas for det1
yt_imagWL × NRdet1Imaginary part of the received data, y,
in antennas for det1
aux_realWL × NR × NTdet1Real part of the operation h ( i ) s
for det1 (Algorithm 1, Line 3)
aux_imagWL × NR × NTdet1Imaginary part of the operation h ( i ) s
for det1 (Algorithm 1, Line 3)
yt1_realWLdet2Real part of the received data, y,
in antennas for det2
yt1_imagWLdet2Imaginary part of the received data, y,
in antennas for det2
aux1_realWL × HXQ1Sdet2Real part of the operation h ( i ) s
for det2 (Algorithm 2, Line 3)
aux1_imagWL × HXQ1Sdet2Imaginary part of the operation h ( i ) s
for det2 (Algorithm 2, Line 3)
yl_realWLdet2Real part of the received data, y,
in antennas for det2_p2
yl_imagWLdet2Imaginary part of the received data, y,
in antennas for det2_p2
auxl_realWLdet2Real part of the operation h ( i ) s
or det2_p2 (Algorithm 2, Line 14)
auxl_imagWLdet2Imaginary part of the operation h ( i ) s
for det2_p2 (Algorithm 2, Line 14)
OutputSize (Bits)Goes toDescription
detected_index1 l o g 2 (NT × HXQC)Detector outputFirst detected index
detected_index2 l o g 2 (NT × HXQC)Detector outputSecond detected index
Table 4. Inputs and outputs of det1.
Table 4. Inputs and outputs of det1.
InputSize (Bits)Comes fromDescription
yt_realWL × NRSystem inputReal part of the received data,
y, in antennas
yt_imagWL × NRSystem inputImaginary part of the received data,
y, in antennas
aux_realWL × NR × NTSystem inputReal part of the operation h ( i ) s
for det1 (Algorithm 1, Line 3)
aux_imagWL × NR × NTSystem inputImaginary part of the operation h ( i ) s
for det1 (Algorithm 1, Line 3)
OutputSize (Bits)Goes toDescription
data_out (data)WLfdFinal data of det1 ( d in Algorithm 1)
data_out (indexes) l o g 2 (NT × HXQC)fdFinal index of det1 ( ord in Algorithm 1)
Table 5. Inputs and outputs of det2.
Table 5. Inputs and outputs of det2.
InputSize (Bits)Comes fromDescription
yt1_realWLSystem inputReal part of the received data in antennas
minus the influence of the detected data in det1
(as in Line 7 of Algorithm 2)
yt1_imagWLSystem inputImag.part of the received data in antennas
minus the influence of the detected data in det1
(as in Line 7 of Algorithm 2)
aux1_realWL × HXQ1SSystem inputReal part of the operation
h ( i ) s for det2 (Algorithm 2, Line 3)
aux1_imagWL × HXQ1SSystem inputImaginary part of the operation
h ( i ) s for det2 (Algorithm 2, Line 3)
yl_realWLSystem inputReal part of the received data in
the remaining antennas for det2_p2
yl_imagWLSystem inputImaginary part of the received data in
the remaining antennas for det2_p2
auxl_realWLSystem inputReal part of the operation
h ( i ) s for det2_p2 (Algorithm 2, Line 14)
auxl_imagWLSystem inputImaginary part of the operation
h ( i ) s for det2_p2 (Algorithm 2, Line 14)
OutputSize (Bits)Goes toDescription
data_out (data)WLfdFinal data of det2 ( d in Algorithm 2)
data_out (indexes) l o g 2 (NT × HXQC)fdFinal index of det2 ( ord in Algorithm 2)
Table 6. Inputs and outputs of sort.
Table 6. Inputs and outputs of sort.
InputSize (Bits)
Sorting_ElementsWL×NT×HXQC
ElementsWL
Index l o g 2 (NT×HXQC)
OutputSize (Bits)
Sorted_ElementsWL + l o g 2 (WL × HXQC) × E2S × NT × HXQC
Table 7. Overall implementation results of the detector in a Cyclone IV FPGA.
Table 7. Overall implementation results of the detector in a Cyclone IV FPGA.
ConfigurationLogic ElementsEmbedded MultipliersMax FrequencyThroughput
2 × 2, QPSK2385 (2%)28 (5%)37.11 MHz416,966 ops
2 × 2, 16-QAM6863 (5%)36 (6%)20.74 MHz171,404 ops
Table 8. Resources used per module of the architecture.
Table 8. Resources used per module of the architecture.
2 × 2, QPSK Detector ModuleLCCombinationalLC RegistersDSPElements
det130815216
det240814012
sort127900
fd33320
ctrl69370
2 × 2, 16-QAM Detector ModuleLC CombinationalLC RegistersDSP Elements
det138632016
det283528820
sort469600
fd34360
ctrl84420

Share and Cite

MDPI and ACS Style

Lopez, I.; Pizano-Escalante, L.; Cortez, J.; Longoria-Gandara, O.; Garcia, A. Fast Scalable Architecture of a Near-ML Detector for a MIMO-QSM Receiver. Electronics 2019, 8, 1509. https://doi.org/10.3390/electronics8121509

AMA Style

Lopez I, Pizano-Escalante L, Cortez J, Longoria-Gandara O, Garcia A. Fast Scalable Architecture of a Near-ML Detector for a MIMO-QSM Receiver. Electronics. 2019; 8(12):1509. https://doi.org/10.3390/electronics8121509

Chicago/Turabian Style

Lopez, Ismael, L. Pizano-Escalante, Joaquin Cortez, O. Longoria-Gandara, and Armando Garcia. 2019. "Fast Scalable Architecture of a Near-ML Detector for a MIMO-QSM Receiver" Electronics 8, no. 12: 1509. https://doi.org/10.3390/electronics8121509

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop