1. Introduction
A wireless communications system employing multiple antennas achieves a better performance over the wireless channel in terms of capacity and bit error rate (BER) [
1] by spreading the transmitted information over space and time in a pattern specified by a space-time block code [
2]. Classic examples include V-BLAST, which maximizes the spatial multiplexing gain [
3], and the Alamouti code [
4], which maximizes the diversity gain. A wide variety of space-time coding techniques has been proposed in the past two decades, each achieving a specific combination of rate and diversity gains and whose decoding requires a certain computational complexity.
Recently, the spatial dimension of MIMO systems was used to modulate the transmitted signal instead of using it for diversity or multiplexing. This technique is known as spatial modulation (SM) [
5,
6,
7]. SM considers the complete array of transmit (Tx) antennas as a spatial constellation, where each Tx antenna in this array represents a point in the spatial constellation. In SM, only one Tx antenna is activated at a time, while the other Tx antennas stay off the air during one symbol transmission. Ideally, there is a different and unique channel between each Tx antenna and a receive (Rx) antenna. Therefore, the receiver can determine which Tx antenna has been utilized in the transmission. In this way, spatially modulated bits add to the received quadrature amplitude modulation (QAM) symbol [
8].
Quadrature spatial modulation (QSM) is an SM based transmission technique that uses the quadrature (Q) and the in-phase (I) QAM components to independently modulate a spatial constellation. As a result, QSM has better spectral efficiency (SE) since it transmits twice the number of bits in the spatial domain [
9,
10,
11] with the aim of further improving the performance of SM systems in terms of bit error rate.
The complexity in the receiver is an important factor to take into account for hardware implementations [
12], mainly in MIMO and massive MIMO systems where the number of antennas used increases significantly the complexity of the receivers. Some previous works have analyzed this problem for basic QSM transmission schemes. For example, low complexity QSM detectors that consider compressive sensing [
13,
14], sphere decoding [
15], minimum mean squared error (MMSE) [
16], equivalent maximum likelihood [
17], and zero forcing precoding [
18] based low complexity detectors have been recently proposed. However, to the best of our knowledge, practical implementations of hardware architectures for the detection process in MIMO-QSM systems have not been discussed.
Against this background, this research proposes an architecture in FPGA to implement a low complexity near maximum likelihood (Near-ML) detector for a MIMO-QSM system. The proposed low complexity detection algorithm is compared with the ML criterion in terms of BER performance and complexity. Results show that the proposed low complexity Near-ML algorithm performs very near to the optimal detector while achieving a complexity reduction of up to 88% for the analyzed cases.
The contribution of the paper is three-fold:
A modified low complexity Near-ML detection algorithm based in sphere detectors for the MIMO-QSM system is proposed.
A new reconfigurable architecture to implement in FPGA the Near-ML detection algorithm is proposed.
The proposed architecture for the MIMO-QSM receiver can operate for different combinations in transmitter and receiver antennas, including all different sizes of modulation schemes M-QAM, which makes it attractive to be used in the new wireless communication standards.
The remainder of the paper is organized as follows: A description of the QSM general system model with a brief explanation of the QSM implementation, the channel effects at the QSM receiver, and the considerations for an ML detection are presented in
Section 2;
Section 3 describes the low complexity detection algorithm implemented at the MIMO-QSM receiver;
Section 4 show a full description of the proposed architecture for the MIMO-QSM receiver;
Section 5 shows the results of BER performance and the proposed architecture to implement the receiver for the MIMO-QSM system. Finally, conclusions are summarized in
Section 6.
Notation: Uppercase boldface letters denote matrices, whereas lowercase boldface letters denote vectors. The transpose, Hermitian transpose, and Frobenius norm of are denoted by , , and , respectively. Finally, is used to represent the circularly symmetric complex Gaussian distribution with mean and variance .
2. MIMO-QSM Transmission System
The system model of the MIMO-QSM transmission scheme is presented in
Figure 1. We considered a transmitter with
transmit antennas and a receiver with
receiving antennas. Thus, the end-to- end configuration can be considered as a
MIMO transmission system. We assumed a rich scattering, Rayleigh wireless channel with flat and slow fading, where the channel between transmitter antenna
j and receiver antenna
i can be modeled as a complex Gaussian gain
of zero mean and variance 0.5 per dimension. This gain remains constant for several symbol intervals, after which it changes to a new independent channel realization. The system considered here can transmit
bits in each time slot, where
M is the size of the
M-ary quadrature amplitude modulation (QAM) constellation
and
is the size of the QSM transmission vector. Thus, the general MIMO-QSM communication system can be mathematically modeled as:
which can be expressed equivalently as:
where
is the overall transmission vector and
is the received vector.
is the channel matrix;
is the signal-to-noise ratio (SNR) at each antenna; and
represents the AWGN vector. The generated noise samples are independent and identically distributed (i.i.d.) with
.
Further assumptions were that all the antennas transmit information symbols from the same M-QAM constellation, the receiver has perfect channel state information (CSI), and the receiver is perfectly synchronized to the transmitter.
2.1. QSM Modulation
In order to generate the QSM signals
, the input sequence of bits
was split into three flows. One flow was utilized to modulate an
M-QAM signal, and the other two flows (spatial bits) were used to modulate the position in the output vector
. For an input bit sequence of length
bits, the first
bits modulated an
M-QAM symbol. The remaining
bits were split into two flows of
spatial bits. These spatial bits modulated the position in the output vector
as follows: the real part of the QAM symbol was assigned to one specific position in the output vector of length
. The imaginary part of the QAM symbol was assigned to another one or even the same position antenna in the output vector
. Finally, these two SM signals were combined to obtain the QSM output vector [
9].
Table 1 shows a mapping rule example for QSM using four-QAM and
. The first column shows the input bit sequence of length
, where the first two bits modulate a four-QAM symbol and the remaining two bits modulate the position of the non-zero entries in the output vector
as follows: the real part of the QAM symbol was assigned to one Tx antenna out of
available in order to modulate
bit, whereas the imaginary part of the QAM symbol was assigned to another or even the same position to modulate
bit to define the
and
index transmitter antenna, respectively, as shown in the third column of
Table 1.
2.2. ML Detection
The MIMO transmitted signal is:
where the transmitted symbol
can be split into two real valued signals,
and
, according to the QSM mapping rule described in
Table 1. Since the QAM symbol
s is expanded by the matrix
, (
3) can be expressed as:
where
and
denote the
and
columns of
, respectively, with
, and the vectors
and
of dimension
and
, respectively, represent the different real and imaginary parts of the symbols belonging to the
M-QAM constellation
; finally, the symbol · represents the product among the
and
columns of
and each element of the vectors
and
.
Assuming that the receiver had perfect channel state information, the ML estimation compared the distance between the received signal and all possible received signals. The ML criterion is defined as:
where
is the Euclidean distance among two vectors and
is the full spatial modulation QSM used in the transmitter; therefore, the ML detector jointly estimated the two possible active Tx antenna indices,
and
, and the corresponding real valued signals
and
. Therefore, (
5) can be written as:
3. Low Complexity Detection Algorithm
In recent works, some low complexity algorithms have been presented for MIMO, SM, and QSM signal detection [
10,
13,
19]. In the works presented by [
14,
16], optimization algorithms and trigonometric functions were required in the receiver to detect the most likely antenna combinations. After the antenna indices were detected, a reduced ML detector was utilized to identify the transmitted symbols in the MIMO-QSM system. However, these schemes demand many hardware implementation resources. Other detection techniques for MIMO-SM-QSM were reported in [
15,
19,
20]. These detectors were based on tree search and spherical detection. These algorithms had an excellent BER performance, and their detection complexity in terms of flops were relatively simple for hardware implementation in FPGA.
In this section, a modified low complexity Near-ML detector for MIMO-QSM signals is presented. The proposed detector is based on a tree search and a spherical algorithm [
20,
21].
The ML solution to (
6) may be expressed as a tree search: each branch in this tree was assigned a distance metric where the symbols with the smallest overall distance were selected as possible optimum solutions [
22] in each level of the tree. To carry out this process, an adaptive
M-algorithm based on a breadth first sorted tree search was used. The proposed algorithm reduced the search complexity by storing only, at maximum, the best
L branches at a time [
23] in each level of the tree. Henceforth, a small
L resulted in low complexity and relatively sub-optimal performance. As
L increased, the complexity of the detector in terms of flops also increased, and the performance of the algorithm approached the ML solution.
The decision metrics
and
, required in the Near-ML proposal detector for the MIMO-QSM scheme can be established as follows:
We denote the
and
element in the vectors
and
by
and
, respectively. The goal of the decoder is to find the optimum solution to the ML criterion in (
6), using the distances calculated in (
7) and (
8). For the case of the distance
, it is a vector of distances calculated for each valid combination between the
transmitter antenna
and each element of the vector
. The decoding procedure was split into two parts. First, a pre-ordering with
was carried out; specifically, the distance of (
7) was calculated, and symbols were re-ordered in ascending order. In this way, a set of
tuples was obtained, each tuple
being formed of a combination of the Tx antenna index
and the Tx symbol
. This part is summarized in Algorithm 1.
The second part corresponds to an optimized detector based on the detector for SM signals published in [
21]. Since the first part of the Near-ML algorithm estimated the real part of the transmitted QSM symbol,
, the second part considered that only one Tx antenna was active, which corresponded to the imaginary part of the QSM transmitted symbol
. Therefore, the proposed method performed the search using the following modified Rx vector:
The decision metrics
required in the second part or the Near-ML proposal detector for the MIMO-QSM scheme can be established as follows:
Algorithm 1 QSM real part decoding. |
Require: Channel matrix , received vector , , Ensure: The set of tuples ordered 1: Let , 2: for do 3: Let 4: for do 5: Let 6: end for 7: end for 8: Let in ascending order. 9: Order with the same order of 10: Return , |
For the case of the distance , it was a vector of distances calculated for each valid combinations between the transmitter antenna and each element of the vector . This part of the Near-ML algorithm is summarized in Algorithm 2. We denote the column of like and the row of and like as and , respectively.
Algorithm 2 QSM imaginary part decoding. |
Require: Channel matrix , modified received vector , , , Ensure: The optimum tuple , 1: Let , 2: for do 3: Let 4: for do 5: Let 6: end for 7: end for 8: Let in ascending order. 9: Order with the same order of 10: Let 11: for do 12: Let 13: for do 14: Let and 15: Let 16: Let 17: if then 18: Let 19: Let and 20: else if then 21: 22: 23: end if 24: end for 25: end for 26: Return , |
The complete Near-ML detector is described with detail in Algorithm 3. Each iteration produced a symbol estimation
with distance
. Symbol pairs
whose distance
were not smaller than the previous ones were skipped. In each iteration, we used the criterion (
and
). The detector used the metrics of the sphere detector to stop the search and discard branches of the tree that were not viable solutions because they exceeded the maximum radius of the detection sphere according to [
15,
24]. With this modification, the number of branches for each level was adaptive and depended on the SNR and the channel. For these reasons, the proposed algorithm had a significantly reduced complexity.
Algorithm 3 Complete Near-ML detector. |
Require: Channel matrix , modified received vector , , , , q, , , Ensure: Optimum 1: Let 2: Let and 3: Execute the QSM real part decoding to obtain , 4: Let 5: for do 6: Let 7: Let 8: Execute the QSM imaginary part decoding to obtain , 9: if then 10: Let 11: Let 12: Let 13: Let 14: Let 15: if then 16: break 17: end if 18: else if and then 19: break 20: end if 21: end for 22: Return |
It is also worth noting that in the proposed algorithm, it is possible to adjust the complexity/BER performance trade-off of the detector the maximum limit of the thresholds and . The advantages of our proposal with respect to other similar schemes recently proposed were: It did not require calculating the QR decomposition; therefore, it was less complex. Additionally, it did not require using complex operations; therefore, it was most adequate for hardware implementation.
In the next subsection, BER performance results and the detection complexity of the proposed scheme were compared to the conventional MIMO-QSM scheme for the ML detection algorithm. Furthermore, the BER performance and the complexity of the proposed low-complexity detection algorithm were analyzed.
3.1. BER Performance Comparison of the MIMO-QSM Scheme
In this subsection, two different configurations were used in order to compare the BER performance of the proposed MIMO-QSM scheme for the ML detection under uncorrelated Rayleigh fading channels. The systems were analyzed considering the same spectral efficiency, the same number of Tx and Rx antennas, and a normalized transmission power in the transmitter. For all computer simulations, we targeted a BER of .
Figure 2 shows the performance comparison for the optimal ML and the proposed low complexity Near-ML algorithm for the MIMO-QSM scheme using the
and
configuration with QPSK modulator. For both cases, the proposed detector performed very near to the ML algorithm, and for this reason, we called our proposal detector Near-ML.
3.2. Complexity
The ML detection complexity
for the ML criterion in (
6) was measured in terms of complex operations (CO). One arithmetic operation was considered as one CO; also, one comparison was considered as one CO. The lattice in the system had
points. Subtraction, obtaining the square, and finding the minimum in (
6) resulted in
CO. Considering
receive antennas, the complexity for ML in (
6) can be approximated by:
Table 2 shows a comparison of the complexity for two MIMO configurations. In
Table 2, the QSM scheme with ML detector is considered as the reference with 100% of complexity for an SNR of 9 dB. The last column shows the complexity of the proposed Near-ML algorithm where it is observed how this detector outperformed significantly the QSM ML detector in terms of complexity.
The results showed that the proposed Near-ML algorithm performed very near to the optimal one with the advantage of a reduction in detection complexity of 88%.
4. Proposed Hardware Architecture for the Digital QSM Detector
A hardware architecture that implements the detection algorithm presented in
Section 3 was proposed. In order to illustrate it, the top module is shown first, and then, the internals of each module are explored.
The design of the architecture followed a top-down approach, and it was composed of the data-path that included four modules and a control unit. The detector is presented in
Figure 3. The
det1and
det2 modules implemented the first and second parts of the detection process, described in Algorithms 1 and 2, respectively; the
sort module performed the sort operation used in the three algorithms; the
fd module compared the distance metrics needed to determine the received symbol (inside the
for cycle in Algorithm 3); and the
ctrl module provided the timing signals and activation flags for the data-path.
From this point onwards, a bold line represents a bit vector, whereas a thin line represents a single bit.
4.1. Top View of the QSM Detector Architecture
Using the description of the QSM communication system in
Section 2.1 and the architecture proposed in
Figure 3, a top view of the detector is presented considering the following parameters:
NT: number of Tx antennas.
NR: number of Rx antennas.
WL: word length used for fixed point representation.
IP: amount of bits used to represent the integer part of data.
FP: amount of bits used to represent the fractional part of data. Result of subtracting WL and IP.
HXQC: number of columns of an auxiliary matrix needed for calculations in det1.
HXQ1S: size of an auxiliary vector needed for calculations in det2.
Considering
Table 3, which summarizes the inputs and outputs of the whole system, the detection process started when the received signal
y was mapped into ports
yt_real and
yt_imag. This input signal was then processed by
det1, which executed Algorithm 1 in hardware. After that,
det2 received data in its
yt1_real and
yt1_imag (equivalent to the modified received vector
y) ports to continue with Algorithm 2.
The sort operation was utilized by both detection processes; therefore, sort can be used by det1 and det2.
At the final step of the detection process, the fd module compared the current detection results with predefined parameters (, , , in Algorithm 3); if the process matched these parameters, the current results became the final results; otherwise, the detection process was restarted until the conditions were met. The ctrl module controlled and synchronized the whole interoperability of the architecture.
The outputs detected_index1 and detected_index2 represent the pair [transmit antenna, symbol sent] for QSM.
In what follows, the modules of the detector are described in detail.
4.2. det1 Module
The detection started with
det1, implementing Algorithm 1. Its architecture is shown in
Figure 4, and an explanation of the inputs and outputs is in
Table 4.
Modules:
Norm (norm): operation of Euclidean distance.
RegisterArray (ra): register array for the results of norm.
SortingRegs (sregs): register array for the results of sort.
When the detection started, the norm modules computed the Euclidean distance operation between (yt_real, yt_imag) and (aux_real, aux_imag) corresponding to the norm operation presented in Line 3 of Algorithm 1.
The results were then added and stored in the corresponding
ra registers in an orderly manner. When every register had data stored, these results were sent to the
sort module in a parallel way, as better described in
Section 4.4.
When the sorted data returned, as seen in “From
sort” in
Figure 4, the data and indexes were stored, in the same order as they came out of the sorting module, in their respective
sregs, which was another register array.
4.3. det2 Module
The second detection process,
det2, was at the same time divided into two parts representing the
for cycles in Lines 2 and 11, respectively, of Algorithm 2.
Figure 5 shows the first part (
det2_p1) with the first
norm blocks and the second part (
det2_p2) with the last block. When
det2_p1 ended,
det2_p2 immediately started.
Table 5 shows the inputs and outputs of the module.
Modules:
Norm (norm): operation of Euclidean distance.
RegisterArray for det2 (ra_det2): register array for the results of norm in det2_p1.
SortingRegs_SP (sregs_sp): register array for the sorting results in det2.
SortingRegs_SP2 sregs_sp2: register array for the sorting indexes results in det2.
RearrangerP(rp): rearranger for the sorting indexes.
After the calculations with the norm modules were done, ra_det2 sent its stored data to sort. When the data came back from the module, they were split into the data vector and index vector to be stored in sregs_sp and sregs_sp2, respectively.
sregs_sp was a register array that had both serial and parallel inputs and outputs and stored the sorting data. It also had outputs like data1_out that fed the control module, equivalent to in Algorithm 2, Line 19.
sregs_sp2 stored the indexes given by the sort module. It only had a parallel input, but serial and parallel outputs; this configuration helps when you need to reorder all indexes (parallel output, as in Line 9 of Algorithm 2) or you need to read only one index.
In det2_p2, another norm module was used, and its result was added to one of the already stored in sregs_sp, then fed back to the same register and sent to sorting again, emulating the behavior of Lines 14 and 15 of Algorithm 2.
When coming back from sorting, rp rearranged the old indexes stored in sregs_sp2 based on the new ones and replaced them. For example, if the stored vector was and the new indexes were , then the rearranged vector would be .
4.4. sort Module
The
sort module was the most used during the detection process. Sorting networks were chosen as the option for sorting in FPGA [
25].
A sorting network is one of the most efficient and traditional ways of sorting in FPGA. They are attractive due two main reasons: they do not require control instructions, and they are relatively easy to parallelize due to the simplicity of the data flow. Sorting networks are adequate for sorting short arrays, the length of which is known beforehand.
According to the detection algorithm, besides sorting, the network must indicate in which position of the array the elements were originally, before entering the network and being sorted; similar to how MATLAB does it with its integrated function
[
26]. There was already a work that addressed this problem in hardware [
27], so the architecture proposed there was used here.
The implemented sorting network consisted of purely combinational comparators so, as the network grew according to the sorting needs, the critical route of the system increased as well.
Figure 6 shows the structure of the sorting network, and
Table 6 describes the inputs and outputs of the module.
The vector Sorting_Elements came from the registers in det1 and det2, and Sorted_Elements was composed of the sorted data and their respective indexes that were going to be stored in sregs.
4.5. fd and ctrl Modules
The fd module made the decision of accepting the current detected indexes as valid or not. It achieved this by comparing the detection results with predefined parameters specific to QSM detectors (, , , in Algorithm 3). In case the results were accepted, the detection process ended; otherwise, det2 started over with different data. This was implemented with counters and a state machine, so it emulated the behavior of the for cycle in Algorithm 3.
The ctrl module grouped three independent modules that controlled the three main parts of the detector (det1, det2_p1, and det2_p2).
The control modules for det1 and det2_p1 were composed mainly of counters that checked the number of elapsed clock cycles, representing their respective for cycles in the algorithms.
The control module for det2_p2 differed from the other two control modules. This one consisted of a state machine and counters that performed the for cycles in Algorithm 2, specifically Lines 11 and 13. As is seen in Line 13, the lim variable was known until execution time and changed depending on the data, the reason why a state machine was required for proper control.