Adaptive Bit-Labeling Design for Probabilistic Shaping Based on Residual Source Redundancy

By using the residual source redundancy to achieve the shaping gain, a joint source-channel coded modulation (JSCCM) system has been proposed as a new solution for probabilistic amplitude shaping (PAS). However, the source and channel codes in the JSCCM system should be designed specifically for a given source probability to ensure optimal PAS performance, which is undesirable for systems with dynamically changing source probabilities. In this paper, we propose a new shaping scheme by optimizing the bit-labeling of the JSCCM system. Instead of the conventional fixed labeling, the proposed bit-labelings are adaptively designed according to the source probability and the source code. Since it is simple to switch between different labelings according to the source probability and the source code, the proposed design can be considered as a promising low complexity alternative to obtain the shaping gain for sources with different probabilities. Numerical results show that the proposed bit-labelings can significantly improve the bit-error rate (BER) performance of the JSCCM system.


Introduction
Bit interleaved coded modulation (BICM) [1] with uniformly distributed symbols leads to a shaping loss and prevents the performance from approaching the Shannon limit for higher order modulation [2]. To close the gap to the Shannon limit, probabilistic amplitude shaping (PAS) schemes were introduced using a distribution matcher (DM) [3]. In [4,5], the PAS schemes using non-binary and protograph low-density parity-check (LDPC) codes were investigated for BICM systems. To further reduce the complexity of binary DM, a PAS scheme via simplified sign-bit shaping was proposed for high spectral-efficiency coding in [6]. In [7], a probabilistic shaping scheme for nongray labelled QAM constellations was proposed.
Although the PAS schemes for BICM have been extensively investigated in recent years, most of them only focus on uniform inputs. Specifically, the constant composition distribution matching (CCDM) utilized in the standard PAS works can only transform Bernoulli (1/2) distributed input bits into output symbols [8].
However, in realistic applications, natural sources often contain substantial amounts of redundancy due to the non-uniform distribution of the source symbols and the source memory. In such cases, source codes should be utilized, which can be divided into two categories: variable-length codes (VLCs) and fixed-length codes (FLCs). Although VLCs exhibit high compression rates, a few bit errors after the channel decoder may dramatically corrupt the decoded source data. To prevent this catastrophic error propagation, FLCs have been proposed and successfully demonstrated for several applications such as GSM and wide-band adaptive multi-rate (WB-AMR) speech transmission [9,10] and image/video transmission [11][12][13]. Because of the limit of the length, the FLCs still exhibit residual redundancy in their outputs.
For non-uniform sources, in [14], protograph LDPC codes are optimized under binary modulation with unequal power allocation. By using the source residual redundancy after source coding to achieve probabilistic shaping, a joint source-channel coded modulation (JSCCM) system was proposed [15]. In [15], the design of the source code parameter is targeted specifically at a given source probability, and thus the residual redundancy left after source coding can be properly used to shape the transmitted symbols. Accordingly, the source-channel code pairs optimized at one source probability may lead to extremely poor performance at another source probability due to the mismatch of the source code parameter and the source probability. However, this characteristic is undesirable for some systems with varying levels of data redundancy since the source-channel code pairs should be optimized at different source probabilities to ensure the PAS performance.
In this paper, a low complexity method to obtain the shaping gain for sources with different probabilities is proposed for the JSCCM system. The main contributions of this paper are as follows: • By studying the effects of bit-labeling in JSCCM systems, it is found that good bitlabelings for different source codes or different source probabilities could be different. • Based on the achievable system rate analysis, a new shaping scheme for the JSCCM system is proposed by optimizing the bit-labeling. • In contrast to the fixed Gray labeling [16], the adaptive design of bit-labelings for the JSCCM system is proposed according to the source codes and the source probabilities.
Since it is much simpler to switch between labelings than to optimize the sourcechannel code pairs for different source probabilities, it is attractive for systems with changing source statistics.
The remainder of the paper is organized as follows. Section 2 presents the JSCCM system model. Section 3 proposes an adaptive design algorithm of the bit-labeling. Section 4 discusses the performance of the system using the adaptively designed labelings in comparison with the fixed labeling. Finally, Section 5 concludes the paper.

System Model
The structure of the JSCCM system over AWGN channels is shown in Figure 1. A nonuniform memoryless binary ("0" and "1") source is considered in this paper, where the probability of "1" is represented as p (p = 0.5). The source entropy is therefore given by Let us consider source protograph LDPC codes with base matrix B s and channel protograph LDPC codes with base matrix B c . Then, the progressive edge growth (PEG) algorithm is employed to generate the corresponding low-density matrices H s and H c by the copy-and-permute operation [17]. The encoding process of the JSCCM system comprises two steps as follows. Firstly, the source bit sequence s is compressed by using the source code as where b is the compressed bit sequence. Then, the compressed bit sequence is protected by the channel code. For systematic binary channel encoding, a systematic generator matrix can be constructed from the parity-check matrix H c and is represented by where I is the identity matrix. The systematic channel codeword x is thus obtained as where c = bP is the parity bit sequence. Assume a quadrature amplitude (QAM) alphabet with 2 m signal points, where m is the modulation order. By setting the channel code rate to be (m − 1)/m, the channel codeword x can be composed of a parity bit sequence c of length N and a compressed bit sequence b of length N × (m − 1), where N represents the lifting factor. Thus, the sequence x is organized as a N × m binary matrix by the interleaver as In (5), the parity bits in c are all transferred to the bit level b 0 and the compressed bits in block b i , 1 ≤ i ≤ m − 1 are transferred to the bit level b i . Subsequently, the QAM symbol sequence X = (X 1 , X 2 , . . . , X N ) for transmission is obtained by row-wise mapping according to the mapping rule.
With the constellation scaling α > 0, the AWGN channel is described by the input-output relationship where X is the modulated symbol sequence, Y is the received symbol sequence, and Z is the AWGN sequence. The overall transmission rate of the JSCCM system is defined in "source bits/channel symbol" as where R s and R c represent the source coding rate and the channel coding rate, respectively. At the receiver, after demodulation and de-interleaving, the joint source and channel decoder is applied to reconstruct the source bit sequence.

Analysis and Design of Bit-Labeling
As shown in [15], in JSCCM systems, the parity bits transferred to the bit level b 0 are uniformly distributed. Meanwhile, the compressed bits transferred to the bit level in block b i , 1 ≤ i ≤ m − 1 can be computed by (2) as the modulo-2 sum of the source bits. Therefore, the probability distribution of the compressed bits is determined essentially by the row weight distribution of B s in combination with the source probability distribution p as [15,18] where w i represents the weight of the i-th row of B s and ( w i k ) denotes the binomial coefficients. Note that p i (1) + p i (0) = 1 and p i (1) = p i (0) for i = 1, 2, . . . , m − 1 when p = 0.5, thus leading to nonuniform input symbol distributions.
Let X denote a set of 2 m -ary QAM constellation points. After interleaving, a symbol mapper maps an m-bit vector Assume that the bits at the input of the modulator are independent, the symbol probabilities for transmission obtained as follows where p i (u) represents the probability of transmitting a bit u ∈ {0, 1} at bit position b i . For a given source probability p and a source code matrix B s , the probability of every bit level can be identified. Then, the symbol probability is determined by the mapping function φ(·), which maps each length-m binary vector [b 0 , . . . , b m−1 ] to a corresponding symbol X, and thus we explicitly indicate this dependence in (9).

Effects of Bit-Labelings
With an aim to illustrate the effects of bit-labelings on the JSCCM system, we compare the BER performance of two different Gray labelings. Figure 2 shows a 16-QAM constellation with L and L labelings, which are both under the rule of one-bit discrepancy between adjacent binary labels.   [15] and B R4J A [19]) and different source probabilities (p = 0.04 and p = 0.96). Note that the two source probabilities contribute to the same source entropy according to (1). It can be found that the labeling L with a competitive advantage at p = 0.04 is outperformed by L at p = 0.96 when B s,1 is utilized. Meanwhile, it can be observed that the better labeling for B s,1 results in worse performance for B R4J A when p = 0.96. Therefore, the good bit-labelings for different source probabilities or different source codes can be different. An adaptive design of bit-labeling according to the source probability and the source code is essential for the JSCCM system.

An Adaptive Design Scheme of Bit-Labeling
In this section, we consider the adaptive design procedure that optimizes the bitlabeling according to the source probability and the source code for a given target transmission rate by the achievable system rate analysis.
For the reliable transmission of an asymmetric source with entropy H(p) over a memoryless AWGN channel with capacity C at a rate of R source bits per channel symbol, the Shannon limit can be expressed as [20] H(p)R < C, where C represents the capacity of two independent Gaussian channels in parallel given by log 2 (1 + R · E s /N 0 ), E s is the average energy per source bit, and N 0 is the one-sided noise power spectral density. Given p and R, the Shannon limit can be interpreted by the smallest E s /N 0 satisfying (10) and expressed as For the QAM-modulated AWGN channel with the alphabet X and the input probability distribution P B s (X, φ) given by (9), the channel capacity is calculated as where α = E s N 0 RN 0 / ∑ X∈X P(X, φ)|X| 2 and P α (Y|X) = 1 πN 0 exp − (Y−αX) 2 N 0 , According to (12), C B s E s N 0 , φ for a given source probability p and a source code matrix B s is conditional on E s /N 0 and the mapping function φ. Thus, by properly designing φ, a shaping gain in E s /N 0 can be obtained. However, it is impossible to derive an analytical solution to φ, and the search for φ has to resort to an exhaustive method.
In Algorithm 1, an adaptive design procedure is proposed to search for the optimal mapping function φ, also known as the optimal bit-labeling, according to the given source probability p and the source code B s at a target transmission rate R. For a practical search, we start from the smallest E s /N 0 satisfying (10), denoted by E s N 0 Shannon , and gradually increase the value of E s /N 0 until the achievable system rates, denoted as C B s E s N 0 , φ /H(p), is around R, at which point we wish to optimize the bit-labeling. Then, we evaluate the channel capacity C B s E s N 0 , φ for different mapping functions and select the mapping function which exhibits the highest capacity. Note that the number of different labelings is very large for the constellation with high-order modulations such as 16QAM, so we focus on Gray codes due to their good error rate performance.
Example: Consider p = 0.05, m = 4, R = 6 bits/symbol, ∆ = 0.1 dB. The source code is the 1/2-rate-B s,1 code [15]. In Figure 4, the achievable system rates R = C B s E s N 0 , φ /H(p) versus E s /N 0 for a 16QAM constellation using the optimized labeling L opt1 and labeling L are provided. For comparison, the Shannon limits C/H(p) are also provided. Compared to labeling L, when R is 6 bits/symbol, the JSCCM system with the optimized labeling can obtain 0.5 dB shaping gain, which reduces the gap to the Shannon limit.

Experimental Results
The system settings for different cases are presented in Table 1. In particular, since the source coding rates are 1/2 and 3/8, the resulting system rates are derived from (7) are 6 bits and 8 bits per symbol, respectively. In all experiments, the length of the source sequence is set as 3600 bits. For cases with different source coding rates, the source and channel codes are designed from case to case with respect to the source and channel coding rates, respectively. The optimal labelings obtained by Algorithm 1 for different cases are shown in Figure 5 and 6 above the constellation points.
The BER curves of the system with the optimized labelings (solid lines) and the labeling L (dashed lines) for different cases are depicted in Figures 7-10. For instance, in Figure 7, for the source code B s,1 , the optimized labeling L opt1 achieves approximately 0.5 dB gain over labeling L at a BER of 10 −6 . Additionally, for the source code B R4J A , our designed L opt2 outperforms L by approximately 1.3 dB at a BER of 10 −5 . Similar observations can be found in Figures 8-10. Particularly, the source code B s,2 in Figure 10, which has extremely poor BER performance with labeling L, achieves a performance gain of at least 3 dB after labeling design. Specifically, the performance gain varies depending on the matching degree of the utilized labeling to the source code and the source probability. For example, using a bit labeling that is highly mismatched to the source probability and the source code could yield a loss of shaping gain, and the performance can be greatly improved by the proposed design of bit-labeling. If the labeling is well matched to the source probability and the source code, the advantage of our method diminishes.

Conclusions
In this paper, a new PAS scheme via bit-labeling design for the JSCCM system is proposed. In contrast to the fixed labeling, an algorithm of adaptive bit-labeling design is proposed according to the source probability and the source code based on the achievable system rate analysis. Simulation results demonstrate that the adaptively designed bitlabelings significantly improve the BER performance of the JSCCM system. Author Contributions: Conceptualization, C.C. and Q.C.; methodology, C.C. and S.L.; software, C.C.; validation, S.L., L.Z., and Q.C.; formal analysis, C.C. and S.L.; investigation, C.C.; resources, S.L.; data curation, C.C.; writing-original draft preparation, C.C.; writing-review and editing, Q.C.; visualization, C.C.; supervision, Q.C.; project administration, C.C.; funding acquisition, C.C. and Q.C. All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: