Joint Intra/Inter-Slot Code Design for Unsourced Multiple Access in 6G Internet of Things

Unsourced multiple access (UMA) is the technology for massive, low-power, and uncoordinated Internet-of-Things in the 6G wireless system, improving connectivity and energy efficiency on guaranteed reliability. The multi-user coding scheme design is a critical problem for UMA. This paper proposes a UMA coding scheme based on the T-Fold IRSA (irregular repetition slotted Aloha) paradigm by using joint Intra/inter-slot code design and optimization. Our scheme adopts interleave-division multiple access (IDMA) to enhance the intra-slot coding gain and the low-complexity joint intra/inter-slot SIC (successive interference cancellation) decoder structure to recover multi-user payloads. Based on the error event decomposition and density evolution analysis, we build a joint intra/inter-slot coding parameter optimization algorithm to minimize the SNR (signal-to-noise ratio) requirement at an expected system packet loss rate. Numerical results indicate that the proposed scheme achieves energy efficiency gain by balancing the intra/inter-slot coding gain while maintaining relatively low implementation complexity.


Background
Recently, academics and industry have proposed many prospects for the evolution of the IoT (Internet-of-Things) in the next generation [1]. In general, the 6G IoT system mainly faces the following three challenges: The rapid growth of connectivity [2], the guarantee of low latency under specific reliability [3], and the requirement of low power consumption and low implementation complexity [4].
Based on this, Polyanskiy proposed the concept of unsourced multiple access (UMA) in 2017 [5]. The UMA removes the coordination center in the network to reduce the transmission cost and latency of the frequent access of vast short packages, which is the major drawback of traditional schemes OMA (Orthogonal Multiple Access) [6], coordinated NOMA (Non-Orthogonal Multiple Access) [7], and grant-free access [8]. The uplink channel is always available to users, and the network works in the unsourced style, adapting to large-scale frequent connection requests. As a result, the system optimization criteria need to be changed from sum-rate to PUPE (Per-User Probability of Error), that is, to achieve the massive access ability under a specific average packet loss rate. The UMA's achievability bound was given in [5], laying the foundation for research in this field.

Related Works
After that, many works emerged, exploring how to approach the performance bound through practical coding design under UMA [9,10]. These schemes are based on three basic paradigms: random spreading, T-Fold Aloha [11], and T-Fold IRSA (Irregular Repeated Slotted Aloha) [12].
The random spreading scheme regards the entire transmission frame as the available length of code chips for each user. The users' data packets are superimposed after the symbol-level spreading. Representative works are the sparse IDMA (Interleave-Division Multiple Access) [13] based on LDPC (Low-Density Parity Check Code) and Polar-RS (Random Spreading) [14] and Polar-SS (Sparse Spreading) [15] based on polar codes. As such schemes occupy the entire frame to form a low-rate code, thereby obtaining the highest coding gain, which has a similar performance to that close to the boundary. However, at the same time, their high implementation complexity is also a problem that cannot be ignored.
T-Fold Aloha divides the transmission frame into slots with equal lengths. On each slot, a multi-access code is designed to control the error rate under multi-user superposition, ensuring a decodable threshold up to T. A classical realization is the concatenated code scheme [16], where the outer code guarantees the T-Fold feature, and the inner code treats noise. The compressed sensing (CS) encoder in CCS (Coded Compressed Sensing) [17] and SPARC (SParse Regression Code) [18] schemes plays the above two roles at the same time and has higher coding efficiency. However, the tree code also suffers an error propagation effect, and the CS coding gain is still limited in the region of high users.
T-Fold IRSA paradigm originated from the research on contention resolution Aloha [19] by G. Liva. Its design consists of two main aspects: the T-Fold intra-slot code and the packet-level inter-slot code. Cheng et al. proposed the SC-LDPC+IRSA scheme [20], spreading the SC-LDPC (Spatially Coupled-LDPC) packets by IRSA. Polar-IRSA [21] changed the SC-LDPC to the SCL decoded multi-user polar code with better performance under the short code length and achieved higher performance gain. In addition, [22] also gives a theoretical analysis of the IRSA framework based on the finite block length bound [23] and asymptotic analysis. Such schemes have performance advantages over T-Fold Aloha; under certain conditions, their performance is close to the high-complexity random spreading schemes. However, these schemes are based on the idealized asymptotic assumptions, the exploitation of intra-slot coding gain is limited, and there is still some room compared with the achievability bound.
In addition, some works also discuss the theoretical analysis [24] and coding scheme [25] design transferred from the AWGN channel to the Rayleigh block fading channel. Besides, MIMO (Multi-Input Multi-Output) transmission [26], channel estimation, activity detection [27], and their co-design [28] push the horizon from theory to practical deployment.

Contributions
Following the works under the T-Fold IRSA paradigm, we continue to investigate the trade-off between the intra-slot coding gain and the inter-slot diversity gain. Besides, with the premise of realization, we attempt to obtain improvement or balance in terms of performance and complexity. The main contributions of the proposed scheme are the following aspects:

1.
Enhanced intra-slot coding structure. We apply the IDMA scheme with CS header to intra-slot code design. The user payload is split into two parts, encoded by IDMA and CS encoders separately, and combined as the intra-slot codeword. The CS part carries the user-specific interleaver pattern of the IDMA codeword, and the IDMA part is the multi-access code resolving the superposition interference.

2.
Joint intra/inter-slot iterative decoder. Under the IRSA paradigm, the superimposed spreading pattern of intra-slot code packets is expanded as a compound inter-slot factor graph, recovered by combining the CS pilot decoding results among the slots. The ESE+BP decoder of the intra-slot IDMA code is the embedded operation on the slot nodes. The inter-slot SIC iteration is performed on the graph to eliminate the interference on the slot nodes, making the overload slot nodes decodable.

3.
Joint intra/inter-slot coding parameter optimization. To minimize the required SNR under the given PUPE standard and a limited number of channel resources, we follow the idea of error event decomposition to build the framework of parameter optimization. The error caused by each coding module is modeled as a function of its coding parameter. Especially, the inter-slot degree distribution is analyzed by density evolution with finite-length realization and energy cost conditions. Then we integrate these error functions in a global optimization problem and design a heuristic bootstrap search algorithm to jointly optimize all these related parameters, including the intra-slot CS pilot length, the IDMA coding rate, and the inter-slot degree distribution.

Content Organization
This article organizes its content in the following way. Section 2 gives some basic concepts and universal notations to help clarify the whole framework. Section 3 focuses on the encoder's design and the proposed scheme's decoding algorithm. Next, Section 4 discusses the coding parameter optimization problem based on error analysis by decomposition and the unified joint optimization algorithm. Finally, by numerical simulation, Section 5 evaluates its energy efficiency and computational complexity under the optimized configuration.

Definitions and Notations
The core problem of UMA is to construct a coding scheme that allows K a unsourced active users to transmit length-B data payload for each on a fixed length-N tot frame at a target PUPE and given SNR level. By definition [5], the PUPE can be expressed as: where k is the index of users andŵ k is the decoded version of transmitted payload w. When , N tot , and B are given, for each K a , there exists an optimal configuration for a coding scheme that makes the required SNR minimum, which is referred to as the SNR threshold. Thus, the energy efficiency of any UMA scheme can be characterized by the K a -SNR threshold curve. Under the T-Fold IRSA paradigm, the entire frame of length N tot is sliced into V slots with length N = N tot /V for each. The degree distribution λ(x) determines the distribution of the repetition rate of the intra-slot coded packets for each user. Since the packets are randomly distributed on slots, the number of superimposed packets L v on slot v is also randomized. The multi-user interference increases with L v . T-Fold means that the intra-slot code guarantees that at most T th users can be correctly decoded. Thus, T th is the threshold for the T-Fold IRSA system.
Here are some notations for matrices and vectors. The upper-bold case A represents the matrix, and A i,j represents its element at the i-th row and the j-th column. The lowerbold case c represents the vector, and v i is the i-th element. The vectors are column vectors in default.

Joint Intra/Inter-Slot Coding Scheme
The structure of the proposed coding scheme is given in this section. In general, it is an intra/inter-slot nested structure, as Figure 1 shows.
The payload of each user w k is encoded by the intra-slot code to produce v k , and then randomly repeated β k times to form an inter-slot packet-level spreading pattern. On the receiver side, the slot-by-slot intra-slot decoding is integrated into the packet-level SIC iteration process on the inter-slot factor graph. The intra-slot codeword v k contains the IDMA codeword x k with a CS pilot s k indicating both the intra-slot interleaving pattern and the inter-slot spreading pattern. The inter-slot code is an IRSA structure enabling the SIC. Thus, the proposed scheme is introduced under the above framework.

Encoder
The intra-slot encoder is concatenated with the inter-slot encoder. The overall scheme of the encoder is depicted in Figure 2a. For each user k, the intra-slot coded packets are the non-zero 'chips' of the inter-slot code. The intra-slot codeword consists of the CS pilot and the IDMA-coded part, each carrying part of the payload. Due to the unsourced feature, users must adopt a common codebook. Therefore, the CS pilot not only determines the user-specific configuration of the intra-slot code but also represents the structure of the inter-slot code, which are both randomized to deal with the multi-user interference.

Repetition
Pattern

Intra-Slot Encoding
The permutation pattern of each user should be known at the receiver side to enable the decoding process, by transmitting it as an encoded pilot attached ahead of the IDMA data coded part. Thus, we slice w l , the information payload of user l, into two parts: the pilot info sequence d l = [w l,1 , · · · , w l,B s ] of length B s that carries both the permutation pattern and part of the user info, along with the data info sequence b l = [w l,B s +1 , · · · , w l,B ] of length B c = B − B s for IDMA encoder.
The function of the CS encoder is to map d l into pilot s l . The binary sequence d l is firstly converted to decimal τ l . Then τ l is through a bijective map to the column index of M s -by-N s sensing matrix A, usually configured as a normalized Gaussian random matrix. As the length of A is B s , the number of columns of A, N s ≥ 2 B s . A natural but effective mapping is to choose the τ l + 1-th column of A as the length-N s pilot, i.e., s l = a τ l +1 .
The data sequence b l is sent to a common rate-R L LDPC encoder identical among all users, generating codeword bit sequence u l with length B u = B c /R L . To introduce intra-slot coding diversity, u l is repeated R r times to produce a low rate codeword c l with length B c , where the repetition rate is also the same for all users. The next step is the user-specific interleaver. As mentioned before, the permutation pattern is determined by d l . The decimal τ l converted from d l is used to choose the τ-th pattern f τ l ∈ F in the common pattern set F . Through permutation function c l = f τ l (c l ), we get the interleaved bit sequence c l . According to constellation G, c l is modulated to IDMA codeword symbol sequence Finally, x l and s l are stitched together to form the whole packet of user l, i.e., v l = [b l ; d l ]. After assembling, the length of v l is N = N s + N c . In general, the intra-slot encoder is common for every user. This not only keeps the simplicity of the encoding process but also ensures the common codebook requirement of unsourced settings. The ability of intra-slot code to distinguish superimposed user packets mainly comes from the compressed sensing pilot code and IDMA's user-specific random interleaver, which are all dependent on the randomness of the pilot info slice d l . The whole structure of the intra-slot encoder is shown in Figure 2b.

Inter-Slot Irregular Spreading
The inter-slot encoding process is based on the intra-slot code, which adds irregular spreading diversity among different user packets by random scheduling, while the encoder remains the same structure shared by all users. Each user k determines the number of packet repeats β k based on the local random scheduler, making the distribution of beta P(β k = i) approaches the preset λ(x). Under the IRSA scheme, λ(x) is the distribution polynomial of β k , written as: where λ i is the probability of user node of degree-i on the packet superposition factor graph and satisfies the normalized constraint λ(1) = ∑ I max l=1 λ i = 1. Therefore, the random scheduler generates β k ∼ λ(x) and then creates a vector: which is then randomly permuted to ensure the 1-elements uniform, producing the β ksparse irregular diversity mapping vector δ k . Let the intra-slot codeword v k take Kronecker product with spreading pattern δ k : v k is sent to the slot where the element is one in δ k . When β = 1 the packet is not repeated, and when β > 1 the packet gains repetition diversity. Each user follows the above structure and superimposes their spread packets z k in the AWGN (Additive White Gaussian Noise) channel. The received signal y is: where n is the Gaussian noise with variance σ 2 . Through this inter-slot encoding, the repeated user packets on different slots form a sparse spreading structure, providing packet-level diversity gain.

Joint Intra/Inter-Slot Decoder on Compound Factor Graph
Accordingly, the receiver follows the mirrored structure. The inter/intra-slot coding configurations are first recovered by pilot decoding, and then the intra-slot decoder is embedded as an operation on the slot nodes of the inter-slot SIC procedure. The receiver will be introduced in the following subsections under the above framework.

Intra-Slot Decoder
The iteration begins with the intra-slot packet decoding process. The received packet on each slot v can be sliced from the whole received signal y: where L v is the number of superimposed user on slot v. According to the intra-slot encoding structure, y v can be further separated into two parts: the CS header y s Since the IDMA decoding requires the interleaver pattern, the CS pilot is decoded first. y s v can be expanded by: where e τ l +1 is an 1-sparse vector with all-zeros elements except for position τ l + 1, and is an L-sparse vector, assuming no resource collision among users. By sending y s v to the support recovery algorithm [29], the column index set of A, as we modeled, the support setD s , is searched out. Remap the elements inD s back to decimalŝ τ l by aligning theτ l − 1-th column toτ l . Then convert the decimalsτ l to length-B s binary sequencesd l , which are the pilot info parts of user payloads. Besides, the interleaver patterns fτ l are recovered by selecting the τ-th pattern fτ l ∈ F .
After the recovery of fτ l , the IDMA decoding can be started. Since the intra-slot repetition rates are the same among users, we can utilize a simple linear algorithm, the ESE+BP iterative structure [30], as depicted in Figure 3.  The ESE+BP iteration may continue for a specific number of times I 1 max , and then the final hard-decision outputb l at the BP decoder of each branch is stitched together with the corresponding pilot info partd l , reconstructing the complete payload decoding resultŵ l .

Inter-Slot Decoder
To recover the superimposed multi-user packets on the overload slots, the interslot decoding performs SIC on the compound packet-level factor graph known by the reconstruction process. After the initial intra-slot decoding on all slots, the CS pilots should be recovered as pilot infod v , while the corresponding IDMA partsb v are not all successfully decoded, especially for those on the overload slots.
Using the CS pilots as pointers, the repetition relationship can be confirmed by comparingd v,k in one slot withd v 1 ,k 1 on the others. According to Section 3.1.2, the number of unique pilots among all slots is the number of users K a . For user packet k, the indexes of the slots where a replica of it exists form a set: where max |U v | = I max . We combine those sets U 1 , · · · , U K a as adjacent matrix U, where the edge between user node k and slot node v is determined. Thus, the intra-slot packet spreading structure can be recovered as the compound factor graph in Figure 4. And The process of the inter-slot decoding described in Algorithm 1 is performed on this graph structure.

Algorithm 1
Inter-slot SIC decoding on the compound packet-level factor graph

Require:
The received signal y v on each slot v, compressed sensing matrix A, interleaver set F , consteallation A, noise variance σ 2 , threshold T th , maximum SIC iteration J. Ensure: Decoded payloads of all usersŵ k . 1: Initialization: Perform the intra-slot decoding in Section 3.2.1 on each slot, and get the pilot infod v,k , the IDMA decoding resultb v,k , and the superimposed number of users L 1 v = |D s v | on each slot; 2: Factor graph reconstruction: Compare all the CS pilotsd v,k , and combine the repetition relationship sets U l into adjacent matrix U by (8); 3: while j ≤ J do 4: Update the slot nodes subsets V + j and V − j by (9), and the user node subsets K + j , K − j and K + v accordingly; 5: for all v ∈ V − do 6: if L j v − K + v ≤ T th then 7: Forward message Passing (from user node l to slot node v ); 8: Remap the user massages in the effective edge setb k , k ∈ K + v to IDMA packets x SIC k ; 9: Peel off the known interference x SIC k as (11); 10: Backward Message Passing (from slot node v to user node k);

11:
Intra-slot decoding: Perform the IDMA decoding part on slot v , get the recovered user informationb v ,k ; 12: User node update: Add/Subtract the newly recovered user on slot v in K + j+1 / K − j+1 ; 13: Update the slot counter L At the j-th SIC iteration, according to the T-Fold IRSA model, the slot nodes set can be separated into two subsets by the given threshold T th where the initial slot node degree L 1 v = |D s v |. Also, the user nodes set has two subsets, the successfully decoded set K + j and the undecoded set K − j . For each underload slot node v ∈ V + that satisfies the decodable condition, add its adjacent user node k into K + j . Based on this graph, the message passed from the user node to the slot node is the remapped decoded IDMA packet x SIC k . After the SIC peeling session, if the degree of an overload slot node can be reduced under the threshold T th , this slot will be decodable, where its output message is the reliable decoded IDMA info partsŵ k . Define the effective edge set of slot v as K + v = {k|U k,v = 1, k ∈ K + }, then the decodable condition of overload slot v after the j-th SIC is: If this condition is satisfied, peel the known interference off on this slot: Then send y SIC v to the intra-slot decoder in Section 3.2.1, where the reliable decoded messagê w v ,k can be recovered. Consequently, the user node subsets K + j and K − j can be updated by the backward messages. Perform this process on each overload slot v ∈ V − . As a result, |K − j | will decrease after each iteration, for the underload slots are always helping the overload ones by message propagation on the factor graph. Under appropriate conditions, the SIC can converge to the required level.

Performance Analysis and Parameter Optimization
The goal of the coding parameter optimization problem of the proposed scheme is to minimize the SNR threshold under a given PUPE (in Section 2). However, due to the complicated intra/inter-slot encoding structure, the relationship between the coding parameters and the performance indicator is not explicit. Therefore, this section addresses the problem by error event decomposition [16]. By breaking the system-level PUPE down to modulelevel error rates, the error contribution of each module can be analyzed and correlated with their parameters and eventually form a system-level parameter optimization problem.

Error Rate Analysis by Decomposition
According to the decoder structure described in Section 3.2, the PUPE in (1) can be decomposed into four parts, as depicted in Figure 5. Although, as described in Algorithm 1, the decoding structure consists of the compound intra/inter-slot iteration, it can nonetheless be regarded as a three-stage process when analyzing errors due to its successive style. The first stage is the CS pilot recovery, where the pilot resource collision d k 1 = d k 2 , k 1 = k 2 occurs at the transmitter side and the support recovery errord k = d k at the receiver side. The pilot plays three significant roles: the first part of the user payload, the user-specific IDMA interleaver, and the pointer used to reconstruct the packet-level factor graph. Thus, the pilot error will cause not only packet loss of its user but also the chain effect spreading to the correlated packets of other users. The IDMA decoding errorb k = b k and the remaining error after the intra-slot SIC procesŝ b SIC k = b k are both conditioned on the successfully recovered pilots. Then the modular errors are expressed as the functions of their corresponding coding parameters in (12).
The CS pilot parameters N s and B s are restricted by the collision and detection conditions. 1 is the resource collision avoiding condition in [16]. The length of the separated pilot info part from user payload B s should be large enough to provide non-collision patterns. As for 2 , the dimension of the sensing matrix is bounded by the Restricted Isometry Property (RIP) [31] condition. In other words, when T th is fixed, large row dimension N s can reduce the 2 . Thus, we can conclude that N s ∝ T th and B s ∝ T th .
The IDMA block error rate 3 is the function 3 = P(B c , R L , R r , L v , SNR). According to the classical analysis of IDMA, 3 is based on the single-user performance of the inner rate-R L FEC code and degrades with the gradually severe interference caused by the increase of the number of superimposed users L v . The SNR cost of 3 at a required level can be extracted on the performance curve of a given threshold T th by simulation.

Inter-Slot Degree Distribution Analysis
As the SNR cost of the IDMA system increases with T th , the SIC decoding on the compound factor graph of the inter-slot code can further reduce the threshold T th to reach the same level of PUPE, resulting in the reduction of SNR threshold. Thus, degree distribution optimization aims to minimize T th .
First, we start with an idealized asymptotic investigation. Previously the T-Fold IRSA was modeled as a factor graph, on which the user node degree distribution λ(x) determines the slot node degree distribution ρ(x): where Bino(·) denotes the binomial distribution. The probability of high-degree slot nodes increases with the average packet repetition rate λ (1) and decreases with more slots V, representing intensified superposition. Figure 6a gives an example of this rule. Moreover, the convergence behavior of the SIC on the factor graph can be characterized by density evolution (DE), which is similar to the LDPC message passing procedure under the erasure channel [32]. At the t-th iteration, the erasure probability of the slot nodes is φ t and η t for the user nodes: where x 0 = 1 at the beginning without a priori knowledge of user nodes, and ρ(x) can be derived from λ as in (13). Under appropriate λ(x), the erasure probability of slot nodes φ t gradually descends with iteration. As λ(φ t ) is a positive-coefficient polynomial function, η t+1 (φ t ) is monotonically increasing, indicating that η t also converges to 0 with descending φ t . Meanwhile, the higher the threshold T th is, the higher the decodable probability of slot nodes ∑ T th r=1 ρ r is. Notice that the two-probability expression corresponds to the two message-passing procedures in Algorithm 1, respectively. Therefore, the SIC iteration can make as many overload user nodes decodable as possible. An example of the SIC procedure simulated by density evolution is depicted in Figure 6b. When V and λ(x) are fixed, the cost of carrying more users K a is the increase of threshold T th to guarantee convergence under limited iterations, i.e., the rise of IDMA's SNR requirement.
The tool to determine the convergence condition of density evolution iteration is its EXIT (EXtrinsic Information Transfer) chart [33]. Under given K a , V and T th , plot the erasure probability curves φ −1 (θ) and η(θ) in (14) on one chart, and find a trajectory between those two curves starting from η(1) = 1. If the trajectory reaches η(0) = 0, this λ(x) enables the SIC iteration to converge, conversely not. Figure 6c,d shows a boundary condition case where the iteration tunnel is about to close for T th . Since λ(x) (x) > 0, η(θ) is always concave. Moreover, this effect becomes stronger when the weights of higher-order coefficients increase. However, it is not easy to explicitly express the boundary condition. We adopt the numerical method, differential evolution [34], to solve the implicit equation and search the range of suitable λ(x): Then we move further to practical consideration. An important preassumption for density evolution analysis is that K a , V → ∞ ensures the isotropy of distribution, while in practice, they are limited. It is challenging for the user-independent random schedulers to guarantee β k ∼ λ(x) when K a is relatively small, especially for the repetition times with low probability (high-order degrees). On the other hand, the finite length effect causes short cycles and trapping loops in the randomly formed factor graph. In some worse cases, the SIC iteration cannot start or converge. Although the effect of 3 can be ignored when considering boundary conditions, the remaining errors on the underload slot nodes may cause error propagation. Nonetheless, Λ * gives a basic scope, which just needs to be narrowed.
We use Monte-Carlo simulation to practically examine the effectiveness of candidates in Λ * . Under finite K a , V, and 3 , the post-iteration packet loss rate of user nodes obtained by simulation is the actual PUPE. Besides, the cost of the packet diversity gain is the additional energy spent on repeated packets. The SNR threshold after inter-slot encoding is: where (E b /N 0 ) T th is the SNR when 3 reaches an effective level under threshold T th . After that, the trade-off of intra-slot encoding should also be considered.

Joint Parameter Optimization Algorithm
Integrating the analysis on the above modules, the complete parameter optimization procedure is proposed as Algorithm 2.

Algorithm 2 Joint optimization of coding parameters
Require: the legnth of user payload B, number of actove users K a , frame length N tot , and target PUPE . Ensure: the length of CS pilot info part B s , length of CS pilot N s , LDPC code rate R L , intra-slot repetition rate R r , and inter-slot packet spreading distribution λ(x). 1: Initialization: Determine N s and B s by the CS decoding conditions, randomly chose rate configuration {R L , R c }, and calculate V; 2: while R > R min do 3: for T th from T min to T max do 4: Obtain 3 and (E b /N 0 ) T th by IDMA simulation; 5: Search the convergable range Λ * using density evolution by (15); 6: for λ(x) from min λ (1) to max λ (1) do 7: Perform Monte-Carlo simulation to validate λ(x) under the residual error 3 ; 8: Output the post-SIC-iteration error rate 4 ; 9: if 4 ≤ then 10: Reserve λ(x) and break; 11: end if 12: end for 13: end for 14: if SNR threshold (16) increases then 15: Choose the {R L , R c } pair with higher total rate R, adjust V accordingly; 16: else 17: Lower the total rate R, adjust V accordingly. 18: end if 19: end while 20: Calculate the SNR threshold by (16) and output the optimal coding parameters.
Overall, we use a heuristic bootstrap method to jointly optimize the coding parameters analyzed above. The CS pilot configurations control 1 and 2 , basically determined by K a . To tackle the contradiction between intra-slot coding gain and inter-slot diversity gain, the IDMA rate R increases with iterations, while the inter-slot diversity decreases in each iteration. The initial rate R is randomly chosen, and then V can be determined. (E b /N 0 ) T th at 3 is extracted on simulation curves. The convergence condition is ensured by (15). Then the effectiveness of λ(x) with increasing energy cost is checked by simulation until it satisfies the post-SIC-iteration PUPE requirement. If the SNR threshold raises compared with the last iteration after optimization, R should be increased to enhance the inter-slot code. If not, lower R to reduce the multi-user interference.

Numerical Results
In this section, we evaluate the proposed scheme using numerical indexes of two aspects: the K a -SNR threshold curve representing energy efficiency and the FLOPf (FLoatingpoint Operations Per frame) comparison representing computational complexity.

Energy Efficiency Analysis
The SNR threshold, by definition, is the minimum required SNR that achieves a target PUPE under specific configurations. As described in Section 2, the K a -SNR curve is acquired under some fundamental constraints. Thus, we give the primary scenario configurations in Table 1. These parameters are shared in the following simulations. To get the SNR threshold for each K a , we optimize the coding parameters by Algorithm 2 point-by-point. The results are displayed in Table 2. Under all the configurations, 1 < 10 −3 and < 10 −4 , so that the pilot error would not affect the subsequent decoders. Through adjustment, the intra-slot coding gain and inter-slot diversity gain are balanced, the sum of which reaches the optimal point. It can be observed that the proportion of the two types of gains varies with K a . In the low K a region, the intra-slot coding gain is more effective against multi-user interference, and the cost of reducing V is affordable. However, when it comes to the high K a region, the main problem is to tackle the rise of threshold T th by the inter-slot SIC. Meanwhile, the severe superposition of packets requires V to increase, so the inter-slot diversity gain dominates.
Next, we compare the proposed scheme with several existing representatives in Figure 7a. Based on the T-Fold Aloha scheme, the CCS-AMP [36] uses CS as the intra-slot encoder to resolve superposition and enhance the AMP algorithm. As introduced in Section 1, the CCS-AMP scheme is based on the T-Fold Aloha scheme, while SC-LDPC+SIC [20] and Polar-IRSA [21] are T-Fold IRSA. Sparse IDMA [13] and Polar-sparse spreading (Polar-SS) [15] are in the random spreading paradigm. Polar-IRSA achieves the best performance in the IRSA category with a better intra-slot code design. The excellent performance of polar codes in short length makes polar code base schemes stand out in low K a conditions where the intra-slot coding gain is critical. Our scheme occupies the middle position among the three for its repetition diversity in intra-slot IDMA code provides more coding gain than pure LDPC. The random spread spectrum schemes have the best performance of all paradigms, which can be decomposed into a cascaded structure on one frame-length long time slot. The sparse outer code can eliminate multi-user interference and provide a particular coding gain in the low K a region.
As K a increases, our scheme maintains the lowest slope and eventually achieves a performance advantage in the high K a region. This is mainly because we effectively control the increase of the threshold T th by utilizing fine optimization of the intra-slot gain and the inter-slot gain. The outer code of the sparse spreading schemes tends to be rateless, and its sparsity and gain gradually diminish. Meanwhile, other IRSA schemes only exploiting packet-level diversity suffer the same problem, let alone the bit-level spreading outperforms the packet-level one. SNR threshold E b /N 0 (dB) our scheme CCS-AMP [36] SC-LDPC+SIC [20] Polar-IRSA [21] Sparse IDMA [13] Polar-SS [15] Achievability bound [5] (a)

Complexity Analysis
Based on the analytical framework proposed in [10], the FLOPf can represent the complexity of the decoder. For each encoding scheme, FLOPf can be expressed as a function of its coding parameters. We compare the proposed scheme with Polar-RS, Polar-IRSA, and CCS-eAMP, the expressions of which are listed in Appendix A. When comparing these schemes, it is insightful to investigate the complexity cost of the corresponding energy efficiency gain. Figure 7b shows the SNR level and the required FLOPf of each scheme when the number of users K a reaches 150.
Although our scheme's and CCS-AMP's performance are the same, the latter costs about ten times more FLOPf than the former. Meanwhile, sparse IDMA spends 100 times more complexity for 0.4dB energy gain, while for the polar-RS scheme, it is 10 6 for 0.8 dB. The intra-slot decoder is a linear iteration and converges after 3 to 5 iterations due to the high coding gain. Besides, the inter-slot iteration is restricted to 10 times by the joint degree distribution optimization. Therefore, our scheme achieves a better energy-complexity trade-off.

Conclusions
The proposed coding scheme achieves performance gain in the high K a region, benefitting from intra/inter-slot gain balance under jointly optimized parameters. Although it is not as good as existing solutions in the low K a region, it achieves a better trade-off between performance and complexity. The joint design of intra/inter-slot code exploits the potential of the T-Fold IRSA scheme more thoroughly. Thus, our scheme would be a prospective candidate for UMA design in next-generation IoT.
For future research, we can consider the fading scenario. Our scheme can easily be promoted to the block fading channel because of two inherent advantages: 1. Inserting the pilot at each slot means the channel response of each user at each slot can be independently estimated; 2. The intra-slot IDMA is a universal code. After optimizing it under the AWGN channel, its configuration can be directly applied to the fading scenario. Table A1. FLOPf expressions of different coding schemes.

Coding Scheme Related Parameters FLOPf
Our scheme N s CS pilot length; B s pilot info length; T th threshold; N c length of IDMA codeword; Z c LDPC lifting size; I 1 maximum iteration of LDPC SPA decoder; n v number of variable nodes; n c number of parity check nodes; V number of slots; I 2 maximum iteration of ESE+SPA IDMA decoder; K a number of users; V number of slots T th (2N s B s + B 2 s N s + N 3 s ) + 2VT th [I 2 (12N c + 3Z c (n c + n v ) 2 I 1 )] +K a N c CCS-AMP [37] N s row size of CS (Compressed Sensing) matrix; 2 G column size of CS matrix;K a number of users; J number of subblocks; I max maximum iteration of CS AMP decoder 4J I max K a N s (2 G + 1) +K a log 2 (K a ) Sparse IDMA [13] N p row size of CS matrix for pilot coding; 2 G column size of CS matrix for pilot coding;K a number of users; RR = λ (1) average repetition rate; RL basic LDPC code rate; N c = n − N p IDMA code length; N c = k − B p IDMA information bit length; I 1 BP user layer iteration; I 2 BP channel layer iteration 4N s K a (2 G + 1) + I 2 [I 1 K a N 2 c ((RR + 1)/RL − 1) 2 +(K a * RR * N c /RL + n) 2 ] * 3 Polar-RS [14] 2 B s total number of spreading sequences; N s length of spreading sequences; N c length of information bits for polar coding; r length of CRC (Cyclic Redundancy Check) parity bits; B c polar code length; g length of segment in energy detector; I max maximum iteration of polar list decoder 2n2 g+B s B c /g + 2N s 2 B s + K a (N 2 s 2 B s + 2 (3B s ) + 2N s 2 B s + I max B c log 2 (B c ) + B c r)