On Compressed Sensing of Binary Signals for the Unsourced Random Access Channel

Motivated by applications in unsourced random access, this paper develops a novel scheme for the problem of compressed sensing of binary signals. In this problem, the goal is to design a sensing matrix A and a recovery algorithm, such that the sparse binary vector x can be recovered reliably from the measurements y=Ax+σz, where z is additive white Gaussian noise. We propose to design A as a parity check matrix of a low-density parity-check code (LDPC) and to recover x from the measurements y using a Markov chain Monte Carlo algorithm, which runs relatively fast due to the sparse structure of A. The performance of our scheme is comparable to state-of-the-art schemes, which use dense sensing matrices, while enjoying the advantages of using a sparse sensing matrix.


Introduction
The emergence of the Internet of Things (IoT) has motivated much research interest in designing communication protocols for massive machine-to-machine type communication. This type of communication setup is characterized by a large number of users that transmit simultaneously to the same receiver, while each of these users has a very short message to send. In addition, since IoT sensors are often required to be extremely cheap, the transmission scheme must be as simple as possible, and the design objective is to minimize the energy-per-bit, E b /N 0 , under a reliability constraint.
In [1], Polyanskiy defined a communication model capturing the challenges in massive machine-to-machine type communication. In this model, there is an unbounded number of potential users, among which only k are active at each frame. Each active user has a message of B bits to transmit, and transmission takes place over a multiple access channel (MAC). Since the number of users is unbounded, the receiver cannot recover the identities of the active users (as this information has unbounded entropy, assuming all potential users are equally likely to transmit, and the channel has bounded capacity). Thus, the receiver's goal is to recover a list of k messages that contains "most" of the transmitted messages, without identifying the sender of each message. This setup is therefore called the unsourced random access channel [2]. The performance of a communication scheme over the unsourced random access channel is assessed by the tradeoff it achieves between energy-per-bit and the per-user probability of error (PUPE), which is the probability that the message transmitted by an active user did not enter the list of messages the receiver outputs.
Over the last few years, there has been great interest in developing efficient lowcomplexity schemes for the unsourced random access channel [2][3][4][5][6][7][8][9][10][11][12][13][14][15]. A natural approach for this setup is for all users to transmit codewords from the same codebook. It can be easily seen that if A ∈ R n×2 B is a matrix whose columns are the codewords of this codebook, and x ∈ {0, 1} 2 B is a vector whose ith entry equals 1 if one of the active users chose message i and 0 otherwise, the channel output is y = Ax + σz, where z is white Gaussian noise (we assume here for simplicity that no message was chosen by more than one user). Since the number of active users k is typically of the order of tens to hundreds, and is much smaller than 2 B , whereas the blocklength n is typically on the order of 10 4 to 10 5 , the problem of designing efficient codebooks and decoding algorithm for the unsourced random access channel corresponds to designing the sensing matrix A and a recovery algorithm for a compressed sensing problem [1]. However, this compressed sensing problem has two non-standard features: (i) the dimensions of the problem are huge (recall that B = 100 is a typical number); and (ii) the sparse vector is binary, in contrast to the standard compressed sensing setup where the nonzero entries can take values in an interval within the real line.
To address the dimensions of the compressed sensing problem, Amalladinne et al. [7] introduced the coded compressed sensing framework, where the B message bits are divided to smaller chunks and are encoded on different sub-blocks. This idea breaks the original compressed sensing problem into a sequence of compressed sensing problems with manageable dimensions, which can be handled via existing tools from the compressed sensing literature (we remark that this is somewhat related to ideas that have previously appeared in the compressed sensing and group testing literature, wherein one constructs the measurements matrix by combining an "outer" and "inner" code-see, e.g., [16][17][18]). A difficulty that arises under this framework is that the sub-messages eventually have to be stitched to one long message, and a tree code was developed by Amalladinne et al. [7] for this purpose. While there has been many important advances in the field since the first appearance of the coded compressed sensing framework [19], the idea of first solving small compressed sensing problems and then leveraging the solutions to obtain estimates of the entire message still appears in one way or another in practically all schemes achieving state of the art performance.
Motivated by the above, the focus of this paper is the design of sensing matrices and efficient decoding algorithms for (small dimensions) compressed sensing of binary signals. Originally, Amalladinne et al. [7] treated this challenge by designing the sensing matrix based on BCH codes and using off-the-shelf recovery algorithms, such as LASSO or non-negative least squares (NNLS), for decoding. The main weakness of this approach is that it fails to exploit the fact that the entries are binary. Later, Fengler, Jung and Caire [9] suggested using Sparse regression codes with approximate message passing (AMP) decoding. The main benefit of the AMP decoder is that it allows incorporating any prior one has on the signal x, and not just sparsity. Consequently, it achieves excellent performance when x is a binary sparse vector and the sensing matrix A is i.i.d. Gaussian. This framework has more benefits, for example it allows one to efficiently jointly decode all compressed sensing problems corresponding to the different sub-blocks and one can even iterate between the AMP decoder and the tree decoder [11].
In this paper, we propose an alternative design for a sensing matrix A and a decoding algorithm. Our sensing matrix A is taken as the parity check matrix of a low-density parity check (LDPC) code, thought of as a matrix over the reals. The decoder is based on the Markov Chain Monte Carlo (MCMC) method, more specifically, Glauber dynamics. This method performs a random walk over a Markov chain whose state space consists of all possible values of x and whose stationary distribution is the conditional probability of x given the measurement y. Due to the sparse structure of the matrix A, each step in the random walk can be simulated with a low computational cost.
For the compressed sensing problem with binary signals problem, our proposed framework achieves comparable performance to that of AMP with a Gaussian sensing matrix. However, in contrast to the AMP framework, which is based on sensing matrices that are Gaussian i.i.d., or "Gaussian i.i.d.-like", our sensing matrix is sparse. The sparsity of the sensing matrix A in compressed sensing of binary signals has several benefits that go beyond the unsourced random access application: • Storage. Storing a sparse matrix requires fewer memory resources than storing a dense unstructured matrix, such as a matrix sampled from the i.i.d. Gaussian ensemble.
We remark, however, that the AMP algorithm often works very well for compressed sensing of binary signals even when the Gaussian i.i.d. matrix A is replaced with a sensing matrix that is dense yet easy to store. For example, Amalladinne et al. [20] suggested taking A as a sub-sampled Hadamard matrix. • Joint source-channel coding with local updates. Consider the problem of storing a sparse binary vector x ∈ {0, 1} M with Hamming weight at most k, in an array of n noisy memory cells. By noisy memory cells, we mean that the value read from memory cell i is modelled as s i + z i , where s i is the stored value and z i is additive noise, e.g. Gaussian. This is a reasonable model for magnetic recording (ignoring intersymbol interference) [21] and for flash memories (ignoring further impairments like cross talk) [22]. Note that this is actually a joint-source channel coding problem where the source is x ∈ {0, 1} M , the channel is Gaussian and can be used n times, and the distortion measure is Hamming distortion. It is often desirable to use update efficient schemes. In such schemes changing one bit in the input vector x, should correspond to changing the content of a small number of memory cells (see, e.g., [23]). When the encoding scheme is s = Ax, an update in one coordinate of x, say x i , corresponds to adding (removing) the ith column of A to (from) s. If each column has a small number of nonzero entries, the update involves changing the stored value in a small number of cells. Thus, using a matrix A with sparse columns is highly desirable. • Group testing. In group testing, the goal is to detect a set of at most k defective items from M possible items. To this end, we designate by x ∈ {0, 1} M the vector whose nonzero entries are defective. We have n measurements of x, each corresponding to a different "pool". Each pool is a subset of [n], and the corresponding measurement is obtained by passing the number of defective items in the pool, denoted by , through some noisy channel P Y|L (y| ) (see, Definitions 3.1 and 3.3 in [24]). The typical case is that the channel depends on the number of defective items only through the indicator on the event { > 0}, but the general model allows the measurement to be distributed as + σz, where z ∼ N (0, 1). Thus, with this model the design of the group testing scheme corresponds to designing a binary sensing matrix A ∈ {0, 1} n×M , and the measurements are y = Ax + σz. Using pools, corresponding to the rows of A, with small Hamming weight, results in simpler tests. For example, the original application for which the group testing framework was developed was detection of syphilis among a large group of patients, using a small number of tests. Using pools with small Hamming weight means that we need to mix samples from fewer patients in each pool, which results in less work for the lab technician.
In Section 2, we formalize the problem of compresses sensing of binary signals and present our suggested construction for the sparse sensing matrix and our MCMC-based recovery algorithm. Some theoretical analysis and justification for our suggested method is also given. In Section 3, we evaluate the performance of the proposed scheme numerically and compare it to other state-of-the art schemes for the compressed sensing of binary signals problem. We also evaluate the performance of an end-to-end communication scheme for the unsourced random access channel with a small amount of feedback, which uses the proposed compressed sensing of binary signals scheme as an important ingredient. Section 4 is devoted to conclusion and additional discussion.

Compressed Sensing of Binary Signals
We now define a formal mathematical model for the problem studied in this paper. Consider a linear inverse problem of the form where x ∈ R M is an unknown signal, to be recovered; A ∈ R n×M is a (known) linear measurement matrix; and z ∈ R n is i.i.d. Gaussian noise: z 1 , . . . , z n i.i.d.
∼ N (0, 1). This problem becomes especially interesting in the under-determined regime, where the number of samples n is smaller than the signal dimension M-here, clearly, one cannot recover x generically, and it is necessary to make additional structural assumptions on x. In compressed sensing, one assumes that x is a sparse vector, where the number of non-zero entries k is very small compared to M. Perhaps the most fundamental result in sparse recovery states that, in order to recover exactly any k-sparse x from noiseless measurements y = Ax, one in fact needs only n = O(k log(M/k)) linear measurements, where the sensing matrix A is taken to be an i.i.d. Gaussian random matrix (see, e.g., [25], Chapter 9). The recovery procedure itself, while not linear, can be formulated as a convex program which is computationally easy to solve. In recent years, a vast literature on compressed sensing has formed, spanning new theory, low-complexity algorithms and new constructions of good sensing matrices, beyond the i.i.d. Gaussian setup. We make no pretense to give a literature review on this topic; for a starting point, we refer primarily to surveys [25][26][27][28][29][30][31][32][33][34].
We consider a setting where x is constrained to be in a discrete set, on top of being sparse. Specifically, we assume it is binary: x ∈ {0, 1} M . As described above, this problem is closely related to communication over the unsourced random access channel, but problems of this form have received some attention in the past (see, e.g., [35][36][37][38]).
Throughout, we assume a sparse binary prior for x. Specifically, let k be the expected sparsity, and denote ρ = k/M. The coordinates of x are assumed i.i.d. Bernoulli random variables: that is, Pr(x i = 1) = ρ and otherwise x i = 0. Clearly, the expected number of non-zero entries is just E x 0 = k. A recovery algorithm for x from y is a mappingx : R n → {0, 1} M . The performance of a recovery algorithm is measured in terms of the bit error rate (BER) it attains where the probability is taken with respect to both the additive noise, as described in (1), and the signal prior (2). Note that the normalization in (3) is by the expected sparsity k, rather than by the length M of the vector x. Since typically k M in compressed sensing, normalizing by M would yield a very small BER for any reasonable estimator, and normalizing by k therefore makes more sense.
Given a signal dimension M and budget of measurements n, one would typically like to: (i) construct "good" sensing matrices A that allow for noise-robust recovery of signals with as little sparsity (large k) as possible; and (ii) come up with low-complexity recovery algorithms for recovering x from y. As for Point (ii), note that one would like to go beyond off-the-shelf compressed sensing algorithms, such as the LASSO [39][40][41] or Non-Negative Least Squares (NNLS) [7,26,[42][43][44][45][46], which are designed with any real or positive signal in mind, and find algorithms that explicitly leverage the binary structure of the signal, to attain an advantage in terms of recovery performance. In this paper, we address these two points: for the sensing matrix, we propose to use sparse matrices based on LDPC codes; as for the recovery algorithm, we propose to use an MCMC sampling method that approximates the optimal (in terms of bit error probability) MAP estimator.

Sensing Matrices from LDPC Codes
We consider sensing matrices based on Gallager's ensemble of LDPC codes [47]. Denote by LDPC(ν, s; M, n) the following ensemble of random bipartite and biregular graphs, described below: • One side of the graph has M vertices, which we call "variables" (the left side), and the other has n vertices, called "factors" (the right side).
• For simplicity, assume νM = sn. Each variable has degree ν, meaning it is connected to exactly ν factors; each factor has degree s. Thus, there are exactly νM = sn edges in the graph. • The edges of G ∼ LDPC(ν, s; M, n) are sampled according to the following procedure. The procedure runs in ν rounds, so that in every round one introduces M/s new factors (we assume M/s is integer for simplicity), by randomly partitioning the variables [n] into M/s parts of size s each, namely, For every new factor 1 ≤ i ≤ M/s introduced in this round, one adds an edge between i and all the variables in the corresponding S i .
The sensing matrix A ∈ {0, 1} n×M is taken to be the adjacency matrix of a randomly sampled graph G ∼ LDPC(ν, s; M, n), that is, 1 there is an edge in G between factor i and variable j 0 otherwise .
The idea of constructing sensing matrices from bipartite graphs is not new. It is known that when G is a sufficiently good expander, the corresponding adjacency matrix A is a good sensing matrix (see, e.g., [48][49][50][51], ( [25], Chapter 13) and the references therein). Specifically, ensembles of LDPC codes have also been considered previously for compressed sensing [52][53][54].
It is worthwhile to recall at this point that the recovery problem we consider here is more structured than the "standard" compressed sensing setup: on top of being sparse, we assume the unknown signal is binary, and in particular non-negative. Past results show that the non-negativity assumption may give a considerable advantage in terms of the required number of measurements, as well as robustness to noise (see, e.g., [26,[42][43][44][45][46]55]).
We especially mention the results of Khajehnejad et al. [55]. We say that a bipartite graph with left degree ν is an (r, ε)-expander if for every set |S| ≤ r of left vertices, one has |N(S)| ≥ (1 − ε)ν|S|, N(S) being the neighbors of vertices in S. The results of Khajehnejad et al. [55] state that a bipartite left-regular (r, 1 − 1/ν)-expander yields, after applying a very small perturbation to the entries of the adjacency matrix, a sensing matrix where all non-negative r/ν − 1 sparse vectors x can be recovered from y = Ax (noiseless measurements). This guarantee, for non-negative signals, is considerably better than what one has without the non-negativity constraint-to get recoverability guarantees for "general" compressed sensing, one needs considerably larger expansion (smaller ε) (see, e.g., [25], Chapter 13). There are well-known connections between the decodability of LDPC codes and their expansion properties [56]. For example, for a slightly different ensemble of LDPC codes (that contains Gallager's ensemble), one can show ( [56], Theorem 8.7) (this result first appeared in [57]) that, with high probability, a random graph is an (α * M, is the binary entropy function. While not precisely applicable for our setup (which uses Gallager's ensemble), the following calculation could nonetheless be thought of as a crude heuristic. For example, in the setup, we consider later on in the numerical experiments, corresponding to a typical use-case for detection in unsourced random access, M = 2 14 , n = 2 11 , ν = 16, s = 128, one can solve the above equation numerically and get α * ≈ 0.993. Together with Khajehnejad et al. [55], this hints that k = α * M/ ν ≈ 101 sparse, non-negative signals can be consistently recovered. In fact, the experiments indicate that, practically, binary signals with considerably more non-zeros can be recovered reliably in this setting (see Section 3.1).

MCMC Algorithm for Recovery
Recall that, for a given sensing matrix A, our goal is to construct an estimator x = x(y) such as to minimize the per-bit error rate (BER), as defined in (3). Clearly, the optimal estimator in the sense of minimizing the BER is simply the per-coordinate maximum a posteriori (MAP) estimator: Computing the posterior Pr(x i | y) is a formidable task: it requires one to marginalize over all other coordinates = i. From a computational point of view, this is highly nontrivial, since the coupling between the coordinates of x, as induced by A, creates a strong cross-coordinate dependency conditioned on y.
We propose to mitigate this difficulty by sampling. Instead of marginalizing and and maximizing, we sample an x ∈ {0, 1} M from the full posterior, given by where λ = log ρ 1−ρ , ρ = k/M and Z is the partition function (to see the correctness of λ = log ρ 1−ρ in the expression above, note that the prior is Pr( Taking the ith coordinate of x, call it x i , we obtain a sample from Pr(x i = ·|y), the desired single-bit posterior distribution.
Intuition suggests that, when x BER,i has small error, the estimator obtained by sampling, call it x SAMP,i , should have small error as well. This is because, if the optimal error is small, the posterior Pr(x i |y) must put most of its mass on x BER,i = x BER,i (y); this in turn means that, with high probability over the sampling procedure, one should in fact get x SAMP,i = x BER,i . This reasoning is formalized in the following Lemma: Lemma 1. Denote x BER = ( x BER,1 , . . . , x BER,M ), with coordinates given by Equation (4). Let x SAMP = x SAMP (y) ∼ Pr(· | y) be a random sample from the posterior (5). Then: In other words, the bit error rate of x SAMP is bounded by twice the optimal bit error rate, over all estimators.
Note that, on the left-hand side, the probability is taken both over the randomness in x and the noise, as well as the sampling procedure used for constructing x SAMP .
Thus, we are left with the problem of sampling from the posterior Pr(x | y)-doing so "directly" might seem, at first glance, essentially just as hard as maximizing the posterior (namely, we would need to go over all 2 M possible signal configurations). Markov-Chain Monte Carlo (MCMC) methods provide a strong toolbox for sampling, approximately, from high-dimensional distributions. The idea is to construct an ergodic Markov chain such that: (i) its stationary distribution is the desired (high-dimensional) distribution one would like to sample from, namely Pr(x | y); and (ii) the chain is easy to propagate in time (e.g., its update rule is local). Having constructed such a chain, and assuming that it mixes sufficiently fast (which is often difficult to ensure), one can therefore efficiently sample from the desired distribution, up to high precision. For further background and discussion on MCMC, we refer to the work of Levin and Peres [61] (Chapter 3). The use of MCMC methods for solving inverse problems in signal processing and for decoding/detection in communication is by no means novel (see, e.g., [62][63][64][65][66][67][68][69]). While both the idea of using LDPC codes as sensing matrices and the idea of using MCMC methods for decoding are not new, our innovation here is in combining the two concepts for the compressed sensing of binary signals problem. As becomes evident below, the sparse structure of the sensing matrix constructed from an LDPC code significantly reduces the computational load from the MCMC decoder by reducing the computational cost of each iteration.
We propose using the well-known Gibbs sampling method, also known as Glauber dynamics, which is a general-purpose recipe for sampling from high-dimensional distributions. Let Q(x) be a distribution over {0, 1} M from which one wants to sample; in our case, of course, Q(x) = Pr(x | y). We construct a chain x (1) , x (2) , . . . ∈ {0, 1} M starting from some (arbitrary) initial state x (0) according to the following transition rule. Suppose that the current state is x (t) ; one samples a coordinate to update at random, i t ∼ Uniform({1, . . . , M}), so that x j for all j = i t . As for coordinate i t , it is sampled according to the conditional distribution of x i t , with all other coordinates fixed and given by x (t) , that is: ∼i t ) (we denote the vector of all coordinates, except for i t , by x ∼i t ). Applied to the posterior in (5), Glauber dynamics reads as follows: It is easy to see that the process x (1) , x (2) , . . . ∈ {0, 1} M is an ergodic Markov chain, and therefore has a unique stationary distribution. Furthermore, it is easy to verify that Q(x) is a stationary distribution of this chain. Thus, for T sufficiently large, we have that indeed x (T) is distributed as a random sample from Q(x). Note that, when A is a sparse LDPC matrix, each iteration of Algorithm 1 is computationally very cheap. One can easily keep track of y (t) = Ax (t) and y − y (t) 2 across iterations, noting that an update to a coordinate of x (t) requires updating only i t =1 stand for setting, in x (t) , the i t th coordinate to 0, 1 respectively).
A proof is given in Appendix A.2. Note that Lemma 2 applies for any y ∈ R n and σ 2 , that do not necessary have anything to do with the model (1). However, when y, σ 2 do correspond to measurements from (1), namely y = Ax + σz, Lemma 2, combined with Lemma 1, allows us to bound the bit error rate of the estimator x = x (T) returned by running T iterations of Glauber dynamics.
Assuming condition (6) holds, Lemma 2 tell us that running T = 4(c+1)σ 2 4σ 2 −4ν(s−1) · M log M = O(M log M) iterations of Glauber dynamics gives, with probability one (over x, y) an output x (T) whose law is M −c -close to the law of x SAMP , in total variation distancehere, c > 0 can be taken as large as one likes. Recall that total variation distance is just

] (recall that the maximum is actually attained at the indicator function
and noting that f is non-negative and bounded by M/k, we deduce which, by Lemma 1, is bounded by twice the optimum BER, up to an inverse polynomial (in M) error. As a remark, we mention that in practice, MCMC methods are often implemented using annealing, which in our case amounts to basically running Glauber dynamics with a noise variance σ 2 which is larger than the true noise. This can help steer the system away from local maxima of Q, by "smoothing" it out.
We would like to emphasize that condition (6) is very pessimistic, and in practice Glauber dynamics appears to mix rapidly at substantially lower noise levels than predicted there. For example, in the setup we consider below, M = 2 14 , n = 2 11 , ν = 16, s = 128, so that the bound σ 2 0 = ν(s − 1)/4 = 508, translates into energy per transmitted bit as E b /N 0 = ν 2σ 2 ·lg 2 (M)· = 1 7(s−1) ≈ 0.001, which is roughly −29.5 dB. This E b /N 0 is very far from sufficient for reliable recovery of x even when k is small (see experiments in Section 3.1). In this regime, while indeed Lemma 2 holds in the sense that Glauber dynamics mixes fast, the error rate of the optimal estimator is too high to be of use. Thus, Lemma 2 should not be thought of as an accurate predictor for the performance of Glauber dynamics for binary compressed sensing. Instead, it should be thought of as a "sanity check"-evidence that Glauber dynamics is a reasonable thing to do, at least in some regime of the problem.
On the same note, we observed, that when k is large, Glauber dynamics tends sometimes to get stuck at "bad" local maxima, even when the noise is moderate. To mitigate this, one can initialize x (0) reasonably close to the true signal x, using an off-the-shelf compressed sensing solver like NNLS-and then use Glauber dynamics as a refinement step. Applying this additional step of Glauber dynamics may improve the performance substantially (see the numerical results in Section 3.1). Of course, the result of Lemma 2 does not predict in any way this behavior; rather, it is completely agnostic to the starting location. Additionally, the bound on the mixing time there does not depend at all on k, which, as just mentioned, is crucial for the behavior of Glauber dynamics in practical regimes. A more sophisticated analysis of Glauber dynamics for compressed sensing of binary signals, which takes into account the points above, is an interesting problem, and, to the best of our judgement, highly nontrivial.

Performance in Compressed Sensing of Binary Signals
We start by demonstrating the performance of Glauber dynamics in the compressed sensing of binary signals setup of Section 2.
We run many random recovery experiments, to recover x ∈ {0, 1} M from y = Ax + σz ∈ R n . In all the experiments, we use M = 2 J , J = 14, n = 2 11 and sparsity values k ∈ {50, 100, 200, 300}. These parameters are representative of a typical setup for unsourced random access (see Section 3.2). For each k, we vary the energy per transmitted bit, E b /N 0 = E m 2σ 2 ·J (here, E m is the average energy per transmitting user-the energy of a column of A) and plot the corresponding bit error rate.
We plot the performance under the following schemes: 1.
The scheme of Amalladinne et al. [7]: A based on BCH codes, and NNLS decoder. To obtain a binary estimator from the NNLS solution, we simply assign every entry to its closest binary value (that is, according to whether it is smaller or greater than 1/2).

2.
A given by a sparse LDPC matrix, with parameters ν = 16 (consequently, s = 128), under the following decoding algorithms: When using Glauber dynamics, we always let it run for T = 10M lg 2 M = 10 MJ iterations.

3.
A is a dense random i.i.d. Gaussian matrix of mean 0 and variance 1/n, with Approximate Message Passing (AMP) decoder (thus, E m = 1; of course, in the experiments, the noise level σ is normalized according to the appropriate choice of E b /N 0 ). The denoiser used in AMP is the optimal denoiser for the i.i.d. Bernoulli source, essentially as proposed by Fengler et al. [70]. AMP is a state-of-the-art algorithm for compressed sensing of binary signals, and is our main benchmark. For convenience, the exact implementation details of AMP are given in Appendix B.
Our results are summarized in Figure 1. We see that when the sparsity is moderate (up to k = 200), our proposed scheme attains essentially state-of-the-art performance. However, when k is large (k = 300), performance falls short of AMP: if initialized at zero, Glauber dynamics consistently gets stuck in a local maximum, far away from the true signal; on the other hand, if one initializes Glauber dynamics with the NNLS solution, the combined scheme eventually attains performance which is substantially better than off-the-self compressed sensing solvers.
In Figure 2, we plot the evolution, across consecutive iterations, of both the BER and the "energy" E(x (t) ) = − 1 2σ 2 y − Ax (t) 2 + λ x (t) 1 along a single run of Glauber dynamics (initialized at x (0) = 0). We use k = 100 and E b /N 0 = 1 dB. Note that the iterations are given in units of MJ = M lg 2 M (meaning, it is t/MJ). Ignoring stochastic fluctuations, we see that Glauber dynamics essentially monotonically minimizes the energy (the error, however, is not monotonically decreasing).

End-to-End Performance in Grant-Based Random Access
As mentioned in the Introduction, the compressed sensing of binary signals problem is an important component of many schemes that have been proposed for communication over the unsourced random access channel. In this model [1], communication is performed in blocks of n channel uses of a Gaussian multiple access channel where (s 1 , . . . , s K tot ) ∈ {0, 1} K tot is the "activity pattern" vector whose Hamming weight is k, x i ∈ R n is the codeword transmitted by user i assuming it was active and z ∼ N (0, I) is additive white Gaussian noise (AWGN). Note that this channel model implicitly assumes perfect power and phase control, which is often difficult to attain in practice. We further assume that all active users have a message of B bits to transmit, and that each of these messages is independently and uniformly distributed over [2 B ]. The activity pattern is assumed unknown to the decoder, and known only locally to the transmitters, i.e., each user only knows whether or not it is active, but does not know which of the other users are active. The decoder's goal is to output a list of k messages that contains as many transmitted messages as possible. The per-user probability of error (PUPE) is defined as the number of transmitted messages that did not enter the list, normalized by k.
In this section, we use the scheme developed above for compressed sensing of binary signals as a building block for an end-to-end communication scheme for the unsourced random access channel. We slightly deviate from the mainstream literature on unsourced random access, by allowing for some feedback to be sent from the receiver to all potential users through a broadcast channel. This option was mostly avoided until now, with the exception of Facenda and Silva [12], as it was believed that the large number of potential users and the small payloads for each active users renders scheduling too wasteful. Recently, Kang and Yu [71] established a connection between scheduling for the unsourced random access channel and perfect hashing and demonstrated that in fact scheduling for the unsourced random access channel can be attained with a very small cost. Based on their observation, we propose the following scheme for the unsourced random access channel with an unbounded number K tot of potential users, among which k are active users that have to send a B bits message each, over n channel uses: • Phase 1: Each active user transmits the first J bits of its message over n 1 < n channel uses. To that end, we use a sensing matrix A drawn from the LDPC(ν, s; M, n 1 ) ensemble, with M = 2 J . Each active user chooses one of the M = 2 J columns of A, corresponding to the first J bits in its message, scales it by α > 0 and transmits them over the channel. Since there are k active users, the channel output after n 1 uses is The vector x consists of entries in Z + (all non-negative integers) and satisfies x 1 = k. If all k active users chose messages that begin with a different string of J bits, the vector x will further be in {0, 1} M . For our choices of J and k described below, typically almost all entries of x will be binary. The basestation (which is now the receiver) applies Algorithm 1 to estimate x. In the end, we compute p The basestation applies a set partitioning scheme for collision-free feedback, as described in [71], for broadcasting to the users a list of the k strings of J prefixes it has decoded in phase 1. Naively, this would require broadcasting a message of k · J bits. However, as shown in [71] using a more intelligent scheme, this can information theoretically be done with about k · lg 2 (e) bits, and practical schemes can encode this information using less than 2k bits. Each active user decodes the message transmitted by the basestation and finds the location of the J bits prefix of its message within the list of k prefixes that was transmitted. • Phase 3: The remaining n 2 = n − n 1 channel uses are split to k slots, each of length n = n 2 /k. Each active user transmits the remaining B − J bits of its message during the slot whose index it has decoded in Phase 2. To this end, off-the-shelf point-to-point codes are used. Active users that did not find their J bits prefix in the list of Phase 2, do not transmit a thing in Phase 3.
Note that in the end of this procedure the receiver outputs a list of at most k messages. The message sent by a particular active user enters the list the decoder outputs whenever neither of the following error events occur: (i) Another active user chose a message with the same J bits prefix, causing a collision in Phase 1 above. (ii) The J bits prefix of the user's message did not enter the list produced by the basestation in Phase 2. (iii) The user failed to decode the message sent from the basestation in Phase 2. (iv) There was a decoding error in the point-to-point transmission of that user in Phase 3.
For the remainder of this discussion, we neglect the cost of Phase 2 in terms of channel resources (energy and bandwidth) and its contribution to the error probability. We do this to avoid the need to model the broadcast channel from the basestation to the active users. In light of Kang and Yu [71], the message sent by the basestation in Phase 2 is significantly shorter than the messages sent by the active users. Adding this to the fact that the basestation is typically less power-constrained than the end-devices in machine-to-machine type communication, it follows that indeed Phase 2 will usually have negligible effect in both aspects (bandwidth and error probability). As mentioned above, our performance figure of merit is the per-user error probability.
We conducted experiments to estimate the expected performance of this end-to-end scheme. In each experiment, each of k users generates a random message of B bits to be transmitted. Let x ∈ {0, 1, . . . , k} 2 J be such that x m = the number of users who sent message m during Phase 1. The per-user error probability for Phase 1 is where L is the list of k messages returned by the base station, and m(i) is the message transmitted by user i. The error probability ε 1 is estimated via Monte-Carlo simulation. For the error of the second phase, we use the finite block normal approximation of Polyanskiy-Poor-Verdú ( [72], Theorem 54): where n P is the total energy per user, C(P) = 1 2 lg 2 (1 + P) is the AWGN capacity and V(P) = P(P+2) 2(P+1) 2 (lg 2 (e)) 2 is the AWGN channel dispersion. Given a target error probability ε 2 , we can solve (7) with an equality to obtain an achievability estimate P * on the power P necessary to attain user-basestation point-to-point error probability at most ε 2 . The total energy per transmitted bit (per user) is just where, as in the previous section, (E b /N 0 ) phase1 = E m 2σ 2 ·J , E m being the energy of a column of A. For every k, we wanted to find the smallest E b /N 0 that achieves total per-user error ε 1 + ε 2 = 0.05. This optimization have has performed numerically.
The performance attained by this end-to-end scheme is plotted in Figure 3. We plot the performance corresponding to Phase 1 implemented by the sensing matrix and recovery algorithm introduced in this paper, as well as an i.i.d. Gaussian sensing matrix and AMP recovery. Both implementations for Phase 1 correspond to similar performance, with slight preference for the latter, and substantially improve the state-of-the-art for unsourced random access with (a small amount of) feedback [12].  . Total E b /N 0 required to achieve end-to-end PUPE ≤ 0.05. We see that, by using a better compressed sensing algorithm for binary signals, significant gains can be achieved over the current state of the art [12].

Conclusions and Additional Discussion
We propose a scheme for compressed sensing of binary signals, consisting of a sparse sensing matrix, based on Gallager's ensemble of LDPC codes, and a decoder based on MCMC. When used as a building block in grant-based random access, the scheme is demonstrated numerically to attain essentially state-of-the-art performance. To conclude, we mention several points that rise up as follow-up questions to our results.
Belief Propagation. One of the most popular algorithms for decoding LDPC codes is Belief Propagation (BP) (see, e.g., [56,63]). We conducted very limited experiments with sum-product and max-product BP (not reported in this paper); our preliminary findings suggest that our MCMC decoder outperforms BP (in terms of its tolerance to noise), at least in the regime considered in Section 3.1. A possible explanation for this could be that the sensing matrix A has many small cycles, which severely violates the tree assumption and is common in BP analysis of LDPC codes. A thorough study of BP for compressed sensing with binary signals is left as an interesting direction for future research.
Grantless unsourced random access. In Section 3.2, we demonstrate that our scheme can attain essentially state-of-the-art performance in grant-based unsourced random access (wherein a compressed sensing problem is solved in the first, scheduling, step). However, most previous works on unsourced random access have considered a different approach, which does not allow for feedback. The idea is to divide transmission into several blocks and perform coding in two steps: (1) an outer code, to allow the decoder to relate ("stitch") messages across different blocks to one another; and (2) an inner code, wherein each user codes its message (payload + parity bits) over an AWGN multiple access channel-in this framework, decoding the inner code boils down to solving a compressed sensing problem with a binary signal. An interesting question is whether our proposed scheme can provide any gains if used to construct an inner code in this framework. In [11], the authors proposed to use a certain tree code (outer code) and an i.i.d. Gaussian sensing matrix for the inner code, together with a certain AMP decoder, which in decoding iteratively passes information between the inner and outer codes. We tried replacing the AMP decoder with our scheme. Specifically, we considered an iterative procedure that alternates between the following steps: (1) run Glauber dynamics on each block, producing a soft decision rule for the (sparse, per-block) activity pattern; and (2) a tree code inference step, that takes the per-block "likelihoods" produced by Glauber dynamics and computes a posterior over the entire activity pattern, by integrating information across all the blocks. The next time we decode the inner code, this posterior is used for the new prior of the signal. Our preliminary experiments indicate that the performance of this combined scheme is rather disappointing and quite far off from state of the art [11].
Generalizing to non-equal channel gains. When discussing random access, we modeled the received signal at the base station as y = Ax + σz where x ∈ {0, 1} M is the pattern of active users and σz is Gaussian noise; namely, the channel between the users and the basestation is an AWGN multiple access channel where all gains are equal. This model is based on the assumption of perfect power and phase control, which is not always realistic, and designing communication schemes for the fading model, where channel gains are not assumed equal, is desired. Generalizing our MCMC decoder to incorporate fading looks somewhat challenging. Consider a model y = AHx + σz where H = diag(h 1 , . . . , h m ) is a diagonal matrix of (random) fading coefficients. We would like to sample from the posterior of x given y: where notice that we now need to marginalize over H = diag(h 1 , . . . , h M ). This marginalization appears to complicate things considerably: in particular, in contrast to the case where H is the identity matrix, in the general case, it is not so straightforward to sample x i conditioned on all other coordinates. Devising an MCMC decoder that can handle fading is an interesting problem for future research. Here, Q (t) is the law of x (t) , the state of Glauber dynamics at time t, starting from x 0 = x, and Q (∞) is the stationary distribution.
The proof of Lemma 2 follows by constructing a contracting coupling between X x and X x for any x ∼ x , and applying Theorem A1. The construction proceeds as follows. Let i ∼ Uniform([M]) be a random coordinate to update. For all ∈ [M], let and where ϕ(x) = 1/(1 + e −x ) is the logistic function. Let p 1 ( ) be defined similarly, with x replaced by x . To obtain X x and X x , sample U ∼ Uniform[0, 1], independent of i. X x , X x coincide with x, x , respectively, on all coordinates = i; as for the ith coordinate, set (X x ) i = 1 if U ≤ p 1 (i) (and otherwise set to 0), and likewise set (X Clearly, the random variables (X x , X x ) constructed in this manner have the "correct" marginal distribution; thus, we have defined a legitimate coupling. It remains to show that this coupling is contracting. Since x ∼ x , there is a unique coordinate on which they differ, call it 0 . Observe that, conditioned on i = 0 , we have X x = X x exactly. On the other hand, when i = 0 , the Hamming distance either stays the same or increases by 1, depending on U. Indeed, the distance increases if and only if min{p 1 (i), p 1 (i) } < U ≤ max{p 1 (i), p 1 (i) }, and, conditioned on i, this happens with probability |p 1 (i) − p 1 (i) |. Thus, and a similar expression holds for p 1 ( ) , with x replaced by x . Since ϕ(·) is 1/4-Lipschitz, where we used that x and x differ only on 0 . Observe that the double sum simply counts the number of pairs ( , f ) such that 0 and = 0 are both connected to the factor f .
Recalling that the degree of all variables is ν and the degree of all factors is s, which is just ν(s − 1). We conclude that which is < 1 whenever 4σ 2 > ν(s − 1) ; this is exactly the condition (6), appearing in the statement of Lemma 2. Applying Theorem A1, This bound is ≤ε whenever t is exceeds the quantity in Lemma 2.

Appendix B. Approximate Message Passing (AMP)
In this section, we provide implementation details for the AMP algorithm used in the numeric comparisons of Section 3.
The AMP algorithm of Donoho et al. [73] uses an iteration of the following form, starting from x (0) = 0, r (0) = 0: Here, f t : R → R is a sequence of univariate functions (for a vector b, f (b) stands for applying f separately to each coordinate). The main idea of AMP ("State Evolution") is that in the large-dimensional limit (and under certain technical assumptions), the iterates b (t) behave as b (t) ≈ x + σ t N (0, I); that is, additively corrupted measurements of the true signal x. The variance σ 2 t can be estimated from r (t) , which approximately has the law N (0, σ 2 t I); conversely, it can be tracked via an explicit recursive formula. For our experiments, we use the proposal of Donoho et al. [74], and use the robust estimator σ t = median(|r (t) |)/Φ −1 (0.75), where Φ −1 (·) is the inverse Gaussian CDF.
The details of the AMP algorithm (A1) rely on the specific choice of functions f t . In compressed sensing of binary signals (1), a natural choice is the MSE-optimal estimator for x from b = x + σ t N (0, I), namely, This is what we used for the experiments presented in Section 3.