Performance Limits of Direct Wideband Coherent 3D Localization in Distributed Massive MIMO Systems

We address the accuracy of wideband direct position estimation of a radio transmitter via a distributed antenna array in 5G cellular systems. Our derivations are based only on the presence of spatially coherent line-of-sight (LoS) signal components, which is a realistic assumption in small cells, especially in the mmWave range. The system model considers collocated time and phase synchronized receiving front-ends with antennas distributed in 3D space at known locations and connected to the front-ends via calibrated coaxial cables or analog radio-frequency-over-fiber links. Furthermore, the signal model assumes spherical wavefronts. We derive the Cramér-Rao bounds (CRBs) for two implementations of the system: with (a) known signals and (b) random Gaussian signals. The results show how the bounds depend on the carrier frequency, number of samples used for estimation, and signal-to-noise ratios. They also show that increasing the number of antennas (such as in massive MIMO systems) considerably improves the accuracy and lowers the signal-to-noise threshold for localization even for non-cooperative transmitters. Finally, our derivations show that the square roots of the bounds are two to three orders of magnitude below the carrier wavelength for realistic system parameters.


Introduction
Recently, there has been a growing interest in massive multiple-input multiple-output (MIMO) systems [1] and millimeter-wave communication (mmWave) technology [2][3][4][5][6][7]. These systems are already finding their place in future 5th generation (5G) cellular systems. A very important application of 5G systems is indoor localization and tracking [8]. Their key objective is to achieve a localization accuracy of about 1 cm [9], which is satisfactory for location-based services in cellular systems. In this paper, we aim to obtain the limits of the accuracy of these systems and answer the question of whether, in theory, it is possible to achieve localization accuracy significantly better than the carrier wavelength. The vision is that if such localization accuracy is possible, one could exploit it for locationaided communications. We derived the Cramér-Rao bounds (CRBs) for models that admit only LoS signal components. We addressed two implementations of the system: (a) with deterministic sequences known to the receiver system and (b) with random Gaussian sequences.
The analysis of localization accuracy is important for various reasons [10]. They include location-aided communication (LAC) and coordinated multipoint (CoMP) transmission. For example, CRBs describe local properties for CoMP, and they are much more important than integer carrier wavelength ambiguity. The authors in [11] analyze the Numerically less expensive methods for LoS localization in a massive MIMO ultradense network are proposed in [24]. They take into account the curvature of the wavefront at the base station arrays (spherical vs. planar). In addition to this curvature, unlike in [24], in this paper, we also consider (1) other useful information due to spatial coherence, and (2) the spatial-wideband effect [25,26]. The localization in our paper is suitable for networks with centralized processing of signals received by distributed antennas such as C-RAN networks [27]. In accordance with [28], we used a physical model for the LoS components and a stochastic model for the NLoS components.
In this paper, we derive the CRBs of localization of an arbitrarily wideband stationary radio frequency (RF) transmitter (Tx) in a spatially coherent scenario (such as the one in [18]). The receiving channels are phase-and time-synchronized, but time synchronization between the Tx and the receiving system is not required, unlike for ToA-based methods [17]. The wavefronts are spherical (the planar wave assumption is not used), contrary to the scenario in [15]. The signals are arbitrarily wideband in the sense that any two TDoAs across the receiving array may differ by more than the reciprocal of the bandwidth. The processing of the received signals is coherent; we exploit the phase differences between all the receiving channels for estimation.
We show that the CRBs are inversely proportional to the squared carrier frequency, and their square roots are two to three orders of magnitude below the carrier wavelength for usual system parameters, as opposed to non-coherent signal scenarios/algorithms, where the ranging/localization CRB is inversely proportional to the squared effective signal bandwidth. Moreover, we show that the CRB is inversely proportional to the square of the signal-to-noise ratio (SNR) for low SNRs when the transmitted sequence is random Gaussian and inversely proportional to the SNR for high SNRs, as well as for known sequences (for any SNR). Finally, we found that the CRB is inversely proportional to the number of samples used for estimation.
The localization system can use an unknown-sequence method (a method suited to random signals) to localize a non-cooperative source in the area of interest, that is completely independent from the system, or a known-sequence method to localize cooperative terminals, especially if they are a part of the communication network that also performs this localization (see different methods in [18]). Known-sequence-based methods can be numerically more complex, but they provide a processing gain which suppresses all interference orthogonal to the searched sequence (similar to CDMA (code-division multiple access) systems). We point out that a system can choose to use unknown-sequence methods in both cases. From the perspective of information theory and attainable accuracy, of interest in this paper; if the localization system knows the sequence, then it has more information at its disposal to localize the source. The latter is reflected in the lower error bounds, as shown in this paper. This paper follows the research direction of extremely large aperture arrays (a group of antennas distributed in a large area that work coherently), [29], and analyzes them as an enabler for highly accurate 3D positioning. The results presented in this paper are a contribution to the idea of using distributed coherent massive MIMO systems as a component (enabling technology) of future 6G wireless networks [30].
The paper is organized as follows. In the next section, we explain the system and signal model used for deriving the CRBs. In Section 3, we present the main derivation results of the paper, and in Section 4, we provide some numerical results obtained from the derived CRBs. In the last section, we provide concluding remarks and directions for future research. In Appendix A, we explain in detail the derivation of the CRBs when the receiving system knows which sequence/waveform the transmitter uses, whereas in Appendix B, we present the derivation details for random transmitted signals.
In the paper, we use the notation given in Table 1.  Note that, for the most part of this paper, we use normalized time and frequency quantities (such as f c ), whereas we use quantities in physical units (non-normalized, such as ν c ) only when necessary. The reason for this decision is that the transition between the analog and the digital domain is then trivial (s(n) ↔ s(t), when t = 0, 1, 2, . . .).

The Setup
A single stationary transmitter, Tx, is at an unknown location r = (x, y, z). A distributed stationary receiving antenna array has M elements/channels, Rx m , m ∈ {1, 2 . . . M}, where the Rx m th antenna is at a known location r m = (x m , y m , z m ), as in Figure 1.
When the distances between adjacent antennas in an array are of the order of λ c /2, where λ c is the carrier wavelength, we call the array a collocated or classical array. If some of the antennas are separated by many wavelengths, we call such an array distributed. Note that a distributed array does not have to contain only independent (single) antennas, but it may also contain collocated arrays (as its subarrays). It may be cheaper to build a distributed array from prefabricated subarrays (with densely packed, i.e., collocated, antennas in each one) and to multiplex the signals from the antennas of the same subarray through a single cable. (This might especially be convenient at higher frequencies.) Next, we define the spatial coherence of a signal component (such as an LoS component) from a transmitter Tx, wherever it may be in a given area of interest, A, across a set of receiving (Rx) antennas, M. We say that the spatial coherence exists, if the difference, denoted by ∆d, in the distances from Rx antennas m and to the Tx implies that the difference in the phase of that component at these two Rx antennas is equal to −2πν c ∆d/ c, for any m, ∈ M, where ν c is the carrier frequency and c is the speed of propagation in that medium. Note that this is a weaker condition than the medium being homogenous and isotropic. It is weaker because only the phase differences are required to follow this law and only for signals coming from inside of A. Furthermore, note that spatial coherence is more likely to exist in smaller cells, and especially indoors. Most importantly, note that spatial coherence is a property of the propagation medium only-it has nothing to do with the front-ends or the cables of the Rx system.
We also define coherent localization methods. We call them coherent if they exploit the information in phase differences between signals in each pair of the receiving channels to increase the accuracy of the localization. This requires (a) spatial coherence in the medium and (b) that the Rx channels work coherently (i.e., that they are time-, frequency-, and phase-synchronized). If they do not work coherently, the phases of the received signals may well be scrambled and this information is lost. Non-coherent methods would still work, because they rely only on signal envelopes. There are also semi-coherent methods, which exploit the phase differences between the signals from the same subarray, but not between different subarrays (because, e.g., the subarrays may be far apart from each other, that is, spread out over a large geographical area where spatial coherence does not exist). Note that spatial coherence, in most cases, exists across collocated arrays.
We now define direct localization methods. These methods estimate the location directly from the raw acquired data (the signal samples), whereas indirect methods estimate the location from some intermediate estimates (such as directions of arrival, times of arrival, time differences of arrival, or received signal strengths). Direct methods usually have better performance at the cost of higher numerical complexity and the need for the transference of raw samples to the signal processor. Furthermore, a wideband localization method is able to cope with signals whose bandwidth is large, such that the differences between envelope delays at different Rx antennas may exceed the inverse of the bandwidth.
Next, we draw some comparisons of our model with other models. Our receiving system model is equivalent to the one in [31,32] for a cell-free network, but the propagation model, however, is not. Namely, the network contains antennas, placed over a wide area, which (thanks to synchronization) work coherently to receive signals. Then, the signal processors process these raw signals, but, unlike in [31,32], where the phases in the Rx signals are scrambled, we consider scenarios with spatial coherence of LoS components, which means the information about the Tx location contained in the phases is preserved. The network may have many more antennas, but only a subset of M of them is chosen. It is assumed that the Tx is in LoS conditions with low path loss with each of the M antennas and that there is spatial coherence of the LoS components across this subset (as explained in [18]). Furthermore, this assumption is made for scalability reasons (see, e.g., [33]). The spatial coherence condition allows coherent localization, which in turn, as we will show, can provide a much higher accuracy. The network also chooses a central processing unit that will run a direct localization algorithm on the acquired signals. A similar comparison can be drawn against the model in [34]. Using the terminology from [34], we have a number of antennas in a distributed antenna system spread over the area of interest, grouped into remote antenna units (RAUs), where the RAUs can have different numbers of antennas; each RAU contains one or more antennas. The signals from the antennas are fed to baseband processing units (BPUs), as separate streams, where processing occurs (centralized processing is enabled, i.e., a direct localization algorithm can be run). The main differences are (a) the elements of the channel matrix in [34] are random, whereas we model the LoS components separately (from NLoS) and deterministically, and (b) the symbols transmitted at the same time arrive at the Rx antennas also at the same time (narrowband approximation) [34], whereas we consider wideband signals.
We also describe a contrast with [35], where a large number of low-cost low-power sensors transmit wirelessly short messages to a fusion center. Instead of performing coherent direct localization (which requires raw signals at the fusion center, and, therefore, probably requires wired infrastructure), it can cooperate with a high performance network, which performs coherent localization and advanced wireless communication (a network which we model) to help it detect the presence of mobile transmitters.
For the receiving channels to work coherently, the cables connecting the Rx antennas to their front-ends (Figure 1), together with the front-ends themselves, should either induce equal delays (including phase delays) in the signals or the deviations should be compensated for in preprocessing (this is a more likely and a more flexible solution). The IQ (in-phase-quadrature-phase) demodulators should use a local carrier from a common source (frequency-synchronization), the A/D converters should have a common clock, and the corresponding samples should have the same indices (timestamps) (time-synchronization). Readers interested in practical implementations of synchronization methods are referred to [36]. The Rx system receives a signal segment of N complex samples (N in-phase samples as real plus N samples from the quadrature-phase branch as imaginary) from each channel (starting, of course, from the same sample index in each channel). The signal processor then runs a localization algorithm on that segment (which contains MN samples). The SNRs are defined for each channel, and they are expected to decrease with the square of the distance from the Tx to an Rx antennas, but we allow them to have any values (deviations could happen because the polarizations are not aligned perfectly, for example; especially if the antennas are distributed in 3D). The system, of course, chooses a carrier frequency and a bandwidth appropriate for the band it wants to monitor (to perform localization in).
Note that the Tx does not need to be either time-or phase-synchronized with the receiving system. This means that even noncooperative transmitters can be localized. If one hopes to estimate the Tx location with accuracy better than the carrier wavelength, then the positions r m of the receiving antennas have to be determined with even better accuracy before the localization algorithm is executed.

Signal Model
The Tx transmits an RF signal (approximately) bandlimited to (ν c − B/2, ν c + B/2), where ν c is the carrier frequency in Hz and B is the signal bandwidth in Hz. The signal received at the Rx m th antenna at a distance d m from Tx, bandlimited to (−B/2, B/2), is sampled at the Nyquist rate ν s = B and is represented in a complex form as where n ∈ {0, 1 . . . N − 1}; a m is a real-valued attenuation factor for the LoS component, s m (n); s(·) is the sequence/waveform; the term u NLoS m (n) models all NLoS components; ω c = 2π f c and f c = ν c /B are the normalized angular and natural carrier frequencies, respectively; the unknown value τ 0 models the asynchronism between the Rx system and the Tx (we use, as a reference, the Rx time axis); τ m = d m /c is the propagation time from Tx to Rx m , where d m = r − r m and · is the Euclidean norm; the normalized propagation speed is c =c/ν s , wherec = 3 × 10 8 m/s; there are L m NLoS paths to the Rx m th antenna with complex-valued coefficients a m, and propagation times τ m, ; w m (n) represents noise; the acquired samples are u m (n).
Note that τ 0 and τ m are fractional dimensionless values (the propagation time delays do not have to be integer multiples of the sampling interval) and that the sequence/ waveform s(·) is defined on the entire (continuous) range R because it has been transformed into a continuous waveform inside the Tx.
Unlike the signal model in [1,37], the signal model in this paper is spatially wideband. Furthermore, the steering vectors are modeled implicitly by τ m for each individual Rx antenna. Besides the spatial-wideband effect, this also models the curvature of the wavefronts, which cannot be neglected due to the size of the subarrays compared to the area of interest. Furthermore, in contrast to [1,37], the coefficients a m are real-valued, thanks to the spatial coherence of the LoS components. Note that the coefficients a m, for the NLoS components may have any phase. In [25,26], it is argued that modeling the spatial-wideband effect is important even for collocated massive arrays and that it is critical for distributed systems.
We scale the signal u m (n), in a preprocessing step, by 1/a m , because the signal-tonoise ratios (SNRs) and noise powers are considered to be known to the receiver. Thereby, the useful signals in all the receiving channels have the same powers. Then, we have where u NLoS m (n) are the (scaled) NLoS components and w m (n) is a circular-complex zeromean Gaussian random process with variance σ 2 m , CN (0, σ 2 m ), independent across time samples and channels and independent from the waveform s(n). Then, the SNR in channel m is defined by The propagation attenuations are modeled by σ 2 m , but the model considers that no information about the transmitter location is contained in the attenuations. In practice, location estimation based on phases is more robust than the location estimation based on attenuations.
Our objective is to find the CRBs of the estimates of unknown parameters for the cases of (a) known and (b) random sequences. In the former case, the unknown parameters are α = (τ 0 , x, y, z) and in the latter, α = (x, y, z). Let us recall that the Tx is not timesynchronized with the Rx system. Therefore, the parameter τ 0 (the shift between the Tx and Rx time axes) is unknown to the signal processor. Even though we are not interested (in this context) in obtaining an estimate of τ 0 (we are only interested in estimating (x, y, z)-the Tx location), τ 0 must be modeled as an unknown and, therefore, as a part of α in case (a). Otherwise, an estimation algorithm could exploit this information to produce a better estimate than could be possible (on average) with the given model (and that would mean that the error bound would be different, too).
In contexts outside of the scope of this paper, an estimate of τ 0 might not be irrelevant. On the contrary, if the Rx system (such as a base station or a group of them) were synchronized to the universal time, and it sent its estimate of τ 0 to the Tx (the terminal), the Tx could adjust its clock to match the universal time as well (as an additional service, the system could provide to the terminal).
Another important thing to note is that, in case (b), regardless of how good or bad an estimation algorithm is, it would be unable to estimate τ 0 at all. An intuitive explanation could be: If τ 0 , s (t), and r were likely estimates of τ 0 , s(t), and the location, then the estimates τ 0 = τ 0 − ∆τ, s (t) = s (t − ∆τ), and r would be equally likely, for any ∆τ, no mater how large |∆τ| is, so the error would be unbounded. A formal explanation is as follows: the Fisher information matrix (see the next section) in case (b), if we chose α = (τ 0 , x, y, z) would be singular, and we need to invert it to obtain the CRB, which would mean that the CRB would tend to infinity.

Cramér-Rao Bounds
The authors in [23,38] provide CRBs for localization when the user terminal and the base station each have a single collocated (small aperture) antenna array. They show that, when the number of antennas grows, the CRBs of the LoS component parameters tend to the appropriate LoS-only CRBs, because the LoS components become separable from the NLoS ones, thanks to the channel sparsity in mmWave. In accordance with that, we derive the CRBs when u NLoS m (n) = 0. This allows us to quantify the impact of the location information in the LoS components on the attainable localization accuracy, regardless of the model for the NLoS components (whether they are modeled as random or deterministic and whether they are changing more slowly of quickly). Even if the NLoS components are not negligible in some scenarios in practice, the LoS-only CRBs are useful for selecting the resolution of a search grid for localization methods that use grids. If the system designer cannot predict how much the NLoS components would degrade the accuracy, it seems logical to use the resolution appropriate for the case without that degradation.

Known Sequences (Cooperative Transmitter)
This case corresponds to a scenario where a base station allocates a sequence for a cooperative user terminal. The terminal then transmits the sequence so that the base station can localize the terminal. From the assumptions about w m (n) in (2), we deduce that Re w m (n) and Im w m (n) have zero-mean Gaussian distributions with variance σ 2 m /2, N 0, σ 2 m /2 , where Re and Im denote the real and imaginary parts of the argument, respectively. It holds then, that E(u m (n)) = s m (n), where E stands for the expectation operator. Furthermore, the probability density function (PDF) of a single time sample is All the observations are given by the vector The PDF of u is given by and thus, the loglikelihood function, L = ln p(u; α), is The Fisher information matrix (FIM) of the unknown parameters is defined by After taking the first partial derivatives of L with respect to the unknown parameters given in α, we get expressions of the form where (if we use the symbol (·) to denote the first derivative) Then, after taking the second partial derivatives of L, we obtain where f 6 (n, m, α) is a deterministic function without an impact on the final result (as explained in Appendix A). These expressions depend on u m (n), which we treat as random variables (not their realizations). After applying the expectation operator to them, we get FIM elements of the form where the multiplication factor is given by and is common to all of the FIM elements. It is interesting to note that the first term in β takes into account the impact of the information in the carrier phase on the error bound, whereas the second term (the derivative of the sequence, s (·)) takes into account the bandwidth. Finally, we need to invert the FIM. The elements on the main diagonal of FIM −1 are the error bounds for τ 0 , x, y, and z, in that order. The bound on the mean squared localization error is then the sum of the elements (2, 2), (3,3), and (4, 4) of FIM −1 for 3D and the sum of (2, 2) and (3, 3) for 2D localization. Appendix A provides the steps of this derivation in more detail. Let SNR 0 be the SNR of a channel whose antenna is 1 m away from the transmitter. Based on the derived expressions for the FIM elements, we obtain that the CRB approximately depends on SNR 0 , the carrier frequency, and the number of samples is as follows:

Random Gaussian Sequences (Non-Cooperative Transmitter)
This case corresponds to a scenario where a base station tries to localize a (possibly non-cooperative) terminal transmitting a signal unknown to the base station (probably with a compact spectrum, e.g., OFDM). We reiterate that in our derivation for random sequences, the beginning of the period of the transmitted sequence, τ 0 , is unknown, but irrelevant for the localization. The reason is that a time-shifted white Gaussian process has the same statistical properties as its version without a time shift. The CRBs derived here are approximate, since the signal is not periodic with period N and the DFT-based time shift is cyclic.
Contrary to the scenario with known sequences, here, all of the MN-acquired time samples are dependent in pairs, which makes it inconvenient to form the joint distribution. Namely, if τ 0 + τ m in (1) were an integer, say 17, then s m (0), as well as u m (0) in (2), would depend only on s (17). However, in reality, τ 0 + τ m is not an integer, but say 17.439, and therefore u m (0) depends on all of s(0), s(1), . . . , s(N − 1) (s(n) being random variables and u m (0) being a function of them). The same is true for u m (1), u m (2), . . . , u m (N − 1) (for that particular channel m)-each of them depends on all s(0), s(1), . . . , s(N − 1). It would be easier if we had groups of variables such that the groups are independent from each other. Then, the total joint PDF would be a simple product of the PDFs of each group individually.
An idea for addressing this is to transform the received signals into the frequency domain (using DFT). We perform the DFT on the time domain vector [u m (0), u m (1), . . . , u m (N − 1)] to get a vector of DFT components, and we repeat this for each Rx channel m, m ∈ {1, 2, . . . , M}. This way, we obtain a new statistical sample, which contains MN frequency samples, instead of the old one, which contains MN time samples, but the amount of information in the new one and in the old one is the same (they are equivalent). Next, at a given frequency with index k, we can group the k-th DFT component from each of the Rx channels into an M-element vector [ u 1 (k), u 2 (k), . . . , u M (k)]. We show that these N groups (random vectors) are independent from each other (exactly what we were looking for). Finally, all that remains is to find the PDF of each group and calculate their product. We proceed by explaining these steps in more detail in the following text. As a convention, we will use the symbol to denote DFT spectra.
The DFT spectra of the signal and noises are where F denotes the DFT operator.
In order to express time shifts in the spectral domain realistically, let the discrete frequency be in the range k ∈ − N 2 , − N 2 + 1, . . . , where s m (k) = F (s m (n)). Since the noise signals w m (n) and the unknown useful signal s(n) are white (uncorrelated time samples) stationary complex Gaussian random processes, their DFT components, w m (k) and s(k), are also i.i.d. (independent and identically distributed) complex Gaussian [39]. The fact that the samples at different frequencies are independent is a very convenient property when we need to find the joint PDF-a property we do not have in the time domain. Note that the time samples s(n) and w m (n) are independent because, in the model, we consider that the sampling is performed at the Nyquist frequency. This, however, does not restrict the analysis in this paper to such systems. Even if a system performs oversampling (which has some advantages), it can reduce the sampling frequency to the Nyquist frequency later in the digital domain.
Regardless, the amount of information is the same (in the band of interest), which is important for the analysis of precision. Therefore, the model is general (it holds for both types of systems). To summarize in a more formal way, the properties of s(n) and w m (n) imply that Re s(k), Im s(k), Re s m (k), and Im s m (k) have Gaussian distributions, i.e., Re s m (k), Im s m (k) ∼ N 0, Nσ 2 s /2 , and Re w m (k), Im w m (k) ∼ N 0, Nσ 2 m /2 . Furthermore, the processes s m and w m are also independent, as are the random variables s m (k 1 ) and s m (k 2 ) (and similarly w m (k 1 ) and w m (k 2 )) for k 1 = k 2 . Finally, the real components are independent from the imaginary components.
Instead of having time samples in the statistical sample, we can compute and use the DFT samples of the received signals. So, the entire statistical sample (containing MN complex scalars from the frequency domain) is Note that the m-th row of the matrix u is the DFT spectrum of the signal from the m-th channel and that the columns (viewed as random vectors) are independent from each other. For the PDF of u, we can write because the random variables for different frequencies are independent [39]. Also, we have p( s(k)) = 1 p( w m (k)) = 1 where s(k), w 1 (k), . . . , w M (k) are independent. In Appendix B, first we show that we can write the loglikelihood as where γ = 1 , and where C m and D m are real and We note that γ and A do not depend on α.
The FIM of the parameters is given by Taking the first partial derivatives of L with respect to the unknown parameters given in α, we get expressions of the form ∂L ∂x Then, taking the second partial derivatives yields expressions of the form Next, we apply the expectation operator to the second derivatives (treating the DFT components u m (k) as random variables) to get the FIM elements, of the form where the multiplication factor is common to all of the FIM elements. Since the FIM in the random sequence case does not have a row/column that corresponds to τ 0 , the bound on the mean squared localization error is simply the trace of FIM −1 for 3D and the sum of the elements (1, 1) and (2, 2) for 2D localization. Appendix B provides the steps of this derivation in more detail. Based on the expressions for the FIM elements, the CRB approximately depends on SNR 0 , the carrier frequency, and the number of samples as follows:

Discussion
In this section, we present experimental results that provide further insights on the localization performance of the studied system. Figure 2 shows a 3D slice representation of the CRB for 3D localization inside a room of size 6 m × 4 m × 2.5 m, with 18 antennas on each wall and the ceiling (90 in total), marked with triangles, for SNR 0 = 25 dB, N = 1024, f c = 60 GHz/100 MHz, and a random Gaussian sequence. For convenience, the parameters are also displayed in Table 2, case 1. In the figure, we plotted √ CRB/λ c as a function of the 3D space, where λ c = c/ f c =c/ν c is the carrier wavelength. We imagined that a distributed array would come in prefabricated panels with subarrays of antennas for easy installation. So, we modeled four smaller subarrays (but still with inter-antenna distances considerably larger than λ c /2) on the walls, and a larger one (with a larger aperture) on the ceiling. Note that the cuboid frame is only a rough model of the room-There would be furniture and people inside and the walls are not expected to be completely flat in practice. Furthermore, they would not be ideal reflective surfaces, but the wave would penetrate the walls to an unknown depth, depending on their conductance and other parameters, changing the carrier phase by a significant amount. Using the information in NLoS components in coherent localization would be much more challenging than LoS components, because ray-tracing-based localization algorithms would have to know the depth, whereas fingerprinting-based methods might require the frequent retraining of their databases (since variations in the phases have to be taken into account here). However, this remains an interesting topic for plausibility studies in future research-Could the obstacles be anchored in space with high enough precision as well (as the Rx antennas are) to enable the gains of coherent localization from their reflected components? This is outside the scope of this paper, so we discard the NLoS components in this error-bound analysis. Figure 3 shows the CRB for a distributed array with five randomly placed antennas for 2D localization in the plain of the array, for SNR 0 = 10 dB, N = 64, and a random Gaussian sequence ( Table 2, case 2). Since the CRB is much lower inside the array aperture, this suggests that a distributed array should be placed in such a way that its aperture encompasses the area where one expects the transmitters.
It is worth noting that both of these example antenna array geometries were chosen to be irregular. An impact on localization performance could be seen as analogous to classical arrays, where nonuniform or sparse arrays perform better than uniform ones for a given number of antennas. In [40], a uniform antenna array with a large number of elements was optimized by reducing the number of used elements greatly (a more cost-effective solution), effectively creating a sparse array with performance comparable to that of the original array. The optimization of an array geometry is also out of scope here, but is an interesting topic for further research. Our idea is that this analysis would also be applicable to geometries that would be set up as ad hoc and then the antenna positions measured.  In Figure 4, we present how the 90% quantile of √ CRB depends on ν c , SNR 0 , and N, for a random Gaussian sequence ( Table 2, case 3). The plots show how the accuracy of localization improves with frequency, N, and SNR 0 . They also demonstrate the advantage of using mmWave because the increased frequency of the carrier allows for higher localization accuracy. The results for known sequences are similar.
It is also worth investigating how the CRBs depend on the number of antennas, M. Since the effect of relative placement of antennas to each other and the effect of increasing M are coupled, we have chosen to evaluate the CRBs for a uniform square distributed array with 2 × 2, 3 × 3, 5 × 5, and 9 × 9 antennas (so M ∈ {4, 9, 25, 81}). The arrays were attached to the ceiling of a room and the CRBs were evaluated in a square of size 4 m × 4 m and centered 1.2 m directly below the array, in a horizontal plane, for SNR 0 = 15 dB, N = 1024, f c = 60 GHz/100 MHz ( Table 2, case 4). We analyzed a constant aperture and a constant antenna distance scenario. In the former, the array's aperture was kept at 4 m × 4 m, whereas in the latter, the adjacent antennas were retained at 0.5 m from each other, effectively making the 9 × 9 arrays identical in both setups. Figure 5 compares the CRBs of the arrays in the constant aperture and Figure 6, in the constant antenna distance scenario.     We have also evaluated the CRB curves as a function of SNR 0 (recall that SNR 0 is the SNR of a channel whose antenna is 1 m away from the transmitter) for random Gaussian sequences and known sequences (Table 2, case 5). Again, the results are obtained by averaging the CRBs for locations of the transmitter over locations in the 4 m × 4 m square of the experiment. The results for the 2 × 2 and 9 × 9 arrays are shown in Figure 8. They suggest that increasing the number of antennas from 4 to 81 produces a gain of approximately 15 dB. The results also show the advantage of using mmWave massive MIMO systems because the localization can successfully be performed in low SNR conditions (low transmitted power in the mmWave range), even for unknown sequences (noncooperative scenarios). In [18], the authors present additional results for CRBs and localization algorithms evaluated for different sets of parameters, including different array geometries. Finally, we point out that in coherent direct localization, besides the main lobe of the criterion function of the localization algorithm, there usually exists a number of high sidelobes (the space between them is on the order of the carrier wavelength), so that some estimates are placed on these lobes, degrading the absolute error. This is called the (integer wavelength) ambiguity problem (for further details, please see [18]). However, for distributed beamforming (which is one of the important applications of coherent localization), this absolute error is irrelevant, whereas the error within the main lobe (the distance of its local maximum to the true location) determines its performance. This error is much smaller than the absolute one and is reflected in the CRB (which describes only local behavior). For this reason, the ambiguity problem is outside the scope of this paper. Some possible solutions, though, are mentioned in [18].

Conclusions
In this paper, we have addressed the performance limits of direct wideband coherent 3D localization in distributed mmWave massive MIMO for 5G cellular systems. In the derivation of the limits, we assume only the presence of spatially coherent line-of-sight signal components. We also assume collocated time-and phase-synchronized receiving front-ends with antennas distributed at known locations. The results for known and random Gaussian signals show that the CRB is inversely proportional to the squared carrier frequency. For a typical indoor setting and realistic system parameters (M = 5 × 18, SNR 0 = 25 dB, N = 1024), the square root of the CRB is smaller than λ c /1000 in 90% of the space. These results suggest important advantages of massive MIMO for 5G systems. Namely, one can preprocess the signals at the base station array to focus energy at the selected user antenna in the downlink and, conversely, to increase the gain in the uplink. This shifts the focus from location-based services to the location aided communication concept. This entails that coherent localization in LoS scenarios, typical for small cells in mmWave range, has the potential to improve the capacity considerably and the overall performance of 5G massive MIMO systems with distributed antennas. The results also show that the studied systems are especially suited for coherent localization due to high accuracy and the ability to perform localization in low SNR conditions (low radiated power), even with non-cooperative transmitters. The ambiguity problem of coherent localization can be solved in a number of ways, e.g, by optimizing the antenna geometry or by using signals from antenna subarrays distributed in the cell.
Finally, we provide some topics for future research. In practice, the propagation conditions inside a cell may vary over time, so it is desirable for a localization system to implement different methods (with different strengths and weaknesses), such as RSS and coherent ones. It would be valuable to compare their performance and design schemes that activate appropriate methods based on the detected conditions. Furthermore, optimization of antenna array geometries based on different criteria is of interest. One criterion is the accuracy, but the main accuracy gain seems to be achieved by placing the array so that the transmitters are inside its aperture. Another criterion is the reduction of the ambiguity problem, which can easily happen in coherent localization when the number of antennas is small. This is of interest in applications where absolute error is important (traffic collision prevention, automated robot and drone movement control).
A challenging direction of research is an error analysis in conditions with more pronounced NLoS components and the possibility to leverage information in strong reflections to improve coherent localization.
The exploration of a fair comparison between the error bounds of RSS (received signal strength; i.e., amplitude only) and coherent localization (from this paper) is also important. RSS localization puts no additional requirements on the hardware. To make the comparison fair, it would have to be carried out in conditions suitable for both types. Namely, RSS localization is robust against synchronization errors and a lack of spatial coherence in the medium. On the other hand, coherent localization is robust against deviations in the received power from a law with a given path loss exponent. This is especially important for 3D localization, where not all of the antenna polarizations would be aligned, so that it could even happen that an Rx antenna closer to the Tx has a weaker received signal power than another Rx antenna farther away.
Another interesting topic for further research is the possibility of using reconfigurable intelligent surfaces (RIS) to aid coherent localization for increased accuracy (the use of a RIS in non-coherent localization was already reported in [19]). Namely, if a localization system could measure very accurately the position of a RIS and if it controls (or knows) the phase changes that the RIS induces in the signal component, it reflects from a transmitter (that needs to be localized). Then, a coherent localization algorithm might be able to use such an NLoS component in addition to the LoS component for better performance (especially if the transmitter is well outside the aperture of the Rx antenna array). The paper [41] presents challenges and opportunities for using RISs in localization.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results. where (·) denotes the first derivative. Then, the first derivatives of the loglikelihood are If we define P m = jω c s m (n) + s pm (n) 2 , then for the second derivatives we obtain x − x m cd m y − y m cd m P m − (s m (n) − u m (n)) * f 6 (n, m, α) , (A15) where f i (n, m, α), 1 ≤ i ≤ 10, are deterministic functions, whose expressions are not given for brevity. They are all deterministic, and each of them is multiplied by the term (s m (n) − u m (n)) * whose expectation is equal to zero, and therefore, their values have no impact on the result in the next step. Then, for the first element of the FIM we write jω c s m (n) + s pm (n) 2 .
If we define β as after some simple derivation, we get because shifting a signal in time does not change its energy. Note that β does not depend on m. Then I 11 and the rest of the FIM elements can be expressed in compact forms respectively as The CRBs for the individual components are given by We define by ∆r = (x − x) 2 + (ŷ − y) 2 + (ẑ − z) 2 the error in locating the transmitter. For the bound of its mean squared value, we write where K ij is the cofactor of the (i, j)th element of I(α). For simplicity of notation, we will call this bound just the (transmitter location) CRB.
If the transmitter's z-coordinate is known (which corresponds to 2D localization), the corresponding FIM, I (2D) , is obtained by removing the last row and column of the FIM for the 3D case, I(α). Namely, Let the quantity SNR 0 be defined as the SNR value in a channel whose antenna would be 1 m away from the transmitter. If the signal power decreases with the square of the distance from the transmitter, and if the noise has equal power across the channels prior to scaling by 1/a m , starting from the derived expressions above, SNR m = SNR 0 d 2 0 /d 2 m , d 0 = 1 m, and the assumption that ω c is large in the sense that β ≈ ∑ N−1 n=0 |jω c s(n)| 2 , it can be shown that the CRBs approximately depend on SNR 0 , the carrier frequency, and the number of samples as follows: (Actually, CRB ∼ 1/ν 2 c . So, the expression CRB ∼ 1/ω 2 c holds if B is constant, since ω c = 2πν c /B. In addition, the dependence on B also exists in c.)

Appendix B. CRB Derivation-Random Gaussian Sequences
We start with writing the joint PDF p( u 1 (k), u 2 (k), . . . , u M (k)). This random vector is a function of M + 1 independent random variables (one for the useful signal, s(k), and M for the noise, w m (k)). We obtain the joint PDF by integrating over all the possible values of the signal random variable, s(k). If we denote Re s(k) = s R (k) and Im s(k) = s I (k), then p( u 1 (k), u 2 (k), . . . , u M (k)) = ∞ −∞ ∞ −∞ p(s R (k))p(s I (k)) × p w 1 (k) ( u 1 (k) − s 1 (k)) × p w 2 (k) ( u 2 (k) − s 2 (k)) × · · · × p w M (k) ( u M (k) − s M (k))ds R (k)ds I (k), where p w m (k) ( u m (k) − s m (k)) is the value of the PDF of the complex random variable w m (k) evaluated at u m (k) − s m (k). After some rearranging, we obtain The FIM is Note that γ, A do not depend on α. For the log likelihood, we write further (A49) Note that the first two terms of L do not depend on α, but the third term does. In the sequel, we use the following identities: (A73) Under similar assumptions as in the known sequence scenario, starting from the derived expressions, and with the approximations