## 1. Introduction

Progressive generations of cellular communication systems have a common property—each generation relies on a larger bandwidth and higher carrier frequency than the previous one. This is particularly noticeable in 5G, where in Frequency Range 2 (FR2), carriers above 24 GHz are utilized in combination with contiguous bandwidths of up to 400 MHz. Early commercial deployments in the US have demonstrated data rates beyond 1 Gbps [

1]. The same increases in bandwidth and carrier frequency also have direct benefits for positioning [

2]. This has led to intense research activities into 5G positioning as well as a dedicated study item in 3GPP. The final report of this study in 3GPP [

3] revealed that by using both delay and angle measurements, it is possible to satisfy commercial positioning performance requirements. Beyond positioning, the sparse nature of the channel at higher carriers allows the receiver to resolve the multipath components, which transforms them from foe to friend [

4,

5]. In particular, the geometric nature of the channel at mmWave frequencies leads to a relation between the multipath parameters and the physical environment, as the parameters are related to the location of the user (UE) with respect to the base station (BS) and the propagation environment. The simultaneous localization and mapping problem is then invoked to invert multipath parameters into geometric information that determines the user’s position and the locations of objects, based on the signal from a single base station. Several papers have addressed this problem, exploiting both line-of-sight (LOS) and non-LOS (NLOS) paths for position estimation, synchronization, and mapping in mmWave multiple-input multiple-output (MIMO) [

6,

7,

8] and mmWave multiple-input single-output (MISO) [

9,

10] contexts. However, this inversion is challenging due to a number of reasons: (i) estimation of the channel parameters involves solving a high-dimensional problem; (ii) the association between the objects in the environment and the measurements is not available; (iii) an object may give rise to multiple measurements, due to diffuse multipath; (iv) the measurements per object are not clustered. We will now treat each of these four challenges in turn.

Channel estimation for mmWave is a rich research area, which we cannot cover in detail here. Rather, we categorize these methods as search-based and search-free. Search-based methods, such as those using maximum likelihood (ML) [

11] and compressed sensing (CS) techniques [

12,

13], require an exhaustive search in the high-dimensional space of channel parameters, which entails high complexity. On the other-hand, search-free methods [

14], such as matrix or tensor decomposition based subspace methods, directly provide estimates of the channel parameters [

15] or rely on low-dimensional search [

16], thus avoiding the need for high-dimensional optimization. An important challenge of channel estimation for SLAM is that the different dimensions (angles of arrival and departure, delays and gains) should be correctly matched.

The unknown association between measurements and objects is a common problem in SLAM, and powerful methods to address it can be found in the literature [

17,

18]. SLAM with radio-based measurements has been considered in the context of ultra-wideband (UWB) communication [

19,

20] using only distance measurements, which is referred to multipath-assisted SLAM, or channel SLAM. Focusing on the application of SLAM in a 5G mmWave context (which we designate ’5G SLAM’), message passing-based estimators were introduced in References [

7,

21], based on the concept of non-parametric belief propagation, without the data association (DA). Extension of such methods to include the hidden DAs is possible, following the approaches from Reference [

22]. In Reference [

23], the probability hypothesis density (PHD) filter, which is a random-finite-set filter, was used to solve the 5G SLAM problem, considering only one measurement per object. In Reference [

24], a more powerful random-finite-set filter, Poisson multi-Bernoulli mixture (PMBM) filter, was used, which enumerates all possible DAs.

Multiple measurements per object, while common in the SLAM and extended object tracking literature [

25], have been generally ignored in the above 5G SLAM works. The multiple measurements per object are caused by diffuse multipath due to object roughness with respect to the wavelength, depicted in

Figure 1. In Reference [

5], diffuse multipath is seen as a perturbation, leading to false measurements. In Reference [

26], exploitation of the diffuse multipath in radar is proposed by means of diffuse multipath statistics. In Reference [

27], surface roughness was considered in radar applications, modeled as a number of sub-reflectors, in an environment with known wall geometry. A similar model with random sub-reflectors was evaluated in Reference [

28], where the estimated diffuse paths were used for positioning and mapping, using a simple geometric approach.

Finally, when measurements from an object are not clustered, the unknown grouping can be considered within the SLAM filter [

29,

30,

31], though this comes at a high computational cost. In Reference [

28], a K-means clustering was utilized, but this requires a priori knowledge of the number of clusters. In Reference [

32], K-power-means was proposed, as well as several criteria to decide the number of clusters. In Reference [

24], the perfect clustering was assumed.

In this paper, we address the aforementioned challenges, building on the extensive literature in each of the above research areas in order to provide an end-to-end framework for SLAM harnessing diffuse multipath. The proposed end-to-end framework provides a general approach for user localization and environment mapping in 5G downlink transmissions from a single BS. Therefore, the purposed framework can be utilized in many application areas, including personal navigation [

4], localization of cars and robots [

33], smart homes [

34], indoor location analysis [

35], immersive customer experiences [

36], location-aided communication [

37], personal radar [

38], to name but a few. Moreover, the proposed framework can form a foundation for future Beyond 5G and 6G localization and sensing approaches [

39]. Our framework is built on a layered approach, comprising three main parts (channel estimation, clustering, and SLAM), which are evaluated separately and end-to-end. The main contributions of this paper are as follows:

The description of an end-to-end framework for SLAM harnessing diffuse multipath and its performance evaluation.

The evaluation of clustering and assignment methods, which is suitable for estimated channel parameters under both specular and diffuse multipath, as well as a method to utilize the estimated channel gains for improving the clustering in the 5G SLAM problem.

The extension of the 5G SLAM likelihood function, in order to harness both specular and diffuse multipath components and to classify different object types according to their roughness, while accounting for clustering errors.

The novelty of the proposed approach compared to the existing random finite set (RFS) based 5G SLAM work [

23,

40] is three-fold—first of all, References [

23,

40] did not use a real channel estimator, which makes the problem easier. Secondly, they assumed at most one measurement from an object, which is not the real case. Finally, the PHD filter is not optimal, which does not contain the enumeration of the different data associations. In Reference [

24], although the measurements are from the ESPRIT channel estimator and the diffuse multipath is considered, the channel gain is still ignored and the channel estimation results are assumed to be well grouped based on the source. In the current paper, we study the whole framework, from downlink signals to SLAM filter. We also fully use the information given by the channel estimator, including diffuse multipath and channel gain. The PMBM filter is used, which is optimal and enumerates all possible data associations.

The remainder of this paper is structured as follows. The system model is described in

Section 2, including the signal model, environment model, sensor, and measurement model. The end-to-end framework is then presented in

Section 3, specifying the components that will be detailed in the subsequent sections, starting with channel estimation in

Section 4, clustering in

Section 5, and the novel likelihood in

Section 6. Simulation results are presented in

Section 7, followed by our conclusions in

Section 8. The paper also contains several appendices describing the geometric expressions of the channel parameters, as well as the details of the SLAM method.

#### Notations

Scalars (e.g.,

x) are denoted in italic, vectors (e.g.,

$\mathit{x}$) in bold, matrices (e.g.,

$\mathit{X}$) in bold capital letters, sets (e.g.,

$\mathcal{X}$) in calligraphic, tensors (e.g.,

$\mathcal{X}$) in bold calligraphic. Transpose and Hermetian are denoted by

${\xb7}^{\mathsf{T}}$ and

${\xb7}^{\mathsf{H}}$, and

$\u0237=\sqrt{-1}$. Furthermore,

$\left|\phantom{\rule{0.166667em}{0ex}}\xb7\phantom{\rule{0.166667em}{0ex}}\right|$ denotes the absolute value of a scalar, or the cardinality of a set;

$\u2225\phantom{\rule{0.166667em}{0ex}}\xb7\phantom{\rule{0.166667em}{0ex}}\u2225$ denotes the Euclidean norm of a vector. A Gaussian density with mean

$\mathit{\mu}$ and covariance

$\mathbf{\Sigma}$, evaluated in value

$\mathit{x}$ is denoted by

$\mathcal{N}(\mathit{x};\mathit{\mu},\mathbf{\Sigma})$. Finally, the notations of important variables are summarized in

Table 1.

## 8. Conclusions

In this paper, we have treated the 5G SLAM problem from an end-to-end perspective, including downlink data transmission, channel estimation, clustering, and the SLAM filter. In the 5G SLAM problem, we aim to localize and synchronize a user while mapping the propagation environment, with the help of downlink signals from a single base station. We have proposed a novel method to cluster the MPCs by projecting the high-dimensional data into 3D points and then cluster the points based on the DBSCAN algorithm, which we augmented to account for the channel gains. We have also proposed a novel likelihood function in the 5G SLAM filter, which accounts for both the specular path as well as the diffuse multipath components.

Our results show that the ESPRIT channel estimator can estimate the channel parameters of both specular and diffuse multipath, and that the proposed system can directly use the raw un-clustered channel estimation results by applying the proposed clustering algorithms. With the help of the novel likelihood function, the proposed scheme can accurately estimate the number of landmarks, their types (i.e., roughness), and positions, and the channel gain is helpful in clustering and mapping and positing. The results also confirm that the proposed method can handle mapping and vehicle state estimation simultaneously, and highlight the benefit of considering both specular and diffuse multipath. In addition, the channel gains turn out the be highly informative for synchronizing the user to the base station.

The proposed framework has two computational bottleneck—the ESPRIT channel estimator and the particle filter used in the PMBM SLAM. In order to enable real-time execution, there is a need for faster solutions for both the channel estimation and the SLAM filter. The solutions could be either in the form of new algorithms, or by offloading the computation to more powerful edge computing systems, where edge servers can provide high-performance computing capability closer to end users [

68,

69,

70].