5G SLAM Using the Clustering and Assignment Approach with Diffuse Multipath

5G communication systems operating above 24 GHz have promising properties for user localization and environment mapping. Existing studies have either relied on simplified abstract models of the signal propagation and the measurements, or are based on direct positioning approaches, which directly map the received waveform to a position. In this study, we consider an intermediate approach, which consists of four phases—downlink data transmission, multi-dimensional channel estimation, channel parameter clustering, and simultaneous localization and mapping (SLAM) based on a novel likelihood function. This approach can decompose the problem into simpler steps, thus leading to lower complexity. At the same time, by considering an end-to-end processing chain, we are accounting for a wide variety of practical impairments. Simulation results demonstrate the efficacy of the proposed approach.


Introduction
Automatic vehicle license plate recognition (AVLPR) is used in a wide range of applications including automatic vehicle access control, traffic monitoring, and automatic toll and parking payment systems. Implementation of AVLPR systems is challenging due to the complexity of the natural images from which the license plates need to be extracted, and the real-time nature of the application. An AVLPR system depends on the quality of its physical components which acquire images and the algorithms that process the acquired images. In this paper, we focus on the algorithmic aspects of an AVLPR system, which includes the localization of a vehicle license plate, character extraction and

Introduction
Progressive generations of cellular communication systems have a common property-each generation relies on a larger bandwidth and higher carrier frequency than the previous one. This is particularly noticeable in 5G, where in Frequency Range 2 (FR2), carriers above 24 GHz are utilized in combination with contiguous bandwidths of up to 400 MHz. Early commercial deployments in the US have demonstrated data rates beyond 1 Gbps [1]. The same increases in bandwidth and carrier frequency also have direct benefits for positioning [2]. This has led to intense research activities into 5G positioning as well as a dedicated study item in 3GPP. The final report of this study in 3GPP [3] revealed that by using both delay and angle measurements, it is possible to satisfy commercial positioning performance requirements. Beyond positioning, the sparse nature of the channel at higher carriers allows the receiver to resolve the multipath components, which transforms them from foe to friend [4,5]. In particular, the geometric nature of the channel at mmWave frequencies leads to a relation between the multipath parameters and the physical environment, as the parameters are related to the location of the user (UE) with respect to the base station (BS) and the propagation environment. The simultaneous localization and mapping problem is then invoked to invert multipath parameters into geometric information that determines the user's position and the locations of objects, based on the signal from a single base station. Several papers have addressed this problem, exploiting both line-of-sight (LOS) and non-LOS (NLOS) paths for position estimation, synchronization, and mapping in mmWave multiple-input multiple-output (MIMO) [6][7][8] and mmWave multiple-input single-output (MISO) [9,10] contexts. However, this inversion is challenging due to a number of reasons: (i) estimation of the channel parameters involves solving a high-dimensional problem; (ii) the association between the objects in the environment and the measurements is not available; (iii) an object may give rise to multiple measurements, due to diffuse multipath; (iv) the measurements per object are not clustered. We will now treat each of these four challenges in turn.
Channel estimation for mmWave is a rich research area, which we cannot cover in detail here. Rather, we categorize these methods as search-based and search-free. Search-based methods, such as those using maximum likelihood (ML) [11] and compressed sensing (CS) techniques [12,13], require an exhaustive search in the high-dimensional space of channel parameters, which entails high complexity. On the other-hand, search-free methods [14], such as matrix or tensor decomposition based subspace methods, directly provide estimates of the channel parameters [15] or rely on low-dimensional search [16], thus avoiding the need for high-dimensional optimization. An important challenge of channel estimation for SLAM is that the different dimensions (angles of arrival and departure, delays and gains) should be correctly matched.
The unknown association between measurements and objects is a common problem in SLAM, and powerful methods to address it can be found in the literature [17,18]. SLAM with radio-based measurements has been considered in the context of ultra-wideband (UWB) communication [19,20] using only distance measurements, which is referred to multipath-assisted SLAM, or channel SLAM. Focusing on the application of SLAM in a 5G mmWave context (which we designate '5G SLAM'), message passing-based estimators were introduced in References [7,21], based on the concept of non-parametric belief propagation, without the data association (DA). Extension of such methods to include the hidden DAs is possible, following the approaches from Reference [22]. In Reference [23], the probability hypothesis density (PHD) filter, which is a random-finite-set filter, was used to solve the 5G SLAM problem, considering only one measurement per object. In Reference [24], a more powerful random-finite-set filter, Poisson multi-Bernoulli mixture (PMBM) filter, was used, which enumerates all possible DAs.
Multiple measurements per object, while common in the SLAM and extended object tracking literature [25], have been generally ignored in the above 5G SLAM works. The multiple measurements per object are caused by diffuse multipath due to object roughness with respect to the wavelength, depicted in Figure 1. In Reference [5], diffuse multipath is seen as a perturbation, leading to false measurements. In Reference [26], exploitation of the diffuse multipath in radar is proposed by means of diffuse multipath statistics. In Reference [27], surface roughness was considered in radar applications, modeled as a number of sub-reflectors, in an environment with known wall geometry. A similar model with random sub-reflectors was evaluated in Reference [28], where the estimated diffuse paths were used for positioning and mapping, using a simple geometric approach.
Finally, when measurements from an object are not clustered, the unknown grouping can be considered within the SLAM filter [29][30][31], though this comes at a high computational cost. In Reference [28], a K-means clustering was utilized, but this requires a priori knowledge of the number of clusters. In Reference [32], K-power-means was proposed, as well as several criteria to decide the number of clusters. In Reference [24], the perfect clustering was assumed.
In this paper, we address the aforementioned challenges, building on the extensive literature in each of the above research areas in order to provide an end-to-end framework for SLAM harnessing diffuse multipath. The proposed end-to-end framework provides a general approach for user localization and environment mapping in 5G downlink transmissions from a single BS. Therefore, the purposed framework can be utilized in many application areas, including personal navigation [4], localization of cars and robots [33], smart homes [34], indoor location analysis [35], immersive customer experiences [36], location-aided communication [37], personal radar [38], to name but a few. Moreover, the proposed framework can form a foundation for future Beyond 5G and 6G localization and sensing approaches [39]. Our framework is built on a layered approach, comprising three main parts (channel estimation, clustering, and SLAM), which are evaluated separately and end-to-end. The main contributions of this paper are as follows: • The description of an end-to-end framework for SLAM harnessing diffuse multipath and its performance evaluation. • The evaluation of clustering and assignment methods, which is suitable for estimated channel parameters under both specular and diffuse multipath, as well as a method to utilize the estimated channel gains for improving the clustering in the 5G SLAM problem. • The extension of the 5G SLAM likelihood function, in order to harness both specular and diffuse multipath components and to classify different object types according to their roughness, while accounting for clustering errors. In mmWave simultaneous localization and mapping (SLAM) applications, each object can give rise to a specular path as well as multiple diffuse paths. The number and spread of these diffuse paths depend on the roughness of the object. At the receiver side, diffuse paths from an object have similar delays and angles, so that they can only be resolved with sufficient bandwidth and number of antennas.
The novelty of the proposed approach compared to the existing random finite set (RFS) based 5G SLAM work [23,40] is three-fold-first of all, References [23,40] did not use a real channel estimator, which makes the problem easier. Secondly, they assumed at most one measurement from an object, which is not the real case. Finally, the PHD filter is not optimal, which does not contain the enumeration of the different data associations. In Reference [24], although the measurements are from the ESPRIT channel estimator and the diffuse multipath is considered, the channel gain is still ignored and the channel estimation results are assumed to be well grouped based on the source. In the current paper, we study the whole framework, from downlink signals to SLAM filter. We also fully use the information given by the channel estimator, including diffuse multipath and channel gain. The PMBM filter is used, which is optimal and enumerates all possible data associations.
The remainder of this paper is structured as follows. The system model is described in Section 2, including the signal model, environment model, sensor, and measurement model. The end-to-end framework is then presented in Section 3, specifying the components that will be detailed in the subsequent sections, starting with channel estimation in Section 4, clustering in Section 5, and the novel likelihood in Section 6. Simulation results are presented in Section 7, followed by our conclusions in Section 8. The paper also contains several appendices describing the geometric expressions of the channel parameters, as well as the details of the SLAM method.

Notations
Scalars (e.g., x) are denoted in italic, vectors (e.g., x) in bold, matrices (e.g., X) in bold capital letters, sets (e.g., X ) in calligraphic, tensors (e.g., X ) in bold calligraphic. Transpose and Hermetian are denoted by ⋅ T and ⋅ H , and  = √ −1. Furthermore, ⋅ denotes the absolute value of a scalar, or the cardinality of a set; ⋅ denotes the Euclidean norm of a vector. A Gaussian density with mean µ and covariance Σ, evaluated in value x is denoted by N (x; µ, Σ). Finally, the notations of important variables are summarized in Table 1.

System Model
In this section, we describe the basic models of the user, the environment, the propagation channel, and the observed waveform. We use the 5G positioning reference signals (PRS) [41] as pilot signals. These signals are broadcast by the BS, and used for positioning according to the 3GPP standards [3,42]. Since the signals are broadcast, there is no multi-user issue or interference, so that we only focus on a single UE, where the proposed algorithms can be assumed to be executed per UE.

User Model
In this paper, we only consider the single-user scenario, so the cooperation between different users is beyond the scope of this paper. The dynamic state of the user at time step k is denoted by s k , which contains the position of the user x UE,k = [x k , y k , z k ] T , heading k , translation speed ζ k , turn rate k and clock bias B k . The dynamic model of s k is where v(⋅) denotes a known transition function; q k is the process noise, modeled as a zero-mean Gaussian with known covariance Q k . The above dynamic model can be expressed, equivalently, in terms of a transition density f (s k s k−1 ) = N (s k ; v(s k−1 ), Q k ).

Environment Model
We consider an environment with a few landmarks. There is a fixed and known BS, located at The other unknown landmarks are modeled as different types of surfaces (see Figure 2). A surface can be described with an extended state comprising: • A point f on a corner of the surface and a vector u normal to the surface. • The size of the surface in length l and height h. The width w is not relevant. • The smoothness of the surface, denoted by α R ≥ 0. • The scattering attenuation 1 ≥ S ≥ 0 and reflection attenuation 1 ≥ R ≥ 0, with R 2 + S 2 ≤ 1, in which the remaining power is absorbed in the surface. To avoid dealing with such a high-dimensional surface state description, we use a simplified state, containing a fixed virtual anchor (VA) with location x VA ∈ R 3 , which is the reflection of the BS with respect to the surface, that is, Note that the VA location is surface-specific. We only consider 3 different types of surfaces: smooth surface (SM, with S = 0, R = 0.8, α R = 100), medium rough surface (MR, with S = 0.4, R = 0.6, α R = 4) and very rough surface (VR, with S = 0.8,

Channel Model
At time step k, the channel from the BS to the user in the frequency domain is given by [11]: where δ(⋅) is the delta Dirac function, f is the frequency, θ is the angle of arrival (AOA), φ is the angle of departure (AOD), I k is the number of landmarks in the environment (i = 0 represents the BS); L i k is the number of paths from each landmark (l = 0 represents the specular path). For each surface, there is at most one deterministic specular component among the paths, while all remaining paths are diffuse components and thus random [28]. We denote the total number of paths by P k = ∑ Each path i, l can be described by a complex gain g i,l k , a time of arrival (TOA) τ i,l k , an AOA pair θ i,l k in both azimuth and elevation, and an AOD pair φ i,l k in both azimuth and elevation. The generative model for each of these parameters is described as • LOS path: When i = 0, L i k = 1, the gain has uniform phase and has power g 0,0 where λ is the wavelength, and the TOA, AOA, and AOD follow the geometric relations between the BS and the UE. They are given in Appendix A.
• Specular path from surface i: For i > 0, l = 0, the point of incidence on the i-th surface (denoted by L i with virtual anchor x VA,i ) is the intersection of the surface L i and the line between the i-th virtual anchor x VA,i and the UE x UE . The specular path gain has uniform phase and power The TOA, AOA, and AOD follow the relative position of the UE, BS, and the incidence point on the surface. They are given in Appendix A. • Diffuse paths from surface i: For i > 0, l > 0, the number of paths per surface and their spread in angle and delay, as well as the channel gains, depend on the roughness of that surface. These paths can be interpreted as coming from random points x i,l k on the surface, with a spatial distribution p i (x sc x UE , x BS ) that depends on the roughness, where x sc is the random variable that describes the position of the diffuse point. The diffuse points are generated from the distribution where L i ⊂ R 3 denotes the set of points that make the i-th surface, ν 1 (x sc ) is the deviation of the scattering angle with respect to the angle of the specular path (i.e,. ν 1 (x sc ) = 0 when x sc is the incidence point of the specular path), ν 2 (x sc ) is the angle between the impinging ray (i.e., from the transmitter to x sc ) and the surface normal, and ν 3 (x sc ) is the angle between the departing ray and the surface normal. The diffuse paths have uniform and independent phases and equal power g i>0,l>0 The locations of the diffuse points x i,l k fully determine their corresponding TOA, AOA, and AOD, provided in Appendix A.
In addition, we consider OFDM transmission with M S = M 5 subcarriers and subcarrier spacing ∆ f . The received signal at subcarrier s at time step k is then [44] Y s,k = where S s ∈ C M T ×T is the pilot signal over subcarrier s (spanning T OFDM symbols), N s,k is white Gaussian noise with vec(N s,k ) ∼ CN (0, σ 2 I M R T ), and H s is the channel frequency response,

Methodology and End-to-End Framework
Using the raw measurements Y s directly in SLAM is challenging, due to the high dimensionality of the measurement, the complex nonlinear relation to the user and landmark states, and the fact that not all paths may be resolvable, due to a limited number of transmitter and receiver antennas and bandwidth. While such direct localization has performance benefits [45][46][47], we instead consider a layered approach, visualized in Figure 3, comprising the following steps after downlink transmission of the signals Y s .

1.
First of all, channel estimation is performed to recover the channel parameters (angles, delays, gains). Due to the finite resolution at the receiver side, not all paths are resolvable. Hence, the number of estimated paths (denoted byP k ) will be much smaller than I k × L i k . The channel estimator thus provides a set of channel parameter estimates Z k at time k, Z k = {z 0 k , z 1 k , . . . , zP k −1 k }. Each element z p k ∈ Z k is either a clutter, which is caused by noise peaks that are detected as paths during channel estimation, with clutter intensity c(z) or follows where w p k denotes the measurement noise; x p k is x BS for LOS, the incidence point of the deterministic specular components, or a (random) point on the surface for a NLOS component. We recall that the underlying geometric relation h(x p k , s k ) can be found in Appendix A. We describe the channel estimator in Section 4.

2.
After channel estimation, we group the unordered elements in Z k in clusters Z i k , where each cluster should correspond to one landmark. This removes the need to consider all possible partitions of the measurements in the SLAM method, drastically reducing overall complexity. Clustering is challenging as measurement clusters may be non-convex. In addition, diffuse paths may be far away from the specular paths, leading to possible miss-classifications. The proposed clustering method is described in Section 5.

3.
Finally, after clustering, the SLAM method requires a likelihood function that expresses the statistical relation between the state and the clustered measurements, (Z i k x, s k ). The SLAM method is deferred to Appendix B, while in the main text we focus on the proposed likelihood function in Section 6. The SLAM filter follows a Rao-Blackwellized approach, where we use a set of particles (indexed by n) to represent the user state, and use PMBM densities conditioned on each particle to represent the map. Clustering and likelihood computation are conditioned in the user state and are thus performed per particle.

Channel estimation
Clustering Z < l a t e x i t s h a 1 _ b a s e 6

l a t e x i t s h a 1 _ b a s e 6 4 = " F l m m W p 4 A l 4 j W + x 3 v e m y r n r 3 q Y 5 U = " > A A A C B H i c b V D L S s N
O m P X s g z 8 w P n 4 A u n O b 1 Q = = < / l a t e x i t > SLAM UE particle weight Estimate of map UE particle

Channel Estimation
In this section, the ESPRIT channel estimator is introduced.

Background
The standard formulation of the channel estimation problem is an ML problem, wherê in which Θ contains the delays, gains, and angles of all the paths, as well as the number of paths. Since the number of paths is unknown and paths are not all resolvable, model order selection techniques can be applied, for example, by adding a regularizer to (17) ( [11], Section 5.2), for example, following the minimum description length (MDL) or Akaike information criterion (AIC). The dense multipath can be treated as a stochastic process, by writing (14) as where W dm s,k is the dense multipath containing all the contributions from the diffuse paths. The dense multipath is then modeled as complex Gaussian process with zero mean and covariance R dm k , possibly with unknown elements. Then the entire noise process has covariance R dm k + σ 2 I M R T . Alternatively, we can estimate the geometric parameters of the dominant diffuse multipath components from MIMO channel measurements, which leads to a multidimensional harmonic retrieval (HR) problem [48]. Numerous HR techniques have been developed, ranging from multidimensional searching or optimization based, such as multiple signal classification (MUSIC), ML and CS techniques [49], and polynomial-rooting or matrix-shifting based search-free methods, such as root-MUSIC [14] and estimation of parameters by rotational invariant techniques (ESPRIT) and their multidimensional extensions [50]. Due to its simplicity and high-resolution capability, ESPRIT-type algorithms have become one of the popular HR techniques [51].

ESPRIT Channel Estimator
For notational convenience, we drop the time index k. From (14), the received signal on subcarrier s is of the form where H s is the channel frequency response, S s is a known pilot signal with orthogonality property (S s S H s = βI M T ×M T ), and N s is i.i.d. Gaussian noise. Then we have where W s is also i.i.d. Gaussian noise with a scaled covariance matrix.

Observations in Tensor Form
We utilize a Tensor framework to exploit the R-D grid structure inherent in the data, as well as the Vandermonde structure in angle and delay domains. To this end, we map from geometric channel parameters to spatial frequencies by For subcarrier i, X i and W i are M 3 M 4 × M 1 M 2 matrices. We convert these M 5 matrices (one per subcarrier) to a 5D tensor of suitable dimension, X , H and W ∈ C M 1 ×M 2 ×M 3 ×M 4 ×M 5 . For the p-th path, the equivalent 5D array steering tensor can be written as where ○ represents the outer product (Note that is equivalent to the uniform linear array steering vector composed of M r sensors, p = 1, 2, ⋯, P, where P is the total number of paths, and r = 1, 2, ⋯, 5. This allows us to express the observation as where g p denotes the complex path gain of the p-th path.

Shift Invariance
We now introduce the multidimensional shift invariant structure in each dimension r, in order to apply Tensor-ESPRIT. We first introduce so-called selection matrices J 1,(r) , out of M r elements belonging to the first and the second subarray in the r-th mode, respectively. For example, when M where I and 0 denote the identity matrix and zero vector, respectively, and the sizes are defined in subscripts. Secondly, we introduce ⋯ a µ P (r) , which has the shift invariance property where is a diagonal matrix that contains the unknown parameters µ (r) l , l = 1, 2, ⋯, L, in dimension r.

Tensor-ESPRIT
In order to obtain the subspace spanned by A (r) , we take CANDECOMP/PARAFAC (CP) decomposition on X [52]. Because the total number of paths P is unknown, model order selection techniques [53] can be utilized to estimateP. In general, the estimatedP ≪ P for rough surfaces with hundreds of closely located diffuse paths.
whereĝ = ĝ 1ĝ2 ⋯ĝP are the estimated path gains, and U (r) = u 1 (r) u 2 (r) ⋯ uP (r) is the estimated signal subspace with normalized column vectors u p (r) . Note that after taking CP decomposition, the path gains g p and channel parameter associations are achieved synchronously. In other words, g p corresponds to a unique u p (r) in each dimension r, so that the output of (26) is Since A (r) and U (r) span the same subspace, A (r) = U (r) T (r) , where T (r) ∈ CP ×P is a non-singular transform matrix. Entering this relation in (25), we obtain where In (27), only Ψ (r) is unknown, but can be estimated via the least squares method:Ψ where ⋅ † denotes the pseudo-inverse. Since the matrix Ψ (r) is similar to the diagonal matrix Φ (r) , they share the same eigenvalues. Hence, the spatial frequency can be recovered asμ . . ,P} is returned as the output of the channel estimator. We denote this combined output as Z.
The most computationally demanding part of channel parameter estimation is the CP decomposition. In (Reference [54], Table 1), the complexities of major computations in popular CP decomposition algorithms is summarized. For example, the alternating least squares (ALS) algorithm with line search has a complexity of order O 2 RP J + RP 3 , where J = ∏ R r=1 M r and R = 5 andP denotes the total number of paths.

Channel Parameter Clustering
At time step k, the channel estimator provides a set of channel parameter estimates Z k , }, and we need to cluster Z k based on different landmarks. However, clustering is challenging, since Z k is in a 6D space, and the delay, angle, and gain are in different scales and spaces, so that they should be properly weighted. We also do not have any advance knowledge of the number of landmarks in the environment. In this section, we try to address these challenges. For brevity, we will omit the time index k.

Background
As a general term, a cluster is defined as a collection of objects that are similar to each other in some agreed-upon sense [55]. In radio channel analysis, a cluster is usually described as a group of multipath components (MPCs) with the same parameter distribution. The MPCs that have similar values in delay and angular domain are jointly classified as a single cluster, which reflects the physical environment, as similar MPCs are usually spatially close to each other, for example, reach the receiver via the same landmark.
Clustering methodology in radio channel analysis has been expanded from visual clustering to automatic clustering, where the visual clustering is usually valid for data with limited dimensions [55,56], while the automatic clustering can handle data with more than three dimensions [57]. Automatic clustering algorithms, focusing on the parameter space of MPCs, such as Hierarchical, K-means, and Gaussian mixture, have been widely used in radio channel characterizations [32,[58][59][60][61]. Hierarchical clustering algorithms cluster the data based on a binary cluster tree, which is limited by the data size, and therefore limited to small data sets. K-means [62] assumes a known number of clusters and iteratively assigns data to each cluster and computes cluster centroids until the cluster centroids are converged. While K-means is a widely used clustering method, it suffers from several drawbacks: (i) the number of clusters is predefined, which limits the capabilities to reflect the reality of the environment; (ii) it cannot cope with outliers, which implies all the observed data will eventually be part of some clusters, even observations that are far away in 6D space and should be considered as outliers; (iii) it usually gives spherical-like shape clusters. MPCs that reflect other physical properties can be assigned as the edge of a spherical cluster, which has notable effects on the cluster properties, that is, cluster centroids, and loses the ability to link to physical reality. The more sophisticated affinity propagation [63] partially avoids these problems, but still leads to circular clusters and has very high complexity. Gaussian mixture clustering [57] gives more variation to the shapes of the clusters extracted with K-means, and follows the similar structure of K-means. It can give clusters in different shape if the original data is not distributed circularly. However, the method is complex and converges slowly.
Density-based spatial clustering of applications with noise (DBSCAN) [64] is one of the most widely applied clustering methods for data sorting. Without a predefined number of clusters, DBSCAN utilizes two critical parameters-the minimum number of points clustered together for a region to be considered dense, and a distance measure to locate the points in the neighborhood of any point. The density-based clustering method has no limitations on the number of clusters, uses a dense distance, and gives more freedom regarding the shape of clusters. While we have evaluated different methods in Section 7, here we only present the best performing method, which is an extension of the DBSCAN algorithm.

Modified DBSCAN
In this section, we will present the DBSCAN clustering method with the obtained channel estimates. The method involves three phases-(i) the mapping of the geometric channel parameters (delay and angles) to a 3D point; (ii) clustering with DBSCAN; (iii) refinement by using the channel gain.

Phase 1: 5D to 3D Mapping
Distinct from existing work where the channel parameters are used as features for clustering [28,61], we transform the 5D channel parameter into a 3D position. In the absence of noise, this point should be on the surface of a landmark for NLOS paths, while in the presence of noise, the most likely estimation of the point can be founded close to the surface. The transformation reduces the dimensionality, and avoids different scales and spaces problems. Hence, the 5D parameters, that is, the delayτ p , the AOA pairθ p = [θ The general idea to transform the 5D parameters into a 3D point is illustrated in Figure 4. Specifically, for the p-th diffuse multipath, a virtual anchor position can be estimated aŝ where c is the speed of light,B is the estimated clock bias, f p R is the unit vector pointing along with the direction of arrival, given by Note that this method also works for the LOS path, and it will return a point nearx BS , sincex   Figure 4. Illustration of the 3D transformation from the 5D parameters: from the channel estimates, a hypothesized landmark (a virtual anchor) locationx p LM is determined. From the landmark and BS locations, the incidence pointx p is derived.

Phase 2: Clustering with DBSCAN
In DBSCAN [64], all points are classified into either clusters or identified as noise. In a specific cluster C, the core points have an -neighbourhood with N min ∈ N points. We define a distance measure d ∶ R 3 × R 3 → R from which the -neighbourhood of the p-th point is defined as In this paper, we use d(x p ,x q ) = x p −x q . The main idea of DBSCAN is that clusters are not characterized by their variance (as in K-means), but by density-reachability. In particular,x q is directly density-reachable fromx p , if and only if C p ≥ N min andx q ∈ C p . In addition,x p is density-reachable fromx q , if there exists an ordered sequence of pointsx p 0 =x q ,x p 1 , . . . ,x p n =x p in D, wherex p i+1 is directly density-reachable fromx p i . If the sphere ofx p contains at least N min points, that is, C p ≥ N min , the pointx p will be a core point of a cluster. The border points can have a smaller size -neighbourhood, but can be density-reachable from the core points. Given the parameters and N min , we can discover a cluster in a two-step approach: (i) choose an arbitrary point from the database satisfying the core point conditions as a seed; (ii) find all points that are density-reachable from the seeds [64]. We follow these two steps, that is, to identify a core point, to find a cluster with the core point, and to find all clusters. The algorithms are provided in Algorithm 1. Note that Algorithm 2 is used in Algorithm 1 to find all points in each cluster. The output of the clustering is a partitioning of Z (or D) into clusters P l and a set of noise points P N .
1: Compute pair-wise Euclidean distance for all points, store them in D, and label all points unvisited; 2: l ← 0 and p ← 0; // cluster index l and point index p 3: while p < P do 4: if Point p is unvisited then 5: Label point p as visited; 6: Find all neighbours within distance of p;

7:
if -neighbourhood is less than N min then 8: Classify the point p as noise; if Pointx j is unvisited then 7: Label pointx j as visited; 8: Find all neighbours within distance ofx j , that is, C j ; 9: if C j ≥ N min then 10: Add all indexes in C j to c; 11: end if 12: end if 13: ifx j is not classified to any cluster then 14: Classifyx j in cluster l; 15: end if 16: k ← k + 1; 17: end while

Phase 3: Extract Isolated Specular Paths and Outliers Using Channel Gain
The DBSCAN clustering has two drawbacks in our application: • The tensor ESPRIT channel estimator from Section 4 can generate estimates, whose 3D points (as obtained in Section 5.2.1) are still on or near the corresponding surfaces, but are far away from the cluster centers. Hence, they are informative for the SLAM algorithm, but are part of P N , so they are not clustered correctly. We have observed that the channel gains of these paths are very small. • The LOS path and specular paths from smooth surfaces are not part of any cluster, as such landmarks have one or few associated paths. We have observed that the channel gains of these paths are very large (approximately following the path loss models from Section 2.3).
To solve these problems, we use the estimated channel gains. We partition P N into two sets according to the channel gains.
where δ is a predefined threshold,B is the estimated clock bias, and c is the speed of light. The scaling with c(τ p −B) 4π λ is added to compensate for the path loss. For each of the high-gain paths in P N,H , we create a new cluster. For each of the low-gain paths in p ∈ P N,L , we find the nearest cluster l * as and add p to cluster P l * . The overall computations in channel parameter clustering consist of three parts, that is, the computations in each phase. Given the number of pathsP, the transition from 5D to 3D mapping requires O P computations. According to Reference [64], the complexity of the DBSCAN algorithm without the use of index structure for acceleration is O P 2 . The last phase requires O P computations. Therefore, the overall complexity is O P 2 .

Likelihood Function for SLAM
After clustering, the measurements Z k are grouped into clusters Z k = {Z 0 k , Z 1 k , . . . , ZÎ k −1 k }, using Algorithm 1, whereÎ k is the number of estimated clusters. The clustered measurements will be the input into the SLAM filter during the update process, via an appropriate likelihood function. For brevity, we omit the time index k again. We work with the compact representation of the landmark state from Section 2.2. Hence, our goal is to determine (Z i x LM , s, m), where x LM is the landmark position (i.e., the BS location or a virtual anchor location) and m ∈ {BS, SM, MR, VR} is the landmark state. This likelihood function will influence the probabilities of different associations between landmarks and clusters, and it will also influence the particle weight and the distribution of the landmark given a state s and the association.

Background
In previous 5G SLAM works, Reference [21] assumed there is at most one measurement from each source; the channel gain is not used, and the inter-path interference is not considered. The measurements are only generated by known multivariate Gaussian distributions, then the likelihood function falls into the likelihood of the signal measurements, which follows the Gaussian format. According to the PMBM formalism, multiple measurements per source should follow a Poisson or Bernoulli distribution. Under the standard point model, the likelihood function corresponds to a Bernoulli distribution, with [65] where p D is the detection probability. Under the extended target model, the likelihood function corresponds to a Poisson point process (PPP) ( [25], Equation (5)): where γ m ≥ 0 is the Poisson rate for surfaces of type m. For both (36) and (37), it has been proven that the PMBM density is a conjugate prior.

Likelihood Function
In this section, we describe the proposed likelihood function. We here assume that the measurements within each cluster Z i are independent, as diffuse points are generated independently. We also assume the number of paths Z i only depends on the source type m. It is important to note that for measurements from diffuse paths, a measurement is a function of a random incidence point on the surface. In order to express the measurements as a function of the compact state, we propose the following approach.
We first separate Z i into two parts, the path with the shortest delay {z i,0 }, which is the closest path to the specular component, and any remaining pathsZ i = {z i,1 , z i,2 , . . . , z i, Z i −1 }, which we view as diffuse multipath. We associate the deterministic incidence point x i,0 with the assumed specular component, which can be derived by (31) using x BS and x LM . Therefore, we can write p(z i,0 x LM , s, m) = p(z i,0 x i,0 , s, m), for z i,0 , which is in the desired form (16). Hence In other words, the object may be miss-detected with probability 1 − p D . If the object is detected, then the likelihood contains a contribution from the first arriving path (the specular path, which is directly related to the VA location or BS location) and from the set of all remaining paths. Note that as a special case for m = BS there can be at most one associated measurement, so that which is in the desired form. Hence, we can focus on the diffuse paths. As the measurements contain both a specular component and diffuse paths, the likelihood function does not follow the standard models (36)- (37). Nevertheless, we use the likelihood in the PMBM filter without any proof of conjugacy or optimality.

Likelihood for Diffuse Paths
We recall from Section 2.3 that diffuse multipath originates from random points x i,l on the surface and from Section 4 that the estimates points will differ from those from Section 2.3, due to the finite resolution of the receiver. Hence, it is fundamentally impossible to estimate the original random incidence points x i,l from the observationsZ i . To avoid estimating x i,l , we propose estimating an artificial incidence pointx i,l from z i,l>0 as the projection ofx i,l (obtained using the method in Section 5.2.1) onto the surface. The projection pointx i,l can be derived bỹ where x e = (x BS + x LM ) 2 is a point on the surface, and e = (x BS − x LM ) x BS − x LM is a normal to the surface. Then, p(z i,l x LM , s, m) can be expressed in the desired form p(z i,l x i,l , s, m), wherex i,l is the artificial incidence point that gave rise to measurement z i,l . The likelihood can be further simplified by noticing the error component ofx i,l , which is only due to the the projection distance ofx i,l to the surface. We therefore use this distance directly as a compressed measurement to replace the delay and anglesd Figure 5 shows the principle of calculatingd i,l (z i,l ).
surface path Figure 5. The principle of findingx i,l and calculatingd i,l using z i,l , x LM and x BS .
We thus have a general model for the likelihood function associated with the diffuse paths, where the cardinality distribution is arbitrary and (cf. (Reference [18], Equation (1)) in which all constituent distributions can be obtained from the simulation of a channel estimator or provided directly in closed-form by a channel estimator. In (42), the first term p( Z i m) represents the cardinality distribution, Z i ! is a normalization constant to ensure that (Z i x LM , s, m) integrates to one overZ i . The element inZ i are represented in compressed format through [d i,l (z i,l ),ĝ i,l ] and are modeled as independent and identically distributed.

Clustering Errors and Marginal Likelihood Function
Due to clustering errors and channel estimator errors, it is possible that the specular path is not associated with the set of the specular paths. Then the shortest path in Z i may be a diffuse path, leading to low likelihood p(z i,0 x LM , s, m), so that the SLAM method is likely to view the source of this measurement cluster as a new landmark, which is not desirable. In order to solve this problem, we introduce p m ∈ (0, 1], which is the probability that the first path in a cluster is in fact the specular path. This probability can be determined by observation of the overall clustering performance. By marginalizing over the event of missing the specular path in a cluster, the likelihood function can be written as p(z i,0 x LM , s, m) = p m p(z i,0 x LM , s, m) where the factor p( Z i + 1 m)( Z i + 1) p( Z i m) is due to there being an additional specular path, which was not present in the cluster. The complexity of the likelihood function is O Z i , as we consider ( Z i + 1) paths.

Results
In the implementation, all the codes are written in MATLAB, and the simulations and experiments are run on a MacBook Pro (15-inch, 2019) with a 2.6 GHz 6-Core Intel Core i7 processor and 16 Gb memory.

Simulation Parameters
We consider a scenario with a single vehicle with the initial state The vehicle moves around the BS with a known constant turn rate. Every time step, the BS sends 10 × 64 OFDM symbols to the vehicle with 200 subcarriers using the transmit power of 5.05 W at carrier frequency of 28 GHz. The subcarrier spacing is 0.5 MHz. The transmitted signals are disturbed by the noise with power spectral density of 4.0049 × 10 −9 mW/Hz. The transition function of the movement, the initial vehicle state, the landmark locations, the process noise, the initial prior, the survival probability, the detection probability, the birth rate, the clutter intensity, and pruning thresholds are the same as in Reference [23].

Channel Estimation Results
In this section, we show the results of the tensor ESPRIT channel estimator for a MR surface, which generated one specular path and 100 diffuse paths. To visualize the channel estimator output, we transform the estimated TOAs, AOAs, AODs into 3D points. The result is shown in Figure 6, where the original diffuse points are marked with blue dots and the estimated diffuse points with red discs. We observe that the channel estimator returns only 5 paths, much less than the original 101 paths, due to the finite resolution of the receiver. All projected points are close to the surface and there is a projected point very close to the deterministic reflection point. This is because the specular path of the MR has larger power than the other diffuse paths, so it is less affected by inter-path interference.

Clustering Performance Evaluation
To evaluate the clustering performance, we generate 40 snapshots of the channel with known landmarks. We then obtain the channel estimates with 5D geometric parameters, and transform them into 3D point positions. The DBSCAN clustering algorithm is applied to each snapshot, with = 30 m and N min = 2. Then we use the estimated channel gain to modify the results, with δ = 0.25. As a comparison, we also compare with other clustering methods, including the well-known K-means, gap statistics ('GS') [66], and affinity propagation ('AP') [63] on the same data set. For the K-means, we assume that the number of clusters, K, is known, while for the remaining methods this priori knowledge is not needed.
To evaluate the clustering performance, we consider the following performance metrics, which use as ground-truth hand-labeled data: the clustering accuracy (CA), which is the percentage of correctly clustered points, and the impurity (IMP), which is the percentage of points clustered into a wrong cluster where Γ is the set of all possible assignments; G = {G 1 , . . . , G N } is the ground truth clusters; C = {C 1 , . . . , CN} is the cluster results provided by the clustering algorithm;P is the number of all target points. The performance is presented in Table 2. The proposed clustering method based on DBSCAN provides the best clustering performance, with an average clustering accuracy over 99%. The modified DBSCAN goes through all noise points using channel gain and refines the results, so it performs slightly better. K-means achieves an average accuracy of 94.63%. GS-based clustering method has an average accuracy of only 68.64% since the estimated value of the number of clusters can be erroneous. The AP clustering method has an average accuracy of 84.35% since it still based on space partition and the estimated value of the number of clusters can be erroneous either. In terms of impurity, the last three algorithms cluster measurements from different sources together when errors occur, but for DBSCANs this never happens, so that all points in each cluster correspond to one landmark. By observing the clustering results of modified DBSCAN, losing the shortest path from the same landmark happens rarely, and we have estimated p m ≈ 1 200. Though the K-means algorithm shows good average clustering performance, it suffers from a few limitations as indicated in Section 5.1. This can be demonstrated in Figure 7, where we present the clustering results with DBSCAN and K-means with the data from the 33rd snapshot. We use different colors to indicate the different clusters obtained from the corresponding clustering methods. The proposed DBSCAN can correctly cluster the measurements into the right clusters shown in Figure 7a; however, K-means shows unstable clustering performance indicated in Figure 7b-d. This is because the clustering performance of K-means is highly dependent on the initialization of the partitions. In addition, we observe from Figure 7b,d that in K-means, the data points from the same cluster can be divided into separated clusters, and the points from different clusters can be clustered in a single cluster. All these mismatches will cause performance degradation in SLAM, which will be further demonstrated in Section 7.5.

Estimated Likelihoods
Given the channel estimation and the clustering performance, we now describe the obtained likelihood functions. For simplicity, we consider all dimensions independent and report the observed distributions of the number of paths and error of the specular path gain, delay and 4 angles. For diffuse paths, we report the distribution of the distancesd i,l and the channel gains. All these distributions are based on gathered channel estimation and clustering results in various vehicle locations. We use the setting in Section 7.1, and run the channel estimator many times, collect all channel estimation results and study statistics of collected data. Since the channel gains are strongly affected by path loss, (4)- (7) reduce the impact of the distance dependence by considerinǧ We first focus on the case m = MR as an example, and study p( Z i MR), p(cτ i,0 x LM , s, MR), p(φ i,0 el x LM , s, MR), p(ǧ i,0 x LM , s, MR), p(d i,l x LM , s, MR) and p(ǧ i,l>0 x LM , s, MR) in Figure 8. Figure 8a shows the histogram of the number of paths, as well as a Poisson approximation. We observe that there are always a (assumed) specular path and 1 to 5 diffuse paths, and the existing of more than one diffuse path follows a Poisson distribution. Figure 8b shows the histogram of the delay error of the first estimated path τ i,0 (i.e., subtracted with the delay of the specular path and multiple by the speed of light c) as well as a Gaussian fit. We observe that there is a shift of delay of 0.17 m, which is caused by inter-path interference. Figure 8c shows the histogram of the elevation AOD error of first estimated path φ i,0 el (i.e., subtracted with the angle of the specular path) as well as a Gaussian fit. We observe that inter-path interference does not influence the angle so much. Figure 8d shows the histogram and Gaussian fit of the distancesd i,l , l > 0. As the inter-path interference leads to a positive delay shift, the estimated diffuse points are more likely to be behind the surface. Finally, Figure 8e,f shows the histogram ofǧ i,0 andǧ i,l>0 as well as Gaussian fits. We observe that the specular path is stronger than diffuse paths, as the mean ofǧ i,0 is larger, and diffuse paths are influenced by inter-path interference more seriously, as the distribution ofǧ i,l>0 is more spread.     Table 3 provides a complete overview of the likelihood function for all landmark types. Based on all components of the likelihood functions for all landmarks, we have the following observations: for both cases m = BS and m = SM, there is always one path presents and all the angles, delays andǧ i,0 follow Gaussian distributions. For case m = MR, apart from the assumed specular path, there are 3 to 12 more paths present, and angles, delays,ǧ i,l andd i,l also follow Gaussian distributions. We also find that the smoother surface has a higher gain for the specular path and lower gain for the diffuse path.
However, since the clustering algorithm is not perfect, it is possible that the cluster associated with a landmark may have fewer or more paths than the ground truth cluster. For example, the cluster associated with the MR may only have a specular path, due to the clustering error. We therefore set those possibility none zero in p( Z i m) manually, in order to increase the tolerance of the wrong clustering, as shown in Table 3.  Firstly, we study the performance of the proposed 5G SLAM scheme in mapping, conditioned on the true vehicle state. We use the generalized optimal subpattern assignment (GOSPA) distance [67] as the metric to evaluate the mapping result. The GOSPA metric considers the estimation error, which is the error between the estimated positions with the real positions of landmarks, the miss-detection error, which is the error of miss-detecting the existing landmarks, and the false alarm error, which is the error of detecting non-existing landmarks. The GOSPA between the estimate landmark position setX = {x i }N i=1 and the real landmark position set X = {x j } N j=1 is defined as where Γ is the set of all possible assignments; N γ miss is the number of miss detection of assignment γ; N γ false is the number of false alarm of assignment γ. The cardinality penalty factor q a , the cut-off distance q c , and the exponent factor q p are set as 20, 2, and 2, respectively.
We make two comparisons: (i) we compare different clustering methods (SLAM filter using modified DBSCAN which is the proposed method, DBSCAN without channel gains, and K-means); (ii) we compare SLAM filters with different numbers of paths (using all paths in every signal cluster based on the proposed likelihood function, which is the proposed method; using all paths but without using channel gains [24]; using the single (specular) path in every cluster based on the proposed likelihood function; using the single (specular) path in every cluster but without using channel gains, which is the PMBM extension of Reference [23]). Figure 9 shows the outcome of the first comparison. We observe that the SLAM filter with K-means clustering performs poorly, even though the number of clusters is already known. Although the K-means performance in Table 2 seems good, grouping measurements from different sources leads to significant errors. The reason is that K-means only divides the space linearly and cannot handle complicated shapes. It often groups projected points from different sources into the same clusters; for example, at time step 17, it groups the measurements from BS and SM together. Modified DBSCAN and simple DBSCAN perform much better, as the blue and red lines are much lower, since both of them never group measurements from different sources together. The tensor ESPRIT channel estimator may create measurements that are far away from the specular path, and have low gains but still contain location information. Treating those outliers individually as single-element clusters in DBSCAN loses some information, and also forces the SLAM filter to create a new landmark, when similar outliers appear in successive time steps. This is apparent in the peak at time step 37-39 in Figure 9a, since there are some clutters pointing to a similar region between time step 36-39. With the help of the channel gains, the modified DBSCAN groups those measurements into the right clusters, which helps in mapping. That is the reason why the blue line is lower than the red line, for example at time step 2 in Figure 9c and time step 3 in Figure 9d.   Figure 10 shows the comparison of the GOSPA results among 4 different settings (with or without using diffuse multipath, with or without using channel gain). We observe that SLAM filter has better performance when using diffuse multipath, as the solid lines are lower than the dashed lines, and it can also help the system to identify the landmark types, as there are no sharp peaks in solid lines, which is because more paths provide more information. We also observe the gain can improve the performance, especially in helping the SLAM filter to recognize the surface type, as the blue lines are lower than the red lines. Without information from channel gains, the SLAM filter has trouble to distinguish SM and MR when only using the specular path, as indicated by the sharp peaks of red dashed lines in Figure 10b,c and the red solid line dropping more slowly than the blue solid line in Figure 10b. This is because the distributions of delays and angles of specular paths from SM and MR are very close to each other, and some disturbance may cause a wrong identification. The information from channel gains will alleviate this problem. However, the channel gain is usually disturbed by inter-path interference, which increases with the roughness of the surface. Therefore, the gain is more informative for paths from SM and MR, but less informative for paths from VR, so the blue dashed lines are much lower than the red dashed lines in Figure 10b,c, but only slightly lower than the red dashed line in Figure 10d.

Localization Performance
Next, we study the performance of the proposed 5G SLAM scheme in vehicle state estimation and compare the estimation results among the SLAM filter with different settings (with or without using diffuse multipath, with or without using channel gain). The landmarks are unknown and are mapped with tracking the vehicle state. We add [0.9 m, 0.9 m, 0m, 4.5 deg, 0m/s, 0rad/s, 3 ns] T bias to the initial state, and the initial covariance is set as diag([0.3 m, 0.3 m, 0 m, 0.1 rad, 0m/s, 0rad/s, 1 ns] 2 ). We use 2000 particles to represent the vehicle state, and obtain the mean absolute error (MAE) between the real vehicle state and the estimated vehicle state after time step 5 until the simulation ends at time step 40 as shown in Figure 11. The clock bias is multiplied with the speed of light c for evaluation. Overall, when using diffuse multipath, the SLAM filter has better performance in positioning, as MAEs are lower. The reason is that diffuse multipath provides more information than the single path. Using channel gain also improves the positioning performance, especially very helpful to estimate the clock bias, since gain contains information about propagation distance directly without being affected by the clock bias. all paths, with gains all paths, without gains specular path, with gains specular path, without gains Figure 11. Comparison of vehicle state estimation performance, considering the utilization of channel gains and different sets of estimated propagation paths.

Complexity Evaluation
We also evaluate the complexity of the proposed framework by measuring the execution time for each phase-channel estimation, clustering, mapping, SLAM (per particle), as shown in Figure 12. We measured a runtime per time step of 8.12 s for the channel estimation. The reason for taking a long time is that the ESPRIT channel estimator is based on a high dimensional tensor decomposition with high complexity. We measured the modified DBSCAN takes 0.21 ms per snapshot. For mapping, it takes 110 ms per time step. Although the proposed SLAM filter considers all possible data associations, as there are not too many landmarks under the simulation scenario, the number of all possible data associations is not large. However, the runtime will increase with more data associations. For SLAM, the proposed SLAM filter takes 149 ms per each particle (298.52 s in total) per time step. As 2000 particles are used and we need to fuse maps of all particles, the complexity is much higher than mapping, where the true vehicle state is given.

Conclusions
In this paper, we have treated the 5G SLAM problem from an end-to-end perspective, including downlink data transmission, channel estimation, clustering, and the SLAM filter. In the 5G SLAM problem, we aim to localize and synchronize a user while mapping the propagation environment, with the help of downlink signals from a single base station. We have proposed a novel method to cluster the MPCs by projecting the high-dimensional data into 3D points and then cluster the points based on the DBSCAN algorithm, which we augmented to account for the channel gains. We have also proposed a novel likelihood function in the 5G SLAM filter, which accounts for both the specular path as well as the diffuse multipath components.
Our results show that the ESPRIT channel estimator can estimate the channel parameters of both specular and diffuse multipath, and that the proposed system can directly use the raw un-clustered channel estimation results by applying the proposed clustering algorithms. With the help of the novel likelihood function, the proposed scheme can accurately estimate the number of landmarks, their types (i.e., roughness), and positions, and the channel gain is helpful in clustering and mapping and positing. The results also confirm that the proposed method can handle mapping and vehicle state estimation simultaneously, and highlight the benefit of considering both specular and diffuse multipath. In addition, the channel gains turn out the be highly informative for synchronizing the user to the base station.
The proposed framework has two computational bottleneck-the ESPRIT channel estimator and the particle filter used in the PMBM SLAM. In order to enable real-time execution, there is a need for faster solutions for both the channel estimation and the SLAM filter. The solutions could be either in the form of new algorithms, or by offloading the computation to more powerful edge computing systems, where edge servers can provide high-performance computing capability closer to end users [68][69][70].
where h indicates the hypotheses index [18], n is the number of potentially detected objects, and f h,j B (⋅) denotes the Bernoulli density of the object j under the global hypothesis h, and l h,j is its weight. Each Bernoulli density follows where r h,j is the existence probability and f h,j (⋅) is the state density. Then, (A2) can be parameterized by λ(x), and (A3) can be parameterized by {{l h,j , r h,j , f h,j (x)} j∈I h } h∈I , where I is the index set.
where p D indicates the detection probability.

(b)
Detections for the first time: Using the grouped measurement Z i k and the PPP parameter λ k k−1 (x), we newly generate the MBM parameters as where c(Z i k ) = δ( Z i k = 1)c(z i,0 k ).