EM Model-Based Device-Free Localization of Multiple Bodies †

In this paper, we discuss the problem of device-free localization and tracking, considering multiple bodies moving inside an area monitored by a wireless network. The presence and motion of non-instrumented subjects leave a specific footprint on the received Radio-Frequency (RF) signals by affecting the Received Signal Strength (RSS) in a way that strongly depends on people location. The paper targets specifically the modelling of the effects on the electromagnetic (EM) field, and the related inference methods. A multiple-body diffraction model is exploited to predict the impact of these bodies on the RSS field, i.e., the multi-body-induced shadowing, in the form of an extra attenuation w.r.t. the reference scenario where no targets are inside the monitored area. Unlike almost all methods available in the literature, that assume multi-body-induced shadowing to sum linearly with the number of people co-present in the monitored area, the proposed model describes also the EM effects caused by their mutual interactions. As a relevant case study, the proposed EM model is exploited to predict and evaluate the effects due to two co-located bodies inside the monitored area. The proposed real-time localization and tracking method, exploiting both average and deviation of the RSS perturbations due to the two subjects, is compared against others techniques available in the literature. Finally, some results, based on experimental RF data collected in a representative indoor environment, are presented and discussed.


Introduction
Recent research activities have shown that electromagnetic (EM) fields used for data communication can be exploited as signals of opportunity for device-free (i.e., passive) environmental radio vision [1][2][3]. Actually, the first studies concerning the effects of obstacles on the radio propagation date back to the 20s of the last century [4] at the dawn of the radio era. Since then, the works presenting and discussing the effects of the human bodies on the Radio-Frequency (RF) propagation became countless [5][6][7]. However, these research activities focused on the impairments that affect the radio communication and how to mitigate these negative effects rather than exploiting them for environmental sensing.
More recently, the first experiments [8,9] about the usage of body-induced perturbations to perform radio detection and localization have revealed that body motions leave a characteristic footprint on the Channel Quality Information (CQI) [1,10,11] or the Received Signal Strength (RSS). In fact, the received power, expressed in terms of PHY layer CQI info or MAC layer RSS values, can be exploited by networked radio devices for body detection and localization tasks. CQI/RSS perturbations and variations produced by bodies (i.e., the targets) on the RF signals are processed to extract an image of the environment and of the bodies that have induced these perturbation.
Despite the extensive research activities on DFL, only a few systems have been discussed in the literature for multi-targets localization [31][32][33][34][35]. The RTI method [32], based on RSS attenuation, models the multi-target RSS footprint as the linear superposition of the perturbations induced by each individual. The localization accuracy can be further improved by the simultaneous processing of RSS measurements from multiple RF channels [33]. In [24], a linear relationship between the RSS fluctuations and the number of targets in the area has been derived experimentally. A successive-cancellation algorithm has been proposed in [34] to address the problem of detection/tracking of new subjects entering/moving in the monitored area. On the other hand, RSS-based compressive sensing methods have been employed in [35] to reduce the device power consumption, still providing a reasonable localization accuracy.
The main limit of the above-mentioned methods is the linear model assumption that was drawn to be unreliable in [31] for an increasing number of targets or closely located subjects. Accurate multi-target DFL requires, in fact, to deal with simple but realistic EMbased models to account for multi-body shadowing interactions. Conventional algorithms based on fingerprinting [14], radio tomographic imaging [10], ray-tracing [36], or ad-hoc statistical [11] techniques are not suited, mostly due to an unfeasible computational burden added to include all target combinations. In addition, the linear model based on the superposition assumption becomes unrealistic for modeling the movements of crowds as shown in [31,37]. In fact, the analytical approaches, based on propagation modeling, focused on the single-target case only [38][39][40].
Starting from recent studies on single-target [40][41][42] and dual-target [37] humaninduced fading modeling, in this paper, we propose a physical-statistical model based on the scalar diffraction theory to predict the effects [37] of two co-located targets (see Figure 1) on both RSS mean and variance. Unlike [31,40], the proposed physical-statistical model does not rely on the paraxial approximation [41] and, therefore, can be used for targets of any size placed anywhere in the monitored area and not only for small targets near the center of the link. In addition, we exploit the proposed physical-statistical model to define a novel Joint Maximum Likelihood (JML) approach that supersedes the method in [31] and a novel Bayesian dual-target DFL algorithm, that is based on the Particle Filter (PF) approach. The PF method is here validated by extensive field measurement data collected by an ad-hoc IEEE 802.15.4 wireless network. With respect to [31], the proposed Bayesian PF algorithm exploits a full two-dimensional (2D) motion model, based on both position and velocity state information, to track two independently moving targets. Finally, the localization performances of the PF method are compared against the JML [31], the successive cancellation (SC) [31], the Maximum Likelihood RTI (RTI-ML) [31] and the Least Squares RTI (RTI-LS) [10] approaches on a real indoor scenario similar to the one shown in Figure 1. The key topic of this paper is the use and evaluation of an EM multi-body model where multipath effects are included as log-normal disturbances. Usually, EM methods are complex and time-consuming. On the other hand, the model proposed here is simple enough to be employed for real-time DFL methods and requires only a very short calibration phase. This is relevant for practical applications, as most of Machine Learning/Deep Learning (ML/DL) methods (not model-based) require a huge amount of data and a long learning phase to extract RSS-position information in scenarios with multiple targets. For details about ML/DL methods, the interested reader can take a look at [13,25,26] and, for the statistical models, at the references included in the cited surveys [2,3,6].
To summarize, the original contributions of this paper are as follows: (i) a novel dualtarget EM model that accounts for small voluntary/involuntary movements of two human bodies and underpins a Bayesian framework for full dual-target localization and tracking; (ii) a procedure for the calibration of the model parameters based on field data, and the evaluation of the dual-target perturbation maps (RSS mean and standard deviations); (iii) a link selection procedure for dropping out unreliable link measurements; (iv) the design of a new family of non-Bayesian (JML, SC and ML-RTI) and Bayesian (PF) methods that exploit the new model for multi-target localization and take advantage of both average and fluctuation of RSS readings to increase the DFL accuracy; and (v) the validation and comparison of the proposed model-based methods in real indoor scenarios.
The paper is organized as follows: the problem formulation is shown in Section 2 where the Bayesian PF framework is introduced to estimate and track two targets by using a 2D motion model. The EM model for the prediction of the multiple bodies-induced shadowing is recalled and discussed in Section 3. As a practical case study, we address the problem of RSS modeling for two targets moving in a monitored area. Section 4 deals with the proposed model-based localization algorithm and its comparison with other DFL methods, used as benchmarks. Experimental results obtained in an indoor environment are shown and discussed in Section 5. Finally, some concluding remarks are drawn in Section 6.

RSS Model
We consider here the localization of K human targets having positions x k = [x k,1 , x k,2 ] T , k = 1, . . . , K, in a 2D area X covered by a wireless network composed of N anchor nodes with L ≤ N(N − 1) radio links. The network is characterized by an undirect graph with links collected in the set L = { : = 1, . . . , L}. The network layout for the experimental tests is sketched in Figure 1 (top): the wireless network, composed by N = 18 fully connected anchor nodes, covers the area X having size 4 × 5 sqm where K = 2 targets can move independently. Being s the RSS in logarithmic scale (i.e., dBm) over the link ∈ L of length d , the objective of the DFL network is the estimation of the K target positions, arranged in the column vector x = [x T 1 · · · x T K ] T of size 2K × 1, by exploiting the column vector s = [s 1 · · · s L ] T of size L × 1 that collects the RSS observations of all links L over the time t.
According to the experimental results [1,14,40] carried out on human-induced fading in a single-target scenario (i.e., K = 1), s can be reasonably modelled as Gaussian distributed. In fact, even if other parametric distributions, e.g., Weibull and Nagakami [43], provide better results, the Gaussian approximation is widely used and provides results in good agreement with RF measures. Extending the Gaussian model to K > 1 targets, we assume that the RSS s depends on the number K (x) ≤ K of the targets that are inside the link sensitivity area X and on their specific locations x. For further details about the link sensitivity area, the reader can refer to [40,41]. An example is sketched in Figure 1 where the 1 -th link identified by the node couple (6,14) is characterized by K 1 (x) = 1 since only the Target #2 falls within the link sensitivity area X (highlighted in red in the aforementioned figure). Similar considerations apply also for the link (9,18), where it is K (x) = 2 since both Target #1 and Target #2 are placed inside the link sensitivity area (highlighted in black). In general, it is: If no target is in the link sensitivity area i.e., K (x) = 0, then the RSS s has a deterministic mean h 0, that accounts for path-loss and static effects due to static obstructing or scattering objects. The random term w 0, ∼ N (0, σ 2 0, ) models the measurement errors and the other small power fluctuations induced by variations and/or changes in the surrounding environment as well. Likewise, when one or more targets are inside the -th link sensitivity area X i.e., K (x) > 0, the received power s shows an extra attenuation ∆h (x), w.r.t. the empty scenario term h 0, , due to the obstruction(s) generated by the presence of the target(s). In addition, s shows an increased fluctuation ∆σ (x) ≥ 0 w.r.t. the reference scenario term σ 0, due to small movements of the target(s) around the nominal position x, such as turning, change of posture, and arm and/or torso movements. Thus, the mean RSS becomes: while the noise term w 1, (x) ∼ N (0, σ 2 1, (x)), that models the random shadowing effects, is given by: The validity of (1) has been demonstrated in several experimental tests on the singletarget case [14,40,44]: both the average ∆h (x) and the standard-deviation ∆σ (x) perturbations increase when the target is obstructing the Line-Of-Sight (LOS), especially if the target is very close to the transmitter or the receiver. Likewise, in the multi-target scenario, perturbations tend to increase with the number of targets [31,37], too. For in-stance, Nicoli et al. [31] shows that, for a link length d = 4 m and for targets placed along the LOS path in selected landmark positions at distance of 0.5 m from the nearest one(s), the RSS mean h 1, (x) decreases of about 3.5 dB w.r.t. the empty scenario while the standard deviation σ 1, (x) increases of about 3 dB w.r.t. the empty scenario when the number of targets raises from K = 1 up to K = 7.
From the above considerations, for each -th link sensitivity area X , both RSS average h 0, , h 1, (x) and standard deviation σ 0, , σ 1, (x) terms of (1) provide important information about the number of targets in the link area and in turn, by combining all L link information, on the target locations x.
To exploit the targets footprint on both RSS average and variance terms, the knowledge of the reference maps {h 0, , σ 0, } for the empty scenario, and the perturbation maps {∆h (x), ∆σ (x)} is required for all links ∈ L, for all landmark positions, and for all targets inside the monitored area X = ∪ X . The reference maps {h 0, , σ 0, } can be easily measured when no target is moving inside the monitored area X . For the single-target scenarios, fingerprinting maps are usually obtained by collecting RSS samples over each link while a person only is subsequently positioned over M ≥ K landmark points {b m } M m=1 of a 2D grid covering the monitored area. However, for K > 1, the evaluation of the maps {∆h (x), ∆σ (x)} is more critical, as it requires time-consuming ray-tracing simulations or extensive fingerprinting acquisition campaigns over different target-landmarks configurations. Given the number M 1 of landmarks, the complexity becomes easily cumbersome even for a small number K > 1 of targets present in the link sensitivity area, as the number of different targets-landmarks configurations to be explored is in the order of O M K . Therefore, modeling is mandatory to simplify the calibration process.
In Rampa et al. [40,41], a closed-form analytical model has been derived for K = 1 based on the diffraction theory, relating both the RSS average and standard deviation to the target location. The diffraction models were shown to fit reasonably well the attenuation effects up to K = 2 targets [31,37], accounting for the joint interactions between targets. Here, we propose a novel physical-statistical model that extends Nicoli et al. [31] by exploiting a full EM model without any paraxial approximations, and Rampa et al. [37] by including small voluntary/involuntary movements of two human bodies around their nominal positions. Although not considered here, the proposed approach can be also exploited to predict the effects of a larger number of targets i.e., for K ≥ 3.
To validate the model, in the experimental Section 5, we consider the RSS maps built by measurement field tests using the conventional fingerprinting approach [14,31]. We compare the new model against the additive perturbation maps [10,33] based on the linear superposition approximation that simplifies processing at the cost of reducing the modeling accuracy. In this case, the multi-target effects are approximated as the linear superposition of single-target terms as: and using the single-target maps {∆h (x k ), ∆σ (x k )} predicted, or measured, assuming the presence of a single target only, located at position x k .

EM-Based Model of Dual-Target RSS Perturbations
In this section, we focus on a single-link scenario [31,37] with K = 2 three-dimensional (3D) objects modeling the human targets, placed near the LOS path connecting the transmitter (TX) and the receiver (RX). The geometrical description of this scenario in the 3D space of coordinates (u 1 , u 2 , u 3 ) is shown in Figure 2, where the TX is placed in the origin O = (0, 0, 0), the RX in (d, 0, 0), and the link is horizontally placed (along the axis u 1 ) at height h from the floor. The floor (i.e., the plane u 3 = −h) does not influence the RF propagation but it is has only the purpose to support the bodies. Thus, the Fresnel's ellipsoid [40] not only has no contact with the floor, but also with ceiling, walls or other obstacles except for the two targets moving inside the link area. These assumptions introduce the geometrical constraint min(h, d w , d c ) where λ is the carrier wavelength, while d w and d c are the minimum distances of the link path from walls and ceiling. Each k-th 3D body is represented as an homogeneous and perfectly absorbing electromagnetic cylinder, with height H k and elliptical base with axes a k , b k . Since each body can move and rotate, it shows a rectangular cross-section in the plane orthogonal to the LOS path having height H k and traversal size c k for k = 1, 2 with b k ≤ c k ≤ a k . Since 3D objects are difficult to model (and time consuming, too), bodies are approximated [37,[40][41][42] as 2D knife-edge obstacles S k , having the barycenter C k that has footprint C k = x k in the horizontal 2D plane containing the LOS path. To model people that stand in specific positions but might change orientation or posture, we assume, as in the single-target case [41], that each object can rotate by an angle θ k around the vertical axis u θ k , i.e., the traversal size c k can arbitrarily change in the range b k ≤ c k ≤ a k . In addition, each object can also slightly change its position to x k + ∆x k w.r.t. the nominal position x k . The link length d is equal to d = d 1,R + d 2,1 + d T,2 where d 1,R and d T,2 are the distances of the knife-edges S 1 and S 2 from the RX and TX, respectively. The distance between the knife-edges S 1 and S 2 is equal to d 2,1 . The knife-edges are numbered in ascending order from the RX up to the TX (i.e., in Figure 2 S 1 is near the RX and S 2 near the TX, respectively).
According to Rampa et al. [37], the total electric field dE at the RX, due to the elementary areas dS k of both 2D knife-edges representing the targets, can be predicted by the forward propagation of the two virtual arrays of Huygens' sources located on the obstacle planes S k but not belonging to the obstacles S k themselves. However, unlike [31], that employs the paraxial approximation [40,44], the EM model adopted here refers to the full scheme presented in Rampa et al. [37]. Neglecting backward/multiple scattering effects between obstacles, and ignoring ground, walls and ceilings reflections, it is: where E 0 is the free-space electric field at the RX when no bodies are present in the link area (i.e., the reference scenario). The terms r 1,R and r 2,1 are the distances of the elementary surface dS 1 from the receiver RX and the elementary surface d 2 , respectively. Likewise, the term r T,2 is the distance of the elementary surface dS 2 from the transmitter TX. By mathematical manipulation of (6), the total received electric field E can be expressed as a function of the link geometry, the position and the size of the knife-edges: where the term E k : is the electric field at the RX when only the k-th target (i.e., k = 1 or k = 2) is present in the link sensitivity area [41] while the joint term E k,i models the interactions between the two knife-edges S k and S i with i = k: Being (6)-(9) based on a forward only method, they hold only in the domain D = (x k , y k ) ∈ R 2 : 0 < x k < d, −∞ < y k < +∞ . As shown in Rampa et al. [37,41] and sketched in Figure 1, the effect due to the target presence near the radio link practically vanishes for large but finite values of |y k |. For each -th link, according to Equations (1) and (7), it is: and where E θ,∆x [·] and S θ,∆x [·] are the Expectation and Standard deviation operators w.r.t. the random vectors θ = [θ 1 θ 2 ] T and ∆x = [∆x T 1 ∆x T 2 ] T , respectively. The terms ∆h C and ∆σ C include the multipath effects that are assumed constant in the link area and independent from the presence of the bodies.
Notice that: (i) if only the k-th body is present then, being E k,i = 0 and E i = E 0 since the i-th target is absent (i = k), then (7) reduces to the single-target case E = E k given in (8); (ii) it is easy to show that, if the knife-edges S k and S i have different size, then it is also E k,i = E i,k since the integration domains are different. It means that, if we exchange the position of the targets, then the extra attenuation results are changed as well. On the contrary, if the targets have equal dimensions, then it is impossible to distinguish them and the final result does not change. Finally, (iii) according to (7), the total extra attenuation 10 log 10 |E/E 0 | 2 w.r.t. the reference scenario, due to the combined effects of both targets, does not simplify to the superposition of the single effects (8) of each target w.r.t. the reference scenario. Thus, it is: 10 log 10 |E/E 0 | 2 = 10 log 10 |E i /E 0 | 2 + 10 log 10 |E k /E 0 | 2 (12) and:

Multi-Target Localization Methods
We will present here a novel Bayesian method (PF) for multi-target localization based on the EM model presented in Section 3, along with a brief review of the non-Bayesian methods introduced in Nicoli et al. [31] for performance comparison. Since all localization algorithms exploit multi-link information, the equations shown in Section 3 for a single link will be used together by considering that all links lay in the horizontal plane having distance h w.r.t. the ground and each -th link has length d . Finally, the performances of the different methods will be compared in Section 5 by extensive on-field experiments. In the following sections, we will assume that the number of targets is known and equal to K = 2. Extension to detection and localization of K > 2 targets can be obtained by enlarging in the search for additional target positions, including states denoting the target absence, following the approach proposed in Savazzi et al. [14] for K = 1.

Dual-Target Particle Filter Estimation (PF)
Within the Bayesian framework, the RSS model (1) can be rewritten to highlight the time dependency of the overall set of observations s 1:t = [s 1 s 2 · · · s t−1 s t ] collected up to time t by the whole network. Since the targets move in the monitored area X changing their positions where the column vectors T of size L × 1 collect the empty-link mean reference and the multi-target induced deviations for all links at time t, respectively. The column vector w( corresponding to the log-normal shadowing effects (assumed mutually independent) having the covariance matrix Q( ). According to Section 3, Equation (7) depends on the parameter set Θ = {h, d, H 1 , H 2 , c 1 , c 2 , x t } ∈ T that allows us to compute all the map elements {∆h (x t ), ∆σ (x t )} by using (10) and (11) as functions of the targets positions x t when both ∆h C and ∆σ C terms are available (or evaluated during the calibration phase as shown in Rampa et al. [40]), and the geometrical sizes H k and c k of all targets are known or estimated (see Section 5.2). The other geometrical terms h and d can be easily measured for each link after the network deployment. It is worth mentioning that the RSS perturbations in (14) are set to ∆h (x t ) = 0 and ∆σ (x t ) = 0 outside the link sensitivity area [40] i.e., ∀x t : K (x t ) = 0. If the model (7) is not available, then the proposed method reduces to the Bayesian fingerprinting approach [14] by exploiting the landmark points {b m } M m=1 to evaluate for all L links the perturbation maps.
We exploit also the spatial continuity of the targets movements by assuming a 2D motion model to describe how the targets positions evolve over time. The constant velocity (CV) model, herein considered to describe the motion with nearly constant velocity of each target k ∈ {1, 2}, is represented by the first-order equation [45]: where the 4 × 1 vector: is the joint position-velocity state for the k-th target while F and L are 4 × 4 and 4 × 2 motion matrices defined as I 2 and 0 2 are the unitary and null matrix of size 2 × 2, respectively while T is the sampling time interval. The noise vector n (k) t of the driving process, having size 2 × 1, is assumed to be Gaussian distributed n (k) t ∼ N (0 2 , Q n ) with zero mean vector 0 2 and covariance matrix Q n = diag(σ 2 x , σ 2 y ). Globally, the joint position-velocity state of both targets is represented by the 8 × 1 vector: Assuming a first order Markov prediction, the PF algorithm combines the measurements collected at time t with the a-priori information obtained at time t − 1 to get the a-posteriori probability density function (pdf) [14] of the target location x t at time t. The apriori pdf p(x t | s 1:t−1 ) is obtained by the prediction step by using the a-posteriori pdf p(x t−1 | s 1:t−1 ) at time t − 1 and the transition pdf p(x t | x t−1 ) of the motion model according to the Chapman-Kolmogorov equation: The a-posteriori pdf p(x t | s 1:t ) is computed in the update step by using the Bayes theorem with the conditional probability p(s t | x t ) evaluated according to the perturbation maps and the a-priori pdf (20): Evaluation of the conditional probability p(s t | x t ) is performed as shown in (28) according to the perturbation maps obtained by exploiting the physical-statistical model (14) or the measurement one (1). Finally, the position estimate is computed as the Minimum Mean Squared Error (MMSE):x To avoid the computational complexity of the grid-based search in the detection area X , we adopted here the Particle Filtering method [46]. This is a Monte Carlo sequential approach, that represents the a-priori pdf with a discrete set of particles: where the N p particles, x t,n = [x T 1,n (t) · · · x T K,n (t)] T , describing the Cartesian coordinates {x t,n } N p n=1 , are independent identically distributed (i.i.d.) random variables with pdf x t ∼ p(x t | s 1:t−1 ). Equation (21) is rewritten as: The conditional pdf p(s t | x t ) is defined as: by normalizing the term q t,n = p(s t |x t =x t,n ) N p . Therefore, (24) simplifies to: The estimated locations of the subjects is found by applying the MMSE criterion as: At time step t + 1 a new set of particles must be generated {x t,n , q t,n } N p n=1 using the resample and propagation steps [14]. Finally, the initial probability p(x 0 | s 0 ) = p 0 (x 0 ) at time t = 0 can be chosen according to the available information about the targets position x 0 . For instance, if the target entrance points are constrained to be in G fixed positions is the Dirac delta function.

Joint-ML Estimation (JML)
Unlike PF, the joint multi-target ML estimation algorithm (JML) obtains the target positions in a snapshot manner asx t = arg max x t ∈X Λ(s t |x t ) using only the information available at time t. Given the Gaussian assumption for w(x t ), the log-likelihood function ln p(s t | x t ) is computed as: where |·| denotes the determinant and a 2 C = a T C a is the weighted squared norm of a vector a by the matrix C. The perturbation maps {∆h (x t ), ∆σ (x t )} L =1 can be model-based or measured in some landmark positions during the initial calibration phase according to the fingerprinting approach (see Section 4.1).

Successive Cancellation Estimation (SC)
The successive cancellation method is an iterative algorithm that estimates the position of one target at a time. Assuming the model (14) with K = 1 target in the monitored area, the SC algorithm estimates the position of the first target, namelyx 1,t , using the ML approach using only the single-target perturbation maps: The contribution of the first target estimatex 1,t is then taken into account for the estimation of the second target, as: where the likelihood Λ is evaluated assuming the perturbation maps ∆h (x 1,t , x 2,t ) = ∆h (x 1,t ) + ∆h (x 2,t |x 1,t ) and ∆σ 2 (x 1,t , x 2,t ) = ∆σ 2 (x 1,t ) + ∆σ 2 (x 2,t |x 1,t ), where the terms ∆h (x 2,t |x 1,t ) and ∆σ 2 (x 2,t |x 1,t ) denote the contributions of the second target when the first one is located inx 1,t . All these maps are computed using the diffraction-based model. The procedure is then repeated till the K-th user. It is apparent the linearity assumption adopted for the evaluation of ∆h (see the remarks in Section 3).

Radio Tomographic Imaging Estimation (RTI)
The tomographic imaging approach herein proposed extends the method [10] originally accounting for average RSS information only. The method derived below jointly accounts for the impact of the targets on the RSS average and RSS fluctuations (14) by estimating a motion image of the area X that captures the power variations w.r.t. the reference scenario. The monitored area X is divided into M voxels centered around the landmark locations {b m } M m=1 , while the estimated vector v = [v 1 · · · v M ] T ∈ V = {∀m = 1, ..., M : v m = 0, 1} of size M × 1 (i.e., the motion image) indicates whether a mov-ing target is observed, or not, in the m-th voxel as v m = 1 or v m = 0, respectively. For sparse motion and assuming linearity in the power measurements, the RSS terms (1) can be approximated as the sum of the contributions generated by all occupied voxels: where ∆h ,m = ∆h (b m ) is the extra attenuation due to a target that occupies the m-th voxel. The corresponding power fluctuation term is modeled as w ∼ N (0, σ 2 (v)) with σ (v) = σ 0, + ∆σ and ∆σ = ∑ M m=1 v m ∆σ 2 ,m where ∆σ ,m = ∆σ (b m ) is the contribution due to the target located in voxel m, i.e., x t = b m . Considering all links, Equation (14) can be rewritten at time t as: where the matrix ∆H = [∆h ,m ] of size L × M collects perturbations for all links and voxels, ) depending on the target locations. Unlike the least squares (LS) radio-tomographic approach (RTI-LS) that employs the closed-form LS estimationv t = ∆H † (s t − h 0 ) by using extra attenuation information only [10], the proposed RTI-ML estimate is obtained as: The targets positions are then estimated as the K voxels associated with the maximum values inv t . The perturbation maps {∆h (b m ), ∆σ (b m )} L =1 are measured in the landmark positions {b m } M m=1 during the initial calibration phase according to the fingerprinting approach.

Experimental Results
In this section, we describe the network setup used for indoor measurements during an experimental on-field campaign. Some indoor and outdoor measurements have been carried but we focus only on the indoor ones as they are the most challenging. We first discuss the dual-body model calibration scheme adopted to estimate the target parameters Θ that are employed in the PF localization algorithm. Then, we compare the results obtained with the proposed method against other model-based and non-model-based DFL algorithms.

Device Configuration and Network Setup
Experimental validation trials have been carried out in a large hall at DEIB, Politecnico di Milano, where two people (female, about 1.65 m, and 50 kg each) freely moves. A fully connected mesh network of N = 18 anchor nodes (and L = 306 links) was uniformly deployed along the perimeter of a 4 m ×5 m area, spaced 1 m apart and placed on stands h = 0.7 m high (see Figure 3). Each anchor node is equipped with a NXP JN5148 Systemon-chip (SoC) wireless micro-controller, that enables time-slotted transmissions within the 2.4 GHz band according to the IEEE 802.15.4 standard, and other ancillary interface components [47]. The RSS dynamic range of the SoC built-in receiver is equal to 75 dB with a minimum sensitivity of −95 dBm while the transmit power is about 0 dBm. All wireless nodes employ omnidirectional vertically-polarized antennas with 2 dBi gain. Node-related noise effects are in line with what reported in Chen and Terzis [48] for other IEEE 802.15.4-compliant devices. A modified MAC layer [1], defined on top of the beacon-enabled mode of the standard IEEE 802.15.4, allows time-slotted communications between nodes while a specific API (Application Programming Interface) has been designed to retrieve the RSS values from each anchor node. The Network Coordinator (NC), see Figure 1, creates a 60 ms custom periodic beacon frame that is transmitted at the beginning of each super-frame [14]. The beacon frame allows every wireless node to synchronize the RSS acquisition: therefore, the RSS values are synchronously sampled and time-stamped every 60 ms over all links. The anchor node 1 acts as a sink node (SN), collecting data from all nodes. Then, through a USB connection, the SN forwards the RSS information, to an external PC where RSS data and target movements are synchronized using a video camera. Finally, RSS values are processed and visualized.
The network layout adopted for double-target localization is sketched in Figure 3. In this picture, two targets follow the same trajectory (i.e., the light blue snake-like path) but in opposite directions: Target #1 enters from the bottom-right corner entrance located near nodes 8 and 9 while Target #2 gets into the area by accessing the top-left entrance located between nodes 17 and 18. Thus, the number of entrance points is G = 2. A regular landmark grid (i.e., the blue dots in Figure 3 [14] by averaging the RSS samples collected over a period of 20 s. During this period, the target is moving/wandering randomly around the nominal landmark position.

Dual-Body Model Calibration
Calibration of the dual-target diffraction model (7) requires the estimation of the pa-rametersΘ from the training RSS dataset. Here, the body-induced RSS perturbations have been measured on the link connecting nodes 6 and 14, namely the link 1 = (6, 14) sketched in Figure 1 Estimation of the EM model parameters for K = 2 targets requires a limited calibration phase that is performed on a few landmarks (M = 7), and not on all landmarks (M = 63) as needed by the full fingerprinting approach. Moreover, to reduce the calibration phase, both targets are assumed to be identical. Thus, the target size parameters are estimated using the LS approach applied to all 21 different dual-target combinations c(x 1 , x 2 ) instead of the full 42 combinations required using different size targets. It is: where ∆h (c(x 1 , x 2 )) are the measurements and E θ,∆x 10 log 10 |E(c(x 1 , x 2 ), Θ)/E 0 | 2 the values predicted by the EM model for the given dual-target combination c( x 1 , x 2 ).
Similarly to what done in Rampa et al. [41,49], for each spatial placement of the two targets, the RSS moments E θ,∆x [10 log 10 |E/E 0 | 2 ] and S θ,∆x [10 log 10 |E/E 0 | 2 ] have been evaluated by averaging over 10 4 realizations of uniform azimuth rotations θ ∼ U(−π, π) and uniform location uncertainties ∆x ∼ U(−B/2, +B/2) within a bin of size B × B with B = 20 cm centered around the nominal landmark points {b m } m∈M 6,14 . Both terms θ and ∆x simulate small voluntary/involuntary body movements ∆x around the landmark positions. According to (35), the diffraction-based parameter estimatesΘ, employed in the experimental tests of Section 5.4, correspond to two targets (female, about 1.65 m, and 50 kg each) having dimensions a 1 = a 2 = 41 cm, b 1 = b 2 = 12 cm, and H 1 = H 2 = 150 cm. With respect to the target(s) size, it has to be noted that both knife-edge height and traversal size are loosely related to the nominal physical size of the target(s) [37,41] as these optimized values refer only to the calibration procedure previously described.
Relevant multipath effects ∆h C , ∆σ C have been observed in the considered indoor environment, especially due to the vertical metal foils covering the pillars along the left and right sides of the room (i.e., near anchor nodes 3, 6, 14 and 17) sketched in Figure 3. These effects are ignored by the diffraction model (7), while they are accounted for in the fingerprinting (i.e., measured) maps as concern the single target impact (i.e., not the interaction between the two targets).
The examples of Figures 4 and 5 show the additive fingerprinting and the diffractionbased perturbation maps, respectively. For a selected link (between devices 7 and 17), the figures show the perturbation measured when Target #1 is placed in the position denoted by the green star, while Target #2 moves all over the grid points of the considered area. In these maps, each pixel indicates the value of the RSS attenuation and standard deviation due to the presence of the two targets. The results show that when Target #1 is far from the LOS link (green line), these values are very similar to those experienced when only Target #2 is in the LOS area [14]. On the other hand, higher variations are tested when target 1 is in LOS and the other one is in the sensitivity area of the link. In both cases, the figures show a clear sensitivity to the presence of both targets along the links. Furthermore, as shown also in Figure 5 of reference [31], the comparison between model (7) and measurements shows that the diffraction-based model better approximates the double-target shadowing with respect to the simplified additive approach, despite the mismodeling due to multipath effects that are not included in the diffraction model (7).
As mentioned before, the dual-body model calibration requires only a few measurements taken in a subset of the landmarks. It is worth noticing that the usage of the fingerprinting methods for the dual-target scenario would require the full M (M − 1) = 3906 target-landmark measurement combination set: this is impractical in almost all DFL scenarios. To solve this problem, we propose here to adopt a model-based approach within the PF framework.

Link Selection Procedure
We recall that the EM diffraction model (7) does not consider propagation phenomena such as reflections and refractions from the floor, ceiling, walls and other massive objects that are present in the monitored area. These effects are included as noisy contributions in the Gaussian terms of (1). Depending on link locations, these noisy effects are observable as introducing outliers in the RF measurements [14]. To handle these cases and avoid performance degradation due to model mismatching, we propose here a novel method for automatic detection of the unreliable links. To this aim, by exploiting the Gaussian assumptions, we define, for each m-th landmark, the following reliability metric η (b m ) > 0 as: Initially, for each m-th landmark, all L links are sorted in the ascending order {η (b m )} L =1 ; then a link is classified as unreliable (i.e., U ,m = 1) or reliable (i.e., U ,m = 0) for the m-th landmark by exploiting the definition U ,m = I(η (b m ) ≥ γ m ) where I(·) is the indicator function and γ m a location-dependent threshold. If a link is labelled as unreliable more than Ω times over the entire landmark set i.e., ∑ M m=1 U ,m > Ω, then it is discarded from the link set L. The remaining setL = L − : ∑ M m=1 U ,m > Ω is then exploited for the DFL algorithms described in Section 4. For each m-th landmark, the threshold γ m is computed as γ m = η¯ +1 (b m ) where¯ ∈ L is the smaller index such that: where δ is a chosen positive constant (0 < δ < 1) valid for all landmark points. The constants δ and Ω are scenario-dependent parameters: a small value of the parameter δ implies that only a few links may be discarded, while the opposite happens when δ is close to one. On the contrary, the parameter 1 < Ω ≤ M sets a threshold on the fraction (Ω/M) of measurement landmarks that must be considered not to discard a link. The example of Figure 6 for the PF algorithms of Section 4.1 shows the deleted links for the network layout of Figures 1 and 3. The color intensity quantifies the unreliability level of each deleted link, namely the number of times each link is discarded: strong colored links refer to links that have been discarded a number of times much greater than the threshold Ω = 25 while lightly colored links indicate that these links have been discarded only a few times over the threshold. This example assumes that the coefficient δ is set to δ = 0.1.
The link selection procedure has been adopted in the early evaluation stages during the algorithm testing since it became apparent that some links were not informative at all. According to the evaluation of the model-based PF algorithm shown in Section 5.4, the degraded RMSE in the unoptimized case without the link selection procedure, for the selected layout and trajectory considered in Figure 3, is in the order of 1.3 m-1.6 m depending on the filtering parameters.
The link selection procedure described here is based on an initial calibration phase that requires only a single-target covering M = 63 landmarks. It is worth noticing that the number of calibration measurements scales linearly with the landmark number M, in line with almost all single-target DFL approaches.

Algorithm Evaluation
As far as the comparison with other methods is concerned, it is not fair to directly compare single-target and multi-target methods due to the interfering effects of one target to the other ones(s) as well as the different layout and trajectory configurations adopted by the each work. In fact, as pointed out in Savazzi et al. [14] for single-target methods, the RMSE depends on the network layout and the specific trajectory adopted. In addition, as shown in [1,40] for the single-target cases, the location accuracy depends also on the number of anchor nodes and the geometry of the wireless links. Similar considerations can be also done for multi-target scenarios [31,37]. In addition to the mutual influence between targets, the target association problem is also more important in multi-target scenarios, as shown e.g., in Nicoli et al. [46], than in single-target ones. This means that the RMSE is not only location-and layout-dependent but its value depends also on the number of the tracked targets. Therefore, it is very difficult to directly compare different configurations, scenarios and number of targets.
For these reasons, in this section, we will compare the performances of the PF algorithm against other multi-target methods in the selected dual-target scenario. In addition, since the key topic of this paper is the proposal of an EM model for attenuation prediction, we will use and compare different types of perturbation maps. More specifically, in the calculation of the location estimates, we will exploit not only the Diffraction Model (DM) of Section 3 but also the Additive Fingerprinting (AF) perturbations maps {∆h (x t ), ∆σ (x t )} of Section 2 as benchmark. To improve the localization accuracy of the PF algorithm, we will also employ the link selection criterion based on link reliability that was previously presented in Section 5.3.
Since the DFL algorithms assume that each position can be visited by only one target at a time, we expect to observe a significant localization error for snapshot (non-Bayesian) methods, such as JML, SC, RTI-ML and RTI-LS, w.r.t. the PF method, when the two targets are close to each other and when they are in the central area where the two trajectories intersect. The RTI-LS method with fingerprinting is described in Wilson and Patwari [10]; it has been adapted here to embed the diffraction model (see Figure 7d).
We assume the following PF parameters: N p = 600, Ω = 25 unreliable links limit, δ = 0.1, and step time T = 360 ms. Moreover, the covariance matrix Q n of the motion model for both targets has standard deviation values set to σ x = σ y = 0.1 m/step 2 equivalent to σ x = σ y = 0.77 m/s 2 . The physical-statistical model parameters (see Section 5.2) are: bin size B = 20 cm, target dimensions a 1 = a 2 = 41 cm, b 1 = b 2 = 12 cm, H 1 = H 2 = 150 cm, and link height h = 70 cm. These values are summarized in Table 1. Assuming K known, the accuracy of the considered algorithms namely PF, JML, SC, RTI-LS and RT-ML is evaluated, using both AF and DM perturbation maps, in Table 2. The location accuracy is evaluated in terms of the location Root Mean Squared Error (RMSE) values: by averaging the location performances of each method over the M = 63 positions x k,m = b m of the considered trajectory and over the K = 2 targets. The k-th target position estimatê x k (t) =x k,m is computed by using the RSS observations collected over 4 s corresponding to the m-th landmark b m . The location accuracy of each method is also shown for each landmark step of the trajectory. For the simulation of the JML and PF methods based on fingerprinting maps, we used the measured single-target perturbations extended to the dual-target case by using the linear superposition model (4) and (5). Similarly, for the dual-target SC and RTI methods with fingerprinting, we used the single-target maps.
The accuracy of the multi-target localization methods is also assessed for each landmark steps of the target trajectories in Figure 7 where the considered algorithms JML, SC, RTI-ML and RT-LS are compared against the PF methods.  The comparisons of the accuracy of multi-target location for PF and the JML algorithms are assessed in Figure 7a. We see, in the first plot (left), that when we used the fingerprinting perturbation maps, the PF algorithm has a low constant error trend with respect to the JML one. In fact, the RMSE of the PF method averaged over the whole trajectory is half of the averaged JML RMSE. The second plot (right) shows the performance of JML and PF when the perturbation maps are given by a diffraction-based model. We also notice that the PF algorithm improves the localization performances with respect to the JML. Figure 7b,c show the performances of SC and RTI-ML with respect to PF, respectively. In these cases, the RMSE of the PF algorithm is lower than the ones of the other algorithms. The performances of the RTI-LS method, already available in the literature [10], are also shown in Figure 7d (left).
The JML, SC, RTI-ML and RTI-LS algorithms, that employ the analytical RSS model (right), have lower accuracy compared to the corresponding fingerprinting approach (left). However, the results of the PF algorithm with the diffraction-based model show that the localization accuracy is better, or similar, to the ones of the other fingerprintingbased methods. Finally, it has to be noted that the PF method based on the diffraction model does not require the full fingerprinting steps thus dramatically reducing the initial calibration phase. Figure 8 shows the trajectories of Target #1 vs. Target #2 for the PF method. It is apparent that in the turn areas there are sudden changes of speed that can introduce location errors. Moreover, in the central part of the layout there are larger errors due to the fact that the two trajectories intersect each other and the targets may become indistinguishable. In spite of the challenging indoor conditions with two close-by users walking over complex trajectories with several u-turns, the proposed EM-model assisted PF DFL is able to discriminate and reconstruct the two trajectories with an average accuracy of 0.61 m. This is obtained by calibrating only few parameters of the EM physical-statistical model without the need of an extensive (unfeasible with 2 targets) full fingerprinting calibration procedure. Finally, it has to be noted that the RMSE of the dual-target PF method with fingerprinting (i.e., 31 cm) is about 2-3 times the best single-target algorithms available in the literature [14,29,39]. The higher error is due to the complex tracing conditions with two nearby people and to the linear superposition assumption used to build the fingerprinting maps from single-target measurements. Model-based methods are the only viable way to solve the passive multi-target tracking problem, since the full fingerprinting approach cannot be used in practical multi-user scenarios due to the expensive calibration steps.

Conclusions
In this paper, we discuss the problem of passive localization and tracking of two moving targets in an area covered by a wireless mesh network. The key topic of the paper is the design and assessment of a dual-target physical-statistical model based on the diffraction theory to describe the fading induced by the two targets and infer their position by a Bayesian particle filter (PF) algorithm. In addition to the PF method, the successive cancellation (SC), the joint maximum likelihood (JML) and the radio tomographic maximum likelihood (RTI-ML) methods have been designed, implemented and compared to estimate the locations of the moving targets by exploiting both RSS average and variance information. Numerical results, obtained by an experimental indoor campaign, show that the proposed PF Bayesian algorithm with the diffraction-based model can achieve an average localization accuracy of about 0.6 m without requiring an extensive calibration phase as mandated by other multi-target fingerprinting methods. The proposed method proves to be a promising and viable solution for multi-target tracking in DFL systems. Funding: This research was partially supported by the CHIST-ERA III Grant RadioSense (Big Data and process modelling for the Smart Industry-BDSI) project. More details at: http://www.chistera. eu/projects/radiosense (accessed on 1 March 2021).

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: Data sharing not applicable.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: