Beyond Stochastic Gradient Descent for Matrix Completion Based Indoor Localization

: In this paper, we propose a high accuracy ﬁngerprint-based localization scheme for the Internet of Things (IoT). The proposed scheme employs mathematical concepts based on sparse representation and matrix completion theories. Speciﬁcally, the proposed indoor localization scheme is formulated as a simple optimization problem which enables efﬁcient and reliable algorithm implementations. Many approaches, like Nesterov accelerated gradient (Nesterov), Adaptative Moment Estimation (Adam), Adadelta, Root Mean Square Propagation (RMSProp) and Adaptative gradient (Adagrad), have been implemented and compared in terms of localization accuracy and complexity. Simulation results demonstrate that Adam outperforms all other algorithms in terms of localization accuracy and computational complexity.


Introduction
The concept of the Internet of Things (IoT), where objects, with their own identifiers, have the ability to transfer data over a network without requiring human interaction, is federating more and more interests nowadays [1,2]. In addition to communication technology, data management and data privacy and security, the development of smart applications is strongly related to the notion of physical location and positions [3][4][5]. Therefore, the infrastructure has to support finding things according to location taking mobility into account. Localization technologies will then play a crucial role for the future of IoT and may be directly embedded into the infrastructure or into "things".
The Global Positioning System (GPS) is the most popular localization solution allowing positioning, with a high accuracy, when at least four satellites are available [6]. In addition, a high signal quality from those satellites is necessary to perform localization and this is why such a solution cannot easily be deployed in an indoor environment. An alternative solution is to use the communication infrastructure to perform localization. Different methods have been proposed and studied. The radio fingerprinting is a well known localization technique which is organized in two steps [7,8]. In the first step, a radio map is constructed and, in the second step, online measurement is compared to the radio map [9]. The communication signal can also be used to perform the trilateration [10][11][12] where the distances between the target and reference positions can be estimated by using the Received Signal Strength Indicator (RSSI) [13]. The performance of such a localization technique depends on the number of detected reference positions.
In indoor environments, when the propagation conditions are very severe, the RSSI is often too weak to be correctly detected by the target receiver. So, only partial pairwise inter sensor nodes distances can be calculated from the measured RSSI values. To overcome this problem, we propose in this paper, to estimate the complete distance matrix using matrix completion algorithm. This approach aims to approximate the distance between the target and all the reference positions through the spatial correlation structure of the fingerprints. After the matrix completion, the localization can be performed using either matrix decomposition or fingerprinting [14][15][16][17]. In this paper, given a set of reference nodes and the RSSI information between sensors, we process sensor localization using the trilateration technique. To the best of our knowledge, it is the first time that matrix completion algorithms are combined with trilateration. So, we propose to improve the localization accuracy of the trilateration technique using a complete pairwise squared Euclidean distance matrix instead of only using a partial number of pairwise distances calculated from measured RSSI only.

Related Works
As mentioned in the previous section, it is difficult, from available RSSI measurements, to acquire all the pairwise distances between sensors and to obtain a full connected squared distance matrix so called the Euclidean distance matrix (EDM) [18]. This is due to the limitation of communication range and multipath effects. To provide a reliable distance information to each sensor node, the matrix completion is then proposed and its role is to recover the complete Euclidean distance matrix EDM from an incomplete matrix.
Several algorithms and approaches have been proposed and studied to approximate the missing distances for RSSI measurements based systems. Authors in [14] formulate the matrix completion problem using the least squared minimization. To solve the minimization problem, they introduce a modified iterative Newton's algorithm [19] to optimize the objective function. This solution can be very fast if the parameters are well chosen. The main drawback is that the result is very sensitive to initialization. It can be considered as a good initial location estimation for other fine localization algorithms. Other existing approaches based on Singular Value Thresholding (SVT) [20], Accelerated Proximal Gradient (APG) [21] and Augmented Lagrange Multiplier (ALM) [22] have been proposed in recent years. SVT method which relies on the Singular Value Decomposition (SVD) of a low rank matrix is associated to poor completion performance and slow convergence rate. Motivated by the above reasons, ALM and APG methods have been proposed. However, these methods depend on the choice of relevant parameters and are sensitive to noise. To overcome this problem, a regularized matrix completion model is proposed in [23] introducing the multivariate function Bregman divergence to solve the EDM problem. The major drawback of such a centralized method is its computational cost. The aforementioned methods are all based on RSSI. As an alternative to RSSI measurements, Nguyen et al. focus on connectivity/distance information setting a routing protocol in place [15,16]. They formulate the matrix completion problem as an unconstrained optimization in Smooth Riemannian manifold. Then, the nonlinear conjugate gradient algorithm is applied on the smooth Riemannian manifold. This approach recovers the EDM in noiseless and noisy environments. However, it focused only on two dimensional position information. Authors in [24] cover the unknown measurements under the existence of noise which is classified into three categories. This approach achieves a good localization accuracy compared to other existing ones. However, it is a centralized localization approach. Motivated by the above reasons, based on RSSI measurements, we solve the minimization problem using advanced algorithms to reduce the running time and the computational complexity while ensuring a good localization accuracy.
After completing the EDM, the second step is to estimate the location of sensors. For this, the completed pairwise distance information can be transformed into the estimated coordinates of sensor nodes by applying a factorization process [14]. This factorization requires that the matrix is Semi Definite Positive (SDP). The SDP problem can be efficiently and accurately solved by CVX toolbox on Matlab [25]. However, if the matrix is not SDP, the problem can also be solved by introducing Semidefinite Relaxations (SDR) [26,27]. The SDR based localization scheme in [26] relaxes the non convex localization into a convex one and marks a good localization accuracy. However, the computational complexity of SDR-based techniques is closely related to the problem size. Therefore, this algorithm is adapted for medium size network only due to the highly required running time. To reduce the running time, a weighted semidefinite relaxation localization method is carried out in [27]. It aims to improve the accuracy of localization, but it is still not suitable for dealing with large networks. A multiple sources localization problem is formulated in [28] as a unimodality constrained matrix factorization (UMF) and two rotation techniques have been developed to solve the problem. In [29], an eigendecomposition is first applied to find the local locations of sensor nodes, then a rotation matrix and a translation vector are used to transform them to true locations. Zhang et al. in [30] proposed to estimate the positions of nodes by the classic multidimensional scaling algorithm (MDS) [31] using a truncated eigendecomposition. However, the MDS method requires high recovery rate of the EDM which is not guaranteed when working in noisy environments. The localization can be also ensured by the fingerprinting technique. The principal drawback is that an offline radio map construction is needed. Furthermore, due to environment changes this radio map has to be frequently updated. In this paper, we propose to use the trilateration method. It can be applied if we have at least three pairwise known distances between the sensors if the localization is performed in 2D space. At each sensor node to be localized, data fusion is conducted by combining measurements from different nodes to estimate its location.

Contribution
The contribution of our work is threefold: • Improving the localization accuracy of trilateration technique: We develop high accuracy fingerprint based indoor localization scheme which is based on sparse representation and matrix completion theories. As said before, trilateration technique is based on pairwise distances between the target node and the anchor nodes. Due to the fact that propagation conditions are not optimal in an indoor environment, several pairwise distances cannot be measured. So, our main contribution is to enhance the localization's accuracy of trilateration by estimating all pairwise distances. • Matrix completion resolution: We formulate the indoor localization scheme as a simple optimization problem which enables efficient and reliable algorithm implementation.

•
Solving the proposed optimization problem: We develop closed-form algorithms, which can be reproduced by simple implementation, to solve the indoor localization problem. Specifically, we adopt recent methods like Nesterov accelerated gradient (Nesterov), adaptative Moment Estimation (Adam), Adadelta, Root Mean Square Propagation (RMSProp) and Adaptative gradient (Adagrad).
The remainder of this paper is organized as follows: In Section 2, we present the system model. In Section 3, details of our contribution and different algorithms are given. Obtained simulation results are presented and discussed in Section 4. The method verification using real measurements is detailed in Section 5. Finally, we conclude the paper in Section 6.
Notations: The following notations are used throughout our paper. (.) T is the transpose operation. ||.|| 2 , ||.|| F and ||.|| * denote the l 2 norm, the Frobenius norm and the nuclear norm, respectively. diag(A) returns a column vector of the diagonal elements of the matrix A. ∇( f (.)) is the gradient of function f . A (t) is the matrix obtained at iteration t. A · B is the product of matrices. A B is the Hadamard product of A and B. Matrices are presented by non italic bold capital letters. Furthermore, variables are presented by italic lowercase characters.

System Model
In a system where our approach is adopted, each sensor node uses the following 2 steps to find its position ( Figure 1).

•
Step 1: Refine and complete the squared distance matrix.

•
Step 2: Once the matrix is completed, the coordinates of the node can be retrieved using the classic trilateration process.
RSSI measurements corresponding to different sensor nodes are used to construct an RSSI matrix. From the RSSI values, the Euclidean Distance Matrix containing the distance information between each pair of sensor nodes X is built. Due to the limitation of radio communication range, the matrix of RSSI measurements is partially known. Thus, the matrix X is incomplete (only a small number of X entries are available) and can be affected by noise. This incomplete matrix can not efficiently serve for localization. It should be completed. Let us define the matrix X true which is the complete EDM. Therefore, the problem is how to recover the unknown elements of X giving a small number of known entries of X.
From RSSI to distance using a propagation model
Step 2 Step 1 Not detected node Assume that m is the number of nodes with known positions named 'Anchor nodes'. In contrast, (n − m) is the number of sensors with unknown positions. n is the total number of sensor nodes (anchor nodes and unknown nodes) placed in the indoor environment. U i is the ith unknown node where i = 1, 2, ..., (n − m). A j is the jth anchor node where j = 1, 2, ..., m. X is the (n × n) Euclidean distance matrix. It can be partitioned as follows: where X 22 is the distance sub matrix between each pair of anchor nodes. It is obtained by calculating the exact distance between each pair of anchors. d A i A j = ||C A i − C A j || 2 is the pairwise distance between anchor node i (A i ) and anchor node j (A j ). C A i ∈ R 3 are the location coordinates of anchor node i. X 11 is the (n − m) × (n − m) distance sub matrix between each pair of unknown nodes. X 12 and X 21 , where X 12 = X 21 T , are the distance sub matrices between each pair of anchors and unknown nodes. X 11 , X 12 and X 21 are obtained from RSSI measurements using the log normal shadowing propagation model which is used to express the pathloss measurements in dBs [32]: where pl 0 is the pathloss value at a reference distance d 0 , is the pathloss exponent, f is the used frequency, d is the distance between node i and node j. The used propagation parameters are defined later in Section 4. Since pairwise distances are dependant because of the dependency of RSSI in the indoor environments, this matrix should have low rank which motivates the use of matrix completion algorithms. After completing the Euclidean distance matrix, a trilateration process is adopted by each unknown node in order to estimate its location [10]. A reminder of this method is introduced below. A combination of estimated distances is required. In this combination process, we use only the distance estimated submatrixX 12 which contains the distances between the unknown nodes and all anchor nodes. The completion of the total matrix brings more distance information. Distances are estimated from each other due to the fact that columns are dependant (low rank matrix).
Let (x j ,ŷ j ) be the estimated coordinates of an unknown node; are the coordinates of the anchor nodes.
The estimated coordinates are calculated by the following equation where

Proposed Matrix Completion Based Localization
In this section, we formulate our proposed approach as a convex optimization problem which is resolved using the developed closed form algorithms via Gradient descent and its variants.

Problem Formulation
Our goal is to reconstruct the complete distance matrix from incomplete and noisy data. The problem of recovering a low rank matrix from a small number of known entries is known as: ω is the set of known entries. Due to the non convexity and non linearity of the rank matrix [33], the problem in Equation (1) cannot be solved numerically. Inspired by the theory of Compressed Sensing (CS), Candes and Recht proposed to replace the rank function in Equation (1) by the nuclear norm [34]. The model in (1) is reformulated into: ||X|| * is the sum of the singular values ofX (i.e., ||X|| * = ∑ n j=1 s j . WhereX = USV T ). Considering the assumption of low rank (r << n) and taking into account that observations are usually affected by noise, the model of matrix completion can be defined as: λ is a tunable parameter and ||.|| F is the Frobenius norm.
H is the matrix whose entries are: We denote the objective function as: where f (X) = ||H (X − X)|| 2 F and l(X) = ||(1 − H) X || * . The defined optimization problem can be solved efficiently by using iterative Gradient Descent method and its variants. The developed algorithm is summarized in Table 1. Where V (t) is the matrix used to update the distance matrix, index t refers to the number of update iteration. Many approaches, detailed in the next section, have been adopted to find the matrix update V (t) .

Matrix Completion: Optimization over GD and Its Variants
In the following, we review the different optimization methods that are widely used by the deep learning community to update the localization matrixX: Gradient descent (GD), Nesterov accelerated gradient (NAG), Adaptive Gradient (Adagrad), Root Mean Square Propagation (RMSProp), Adadelta and Adaptive Moment Estimation (Adam). We discard the class of algorithms that are computationally very expensive for high dimensional data sets, e.g., the second-order Newton's method [35].

Gradient Descent (GD)
Gradient descent is an iterative method that aims to find local minimum of differentiable cost functions [36]. It is the most common first-order optimization algorithm in machine learning and deep learning. GD is based on updating each element of matrixX (t) in the direction to optimize the objective function J(X (t) ). The new parameter V (t) can be adjusted as α is the learning rate from range (0, 1). ∇(J(X (t) )) is the gradient of the cost function with respect to the parameter matrix. It can be computed as follows: ∇(J(X (t) )) = ∇( f (X (t) )) + λ × ∇(l(X (t) )), (11) where ∇(l(X (t) )) is calculated as follows: We might need some sort of regularization because the inverse of the square root of (X (t) ) T ·X (t) may not exist, e.g., where is a regularization parameter and I(n × n) is the identity matrix. Then We obtain Using the classic Gradient descent, the known entries are very well estimated, this is why it will be used in the rest of the paper to estimate U (t) . To estimate W (t) , we propose to use the following algorithms.

Nesterov Accelerated Gradient (NAG)
A second commonly used variant is the acceleration of Nesterov, that has been published in [37] and fits in the same vein as the idea of the Momentum [38]. It has the same intuition using the gradient history but it calculates the gradient with respect to an approximate future values of our parameters instead of the current parameters. To update W (t) with NAG, we use the following equation. where And ∇(l(X (t) − µ × W (t−1) )) = ∇(l(X (t) − µ × (1 − H) (X (t−1) · ((X (t−1) ) T ·X (t−1) + × I) −0.5 ))), (19) Suppose that Then, We obtain The main advantage of the NAG method compared to the GD one is related to the fact that an anticipatory update prevents us from going too fast and results in increased responsiveness to the landscape of the loss function [36].

Adaptive Gradient (Adagrad)
Duchi et al. [35] introduced Adagrad algorithm in the context of projected gradient method. Adagrad aims to adapt the learning rate to the updated parameters, performing low learning rate (i.e., smaller updates) when the memory of squared gradients is high and larger updates conversely. Adagrad update rule is as follows: • We set E (t) to be the gradient of the objective function with respect to the parameterX (t) • we compute the memory of squared gradients over time as • we modify the general learning rate α at each time step t for every parameterX (t) based on the sum of the squares of the gradients that have been computed forX (t) up to time step t.
while is a regularizing term used to avoid division by zero. It is worth mentioning that we do not need to adjust the learning rate.

Root Mean Square Propagation (RMSProp)
Tieleman et al. [39] introduced this algorithm in 2012. It is described in the following 2 steps. Instead of using the memory of all squared gradients, RMSProp uses only recent past gradients computed in a restricted time.

•
We compute the local average of previous (E (t) ) 2 as • Then, we apply the update

Adadelta
Adadelta was introduced in 2012 by Zeiler [39]. It aims at circumventing Adagrad's weakness that consists in its aggressive decreasing learning rate caused by accumulating all past squared gradients in the denominator. Adadelta scales learning rate using only recent past gradients computed in a restricted time (i.e., not the whole history). In addition, Adadelta uses an accelerator term by taking into account past updates (as in Momentum). Adadelta update rule is as follows: • we compute gradient E (t) as in Equation (23). • we compute the local averageG (t) of previous (E (t) ) 2 • we compute new term accumulating prior updates ( Momentum : acceleration term) • Then, we apply the update

Adaptive Moment Estimation (Adam)
Another optimization method that computes adaptive learning rate for each parameter is introduced by Kingma and Ba in [40]. Adam uses the first and the second moments of the gradients and has strong similarities with Adadelta. Indeed, it uses the second gradient moment in the denominator and a momentum term. Adam update rule consists of the following steps.

•
Compute second gradient moment with local accumulation ( Adadelta/RMSProp) • Compute the first gradient moment • Compute bias-corrected first moment and second moment estimatê • Update parameters

Simulation Results and Discussion
In this section, we adjust by simulations different used parameters. The beneficial use of matrix completion in order to improve the localization accuracy is justified empirically. We also study the recovery performance, the localization accuracy and the computational complexity of each cited algorithm.

Determining the Best Ratio between the Number of Unknown Nodes and the Number of Anchors
It is reasonable to state that the completion of a high dimensional matrix improves the localization accuracy. This is due to the diversity of the information introduced by each sensor node. However, from a finite dimension, the algorithm converges and the use of additional data increases the complexity and the execution time. To ensure the best localization accuracy and decrease the execution time of the algorithm, we set up this ratio. Let where Nb Un is the number of unknown nodes and Nb An is the number of anchors. To determine the best ratio ϑ in terms of localization accuracy, simulations have been done. Using 35 unknown nodes, the number of anchors varies from 5 to 20. The localization error applying Adam respect to the number of anchors is shown in (Figure 2). When the number of anchors reaches 10, the localization error becomes almost stable. Figure 3 illustrates the localization error in meters depending on the number of unknown nodes using 10 anchor nodes. It can be observed that the localization error is almost the same from 35 unknown nodes. The trade-off between accuracy and complexity leads us to use a ratio ϑ equal to 3.5. We mention that simulations have been done for each optimization algorithm and for each value of sigma shadowing. The ratio guaranteeing the best performance is always 3.5. Thus, to simplify presentation and without loss of generality, we present only simulation results corresponding to Adam when working with sigma shadowing equal to 2.

Simulation Setup
We consider a wireless sensor network of 45 sensor nodes with 10 of them being anchors and 35 being unknown nodes, placed in an area of 400 m 2 (i.e., 20 m × 20 m) and its architecture is illustrated in Figure 4. The sensor nodes (anchors and unknown nodes) are randomly placed in the studied area.  The accuracy of the studied algorithms is investigated over many environment realizations. In order to simplify the presentation of the paper and without loss of generality, we present results only for one environment test, as illustrated in Figure 4. Simulation results are consistent with other environments. 10 RSSI measurements of pr ij received from sensor i (i = 1, 2, 3, ..., n) are taken at each position j (j = 1, 2, 3, ..., n) for each sigma shadowing value. This value is calculated in dBs as: where pe is the transmission power, a σ is a Gaussian random variable which describes the random shadowing effects, pl ij is the pathloss calculated using Equation (2). In this paper, we use parameters related to our laboratory: = 3.23, pe = 20 dBm, d 0 = 1 m, f = 2.4 GHz. The sigma of the random variable a takes the values 0, 2, 5 to study the effect of its variation on the algorithm recovery performance and localization accuracy.

Verification of the Low Rank Property
To apply the matrix completion technique to recover the EDM matrix from observed entries, the matrixX should have low rank r to ensure that the generated distances have a strong correlation. We can approximate unknown entries from known ones because they are dependant. To check whether the data matrixX has a good low-rank approximation, we apply the singular value decomposition [41]. An n × n matrixX, can be decomposed as:X where O is an n × n unitary matrix, Q is a n × n unitary matrix and F is a n × n diagonal matrix. The diagonal elements of F are the singular values ofX, they are organized in a decreasing order (i.e., F = diag(γ 1 , γ 2 , ..., γ r , 0..., 0)). r is the rank of the matrixX, it is equal to the number of its non zero singular values. IfX is a low rank matrix, its top l singular values occupy the total or nearly the total energy (i.e., [42]. The metric used to verify the property of low rank, is the fraction of the nuclear norm caught by the top l singular values. This fraction is defined as: Figure 5 illustrates the fraction of the nuclear norm captured by the top l singular values. We find that the top 3 singular values capture 90% of the nuclear norm. This indicates that the matrixX has a low rank approximation. So, we are able to apply matrix completion.

Recovery Performance and Localization Accuracy
For evaluating the studied solutions based on GD and its variants combined with the trilateration process, we define the following two metrics.
• EDM reconstruction error using the mean square error • Localization error As mentioned before,Ĉ is the matrix of estimated coordinates of unknown nodes and C are their real coordinates. We recall that parameters α, λ, , µ, ρ, β 1 and β 2 are adjusted by simulations, selected to ensure the best result in terms of localization mean error on a validation dataset. To find the best set of these parameters, an empirical process is conducted. Thus, several simulations are required to identify the optimal value of each parameter. The parameters used in the rest of the paper are presented in Table 2. The rest of algorithms includes GD, NAG, Adagrad, RMSProp and Adadelta.  To verify that trilateration guarantees better localization accuracy when more distance information is provided, we firstly apply the trilateration with observed distances only. We can easily notice that it introduces the worse localization accuracy compared to tested combinations in both noisy and noiseless environments (Figure 6b,d,f). The localization accuracy is much better when we use a complete EDM than using only the observed distances. Moreover, to apply the trilateration process, at least three detected anchors are needed. If this is not the case, the sensor node cannot be localized. This problem can be solved when using a compete EDM containing all pairwise distances. Therefore, the combination of matrix completion technique and trilateration is highly recommended.  In the first set of simulations, we investigate the performances, in terms of EDM reconstruction and localization error, of different cited algorithms in a noiseless environment as shown in Figure 6a,b. It shows the effectiveness of the location estimation of the proposed schemes for 10 simulations. Figure 6a illustrates the EDM reconstruction error 'MSE'. The error estimation of pairwise known distances is in the order of 10 −30 . The reconstruction error varies between 10 −3 and 10 −1 for different cited algorithms. The best EDM reconstruction rate is obtained by Adadelta which produces a localization mean error of 0.47 m. This result is close to those obtained by GD, NAG and Adam. The difference is the number of iterations required to reach the convergence of the algorithm. Adam converges at 790 iterations which is about the 1 18 of required iterations by Adadela. GD converges at 5910 iterations which is about the 0.41 of required iterations by Adadela and NAG converges at 3100 iterations.
The performances of GD and NAG in terms of localization accuracy are very close. So that, their cumulative distribution functions (CDF) are confused for each sigma shadowing value. However, it accelerates the convergence of GD. Instead of converging at 5910, it converges at 3100 when working in a noiseless indoor environment.
Adagrad exhibits the worst performances in terms of EDM reconstruction error and localization error. This is due to the fact that it accumulates the squared gradients in the denominator. So, the sum of positive terms keeps growing and the learning rate becomes very small, thus making the algorithm no longer able to ensure updates in order to reach a lower minimum. RMSProp, Adadelta and Adam aim to reduce decreasing learning rate.
Instead of accumulating all past squared gradients, RMSProp and Adadelta use a window of size (ρ) of accumulated past gradients. RMSProp improves a little bit the EDM reconstruction error (Figure 6a) and the localization error compared to those introduced by Adagrad in a noiseless environment. However, the result is still worse than those obtained by GD and NAG. The performances of RMSProp and Adagrad are quite close in a noisy environment. They converge to close minimums (Figure 6c,e) and their CDF are almost confused (Figure 6d,f).
Adadelta performs a bit better than GD and NAG for sigma shadowing = 0 and 2. However, it requires 2.4 times the number of iterations required by GD to converge and 4.5 times the number of iterations required by NAG to converge (in a noiseless environment) and it is more complex than GD and NAG, which increases the execution time of the algorithm. We notice that adadelta is more affected by noise than other algorithms. Its performances decrease more quickly than the others. In a noiseless indoor environment, Adadelta introduces the best localization accuracy. For sigma shadowing equal to 2, Adam performs better than it. Furthermore, for sigma shadowing equal to 5, Adam, GD and NAG are better than Adadelta in terms of EDM reconstruction error, localization accuracy and speed of convergence. The advantage of Adadelta is that we do not need to set a default learning rate since α has been eliminated from the update rule. However, this can have a negative effect as we cannot control the learning rate. To resolve this flaw, Adam is used.
Adam works well compared to other algorithms considering the compromise between the localization accuracy and the execution time. It requires the smallest number of iterations to converge. This is due to the fact that it uses an accelerator term by taking into account past updates. Furthermore, it achieves to the best localization mean error in a noisy environment. For sigma shadowing equal to 2, its localization mean error is 1.2 m and it reaches 2.6 m when sigma shadowing is equal to 5. As mentioned before, these results have been done on 10 simulations and the variance is about 0.1 m for each value of sigma shadowing. Therefore, Adam is the most adapted algorithm for indoor localization schemes.

Analytical Expressions
This section aims at approximating the theoretical complexity of the different studied algorithms. The complexity will be assessed by counting the number of multiplications per iteration and neglecting additions and subtractions. The square root and the power of elements are also neglected. For matrices multiplication, the used computational formula is the classic one and not Strassen formula [43]. To calculate the square power of a matrix, the binary method is considered [44]. The complexity of the negative square root of a matrix is obtained by calculating the inverse of the square root of this matrix. We calculate the complexity of the inverse of a matrix based on the Gauss method [43]. The complexity of the square root of a matrix is determined through the Denman-Beavers algorithm [45]. We define c GD , c N AG , c Adagrad , c RMSProp , c Adadelta and c Adam the computational complexity per iteration of GD, NAG, Adagrad, RMSProp, Adadelta and Adam, respectively. Furthermore, as defined before, n is the number of sensor nodes (anchors and unknown nodes).

Analysis
According to the closed-form expressions mentioned in Section 4.5.1, it is possible to numerically calculate the complexity of the different algorithms , since the value of n is known, as given in Table 3. The complexity of each iteration and the complexity of the algorithm are normalized with respect to the complexity of GD, in order to highlight the contribution of other cited algorithms compared to GD. The localization mean error in Table 3 corresponds to the best mean error registered by each algorithm at different values of sigma shadowing.
The best localization mean errors reached by GD and NAG are close. However, in terms of execution time, the GD wins. In terms of number of iterations required to converge, NAG converges fast compared to GD. However, the complexity taken by each iteration, which represents 1.9839 operations instead of 1 operation registered by GD. The worst localization mean error is introduced by Adagrad. It is also associated with the highest computational complexity. Its required number of operations per iteration is the best one compared to other used algorithms (except GD). However, the number of iterations to converge is too high, which increases significantly the complexity of the algorithm. The complexity of RMSProp to converge is around 0.4 of the one registered by Adagrad. However, it is still high and cannot be adapted in large networks or in real time localization systems.
The required number of operations per iteration for Adadelta and Adam is around 1.5 × c GD . However, the number of iterations required by Adadelta to converge is significantly higher than the number needed by Adam. For this, the execution time and the computational complexity are higher for Adadelta than Adam. Adam corresponds to the best complexity, it is 0.2 of the complexity reached by GD. Compared to other cited algorithms, it is associated to the lowest complexity. It does not require a high computational time, so, it can be adapted to real time localization systems. Thus, considering the trade-off between localization accuracy and computational complexity, Adam outperforms other optimization algorithms.

Method Verification Using Real Measurements
For method verification, we consider a classroom of our university as an indoor area of 88 m 2 illustrated in Figure 7. The used technology is LoRa which uses 868 MHz as frequency band. We placed eight sensor nodes which can transmit and receive messages. We collect RSSI data during an afternoon. Then, we construct the RSSI matrix, in which there are some missing values which correspond to unknown detected sensors. This matrix is used to obtain the partially known EDM matrix. We mention that the used technology and the number of sensor nodes explored for experimental verification relates to the availability of such equipment in our laboratory. Thus, we do not consider the optimal conditions determined before. The aim is to validate the effectiveness of our method.  When applying our localization algorithm (Adam is used to estimate the complete EDM matrix) to localize each sensor node, we obtain a mean localization error which is 1.5 m. This error reaches 3.8 m when using the classic trilateration. In Figure 8, we present the CDF corresponding to the classic trilateration (without MC i.e., without matrix completion) and the CDF corresponding to our algorithm (with MC, i.e., with matrix completion).

Conclusions
In this paper, we aim to improve the localization accuracy of the trilateration technique based on pairwise distances between sensor nodes. For this, we explore the problem of missing data in sensor network localization. Technically, we formulate the information distance from RSSI measurements using a propagation model. We obtain a squared Euclidean distance matrix with several unknown entries. Then, based on the matrix completion process, the complete EDM is efficiently generated by exploring only the available distances. For this, Gradient descent based advanced methods, some of which use an adaptative learning rate, are introduced. To perform trilateration, all pairwise distances between the node to localize and different sensor nodes are explored. To validate the merits of the proposed framework, an extensive set of simulations was carried out in noiseless and noisy environments using a real propagation model. Simulation results suggest that the proposed Gradient descent variants based matrix completion reliably estimate the complete EDM exploring partial known information. Trilateration combined with matrix completion outperforms the traditional localization system. Our simulation results indicate that trilateration combined with Adam which is used to solve the matrix completion problem outperforms the other combinations. This approach does not require a high computational complexity. Real experiments have been done in order to support simulation results and show that our approach outperforms the classic trilateration.
Since there are pairwise distances that are better estimated than others, we are interested in using weighted distances when applying trilateration in future work. The accorded weight increases when the distance is well estimated. New localization system improvements in terms of complexity can be provided using machine learning methods, since these aim to shift the prediction complexity to an offline phase.