The Time-of-Arrival Offset Estimation in Neural Network Atomic Denoising in Wireless Location

With the increasing demand for wireless location services, it is of great interest to reduce the deployment cost of positioning systems. For this reason, indoor positioning based on WiFi has attracted great attention. Compared with the received signal strength indicator (RSSI), channel state information (CSI) captures the radio propagation environment more accurately. However, it is necessary to take signal bandwidth, interferences, noises, and other factors into account for accurate CSI-based positioning. In this paper, we propose a novel dictionary filtering method that uses the direct weight determination method of a neural network to denoise the dictionary and uses compressive sensing (CS) to extract the channel impulse response (CIR). A high-precision time-of-arrival (TOA) is then estimated by peak search. A median value filtering algorithm is used to locate target devices based on the time-difference-of-arrival (TDOA) technique. We demonstrate the superior performance of the proposed scheme experimentally, using data collected with a WiFi positioning testbed. Compared with the fingerprint location method, the proposed location method does not require a site survey in advance and therefore enables a fast system deployment.


Introduction
Due to the emergence of 5G and 6G technology, people have higher and higher requirements for indoor positioning [1,2]. According to Market&Markets research data, the global market size related to indoor location is expected to reach USD 18.74 billion by 2025 [3]. The huge market demand has become a key driver of location-based applications in the future. However, despite the growing demand in a large market, it is not easy to provide a viable indoor positioning solution in many cases. One of the key challenges facing distance-based positioning systems is that the reflection and refraction of wireless signals in indoor environments may lead to a serious decrease in positioning accuracy.
There are currently two CSI-based positioning technologies, which are based on fingerprint and triangulation, respectively. The fingerprint-based localization method needs to collect CSI data at various locations of the deployment site in advance. The FILA approach proposed by Kaishun et al. [17] achieves a 50 percent accuracy at less than 0.6 m in the laboratory. The C-Map proposed by WEN et al. [18] achieves an average positioning error in the comprehensive environment of 0.97 m. CHAUR-HEH et al. [19] adopted a convolutional neural network to localization in a room, which performs the best, resulting in a maximal localization error of 0.92 m with a probability of 99.97%. YUAN et al. In [20], a multi-view discriminant learning approach was developed for indoor localization that exploits both the amplitude and the phase information of CSI to create feature images for each location, and the minimum distance errors for the laboratory and corridor experiments were 0.205 m and 0.109 m, respectively. In [21], the authors proposed to transform the measured data from CSI into images, extract their features, and use deep learning networks to estimate localization, and achieved an accuracy of more than 90% in laboratories. In [22], the authors proposed to transform the measured data of CSI into images and use the classification ability of the convolutional neural network (CNN) for localization, and 70% of the test cases had a localization error under 1.5 m. The fingerprinting-based approach achieves a relatively high positioning accuracy, but requires the deployment site to be surveyed in advance and whenever the environment is changed, which increases the operating costs of commercial applications.
Data fusion-based positioning can achieve a higher positioning accuracy. Zhao et al. [23] proposed a data fusion method of fingerprinting using RSS and CSI data from single access points, which can achieve a positioning accuracy of 1.79 m in a typical laboratory. Li et al. [24] proposed an enhanced particle-filter-based positioning method that combines CSI information and inertial sensor information to achieve an average accuracy of 1.3 m. However, such techniques require more data sources and therefore are not favorable for commercial applications.
Triangulation-based positioning uses angle and/or distance measurements to locate targets. Unlike fingerprinting-based approaches, it does not require data collection in advance, which brings great convenience to commercial deployment. The systems only needs to be updated when the anchor points are moved, which involves surveying the coordinates of anchors only. In [25], Manikanta et al. used multiple measurements of the angle-of-arrival to achieve positioning on standard WiFi equipment, and 60% of the localization error reached approximately one-meter in indoor office deployment. In [14], WiFi devices were located by using the round-trip delay and angle of arrival (AOA) measurements, which require changes to the firmware of target devices. The multiple signal classification (MUSIC) algorithm proposed by Schmidt et al. [26] had a good effect in terms of resolution, estimation accuracy, and stability under the condition of multiple antennas. ArrayTrack [27] proposed by Jie et al. is similar to [26], requiring eight antennas. This paper proposes a novel algorithm to estimate the TOA from CSI, which enables an accurate localization of standard WiFi devices. The main contributions of this work are as follows:

1.
A novel algorithm based on the CS is proposed to analyze the CSI and extract the CIR.

2.
A dictionary denoising method based on the direct weight determination of a neural network is derived for denoising CIR signals in compressed sensing. The sparsity of a dictionary is used as the sparse matrix of compressed sensing to estimate the TOA, which improves the accuracy of estimation.

3.
The proposed method is validated experimentally and shown to outperform conventional approaches based on MUSIC or estimating signal parameters via rotational invariance techniques (ESPRIT).
The remainder of this paper is organized as follows. The system model, including system architecture and signal structure, is presented in Section 2. The problem-solving process is presented in Section 3. We present the process of solving TOA by the signal reconstruction proposed. Experimental validation is provided in Section 4, followed by concluding remarks in Section 5.

System Model
In this section, we briefly introduce the localization testbed that we used, and then introduce the experiment data collected with the system.

System Architecture
In this paper, as shown in Figure 1, we used the WiFi-based wireless ad hoc system for positioning (WiFi-WASP) [2,28,29] developed by the Commonwealth Scientific and Industrial Research Organization (CSIRO) of Australia for our experiments. We installed six custom-built WiFi sniffers as anchors for localization in a conference hall and placed an access point (AP) approximately in the middle of the conference hall. The sniffer measures the time stamp of the communication between the sniffers and the AP to estimate the clock offset and offset of the system clock. The sniffers also measure the time stamp of the communication between the target and the AP to locate the target. A computer was placed in the conference hall to collect time stamps collected by sniffers and estimate the location of target devices by using the TDOA. The system is passive and does not interfere with existing standard WiFi systems. In WiFi-WASP systems, the sniffer clock is post-synchronized by estimating and compensating for clock skew and clock offset to estimate the arrival time of packets based on a common reference clock [30]. This indicates that the locations of the sniffers and the target device is s j = [x j , y j , z j ] T (j = 1, ..., N), where cv = [x, y, z] T is the coordinate value of the device and N is the number of sniffers. Each sniffer in the system has a local clock that is not synchronized with each other or with any external references. Assuming that the target device transmits a packet at t Tx , the arrival time of the corresponding radio signal measured by each sniffer is where α j and β j denote the clock skew and clock offset of sniffer j, d j |s j − cv| denotes the distance between the target device and sniffer j, c is the speed of light, δt j is the hardware delay (e.g., delay caused by radio frequency (RF) circuitry), and m j is the time measurement error. During system operation, the measured packet arrival time is corrected according to the relevant AGC settings. Since the values of α and ∆t are small, in order to directly subtract excessive hardware delay from r j , Equation (1) can be rewritten as By estimating the clock skew α j , clock offset β j , and the hardware delay ∆t j of sniffer j, as well as the measurement value of r j , the measurement value tr j of the synchronous arrival time of sniffer j (less than 0.5 s) in a small time can be estimated, which can be expressed as Suppose a standard WiFi device transmits two packets in succession at t n and t n+1 , respectively. α j can be written approximately as where j and k are different sniffers. By assigning one of the sniffers as the reference clock (i.e., α j = 0), the clock skews of other sniffers can be obtained from (4).
Since the locations of the sniffers are also known, the distance between the transmitter and the sniffer can be readily obtained. Thus, β j can be written approximately as where j and k represent different sniffers andα represents the clock skew estimated by (4).

Signal Structure
The WiFi-WASP system equipment used the orthogonal frequency-division multiplexing (OFDM) modulation scheme. Figure 2 illustrates the system model for obtaining the TOA calculation considered in this paper from OFDM [31]. At the receiving end, the analog data collected by the antenna went through the processes of the analog/digital converter, the serial transmission data were then converted to parallel data, circular prefixes were removed, discrete Fourier transform was conducted [32], and, finally, the frequency domain CSIs were obtained for localization. The CSI in the frequency domain is given by where f 0 is the central frequency and f represents the frequency space between adjacent subcarriers. α k and τ k denote the complex Gaussian channel coefficient and the time delay of the k-th signal propagation path, respectively. L is the number of multipath components.
Since the system adopts an 80 MHz bandwidth, there were 256 subcarriers in total, and the subcarrier space was 80 MHz/256 = 312.5 khz. The number of CSI values in the frequency domain was 245 because 11 subcarriers were unused. f denotes the center frequency of each subcarrier. We denote τ = {τ 0 , τ 1 , ..., τ L−1 } as the set of true L time delays. Multipath propagation characterized by the channel frequency response (CFR) in the frequency domain signals includes the amplitude-frequency response and phase-frequency response, while, in wireless channels, amplitude attenuation, phase offset, and time delay can be characterized by CIR. In the CS method, assuming that the sparse matrix is an identity matrix and there is only a rotation factor in the sparse matrix, the multiplication of the sparse matrix by the time domain signal is equivalent to performing the Fourier transform. The physical meaning of the sparse matrix is to transform the signal to be sampled into another domain, and, in this domain, the signal is sparse. Thus, we used the CS method to convert the CFR into the CIR in the time domain to estimate the TOA, and we obtained where a k and τ k denote the amplitude of the multipath component and the complex attenuation and propagation delay of the kth path, respectively. δ(τ) is the Dirac delta function and L is the number of multipath components, while 1 ≤ k ≤ L are in ascending order. In a line-of-sight (LOS) environment, the first peak of the impulse response is known to be a good choice for the TOA of the direct path.

Proposed Approach
In this section, the direct weight determination method based on a neural network is introduced to filter dictionary atoms, and the filtering effect is proven by using the quintic polynomial. In order to increase the resolution of TOA, an augmented matrix was added for compressed sensing, and the sparsity and coherence of the proposed method are proven. Figure 3 illustrates the steps taken to locate a target based on CSI collected by the considered WiFi positioning testbed. A Hamming window was first applied to the CSI to prevent frequency leakage. In order to estimate high-precision TOA, we used neural network weight determination method to filter dictionary atoms. The phase space reconstruction theory was used to reconstruct the input signal to address the problem of insufficient input training signals. The sparsity of dictionary was used as the sparse matrix of compressed sensing, and the resolution of TOA was improved by adding and expanding matrix. However, in a non-LOS or multipath environment, the first arrival path tends to decay significantly and its signal strength may be lower than the reflection path. A peak filtering algorithm was employed to improve the accuracy of estimated TOA. The system clocks among anchors were then synchronized and used to correct the measured TOA values. Finally, the target was located based on the differences between the TOA measured by different anchors.

Frequency Domain Windowing
In the process of digital signal processing, we can only transform the limited length of signal data, so it is necessary to carry out signal truncation. Even if it is a periodic signal, if the truncation time is not an integer multiple of the period, then the intercepted signal will have frequency leakage. To minimize this leakage error, we need to use a weighting function, also known as a window function. There are many kinds of window functions, such as rectangular window, Hann window, and Hamming window. Rectangle window only requires an accurate reading of the main lobe frequency, regardless of amplitude accuracy. Hann window is superior to rectangular window, but the main lobe of Hann window is widened, which is equivalent to the analysis bandwidth broadening, and the frequency resolution decreases. If the test signal has multiple frequency components, the spectrum performance is very complex, and the purpose of the test is more concerned with the frequency points rather than the size of the energy, so Hann window should be selected. Both Hamming window and Hann window are cosine windows, also known as improved ascending cosine windows, but the weighting coefficient is different, which makes the sidelobe smaller, and the sidelobe attenuation speed of the Hamming window is slower than that of Hann window. Due to the fact that the amplitude-frequency characteristic of Hamming window is that the side lobe attenuation is large [33], the peak attenuation of the main lobe and the first side lobe can reach 40 dB. Thus, we have where α 0 = 25/46, and this value is intended to generate zero-crossing at the frequency 5π/(L − 1). Since the sample point is 256, L = 256. Therefore, the CFR data after applying the Hamming window is expressed as where

Dictionary Atomic Filtering
In order to solve the problem of insufficient data of neural network training, the phase space reconstruction theory is proposed to reconstruct signal space. A direct weight determination method based on neural network is proposed to realize dictionary atomic filtering, and its effectiveness is verified.

The Phase Space Reconstructs the Input Signal
The theory of phase space reconstruction holds that the prime mover system model can be reconstructed from an observation of the system. The evolution of any component in the system is determined by the other components interacting with it, so the information of these related components implies that, in the development process of any component, a spatial system model of the original signal can be reconstructed with an observation of the system.
Due to the fact that each set of signals sampled is different, there are too little data to train the dictionary. According to the phase space reconstruction theory [34,35], for the noise signal F ∈ C K×1 sampled from WiFi-WASP system and through the Hamming window, it can be embedded into the attritor orbit matrix F in dimension K so as to solve the problem of insufficient training data F.
where F ∈ C N×K .

Dictionary Learning Algorithms
The dictionary learning model is widely concerned and has been successfully applied in various fields of signal processing, such as signal processing, image processing, image fusion, video key frame extraction, and so on. In the field of signal processing, sparse representation of signals is a simple and effective signal coding method, which describes signals by using linear combinations of as few basis vectors as possible in the basis space. The span of basis vectors is also called a dictionary, and the basis vectors in the corresponding space are atoms.
The dictionary learning algorithm is also very suitable for signal recovery and reconstruction and image noise and blur removal applications. For the dictionary learning algorithm of signal denoising, the trained data can be divided into two categories [36]. First, the method that uses data without noise for dictionary learning is called the external prior dictionary learning algorithm. Second, the method of using noisy data for dictionary learning is called the internal prior dictionary learning algorithm. The external prior dictionary learning algorithm is not adaptive to signal denoising, and it may not be able to reconstruct some fine-scale noise signals well. Compared with the external prior dictionary learning algorithm, the internal prior dictionary learning algorithm adopts the signal data containing noise as the training data; although it has good adaptability, the prior learning is greatly affected by noise. The denoising effect of using such a dictionary to denoise signals is inevitably lower than that of using a fixed atom dictionary. Therefore, reducing the atomic noise of a learning dictionary will definitely improve the denoising effect of signals. The main idea of dictionary learning is to use the dictionary matrix Ψ ∈ R N * K , which contains K atoms ψ k and a sparse linear representation of the original sample Y ∈ R N * L , where m represents the number of samples, n represents the attributes of samples, and, in the ideal case, Y = ΨX. This can be transformed into an optimization problem.
where X ∈ R K×L is the sparse decomposition coefficient of the signal, and X i is the row vector in the matrix. X i 0 represents the zero-order norm, which represents the number of non-zero numbers in a vector. T 0 represents the maximum sparsity of X i 0 . There are two optimization variables Ψ and X to solve this optimization problem. It is common to fix one optimization variable, optimize the other variable, and so on. The sparse matrix X in Formula (11) is solved by the least absolute shrinkage and selection operator classical algorithm.

Direct Weight Determination of Neural Networks
Neural networks with their massively parallel processing, distributed storage, and elastic topology, and other significant characteristics of highly redundant and nonlinear arithmetic, such as the control and nonlinear signal processing, pattern recognition, and robots, have been widely used in such fields as [37]. According to the model of the artificial neural network, the structure can be divided into a feedforward network (also known as a multilayer perceptron network) and the feedback network (also known as the Hopfield net). Among neural network models, the back propagation (BP) neural network model is one of the most widely used neural network models at present. Its core is the error back propagation algorithm, and its model is shown in Figure 4. The model shows a three-layer BP neural network model that includes an input layer, hidden layer, and output layer. The input layer contains N neurons, so the corresponding input I i = [x 1 , x 2 , · · · , x n , · · · , x N ] is an N dimension vector. There are L neurons in the hidden layer and the excitation function is denoted as f 1 . In addition, the expected output O i is an I dimensional vector and ω nl is the connection weight between the n neuron of the input layer and the L neuron of the hidden layer, whereas lj is used to represent the connection weight between the l neuron of the hidden layer and the j neuron of the output layer. ϑ l and θ j are used to represent the threshold value of the neuron at the hidden layer and the neuron at the output layer, respectively. ν 1 denotes the first l hidden layer neurons of the output, and y j represents the actual output of the j neuron in the output layer, for the sample output. According to the direction of negative gradient descent, the BP algorithm iteratively adjusts the weights and thresholds of the network to achieve the reduction in the error function value, and its weight ω nl and lj correction iterative formula can be described as where In the above formula, v l represents the output of the lth neuron, y j represents the output of the jth neuron, and E represents the neural network learning error function corresponding to the jth sample pair under the incremental processing mode. Their derivation formulas are as follows: The BP network has some problems, such as a slow convergence, local minimization, and fixed learning rate. To solve these problems, Zhang yu-nong et al. proposed an improved power-excited forward neural network model by using polynomial interpolation and approximation theory [38]. Different from the traditional idea of obtaining neural network weights through lengthy BP iterative training, the direct weight determination method can directly obtain the weights, greatly shortening the determination time of network weights. It overcomes the inherent defect that the traditional iterative training algorithm is susceptible to a dynamic learning process and that it is difficult to determine the optimal topology of the neural network. This model can still be regarded as a BP neural network model, as shown in Figure 5. In this figure, the number of hidden layer neurons is n, and the connection weight between hidden layer neurons and output layer neurons is denoted as j , j = 0, 1 · · · , n − 1, but the relation between the input xand the output y of the neural network can be obtained as y = 0 x 0 + 1 x 1 + 2 x 2 + · · · + n−1 x n−1 .
Output data In the figure, the sample set (x i , y i ), i = 1, 2, · · · , n is used to train the power-excited forward neural network, and the learning error function E is defined as . Therefore, the weight iterative formula can be expressed as: where j = [1, 2, · · · , n − 1]. Let = [ 0 , 1 , · · · , n−1 ] T ∈ R n and y = [y 0 , y 1 , · · · , y n−1 ] T ∈ R m , respectively, where the superscript T expresses the matrix transpose operation. Let X for Thus, Formula (15) can be expressed in matrix vector form as follows: For Formula (15), when the network training reaches the steady state, we have Then, there is −ηX T [Xw − y] = 0. The optimal weight of the power-excited neural network can be directly determined as by using the matrix pseudo-inverse idea as follows: (k) = (X T X) −1 X T y.
It can be seen from the above that the direct weight determination method can automatically, quickly, effectively, and accurately determine the optimal weight, so as to achieve network performance optimization.

Dictionary Atomic Denoising Verification
Regardless of using the recursive least squares dictionary learning algorithm (RLS-DLA), K-singular value decomposition (K-SVD)K-SVD, or other dictionary learning algorithms, as long as the training data contain noise, the resulting dictionary is bound to contain noise. In order to prove that the dictionary atom trained by RLS-DLA contains noise and verify the effect of dictionary learning by the direct weight determination method, we set the real quintic polynomial phase signal as the input signal, which can be expressed as where a(n) = 1. v(n) is Gaussian white noise, and the signal-to-noise ratio was set to 10 dB. In order to avoid the ambiguity of a i , it follows |a i | ≤ π m!n m−1 ∆ m , so we have a = (a 5 , a 4 , a 3 , a 2 , a 1 , a 0 ) = (4e − 10, 4e − 9, 2.75e − 6, 1.5e − 3, π/8, π/3).
According to the algorithm idea of a neural network, a dictionary learning algorithm for denoising polynomial phase signals based on the direct weight determination method is proposed. D ∈ R N×L initializes the dictionary. Each column in the dictionary represents an atom, and L represents the number of atoms in the dictionary. Since a redundant dictionary is used, L N. Y ∈ R N×K represents the training set, where K L; thus, the coefficient matrix W ∈ R L×K . Since the denoising signal itself is used as training data, the observation length of the signal needs to be considered. In the processing of polynomial phase signals, the signal length is not very long, and the number of trained signals K in the training signal set Y is much larger than the number of atoms in the redundant dictionary. Therefore, if the length of the object signal y(i)(i = 1, 2, · · · , M) is not long enough, in order to satisfy K L, the training signal set Y is reconstructed by referring to the phase space reconstruction theory, which can be expressed as At the same time, any observed signal in segment K is taken as the initial dictionary D, and the ith atom d i = [d i1 , d i2 , · · · , d iN ] T of dictionary D is obtained. The input excitation matrix X can be expressed as According to the neural network weight determination method, w = (X T X) −1 X T d i . Therefore, the predicted atomic value after filtering can be obtained as d * i = Xw, and the filtered dictionary can be expressed as . As shown in Figure 6, it can be seen from the figure that the real quintic polynomial phase signal with additive Gaussian white noise is a smooth curve after learning by the direct weight determination method. The dictionary atom after RLS-DLA learning obviously contains additive white Gaussian noise. Therefore, the dictionary composed of atoms learned by the direct weight determination method is denoised by sparse representation, and the denoising effect is very obvious.

Sparse Blind the TOA Offset Estimation
In this section, sparse dictionary learning is used to represent signals sparsely. Meanwhile, in order to improve the resolution of TOA, the signal aug-expanding matrix is added and its sparsity and coherence are proven.

Construction of Sparse Basis Matrix
Sparsity is the premise of the compressed sensing algorithm and the necessary condition for signals. However, most signals do not meet the conditions of sparsity or compressibility. In this case, signals need to be transformed into sparse signals in a certain transform domain, and the transformed signals become compressible signals. The signal (F) = [ς 1 , ς 2 , · · · , ς l , · · · , ς L ] T ∈ R sampled by us was a discrete signal in the frequency domain with dimension L. The dictionary Ψ estimated by the direct weight determination method was used to estimate the sparse representation coefficient α as the sparse matrix. The signal (F) can be expressed as where Ψ ∈ R L×L , and L = 256 is the number of dimensions of F of the sampled signal. α = [α 1 , α 1 , · · · , α l , · · · , α L ] T is the coefficient vector, which is another representation of F and is sparse on Ψ L×L .

Construction of Measurement Matrix
In the whole process of compressed sensing, the design of the measurement matrix is a key step. The properties of the measurement matrix are related to whether the compression can be achieved and whether the signal can be reconstructed accurately. How can an appropriate observation matrix be designed, which can not only achieve the purpose of compressed sampling but also ensure that the signal can be reconstructed without distortion? The restricted isometry property (RIP) can solve this problem in theory [39]. The random Gaussian measurement matrix is the most widely used in compressed sensing. We constructed a matrix Φ with the size of K * L, and each element in the matrix independently followed a Gaussian distribution with mean value υ = 0 and variance of σ = √ 1/K, which can be expressed as follows where L is the dimensions of the sampled data, and N is the normal distribution function. K is the number of CIR dimensions to be restored. The random Gaussian measurement matrix has a strong randomness. It can be proven that, when the measurement number K ≥ cM log(L/M) of the random Gaussian measurement matrix, RIP conditions will be met with a great probability. In general, there is M << K ≤ L, and c is a tiny constant. In the compressed sensing process, a random Gaussian measurement matrix is widely used mainly because it is not related to most orthogonal bases or orthogonal dictionaries, and the number of measurements required for accurate reconstruction is relatively small.

CIR Resolution Augmentation Matrix
In order to improve the time domain resolution and recover CIR to estimate a more accurate TOA, we constructed a matrix to extend the arrival delay τ [40], which can be expressed as where sinc is a filter constructed by the sinc function, often used in anti-aliasing techniques.

Sparse Recovery
The K * L dimension matrix formed by the product of measurement matrix Φ K×L and transformation basis Ψ L×L is called the perception matrix, and observes the signal and obtains the observation vector y = [y 1 , y 2 , · · · , y n , · · · , y N ] T ∈ R. The observed signal can be expressed as Meanwhile, in order to improve the resolution, we constructed a matrix S N×K to extend the arrival delay, so (27) can be expressed as In the context of CS, the CIR reconstruction problem is now formulated as where, for some small ε, · 0 denotes l 0 -norm, which counts the number of non-zero elements, and · 2 stands for l 2 -norm. Note that ε approximates σ 2 if we have enough data. However, l 0 -norm minimization is a complex combinatorial problem [41]. In order to improve the computational efficiency, it is usually extended to l 1 -norm minimization, because it can be solved by convex optimization [42,43]. Compared with the inverse Fourier transform, this CS-based approach enhances sparsity and effectively minimizes the pseudo-sidelobe.

CS with Complex-Valued Targets
The CS estimates a sparse signal vector in a probabilistic manner. It maximizes the posterior probability of the sparse signal vector at a given observed value. In the posterior probability maximization, the CS theory is applicable to the processing of real data, while CSI data are complex. In order to apply this theory to TOA estimation, observation data should be realizable in order to build an optimization model of a real number field [44]. For the application of CS in complex valued CIR estimation, Equation (28) is first extended to the equivalent real-valued form by where H ∈ R 2N×1 , Φ ∈ R 2K×2L , Ψ ∈ R 2L×2L , S ∈ R 2N×2K , and α ∈ R 2L×1 . This expansion increases the dimension of the observation matrix and separates the reconstruction of real and imaginary values of a channel magnitude in each delay tap.

Sparse and Coherence Proof
In the whole process of compressed sensing, the design of a measurement matrix is a key step. The properties of the measurement matrix are related to whether the compression can be achieved and whether the signal can be reconstructed accurately. For a given measured value y, the solution of Formula (28) is an underdetermined problem. Candes et al. proposed a solution: as long as the measurement matrix conforms to the RIP, a definite solution can be obtained; that is, the existence, accuracy, and uniqueness of the original signal reconstruction can be guaranteed. RIP is defined as: where δ is a constant and δ ∈ (0, 1). · 2 stands for 2 -norm. In this paper, the measurement matrix is a random Gaussian measurement matrix, and there is a high probability that RIP conditions are met. According to Baraniuk, the equivalent condition of RIP is that the observation matrix Φ and the sparse representation basis Ψ are irrelevant. However, since we added the S matrix to improve the CIR resolution, the equivalent condition of RIP is therefore irrelevant between Φ, Ψ and S. In practical application, we can use an equivalent condition of criterion, namely incoherence, to guide the design of the measurement matrix. Incoherency means that the row vector Φ i of the matrix observation system Φ cannot be sparsely represented by the column vector Ψ i in the basis Ψ, nor can it be represented by the column vectors of the S i matrix. It can be measured by coherence and is expressed as In this paper, Φ K×L is a sensing matrix of random normal distribution with mean υ = 0 and variance σ = √ 1/K. According to the normal distribution function, the ratio within 3 √ 1/K is approximately 99.73%, so The sparse matrix Ψ estimated by dictionary learning is derived from collected CSI data. The indoor location signal model of the wiFI-WASP system used to collect CSI data is suitable for the Saleh-Valenzuela propagation model. The Saleh-Valenzuela propagation model is an indoor radio channel statistical multipath model that fits well with our measurements, through which, the received signal rays arrive in clusters. These rays have an independent uniform phase, as well as independent Rayleigh amplitudes whose variances decay exponentially with the cluster and ray delay. Light in clusters and within clusters forms Poisson arrival processes with different but fixed rates. According to the model, the received signal amplitude follows a normal distribution, which can be expressed as where k indicates the number of paths and N(0, σ 2 k ) shows a value that is in accordance with the normal distribution. Equation (34) creates the Rayleigh fading. The variance σ 2 k is the average power of the k path, so the strength of the paths within the clusters is given as [45,46]: where T l is the arrival time of the lth cluster and τ il is the arrival time of the ith path in the lth cluster. We set the time constants T l and τ il or the inter and intra cluster as 300 ns and 5 ns, respectively, as Poisson distributions. Γ is a constant of cluster arrival decay time, γ indicates a constant of ray arrival decay time, Γ = 60 ns, and γ = 20 ns. According to the normal distribution function, the ratio within 3 1/σ 2 (k) is approximately 99.73%. Thus, it can be seen that the range of l 1 norm of any element in Ψ can be expressed as The sinc function of CIR resolution enhancement matrix S N×K can be expressed as Obviously, the value range of any element in matrix S is Therefore, the coherence of Φ, Ψ, and S when multiplied by any rows and columns is in the range of where the coherence of µ is closely related to the number of rows and columns of Φ and Ψ, but not to the number of rows and columns of S and σ 1. Since the elements in the S matrix are generally less than 1, it is still helpful for incoherence µ.

Experimental Results
The performance of the proposed algorithm was evaluated under outdoor line-ofsight (LOS) conditions and indoor conditions, respectively. In each experiment, an 802.11ac wireless LAN was built, and six sniffers were deployed, running in 149 channels with a bandwidth of 80 MHz. One target device was used for location verification and one PC was used to collect data from the sniffer and target device online to estimate the location of the target device.
In practical applications with LOS propagation, usually the first peak of the impulse response is the TOA of the direct path. However, in a non-LOS or multipath environment, the first-arrival path usually attenuates significantly and may even have a lower signal intensity than the reflection path. In addition, multiple reflected signals that overlap the direct path signal distort the first peak of the combined signal [10]. Existing ToA estimation algorithms are based on thresholding the first peak, using a prior model [3], or the MUSIC/ESPRIT algorithm. The performance of the proposed approach was compared against MUSIC and ESPRIT.

Outdoor Test
The system was tested under outdoor LOS conditions to evaluate its performance. We deployed six sniffers around a WiFi network for the outdoor LOS tests with one AP and one WiFi dongle. The topology of the system is shown in Figure 7. The size of the experimental area was 900 m 2 . The target moved across the test area. Figure 5 shows the positioning results for the outdoor test. It can be seen that the positioning results are consistent with the actual locations.   Figure 8 shows the positioning accuracy of our system when the TOA is estimated using the CS and MUSIC and ESPRIT algorithm. It can be seen that the 72-percentile positioning error is 0.5 m, and 1 m in 90% with the CS. Compared with MUSIC and ESPRIT's TOA estimation methods, the accuracy of CS's TOA estimation method is much higher.

Indoor Test
Localization accuracy depends on the multipath environment, the materials used in the walls, the presence of metal objects, the density of sniffer deployment, and many other factors, so indoor localization testing is an important step in validating wireless positioning methods. The assembly hall covers an area of around 70 m 2 . We deployed six sniffers and a standard Wi-Fi device in the conference hall and placed them at known locations to collect data from 17 locations of target devices inside the conference hall. Figure 9 shows the real and estimated location of the target. It can be seen that, except for the poor reflection effect of the seat in the Y-axis direction, the estimated location is consistent with the actual location, which can meet the needs of many indoor positions. Figure 10 shows the positioning effect when using CS, MUSIC, and ESPRIT algorithms to estimate the TOA. We observed that CS achieves a localization error of approximately the 48th percentile at 0.5 m and 1 m, respectively. Compared with MUSIC and ESPRIT's TOA estimation method, CS's TOA estimation method has a higher accuracy. The SpotFi positioning system is an indoor localization method proposed by m. Kotaru et al. The SpotFi localization system has three antennas and is deployed in the indoor office with 5-6 AP. The localization error of approximately 18% is 0.5 m and approximately 60% is 1 m [25]. The ArrayTrack localization system is an indoor localization method proposed by Jie Xiong et al. In the same experimental environment as SpotFi, the localization error of approximately 30 % is 1 m.

Conclusions
With the increasing demand for positioning, it is difficult for existing positioning methods to meet the convenience of deployment equipment without satellite positioning. The proposed TOA estimation method is expected to better adapt to the business requirements of equipment deployment. In this paper, a method based on a neural network to determine the direct weight of dictionary filtering was designed, and the filtering effect is proven by quintic polynomial. The learned dictionary was used as the sparse matrix of compressed sensing, the augmentation and expansion matrix was added to increase the CIR resolution after sparse recovery, and the sparsity of the dictionary, measurement matrix, augmentation, and expansion matrix was proven. The proposed algorithm was validated by experiments on the indoor and outdoor positioning system. The results show that the proposed algorithm has a high positioning accuracy. Compared with the fingerprint positioning method, the positioning method adopted in this paper has the advantage of not needing to sample the data in the positioning environment in advance, so it has a broad application prospect.
Author Contributions: Y.H.: formal analysis, investigation, software, writing-original draft, writing-review and editing; A.P.: conceptualization, methodology, formal analysis, investigation, funding acquisition; B.T.: data curation, validation, software, project administration; G.O.: methodology, writing-review; X.L.: software, methodology, writing-review. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: As the data came from a third party department, CSIRO, which was not authorized to release the data, not applicable.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: