Device-Free Indoor Location Estimation System Using Commodity Wireless LANs

In recent years, propagation channel characteristics have been effectively used in several applications such as motion sensing and position detection. Considerable attention has been paid to channel-sounding methods that are easy to utilize using low-cost devices. This paper presents a device-free indoor location estimation method using the spatio-temporal features of radio propagation channels using a 2.4 GHz-band three-by-three multiple-input-multiple-output (MIMO) channel sounder developed using commodity wireless local area network (WLAN). The measurement results demonstrated a reasonable performance of the proposed method with a small number of antennas.


Introduction
The technical development of the Internet of Things (IoT) has enhanced people's demand for location services. In the IoT era, position information is crucial. Among these services, indoor positioning-as-a-service, closely related to people's lives, has attracted a great deal of attention. The indoor positioning system (IPS) can be widely used in home surveillance systems, nursing care for the elderly, and patient monitoring in hospitals [1]. Most existing surveillance systems use cameras for monitoring purposes [2]. However, these have insufficient coverage and blind areas. To improve the performance, it is necessary to increase the number of cameras. However, this also brings an excessive cost and complex configuration. Further, privacy is still a problem that cannot be ignored.
To address these problems, various methods utilizing radio waves have been considered. By capturing the fluctuation of wireless communication channel characteristics due to a person's movement and applying machine learning, the estimation of a person's position and behavior pattern can be achieved. However, positioning in an indoor environment still has many technical challenges owing to different propagation conditions from outdoor applications [3]. For example, angle-of-arrival (AOA)-based localization has received great attention in the past decades. However, the major difficulty of AOA-based positioning is the influence of multipath propagation. The complexity of estimating the unknown channel parameters (amplitudes, delays, and angles) grows as the number of paths increases [4]. The time-of-arrival (TOA)-based system also has many challenges in obtaining high localization accuracy due to the difficulty of the order of one nanosecond measuring of the TOA using low-cost hardware in multipath-rich environments. Such systems suffer from time and frequency offsets between the local clocks in different nodes [5]. Furthermore, the range-based indoor positioning system is easily degraded in non-line-ofsight (NLOS) propagation environments [6]. In addition, the commonly used localization technologies such as the Global Positioning System (GPS) do not work effectively (due to low accuracy, a bad signal, and other issues) in an indoor environment owing to signal fading and multipath effects caused by building structures. Therefore, we focused on wireless devices that have already been widely deployed, such as WiFi (WLAN using IEEE802.11 standards). WiFi is widely used in homes, hotels, cafes, airports, shopping malls, and other various types of large or small buildings, which makes it very attractive for indoor positioning applications. The universality and affordability of WiFi devices will enable the popularization of indoor localization systems.
Recently, various studies researching indoor localization systems using WiFi devices have been reported. For instance, the IPS using the fingerprint method of received signal strength indication (RSSI) has been discussed [7,8]. However, owing to the temporal and spatial variation of the indoor environment [9], several access points (APs) are required to improve the accuracy; moreover, the configuration would become complex [10]. Channel state information (CSI)-based IPS has been developed [11][12][13][14]. However, owing to the limited bandwidth of WiFi devices, several APs are required to maintain the performance of the system [15]. Note that the latest WiFi standards allow a greater bandwidth, but no opensource tools for CSI acquisition have been provided. Therefore, we still need to use specific chipsets for CSI acquisition such as Qualcomm Atheros 9380/9580 and Intel 5300 [16], which support only IEEE 802.11n. Moreover, a microwave system using an antenna array has already been developed [17] where predetermined events are identified by monitoring the fluctuation of the signal subspace spanned by the eigenvectors. Because the eigenvector represents the spatial structure of the multipath propagation, spatial filtering by the first eigenvector can reduce the effects of noise, but a large number of antennas is needed to resolve the multipath components more precisely. In device-free localization (DFL) techniques, the performance is significantly influenced by the strong line-of-sight (LOS) path and is prone to bias depending on the person's location [18].
To cope with these problems, this study developed an indoor DFL system based on commodity WiFi devices, which achieves a reasonable performance by using machine learning for the spatio-temporal features of wireless channels. The technical contributions of this study are as follows. First, a 2.4 GHz-band three × three MIMO channel sounder was developed where the channel bandwidth was extended to approximately 68 MHz by concatenating the CSI taken at six consecutive WiFi channels to achieve high resolution in the delay time domain. Second, we developed an indoor location estimation method using the spatio-temporal features of multipath propagation characteristics, which were obtained by separating the multipath components into three angle taps and two delay taps. This can reduce the dependency of the performance on the change in the LOS path. Third, a support vector machine (SVM) was applied to identify a person's location.
The remainder of this paper is organized as follows. In Section 2, a channel-sounding system based on IEEE 802.11n wireless LANs is presented. Next, we present an indoor location estimation method in Section 3. The identification results of the sub-area where a person is present via two measurement campaigns are presented in Section 4. This demonstrates the performance of the proposed system and its feasibility. Finally, Section 5 concludes the paper.  [19] as a MIMO-OFDM-based wireless LAN standard and is the successor to IEEE 802.11a/g. The backward compatibility of existing standards provides a mixed mode [20] to achieve MIMO transmission, extending the legacy mode. The frame format of the mixed mode is shown in Figure 1. After L-SIG (legacy signal field), which sends wireless LAN frame information such as the total length of the wireless LAN frame, it sends the HT-SIG (high-throughput signal field), which includes transmission parameters for spatial multiplexing transmission. Furthermore, after HT-STF (high-throughput short training field) and HT-LTF (high-throughput long training field), it estimates the CSI, which is necessary for spatial multiplexing transmission. In IEEE 802.11n, the number of subcarriers over the 20 MHz bandwidth is increased from 54 to 56. Furthermore, it provides a 40 MHz-mode using channel bonding as an option.

Channel Sounding
Consequently, by four-stream spatial multiplexing with a 40 MHz bandwidth, it can achieve transmission throughput up to 600 Mbps (64QAM, Code Rate 5/6).

CSI Acquisition
In this study, we used an open-source software package called the CSI tool [21] to build our system. In IEEE 802.11n, the CSI was estimated in the receiver and then fed back to the transmitter, so the transmitter can use the CSI to complete calibration and beamforming. In other words, by setting the CSI estimate flag in the transmit packet, we can manage the receiver to report the CSI estimation to the transmitter. The CSI tool is a device driver for the Atheros network interface cards (NICs), ath9k (kernel module), to extract the CSI using IEEE 802.11n. In the CSI tool, the transmitter repetitively sends the CSI estimate flag packet for CSI acquisition, and the receiver reports the CSI acquisition result to the user space program whenever the CSI is acquired. Here, the CSIs acquired for every combination of transmitter and receiver antennas are estimated as complex numbers for 56 subcarriers, including transmitter/receiver characteristics, propagation channel characteristics, and antenna characteristics.
The CSI report in one packet transmitted with a bandwidth of 20 MHz is obtained in the form of a complex matrix of size N R × N T × N C ; if the number of antennas at the receiver and transmitter is N R and N T , respectively. Furthermore, information such as the timestamp, channel number, transmission rate, number of antennas (N R , N T ), number of sub-carriers N C , noise floor, PHY error, the received signal strength indicator (RSSI) of every antenna at the receiver, and the length of the payload is added. Because the real and imaginary parts of the CSI are each represented by 10 bits, their amplitude is automatically scaled according to the magnitude of the received power. Further, the phase of CSI fluctuates independently for each packet, but the amount of offset is common to all subcarriers (common phase offset). Figure 2 shows the MIMO transceiver configuration of the IEEE 802.11n wireless LAN. The stream parser creates N st spatial streams from the encoded bit stream, and the quadrature is modulated to the symbol stream by constellation mapping. The transmitter can send N st (≤ rank(H) ≤ min(N T , N R )) parallel streams, where N T and N R denote the number of antennas in the transmitter and receiver, respectively. Then, the spatial streams are shifted cyclically by cyclic delay diversity (CDD), extending the communication area by transmitting the same signal at different carrier frequencies. This prevents beamforming that might be made in a specific direction when all the antennas send the signal with common header information. Spatial mapping (SM) is performed when the number of antennas is larger than the number of spatial streams (N T > N st ). The frequency samples of each transmission branch are transformed into time-domain signals by the inverse Fourier transform and are transmitted simultaneously by all antennas. The MIMO propagation channel matrix H is estimated using the training signal (HT-LTF), which is contained in the frame format as shown in Figure 1. Figure 3 shows the block diagram of the three × three MIMO channel-sounding system. Assuming that the number of spatial streams and the number of transmit antennas are the same (N T = N st = 3), the received signal vector at the kth (k = 1, . . . , N c = 56) sub-carrier is expressed as:

MIMO Channel Sounding
where G agc , H (k) , and Φ (k) denote the gains of the automatic gain control (AGC), MIMO channel matrix, and CDD matrix, respectively. They are expressed as: where x (k) is the transmitted signal and n (k) = n T is the noise vector. Here, δ 2 and δ 3 are specified as δ 2 = 8 (400 ns) and δ 3 = 4 (200 ns), and N f is the number of FFT points and is set to 64. To separate the transmitted signals from multiple transmitting antennas, the signal x (k) is transmitted four times with each transmitting antenna by setting d as d Equation (1), then the matrix of the received signal can be obtained as: where: and: Then, the CSI is obtained using Equation (5) as: Note that unlike data transmission, the signal-to-noise power ratio (SNR) should sufficiently be large in channel sounding because the target to be discovered is not data, but the unknown radio channel. However, in a low-SNR regime, the SNR can be improved by taking the average of the channel snapshots under the static channel condition. It is necessary to remove the effects of the amplifier and CDD to obtain the MIMO channel matrix, which is expressed as:Ĥ Owing to the nonlinear distortions of the transmitter and receiver circuits in the CSI, further calibration to Equation (10) is required.

Developed System
The bandwidth of IEEE 802.11n is approximately 20 MHz with an optional 40 MHz mode. Therefore, the delay time resolution is theoretically limited to 50 ns (=1/20 MHz), which is not sufficient to apply WiFi to sensing with subtle variation of the propagation path. Bandwidth expansion with channel bonding has been proposed [22], and the toolkit has been published as open-source [23]. Utilizing this toolkit, the channel bandwidth could be extended to approximately 67.8125 MHz (delay resolution: approximately 15 ns, distance resolution: approximately 4.5 m) by concatenating the CSIs taken at six consecutive WiFi channels, as shown in Figure 4.

System Configuration
The system specifications are listed in Table 1. Two Linux PCs equipped with an Atheros AR9380 WLAN card were used for the transmitter and receiver, as shown in Figure 5. To extend the bandwidth, fifty packets are transmitted and received continuously at every channel while sweeping the channel from 1 to 11 at two-channel intervals. Separately averaging the amplitude and phase of the CSI value at each subcarrier filters out abrupt fluctuations and improves the SNR. Here, the beacon interval was set to a minimum value of 15 ms for fast channel switching.

Back-To-Back Calibration
The phase variation of the CSI occurs because of hardware imperfections such as the nonlinear distortion of the amplifiers, power control uncertainty, and phase ambiguity [22,24], which should be removed to obtain the genuine propagation channel characteristics. However, because the six CSIs taken at consecutive WiFi channels are concatenated to extend the bandwidth, it is necessary to calibrate the inter-channel discontinuity in the amplitude and phase values.
To remove the hardware imperfections contained in the CSI, we measure the system characteristics including the hardware imperfection in advance by back-to-back measurement. This measurement is conducted by connecting the transmit and receive antenna ports directly via an RF cable with known characteristics. We then obtain the channel transfer function by dividing the CSI by the system characteristics, which is called back-to-back calibration (B2B calibration). Here, CSIĤ (k) CSI,B2B,j,i is obtained by connecting the ith transmit and jth receive antenna ports directly with a cable and is performed for every antenna combination. The MIMO channel matrix can be obtained as: where G agc,j denotes the gain of the AGC amplifier of the jth receiving branch. It is assumed that the values of the denominator in Equation (11) should usually be measured in advance (upon initial setup). In commodity WiFi devices, the AGC amplifier operates automatically according to the received signal level to maintain the signal power at a certain level; thus, the CSI values are scaled as in Equation (11). The gain of the AGC amplifier can be deduced by referring to the amplitude of the CSI and RSSI as: where P j = E |y (k) j | 2 . The B2B calibration allows the CSIs acquired on multiple channels to be connected continuously because the phase and amplitude should become zero and unit after calibration. However, a phase jump occurs between adjacent channels owing to the independent random offset of the CSI phase characteristics in every packet. This can be removed by calculating the inter-channel phase difference using overlapped sub-carriers in adjacent channels. The amplitude of each channel also fluctuates owing to the AGC gain uncertainty. Using the averaged values of the overlapped subcarriers, the amplitudes were concatenated smoothly. The calibration procedure is shown in Figure 6a,b. (a)

Evaluation
The channel impulse response test was conducted for system validation purposes. An unmatched T-junction (BL41-6203-00, Orient Microwave Corp.) [25] was used for the measurements. Figure 7a shows the measurement configuration. Transmitter antenna port 1 (Tx1) was connected to a power splitter. Port 1 of the splitter was connected to receiver antenna port 1 (Rx1), which was used for the reference measurement. Port 2 of the splitter was connected to port A of the T-junction, and the signal from port B was input to receive antenna port 2 (Rx2). Here, port C was connected to a 6 m cable that left the end opened, which generated reflected waves at the cable end. The attenuation per 1 m of the cable used (Enviroflex_316, Huber Shuner, Corp.) [26] was approximately 1.62 dB/m. The measurement results are shown in Figure 7b. Only direct waves were observed in Tx1-Rx1, and the first reflected wave from the cable end could also be observed in Tx1-DUT-Rx2, which was in good agreement with the calculated value. Here, the calculation was done considering the propagation speed of electromagnetic waves in coaxial cables to be about 77 % of that in a vacuum and the path length and attenuation rate of the cable (delay time: 12 3×10 8 ×0.77 = 52 ns, power attenuation: 12 × 1.65 = 20 dB). To evaluate the inter-channel phase stability, the Tx1 signal was distributed to Rx1, Rx2, and Rx3 using a splitter. After B2B calibration, two-hundred ten channel transfer functions were obtained. The variation of ∠Ĥ 2,1 and ∠Ĥ 3,1 based on the reference value ofĤ 1,1 is shown in Figure 8. The left figure shows the relative phase difference of 217 sub-carriers for 210 time samples, and the right figure shows the phase difference of all sub-carriers acquired for 210 time samples. It can be seen that the phase variation was less than approximately 20 degrees.

DF Indoor Location Estimation
The DF indoor location estimation system was developed using the WiFi sounder described above. The receiving antennas were arranged linearly at approximately 6.3 cm intervals (half-wavelength of the minimum frequency 2.403 GHz). The bandwidth-broadened channel impulse responses were obtained by concatenating the CSIs taken at six consecutive WiFi channels to extend the bandwidth to approximately 68 MHz (delay resolution: approximately 15 ns, distance resolution: approximately 4.5 m).

Experiment Scenario
The experiment was conducted in a small office and a medium-sized conference room. The measurement specifications are presented in Table 2. Note that WiFi communication was performed between the access point (AP) and station (STA) fixed in the environment, and a target person moving in the environment was not equipped with any device. Here, the CSI was acquired 10 times for each target position and averaged to reduce the effects of noise. The room model of the small office and the position of the antenna are as shown in Figure 9a,b. The receiving and transmitting antennas were placed at a height of 0.9 m and 1.9 m, respectively. For this environment, thirty CSIs were captured while a person moved randomly in a sub-area; thus, one-hundred eighty samples were captured in total.

Conference Room
The room model of the conference room and the position of each antenna are shown in Figure 10a,b. The receiving antenna was a 3-element linear array placed at a height of 2.0 m. One transmitting antenna (Tx1) was placed at a height of 0.9 m inside the room, and one antenna (Tx2) was placed at a height of 1.6 m outside the room. In this environment, the CSI was captured at each position when a person moved along a predetermined route. Here, the person moved at 0.5 m intervals with a random orientation. One-hundred fifty-one CSIs were captured in total.

Signal Processing
The array output y(t) of the K-element linear array is expressed as: where [·] H denotes the complex conjugate transpose and x(t) and w denote the input signal and the weight of each antenna, respectively. Then, the output power can be expressed as: where R xx denotes the correlation matrix of the received signal and is expressed as: To direct the main beam of the array antenna to an arbitrary angle φ, each weight expressed in Equation (15) can be set as: then the weight vector can be expressed as: Here, a(φ) denotes the mode vector, and angle φ is a variable. Then, the output power is expressed as:

Existing Method
This subsection describes an existing method that uses spatial features [17]. We assumed that L waves were arriving from angles θ 1 , · · · , θ L . According to Equation (14), the array signal can be expressed as: where A denotes the direction matrix, a(φ) denotes the mode vector defined by Equation (19), and s(t) and n(t) denote the signal vector and internal noise, respectively. We assumed that the internal noise component was independent in each antenna with mean = 0 and variance = σ 2 . Then, the correlation matrix of the input signal is expressed as: The eigenvalue λ k and eigenvector e k (k = 1, 2, · · · , K) can be obtained via the eigenvalue decomposition (EVD) of the correlation matrix R xx , which satisfies the following equation: R xx e k = λ k e k (k = 1, 2, · · · , K).
Since R xx is a Hermitian matrix, the eigenvalues can be sorted as: Because rank AR ss A H = 1, the eigenvalue distribution is: Thus, the space spanning by the eigenvector matrix E can be separated into a signal subspace and a noise subspace.
The first eigenvalue separated by the process above and the corresponding vector (first eigenvector) were used in machine learning. The feature value for machine learning [17] is expressed as: where v 0 and λ 0 denote the first eigenvector and first eigenvalue obtained in advance, respectively. The first eigenvector is considered the optimal weight vector for the maximum ratio combination to maximize the output SNR. Therefore, it is strongly influenced by the direct wave. However, there is a problem in that the features hardly fluctuate when blocking the multipath component with low power.

Proposed Method
To address the problem mentioned above, we proposed a method that treats the multipath components separately in the delay time domain and beam space. Here, beamforming with three orthogonal beams was used to separate the multipaths in the three beamspaces. The output signal obtained at multiple delay taps and multiple beamspaces is expressed as: where w m ∈ C 3×1 denotes the weight vector for the mth orthogonal beam (m = 1, 2, 3) and x n (t) ∈ C 3×1 denotes the nth delay tap of the channel impulse responses (n = 1, 2). The weight vector is obtained by (18) where φ ∈ {−42 • , 0 • , 42 • }, and the beam patterns are as shown in Figure 11a. The target space was divided into three parts to receive multiple waves. γ and θ denote the resulting amplitude and phase, respectively. Figure 11b shows the condition when the single-bounce reflection wave is blocked by a human body. Here, because the multipath components were treated separately and received, its fluctuation can be detected when blocked, as seen in Figure 11c. Finally, the amplitude and phase of the received signal obtained at multiple delay taps and multiple beamspaces were used as spatio-temporal feature values for machine learning.

Result and Discussion
The machine learning used to create the classification model was the support vector machine (SVM) [27,28] and random forest [29]. The implementation used scikit-learn [30] Python [31] library and the hyperparameters for the model were determined by a framework called Optuna [32]. Figure 12 shows the impulse response captured by the WiFi channel sounder under different conditions (with and without a person). It was observed that the channel bonding extended the delay resolution to approximately 15 ns. Therefore, the first delay tap was set to 0 ns, and the second delay tap was set to approximately 15 ns considering the dimension of the room.  Figure 13a shows the features obtained by the existing method in the same office. Here, the first eigenvector and the first eigenvalue obtained in the environment without a person were used as reference data to calculate the features. The features, P, which are the correlation of the eigenvector obtained in environments with and without a person, fluctuated significantly in three locations, in Area 1 due to the shielding effect of the direct wave. However, Areas 3 and 0 were relatively stationary compared to Area 1, even though the direct wave also passed these areas due to the height of the transmitting antenna, which was higher than the subject. Therefore, the direct wave might not be shielded in Area 3 and not completely shielded depending on the orientation of the person's body in Area 0. Figure 13b shows the features obtained by the proposed method in a small office. The solid and dotted lines denote Delay Taps 1 and 2, respectively. The colors blue, orange, and green represent Beams 1, 2, and 3, respectively. Relatively large phase fluctuations can be seen on the first delay tap of Beam 1, as the direct wave was shielded. Furthermore, many fluctuations can also be confirmed at the second delay tap, which was considered to be the result of shielding the single-bounce reflection wave showing that the events can be detected even when multiple waves with a small power were blocked.

Small Office
The classification results of the existing and proposed methods are shown in Figure 14. The classification was performed using the support vector machine. For the existing method, the success rate was 35.8%. The average F-score was 0.31, which was not correctly classified due to insufficient features, feature P being relatively stationary. For the proposed method, the success rate and average F-score improved to 64.5% and 0.63, respectively. However, misclassification occurred and was considered to be due to the effect of similar propagation channel conditions, which could improve if the specific area of the room, such as the area covered by the sofa, table, and corridor, were well-defined rather than considering the room to be equally divided.
(a)  Figure 15 shows the features obtained by the proposed method in the conference room. As shown in Figure 15b, the amplitude of Tx2 fluctuated as it was placed outside the room and was greatly affected by noise. The classification results are shown in Figure 16a. The success rate was 88.2, and the average F-score for each area was 0.84. Here, the features of 76 positions were used to build the machine learning model, and the rest were used as test data. Notably, the positions of each area were randomly selected. However, the ratio of positions used for training to those used for prediction should be the same. As shown in Figure 16b, most of the misclassified positions were near the subarea boundary.

Conference Room
(a)

Conclusions
In this study, a three-by-three MIMO channel sounder was built, and the CSI taken at six consecutive WiFi channels was concatenated to extend the bandwidth and improve the delay time resolution. Based on this, a practical indoor location estimation system was proposed to cope with the problem that the features are significantly influenced by the strong LOS path. Then, we experimented with a small office and a conference room, and the results of person position estimation by machine learning were presented. The classification result showed that a rough classification of the subarea in a room where a target person is present is possible using a commodity WiFi device.