# Graph-Based Semi-Supervised Learning for Indoor Localization Using Crowdsourced Data

## Abstract

## 1. Introduction

## 2. Background and Related Works

## 3. Problem Formulation

## 4. Linear Regression Algorithm against Device Diversity Problem

#### 4.1. Pre-Processing of RSS Values

#### 4.2. Linear Regression Algorithm against Device Diversity Problem

- compute ${\mathbf{a}}_{old}$ and ${\mathbf{b}}_{old}:=$ least squares regression estimator based on ${H}_{old}$
- compute the residuals ${d}_{old}\left(i\right)$ for $i=1,\dots ,c$
- sort the absolute values of these residuals, $|{d}_{old}\left(1\right)|\text{}\le |{d}_{old}\left(2\right)|\le \dots \le \text{}|{d}_{old}\left(c\right)|$
- arrange the absolute values of the residuals in ascending order, let ${H}_{new}$ be a subset consisting of the nearest neighbors corresponding to the first h the absolute values of the residuals in the sequence
- compute ${\mathbf{a}}_{new}$ and ${\mathbf{b}}_{new}:=$ least squares regression estimator based on ${H}_{new}$

#### 4.3. Automatic Device-Transparent Algorithm for Crowdsourcing Indoor Localization System

## 5. AP Localization Using Compressed Sensing Method

- ${\mathbf{y}}_{\ell \times M}$ are the compressive noisy RSS measurements.
- ${\mathsf{\Phi}}_{\ell \times N}$ is the measurement matrix. Each row in this matrix represents the location of one RP, with an element of 1 to indicate the grid point at which the RP is located. Thus, only a few of RSS values are collected on the locations of RPs instead of measuring all the RSS values on the overall grid, which reduces the workload in the offline phase.
- ${\mathsf{\Psi}}_{N\times N}$ is the sparsity basis on which the measured signals have sparse coefficients $\mathsf{\Theta}$. In this matrix, ${\mathsf{\Psi}}_{ij}=RSS\left({d}_{ij}\right)$ indicates the RSS values collected at grid point i from the AP located at grip point j, for all $1\le i\le N$ and $1\le j\le N$. Assume that the transmition power of an AP is ${P}_{t}\left(\mathrm{dBm}\right)$. Then $RSS\left(d\right)$ is calculated based on the empirical indoor propagation model of [20]:$$\begin{array}{c}\hfill RSS\left(d\right)=\left\{\begin{array}{cc}{P}_{t}-40.2-20\mathrm{log}\left(d\right),\hfill & \mathrm{if}\text{}d\le 8\hfill \\ {P}_{t}-58.5-33\mathrm{log}\left(d\right),\hfill & \mathrm{if}\text{}d8\hfill \end{array}\right.\end{array}$$
- $\epsilon $ is the measurement noise.

## 6. RSS Difference-Aware Graph-Based Semi-Supervised Learning RSS Smoothing Method

#### 6.1. Estimation of ${\widehat{R}}_{d}({S}_{i},{S}_{j})$

#### 6.1.1. Offline Training Phase

#### 6.1.2. Online Localization Phase

#### 6.2. Finding the Optimal Solution

#### 6.3. Experimental Results

- Set one of the labelled points as unlabelled.
- Use the rest of the labelled points, 125 unlabelled points and RG-SSL method to estimate the RSS value of the above unlabelled point.

## 7. RSS Difference-Aware Sparse Graph-Based Semi-Supervised Learning Method and Experimental Results

#### 7.1. Sparse Graph Construction for RG-SSL Using CS Method

#### 7.2. Experimental Results

## 8. Conclusions

**Figure 2.**RSS values of an AP over the corridor area of the fourth floor of the Bahen Building, University of Toronto.

**Figure 10.**Comparison of signal distribution of radio map. (

**a**) Original radio map (

**b**) RG-SSL method (

**c**) G-SSL method (

**d**) SCTW method (

**e**) SPORT method

**Figure 11.**Comparison of signal distribution of test data. (

**a**) Original radio map (

**b**) RG-SSL method (

**c**) G-SSL method (

**d**) SCTW method (

**e**) SPORT method

**Figure 12.**Comparison of localization results. (

**a**) Original radio map (

**b**) RG-SSL method (

**c**) G-SSL method (

**d**) SCTW method (

**e**) SPORT method

**Figure 14.**Comparison of Weighted graph. (

**a**) Weighted graph calculated by heat kernel method; (

**b**) Weighted graph calculated by CS method.

**Figure 15.**Smoothed signal distribution of radio map and localization results using RSG-SSL. (

**a**) Smoothed signal distribution of radio map; (

**b**) Localization results.

Algorithm | Cumulative Probability (Location Error Is 2 m) | Average Error (m) | Maximum Error (m) |
---|---|---|---|

RG-SSL | 63.5% | 2.07 | 4 |

G-SSL | 60% | 2.13 | 4.5 |

SPORT | 60% | 2.15 | 4.5 |

SCTW | 54.3% | 2.24 | 5.5 |

Original data | 42.9% | 2.89 | 10 |

