1. Introduction
Indoor localization has become one of the potential research areas in the last decade. The proposal of RADAR [
1] pioneered the indoor positioning with utilizing the radio signals. The emergence of smartphones in 21st century paved the way for the flourishing of localization. The inception and penetration of location-based services (LBS) further accelerated the research in the field of indoor localization. Today, LBS are offered to a large number of users, both indoors and outdoors. The global positioning system (GPS) can achieve localization accuracy ranging from 17 m to better than a few meters [
2]. However, this accuracy depends upon many factors including the number and geometry of collected observations, mode, and type of observation, measurement model, level of used biases, design of GPS receiver, and receiving land structure like obstacles or no obstacles [
2,
3]. The GPS is used for outdoor positioning, yet its sensitivity to occlusions including ceilings and walls makes it inappropriate and inefficient for indoor localization. Although GPS can be used for indoor localization when the user is close to wide windows and the receiver can get signals, the provided location may have a higher error which can in certain scenarios be larger than the indoor localization area itself. The lower signal-to-noise ratio and multipath phenomena result in a less reliable position [
4]. This led researchers to investigate alternative technologies that could potentially overcome such limitations and work efficiently for indoor environments.
A large body of work has been presented on such technologies including ultra-wideband (UWB) [
5], radio frequency identification (RFID) [
6], Wi-Fi [
7], and vision [
8], etc. Such technologies are however limited by their dependence on additional hardware (with the exception of vision), which needs to be installed in the area intended for localization. Additionally, the wide applicability is restricted by the shortcomings and software and hardware limitations of these techniques. For example, RFID is based on short-range communication and works only in a small area where RFID tags have been installed. The UWB based indoor localization systems provide precise position information but are expensive. Additionally, in the complex and occluded environments, more nodes are required to achieve higher accuracy which further increases the cost [
9]. Vision-based indoor localization although it does not need additional infrastructure, but requires a significant amount of computational resources to perform the image matching. The modern graphics processing unit (GPU) can do the image matching in a reduced amount of time; however, vision-based systems’ performance is degraded in case of low lighting conditions and poor image quality which can happen with the phone holding orientation of the user.
The proliferation and wide usage of modern smartphones present a potential solution to many of the above-mentioned limitations. Today, modern smartphones are equipped with a variety of sensors that can be leveraged to perform indoor localization. Smartphone sensors including Wi-Fi, Bluetooth, and camera resulted in the development of many localization techniques. Wi-Fi and Bluetooth based localization systems are limited by inherent limitations of wireless communication e.g., the propagation losses and environmental changes cause substantial changes in received signal strength (RSS) [
1,
10,
11]. The problems of multipath shadowing, signals fading, and impact of other dynamic factors including human mobility on signal fluctuation may lead to very high localization error. In the same fashion, the impact of human body loss causes signals absorption and the change in the RSS leads to higher localization error [
12]. Additionally, the RSS has been found to be dependent on hardware and an antenna design which may be an inherent limitation of Wi-Fi positioning accuracy [
13]. The sensors embedded in the smartphone are utilized in a variety of practical tasks. The authors present an object classification framework in [
14] using the hyperspectral camera. Similarly, wearable sensors are utilized as well in many practical applications. A triaxial accelerometer-based human motion detection system is proposed in various research works [
15,
16]. The features extracted from the data are used in machine learning-based models for that purpose. Feature extraction for such applications is very important. Thus, we can find various works which aim at finding various features for these tasks. Research [
17] works on feature extraction on the unmanned aerial vehicles (UAV). Algebraic representation of Spatio-temporal real-world objects is presented in [
18]. Local descriptors are used to track a person in two different cameras with support vector machines (SVM) [
19]. In the same way, similar human interactions are recognized with a supervised framework in [
20]. Motion estimation is another important task in today’s real-world applications and we can find a large body of work about human activity detection [
21], motion estimation, and real-time motion detection through multiple cameras [
22,
23]. Various sensors have been employed to achieve such tasks. For example, the authors in [
24] present a geometric-constrained multi-view image matching method that aims at the efficient and reliable processing of multiple remote sensing images. Machine learning [
25], as well as deep learning frameworks [
26], have also been utilized for human and human interaction detection.
The geomagnetic field-based localization has emerged as a new paradigm during the last few years [
27,
28,
29]. Today, a large body of works [
30,
31,
32,
33] can be found which utilizes the earth’s magnetic field data for indoor localization. The geomagnetic field (referred to as the magnetic field for convenience) is the natural phenomenon that is caused by the flow of convection current in the outer layer of the earth. The magnetic field is a vector field and possesses a direction and magnitude. It requires three parameters to represent the magnetic field at a point. The north, east, and downward components are represented
x,
y, and
z. A common way to represent the magnetic field is with the total intensity
F, the inclination
I, and the declination
D. However, the most widely used representation is through magnetic
x,
y,
z, and
F. Another way of showing the magnetic field is through the horizontal component
H, the vertical component
z, and the declination
D [
34]. The total magnetic strength on the earth’s surface varies from 25 micro Tesla to 65 micro Tesla [
35]. The magnetic strength and its direction do not change over a small restricted area, yet man-made construction obstructs the magnetic field and alters it to cause magnetic disturbances. Such magnetic disturbances are called anomalies and observed to exhibit unique behavior. These magnetic anomalies have been studied and used as a fingerprint in many research works [
30,
36]. However, the techniques which utilize magnetic field fingerprints have two major limitations. First, owing to the use of various magnetometers in heterogeneous smartphones, the magnetic strength is different even for the same location [
27]. This limits the wide applicability of magnetic field based localization systems as making a common fingerprint for various smartphones is not possible. As a result, the localization error is different even when a single localization approach is adopted for various smartphones. Second, multiple distant locations may have a very similar magnetic signature due to the indoor environment. It is highly probable, especially when the localization space is large. On account of the above-mentioned shortcomings, the fingerprinting technique is not suitable for indoor positioning where magnetic data are used. This study aims to leverage deep neural networks (NN) to address these issues.
Deep learning has recently been utilized to solve many problems and indoor localization is no exception. Deep NN and convolution neural networks (CNN) have been used for indoor scene recognition, object detection, and localization, etc. However, a single NN may perform worse in case of noisy data. This is why this study proposes the use of an ensemble that is based on multiple NN that are trained separately. The prediction from each of these classifiers is then employed to find the final location of the user. Deep learning is a data-intensive technique and requires a large amount of data for training. For this purpose, thousands of magnetic samples have been collected. The key contributions of this research can be summarized as:
A deep neural network (NN) based approach is presented which performs the indoor localization based on the features extracted from the magnetic data.
A soft voting criteria is defined to ensemble the prediction of multiple NNs. All NNs are trained with the same magnetic data to predict the user’s current location.
The proposed approach is tested with heterogeneous devices including Galaxy S8, LG G6, LG G7, and Galaxy A8 to evaluate the localization accuracy. The results are compared with support vector machines (SVM) and another magnetic localization approach.
Besides our own collected dataset, the proposed approach is tested on a publicly available magnetic dataset where the data have been collected with a Sony Xperia M2 smartphone.
The impact of device varying attitude has also been investigated where the device attitude is changed from ’navigation’ to ’call listening’, and ’front pocket’ mode to analyze the localization performance of the proposed approach.
The rest of the paper is organized in the following manner.
Section 2 presents an overview of a few studies related to this research. The current challenges of magnetic field based positioning are discussed in
Section 3.
Section 4 describes the proposed approach while
Section 5 details the experiment setup and analyzes the results. Finally, conclusions are given in
Section 6.
2. Related Work
The application of magnetic field data for indoor localization has been investigated by many research works. Such research includes the analysis of properties of magnetic field data that can be used for localization, as well as the impact of various devices usage, and the attitude of these devices [
28,
30,
36]. Research works using the magnetic field can broadly be categorized into three groups: using magnetic field data alone, hybrid approaches that combine magnetic field with Wi-Fi, pedestrian dead reckoning (PDR), vision, etc., and approaches that utilize machine/deep learning. A few works related to each category are discussed here.
Authors in [
37] investigated the use of a smartphone magnetometer to perform indoor localization. The investigation is aimed at studying the localization performance of magnetic data alone. The localization error is low if more elements of the magnetic field are used. However, the error may rise up to 20 m when the localization area is large and complicated by structure. The proposed technique is based on the fingerprint database of magnetic field data, which is laborious and time-consuming. Authors in [
31] used the crowdsourcing approach to build the fingerprint and minimize the labor and cost involved in fingerprinting. They employed a revised Monte Carlo technique to locate a pedestrian indoor. The proposed approach is able to converge to a 5 m area by using 30 s data. The research suggests that localization error using magnetic field data alone is higher and other assistive technologies could help to lower the error. Therefore, many research works focusing on the use of magnetic field data with other localization techniques can be found.
For example, an indoor localization system is presented in [
38] that combines Wi-Fi signals with the magnetic field data to build the fingerprint database. Initially, Wi-Fi access points (AP) are used to calculate an approximate location. This position is later used to restrict the search space that helps reduce the localization error to 4.5 m. The use of magnetic field data alone with the proposed technique results in an error of up to 16.6 m. Similarly, authors in [
39] present an approach that is based on the fusion of magnetic field data with PDR. An artificial neural network is used to identify the user walking and stationary modes. The user movement is tracked at regular intervals and its relevant position is utilized to refine the magnetic position. The reported accuracy is 2–3 m at 50% with two different smartphones. Furthermore, an approach is proposed in [
40] which works with PDR and magnetic data to locate a user in the indoor environment. An approach similar to particle filter has been adopted which takes into account the PDR and magnetic position of the user and predicts the final location of the user. Experiment results show under 2 m accuracy with two different devices.
Research proves that the fusion of more than one localization technology can significantly improve the localization accuracy. For instance, authors in [
41] present a system called WAIPO. The system is based on the fusion of Wi-Fi and magnetic fingerprints, image matching, and people’s co-occurrence. Initially, the position is estimated using Wi-Fi fingerprints which can further be refined with image matching and Bluetooth beacons. The final position is then calculated with the magnetic data from the user smartphone. The reported accuracy of WAIPO is under 2 m at 98 percent.
Various machine learning models have been proposed as well for human detection in an indoor envrionment. Such models work on various features extracted from the sensor’s data and perform human and object detection. For example, research works [
42,
43] focus on human interaction recognition with the help of artificial neural networks. The genetic algorithm is applied to identify prominent objects under varying environmental settings. Likewise, authors in [
44] make use of graph kernel-based SVM and bag-of-words to perform abnormal activity detection. Authors investigate the use of K-means++ and support vector data description in [
45] to cluster the data for regions of interests. Additionally, the use of pyroelectric infrared sensors is reported to perform abnormal activity detection in [
46]. The similarity between normal training samples is measured using Kullback–Leibler divergence, and one-class SVMs are used to perform the activity detection. Similarly, the use of depth information using the depth sensors has been proven to increase human activity recognition and tracking in smart houses [
47,
48].
Recently, the use of deep learning has been reported to perform localization with smartphone sensors. Authors in [
49] propose a system that makes use of a variety of smartphone sensors to localize a pedestrian. The research uses a smartphone camera, motion sensors, compass, magnetometer, and Wi-Fi to do the localization. A CNN has been designed that can identify the indoor scene. The recognized scene is later used to narrow down the search space in the magnetic database. The reported localization error is 1.32 at 95%. In the same fashion, the research [
50] proposes a multi-story localization approach based on smartphone sensors. The smartphone camera is utilized along with the magnetometer. Instead of magnetic intensity, the magnetic patterns are used to build the database. Smartphone camera pictures have been used for indoor scene recognition. The CNN model helps to identify a specific floor. It also increases the localization accuracy by narrowing down the magnetic database search space. The reported localization error is 1.04 m at 50 percent. CNN has been utilized with the magnetic data as well to perform the localization. For example, authors in [
51] present a magnetic field-based indoor localization method that utilizes CNN with smartwatch. The magnetic data along with smartwatch orientation data are used for training. Experiment results show promising results. The NN has been utilized in Wi-Fi based localization as well. Authors in [
52] propose a stacked denoising auto-encoder based feature extraction to extract Wi-Fi fingerprints to perform localization. The proposed approach tackles the problem of RSS fluctuation in dynamic environments and improves the localization accuracy.
The above-mentioned research works are limited by two factors in essence. The first problem lies in the use of Wi-Fi signals, which, as already discussed, are vulnerable to propagation loss. Furthermore, the RSS value is subject to dynamic factors including the presence of obstacles, shadowing, and human mobility. Secondly, the impact of device heterogeneity is not studied very well. Very few studies consider device heterogeneity, yet they, in turn, use longer data samples. For example, the authors in [
39] consider 14 s data while the authors in [
40] employ 8 s data to calculate the final location of the user. Additionally, the use of a smartphone camera consumes the battery very fast and is not an efficient solution. Similarly, the camera has its own inherent limitations including image quality under poor light conditions and dark environments. It is noteworthy to point out that deep learning has been utilized on smartphone camera images alone. This study aims to use deep neural networks on the magnetic field data to perform indoor localization.
4. Materials and Methods
This section provides the details of the proposed approach. The proposed approach is based on the use of deep learning to train NNs for localization. The first task is to find suitable features that are fed into the NNs.
4.1. Features Selection
The major limitation of using the magnetic field is the device dependence. The intensity of collected magnetic data may be different depending on the sensitivity of the installed magnetometer in various smartphones.
Another shortcoming of magnetic data is its low dimensionality. The magnetic
x,
y, and
z are traditionally used to build the fingerprint. These values may be very similar in multiple locations, especially in a large space. Thus, contrary to using the magnetic field data, this study aims to work with the important features of this data. Initially, a total of 18 features, as shown in
Table 1, are shortlisted. Then, a feature analysis is performed and the correlation of each feature to the prediction is evaluated, upon which features including ‘coefficient of variance’, ’kurtosis’, ’Shannon’s entropy’, and ’skewness’ are dropped due to their little correlation to the classification label. The correlation of the features is shown in
Figure 3.
4.2. Proposed Approach
The architecture of the proposed approach is shown in
Figure 4. The features extracted from the magnetic data are fed into neural networks. In addition, two other sensors including the accelerometer and gyroscope have been utilized to approximate the user’s relative motion and direction. Three different NNs make use of magnetic features to predict the user’s position. The purpose of using three different NNs is to leverage the predictions such that the prediction accuracy can be maximized. Each NN has a different architecture in terms of the number of hidden layers, the sequence of layers, and activation functions. The structure of each NN is shown in
Figure 5. Each NN is comprised of different numbers of hidden layers, as well as the placement of dropout and regularization layers. The features extracted from the magnetic data are used to train the neural networks.
In the same fashion, during the positioning phase, first of all, the features from the user collected data are extracted and fed into each NN to get the prediction. The user data are collected for three consecutive frames at a sampling rate of 10 Hz/s, where each frame is comprised of 2 s. Pre-processing plays an important role in the prediction process. The noise in the training data degrades the performance of the classification models. The data from smartphone sensors contain noise, so pre-processing is performed to clean this noise. For this purpose, a low pass filter has been used on the sensor’s data before the feature extraction. The positioning process follows the steps given in Algorithm 1. Here, each step is described in detail.
Step 1 (line 1): The first step is to get the predictions from three NNs. For this, instead of a single prediction from each NN, top k classes with highest probability are selected where value of k is 10. The value of k is based on the empirical findings. The predictions are collected for and denoted as , , and for , , and , respectively.
Algorithm 1 Find user location |
- 1:
get predictions from NN. - 2:
fordo - 3:
for do - 4:
- 5:
end for - 6:
- 7:
- 8:
end for - 9:
fordo - 10:
- 11:
- 12:
end for - 13:
|
Step 2 (lines 2–8): Once top
k predictions have been taken from each NN, a voting mechanism is needed to combine the predictions. For this purpose, the Euclidean distance
d between the predictions is calculated and a soft voting scheme is followed. A threshold
is set to select the common predictions from three NN. The value of
is 2 m and location candidates
are selected with the following criteria:
Equation (
1) states that, if the distance between one prediction by
and any of the predictions by
, and
is less than or equal to 2 m, then all the predictions from
,
, and
are added in
. Since three NNs may have the same predictions, duplicate predictions from
are removed.
Figure 6 shows the predictions from three NNs for
. We can see that a higher number of predictions fall in a small area and few of them may even overlap. It becomes more vivid when predictions are drawn together.
Figure 7 shows all predictions drawn together. The predictions which are closer in spatial dimension can easily be seen. Once Equation (
1) is applied, the predictions not shared by three NNs can easily be identified. The circled predictions in
Figure 7 represent the predictions which are only made by either of NNs and not shared by other NNs (They do not fulfill the criteria set in Equation (
1). Once the predictions which do not fulfill the criteria given in Equation (
1) are removed, the selected
can be seen in
Figure 8.
Step 3 (lines 9–12): Now,
are updated for
, and
with the help of user estimated step length and heading estimation. User approximate relative position is calculated to this end. For this purpose, user step length estimation and heading are required. Step detection is performed using the method proposed in [
39], while step detection is done with Weinberg model [
55]:
where
and
show the maximum and minimum acceleration during a time period. Once step length estimation is done, the position
,
can be calculated using
and heading estimation
as follows:
where
and
give the approximate relative position and show only how much the user has traveled in a particular direction during time
T. The
can be updated using the approximated position of the user.
After that,
are refined with the predictions from NN for
. The refining criterion is the same as given in Equation (
1); however,
d is now calculated using
and NN predictions. As mentioned before, each
T is comprised of the data of 2 s, and, at a moderate speed, the user can travel up to a 2 m distance during the considered time window. Thus, if the new predictions are within the range of 2 m from the locations given in
, they are selected; otherwise, they are dropped. Since the distance data may contain an error due to noise, a compensating factor
is introduced in the
threshold and its value is 0.18 m. The value of
is based on the error found during experiments and represents the average error in pedestrian dead reckoning (PDR) estimation over 2 s. Now, the value of
is
, and
is defined as follows:
The same process is repeated for where new are updated with PDR data, and new predictions from NNs are refined with respect to .
Step 4 (line 13): After this step, location candidates converge to a small area. Now, their centroid is calculated which gives the user’s current predicted location .