Estimation of Indoor Location Through Magnetic Field Data: An Approach Based On Convolutional Neural Networks

: Estimation of indoor location represents an interesting research topic since it is a main contextual variable for location bases services (LBS), eHealth applications and commercial systems, among others. For instance, hospitals require location data of their employees, as well as the location of their patients to offer services based on these locations at the correct moments of their needs. Several approaches have been proposed to tackle this problem using different types of artiﬁcial or natural signals (i.e., wiﬁ, bluetooth, rﬁd, sound, movement,etc).In this work, it is proposed the development of an indoor location estimator system, relying in the data provided by the magnetic ﬁeld of the rooms, which has been demonstrated that is unique and quasi-stationary. For this purpose, it is analyzed the spectral evolution of the magnetic ﬁeld data viewed as a bidimensional heatmap, avoiding temporal dependencies. A Fourier transform is applied to the bidimensional heatmap of the magnetic ﬁeld data to feed a convolutional neural network (CNN) to generate a model to estimate the user’s location in a building. The evaluation of the CNN model to deploy an indoor location system (ILS) is done through measuring the Receiver Operating Characteristic (ROC) curve to observe the behavior in terms of sensitivity and speciﬁcity. Our experiments achieve a 0.99 Area Under the Curve (AUC) in the training data-set and a 0.74 in a total blind data set.


Introduction
People's indoor location estimation (ILE) is one of the most important context variables, since they are essential data for location-based services (LBS). This allows us to increase the features of these services in many fields, such as, eHealth [1], shopping centers [2], automotive applications [3], among others. Consequently, the development of indoor location systems (ILS) is an interesting research topic that is improved every day with different proposals, taking advantage of the availability of mobile devices, such as smartphones that can detect many natural or generated signals [4].
Regarding generated or artificial sources of data, few approaches have been proposed to develop ILS. For instance, there are several proposals that use the propagation of signals, such as Bluetooth, Zigbee, WiFi and radio frequency identification (RFID), among others [5][6][7][8]. These proposals have proved their ability to develop accurate and robust ILS, through the use of sensors and diverse devices, allowing the development of commercial and well-known systems, e.g., LANDMARC [9], Bluepos [10], CLIPS [11], to mention some [12,13]. Nevertheless, the use of these approaches is complicated and expensive given the need of specific infrastructure that must be deployed in buildings where the ILS will be used. Additionally, the coverage of these systems has several constrains of infrastructure materials, point of view, signal propagation, etc., giving a limited coverage for the estimation of the location of the user.
Approaches that uses natural signals, for instance, light, environmental sound, magnetic field data, etc., solve the problem of install the dedicated infrastructure to generate data to develop ILS and increasing the capability to cover wide areas. For this purpose, most of the natural signals can be sensed with common devices, such as smartphones, since they include not only magnetic field sensors but also proximity and acceleration sensors that allow us to collect data on the earth's magnetic field.
For instance, Majeed et al. [14] propose an indoor location method using LED luminaries, where a passive user participation is not needed; however, specific LED lights are. Guan et al. [15] use visible light communication and specific niche of Robot Operating System (ROS) to develop an ILS. They propose the use of the double-lamp principle, presenting in their results a 1 cm error in real-time; however, it is a specific application of light localization in a mobile robot topic.
One approach that uses sound to develop an indoor positioning system is presented by Chang et al. [16]. This research proposes the use of deep learning to model the behavior of sound in a building, predicting the location based on the received sound, claiming 90% of accuracy. A mobile system developed to use with smartphones, called EchoTag, is proposed by Tung et al. [17]. They claim that one of the major improvements is that he pre-installation of infrastructure is not needed for generating acoustic signatures or fingerprints with a phone's speaker. An average of 90% of accuracy is reported after one week of the fingerprint collection.
However, the previous approaches require signals that are easily affected by external phenomena. For instance, light varies even with the earth rotation, or in the case of artificial light with bulbs or LED, it changes if any replacement is done. In the case of environmental sound in a non-controlled environment, any type of noise could led to error estimation, additionally to the noise or offset that may be added by the microphone. Therefore, with the development and the availability of magnetic sensors in common devices, such as smartphones, approaches that use magnetic field signal as information source has been proposed [18][19][20] given that it is a robust signal to natural earth phenomena as rotation and translation [18].
Magnetic field anomalies can be used to describe specific locations inside a building [21], and novel approaches include the use of complex machine learning techniques. Ashraf et al. [22] propose a smartphone approach applying a deep neural network ensemble. A soft voting criteria is defined for the prediction of multiple neural networks (NN), claiming an average error of 2.8 meters. In this sense, convolutional neural networks (CNN), which are commonly used to classify images [23][24][25], are proposed to develop ILS. Al-homayani et al. [26] propose the use of CNNs to develop an ILS, using a smartphone as sensor and a matrix of magnetic data points as CNN input. They propose 317 points of interest and 58,374 samples collected, describing each point with a 2 rows by 3 columns matrix; however, a maximal error of 40.76 meters is reported. Also, an approach presented by Ashraf et al. [27] apply CNNs to identify the floor in which localization must be done and then mixed with a WiFi approach to achieve the location. However, for several applications, the identification of certain place, like the floor, or a particular room is enough to provide indoor services.
Based on what was mentioned above, this study proposes the use of a CNN applied to magnetic field data to identify particular rooms. For this approach, magnetic field data are viewed as a bi-dimensional heatmap of energy, that recognizes a particular room. Then, these heatmaps are used to feed a CNN, which is comprised by two convolutional layers, one max pool layer and two dropout layers. The objective of this CNN is to classify the heatmap into the different possible classes, estimating to which room it corresponds, to help in the development of new ILS approaches using a natural signal, which does not need to be deployed, additionally with the use of common devices, i.e., smartphones, with an algorithm that can be upgraded easly and implemented in any type of buildings.
Three main goals are presented in this work, (1) to know if magnetic field information can be used as a bi-dimensional data source to develop ILS and other type of applications. (2) To improve related approaches that implement CNNs with magnetic field data as a first step, given that CNNs could be improved by changing or adding new layers of the architecture. Finally, two important aspects of any ILS are: (a) that it must be a generalized model (i.e., a model that is independent from a specific device or user) and (b) easy to implement in mobile devices. Therefore, (3) it is important to demonstrate that magnetic data have enough information to develop a CNN model that can be encapsulated and ported in mobile devices.
This paper is organized as follows Section 2 presents a detailed description of the magnetic field data set used, as well as the methods applied for the development of the ILS. Section 3 presents the experiments performed using the spectral evolution of the magnetic field data from several rooms and the results obtained. In Section 4, a discussion and conclusions are described and, finally, Section 5 presents future work.

Materials and Methods
In this section, magnetic field sensors and data that describe each of the rooms and the building locations are detailed. Additionally, methods used to develop the indoor location estimation model through a CNN are presented.

Magnetic Field Sensors
As mentioned in the previous section, LBS are one of the principal applications of ILS. Therefore, to promote the application of this type of proposals, well-know and common devices are proposed to sense the magnetic field data, i.e., smartphones, which are widely spread and include magnetic field sensors, among others, as light, proximity and acceleration. Three smartphones with different types of sensors were used as collection devices to collect data on the earth's magnetic field.
In the software part, an application that collects the magnetic sensor data from these smartphones was developed. To add complexity and variability to the data, three different users performed the sensing. A first user performed the magnetic field sensing using a Motorola G2, with a BOSH BMC150 magnetic sensor. The second user used a Sony Xperia 7 with the MPC d sensor and finally a last user performed the sensing with a Nexus 7 model smartphone. The sensors were selected based on the criteria of sensor quality to have a wide spectrum from low to high quality smart phones.

Data Pre-Processing
The original magnetic field readings are composed of an individual reading for each Cartesian axis, i.e., x, y and z component, as well as anomalous readings due to sudden changes in the magnetic field or erroneous sensor readings. Therefore, a pre-processing is required to obtain tidy data. The tasks corresponding to the pre-processing are described below:

Magnitude of Magnetic Field Data
Raw magnetic field data, composed by the three components in micro Teslas, described above, are processed using Equation (1) to obtain the magnitude of such data.
Obtaining the magnitude of the magnetic field data makes it possible to avoid restricting the position of the smartphone and the reading obtained by the sensor for each of the axes (i.e., screen position with respect to the hand).

Normalization
To avoid a possible spatial scaling due to the different sensors used, a Z normalization is applied to each of the magnetic field readings that comprise the magnetic signature, using Equation (2), where z i,d refers to the normalized reading, r i,d refers to the i th observation of the signature in dimension d, µ d is the mean value of the signature for dimension d and σ d is the standard deviation of the signature for dimension d.
Equation (2) was applied for all dimensions in R d .

Energy Grouping
As a final step, an energy grouping process was performed, where the Fast Fourier Transform (FFT) was applied.
The FFT can be described as a simpler and efficient method to compute the Discrete Fourier Transform (DFT), which allows us to characterize linear systems and identify the frequency components of a sampled waveform [28]. Therefore, to calculate the DFT of an array with a fast algorithm, or the FFT, Equation (3) can be applied, where z[k] represents the vector of values to transform, h = 1, ..., n where n is the length of y, exp(−2πi(k − 1)(h − 1)/n) is a primitive nth root of 1 and the value returned is the normalized univariate DFT of the sequence of values of z [29].
Once initial magnetic signatures were collected in the rooms and stored in a data set created during the signature collection, feature extraction and selection were carried out to get the initial classification model that allows us to estimate the location of users.

Data set Description
The data set of the signatures of the earth's magnetic field is composed of the data of 11 rooms. Each of the signatures represents 1,000 points that describe the magnitude of the readings from each room, collected from a school building shown in Figure 1, available at research group website (The data set can be publicly accessed in http://ingsoftware.reduaz.mx/amidami). Those rooms (classrooms, bathrooms, corridors and offices) were selected due to several factors that make them interesting for this study. Their location, construction materials, electrical circuits and some of the walls are shared between rooms, making common spaces, which adds similar readings between two rooms, adding complexity to the development. To ensure statistical validity, the minimum number of magnetic field signatures for each room was determined using Equation (4), as proposed by Galvan-Tejada et al. [18]. In this equation the result represented by x is the number of signatures to develop the model, N is the number of variables used for the experimentation. In this work, N is considered equal to 33,000 (11 rooms multiplied by 1000 unique magnetic field data points, multiplied by three different magnetic sensors each). Meaning a minimum of 16 signatures per room.
In Table 1 selected rooms are presented, as well as the number of fingerprints of each. It can be observed that the minimal number of magnetic field fingerprints is 36 from one classroom and one corridor. Therefore, 36 was selected as the number of fingerprints used to train and test the CNN. Partial classroom 37 10 Partial classroom 36 11 Partial classroom 42 12 Partial classroom 37 13 Partial classroom 39 14 Partial classroom 42 22 Lower narrow corridor 42 23 Upper narrow corridor 36 24 Upper narrow corridor 37 25 Right wide corridor 38

Convolutional Neural Network
CNN are a particular type of artificial neural network (ANN) to accurately represent the data learned. They are characterized by having up to five base layers, including an input layer, a convolutional neural layer, a pooling layer, a fully connected layer and an output layer. Also, CNN presents the important characteristic of extracting abstract features directly from the input data, achieving important contributions, mainly in different computer fields, such as classification, detection and segmentation of data [30]. This method, like other similar methods, uses a two-stage process: a feature learning stage and a classification stage. Each stage consists on one or more layers and, the feature learning stage is implemented by combining two types of layers, usually being convolutional layers and pooling layers. In this stage, the most significant features are extracted from the training subset of data. Then, these features are provided to a fully connected ANN layer.
A traditional ANN architecture of N-layers, with an input vector X 0 , produces an output vector X n = f (W n Xn − 1 + b n ), where X n−1 is the input vector to the n th layer. In order to achieve minimum error between the desired and the actual output, an optimized set of weight vectors, W n , and bias vectors, b n , is calculated as shown in Equations (5) and (6).
Therefore, the first layer of a CNN, as mentioned above, usually consists of a convolutional layer, which uses a set of the filter to slid over the input data. The results obtained from each subset are mapped to one single point, repeating the process for the entire set of data. Then, this output is provided to a pooling layer. The purpose of the pooling layer is to make the calculations obtained from the convolutional layer easier through the reduction of its size and to make the CNN more invariant. To reach this purpose, a convolutional layer with a kernel size equal to the pooling size and a large pitch is used.
Considering a feature map, f , of a convolutional layer of size w * h * n, where w and h represent the width and height, respectively, and n the number of filters used in the convolutional layer. The pooling of the feature map, f , with a pooling size k and a stride r gives a three-dimensional array, S, represented in Equation (7), where p refers the order of p-norm and p → ∞ represents the max pooling operation. g(h, w, i, j, u) = (r · i + h, r · j + w, u) represents a mapping function from positions in S to f . Then, in Equation (8), a calculation to correlate the convolutional layer and the pooling layer is presented, where θ refers the kernel weights, σ(·) refers the rectified linear unit activation function (σ(x) = max(0, x)), and o ∈ N, being N the number of output channels [31,32].

Model Training
For the model training, the parameters used and the established CNN architecture are described below in Figure 2. Initially, two convolutional layers (2-dimensional) are included, differing in the number of filters used in each, being 32 and 64, respectively. Also, both layers used a kernel size of 3 × 3 and as activation function, rectifier linear unit (ReLU). Then, a max pooling layer is included with a size of 2 × 2. A dropout layer is included with rate of 0.25. This layer probabilistically removes 25% of the inputs to the next layer, simulating a large number of networks with different structure, making nodes more robust to the inputs and reducing the risk of overfitting. A flatten layer is also added to reorder the data into a 4-d vector to be applied to the next dense layer. In this layer are presented 128 units and it uses as an activation rule, ReLU. A second dropout layer is added with a rate of 0.5 (50% of the inputs are removed). Finally, a second dense layer is added, presenting 11 units and as activation function, Softmax.
The activation function, ReLU, represents a non-sigmoidal function, defined by Equation (9), where x refers to the positive value obtained [33].
On the other hand, the objective of the activation function Softmax is to turn numbers aka logits into probabilities that sum to one, having as output a vector representing the probability distributions of a list of potential outcomes. It is calculated with Equation (10), where z is an arbitrary vector with real values z j , being j = 1, . . . , n, and n is the size of the vector [34].

Model Validation
Additionally to the self-evaluation process, in terms of accuracy and loss function, which is done to evaluate the performance of the CNN to classify the heatmaps from the rooms, a blind test approach is proposed. Therefore, a percentage of the observations is used to train the CNN and the complement is used to test the performance of the trained model with data that the model has never seen.
For the loss function, categorical cross entropy is implemented. This function is used for categorical data, where for n classes, the target for each sample is a n-dimensional vector where all values are zeros, except for the index corresponding to the class of the sample, where the value is one. Then, the closer the model's outputs are to the vector with value one, the lower the loss. Therefore, this function compares the distribution of the predictions, represented as q in Equation (11), with the true distribution, represented as q, where the probability of the real class is set to one while for the rest of the classes is set to zero.
On the other hand, to measure the performance of the model the areas is calculated under the curve (AUC), which is the area under the ROC curve, is selected as metric to evaluate in terms of sensitivity and specificity. This metric allows us to observe the behavior of the model recognizing true positives (TP), which are observations that actually are classified as the correct room it belongs to, and false positives (FP), which are the observations that belongs to aother room but are classified in the current evaluated room. Therefore, AUC explains general performance of the CNN to classify the rooms.

Experiments and Results
In this section, the experiments performed and the results obtained are presented. Initially, the data set contains 396 raw magnetic signatures of the rooms, 36 for each of the selected rooms, shown in Figure 3 in red, where it can be seen that they cover 40% of the total floor area of the building. As mentioned in previous section, these rooms were selected given several characteristics, mainly the location; nevertheless, the building construction materials and weather circumstances are other important features that can be taken in account in the development as is presented in other studies [19,21,35].
Once the fingerprints are selected, magnitude is calculated for each fingerprint to acquire a 1000 unique data points and the FFT is applied to each one. Subsequently, the one-dimension data vector obtained is reshaped, converting each uni-dimensional fingerprint into a bi-dimensional heatmap, as shown in Figure 4. This step is performed so that the data is in the appropriate format to be submitted to the CNN. The energy distribution calculated per room is presented in Figure 5, where it can be observed the dispersion and symmetry of the data, and it can be identified that each room has a unique behavior, highlighting the following:

•
In general, the boxes that represent the energy measurements do not show symmetry, since Q 2 is not in the center.

•
The boxes show similar interquartile difference, which adds complexity to the development, as mentioned in Section 2.3, by having some similar measurements between rooms.

•
The average measurements for the regions are not similar between rooms.

•
There are no outliers, which demonstrates consistency and reliability in measurements.
In the next stage, the training of the model proposed is developed. For the implementation of the CNN, the Python deep learning library, Keras [36], is used. The development of the model is divided into two steps, training and testing. For the training, the CNN balances 70% of the data, i.e., 11 fingerprints per room and it is carried on over 500 epochs. The performance of this step is presented in Figure 6, observing in the upper graph the behavior of the loss function and in the lower graph the behavior of the accuracy, which increase along the 500 epochs and reaches 0.99. For the validation of the CNN model, the 30% remaining data is used for a blind test, i.e., five fingerprints per room. Then, the ROC curve of this step is calculated, as shown in Figure 7, to observe the behavior in terms of sensitivity and specificity, appreciating true positives and true negatives, obtaining an AUC of 0.746.

Discussion and Conclusions
In this work it is presented the implementation of a CNN to classify indoor location magnetic fingerprints for the estimation of the location. This approach uses lineal magnetic field data points viewed as a bi-dimensional heatmap in order to be classified with a CNN approach. The results presented in Section 3 allow us to identify the following aspects: • Magnetic field fingerprints can be viewed as bi-dimensional data: Even when magnetic field data is viewed as a unique data point, a collection of points of a room can be treated as a bi-dimensional data heatmap that allows us to develop an ILS with bi-dimensional techniques. In this proposal, these bi-dimensional representations are viewed as a spectral evolution after applying an FFT, as presented in the results section, which means that spectral information and their properties are present due to the above, it means that a partial fingerprint has enough information that can identify the room. • CNN can be used to work with magnetic data: Currently, CNN has been widely used for the development of classification models in the field of image processing. However, the application presented in this work, with the magnetic field data seen as a bi-dimensional heatmap, has been shown to have the potential to be used as input for the training of a CNN in order to develop an ILS. According to the results obtained, the classification of indoor locations based on the modeling of magnetic field data allows us to obtain statistically significant accuracy. However, CNN could be improved in several ways, for instance, changing the loss function could lead to upgrading the performance of the CNN for this specific scenario, additionally a deeper NN can increase the AUC for complex buildings. Nevertheless, these last modifications needs a special study to be sure of the avoidance of the over fitting problem.

•
Magnetic field data present enough information to develop an ILS: Several approaches include magnetic field as a second data source to complement another type of signal. However viewed as a bi-dimensional data source, magnetic field has enough information to develop an ILS, achieving almost 75% of AUC. Nevertheless, the reduction of AUC in the blind set could reflect a overfitting problem that must be studied.

•
Deeper Networks and longer training improve the fitness of the CNN: Accuracy of the CNN increase along the 500 epochs proposed in this work, meaning a better ILS complemented with the reduction of the loss metric.
One interesting point from this research is that magnetic field can be viewed as a bi-dimensional signal, allowing the study of this signal with digital image processing techniques, such as CNNs which are mainly used to classify images.
Nevertheless, even when the AUC of the blind test is almost 0.75, it is much less than 0.99 which was obtained with the training set, meaning that the architecture could be modified to show better results in the blind test, for example, increasing the number of layers or, on the other hand, increasing the number of epochs. However, both options could lead to an overfitting problem, so it must be analyzed.

Future Work
Magnetic field demonstrates that it can be used to develop an ILS based on a CNN approach. However it is important to mention that there are several problems and improvements that could be done. Therefore, it is proposed as future work the analysis of other types of architectures to increase the fitness in the blind test without compromising the overfitting that could appear, as well as testing a longer training to adjust weights of the neural network. It is also possible to evaluate the model in different scenarios with variable configurations (size and distribution of spaces, wall materials, etc.) to visualize the behavior of the obtained results. In addition, increasing the data set allows us to apply transfer learning as another approach to evaluate the performance of the CNN, where they can also be used as learning curves for the evaluation of the behavior.
Finally, the developed CNN will be implemented in mobile devices through the development of a mobile app for the most common operative systems, such Android and iOS.