Deep Learning for Fingerprint Localization in Indoor and Outdoor Environments

Wi-Fi and magnetic field fingerprinting-based localization have gained increased attention owing to their satisfactory accuracy and global availability. The common signal-based fingerprint localization deteriorates due to well-known signal fluctuations. In this paper, we proposed a Wi-Fi and magnetic field-based localization system based on deep learning. Owing to the low discernibility of magnetic field strength (MFS) in large areas, the unsupervised learning density peak clustering algorithm based on the comparison distance (CDPC) algorithm is first used to pick up several center points of MFS as the geotagged features to assist localization. Considering the state-of-the-art application of deep learning in image classification, we design a location fingerprint image using Wi-Fi and magnetic field fingerprints for localization. Localization is casted in a proposed deep residual network (Resnet) that is capable of learning key features from a massive fingerprint image database. To further enhance localization accuracy, by leveraging the prior information of the pre-trained Resnet coarse localizer, an MLP-based transfer learning fine localizer is introduced to fine-tune the coarse localizer. Additionally, we dynamically adjusted the learning rate (LR) and adopted several data enhancement methods to increase the robustness of our localization system. Experimental results show that the proposed system leads to satisfactory localization performance both in indoor and outdoor environments.


Introduction
In recent years, the demand for location-based services (LBSs), both indoors and outdoors, has been gaining attention and has massive demand in industry and academia [1]. Successful application of the Satellite Navigation Positioning System (SNPS), such as Global Positioning System (GPS) and the Galileo Navigation System, provides great convenience for traveling people. However, in indoor or complex outdoor environments, GPS cannot provide accurate LBS [2]. Multiple sensors equipped with a smartphone have brought new advances for indoor LBS. By measuring with the received signal measurements, localization with Wi-Fi or a magnetic signal becomes possible [3].
Traditional localization methods rely on signal Time of Arrival (TOA), Time Difference of Arrival (TDOA), and Angle of Arrival (AOA) to determine the position of the User Equipment (UE). However, special equipment is needed to determine the signal round-trip time or angle. Therefore, it is inconvenient and impractical in many applications. In contrast, most of the fingerprint-based positioning methods do not require any dedicated equipment or infrastructure, and it can be implemented just by one ubiquitous smartphone. In addition, low-power sensors equipped with a smartphone draw much lower energy, even when continuously active [4].   for localization. Therefore, it has aroused great interest among researchers [13].

117
Traditional measurement-based localization systems, such as TOA and TDOA, can determine 118 the UE location. However, these approaches require line-of-sight (LOS) signal propagation, because 119 the localization approaches depend on trilateration. The localization accuracy will deteriorate greatly 120 in indoor environments, because the signal will often be blocked by objects and refracted [14].

121
However, fingerprint-based localization can overcome these drawbacks, and it has been proven to 122 have a satisfactory localization performance [12]. Therefore, the fingerprint-based localization 123 technique has attracted widespread attention. Basically, there are three kinds of fingerprints: visual 124 fingerprint, motion fingerprint, and signal fingerprint [3]. Improved image and video processing 125 abilities enable smartphones to handle massive visual searches from a large number of visual 126 fingerprint databases [15]. The application of Google Goggles and Vuforia Object Scanner have also 127 been successful. With the support of motion sensors, such as accelerometers and electronic 128 compasses, smartphones can identify the real-time dynamics of UE. The basic idea of motion 129 fingerprint localization is to combine an accelerometer and compass measurements and match these 130 with the pre-constructed motion fingerprint database to determine the UE location [16]. Signal The initial fingerprint-based localization approach relies on K-Nearest Neighbor (KNN) to find the RPs that match best with the fingerprint database. Later, the Bayesian algorithm, Weighted-K-Nearest Neighbors (WKNN), and Support Vector Machine (SVM) are proposed to improve the robustness of the positioning system [6][7][8]. In [9] a magnetic-based indoor subarea localization approach was proposed using an unsupervised learning algorithm. A multi-hop approach was leveraged to solve inaccuracies in the localization problem [10].
However, the main problem in achieving accurate fingerprint localization lies in the signal fluctuation, such as the adverse impact of multipath fading and signal attenuation by furniture, walls, and people. In addition, accurate positioning requires collecting more RPs; therefore, the workload of constructing a fingerprint database tends to be tremendous. Consequently, the main challenge in fingerprint-based localization is how to develop a model that can extract reliable features and accurately map massive numbers of RPs with widely fluctuating signals [11]. The aforementioned localization approaches have shallow learning architectures, leading to limited representational ability, especially when dealing with those massive and noisy data issues. Positioning with MFS is also problematic. The discernibility of MFS decreases dramatically when considering a large area, which makes it impossible to use MFS directly for positioning.
In recent years, deep learning has made great progress both in academics and industry. Deep learning with multiple layers has beaten other techniques in speech recognition, image classification, and so on [11,12]. Therefore, in this work, deep residual network (Resnet) and transfer learning are introduced to develop a highly accurate localization system. Using MFS alone for localization is insufficient, because of its low discernibility in a large area. Therefore, considering the outstanding performance of the density peak clustering (DPC) algorithm in feature selection, we propose a novel density peak clustering algorithm based on the comparison distance (CDPC) algorithm to select several center points of magnetic field strength (MFS), then combined it with a Wi-Fi signal to improve the robustness of the proposed localization system. Owing to the state-of-the-art performance of deep learning in image classification, the Wi-Fi RSSI and the center points of MFS are converted into images to build the fingerprint image database.
In order to deal with signal fluctuation, a model with a strong learning ability should be designed. In this work, a two-level hierarchical architecture training approach, containing a pre-training step and fine-tuning step, is adopted to obtain the final deep learning model. After finishing the construction of the fingerprint image dataset, the proposed Resnet is first used to train with the dataset and return a pre-trained model called the coarse localizer. Then, by leveraging prior knowledge of the pre-trained model, multiple perception layer (MLP)-based transfer learning is used to further train with the dataset and return a fine-tuned model called the fine localizer.
During the training phase, multiple data enhancement approaches are leveraged to improve the localization accuracy. The fingerprint dataset images are standardized into 224*224, so the model can more easily learn image features. In addition, some of the images are enlarged by 1.25 times or randomly rotated by 15 • . In batch normalization, a momentum item is added to reduce the vibration time and accelerate convergence of the model. In addition, the learning rate (LR) is dynamically adjusted to further optimize the model. For the matching phase, a probabilistic method is leveraged to indicate the accuracy of the localization system.
The main contributions of this paper can be summarized as follows: (1) the unsupervised learning CDPC algorithm is first used to pick up center points of MFS, which can represent the distribution of MFS at each RP. Positioning accuracy can be improved by combining Wi-Fi signals and the selected MFS. (2) Different from ordinary datasets, these selected MFS and Wi-Fi RSSI are transformed into images to form the fingerprint image dataset for localization. In order to develop a model with strong learning ability, Resnet and an MLP-based transfer learning two-level hierarchical training architecture are proposed for localization. (3) Considering the numerous classification points, we dynamically adjust the LR and adopted several data enhancement approaches to enhance the generalization ability of the deep neural network (DNN) model. (4) To verify the effectiveness of the proposed positioning system, the experiment was conducted in both real indoor and outdoor environments. The experiment shows that the proposed positioning system can achieve high-precision localization in both indoor and outdoor environments.
The rest of this paper is organized as follows: Section 2 describes the related works. The proposed positioning system is presented in Section 3. The experimental part is described in Section 4. Finally, Section 5 describes the conclusions and future works.

Related Work
The great demand for LBS has stimulated the development of localization techniques. The wide deployment of Wi-Fi signals and magnetic signals can be useful in almost all indoor environments for localization. Therefore, it has aroused great interest among researchers [13].
Traditional measurement-based localization systems, such as TOA and TDOA, can determine the UE location. However, these approaches require line-of-sight (LOS) signal propagation, because the localization approaches depend on trilateration. The localization accuracy will deteriorate greatly in indoor environments, because the signal will often be blocked by objects and refracted [14]. However, fingerprint-based localization can overcome these drawbacks, and it has been proven to have a satisfactory localization performance [12]. Therefore, the fingerprint-based localization technique has attracted widespread attention. Basically, there are three kinds of fingerprints: visual fingerprint, motion fingerprint, and signal fingerprint [3]. Improved image and video processing abilities enable smartphones to handle massive visual searches from a large number of visual fingerprint databases [15]. The application of Google Goggles and Vuforia Object Scanner have also been successful. With the support of motion sensors, such as accelerometers and electronic compasses, smartphones can identify the real-time dynamics of UE. The basic idea of motion fingerprint localization is to combine an accelerometer and compass measurements and match these with the pre-constructed motion fingerprint database to determine the UE location [16]. Signal fingerprint-based localization captures signals and matches them with the geotagged fingerprint database to determine the UE location [17].
The most commonly used signals are Wi-Fi signals and geomagnetic signals. Each Wi-Fi signal has its unique media access control (MAC), and its limited signal coverage ability (around 100 meters) enables Wi-Fi signals to be widely used in localization [5]. However, as is shown in Figure 2, Wi-Fi signals can fluctuate over a wide range because of surrounding signal noises, multiple fadings and so on, which may confuse nearby locations in Wi-Fi-based positioning systems. Therefore, collecting more Wi-Fi signals with different MACs can produce a higher positioning accuracy. Wi-Fi-based indoor localization systems have a localization performance of 5-10 meters. In addition, for signals with low strength, the Wi-Fi signal scanning process may take several seconds to obtain all the Wi-Fi signals.

145
The magnetic field is rather stable over a long period, and it has outstanding spatial discernibility 146 in a small area [18]. It can collect around 100 data points per second by the sensors equipped in a 147 smartphone. Researchers have found that MFS in indoor environments varies from 20 to 80 T  . 148 MFS at a given location will have similar variations to nearby locations. Therefore, discernibility 149 decreases dramatically when considering a large area. Therefore, it is impossible to directly use MFS 150 for positioning. This paper discusses whether the CDPC algorithm can be used to pick out the MFS 151 center point to enhance the positioning accuracy.
In [19], KNN was leveraged to find the best match from the constructed fingerprint database.

153
However, the experiments showed that the performance was not very satisfactory, because the 154 system was sensitive to signal noise. In order to enhance the stability of the localization system,

155
Bayesian-based filtering localization approaches were proposed in [20]. However, the traceability of 156 the localization system was influenced by the filter. An SVM-based localization system that converts neural networks (NNs), researchers have leveraged shallow NN models for localization. However, 159 these models have shallow structures and lead to a limited learning ability; therefore, it cannot handle 160 a large set of massive vibrating signals, and the localization performance is not very good [11]. The 161 increase in computer computing power and the successful application of deep learning give 162 researchers a new way to improve localization performance. One study [22] investigated the 163 application of convolution neural networks for localization. Another [11] used a stacked denoising 164 autoencoder and four-layer DNN to learn reliable features. In order to further increase the 165 localization accuracy, [23] leveraged channel state information (CSI) and deep learning for 166 localization. SVM and DNN were used for indoor and outdoor localization [24]. By using convolution 167 neural network, a hybrid wireless fingerprint localization method was proposed for indoor 168 localization [25]. However, additional expensive hardware is needed to acquire CSI information, and 169 the workload of data preprocessing is tremendous. Therefore, this approach is inconvenient and 170 impractical [26]. The magnetic field is rather stable over a long period, and it has outstanding spatial discernibility in a small area [18]. It can collect around 100 data points per second by the sensors equipped in a smartphone. Researchers have found that MFS in indoor environments varies from 20 to 80 µT. MFS at a given location will have similar variations to nearby locations. Therefore, discernibility decreases dramatically when considering a large area. Therefore, it is impossible to directly use MFS for positioning. This paper discusses whether the CDPC algorithm can be used to pick out the MFS center point to enhance the positioning accuracy.
In [19], KNN was leveraged to find the best match from the constructed fingerprint database. However, the experiments showed that the performance was not very satisfactory, because the system was sensitive to signal noise. In order to enhance the stability of the localization system, Bayesian-based filtering localization approaches were proposed in [20]. However, the traceability of the localization system was influenced by the filter. An SVM-based localization system that converts the localization problem to a classification problem was proposed in [21]. With the development of neural networks (NNs), researchers have leveraged shallow NN models for localization. However, these models have shallow structures and lead to a limited learning ability; therefore, it cannot handle a large set of massive vibrating signals, and the localization performance is not very good [11]. The increase in computer computing power and the successful application of deep learning give researchers a new way to improve localization performance. One study [22] investigated the application of convolution neural networks for localization. Another [11] used a stacked denoising autoencoder and four-layer DNN to learn reliable features. In order to further increase the localization accuracy, [23] leveraged channel state information (CSI) and deep learning for localization. SVM and DNN were used for indoor and outdoor localization [24]. By using convolution neural network, a hybrid wireless fingerprint localization method was proposed for indoor localization [25]. However, additional expensive hardware is needed to acquire CSI information, and the workload of data preprocessing is tremendous. Therefore, this approach is inconvenient and impractical [26].
Compared to other works, this work has three differences. First, the collected signal measurements were converted into fingerprint grayscale image for localization. Second, the unsupervised learning CDPC algorithm is first used to find out the center points of MFS, and these selected MFSs are leveraged to improve the localization performance. Third, in this work, a two-level hierarchical deep learning structure is leveraged to extract key features from massive, widely fluctuating Wi-Fi and magnetic signals. Additionally, MLP-based transfer learning is introduced to fine-tune the trained Resnet coarse localizer for obtaining the fine localizer. In addition, our localization system requires no orientation information; therefore, there are no orientation requirements for the phone when localizing. Different from the aforementioned localization methods, in this paper, our proposed method does not rely on additional expensive hardware, and the localization task can be realized only by a smartphone. Therefore, our proposed localization system is universal and cost-effective.

Proposed Solution
In this paper, we considered a typical localization environment with a smartphone receiving RSSI and MFS measurements from surrounding Wi-Fi APs and magnetic fields. As is shown in Figure 3, the purpose of localization is to find the location of the smartphone from the collected signal measurements. The localization system consists of six functional modules: data collection, data selection, data pre-processing, fingerprint image construction, DNN training and DNN localization. Multiple sensors equipped in the smartphones make it possible to read Wi-Fi and MFS signals. The purpose of the data selection is to use the CDPC algorithm to find the center point of MFS, and by combining the selected MFS with Wi-Fi RSSI, the localization accuracy can be improved. The signal measurements were converted into images to form fingerprint image dataset. Additionally, the localization information contains the fingerprint image and its location. The purpose of data pre-processing is to find signals with high strength and make it adaptable to form fingerprint images. After the construction of fingerprint image database, the proposed DNN was used to train with it. Then, the DNN parameter database stores the proposed localization model for the online localization. In the online phase, by using the trained DNN model, the constructed fingerprint image is used to match against the fingerprint image dataset to estimate location. Additionally, the DNN used in this paper includes Resnet and MLP-based transfer learning. In the following sections, we will detail the implementation steps and corresponding algorithms of the proposed localization system.

184
In this paper, we considered a typical localization environment with a smartphone receiving 185 RSSI and MFS measurements from surrounding Wi-Fi APs and magnetic fields. As is shown in Figure   186 3, the purpose of localization is to find the location of the smartphone from the collected signal  processing is to find signals with high strength and make it adaptable to form fingerprint images.

195
After the construction of fingerprint image database, the proposed DNN was used to train with it.

196
Then, the DNN parameter database stores the proposed localization model for the online localization.

197
In the online phase, by using the trained DNN model, the constructed fingerprint image is used to

The Proposed Data Selection Algorithm
For the magnetic field measurements, the unsupervised learning CDPC algorithm is used to select several center points to better reflect the distribution of MFS in each RP. Combining the selected MFS and Wi-Fi RSSI can improve the accuracy of the localization system.
Clustering by fast search and finding density peaks are representative of a density clustering algorithm. The basic idea of the DPC algorithm is based on two assumptions: (1) the cluster center is surrounded by some points with a lower density; and (2) these centers have a relatively larger distance from the points of higher density [27].
The two assumptions give the criteria of the cluster centers and give the test criteria for potential cluster centers. Two important parameters, the density ρ, and relative distance δ, can be calculated.
A clustering dataset is X = {x1, x2, . . . , xn}, where xi, 1 ≤ i ≤ n is a vector with m attributes. xi can be expressed as xi = {xi1, xi2, . . . , xim}, and the Euclidean distance d(i, j) for the xi and x j can be represented as follows: After calculating the Euclidean distance, the DCP algorithm can be conducted by the following procedure.
Define the local density ρ i of data point i where dc is the cut-off distance and is usually used as a manually entered parameter, based on experience.
Suppose there are N data points, and the distance between each point is Nd = N 2 . These distances are sorted in ascending order. Nd × p is the position of dc in this order, where p is the manual input percentage parameter and . is the celling function.
is to discover the number of points in the data space that are less than dc from data point i. Traditional relative distance δ: for each node i, a node with a higher density than j can be found. Calculate the distance between nodes i and j, and define the smallest d ij as δ i . If node i has the largest density, then δ i is the maximum distance from that point to other points.
In this paper, we propose a comparable distance to improve on DPC's second hypothesis. The DPC algorithm does not quantitatively compare δ i . Therefore, choosing a new variable to replace δ i reflects the relative size in the algorithm. Based on the above conditions, an amount ζ i which similar to δ i is defined as follows: where ζ i represents the distance from point i to the low-density area, which is a very suitable amount to compare with δ i . It is known by the hypothesis that the point with larger density and larger relative distance is the cluster center point. Hence, calculations are after each point of local density ρ and comparative distance ζ i . Figure 4 indicates the decision graph for our experiments. γ i = δ i × ζ i is calculated to find several maxima values. These maxima values are utilized as the center points and reflect the overall magnetic measurement distribution.
suitable amount to compare with .

245
It is known by the hypothesis that the point with larger density and larger relative distance is 246 the cluster center point. Hence, calculations are after each point of local density  and comparative 247 distance i  . Figure 4 indicates the decision graph for our experiments.

Data Pre-Processing
The purpose of data pre-processing is to find signals with high strength and make them adaptable to an RGB image. In order to eliminate the adverse effect of weak Wi-Fi signals on localization, we selected the eight strongest Wi-Fi signals at each RP. In our proposed localization system, the fingerprint database was constructed based on the image. Therefore, the purpose of data pre-processing was to adapt the signal measurements to an image. Generally, an ordinary RGB image contains three channel matrixes, and the values in the matrix are between 0 and 255. Wi-Fi RSSI measurements are between −30 and −120 dBm. Thus, the Wi-Fi measurements are based on η =|RSSI .

Fingerprint Image Construction
Different from other works that use raw signal data to construct fingerprint database [13,16], this paper proposes a novel method to construct fingerprint image dataset. Considering the impact of different data lengths and AP sets on localization accuracy, the fingerprint image construction module, in each grid, normalizes all the fingerprint images into the same size and AP set. This module is used both in training and matching phases. The difference is that, in the training phase, the fingerprint images are labeled, and it needs to predict the label in the matching phase.
Different from the traditional way of processing sequence data, we converted the collected data into fingerprint images for feature extraction. The collected sensor data contained a series of MFS, RSSI and multiple APs. Generally, an ordinary image is a three-channel matrix that has red, green, and blue channels, respectively. Therefore, for constructing the fingerprint image, we need to rearrange the collected data.
In the proposed localization system, the constructed fingerprint image should be standardized into the same size. The fingerprint image F is composed of a magnetic part F mag and a Wi-Fi RSSI part F rssi . The fingerprint image can be constructed as follows: where n is the number of center points selected by the CDPC algorithm, and it is equal to the number of RSSI measurements collected in each RP. k is the number of APs detected in the localization areas. Therefore, the MFS F mag is stored as a 1 × n vector. The Wi-Fi RSSI fingerprint image is stored as a k × n matrix. In this paper, F is used to form the red, green, and blue channel matrixes; therefore, the fingerprint image can be constructed. Then, the same method is used to form the fingerprint image dataset.

The Proposed DNN Introduction
In this paper, the proposed DNN contains a Resnet-based coarse localizer and a transfer learning-based fine localizer. DNN used in our localization system can automatically learn signal features and can distinguish the difference between fingerprint features in different classification points. However, the collected dataset is rather small, which lessens the localization accuracy. Therefore, inspired by the idea of transfer learning, a two-level hierarchical training strategy is adopted. First, Resnet is used to train with the fingerprint image database, and we reserved the localization model. Then, MLP is added after the Resnet, and we used the new model for transfer learning.

Deep Residual Network Introduction
DNN algorithm is proposed to predict the user equipment (UE) locations. Because we converted the locations into labels, the predicted results were the IDs of these labels. In addition, the proposed localization consists of a Resnet-based coarse localizer and a transfer learning-based fine localizer.
With the development of deep learning, researchers have found as the number of layers of the neural network increases, the learning ability of the network will increase. However, owing to the overfitting problem, the generalization ability will be decreased as the network goes deeper. This problem has troubled researchers for a long time. With further research, [28] proposed the deep residual model, and it successfully improved the learning ability of the network. As is shown in Figure 5, the residual model is constructed by adding a skip connection. The learning for the target map H(x) is transformed into H(x) = F(x) + x, and learning F(x) is easier than H(x). By cumulating multiple residual modules, the degradation problem of DNN can be effectively alleviated and performance improved.    Figure 6 shows the proposed Resnet model, and it consists of one basic block 2, four basic blocks 307 2, three basic blocks 3, an average pooling layer, and one MLP layer. Each basic block is a residual 308 module, and when overfitting occurs, the DNN skips some residual blocks and continues training.

309
In this paper, SELU was used as the activation function. Additionally, cross-entropy loss is used as     Figure 6 shows the proposed Resnet model, and it consists of one basic block 2, four basic blocks 2, three basic blocks 3, an average pooling layer, and one MLP layer. Each basic block is a residual module, and when overfitting occurs, the DNN skips some residual blocks and continues training. In this paper, SELU was used as the activation function. Additionally, cross-entropy loss is used as the loss function of the So f tmax classifier. The detailed calculation process of different layers can be seen in [29].

316
Transfer learning has lots of merits. As shown in Figure 7, transfer learning has a higher start, 317 higher slope, and higher asymptote. Therefore, for obtaining the best localization model in this paper,

Transfer Learning Introduction
Transfer learning has lots of merits. As shown in Figure 7, transfer learning has a higher start, higher slope, and higher asymptote. Therefore, for obtaining the best localization model in this paper, a Resnet-based coarse localizer model and transfer learning-based fine localizer model were used to maximize the localization accuracy. These two localizer models need to be trained separately. Specifically, Resnet is first used to train with the fingerprint image dataset. After completing the training process, we reserved the trained Resnet model and added MLP after Resnet for transfer learning. The MLP-based transfer learning model leveraged prior information from the trained Resnet to maximize localization accuracy.

326
As is shown in Figure 8, in this paper, MLP-based transfer learning is leveraged to fine-tune the   As is shown in Figure 8, in this paper, MLP-based transfer learning is leveraged to fine-tune the Resnet and further increase the localization accuracy. First, the Resnet is leveraged to train with the fingerprint image database. After finishing the training process, we obtained a pre-trained model called the coarse localizer. Then, we reserved the trained Resnet model and added MLP after it. Finally, this newly constructed model was used to further train with the fingerprint image database. This transfer learning-based model was used as the final localization model called the fine localizer.   sensors that could receive MFS and RSSI from the surrounding environment. In each grid, a series of 340 these signal measurements were collected in four to six locations to deal with signal instability. In 341 addition, this process was conducted five times, five days apart. Therefore, these measurements can 342 fully reflect the overall distribution of the signals. In the matching phase, the purpose was to find the 343 location of UEs given a collection of MFS and RSSI data and compare it with the true location.

344
The number of training epochs greatly impacts the performance of DNN. Too few training 345 epochs will make it difficult for the model to fully extract the features of the dataset. Conversely, too 346 many training epochs will lead to overfitting. In order to solve this problem and maximize the 347 localization accuracy, the fingerprint dataset was divided into 60% training set, 20% validation set, 348 and 20% test set. In each training epoch, a new localization accuracy will be generated. The DNN 349 model stores its best localization accuracy model parameters. Therefore, the DNN model will be

Setup of the Experiments
Experiments were conducted in both indoor and outdoor environments, which were divided into hundreds of grids. A person walked around and held a smartphone equipped with wireless sensors that could receive MFS and RSSI from the surrounding environment. In each grid, a series of these signal measurements were collected in four to six locations to deal with signal instability. In addition, this process was conducted five times, five days apart. Therefore, these measurements can fully reflect the overall distribution of the signals. In the matching phase, the purpose was to find the location of UEs given a collection of MFS and RSSI data and compare it with the true location.
The number of training epochs greatly impacts the performance of DNN. Too few training epochs will make it difficult for the model to fully extract the features of the dataset. Conversely, too many training epochs will lead to overfitting. In order to solve this problem and maximize the localization accuracy, the fingerprint dataset was divided into 60% training set, 20% validation set, and 20% test set. In each training epoch, a new localization accuracy will be generated. The DNN model stores its best localization accuracy model parameters. Therefore, the DNN model will be thoroughly trained, and we will choose the model with the best localization accuracy as the final model. To further increase the robustness of the proposed DNN in this paper, multiple data enhancement approaches were adopted. First, fingerprint images were standardized into 224*224. Second, parts of the fingerprint images were enlarged by 1.25 of its original size, or another way was to randomly rotate the fingerprint images by 15 • . In addition, momentum was added to the batch normalization to accelerate the training speed. Figure 9a shows the indoor floor plan for localization, and the area of interest was divided into 96 grids with a size of 2 square meters. The total number of collected APs was 87. Therefore, the proposed DNN structure consisted of 137 input units and 96 output units. Figure 9b shows the outdoor experiment environment, which was conducted in a community garden. The outdoor localization area was divided into 54 grids with a size of 3 square meters. The total number of collected APs was 161. The localization system was implemented on a Dell PC with an RTX2060 graphics card; this has powerful data processing capabilities compared to smartphone platforms. The proposed positioning models, data pre-processing, and data enhancement methods were implemented in Matlab and Pytorch.
proposed DNN structure consisted of 137 input units and 96 output units. Figure 9b shows the 359 outdoor experiment environment, which was conducted in a community garden. The outdoor 360 localization area was divided into 54 grids with a size of 3 square meters. The total number of 361 collected APs was 161. The localization system was implemented on a Dell PC with an RTX2060 362 graphics card; this has powerful data processing capabilities compared to smartphone platforms. The 363 proposed positioning models, data pre-processing, and data enhancement methods were 364 implemented in Matlab and Pytorch.

371
However, as shown in Figure 10, a suitable LR is difficult to pike up. In addition, a fixed LR may 372 cause the network to oscillate back and forth between the smallest point [29]. In order to solve this 373 problem, the LR needs to dynamically adjust to improve the convergence of the network. Therefore,

374
in this designed DNN model, the initial LR was set as 0.001, and, after every 35 epochs, we 375 dynamically adjusted the LR to half of its original size.

Influence of MFS and Learning Rate
LR is a critical hyperparameter in deep learning. During the training process, appropriate LR will help increase the fitting ability and improve the training speed of DNN. Conversely, improper LR will cause the network converge to a local minimum and greatly reduce the learning ability. However, as shown in Figure 10, a suitable LR is difficult to pike up. In addition, a fixed LR may cause the network to oscillate back and forth between the smallest point [29]. In order to solve this problem, the LR needs to dynamically adjust to improve the convergence of the network. Therefore, in this designed DNN model, the initial LR was set as 0.001, and, after every 35 epochs, we dynamically adjusted the LR to half of its original size.
96 grids with a size of 2 square meters. The total number of collected APs was 87. Therefore, the 358 proposed DNN structure consisted of 137 input units and 96 output units. Figure 9b shows the 359 outdoor experiment environment, which was conducted in a community garden. The outdoor 360 localization area was divided into 54 grids with a size of 3 square meters. The total number of 361 collected APs was 161. The localization system was implemented on a Dell PC with an RTX2060 362 graphics card; this has powerful data processing capabilities compared to smartphone platforms. The

363
proposed positioning models, data pre-processing, and data enhancement methods were 364 implemented in Matlab and Pytorch.

371
However, as shown in Figure 10, a suitable LR is difficult to pike up. In addition, a fixed LR may 372 cause the network to oscillate back and forth between the smallest point [29]. In order to solve this 373 problem, the LR needs to dynamically adjust to improve the convergence of the network. Therefore,

374
in this designed DNN model, the initial LR was set as 0.001, and, after every 35 epochs, we 375 dynamically adjusted the LR to half of its original size. As shown in Figure 11, we tested the localization performance of our proposed localizer with respect to LR and MFS. Figure 11 shows that localization accuracy achieved the highest when the LR take was 1 × 10 −3 . This is an appropriate LR for the DNN to converge to the global minimum. It can also be observed that the MFS effectively helped enhance the localization performance both for the coarse localizer and fine localizer. This is probably because the selected MFS enriched the localization features. The fine localizer with inappropriate LR performed worse than the coarse localizer; this may because the network was already at a local minimum at the beginning of training, and it was difficult to effectively converge. With appropriate LR, the transfer learning-based fine localizer can effectively utilize prior information of the pre-trained coarse localizer to achieve a better localization performance.

391
The number of neurons and hidden layers greatly influence the DNN. Therefore, we compared 392 their impact on localization performance.  indicated the number of hidden layers. Figure 12 shows 393 that, as the number of neurons increased, the localization accuracy first increased then decreased.

394
The downtrend was not obvious. However, this was not the case when experimenting with the

Influence of Different Numbers of Neurons and Hidden Layers
The number of neurons and hidden layers greatly influence the DNN. Therefore, we compared their impact on localization performance. λ indicated the number of hidden layers. Figure 12 shows that, as the number of neurons increased, the localization accuracy first increased then decreased. The downtrend was not obvious. However, this was not the case when experimenting with the number of hidden layers. The localization accuracy deteriorated when DNN went deeper, because excessive layers make it difficult for gradients to propagate between each hidden layer. The best localization performance was obtained with two hidden MLP layers and 200 neurons in each hidden layer.
localizer and fine localizer. This is probably because the selected MFS enriched the localization 383 features. The fine localizer with inappropriate LR performed worse than the coarse localizer; this may 384 because the network was already at a local minimum at the beginning of training, and it was difficult 385 to effectively converge. With appropriate LR, the transfer learning-based fine localizer can effectively 386 utilize prior information of the pre-trained coarse localizer to achieve a better localization

391
The number of neurons and hidden layers greatly influence the DNN. Therefore, we compared 392 their impact on localization performance.  indicated the number of hidden layers. Figure 12 shows 393 that, as the number of neurons increased, the localization accuracy first increased then decreased.

394
The downtrend was not obvious.

Influence of Different Dropout Rates
To prevent the overfitting problem, a dropout layer was used between each MLP layer. During the training phase, the dropout layer randomly sets the input neurons to 0. In this way, it could reduce the number of intermediate features, thereby reducing redundancy, that is, increasing the orthogonality between each feature. Table 1 shows the impact of different dropout rates on localization performance. It can be seen that the localization accuracy reached a peak of 97.1% when the dropout rate was 0.6. However, if the MLP did not possess a dropout layer, the best localization accuracy was 94.7%, which is lower than the best result. This is because the overfitting problem occurred. Therefore, a dropout layer was used to solve the overfitting problem.

Influence of Dynamic Learning Rate and Data Enhancement Methods
In order to further increase generalization ability of the DNN model. The LR was dynamically adjusted and several data enhancement methods were adopted. Table 2 shows the impact of dynamic LR and data enhancement methods on localization accuracies. It can be seen that these two methods significantly improve the generalization ability of DNN.

Comparison with Other Algorithms
In order to evaluate the proposed algorithm with other algorithms, different experiments were conducted. Figure 13 indicates the localization performance of the proposed algorithm with other existing learning algorithms. The raw collected Wi-Fi data and selected MFS were used to construct a fingerprint, and were used as the inputs of GRNN, KNN, WKNN, SVM and MLP. It worth mentioning that the fingerprint image dataset was constructed by the raw collected signal measurements. Then, these learning algorithms were leveraged for comparative experiments. When using multiclass SVM for positioning, the Gaussian kernel is used as the kernel function, with the kernel scale set to sqrt(P)/4, where P is the number of predictors. For the GRNN, we set its smoothing factor to 1. For SVM, 80% of the dataset was used for training and the remaining 20% for prediction. The MLP contains three hidden layers. The CNN algorithm contains one convolution layer, one batch normalization layer, one ReLU activation function and two feed forward layers. The experiment results showed that the proposed localizer was superior to other localization approaches. This is because the other models had a shallow structure, leading to limited learning ability. The proposed localizer had a deep structure, and it could perform well to extract reliable features from a large set of fluctuating signal samples.

Conclusions
In this study, we have proposed a two-level hierarchical training approach comprising a deep learning framework for indoor and outdoor localization with Wi-Fi and magnetic fingerprinting. By leveraging unsupervised learning, the CDPC algorithm can pick up center points of MFS to construct the fingerprint image database with Wi-Fi measurements. Then, Resnet is used to train with a fingerprint image database and get a coarse localizer. In order to increase the localization performance, the MLP-based transfer learning fine localizer is used to refine the localization results based on prior knowledge of the trained coarse localizer. We have evaluated our proposed localization system in indoor and outdoor areas. Various experimental results have demonstrated the superiority of our localization system. In the future, we would like to cooperate with local enterprises to develop applications that can be used in our daily life.