A Hybrid CNN-LSTM-Based Approach for Pedestrian Dead Reckoning Using Multi-Sensor-Equipped Backpack

: Researchers in academics and companies working on location-based services (LBS) are paying close attention to indoor localization based on pedestrian dead reckoning (PDR) because of its infrastructure-free localization method. PDR is the fundamental localization technique that utilize human motion to perform localization in a relative sense with respect to the initial position. The size, weight, and power consumption of micromechanical systems (MEMS) embedded into smartphones are remarkably low, making them appropriate for localization and positioning. Traditional pedestrian PDR methods predict position and orientation using stride length and continuous integration of acceleration in step and heading system (SHS)-based PDR and inertial navigation system (INS)-PDR, respectively. However, these two approaches provide accumulations of error and do not effectively leverage the inertial measurement unit (IMU) sequences. The PDR navigation solution relays on the standard of the MEMS, which yields PDR with the acceleration and angular velocity from the accelerometer and gyroscope, respectively. However, low-cost small MEMSs endure enormous error sources such as bias and noise. Hence, MEMS assessments lead to navigation solution drifts when utilized as inputs to the PDR. As a consequence, numerous methods have been proposed to mitigate and model the errors related to MEMS. Deep learning-based dead reckoning algorithms are provided to address aforementioned issues owing to the end-to-end learning framework. This paper proposes a hybrid convolutional neural network (CNN) and long short-term memory network (LSTM)-based inertial PDR system that extracts inertial measurement units (IMU) sequence features. The end-to-end learning framework is introduced to leverage the efﬁciency of low-cost MEMS because data-driven solutions provide more complete knowledge of the ever-increasing data volume and computational power over the ﬁltering model approach. A CNN-LSTM model was employed to capture local spatial and temporal features. Experiments conducted on odometry datasets collected from multi-sensor backpack devices demonstrated that the proposed architecture outperformed previous traditional PDR methods, demonstrating that the root mean square error ( RMSE ) for the best user was 0.52 m. On the handheld smartphone-only dataset the best achieved R 2 metric was 0.49.


Introduction
The extraordinary development of state-of-the-art indoor location-based services (LBS) is accelerating the expansion of indoor positioning techniques [1,2].The current demand for determining the location of dynamic agents such as humans and robots in indoor environments has been achieving unprecedented importance for society and scientific purposes.There are different techniques of indoor localization approaches such as infrastructure-based approaches and infrastructure-free approaches.Wireless fidelity (Wi-Fi) [3][4][5], radio frequency identification (RFID) [6][7][8][9], ultrawide-band (UWB) [10,11], and Bluetooth (BLE) [12] are among those techniques that require tailored infrastructure.These approaches need the distribution of Wi-Fi access points (Aps), tags, and BLE beacon signals when indoors to sense the environment.The infrastructure-free [13] approaches usually require the pre-built infrastructure to update the position.Each of these techniques has its own limitations as well as advantages.Among efficient cost-effective techniques, the self-contained PDR algorithm is the most popular.The convenience of utilizing a smartphone-based indoor localization system is that it requires no infrastructure deployments.In our case, we focused on the cost-effectiveness and scalability of large environment sensors such as inertial and LIDAR sensors.This approach is usually adopted for mobile indoor localization.
Odometry estimation is a critical ingredient across many domains, such as robotics egomotion estimation [14], unmanned aerial systems (UAS) [15], and for humans' self-motion estimation through space [16][17][18].As different sensing modalities have different capabilities, dynamic agents (human or robot) operating in indoor environments are often outfitted with numerous sensors, such as cameras [19], inertial measurement units (IMU) [20], and LiDARs [21].Therefore, tracking motion using hand-held smartphones has become highly popular in indoor environments [22].Among motion tracking methods, PDR has become an indispensable relative positioning technology that tracks the position and orientation (pose) in an infrastructure-free environment using wearable sensors that provide linear acceleration and rotational velocity.However, a PDR system based on inertial sensors alone is challenging in part because of its deficiencies of unbounded system error which leads to the development of complex models [23].In positioning and navigation fields, developing a new architecture is a well-researched topic as it enables ubiquitous mobility for providing reliable pose information.Traditional inertial sensor-based odometry estimation methods predict position and orientation using stride length and continuous integration of acceleration in SHS-based PDR and INS-PDR, respectively.Existing traditional PDR solutions either rely on guessing latent states or rely on the periodicity of human walking and the sensors' fixed position.However, these two approaches provide error accumulation, and do not effectively leverage this algorithm to enhance the accuracy of PDR.
Recently, the demand for deep learning (DL) approaches has been significantly increasing in almost every domain.In climate analysis and weather forecasting, DL techniques have been used and applied with techniques such as DNN and RNN [24].They model future climate status by focusing on the use of limited scope and data to investigate their models, and then use parameter tuning and cross-validation on different data.They use a novel CDLSTM model to investigate three important aspects, namely detecting rainfall and temperature trends, analyzing the correlation between temperature and rainfall, and forecasting the temperature and rainfall.LSTM is an improvement compared to RNN, designed to capture long-range dependencies in time series data.This network is immensely beneficial for a broad range of circumstances, and it is now broadly used in various applications.In the climate change forecasting and groundwater storage field, LSTM is being used broadly.LSTM is also used for botnet detection and classification in developing fast and efficient networks.The authors in [25] proposed a deep neural network for intrusion detection.They implemented a principal component-based convolution neural network (PCCNN) approach to improve precision.The PCA is used to reduce the dimension of the feature vector.More recent work also in the area of botnet detection and classification using deep neural network (DNN) models, DNNBoT1 and DNNBoT2, for the detection and classification of internet of things (IoT) botnet attacks [26].
Few studies are proposing deep learning-based data-driven methods to enhance the traditional odometry, i.e., the INS algorithm to make robust self-localization [27].Hannink et al. [28] proposed a mobile stride length estimation system that constrains double integration approaches from a raw foot-mounted IMU using deep convolutional neural networks.The authors in [29] used a neural network model to classify walking patterns to improve step length estimation.The recent fundamental approach is presented in [30] and repeated in [31], where the authors proposed reliable pose estimation using 6D IMU measurements for attention-driven rotation-equivariance-supervised inertial odometry.Many studies have been focusing on applying end-to-end learning approach to inertial odometry in order to enhance performance in terms of both robustness and accuracy.In the sequence-based approach, LSTMs are applied to learn temporal correlations between multivariate time series IMU data.The author in [32] proposed the recurrent neural network (RNN) method to propagate state and regress orientation from IMU measurements.Deep learning-based dead reckoning algorithms were provided to address the aforementioned issues owing to the end-to-end learning framework.However, the conventional models including RNNs, CNNs, and transformers have many disadvantages.Specifically, they need complex quadratic time, high memory usage, and inherit limitations of encoderdecoder architecture.
The merits of using deep learning (DL)-based PDR over traditional PDR are due to its robustness and accuracy.Given their ability for generalization without specifying an explicit analytic model, DL approaches are increasingly being used to learn motion from time-series data.The other advantage of DL-based PDR is that it decreases inertial sensor errors in the multi-mode system and is also capable of estimating motion and generating trajectories directly from raw inertial data without any handcrafted engineering.The conventional PDR approaches have difficulties in providing accurate state estimation for long-range distance and are inadequate to sufficiently control error explosion from primitive double integration.Additional limitations are due to complex users' body constraints and motion dynamics.In a departure from other studies which target estimation of the pose of the smartphone collecting the IMU data, here we focus on the scenario where input IMU data are collected from a smartphone while the target estimation pose is that of the user holding the phone.
The end-to-end learning framework for inertial odometry mitigates the challenge associated with IMU produced at high frequencies which is directed to long sequences [33].The recurrent neural network is an extremely powerful sequence model suitable for the challenge involving sequence processing.Processing the raw IMU using long-range sequence is challenging and vulnerable to washout, while processing using CNN architecture.In order to solve the aforementioned challenges, another model aware preprocessing for compressing the raw IMU for motion measurement is required.After numerous efforts in the indoor positioning and localization communities over the last couple of years, state-of-the-art inertial odometry (IO) algorithms have presented magnificent performance.Due to the disadvantage of computational load, a filtering-based PDR solution approach using MEMS has proven to be highly challenging, and a sliding window-based optimization method and end-to-end learning approach are crucial for position and orientation estimation.In order to increase performance and efficiency, we provide an end-to-end deep learning architecture for inertial sensor modeling.The contribution of this paper is as follows:

•
We proposed hybrid CNN and LSTM architecture to learn the spatial and temporal features from the input IMU.We built an end-to-end model designed for pedestrian dead reckoning.Then, from the datasets obtained, the aim was to predict the position and orientation of the pedestrian to estimate its trajectory;

•
We presented a 2D LiDAR (position and orientation) dataset with IMU, a groundbreaking dataset for research on inertial sensor-based pedestrian navigation that would both encourage the use of data-driven techniques and serve as a standard reference;

•
We demonstrated with the experimental results that our deep learning-based PDR trained only on 6-axis IMU data can also be used for 2D pedestrian trajectories estimation using conventional SHS PDR, therefore comparing the two approaches, this has good generalizability.
The rest of this work is structured as follows: Section 2 describes related works that are closely connected to the research topic of this paper.Section 3 explains the deep learning model's proposed approaches and the architecture of the IMU channel.Section 4 describes the experiment used to validate the estimated output pose, and Section 5 offers a comparison with other studies.Section 6 presents and discusses the findings of the experiment.Section 7 of the study finishes with conclusions and future research activity.

Related Work
Inertial odometry has been studied for both odometry in 3D and classical inertial odometry.The model-based pedestrian dead reckoning (PDR) technique, empowered by either sensor's fixed positions or cyclic motion patterns, is broadly considered a type of self-contained indoor positioning system.However, the model-based PDR approach needs an accurate representation of state estimates with regard to the incoming measurements and is inadequate to preferably control error eruption from unsophisticated double integration and has basic limitations due to complex users' body constraints and motion dynamics.
Chen et al. [34] proposed one of the first data-driven inertial odometry using an inertial measurement unit.Recent studies have used IMU data to train neural networks to learn motion models and output velocity estimates directly from IMU measurements.With the use of body wearable IMU, data-driven models [35] and filtering models [36] have been implemented into several positioning and navigation fields of applications.
There are two indoor positioning approaches utilizing IMU sensors: SHS smartphonebased PDR [37] and using foot-mounted IMU-based PDR [38].Several studies have significantly improved the efficiency of foot-mounted IMU-based inertial navigation systems.The first is the zero-velocity update (ZUPT), which improves model capacity by exploiting certain motion constraints.The second approach used to improve the efficiency of pose estimation is the zero angular rate update (ZARU), which reduces gyroscope drift.If a pedestrian dead reckoning system is constructed on a handheld smartphone, a zero-velocity update cannot be applied, carrying out the filter-based integrated navigation solution.The body wearable sensor embedded IMU is susceptible to different types of error caused by intrinsic properties.Step detection, heading, stride length estimate, and location updating are the three basic components of smartphone-based PDR [39].In order to identify step peaks and segment the related inertial data, PDR uses the inertial data's threshold.However, inaccuracies in step length estimation and step detection can still happen, resulting in significant system error drifts.It can be realized that step detection using step peaks methods is still in the preliminary stage of research, and most stride detection is performed on learning-based methods at present.Most step detection currently requires an activity recognition (AR) model and more attention is given to machine learning (ML) approaches [40].The second component of SHS-PDR, the step length estimation, also has the disadvantages of using traditional methods such as regression-based methods, biomechanical models, and empirical connections which are not appropriate for running because of larger gait parameter variations.Therefore, the smartphone-embedded IMU required for SHS-PDR is susceptible to different types of error caused by intrinsic properties.
Machine learning-based inertial odometry solutions eliminate the need for setting manually during testing and turn the incorporation of inertial navigation into a continuous time-series learning activity.Commonly used methods in INS modeling are adaptive neuro-fuzzy inference systems (ML-based-ANFIS) [41], fuzzy extended Kalman filter (AFEKF) [42], rotational symmetry of pedestrian dynamics [43], and Support vector machine(SVM) [44].The author in [45] first proposed a mobile stride length estimation system that constrains double integration approaches from a raw foot-mounted IMU using deep convolutional neural networks.Considering the robust zero velocity detector (ZVD) method used in a foot-mounted IMU.The author in [46] employs histogram-based gradient boosting because of its efficiency and achieves comparable results for various types of motion.Many authors [47] focus on orientation estimate which leverages sequential models to include prior information about the sequential nature of IMU signals.The others focus on position estimation which continuously tracks IMU ego-motion and produces IO [48].Sequence-based IO methods have achieved increasing attention to address the problem of error propagation.This approach breaks the data into an independent window to segment the inertial data.The state such as orientation, velocity, and position are not visible and drive from the inertial data and propagate with time.In the sequence-based approach, LSTMs are applied to learn temporal correlations between multivariate time series IMU data.In these types of inertial navigation system models, automatically discovered features relevant to the task are used.Author [49] proposed an end -to -end deep learning framework to tackle the inertial attitude estimation problem based on IMU measurements.

System Overview
General PDR systems based on end-to-end learning frameworks can practically be categorized into two architectures.One is to use the two bi-directional CNN architecture to predict the spatial features.Then CNN is extracting spatial features of the input IMU from the handheld smartphone and provides the feature maps to LSTM.The other is the two bi-directional LSTM models that are employed to capture time series temporal features.A proposed hybrid CNN-LSTM-based PDR approach is used to estimate the position and orientation of pedestrians using the multisensory-equipped backpack device (accelerometer, gyroscope, and backpack LIDAR global pose).The input of our network is a six-axis IMU sequence of pedestrian trajectory in an indoor environment.The output of our network is the pedestrian position and orientation.Figure 1 shows the proposed architecture of CNN-LSTM-based PDR, which mainly consists of two combined modules: 1D CNN and two bidirectional LSTM.CNN and LSTM combination were used to extract effective features for motion estimation.We introduced a recurrent neural network (RNN) that was particularly worthy to problems that require sequence processing.In order to enable RNNs to learn longer-term trends, the LSTM architecture was developed.CNN makes layers that use convolution filters on local features.The basic CNN block is composed of standard pointwise linear functions, nonlinearities, and residual connections.The input IMU sequence is sequentially transformed into smaller and larger sequences through pooling layers.For deep learning-based pedestrian dead reckoning, there is a cooperation between CNN and LSTM to acquire the benefit of both modules.As shown in Figure 2, the six-axis IMU input sequence of window 200 frames is processed by 1D convolutional layers.It has 11 kernels,128 features with a fixed dimension, and a maximum pooling layer of size 3.The output from concatenated 1D CNN-model fed to the first bidirectional LSTM.Since the future and previous IMU readings have an influence on the relative pose regression, the two-layer LSTM model was used.Hence, the output from the first bidirectional LSTM was the input to the second bidirectional LSTM model.In order to prevent overfitting, the dropout layers with a 0.25 rate were used.Finally, the estimated output relative pose was generated by a fully connected layer.For deep learning-based pedestrian dead reckoning, there is a cooperation between CNN and LSTM to acquire the benefit of both modules.As shown in Figure 2, the six-axis IMU input sequence of window 200 frames is processed by 1D convolutional layers.It has 11 kernels,128 features with a fixed dimension, and a maximum pooling layer of size 3.The output from concatenated 1D CNN-model fed to the first bidirectional LSTM.Since the future and previous IMU readings have an influence on the relative pose regression, the two-layer LSTM model was used.Hence, the output from the first bidirectional LSTM was the input to the second bidirectional LSTM model.In order to prevent overfitting, the dropout layers with a 0.25 rate were used.Finally, the estimated output relative pose was generated by a fully connected layer.
11 kernels,128 features with a fixed dimension, and a maximum pooling layer of size 3.The output from concatenated 1D CNN-model fed to the first bidirectional LSTM.Since the future and previous IMU readings have an influence on the relative pose regression, the two-layer LSTM model was used.Hence, the output from the first bidirectional LSTM was the input to the second bidirectional LSTM model.In order to prevent overfitting, the dropout layers with a 0.25 rate were used.Finally, the estimated output relative pose was generated by a fully connected layer.

Six Degrees of Freedom (6DOF) Relative Position and Orientation Representation
A six degrees of freedom (6DOF) relative position and orientation can be describ in numerous approaches.The first approach is an extension of the polar coordinate syst proposed in the 3D space by using the spherical coordinate system.The second approa uses 3D distance vectors and a unit quaternion.When dealing with motions in any dir tion, this characterization grasps the orientation accurately.Thus, by considering the fi approach which is spherical coordinate system, the relative position and orientation

Six Degrees of Freedom (6DOF) Relative Position and Orientation Representation
A six degrees of freedom (6DOF) relative position and orientation can be described in numerous approaches.The first approach is an extension of the polar coordinate system proposed in the 3D space by using the spherical coordinate system.The second approach uses 3D distance vectors and a unit quaternion.When dealing with motions in any direction, this characterization grasps the orientation accurately.Thus, by considering the first approach which is spherical coordinate system, the relative position and orientation is obtained by: where (x t , y t , z t ) and (x t−1 , y t−1 , z t−1 ) are the current position and previous position, respectively.∆l is the traveled distance, ∆θ is an inclination change, and ∆∅ is the change in heading.

Loss Function
The loss function consists of two parts: the position loss (p Loss) and the orientation loss (q loss).The loss for the trajectory value ∆p and orientation change ∆q are estimated separately due to the proposed model having a different scale.As described in [17], the loss function of the output is trained using multi-task learning.The basic uncomplicated way to estimate the loss for the six degrees of freedom (6DOF) odometry challenge is to consider a consistent weighting of the losses.The total loss function is given in Equation ( 2).
where σ 2 i and L i are the variance and loss function of the i th task.Let the estimated position and orientation be described as follows: where relative pose (∆p, ∆q) is calculated from previous and current positions.Orientations p t−1 , q t−1 , p t , and q t are associated with a given IMU data window.

Experimental Setup
As shown in Figure 3, the whole sensors setup contains a smartphone and LIDAR scanner.We used a Samsung Galaxy S21 Ultra 5G phone to collect the IMU data in indoor scenes to measure the accuracy of the recovered trajectories.Backpacks were equipped with LIDAR equipment.We first conducted experiments to demonstrate the pedestrian dead reckoning problem.The data used in our system were extracted from the backpack device method beforehand.
For each location, a floor-plan and recorded sequence of LIDAR pose was described in Figure 2c,d for scenario 1 and scenario 2, respectively.A sequence of five data points for scenario 1 and a sequence of five data points for scenario 2 were gathered to be processed.To see trajectories through a corridor of our building, we plotted the LIDAR pose which was considered as ground truth as shown in Figure 3a,b.

Dataset Acquisition and Pre-Processing
In this section, we describe the dataset acquisition, data description, and preprocessing.To obtain the highly accurate neural network model for PDR, the acquisition of reliable dataset is very important.The data were collected using a backpack laser scanner and a handheld smartphone.A detailed description of the dataset is given in Table 1.The collected dataset was carried out along the corridor of ETRI building 12 on the fifth floor on flat ground.To describe the real-world applicability of pose estimation, the dataset was collected by only considering one motion mode, i.e., normal walking by four test subjects at two different scenarios (medium and long distance).The LiDAR pose global coordinate was first changed to meters, where a Python script was used to convert using the Pyproj transformation coordinate system in the Republic of Korea where the data were collected to WGS84.Data recorded on the IMU inside the smartphone were read out and processed by software written in Python.See Figure 3a,b for the trajectories of the measurements carried out using the backpack scanner.The dataset contains a sequence of data labeled with a LIDAR coordinate absolute position which was converted to meters before being given to the model for training and testing.

Experimental Setup
As shown in Figure 3, the whole sensors setup contains a smartphone and LIDAR scanner.We used a Samsung Galaxy S21 Ultra 5G phone to collect the IMU data in indoor scenes to measure the accuracy of the recovered trajectories.Backpacks were equipped with LIDAR equipment.We first conducted experiments to demonstrate the pedestrian dead reckoning problem.The data used in our system were extracted from the backpack device method beforehand.
For each location, a floor-plan and recorded sequence of LIDAR pose was described in Figure 2c,d for scenario 1 and scenario 2, respectively.A sequence of five data points for scenario 1 and a sequence of five data points for scenario 2 were gathered to be processed.To see trajectories through a corridor of our building, we plotted the LIDAR pose which was considered as ground truth as shown in Figure 3a

Dataset Acquisition and Pre-Processing
In this section, we describe the dataset acquisition, data description, and preprocessing.To obtain the highly accurate neural network model for PDR, the acquisition of reliable dataset is very important.The data were collected using a backpack laser scanner and a handheld smartphone.A detailed description of the dataset is given in Table 1.The collected dataset was carried out along the corridor of ETRI building 12 on the fifth floor on flat ground.To describe the real-world applicability of pose estimation, the dataset was collected by only considering one motion mode, i.e., normal walking by four test subjects at two different scenarios (medium and long distance).The LiDAR pose global coordinate was first changed to meters, where a Python script was used to convert using the Pyproj transformation coordinate system in the Republic of Korea where the data were collected to WGS84.Data recorded on the IMU inside the smartphone were read out and processed by software written in Python.See Figure 3a,b for the trajectories of the measurements carried out using the backpack scanner.The dataset contains a sequence of data labeled with a LIDAR coordinate absolute position which was converted to meters before being given to the model for training and testing.   1 shows the description of inertial odometry datasets collected with four different subjects-1 female and 3 males between the ages of 30 and 49.Each trajectory is about 2 min of normal walking along a corridor in a building as illustrated in Figure 2c,d.IMU data from the smartphone was used as input data for the model that tries to estimate the positions recorded by the backpack positioning system as shown in Figure 2a,b.

Model Training Details
The models were implemented on Python.For the compilation of our model built, Python 3.8 and Cuda 10.2 were used.We trained the CNN-LSTM model for self-collected datasets.The training tasks were done for 100 epochs, as we noticed that the performance did not improve with further training.We implemented the framework algorithms with Tensor Flow 2.3.0 and Keras 2.4.3, using the Adam optimizer with a learning rate of 0.0001.The computations were performed on Microsoft Windows 10.0 with GPU NVIDIA GeForce GTX TITAN X.All models were trained for a batch size of 64 samples.The datasets were split using a window size of 200 and a stride of 10.The window size determines the length of lookback IMU readings used to predict the difference between position and orientation at the beginning and the end of the stride interval.We split our data set into training and validation datasets with a 9-to-1 ratio.After training was completed, the model saved during the training session with the best validation loss was used for testing.

Hyper-Parameter Tuning
Tuning the hyperparameters is a very important step in training neural network models.During the tests, the batch size was determined by the limits of the GPU memory.The window size and stride depend on the sampling rate of the sensor data and ground truth positions, and were selected after a preliminary search within a range similar to other studies.At this stage, we were concerned with selecting a model size with sufficient capacity to learn the task on this dataset.Reducing the model size to be able to run well on mobile devices was considered at a later stage.We tested two configurations with 128 and 192 as the number of filters, and 1,159,689 and 2,599,689 as the number of parameters correspondingly.The larger model achieved a slightly better minimum validation loss of −12.65 but the testing errors were slightly worse than the smaller model.Above certain model sizes, it seems that performance is not very sensitive to an additional increase in the number of parameters.Therefore, we used the smaller model size in the rest of the experiments.
To determine an appropriate optimizer and learning rate, we performed a grid search with Adam, SGD, and RMSprop optimizers and 1 × 10 −4 , 3 × 10 −4 , 6 × 10 −4 and 1 × 10 −3 learning rates.The training results are visualized in Figure 4. We selected 200 epochs for training, confirming that this was sufficient to observe the flattening of the loss curves.We can see that each optimizer approached a specific loss value that was not sensitive to the learning rate.The SGD optimizer performed worse than Adam and RM-Sprop which approached similar loss values.The Adam optimizer slightly outperformed RMSprop.The learning rate affected how fast we approached the minimum loss value.These results motivated us to use the Adam optimizer with a learning rate 1 × 10 −4 in the later experiments as indicated in Table 2. Since we used the lower learning rate, we increased the number of training epochs to 500 to ensure sufficient time for the loss to decay.These results motivated us to use the Adam optimizer with a learning rate 1 × 10 −4 in the later experiments as indicated in Table 2. Since we used the lower learning rate, we increased the number of training epochs to 500 to ensure sufficient time for the loss to decay.Increasing the learning rate also resulted in a noisier loss curve at low loss values.From Table 3, we can confirm that the Adam optimizer with a learning rate of 1 × 10 −3 achieved a minimum validation loss of −12.62.Second was Adam with a learning rate 3 × 10 −4 and a validation loss of −12.58, but with the advantage of having a smoother loss curve.Table 4 summarizes the test errors averaged across all test trajectories for each learning rate and optimizer combination.Since we are interested in the position estimation accuracy, we reported the standard for the task RMSE and MAE measures.For completeness, we added the R 2 metric too.As expected, the models trained with Adam and RMSprop optimizers significantly outperformed the models trained with SGD.Adam optimizer models again performed slightly better than RMSprop models.Surprisingly, the minimum MAE and RMSE errors were achieved by Adam with a learning rate 1 × 10 −4 , despite a validation loss for learning rates of 1 × 10 −3 and 3 × 10 −4 being lower.The reported mean R 2 metric values for the Adam optimizer were affected by bad trajectory results in two out of eleven trajectories.

Evaluation of Model Performance
Deep learning architecture was explored to improve accuracy.Using a low-cost IMUembedded smartphone and LiDAR scanner data, qualitative and quantitative analyses were performed for evaluation.
Root mean square error (RMSE): The RMSE was used as a standard statistical metric to measure model performance.The underlying assumption when presenting the RMSE was that the errors were unbiased and followed a normal distribution.In the pose prediction stage, a loss function was employed to determine the predicted error over the training samples.This error shows where the output was different from what was anticipated.To investigate the system performance on the tradeoff between ground truth and estimated position, we defined a cost function that covered the RMSE metrics as follows.
where n is the reading timestamps, p i is the actual position, and pi is the predicted position.

Mean absolute error (MAE):
The MAE is another useful measure widely used in model evaluations.The MAE is widely used to evaluate the performance of a guidance or navigation system and represents the estimated position's global accuracy.The MAE for a regression problem is the average of the absolute difference between the predicted and actual values.
where p i is the actual value, pi is the predicted value, and n is the number of forecast values.These metrics are error rates; thus, a lower MAE result represents a more accurate prediction.

Experimental Results and Analysis
Four users conducted experiments using five time sequences of IMU data collected by each individual, which were trained with the same hyperparameter settings in both the testing and training phases.The RMSE described in Section 4.5 was used for evaluation purposes.Table 4 shows the empirical results of each user in detail.Table 4 shows RMSE, which is better to evaluate the performance of the model.Compared with the traditional step and heading system-based pedestrian dead reckoning method, our end-to-end model on the lightweight 6-axis IMU provided an important enhancement.
Table 5 summarizes the different subject walk sequence evaluation results of the proposed method.The position estimation accuracy in each sequence was summarized with RMSE.The best result with the lowest RMSE was achieved on the female collected data, user 2. For the first scenario dataset (Table 5), the estimated RMSE varied between a maximum of 1.99 and a minimum of 0.51, and, similarly, estimated MAE varied between a maximum of 1.56095 and a minimum of 0.1210.From the table, the female subject RMSE and MAE could be assumed to be very small even if all users intended to perform the experiment in the same walking mode.Figure 5 shows the box plot for the RMSE mean of each user in Table 5.The average root means square error ranged from 0.51 to 1.61 m for the three datasets.As it is demonstrated in Table 5, the lowest error mean was user 1.The comparison was between the users' collected sequence in scenario 1 with the ground truth provided by the LIDAR dataset.The RMSE of user 1 was the largest because user one was the tallest among those subjects asked to collect the dataset, and his stride was very fast during normal walking, while the second user was the only female who participated in dataset collection and her MAE was relatively very small.Table 6 summarizes the different subject walk sequence evaluation results of the proposed methods.We evaluated the RMSE mean and MAE mean of each sequence to demonstrate the efficiency of each sequence.The best result with the lowest RMSE was the user 2 who was a female collecting data.For the first scenario dataset (Table 5), the estimated RMSE varied between a maximum of 0.346 and a minimum of 0.258, and similarly the estimated MAE varied between a maximum of 0.999 and a minimum of 0.0761.Table 6 summarizes the different subject walk sequence evaluation results of the proposed methods.We evaluated the RMSE mean and MAE mean of each sequence to demonstrate the efficiency of each sequence.The best result with the lowest RMSE was the user 2 who was a female collecting data.For the first scenario dataset (Table 5), the estimated RMSE varied between a maximum of 0.346 and a minimum of 0.258, and similarly the estimated MAE varied between a maximum of 0.999 and a minimum of 0.0761.Table 6 summarizes the different subject walk sequence evaluation results of the proposed methods.We evaluated the RMSE mean and MAE mean of each sequence to demonstrate the efficiency of each sequence.The best result with the lowest RMSE was the user 2 who was a female collecting data.For the first scenario dataset (Table 5), the estimated RMSE varied between a maximum of 0.346 and a minimum of 0.258, and similarly the estimated MAE varied between a maximum of 0.999 and a minimum of 0.0761.The trajectory of each sequence pose estimated by the model in our datasets was shown in Figure 8.   Figure 9 shows a comparison of each user's trajectory of scenario 2. For our results, we displayed the root mean square error for each sequence from the four runs plotted in Figure 9a-d  Figure 9 shows a comparison of each user's trajectory of scenario 2. For our results, we displayed the root mean square error for each sequence from the four runs plotted in Figure 9a-d.Figure 9 shows a comparison of each user's trajectory of scenario 2. For our results, we displayed the root mean square error for each sequence from the four runs plotted in Figure 9a-d  Figure 10 shows the sample sequence 01 of user 1.The orientations shown here were achieved by converting the quaternion to a Euler angle.As indicated in Figure 10, the orientation impacted the position error by using the phone orientation and the ground truth orientation.We can see that the orientation module improved the performance of all other position models (quite significantly for CNN-LSTM), but it also nearly reached the theoretical maximum performance where ground truth orientations were directly provided.
Electronics 2023, 12, x FOR PEER REVIEW 17 of 25 Figure 10 shows the sample sequence 01 of user 1.The orientations shown here were achieved by converting the quaternion to a Euler angle.As indicated in Figure 10, the orientation impacted the position error by using the phone orientation and the ground truth orientation.We can see that the orientation module improved the performance of all other position models (quite significantly for CNN-LSTM), but it also nearly reached the theoretical maximum performance where ground truth orientations were directly provided.The training loss and validation loss vs. epochs in Figure 11 show how the epochs were distributed among the different users.The training loss and validation loss for large epochs were not smooth because of the variation in the learning rate.There was the probability of finding bad data for learning in large epochs and batch sizes.Even though the training loss and validation loss were very smooth in 100 epochs, the trajectory estimation was not fairly consistent within each user.The results of the various user's specific validation and learning rate calculations were used to select the optimal number of epochs and sensitivity of parameters for the learning model.First, we picked out the suitable number of epochs for model training.Figure 11c,d  The training loss and validation loss vs. epochs in Figure 11 show how the epochs were distributed among the different users.The training loss and validation loss for large epochs were not smooth because of the variation in the learning rate.There was the probability of finding bad data for learning in large epochs and batch sizes.Even though the training loss and validation loss were very smooth in 100 epochs, the trajectory estimation was not fairly consistent within each user.The results of the various user's specific validation and learning rate calculations were used to select the optimal number of epochs and sensitivity of parameters for the learning model.First, we picked out the suitable number of epochs for model training.Figure 11c,d shows the training and validation loss results for the model.In each case, the training and validation loss of the model reached its lowest level when epochs between 100 and 500 were used.
Figure 10 shows the sample sequence 01 of user 1.The orientations shown here were achieved by converting the quaternion to a Euler angle.As indicated in Figure 10, the orientation impacted the position error by using the phone orientation and the ground truth orientation.We can see that the orientation module improved the performance of all other position models (quite significantly for CNN-LSTM), but it also nearly reached the theoretical maximum performance where ground truth orientations were directly provided.The training loss and validation loss vs. epochs in Figure 11 show how the epochs were distributed among the different users.The training loss and validation loss for large epochs were not smooth because of the variation in the learning rate.There was the probability of finding bad data for learning in large epochs and batch sizes.Even though the training loss and validation loss were very smooth in 100 epochs, the trajectory estimation was not fairly consistent within each user.The results of the various user's specific validation and learning rate calculations were used to select the optimal number of epochs and sensitivity of parameters for the learning model.First, we picked out the suitable number of epochs for model training.

Conventional PDR Trajectory Estimation
By taking the sample trajectory from the collected sequence of IMU data for scenario 2, we estimated the position using conventional SHS-based PDR.First, we estimated the step and step length using biomechanical methods.Second, we used the attitude and heading reference system-based heading estimation.Finally, positions were estimated by multiplying the step length with heading estimation techniques.
Figure 12 shows the trajectory of the estimated position of conventional SHS PDR.The Blue line is the estimated position and the yellow line is the Ground truth from the LIDAR pose.

Conventional PDR Trajectory Estimation
By taking the sample trajectory from the collected sequence of IMU data for scenario 2, we estimated the position using conventional SHS-based PDR.First, we estimated the step and step length using biomechanical methods.Second, we used the attitude and heading reference system-based heading estimation.Finally, positions were estimated by multiplying the step length with heading estimation techniques.
Figure 12 shows the trajectory of the estimated position of conventional SHS PDR.The Blue line is the estimated position and the yellow line is the Ground truth from the LIDAR pose The heading estimation has influences on the position estimation accuracy.Thus, it is very important to select an algorithm with better heading estimator accuracy to simulate the process of the conventional pedestrian dead reckoning.Figure 13 compares error accumulation as the trajectory increases in the traditional PDR SHS method.The CNN-LSTM PDR was used to compromise the resulting position drift.The heading estimation has influences on the position estimation accuracy.Thus, it is very important to select an algorithm with better heading estimator accuracy to simulate the process of the conventional pedestrian dead reckoning.Figure 13

Conventional PDR Trajectory Estimation
By taking the sample trajectory from the collected sequence of IMU data for scenario 2, we estimated the position using conventional SHS-based PDR.First, we estimated the step and step length using biomechanical methods.Second, we used the attitude and heading reference system-based heading estimation.Finally, positions were estimated by multiplying the step length with heading estimation techniques.
Figure 12 shows the trajectory of the estimated position of conventional SHS PDR.The Blue line is the estimated position and the yellow line is the Ground truth from the LIDAR pose The heading estimation has influences on the position estimation accuracy.Thus, it is very important to select an algorithm with better heading estimator accuracy to simulate the process of the conventional pedestrian dead reckoning.Figure 13 compares error accumulation as the trajectory increases in the traditional PDR SHS method.The CNN-LSTM PDR was used to compromise the resulting position drift.Figure 13 shows the CDF of the location errors in the experiment corresponding to one of the motion types and scenario 2. We evaluated the horizontal position error results using the CDF to compare with the degree of divergence within a very short distance.It showed that the CDF of the 2D horizontal position error increased with time.At the same time, it was found that the probability of the error range was more than 80% within 3 m as indicated in Figure 13.The position error range was greatly reduced by the use of the deep learning-based PDR algorithm, which improved the system's position performance.
This work was inspired by state-of-the-art deep learning that has been used to improve PDR.Previous efforts in the study of PDR for indoor positioning were done using the model-based method.These approaches commonly include the INS and SHS methods.However, employing each method for indoor positioning has the limitation of error drifts.Nevertheless, currently, there are data-driven methods that constrain system error drift to predict the position and orientation of pedestrians without any handcrafted engineering.In this work, therefore, we presented a learning end-to-end inertial odometry to outperform previous model-based approaches.
We used a combination of CNN-LSTM to extract features of a multi-channel low-cost IMU from backpack devices.A CNN and LSTM combination was used to extract effective features for motion estimation.In addition, we evaluated the traditional SHS-based PDR that relied on a three-heading estimation mechanism to compare the performance between the two PDR approaches.As a result, we found that a CNN-LSTM-based PDR accuracy improvement was obtained for the PDR compared with SHS PDR.To the best of our knowledge, there were no previous efforts to use either aforementioned method to evaluate qualitatively and quantitatively method using a self-collected odometry dataset.Finally, this study concludes that the deep learning-based PDR outperforms the traditional PDR.

Comparison with Other Studies
Deep learning-based PDR is an active topic of research and the previous studies provide a base for performance comparison on one hand and hints for unexplored subproblems worth pursuing.Previous studies have selected comparison baselines from conventional PDR such as SHS-PDR and INS-based PDR.In this section, we compare our result with other contemporary PDR methods based on machine learning and deep learning techniques with reference to the accuracy observed for the odometry dataset in the context of trajectory length.
Table 7 shows that methods based on the hybrid CNN-SVM showed significantly a better performance than using CNN method alone, based on the accuracy values.In our case, a hybrid CNN-LSTM network trained on different users showed that in most of the cases, MAE and RMSE were less than traditional PDR methods as shown in Table 8.It is important to notice that, a small error here demonstrated a significant improvement compared to the PDR result.Looking at the average MAE and RMSE on all users showed that there was an improvement in the value of MAE.Regarding RMSE, the effect of considerable noise was not too much, and still, the RMSE of each user was less than existing methods.As an additional qualitative comparison, in Figure 14 we show the ground truth and estimated trajectories on a dataset from the public repository provided by [17].Our results are shown in the right column and are obtained from a model trained with an Adam optimizer with a learning rate of 3 × 10 −4 (see Table 4).As an additional qualitative comparison, in Figure 14 we show the ground truth and estimated trajectories on a dataset from the public repository provided by [17].Our results are shown in the right column and are obtained from a model trained with an Adam optimizer with a learning rate of 3 × 10 −4 (see Table 4).4).

Discussion
In this paper, we used a backpack LiDAR system with laser scanners and a handheld smartphone IMU to illustrate its ability for enhancing the efficiency of a deep learningbased robust positioning PDR system.Collecting a low-cost inertial measurement unit (IMU) from a handheld smartphone and backpack LIDAR pose is a crucial method for estimating navigation solutions and using them as the ground truth, respectively.While Figure 14.Ground truth and estimated trajectories on a dataset from the public repository provided by [17].(a,c,e) shows estimated trajectory with normal training data.(b,d,f) shows the estimated trajectory with tuned better optimizer and learning rate.The result in the left column is from the publicly released model [17].Our results are shown in the right column and are obtained from a model trained with an Adam optimizer with a learning rate of 3 × 10 −4 (see Table 4).

Discussion
In this paper, we used a backpack LiDAR system with laser scanners and a handheld smartphone IMU to illustrate its ability for enhancing the efficiency of a deep learningbased robust positioning PDR system.Collecting a low-cost inertial measurement unit (IMU) from a handheld smartphone and backpack LIDAR pose is a crucial method for estimating navigation solutions and using them as the ground truth, respectively.While the acquisition of such data in indoor environments is relatively difficult to obtain GPS signal denied environment, it is very important to attain ground truth for the pedestrian dead reckoning algorithm evaluation.We also introduced a new dataset to encourage the community to adapt deep learning-based PDR method evaluation for indoor environments when external observers are unavailable.The proposed approach was trained and tested on our smartphone dataset from a backpack multisensorial system collected by five individuals.We inherited settings from 2-layer BiLSTM for the position and orientation estimation mechanisms.The input was a sequence of IMU-form handheld smartphone measurements and LiDAR poses for ground truth in the world frame.We evaluated CNN-LSTM on five individually collected datasets against conventional filtering-based research works.
By comparing our data-driven approach results with model-based PDR or filteringbased PDR without any pre-installed infrastructure research, we found that a CNN-LSTM neural network accuracy improvement was obtained for the PDR compared with conventional SHS PDR.These results showed that the proposed deep learning-based pedestrian dead reckoning methodology is a useful tool for the position and orientation method for IMUs captured from handheld smartphones and backpacks.Often times in many scenarios, it is reasonable for IMUs to be noisy, thereby providing imperfect data.To determine the accuracy of the two approaches in our scenarios and evaluate the influence of the neural network, we prepared two datasets with inertial data (consisting of accelerometer, gyroscope, and LIDAR pose data).

Conclusions
The experiments described in this paper aimed to evaluate the deep learning-based pedestrian dead reckoning under discussion across the different sequences of smartphoneembedded IMU and LiDAR ground truth pose data.We evaluated how a deep learning, hybrid CNN-LSTM system is able to efficiently estimate an accurate position even when working under a largely GPS-denied environment.A hybrid CNN-LSTM model with end-to-end training mechanisms for pedestrian dead reckoning is proposed.The proposed pedestrian dead reckoning-based indoor positioning system using CNN and LSTM algorithms extracted the spatial and temporal features from the input IMU data.The CNN module was used to extract spatial features from the IMU measurements, followed by the two bi-directional LSTM with SoftMax scoring alignment, which was applied to further capture the temporal features.Our experiments validated the effectiveness of the proposed CNN-LSTM-based PDR in terms of pose estimation accuracy.
The limitation of this study is that only normal walking was considered.Other modes of motion such as side stepping, running, climbing, and descending stairs are to be considered at a later stage.In addition to unconstrained smartphone-based PDR, in our future work, we will consider the addition of a self-attention mechanism to increase robustness by mitigating noise spikes and missing measurements, and by improving generalization over a variety of smartphone models.Another promising approach to consider is structured state-space sequence modeling that targets solving the problem of long-range dependencies.It will help in capturing rich building interior contexts and improve performance on trajectories specific to the given building.

Figure 1 .
Figure 1.Overview of the proposed CNN-LSTM model.

Figure 1 .
Figure 1.Overview of the proposed CNN-LSTM model.

Figure 2 .
Figure 2. The device used for the indoor experiment.It contains a multisensorial backpack dev used in the indoor environment to collect the dataset.(a) User carrying backpack device.(b) Mon for real-time display of backpack positioning system.(c) Scenario 1 ground truth plotted.(d) S nario 2 ground truth plotted.

Figure 2 .
Figure 2. The device used for the indoor experiment.It contains a multisensorial backpack device used in the indoor environment to collect the dataset.(a) User carrying backpack device.(b) Monitor for real-time display of backpack positioning system.(c) Scenario 1 ground truth plotted.(d) Scenario 2 ground truth plotted.
Electronics 2023, 12, x FOR PEER REVIEW 11 of 25 learning rate.The SGD optimizer performed worse than Adam and RMSprop which approached similar loss values.The Adam optimizer slightly outperformed RMSprop.The learning rate affected how fast we approached the minimum loss value.

Figure 6
Figure 6 shows a boxplot summarizing the RMSE and MAE value for different subject.The box plot provides the visualization of statistical estimated position max-min.

Figure 6 .
Figure 6.RMSE vs. MAE of different users for scenario 2.

Figure 7
Figure7illustrates the estimated RMSEs of each coordinate x, y, and z, and the total trajectory.The coordinates for x, y, and z showed increased errors because the user direction changed.The test trajectories' length along the x-axis was twice the length along the y-axis, which might be related to the observed large RMSE.

Figure 5 .
Figure 5. RMSE vs. MAE of different users for scenario 1.

Figure 6 25 Figure 5 .
Figure 6 shows a boxplot summarizing the RMSE and MAE value for different subject.The box plot provides the visualization of statistical estimated position max-min.

Figure 6
Figure 6 shows a boxplot summarizing the RMSE and MAE value for different subject.The box plot provides the visualization of statistical estimated position max-min.

Figure 6 .
Figure 6.RMSE vs. MAE of different users for scenario 2.

Figure 7
Figure7illustrates the estimated RMSEs of each coordinate x, y, and z, and the total trajectory.The coordinates for x, y, and z showed increased errors because the user direction changed.The test trajectories' length along the x-axis was twice the length along the y-axis, which might be related to the observed large RMSE.

Figure 6 .
Figure 6.RMSE vs. MAE of different users for scenario 2.

Figure 7 Figure 7 .
Figure 7 illustrates the estimated RMSEs of each coordinate x, y, and z, and the total trajectory.The coordinates for x, y, and z showed increased errors because the user direction changed.The test trajectories' length along the x-axis was twice the length along the y-axis, which might be related to the observed large RMSE.Electronics 2023, 12, x FOR PEER REVIEW 15 of 25

Figure 7 .
Figure 7. Boxplot of user trajectory RMSEs with CNN-LSTM model where (a) is user 1; (b) is user 2; (c) is user 3; (d) is user 4.The trajectory of each sequence pose estimated by the model in our datasets was shown in Figure8.

Figure 9 .
Figure9shows a comparison of each user's trajectory of scenario 2. For our results, we displayed the root mean square error for each sequence from the four runs plotted in Figure9a-d.

Figure 9 .
Figure9shows a comparison of each user's trajectory of scenario 2. For our results, we displayed the root mean square error for each sequence from the four runs plotted in Figure9a-d.

Figure 9 .
Figure 9.The estimated trajectory of the proposed method for scenario 2 (a) user 1 (b) user 2 (c) user 3 (d) user 4.Figure 9.The estimated trajectory of the proposed method for scenario 2 (a) user 1 (b) user 2 (c) user 3 (d) user 4.

Figure 11 .
Figure 11.Each user's training and validation loss on selected epochs for the model.(a) Training loss for 100 epochs; (b) validation loss for 100 epochs; (c) training loss for 500 epochs; (d) validation loss for 500 epochs.

Figure 12 .
Figure 12.The trajectory of conventional SHS-PDR.The return position error of the two scenarios evaluated for long-range distance for comparison.

Figure 13 .
Figure 13.Cumulative distribution function (CDF) of the conventional SHS PDR horizontal position error.(The office building scenario 2).

Figure 12 .
Figure 12.The trajectory of conventional SHS-PDR.The return position error of the two scenarios evaluated for long-range distance for comparison.
compares error accumulation as the trajectory increases in the traditional PDR SHS method.The CNN-LSTM PDR was used to compromise the resulting position drift.Electronics 2023, 12, x FOR PEER REVIEW 19 of 25

Figure 12 .
Figure 12.The trajectory of conventional SHS-PDR.The return position error of the two scenarios evaluated for long-range distance for comparison.

Figure 13 .
Figure 13.Cumulative distribution function (CDF) of the conventional SHS PDR horizontal position error.(The office building scenario 2).

Figure 13 .
Figure 13.Cumulative distribution function (CDF) of the conventional SHS PDR horizontal position error.(The office building scenario 2).

Figure 14 .
Figure 14.Ground truth and estimated trajectories on a dataset from the public repository provided by[17].(a,c,e) shows estimated trajectory with normal training data.(b,d,f) shows the estimated trajectory with tuned better optimizer and learning rate.The result in the left column is from the publicly released model[17].Our results are shown in the right column and are obtained from a model trained with an Adam optimizer with a learning rate of 3e−4 (see Table4).

Table 3 .
Minimum validation loss achieved by each combination of optimizer and learning rate.

Table 4 .
Comparison of the test errors for different learning rate and optimizer combinations.Test error metrics averaged across all test trajectories for each learning rate and optimizer combination.

Table 5 .
The RMSE and MAE comparison for each sequence measurement by each user (scenario 1).

Table 6 .
The RMSE and MAE comparison for each sequence measurement by each user (scenario 2).

Table 6 .
The RMSE and MAE comparison for each sequence measurement by each user (scenario 2).

Table 6 .
The RMSE and MAE comparison for each sequence measurement by each user (scenario 2).

Table 7 .
Comparison of CNN-LSTM localization accuracy with traditional PDR trajectory (first scenario).

Table 8 .
Comparison of different deep learning-based and machine learning techniques used for IMU data-based PDR.

Table 8 .
Comparison of different deep learning-based and machine learning techniques used for IMU data-based PDR.