Smartphone Sensor-Based Human Locomotion Surveillance System Using Multilayer Perceptron

Featured Application: The proposed methodology is an application for monitoring people, tracking, and localization, which is evaluated over several challenging benchmark datasets. The technique can be applied in advanced surveillance security systems that help to ﬁnd targeted persons, to track their functional movements, and to observe their daily actions. Abstract: Applied sensing technology has made it possible for human beings to experience a revo-lutionary aspect of the science and technology world. Along with many other ﬁelds in which this technology is working wonders, human locomotion activity recognition, which ﬁnds applications in healthcare, smart homes, life-logging, and many other ﬁelds, is also proving to be a landmark. The purpose of this study is to develop a novel model that can robustly handle divergent data that are acquired remotely from various sensors and make an accurate classiﬁcation of human locomotion activities. The biggest support for remotely sensed human locomotion activity recognition (RS-HLAR) is provided by modern smartphones. In this paper, we propose a robust model for an RS-HLAR that is trained and tested on remotely extracted data from smartphone-embedded sensors. Initially, the system denoises the input data and then performs windowing and segmentation. Then, this preprocessed data goes to the feature extraction module where Parseval’s energy, skewness, kurtosis, Shannon entropy, and statistical features from the time domain and the frequency domain are extracted from it. Advancing further, by using Luca-measure fuzzy entropy (LFE) and Lukasiewicz similarity measure (LS)–based feature selection, the system drops the least-informative features and shrinks the feature set by 25%. In the next step, the Yeo–Johnson power transform is applied, which is a maximum-likelihood-based feature optimization algorithm. The optimized feature set is then forwarded to the multilayer perceptron (MLP) classiﬁer that performs the classiﬁcation. MLP uses the cross-validation technique for training and testing to generate reliable results. We designed our system while experimenting on three benchmark datasets namely, MobiAct_v2.0, Real-World HAR, and Real-Life HAR. The proposed model outperforms the existing state-of-the-art models by scoring a mean accuracy of 84.49% on MobiAct_v2.0, 94.16% on Real-World HAR, and 95.89% on Real-Life HAR. Although our system can accurately differentiate among similar activities, excessive noise in data and complex activities have shown an inverse effect on its performance.


Introduction
With the evolution of the technological world, remote sensing has secured an indispensable role in multitudinous fields. It enables researchers to get a huge amount of data to design smart applications, and in many cases, it does not even cause any disturbance in the environment. One of the remote applications that have caught the attention of researchers is remotely sensed human locomotion activity recognition (RS-HLAR) [1][2][3][4][5][6][7]. Some examples of human locomotion activities are walking, sitting, jogging, jumping, climbing stairs, coming down the stairs, standing, and running. The accurate recognition of these activities can assist the design process of a variety of applications that are related to smart homes [8][9][10][11], life logging [12][13][14][15][16][17], indoor localization [18], healthcare [19][20][21], rescue [22], fitness [23], surveillance [24], and entertainment [25]. As an exemplary scenario, let us consider that a person is in a huge and crowded shopping mall and his location is to be identified. This task that seems to be arduous can be easily taken care of, if we remotely track the person's locomotion activities and record them with their timestamps.
When talking about remote devices, smartphones are second to none. Present-age smartphones have become very powerful by virtue of the sensors that are embedded in them [26]. Another quality that makes them eminent is that they are ubiquitous. With these features, smartphone technology proves to be one of the best means to design remote applications. Types of smartphone sensors that have been used for remotely sensed human locomotion activity recognition (RS-HLAR) include inertial sensors and proximity sensors [27,28]. A typical inertial sensor that is used for HLAR is an accelerometer, which measures the force along the x, y, and z-axis [29][30][31]. However, while using the smartphone-embedded sensors, the significant issue of the position and orientation of the smartphone arises as every individual has a preference of carrying their smartphone either in their hand or in pocket or a bag. This issue is resolved by adding some assistive sensors to the framework. An accelerometer combined with a gyroscope that measures the angular velocity about the x, y, and z axes and a magnetometer that gives the magnetic field intensity in the x, y, and z directions resolves the position and orientation issue and makes the system user-independent [32]. In addition to these, other assistive sensors can also be added to the system to enhance its accuracy.
Despite all these advances, several challenges are also faced in RS-HLAR. In the wireless transmission of the sensor signals, there is a large possibility of noise intrusion and even complete distortion of the signal in the worst case. In such cases, the identification and classification of the performed locomotion activity becomes burdensome. Another challenge is the compound and complex locomotion activities. These are the activities that are composed of two or more individual activities, e.g., a person has fallen when they were trying to sit on a chair. The proposed system tackles the mentioned challenges in an efficient way and produces better results than those of the available state-of-the-art (SOTA) methods.
For this article, we propose a comprehensive and a robust model Smartphone Sensors Remote Data classification (SSRDC) for RS-HLAR while working in a remote sensing environment. We adopted three challenging datasets that provide real-life-depicting data of human locomotion activities. Every dataset provides different locomotion activities' data while using a distinct combination of various smartphone sensors including accelerometer, gyroscope, magnetometer, global positioning system (GPS), light sensor, and microphone. We introduced an efficient blend of features including Parseval's energy, time-domain statistics, and frequency-domain statistics. We also implement feature selection using Lukasiewicz similarity measure (LS), Luca-measure fuzzy entropy (LFE), and Yeo-Johnson power transform, which is a novel feature optimization technique. Finally, locomotion activity recognition is performed by a multilayer perceptron (MLP). The chief contributions of this paper are as follows:

•
We implement a feature selection framework based on Lukasiewicz similarity measure (LS) and Luca-measure fuzzy entropy (LFE). Quality features produce more accurate results.
• To optimize the selected features, the Yeo-Johnson power transform is implemented, which brings the data into a sophisticated shape and consequently supports better classification.

•
We implement a multilayer perceptron (MLP) and tune its parameters for the final classification of human locomotion activities.

•
We introduce novel features to the field of RS-HLAR in the shape of Parseval's energy and auto-regressive coefficients that have been popularly used in EEG signal and acoustic signal processing, respectively.
The rest of the article is formatted as follows: Section 2 contains the related work on the field of RS-HLAR. Section 3 comprehensively covers the materials and methods used in this study. Section 4 shows the results for all the datasets used in the proposed SSRDC system and compares the system's performance with the available state-of-the-art models. Section 5 discusses the results in a broader perspective and highlights the strengths and limitations of the system, while Section 6 concludes the research article and describes the future plans.

Related Work
The recent advances in the recognition of remotely sensed human locomotion activities direct us toward very efficient and productive techniques. We conducted a comprehensive literature review to explore the remote sensing techniques previously adopted, so that we can develop a better and more effective model for the concerned research problem. In this section, we divide the literature review into two sub-sections, i.e., RS-HLAR using inertial sensors and RS-HLAR using proximity sensors.

RS-HLAR Using Inertial Sensors
Inertial sensors are held in high esteem when it comes to human locomotion activity recognition. As modern smartphones have very useful inertial sensors such as accelerometers, gyroscopes, and magnetometers embedded in them, RS-HLAR also benefits from them. Bashar et al. [33] extracted hand-crafted features from smartphone-embedded gyroscope and accelerometer and selected the best ones among them using neighborhood component analysis. Then, they used a dense neural network for the classification of locomotion activities. Using smartphone accelerometer data, Shan et al. [34] extracted deep features from temporal and spatial domains. Their locomotion activity classification system, namely C4M4BL, consisted of four convolutional layers, four max-pooling layers, and one bidirectional long short-term-memory (LSTM) layer. Xie et al. [35] compared the performance of different kernels of a support vector machine (SVM) in recognizing the locomotion activities such as climbing stairs, coming down the stairs, walking, and standing. They used a smartphone-embedded accelerometer, gyroscope, and magnetometer to extract various frequency-and time-domain features. After that, they employed a multiclass SVM in a one vs. all mode. Their 10-fold cross-validated results acknowledge the credibility of their system. In [36], the authors created a one-dimensional magnitude vector by using instantaneous values of a tri-axial smartphone accelerometer. After that, the magnitude vector was forwarded to a one-dimensional convolutional neural network (CNN) that auto-generated features and then performed the classification. Azmat et al. used the inertial sensors of a smartphone to classify human locomotion activities such as jumping, walking, standing, jogging, etc. Initially, they split the activities into static and dynamic categories using a cross-correlation-based template matching approach. Then, they separately extracted time-and frequency-domain features for each branch (static and dynamic). Following this, for the data optimization, a vector-quantization-based codebook was generated. For dynamic activities, they employed a multiclass SVM classification model, while for static activities, they used a Gaussian naïve Bayes (GNB) classification algorithm [37].
Another approach that was followed in [38] used a hybrid CNN-LSTM model in which they introduced an attention mechanism. The smartphone sensors used in this system were accelerometers, gyroscopes, and magnetometers. In another work, a model that combines a deep bidirectional long short-term memory (DBLSTM) and CNN is proposed. While using an abstract approach, a DBLSTM serializes the smartphone accelerometer and gyroscope data and generates a bidirectional output vector. As the DBLSTM model is not good for feature extraction, the feature extraction task is assigned to a CNN. Finally, a SoftMax function is implemented in the final layer of the network to perform the classification of the activities [39].

RS-HLAR Using Proximity Sensors
Another kind of sensors that has been used for RS-HLAR is proximity sensors. These are electromagnetic-radiation-based sensors that can sense and recognize activities without any physical contact. Some examples of proximity sensors are acoustic sensors, infrared sensors, and radars. A lot of research has been done on RS-HLAR using proximity sensors. Guo et al. [40] used a two-dimensional array of acoustic sensors that consisted of 4 transmitters and 256 receivers. The transmitters sent ultrasonic sinusoidal waves, and the receivers received reflected waves. In this way, the samples were gathered in the form of tri-axial acoustic data to extract frequency-domain and time-domain features. Then they forwarded those features to a vanilla CNN for the classification of walking, sitting, falling, and standing. Tripathi et al. [41] used acoustic sensors to detect the locomotion activities of humans at bus stops and parks and generated perceptual features. They used an ensemble of one-class classifiers that were based on fuzzy rules. Additionally, they validated their method using real data and then compared the results using an SVM classifier. She et al. [42] proposed a human activity recognition model based on a micro-doppler radar system. A data augmentation method was utilized that consisted of three major operations, i.e., frequency disturbance, time shift, and frequency shift. Radar data was recorded while targeting the subject performing locomotion and then these data were converted into a spectrogram. The activity was recognized based on the change in speed and frequency. They used various deep architectures and compared their performances using their augmentation concept. The networks they experimented with include Kim and Moon's architecture [43], Inception-ResNet [44], Jordan's net [45], ResNet-18 [46], and CNN-RNN [47].
Li et al. [48] worked on human activity recognition based on radar. Following the traditional approach, they acquired radar signals from the targets and then generated spectrograms. Their major contribution was that they proposed a transfer-learning-based semi-supervised algorithm. The algorithm consisted of two modules, i.e., supervised semantic transfer and unsupervised domain adaptation. They labeled every few spectrograms and achieve good results in the classification of a total of six locomotion activities.

Materials and Methods
Sensory data were remotely acquired from smartphones and then denoising was performed using a second-order Chebyshev type-I filter. After that, we windowed the input signal using a rectangular window of five seconds duration. By combining three windows in each segment, we segmented the data. Then we extracted features from the segmented data. As multiple sensors were used in this research, and some sensors had more than one channel, features were extracted for each channel of each sensor separately. Then by concatenating all the features, feature vectors were generated and placed in a single data frame. Considering the fact that garbage in leads to garbage out, we rejected the least-informative features using feature selection based on Luca-measure fuzzy entropy (LFE) and Lukasiewicz similarity measure (LS). By selecting the useful features, we reduced the size of the feature data frame by 25%. In the next step, we revamped the feature data frame to have a more sophisticated distribution by utilizing the Yeo-Johnson power transform. The architecture of the proposed system is shown in Figure 1. and Lukasiewicz similarity measure (LS). By selecting the useful features, we reduced the size of the feature data frame by 25%. In the next step, we revamped the feature data frame to have a more sophisticated distribution by utilizing the Yeo-Johnson power transform. The architecture of the proposed system is shown in Figure 1.

Preprocessing
The very first step of data processing is denoising. For this purpose, we used a second-order Chebyshev type-I filter [49] with a cutoff frequency of 0.001. The mentioned filter rejects noise very well and provides a signal with a good signal-to-noise ratio (SNR) at the output. Figure 2 represents the noisy and denoised signals for the x, y, and z channels of the magnetometer for the walking activity for all three datasets. A similar approach was followed to denoise the data obtained from other sensors corresponding to other activities. We chose the walking activity to plot because walking is a common activity among all three datasets, thus supporting a good comparison among datasets.

Preprocessing
The very first step of data processing is denoising. For this purpose, we used a secondorder Chebyshev type-I filter [49] with a cutoff frequency of 0.001. The mentioned filter rejects noise very well and provides a signal with a good signal-to-noise ratio (SNR) at the output. Figure 2 represents the noisy and denoised signals for the x, y, and z channels of the magnetometer for the walking activity for all three datasets. A similar approach was followed to denoise the data obtained from other sensors corresponding to other activities. We chose the walking activity to plot because walking is a common activity among all three datasets, thus supporting a good comparison among datasets. Figure 2a and Lukasiewicz similarity measure (LS). By selecting the useful features, we reduced the size of the feature data frame by 25%. In the next step, we revamped the feature data frame to have a more sophisticated distribution by utilizing the Yeo-Johnson power transform. The architecture of the proposed system is shown in Figure 1.

Preprocessing
The very first step of data processing is denoising. For this purpose, we used a second-order Chebyshev type-I filter [49] with a cutoff frequency of 0.001. The mentioned filter rejects noise very well and provides a signal with a good signal-to-noise ratio (SNR) at the output. Figure 2 represents the noisy and denoised signals for the x, y, and z channels of the magnetometer for the walking activity for all three datasets. A similar approach was followed to denoise the data obtained from other sensors corresponding to other activities. We chose the walking activity to plot because walking is a common activity among all three datasets, thus supporting a good comparison among datasets.

Windowing and Segmentation
In locomotion activities, there are repeating patterns. For example, if a person is walking, the basic step is "taking a step", and then, this basic step repeats itself throughout the locomotion activity. Processing a signal's windows where each window contains the basic pattern of the locomotion activity produces better results as compared to processing the signal as a whole. According to our experimentation, a five-second window worked in the best way to produce precise results. In addition to windowing, we also produced segments of the signal where each segment contained three windows of the input signal and in this way, covered the complete signal. Equation (1) shows the segmentation of the signal.
where is the pth segment and represents the qth window. Meanwhile, it is to be asserted that must be less than − 2, where is the last window index of the signal [50].

Feature Extraction
After denoising the data, we extracted a total of 15 features from the data some of which are mentioned along with their description and formulation in Table 1.

Windowing and Segmentation
In locomotion activities, there are repeating patterns. For example, if a person is walking, the basic step is "taking a step", and then, this basic step repeats itself throughout the locomotion activity. Processing a signal's windows where each window contains the basic pattern of the locomotion activity produces better results as compared to processing the signal as a whole. According to our experimentation, a five-second window worked in the best way to produce precise results. In addition to windowing, we also produced segments of the signal where each segment contained three windows of the input signal and in this way, covered the complete signal. Equation (1) shows the segmentation of the signal.
where is the pth segment and represents the qth window. Meanwhile, it is to be asserted that must be less than − 2, where is the last window index of the signal [50].

Feature Extraction
After denoising the data, we extracted a total of 15 features from the data some of which are mentioned along with their description and formulation in Table 1.

Windowing and Segmentation
In locomotion activities, there are repeating patterns. For example, if a person is walking, the basic step is "taking a step", and then, this basic step repeats itself throughout the locomotion activity. Processing a signal's windows where each window contains the basic pattern of the locomotion activity produces better results as compared to processing the signal as a whole. According to our experimentation, a five-second window worked in the best way to produce precise results. In addition to windowing, we also produced segments of the signal where each segment contained three windows of the input signal and in this way, covered the complete signal. Equation (1) shows the segmentation of the signal.
where seg p is the pth segment and w q represents the qth window. Meanwhile, it is to be asserted that q must be less than r − 2, where r is the last window index of the signal [50].

Feature Extraction
After denoising the data, we extracted a total of 15 features from the data some of which are mentioned along with their description and formulation in Table 1.
Other features that include the maximum and the minimum point difference and their ratio in the frequency domain, median, mode, and min and max points of time and frequency domain [51][52][53][54][55][56][57][58][59][60][61][62][63] require a complete mathematical procedure to be followed for their computation, and therefore, we did not mention them in Table 1. The features that are mentioned in Table 1 can be graphically observed in Figure 5.

Feature Description Formulation
Parseval's Energy [43] Energy of the signal in time domain is equal to the energy of the signal in frequency domain (Parseval Theorem).
Skewness [44] It is a measure of the symmetry of a distribution.
Kurtosis [44] It compares the tails of the distribution to the tails of a distribution.
Shannon Entropy [45] It is the expected amount of information in an instance of the distribution.
These are the auto-regressive coefficients of a distribution.

Feature Description Formulation
Parseval's Energy [43] Energy of the signal in time domain is equal to the energy of the signal in frequency domain (Parseval Theorem).

| ( )|
Skewness [44] It is a measure of the symmetry of a distribution.
Kurtosis [44] It compares the tails of the distribution to the tails of a distribution.
Shannon Entropy [45] It is the expected amount of information in an instance of the distribution. * − ( ) AR-Coefficients [46] These are the auto-regressive coefficients of a distribution. Other features that include the maximum and the minimum point difference and their ratio in the frequency domain, median, mode, and min and max points of time and frequency domain [51][52][53][54][55][56][57][58][59][60][61][62][63] require a complete mathematical procedure to be followed for their computation, and therefore, we did not mention them in Table 1. The features that are mentioned in Table 1 can be graphically observed in Figure 5.

Feature Selection
To select the best feature from the extracted features, we apply Luca-measure fuzzy entropy (LFE) and Lukasiewicz similarity measure (LS)-based feature selection. It works in such a way that we provide a feature set = ( , , … , ) to the algorithm, and after

Feature Selection
To select the best feature from the extracted features, we apply Luca-measure fuzzy entropy (LFE) and Lukasiewicz similarity measure (LS)-based feature selection. It works in such a way that we provide a feature set A = (a 1 , a 2 , . . . , a n ) to the algorithm, and after the optimization of an objective function, it provides a subset B = (a 1 , a 2 , . . . , a m ) of A with m < n. The parameters used for the concerned algorithm were LFE and LS. LFE is given by Equation (2), where µ A x j has a value within the range of [0, 1], and LS is defined by Equation (3), where f r is the feature set. Moreover, x is the input and v is the mean vector for a certain class [64].
S(x, y) = 1 We experimented using a different number of features, but we obtained the best results with a set of 11 features. In this way, we reduced the size of our feature set by approximately 25%. The 11 features selected using the mentioned feature selection algorithm were Parseval's energy, skewness, kurtosis, Shannon entropy, AR-coefficients, minimum, maximum mode, and median of the time-domain data, FFT minimum and maximum point difference, and FFT minimum and maximum point ratio. The feature selection flow diagram is presented in Figure 6.

Feature Selection
To select the best feature from the extracted features, we apply Luca-measure fuzzy entropy (LFE) and Lukasiewicz similarity measure (LS)-based feature selection. It works in such a way that we provide a feature set = ( , , … , ) to the algorithm, and after the optimization of an objective function, it provides a subset = ( , , … , ) of with m < n. The parameters used for the concerned algorithm were LFE and LS. LFE is given by Equation (2), where has a value within the range of [0,1], and LS is defined by Equation (3), where is the feature set. Moreover, is the input and is the mean vector for a certain class [64].
We experimented using a different number of features, but we obtained the best results with a set of 11 features. In this way, we reduced the size of our feature set by approximately 25%. The 11 features selected using the mentioned feature selection algorithm were Parseval's energy, skewness, kurtosis, Shannon entropy, AR-coefficients, minimum, maximum mode, and median of the time-domain data, FFT minimum and maximum point difference, and FFT minimum and maximum point ratio. The feature selection flow diagram is presented in Figure 6.

Feature Optimization
By concatenating the selected features, we designed a data frame and labeled each example so that classification can be performed. However, before performing the classification, data must be optimized so that the classifier can distinguish among classes more efficiently. For optimization, we used the Yeo-Johnson power transform, which brings the data into a more sophisticated form obeying the formulation given in Equation (4), where s is the input array and the value of α can be 0.5, 0 or −1. This transformation generated very efficient results and helped in boosting the accuracy of the system [65].

Classification
The final step of the proposed system was classification, which was performed by a multilayer perceptron (MLP). MLP is a feed-forward neural network (FFNN) that has an input layer, one or more hidden layers, and an output layer. The mathematical representation [66] of an MLP is given in Equation (5), where Ym is the output of the mth perceptron and wmi represents the mth weight that is multiplied with ith input xi. Other than this, the bias of the mth perceptron is represented by bm, n is the number of total neurons in the current layer, and f is the representative of the activation function.

Classification
The final step of the proposed system was classification, which was performed by a multilayer perceptron (MLP). MLP is a feed-forward neural network (FFNN) that has an input layer, one or more hidden layers, and an output layer. The mathematical representa-tion [66] of an MLP is given in Equation (5), where Y m is the output of the mth perceptron and w mi represents the mth weight that is multiplied with ith input x i . Other than this, the bias of the mth perceptron is represented by b m , n is the number of total neurons in the current layer, and f is the representative of the activation function.
MLP works in such a way that it takes features as input and multiplies those features with the initial weights in the hidden layers and then sends the weighted features to an activation function that gives the output as a probability distribution. The instance with the highest probability is declared as a class of the input. The structure of an MLP is depicted in Figure 8.
Real-Life HAR original vs. power-transformed feature vector.

Classification
The final step of the proposed system was classification, which was performed by a multilayer perceptron (MLP). MLP is a feed-forward neural network (FFNN) that has an input layer, one or more hidden layers, and an output layer. The mathematical representation [66] of an MLP is given in Equation (5), where Ym is the output of the mth perceptron and wmi represents the mth weight that is multiplied with ith input xi. Other than this, the bias of the mth perceptron is represented by bm, n is the number of total neurons in the current layer, and f is the representative of the activation function.
MLP works in such a way that it takes features as input and multiplies those features with the initial weights in the hidden layers and then sends the weighted features to an activation function that gives the output as a probability distribution. The instance with the highest probability is declared as a class of the input. The structure of an MLP is depicted in Figure 8.

Experimental Results
The primary evaluation metric for our system was the mean accuracy of the system. To compute the mean accuracy, we adopted the 10-fold cross-validation technique so that the results are more general and dependable. All the experimentation was performed on a Windows-10 operating system having 16 GB RAM, and a processor of core-i7-7500U CPU @2.70 GHz. After a brief description of the datasets, we present the results of four experiments that we performed on each dataset. In the first experiment, we generated receiver operating characteristics (ROC) curves for each class separately. In the second experiment, we compared the accuracies for the classification of individual classes with the help of confusion matrices. In the next experiment, we analyzed the proposed system's performance while using some other well-known classifiers. Finally, we made a comparison of our system with the available state-of-the-art techniques.

Datasets Description
The MobiAct_v2.0 dataset provides data for smartphone inertial sensors, i.e., accelerometers, gyroscopes, and orientation sensors. For this dataset, the orientation sensor is a software-based sensor that works as a magnetometer and provides data for the magnetic field intensity in the x, y, and z directions. There are 15 locomotion activities to be classified based on the provided sensor data that are shown in Table 2. A notable point here is that the subjects performing the activities are not consistent for every activity. Therefore, some classes have more samples than others and vice versa. Stand-to-sit Lying Driving 5 Sit-to-stand Running -6 Car step-in Sitting -7 Car step-out Standing -8 Jogging Walking -9 Jumping --10 Standing --11 Sitting --12 Forward lying --13 Back-sitting chair --14 Front-knees lying --15 Sideward lying -- The Real-World HAR Dataset collects data from 15 subjects for a total of 8 locomotion activities that are mentioned in Table 2. This dataset collects a large amount of data using the six sensors of a smartphone, i.e., accelerometer, gyroscope, magnetometer, GPS, light sensor, and microphone. Moreover, for each locomotion activity, the dataset uses seven smartphones at the same time that are tied to seven different positions of the subject's body. For a single smartphone, the accelerometer, gyroscope, and magnetometer provide data from the following body positions: chest, forearm, head, shin, thigh, upper arm, and waist. However, the GPS, light sensor, and the microphone do not provide data for the forearm. Thus, by combining the data coming from all the smartphones and all the sensors, we obtained a 39-dimensional vector against one example of a specific locomotion activity.
The third and the last dataset used in this study was the Real-Life HAR dataset that provides data from four sensors to classify four locomotion activities. This dataset was released at the end of the year 2020, which makes it the newest among the datasets that we used in this study. The fact that makes this dataset extremely challenging is that the subjects were free to use the smartphones in any way they preferred. Sensors used to collect this dataset include an accelerometer, a gyroscope, a magnetometer, and a GPS. The respective locomotion activities for each dataset are provided in Table 2.

Experiment I: Receiver Operating Characteristic (ROC) Curves SSRDS
The ROC curve shows the performance of the system in terms of the area that is covered under it. The larger the area under the curve, the better the performance and vice versa. For the evaluation of the proposed SSRDC system, we plotted the ROC curves for every class of all the datasets that our system was dealing with. Figure 9 depicts the ROC curves for MobiAct v2.0. Figure 9 shows that the average area under the ROC curve was 0.93, which is a very good number regarding the performance of the proposed model over Mobiact_v2.0. According to Figure 9, the smallest area was covered by "front-knees lying", i.e., 0.77, and the largest covered area, 1.00, was achieved by the "sitting" and "car step-out" classes. Following a similar approach, we plotted the ROC curves for Real-World HAR, which are shown in Figure 10. Figure 10 shows that the average area under the ROC curve was 0.95, which advocates SSRDC system's performance over that of Real-World HAR. According to Figure 10, the smallest area was covered by "stairs-down", i.e., 0.67, and the largest covered area, 1.00, was achieved by the "jumping", "lying", "sitting", "standing", and "walking" classes. The ROC curves for Real-Life HAR are given in Figure 11. According to Figure 11, the average area under the curves for Real-Life HAR was 0.96, which was the greatest number achieved by our system. The proposed SSRDC system performed in the best manner over the Real-Life HAR with a minimum area of 0.90 under the curve for the class "active" and the maximum area was 1.00, which was commonly attained by the "walking" and "driving" classes.
we used in this study. The fact that makes this dataset extremely challenging is that the subjects were free to use the smartphones in any way they preferred. Sensors used to collect this dataset include an accelerometer, a gyroscope, a magnetometer, and a GPS. The respective locomotion activities for each dataset are provided in Table 2.

Experiment I: Receiver Operating Characteristic (ROC) Curves SSRDS
The ROC curve shows the performance of the system in terms of the area that is covered under it. The larger the area under the curve, the better the performance and vice versa. For the evaluation of the proposed SSRDC system, we plotted the ROC curves for every class of all the datasets that our system was dealing with. Figure 9 depicts the ROC curves for MobiAct v2.0.  Figure 9 shows that the average area under the ROC curve was 0.93, which is a very good number regarding the performance of the proposed model over Mobiact_v2.0. According to Figure 9, the smallest area was covered by "front-knees lying", i.e., 0.77, and the largest covered area, 1.00, was achieved by the "sitting" and "car step-out" classes. Following a similar approach, we plotted the ROC curves for Real-World HAR, which are shown in Figure 10.  Figure 10 shows that the average area under the ROC curve was 0.95, which advocates SSRDC system's performance over that of Real-World HAR. According to Figure 10, the smallest area was covered by "stairs-down", i.e., 0.67, and the largest covered area, 1.00, was achieved by the "jumping", "lying", "sitting", "standing", and "walking" classes. The ROC curves for Real-Life HAR are given in Figure 11. According to Figure 11, the average area under the curves for Real-Life HAR was 0.96, which was the greatest number achieved by our system. The proposed SSRDC system performed in the best manner over the Real-Life HAR with a minimum area of 0.90 under the curve for the class "active" and the maximum area was 1.00, which was commonly attained by the "walking" and "driv- 1.00, was achieved by the "jumping", "lying", "sitting", "standing", and "walking" classes. The ROC curves for Real-Life HAR are given in Figure 11. According to Figure 11, the average area under the curves for Real-Life HAR was 0.96, which was the greatest number achieved by our system. The proposed SSRDC system performed in the best manner over the Real-Life HAR with a minimum area of 0.90 under the curve for the class "active" and the maximum area was 1.00, which was commonly attained by the "walking" and "driving" classes.

Experiment II: Individual Locomotion Activity Classification Accuracies
One of the best options to find out how accurately the proposed system classifies an individual locomotion activity is the confusion matrix. We plotted the confusion matrices for all the concerned datasets. The confusion matrix for MobiAct_v2.0 is given in Figure 12.

Experiment II: Individual Locomotion Activity Classification Accuracies
One of the best options to find out how accurately the proposed system classifies an individual locomotion activity is the confusion matrix. We plotted the confusion matrices for all the concerned datasets. The confusion matrix for MobiAct_v2.0 is given in Figure 12.  Figure 12 shows that the lowest recognition accuracy was 70%, and the corresponding locomotion activity was forward lying, while the highest accuracy, that is, 99%, was achieved by the car step-in activity. All other locomotion activities were also predicted with pretty decent accuracies. Following a similar pattern, we analyzed individual accuracies for the Real-World HAR with the help of the confusion matrix given in Figure 13.  Figure 12 shows that the lowest recognition accuracy was 70%, and the corresponding locomotion activity was forward lying, while the highest accuracy, that is, 99%, was achieved by the car step-in activity. All other locomotion activities were also predicted with pretty decent accuracies. Following a similar pattern, we analyzed individual accuracies for the Real-World HAR with the help of the confusion matrix given in Figure 13. Considering the Real-World HAR, the lowest classification accuracy was for the sitting activity, that is, 73%, while jumping and running were classes that were always predicted correctly as their recognition accuracy was 100%. All other activities were also predictable with appreciable accuracy. Last, but not the least, the individual locomotion activity classification accuracies for the Real-Life HAR dataset are provided in the confusion matrix shown in Figure 14. Considering the Real-World HAR, the lowest classification accuracy was for the sitting activity, that is, 73%, while jumping and running were classes that were always predicted correctly as their recognition accuracy was 100%. All other activities were also predictable with appreciable accuracy. Last, but not the least, the individual locomotion activity classification accuracies for the Real-Life HAR dataset are provided in the confusion matrix shown in Figure 14.
The Real-Life HAR confusion matrix shows that all four activities were classified with exceptional accuracy. The walking and active classes were predicted with 95% confidence, while the driving and inactive classes were predicted 100% accurately.

Experiment III: Comparison with Well-Known Classifiers
In this experiment, we analyzed the performance of the proposed system while using two other well-known classifiers namely, K-nearest neighbors (KNN) and AdaBoost. After that, we compared the results with the original multilayer perceptron (MLP)-based system in terms of precision, recall, and F1-score. We repeated the process for all three datasets used in this study. Table 3 represents the comparison results using MobiAct_v2.0, while Table 4 shows the comparison using Real-World HAR, and finally, Table 5 summarizes the comparison results that we obtained using Real-Life HAR.
Considering the Real-World HAR, the lowest classification accuracy was for the sitting activity, that is, 73%, while jumping and running were classes that were always predicted correctly as their recognition accuracy was 100%. All other activities were also predictable with appreciable accuracy. Last, but not the least, the individual locomotion activity classification accuracies for the Real-Life HAR dataset are provided in the confusion matrix shown in Figure 14.     The last row of Table 3 shows the mean precision, recall, and F1-score for each of the used classifiers. Performance statistics clearly state that the AdaBoost classifier was the worst performer among the three, and KNN was the second-best, while MLP performed the classification in the best way.
Considering the mean values of the performance metrics used, it can be concluded that the performance of the AdaBoost classifier on Real-World HAR was better than its performance on MobiAct_v2.0, but still, it performed a poor classification. The K-nearest neighbors classifier displayed a noticeable performance but still could not outperform the multilayer perceptron classifier.
By looking at Table 5, it is evident that the MLP classifier still left the other two method behind with its classification performance. Overall, all the methods performed well, but the SSRDC system was outstanding.

Experiment IV: Comparison with Available State-of-the-Art Techniques
Considering all the datasets that we used, we compared the performance of the SSRDC system with other available SOTA methods. There are various human locomotion activity recognition algorithms that have been applied on MobiAct_v2.0, Real-World HAR, and Real-Life HAR. Table 6 advocates the fact that the proposed SSRDC system outperformed all of them by a good margin. In [67], they used a random forest model with two different modes of the system while working on the Real-World HAR dataset. In the first mode, the system was aware of the position of device on the body of subject; while in other mode, the system was unaware of the device position. In the position-unaware case, their system produced 80.2% accurate results, while in the position-aware case, they predicted locomotion activity with 83.4% accuracy. In [68], a model that was based on cross-subject activity recognition was used, and it scored an 83.1% accuracy. A work referenced at [69], used signal visualization alongside a CNN to produce 92.4% accurate results over Real-World HAR. Finally, another framework [70] that was designed over a Real-World HAR used a CNN and scored 93.8% accuracy. In comparison to these works, the proposed system attained 94.2% accuracy and left the others behind.
Regarding MobiAct_v2.0, works referenced at [71][72][73] used SVM, CNN, and thresholding techniques to predict locomotion activities with 77.9%, 80.7%, and 81.3% accuracy. While our system proved to be 84.5% accurate. We also compared the performance of our system with state-of-the-art works over Real-Life HAR dataset. In [74], they utilized an SVM while working on different combinations of the sensors that are described in Table 6. While using an accelerometer and a GPS, they achieved 60.1% accuracy; with the addition of magnetometer to these two sensors, the accuracy increased to 62.6%; and while using gyroscope data at the same time, they obtained 67.2% accurate results. Another work that can be found at [38], employed an attention-based hybrid model that consisted of a CNN and an LSTM. They also experimented using different sensors. Using a magnetometer, they achieved 70.3% accuracy; with gyroscope data, they managed to obtain 95.2% accurate results; and they predicted with 95.7% accuracy when they used an accelerometer. Our SSRDC system produced 95.9% accurate results for the Real-Life HAR dataset. The accu-racy of our system beat the work in [38] by a small margin. The major difference between both approaches was that we extracted hand-crafted features, selected the best featured, and then used a novel technique of power transformation for feature optimization, while they simply preprocessed the data and trained their deep learning model. Our approach evidently proved better than their approach on Real-Life HAR dataset.

Discussion
In the proposed SSRDC framework, we extracted different features from denoised smartphone sensor data. Then, we performed feature selection; power-transformed the chosen features; and using Multilayer Perceptron, performed the classification. To get started, the proposed system needs the smartphone sensor data. For this purpose, we chose three challenging RS-HLAR datasets that depict real-life scenarios and provide data from various sensors. The very first step of our system was to denoise the data using a Chebyshev type-I filter. From the denoised data, we extracted a total of 15 features including statistical features from the time domain and frequency domain and other features. We rejected the four least informative features with the help of LFE and LS-based feature selection. As a result, we determined the 11 most informative features. These 11 features were selected because a set of these 11 features provided the best accuracy. If we added or eliminated any feature in this set, it had a negative effect on the system's performance. Raw data produces bad results, so, we employed power transform for data optimization. Finally, we labeled the data and sent it to a multilayer perceptron (MLP) for classification. After tuning the hyperparameters of the MLP, we achieved excellent classification results.
Although the system classifies human locomotion activities very well, it still has some limitations. When the system has to detect locomotion activities that are a combination of multiple locomotion activities, it finds the task more difficult as compared to the recognition of a single locomotion activity. An example of such a combination of locomotion activities can be a person performing acrobatics in which they rapidly go from one position to another. As the SSRDC system operates on remotely sensed data, there is a great chance of signal distortion, which can result in complete loss of useful data in some cases. The failure cases of the SSRDC system are picturized in Figure 15. est model [67] achieved 80.2% and 83.4% accuracy for each modes, respectively. SSRDC system's mean accuracy over MobiAct_v2.0 was 84.49%, while the approaches used in [71][72][73] show a mean accuracy of 77.9%, 80.7%, and 81.3%, respectively. Finally, the techniques used over Real-Life HAR in [38,74], scored 70.3% and 95.7% respectively in terms of accuracy, while our system gave up to 95.89% accurate results. More detailed comparisons can be found in Table 6. These statistics show that the proposed system outperforms the available state-of-the-art techniques.
(a) (b) Figure 15. Failure cases for SSRDC system: (a) complex activity and (b) extreme signal distortion.

Conclusions
The proposed SSRDC system aims to classify human locomotion activities using remotely sensed data. In this regard, we first denoised the sensor data using a second-order Chebyshev type-I filter. Secondly, we divided the input signal into windows and then generated signal segments using those windows. Then we extracted divergent features from the data. Feature extraction was followed by an LFE and LS-based feature selection algorithm that reduced the size of the feature set by 25%. To optimize the features, the Yeo-Johnson power transform was applied, and then, data were sent to an MLP for the classification purpose.

Research Limitations
Although the outstanding performance of the proposed SSRDC system leaves stateof-the-art techniques behind, it certainly has some limitations also. The first point of concern is the correct recognition of complex locomotion activities that are a combination of two or more individual locomotion activities, and as our system deals with remote data, there can be some cases when data are lost during the wireless communication and the SSRDC system might not recognize the activity.

Future Work
Revolving around the smartphone sensors, the future directions of our research may proceed to pinpointing the location of a human being in an indoor environment along with the recognition of their locomotion activity. Meanwhile, we will be working to better our SSRDC system by improving our algorithms and adding more sensors in the framework. The results and analysis of the proposed system are as follows. The mean accuracy of the proposed system over Real-World HAR was 94.16%, while a previous implementation that include position-unaware and position-aware system modes with a random forest model [67] achieved 80.2% and 83.4% accuracy for each modes, respectively. SSRDC system's mean accuracy over MobiAct_v2.0 was 84.49%, while the approaches used in [71][72][73] show a mean accuracy of 77.9%, 80.7%, and 81.3%, respectively. Finally, the techniques used over Real-Life HAR in [38,74], scored 70.3% and 95.7% respectively in terms of accuracy, while our system gave up to 95.89% accurate results. More detailed comparisons can be found in Table 6. These statistics show that the proposed system outperforms the available state-of-the-art techniques.

Conclusions
The proposed SSRDC system aims to classify human locomotion activities using remotely sensed data. In this regard, we first denoised the sensor data using a second-order Chebyshev type-I filter. Secondly, we divided the input signal into windows and then generated signal segments using those windows. Then we extracted divergent features from the data. Feature extraction was followed by an LFE and LS-based feature selection algorithm that reduced the size of the feature set by 25%. To optimize the features, the Yeo-Johnson power transform was applied, and then, data were sent to an MLP for the classification purpose.

Research Limitations
Although the outstanding performance of the proposed SSRDC system leaves state-ofthe-art techniques behind, it certainly has some limitations also. The first point of concern is the correct recognition of complex locomotion activities that are a combination of two or more individual locomotion activities, and as our system deals with remote data, there can be some cases when data are lost during the wireless communication and the SSRDC system might not recognize the activity.

Future Work
Revolving around the smartphone sensors, the future directions of our research may proceed to pinpointing the location of a human being in an indoor environment along with the recognition of their locomotion activity. Meanwhile, we will be working to better our SSRDC system by improving our algorithms and adding more sensors in the framework.