SVSL: A Human Activity Recognition Method Using Soft-Voting and Self-Learning

: Many smart city and society applications such as smart health (elderly care, medical applications), smart surveillance, sports, and robotics require the recognition of user activities, an important class of problems known as human activity recognition (HAR). Several issues have hindered progress in HAR research, particularly due to the emergence of fog and edge computing, which brings many new opportunities (a low latency, dynamic and real-time decision making, etc.) but comes with its challenges. This paper focuses on addressing two important research gaps in HAR research: (i) improving the HAR prediction accuracy and (ii) managing the frequent changes in the environment and data related to user activities. To address this, we propose an HAR method based on Soft-Voting and Self-Learning (SVSL). SVSL uses two strategies. First, to enhance accuracy, it combines the capabilities of Deep Learning (DL), Generalized Linear Model (GLM), Random Forest (RF), and AdaBoost classiﬁers using soft-voting. Second, to classify the most challenging data instances, the SVSL method is equipped with a self-training mechanism that generates training data and retrains itself. We investigate the performance of our proposed SVSL method using two publicly available datasets on six human activities related to lying, sitting, and walking positions. The ﬁrst dataset consists of 562 features and the second dataset consists of ﬁve features. The data are collected using the accelerometer and gyroscope smartphone sensors. The results show that the proposed method provides 6.26%, 1.75%, 1.51%, and 4.40% better prediction accuracy (average over the two datasets) compared to GLM, DL, RF, and AdaBoost, respectively. We also analyze and compare the class-wise performance of the SVSL methods with that of DL, GLM, RF, and AdaBoost.


Introduction
Smart cities and societies, also known as Artificially Intelligent cities [1], are characterized by their ability to allow us "to "engage" with our environments, analyze them, and make decisions, all in a timely manner" [2,3]. We "engage" with our environments using a range of sensors, including smartphones, the Internet of Things (IoT), cameras, GPS, social media, etc. [4]. The data produced by these sensors are analyzed for timely analysis and decision-making using a range of mathematical methods and simulations. Increasingly, artificial intelligence methods, particularly machine and deep learning methods, have become the methods of choice in smart city applications [5][6][7].
Many smart city and society applications, such as smart health (elderly care, medical applications), smart surveillance, sports, and robotics require recognition of node and user activities, a class of problems known as human activity recognition (HAR) [8][9][10]. The increasing importance of HAR is due to the many smart city applications that allow for the dynamic optimization of services based on the user location and the activity being carried out by the user at a particular time, which is made possible by smartphones, smartwatches, smart wearables, etc. Today's smartphones, smartwatches, and other smart wearables are equipped with several sensors. Wearable sensors are small hardware devices. These could higher number of misclassification instances (an observation that we plan to investigate further in the future, with the expectation of further success). We also show that the self-training mechanism of the proposed SVSL method increases the average prediction accuracy from 99.26% to 99.37% for dataset I, and 97.15%, to 97.59% for dataset II.
The rest of the paper is organized as follows. A review of the relevant literature is given in Section 2. The proposed SVSL method is described in Section 3. In Section 4, the proposed SVSL method is evaluated using the provided results. Finally, in Section 5, we conclude the paper and provide a direction for future work.

Literature Review
The most straightforward and practical application of human activity recognition (HAR) involves using wearable devices to track individuals while they perform their daily activities. Humans perform many exercises every day, such as walking, running, football games, dozing, and eating, and are constantly expanding these exercises. Therefore, these data have an enormous significance in educating us about different parts of human existence and what they mean in people's lives. HAR has many applications and can be used in various domains, for example, smart cities [24], elderly and child care [11], physical rehabilitation [25], identifying criminal behavior and violence [26], real-time content delivery [13], surveillance [14], etc. The two types of sensors that are usually used for human activity recognition include visual sensors, such as cameras, which produce video as data, and non-visual sensors, such as accelerometers and gyroscope sensors, which generate numerical data. Machine learning strategies are on the leading edge and play a key role in HAR. Over time, these AI-based HAR techniques have improved the accuracy in prediction and working on complex information. RF, Support vector machine (SVM), decision tree, and DL are the most well-known decision-making strategies for HAR.
Palaniappan et al. [27] focused on recognizing strange human practices. Unusual exercises are sudden occasions that occur arbitrarily. Human practices can be perceived using a variety of methods. The most commonly used method is multi-class SVM. Palaniappan et al. [27] proposed a new plan to address human practices in the form of a state progress table. The changing table helps keep the classifier away from the states that are inaccessible from the current state. By staying away from unreachable states, the computational time for grouping is radically reduced compared to that using conventional methods.
Many classifiers face the constraints of a long training time and large feature vector size. Chaturmali and Rodrigo [28] proposed a method based on the SVM classifier, addressing the problems in human activity recognition using an existing spatio-temporal feature descriptor. A comparison of the system proposed in [28] with existing classifiers using two standard datasets shows that the system in [28] is much better in terms of computational time and either exceeds the existing recognition rates or is equal to it. Several other SVM-based methodologies have been proposed in the literature [29][30][31][32], referred to for further information.
Computing feature importance is a critical task in HAR problems, as it helps to decrease the features that do not hold any relevance, and removable features can increase classification performance. Uddin and Udiny [33] proposed a random forest-based feature importance method for HAR problems. The first step is to train a conventional RF algorithm on the HAR dataset to compute the feature importance. In the second step, the feature importance values are transferred to the directed RF algorithm. The authors used the directed RF algorithm because trees are not dependent on each other, and parallel computation decreases the training time and minimizes the prediction time. The algorithm developed only two ensembles and showed a high selectivity with a small sample. Further, this helped to maintain high prediction accuracy. In [33], five widely used HAR datasets were used, and the authors noted that the directed RF can find smaller feature sets while maintaining a high HAR accuracy.
Different functions exist for further improving the HAR prediction, including like, display, and circle [34][35][36]. In recent years, deep learning has rapidly grown in all applica-Algorithms 2021, 14, 245 4 of 17 tion domains (natural science, computer science, multimedia, networking, security, finance, etc.) due to its ability to efficiently understand complex and non-linear data [30]. Human actions need to be accepted as a specific activity that helps to identify different types of human development and behavior. HAR uses information gathered from different types of sensors. Wang et al. [12] proposed a deep learning-based method that can perceive two different exercises and transitions between them. This has a very important practical use, particularly for medical care applications. In [12], the authors first designed a deep convolutional neural (CNN) model to extract distinguishing parameters from the sensed data. Then, to capture the conditions of two different exercises, a long transient memory network was used. This step improved the HAR accuracy for two exercise activities. With the fusion of CNN and LSTM, a model was introduced for wearable devices that can precisely deduce exercises and switching one exercise to another. With the fusion of CNN and a long transient memory network, a model was introduced for wearable devices that can precisely deduce one exercise and then switching to another successfully. The test results showed that the proposed method is highly accurate, with a correct classification rate of up to 95.87% and a correct classification rate for changes of over 80%, which is superior to comparable HAR models. Another deep LSTM neural network method for HAR is introduced in [37], where IMU sensor data are used.
DL techniques are used for the classification problem but perform very well for timeseries problems too. Alawneh et al. [38] examined the pros and cons of time series data augmentation to enhance the accuracy of DL models for HAR from smartphone-based accelerometer data. Alawneh et al. critically analyzed Gated Recurrent Units, Long-Short Term Memory, and Vanilla and tested them using three publicly available HAR datasets. They used double cross arrangement information augmentation procedures and studied their effect on the accuracy of the objective model. The analysis proved that using gated intermittent units produces the best accuracy and preparation time results and enables long-transient memory processing. Furthermore, the results showed that the use of information extension essentially improves the quality of the acknowledgment. Similar to [8], Ronald et al. [39] focused on the importance of the low computational power of mobile devices while performing HAR using DL models. A very interesting framework of feature fusion is proposed by Chen et al. [40], where handcrafted features are fused with automatically generated features through DL for HAR.
HAR is of great importance when managing and controlling pandemics like that of COVID-19 today and in the future. Applications for contact tracing, social distancing, and information dissemination have grown significantly during COVID-19 to effectively manage and control the pandemic. Contact tracing applications help to determine the near history of an infected person, such as where the infected person went and whom the infected person met in the last week. On the other hand, social distancing applications assist in determining whether people follow social distancing guidelines. Location coordinates, proximity data, and HAR data play an important role in contact tracing and social distancing applications. Countries around the world have successfully used these applications. D'Angelo and Palmieri [41] introduced a human movement classifier based on convolutional deep neural networks to improve the exposure of COVID-19 to the abovestated applications. Specifically, the raw information from a cell phone's accelerometer sensor was arranged to frame a picture, including some channels (HAR-images) used as fingerprints for progress action, which can be extrapolated to the following applications, constituting one of the contributions of the present study. The experimental results from examining anecdotal information revealed that HAR images are potent attractants for human action acceptance. The novel coronavirus devastated the world and forced researchers to find solutions for controlling and irradicating this infection. These researchers include virologists, clinical specialists, and doctors attempting to find answers and develop solutions to deal with the COVID-19 pandemic, for example, techniques that can improve the diagnosis of COVID-19, healing protocols, drugs, and vaccines.
The late progress in non-contact detection for improving medical care and regulating COVID-19 flare-ups is the motivation for this investigation. Khan et al. [42] attempted to explain an imaginative answer to the early analysis of COVID-19 signs, such as strange breathing rates, hacking, and other inevitable medical issues. To develop a compelling and practical arrangement of the existing phases, Khan et al. [42] identified the existing methods used for health monitoring based on human activity data. The paper presented data collection methods, data preprocessing and processing methods, data preparation, feature selection and extraction, and prediction methods for non-contact applications. The preliminary findings of Khan et al. [42] regarding COVID-19 manifestations and the observation of human practices and well-being during isolation play a critical role in determining how the infection will spread and with what intensity. There have been several advances in non-contact sensing to improve health care. As previously discussed, the study of D'Angelo and Palmieri [41] was also inspired by this, and their work has contributed to preventing the COVID-19 outbreak. This investigation aims to explain an imaginative answer to the early analysis of COVID-19 signs, such as abnormal breathing rates, hacking, and other underlying medical conditions. To obtain a feasible and achievable system based on the existing steps, we differentiate the current techniques used to examine humans' function and well-being in a non-contact manner. This efficient audit presents the performance of information classification innovation, information preprocessing, information readiness, highlight extraction, order counting, and various non-contact detection steps. This examination proposes a non-contact detection phase for the early conclusion of COVID-19 side effects and the observation of human exercise and well-being during detachment or isolation periods. From the literature mentioned above, it is clear that there has been no work on the potential of soft-voting and self-learning mechanisms to improve HAR accuracy. In Table 1, we provide a comparative analysis of the HAR literature.

Avg. Accuracy (%) Disadvantages Advantages
Braganca et al. [8] 93 Accuracy can be higher Lightweight with low computational cost.
Gao et al. [9] 91 Accuracy can be higher, and smartphone position may vary.
Addresses two different problems in a single solution which are HAR and smartphone position recognition.
Ogbuabor et al. [11] 93.5 Smartphones need to be carried which is not practical always Have life-saving healthcare application Wang et al. [12] 95.85 Prediction accuracy must be higher, particularly for healthcare applications Can recognize HAR and activities transitions Mehmood et al. [13] 87 Tested on the small dataset HAR concept used for adaptive content delivery Alam et al. [24] 97.1 Validation needs to be performed on more extensive and diverse IoT datasets.
Better accuracy, memory efficiency, and relatively higher processing speed Kańtoch [25] 82 The proposed prototype is not suitable for the final confirmation of a performed activity. Additionally, further study is needed to investigate other features that will allow for improved activity differentiation.
A prototype of a battery-operated wearable health-tracking device that tracks body temperature and body motions Mai et al. [26] 74.1 A personal reidentification approach to discriminate the owner from the thief is needed for enhancing the accuracy level, and work needs to be carried out to recognize complex activities.
System proposal for motorbike theft detection in video surveillance systems Table 1. Cont.

Avg. Accuracy (%) Disadvantages Advantages
Palaniappan et al. [27] 94.4 Data from environmental and physiological sensors are not considered. A varied form of the sensor can be used to understand the context information and the patient's health condition to provide better assistance.
The computational time for classification is reduced significantly when compared to conventional approaches. The precision and sensitivity of the proposed system are better.
Chathuramali et al. [28] 100 When the number of training examples is few due to an imbalance, the proposed system performs marginally inferior to the existing established system The proposed system is superior in terms of computational time in terms of human activity recognition.
The proposed system can be used as home automation input for the home security system.
Zheng [30] 95.6 Placement of sensors for correct detection is an issue, and there is no involvement of an unsupervised approach for automatic activity recognition.
The proposed system recognized a number of human activities like walking considerably, running, jumping, standing, sitting, and sleeping using only a single triaxial accelerometer.
Kerboua et al. [31] 95.3 Improvement in the action recognition score is needed, and decreasing the detection time.
The proposed approach maintains a good accuracy score even using limited frame numbers.
Subasi et al. [32] 99.9 More considerable dataset validation is required, decrease the use of a number of sensors. The use of a more robust algorithm is needed.
Activity recognition using wearable sensors.
Uddin et al. [33] 95 More benchmark activity recognition data sets are needed for further validation of the study.
The proposed study allows parallel computing and offers low computational costs with high recognition accuracy. Additionally, it can select a minimal set of high-quality features without losing classification accuracy.
Balli et al. [34] 97.3 Human activities such as eating, smoking, cooking, handshaking, and hand waving are not considered.
Classification of human motions with motion sensor data.
Nurwulan et al. [35] 84.5 When a dataset is larger, RF is a time-consuming method for building a model.
Random forest is better for HAR when compared to KNN, LDA, NB, and SVM.
Bustoni et al. [36] 96 Feature selection and feature scaling to optimize the classification process are not considered in the proposed study.
To identify the most effective method using performance comparison of machine learning methods for classifying sensor data on human motion activities.
Alawneh et al. [38] 95 A larger dataset is needed to further validate the proposed study.
The proposed study enhances recognition quality by using data augmentation. Additionally, accuracy and training time is enhanced D'Angelo et al. [41] 99.9 Telemedicine or personal fitness monitoring fields also need to be investigated.
Enhance the performance of the COVID-19 tracking apps by using HAR.

Methodology
Machine learning is playing a significant role in understanding complex human activity patterns relating to the problems of HAR. In this paper, we propose a machine learning-based method that works in two phases. In the first phase of the method, the prediction probabilities are combined by soft voting. Later, the method is periodically capable of training itself. All the simulations are performed using R machine learning and a statistical platform. All the experiments are performed on the Dell Precision M4800 workstation with Intel Core i7 (Santa Clara, CA, USA), which has eight cores and 16 GB RAM. We used multiple cores for parallel processing to speed up the training of the models.

Datasets
We used two datasets in this paper. However, the main focus has been given to dataset-I. The dataset-I we used in this work is freely available on the UCI data repository. The dataset is recorded by conducting experiments on 30 people aged between 19 and 48 years old by Anguita et al. [23]. Six activities were performed by each person, including walking, walking up the stairs, walking down the stairs, standing, lying, and sitting. The training data consist of 7209 rows, and the testing dataset consists of 3090 rows. The data were sensed using the accelerometer and gyroscope sensors of the Samsung Galaxy S II smartphone. However, in this work, we only used the accelerometer data. The dataset contains 562 feature, subject, and activity vectors. The crucial aspect of this dataset is that it has a vast feature set, which can be challenging for machine learning algorithms from the perspective of training time, feature importance, and resource requirements, such as processors and RAM. Further, we used an HAR dataset-II to have more convincing evidence of the proposed method's performance [43].

Proposed Method
HAR is a problem of great interest due to its wide array of applications, including healthcare, surveillance, adaptive content delivery, tracking, etc. Machine learning is at the forefront of it. We called our HAR method Soft-Voting and Self-Learning (SVSL). The idea is to use the power of combined decision-making rather than limiting it to a single classifier. For this purpose, we used soft voting, where we combined the prediction probabilities and took weighted probability as the deciding factor in SVSL.
Further, to enhance the prediction accuracy, the SVSL method was integrated with a self-training mechanism. By self-training mechanism, we mean that the SVSL method trains itself again periodically, without the need for any data from the user side and any human interference. We believe that this trick could help us understand and correctly classify the data instances that are the most difficult to predict and efficiently accommodate dynamic environments. A functional block diagram of the SVSL method is shown in Figure 1, and the self-training process is shown in Figure 2. Further, for an in-depth understanding, the Algorithm 1, is given, which is self-explanatory.
The SVSL works in two phases: (1) Soft-voting and (2) Self-training phases. Figure 1 depicts the processing steps of the SVSL method. First, we divided the dataset into training and test data by 0.7 and 0.3 ratios. First, DL, GLM, RF are trained separately. For DL, Tanh is used as an activation function. Figure 3 depicts that the model with 40 epochs and two hidden layers with 64 neurons each produced the best training results. Hence the model with these parameters is selected from the eight trained DL models.
Then, using soft-voting, we obtained the final model (Final model ) using the formulation of soft-voting given in Equations (1)-(4), where the activity class is denoted by i = (Walking; Standing, etc.), the predicted class probabilities are denoted by ρ, and the classifiers used for voting are denoted by j ← RF p , GLM p , DL p .
p(i n |x) ← RF p n + GLM p n + DL p n 3 (3) Once the is obtained, as given in Figure 1 and further in Algorithm 1, we use it to predict the activity class.
is trained periodically from a buffered dataset, which is obtained autonomously from the predicted data. Buffered data are the saved data from the previous ten predictions from , which is updated periodically. The periodically trained model is used to predict human activity labels. Figure 2 depicts the self-training process, which is an iterative task. After every ten executions, is replaced by the retained .      Once the Final model is obtained, as given in Figure 1 and further in Algorithm 1, we use it to predict the activity class. Final model is trained periodically from a buffered dataset, which is obtained autonomously from the predicted data. Buffered data are the saved data from the previous ten predictions from Final model , which is updated periodically. The periodically trained model is used to predict human activity labels. Figure 2 depicts the self-training process, which is an iterative task. After every ten executions, Final model is replaced by the retained Final model .

Results and Analysis
To evaluate the performance of the SVSL HAR method, we used the confusion matrix as a performance-measuring benchmark, which can be used to compute the prediction accuracy percentage, sensitivity, and specificity [44]. To demonstrate the validity of the performance of the SVSL method, we compared the results of four state-of-the-art classifiers: DL, GLM, RF, and AdaBoost. RF is based on bagging, which is an ensemble learning technique that can perform classification and regression. RF is a highly capable classifier that can handle high-dimensionality datasets, compute variable importance, and avoid overfitting [45]. However, RF is not very capable of performing regression tasks with good accuracy. GLM was proposed by John Nelder and Robert Wedderburn [46] in 1972. It is a linear regression generalization. It allows the linear model to include the dependent variable with the help of a linking function.
Furthermore, it enables the variance magnitude of every data instance to be its predicted value function. This paper used Feed Forward Deep Neural Networks, also known as a multi-layered perceptron, for DL [47]. They consist of an input layer (data), hidden layers (neurons), and an output layer. All three classifiers, RF, GLM, and DL, are implemented using the H 2 O deep learning library in R [48], which supports parallel and distributed programming. Further, we also compared results with a state-of-the-art boosting classifier known as AdaBoost as it falls in the Ensemble class of algorithms. Figure 4 shows that the SVSL method achieved an average accuracy of 99.26%, which is better than that of the other three classifiers, RF, GLM, DL, and AdaBoost, with 98.09%, 90.84%, 97.90%, and 96.13%, respectively. Furthermore, the SVSL method showed a 0.14% gain with the self-training mechanism, which indicates that the SVSL method can correctly classify the most challenging data instances.  The normalized confusion matrices of the SVSL method, DL, GLM, RF, and AdaBoost are shown in Figures 5-9. Each confusion matrix contains the sum of the rightly classified and misclassified classes. The diagonal of the matrix represents the rightly classified class details. Figure 5 depicts the conventional and normalized confusion matrix for the deeplearning classifier, as we know that the matrix values in the figure can be used to compute the percentage values of the class-wise prediction accuracy. The deep-learning classifier produced the highest class-wise accuracy of 100% for the walking down the stairs and lying activities. The lowest accuracy, 93.9%, produced by the deep-learning classifier was for the standing class. The deep-learning classifier produced accuracies of 99.6%, 98.5%, 98.30%, and 96.1% for the other four activities-walking, walking up the stairs, walking down the stairs, and sitting, respectively. It should be noted that, as shown in Figure 4, the deep-learning classifier produced an average accuracy of 97.90% for all the classes.  The normalized confusion matrices of the SVSL method, DL, GLM, RF, and AdaBoost are shown in Figures 5-9. Each confusion matrix contains the sum of the rightly classified and misclassified classes. The diagonal of the matrix represents the rightly classified class details. Figure 5 depicts the conventional and normalized confusion matrix for the deeplearning classifier, as we know that the matrix values in the figure can be used to compute the percentage values of the class-wise prediction accuracy. The deep-learning classifier produced the highest class-wise accuracy of 100% for the walking down the stairs and lying activities. The lowest accuracy, 93.9%, produced by the deep-learning classifier was for the standing class. The deep-learning classifier produced accuracies of 99.6%, 98.5%, 98.30%, and 96.1% for the other four activities-walking, walking up the stairs, walking down the stairs, and sitting, respectively. It should be noted that, as shown in Figure 4, the deep-learning classifier produced an average accuracy of 97.90% for all the classes.    Figure 6 depicts the normalized confusion matrix for the GLM. Using the confusion matrix values in the figure to compute the percentage values of the class-wise prediction accuracy, we note that the GLM classifier produced the highest accuracy of 99.3% for the lying position. The lowest accuracy, 83.3%, was for the standing activity. The GLM classifier produced accuracies of 88.8%, 92.1%, 93.6%, and 88.4% for the other four activities-Walking Up the stairs, Walking, Walking Down the stairs, and Sitting, respectively. It should be noted that, as shown in Figure 4 the GLM classifier produced an average accuracy of 90.84% for all the classes.    Figure 6 depicts the normalized confusion matrix for the GLM. Using the confusion matrix values in the figure to compute the percentage values of the class-wise prediction accuracy, we note that the GLM classifier produced the highest accuracy of 99.3% for the lying position. The lowest accuracy, 83.3%, was for the standing activity. The GLM classifier produced accuracies of 88.8%, 92.1%, 93.6%, and 88.4% for the other four activities-Walking Up the stairs, Walking, Walking Down the stairs, and Sitting, respectively. It should be noted that, as shown in Figure 4 the GLM classifier produced an average accuracy of 90.84% for all the classes.   respectively. It should be noted that, as shown in Figure 4, the RF classifier produced an average accuracy of 98.09% for all the classes. Figure 8 depicts the normalized confusion matrix for the AdaBoost classifier. Using the confusion matrix values in the figure to compute the percentage values of the classwise prediction accuracy, we note that the AdaBoost classifier produced the highest accuracy of 99.4% for the lying position. The lowest accuracy, 90.9%, was for the standing activity. The AdaBoost classifier had accuracies of 98.2%, 97.6%, 98.5%, and 92.2% for the other four activities-Walking Up the stairs, Walking, Walking Down the stairs, and Sitting, respectively. It should be noted that, as shown in Figure 4, the AdaBoost classifier produced an average accuracy of 96.13% for all the classes.      Figure 5 depicts the conventional and normalized confusion matrix for the deeplearning classifier, as we know that the matrix values in the figure can be used to compute the percentage values of the class-wise prediction accuracy. The deep-learning classifier produced the highest class-wise accuracy of 100% for the walking down the stairs and lying activities. The lowest accuracy, 93.9%, produced by the deep-learning classifier was for the standing class. The deep-learning classifier produced accuracies of 99.6%, 98.5%, 98.30%, and 96.1% for the other four activities-walking, walking up the stairs, walking down the stairs, and sitting, respectively. It should be noted that, as shown in Figure 4, the deep-learning classifier produced an average accuracy of 97.90% for all the classes. Figure 6 depicts the normalized confusion matrix for the GLM. Using the confusion matrix values in the figure to compute the percentage values of the class-wise prediction accuracy, we note that the GLM classifier produced the highest accuracy of 99.3% for the lying position. The lowest accuracy, 83.3%, was for the standing activity. The GLM classifier produced accuracies of 88.8%, 92.1%, 93.6%, and 88.4% for the other four activities-Walking Up the stairs, Walking, Walking Down the stairs, and Sitting, respectively. It should be noted that, as shown in Figure 4 the GLM classifier produced an average accuracy of 90.84% for all the classes. Figure 7 depicts the conventional and normalized confusion matrix for the RF classifier. Using the confusion matrix values in the figure to compute the percentage values of the class-wise prediction accuracy, we note that the RF classifier produced the highest accuracy of 100% for the lying position. The lowest accuracy, 94.7%, was for the standing activity. The RF classifier produced accuracies of 98.5%, 99.2%, 99.5%, and 97.1% for the other four activities-walking up the stairs, walking, walking down the stairs, and sitting, respectively. It should be noted that, as shown in Figure 4, the RF classifier produced an average accuracy of 98.09% for all the classes. Figure 8 depicts the normalized confusion matrix for the AdaBoost classifier. Using the confusion matrix values in the figure to compute the percentage values of the class-wise prediction accuracy, we note that the AdaBoost classifier produced the highest accuracy of 99.4% for the lying position. The lowest accuracy, 90.9%, was for the standing activity. The AdaBoost classifier had accuracies of 98.2%, 97.6%, 98.5%, and 92.2% for the other four activities-Walking Up the stairs, Walking, Walking Down the stairs, and Sitting, respectively. It should be noted that, as shown in Figure 4, the AdaBoost classifier produced an average accuracy of 96.13% for all the classes. Figure 9 depicts the conventional and normalized confusion matrix for the SVSL method. Using the confusion matrix values in the figure to compute the percentage values of the class-wise prediction accuracy, we note that the SVSL method produced the highest accuracy of 100% for the lying position. The lowest accuracy, 98.3%, was for the standing position. The SVSL method produced accuracies of 99.1%, 99.6%, 99.8%, and 98.9% for the other four activities-walking up the stairs, walking, walking down the stairs, and sitting, respectively. It should be noted that, as shown in Figure 4, the SVSL method produced an average accuracy of 99.26% for all the classes.

Dataset I
The SVSL method performed better than all the other five classifiers: DL, GLM, RF, AdaBoost, and Stacking. The proposed method outperformed GLM and performed better than DL and RF, producing a prediction accuracy that was 8.24%, 1.36%, and 1.17% better, respectively, before the execution of the self-training mechanism. After self-training, the accuracy further increased, but very slightly. However, this topic requires further investigation. From Figures 4-9 it can be seen that the SVSL method not only produced a better HAR accuracy, but it could also predict the classes of sitting and standing far more accurately than DL, GLM, and RF. Figure 10 depicts the average accuracy gain after every self-training. Without retraining, the SVSL method produced an average accuracy of 99.26% for all the classes. The average accuracy of the SVSL method increased by 0.11% after five executions of the self-training mechanisms.
Algorithms 2021, 14, x FOR PEER REVIEW 14 of 18 Figure 9 depicts the conventional and normalized confusion matrix for the SVSL method. Using the confusion matrix values in the figure to compute the percentage values of the class-wise prediction accuracy, we note that the SVSL method produced the highest accuracy of 100% for the lying position. The lowest accuracy, 98.3%, was for the standing position. The SVSL method produced accuracies of 99.1%, 99.6%, 99.8%, and 98.9% for the other four activities-walking up the stairs, walking, walking down the stairs, and sitting, respectively. It should be noted that, as shown in Figure 4, the SVSL method produced an average accuracy of 99.26% for all the classes.
The SVSL method performed better than all the other five classifiers: DL, GLM, RF, AdaBoost, and Stacking. The proposed method outperformed GLM and performed better than DL and RF, producing a prediction accuracy that was 8.24%, 1.36%, and 1.17% better, respectively, before the execution of the self-training mechanism. After self-training, the accuracy further increased, but very slightly. However, this topic requires further investigation. From Figures 4-9 it can be seen that the SVSL method not only produced a better HAR accuracy, but it could also predict the classes of sitting and standing far more accurately than DL, GLM, and RF. Figure 10 depicts the average accuracy gain after every self-training. Without retraining, the SVSL method produced an average accuracy of 99.26% for all the classes. The average accuracy of the SVSL method increased by 0.11% after five executions of the selftraining mechanisms.

Dataset II
For giving more convincing evidence of the performance of the SVSL, we also tested it on another dataset [43]. We are keeping this subsection precise to avoid any further increase in paper length. Figure 11 shows that the SVSL and self-trained SVSL methods achieved an average accuracy of 97.15% and 97.59%, which is better than that of the other four classifiers, RF, GLM, DL, and AdaBoost, with 95.29%, 93.04%, 95.01%, and 91.47%, respectively. Furthermore, the SVSL method showed a 0.44% gain with the self-training mechanism, which indicates that the SVSL method can correctly classify the most difficult data instances.

Dataset II
For giving more convincing evidence of the performance of the SVSL, we also tested it on another dataset [43]. We are keeping this subsection precise to avoid any further increase in paper length. Figure 11 shows that the SVSL and self-trained SVSL methods achieved an average accuracy of 97.15% and 97.59%, which is better than that of the other four classifiers, RF, GLM, DL, and AdaBoost, with 95.29%, 93.04%, 95.01%, and 91.47%, respectively. Furthermore, the SVSL method showed a 0.44% gain with the self-training mechanism, which indicates that the SVSL method can correctly classify the most difficult data instances.  Figure 12 depicts the training and testing time of all the methods. The SVSL method consumed the maximum time in training time, which is 2640 and 9772 s for dataset-I and dataset-II. This is expected as it comprises multiple phases. The fastest training time is of RF, 245 and 849 s for dataset-I and dataset-II. Further, GLM, AdaBoost, and DL remain in second, third, and fourth position in terms of training time. Similar patterns are observed for the testing times as depicted in Figure 12. All these methods with their current testing performance cannot be applied to real-time applications. These can be used in applications where we need activity recognition periodically or in near real-time predictions.

Conclusions
With the exponentially growing number of smart devices today, we can sense and record data that we could not even imagine a decade ago. We can benefit from wearable devices and smartphones through sensing, storing, and processing the stored data. One of the important and popular uses of this type of data is in HAR. This is a problem of great interest due to its wide array of applications, which include but are not limited to healthcare, surveillance, adaptive content delivery, and tracking. The use of wearable technology is rapidly increasing, and its effects have been positively observed by users in relation to follow-up healthcare appointments.
In this paper, we identified crucial research gaps in the area of HAR and investigated them. Firstly, the individual classifiers belong to a particular family of algorithms, such as neural networks, decision trees, bagging, and boosting. However, each of them can perform efficiently for certain types of data patterns, with associated weaknesses. Secondly,   Figure 12 depicts the training and testing time of all the methods. The SVSL meth consumed the maximum time in training time, which is 2640 and 9772 s for dataset-I a dataset-II. This is expected as it comprises multiple phases. The fastest training time is RF, 245 and 849 s for dataset-I and dataset-II. Further, GLM, AdaBoost, and DL remain second, third, and fourth position in terms of training time. Similar patterns are observ for the testing times as depicted in Figure 12. All these methods with their current testi performance cannot be applied to real-time applications. These can be used in appli tions where we need activity recognition periodically or in near real-time predictions.

Conclusions
With the exponentially growing number of smart devices today, we can sense a record data that we could not even imagine a decade ago. We can benefit from weara devices and smartphones through sensing, storing, and processing the stored data. O of the important and popular uses of this type of data is in HAR. This is a problem of gr interest due to its wide array of applications, which include but are not limited healthcare, surveillance, adaptive content delivery, and tracking. The use of weara technology is rapidly increasing, and its effects have been positively observed by users relation to follow-up healthcare appointments.
In this paper, we identified crucial research gaps in the area of HAR and investigat them. Firstly, the individual classifiers belong to a particular family of algorithms, such neural networks, decision trees, bagging, and boosting. However, each of them can p form efficiently for certain types of data patterns, with associated weaknesses. Second

Conclusions
With the exponentially growing number of smart devices today, we can sense and record data that we could not even imagine a decade ago. We can benefit from wearable devices and smartphones through sensing, storing, and processing the stored data. One of the important and popular uses of this type of data is in HAR. This is a problem of great interest due to its wide array of applications, which include but are not limited to healthcare, surveillance, adaptive content delivery, and tracking. The use of wearable technology is rapidly increasing, and its effects have been positively observed by users in relation to follow-up healthcare appointments.
In this paper, we identified crucial research gaps in the area of HAR and investigated them. Firstly, the individual classifiers belong to a particular family of algorithms, such as neural networks, decision trees, bagging, and boosting. However, each of them can perform efficiently for certain types of data patterns, with associated weaknesses. Secondly, data and the environment in which they are sensed are dynamic, and there is a need to frequently update the models according to the changing sensed data patterns.
In this paper, we addressed the problem of HAR. Machine-learning algorithms play a significant role in developing our understanding of HAR data and applying the acquired knowledge to various application domains. We proposed a Soft-Voting and Self-Learning (SVSL)-based HAR method, which uses a soft-voting and self-learning mechanism to classify human activities. The SVSL method produced better results for dataset-I than the four other state-of-the-art classifiers: DL, GLM, RF, and AdaBoost. SVSL outperformed GLM and AdaBoost by almost 9%, and 3% and had a prediction accuracy that was more than 1% higher than that of DL and RF, respectively. Similar prediction accuracy patterns have been seen for dataset-II. We also noticed that the SVSL method had an improved prediction accuracy for the classes where the other three state-of-the-art classifiers produced a higher misclassification. Further, the average accuracies for both datasets increased by 0.11% and 0.44% after five executions of the self-training mechanisms for the proposed method.
In the future, we plan to work toward enhancing the performance of the proposed method using additional diverse datasets. Additionally, we planned to develop an automated system to collect HAR data. This area has an enormous real-world application scope and must be exploited for the sake of better understanding human behaviors and our surroundings, which will enhance decision making.