Improving Human Activity Monitoring by Imputation of Missing Sensory Data: Experimental Study

The automatic recognition of human activities with sensors available in off-the-shelf mobile devices has been the subject of different research studies in recent years. It may be useful for the monitoring of elderly people to present warning situations, monitoring the activity of sports people, and other possibilities. However, the acquisition of the data from different sensors may fail for different reasons, and the human activities are recognized with better accuracy if the different datasets are fulfilled. This paper focused on two stages of a system for the recognition of human activities: data imputation and data classification. Regarding the data imputation, a methodology for extrapolating the missing samples of a dataset to better recognize the human activities was proposed. The K-Nearest Neighbors (KNN) imputation technique was used to extrapolate the missing samples in dataset captures. Regarding the data classification, the accuracy of the previously implemented method, i.e., Deep Neural Networks (DNN) with normalized and non-normalized data, was improved in relation to the previous results without data imputation.


Introduction
The evolution of Internet of Things systems and multi-sensor devices contributed to the development of systems for human activity monitoring. One set of applications of these technologies is improving the independent living and rehabilitation of older adults and people with special needs [1]. Likewise, there are approaches for fall detection and risk assessment [2,3]. Usually, the human activity monitoring systems transmit the collected data to the cloud for real-time processing and further analysis [4]. In light of that, the network conditions become an important factor in facilitating data transfer [5]. Therefore, the development of optimized online systems and test pilots are important. Moreover, these systems should be prepared for older adults, which raises other sets of challenges related to usability and ergonomics; therefore, the resilience of these is essential [6,7].
Different types of activities may be detected with the inertial sensors available in the mobile devices, including running, walking, walking upstairs, walking downstairs, and standing [8,9]. For the detection of human activities, one of the possibilities is the use of artificial intelligence methods combined with the capabilities of the mobile devices for the development of monitoring tools anywhere at any time [10]. Still, the data acquisition may have problems related to low memory, power processing, and battery

Overview
The methodology of this study proposes the automatic identification of five human activities, including walking, running, walking upstairs, walking downstairs, and standing. Figure 1 shows the flow diagram of the proposed methodology to perform the classification of the different samples with the extrapolation the missing samples before the classification of the data. The method is composed by seven modules, including data acquisition (Section 2.2), data imputation (Section 2.3), denoising (Section 2.4), features integration (Section 2.5), data normalization (Section 2.6), model training and evaluation (Section 2.7), and performance comparison (Section 2.8). These stages are explained in the next sections.

Study Participants and Data Acquisition
The data acquisition process is performed with non-intrusive equipment based on the use of a mobile device that incorporates different sensors, including an accelerometer, magnetometer, and gyroscope sensors. During the data acquisition, some failures may occur, and the missing samples were detected (Section 2.4). The data acquired includes the performance of different activities, including walking, running, standing, walking upstairs, and walking downstairs [22]. The different activities were performed and labeled by 25 individuals aged between 20 and 60 years old with different lifestyles and health states.
In general, the dataset is composed by 2000 captures of 5 s for each activity that corresponds to around 2.78 h of captures, representing 169.44 h of captures related to each activity. Thus, this dataset is composed by 13.9 h of captures shared by different individuals. The data were acquired using an Android application installed in a mobile device to record the mobile sensors data while performing the activities. All the participants kept the mobile phone in the front pocket of their pants while performing activities. The mobile device used is the BQ Aquaris 5.7 smartphone with a Quad Core CPU and 16 GB of internal memory [23]. Next, the data were used for the implementation of different techniques for data classification (Section 2.7). The mobile devices have different constraints related to the low memory, battery, and power processing, which may cause different failures [6,24]. After the acquisition, the original dataset without the application of the data imputation technique is available in [25], and the dataset with the application of the data imputation technique is available in [26].

Data Imputation
Once the dataset was collected, the next step was to analyze the missing samples and then extrapolate the missing samples using the data imputation technique. Figure 2 shows the flowchart for extrapolating the missing samples. It can the seen that the data imputation was performed in four major steps, which include missing samples identification, NULL values insertion, data segmentation, and data imputation.

Study Participants and Data Acquisition
The data acquisition process is performed with non-intrusive equipment based on the use of a mobile device that incorporates different sensors, including an accelerometer, magnetometer, and gyroscope sensors. During the data acquisition, some failures may occur, and the missing samples were detected (Section 2.4). The data acquired includes the performance of different activities, including walking, running, standing, walking upstairs, and walking downstairs [22]. The different activities were performed and labeled by 25 individuals aged between 20 and 60 years old with different lifestyles and health states.
In general, the dataset is composed by 2000 captures of 5 s for each activity that corresponds to around 2.78 h of captures, representing 169.44 h of captures related to each activity. Thus, this dataset is composed by 13.9 h of captures shared by different individuals. The data were acquired using an Android application installed in a mobile device to record the mobile sensors data while performing the activities. All the participants kept the mobile phone in the front pocket of their pants while performing activities. The mobile device used is the BQ Aquaris 5.7 smartphone with a Quad Core CPU and 16 GB of internal memory [23]. Next, the data were used for the implementation of different techniques for data classification (Section 2.7). The mobile devices have different constraints related to the low memory, battery, and power processing, which may cause different failures [6,24]. After the acquisition, the original dataset without the application of the data imputation technique is available in [25], and the dataset with the application of the data imputation technique is available in [26].

Data Imputation
Once the dataset was collected, the next step was to analyze the missing samples and then extrapolate the missing samples using the data imputation technique. Figure 2 shows the flowchart for extrapolating the missing samples. It can the seen that the data imputation was performed in four major steps, which include missing samples identification, NULL values insertion, data segmentation, and data imputation.

Missing Samples Identification
After data acquisition and cleaning, the existence of missing samples in each record was performed. Regarding the training of the artificial intelligence methods, the existence of missing samples causes some impact in the correct recognition of human activities. It may occur by different reasons, including the user not performing an activity for a complete defined activity duration, failures of the sensors, environmental noise, or problems with the mobile device used for data acquisition.
Firstly, the number of missing samples in each record of the dataset was identified, analyzing the duration of each activity and the frequency rate of the sensors. The frequency rate differs from the sensors, where the frequency rate of the accelerometer and gyroscope was 100 Hz, and the frequency rate of the magnetometer was 10 Hz. Thus, the methods analyzed 500 samples for the accelerometer and magnetometer sensors, and 50 samples for the magnetometer sensor for each 5 s of activity.
Next, the number of missing samples for each capture was analyzed, excluding the samples that had less than 4 s of the data. Thus, the captures with more than 100 missing samples in the accelerometer and gyroscope sensors and the captures with more than 10 missing samples in the magnetometer sensor were discarded. This was done to be closer to the originality of the data than filling all synthetic samples to fulfill the space of missing samples.
From the above analysis, the missing samples count was identified with Equation (1).
Missing Samples Count = (Frequency rate × Activity Duration) − Samples Count in the Given Excerpt (1) Now, based upon the accelerometer and gyroscope specifications, Equations (1) and (2)

Missing Samples Identification
After data acquisition and cleaning, the existence of missing samples in each record was performed. Regarding the training of the artificial intelligence methods, the existence of missing samples causes some impact in the correct recognition of human activities. It may occur by different reasons, including the user not performing an activity for a complete defined activity duration, failures of the sensors, environmental noise, or problems with the mobile device used for data acquisition.
Firstly, the number of missing samples in each record of the dataset was identified, analyzing the duration of each activity and the frequency rate of the sensors. The frequency rate differs from the sensors, where the frequency rate of the accelerometer and gyroscope was 100 Hz, and the frequency rate of the magnetometer was 10 Hz. Thus, the methods analyzed 500 samples for the accelerometer and magnetometer sensors, and 50 samples for the magnetometer sensor for each 5 s of activity.
Next, the number of missing samples for each capture was analyzed, excluding the samples that had less than 4 s of the data. Thus, the captures with more than 100 missing samples in the accelerometer and gyroscope sensors and the captures with more than 10 missing samples in the magnetometer sensor were discarded. This was done to be closer to the originality of the data than

NULL Values Insertion
After identifying the missing samples, the next step is to insert the NULL values to fill the space of the missing samples. Before inserting the NULL values, it was verified whether the missing samples count is greater than the sample frequency rate, i.e., the number of samples recorded in one second; then, the NULL values are not inserted, and the excerpt is ignored. Thus, if the missing samples count in each excerpt is more than 100 missing samples in case of the accelerometer and gyroscope or more than 10 missing samples in case of the magnetometer, then the excerpt is ignored. It is done to be closer to the originality of the data than filling all synthetic samples to fill the space of missing samples. On the other hand, if the missing samples count is less than or equal to the sample frequency rate, then NULL values are inserted after every constant time interval, i.e., after 1/100 s in case of an accelerometer and gyroscope and 1/10 s in case of a magnetometer, to fill the space of missing samples.

Data Segmentation
After filling the space of missing samples with NULL values, the segmentation of the samples was performed to apply the imputation technique to extrapolate the missing samples. The samples were segmented in each excerpt with respect to its sample frequency rate. In the case of the accelerometer and gyroscope, the samples were segmented into a window of 100 samples having 90 known samples and the first 10 unknown samples. While in the case of the magnetometer, the samples were segmented into a window of 10 samples having 9 known samples and one unknown sample. If the missing samples count was less than or equal to 10 in the case of the accelerometer and gyroscope, then all unknown samples were included with the known samples to make a window of 100 samples. While in case of the magnetometer, if the missing samples count was less than or equal to 2, all unknown samples were included with the known samples to make a window of 10 samples.

Data Imputation
The KNN imputation technique is a method to identify k samples in the used dataset by its similarity or closeness in the space [27]. The k samples are used to estimate the value of missing points. Generally, the value is imputed with the mean value of the k samples that are neighbors in the dataset.
Once the data were segmented, the KNN imputation technique was applied to extrapolate the missing samples. In the KNN imputation technique, we first found k-closest neighbors to the missing samples, and then these missing samples were imputed based upon the known k-closest neighbors. The data points having the shortest distance based on Euclidean distances were considered as the closest neighbors. The value of every missing sample was interpolated using the mean value of the k-closest neighbors. The missing samples count every time was noticed before applying the KNN imputation. If the missing samples count was less than 10, then the missing samples were filled in the first iteration. However, if the missing samples count was more than 10, then the window was moved 10 steps forward to make another chunk of data and apply the KNN imputation technique to extrapolate the missing samples. As shown in Figure 1, this process continued until all the missing samples were extrapolated.
In short, each recorded activity of the given dataset was analyzed, and the missing samples count was identified. Based upon the missing sample count, the comparison of the missing sample count with the sample frequency rate was performed. If the missing sample count is greater than the frequency rate of given excerpt, then that particular excerpt was ignored, and the next excerpt was analyzed. However, if the missing sample count is less than or equal to frequency rate, then the NULL values are inserted after a fixed time interval until all the missing sample are filled with the NULL values. Once the NULL values are inserted, then the samples were segmented into a window of 100 samples. Finally, the KNN imputation technique was applied for extrapolating the missing samples values based upon the known samples and this process was repeated as all the unknown values were extrapolated.

Denoising
The data cleaning process is important to remove the environment noise, effects of involuntary movements, and other artifacts, to improve the results of the recognition of human activities. According to the type of sensors used, the implemented method was the low-pass filter [28], which allows extracting features more clearly and is reliable for the implementation of classification methods.

Features Integration
After the data imputation, all three sensor excerpts of all datasets along with their activity labels were integrated to make a feature vector. The features extracted for each sensor are the five greatest distances between the maximum peaks, the average, standard deviation, variance, and median of the maximum peaks, and the standard deviation, average, maximum value, minimum value, variance, and median of the raw signal.

Data Normalization
After the extraction and integration of the different features, two different analyses were performed, i.e., one with raw features, and another one with the normalized features. According to the literature, there are different normalization techniques, but the most adapted to the implementation of the Deep Neural Networks (DNN) method with the DeepLearning4j framework [29] consists in the use of the mean and standard deviation [30].

Model Training and Evaluation
Once the feature vector was split into a training and test set, the deep learning model was trained over the training set for recognizing the human activities. After the extraction of the different features, the Deep Neural Networks (DNN) method was implemented with the DeepLearning4j framework [29]. During the hyper parameter tuning with a grid search approach [31], we considered the following values for each of the parameters: learning rate (10 0 , 10 −1 , 10 −2 , 10 −3 , 10 −3 , 10 −5 , 10 −6 , 10 −7 ) with an adaptive learning rate approach [32], number of hidden layers (1-4), regularization (L1, L2), normalization (min-max normalization), and mean and standard deviation. The following parameters were selected with the grid search and were configured for the final DNN model: •  [30].
Finally, the evaluation of the performance of the trained model was performed, and it was tested over the unseen data, i.e., the test set. Note that the experiments were repeated five times with different seeds causing different training and test splits, as well as different initializations of the DNN network. During the experiments, the hyperparameters were fixed to the above values. Based upon the testing results, the confusion matrix was constructed, which is further used to evaluate the performance of the trained model with respect to different performance metrics. The results and metrics are discussed in Section 3, and they represent the averages of the five repetitions.

Performance Comparison
Since a data imputation technique was applied to extrapolate the missing samples, next, the evaluation of the effectiveness of the proposed data imputation technique was performed. For this purpose, the dataset was trained and tested with a deep learning algorithm first without data imputation. Afterwards, the dataset was trained and tested with the same deep learning model over the imputed dataset. Finally, the comparison of the performance of both the traditional approach and the proposed imputation approach was performed. The results of this experiment are discussed in Section 3.

Data Imputation
This stage started with the identification of the number of missing samples. A sample rate of 100 Hz for the accelerometer and gyroscope sensors was considered, which corresponds to 500 samples per activity, and a sample rate of 10 Hz for the magnetometer sensor was considered, which corresponds to 50 samples per activity. Thus, as presented in Table 1, there are a lot of missing samples related to the accelerometer sensor. It shows the number of complete and missing values related to the accelerometer, where the same analysis was performed for the other sensors. The major number of missing samples was verified during walking upstairs. Next, the data segmentation for the further implementation of data imputation techniques was performed. Table 2 shows an excerpt of accelerometer data during walking activity, where it is possible to observe that 50 samples of data are missing. Next, the frequency of 100 Hz was considered, and the values of the next 50 samples were measured. However, Table 3 shows the start of the process, filling the missing values in the missing rows as NULL. After all the missing values were filled as NULL, the data segmentation process was performed, as shown in Table 4. Finally, the KNN imputation method was implemented to extrapolate the NULL values, as shown in Table 5.  This technique is implemented for the files that have less than 100 missing samples in the case of the accelerometer and gyroscope and 10 missing samples in the case of the magnetometer. If more than 100 samples are missing, this capture should be discarded. Thus, the captures with more than 100 records missing from the accelerometer or gyroscope and the captures with more than 10 records missed from the magnetometer were ignored. The pattern of imputed data is similar to the other values in each capture, as explained in Section 2.
Next, Figure 3 shows the representation of the different axis of one capture during walking downstairs, where only 375 records are available. The number of records should be normalized to obtain reliable results in the data classification, i.e., all experiments must have 500 records. It is verified that 125 samples are missing. The KNN imputation method was implemented and the result is presented in Figure 4. However, the results obtained have the same pattern, but its amplitude is higher.  This technique is implemented for the files that have less than 100 missing samples in the case of the accelerometer and gyroscope and 10 missing samples in the case of the magnetometer. If more than 100 samples are missing, this capture should be discarded. Thus, the captures with more than 100 records missing from the accelerometer or gyroscope and the captures with more than 10 records missed from the magnetometer were ignored. The pattern of imputed data is similar to the other values in each capture, as explained in Section 2.
Next, Figure 3 shows the representation of the different axis of one capture during walking downstairs, where only 375 records are available. The number of records should be normalized to obtain reliable results in the data classification, i.e., all experiments must have 500 records. It is verified that 125 samples are missing. The KNN imputation method was implemented and the result is presented in Figure 4. However, the results obtained have the same pattern, but its amplitude is higher.

Non-Normalized Data
Considering the accelerometer data, the results obtained with non-imputed and non-normalized data are reported in the confusion matrix presented in Table 6. The implemented method reported an accuracy of 22.9%, a precision of 19.65%, a recall value of 22.9%, and an F1 score of 21.15%. Table 6. Confusion matrix related to non-normalized and non-imputed data from the accelerometer sensor. Considering the accelerometer and magnetometer sensors' data, the results obtained with non-imputed and non-normalized data are reported in the confusion matrix presented in Table 7. The implemented method reported an accuracy of 40.69%, a precision of 56.4%, a recall value of 40.69%, and an F1 score of 47.27%.

Non-Normalized Data
Considering the accelerometer data, the results obtained with non-imputed and non-normalized data are reported in the confusion matrix presented in Table 6. The implemented method reported an accuracy of 22.9%, a precision of 19.65%, a recall value of 22.9%, and an F1 score of 21.15%. Table 6. Confusion matrix related to non-normalized and non-imputed data from the accelerometer sensor. Considering the accelerometer and magnetometer sensors' data, the results obtained with non-imputed and non-normalized data are reported in the confusion matrix presented in Table 7. The implemented method reported an accuracy of 40.69%, a precision of 56.4%, a recall value of 40.69%, and an F1 score of 47.27%. Considering the accelerometer, magnetometer, and gyroscope sensors' data, the results obtained with non-imputed and non-normalized data are reported in the confusion matrix presented in Table 8. The implemented method reported an accuracy of 74.46%, a precision of 78.24%, a recall value of 74.46%, and an F1 score of 76.3%. Considering the accelerometer data, the results obtained with imputed and non-normalized data are reported in the confusion matrix presented in Table 9. The implemented method reported an accuracy of 20%, a precision of 20%, a recall value of 20%, and an F1 score of 20%. Table 9. Confusion matrix related to non-normalized and imputed data from the accelerometer sensor. Considering the accelerometer and magnetometer sensors' data, the results obtained with imputed and non-normalized data are reported in the confusion matrix presented in Table 10. The implemented method reported an accuracy of 20.1%, a precision of 73.34%, a recall value of 20.1%, and an F1 score of 31.55%. Table 10. Confusion matrix related to non-normalized and imputed data from the accelerometer and magnetometer sensors. Considering the accelerometer, magnetometer, and gyroscope sensors' data, the results obtained with imputed and non-normalized data are reported in the confusion matrix presented in Table 11. The implemented method reported an accuracy of 20.19%, a precision of 60.02%, a recall value of 20.19%, and an F1 score of 30.22%. Table 11. Confusion matrix related to non-normalized and imputed data from the accelerometer, magnetometer, and gyroscope sensors.

Normalized Data
Considering the accelerometer data, the results obtained with non-imputed and normalized data are reported in the confusion matrix presented in Table 12. The implemented method reported an accuracy of 85.89%, a precision of 86.21%, a recall value of 85.89%, and an F1 score of 86.05%. Considering the accelerometer and magnetometer sensors' data, the results obtained with non-imputed and normalized data are reported in the confusion matrix presented in Table 13. The implemented method reported an accuracy of 86.49%, a precision of 86.75%, a recall value of 86.49%, and an F1 score of 86.62%. Considering the accelerometer, magnetometer, and gyroscope sensors' data, the results obtained with non-imputed and normalized data are reported in the confusion matrix presented in Table 14.
The implemented method reported an accuracy of 89.52%, a precision of 89.74%, a recall value of 89.51%, and an F1 score of 89.62%. Considering the accelerometer data, the results obtained with imputed and normalized data are reported in the confusion matrix presented in Table 15. The implemented method reported an accuracy of 94.56%, a precision of 94.63%, a recall value of 94.56%, and an F1 score of 94.59%. Considering the accelerometer and magnetometer sensors' data, the results obtained with imputed and normalized data are reported in the confusion matrix presented in Table 16. The implemented method reported an accuracy of 98.24%, a precision of 98.28%, a recall value of 98.24%, and an F1 score of 98.26%. Considering the accelerometer, magnetometer, and gyroscope sensors' data, the results obtained with imputed and normalized data are reported in the confusion matrix presented in Table 17. The implemented method reported an accuracy of 99.82%, a precision of 99.82%, a recall value of 99.82%, and an F1 score of 99.82%. Table 17. Confusion matrix related to normalized and imputed data from the accelerometer, magnetometer, and gyroscope sensors.  Table 18 summarizes the results obtained after all previously discussed experiments. Thus, 12 different experiments with respect to the different combinations of sensors and tests were used to analyze the effect of data imputation and data normalization, along with different combinations of sensors, as illustrated in Table 18. These results are presented in Figures 5-7, based on the sensors combinations, i.e., accelerometer (Ac) only, accelerometer and magnetometer (Ac + Mg), and accelerometer, magnetometer and gyroscope (Ac + Mg + Gy).

Discussion
In Figure 5, only accelerometer sensor data are utilized to perform the experiments with respect to four scenarios of data normalization and data imputation combinations. Each scenario is evaluated across four performance metrics, i.e., accuracy, precision, recall, and F-measure. It can be observed that the deep learning model performance across all metrics is highest with the application of normalization and imputation on the given dataset.  In Figure 5, only accelerometer sensor data are utilized to perform the experiments with respect to four scenarios of data normalization and data imputation combinations. Each scenario is evaluated across four performance metrics, i.e., accuracy, precision, recall, and F-measure. It can be observed that the deep learning model performance across all metrics is highest with the application of normalization and imputation on the given dataset. Similarly, Figure 6 shows the results when utilizing the accelerometer and magnetometer (Ac + Mg) sensors values to test the trained deep learning model with respect to four scenarios of data normalization and data imputation combinations. Each scenario is evaluated across four performance metrics, i.e., accuracy, precision, recall, and F-measure. It can be noticed that the deep learning model performance across all metrics is highest with the application of normalization and imputation on the given dataset. Similarly, Figure 6 shows the results when utilizing the accelerometer and magnetometer (Ac + Mg) sensors values to test the trained deep learning model with respect to four scenarios of data normalization and data imputation combinations. Each scenario is evaluated across four performance metrics, i.e., accuracy, precision, recall, and F-measure. It can be noticed that the deep learning model performance across all metrics is highest with the application of normalization and imputation on the given dataset. Likewise, Figure 7 displays the results when utilizing all three sensors data-i.e., accelerometer, magnetometer, and gyroscope-to test the trained deep learning model with respect to four scenarios of data normalization and data imputation combinations. Each scenario is evaluated across four performance metrics, i.e., accuracy, precision, recall, and F-measure. It can be observed that the deep learning model performance across all metrics is highest with the application of normalization and imputation on the given dataset. Performance of deep learning model when utilizing accelerometer, magnetometer, and gyroscope data to classify the human daily living activities across four data normalization and data imputation scenarios.
As results of this study, it was verified that the pattern of the imputed data is similar to the original data. However, its frequency and amplitude are higher than in the original data. Regarding the data classification, depending on the number of sensors, the accuracy was between 22.9% and 74.46% for non-normalized data, and between 85.89% and 89.51% for normalized data. After the Likewise, Figure 7 displays the results when utilizing all three sensors data-i.e., accelerometer, magnetometer, and gyroscope-to test the trained deep learning model with respect to four scenarios of data normalization and data imputation combinations. Each scenario is evaluated across four performance metrics, i.e., accuracy, precision, recall, and F-measure. It can be observed that the deep learning model performance across all metrics is highest with the application of normalization and imputation on the given dataset. Likewise, Figure 7 displays the results when utilizing all three sensors data-i.e., accelerometer, magnetometer, and gyroscope-to test the trained deep learning model with respect to four scenarios of data normalization and data imputation combinations. Each scenario is evaluated across four performance metrics, i.e., accuracy, precision, recall, and F-measure. It can be observed that the deep learning model performance across all metrics is highest with the application of normalization and imputation on the given dataset. Performance of deep learning model when utilizing accelerometer, magnetometer, and gyroscope data to classify the human daily living activities across four data normalization and data imputation scenarios.
As results of this study, it was verified that the pattern of the imputed data is similar to the original data. However, its frequency and amplitude are higher than in the original data. Regarding the data classification, depending on the number of sensors, the accuracy was between 22.9% and 74.46% for non-normalized data, and between 85.89% and 89.51% for normalized data. After the Figure 7. Performance of deep learning model when utilizing accelerometer, magnetometer, and gyroscope data to classify the human daily living activities across four data normalization and data imputation scenarios.
As results of this study, it was verified that the pattern of the imputed data is similar to the original data. However, its frequency and amplitude are higher than in the original data. Regarding the data classification, depending on the number of sensors, the accuracy was between 22.9% and 74.46% for non-normalized data, and between 85.89% and 89.51% for normalized data. After the data imputation process, depending on the number of sensors, the accuracy changed to between 20.00% and 20.19% for non-normalized data, and between 94.56% and 99.82% for normalized data.
In summary, all the experimental results depict that the deep learning model better distinguishes daily living activities when both data normalization and data imputation techniques were applied. Moreover, the deep learning model gives the best results when imputed and normalized data from the combination of all three sensors are used, i.e., the accelerometer, magnetometer, and gyroscope. Furthermore, the use of data imputation reported an improvement of 25.36% in accuracy, 21.58% in precision, 25.36% in recall, and 23.52% in F-measure values over the normalized and imputed dataset across all three sensors, as compared to the non-normalized and non-imputed dataset across all three sensors. Therefore, from the above experimental results, it is verified that the performance of the deep learning model significantly increased when normalization and imputation techniques were applied to the dataset across all three sensors.
As we are using a proprietary dataset, the results are not comparable with others. However, several limitations were found that are related to the acquisition and positioning of the mobile device, the power processing of the methods implemented, and other involuntary limitations of the study [6,24].
The results obtained are affected by the reduced sample size. Initially, the data normalization was performed, and the maximum accuracy was around 89.51% [34,35] with the recognition of the same activities and with the use of the same sensors of this study. The implementation of imputation techniques increased the results with a maximum accuracy of 100%. Thus, we can conclude that the data imputation techniques increased the different results.

Conclusions
The missing samples in the dataset affect the performance of deep learning models. Therefore, in this paper, a methodology was proposed to extrapolate the missing samples of human activity recognition dataset captures to make deep models better classify the human daily living activities. The proposed methodology utilizes the K-Nearest Neighbors (KNN) imputation technique to extrapolate the missing samples in dataset captures. Thus, 12 experiments were performed to analyze the effect of data imputation and data normalization, along with different combinations of sensors.
The proposed methodology, when compared to a non-normalized and non-imputed dataset across all three sensors, reported an improvement of 25.36% in accuracy, 21.58% in precision, 25.36% in recall, and 23.52% in F-measure values over the normalized and imputed dataset across all three sensors. The experimental results revealed that the performance of the implemented model increased with the implementation of the data imputation method.
Author Contributions: Conceptualization, methodology, software, validation, formal analysis, investigation, writing-original draft preparation, writing-review and editing, I.M.P., F.H., N.M.G. and E.Z. All authors have read and agreed to the published version of the manuscript.