Activities of Daily Living and Environment Recognition Using Mobile Devices: A Comparative Study

The recognition of Activities of Daily Living (ADL) using the sensors available in off-the-shelf mobile devices with high accuracy is significant for the development of their framework. Previously, a framework that comprehends data acquisition, data processing, data cleaning, feature extraction, data fusion, and data classification was proposed. However, the results may be improved with the implementation of other methods. Similar to the initial proposal of the framework, this paper proposes the recognition of eight ADL, e.g., walking, running, standing, going upstairs, going downstairs, driving, sleeping, and watching television, and nine environments, e.g., bar, hall, kitchen, library, street, bedroom, living room, gym, and classroom, but using the Instance Based k-nearest neighbour (IBk) and AdaBoost methods as well. The primary purpose of this paper is to find the best machine learning method for ADL and environment recognition. The results obtained show that IBk and AdaBoost reported better results, with complex data than the deep neural network methods.


Introduction
The use of mobile devices while doing daily activities is increasing [1]. These devices have different types of sensors that allow the acquisition of several data related to the user, including the accelerometer, magnetometer, gyroscope, Global Positioning System (GPS) receiver, and microphone [2,3]. These sensors allow the creation of intelligent systems to improve the quality of life. The monitoring of older adults or people with chronic diseases is one of the critical purposes. Furthermore, it can be useful to support sports activities and stimulate the practice of physical activity in teenagers [4]. The development of these systems is included in the research of Ambient Assisted Living (AAL) systems and Enhanced Living Environments (ELE) [5][6][7][8][9][10].

Study Design
This study consisted of the use of the same structure and data acquired by the research presented in [18,21,22,24,25] to implement a comparative study between three types of studies. The tests were conducted with the dataset available in [24], which included data related to the eight ADL and nine environments. The information was acquired from the accelerometer, magnetometer, gyroscope, microphone, and GPS receiver available in the mobile device.
As presented in [21], an Android application was used for the acquisition of the data related to the different sensors. This mobile application is responsible for data acquisition and data processing using built-in smartphone sensors such as the accelerometer, magnetometer, gyroscope, sound, and GPS data. The software was responsible for managing five seconds of data every five minutes. It was installed in a smartphone, and it was placed in the front pocket of the pants of 25 subjects with different lifestyles, aged between 16 and 60 years old. For ADL and environment identification, a minimum of 2000 samples with five seconds of data acquired from the different sensors was available in the dataset used for this research. Different environments were used in the performed tests and were strictly related to specific activities. The volunteers had to select the ADL that would be performed using the mobile application before the start of the test. By default, the mobile application did not save any data without user input. However, the proposed method had limitations related to battery consumption and the processing power needed to perform the tests. Currently, the majority of the smartphones available on the market incorporate high performance processing units that can be used to perform the tests, and the main problem is related to power consumption. However, most people usually recharge their mobile phones daily. Therefore, the proposed method can be used in real-life scenarios.

Overview of the Framework for the Recognition of the Activities of Daily Living and Environments
Based on the previously proposed framework [20], Figure 1 shows a framework composed of four stages, including data acquisition, data processing, data fusion, and data classification. The data processing consisted of several phases, including data cleaning and feature extraction. The data classification was divided into three stages, the recognition of simple ADL (Stage 1), the identification of environments (Stage 2), and the activities without motion (Stage 3). Stage 1 included the use of the data acquired from the accelerometer, magnetometer, and gyroscope sensors. The data received from the microphone were processed in Stage 2. Finally, Stage 3 increased the number of sensors, combining the data acquired from the accelerometer, magnetometer, and gyroscope sensors with the data obtained from the GPS receiver and the environment previously recognised. Mobile devices are composed of several sensors, which are capable of acquiring different types of data. The framework proposed was capable of acquiring and analysing 5 seconds of data and identifying the current ADL executed and the current environment frequented. The next stage consisted of the processing of the data acquired from the sensors for a further fusion of the different data acquired from the sensors. The final module of the framework consisted of the classification of the data, which started to process all features extracted from the sensors available in the mobile device and identified if the ADL executed was available in the set of ADL proposed. In the affirmative case, the ADL performed was presented to the user. Next, the environment frequented was recognised in the next stage, and it was presented to the user. If no ADL was recognised or the ADL recognized was standing, the identification of a standing ADL would be executed, trying to discover the activity performed by the user.

Data Acquisition
This study was based on the same dataset used in [21], which is publicly available in [31]. This dataset was composed of small sets of data (five seconds every five minutes) captured by the sensors available in the off-the-shelf mobile phones, i.e., accelerometer, magnetometer, gyroscope, microphone, and GPS receiver, and stored in the cloud. The dataset used in the presented study was created using an Android mobile application for data collection. On the one hand, the running and walking data were collected in outdoor environments. On the other hand, standing and going down and upstairs were performed inside buildings.
Moreover, the tests were conducted at different times of the day. In total, thirty-six hours of data were collected, which corresponded to 2000 samples with five seconds of raw sensor data each. Before data acquisition, the user had to use the smartphone to select the ADL that would be conducted and the time needed.

Data Cleaning
Data cleaning is a step performed during data processing. It is mainly used to minimise the effects of the environmental noise acquired during the acquisition of the data from the sensors. Data cleaning methods depend on the type of data acquired and the sensors used. On the one hand, a low pass filter was applied to the data obtained from the accelerometer, magnetometer, and gyroscope sensors [37]. On the other hand, the Fast Fourier Transform (FFT) [38] was used to extract the relevant information from the data collected from the microphone. There were no methods needed to clean the received data from the other types of sensors.

Feature Extraction
After the cleaning of the data, we extracted the features. Table 1 presents the extracted features from the selected sensors, which consisted mainly of statistical features. In Stage 1, the statistical features were mainly used, i.e., standard deviation, mean, maximum and minimum value, variance, and median, of the raw data and the peaks of the motion and magnetic sensors. It also included the calculation of the five greatest distances between calculated peaks. Stage 2 was composed of the feature acquired from the microphone, including the statistical features, i.e., standard deviation, mean, maximum and minimum value, variance, and median, of the raw data, and the calculation of 25 Mel frequency cepstrum coefficients with the microphone. Finally, Stage 3 included also the distance travelled calculated from the Global Positioning (GPS) receiver data and the environment recognised in Stage 2. Data fusion and classification were included in the last stage of the ADL and environment recognition framework. The previous studies reported that the best accuracies were achieved with the DNN method [18,21,22,24,25], and all the features are presented in Table 1. This study presents the results of the test and validation of different methods, including IBk, AdaBoost with the decision stump, and AdaBoost with the decision tree, implemented in the Java programming language for compatibility with Android based devices. The configurations used were different for the different methods implemented. Firstly, the DNN method was implemented with an activation function named sigmoid, which is a function that has the sigmoid curve, widely used as an activation function for neural networks [39]. Several learning rates were previously studied, and it was verified that we obtained better results with a value equal to 0.1. For this method, the maximum number of training iterations was established as 4 × 10 6 . The method was implemented without distance weighting, with three hidden layers, a seed value of six, and backpropagation. The Xavier function [40] was used as an initialization function, implementing L 2 regularization [41]. Secondly, the IBk method was implemented with a batch size of 100, a k value of 1, and the linear nearest neighbour search algorithm [42]. Finally, in the last two methods implemented, the main difference was the weak classifier used in combination with the AdaBoost method as the decision stump classifier [43], for the first one, and the decision tree classifier [44], for the second one. Other differences were revealed, where the combination of the AdaBoost method with the decision stump classifier was implemented with a maximum number of training iterations as 10, a seed value of 1, a batch size of 100, a weight threshold of 100, and without resampling. Thus, the combination of the AdaBoost method with the decision tree classifier was implemented with a seed value of 2, a batch size of 10, a number of maximum nodes equal to 4, and 200 as the number of trees.
Initially, we started with the identification of simple ADL, i.e., walking running, standing, going upstairs, and going downstairs, which was performed with the data acquired from the accelerometer, magnetometer, and gyroscope sensors. Secondly, the recognition of environments, i.e., bar, classroom, gym, library, street, hall, living room, kitchen, and bedroom, was performed with the data retrieved from the microphone. Finally, the recognition of activities without motion, i.e., driving, sleeping, and watching television, was performed with the data collected by the accelerometer, magnetometer, gyroscope, and GPS receiver with the inclusion of the environment recognised. Thus, the framework provided the recognition of eight ADL and nine environments.
For the implementation of the methods, the following technologies and frameworks were used:

Recognition of Simple ADL
The results of simple ADL recognition with the IBk method presented around 80% accuracy using the different combinations of motion and magnetic sensors, as presented in Table 2. AdaBoost is a binary classifier that uses a weak classier to improve the recognition of different events. The implementation of this algorithm was performed with the identification of each ADL. The results of simple ADL identification with the AdaBoost with the decision stump method implemented with Weka software are presented in Table 3, verifying that all of the ADL were recognised with an accuracy between 25.61% (going downstairs recognised with the accelerometer and magnetometer sensors) and 98.44% (standing recognised with the accelerometer, magnetometer, and gyroscope sensors). In addition, Table 4 presents the clarification of the values obtained in Table 3, presenting the True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) values. As this recognition was performed as binary recognition, i.e., the comparisons were performed by comparing the correct value with all records, we verified that the values of TP and TN were higher than others, proving the reliability of the method. Moreover, the results on the recognition of simple ADL with AdaBoost with the decision tree method implemented with the Smile framework are presented in Table 5, verifying that all of the ADL presented an accuracy between 83.79% and 99.55% using the different combinations of motion and magnetic sensors.
Additionally, Table 6 presents the clarification of the values obtained in Table 5, presenting the True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) values. As this recognition was performed as binary recognition, i.e., the comparisons were performed by comparing the correct value with all records, we verified that the sum of the values of TP and TN was 2000. This was the value of the number of samples equal to each activity, but the method reported a high number of FP.
Finally, the results previously obtained with the implementation of the recognition of simple ADL with the DNN method implemented with the Deeplearning4j framework are presented in Table 7, verifying that all of the ADL showed an accuracy between 66.70% and 99.35% using the different combinations of motion and magnetic sensors.

Recognition of Environments
The use of the IBk method for the recognition of environments using the microphone data reported an average accuracy of 41.43%, as presented in Table 8. The remaining results presented in Table 9 showed that the AdaBoost with the decision stump method implemented with Weka software had an accuracy between 10.36% and 91.78%. Next, the AdaBoost with the decision tree implemented with the SMILE framework reported an accuracy between 88.74% and 99.08%. Finally, the DNN method implemented with the Deeplearning4j framework presented an accuracy between 19.90% and 98.00%.
In addition, Table 10 presents the clarification of the values obtained in Table 9, presenting the True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) values. As this recognition was performed as binary recognition, i.e., the comparisons were performed by comparing the correct value with all records, we verified that the values of TP were higher in the recognition of bar, library, hall, and street. However, in the remaining classes, the values of TN were correctly recognised. Furthermore, Table 11 presents the clarification of the values obtained in Table 5, presenting the True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) values. As this recognition was performed as binary recognition, i.e., the comparisons were performed comparing the correct value with all records, we verified that the values of TP were higher in the recognition of bar, library, hall, and street. However, in the remaining classes, the values of TN were also correctly recognised.

Recognition of Activities without Motion
Initially, we presented, in Table 12, the results on the recognition of activities without motion with the IBk method reporting an accuracy between 99.27% and 100% using the data acquired from the accelerometer, magnetometer, gyroscope, GPS receiver, and the environment previously identified. Furthermore, the results of the implementation of the recognition of activities without motion with the AdaBoost with the decision stump method implemented with Weka software are presented in Tables 13 and 14, verifying that the events were recognised with an accuracy between 98.32% and 100% using the data acquired from the accelerometer, magnetometer, gyroscope, GPS receiver, and the environment previously identified.   Tables 15 and 16 present the clarification of the values obtained in Tables 13 and 14, presenting the True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) values. As this recognition was performed as binary recognition, i.e., the comparisons were performed by comparing the correct value with all records, we verified that the values of TP and TN were higher than others, proving the reliability of the method.   Additionally, the results on the recognition of activities without motion with the AdaBoost with the decision tree implemented with the SMILE framework are presented in Tables 17 and 18, verifying that the events were recognised with an accuracy between 98.50% and 100% using the data acquired from the accelerometer, magnetometer, gyroscope, GPS receiver, and the environment previously identified.   Tables 19 and 20 present the clarification of the values obtained in Tables 17 and 18, presenting the True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) values. As this recognition was performed as binary recognition, i.e., the comparisons were performed comparing the correct value with all records, we verified that the values of TP and TN were higher than others, proving the reliability of the method.  Finally, the results of the activity recognition without motion using the DNN method implemented with the DeepLearning4j framework are presented in Tables 21 and 22, verifying that the events were recognised with an accuracy between 79.55% and 98.50% using the data acquired from the accelerometer, magnetometer, gyroscope, GPS receiver, and the environment previously identified. Based on the results reported, Table 23 presents the average of the results obtained with the different algorithms implemented. As shown, the best results were achieved with the IBk method (99.68%) and AdaBoost with the decision tree as a weak classifier (94.05%).
The training stage was faster with IBk and AdaBoost with the decision tree than the DNN method previously implemented. These methods were less complicated to implement than the DNN method and were more efficient. Based on the limitations of mobile devices, these methods should be implemented in the ADL and environment recognition framework to improve the results provided to the user. The results showed that the recognition of ADL and its environments was possible with the implementation of the AdaBoost, IBk, and DNN methods. It allows opportunities to create a personal digital life coach and monitor the different lifestyles. It is important for all people, because mobile devices are widely used. They exploit the possibilities to improve the quality of life.

Discussion and Conclusions
The implementations of DNN, IBk, AdaBoost with the decision stump, and AdaBoost with the decision tree were performed with success with the dataset previously acquired, which was based on the data received from the accelerometer, magnetometer, gyroscope, GPS receiver, and microphone. The framework was composed of data acquisition, data processing, data cleaning, feature extraction, data fusion, and data classification, to recognise eight ADL and nine environments.
In general, the overall accuracies of the methods depended on the number of sensors and resources available during data acquisition. The framework should be a function of the number of sensors available in mobile devices. The methods with an accuracy higher than 90% were the IBk method and AdaBoost with the decision tree as the weak classifier.
The AdaBoost and IBk methods reported the best results because these methods were not susceptible to overfitting in comparison with the DNN method. Notably, one of the reasons for this conclusion was the use of a weak classifier by AdaBoost that handled the discrimination of some results.
According to the previously proposed structure of a framework for the recognition of ADL and environments [2,[17][18][19][20][21][22][23][24][25], the main focus of this study was related to the data classification module, taking into account the implementations of the other modules performed in previous studies. Previously, the DNN method was implemented, and it reported reliable results. Still, for the recognition of the environments with acoustic data, the results obtained were below the expectations, because it took many resources from the processing unit. For the validation of the different implemented methods, we performed cross-validation with 10 folds.
Following the tests of the different methods for the recognition of simple ADL, the best results were achieved with AdaBoost with the decision tree implemented with the SMILE framework, reporting an overall accuracy of 91.33% with all combinations of sensors. Still, there was a high number of FP.
In the case of the recognition of environments, the best method was also AdaBoost with the decision tree implemented with the SMILE framework, reporting an overall accuracy of 99.87%. Still, it did not recognise correctly two environments. However, the AdaBoost with the decision stump method implemented with Weka software did not recognise five environments correctly, reporting an overall accuracy of 32.04%. Finally, in the recognition of activities without motion, the results obtained with AdaBoost with the decision tree implemented with the SMILE framework were the same as the results obtained with the DNN method (99.87%).
As future work, the methods should be implemented during the development of the framework for the identification of ADL and its environments, adapting the approach to all the sensors available on mobile devices.