Identiﬁcation of Daily Activites and Environments Based on the AdaBoost Method Using Mobile Device Data: A Systematic Review

: Using the AdaBoost method may increase the accuracy and reliability of a framework for daily activities and environment recognition. Mobile devices have several types of sensors, including motion, magnetic, and location sensors, that allow accurate identiﬁcation of daily activities and environment. This paper focuses on the review of the studies that use the AdaBoost method with the sensors available in mobile devices. This research identiﬁed the research works written in English about the recognition of daily activities and environment recognition using the AdaBoost method with the data obtained from the sensors available in mobile devices that were published between 2012 and 2018. Thus, 13 studies were selected and analysed from 151 identiﬁed records in the searched databases. The results proved the reliability of the method for daily activities and environment recognition, highlighting the use of several features, including the mean, standard deviation, pitch, roll, azimuth, and median absolute deviation of the signal of motion sensors, and the mean of the signal of magnetic sensors. When reported, the analysed studies presented an accuracy higher than 80% in recognition of daily activities and environments with the Adaboost method.


Introduction
AdaBoost is one of the first boosting algorithms developed by Yoav Freund and Robert Schapire that was adapted for practical application in many solving tasks.AdaBoost is a method that uses ensemble learning techniques to combine multiple weak classifiers into a single strong classifier.It is combined with other artificial intelligence methods to increase the accuracy of the recognition [1].Thus, weak learners, including decision tree and decision boosting, are commonly used with the AdaBoost method.In comparison with other machine learning methods, the AdaBoost method is less susceptible to overfitting.
One of the strategies adopted by the different implementation of Adaboost consists in combination with other methods to reduce the errors obtained [2,3].The primary purpose of ensemble learning techniques is to improve the results by combining the results of different methods [2,3].These techniques consist of the combination of several machine learning techniques with a single purpose and model to improve the prediction results [4][5][6].It can be divided into two groups, sequential ensemble methods and parallel ensemble methods, where our focus is the sequential ensemble methods, because the implementation of Adaboost consists in the application of a base learner that is generated sequentially [7].
In the last years, several studies have been developed with a focus on the recognition of daily activities using the sensors available in the commonly used mobile devices.These studies conclude that it is possible to accurately detect the daily activities and environments with motion, magnetic, location and acoustic sensors embedded on mobile devices, reporting reliable results available in the literature with different machine learning methods [8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23].
Generally, the raw readings of one-dimensional (e.g., blood pressure sensor, thermometer, etc.) or multi-dimensional signals (e.g., accelerometer or gyroscope) can be directly processed by AdaBoost, and other classification and regression algorithms in general.To do that, all sensory readings in a specific time window represent different inputs.For example, if a thermometer reads data with 1 Hz frequency, and the window is 60 s, there will be 60 inputs to AdaBoost.Similarly, a three-dimensional gyroscope would present 180 inputs.Many deep learning methods accept the input data in this format.Be that as it may.Usually, many algorithms benefit from a feature engineering step [43], which significantly improves the accuracy or simplifies the complexity of the models [23,44].
Previous studies [33][34][35][36][37][38][39][40][41][42] shown that the proposed framework includes the correct modules for the reliable recognition of daily activities and environments.However, the results can be improved with other methods, including ensemble learning methods.This paper reviews the different studies available in the literature related to the implementation of the AdaBoost method for daily activities recognition.This review is included in the research and development of a framework associated with the identification of daily activities and environments using the sensors available in mobile devices, where the AdaBoost method can increase the accuracy compared to other implementations.The motivation of this paper is to improve the accuracy reported in previous studies for the recognition.This review intends to explore the use of the Adaboost method to verify if it reports better results than MLP, FNN, and DNN methods for the identification of daily activities.
The main contribution of this review is the presentation of a base of study for the readers who deal with the recognition of daily activities and environments using sensors available in mobile devices providing an in-depth survey of several research projects which implement Adaboost method.
This review shows that the features that reported better results are mean, standard deviation, pitch, roll, azimuth and median absolute deviation of the signal of motion sensors, and the mean of the signal of magnetic sensors.According to the results, the Adaboost method provides huge accuracy for the recognition of daily activities and environments.
The following sections are organized as follows: Section 2 presents the methodology of the review.The results obtained are presented in Section 3. Section 4 presents the discussion on the results.Finally, the conclusions are presented in Section 5.

Research Questions
In this way, the leading questions of this review are: (RQ1) What is AdaBoost?(RQ2) How to detect daily activities with AdaBoost?(RQ3) How to identify daily activities with AdaBoost using mobile devices?

Inclusion Criteria
Studies assessing the recognition of daily living using AdaBoost method were included in this review according to the following criteria: (1) Detect daily activities using sensors; (2) implementing AdaBoost method for the automatic recognition of daily activities, presenting the information about the activities and environments recognized; (3) make use of mobile devices; (4) presents the accuracies obtained with AdaBoost method; (5) published between 2010 and 2019; (6) were available in open-access libraries; and (7) written in English.

Search Strategy
The authors of this review searched for studies according to the inclusion criteria in the following electronic databases: IEEE Xplore and Science Direct.Every study was independently evaluated by eight reviewers (JF, IMP, GM, NMG, EZ, PL, FFR, and SS), and all parties evaluated its suitability.The studies were examined to identify the characteristics of AdaBoost and its relevance for the implementation in recognition of daily activities and environments using mobile devices.

Extraction of Study Characteristics
The following data were extracted from the studies and tabulated (see Tables 1 and 2): Year of publishing, the population was taken into account, purpose, equipment used, and outcomes of each publication.All cited studies in Tables 1 and 2 informed that the experiments were performed in laboratory settings.The verification of the availability of the raw data was performed.

Authors Year Outcomes
Kelarev et al. [46] A cardiovascular autonomic neuropathy identification algorithm that uses mobile devices is proposed.The dataset has been created using health records collected in a university research project named Diabetes Complications Screening Research Initiative.
The main contribution of the paper is the recommendation of the AdaBoost and Bagging based on the J48 decision.
Xu et al. [47] The paper presents an accurate method for context detection, which uses multiple sensors and machine learning.The context information is restrictively used to select activities that require classification, increasing the accuracy and decreasing the complexity of the process.Fourteen subjects each carried a tablet, and four 9-DOF sensors were located on wrists, ankle, knee, and mid-waist.Each volunteer allocated thirty minutes in every context and did each required activity from two to five minutes.The dataset was then divided into two parts, 30% of the data for training and 70% rest for testing.The combined results of the three classifiers were able to achieve higher accuracy for all contexts.
Wisniweski et al. [48] The paper presents an automatic recognition method of asthmatic wheezing through the analysis of a breathing sound dataset.One hundred thirty s130 records for natural and wheezy breathing using 1024 samples each were used for the study.The overall recognition was 93%.
Zhou et al. [49] The authors propose the HATS, which provides both entry-point and post-log-in mobile user authentication.The proposed method integrates several authentication methods like password, keystroke, gesture, and touch dynamics features to explore the vulnerabilities of specific approaches to specific security attacks.The participants were required to go through several training sessions to be introduced to the usage of two different keyboards.Twelve volunteers (for men, eight women) carried the study.

Masri et al. [50]
The study proposes active authentication applying scrolling behaviors for biometrics and evaluates diverse classification and clustering approaches that support those characteristics.The experiment counted with 84 participants and 54 documents.The most accurate method was achieved adopting k-means clustering among two techniques applied to validate users, with a success rate of 83.5%.

Xu et al. [51]
The authors propose an online learning approach for activity recognition based on data collected using inertial sensors.The data was gathered from fourteen volunteers.Every volunteer performs thirty minutes in the respective context and carried each required activity for two to five minutes.This algorithm outperformed the benchmark algorithms by 30-40%.
Tang et al. [52] The paper shows an assessment of ten representative classifiers applied in two datasets.The dataset contains accelerometer time-series data from 22 volunteers.This study concluded that K-Nearest Neighbors is the most suitable classifier.

Yanyun et al. [53]
The paper presents a method based on Convolutional Neural Networks approach to provide automatic extraction of features for transportation mode classification.There were used a total of 169 features, and the dataset has more than 200 h of transportation data collected from thirty volunteers on diverse transportation modes (bus, car, metro, train).The recognition accuracy was: 96.6% for the bus, 99.6% for the car, 99.0% for the metro, and 98.9% for the train.Giving an average accuracy of 98.6%.

Authors Year Outcomes
Li et al. [54] 2017 The authors propose an indoor and outdoor recognition method, which is divided into two parts: The machine learning-based Indoor, Outdoor, and Semi-open areas recognition algorithm and the lightweight WiFi sub-detector.The absolute values and the relative measurements of WiFi received signal strength are calculated to identify if the user environment is a semi-open area, indoor or outdoor.The proposed method presents 85% of accuracy for the lightweight WiFi-based technique and 96% of accuracy using the aggregated IOS-detector.

Yanjun et al. [55] 2017
The article proposes a Bayesian algorithm for traffic pattern recognition.The used dataset consists of 400 h from eight individuals.An accelerometer, a barometer, a geomagnetic, a gyroscope, and base station were the five used sensors.The AdaBoost classification method was also implemented to get better results.The proposed method presents an accuracy rate from 83.3% to 91.5%.

Vafeiadis et al. [56] 2017
The paper presents a machine learning approach for occupancy detection.The water and energy consumption data collected using smart meters are used as features for occupancy detection in a domestic environment.Under their boosting versions, Random Forest and Decision Tree classifiers present more accuracy when associated with the other classifiers.The authors obtain an overall accuracy of 83.37% and 82.79%, respectively.

Subasi et al. [57] 2018
This study proposes the use of AdaBoost based classifier for human activity recognition using data collected from sensors located on the body.The study is based on nine inertial sensors collected by seventeen volunteers who perform 33 fitness exercises.
The results present 99.98% of success rate.

Yuan et al. [58] 2018
The authors present an indoor localization algorithm based on 'Twi-AdaBoost'.The proposed method uses several sensors, such as gyroscope, magnetometer, and accelerometer.The tests used 6304 samples collected from both smartphone and smartwatch devices.The AdaBoost method outperforms the other approaches tested in every metrics.

Results
As pictured in Figure 1, we identified 151 papers with three duplicates, that were removed.The other 148 articles were evaluated according to the title, keywords, and abstract, excluding 133 citations.After full-text evaluation, two papers were removed from the remaining 15 papers.The qualitative and quantitative synthesis included information related to the remaining 13 articles.
In conclusion, we examined 13 documents.To find relevant information about the implementations presented in the different studies analysed in this review, the reader should find the information in the original cited works.Table 1 shows the year of publication and the resume of the papers and final results.Table 2 shows the population, the purpose of the study, devices, settings of the papers, pros, and cons.When the datasets used in a study is publicly available, or the population information is provided, it is considered as a positive aspect.In many cases, the evaluation uses a cross-validation scheme (regular or stratified per class).However, the studies do not consider different subsets of the population for training and testing (i.e., train/test split based on subjects or patients).This is generally a more rigorous evaluation scheme and is expected to hurt the reported accuracy.Other more specific pros and cons are provided for each study.
The papers were published between 2012 and 2018, where two studies were published in 2018 (15%), four studies were published in 2017 (31%), two studies were published in 2016 (15%), two studies were published in 2015 (15%), two studies were published in 2014 (15%), and one study was published in 2012 (8%).Regarding the used devices, it was split among 43% for smartphones and the remaining 57% for mobile devices.The source code is not available for all studies analysed.Moreover, 69% of the studies have the raw data available.Finally, we verified that there are no studies that shared the source code.

Methods for Identification of Activities in Daily Living
In the study [57], the authors tried to use different classifiers for the recognition of activities with sensors to find the best method.Ten classifiers were utilized with the AdaBoost method.The dataset used was publicly available.The settings were investigated using nine inertial sensors from seventeen individuals taking into account 33 fitness activities.The used sampling rate was 50 Hz.After checking accuracies of the AdaBoost method, authors came to conclude that its implementation with random forest gives the best accuracy, with a value of 99.98%.
Authors of [49] have proposed harmonized authentication based on ThumbStroke dynamics (HATS) for mobile devices.The performance of HATS was tested, taking into account the different screen sizes of several mobile devices.Laboratory experiments were conducted to collect data for testing.Participants were required prior experience with touch screen devices and a qwerty keyboard.The study selected some features for learning ThumbStroke models, and these are timing features, spatial features, movement direction features, and operation features.The phrases, entered by the participants, were adopted from MacKenzie and Soukoreff and varied from 16 to 43 characters.Based method across all settings and classification models, the final results showed that HATS outperformed the keystroke dynamics.Among all the classification methods used, AdaBoost reported a maximum accuracy of 41.8%.
Li et al. [54] talks about an indoor/outdoor detection system (IOS).This method is split by the machine learning-based IOS-detector and the lightweight WiFi sub-detector.The first part infers indoor, outdoor, or semi-open environments based on the classification results.The second part focuses on the implementation of mobile devices.Finally, the other part consists of the IOS detection that shows high accuracy for the system.In conclusion, the proposed IOS detector achieves around 96% for the aggregated IOS detector and over 85% accuracy for the lightweight WiFi-based sub-detector.
In the study [50], the authors introduce a method for re-authenticating users taking into account a behavioral biometric-based on users' document scrolling traits.More specifically focused on identifying abnormal scrolling behavior on users while interacting with protected or read-only documents.Dataset was obtained from a previous project aimed to detect document access activities that indicate cyber attacks.Features for this paper were slit in vectors, being vector one derived from scrolling traits, vector two a representation of the polarity of scrolling, and vector 3 treats the dataset as a bipartite graph with two node sets.k-means clustering achieved the best performance with an 83.5% success rate in predicting the authenticated user.
The paper [48] presents a highly efficient method for the automatic detection of asthmatic wheezing in breathing sounds.The process is suitable for personal asthma monitoring via mobile devices since its not computationally complex.Most of the used data came from online databases of Human lung sounds.However, the authors also used several of their recordings of regular and wheezy breaths.The authors also confirmed the optimality of the audio spectral envelope (ASE) plus the value of the tonality index (TI) as a feature detector, using the mRMR (minimal redundancy-maximal relevance) method.Thousands of experiments were performed, and the best results were obtained from the fluctuation of the Audio Spectral Envelope descriptor adopted from the MPEG-7 standard, reporting an accuracy around 100%.
Authors of [53] developed a method to collect the sensor data, acceleration, gyroscope, geomagnetic, and atmospheric pressure were the four kinds of sensors used.The shallow feature extraction of the raw data happens before the CNN learning deep feature, which will reduce the complexity of the network and training time of the model.This process is critical for smartphones because of their limited resources.Three classes of features are extracted from each frame, including statistical, time, and frequency domains.Namely, the features used are: Mean, standard deviation, variance, median, minimum, maximum, range, interquartile range, kurtosis, skewness, root mean square, integral, double integral, autocorrelation, mean-crossing rate, fast Fourier transform, spectral energy, spectral entropy, spectrum peak position, wavelet entropy, and wavelet magnitude.Final results show that the proposed method can achieve 98% accuracy, meaning it outperforms the SVM (support vector machine) and AdaBoost classification in efficiency and computational cost, reporting accuracy of 93.6% with AdaBoost.Yuan et al. [58] propose an indoor localization system using sensors for smartphones and smartwatches.Over 36,000 samples of data were collected in a 185.12 m 2 real indoor environment by a user using two different devices.Looking with the experimental results, the authors concluded that Twi-AdaBoost outperforms the state-of-the-art indoor localization algorithms.The localization error of position x and y achieved was 0.387 m and 0.398 m, respectively.The used datasets include the features: Place ID, Timestamp, Accelerometer_X, Accelerometer_Y, Accelerometer_Z, MagneticField_X, MagneticField_Y, MagneticField_Z, X_Axis Angle (Pitch), Y_Axis Angle (Roll), Z_Axis, Angle (Azimuth), Gyroscope_X, Gyroscope_Y, and Gyroscope_Z, reporting an accuracy around 99%.
In the paper [55], a novel technique based on the Bayesian voting algorithm that can be used with low-power sensors for transportation mode detection is presented.The authors used a set of data that consists of 400 h from eight individuals.Five sensors were used, being those: Acceleration, gyroscope, geomagnetic, barometer, and base station obtain by using AdaBoost classification to improve the results.Besides, the Bias algorithm was used to extract the features to reduce the adaptive boosting feature dimensions and determine the critical factors for identifying different transportation modes.The features used are: Mean, standard deviation, variance, median, minimum, maximum, range, interquartile, kurtosis, skewness, root mean square, time integral, double integral, auto-correlation, mean-crossing rate, fast Fourier transform, spectral energy, spectral entropy, spectrum peak position, wavelet entropy, wavelet magnitude, peak volume, intensity, length, variance of peak features, peak frequency, stationary duration, stationary frequency.Taking into account the final results, authors concluded that their algorithm could supply and replace some traffic pattern recognition algorithms and fix the problem that different mobile phones have various sensors, reporting accuracy between 64.54% and 96.83%.
In [51], the authors presented a contextual multi-armed bandits (MAB) approach that enables activity classification.This method makes context adaptation, continuous online learning, and active learning.Since the cost of extracting specific features is very high, the authors decided to use side information as the context.Since features can be used as contexts, this is not a limitation for the project.The proposed algorithm with active learning outperformed the benchmark algorithms by an average of 35%, reporting, and accuracy between 70% and 85%.
Xu et al. [47] focuses on three challenges, including the ability to accurately detect context using sensors and machine learning.The selection of activities for classification is performed by using context, reducing the complexity and improving the accuracy, speed, and energy usage, and the ability for experts in prescribing sets of physical activities under different environments.The features used for the project were: kNN (k-Nearest Neighbor) with time, kNN with wireless media access control (MAC) address and signal strength, and AdaBoost with audio peak frequency, peak energy, average power, and total energy.These were extracted from raw sensor data using a java program implementing the IContextFeatureExtractor interface.The data used was acquired by 14 participants that carried an Android mobile phone, and four 9-DOF devices were placed on dominant wrists, knee, ankle, and mid-waist.Each subject performed every required activity under every context for 2-5 min.The data were split into training (30%) and testing (70%) sets.Authors concluded that despite the methodology demonstrating effectiveness, efficiency, and potential, a more extensive study needs to be performed to improve privacy, security, and user-friendliness, reporting accuracy between 59% and 100%.
In [56], the problem of occupancy detection in a domestic environment was studied using machine learning techniques and their boosting versions on a dataset collected from electricity and water consumption smart meters.These features were selected using the Mutual Information technique.The dataset contains energy and water consumption (during summer) time data of 1-minute resolution for 16 consecutive days.The features included in the used dataset were: Central power, refrigerator, television, washing machine, dryer, cold water-kitchen, hot water-kitchen, dishwasher-water and washing machine-water, reporting accuracy higher than 70%.
Authors of [52] evaluated ten representative classifiers in the identification of two available datasets.The first dataset consists of accelerometer readings of walking patterns from 22 participants.The second one contains activity and postural transition data collected from the accelerometer and magnetometer data acquired from 30 participants.For the Walking dataset, the authors split the data into fixed-width sliding windows with a 50% overlap and extract nine features from every window and scale the features to [−1, 1].The authors obtained the mean, standard deviation, and median absolute deviation from the different axis of the sensors.The authors of the study already pre-processed the sensor signals by noise filter and partitioned the data into fixed-width sliding windows with a 50% overlap as well and constructed a 561-feature vector for every window.From those features, authors extracted 24 features, including mean, standard deviation from the different axis of body acceleration, gravity acceleration, jerk signals of body acceleration, angular velocity, and jerk signals of angular velocity.In conclusion, the authors reported an accuracy between 95.6% and 97.8%.
The study [46] focuses on using mobile devices for the detection of cardiovascular autonomic neuropathy.The authors concentrated on the task of the detection and monitoring of cardiovascular autonomic neuropathy.After all the studies, they concluded that best outcomes were obtained by the novel combined ensemble of AdaBoost and Bagging based on the J48 decision tree, reporting the highest accuracy of 94.53%.

Discussion
This review confirms that AdaBoost, and in general boosting ensemble methods, are reliable for the identification of daily activities.Several studies are not well described, and the source code of the algorithms are not publically available.The verification and reproducibility of the obtained results is not easily possible, because of the following reasons: Only some authors shared the datasets; in many cases, the methods are not explained well explained, in particular, the preprocessing of the datasets; and the hyper-parameter tuning is poorly described, or the exact algorithm parameters are not described.
The number of studies using the AdaBoost method for the recognition of daily activities is minimal, and the daily activities mainly recognized are the simple activities, including walking, running, walking upstairs and downstairs, and other quotidian activities.
Following our literature review, most of the analysed studies (85%) report the best results using AdaBoost methods.Only two studies (15%) presented in [49,58] have said that the AdaBoost based methods do not show the best results when compared with the other approaches for daily activities and environments recognition.Nevertheless, the authors of these studies still recognised the reliable applicability of the AdaBoost method for activity and environment recognition activities.
In summary, all reviewed works first perform a feature extraction step, which somewhat varies depending on the used sensor types.In cases of multiple sensors, or multi-channel sensors, the feature extraction is performed independently for each time series (i.e., channel or sensor).Generally, various statistical metrics, as listed in Table 3, are computed on the raw signal in the time domain, and rarely features are deriving from the frequency domain.Then, after the features are extracted from each sensor as a separate time series, the extracted features are fed into the classifiers.Very often, a systematic approach to feature extraction improves the accuracy [23].
The authors used different features, and the average accuracies obtained with them can be comparable.Table 3 presents the average accuracy of the various features extracted, verifying that the features that allow the recognition of daily activities with an accuracy higher than 90% are the mean, standard deviation, pitch, roll, azimuth and median absolute deviation of signal of motion sensors, and the mean of the signal of magnetic sensors.Moreover, Table 4 presents the advantages and disadvantages of the Adaboost method, proving that it can be used for the recognition of daily activities and environments with the recent advancements in the hardware and software of the devices commonly used.In comparison with other algorithms, the Adaboost method uses different algorithms as the weak learner, in which these algorithms will take into account the features extracted from the signals, such as mean, standard deviation, variance, and others.In general, Adaboost made use of complex data, but it can be used with 1D data in comparison with other algorithms.The authors of the research studies analysed used the Adaboost with uni-dimensional data, i.e., they used the features extracted from the data to provide the results, where the results obtained proved its reliability for physical and physiological data.
In conclusion, the use of mobile devices for daily activities recognition using AdaBoost is limited, because of the low power processing and battery capabilities of these devices [59,60].According to the reported studies in this review, it is possible to conclude that the use of the AdaBoost method is reliable with mobile devices as verified by the accuracies reported in the different studies, where only two studies reported accuracies lower than 50%.

Conclusions
This review presents studies available in the literature that use the AdaBoost method for the recognition of daily activities and environments.Thirteen studies were analysed, and the main findings are summarised as follows: • (RQ1) The AbaBoost method is an ensemble learning method that is used in conjunction with other algorithms.The different algorithms are commonly named as weak classifiers, avoiding the overfitting problem; • (RQ2) The AdaBoost method is implemented in conjunction with other algorithms to increase the accuracy of the recognition of daily activities and environments; • (RQ3) For the recognition of daily activities and environments, the AdaBoost method is combined with a weak classifier.The features that reported better accuracy are the mean, standard deviation, pitch, roll, azimuth, and median absolute deviation of the signal of motion sensors, and the mean of the signal of magnetic sensors.
This review also highlights the use of smartphones and other mobile devices as they should have a particular purpose because of limited battery life and processing capabilities.First, the authors excluded studies that are not focused on the recognition of daily activities end environments with the AdaBoost method.Secondly, the studies that do not use sensors available on mobile devices were excluded.We excluded several studies after analysis of the abstracts and full-text of the papers.Another reason for exclusion was the language of the study, excluding the studies that were not written in English.With the features collected, the AdaBoost method allows recognition with an accuracy higher than 80%.
As future work, the implementation of the AdaBoost method in the framework for the recognition of daily activities and environments; it will be used to recognize seven daily activities and nine environments.

Table 2 .
Critical analysis of reviewed studies.

Table 3 .
Average of the accuracy reported in the studies analysed, grouped by features.

Table 4 .
Advantages and disadvantages of the use of Adaboost method in the different studies analyzed.