Northumbria Research Link An Ensemble of Condition Based Classiﬁers for Device Independent Detailed Human Activity Recognition Using Smartphones †

: Human activity recognition is increasingly used for medical, surveillance and entertainment applications. For better monitoring, these applications require identiﬁcation of detailed activity like sitting on chair/ﬂoor, brisk/slow walking, running , etc. This paper proposes a ubiquitous solution to detailed activity recognition through the use of smartphone sensors. Use of smartphones for activity recognition poses challenges such as device independence and various usage behavior in terms of where the smartphone is kept. Only a few works address one or more of these challenges. Consequently, in this paper, we present a detailed activity recognition framework for identifying both static and dynamic activities addressing the above-mentioned challenges. The framework supports cases where (i) dataset contains data from accelerometer; and the (ii) dataset contains data from both accelerometer and gyroscope sensor of smartphones. The framework forms an ensemble of the condition based classiﬁers to address the variance due to different hardware conﬁguration and usage behavior in terms of where the smartphone is kept (right pants pocket, shirt pockets or right hand). The framework is implemented and tested on real data set collected from 10 users with ﬁve different device conﬁgurations. It is observed that, with our proposed approach, 94% recognition accuracy can be achieved.


Introduction
Human physical activity refers to any body movement produced by skeletal muscles or different position of the limbs with respect to time upstanding against gravity that results in an energy expenditure [1,2]. Activity recognition and monitoring system concurrently identifies, evaluates the actions carried out by a person on a daily basis in real conditions of the surrounding environment and provides context aware feedback for healthcare and elder care. Daily activity is a complex concept; it depends on many factors, including physiological, anatomical, psychological, and environmental effects. Human daily activity tracking was traditionally solved by an image processing approach and vision-based techniques [3,4]. However, these techniques may violate user privacy, mostly require infrastructure support like installing video cameras in the monitoring areas, and depend heavily on lighting conditions. Several works consider wearable sensors individually and combined with ambient sensor for activity recognition [5][6][7]. Many of the early efforts focused on detecting fall and daily-life activities, mainly using one/or more wearable accelerometers. However, it may not be convenient for patients to carry out daily activities with sensors worn in hands and/or limbs. However, inertial sensors of smartphones can be a convenient option for activity recognition as most users almost always carry smartphones. Most smartphones are equipped with accelerometer, gyroscope, compass and proximity sensors. In addition, these devices also have communication facilities like Wi-Fi and Bluetooth by which sensor readings can be transferred to a server. Continuous raw data are collected from several sensors of the smartphone during monitoring. The data are processed for extracting useful features, fed to some classification algorithm for training an appropriate activity model, in order to recognize a variety of activities.
Daily activities can be categorized in two ways: coarse-grained or simple activity and fine-grained or detailed activity. Coarse-grained (Sit, Stand, Walk, etc.) is simplified larger sub-component of basic activity, whereas fine-grained, that is, detailed activity, refers to smaller distinguishable subcomponents that can be composed together to get a coarse-grained activity. Fine-grained or detailed activity contains activities like Sit on floor/chair. Identifying detailed activity can be beneficial for many medical applications such as elderly assistance at home, post trauma rehabilitation after a surgery, detection of gestures, motions and fitness of diabetic patients, etc. The elderly population has increased significantly who are living alone and mostly suffering from chronic diseases. Stroke patients need assistance and require regular monitoring during rehabilitation. Increased walking ability is the focus of rehabilitation. Accurate information on daily activity has the potential to improve the regular monitoring and treatment in several diseases and sometimes it reduces high burden of hospitalization costs.
Existing works [8][9][10] mostly focus on coarse grained activity recognition. Few works could be found on detailed activity [11,12] using several inertial sensors that may not be present in many smartphone configurations, thus the system is not ubiquitous. In literature, most of the works [9,13,14] consider a single classifier approach to study activity recognition system with smartphones. In [15,16], the authors use an ensemble learning technique for activity recognition. In real life, the training and testing environment for activity recognition are not same all the time. Generally, raw data from several devices are collected to monitor activities. Due to several hardware configuration and calibration problem, sensor readings vary from one device to another. Even orientation of the smartphone, which depends on the usage behavior with respect to human subjects, is also a factor affecting classification accuracy of the system. Several sensors are used for activity recognition in order to make the recognition system device independent in [17]. A recent work in [18] addresses different usage behavior like smartphones kept at a coat pocket or bag. However, no work could be found that enables detailed activity recognition even when training data is collected using one device at one position (say, right trouser pocket) and activity is recognized for test data collected from a different device kept at the same or different position (say, shirt pocket). Consequently, our main contribution in this work are as follows: (a) We propose an activity recognition framework using an ensemble of condition based classifiers to identify detailed static activities (Sit on chair, Sit on floor, Lying right, Lying left) as well as detailed dynamic activities (Slow walk, Brisk walk). The proposed technique works irrespective of device configuration and usage behavior. (b) The process utilizes accelerometer and gyroscope sensor of smartphones that are available in almost all smartphones by most of the manufacturers, thus making the framework ubiquitous. (c) The proposed technique can identify the effect of accelerometer and gyroscope for identifying individual detailed activity.
The rest of this paper is organized as follows: Section 2 describes state-of-the-art techniques for activity recognition through smartphones. Definition of the problem is discussed in Section 3. Design of the proposed system is detailed in Section 4. Section 5 describes the experimental setup and summarizes the results. Finally, we conclude in Section 6.

Related Work
A typical activity identification framework mostly follows four phases including (1) Data collection and preprocessing; (2) Feature extraction; (3) Feature selection and (4) Classification as shown in Figure 1. Data are collected through several sensors with respect to human usage and behavior. Preprocessed data are sent to a server for further processing. Several time and frequency domain features are extracted and selected from preprocessed data. The classification techniques are applied on the server side to recognize an activity.
Data are collected from wearable sensors and smart handhelds. One of the most important issues in data collection is the selection of sensors and the attributes to be measured, which play an important role in the activity recognition system's performance. Incorrect selection of sensors may adversely affect the recognition performance. Inertial sensors such as accelerometers and gyroscopes are used for activity recognition. The accelerometer measures non-gravitational acceleration of a smart device while the gyroscope senses the rate of change of orientation or angular velocity. Human Activity Recognition can be broadly classified on the basis of medium of data collection into two categories of "Using Wearable Sensor" and "Using Smartphone". Some relevant state-of-the art works are summarized in Table 1. Most of these works use wearable accelerometers, or accelerometer sensors of smartphones. It is evident that works are done in different directions, not only detecting detailed daily life activities and fall [7,19], but also on online activity recognition [20], publishing benchmark datasets [21] as well as analyzing different usage behaviour as in [18,22]. [20] in 2012 Idle, Walking, Cycling, Driving, Running Smartphone placed in pocket Accelerometer The entire system is implemented offline and online. However, in the online mode, the recognition on the device was performed using only a limited number of randomly chosen instances from training data due to limited computational power of the smartphones. Although the online results are almost comparable with the offline results, the system is not entirely user independent. [21]  A lot of research has been done in Human Activity Recognition (HAR) using Wearable Devices [25] mostly using acceleration data independent of orientation [26]. Combination of accelerometer sensor and gyroscope sensor placed on the neck of users are used in [27] to identify activities. They have evaluated the effect of appending gyroscope with accelerometer data and maintained individual threshold value for identifying several activities. The authors in [19] used gyroscopes and accelerometers placed on the thigh position and chest position of the user to identify several activities as well as fall using inclination angle and accelerometer value. An unintentional transition to a lying posture is regarded as a fall, where large changes in accelerometer and gyroscope readings can be observed. Authors in [19] differentiate intentional and unintentional transitions by applying thresholds to peak values of acceleration and rate of angular velocity from gyroscope. A certain change of angular velocity determines the fall of the subject.
A few works can also be found on wearable sensors using classifiers to recognize activities. In [28], authors considered accelerometers kept in three positions (wrist, ankle, chest) of the human body to monitor daily activities and applied Decision Table on the preprocessed data for activity classification. In [29], several activities are monitored using a customized device that is configured with an accelerometer attached to wrist position. However, the gyroscope sensor alone has not achieved a significant place in human activity recognition, and it can work in combination with other sensors. Using many wearable sensors for activity detection may hamper the movement of a person itself. In [30], "Vital Signs" are detected in addition to acceleration data, where vital signs vary in each activity, for example, when an individual begins running, it is expected that their heart rate and breath amplitude increase. This then becomes the vital sign for that activity and provides better accuracy. However, the advent of smartphones and the exponential increase in their usage in the past decade has resulted in a growing interest in HAR using smartphones for data collection as smartphones provide somewhat more convenient wearable computing environment.
Several works extensively study the use of smartphone based inertial sensors like accelerometers and gyroscopes in activity recognition. The Activity Recognition API by Google [31] provides insights into what users are currently doing and is used by several Android applications to enhance their user experience. The API automatically detects activities by periodically reading short bursts of sensor data and processing them using machine learning models. However, the set of activities that it can recognize is limited to coarse grained activities such as "sit", "stand", "walk", "run", "biking", "device in vehicle", etc. Few works could be found that focus on using minimal sensors like only accelerometers [10,22,24] for making the framework energy efficient and ubiquitous. Both of the works use state-of-the-art classifiers including MultiLayer Perceptron (MLP); however, the work in [22] aims at detecting detailed activities like slow walk and fast walk, while, in [24], mainly coarse grained activities are covered. In [22], the average accuracies of combination of classifiers is used for recognizing activity resulting in around 91% accuracy. Few works consider gyroscope along with accelerometer for gait analysis [32,33] and fall detection [34]. In [33], K-Nearest Neighbor (KNN) is used and achieved around 80% accuracy while different supervised learning algorithms are explored in [34], and Support Vector Machine (SVM) giving the highest accuracy.
In [14], authors consider all sensors available in smartphones like accelerometer, gyroscope and magnetometer for identifying human activity and explain the role of each sensor. The combination of accelerometer and gyroscope [14,35] is found to yield better results in some aspects. In [14], several machine learning algorithms are applied for activity recognition while the SVM is used to classify activities in [35]. The authors in [36] show the potential of using only magnetometer and how it affects activity recognition. In [37], the authors use a combination of an accelerometer and a gyroscope, and the recognition accuracy for some of the activities increases from 3.1-13.4%. The authors in [15] consider Decision tree, Logistic regression and Multilayer neural networks algorithms as base classifiers and designed a majority voting based ensemble [38] to identify human activity. It is found to increase accuracy up to 3.6% from a single classifier based approach. In [16], the authors combine multiple classifiers to improve the accuracy of activity recognition up to 7% and overall accuracy of 93.5% using a 5-fold cross validation technique. In this way, the output of different classifiers can be combined using several fusion techniques to improve classification accuracy and efficiency.
In [39], authors identify activity applying KNN on the combination of various data like accelerometers, magnetometers, gyroscopes, linear acceleration and gravity, and it performs better than the accelerometer alone. However, most of these works above focus on coarse grained activities [8,9]. Few works could be found on [7,11,12] detailed activity recognition. In [7], the authors classified a number of detailed daily life activities with the help of several wearable sensors. They even predicted possible classification of a slow walk and brisk walk on the basis of speed. The authors in [12] uses wearable sensors to monitor detailed activity along with fall detection using Hidden Markov Model (HMM). In [40], HMM is applied to get activities subject to smartphone and ambient sensors. User ambience is also used in [11], where authors use accelerometers and gyroscopes for body locomotion, temperature and humidity sensors for sensing ambient environment, and location (via communication with Bluetooth beacon location tags). Two-level supervised classification is performed to detect the final activity state. A modified conditional random field based supervised activity classifier is designed by the authors for this purpose. However, the use of several sensors makes the systems more expensive and inconvenient for users.
In reality, for smartphone based ubiquitous activity recognition, we cannot impose constraints like the same device being used for training and testing or smartphones needing to be kept at a fixed position (the same as the one used for training the system). Thus, detailed activity recognition works should also consider the usage of different devices for training and testing (device independence) and usage behavior in terms of where the smartphone is kept for training and testing (position independence). The work in [17] considers device independence issues with multiple sensors and, in [41], position independent activity recognition framework is presented. In [17], the authors focus on several challenges like different users, different smartphone models and orientation. They have used several smartphone based sensors like accelerometers, gyroscopes and magnetic field sensors to remove gravity from accelerometer signals and converted accelerometer signal data from the body coordinate system to the earth coordinate system. Frequency domain features are extracted and the KNN (K Nearest Neighbor) classification algorithm is used in the work. In [1], device independent activity monitoring is achieved using Logistic Regression (LR) based two phase classifier where the best training device gets selected in the first phase while the second phase tunes the classifier for better recognition of activities. However, only coarse grained activities are recognized by this technique. Consequently, in this work, a detailed activity recognition framework is proposed that attempts to recognize detailed activities irrespective of the hardware configuration of smartphones and how the smartphones are kept during training and testing phases. We have not made use of Google's API as the class of activities that we are trying to classify are comprised of finer distinctions of a coarse grained class of activities, such as, for "walk", we have finer distinctions as: "brisk walk" and "slow walk". Similarly, for every coarse grained activity, finer distinctions exist and we propose a system here that learns such finer class of activities.

Problem Definition
Activity recognition problem can be defined as follows. Let the set of activities that can be recognized by the Human Activity Recognition System be represented by A = {a 1 , a 2 , a 3 , . . . , a n }, where A comprises of both static and dynamic activities. The Dataset DS = {ds d 1 p 1 , ds d 1 p 2 , ds d 2 p 1 , . . . , ds d m p k } is a set of datasets (ds d i p j represents set of data points), each being a function of device position (p j ) used for data collection and the device used (d i ), that is, DS = f (Device Used, Device Position) where m denotes the number of devices used and k denotes the number of positions used for collecting the data. The dataset DS, when preprocessed results in . . , f j } denotes the feature space consisting of all the features extracted from the preprocessed dataset DS , and each feature vector X i of dataset DS has an activity label y i of the form (x 1 , x 2 , x 3 , . . . , x j , y i ) : x 1 , x 2 , . . . x j ds k , y i A. Given a learning algorithm C, the Human Activity Recognition problem is to learn to recognize the activity set A from the dataset DS using the feature space F, by using a function g : DS → A, where g is a member of the hypothesis space and it best fits the Dataset DS to A using a loss function L : A × A → R such that if, for an instance i of training the model, the activity label is y i and predicted label is y and then the loss is computed as L(y i , y ). The trained model is then tested on an unseen test dataset DS", the trained model C using the function g : DS" → A, predicts the activity being performed as y, and the accuracy of the model is then computed.

Detailed Activity Recognition Framework
The objective of this work is to identify six individual detailed human activities from raw data produced by accelerometer and gyroscope of a smartphone. Four static activities (Sit on floor/chair, Lying left/right) and two dynamic activities (Slow/Brisk walk) are considered for this work. New activities can also be recognized by the system by appropriately updating the training dataset.
Accelerometer and gyroscope sensor readings are collected from individual smartphones (chosen as training and test devices) being kept in either the Shirt PockeT (SPT), the Right (front) pants PockeT (RPT) or the Right Hand (Hand) position. The data collected by holding in one's hand was done in a way that replicates day to day usage; therefore, the subjects were asked to hold the device in their hands as if they are using them during static fine grained activities, and, during dynamic fine grained activities, subjects were asked to perform the activity while holding the device in their hands as per their preference. Thus, the device held in one's hand does not replicate other positions, that is, SPT or RPT. The raw data plots of three acceleration axes (A x , A y , A z ) for the above-mentioned set of activities are shown in Figure 2a for a device. It reveals that static and dynamic activities grossly show different patterns, which can be easily distinguished using threshold based techniques that measure changes of sensor readings. However, it is difficult to distinguish between two static activities like sitting on the floor and sitting on a chair. The problem becomes complicated when different devices are involved as can be observed from Figure 2a,b. Interestingly, sensor readings of one smartphone also vary depending on how it is kept. Figure 3 shows such patterns for different activities using the same device when it is kept at three different positions-SPT/RPT/Hand respectively. Thus, threshold based techniques are not sufficient to distinguish between static and dynamic detailed activities, especially when device and position heterogeneity are considered. Hence, data transformation, feature extraction, selection and the classification techniques should be designed in a way that can mitigate these challenges.

Data Preprocessing and Feature Extraction
The raw sensory data may contain noise or abnormal spikes, due to a certain change of position or fall of device, unintentional change of sensor orientation, etc. Filtering techniques remove accelerometer signal noise, outliers like low frequency acceleration (gravity), which capture orientation of the smartphone sensors with respect to ground level data, and noise generated by the dynamic motion of humans, and preserves medium frequency signal components. Data transformation is a significant process of validating and normalizing filtered data. Data transformation is applied to make a linear fit of one dataset against another. The nonlinear transformation generally increases the linear relationship after applying Tr (function for transformation) to each data point. The square root of the value, the inverse of the value, converting into logarithmic scale, etc. are different nonlinear transformation procedures that are used for statistical analysis. The logarithm function is applied when the data cover different orders of magnitude. Logarithm transformation [42] with base 10 is applied on (A x , A y , A z ) and (G x , G y , G z ) in test and training datasets to improve linear relationship for this work. An orientation insensitive dimension Signal Vector Magnitude (SV Mag) is added in order to achieve usage behavior independent recognition [14] along with existing three dimensions (A x , A y , A z ) of accelerometer readings and gyroscope readings (G x , G y , G z ): Figure 4 shows the accelerometer readings collected from smartphones when it is in the right pants pocket and is faced upright or turned upside down. The plots show variations in particularly A x and A y , though data values for A z do not show much variation in direction, but a slight variation of magnitude can be observed. However, as is evident in Figure 5, SVMag is found to mitigate the change of orientation of smartphones due to minor changes in usage behaviour, such as turning of the device. As is evident from the figure, limb movements can occur even for static activities while maintaining the posture resulting in momentary spikes in the trace. The transformed data are partitioned into small segments and it is known as segmentation [35]. Proper selection of segment size is necessary to reduce classification complexity of the system and compute features from a small set of values. The short length of the window does not provide sufficient information of individual activities, and more than one activity may be present in the same window if the window size is too big. The sliding window approach is considered to effectively capture cycles in activities. Here, we have considered a 2 s window with 50% (1 s) overlap following [14] to reduce loss of information at the edge of the window. In [43], a 3 s window is found to achieve a minimal gain in classification accuracy in comparison to a 2 s window for short daily activities. The features are extracted from preprocessed data in the next phase of sensor data processing. Discovering meaningful representation of data and formulating the relation of raw sensor data with the expected knowledge for decision-making are the objectives of feature extraction. Feature vectors F i s are extracted on the set of segments S of the preprocessed dataset, Dt, by applying f t(). Extracted features constitute feature space:  Table 2 for three dimensions of acceleration (A x , A y , A z ) and gyroscopes (G x , G y , G z ) along with orientation of independent dimensions SV MagA and SV MagG.      A x , A y , A z , G x , G y , G

Frequency Domain Features
Initially, a total of 28 (seven features for four dimensions as mentioned in Table 2) time domain features and eight (two features for four dimensions as mentioned in Table 2) frequency domain features are applied to the preprocessed data. However, all features may not be relevant and informative. Thus, we have used information gain [44] to identify important features for the problem. Information gain value is measured for each attribute (feature) and the Ranker Search method is used to rank attributes by their individual evaluations. Features with low information gain value are removed as they do not add much information. Consequently, for the collected dataset, the following features are found to be informative for the problem considered.
Min and Max of A y and max of G x ,G y ,G z from time domain features, respectively, for accelerometers and gyroscopes; median of A y and G y , mean of A z , A x and G y from frequency domain features, respectively, as listed in Table 2. However, only a mean is not sufficient to get accurate reflection of several activities on the skewed data. Min and Max are applied to define minimum and maximum values on each segments, respectively, as the acceleration is expected to be restricted to a certain range for each class of activity. The median arranges the observations in order from smallest to largest value and represents an average of the two middle values.
With these selected feature sets, in this paper, we design a condition based ensemble as part of the proposed detailed activity recognition framework. This is detailed in the next subsection.

Classification of Detailed Activity
Training position selection is a crucial factor for recognition of detailed activity. Here, the training position selection process is done using several base classifiers. Data are collected for each position using a training device. Data collected for one position is supplied as the training set while data pertaining to all other positions are treated as a test set. Representative positions can be found in this way based on accuracy of activity recognition. This is detailed in the experimental setup discussed in the next section.
It is difficult to identify individual activity with reasonable accuracy using a single classifier (C i ) with default parameters. As different devices are configured with different sensors having varying sensitivities, even for the same activity, the sensor readings differ from one device configuration to another as shown in Figure 2. Even the position where the smartphone is kept also influences the sensor readings (due to change in device orientation with respect to body) as reflected in Figure 3. Hence, keeping a device (d i ) at a specific position (p i ) for data collection is designated as a condition denoted by d i − p i . Data is collected for different conditions. Specifically, for each training device, data are collected, keeping the phone at each of the representative positions. A base classifier is applied to each feature set obtained corresponding to each of the conditions. However, parameter tuning is needed for individual conditions even with a selected base classifier. Moreover, including every possible condition into one classifier and retaining it's power of generalization are not feasible tasks. However, classifiers could be individually tuned to effectively classify data for each training condition so that, when one specific classifier fails to achieve the desired result, an ensemble of such classifiers may prove to be a reasonable choice. Ensemble model is a combination of several condition based classifiers (C 1 , C 2 , . . . C k ) to increase the performance of prediction. Here, we used Logistic Regression (LR) [10] as the base classifier and take individual training datasets, ds d i p i (collected from a device kept at a representative position) to train each of the classifiers with different parameter values. Hence, each individual classifier (C i ) is tuned to effectively classify data collected in a specific condition. The k trained classifiers are represented by C'= {C 1 , C 2 , C 3 , . . . C k }. Given a test set (that may be collected using a different device keeping at RPT/SPT or held in hand), each C i classifies the instances of the test set. Then, C performs a weighted majority voting of the decisions made by the individual classifiers to come up with a decision for each of the test instances. The relative performance accuracy for C is computed using a cost function R': C'×DS'→ W, where W = {w 1 , w 2 , . . . w k }. R computes the relative weights for the classifiers based on their performance accuracy for each activity and these weights are represented by w i . This is defined as follows: Here, accuracy(C i ) denotes the classification accuracy of the ith classifier that is obtained experimentally. This is calculated on a training dataset. Thus, the problem of detailed human activity recognition is to form a k-condition based ensemble classifier EC : DS × C → A , where EC performs weighted majority voting from C using W on the test dataset DS and returns the activity being performed. Thus, the weighted majority voting scheme considers the classification of test instances that is predicted by the weighted majority of the classifiers as shown in Figure 6. The accuracy of ensemble EC is then computed by comparing the predictions with the true labels.
The basic block diagram of this detailed activity recognition framework is illustrated in Figure 6 where data are collected for k different conditions and, correspondingly, the selected base classifier is tuned for each of the k feature sets to form k condition based classifiers. We have considered two types of arrangements for collecting datasets-(i) using smartphone accelerometer and (ii) using both accelerometers and gyroscopes. Experimentations are conducted in these two modes to identify the role of each sensor for detailed activity recognition. The experimental results for validating the proposed framework are detailed in the next section.

Performance Evaluation
In this section, the performance of the proposed framework is evaluated for real data collected from five smart handheld devices (D1-D5) for 10 users. The devices are kept in three positions on/around the body, namely, SPT, RPT and Hand. An android application Sensor Kinetics pro [45] collects the embedded tri-axial accelerometer and gyroscope sensor data for six detailed daily activities Sit on chair, Sit on Floor, Slow Walk, Brisk Walk, Lying left and Lying right. The static fine grained activities are found to be dependent on the user posture. For instance, the activities "Sit on Floor" and "Sit on chair" are not affected by the hardness of the surface of the chair or floor. The data collection was done on both chairs with cushion and chairs with no cushion. What separates these two similar activities is that the posture in which the user sits, and the relative position of the rest of the body parts in a particular posture. On average, the subjects took at least 55-65 steps per minute when "walking slow" and around 105-110 steps minimum when "walking briskly". A user carried out each of these activities for 3-4 min while keeping each device at each of the three positions considered. The phone is held upright in each of these three positions while collecting the dataset. Each dataset contains around 54,000 accelerometer and gyroscope records. The experimental setup is detailed in Table 3. After removing the low-frequency acceleration (gravity) and noise, preprocessed data are grouped into overlapping windows and features are calculated from the acceleration and gyroscope values using MatlabR2013 (MathWorks, Natick, MA, United States) [46]. A Weka3.7 [47] tool is used for applying classification algorithms where default parameters of classifiers are changed as necessary.
Initially, experiments are conducted to find representative training positions. If the classifiers can accurately identify several activities, even when the training and test position of the smartphone are different, then that position of the training device is considered to be a representative position. Two training devices D1 and D2 are considered (Table 3) in order to keep it minimal to show the effectiveness of the ensemble on three different test devices. However, the ensemble would work for any number of devices. Details of the experiments for selecting the base classifier for a condition based ensemble are provided, followed by experiments to show the effectiveness of the ensemble subject to device and position independence. Initially, it is applied only on the accelerometer data set and then a combination of accelerometers and gyroscopes is used. Finally, the overall performance of the framework is also verified.

Training Position Selection
The main objective of this section is to verify whether the classifiers can identify different activities, when the smartphone is kept at one position to collect training data while test data are collected by keeping it at another position. State-of-the-art classifiers, such as Bayesian Network (BN), Decision Tree (J48), lazy learner such as k-Nearest Neighbor (IBK), ensemble learner such as bagging and Logistic Regression (LR) are applied. Logistic function or sigmoid function are used in LR. Bagging, which stands for Bootstrap Aggregation, helps to reduce variance and avoid overfitting.
The results are shown in Figure 7. Here, we have considered three positions (SPT, RPT, Hand), and, individually, each position is considered as a training position and other positions are test positions. For instance, if SPT is the training position, then RPT and Hand are considered as test positions. Several classification algorithms are considered for this experiment that are applied on the selected feature set extracted from the training and test datasets. From Figure 7a, it can be observed that, when a device is kept at SPT for training, and test data are collected by keeping it in RPT or Hand, all classifiers show comparable results. If training data are collected by keeping the device in SPT, accuracy of activity recognition is comparable with state-of-the-art classifiers. However, it becomes difficult to recognize activity when the device is kept in Hand while collecting training data as shown in Figure 7c. Trying to train the activities while holding the device in hand is the most challenging position as the way a device is held varies from person to person, in addition to other factors, such as the amount of gestures someone does, also affect the sensor values. Hence, it is the position for which we received the least relative accuracy in terms of learning various fine grained activities. Thus, SPT and RPT are found to be the two representative training positions to consider for collecting training datasets as shown in Figure 7a,b. The default parameters of the classifiers are tuned for accuracy as detailed in Table 4.

Base Classifier Selection
The base classifier for the ensemble is selected from state-of-the-art classifiers Bayesian network, K nearest neighbor, LR, Multilayer perceptron and Decision tree. The "no free lunch theorem" for optimization states that no optimization technique (algorithm/heuristic/meta-heuristic) is the best for the generic case and all special cases (specific problems/categories) [49]. Thus, we find the classifier that consistently performs best for our collected dataset and select that as the base classifier for the ensemble. Test set SPT-D5-Tst is used for classification with respect to two training data sets (SPT-D1 and RPT-D1) as shown in Figure 8. This experiment is also repeated with other datasets. From these experiments, it can be observed that LR consistently performs better than the other classifiers. Even from Figure 7, we find that LR performs well in most of the cases. Hence, LR is considered as the base classifier for our proposed ensemble of condition based classifiers. The main benefit of LR is that it is simple and the logistic cost function is convex, and thus finds the global minimum. The maxIts parameter of LR is tuned for several conditional classifiers, maintaining the range 20 to 40 in order to get stable output.

Activity Classification Using Only an Accelerometer
The classification accuracy of the condition based classifiers along with the classification accuracy calculated by the majority voting ensemble are shown in Table 5, when the data set contains only data collected from an accelerometer sensor. It can be observed that, for most of the cases, a condition based ensemble classifier provides improved results from individual classifiers. Classification accuracy is improved in ensemble classification from an individual classifier by 3-20% as shown in Table 5 and Figure 9. When SPT-D5-Tst is considered as the test dataset, accuracy increases from 75% to 90% with ensemble. In this way, the framework becomes device independent. Reasonable accuracy can be achieved even when test data are collected by holding the device in hand (D4-Hand-Tst as test dataset). This makes the framework not only device independent but position (usage behavior) independent as well. If the test dataset is considered from one of the training devices, then ensemble is also found to provide better results, the overall classification accuracy is 91% as shown in Figure 9. This is the case for a test data set D1-SPT-Tst and that collected from training device D1.  Figure 10 shows the performance of the ensemble in identifying individual activities. Most of the activities are found to be effectively classified using ensemble with above 90% accuracy as shown in Figure 10. The figure interestingly indicates the effectiveness of a majority voting scheme employed here. However, Sit on floor is an activity that is not identified by the ensemble effectively as all the condition based classifiers are showing almost average classification accuracy. Actually, with accelerometer data alone, it is difficult to differentiate between two closely related static activities.
Hence, experiments are also conducted using both accelerometer and gyroscope sensors in the next subsection. The classification errors are also reported.

Activity Classification Using Both an Accelerometer and Gyroscope
Experiments are conducted for device independent activity recognition with an ensemble classifier using data collected from both accelerometer and gyroscope, and the results are reported in this section. The overall experimental procedure is the same as reported in the previous section. The classification accuracy of the condition based classifiers along with the classification accuracy calculated by the majority voting ensemble are shown in Table 6. When SPT-D5-Tst is considered as the test dataset, accuracy increases from 90% to 93% for ensemble, making the framework device independent. If the test data set is collected from one of the training devices, then overall classification accuracy is found to increase from 88% to 94% when test data set is D1-SPT-Tst.
Classification accuracies of the framework for individual activities are shown in Figure 11. Most of the activities are found to be classified with better accuracy compared to the one with an accelerometer as shown in Figure 11. Sit on floor was not effectively detected when using only an accelerometer, as is reflected in Figure 10, but can be better recognised when gyroscope readings are added to the dataset ( Figure 11).
The work in [22] is compared with our work on our collected training dataset D1-SPT and test dataset D5-SPT-Tst. They have considered the average of probabilities fusion method, which returns the mean of the probability distributions for each of the single classifiers. Multiple combinations of classifiers are used to find the highest accuracy using an average of probabilities. Difference of Min and Max, Correlation, Root mean square, Average count of peak (AP), Variance of AP, Mean, and Standard deviation are considered as features in each window (128 samples) as detailed in the paper [22]. A combination of three classifiers-MLP, Random Forest (RF), and Simple Logistic (SL)-as mentioned in their paper is considered, with the average probabilities fusion method. Both training and test datasets contain only accelerometer readings. Our proposed ensemble is applied on the selected feature set as detailed in Table 2, but the features are extracted from the same dataset, that is, training dataset D1-SPT and test dataset D5-SPT-Tst. The results are reported in Table 7. Our proposed framework is found to perform better showing 90% accuracy with minimal features. The necessity of having a condition based classifier is also reflected in the table.  Figure 11. Classification accuracy (activity wise) for device independent activity recognition when D5 is used as a test device using condition-based and ensemble classifiers for both an accelerometer and a gyroscope.

Evaluating the Performance of the Proposed System
Confusion matrices for the ensemble classifier considering only accelerometer values and both accelerometer and gyroscope values of D5 are shown in Figure 12. For case (a), the model mostly misclassifies Sit on floor as Sit on chair, and Lying on left side with Lying on right side. The misclassification for these classes decreases in (b), as the addition of gyroscope sensor values, in addition to accelerometer values enabling the model to better learn the finer differences in these similarly detailed activity classes.  Performance of the proposed system is evaluated using the following error metric. Here, E i denotes the error of the ith instance and E tot denotes the total error for the individual dataset. These are calculated as follows: E i = 0, if label(C test ) = label(C classi f ication ) = 0.5, if label(C test ) = label(C classi f ication ) and CategoryLabel(C test ) = CategoryLabel(C classi f ication ) = 1, otherwise, where N is the total number o f instances.
If the actual label is brisk walk, and the label is accurately predicted by the classifier, then E i is 0, according to Equation (5). If the predicted label is slow walk instead of brisk walk, as both are dynamic of activities. E i is 0.5. Otherwise, it is 1.
The error of activity recognition (E total ) for test datasets using individual classifiers and the majority voting ensemble is shown in Figure 13. A majority voting ensemble is found to produce an average error of 5% only, which is much lower than the errors using individual classifiers as shown in Figure 13. The average error can still be decreased to 2% when we are taking gyroscope readings along with accelerometer readings in the proposed framework as shown in Figure 14.  In this way, we can identify the activities when training device and test device positions (usage behavior) are different. The system gives better accuracy when training devices are D1 and D2 and the test device is D5. The condition based ensemble approach with selected minimum features is found to improve the overall system accuracy by 20% on an average as depicted in Table 5. Our proposed framework is found to perform well with ensemble, providing 90% accuracy using only an accelerometer for any device. The overall performance is improved when data from two sensors (both accelerometers and gyroscopes) are utilized. Activity wise, accuracy is also found to be increased.

Conclusions
In this work, we have proposed a framework for detailed activity recognition that classifies both static activities like Sit on chair/Floor, Lying left/right and dynamic activities like Slow Walk, Brisk Walk irrespective of usage behavior and differing hardware configuration of smartphones. Through feature extraction and selection, the framework can perform better for individual classifiers. The proposed weighted majority voting based ensemble of condition based classifiers is found to perform detailed activity recognition with considerable accuracy (90%) better than an individual classifier using only an accelerometer. SPT and RPT are found to be representative positions to keep a training device while collecting data for activity recognition. The framework is found to perform better when both accelerometer and gyroscope sensors work together, and achieves 94% accuracy. Results are taken from 10 users using five devices. The solution is ubiquitous, as it uses accelerometer and gyroscope sensors only, which are widely available in any smart handheld devices and does not need any specific device nor does it need the device to be held at some specific orientation.
We plan to look into more detail on how to train and test detailed activities when a device is held in hand, as this position is influenced by user gestures, with dynamic activities, etc. We also plan to combine datasets containing both coarse grained and fine grained activities, and make it public as part of our future work.