A Novel Wearable Sensor-Based Human Activity Recognition Approach Using Artificial Hydrocarbon Networks

Human activity recognition has gained more interest in several research communities given that understanding user activities and behavior helps to deliver proactive and personalized services. There are many examples of health systems improved by human activity recognition. Nevertheless, the human activity recognition classification process is not an easy task. Different types of noise in wearable sensors data frequently hamper the human activity recognition classification process. In order to develop a successful activity recognition system, it is necessary to use stable and robust machine learning techniques capable of dealing with noisy data. In this paper, we presented the artificial hydrocarbon networks (AHN) technique to the human activity recognition community. Our artificial hydrocarbon networks novel approach is suitable for physical activity recognition, noise tolerance of corrupted data sensors and robust in terms of different issues on data sensors. We proved that the AHN classifier is very competitive for physical activity recognition and is very robust in comparison with other well-known machine learning methods.


Introduction
The interest in human activity recognition research has been growing in context-aware systems for different domain applications. Human activity recognition (HAR) deals with the integration of sensing and reasoning in order to better understand people's actions. Research related to human activity recognition has become relevant in pervasive and mobile computing, surveillance-based security, context-aware computing, health and ambient assistive living. Recognizing body postures and movements is especially important to support and improve health systems, as discussed below.
In their survey, Avci et al. [1] reviewed several medical applications of activity recognition for healthcare, wellbeing and sports systems. Regarding medical applications using HAR with wearable sensors, the authors report examples in the literature of healthcare monitoring and diagnosis systems; rehabilitation; systems to find correlation between movement and emotions; child and elderly care. They also reviewed assisted living and home monitoring systems improving the quality life and ensure the health, safety and wellbeing of children, the elderly and people with cognitive disorders. The authors also state that numerous activity recognition systems using wearable sensors have been proposed for sports and leisure applications; for example: daily and sport activity recognitions; detection of motion sequences in martial arts to increase interaction in video games or martial arts education; monitoring sport activities in order to train and monitor the performance.
Preece et al. [2] reported activity classification systems to find links between common diseases and levels of physical activity. The authors also reviewed systems that provide information on daily activity patterns to improve the treatment and diagnosis of neurological, degenerative and respiratory disorders. Other reported systems quantify levels of physical activity providing feedback and motivating individuals to achieve physical activity goals. Guidoux et al. [3] presented an approach based on smartphone sensors for estimating energy expenditure recognizing physical activities in free-living conditions. In summary, health systems and assistive technologies can benefit from activity recognition and deliver personalized services.
Automated human activity recognition is a challenging task. Two main approaches are used to perform the task of activity recognition: vision-based and sensor-based activity recognition [4,5]. The vision-based approach is based on image processing of video sequences or digital visual data provided by cameras. No wearable or smartphone sensors are required, but it depends on the image quality. The quality of cameras, lighting and environments, among others, are factors that determine image quality. Visually monitoring the actor behavior entails privacy issues. The sensor-based approach is focused on activity monitoring using wearable [5], smartphone sensors and technologies [6] or object embedded sensors [7]. There are several drawbacks in these approaches: wearing sensors or smartphones for a long period of time is necessary, and there might be battery issues. However, the main problem when using sensor-based approaches is the different types of noise found in input features due to sensor errors or noisy environments. The output class can also have errors. Noise in data hampers the human activity recognition classification process.
Nettleton et al. [8] state that "machine learning techniques often have to deal with noisy data, which may affect the accuracy of the resulting data models." This statement is also true in the activity recognition classification process given the great variations on the types, number and positioning of sensors. Sensor characteristics change also across different subjects and for the same individual [2]. Therefore, in order to develop a successful activity recognition system, it is necessary to use stable and robust machine learning techniques capable of dealing with noisy data.
In this paper, we present a novel machine learning technique to the human activity recognition community: artificial hydrocarbon networks (AHN). Our artificial hydrocarbon networks approach is suitable for physical activity recognition, the noise tolerance of corrupted data sensors and is robust in terms of different issues for data sensors. With the purpose of proving the aforementioned characteristics of our technique, a comparison analysis was performed with the most commonly-used supervised classification techniques in the HAR community. The performance of the proposed AHN classifier was compared to fourteen supervised techniques frequently used in the activity recognition classification process and reviewed in the literature [2,[9][10][11].
In order to evaluate the performance of the artificial hydrocarbon network-based classifier, four experiments were designed using the public Physical Activity Monitoring dataset (PAMAP2) [12,13]. The first experiment was done using the entire raw dataset. The second experiment was made performing a prior feature reduction using the recursive feature elimination (RFE) method. The third experiment evaluated noise tolerance in all supervised classifiers using three levels of noise: 7%, 15% and 30%. Noise was simulated with random insertion in input features of the testing set. Lastly, a majority voting across windows-based approach for an HAR system using the proposed method was implemented.
Our results show that the AHN classifier is very competitive for physical activity recognition and is very robust in comparison with the other methods. In addition, this paper also contributes with a benchmark between fifteen supervised machine learning methods in the human activity recognition field, comparing them in terms of: accuracy, macro-and micro-averaging sensibility, precision and F 1 -score and training time; also contrasting the experimental results with recent literature. Notice that the proposed method is not working in real time, and the introduction of the artificial hydrocarbon networks in real-time HAR systems is out of the scope of this work.
The rest of the paper is as follows. Section 2 describes the state-of-the-art sensor-based human activity recognition and discusses noise in the classification process. Then, Section 3 introduces the artificial hydrocarbon networks technique as a supervised learning method, and Section 4 describes our proposal for using an AHN-based classifier in human activity recognition. In order to prove the proposed classifier, a case of study in physical activity monitoring is presented and described in Section 5. In addition, Section 6 presents the results and a discussion of the proposal, as well as a comparison between fourteen supervised classifiers used in HAR. Lastly, Section 7 concludes the paper and highlights future work in this context.

Sensor-Based Human Activity Recognition
Recognizing a human activity in a wearable sensor-based approach means that: (i) the activity is present in the physical environment; (ii) sensors are able to provide a reliable representation of the physical parameters of the environment affected by the activity; and (iii) a classification algorithm recognizes accurately an activity [14]. In that sense, this work is focused on the latter component of the wearable sensor-based human activity recognition approach.
Currently, many learning methods have been used in recent years for human activity recognition. Several reviews have been published analyzing the performance of different classifiers in the human activity recognition research area for applications in home care systems, surveillance, physical therapy and rehabilitation, sports improvement, among others. The literature reports several surveys and comparisons for sensor-based human activity, like in [1,2,4,9,11,[15][16][17], and vision-based human activity recognition can be found in [18,19].
Since our work is mainly focusing on sensor-based human activity recognition and data-driven approaches, this section is particularly interested in reviewing related works regarding the stability and robustness of machine learning techniques when confronted with the task of human activity recognition. Thus, noise in the human activity recognition classification process is discussed, firstly. Subsequently, related works on machine learning techniques used for human activity recognition are reviewed.

Noise in the Human Activity Recognition Classification Process
A classification process must be done in order to recognize human activity given that the activity is present and wearable sensors reliably represent physical parameters affected by the activity. The goal of the classification task for human activity recognition is to interpret the features of physical parameters and perform a correct classification of the activity [14].
Noisy data are often provided in machine learning processes, making it more difficult to obtain accurate models for real problems [8]. Different types of noise can also be found in the human activity recognition classification process. Input features may have noise for several reasons, such as: (1) sensor miscalibration; (2) dead or blocked sensors; (3) errors in sensor placement; (4) activities registered in noisy environments; (5) activities interleaved, so that the events are not only related to one activity. Classification labels in the output class need human intervention, and it is therefore likely to have errors, as well. As in other classification problems, noise can be located in training and/or test data.
It is difficult to measure the impact of each type of noisy data in the classification process. Nettleton et al. [8] reviewed works that studied the impact of noise for several learners and presented a comparison of the effect of attribute and class noise on models created by naive Bayes, C4.5 decision tree, an instance-based algorithm, and support vector machines. They compared the techniques' performance with thirteen classification problems (activity recognition is not included). In the latter work, the authors proved that naive Bayes is relatively more robust to noisy data than the other three techniques, and SVM presented the poorest performance. In this regard, we agree with Nettleton et al. [8] on two statements: 1. Developing learning techniques that effectively and efficiently deal with noisy types of data is a key aspect in machine learning. 2. There is a need for comparisons of the effect of noise on different learning paradigms.
These two statements are also pertinent for human activity recognition domain.

Machine Learning Techniques Used for Human Activity Recognition
The growing interest on human recognition and the great advances in sensor technologies create the necessity for developing robust machine learning systems. Applications in the field of activity recognition need to deal with a large number of mult imodal sensors that provide high-dimensional data with large variability; thus, data may be missing, and labels can be unreliable.
Recently, some efforts have been done to promote the development of robust machine learning techniques, especially in the domain of activity recognition. The workshop on robust machine learning techniques for human activity recognition is one example of these efforts [20]. An overview of activity recognition describing the major approaches, methods and tools associated with vision and sensor-based recognition was presented by Chen et al. [4]. In fact, the authors made the distinction between data-driven and knowledge-driven approaches. The sensor-based approach is focused on activity monitoring using wearable or smartphone sensors and technologies, while the vision-based approach requires image processing of video sequences or digital visual data provided by cameras [21].
Preece et al. [2] present an introduction and research review of different machine learning techniques used for human activity recognition and their failures. Currently,the authors discuss findings and results obtained with the following learning techniques used in activity classification: threshold-based classification, hierarchical methods, decision trees, k-nearest neighbors, artificial neural networks, support vector machines, naive Bayes and Gaussian mixture models, fuzzy logic, Markov models, combined classifiers and some unsupervised learning methods. They made a summary of studies comparing different classifiers and an overview of the advantages and drawbacks of each of the aforementioned methods. Their comparison includes the number and type of activities classified, accelerometer placements and inter-subject classification accuracy. From this overview, we extract and highlight the following statements [2]: • "The variability in activities, sensors and features means that it is not possible to directly compare classification accuracies between different studies." • "... there is no classifier which performs optimally for a given activity classification problem." • "... there is a need for further studies investigating the relative performance of the range of different classifiers for different activities and sensor features and with large numbers of subjects." Regarding noise, Preece et al. [2] only mentioned wavelet analysis techniques for suppressing noise, but they had not mentioned anything about the classifier's robustness or stability.
Dohnálek et al. [11] present a comparison of the performance only in terms of the accuracy of several classifiers: two orthogonal matching pursuit techniques, k-nearest neighbors, classification and regression tree (CART) techniques and global merged self-organizing maps. Their dataset contains data of sensors that measure temperature and 3D data from the accelerometer, gyroscope and magnetometer of nine healthy human subjects. Their results confirm that a compromise between speed and accuracy must be made given that the best classifiers are slower than the worst. It is important to notice that only a brief discussion of time complexity was presented, and no discussion regarding the robustness of the classifiers was done.
Lara et al. presented a summary of classification algorithms used in human activity recognition systems in their survey [9]. They discussed the advantages and limitations of different types of classifiers: decision trees, Bayesian instance-based artificial neural networks, domain transform, fuzzy logic, regression methods, Markov models and classifier ensembles. In addition to this work, the authors did not mention the impact of noise in the process of activity recognition; however, Lara presented experiments addressing this impact in his dissertation [22]. He induced noise by arbitrarily modifying the labels in the dataset to assess the effectiveness of the proposed probabilistic strategies. His results show that some classification algorithms are more tolerant to noise than others.
Lustrek et al. [23] compared the performance of eight machine learning techniques in fall detection and activity recognition. They added Gaussian noise to their input recordings of body tags to the shoulders, elbows, wrists, hips, knees and ankles. They presented classification accuracy results for clean and noisy data in support vector machines, random forest, bagging and AdaBoost classifiers. The best accuracy (support vector machines) on clean data was 97.7% and on noisy data 96.5%.
Ross et al. [24] presented a comparative analysis of the robustness of naive Bayes, support vector machines and random forest methods for activity with respect to sensor noise. The authors performed experiments with collections of test data with random insertions, random deletions and dead sensors. They simulated miscalibrated and dead sensors. Random forest models outperform the other methods in all of their experiments. In their brief study, the three chosen methods were consistent in their relative performance.
To this end, the Opportunity Activity Recognition Challenge was set to provide a common platform to allow the comparison of different machine learning algorithms on the same conditions. Chavarriaga et al. [25] presented the outcome of this challenge. They reported the performance of the following standard techniques over several subjects and recording conditions: k-nearest neighbors, nearest centroid classifier, linear discriminant analysis and quadratic discriminant analysis. One of the subjects had different sensor configurations and noisy data. The dataset used for the challenge is a subset of the one presented by Roggen et al. in [10]. These efforts provide a method of comparison of machine learning techniques using common benchmarks.

Artificial Hydrocarbon Networks as a Supervised Learning Method
Nature-inspired computing promotes methodologies, techniques and algorithms focusing on the computation that takes place in nature [26]. Moreover, in machine learning, heuristic-and meta-heuristic-based methods have been widely explored in order to efficiently tackle real-life problems that are difficult to solve due to their high complexity and limitations of resources to analyze and extract experience from them [26]. Recent works have introduced artificial hydrocarbon networks as a supervised learning algorithm [27], which we use as a classifier for human activity recognition. Thus, this section briefly describes the high-level framework of artificial hydrocarbon networks, called artificial organic networks, and then a full description of the artificial hydrocarbon networks algorithm and its characteristics is exposed.

Artificial Organic Networks
The artificial organic networks (AON) technique is a machine learning framework that is inspired by chemical organic compounds [27], such that all definitions and heuristics are based on chemical carbon networks. Currently, this technique proposes two representations of artificial organic compounds: a graph structure representing physical properties and a mathematical model behavior representing chemical properties.
The main characteristic of the AON framework is that it packages information into modules, so-called molecules [27]. Similar to chemical organic compounds, artificial organic networks define heuristic mechanisms for generating organized and optimized structures based on chemical energy. In a nutshell, artificial organic networks allow [27]: modularity, inheritance, organizational and structural stability.
Currently, artificial organic networks define a framework in order to develop useful learning algorithms inherit to it [27], as shown in Table 1. Reading bottom-up, the first component of this framework defines the basic units that can be used in the machine learning algorithm, the second level is related to the interactions among components to compute nonlinear relationships. Then, the third level of the framework refers to the chemical heuristic rules that control the interactions over components. These three levels are also mathematically modeled in terms of their structure and functionality, and lastly, the implementation level considers training learning models and then inferring from them [27,28]. Detailed information of the AON-framework can be found in [27,28].

Artificial Hydrocarbon Networks Algorithm
Artificial hydrocarbon networks (AHN) algorithm is a supervised learning algorithm with a graphical model structure inspired by chemical hydrocarbon compounds [27]. Similar to chemical hydrocarbon compounds, artificial hydrocarbon networks are composed of hydrogen and carbon atoms that can be linked with at most one and four other atoms, respectively. Actually, these atomic units interact among themselves to produce molecules. Particular to this method, the basic unit with information is the CH-molecule. It is made of two or more atoms linked between each other in order to define a mathematical function ϕ centered in the carbon atom and parameterized with hydrogen-based values attached to it, as shown in Equation (1); where ϕ ∈ R represents the behavior of the CH-molecule, σ is a real value called the carbon value, H i ∈ C is the i-th hydrogen atom linked to the carbon atom, k represents the number of hydrogen atoms in the molecule and x is the input to that molecule [27,29,30].
If a CH-molecule is unsaturated (i.e., k < 4), then it can be joined together with other CH-molecules, forming chains of molecules, so-called artificial hydrocarbon compounds. In [29][30][31], the authors suggest using saturated and linear chains of molecules like in Equation (2); where CH k represents a CH-molecule with k hydrogen atoms associated with it, and the line symbol represents a simple bond between two molecules. Notice that outer molecules are CH 3 , while inner molecules are CH 2 .
Artificial hydrocarbon compounds also have associated a function ψ representing their behavior. For instance, the piecewise compound behavior [27] ψ ∈ R can be expressed as Equation (3); where L t represents the t-th bound that limits the action of a CH-molecule over the input space. In that sense, if the input domain is in the interval x ∈ [L min , L max ], then L 0 = L min and L n = L max , and the j-th CH-molecule acts over the interval [L j−1 , L j ], for all j = 1, ..., n.
To obtain the bounds L t for all t = 0, ..., n, a distance r between two adjacent bounds, i.e., [L t−1 , L t ], is computed as in Equation (4); where r represents the intermolecular distance between two adjacent molecules. In addition, ∆r is computed using a gradient descent method based on the energy of the adjacent molecules (E j−1 and E j ) like in Equation (5), where 0 < η < 1 is a learning rate parameter [27,28,31]. For implementability, the energy of molecules can be computed using a loss function [27]. r = r + ∆r (4) At last, artificial hydrocarbon compounds can interact among themselves in definite ratios forming a mixture S ∈ R. For this method, weights are called stoichiometric coefficients, and they are represented as elements α i ∈ R, as shown in Equation (6); where c is the number of compounds in the mixture [27]. For this work, the artificial hydrocarbon networks structure considers one compound, such that, c = 1 and S(x) = ψ 1 (x).
Formally, an artificial hydrocarbon network is a mixture of artificial hydrocarbon compounds (see Figure 1), each one obtained using a chemical-based metaheuristic rule. The training algorithm is known as the AHN-algorithm [28][29][30]; and for this work, the AHN-algorithm was reduced to Algorithm 1 for saturated and linear hydrocarbon compounds. Notice that Algorithm 1 reflects the restrictions about a saturated linear chain of molecules and a piecewise compound behavior imposed for this work. For a detailed description of the general AHN-algorithm, see [27]; and for the implementability, see [28]. In addition, a numerical example of training and testing AHN is summarized in Appendix A.   Figure 1. Structure of an artificial hydrocarbon network using saturated and linear chains of molecules [29]. For this work, the topology of the proposed classifier considers just one hydrocarbon compound (see Section 4).

Algorithm 1 AHN algorithm for saturated and linear hydrocarbon compounds.
Input: a training dataset Σ = (x, y), the number of molecules in the compound n ≥ 2 and a tolerance value > 0. Output: the saturated and linear hydrocarbon compound AHN. Initialize an empty compound AHN = {}. Create a new saturated linear compound C like Equation (2). Randomly initialize intermolecular distances r t . while |y − ψ| > do Determine all bounds L t of C using r t . for each j-th molecule in C do Determine all parameters of behavior ϕ j in Equation (1) using an optimization process. end-for Build the compound behavior ψ of C using Equation (3). Update intermolecular distances using Equations (4) and (5).

end-while
Update AHN with C and ψ. return AHN

Characteristics of Artificial Hydrocarbon Networks
The artificial hydrocarbon networks algorithm has some characteristics that can be useful in regression and classification problems. In particular, for this work, both monitoring and noise tolerance tasks in human activity recognition are considered. Thus, several characteristics of AHN related to these tasks are discussed below: • Stability: This characteristic considers that the artificial hydrocarbon networks algorithm minimizes the changes in its output response when inputs are slightly changed [27]. This is the main characteristic that promotes using AHN as a supervised learning method. • Robustness: This characteristic implies that artificial hydrocarbon networks can deal with uncertain or noisy data. The literature reports that AHN can deal with noisy data as it filters information, e.g., AHN has been used in audio filtering [27,31]. Additionally, ensembles of artificial hydrocarbon networks and fuzzy inference systems can also deal with uncertain data, for example in intelligent control systems, like in [29,30]. • Metadata: Molecular parameters like bounds, intermolecular distances and hydrogen values can be used as metadata to partially understand underlying information or to extract features. In [27], it reports that the artificial organic networks method packages information in several molecules that might be interpreted as subsystems in the overall domain. For example, these metadata have been used in facial recognition approaches [27].

Artificial Hydrocarbon Networks-Based Classifier for HAR Systems
From above, this work considers training and using an AHN classifier exploiting stability and robustness characteristics in the field of human activity recognition based on wearable sensors, with particular approaches in monitoring and noise tolerance. Previous work in this direction can be found in [21].
In this paper, we propose to build and train an artificial hydrocarbon network for a supervised learning classifier (AHN classifier) aiming to monitor human activities based on wearable sensor technologies. In fact, this AHN classifier is computed and employed in two steps: training-and-testing and implementation, as shown in Figure 2.  Currently, the AHN classifier considers that sensor data have already processed in N features x i for all i = 1, ..., N and have organized in Q samples, each one associated with its proper label y j representing the j-th activity in the set of all possible activities Y for j = 1, ..., J; where J is the number of different activities in the dataset. Thus, samples are composed of features and labels as (N + 1)-tuples of the form (x 1 , ..., x N , y j ) q for all q = 1, ..., Q.
Considering that there is a dataset of Q samples of the form defined above, then the AHN classifier is built and trained using the AHN algorithm shown in Algorithm 1. It is remarkable to say that this proposal is using a simplified version of artificial hydrocarbon networks. Thus, the AHN classifier is composed of one saturated and linear hydrocarbon compound, i.e., no mixtures were considered (see Figure 1 for a hydrocarbon compound reference). In that sense, the inputs of the AHN-algorithm are the following: the training dataset Σ is a subset of R samples, from the original dataset, as Equation (7); the number of molecules n in the hydrocarbon compound is proposed to be the number of different activities in the dataset (n = J); and the tolerance value is a small positive number selected manually. Notice that the number of molecules in the compound is an empirical value; thus, no pairing between classes and molecules occurs. At last, the AHN-algorithm will compute all parameters in the AHN classifier: hydrogen and carbon values, as well as the bounds of molecules.
For testing and validating the AHN classifier, the remaining samples P from the original dataset (i.e., such that Q = P + R) form the testing dataset. Then, the testing dataset is introduced to the AHN classifier, previously computed. Lastly, the validation of the classifier is calculated using some metrics (see Section 5). Moreover, new sample data can be also used in the AHN classifier for recognizing and monitoring a human activity based on the corresponding features.

Case Study: Physical Activity Monitoring Using Artificial Hydrocarbon Networks
In this section, a case study on physical activity monitoring is presented and described in order to prove the performance of the proposed AHN classifier in terms of both monitoring and noise tolerance tasks. In particular, this case study uses a public dataset, and it compares the performance of the AHN-algorithm among other well-known supervised classifiers in the field of human activity recognition. At last, several metrics for classification tasks are also described.

Dataset Description
This case study employs the public Physical Activity Monitoring Data Set (PAMAP2) [12,13], which consists on 3,850,505 samples of raw signals from inertial sensors. Those samples were collected from three sensors placed on nine 27-year average people (eight men and one woman), as shown in Figure 3. The subjects performed 18 different activities during intervals of 10 h. However, only eight hours are dedicated to the activities, and the remaining two hours are dedicated to rest and change from one activity to another. Notice that resting and transitional period activities were labeled with zero-value in this dataset. In particular to our case study, we eliminated these zero-labeled activities. Then, the 18 different activities in our modified dataset are summarized in Table 2.
Since the PAMAP2 dataset consists of several measurements from inertial sensors and a heart rate monitor, this case study only considers numerical features from inertial sensors. Each "Colibri" wireless sensor has a total of 17 features: one for temperature, three 3D-acceleration data in inertial measurement units (IMU) sampled at 100 Hz at the scale of Å ± 16 g (13-bits), three 3D-acceleration data (IMU) sampled at 100 Hz at the scale of Å ± 6 g (13-bits), three 3D-gyroscope data (rad/s), three 3D-magnetometer data (µT) and three orientation values. Furthermore, the timestamp was eliminated from the dataset, since it might cause overfitting in supervised classifiers.
To this end, the dataset for the case study is composed of the following samples: 10,200 training samples (600 random samples for each of the first twelve activities and 500 random samples for each of the other activities) and 5100 testing samples (300 random samples for each of the first twelve activities and 250 random samples for each of the remaining activities) chosen randomly from the original dataset. In both cases, samples with missing values were avoided. Notice that since random selection was done, samples in the training and testing sets are not time dependent.  Table 2. Physical activities identified in this case study, adapted from [13].

1
Lying This movement is lying flat, slightly changing position or stretching a little bit.
2 Sitting Refers to sitting in a chair in any posture. It also includes more comfortable positions as leaning or crossing your legs.
3 Standing This position includes the natural movements of a person who is standing, swaying slightly, gesturing or talking.

Walking
This activity is a stroll down the street at a moderate speed of approximately 5 km/h.

Running
The people who made this activity ran at a moderate speed; taking into account non-high level athletes.
6 Cycling A bicycle was used for this movement, and people pedaled as on a quiet ride. An activity requiring great effort was not requested.

7
Nordic walking For this activity, it was required that persons that were inexperienced walked on asphalt using pads. 8 Watching TV This position includes the typical movements of someone who is watching TV and changes the channel, lying on one side or stretching his or her legs.

9
Computer work The typical movements of someone who works with a computer: mouse movement, movement of neck, etc.

10
Car driving All movements necessary to move from the office to the house for testing sensors.

11
Ascending stairs During this activity, the necessary movements up to a distance of five floors were recorded; from the ground floor to the fifth floor.

12
Descending stairs This movement is the opposite of the former. Instead of climbing the stairs, the activity of descending them was recorded.

13
Vacuum cleaning Refers to all of the activities necessary to clean a floor of the office. It also includes moving objects, such as rugs, chairs and wardrobes.
14 Ironing It covers the necessary movements to iron a shirt or a t-shirt. 15 Folding laundry It consists of folding clothes, such as shirts, pants and socks. 16 House cleaning These are the movements that a person makes while cleaning a house; such as moving chairs to clean the floor, throwing things away, bending over to pick up something, etc.

17
Playing soccer In this activity, individuals are negotiating, running the ball, shooting a goal or trying to stop the ball from the goal.

18
Rope jumping There are people who prefer to jump with both feet together, and there are others who prefer to move one foot first and then the other.

Methodology for Building Supervised Models
In order to prove that our AHN classifier is very competitive for physical activity recognition in terms of performance and noise tolerance, we choose to compare fourteen supervised classifiers, and we conduct three experimental cases.
In addition, the methodology also considers a cross-validation technique (10 folds and five repetitions) for each classifier in order to build suitable supervised models. For this case study, the accuracy metric was employed within the cross-validation technique to select the best model for each classifier. Table 3 summarizes the configuration parameters for training these models, using the caret package in R. Notice that the configurations column represents the number of different configurations created automatically in the cross-validation technique before selecting a suitable classifier. On the other hand, each stage of the activity recognition chain (ARC) described by Bulling et al. [32] (i.e., stages from data acquisition, signal preprocessing and segmentation, feature extraction and selection, training and classification) directly influences the overall recognition performance of an HAR system [32]. In particular, feature extraction and selection are common practice to improve the performance of most HAR systems. Hence, if bad design decisions are made, the processed dataset might contain redundant or irrelevant information [9]; the computational demand may unnecessarily increase and also reduce the accuracy of some classification methods [2]. Therefore, some authors choose to experiment with raw data for comparison and evaluation of the recognition performance of supervised and/or unsupervised machine learning techniques [32,33].
In our work, we choose to compare the following cases trying to minimize the influence of feature generation and extraction: • Case 1: This experiment occupies the raw dataset of the case study as the feature set in order to measure the classification, recognition and monitoring performance over physical activities of all supervised methods, as explained above [32,33]. • Case 2: This experiment conducts a feature reduction over the feature set of the previous case, using the well-known recursive feature elimination (RFE) method [34,35]. Table 4 shows the ten retained features, and Figure 4 shows the accuracy curve of its validation. In fact, this experiment aims to compute human activity recognition with the minimal set of raw signals from the sensors' channels, since minimizing the number of sensors and the usage of their channels is a challenging problem in HAR [9]. The features retained from the initial set of features by the automatic RFE method can apparently contain presumably redundant features (e.g., accelerometers 16 g and 6 g) or some variables that presumably can lead to overfitting. Regarding these two concerns, Guyon et al. [35] proved with simple examples that "noise reduction and consequently better separation may be obtained by adding variables that are presumably redundant" [35]. Thus, sometimes, variables that are apparently redundant, as in our case, can enhance the prediction power when they are combined. At last, the same measures of classification, recognition and monitoring performance were computed. • Case 3: This experiment evaluates noise tolerance in all supervised classifiers using noisy datasets. For instance, Zhu and Wu [36] describe different types of noise generation: in the input attributes and/or in the output class; in training data and/or in test data; in the most noise-sensitive attribute or in all attributes at once. Thus, we decided to generate noise only in some input feature values of some samples of the testing dataset. In order to add noise in a numeric attribute, the authors in [36] suggest selecting a random value that is between the maximal and the minimal values of the feature. For our experimentation, we first randomly removed some feature values using a 7%, 15% and 30% data selection in order to simulate missing values and then automatically replaced the null values with the mean of the related feature, as some data mining tools suggest [37]. In fact, this method can be considered as random noise insertion, given that generated missing data are replaced with a value. Notice that supervised models built for this experiment are the same classifiers as those built in the first experiment.  The overall methodology is shown in Figure 5. The experiments were executed in a computer Intel Core TM i5-2400 with CPU at 3.10 GHz and 16 GB-RAM over Windows 7 Pro, Service Pack 1 64-bit operating system. data acquisition  random selection (15,300-samples) data sets 10  To this end, we conduct another experiment using a majority voting across windows-based approach and the AHN classifier to simulate the data flow in a real HAR system and to validate the performance of the proposed classifier in that situation. In fact, we select the first 30 s of each activity carried out by all of the subjects as the testing set, using the same models obtained in the first experiment. Table 5 shows the activities performed by each subject for at least 30 s [12,13]. Then, we apply a fixed window of 2.5 s in size (i.e., 250 samples) without overlapping during the 30 s of each activity. Lastly, a majority voting strategy [32] was employed inside the window in order to finally output the recognized activity. For this experiment, we build the models with the same strategy as followed in the previous cases. Table 5. Summary of activities performed by each subject for at least 30 s.

Metrics
This case study uses different metrics to evaluate the performance of the AHN classifier in comparison with the other supervised classifiers, such as: accuracy, sensitivity, precision and F 1 score [38]. In addition, the metrics distinguish two ways of computation: macro-averaging (M) and micro-averaging (µ) [38]. The first one treats all classes equally, while the second one considers the size of each class. Thus, macro-averaging is important to measure the overall classification, and micro-averaging computes the performance of classifiers in a precise way. To this end, F 1 score was calculated using Equations (8) and (9) [38], respectively.
Additionally, other metrics on the classifiers are computed as well: training_time specifies the training time (in seconds) to build and train a model, and testing_time specifies the evaluation time of an input sample (in milliseconds).

Experimental Results and Discussion
As said above, three experiments were conducted in order to evaluate the performance in both monitoring and noise tolerance tasks using an artificial hydrocarbon networks-based classifier in the context of the case study previously presented. In addition, a fourth experiment was conducted using a majority voting across windows-based strategy to simulate the data flow in a real HAR system and to validate the performance of the AHN classifier in that situation. Thus, this section presents and analyzes the comparative results obtained in this regard.

Comparative Analysis on Physical Activity Monitoring
To evaluate the performance on monitoring physical activities using the AHN classifier, two experiments were conducted. The first experiment considers the complete dataset of the case study, and the second experiment occupies the reduced dataset using the RFE technique (see Section 5). Tables 6 and 7 show comparative results (sorted in descending order by accuracy) of the supervised classifiers in terms of the metrics already defined above. In both cases, the AHN classifier ranks over the mean accuracy, and it is positioned in the first quartile of the evaluated classifiers. Using the complete dataset, the AHN classifier is placed close to decision tree (first place), rule-based (second place) and support vector machine (fourth and fifth places) -based classifiers, as seen in Table 6. In addition, Table 7 shows that the AHN classifier is placed close to stochastic gradient boosting (first place), AdaBoost (third place), random forest (fourth place) and rule-based (fifth place) classifiers. For instance, the decision tree-based classifier (the best ranked method in Table 6) is 0.52% over the AHN classifier based on the accuracy, and in terms of F 1 -score µ , the decision tree-based classifier is 0.33% over the AHN classifier. Using the same comparison, Table 7 shows that stochastic gradient boosting (the best ranked method) is 1.5% and 0.86% over the AHN classifier based on accuracy and F 1 -score µ , respectively.
Comparing Tables 6 and 7, the performance of the methods is modified. For example, the decision tree-based classifier goes down 3.12% in accuracy and 1.35% in F 1 -score µ ; while stochastic gradient boosting goes up 1.77% in accuracy and 0.57% in F 1 -score µ . In this regard, the AHN classifier goes down 0.89% in accuracy and 0.46% in F 1 -score µ . These comparisons give some insights about the robustness of the AHN classifier in contrast to the other two methods that were ranked in first place in any of the complete or reduced datasets.

Comparative Analysis on Supervised Model Performance under Noisy Data
A third experiment was conducted in order to measure the noise tolerance of the selected supervised classifiers. In this case, three noisy datasets (7%, 15% and 30% randomly corrupted) were used (see Section 5). Tables 8-10 show the overall results, sorted in descending order by accuracy, of this experiment. In 7% noisy data, the AHN classifier ranks over the mean accuracy, and it is positioned in the first quartile of the evaluated classifiers. The proposed classifier is placed close to random forest (first place), stochastic gradient boosting (second place), rule-based (fourth place) and decision tree (fifth place) -based classifiers. In terms of the accuracy, the random forest-based classifier is 1.31% over the AHN classifier; while it is 0.71% over the AHN classifier in terms of F 1 -score µ .
In 15% and 30% noisy data (Tables 9 and 10), the AHN classifier also ranks over the mean accuracy, and it is positioned in the first quartile of the evaluated classifiers. In both experiments, the AHN classifier is very close to naive Bayes, k-nearest neighbors, SVM with radial basis function kernel and stochastic gradient boosting. In the 15% noisy dataset, the AHN classifier is ranked at the top of the table; while in the 30% noisy data, it is ranked 0.14% under the naive Bayes-based classifier.

Comparative Analysis on the Majority Voting Across Windows-Based Strategy
As already mentioned in Section 5, a majority voting across windows-based approach was also conducted to validate the performance of the AHN classifier in a simulated data flow that can be found in a real HAR system.
At first, we extracted the first 30 s of each activity carried out by each of the subjects (see Table 5), and we validated that our AHN classifier, as well as the other supervised models are able to classify human activities correctly. Table 11 reports the performance results of all methods, sorted in descending order by accuracy. In contrast with Table 6, it can be seen that the AHN classifier is stable in both circumstances with small (0.9829 in accuracy) and large (0.9845 in accuracy) testing sets. Furthermore, the other top methods (i.e., random forest, rule-based classifier, SVM, decision tree and stochastic gradient boosting) are consistent in both experiments. In addition, Table 12 shows the confusion matrix of the AHN classifier.  Table 13. Comparison of the supervised classifiers using a majority voting across windows-based approach (2.5-s window size).  Then, a fixed window of 2.5 s was applied to the sequential data, and a majority voting strategy was computed within the window. The results of the AHN classifier, as well as the other fourteen methods are reported in Table 13, sorted in descending order by accuracy. Notice that the AHN classifier, as well as rule-based classifier, decision trees, random forest, stochastic gradient boosting and k-nearest neighbors have 100% accuracy. In particular, the confusion matrix of the AHN classifier is presented in Table 14. The values of this matrix correspond to the number of windows for each activity performed by the related subjects. In contrast with the confusion matrix of Table 12, the majority voting across windows-based approach improves the performance of the sample-based experiment. This can be explained because the latter has less false positive than true positive values for each activity. To this end, an overall perspective of the learning performance in the proposed classifier can be seen in Figure 6, which shows the learning curve of the AHN classifier for this experiment.

Discussion
From the first two experiments, the artificial hydrocarbon networks-based classifier showed good performance in terms of accuracy and F 1 -score µ in comparison with the other 14 supervised methods of classification. In that sense, the AHN classifier can achieve physical activity monitoring tasks.
Besides , Tables 15-17 show the confusion matrices of the AHN classifier using the 7%, 15% and 30% noisy datasets, respectively. As shown, the confusion matrices present a few mistaken classifications, most of them close to the diagonal. This behavior can be explained by the nature of the method. For instance, the nature of artificial hydrocarbon networks is mainly for regression tasks; then, classification problems are converted into a regression problem using numeric labels as data values for approximation. In that sense, similar numeric labels are the cause of misclassification. To this end, this misclassification behavior is completely related to the nature of the method and not in terms of the nature of physical activities.
On the other hand, large values in the confusion matrix are also analyzed. For instance, ascending stairs, cycling and walking are confused with Nordic walking; also, computer work is confused with watching TV. The human performances of these activities are closely related; thus, the performance of the AHN classifier is related to the nature of the physical activity. To this end, notice that confusion matrices correspond to the AHN classifier performance when data from sensors are corrupted, and as a result, it is more difficult to handle physical activity monitoring for the methods. From Tables 8-10, it is shown that the AHN classifier has a suitable performance in contrast with the other methods.  From the above experimental results, all methods have advantages and weaknesses. In that sense, the overall performance of the supervised classifiers is also inspected. For instance, Table 18 shows the overall performance of the classifiers in terms of the accuracy metric, and Table 19 summarizes the overall results in terms of the F 1 -score µ . The first three experiments are concentrated in these tables. In order to preserve a more confident analysis, results from the 7% noisy dataset are only considered here. The mean (x) and the standard deviation (σ) of both metrics were computed. The tables are sorted in descending order by the mean values of the metric, concluding that the artificial hydrocarbon networks-based classifier is ranked in second position in both accuracy and F 1 -score µ metrics.     2  9  1  3  2  0  1  5  3  0  1  0  Nordic walking  17  22  20  26  19  16  278  20  6  3  16  17  19  31  8  10  1  0  Watching TV  1  0  13  1  5  5  8  221  2  0  2  2  5  4  2  3  Since the accuracy measures the overall classification performance (Table 18), the AHN classifier is very competitive for physical activity monitoring (x = 0.9756) because the method is close to the best stochastic gradient boosting ranked method (x = 0.9782), representing a relative gap of 0.27%. In addition, the AHN classifier does not only performed well in monitoring, it also shows the smallest standard deviation (σ = 0.0055) in comparison with the other methods, proving that the AHN classifier is very robust instead of different datasets (complete, reduced and noisy), as shown in Figure 7.  The same analysis can be done using the information from Table 19 in which the F 1 -score µ is compared. Since the F 1 -score µ measures the tradeoff between sensitivity and precision evaluations in unbalanced classes, then the AHN classifier is also suitable for physical activity monitoring represented by thex = 0.9871. This mean value is close to the best random forest ranked method, which obtained x = 0.9895, representing a relative gap of 0.24%. Using the F 1 -score µ , the AHN classifier also showed suitable robustness to different datasets (complete, reduced and noisy), obtaining σ = 0.0029, which ranks it in the second position below the random forest-based classifier, as depicted in Figure 8. To this end, the AHN classifier is positioned close to the following classifiers in terms of monitoring task performance and noise tolerance (see Tables 18 and 19) and robustness (see Figures 7 and 8): Stochastic gradient boosting, random forest, rule-based classifier, decision trees and artificial neural networks.
A closer look at the results over the noisy datasets is summarized in Table 20. The mean and the standard deviation of accuracy and F 1 -score µ were calculated. As shown, the AHN classifier is ranked at the top of the table with 93.43% of accuracy and 96.97% of F 1 -score µ on average. In terms of standard deviation, the AHN classifier is the second best classifier in accuracy over the nearest shrunken centroids; and it is the best classifier in F 1 -score µ . The above results conclude that the AHN classifier is tolerant to different ratios of noise in raw data sensors. On the other hand, the above benchmark is closely related to the literature. An overall look into Tables 18 and 19 shows that boosting and bagging methods (e.g., stochastic gradient boosting, AdaBoost and random forest) are positioned over discriminant analysis methods (e.g., linear and mixture), and those are over instance-based classifiers (e.g., k-nearest neighbors and nearest shrunken centroids), as noted in [25]. Furthermore, artificial neural networks are placed over discriminant analysis and instance-based methods, as suggested in [25]. In terms of noise tolerance, instance-based classifiers are easily altered by exclusion of single noisy data, as mentioned in [8]; this can be explained by the low positions of these methods observed in the experimental results. Additionally, decision trees obtained good performance in the benchmark (Table 8), which is correlated with the tolerance characteristic detected in [8], which assumes that decision trees trained with noisy data are more tolerant than when the method is trained with filtered data and then test data are corrupted with noise. With respect to support vector machines, the methods occupied in this benchmark obtained between medium to poor performance (see Table 8), which can be explained, since SVMs are easily altered by the exclusion of noisy data, as suggested in [8].
In fact, the above results were computed with raw sensor signals as features in order to minimize the influence of the feature extraction typically done in HAR. Hence, the accuracy in several methods is ranked high. Other factors that influence the high levels of accuracy are the cross-validation process and the selection of the best model based on the latter. In contrast to the single-based approach, a fourth experiment was conducted using a majority voting across windows-based approach. As noted, the proposed AHN classifier is improved in terms of accuracy (100%), since calculating a majority voting value per window increases the probability to predict activities well, as expected [32]. In addition, other methods can also reach that accuracy in the same way.
To this end, Table 21 summarizes the training time (measured in seconds) that classifiers take to build and train a model and the testing time (measured in milliseconds) that they take to compute a classification of one sample. As shown, the AHN classifier has the longest training times in both the complete (72.61 s) and the reduced (61.53 s) datasets; while it is the third worst classifier in terms of testing times in both the complete (1.71 ms) and the reduced (0.92 ms) datasets. Finally, from the comparative study of the three experiments run in this benchmark, the majority voting across windows strategy and comparing the results obtained with the literature, it is evident that artificial hydrocarbon networks-based classifiers are: (i) suitable for physical activity monitoring; (ii) noise tolerant of corrupted data sensors; (iii) robust in terms of different issues for data sensors; and (iv) useful for simulated data flow classification; proving that AHN classifiers are suitable in the field of human activity recognition.

Conclusions and Future Work
Automated human activity recognition is a challenging task. Particularly in sensor-based approaches, these present several drawbacks, such as: the intensive periods of time for wearing sensors, typical battery issues and the presence of noise in data due to sensor errors or noisy environments. Thus, robust machine learning techniques are required in human activity recognition.
In that sense, this paper presents a novel supervised machine learning method called artificial hydrocarbon networks for human activity recognition. In fact, experimental results over a public physical activity monitoring raw dataset proved that the artificial hydrocarbon networks-based classifier is suitable for human activity recognition when compared to the other fourteen well-known supervised classifiers. In particular, the overall classification performance was measured in terms of accuracy (x = 0.9756) and micro-averaging F 1 -score µ (x = 0.9871), while robustness was analyzed in terms of the standard deviation of accuracy (σ = 0.0055) and micro-averaging F 1 -score µ (σ = 0.0029) over three different experiments, concluding that the AHN classifier is robust for different data (complete, reduced and noisy) profiles. To this end, experimental results in noisy data also confirm that the AHN classifier is noise tolerant of corrupted raw data sensors (i.e., 7%, 15% and 30% noise level), achieving 93.43% in accuracy and 96.97% in F 1 -score µ . Moreover, when using a majority voting across windows-based approach, the AHN classifier is able to provide an accuracy of 100%, validating that it is useful for simulated data flow classification.
For future work, we must address two important challenges in order to prove that our AHN classifier is very well suited for human activity recognition. One important challenge for an activity recognition classifier is to determine if it is sufficiently flexible to deal with inter-person and intra-person differences in the activities' performance. People can perform the same activity differently if they are in various times and situations (e.g., day or night, energetic or tired, etc.). Similarly, there is great variability in the performance of an activity depending on the person characteristics, such as age, weight, gender, health conditions, etc. [9]. The second challenge is to determine if AHN is capable of finding the most informative and discriminative features with the goal of developing a real-time HAR system to classify as many activities as possible with good performance. To this end, we will also revise the artificial hydrocarbon networks algorithm in order to improve the training time and make it more competitive with respect to the other methods.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Artificial Hydrocarbon Networks: A Numerical Example
This section aims to show training and testing procedures in artificial hydrocarbon networks (AHN) for classification purposes. To this end, a numerical example with a general purpose was elected.

Appendix A.1. Training Step
Consider that there is a dataset of 20 samples with three features and one label, as shown in Table A1. If an artificial hydrocarbon network model is required, then the training process will be as follows: (i) define the training set; (ii) determine configuration parameters; (iii) run Algorithm 1; and (iv) obtain the AHN-model. For this particular example, the training set is defined to be 50% of the original dataset (Table A1), and the remaining 50% will be considered for testing. Random selection is applied. For example, the following samples are part of the training set: {1, 5, 6, 7, 13, 14, 15, 16, 18, 19}. Next, Algorithm 1 is computed. Following the algorithm, the first step is to initialize an empty hydrocarbon compound AHN = {}. Then, a saturated compound C is created using the number of molecules n and Equation (2), i.e., C = CH 3 − CH 2 − CH 3 . In fact, the latter means that the first molecule has three hydrogen values; the second one has two hydrogen values; and the third one has three hydrogen values. Then, three intermolecular distances r t for t = 1, ..., n are randomly created. For this example, Table A2 shows the initial intermolecular distances. Notice that each intermolecular distance is a vector in the feature space, such that r t = {r 1 , r 2 , r 3 } t . Then, a loop starts until a tolerance criterion is true. Inside this loop, the set of bounds L t is computed using Equation (A1); where L 0 is the minimum value of each feature in the training set, i.e., L 0 = (3.2687, 3.6221, −4.3858). Table A2 shows the first iteration of bounds.
Then, these bounds define a subset of samples for each molecule. In fact, a subset of samples is used to compute the hydrogen H i and carbon σ values in the specific molecule, using Equation (1) and an optimization process. In this work, the least square estimates were used as suggested in [27]. Using these parameters, the compound C is built using Equation (3). A prediction with this compound is done in order to calculate the energy of molecules. In this example, the mean squared error is employed. It is remarkable to say that the rounding function was employed at the output of the predicted values. Lastly, the updated values of intermolecular distances are computed using Equations (4) and (5). For this example, assume that the energy values of the molecules are E 1 = 1.5, E 2 = −0.5, E 3 = 1.0 with a steady state E 0 = 0.0; then, the intermolecular distance differences are ∆r 1 = 0.15, ∆r 2 = −0.20, ∆r 3 = 0.15, and the updated intermolecular distances are those summarized in Table A2.
Once the loop stops, the artificial hydrocarbon network AHN is completed with the following information: the set of CH-molecules, the hydrogen and carbon values of each molecule and the complete set of bounds, such that AHN = {C, ψ(x)}. Table A3 summarizes the parameters of the resultant AHN-model.  Notice that the order of molecules is defined by the algorithm; but once the AHN-model is trained, the order has to remain constant when testing.

Appendix A.2. Testing Step
Once the AHN-model is trained, the testing step considers validating the output predictions of the classifier. In that sense, the testing set is required. Following with this example, the testing set is composed of the samples: {2, 3, 4, 8, 9, 10, 11, 12, 17, 20}. Then, the functional ψ(x) with parameters equal to the ones as shown in Table A3 is used. The inputs of this function are the features of the testing set. For instance, consider the first sample in the testing set x = (4.2141, 2.3912, 6.8321). This value is tested in ψ(x), which calculated the value ψ(x) = 1. As noted, the result is the same as the label. Table A4 shows a comparison between the predicted values using the AHN classifier and the target values. For an extended description of training and testing artificial hydrocarbon networks, see [27,28]. Table A4. Comparison between the predicted values y AHN and the target values y. 2  1  1  3  1  1  4  1  1  8  1  1  9  1  1  10  1  1  11  2  2  12  2  2  17  3  3  20  3  3