Predictive Model for Human Activity Recognition Based on Machine Learning and Feature Selection Techniques

Research into assisted living environments –within the area of Ambient Assisted Living (ALL)—focuses on generating innovative technology, products, and services to provide medical treatment and rehabilitation to the elderly, with the purpose of increasing the time in which these people can live independently, whether they suffer from neurodegenerative diseases or disabilities. This key area is responsible for the development of activity recognition systems (ARS) which are a valuable tool to identify the types of activities carried out by the elderly, and to provide them with effective care that allows them to carry out daily activities normally. This article aims to review the literature to outline the evolution of the different data mining techniques applied to this health area, by showing the metrics used by researchers in this area of knowledge in recent experiments.


Introduction
The research area of assisted living environments (AAL) focuses on generating innovative technologies, products, and services to aid, medical care, and rehabilitation to elderly people with the purpose of increasing the time in which these people can live independently, whether they suffer from neurodegenerative diseases or disabilities. This key research area is responsible for the development of activity recognition systems (ARS), which are a valuable tool to identify the types of activities carried out by elderly people, and to provide them with effective assistance that allows them to carry out daily activities normally.
ARS are based on human activity recognition (HAR), which encompasses the recognition of a wide range of activities. Within these, this work focuses especially on activities of daily living (ADL). To evaluate the performance of ARS in the recognition of activities of daily living, it is necessary to use test data sets in experimental scenarios, which have been suitably designed by the scientific community for HAR.
Currently, a large part of the world's elderly population suffers from neurodegenerative diseases. These types of diseases greatly affect the people who suffer from them, since they cause loss of balance, reduced mobility, speech deficiencies, breathing issues, and other alterations in cardiovascular function, which directly lead to a decrease in the cognitive abilities of individuals and, to a great extent, make it difficult to carry out activities of daily living [1]. Alzheimer's, dementia, amyotrophic lateral sclerosis (ALS), and Parkinson's are some of the most common types of neurodegenerative diseases.
However, before implementing these systems, it is necessary to evaluate their performance in the HAR process to optimize the classification of activities in indoor environments. In this project, a functional model for HAR was built, combining the logistic model trees (LMT) classification technique and the One R feature selection technique; from the latter, the

Data Collection Type [6]
Recognition Type [7,8] Application Areas [7,9] Wearable devices and sensors: accelerometers, gyroscopes, GPS, electrocardiogram, magnetometer, and heart rate, among others Environmental sensors: binary sensors and cameras Gestures, actions, interactions, and activities (e.g., daily living) Computer vision, video surveillance (e.g., banks or airports), sport technique analysis, interaction with video games through gestures, military tactics, assisted living environments for health care of the elderly people or other diseases HAR is an area of research with numerous applications such as computer vision [6], video surveillance implemented in banks or airports, sports technique analysis, systems that allow interaction with video games through gestures and military tactics [7], in addition to assisted living environments providing care for the elderly or people with mental illness. This wide range of applications makes HAR a highly relevant and current research topic.
HAR recognizes patterns of human activity from different types of data, which are collected through different devices that contain a variety of sensors. For example (1) wearable devices that integrate accelerometers, gyroscopes, GPS, and heart rate sensors, among others; or (2) environmental sensors that collect numerical or categorical data directly from cameras that record image or video data. Thus, human activity recognition has been approached in two different ways in terms of the source or the type of device that collects the data: the first is wearable sensors which are directly attached to the user. The second source is external sensors, which are fixed by default to objects with which people will interact within a given area of interest [8].
Regarding human activities, in [6] these have been categorized or classified at different levels according to their complexity: gestures, actions, interactions, and group activities. Gestures are considered elementary movements of a part of the person's body such as stretching an arm or lifting a leg. Actions are activities that can be composed of multiple gestures organized in a space of time performed by a person, such as walking or jumping. Interactions are human activities involving two or more people and/or objects; for example, two people fighting or one person doing the dishes. Finally, group activities involve multiple people and/or objects, such as a group hike or a fight between two groups.
Although the field of HAR is very broad, this research work focuses on the recognition of activities of daily living (ADL), which were defined in [9] as the set of activities that a person performs independently for their personal care, transport, and communication, such as personal mobility, eating, cleaning and resting, among others. Indeed, ARS based on HAR to recognize activities of daily living has the potential to bring significant improvements in the quality of life of people suffering from neurodegenerative diseases, but the performance of these systems must be previously tested and measured by evaluating various test data sets in experimental scenarios. To achieve this end, the scientific community has developed and promoted a variety of data collections available online, which contain information regarding activities of daily living, performed both in indoor and outdoor environments.

HAR Dataset
In the scientific literature in the field of ADL recognition, seven datasets [10] are highly referenced and the main characteristics of these are summarized below in Table 2.  The most relevant data sets are (1) the Van Kasteren dataset [11], which is a collection of binary values collected from a wireless sensor network (WSN) deployed in an enclosure occupied by two men; (2) the Kyoto [12], Aruba [13] and Multiresident [14] datasets, all of which are part of the CASAS project [12] carried out by WSU (Washington State University). The latter deployed a variety of environmental sensors in an apartment, which consisted of three bedrooms, a bathroom, a kitchen, and a living room.
For this study, we decided to evaluate the Aruba CASAS dataset, since it is a comprehensive dataset, whose raw files are available online on the official project site. Although it has been shown that the evaluation metrics [17] are 100% in terms of accuracy, in this study the second-best result so far was obtained, with an improvement in terms of accuracy and in computation times. This was achieved by evaluating other classification techniques and reducing the scope of the data by applying various feature selection techniques.
In this paper, the single and multiple occupancy dataset known as Aruba CASAS smart home project [13] from WSU (Washington State University) is used. This dataset collected different data sources in the home of an adult volunteer. The resident of the house was a woman who received visits from her children and grandchildren regularly between 4 November 2010 and 11 June 2011. Two data sources gave rise to the information, the first source was binary and was made up of movement and contact sensors, and the second source was made up of temperature sensors.
The binary source consisted of 35 sensors, of which 31 were movement sensors, identified by the letter M. These sensors were installed on the floor and detected the pressure exerted by the individual when stepping on the ground, representing the activation and deactivation states (ON/OFF). The remaining four (4) sensors were contact sensors, installed on the doors and identified by the letter D. These types of sensors detect the opening and closing states of the doors (OPEN/CLOSE). The second source was made up of 5 temperature sensors located in different places in the house and identified by the letter T. This type of sensor detects the temperature of the environment in continuous values represented in degrees Celsius.
The information contained in this dataset is made up of the recorded events, a product of the individual's interactions with each of the sensors. For each event (each activity performed by the individual), the start and end date and time are recorded. In total, eleven activities were labeled, but in this study, only nine (9) were considered because the other two activities have a very low number of samples. For evaluation purposes, the following activities were considered: preparing meals (Meal_Preparation), resting (Relax), eating (Eating), working (Work), sleeping (Sleeping), going from bed to the bathroom (Bed_to_Toilet), getting home (Enter_Home), leaving home (Leave_Home) and cleaning (Housekeeping).

Building Predictive Models for HAR
This section describes the methodology applied: pre-processing of the datasets, aggregation functions, model building, and experimentation.
The proposal described here is based on the pre-processing of the original data [18][19][20] provided by the Aruba CASAS dataset. This resulted in the processed dataset, from which three new subsets of data were generated: Aruba CASAS-raw, Aruba CASAS-duration, and Aruba CASAS-sensor-based.
For each of the three datasets, the process of building a functional model was carried out, followed by a comparison of the quality metrics for each model and, finally, choosing the best-validated model and the correct configuration of the dataset in terms of feature categories.
This whole process is summarised in Figure 1.

Pre-Processing of Datasets
The starting point for this research was the original data provided by the A CASAS dataset (detailed above), made up of the events recorded both by binary se (motion and contact) and by temperature sensors. In addition, it includes the sta end date and time of each activity. Initially, a pre-processing phase was carried out, consisted in generating features from the representation of the activity duration frames, extracted from the original data instances. This procedure gave rise to th cessed dataset, whose structure is detailed below. The processed dataset is made u total of 69 features, divided into four (4) categories: count features, average featur gregation features, and original features. The count features are built from the c sensors in the doors. In total, there are four (4) contact sensors, and a count was ma both opening and closing (OPEN/CLOSE), within the duration frames of the acti Therefore, eight (8) door contact sensor features were generated. Count features wer generated from the motion sensors. However, since motion sensors have nearly sim neous ON and OFF states (i.e., an OFF state is executed immediately after the ON an event count was made from the pair of states (ON and OFF) for each sensor. Ther

Pre-Processing of Datasets
The starting point for this research was the original data provided by the Aruba CASAS dataset (detailed above), made up of the events recorded both by binary sensors (motion and contact) and by temperature sensors. In addition, it includes the start and end date and time of each activity. Initially, a pre-processing phase was carried out, which consisted in generating features from the representation of the activity duration time frames, extracted from the original data instances. This procedure gave rise to the processed dataset, whose structure is detailed below. The processed dataset is made up of a total of 69 features, divided into four (4) categories: count features, average features, aggregation features, and original features. The count features are built from the contact sensors in the doors. In total, there are four (4) contact sensors, and a count was made for both opening and closing (OPEN/CLOSE), within the duration frames of the activities. Therefore, eight (8) door contact sensor features were generated. Count features were also generated from the motion sensors. However, since motion sensors have nearly simultaneous ON and OFF states (i.e., an OFF state is executed immediately after the ON state), an event count was made from the pair of states (ON and OFF) for each sensor. Therefore, 31 motion sensor features were generated. Other count features are the number of events corresponding to a certain activity carried out in the time frame. The duration features represent the difference in seconds between the start date and time and the end date and time of the activity.
Regarding average features, these have been calculated from the temperature sensors, since the values they take are continuous data, adding a total of five (5) features to the dataset. Additionally, other aggregation features were generated from these sensors and, since four (4) statistical formulas were used (range, standard deviation, skew, and kurtosis) for each of the sensors, a total of 20 features were generated for this category. The three (3) remaining features are part of the category of original features and correspond to the class label, the start date and time, and the end date and time of the activity. For greater precision, Table 3 contains the structure of the processed dataset.  (Total: 1)   T001, T002,  T003, T004 Y  T005  (Total: 5)   T001-RANGE,  T001-DESV,  T001-BIAS,  T001-KURT,  T002-RANGE,  T002-DESV,  T002-BIAS,  T002-KURT,  T003-RANGE,  T003-DESV,  T003-BIAS,  T003-KURT,  T004-RANGO,  T004-DESV,  T004-BIAS,  T004-KURT,  T005-RANGE,  T005-DESV,  T005- From the processed dataset, three data subsets were generated that will be called: Aruba CASAS-raw, Aruba CASAS-duration, and Aruba CASAS-sensor-based, which differ in the number of features and have the following configuration: These datasets were generated to carry out subsequent tests and identify which dataset produces the best results in terms of the classification capacity of machine learning techniques. These techniques were proposed to evaluate the incidence of one or another category of features in the classification capacity of the technique. Table 4 identifies the number of features of these three datasets based on the categories of features that make them up. Moreover, the data subsets used in the model construction process for training (train) and testing (test) follow the distribution of data instances presented in Table 5. The proportions correspond to 69.90% for the training subset and 30.10% for the testing subset, in each of the three datasets (Aruba CASAS-raw, Aruba CASAS-duration, and Aruba CASAS-sensor-based). For the construction of each subset (training and testing), the instances were selected randomly, approximately the same proportion of instances for each class label. That is, approximately 70% for training and 30% for testing (see Table 6). Table 6. Distribution of data instances by class for training and testing data subsets for the Aruba CASAS-raw, Aruba CASAS-duration, and Aruba CASAS-sensor-based datasets.

Aggregation Functions
To verify the Aruba CASAS-sensor-based dataset, it was necessary to calculate several features from aggregation functions. In this process, the instances were grouped by class criteria. That is, by activity, specifically from the temperature features, the functions used were: range, standard deviation, skewness, and kurtosis, using the functions defined in [21]. Each of these is detailed below: -Range: is the difference between the largest value and the smallest value in a data set.
-Skewness: defined as the quotient of the third central moment m 3 of a data set and the standard deviation cubed.
-Kurtosis: defined as the quotient of the fourth central moment of a data set m 4 , and the standard deviation σ to the fourth power.

Model Construction
Different models were built from the three datasets, implementing classification techniques, integrated with feature selection techniques. As a result of the evaluation of the quality metrics, the best results were identified for each dataset. That is, each evaluation made it possible to identify the best combinations of classification techniques with feature selection techniques, which generated the highest quality metrics for each evaluated dataset (Aruba CASAS-raw, Aruba CASAS-duration, and Aruba CASAS-sensor based).
A comprehensive comparative analysis of the results obtained by these evaluations made it possible to identify the dataset that generated the best classification results and the respective classification techniques and feature selection which led to those best results (see Figure 2).

Experimentation
We wanted to build a model that yields the best results in terms of quality metrics. In addition to evaluating different configurations of feature categories for the dataset, which have a major impact on the classification process, three experimentation scenarios were proposed. In our first experimental scenario, different classification techniques were applied to each of the three data subsets (Aruba CASAS-raw, Aruba CASAS-duration, and Aruba CASAS-sensor-based). Then, we wanted to identify the techniques that generate the best quality metrics in each of the experiments. For this evaluation, a random sampling of instances of each dataset was carried out to divide them into training and testing, of which each training data set (train) is 70% of the samples, and each testing data set (test) is approximately 30%.
In our second experimental scenario, different feature selection techniques were applied to the training and testing datasets of each data subset, and the optimal number of features was identified with the classification technique that best affects the evaluation process for each one of the data subsets. In our third experimental scenario, for each dataset, the performance of the best hybridization of classification technique with feature selection technique using 10-fold cross-validation was comprehensively evaluated. Each one of the proposed experimentation scenarios carried out for each data subset (Aruba CASAS-raw, Aruba CASAS-duration, and Aruba CASAS-sensor-based) is detailed below. selection techniques, which generated the highest quality metrics for each evaluate taset (Aruba CASAS-raw, Aruba CASAS-duration, and Aruba CASAS-sensor base A comprehensive comparative analysis of the results obtained by these evalua made it possible to identify the dataset that generated the best classification result the respective classification techniques and feature selection which led to those be sults (see Figure 2).

Experimentation
We wanted to build a model that yields the best results in terms of quality m In addition to evaluating different configurations of feature categories for the da which have a major impact on the classification process, three experimentation scen were proposed. In our first experimental scenario, different classification techniques

Experimentation Scenarios
Here we describe different experimentation scenarios for the creation of a predictive HAR (human activity recognition) model, applying different configurations of classification and feature selection techniques to the Aruba CASAS-raw, Aruba CASAS-duration, and Aruba CASAS-sensor-based data subsets generated from the original Aruba CASAS dataset. Subsequently, to compare the performance of different machine learning approaches, a comparative analysis of the quality metrics was performed on each of the three recreated scenarios: (1) with classification techniques, (2) through hybridization of classification and selection techniques, and (3) evaluating the best results through cross-validation. In the three scenarios, the three pre-processed data subsets were used to identify which data subset, when processed using the respective techniques, generates better quality metrics in the predictive process.

Experimental Scenario No. 1: Comparative Analysis of Classification Techniques on Data Subsets
In this first scenario, three experiments were carried out, each evaluating 31 classification techniques in the three datasets (Aruba CASAS-raw, Aruba CASAS-duration, and Aruba CASAS-sensor-based). For each experiment, subsets of data were used, from each dataset, for the training process (train) and the testing process (test). The classification techniques evaluated in the different experiments of this scenario are presented in Table 7, indicating the subcategory to which they correspond.

Subcategories
Technique Function

Decision Tree
Logistic Model Trees-LMT [22] Build logistic model trees.
Reduced-Error Pruning Tree-REPTree [24] Fast tree learning using pruning in error reduction.
RandomForest [25] Construction of random trees Random Tree [24] Build a tree that considers a random number of given features at each node.

Rules
JRip [27] RIPPER (Reduced Incremental Pruning to Produce Error Reduction) algorithm for fast, efficient rule induction.

Multiclassifiers (Meta)
Random Committee [24] Build a set of random base classifiers Stacking [29] Combine multiple classifiers using the stacking method.
LogitBoost [33] Perform additive logistic regression Classification Via Regression [24] It performs classification using a regression method MultiClass Classifier [34] Use a two-class classifier for multiclass data sets Bagging [35] A bag classifier works by regression as well.
Vote [37] Combine classifiers using average probability estimates or numerical predictions CVParameterSelection [38] Performs parameter selection through cross-validation MultiScheme [39] Uses cross-validation to select a classifier from multiple candidates AttributeSelectedClassifier [24] Reduces the dimensionality of the data by selecting attributes.
RandomSubSpace [40] Build a decision tree-based classifier that maintains the highest accuracy on the training data.
Filtered Classifier [39] Run a classifier on filtered data

Lazy algorithms
IB1 Instance-based Learning Algorithms [41] Instance-based learning is a basic nearest neighbor IB2 Instance-based Learning Algorithms [41] K nearest neighbor classifier.
KStar [42] A nearest neighbor with a generalized distance function LWL [43] A general algorithm for locally heavy learning.
To define the features of the datasets, a series of feature selection algorithms were used in this experiment, which yielded the following results (see Table 8): For the experiments with the Aruba CASAS-raw and Aruba CASAS-sensor-based datasets (see Table 8), the classifiers with the best results in terms of the recall metric were LMT with 94.50% and LogitBoost with 94.20% when both were evaluated. Additionally, it was possible to identify that in these cases the ROC area metric was 99.60% and 99.70%, respectively. Regarding the test with the Aruba CASAS-duration dataset, the classification techniques with the highest recall were J48 and JRIP at 95.60% for both classifiers, with JRIP presenting the highest ROC area metric at 99.30%. It is important to specify the implementation details of this classifier considering the feature selection process identified in Table 9. Table 10 shows the results of the LMT classifier using the GainRatio and OneR algorithms, respectively. LMT is the classification technique that yields the best results in terms of recall with both the Aruba CASAS-raw and Aruba CASAS-sensor-based datasets. Regarding the Aruba CASAS-duration dataset, even though LMT was not the technique with the best classification results, it reached a Recall of 95.40%, as shown in Table 11 below.

Experimental Scenario No. 2: Comparative Analysis of the Hybridization of Selection and Classification Techniques on Data Subsets
In this scenario, three experiments were carried out, each with the respective datasets mentioned above (Aruba CASAS-raw, Aruba CASAS-duration, and Aruba CASAS-sensorbased). To minimize computation times, we sought to reduce the size of the three datasets, identifying the set of features that best affect the classification. For this purpose, the Info Gain [44], Gain Ratio [44], Symmetrical Uncert [44], OneR [30], and Relief feature selection techniques [45] were combined with each of the four classification techniques. The results were then analyzed for each scenario and each dataset to see which generated better results.
Once the quality metrics were evaluated, it was possible to determine that the hybridization of the classification techniques with the feature selection techniques which generated the best results were: (1) LMT with Gain Ratio using both 27 and 24 features for the Aruba CASAS dataset-raw (see Table 12); (2) JRIP with One R using 47 features and LMT with One R using 33 features for the Aruba CASAS-duration dataset (see Table 13); and (3) LMT with Info Gain using 47 features and LMT with Gain Ratio using 31 features for the Aruba CASAS-sensor-based dataset (see Table 14). In this scenario, the application of different feature selection techniques was carried out to select the features to be selected to be included in the experimentation (see Table 15): Table 15. Feature Prioritization Algorithms approach.
In the experiment with the Aruba CASAS-duration dataset, even though the J48 classifier (in the first experimentation scenario) had generated very good results, reaching 95.60% recall (with 49 features, as can be seen in Table 3), the hybridization proposals JRIP with One R using 47 features and LMT with One R using 33 features (and executed in this second scenario), increased recall, reaching 95.80% and 95.90% respectively. In addition, a significant reduction in the number of features was achieved for the classification process. For a better overview of this (see Table 13). It is evident that of these two combinations of techniques, it is better to use LMT with One R because it generates greater recall (95.90%) and because it only requires 33 features for the classification process.
In the first scenario for the Aruba CASAS-sensor-based dataset with 67 features, a recall of 94.50% and a ROC area of 99.70% was obtained using the LMT technique. In this scenario, with the same dataset, both proposals (LMT + Info Gain with 47 features and LMT + Gain ratio with 31 features) showed an increase in recall, which was 94.90%. LMT with Gain Ratio was the combination that achieved a greater decrease in the number of features (which affects the computation time required by the predictive model), as can be seen in Table 14.
Let us compare the two best hybridizations for each dataset: -In the Aruba CASAS-raw dataset, the two combinations presented the same results in terms of recall, F-Measure, and ROC area. LMT with Gain Ratio using 24 features presented the lowest FP-Rate at 0.5%. - In the evaluation of the Aruba CASAS-duration dataset, the combination with the best recall (95.90%) and ROC area (99.70%) was LMT with One R, using 33 features. -Regarding the evaluation of the Aruba CASAS-sensor-based dataset, the results for the two hybridizations of classification and selection techniques used coincided with the respective results of the precision, recall, F-Measure, and ROC area metrics.
Although LMT with Gain Ratio for 31 features is the combination that presented the highest FP Rate of 0.6%, it is important to highlight that the other combination (LMT with Info Gain) uses 16 more features (see Table 16). Until this point it can be deduced that the dataset that generates the best predictive model is Aruba CASAS-duration, after applying the hybridization of LMT techniques with One R, using only 33 features of the 49 original features. In this order of ideas, these 33 features have the best effect on the classification process to predict human activities. Table 17 indicates the priority of incidence in the prediction identified from the One R selection technique. In this scenario, a more exhaustive evaluation was carried out to assess whether there is a better combination of classification techniques and feature selection, compared to the previous scenario, for each dataset (Aruba CASAS-raw, Aruba CASAS-duration, and Aruba CASAS-sensor-based). Each dataset was trained and tested using 10-fold crossvalidation, generating three experiments, the results of which are detailed in Tables 18-20.  In the cross-validation process, each complete dataset was divided into 10 folds of equal size. Iterative tests were then performed in which the model was trained on 9 folds and tested on the remaining fold. Finally, the quality metrics obtained in each of the 10 iterations were averaged to calculate the result. For the test with the Aruba CASAS-raw dataset, with the LMT classification technique and Gain Ratio feature selection (24 features), the recall was 94.10% (see Table 18). This is not an improvement over the results of the second scenario (same dataset and same combination of techniques), where recall was 94.90% (see Table 12).
In the test with the Aruba CASAS-duration dataset, with the LMT classification technique and One R feature selection (33 features), the recall was 94.10% (see Table 15). This is not an improvement over the evaluation carried out for this dataset with the same combination of techniques in the second scenario, where recall was 95.90% (see Table 15).
Regarding the test with the Aruba CASAS-sensor-based dataset, with the LMT classification technique and One R feature selection (31 features), the recall was 94.00% (see Table 18). This is also not an improvement over the evaluation carried out for this dataset with the same combination of techniques in the second scenario, where recall was 94.90% (see Table 18).
The results obtained in this third experimentation scenario, in terms of recall and ROC area after applying cross-validation, did not show improvements compared to those obtained in the second scenario. This behavior occurred in each of the experiments carried out with the datasets (Aruba CASAS-raw, Aruba CASAS-duration, and Aruba CASASsensor-based) due to overfitting (see Table 21). Overfitting is the result of over-training a model with data adjusted to specific features of the dataset. That is, excessive learning of certain class behaviors means that, in turn, the understanding of behaviors that are different from the class label is impossible. According to [46], this is a result of an imbalance in the training data set. To check if there are significant differences between the proposed models, a statistical analysis was carried out through the study of their variance. For this, the null hypothesis Ho was proposed, which posits equality between the means of the models with an alpha level of significance of 5%, and an alternative hypothesis H1 rejects said equality.
In Table 22, the probability values for the three models-M1 (LMT + Gain Ratio 24 features) vs. M2 (LMT + One R 33 features), M1 (LMT + Gain Ratio 24 features) vs. M3 (LMT + Info Gain 47 features) and M2 (LMT + One R 33 features) vs. M3(LMT + Info Gain 47 features)-are much higher than the 5% alpha level of significance. So, the null hypothesis Ho was accepted, which poses the equality between the means of the models. This indicates that there is no significant difference between the three models proposed, in addition to the consistency of the data considered for the experimentation.

Conclusions
In this section, the results and conclusions reached with the development of this research work are presented, after evaluating each experimentation scenario previously proposed.
In the first scenario, the recall quality metric of 95.60% represents the best result, when the Aruba CASAS-duration dataset was evaluated using 49 features and the J48 and JRIP classification techniques. This surpassed the results of the Aruba CASAS-raw and Aruba CASAS-sensor-based datasets, in which a recall of 94.50% was obtained for both. This shows that adding the two count features for the number of events and activity duration improved recall by 1.1%. On the other hand, the Aruba CASAS-sensor-based dataset did not show any improvements over the Aruba CASAS-duration dataset. On the contrary, the results show an increase in computation times during the classification process. Aruba CASAS-sensor-based dataset has 20 additional features calculated through the aggregation functions applied to the features, generated from the temperature sensors, and which have been calculated by grouping the instances of the original dataset, segmented by classes-activities.
In the second scenario, again the experiment with the best result in terms of recall was the Aruba CASAS-duration dataset. The hybridization of the LMT classification technique with the One R feature selection technique, using 33 features, reached a recall of 95.90% compared to 95.80% achieved by JRIP and One R using 47 features. Additionally, the hybridization of LMT and One R achieved a significant reduction (of 32.65%) in the number of features (16 fewer features) compared to just 4.08% (a reduction of two features) achieved by the hybridization of JRIP and One R. Thus, using the combination of LMT and One R is used will have a direct impact in terms of decreasing in the computation times required for the construction and evaluation of the predictive model.
On the other hand, in the second scenario, regarding the experiments with the Aruba CASAS-raw and Aruba CASAS-sensor-based datasets, there was also a significant reduction in the number of features. Specifically, with the hybridization of LMT and Gain Ratio, using 24 features, for Aruba CASAS-raw and the hybridization of LMT and Gain Ratio, using 31 features, for Aruba CASAS-sensor-based. Although the decrease in the number of features is 48.94% and 53.73%, respectively, recall is 1.00% lower than the value obtained in the Aruba CASAS-duration dataset experiment. It is important to highlight that the classification technique that yielded the best results for each of the experiments with the three datasets, in terms of quality metrics, was the LMT technique. Table 12 presents the ranking of the 33 features that most affect the classification process, as determined by the One R feature selection technique.
In the third scenario, a comprehensive evaluation was carried out to determine the best hybridization for each dataset using 10-fold cross-validation. Here, a decrease in recall was found in each of the experiments with the three datasets (Aruba CASAS-raw, Aruba CASAS-duration, and Aruba CASAS-sensor-based) due to overfitting.
Regarding the recall quality metric corresponding to each class label (activity) in each of the experiments, it should be noted that the winning hybridization for the Aruba CASASduration dataset, despite having yielded a low 83.50% for the "Leave_Home" activity, managed to surpass the 55.60% achieved in both cases by the winning hybridizations of the Aruba CASAS-raw and Aruba CASAS-sensor-based dataset by 28.24%. This may be due to the inclusion of the 2 additional features for the number of events and duration of the activity, including for the Aruba CASAS-duration dataset, given that particularly for said activity the number of events (readings of sensors) is very low (see Table 21).
The recall metric for the cleaning activity (Housekeeping) has yielded different values in the experiments with each dataset. Despite having achieved 100.00% with the Aruba CASAS-raw dataset, its result with the other two datasets was not the best: in Aruba CASAS-duration it was 77.80% and in Aruba CASAS-sensor-based it was 88.90%. The difference in the results obtained for recall in each dataset was due to the low number of instances of this activity compared to the others, just 32 data instances (see Table 6). The highest success rates in terms of quality metrics were obtained when training the model with the Aruba CASAS-duration dataset, obtaining 95.90% in the recall, which indicates a high proportion of positive cases. This is a high detection rate for activities that were correctly identified. The 99.70% reached in the ROC area indicates that the model has very high predictive quality (see Table 18). In addition, there was a very low average detection rate of false positives with an FP rate of 0.60%. The average accuracy of 95.90% was also reached, which indicates that there is a high proportion of correct predictions, both positive and negative, in the total number of predictions. An F-Measure of 95.80% was also reached (see Table 23). Consequently, the model proposed in this research integrates the LMT classification technique with the One R feature selection technique, using only 33 of the 49 features available in the Aruba CASAS-duration dataset for human activity recognition: preparing meals (Meal_Preparation), resting (Relax), eating (Eating), working (Work), sleeping (Sleeping), going from bed to the bathroom (Bed_to_Toilet), getting home (Enter_Home), leaving home (Leave_Home) and cleaning (Housekeeping). Said data was collected from an indoor environment by the WSU (Washington State University) smart home project.
Finally, this research work makes two important contributions to the area of human activity recognition (HAR): firstly, the pre-processing of the original Aruba CASAS dataset provided by the WSU smart home project, which is available in an online repository with all its raw records. Finally, the identification of the classification and feature selection techniques that yield the best metrics by class criterion, is based on the construction of a model that evaluates said dataset.