Next Article in Journal
Monitoring Environmental Conditions in Airports with Wireless Sensor Networks
Previous Article in Journal
Directional Statistics in Solar Potential of Rooftops at Three Different Neighborhoods of a Medium Size City
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Proceeding Paper

Human Activity Recognition Using Binary Sensors, BLE Beacons, an Intelligent Floor and Acceleration Data: A Machine Learning Approach †

by
Jesús D. Cerón
1,
Diego M. López
1,* and
Bjoern M. Eskofier
2
1
Telematics Engineering Research Group, Telematics Department, Universidad Del Cauca, 910002 Popayán, Colombia
2
Machine Learning and Data Analytics Lab, Computer Science Department, Friedrich-Alexander University Erlangen-Nürnberg, 91052 Erlangen, Germany
*
Author to whom correspondence should be addressed.
Presented at the 12th International Conference on Ubiquitous Computing and Ambient Intelligence (UCAmI 2018), Punta Cana, Dominican Republic, 4–7 December 2018.
Proceedings 2018, 2(19), 1265; https://doi.org/10.3390/proceedings2191265
Published: 19 October 2018
(This article belongs to the Proceedings of UCAmI 2018)

Abstract

:
Although there have been many studies aimed at the field of Human Activity Recognition, the relationship between what we do and where we do it has been little explored in this field. The objective of this paper is to propose an approach based on machine learning to address the challenge of the 1st UCAmI cup, which is the recognition of 24 activities of daily living using a dataset that allows to explore the aforementioned relationship, since it contains data collected from four data sources: binary sensors, an intelligent floor, proximity and acceleration sensors. The methodology for data mining projects CRISP-DM was followed in this work. To perform synchronization and classification tasks a java desktop application was developed. As a result, the accuracy achieved in the classification of the 24 activities using 10-fold-cross-validation on the training dataset was 92.1%, but an accuracy of 60.1% was obtained on the test dataset. The low accuracy of the classification might be caused by the class imbalance of the training dataset; therefore, more labeled data are necessary for training the algorithm. Although we could not obtain an optimal result, it is possible to iterate in the methodology to look for a way to improve the obtained results.

1. Introduction

Human Activity Recognition (HAR) has been a field widely studied recently, however few works have taken advantage of the existing relationship between what we do (HAR) and where we do it (Indoor Localization) [1,2,3]. On the one hand, supporting the HAR in Indoor Localization (IL) makes it possible to increase the number and complexity of recognized activities, as well as their level of accuracy. On the other hand, supporting the IL on HAR allows increasing the accuracy of the location. Initially, the lack of studies taking advantage of this relationship happened due to the lack of adequate technology to obtain human movement and location data simultaneously in indoor environments, but nowadays it is already possible to collect this type of data by means of Inertial Measurement Units (IMU), smartphones, wearables and different types of ubiquitous sensors located in indoor environments. For this reason, recently some datasets containing data related to both HAR and IL have been collected and published in order to experiment with different approaches [4,5].
The dataset collected for the 1st UCAmI [6] joins that select group of datasets. This paper aims to perform the recognition of the 24 human activities included in the dataset of the 1st UCAmI cup. For this, an approach based on machine learning has been proposed following the Cross Industry Standard Process for Data Mining methodology (CRISP-DM) [7].

2. Methods

The proposed challenge is addressed from a machine learning approach. For that reason, the CRISP-DM was used. CRISP-DM is a free-use methodology and is one of the most used in data mining projects currently. It defines 6 phases: Business understanding, data understanding, data preparation, modeling, evaluation and deployment. Each of these phases is composed of different tasks. It is important to mention that the CRISP-DM life cycle is not linear. In the present work the deployment phase is not carried out, since it would aim to implement the models obtained in the previous phases in a real context, which is out of the scope of this paper. In this section the most relevant tasks of the first four phases of the methodology are described.

2.1. Business Understanding and Data Understanding

The context and the main requirements of the project are defined during the business understanding phase. As a result of this phase, both the data mining objective and how to measure compliance must be defined. It was considered that a level of classification accuracy greater than 80% would be desirable in order to implement the resulting model in a real context in the future, therefore, the objective of data mining of this work states: “perform the classification of the 24 activities included in the dataset of the 1st UCAmI cup with an accuracy of at least 80%”.
The identification and definition of the data that will be used in the project is one of the tasks pertaining to the phase of data understanding. The dataset of the 1st UCAmI cup was collected in the UJAmI SmartLab of the University of Jaén. It contains data of 24 activities that were performed by a person during 10 days. Data of each day was collected in three different sessions: morning, afternoon and evening. For each session, the dataset contains five comma-separated text files: one contains the labeled activities and the remaining four contain data related to the following data sources:
  • Event streams of binary sensors: 30 binary sensors were located in different parts of the SmartLab. They send a binary value such as Open-Close, Movement- No movement and Pressure-No Pressure with its respective Timestamp.
  • Spatial data from an intelligent floor: Capacitance data of each of the Smart Lab’s smart floor modules with their respective Timestamp
  • Proximity data between a smart watch worn by the inhabitant and Bluetooth Low Energy (BLE) beacons: 15 BLE beacons were located in different parts of the SmartLab. Their RSSI was collected at a sampling frequency of 0.25 Hz
  • Acceleration data from the same smart watch worn by the inhabitant: 3D acceleration collected with a sample frequency of 50 Hz.
The 24 activities included in the dataset, the 30 binary sensors and the 15 BLE beacons deployed in the SmartLab are listed in Appendix A. For the 1st UCAmI cup, the dataset was divided into two parts: part one contains labelled training data of seven days of recordings and part two contains unlabeled test data of three days of recordings. The objective of the competition is to propose an approach for the classification of the activities using data from part one of the dataset. The evaluation of the approach is done with part two of the dataset, for that reason, part one contains the file with the activities labeled, part two does not.
As described in the introduction section, previous studies have shown that HAR and indoor location are directly related. For that reason, the four data sources were used: data sources 1, 2 and 3 to obtain location information and data source number 4 to obtain information about body movement.

2.2. Data Preparation

In this phase, the data must be prepared in such a way as to allow the training and evaluation of classification algorithms that will be used in the next phase, the modeling phase.
The first step in this phase was to generate a file with data from the four data sources synchronized for each day. This was done matching the Timestamps, sample by sample, of the comma-separated files belonging to data sources 1, 2 and 3 with respect to data source number 4, which has the highest sampling frequency. In case of samples containing missing data, those values were established by calculating the average of the value on the 50 previous samples.
The next step was to perform a feature extraction process. This process starts with the segmentation of the synchronized data in segments of 5 s. Thus, each segment contains approximately 200 samples. Then, from each segment was obtained an example, which contains in total the following 87 features calculated from the data of each segment:
  • Binary sensors: 30 binary features (one for each sensor). If the status of a sensor is in Open, Movement or Pressure in any sample belonging to the segment, its corresponding feature has the value “1”, otherwise it will be assigned to “0”.
  • Intelligent floor: 40 binary features (one for each module). In the case that the capacitance of a module is greater than zero in any of the samples of the segment, its corresponding feature is “1”, otherwise it is “0”.
  • Proximity data: 4 categorical features that correspond to the ID of the nearest four BLE beacons.
  • Acceleration: 13 statistical features: mean, median, standard deviation and mean absolute deviation for each axis and the mean of the square roots of the sum of the values of each axis squared.
Once the examples for the complete dataset were obtained, each example belonging to part one of the dataset was labelled with its corresponding activity using the Timestamps included in the files that contain the label of the activities. In total, 4997 labeled examples were obtained for part one of the dataset, while 535 was the number of unlabeled examples belonging to part two of the dataset.

2.3. Modeling

In this phase, the modeling techniques are selected and applied and, if necessary, their parameters are calibrated to improve their results. Based on the results obtained in a previous work in which the classification of sedentary behaviors was performed using acceleration and proximity data with BLE beacons [8], the classification algorithms used in this work are: J48, Ib1, SVM, Random Forest (RF), AdaBoostM1 (ABM1), and Bagging. The last three are ensemble algorithms and J48 was set as their base classifier. To perform the training and evaluation of the models, the application developed in the context of the work presented in [8] was adapted. That application, written in JAVA, incorporates the Waikato Environment for Knowledge Analysis (WEKA) library [9]. The training of the algorithms to obtain the classification model was done with the 4997 examples belonging to the part one of the dataset and its evaluation was done with the 10-fold-cross-validation method. It is important to note that all the 87 features were used for modeling, that is, no feature selection process was done.
Taking advantage of the fact that the life cycle of the CRISP-DM methodology is not linear, it was found that merging the activities “prepare breakfast, lunch and dinner” into one activity and merging “breakfast, lunch and dinner” into another, the accuracy of the classification increased on average by 13%. The above makes sense since for those activities, both the body movement and the location have similar data; therefore, the classification algorithms are not able to distinguish between them. The strategy used for the final classification of each of the six activities mentioned above was made based on the corresponding recording session (Morning, afternoon and evening). In a system to be deployed in a real environment, rules based on the time of day and the sequence of these activities can be incorporated to make their classification.

2.4. Evaluation

In this phase, the results obtained during the modeling phase are analyzed and the best classification model is chosen according to the evaluation metrics proposed in the business-understanding phase. Below, in the results section, the results obtained in this phase are detailed.

3. Results

Table 1 shows the accuracy obtained by each algorithm using 10-fold-cross-validation on part one of the dataset.
The highest accuracy was achieved by the algorithm AdaBoostM1, therefore the model obtained with this algorithm was used for the classification of the examples of part two of the dataset.
Since the original labelling of the activities of the dataset is done in batches of 30 s, and the duration of the examples used in our approach was 5 s, the majority voting algorithm was configured in the application developed in such a way that the classification of the activity corresponding to every 30 s was the activity that appeared more frequently in the 6 resulting examples.
The resulting accuracy in the classification of the 535 examples belonging to part two of the dataset was 60.1% which means that 322 examples were correctly classified. The precision was 56%, the recall was 57.3% and the F-measure was 59.9%. Therefore, the objective of data mining proposed in phase 1 was not fulfilled. This implies that it is necessary to iterate in the CRISP-DM life cycle in order to improve that level of accuracy. In Appendix B, the confusion matrix obtained in this classification process is presented.

4. Discussion and Conclusions

Following the guidelines of the CRISP-DM methodology, an approach based on machine learning for the recognition of the activities of daily living included in the 1st UCAmI dataset was proposed. The tasks in each phase of the life cycle of this methodology allowed to have a well-defined work flow during the execution of this work. Considering that the accuracy achieved using 10-fold-cross-validation on part one of the dataset was 92.1%, the accuracy of 60.1% achieved by our approach on part two of the dataset was not as expected. This may have happened due to the class imbalance present in the part one of the dataset (with which the classification model was obtained). For example, only 11 and 78 examples are labeled with the activity ‘wash dishes’ and ‘Playing video game’ respectively, while other activities contain more than 100 examples. The previous hypothesis can be corroborated initially using an evaluation method like Leave-One-Day-Out on the complete dataset. This method would consist in training the classification algorithm with data of 9 days and evaluate it with the data of the remaining day. According to the results obtained in this experiment, it could be concluded if more labeled data are necessary for the training of the classification algorithm.
The confusion matrix presented in Appendix B shows that there were three activities in which there were many misclassifications: idle, Watching TV and Playing video game. ‘Idle’ represents spaces of time in which there is sensor data but no activity was labeled. This takes place especially at the beginning and end of the recording of each session, just before starting the first activity and finishing the last one. Therefore, the inclusion of ‘Idle’ for HAR in a real environment would not be necessary. In that scenario, the accuracy would increase to 62.77%. On the other hand, 76 of the 95 examples corresponding to the activity ‘Watch TV’ and 17 of 23 examples corresponding to the activity ‘Playing video game’ were erroneously classified as ‘Relax on the sofa’. Although one might think that the binary sensors would provide enough information to classify these activities correctly, the results obtained seem to demonstrate the opposite. As a future work, this case will be analyzed in depth to verify what happened and to raise any additional characteristic or rule that allows the correct classification of these activities.

Funding

This research was funded by the Colombian Administrative Department of Science and Technology (Colciencias) under the call 727-2015-National PhD programs, grant number 1061728514.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Binary sensor deployed in the SmartLab.
Table A1. Binary sensor deployed in the SmartLab.
Object
DoorKettleTrashCupboard cups
TVMedication boxTapDishwasher
Sensor Kitchen movementFruit platterTankTop WC
Motion sensor bathroomCutleryLaundry basketCloset
Motion sensor bedroomPotsPyjamas drawerWashing machine
Motion sensor sofaWater bottleBedPantry
RefrigeratorRemote XBOXKitchen faucet
MicrowavePressure sofaWardrobe clothes
Table A2. BLE beacons deployed in the SmartLab.
Table A2. BLE beacons deployed in the SmartLab.
Object
TV controllerFridgePyjama drawer
BookPot drawerBed
Entrance doorWater bottleBathroom tap
Medicine boxGarbage canToothbrush
Food cupboardWardrobe doorLaundry basket
Table A3. Activities included in the dataset.
Table A3. Activities included in the dataset.
ID ActivityNameID ActivityName
1Take medication13Leave the SmartLab
2Prepare breakfast14Visit in the SmartLab
3Prepare Lunch15Put waste in the bin
4Prepare dinner16Wash hands
5Breakfast17Brush teeth
6Lunch18Use the toilet
7Dinner19Wash dishes
8Eat a snack20Put washing into the washing machine
9Watch TV 49421Work at table
10Enter to the Smartlab22Dressing
11Play a video game 7823Go to the bed
12Relax on the sofa 20724Wake up

Appendix B

Idle123456789101112131415161718192021222324
Idle5100110001800100030001113
11900001000000000000100000
20090000000000000400000000
300041000000100000000000000
400001800000400000000000000
500300310000000000000000000
601010023000000000000000000
701003003600000000000000000
80000000000000000000000000
900000000016000000010000000
102002000000800000000000000
118000000000050000000000200
12000000001760170000050000020
131000000011000700010000200
140000000000000000000001000
1500000002000000010000000000
160000000000000000501000001
1700000200002000060190000000
180000000000010000004010000
190300000000000006000000000
200000000000000000000000000
2100000000000000000200035000
2240000000011000000110002100
238000000000000000001010130
2410000000000000000000000017

References

  1. Hardegger, M.; Nguyen-Dinh, L.-V.; Calatroni, A.; Tröster, G.; Roggen, D. Enhancing action recognition through simultaneous semantic mapping from body-worn motion sensors. In Proceedings of the 2014 ACM International Symposium on Wearable Computers (ISWC ’14), Seattle, WA, USA, 13–17 September 2014; pp. 99–106. [Google Scholar] [CrossRef]
  2. Loveday, A.; Sherar, L.B.; Sanders, J.P.; Sanderson, P.W.; Esliger, D.W. Technologies that assess the location of physical activity and sedentary behavior: A systematic review. J. Med. Internet Res. 2015, 17. [Google Scholar] [CrossRef] [PubMed]
  3. Ceron, J.D.; Lopez, D.M. Human Activity Recognition Supported on Indoor Localization and viceversa: A. Systematic Review. Stud. Health Technol. Inf. 2018, 249, 93–101. [Google Scholar]
  4. Hardegger, M.; Roggen, D.; Calatroni, A.; Tröster, G. S-SMART: A Unified Bayesian Framework for Simultaneous Semantic Mapping, Activity Recognition, and Tracking. ACM Trans. Intell. Syst. Technol. 2016, 7, 1–28. [Google Scholar] [CrossRef]
  5. Possos, W.; Cruz, R.; Cerón, J.D.; López, D.M.; Sierra-Torres, C.H. Open dataset for the automatic recognition of sedentary behaviors. Stud. Health Technol. Inf. 2017. [Google Scholar] [CrossRef]
  6. UJAmI Smart Lab Repository. Available online: http://ceatic.ujaen.es/ujami/en/repository (accessed on 15 September 2018).
  7. Chapman, P.; Clinton, J.; Kerber, R.; Khabaza, T.; Reinartz, T.; Shearer, C. CRISP-DM 1.0 Step-by-Step Data Mining Guide 2000. Available online: http://www.citeulike.org/group/1598/article/1025172 (accessed on 15 September 2018).
  8. Ceron, J.D.; Lopez, D.M.; Ramirez, G.A. A mobile system for sedentary behaviors classification based on accelerometer and location data. Comput. Ind. 2017. [Google Scholar] [CrossRef]
  9. Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, I. The WEKA data mining software: An update. SIGKDD Explor. 2009, 11, 10–18. [Google Scholar] [CrossRef]
Table 1. Accuracy of the classification using 10-fold-cross-validation on part one of the dataset.
Table 1. Accuracy of the classification using 10-fold-cross-validation on part one of the dataset.
AlgorithmJ48Ib1SVMRFABM1Bagging
Accuracy (%)8891.289.490.392.189.8
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Cerón, J.D.; López, D.M.; Eskofier, B.M. Human Activity Recognition Using Binary Sensors, BLE Beacons, an Intelligent Floor and Acceleration Data: A Machine Learning Approach. Proceedings 2018, 2, 1265. https://doi.org/10.3390/proceedings2191265

AMA Style

Cerón JD, López DM, Eskofier BM. Human Activity Recognition Using Binary Sensors, BLE Beacons, an Intelligent Floor and Acceleration Data: A Machine Learning Approach. Proceedings. 2018; 2(19):1265. https://doi.org/10.3390/proceedings2191265

Chicago/Turabian Style

Cerón, Jesús D., Diego M. López, and Bjoern M. Eskofier. 2018. "Human Activity Recognition Using Binary Sensors, BLE Beacons, an Intelligent Floor and Acceleration Data: A Machine Learning Approach" Proceedings 2, no. 19: 1265. https://doi.org/10.3390/proceedings2191265

Article Metrics

Back to TopTop