Recognition of Daily Activities of Two Residents in a Smart Home Based on Time Clustering

With the development of population aging, the recognition of elderly activity in smart homes has received increasing attention. In recent years, single-resident activity recognition based on smart homes has made great progress. However, few researchers have focused on multi-resident activity recognition. In this paper, we propose a method to recognize two-resident activities based on time clustering. First, to use a de-noising method to extract the feature of the dataset. Second, to cluster the dataset based on the begin time and end time. Finally, to complete activity recognition using a similarity matching method. To test the performance of the method, we used two two-resident datasets provided by Center for Advanced Studies in Adaptive Systems (CASAS). We evaluated our method by comparing it with some common classifiers. The results show that our method has certain improvements in the accuracy, recall, precision, and F-Measure. At the end of the paper, we explain the parameter selection and summarize our method.


Introduction
The problem of aging in the world's population is becoming increasingly serious; meaning, the proportion of the aging population is increasing while the fertility rate continues to decrease. By the end of 2019, the number of elderly people worldwide exceeded infants and young children. The problems brought about by an aging population are not only political and financial; in the reality of human aging, brain power will gradually weaken, and symptoms, such as decreased memory and brain function, will occur. The elderly care problem brought about by this is becoming increasingly prominent. Some empty-nest seniors are old and frail, and there is no one to take care of them in emergencies, such as falls, which may cause irreparable losses.
To improve this problem, some researchers have begun to focus on elderly activity recognition in smart homes. Elderly activity recognition is an important part of the functioning of smart homes, mainly as it could determine abnormalities in the elderly by obtaining information about their daily activities. This can help predict some potential diseases in the elderly, such as Alzheimer's disease, which is also the main motivation for many activity recognition studies in intelligent environments (and has widely been recognized by families affected by Alzheimer's disease) [1].
According to different data collection methods, smart home activity recognition can be roughly divided into intrusive and non-intrusive design.
Intrusive design usually refers to video-based activity recognition. It mainly records the daily lives of the elderly through cameras, and then methods are used to analyze the video for activity recognition [2][3][4]. The effect of video-based activity recognition is excellent [5], but its disadvantages are also very prominent. First, it collects a lot of privacy and sensitive information. Second, it is susceptible to light. Finally, it is very expensive.
For non-intrusive design, it is mainly divided into wearable and environmental interaction activity recognition. Wearable devices mainly refer to miniature sensors, such as accelerometers and gyroscopes [6][7][8][9], which can be worn on the body. When older people wear them, they can collect information anytime, anywhere. However, wearing these devices can be a burden and cause resentment in the wearers.
With the development of the Internet of things technology [10,11], smart homes have become more complete, making activity recognition popular. This method mainly interacts with seniors by embedding sensors in objects in the smart home. These sensors are mainly divided into environmental sensors and binary sensors. Environmental sensors can collect environmental information in real time. Binary sensors can interact with seniors to locate them. This method is secretive and convenient, and will not cause resentment in the elderly.
In this paper we focus on environment-based interactive activity recognition. Many commonly used methods characterized by statistical sensor frequency ignore the temporal correlation of features. In order to improve the methods, we propose a data-driven activity recognition method. The method is divided into three stages: feature extraction, temporal clustering, and activity recognition. The method of the feature extraction proposes a de-noising method to remove interference from other residents. Generally it is believed that the daily activity of the elderly has a certain regularity, so we can cluster the activities in the close time. In the end, the similarity matching equation proposed in the paper is employed to calculate out the points with the most similarity, and select the classification by the votes of the points.
The structure of this paper is as follows: Section 2 introduces related work. Section 3 defines some terminologies. Section 4 describes our method. In Section 5, we validate our method and discuss it. Section 6 concludes the paper.

Related Work
Activity recognition based on non-intrusive design can be divided into data-driven and knowledge-driven, according to the differences of the used methods.
The knowledge-driven method extracts things, space, and time into certain rules in the domain, constructs them into a reusable context model and correlates with activities, and then uses inference and other technologies to determine the activity category. Ontology is often used in knowledge-driven methods. Chen et al. [12] proposed a formal explicit ontological modeling and representation of the smart home domain approach to the processing of multisource sensor data streams. Ye et al. [13] proposed a hierarchical structure of the domain concept ontology model to represent domain knowledge, which is independent of particular sensor deployment and activities of interest. Their subsequent researches [14] combined the ontology and statistical methods to automatically detect the boundaries of different activities and extended the Pyramid Match Kernel (PMK) to accommodate and balance the sensor noise in activity recognition. Ontology is also used for cross-environment activity recognition. Wemlinger et al. [15] proposed a Semantic Cross-Environment Activity Recognition (SCEAR) system to establish different ontology models for 22 data sets provided by CASAS to map between different environments. This method has a good recognition effect on two smart homes with similar layouts. The knowledge-driven method can clearly distinguish activities with significant semantic differences, but cannot identify well the activities with similar semantics.
Compared with knowledge-driven, data-driven models place more emphasis on the use of large-scale data for reasoning to build decision models [16]. Data-driven is mainly focused on supervised methods to recognize unlabeled activities by generating classifiers with labeled activities. Ravi et al. [17] proposed a naive Bayesian approach to activity recognition using accelerometers. Ruben et al. [18] proposed a resident adaptation technique based on Hidden Markov Models (HMMs). This system segments and recognizes six different physical activities using inertial signals from a smartphone. Asghari P. et al. [19] proposed an online hierarchical hidden Markov model to predict the activity in the environment with any sensor event. This method first uses the HMM model to recognize the beginning and end of an activity, and then predicts the next activity by establishing an HMM for each sensor event.
Some ensemble methods have also been applied in activity recognition. Hu et al. [20] proposed a novel separating axis theorem (SAT) based splitting strategy, then used it to improve the random forest. Anna et al. [21] used a previously developed Cluster-Based Classifier Ensemble [22] (CBCE) method for smart home-based activity recognition; this method proposes a support formula for clustering to solve the recognition problem in clusters. In addition, with the development of deep learning, neural networks have also been applied to activity recognition. Arifoglu D et al. [23] extracted fixed-length sliding windows into a sparse two-dimensional time matrix to use Convolutional Neural Networks (CNN) for activity recognition. This is basically the same as the work of Gochoo et al. [24], and both have been tested on the Aruba dataset (and the performance is roughly the same). However, Arifoglu et al. proposed a novel method for identifying abnormal activities, and discovered abnormalities, as well as identified and prevented abnormal activities by generating abnormal activities in the data set. Medina et al. [25] proposed a method using fuzzy time windows (FTW) to segment the data set, followed by Long Short-Term Memory (LSTM) for activity recognition.
There are also unsupervised and semi-supervised methods. The semi-supervised method mainly uses a small part of the labeled data to label a large amount of data; thereby, reducing the workload of the labeling activity. Hu et al. [26] proposed a cross-domain activity recognition (CDAR) algorithm to label another set of different but related activity from labeled activities. Wen [27] et al. proposed a similarity measurement formula that uses a small amount of labeled data to label a large amount of unlabeled data. The difficulty of the unsupervised method is the problem of data labeling. Researchers have now proposed some unsupervised methods to solve the problem of data annotation, such as frequent sensor mining methods [28], and frequent periodic pattern mining methods [29], activity modeling based on low-dimensional feature space [30], probabilistic model [31,32], and retrieval of activity definition, using Web mining [33].
Some multi-resident activity recognition based on smart homes has also been proposed. Hao et al. [34] proposed a knowledge-driven solution based on formal concept analysis (FCA) and sequential pattern mining to analyze the activity rules of different residents and recognize human activities in non-intrusive sensor data. Alemdar et al. [35] used the factorial hidden Markov model and nonlinear Bayesian tracking to recognize the behaviors of the two residents. They did not distinguish between the residents and achieved good results. Guo et al. [36] proposed a method to extract features using the improved Term Frequency-Inverse Document Frequency (TF-IDF). They used the improved TF-IDF to calculate the probability of the sensor appearing in the activity, recognized it as a new feature, and tested the method on the Tulum2009 and Cairo datasets. Lu et al. [37] used Back Propagation-Hidden Markov Model (BP-HMM) to extract daily activity features and SVM to recognize daily activity. They introduced the dependent Beta process into the HMM, and integrated the state constraints of the sensors into the sampling process. Finally, SVM is employed to recognize daily activities.
Many multi-resident activity recognition focuses on feature extraction. This paper proposes a clustering method based on the time pattern of elderly activity, then based on this, a similarity matching formula is proposed. This formula is based on Levenshtein Distance which is used in the stage of activity recognition model development, rather than the stage of daily activity feature extraction. We applied this method to the activity recognition of two residents and the results were very attractive. Although the cost of training will be higher, this method is better than a large number of single and ensemble methods.

Terminologies
To better express our method, we define some terminologies and use the activities in Table 1 as examples. Definition 1. se = (d, t, s, ss, ar, as) is a sensor event, where d is the date when the event occurred, t is the time when the event occurred, s is the activated sensor, ss is the sensor status, ar is the corresponding activity, and as is the activity status.
The sensor sequence of Bed_to_Toilet activity in Table 1   For example, the Bed_to_Toilet activity in Table 1

Definition 4.
We count the importance of other sensors in the activities that begin with a sensor and store them in the SIA. The algorithm is shown in Algorithm 1.

Methodology
Our method is mainly divided into two parts: feature extraction and activity recognition. The process is shown in Figure 1.  Figure 1. Process for the activity recognition.

Feature Extraction
The feature extraction method is shown in Algorithm 2. First, we extract the data set D = {se1, se2… sen} and extract it as A = {a1, a2… an}. Next, we need to de-noise sq in A. It is believed in a tworesidential residence, sensor events triggered by one resident may be disturbed by another resident, and these interferences can be removed by our method.

Feature Extraction
The feature extraction method is shown in Algorithm 2. First, we extract the data set D = {se 1 , se 2 . . . se n } and extract it as A = {a 1 , a 2 . . . a n }. Next, we need to de-noise sq in A. It is believed in a two-residential residence, sensor events triggered by one resident may be disturbed by another resident, and these interferences can be removed by our method.
if temp > w and s does not repeat: end for 10. Ar←∪{a.bt,a.et,ns} 11. end for 12. return Ar Our de-noising method has two steps: the first step is to remove duplicate sensor features. It means that if sq.s i is equal to sq.s i + 1 in sq, then only sq.s i is kept. In the second step, we propose a sensor importance measurement method. As shown in Definition 4, SIA counts the importance of other sensors for each sensor that appears at the beginning of the activity. If the importance exceeds a set threshold, we keep the sensor.
Through the above two steps of screening, the extracted feature Ar = {bt, et, sq} is finally obtained.

Activity Recognition
Our activity recognition method is to recognize the activities after extracting features. First, we cluster the activities with the Kmeans based on the bt and et in ar. The number of clusters k was selected using the elbow method. The core index of the elbow method is SSE (sum of the squared errors), the formula is: where C i is the i-th cluster, p is the sample point in C i , m i is the centroid of C i (mean of all samples in C i ), and SSE is the clustering error of all samples, which represents the quality of the clustering. The relationship between SSE and k is used to obtain k with the largest curvature in the graph. The specific algorithm is shown in Algorithm 3.
Cal←∪{(n, Val)} 10. end for 11. k←getCur(Cal)//Get k with highest curvature 12. return k After clustering, we first find the cluster that belongs to the input test data, and then calculate the similarity between the instances in the cluster and the test data. Here we propose a similarity matching method. If the test case is t = {bt, et, sq}, the instance in the training set is a = {bt, et, sq}, then the similarity between them is expressed as: where w 1 and w 2 refer to the weight represented by time and sensor sequence. Here we make 2 * w 1 +w 2 = 1, 24 refers to 24 h in a day. Levenshtein.ratio is a method to calculate the similarity of sequences, the formula is: where sum refers to the sum of the length of sq 1 and sq 2 , and ldist is the class edit distance. After obtaining the ratio, select the n closest instances in the cluster and let them vote for the label of test data. The complete recognition algorithm is shown in Algorithm 4.
The Cairo dataset was collected in the home of a voluntary adult couple. A couple and a dog live in the smart apartment. The couple's children also visited the house at least once. The sensor layout is shown in Figure 3.
The Cairo dataset was collected in the home of a voluntary adult couple. A couple and a dog live in the smart apartment. The couple's children also visited the house at least once. The sensor layout is shown in Figure 3.

Evaluation Method
In this section, we mainly applied all the examples in the considered dataset and performed five cross-validations to evaluate the performance of our method. During the process, it was performed five times. In each iteration, four folds are selected for training and one fold is used for testing. The final accuracy is the average of the five results. Except for the accuracy, in the paper we also used others to evaluate the results, such as the precision, the recall, and the F-Measures. We used the confusion matrix in Table 3 to represent the number of true positives, true negatives, false positives, and false negatives for the 2 classification problems. The columns and rows in the matrix refer to the actual and predicted classes by a classification model, respectively. Based on the confusion matrix the precision, recall and F-Measure for class 1 can be calculated as presented in Equations (4)-(7), respectively: F-Measure = 2*TP 2*TP+FP+FN (7)

Evaluation Method
In this section, we mainly applied all the examples in the considered dataset and performed five cross-validations to evaluate the performance of our method. During the process, it was performed five times. In each iteration, four folds are selected for training and one fold is used for testing. The final accuracy is the average of the five results. Except for the accuracy, in the paper we also used others to evaluate the results, such as the precision, the recall, and the F-Measures. We used the confusion matrix in Table 3 to represent the number of true positives, true negatives, false positives, and false negatives for the 2 classification problems. The columns and rows in the matrix refer to the actual and predicted classes by a classification model, respectively. Based on the confusion matrix the precision, recall and F-Measure for class 1 can be calculated as presented in Equations (4)-(7), respectively: The four measures were calculated for each class, taking into account the class imbalance problem; the final result is presented as the average of all classes.

Results and Evaluation
We first compare our results with several common single classifiers, such as K-Nearest Neighbor (KNN), lib Support Vector Machine (libSVM), Sequential Minimal Optimization (SMO), Naïve Bayes (NB), PIPPER, C4.5 and Random Forests (RF). In addition to these, we also compare several more complex classifiers. These classifiers are all implemented in Weka, and the specific results are shown in Tables 4 and 5. It can be seen from Tables 4 and 5 that our method and RF are the highest performance methods, and our method is better than RF in terms of performance. To better understand the performance of our method, we construct a confusion matrix. Tables 6 and 7 show the confusion matrices for Tulum2010  and Cairo. From Table 6, we can see that in the Tulum2010 dataset we have almost no errors in the recognition of B, B_T, M_P, P_H, S_B, W_B1, W_B2, and other activities. However, the recognition effect for E_H and L_H, E and W_T, W_TV, and W_L is a bit poor, which may be because these activities are represented by some of the same sensors compared to other activities, making them more difficult to recognize. For example, E_H and L_H both occur at the door, W_T and E both occur in the dining room, and W_TV and W_L both occur in the bedroom.
Compared with Tulum2010, the Cairo dataset has fewer activities. From Table 7, we can see that there are three misidentifications of N_W activity as B_T. This may be because the routes and the occurrence times of these three N_W are similar to B_T. However, it can be seen that the overall recognition effect of our method is still very good.

Selection of k
Our k-value selection method is to use the elbow method. Its formula is shown in Equation (1). The core idea of the elbow method is: as the number of clusters k increases, the sample division will become more refined, and the degree of aggregation of each cluster will gradually increase, so the SSE will naturally become smaller. Moreover, when k is less than the number of true clusters, the increase of k will greatly make the degree of aggregation of each cluster increasing, so the decline of SSE will be quick. When k reaches the true number of clusters, the increase of the aggregation degree obtained by increasing k will be slow, so the decline of SSE will decrease sharply, and then gradually flatten as the value of k continues to increase. In other words, the relationship between SSE and k is the shape of an elbow, and the value of k, this elbow is the true cluster number of the data. Of course, this is why the method is called the elbow method. Figure 4 shows the relationship between k and SSE in the Tulum2010 dataset. It can be seen that the curvature is the largest when k = 3, so we choose k = 3 as the optimal number of clusters. We believe that the choice of n value should be less than the number of kinds of activities in the data set, which makes our results more accurate. Equation (2) gives a detailed similarity formula. For w1 and w2, we always keep 2*w 1 ≤w 2 and 2*w 1 +w 2 =1. Figure 5 shows the relationship between n and accuracy when w1 = 0.15 and w2 = 0.7 in the Tulum2010 data set. As you can see, the fluctuation of accuracy is small.

Selection of n and w 1 , w 2 Values
We believe that the choice of n value should be less than the number of kinds of activities in the data set, which makes our results more accurate. Equation (2) gives a detailed similarity formula. For w 1 and w 2 , we always keep 2 * w 1 ≤ w 2 and 2 * w 1 +w 2 = 1. Figure 5 shows the relationship between n and accuracy when w 1 = 0.15 and w 2 = 0.7 in the Tulum2010 data set. As you can see, the fluctuation of accuracy is small.

Conclusions
This paper presents a daily activity recognition method based on time clustering for two residents in a smart home. First, noise reduction processing is performed on the features. Second, cluster is performed to separate activities that occur at the same space but at different times. Finally, a similarity matching formula based on Levenshtein Distance is proposed for daily activity recognition. The proposed method not only reduces the interference caused by the activities of different residents in daily life, but also separates the activities of residents at different times in the same space. We evaluated the proposed on two public datasets, Tulum2010. The results show that our method works well on large and small datasets.

Conclusions
This paper presents a daily activity recognition method based on time clustering for two residents in a smart home. First, noise reduction processing is performed on the features. Second, cluster is performed to separate activities that occur at the same space but at different times. Finally, a similarity matching formula based on Levenshtein Distance is proposed for daily activity recognition. The proposed method not only reduces the interference caused by the activities of different residents in daily life, but also separates the activities of residents at different times in the same space. We evaluated the proposed on two public datasets, Tulum2010. The results show that our method works well on large and small datasets.