A Two-stage Method for Solving Multi-resident Activity Recognition in Smart Environments

To recognize individual activities in multi-resident environments with pervasive sensors, some researchers have pointed out that finding data associations can contribute to activity recognition and previous methods either need or infer data association when recognizing new multi-resident activities based on new observations from sensors. However, it is often difficult to find out data associations, and available approaches to multi-resident activity recognition degrade when the data association is not given or induced with low accuracy. This paper exploits some simple knowledge of multi-resident activities through defining Combined label and the state set, and proposes a two-stage activity recognition method for multi-resident activity recognition. We define Combined label states at the model building phase with the help of data association, and learn Combined label states at the new activity recognition phase without the help of data association. Our two stages method is embodied in the new activity recognition phase, where we figure out multi-resident activities in the second stage after learning Combined label states at first stage. The experiments using the multi-resident CASAS data demonstrate that our method can increase the recognition accuracy by approximately 10%.


Introduction
Activity recognition has appeared as an Ambient Intelligence (AmI) feature to facilitate the development of applications that are aware of users' presence and context and are adaptive and responsive to their needs and habits.Such assistive technologies have been widely adopted in smart home systems and healthcare applications involving only a single resident [1,2], and have delivered promising results for offering thoughtful services to an elderly resident [3,4,5], and providing responsive help in emergencies [6].Now the interest in designing smart environments that can reason about multiple residents is growing.However, earlier work includes finding out global behaviors and preferences of a group residents [7,8] and little is focused on recognizing the activity of every resident in a multi-resident environment.The focus of our study is how to recognize individual activities in multi-resident environments.
Moving from a single resident to multiple ones, many researchers often resort to asking residents to carry wearable devices [9] or capturing their behavior with video cameras [10].However, this might erode the human-centric advantages of AmI since wearable sensors may cause inconvenience, and camera-based solutions may raise residents' privacy concerns.Therefore some researchers have preferred to deploy non-obtrusive and pervasive sensors such as pressure sensors, reed switches, motion sensors and temperature sensors for tuning activity reasoning in a multi-resident environment.
For multi-resident activity recognition using non-obtrusive sensors, most of the related studies have involved data association, i.e., associating sensor data with the right person.In [11], Wilson and Atkeson first introduced the concept of data association and pointed out that we need some way of determining which occupant generated what observation.Most recent work also looked into the benefits of data association in activity estimation.Singla et al. [12] first modeled the independent and joint activities among multiple residents using a single Hidden Markov model (HMM), then with manual data association, they modeled one HMM for one resident.Based on the Conditional Random Field (CRF), Hsu et al. [13] proposed an iterative procedure to train and infer the activities and data association in turns, and their empirical studies proved that good data association can greatly improve the accuracy of multi-resident activity recognition.Provided with data association, Chiang and Hsu [14] improve the accuracy of activity estimation by modeling interactions among residents.Wang et al. [15] proposed using Coupled Hidden Markov Model (CHMM) and Factorial Conditional Random Field (FCRF) to model interaction processes.
Besides data association in training activity recognition models, however, previous methods either need or infer data association when recognizing new multi-resident activities based on new observations from sensors.For the training dataset, it is easy to find out data associations because which resident is carrying out activity is known at each moment.For new observations from sensors, it is hard to find out the data association because we don't know which resident is carrying out the activity.The problem is that activity estimation in a multi-resident context greatly degrades when data association is not available or is induced in a low quality [12,13].Learning data association is interesting but not easy, and even harder with unreliable data signaled by non-obtrusive and pervasive sensors, and also it may vary from application to application, so there is not such a strong underlying association for recognition algorithms to use.
In the smart homes setting, there are activities that can be done independently, jointly or exclusively, so we believe that there is some simple knowledge in activities (i.e.pattern, global features and trends) that is invariant and easy to set up in the multi-resident context.For instance, two residents may be used to carrying out activities collaboratively (playing chess), one may do one activity when the other is doing another (resident B is used to sleeping when resident A is watching TV), or one can exclusively carry out some activity (resident B must do other thing when a computer is used by resident A, since there are only one computer in the home).Data association can be a kind of simple knowledge, but not the other way around in that simple knowledge does not indicate the association between sensor data and a particular person.Considered seriously, the so-called "simple knowledge" is beyond data association by expressing something like "what is done together", and "what is done individually"; it aims to exploit some properties of multi-residents' behavior, either spatial or temporal, which are handy and widely valid in most smart environments.
To exploit simple knowledge in multi-resident activities, we define one Combined label and use the states to signify two or more activities that are performed simultaneously and independently by multiple residents.By learning the states of the Combined label, we can gain the global features and trends of multi-resident activities, and our goal is to find the most likely Combined label states sequence that could have generated the original sensor event sequence and then use the Combined label states to figure out multi-resident activities.In doing so, this paper proposes a two-stage method for multi-resident activity recognition to improve the performance.There are two phases in our model defined as model building phase and new activity recognition phase.Our two stages method embodied in the new activity recognition phase.We build and train activity recognition model at model building phase using training dataset.At new activity recognition phase, our method learns Combined label states at first stage, and then maps the Combined label states to multi-resident activities at the second stage.
Compared to previous methods, our method builds activity recognition models using a training dataset with the help of data association, while does not need data association when recognizing new multi-resident activities based on new observations from sensors.The paper is organized as follows: Section 2 will introduce our two-stage method for multi-resident activity recognition in detail.Then, we will verify our method in the Section 3. Finally, we summarize this paper in Section 4.

Two-Stage Method for Multi-Resident Activity Recognition
In this section, we begin with the problem statement, and give some definitions.After that, two typical activity recognition models like HMM and CRF are briefly introduced.Finally, we propose our two-stage multi-resident activity recognition method.

Problem Statement
Activity recognition is often done with a learned state space model with hidden variables to infer a multi-resident activity sequence {y 1 , y 2 , …, y T } from an observation sequence {x 1 , x 2 , …, x T }, where y t , t = 1, 2, …, T, are activity label vectors with the dimension equal to the number of residents and x t , t = 1, 2, …, T, are observation vectors with the dimension equal to the number of sensors.

Some Definitions
Definition 1 (Single-label Problem) Single-label learning is phrased as the problem of finding a model that maps inputs x to a scalar output y.
Definition 2 (Multi-label Problem) Multi-label learning is phrased as the problem of finding a model that maps inputs x to a vector y, rather than a scalar output.
For an observation sequence x 1:T = {x 1 , x 2 , …, x T }, it is a Single-label Problem if there is a sequence of labels y 1:T = {y 1 , y 2 , …, y T }, where y t , t = 1, 2, …, T, are single-label vectors that best explains the sequence, and it is Multi-label Problem if there is a sequence of labels y 1:T = {y 1 , y 2 , …, y T },where y t , t = 1, 2, …, T, are multi-label vectors that best explains the sequence.
Multi-resident activity recognition is a Multi-label Problem if we regard one resident as one label and regard one activity as one state.The activities label for two residents at time step t is a twodimensional label and denoted as 1 2 ( , ) , where , 1,2 i t y i  is the activity ID at time step t that triggered the signals and performed by the i-th resident.
In the above two resident scenario, if we count all activities of two residents that can happen concurrently, we can get 4 × 4 = 16 different two-dimensional label vectors which are included in set {(0, 0), (0, 1), …, (3, 3)}.However, due to the fact some activities do not occur concurrently in real life, there are only seven two-dimensional label vectors, and some of them are the same.For example, a person cannot perform "using computer" if the computer is occupied by another person and the combined state denoted that two persons perform 'using computer' simultaneously is illegal. To where the i-th row is the i-th State Event, the j-th column represents the state of j-th label, and T is the State Event number which equals to the total training data number.Definition 5 (State Event Set A) where K is the total number of different State Events and all State Event can be found in set A.
According to the definition of M, the State Event Matrix M1 of the above multiple-resident scenario can be expressed as: As M1 shows, the second State Event and the third State Event are the same, the 5-th State Event and 6-th State Event are the same, and all State Event are included in the State Event Set A1 = {(0, 0), (1, 0), (3, 0), (1, 1), (3, 3)}.
To represent the possible states of all individual labels simultaneously, we define a Combined label C and a mapping f as follows. Definition where 1 2   m ( , , , ) In our multi-resident scenario, the State Event Set is A1 = {(0, 0), (1, 0), (3, 0), (1, 1), (3, 3)}, the Combined label state set is B1 = {0, 1, 2, 3, 4}, and the mapping between A1 and B1 is clearly listed as Figure 1.(0,0) The mapping between State Event Matrix M1 and states set B1 is defined as: Representing the possible states of all individual labels at the same time as Combined label states C, we can get Figure 2b instead of Figure 2a and the states of y 1 , y 2 can be inferred on the basis of the states of C. For example, in our multi-resident scenario, when C = 1, there are y 1 = 1, y 2 = 0, and when C = 3, there are y 1 = 1, y 2 = 1.Therefore, the multi-label problem is converted to a single-label problem.Thus, multi-resident activities can be recognized by finding out the states of Combined label C and backward reasoning State Event based on inverse mapping 1  f  , and then find the state of individual labels which present activities of single resident in our case.Apparently, given the mapping from Combined label states set to State Event set, above method can recognize multi-resident activities base on Combined label states.
If there is a need to know the data association, i.e. tracking the resident, we can also find out data association based on the states of Combined label C. For example, given C = 1, we can figure out y 1 = 1, y 2 = 0, where y 1 = 1 represents that the first person is carrying out the first activity while y 2 = 0 represents the second person do not trigger signals at this time step.Because the first state is non-zero, we say that the first person triggered the signals and generated the observations.

Hidden Markov Model
The Hidden Markov Model (HMM) is a generative probabilistic model and is the development of the Naive Bayes in sequence direction.The model consists of a hidden variable y and an observable variable x at each time step.There are two dependency assumptions in HMM.Firstly, the hidden variable at time t, namely y t , depends only on the previous hidden variable y t-1 .Secondly, the observable variable at time t, namely x t , depends only on the hidden variable y t at that time slice.Figure 3 is graph structure of HMM.
Thus, HMM can be modeled using three probability distributions: initial state distributions p(y 1 ) , transition probability distribution p(y t ⏐y t-1 ) and observation distribution p(x t ⏐y t ).
Given a labeled dataset {(x t , y t )}, t = 1, 2, …, T, we can calculate the initial state distributions , K, which represents the belief about which state the HMM is in when the first sensor event is seen.For a state (activity) a, this is calculated as the ratio of instances for which the activity label is a.The transition probability a ij = p(y t = j⏐y t-1 = i), i, j = 1, 2, …, K, signifies the likelihood of transitioning from a given state to any other state in the model and captures the temporal relationship between the states.For any two states a and b, the probability of transitioning from state a to state b is calculated as the ratio of instances having activity label a followed activity label b, to the total number of instances.The observation distribution is factorized as: where each sensor observation is modeled as an independent Bernoulli distribution, given by: 1 ( ) ( ) ( 1) where ) is calculated by finding the frequency of the nth sensor event as observed for each activity.

Conditional Random Field
A Conditional Random Field (CRF) is an undirected graphical model and the main idea of the model is that of a maximum entropy model.CRF models the conditional probability of a particular label sequence Y given a sequence of observations X [17].Figure 4 is graph structure of a linear-chain CRF (LCRF).Like other undirected graphical models, LCRF is a discriminative model where the relations between two connected nodes are represented as potentials which are not like a probability restricted to a value between 0 and 1. LCRF is commonly trained by maximizing the conditional likelihood of a labeled training set to estimate the weight vector .The log conditional likelihood is defined as: The feature function f k (x, y t-1 , y t , t) is either a state function s k (y t, x, t) or a transition function t k (y t−1 , y t , x, t).State functions s k depend on a single label state and observations in the model while transition functions t k depend on pairs of label states.For each state functions, there are: For each label pair ' '' ( , ) y y , the transitions function t k is defined as: ) ) The gradient of Equation ( 10) Error!Reference source not found.isdefined as: Given a function and its gradient, training the model becomes a matter of numerical optimization.In the case of LCRF, the objective function is convex and first order methods such as gradient ascent are directly applicable, although in practice more efficient algorithms such as conjugate gradient offer better performance.In addition to first order methods, an approximate second order method, limited memory BFGS [18] can also been used.For a novel observation sequence x = {x 1 , x 2 , …, x T1 }, the sequence of activities that best fits to the data is found using the Viterbi algorithm [16].

Two-Stage Method for Multi-Resident Activity Recognition
The flow charts of our multi-resident activity recognition method are shown in Error!Reference source not found..The top is the model building flow chart and the bottom is the activity recognizing flow chart.There are two phases in our model defined as model building phase and new activity recognition phase, while our two stages method is embodied in the new activity recognition phase.In the new activity recognition phase, we learn Combined label states at the first stage, and figure out the multiresident activities from Combined label states at the second stage.
Based return Table 3 is the procedure of our two-stage method for recognizing multi-resident activities with HMM (TSM-HMM) and with CRF (TSM-CRF) .
Step 1. Model the State Events of training data based on multi-labels y 0 , cluster and get State Event Set A.
Step 2. Define Combined label C and the corresponding state set, build mapping f and map State Events of every training data as Combined label states C 0 using f.
Step 3. Train activity recognition models such as HMM and CRF with x 0 and C 0 , and finally get model parameters.
Step 4. Infer Combined label states C 1 use trained model and testing observation x 1 .
Step 5. Backward reasoning using

Validation
This section is organized as follows: after introducing the dataset and giving some measurement criteria, this section will carry out two experiments.

Dataset Preparation
In this section, we will validate our two-stage method for multi-resident activity recognition using a dataset used in literature [12] and collected in the CASAS project in the WSU smart apartment Testbed where two residents are performing activities of daily living (ADL).
Sensor: The Testbed is equipped with motion and temperature sensors as well as analog sensors that monitor water and stove burner use (see Figure 6).The motion sensors are located on the ceiling approximately one meter apart to locate the resident, the Voice over IP (VOIP) technology captures phone usage and switch sensors to monitor usage of the phone book, a cooking pot, and the medicine container.Activities: There are 26 files in this dataset and there are approximately 400 to 800 sensor events in each file.Each file corresponds to one pair participants who perform fifteen activities in the apartment.The data represents the observation of two residents asked to perform 15 activities, which are listed as follows: (1) Filling medication dispenser: Fill medication dispenser in the kitchen using items obtained from the cabinet.Return items to the cabinet when done.don't occur at all in this dataset, some happen rarely, while some occur frequently.This is because there are 15 activities that can be detected in ideal conditions for each resident at a time step.So there are 16 labels for each resident (including label 0 which means that the resident performed unknown activity but did not trigger the signals).For two residents, there are 16 × 16 = 256 two-dimensional labels theoretically, but there are only 27 different two-dimensional labels in the CASAS dataset, since some concurrent activities don't occur at all in reality.

Measurement Criteria
In the first experiment, we use the measurement criteria used in [12] to validate our two-stage method for multi-resident activity recognition in the first experiment, where accuracy is defined by comparing correctly labeled activities with total activities and the average indicates that it is averaged across all possible activities.
Activity recognition is a multi-label classification problem in essence, so we choose the corresponding measures [19] to evaluate the performance of our models in the second experiment.Quality of the overall classification is assessed in two ways: Macro-averaging and Micro-averaging.We use Macro-averaging because it treats all classes equally and take β = 1 which weights recall and precision evenly.The multi-label classification measurement criteria are list as follows: , 1, , where l is the total cluster number, tp i is the number that correctly recognized as the i-th class (true positives), tn i is the number that correctly recognized but do not belong to the i-th class (true negatives), and fp i is the number that incorrectly recognized as the i-th class (false positives) while fn i is the number that incorrectly recognized but do not belong to the i-th class (false negatives).

Experiment 1
This experiment will carry out the two-stage method with threefold cross validation and compare the result with previous method.For training dataset, we map multi-resident activity labels as Combined label states C 0 after defining Combined label state C and building mapping f and Then, we train HMM and CRF with training sensors and Combined label states.For testing the dataset with only testing sensors, we first estimate Combined label state using the trained model.The average accuracies of Combined label states for HMM and CRF are 65.46% and 67.61%, respectively, which represent not the multi-resident activity recognition accuracy but the multi-resident activity knowledge recognition accuracy.
To get multi-resident activity labels, we map Combined label states to multi-resident activity labels with the built mapping 1 f  .Figure 7 gives the average recognition accuracies of four models in recognizing multi-resident activities.Singla et al. [12] gave an average accuracy of 60.60% by a single HMM and Hsu et al. [13] gave an average accuracy of 64.16% by Iterative CRF.However, TSM-HMM gets an average accuracy of 75.77% and TSM-CRF obtains an average accuracy of 75.38% which are apparently higher than the single HMM and the Iterative CRF.TSM-HMM and TSM-CRF for multi-resident activity recognition outperform single HMM and Iterative CRF and gets good recognition results for the following reasons.Firstly, the previous single resident activity recognition model is not suitable for multi-resident activity recognition.In HMM implementation, a single model is implemented for both residents.The model not only needs to represent transitions between activities performed by one person, but also needs to represent transitions between residents and transitions between different activities performed by different residents.Secondly, the activity recognition using Iterative CRF depends on the recognition accuracy Table 7 is the Combined label states in the test dataset and corresponding multi-resident activity labels.Figure 8 shows the recognition accuracies of Combined label states.Figure 9 is the recognition accuracies of multi-resident activities.As they show, the accuracies of activities 1-5 are the same as Combined label states 9, 1, 2, 10, 3.This is because those activities are performed separately by one resident.The activity 12 is performed separately by one resident, but the accuracy is higher than Combined label state 17.The activity 15 is only existed in Combined label state 20, but the accuracy is higher than Combined label state 20.This is because our method can capture the knowledge in the multi-resident activities.

Conclusions
Multi-resident activity recognition is an important yet challenging problem when allowing elderly people to be better assisted with context-aware services.In order to reason about multi-resident activities, this paper exploits some simple knowledge about multi-resident activities by defining a Combined label and the state set, and proposes a two-stage activity recognition method for multi-resident activity recognition after the typical activity recognition model is build.The simple knowledge we use is not the same as the data association in that it encodes spatial and temporal constraints without indicating which occupant generated what observation.This method converts multi-label problems into single-label problems by treating multi-resident activities at the same moment as a Combined label state and can capture global activity features and trends of multi-residents from sensor data in smart environments.
We validated the algorithm by recognizing multi-resident activities from the CASAS dataset.In the first experiment, we compared our two-stage method with previous multi-resident activity recognition methods.The results show that our method can recognize multiple-resident activities with higher accuracy.In the second experiment, we compared TSM-HMM and TSM-CRF in terms of multi-label classification measurement criteria, and also analyzed the relations between Combined label states and multi-residents activities.Any contribution to multi-resident activity recognition will bring us to live better.In future, we will build models to learn more about global activity features and trends, and then use them to recognize complex activities in multi-resident environments.

Figure
Figure 2．Graph structure for two-label problem (a) and graph structure for individual label problem (b).

Figure 5 .
Figure 5.The flow charts of our multi-resident activity recognition method.

1 f
 and get estimated State Event for every testing observation.Step 6. Figure out activity labels y 1 based on the estimated State Event.

( 2 )
Hanging up clothes: Hang up clothes in the hallway closet.The clothes are laid out on the couch in the living room.(3) Moving furniture: Move the couch and coffee table to the other side of the living room.Request help from other person.(4) Reading magazine 1: Sit on the couch and read a magazine.(5) Watering plants: Water plants located around the apartment.Use the watering can located in the hallway closet.Return the watering can to the closet when finished.

Figure 7 .
Figure 7. Average recognition accuracies of four models in recognizing activities for multi-residents.

Figure 8 .
Figure 8.The recognition accuracies of Combined label states.

Figure 9 .
Figure 9.The recognition accuracies of multi-resident activities

This is dictated by two reasons: first, for the training dataset, we need the simple knowledge to build State Event Matrix to create the State Event Set and to build the activity recognition model offline; second, the State Event Matrix is built based on inverse mapping from the Combined label states to the State Event Set where Combined label
states are inferred by the learned activity recognition model and State Event Sets are learned from the training dataset.
State Event is multi-dimensional vector which represent the possible states of all individual labels at the same time step.In multi-resident scenario, we use State Event to present the multi-resident activities at the same time step, and State Event 1 2 present the activity of the i-th resident at time step t.Definition 4 (State Event Matrix M) For any observation x t with m different labels, we define State Event Matrix M as: reduce the complexity of labeling, we introduced concept of State Event, State Event Matrix M and State Event Set A. Definition 3 (State Event) For any observation x t with m different labels, we define State Event as t y is the state of the i-th label at time step t.Therefore, we can map State Event at time t to the states set B as:

Table 1 .
on the training dataset, we first build the State Event Matrix using multi-resident activity labels (Definition 4).Then, we create State Event Set (Definition 5), define Combined label and the corresponding Combined label state set (Definition 6), build the mapping f from State Event Set to Combined label state set (Definition 7).Finally, we obtain Combined label states from the State Event Matrix using the mapping f.After mapping multi-resident activities in the training dataset to the states in the Combined label state set, typical activity recognition models (e.g., HMM and CRF) will be built and train based on the training dataset.When new sensors are coming, we will infer Combined label state for every observation using built typical activity recognition model firstly, and then inversely map Combined label states as State Event Matrix using 1 f  .State Event Matrix is composed of State Event which represents the multi-resident activities at the same time step.For example, State Event 1 2 So from State Event Matrix we can obtain multi-resident activities easily.Algorithm 1.
t y presents the activity of the i-th resident at time step t.

generation process of State Event Matrix for the training dataset Input:
Multi-resident activity label: L = { There are two kinds of State Event Matrixes: the State Event Matrix corresponding to the training dataset and the State Event Matrix corresponding to the testing dataset.The State Event Matrix corresponding to the training dataset is defined by multi-resident activity labels where data association is given.The State Event Matrix corresponding to the testing dataset is obtained by backward reasoning Combined label states to State Event Set based on inverse mapping 1 f  , where Combined label states are inferred based on observations from sensors using the trained activity recognition model.Algorithm 1 (in Table 1) and Algorithm 2 (in Table 2) are the automatic generation processes of the State Event Matrix that correspond to the training dataset and the State Event Matrix that corresponds to the testing dataset, where m is resident number, T is samples number in the training dataset and T1 is observations number in the testing dataset.

generation process of State Event Matrix for the testing dataset Input:
Observations sensor in the testing dataset: x = {x t }, t = 1,…,T1;

Table 4 .
The occurrence counts of different two-dimensional labels.

Table 7 .
Combined label states in the test dataset and corresponding multi-resident activity labels.