Real-time Recognition of Interleaved Activities based on Ensemble classifier of Long Short-Term Memory with Fuzzy Temporal Windows

In this paper, we present a methodology for Real-Time Activity Recognition of Interleaved Activities based on Fuzzy Logic and Recurrent Neural Networks. Firstly, we propose a representation of binary-sensor activations based on multiple Fuzzy Temporal Windows. Second, an ensemble of activity-based classifiers for balanced training and selection of relevant sensor is proposed. Each classifier is configured as a Long Short-Term Memory with self-reliant detection of interleaved 5 activities. The proposed approach was evaluated using well-known interleaved binary-sensor datasets comprised of activities of daily living.


Introduction
Globally, the population demographics are gradually shifting from younger to older age groups meaning elderly care is becoming unsustainable [1]. In order to relieve some workload from carers, whilst encouraging older adults to remain independent at home, assistive systems have been developed [2]. Some of these systems are capable of identifying simple to complex human activities, as well as the context in which they occur, from sensor data; this is currently a core aspect of smart assistive technologies [3]. Activity recognition assists with identifying the tasks being carried out and can determine whether the occupant has any difficulties with completing tasks or daily activities [4]. When tasks and scenarios become more complex, they are referred to as interleaved activities [5].
In this paper, we address the real-time recognition of interleaved activities. Real time refers to the recognition of activities while they are taking place; new sensor events are recorded, whilst streaming, without including explicit information on the evaluations labelled time interval [6]. In this way, the methodology presented in this work faces two key problems: (i) learning from activities which are developed in any order, interweaving and performing tasks in parallel if desired [7]; and (ii) recognizing activities in real time, without including explicit information on the future sensor events [8].
Activity recognition is the process of retrieving high level knowledge about activities and occurrences taking place in an environment such as a smart home; whilst also learning about the behaviour of those present in the environment [9].
Real-time activity recognition is a challenging area of smart home technology [8]. The main difficulty with real-time activity recognition approaches is the ability to correctly define the size of the temporal window to allow effective recognition of activities [6,10]. The main concern of including a single sliding window is that the more sensor events from the past are included, the more noise the data representation includes on the model [11]. In this work, the use of multiple temporal windows and fuzzy aggregation methods are proposed to enable the long and middle-term evaluation of sensors.
Furthermore, daily activity datasets suffer from a severe class imbalance problem [12,13], which are presented when their classes are not equally represented [14]. We balance the training dataset for each activity classifier in order to solve the imbalance problem within datasets.
In the context of interleaved activities, a scarce number of approaches face this complex problem. In [15] authors proposed a multi-layer model for activity recognition, using RFID technology and appliance signatures to identify errors related to cognitive decline from daily activities. Focusing on morning routine activities, they carried out activity recognition in real time through RFID based localization and through the use of electrical sensors. It was found that using multiple sensors increased the accuracy of recognition, however, interleaved activities were not considered. In [16], authors addressed this challenge using a data set collected from an elderly person living alone by means of an event-driven approach [6]. It was found that although they were able to effectively distinguish concurrent activities, more event factors could have been used for better accuracy in results, as only the timing and the sensor were taken in to account when the location could have been added.
In [17], authors present an approach through ontological and probabilistic reasoning, which requires a relevant knowledge engineering effort to define a comprehensive ontology of activities, home environment, and sensor events. This work does not include real-time evaluation presenting a results based on sensor events of 81%.
In this work, we face to novelty challenge of the real-time recognition of interleaved activities, which present a hard problem due to: (i) the evaluation in real-time while the activities are being developed without including future information of sensors; and (ii) the concurrence and disorder of activities performed by the participants.
The remainder of the paper is structured as follows: the Section 2 details an overview of the methodology proposed. In Section 3, we highlight the experiments performed, of which are discussed in Section 4. Finally, the Conclusions and Ongoing Works are presented in Section 5, providing a critique of the study overall and proposing plans for future work.

Methodology
In this section, we detail the proposed methodology for recognizing interleaved human activities in real-time, using an ensemble classifier of Long Short-Term Memory, with Fuzzy Temporal Windows. It is based on a previous methodology for sequential learning of activity recognition [18]. This work is based around the following: (i) the concurrent activation in a parallel way of the ensemble of activity-based classifiers to provide a suitable interleaved response; and (ii) the computing of relevant sensors, which are filtered for improving the learning capabilities.
In summary, the proposed methodology for real-time recognition of interleaved activities is focused on three key points: • A fuzzy temporal representation of long-term and short-term activations, which define temporal sequences.

•
An ensemble of of activity-based classifiers, which are defined by the suitable sequence classifier: Long Short-Term Memories (LSTM) [19].

•
Balanced learning for each activity-based classifier, to avoid the imbalance problem that suffers daily activity datasets [12,13]. It is optimized by the similarity relation between activities which: (i) determines the adequate samples within the training dataset, based on the similarity with activity to learn; and (ii) filters the relevant sensors to take into account in the learning process.
In Figure 1, we show the scheme of the methodology proposed.

Representation of Binary Sensors and Activities
A set of binary sensors is represented by S = {S 1 , . . . , S |S| } and a set of daily activities is represented by A = {A 1 , . . . , A |A| }, where |S| and |A| are the number of sensors and daily activities respectively. They are described by a set of binary activations within a set of ranges of time, which are defined by a starting and ending point of time by Equation (1): where (i) |S i |, |A i | is the total number of activations for a given binary sensor S i and daily activity respectively; and (ii) S 0 is the starting and ending point of a given time of activation.

Segmentation of Dataset in Time-Slots
We generated a segmented timeline defined by time-slots (also known as time-steps), which indicate the activation of activities and sensors for a given in a time interval of fixed duration ∆t. The range for evaluating each time-slot t i is defined by a sliding window between [t i , t i + ∆t].
For each time-slot and a given sensor we determine its activation based on if it has been activated within it: In a similar way, for define the activation of an activity a in a time-slot t i ,: Each sensor or activity is represented as a set of activation by ordered time-slots S(s) = {S(t 0 , s), . . . , S(t n , s)}. For sake of simplicity, we call extensively t + to a time-slot t i in the timeline T.

Sensor Features Defined by Fuzzy Temporal Windows
In this Section, a binary-sensor representation approach based on fuzzy temporal windows (FTWs) is detailed. FTWs are therefore described from a given current time t * to a past point of time t i as a function of the temporal distance ∆t * [20]. For that, a given FTW T k relates the sensor activation S(s, t i ) in a current time t * to a fuzzy set T k (∆t * i ), which is characterized by a membership function µT . Firstly, for a given FTW T k and the current time t * , each past sensor activation St i , s is weighted by calculating the degree of time-activation within the fuzzy temporal window T k according to Equation (4).
Secondly, the degrees of time-activation are aggregated using the t-norm operator in order to obtain a single activation degree of both fuzzy sets S(s) ∩ T k by Equation (5).
We propose using the maximal and minimal operators as t-norms, which are recommended for representing binary sensors [21].

Sequence Features of FTW
The representation of sensor activation based on FTWs is used to define a sequence for the purposes of classification. It has been proposed that FTWs of incremental temporal sizes are defined, to collect the long-term to the short-term temporal activations.
Each TFW T k is described by a trapezoidal function based on the time interval from a previous time t i to the current time t * : T k (∆t * i )[l 1 , l 2 , l 3 , l 4 ] is described by a fuzzy set characterized by a membership function whose shape corresponds to a trapezoidal function. The well-known trapezoidal membership functions are defined by a lower limit l 1 , an upper limit l 4 , a lower support limit l 2 , and an upper support limit l 3 (refer to Equation (7)): To generate FTWs in a simple manner, we propose to define them from a set of incremental ordered times of evaluation L = {L 1 , . . . , L |L| }, L i−1 < L i , which the limits of the trapezoidal functions are calculated regarding to the index of the temporal window T k .
In order to define the FTWs, we purpose incremental FTWs straightforwardly defined by the Fibonacci sequence [22] L = {1, 2, 3, 5, 8, . . .} · ∆t, whose example is shown in Figure 2. So, L generates a feature vector: (i) which is composed of components the T k (s, t + ) for each time-slot in the timeline t + and a given sensor s; and (ii) whose size is equal to the number of TFWs times the number of sensors |T| × |S|:

Ensemble of Classifiers for Activities
Each LSTM activity-based classifier is focused on learning a given activity A i by means of a balanced training dataset. Therefore, each classifier learns two class problems: the target activity A i and not-being the target activity A i , which represents other classes and idle class.
For each time-slot t + and a given classifier A i the target class O(t + ) is defined by: So, O(t + ) represents the target class to learn by each classifier, whose activation can be concurrent with several activities ∃A i , A j , t + : The feature vector for this time-slot t + is formed by the sequence of aggregated activation degrees T k (s, t + ) from the FTWs T k for each sensor s for a given time-slot t + , as we described in Section 2.4.
Once the learning process is complete, the activation of the target activity A i is presented when the prediction for the target activity p A i overcomes the prediction of not-being the target activity p A i . We note several classifiers within the ensemble which can (and must) be activated in same time-slot t + .

Balancing Learning With Similarity Relation Between Activities and Filtering of Relevant Sensors
In this section, we describe how to build ad-hoc balanced training for each activity-based classifier from the similarity relation between other activities and filter the relevant sensors while the learning process.
Based on a given activity A i and other activity A j , we define a similarity relation R a as a function R a : A i × A j → [0, 1], which determines the similarity degree between both activities.
To compute the similarity, we calculate a similarity relation R s : A i × S j → [0, 1] between activities and sensors using the relative frequency of sensor activation within each activity: where |S j ∩ A i | represents the number of time-slots activated when the sensor S j is activated together with the activity A i . This measure is also called Mutual Information [6]. First, the similarity R s (A i , S j ) is used to compute the relevant sensors S + j for a given activity A i based on a relevance factor s α : Secondly, we evaluate the similarity relation between activities R a aggregating the similarity relation between their sensors: Thirdly, we propose to build a balanced-activity training dataset, which contains a weight or percentage of samples for each activity A i based on the similarity relation: • w A i , defines a fixed percentage of samples corresponding to the activity to learn. • w A 0 , defines a fixed percentage of samples corresponding to any activity (Idle). • w A i , configures a dynamic percentage from the all other activities in the balanced-activity training dataset w A i + w A 0 + w A i = 1, which is calculated by weighting the normalized similarity degree with the percentage from the other activities: In order to re-sample the time-slots for each balanced-activity training dataset, a straightforward random process is included to select a random time-slots rejecting or accepting based on the percentages of activities w A i .

Experimental Setup
In this Section, the experiments performed on the proposed methodology is evaluated using the interleaved dataset [7], which provides data from 20 participants who performed eight activities in any order, interweaving and performing tasks in parallel if desired. Up to three activities can be performed concurrently by the participant. We note the complexity of learning in this extreme problem.
The methodology proposed in this work uses following parameters: • Number of FTWs=|T| = 10. -Percentage of samples corresponding to the non-target activity w A i = 0.6.
We evaluated three different time intervals of fixed duration ∆t to define the time-slots ∆t = {30 s, 60 s, 90 s} and three different values for the relevance factor; to identify the selected sensors for learning in each activity s α = {0%, 3%, 6%}. For this, two metrics are introduced: • F1-coverage (F1-sc), which provides an insight into the balance between precision (precision = TP TP+FP ), and recall (recall = TP TP+FN ) from predicted and ground truth time-slots. Although well-known in activity recognition [23], we note a key issue from this metric on time interval analysis: the false positives of an activity, far from any time interval activation, are equally computed to false positives closer to end of activities. Which is common in the end of activities more so than in interleaved activities. For evaluation purposes we have developed a leave-one-participant cross-validation, where for each participant, the test is composed by the activities performed by the given participant and training is composed by the activities performed by other participants.

Results
In this section we describe the results of F1-sc and F1-ii from the interleaved dataset [7]. Table 1 describes the metrics for the duration of time-slots ∆t = {30 s, 60 s, 90 s} and the the relevance factor s α = {0%, 3%, 6%} for each configuration of values. Due to the recognition of the activities being frequently adjacent to the ground truth of the activity, included also is: (i) a strict comparison without error margin 0-time-slot, and (ii) a time-slot margin, comparing the prediction and ground truth; which evaluates as correct if both match in the adjacent time-slot. Finally, in Tables 2 and 3 we detail the values of metrics for each activity and the best configuration of the relevance factor s α .  Table 3. Detailed values of metric F1-sc for each activity, duration of time-slot ∆t and relevance factor s α . ∆t = 30 s ∆t = 60 s ∆t = 90 s A i s α = 0% s α = 3% s α = 6% s α = 0% s α = 3% s α = 6% s α = 0% s α = 3% s α = 6%

Discussion
From the results previously described, the suitable performance for the challenging problem presented in the interleaved dataset [7] is highlighted. The evaluation based on leave-one-participant cross-validation presents a hard comparative, due to each participant having the opportunity to carry out activities in any order, thus introducing unseen and unlearned habits within the activity learning process.
It is noted that the relevant detection of activity intervals, is presented by F1-ii) close or up to 90% for the three duration of time-slots ∆t = {30 s, 60 s, 90 s}. Patently, a higher aggregation of time-slots with duration ∆t = 90 s increases the performance, but at the cost of reducing the evaluation time, which present three times less responses than ∆t = 30 s. For same reason, the difference in error margin between 0-time-slot and 1-time-slot is more relevant with ∆t = 90 s. Furthermore, the use of a relevance factor s α , which identifies and selects the key sensors for each activity while learning, has increased the accuracy rate. In higher duration of time-slots ∆t = {90 s, 60 s} the filtering by the relevance factor is noteworthy due to a greater number of sensors being activated concurrently in the same time-slots. It increases the noise in the feature representation of sensors, but a filtering of relevant sensors aims to reduce the conflicting activations in time-slots.
In [17] the approach presents results based on sensor events without real-time capabilities of 81%, where the classification was evaluated only when a change in a sensor was detected. Due to this, our work is based on the evaluation of time-slots in the further timeline; a direct evaluation is not possible. The coverage of time-slots F1-sc presents an excellent prediction in real-time close to 75%, which is remarkable due to any external information and modelling being required previously, as well as, the requirements of evaluating in real-time each time-slot without introducing future information of sensor.

Conclusions and Ongoing Works
The use of fuzzy temporal representation on binary sensors, learned by an ensemble of Long Short-Term Memory, have been demonstrated as an encouraging methodology to recognize interleaved activities in real-time. The use of multiple FTWs enables a flexible temporal evaluation in interleaved activities, whose duration is strongly variant. Moreover, the Fibonacci sequence represents a suitable shape of incremental FTWs to avoid the hard selection in temporal segmentation.
The results show an encouraging real-time recognition of activity intervals, which represent the intersection of recognition intervals in the ground truth interval, as f1-ii= 90%. The coverage of predicted time-slots in real-time within the activity intervals is f1-sc= 75%.
In ongoing works, we will translate the proposed methodology in multi-occupancy and interleaved activities represented by recent devices, such as, wearable and vision sensors, which provide a challenging problem to be solved. To evaluate these non-binary sensors in long and middle-term, it will be necessary to extract several temporal features from signals including a filter to remove those non-relevant. For that, the performance of human defined features versus deep learning approaches will be also compared.