Activity Learning as a Foundation for Security Monitoring in Smart Homes

Smart environment technology has matured to the point where it is regularly used in everyday homes as well as research labs. With this maturation of the technology, we can consider using smart homes as a practical mechanism for improving home security. In this paper, we introduce an activity-aware approach to security monitoring and threat detection in smart homes. We describe our approach using the CASAS smart home framework and activity learning algorithms. By monitoring for activity-based anomalies we can detect possible threats and take appropriate action. We evaluate our proposed method using data collected in CASAS smart homes and demonstrate the partnership between activity-aware smart homes and biometric devices in the context of the CASAS on-campus smart apartment testbed.


Introduction
Smart homes have long held the promise of making our everyday environments secure and productive. Individuals spend the majority of their time in their home or workplace [1] and many feel that these places are their sanctuaries. In order to preserve that feeling, smart homes can make use of technologies such as embedded sensors and machine learning techniques to detect, identify, and respond to potential threats. While stand-alone security systems have been used in homes for many years, they cannot make adequate use of the rich information that is available from sensors integrated throughout the home and algorithms that can reason about normal and abnormal behavior in the home.
In this paper, we introduce a smart-home based approach to home security. Our proposed approach is built on the foundation of the CASAS smart home system, in which sensors are embedded in the environment. The sensors collect information about the state of the home and resident. Activity learning techniques use this information to identify and reason about routine or normal behavior in terms of recognized and forecasted activities. This identified behavior forms the basis for threat detection based on sensing abnormal behavior. Once the abnormal behavior is identified as a threat, the home selects an action to take as a response.
We evaluate our approach in the context of actual smart home testbeds. The CASAS smart home infrastructure is described in the next section and has been deployed in over 100 residences. We use actual and synthetically-manipulated data from three single-resident smart homes (named B1, B2, and B3) to evaluate our approach to threat detection and response. In addition, we demonstrate the automated-response smart home using data from a multi-resident on-campus smart apartment testbed (named Kyoto). Finally, we describe challenges and strategies for enhancing and utilize smart home-based security technologies. The smart home testbeds we use to evaluate our proposed secure smart home rely on the streamlined CASAS "smart home in a box", or SHiB (Figure 1 middle). The locations of each SHiB sensor are predefined in terms of functional areas of the home, which supports the creation of generalizable activity models. Each CASAS SHiB is installed by our team or the residents themselves [2], generally in less than two hours, and can be removed in less than twenty minutes. The lightweight nature of the design allows us to evaluate new software methods, such as the ones described in this paper, outside of laboratory or simulation settings, which is a challenge for many smart home and activity learning projects [3,4]. While each smart home site runs independently, the homes can also securely upload events to be stored in a relational database in the cloud. Figure 2 shows a sample of collected smart home data with associated automatically-generated activity labels.  The smart home testbeds we use to evaluate our proposed secure smart home rely on the streamlined CASAS "smart home in a box", or SHiB (Figure 1 middle). The locations of each SHiB sensor are predefined in terms of functional areas of the home, which supports the creation of generalizable activity models. Each CASAS SHiB is installed by our team or the residents themselves [2], generally in less than two hours, and can be removed in less than twenty minutes. The lightweight nature of the design allows us to evaluate new software methods, such as the ones described in this paper, outside of laboratory or simulation settings, which is a challenge for many smart home and activity learning projects [3,4]. While each smart home site runs independently, the homes can also securely upload events to be stored in a relational database in the cloud. Figure 2 shows a sample of collected smart home data with associated automatically-generated activity labels.

CASAS Smart Home
Our proposed secure smart home is built on the foundation of the smart home infrastructure developed at the Center for Advanced Studies in Adaptive Systems (CASAS). The smart home sense, identify, assess, and act functions represent a continuous cycle (see Figure 1). Using embedded sensors, the home senses the state of the physical environment and its residents. Software provides reasoning capabilities to identify behavior and assess the well-being of residents ( Figure 1a). Finally, the home can act on its information in order to ensure the safety, security, comfort, and productivity of the residents. The CASAS infrastructure (Figure 1c) includes components at the physical layer to sense and act on the environment. The components at the middleware layer provide component communication, identification, and synchronization. Finally, the components at the application layer provide specialized services such as activity learning, sensor fusion, and optimization for specific smart home goals. The smart home components communicate with each other via software "bridges", or communication links. Examples of such bridges are the Zigbee bridge to provide network communication, the Scribe bridge to store messages and sensor data in a relational database, and bridges for each application. The smart home testbeds we use to evaluate our proposed secure smart home rely on the streamlined CASAS "smart home in a box", or SHiB (Figure 1 middle). The locations of each SHiB sensor are predefined in terms of functional areas of the home, which supports the creation of generalizable activity models. Each CASAS SHiB is installed by our team or the residents themselves [2], generally in less than two hours, and can be removed in less than twenty minutes. The lightweight nature of the design allows us to evaluate new software methods, such as the ones described in this paper, outside of laboratory or simulation settings, which is a challenge for many smart home and activity learning projects [3,4]. While each smart home site runs independently, the homes can also securely upload events to be stored in a relational database in the cloud. Figure 2 shows a sample of collected smart home data with associated automatically-generated activity labels.  We evaluate our proposed secure smart home using four of the CASAS testbeds. The sensor layouts for these testbeds are shown in Figure 3. As shown in the figures, data is collected by sensors that monitor infrared motion (red circles), door open/close status (green rectangles), and ambient temperature as well as and ambient light (yellow stars). The B1, B2, and B3 testbeds each housed one resident and the Kyoto testbed housed two residents. We modeled the activities Bathe, Bed-toilet transition, Cook, Eat, Enter home, Leave home, Personal hygiene, Relax, Sleep, Take medicine, and Other activity. We evaluate our proposed secure smart home using four of the CASAS testbeds. The sensor layouts for these testbeds are shown in Figure 3. As shown in the figures, data is collected by sensors that monitor infrared motion (red circles), door open/close status (green rectangles), and ambient temperature as well as and ambient light (yellow stars). The B1, B2, and B3 testbeds each housed one resident and the Kyoto testbed housed two residents. We modeled the activities Bathe, Bed-toilet transition, Cook, Eat, Enter home, Leave home, Personal hygiene, Relax, Sleep, Take medicine, and Other activity. Red dots indicate motion sensors, green rectangles indicate door/temperature sensors, and yellow stars indicate ambient temperature sensors.

Activity Learning
Our approach to threat detection in smart homes is unique because it incorporates knowledge of current and activities to determine deviations from normal behavioral patterns. Those deviations represent potential threats that require further investigation and response. Most cyber-physical systems generally use a fixed set of parameters such as time and location to identify a current context, and this context forms the basis for many context-aware services including security services. Not only is this parameter set fairly small, but the parameters are typically considered separately. We postulate that learning activities provides a richer source of information for smart homes and can thus improve the security of the home. Furthermore, we maintain that an activity-aware home will provide even more powerful security services than one that is simply aware of resident locations and movements, because it can detect deviations from learned complex activity patterns.
Learning and understanding observed activities is essential for enabling secure smart homes to be sensitive to the needs of the humans that inhabit them. Our smart home makes use of activity learning in order to transform the system into one that is activity aware. Here, we describe the activity recognition and discovery algorithms that create the foundation of activity awareness in the secure smart home.  Red dots indicate motion sensors, green rectangles indicate door/temperature sensors, and yellow stars indicate ambient temperature sensors.

Activity Learning
Our approach to threat detection in smart homes is unique because it incorporates knowledge of current and activities to determine deviations from normal behavioral patterns. Those deviations represent potential threats that require further investigation and response. Most cyber-physical systems generally use a fixed set of parameters such as time and location to identify a current context, and this context forms the basis for many context-aware services including security services. Not only is this parameter set fairly small, but the parameters are typically considered separately. We postulate that learning activities provides a richer source of information for smart homes and can thus improve the security of the home. Furthermore, we maintain that an activity-aware home will provide even more powerful security services than one that is simply aware of resident locations and movements, because it can detect deviations from learned complex activity patterns.
Learning and understanding observed activities is essential for enabling secure smart homes to be sensitive to the needs of the humans that inhabit them. Our smart home makes use of activity learning in order to transform the system into one that is activity aware. Here, we describe the activity recognition and discovery algorithms that create the foundation of activity awareness in the secure smart home.

Activity Recognition
Activity recognition algorithms label activities based on the data that are collected from sensors in the environment. Once this information is provided, we can identify situations that are relevant to home security such as sleeping, entering/leaving the home, cooking, and performing activities that use valuable items. The goal of an activity recognition algorithm is to map a sequence of sensor readings, or sensor events, x = <e 1 e 2 ... e n >, onto a value from a set of predefined activity labels, a ∈ A. Activity recognition can be viewed as a type of supervised machine learning problem. We further assume the availability of a feature function, Φ, that can compute a d-dimensional feature vector from a sequence of sensor events. Our activity recognition algorithm, CASAS-AR, learns a function h that maps a feature vector, X ∈ R d , describing a particular sensor event sequence onto an activity label, h:X→A. CASAS-AR can use the learned function to recognize and label occurrences of the learned activity.
There are challenges in activity recognition that are unique among machine learning problems. The input data is often sequential and noisy, the data is not clearly partitioned into activity segments, and the data is occasionally multi-label. Some of these challenges are addressed through additional data processing such as the steps shown in Figure 4 which include collecting and preprocessing sensor data, dividing it into subsequences of manageable size, then extracting subsequence features. The final feature vectors are either labeled by an expert to use as training data or are input to an already-trained model to generate the corresponding activity label. The raw data we collect in smart homes together with the features we use to learn activity models from smart home data are summarized in Table 1. In contrast to sampling-based sensors that generate a continuous stream of values, all of the smart home sensors detect discrete events. As such, they generate text messages with sensor values, or events, only when the sensor internally notes a change in state.

Activity Recognition
Activity recognition algorithms label activities based on the data that are collected from sensors in the environment. Once this information is provided, we can identify situations that are relevant to home security such as sleeping, entering/leaving the home, cooking, and performing activities that use valuable items. The goal of an activity recognition algorithm is to map a sequence of sensor readings, or sensor events, x = <e1 e2 ... en>, onto a value from a set of predefined activity labels, a ∈ A. Activity recognition can be viewed as a type of supervised machine learning problem. We further assume the availability of a feature function, Φ, that can compute a d-dimensional feature vector from a sequence of sensor events. Our activity recognition algorithm, CASAS-AR, learns a function h that maps a feature vector, X ∈ R d , describing a particular sensor event sequence onto an activity label, h:X→A. CASAS-AR can use the learned function to recognize and label occurrences of the learned activity.
There are challenges in activity recognition that are unique among machine learning problems. The input data is often sequential and noisy, the data is not clearly partitioned into activity segments, and the data is occasionally multi-label. Some of these challenges are addressed through additional data processing such as the steps shown in Figure 4 which include collecting and preprocessing sensor data, dividing it into subsequences of manageable size, then extracting subsequence features. The final feature vectors are either labeled by an expert to use as training data or are input to an already-trained model to generate the corresponding activity label. The raw data we collect in smart homes together with the features we use to learn activity models from smart home data are summarized in Table 1. In contrast to sampling-based sensors that generate a continuous stream of values, all of the smart home sensors detect discrete events. As such, they generate text messages with sensor values, or events, only when the sensor internally notes a change in state.   The activity recognition approach that we incorporate into our secure smart home builds upon our prior work to design algorithms that automatically build activity models from sensor data using machine learning techniques [5][6][7][8]. Other groups have also explored a large number of approaches to perform supervised activity recognition [9][10][11][12][13][14][15][16][17][18][19][20][21][22][23]. These have been tested for a variety of sensor modalities, including environment [7,24,25], wearable [26][27][28], object [29,30], smart phones [31,32], and video [33]. The learning methods can be broadly categorized into template, generative, discriminative, and ensemble approaches. Template matching techniques employ a pattern-based classifier such as k-nearest neighbors, often enhanced with dynamic time warping to a varying window size [34]. Generative approaches such as naïve Bayes classifiers, Markov models and dynamic Bayes networks have yielded promising results for behavior modeling and offline activity recognition when a large amount of labeled data is available [7,[35][36][37][38][39][40][41]. On the other hand, discriminative approaches that model the boundary between different activity classes offer an effective alternative. These techniques include decision trees, meta classifiers based on boosting and bagging, support vector machines, and discriminative probabilistic graphical models such as conditional random fields [7,[42][43][44]. Other approaches combine these underlying learning algorithms, including boosting and other ensemble methods [36,45,46]. While many activity recognition algorithms have been proposed, they are typically designed for constrained situations with pre-segmented data, a single user, and no activity interruptions. Our recent work has extended this to consider generalization of activity models over multiple users. We use this generalized model in this paper to provide activity labels for the collected sensor data in each of the smart homes. To facilitate learning such a general model, we utilize a common vocabulary of sensor types and locations that allows us to design algorithms that recognize activities even in new spaces with no training data. Because we label data in real time, we do not employ the offline data segmentation found in other approaches [29,[47][48][49][50][51][52][53]. Instead, we move a sliding window through the sensor data in real time, extracting features from the window of data and mapping them to an activity label that represents the current activity (or activity corresponding to the last reading in the window). Utilizing a random forest of decision trees, we have achieved >95% recognition accuracy for as many as 33 activities in over 20 homes [38,54] including those we include in the evaluation of our proposed secure smart home.

Activity Discovery
In order to offer an activity-aware smart home, we need to enhance activity recognition research in a novel way by combining activity recognition and discovery. The most common approach to reasoning about human activities from sensor data is to identify activities that are interesting to track, then model those activities. However, to model and recognize activities, a large amount of sensor data must be available that is pre-labeled with ground truth labels. For most activities that are performed in real-world spaces, such pre-labeled data is not readily available. As a result, the set of commonly-tracked activities represents a fraction of an individual's total behavioral routine [55,56].
Tracking such a small number of activities can affect learning performance because the remainder of the data provides important context and putting unlabeled activities into an "Other activity" category yields a significant class imbalance. Considering the datasets we evaluate in this paper, the "Other activity" category represents more than half of the total sensor data. This situation is particularly problematic because the "Other activity" category represents a potentially boundless combination of diverse activities and is therefore difficult to model accurately.
While activity discovery can be used in many varied ways for smart home security, here we use it specifically to address the class imbalance problem. Specifically, we introduce intra-class clustering as a preprocessing step for activity recognition. Here, majority classes (such as "Other activity") are decomposed into sub-classes. Thus, activity class ai ∈ A is decomposed into sub-classes Sai = {ci1, ..., cik}. Training instances are assigned new class labels corresponding to their respective sub-classes and used to build the new activity models. The classifier will predict activity labels using both predefined and discovered sub-class labels. When the learned models are used to label new data, the sub-class labels are mapped back onto the original parent class labels (in this case, "Other activity"). Multiple types of clustering algorithms can perform this function. Some require a predefined number of clusters such as k-means++ [57], in which case we choose the number of clusters so that the resulting subclasses are close to the mean activity size for balance, while others such as CascadeSimpleK-Means [58] partition data based on natural groupings. Our previous research [59] has indicated that intra-class cluster frequently outperforms sampling-based methods for imbalanced class distributions. In our secure smart home, we utilize k-means++ to perform intra-cluster clustering in order to enhance the CASAS-AR activity recognition.

Anomaly Detection
One approach to designing secure smart homes is to train the home to recognize specific situations such as resident falls. This approach can be effective if enough realistic labeled data is available to emulate each possible situation. However, gathering this type of data in actual homes is difficult and can be a limitation for this approach. Furthermore, looking at predefined target situations prevents the home from detecting a broader range of security threats.
In order to find a larger collection of possible threats, we turn our attention to anomaly detection. As Chandola states, anomaly detection is "the problem of finding patterns in data that do not conform to expected behavior" [60]. One of the important roles a smart home can play is to automatically find anomalies. Anomalies can be viewed as a threat to an individual's safety in the home or even as a threat to their health [61][62][63][64][65].
There are several standard techniques for finding anomalies or outliers that are commonly employed. One such technique is to cluster data points into groups based on their distance to the cluster center. Once the data is clustered, points that are far from all of the cluster centers can be labeled as outliers and can be considered as anomalies. In cases where the data is normally distributed, z scores can also be computed. The z score for any data point is its distance from the sample mean, divided by the sample standard deviation. All z scores greater than 3.5 are typically considered to be outliers. The problem with employing clustering methods is that the clustering algorithms rely upon calculating distances between data points. While this can be effective for univariate data, smart home sensors generate multi-variable, complex data and Euclidean distance-based methods do not always reflect the important aspects of the collected data.
To address this situation and detect anomalies in smart home sensor data, we employ an isolation forest [66] to generate anomaly scores, where a higher score indicates a greater deviation from normal data patterns. An isolation forest builds a decision tree to recognize data points that fit into well-known classes. Because decision tree algorithms attempt to build the smallest possible decision tree that is consistent with training data, an outlier can be observed as a data point x that has an unusually long path length from the root of the decision tree to the leaf node containing the data point, or h(x). Each branch along the paths queries the value of a particular feature, thus this approach to anomaly detection utilizes the same feature vector employed by activity recognition and is sensitive to activities occurring in the home.
To generate an anomaly score, a set of decision trees is trained on the same data, where a node's children for each tree in the set are created by randomly selecting a feature to query and randomly selecting a split point for the feature values. Multiple trees are created, reducing the tendency to overfit and increasing the likelihood that a point with a large average h(x) value, or E(h(x), is a true anomaly (see Figure 5 for an example decision tree and h(x) value). The average height is normalized across trees dividing the value by c(n) = 2H(n − 1) − (2(n − 2)/n), where H(i) is the harmonic number, n represents the number of leaf nodes in the tree, and the function c(n) reflects the average path length of an unsuccessful search in a binary search tree. The anomaly score s of a data point x is thus defined as shown in Equation (1). To generate an anomaly score, a set of decision trees is trained on the same data, where a node's children for each tree in the set are created by randomly selecting a feature to query and randomly selecting a split point for the feature values. Multiple trees are created, reducing the tendency to overfit and increasing the likelihood that a point with a large average h(x) value, or E(h(x), is a true anomaly (see Figure 5 for an example decision tree and h(x) value). The average height is normalized across trees dividing the value by c(n) = 2H(n − 1) − (2(n − 2)/n), where H(i) is the harmonic number, n represents the number of leaf nodes in the tree, and the function c(n) reflects the average path length of an unsuccessful search in a binary search tree. The anomaly score s of a data point x is thus defined as shown in Equation (1). This method has been effective at detecting anomalies for many varied applications, even on streaming data [67]. We report anomalies as the data points with the top anomaly scores. Selecting a threshold of scores to report as anomalies is application dependent. If the threshold is too high, some anomalies may be missed, thus reducing the number of true positive anomalies that are detected. If the threshold is too low then many normal situations will be reported as anomalies, increase the number of false positives. In an empirical testing of our approach, we experimented with different fractions of outliers to report and found that reporting 0.1% yields successful detection of most known anomalies without generating an excess number of false positives.

Evaluation
Our proposed secure smart home technology is built on the notion of activity-aware anomaly detection. In this section, we analyze the effectiveness of this proposed approach which is applied to detecting anomalies and security threat conditions. To do this, we utilize actual smart home data collected in home testbeds described in Section 2. We first validate the ability of our algorithm to discover known anomalies then evaluate the technology for security-related threat conditions in an on-campus smart home testbed. This method has been effective at detecting anomalies for many varied applications, even on streaming data [67]. We report anomalies as the data points with the top anomaly scores. Selecting a threshold of scores to report as anomalies is application dependent. If the threshold is too high, some anomalies may be missed, thus reducing the number of true positive anomalies that are detected. If the threshold is too low then many normal situations will be reported as anomalies, increase the number of false positives. In an empirical testing of our approach, we experimented with different fractions of outliers to report and found that reporting 0.1% yields successful detection of most known anomalies without generating an excess number of false positives.

Evaluation
Our proposed secure smart home technology is built on the notion of activity-aware anomaly detection. In this section, we analyze the effectiveness of this proposed approach which is applied to detecting anomalies and security threat conditions. To do this, we utilize actual smart home data collected in home testbeds described in Section 2. We first validate the ability of our algorithm to discover known anomalies then evaluate the technology for security-related threat conditions in an on-campus smart home testbed.

Hybrid Real-Synthetic Data Generator
We evaluate our smart home security detection system on data collected from real smart homes. This allows us to determine the usefulness of the method on actual smart home data as residents perform their normal daily routines. In real-world daily situations, however, anomalies are rare and are not always well documented. Therefore, we not only test the data on untouched real-world data but we also find a need to create synthetic data that is reflective of real-world data and then modify the synthetic data to contain known anomalies.
To create hybrid real-synthetic data, we designed a Markov model-based generator. The data generator is designed to be probabilistically similar to the original smart home datasets, yet allow some variation that could be detected as anomalies depending upon the methods that are used. A Markov model is a statistical model of a dynamic system. The system is modeled using a finite set of states, each of which is associated with a multidimensional probability distribution over a set of parameters. Transitions between states are also governed by transition probabilities. The parameters for our model are the possible sensor event states in the home. We utilize a hidden Markov model (HMM), so the underlying model is a stochastic process that is not observable (i.e., hidden) and is assumed to be a Markov process (i.e., the current state depends on a finite history of previous states). The Markov property specifies that the probability of a state depends only on the probability of the previous state. More "memory" can be stored in the states, however, by using a higher-order Markov model. In our case, we employ a third-order Markov model, which means that the probability of the next state xi + 1 depends on the previous three states, as shown in Equation (2).
To generate synthetic data, we learn the structure of our HMM from the real smart home data [55] then use the learned structure to synthesize data that is reflective of the original data but may contain minor variations. As a result, we can generate an arbitrary amount of data and expect that few if any anomalies would be found that do not exist in the original real-world datasets. HMMs are well-suited to this type of data generation and have been used for this task in applications including speech synthesis, gene sequences, and cognitive processes [56][57][58].

Validation of Anomaly Detection
Our first evaluation step is to validate the accuracy of our anomaly-based smart home security threat detection approach. The second step is to demonstrate its use in a security-based smart home scenario. To validate our activity anomaly-based approach, we generate one week of hybrid real-synthetic data for the B1, B2, and B3 smart homes shown in Figure 3. We use one day of real data for each site to generate a week's worth of hybrid synthetic data. This allows us to keep the baseline data fairly uniform while allowing for some minor variation in residents' activities, locations, and timings throughout the dataset. In each of these datasets, we pick 20 random spots where the generated data is modified to simulate an anomaly. Each anomaly spans 30 events (the size of our activity recognition sliding window) and is created by randomly modifying the data while maintaining the same dates and times as the original data.
In order to determine the role that activity awareness plays in anomaly detection, we perform anomaly detection with alternative feature vectors. The tested feature vectors are listed below.
• f.standard: include timing, window, and sensor features that are listed in Table 1  • f.activity: include timing information and the activity label for the corresponding window of sensor data as provided by the discovery-enhanced CASAS-AR activity recognition algorithm • f.all: include all of the standard features together with activity label To measure performance, we compute true positive rates and false positive rates for the algorithm based on its ability to detect the known anomalies. We compute these rates as we vary the fraction of outliers that are reported as anomalies, from 0.1% to 0.5%. We can then use the rates computed for each of these thresholds to generate a receiver operating characteristic (ROC) curve and report the Area Under the ROC curve, or AUC. This allows us to not only determine how well the algorithm is performing overall but to compare the alternative feature vectors and thus assess the impact of activity awareness on our proposed anomaly detection method.
The results from this experiment are summarized in Table 2. As the true positive rates show, most of the embedded anomalies are found for each dataset. The B1 dataset presents the easiest case for detecting anomalies because the resident had a very structured daily routine with predictable activities, times, and locations. Even with this amount of regularity, however, the anomaly detector does find quite a few anomalies in the normal data. To an extent, this is to be expected, because humans are unpredictable and their routines are highly variable. This presents a challenge for anomaly detection in general because an algorithm that is robust enough to find true anomalies in the data will also detect such natural variations. The B3 dataset was the most challenging for finding the embedded anomalies. This is due, in part, to the fact that the normal data itself contained some anomalous situations, with one of the door sensors reporting a very large number of repeated OPEN/CLOSE messages over short periods of time. In the face of these preexisting anomalies, the embedded anomalies were not always as obvious. For all of the datasets, we observe that activity awareness improves the ability to detect anomalies. In the case of the B1 and B2 datasets, the improvement in AUC values when using standard features and activity labels is significantly better (p < 0.01) than using standard features alone. In the case of the B2 dataset, the activity labels alone were particularly effective at discovering almost all of the embedded anomalies.

Security Application
We begin the second half of our evaluation with a description of a security case study scenario. A secure smart home is valuable for monitoring the state of the home when the resident has left. An activity-aware smart home can recognize when the resident has left and when a person enters the home. For this type of security monitoring, a camera can also be used to perform face recognition on individuals who enter the residence. However, many residents do not want a camera monitoring them continuously. Additionally, face recognition techniques encounter difficulties when the person is obstructed or there are poor lighting conditions. In our scenario, the smart home sensors monitor the behavior of a person when they first enter the home. When the behavior is significantly different from normal in this context (is anomalous), then the home can automatically respond by turning on the camera and face recognition software.
In this section, we demonstrate the ability of our secure smart home to perform these tasks. We evaluate the ability of the anomaly detection algorithm to detect threats that fit this scenario. We then illustrate how our smart home testbed performs for our case study scenarios. For this evaluation, we utilize the Kyoto smart home testbed shown in Figure 3. We have collected data in this testbed for almost ten years and have labeled the data with activities using the CASAS-AR software. We extracted 762 instances of the "Enter home" activity and five minutes of sensor data following recognition of this activity. We filtered instances when the home was vacant or used extensively for other smart home studies. Each of these instances represents a single data point. We then generate a feature vector using the features summarized in Table 1 except for window duration, which is fixed in this case. Because the residents did not report any security issues during the data collection, we treat these data points as normal.
Next, we insert three sample scenarios into the data by generating sensor data that fits dates and times for actual "Enter home" activities but which represents three known conditions, one which is normal and two which are anomalous. In the first scenario, an individual enters the home through the front door (Figure 2 middle bottom of Kyoto floor plan), goes into the kitchen (Kyoto bottom right), and puts away groceries. The scenario triggers door sensors for the front door as well as the kitchen pantry and cabinets, and triggers motion sensors in the entryway and kitchen. In the second scenario, an individual enters the front door and quickly moves through the entire lower floor of the apartment, opening all of the closet and cabinet doors as though search for items to steal. This scenario should be detected as an anomaly and a possible security threat (and thus cause the camera to turn on and other appropriate actions to be taken). In the third scenario, an individual enters the apartment by a means other than the front door. For our smart home, this is represented by the individual entering the apartment through the back door. Because the front door is closest to the parking lot, the back door is not typically used to enter the apartment. Entering the apartment through a window, or for this home the back door, is anomalous and again would be viewed as a security threat.
As in the previous section, we run our anomaly detection algorithm on the actual smart home data and three inserted scenarios. We utilize a threshold of 0.1% which we selected empirically based on its consistent performance for smart home data. There are 765 data points, two of which are known anomalies. Both anomalies are detected by the software.
The complete set of results is summarized in Table 3. As the table shows, the anomaly detection algorithm was fairly conservative and only detected seven anomalies. All of the data points that were not detected as anomalies fell correctly into the baseline (normal activity) category, including the scenario case that represented normal putting-away-groceries behavior. In five of the cases, however, baseline data was labeled as anomalous. We analyzed each of these data points to determine which it was selected as an anomaly and possible security threat. In three of the cases, the individual opened and shut many of the downstairs closet doors, creating an activity profile that was very similar to our second scenario and that was unusual for the residents. The date for these data points was at the end of the spring semester when the resident was packing to move out of the apartment. These are indeed anomalous situations although may not be considered a security threat. In the other two cases, the data again included many motion sensors throughout the residence. Because an area of the home was accessed that contained much of the smart home data collection equipment, these may have represented times in which the CASAS team was performing maintenance or upgrades. Finally, Figures 6 and 7 show screenshots of our actual secure smart home system. We placed cameras throughout the home and captured video as a volunteer performed our three case study scenarios in the smart apartment. The screens indicate when the home is empty, when an individual enters the home, and whether the enter-home activity is normal or anomalous. In the two detected anomalous situations, the cameras are activated and a window pops up on the screen that shows the resident the captured video. At this point, face recognition software can be employed or the resident can view the video manually to determine whether the situation is a security threat that requires further action. We also include images from PyViz [59], our tool that shows the status of sensors in the smart home in real time as the resident moves throughout the space.

Related Approaches to Secure Smart Homes
As is indicated in Figure 1, secure smart homes need to sense the state of the environment, identify and assess possible threats, and act on those threats. While the approach we have described in this paper is unique in investigating an activity-predictive approach to secure smart homes, the work is one of many research and commercial efforts that have been developed to offer personal and home security in everyday residences [68].
In terms of sensing the state of a home, there are a number of technologies that provide sensor data specifically for security purposes. The most common sensing mechanism for home security is video cameras, and companies including iControl, Nest, SmartThings, Vivint, and Ring have enhanced traditional camera systems to offer home security features. These features include alerting homeowners of detected activity and connecting the camera with the home's doorbell or other sensors [69,70]. Also common is the use of audio, which has been employed by Zhuang et al. [71] to detect falls and Moncrieff et al. [72] to detect unusual home noises. Biometrics have also been integrated into buildings to recognize individuals based on unique anatomical traits including voice, gait, retina, and facial features [73,74], as well as body shape (anthropometry) [75], footstep shape [76], body weight [77], and heart beat pattern [78].
These sensing mechanisms offer rich, fine-grained data, although for a limited view range. The depth of information also makes them not only a useful method of detecting threats but also a possible risk for invasion of privacy. In addition, the technologies rely on residents to interpret the collected data and to select an appropriate action in response to the situation.
In the area of identifying and assessing threats, many researchers take an approach similar to the one we propose by looking for outliers in sensor data patterns and viewing such outliers as threats to well-being. Unlike our approach, these previous methods do not look at activities and the related patterns. However, existing approaches have looked for outliers based on overall movement in the home [79], resident locations in a home [58,80,81], sensor event times [82], or sensor values [83].
As an alternative approach, some investigators predefine the types of threats that are of interest, then train systems to recognize those threats. For example, Teoh and Tan [84] designed a machine learning method specifically to recognize intruders based on input from a large variety of sensors including motion sensors, closed-circuit televisions, radio frequency identification (RFID) tags,

Related Approaches to Secure Smart Homes
As is indicated in Figure 1, secure smart homes need to sense the state of the environment, identify and assess possible threats, and act on those threats. While the approach we have described in this paper is unique in investigating an activity-predictive approach to secure smart homes, the work is one of many research and commercial efforts that have been developed to offer personal and home security in everyday residences [68].
In terms of sensing the state of a home, there are a number of technologies that provide sensor data specifically for security purposes. The most common sensing mechanism for home security is video cameras, and companies including iControl, Nest, SmartThings, Vivint, and Ring have enhanced traditional camera systems to offer home security features. These features include alerting homeowners of detected activity and connecting the camera with the home's doorbell or other sensors [69,70]. Also common is the use of audio, which has been employed by Zhuang et al. [71] to detect falls and Moncrieff et al. [72] to detect unusual home noises. Biometrics have also been integrated into buildings to recognize individuals based on unique anatomical traits including voice, gait, retina, and facial features [73,74], as well as body shape (anthropometry) [75], footstep shape [76], body weight [77], and heart beat pattern [78].
These sensing mechanisms offer rich, fine-grained data, although for a limited view range. The depth of information also makes them not only a useful method of detecting threats but also a possible risk for invasion of privacy. In addition, the technologies rely on residents to interpret the collected data and to select an appropriate action in response to the situation.
In the area of identifying and assessing threats, many researchers take an approach similar to the one we propose by looking for outliers in sensor data patterns and viewing such outliers as threats to well-being. Unlike our approach, these previous methods do not look at activities and the related patterns. However, existing approaches have looked for outliers based on overall movement in the home [79], resident locations in a home [58,80,81], sensor event times [82], or sensor values [83].
As an alternative approach, some investigators predefine the types of threats that are of interest, then train systems to recognize those threats. For example, Teoh and Tan [84] designed a machine learning method specifically to recognize intruders based on input from a large variety of sensors including motion sensors, closed-circuit televisions, radio frequency identification (RFID) tags, magnetic contact switches, and glass breakage sensors. Aicha et al. [85] also look for these situations by analyzing transition times from when a home is empty to when an individual enters the home. A primary challenge for these types of systems that use supervised learning to detect highly specialized situations is that sufficient training data is normally not available for realistic home situations. For health applications, Han et al. [86] look for changes in the amount of time spent moving around the house, eating, sleeping, and performing hygiene. Williams and Cook [87] also look for changes in waketime and sleeptime behavior as a means to detect and circumvent sleep disturbances. In other approaches, researchers have constructed methods to look for changes in activity times for a constrained set of rule-based activity patterns [88][89][90][91].

Conclusions
In this paper, we introduce an activity-aware approach to creating secure smart homes. Smart homes have become adept at robustly and unobtrusively collecting sensor data that provides insights on human behavior. When this data is labeled using activity recognition algorithms, the smart home can become activity aware. We demonstrate how this information can be employed by an anomaly detection algorithm to detect security threats in real time. We validate our approach by detecting known anomalies in hybrid real-synthetic smart home data and demonstrate its use in a security home case study.
While this work demonstrates that smart homes can be automated for security threat detection, there are still many avenues for ongoing research. One problem that we highlighted in our experiments is that many anomalies naturally exist in human behavior data and not all of these represent security threats. In our future work, we will involve smart home residents to train the smart home on the types of anomalies that are of interest. We will also design an ensemble approach that employs multiple types of anomaly detection, including detection of location-based anomalies, current activity anomalies, and forecasted activity anomalies to improve the performance of anomaly detection. Finally, we will incorporate automated face recognition and resident notification to automate not only detection of anomalies but response to security threats.
Author Contributions: Jess Dahmen and Diane Cook conceived, designed, and implemented the anomaly detection algorithms and experiments; Brian Thomas conceived and designed the HMM synthetic data generator; and Xiaobo Wang conceived and designed the security scenarios and use of smart home sensor data for home-based security.

Conflicts of Interest:
The authors declare no conflict of interest.