Computational Topology to Monitor Human Occupancy

The recent advances in sensing technologies, embedded systems, and wireless communication technologies, make it possible to develop smart systems to monitor human activities continuously. The occupancy of specific areas or rooms in a smart building is an important piece of information, to infer the behavior of people, or to trigger an advanced surveillance module. We propose a method based on computational topology to infer the occupancy of a room monitored for a week by a system of low-cost sensors.


Introduction
Improving the energy efficiency of buildings is a lively research topic. Technologies for the monitoring and consequent intervention on the environment not only help saving resources, but also improve the well-being perceived by building occupants [1]. One of the simplest policies to improve energy-efficiency while respecting the well-being of occupants, is the automatic regulation of the usage of energy based on the occupancy of the building spaces [2]. If no users are detected in the monitored space, inactive devices can be disconnected; environmental setpoints for air conditioners, heaters, and humidifiers can be decreased; lights can be lowered or turned off.
The key requisite for implementing occupancy-driven applications is an accurate, inexpensive, and non-intrusive method for monitoring occupancy. Non-intrusive means that the user is not required to have smartphones or RFID identification systems, which provide information about its presence within the space. Usually, such systems are referred to as device-free, and exploit the networks of acoustic, inertial, environmental and power sensors already installed in the buildings [3].
Recently, many authors proposed to analyse data from multiple sensors (electric power meters, accelerometers and noise meters) to identify the user's presence or absence in a space [4,5]: variations of sensor time series, due to the arrival or exit of a user from a space, are analyzed to infer a classification template for the user's presence/absence. Conversely, refs. [6][7][8] address the problem of identifying user activities by analyzing the spectral fingerprint of the energy consumption due to the used appliances during their activities. Ref. [9] investigates about the possibility of identifying the user activity by analyzing energy consumption patterns from electrical appliances.
Device-free sensing of building occupancy poses many challenges, as any sensor shows a noisy behavior under certain conditions. For example, inertial sensors can cause spurious detection deriving from pets movements; though CO 2 sensors are less sensitive to their placement and to external events, their efficiency is directly related to the ventilation of the building. Therefore, robustness to noise is a key issue for device-free sensor system. In this work, we present an algorithm for extracting topological features from sensor time series, to support the implementation of a device-free detection algorithm of presence/absence. Basing on the theory of Persistent Homology, we evaluate the relevance of topological features as their persistence (a topological event's lifespan) in the so-called barcodes [10].
The assumption is that persistent features identify important events, while features with a shorter lifespan correspond to noise. The result is a topological fingerprint of the user's presence, which can be used in an unsupervised approach for state classification. The main advantages brought by Persistent Homology are low computational cost, and high resilience to noisy data.

Topological Data Analysis
Topology is a branch of mathematics dealing with qualitative information of an object: topology looks at the intrinsic and global properties of an object, such as its shape [11]. Recently, Topological Data Analysis (TDA) methods are increasingly used to investigate and characterize multidimensional datasets. The research on TDA was recently boosted by the introduction of the Persistent Homology (PH) theory [12], along with fast algorithms for its computation [13] and efficient implementations [14]. The application scenarios include shape and texture analysis [15], biological and molecular data analysis [16], sensor networks [17], image and signal processing [18][19][20]. In what follows, we offer a brief intuition on PH; we refer the reader to [21] for a rigorous treatment of the subject.
The core idea of PH is to represent data as filtered simplicial complexes. Given a simplicial complex (for example, a triangle mesh), filtering the complex means defining a rule to build the complex as a sequence of nested sub-complexes. An example of filtered complex is the Vietoris-Rips complex, whose construction is shown in the example of Figure 1: at the first stage of the filtration, only the nodes are included; then, in subsequent steps, the edges in the complex appear ordered by the distance between their endpoints (the filtration parameter). There are many viable choices on how to filter a complex associated to a dataset: this is one of the main strengths of PH-based approaches. Once the dataset is encoded as a filtered simplicial complex, one evaluates the birth and death of topological events (homology classes) while growing the filtration parameter: for example, when connected components appear and when are merged, when holes are created and when are closed off, and so on. The lifespan of these events is stored in a stable invariant: the i−barcode. A barcode in a given dimension i is a collection of horizontal bars in a plane: the horizontal axis corresponds to the parameter to filter the complex, while the vertical axis represents an arbitrary ordering of homology generators in dim i (see Figure 2 for an example of barcodes in dim 0 and 1). The length of each bar is interpreted as the lifespan or persistence of the corresponding generator: short bars are interpreted as noise, while long bars as important topological features.  In this paper, we compute the persistence barcodes on time series from multiple sensors placed in a room, to detect the points in time when the room is occupied. The assumption is that room occupancy is reflected by some topological features in PH. To compute the barcodes, the first step is to discretise the sensor time series and represent them as filtered complexes, as described in the next Section.

Feature Extraction Algorithm
Given K different sensor time series, each of length M, we represent them with a matrix (Figure 3): each column corresponds to the time series of a single sensor, whereas each row corresponds to all sensor values sampled at a given time. Then, we cluster rows into sliding windows F i [19]: each F i corresponds to W time samples of all K sensors, with each window sharing W/2 samples with the previous one. Therefore, each F i can be seen as a point cloud, made of W points in the Euclidean space R K , where the coordinates correspond to the output of the different sensors. For each point cloud F i associated with a sliding window, we compute the Vietoris-Rips complex filtered by the Euclidean distance between points, and compute the corresponding barcodes. The analysis of the barcodes then gives insights on the likelyhood of a window to correspond to a period when the room is occupied. In particular, we analyse the barcodes through a set of descriptors derived by the topological features summerized in each barcode [20]: the number of 0− and 1− topological events; the number of 0− and 1− topological long events (i.e., events with lifespan greater than a fixed threshold: τ 0 in dimension 0 and τ 1 in dimension 1); and the average lifespan in dimension 0 and 1.
The feature extraction algorithm is summarised in Algorithm 1. The computations have been carried out using JavaPlex [22].

Algorithm 1 PH-based Algorithm for Feature Extraction.
Step 1.
a. Group data in blocks of length W, overlapping of W/2 each other b. Compute the associated set of point clouds F i c. Mean-center and normalize each F i Step 2.
For each F i : a. Compute the associated barcodes in dimension 0 and 1 (i.e., B 0,i and B 1,i ) b. Count the topological events in B 0,i and B 1,i c. Count the long topological events d. Compute the average lifespan for each barcode The set of descriptors is evaluated as a predictor of the occupancy of the room. The detection procedure works as follows: (i) the window F i for which the descriptor reaches its highest values is selected; (ii) this selection is performed over all the descriptors; (iii) the intersection among the set of selected windows is evaluated. The detection of presence occurs in the windows belonging to the intersection; and the absence in the remaining windows.

Experiments
The following subsections describe our scenario and the results we got in our preliminary study.

Experimental Set-Up
The detection of the user's presence was performed using time series acquired by three sensors: motion, acoustic, and a power meters sensor. Acoustic and motion sensors are characterized by a binary output: they return 1 if motion, or respectively high level of environmental noise, is detected, and 0 otherwise. The output of the power meter is a scalar that ranges from 0 to the level of consumed electric power.
The sensors' time series were acquired at an average sampling rate of 1 per minute, through a monitoring system installed in an office at the National Research Council of Italy (CNR) in Pisa. The office is 25 m 2 and it is used by only one employee. The typical working day at CNR ranges from 9.00 a.m. to 6.00 p.m. (Monday-Friday). Lamps, PCs and other appliances such as printers, kettles or coffee machines, may be present in the room.
The sensors deployed in the office communicate via ZigBee protocol with a gateway based on the Raspberry PI board which runs an integration middleware for sensing information. The data collected by the gateway are stored on a cloud infrastructure hosted at CNR. The cloud is organized as a set of virtual machines based on an VMware ESX Server providing three kinds of services: storage, visualization through an interactive dashboard, and tools for the analysis of data.
The data have been collected for one week asking the occupant to register the ground truth of the room occupancy.
The three time series were temporally aligned: a week of acquisition produced 8840 samples for each time series. So, the data matrix in Figure 3 has M = 8840 rows and K = 3 columns. Rows have been grouped in sliding windows of size: W = 18, 34, 68. For each value of the window size W, we obtain a sequence of point clouds F i in R 3 , with the corresponding Rips-Vietoris complex and persistence barcodes.
The number of long topological features, for each F i , was counted using different values for both thresholds τ 0 and τ 1 . Precisely, we have characterized topological events on a lattice of points (τ 0 , τ 1 ), both ranging in the interval [0.1, 0.9] with step size 0.2. Figure 4 shows the number of counted features for each F i , with W = 68. The comparison of this graph with the ground truth suggests that the greater number of features is obtained for those windows corresponding to the presence of the user in the office. In general, features in dim 0 showed better correlation with room occupancy than features in dimension 1. This is probably due to the fact that 0-dimensional features are linked to the correlation between samples within the windows considered, while features in dim 1 are indicators of possible periodic pattern in the signal time series, which are negligible in our data.   Figure 6 shows the prediction performance obtained with the descriptor extracted from sensor time series with the values W = 68 and τ 0 = 0.1, which provided the best set of descriptors for the detection procedure. The values 'Presence' and 'no Presence' in the axes 'Verified' are the ground truth, as registered by the room occupant; values 'Presence' and 'no Presence' in the axes 'Predicted' are those obtained by our detection algorithm. Figure 6 shows that the detection algorithm detects a 'Presence', which is confirmed by the ground truth. Instead, with a probability of 88% the detection algorithm detects a 'no Presence', which is again confirmed by the ground truth. The algorithm performance in our preliminary study is promising compared to the results in [3], considering that their period of experimentation is quite longer (one month vs one week).

Conclusions and Future Work
The results reported in the previous section show that PH-based methods are useful to encode the information from low-cost monitoring systems about human activity. A further analysis may be carried out in order to understand if the proposed set of topological features may characterize more precisely human activity, not limiting to the occupancy of a room. Viable options for developing a topology-based activity detection may be: increasing the number of low-cost sensors used for the data acquisition, increasing the computation of PH to dimensions > 1, and complementing topological data analysis with adaptive detection techniques.