Surveying Human Habit Modeling and Mining Techniques in Smart Spaces

: A smart space is an environment, mainly equipped with Internet-of-Things (IoT) technologies, able to provide services to humans, helping them to perform daily tasks by monitoring the space and autonomously executing actions, giving suggestions and sending alarms. Approaches suggested in the literature may differ in terms of required facilities, possible applications, amount of human intervention required, ability to support multiple users at the same time adapting to changing needs. In this paper, we propose a Systematic Literature Review (SLR) that classiﬁes most inﬂuential approaches in the area of smart spaces according to a set of dimensions identiﬁed by answering a set of research questions. These dimensions allow to choose a speciﬁc method or approach according to available sensors, amount of labeled data, need for visual analysis, requirements in terms of enactment and decision-making on the environment. Additionally, the paper identiﬁes a set of challenges to be addressed by future research in the ﬁeld.


Introduction
The progress of information and communication technologies has many faces; while computing speed, reliability and level of miniaturization of electronic devices increase year after year, their costs decrease.This allows a widespread adoption of embedded systems (e.g., appliances, sensors, actuators) and of powerful computing devices (e.g., laptops, smartphones), thus, turning pervasive (or ubiquitous) computing into reality.Pervasive computing embodies a vision of computers seamlessly integrating into everyday life, responding to information provided by sensors in the environment, with little or no direct instruction from users [1].At the same time, connecting all these computing devices together, as networked artefacts, using local and global network infrastructures has become easy.The rise of applications that exploit these technologies represents a major characteristic of the Internet-of-Things (IoT) [2].
Smart spaces represent an emerging class of IoT-based applications.Smart homes and offices are representative examples where pervasive computing could take advantage of ambient intelligence (AmI) more easily than in other scenarios where artificial intelligence-AI problems soon become intractable [3].
A study about the current level of adoption of commercial smart home systems is provided in [4].This study reveals how people understanding of the term "smart" has a more general meaning than what we presented here as AmI; in particular, it also includes non-technological aspects such as the spatial layout of the house.Additionally, an automated behavior is considered as smart, especially from people without a technical background, only if it performs a task quicker than the user could do by himself.The research also reveals that interest in smart home systems is subject to a virtuous circle such that people experiencing benefits from their services feel the need of upgrading them.The universAAL specification [5] defines a smart space as "an environment centered on its human users in which a set of embedded networked artefacts, both hardware and software, collectively realize the paradigm of ambient intelligence" (AmI).
Many different definitions of AmI are provided in the literature, e.g., [6] introduces a set of distinguishing keywords characterizing AmI systems, namely: sensitivity, responsiveness, adaptivity, ubiquity and transparency.The term sensitivity refers to the ability of an AmI system to sense the environment and, more generally, to understand the context it is interacting with.Strictly related to sensitivity are responsiveness and adaptivity, which denote the capability to timely act, in a reactive or proactive manner, in response to changes in the context according to user preferences (personalization).Sensitivity, responsiveness and adaptivity all contribute to the concept of context awareness.Finally, the terms ubiquity and transparency directly refer to concept of pervasive computing.AmI is obtained by merging techniques from different research areas [5] including artificial intelligence (AI) and human-computer interaction (HCI).
The different proposed approaches largely differs in terms of models taken as input for both space dynamics and human behavior in it, how these models are extracted, what kind of input sensors they need, which are the constraints under which they are supposed to effectively work.
The performance evaluation of all these approaches, especially in the academic field, is often conducted in controlled situations where some of the features of a real environment are hid by the need for repeatability of the experiment.
In this paper, we propose a comparative framework for techniques and approaches for modeling and extracting models to be employed with modern smart spaces.Differently from other surveys in the same area, this work focuses on giving an overview of the approaches by enforcing a taxonomy with dimensions that are chosen in order to understand the suitability of a specific technique to a specific setting.
The paper is organized as it follows.Section 2 introduces the terminology and basic notions of smart spaces needed to fruitfully follow the discussion.Section 3 explains the inclusion criteria for the literature review.Section 4 discusses sensor types and how they are considered in the relevant literature.Sections 5 and 6 discuss the different types of smart space models employed and how they are constructed.Section 7 compares our work with other similar literature surveys.Finally, Section 8 concludes the paper by providing final considerations.

Background
Figure 1 depicts the closed loops that characterize a running smart space [7].The main closed loop, depicted using solid arrows and shapes, shows how the knowledge of environment dynamics and of users behaviors and preferences is employed to interpret sensors output in order to perform appropriate actions on the environment.Sensor data is first analyzed to extract the current context, which is an internal abstraction of the state of the environment from the point of view of the AmI system.The extracted context is then employed to make decisions on the actions to perform on the controlled space.Actions related to these decisions modify the environment (both physical and digital) by means of actuators of different forms.
Sensors can be roughly divided into physical ones, which provide direct measurements about the environment (e.g., humidity, brightness, temperature), the devices and the users, and cyber ones, which provide digital information, not directly related to physical phenomena, such as user calendars.
A cyber sensor often provides information related to the presence of the user in the cyberspace (e.g., the calendar of an user, a tweet posted by him/her, etc.).
The term sensor data encompasses raw (or minimally processed) data retrieved from both physical sensors and cyber sensors.We can imagine a smart space producing, at runtime, a sensor log containing raw measurements from available sensors.Definition 1 (Sensor Log).Given a set S of sensors, a sensor log is a sequence of measurements of the kind ts, s, v where ts is the timestamp of the measurement, s ∈ S is the source sensor and v the measured value, which can be either nominal (categorical) or numeric (quantitative).
Measurements can be produced by a sensor on a periodic base (e.g., temperature measurements) or whenever a particular event happens (e.g., door openings).As many of the algorithms proposed in the literature borrow the terminology of data mining, the sensor log could be conceived as a sequence of events instead of a sequence of measurements.Definition 2 (Event Log).Given a set E = {e 1 , . . . ,e n E } of event types, an event sequence is a sequence of pairs e, t , where e ∈ E and t is an integer, the occurrence time of the event type e. Definition 2 is more restrictive than Definition 1. Translating a sensor log into an event log could cause a loss of information especially if discretization of periodic sensor measurements is required.
Authors in the field of smart spaces uses, sometimes as synonyms, a variety of terms to refer to the state of the environment and the tasks humans perform in it.For the rest of the article, we will use the following terminology:

•
Context: The state of the environment including the human inhabitants.This includes the output of sensors and actuators, but also the state of human inhabitants including the action/activities/habits he/she is performing.In this very comprehensive meaning, the term situation is sometimes used.

•
Action: Atomic interaction with the environment or a part of it (e.g., a device).Recognizing actions can be easy or difficult depending on the sensors installed.Certain methods only focuses on actions and they will not be part of our survey.In some cases methods to recognize activities and habits completely skip the action recognition phase, only relying on the raw measurements in the sensor log.

•
Activity: A sequence of actions (one in the extreme case) or sensor measurements/events with a final goal.In some cases an action can be an activity itself (e.g., ironing).Activities can be collaborative, including actions by multiple users and can interleave one each other.The granularity (i.e., the temporal extension and complexity) of considered activities cannot be precisely specified.According to the approach, tidying up a room can be an activity whereas others approaches may generically consider tidying up the entire house as an activity.In any case, some approaches may hierarchically define activities, where an activity is a combination of sub-activities.

•
Habit: A sequence or interleaving of activities that happen in specific contextual conditions (e.g., what the user does every morning between 08:00 and 10:00).
Knowledge plays a central role in AmI systems.
As it intervenes both for context extraction and decision-making, it takes the form of a set of models describing (i) users behavior, (ii) environment/device dynamics, and (iii) user preferences.Anyway, knowledge should not be considered as a static resource as both users behavior and preferences change over time.Vast majority of works in the area of ambient intelligence suppose the knowledge to be obtained off-line, independently from the system runtime.A second optional loop in Figure 1, depicted using dashed arrows, shows that the current context could be employed to update the knowledge by applying learning techniques at runtime.Noteworthy, AmI is not intended to be provided by a centralized entity, on the contrary, its nature is distributed with embedded devices and software modules, possibly unaware one of each other, contributing to its features.Recently introduced smart space appliances, such as the NEST thermostat (see https://nest.com/thermostats/nest-learning-thermostat),contain sensors, actuators and AmI features in a single small package.

Inclusion Criteria and Comparison Framework
In order to conduct our analysis we took inspiration from SLR (Systematic Literature Review) guidelines.A SLR is a method to identify, evaluate and interpret relevant scientific works with respect to a specific topic.We designed a protocol for conducting the SLR inspired to the guidelines and policies presented in [8], in order to ensure that the results are repeatable and the means of knowledge acquisition are scientific and transparent.
The necessary steps to guarantee compliance with the guidelines include (i) the formulation of the research questions; (ii) the definition of a search string; (iii) the selection of the data sources on which the search is performed; (iv) the identification of inclusion and exclusion criteria; (v) the selection of studies; (vi) the method of extracting data from the studies; and (vii) the analysis of the data.
Our analysis covers the different aspects of habit mining by separately analyzing the features requested to the sensor log, the modeling phase, and the runtime phase.The modeling phase is in charge of creating the models of human habits and environmental dynamics, whereas the runtime phase covers the aspect related to how these models are employed at runtime to recognize the context and to act on the environment.These two phases can even overlap in the case the system is able to refine models at runtime (either in collaboration or not with the user) or even to completely create them from scratch at runtime.In any case, the runtime phase covers only the way models are employed whereas the modeling phase cover any phase that is related to model production or update.
On the basis of the above premises, the following research questions RQ-x have been defined: • RQ-A: Sensor measurements represent the traces of user behavior in the environment.This information is needed at runtime to understand the current context, but the specific available information must be known when the model is defined (through specification or learning).
The current review will only take into account the sensors for which the method has been validated.Section 4 will present for each of the included papers the following information: -RQ-A1: which sensors are taken into account?-RQ-A2: are the sensors only categorical or numerical?And in the latter case, -RQ-A3: which discretization strategy is employed?
• RQ-B1: Any proposed method has a way to represent models.Models can be represented using graphical or mathematical/logic formalisms.Some methods propose formalisms that are specifically designed for the particular approach.Other methods conversely employ standard formalisms from machine learning and data mining.Section 5 analyzes the following aspects about employed models: -RQ-B1.1: the type of adopted model; -RQ-B1.2:how does an instance of the model can be represented?Is it human readable?-RQ-B1.3:which is the granularity at which the model works?
• RQ-B2: Whichever model type is adopted by a specific method, each method introduces a different way to construct the model.Section 6 expands the following analysis: -RQ-B2.1: for methods involving an, optionally partial, automatic construction of models (i.e., through learning), a training set consisting of a sensor log must be fed as input.At first, learning methods in the field of ambient intelligence can be classified according to the effort devoted to label the training set; -RQ-B2.2:does the system consider the possibility of having multiple users?How many?-RQ-B2.3: in the latter case, is some type of additional labeling needed?
As terms in this specific research area are not yet standardized, we replaced the search string with a crawling-like procedure starting from a set of seed papers, recursively navigating papers through the "Cited by" feature of Google Scholar, and selecting influential works from the last 10 years.This kind of search was manually performed on Google Scholar, as it includes all the relevant sources of scientific papers available.
We then computed the 33rd percentile over the number of citations per year.In general, using the number of citations per year promotes recent works, as they attract citations more than outdated ones, but as we are analyzing only works from the last 10 years, this approach allows to highlight the most influential works.Additionally, papers that extend works included by this criterion are included in the results.
The application of this criteria allowed to identify 22 primary studies (cf.Table 1) that were included in the final SLR.For each work, in the following tables, a general description is provided in addition to some schematic information.PALMES-OBJREL [29] Palmes, P., Pung, H. K., Gu, T., Xue, W., and Chen, S. Object relevance weight pattern mining for activity recognition and segmentation.

Supported Sensors
Sensing technologies have made significant progress on designing sensors with smaller size, lighter weight, lower cost, and longer battery life.Sensors can, thus, be embedded in an environment and integrated into everyday objects and onto human bodies without affecting users' comfort.Nowadays, sensors do not only include those traditionally employed for home and building automation (e.g., presence detectors, smoke detectors, contact switches for doors and windows, network-attached and close circuit cameras) but also more modern units (e.g., IMU-Inertial Measurements Units such as accelerometer and gyroscopes, WSN nodes), which are growingly available as part of off-the-shelf products.
Measured values are usually affected by a certain degree of uncertainty.Sensors have indeed their own technical limitations as they are prone to breakdowns, disconnections from the system and environmental noise (e.g., electromagnetic noise).As a consequence, measured values can be out of date, incomplete, imprecise, and contradictory with each other.Techniques for cleaning sensor data do exist [33], but uncertainty of sensor data may still lead to wrong conclusions about the current context, which in turn potentially lead to incorrect behaviors of the system.
Formalisms employed for representing knowledge in AmI systems often need environmental variables to be binary or categorical.A wide category of sensors (e.g., temperature sensors) produce instead numerical values, making it necessary to discretize sensor data before they can be used for reasoning.
Discretization methods in machine learning and data mining are usually classified according to the following dimensions [34]:

•
Supervised vs. Unsupervised.Unsupervised methods do not make use of class information in order to select cut-points.Classic unsupervised methods are equal-width and equal-frequency binning, and clustering.Supervised methods employ instead class labels in order to improve discretization results.

•
Static vs. Dynamic.Static discretization methods perform discretization, as a preprocessing step, prior to the execution of the learning/mining task.Dynamic methods instead carry out discretization on the fly.

•
Global vs. Local.Global methods, such as binning, are applied to the entire n-dimensional space.
Local methods, as the C4.5 classifier, produce partitions that are applied to localized regions of the instance space.A local method is usually associated with a dynamic discretization method.

•
Top-down vs. Bottom-up.Top-down methods start with an empty list of cut-points (or split-points) and keep on adding new ones to the list by splitting intervals as the discretization progresses.Bottom-up methods start with the complete list of all the continuous values of the feature as cut-points and remove some of them by merging intervals as the discretization progresses.

•
Direct vs. Incremental.Direct methods directly divide the range of a quantitative attribute in k intervals, where the parameter k is provided as input by the user.Conversely, incremental methods start from a simple discretization and improve it step by step in order to find the best value of k.
An important aspect to take into account when evaluating the usage of a particular category of sensors is the final user acceptance.Wearable sensors or cameras usually provoke a certain level of embarrassment in the users that must be taken into account.
Table 2 shows, for each of the selected papers, the answers to research questions A1, A2 and A3.A3 is always empty for methods only supporting discrete sensors.

Model Types
Knowledge is represented in AmI systems using models.The literature about representing models of human habits is wide.In this section, we will review the most adopted approaches, highlighting those formalisms that are human understandable, thus, being easy to validate by a human expert or by the final user (once the formalism is known).
Bayesian classification techniques are based on the well known Bayes theorem P(H|X) =

P(X|H)P(H) P(X)
, where H denotes the hypothesis (e.g., a certain activity is happening) and X represents the set of evidences (i.e., the current value of context objects).As calculating P(X|H) can be very expensive, different assumptions can be made to simplify the computation.For example, naïve Bayes (NB) is a simple classification model, which supposes the n single evidences composing X independent (that the occurrence of one does not affect the probability of the others) given the situational hypothesis; this assumption can be formalized as P(X|H) = ∏ n k=1 P(x k |H).The inference process within the naïve Bayes assumption chooses the situation with the maximum a posteriori (MAP) probability.
Hidden Markov Models (HMMs) represent one of the most widely adopted formalism to model the transitions between different states of the environment or humans.Here hidden states represent situations and/or activities to be recognized, whereas observable states represent sensor measurements.HMMs are a statistical model where a system being modeled is assumed to be a Markov chain, which is a sequence of events.A HMM is composed of a finite set of hidden states (e.g., s t−1 , s t , and s t+1 ) and observations (e.g., o t−1 , o t , and o t+1 ) that are generated from states.HMM is built on three assumptions: (i) each state depends only on its immediate predecessor; (ii) each observation variable only depends on the current state; and (iii) observations are independent from each other.In a HMM, there are three types of probability distributions: (i) prior probabilities over initial state p(s 0 ); (ii) state transition probabilities p(s t |s t−1 ); and (iii) observation emission probabilities p(o t |s t ).
A drawback of using a standard HMM is its lack of hierarchical modeling for representing human activities.To deal with this issue, several other HMM alternatives have been proposed, such as hierarchical and abstract HMMs.In a hierarchical HMM, each of the hidden states can be considered as an autonomous probabilistic model on its own; that is, each hidden state is also a hierarchical HMM.
HMMs generally assume that all observations are independent, which could possibly miss long-term trends and complex relationships.Conditional Random Fields-CRFs, on the other hand, eliminate the independence assumptions by modeling the conditional probability of a particular sequence of hypothesis, Y, given a sequence of observations, X; succinctly, CRFs model P(Y|X).Modeling the conditional probability of the label sequence rather than the joint probability of both the labels and observations P(X, Y), as done by HMMs, allows CRFs to incorporate complex features of the observation sequence X without violating the independence assumptions of the model.The graphical model representations of a HMM (a directed graph, Figure 2a) and a CRF (an undirected graph, Figure 2b) makes this difference explicit.In [24], a comparison between HMM and CRF is shown, where CRF outperforms HMM in terms of timeslice accuracy, while HMM outperforms CRF in terms of class accuracy.Examples of HMM and CRF models.Ellipses represent states (i.e., activities).Rectangles represent sensors.Arrows between states are state transition probabilities (i.e., the probability of moving from a state to another), whereas those from states to sensors are emission probabilities (i.e., the probability that in a specific state a sensor has a specific value).(a) HMM model example.Picture inspired by CASAS-HMM [14] and CASAS-HAM [15].(b) CRF model example.Picture inspired by KROS-CRF [24].
Another statistical tool often employed is represented by Markov Chains (MCs), which are based on the assumption that the probability of an event is only conditional to the previous event.Even if they are very effective for some applications like capacity planning, in the smart spaces context, they are quite limited because they deal with deterministic transactions and modeling an intelligent environment with this formalism results in a very complicated model.Support Vector Machines (SVMs) allow to classify both linear and non-linear data.A SVM uses a non-linear mapping to transform the original training data into an higher dimension.Within this new dimension, it searches for the linear optimal separating hyperplane that separates the training data of one class from another.With an appropriate non-linear mapping to a sufficiently high dimension, data from two classes can always be separated.SVMs are good at handling large feature spaces since they employ overfitting protection, which does not necessarily depend on the number of features.Binary Classifiers are built to distinguish activities.Due to their characteristics, SVMs are better in generating other kind of models with a machine learning approach than modeling directly the smart environment.For instance in [35] authors uses them combined with Naive Bayes Classifiers to learn the activity model built on hierarchical taxonomy formalism shown in Figure 3. Artificial Neural Networks (ANNs) are a sub-symbolic technique, originally inspired by biological neuron networks.They can automatically learn complex mappings and extract a non-linear combination of features.A neural network is composed of many artificial neurons that are linked together according to a specific network architecture.A neural classifier consists of an input layer, a hidden layer, and an output layer.Mappings between input and output features are represented in the composition of activation functions f at a hidden layer, which can be learned through a training process performed using gradient descent optimization methods or resilient backprogagation algorithms.Some techniques stem from data mining methods for market basket analysis (e.g., the Apriori algorithm [36]), which apply a windowing mechanism in order to transform the event/sensor log into what is called a database of transactions.Let I = {i 1 , . . ., i n E } be a set of binary variables corresponding to sensor event types.A transaction is an assignment that binds a value to each of the variables in I, where the values 0 and 1 respectively denote the fact that a certain event happened or not during the considered window.A database of transactions T is an (usually ordered) sequence of transactions each having a, possibly empty, set of properties (e.g., a timestamp).An item is an assignment of the kind i k = {0, 1}.An itemset is an assignment covering a proper subset of the variables in I.An itemset C has support Supp T (C) in the database of transactions T if a fraction of Supp T (C) of transactions in the database contain C. The techniques following this strategy turn the input log into a database of transactions, each of them corresponding to a window.Given two different databases of transactions T 1 and T 2 , the growth rate of an itemset C from T 1 to T 2 is defined as Supp T 1 (X) .Emerging patterns (EP) are those itemsets showing a growth rate greater than a certain threshold ρ.The ratio behind this definition is that an itemset that has high support in its target class (database) and low support in the contrasting class, can be seen as a strong signal, in order to discover the class of a test instance containing it.Market basket analysis is a special case of affinity analysis that discovers co-occurrence relationships among purchased items inside a single or more transactions.
Initial approaches to the development of context-aware systems able to recognize situations were based on predicate logic.Loke [37] introduced a PROLOG extension called LogicCAP; here the "in-situation" operator captures a common form of reasoning in context-aware applications, which is to ask if an entity E is in a given situation S (denoted as S* > E).In particular, a situation is defined as a set of constraints imposed on output or readings that can be returned by sensors, i.e., if S is the current situation, we expect the sensors to return values satisfying some constraints associated with S. LogicCAP rules use backward chaining like PROLOG, but also utilizes forward chaining in determining situations, i.e., a mix of backward and forward chaining is used in evaluating LogicCAP programs.The work introduces different reasoning techniques with situations including selecting the best action to perform in a certain situation, understanding what situation a certain entity is in (or the most likely) and defining relationships between situations.
There are many approaches borrowed from information technology areas, adapted to smart environments.For instance in [38], the authors use temporal logic and model checking to perform activities modeling and recognition.The system proposed is called ARA.A graphical representation of a model example adopted by this approach is shown in Figure 4 Ontologies (denoted as ONTO) represent the last evolution of logic-based approaches and have increasingly gained attention as a generic, formal and explicit way to "capture and specify the domain knowledge with its intrinsic semantics through consensual terminology and formal axioms and constraints" [39].They provide a formal way to represent sensor data, context, and situations into well-structured terminologies, which make them understandable, shareable, and reusable by both humans and machines.A considerable amount of knowledge engineering effort is expected in constructing the knowledge base, while the inference is well supported by mature algorithms and rule engines.Some examples of using ontologies in identifying situations are given by [40] (later evolved in [20,21]).Instead of using ontologies to infer activities, they use ontologies to validate the result inferred from statistical techniques.Clearly, each approach adopts the formalism most devised for its application, and many aspects can be modeled under different granularities.
The way an AmI system makes decisions on the actions can be compared to decision-making in AI agents.As an example, reflex agents with state, as introduced in [42], take as input the current state of the world and a set of Condition-Action rules to choose the action to be performed.Similarly, Augusto [43] introduces the concept of Active DataBase (ADB) composed by Event-Condition-Action (ECA) rules.An ECA rule basically has the form "ON event IF condition THEN action", where conditions can take into account time.
The first attempts to apply techniques taken from the business process management-BPM [44] area were the employment of workflow specifications to anticipate user actions.A workflow is composed by a set of tasks related by qualitative and/or quantitative time relationships.Authors in [45] present a survey of techniques for temporal calculus (i.e., Allen's Temporal Logic and Point Algebra) and spatial calculus aiming at decision-making.The SPUBS system [46,47] automatically retrieves these workflows from sensor data.

Model Construction
Modeling formalisms in the literature can be roughly divided into specification-based and learning-based [1].Research in the field of AmI started when few kinds of sensors were available and the relationships between sensor data and underlying phenomena were easy to establish.Specification-based approaches represent hand-made expert knowledge in logic rules and apply reasoning engines to infer conclusions and to make decisions from sensor data.These techniques evolved in the last years in order to take into account uncertainty.The growing availability of different kinds of sensors made hand-made models impractical to be produced.In order to solve this problem, learning-based methods employ techniques from machine learning and data mining.Specification-based models are usually more human-readable (even though a basic experience with formal logic languages is required), but creating them is very expensive in terms of human resources.Most learning-based models are instead represented using mathematical and statistical formalisms (e.g., HMMs), which make them difficult to be revised by experts and understood by final users.These motivations are at the basis of the research of human-readable automatically inferred formalisms.
Learning-based techniques can be divided into supervised and unsupervised techniques.The former expect the input to be previously labeled according to the required output function, hence, they require a big effort for organizing input data in terms of training examples, even though active learning can be employed to ease this task.Unsupervised techniques (or weakly supervised ones, i.e., those where only a part of the dataset is labeled) can be used to face this challenge but a limited number of works is available in the literature.
Unsupervised techniques for AmI knowledge modeling can be useful for other two reasons.Firstly, as stated in the introduction, sometimes knowledge should not be considered as a static resource; instead it should be updated at runtime without a direct intervention of the users [15], hence, updating techniques should rely on labeling of sensor data as little as possible.Moreover, unsupervised techniques may also result useful in supporting passive users, such as guests, that do not participate in the configuration of the system but should benefit from its services as well.
Performing learning or mining from sequences of sensor measurements poses the issue of how to group events into aggregates of interests (i.e., actions, activities, situations).Even with supervised learning techniques, if labeling is provided at learning time, the same does not hold at runtime where a stream of events is fed into the AmI system.Even though most proposed approaches in the AmI literature (especially supervised learning ones) ignore this aspect, windowing mechanism are needed.As described in [17], the different windowing methods can be classified into three main classes, namely, explicit, time-based and event-based.

•
Explicit segmentation.In this case, the stream is divided into chunks usually following some kind of classifier previously instructed over a training data set.Unfortunately, as the training data set simply cannot cover all the possible combinations of sensor events, the performance of such a kind of approach usually results in single activities divided into multiple chunks and multiple activities merged.

•
Time-based windowing.This approach divides the entire sequence into equal size time intervals.This is a good approach when dealing with data obtained from sources (e.g., sensors like accelerometers and gyroscopes) that operate continuously in time.As can be easily argued, the choice of the window size is fundamental especially in the case of sporadic sensors as a small window size could not contain enough information to be useful whereas a large window size could merge multiple activities when burst of sensors occur.

•
Sensor Event-based windowing.This last approach splits the entire sequence into bins containing an equal number of sensor events.Usually, bins are in this case overlapping with each window containing the last event arrived together with the previous events.Whereas this method usually performs better than the other, it shows drawbacks similar to those introduced for time based windowing.
In AUG-ECA [9], the rules composing the model are written by experts, so the model formalism is based on knowledge bases.The rules will be evaluated by exploiting evidential reasoning (ER) and the parameter for the ER are trained in a supervised way.The rules structure follows the Event-Condition-Action (ECA) paradigm, borrowed by database techniques.The rules specify how the system has to react to a given event in a specific context.The action performed by the system influences the status of the system itself, potentially generating another event managed with a reaction is other rules.
Figure 6 shows some examples of ECA rules used to model some activities.The structure has the typical layout of a IF-THEN construct, in which the conditions represent sensor triggering (or its interpretation), and the conditions are expressed as probabilities regarding the context elements and the auxiliary actions performed.The limits of this approach are related to the big effort needed to model successfully the environment and the possibility of producing conflicting conditions.
The evolution in specification-based approaches to overcome the previous limits is represented by ontologies.In CHEN-ONT [18], an activity is modeled as a class: the entities and the properties correlated to them are divided in three groups (see Figure 7).The first one is about the context, the second group represents the causal and/or functional relations, the properties of the third group denote the type and interrelationship between activities.Pictures inspired by AUG-ECA [9].[18] to model the aspects of smart spaces.(a) Ontology example used to model the Smart Environment domain.Picture inspired to CHEN-ONT [18].(b) Ontology example used to model the activities correlations.Picture taken from CHEN-ONT [18].(c) Ontology example used to model the sensors properties.Picture taken from CHEN-ONT [18].
In RIB-PROB [20,21], the multilevel model is obtained combining ontologies and/or grouping elements at previous levels.The Atomic Gestures model is obtained just considering log elements.The Manipulative Gestures are computed considering ontology and axioms.The Simple Activities are obtained grouping Manipulative Gestures.Finally, for Complex Activities, ontologies are involved.Figure 5c represents a portion of the resulting ontology model.The dashed lines represent the relations super/sub between classes.The individual classes have relations that describe dependencies.Moreover Description Logic is employed to support ontological reasoning, which allows also to check the knowledge base consistency.It also infers additional information from registered facts.
In NUG-EVFUS [22], the interrelationships between sensors, context and activities are represented as a hierarchical network of ontologies (see Figure 8).A particular activity can be performed or associated with a certain room in the house, this information is modeled with an ontology of the network. .Hierarchical ontology structure adopted in NUG-EVFUS [22] to model activities in a smart space.
In CASAS-HMM [14], each activity is performed in a protected environment, and the resulting log is recorded and labeled.Then, HMM model is built upon this dataset in a supervised way.The resulting model is shown in Figure 2a.Observations (squares) model the sensor triggering, the states (circles) model the activities that can generate the observations according to certain probabilities.The goal is to infer the activities by processing the observations.This recognition technique is supporting single user data, but the problem of modeling multiple users is introduced.The same team, in CASAS-SVM [17], employs SVM.In this second work, authors propose an interesting analysis of the different windowing strategies to be employed to gather measurements into observation vectors.Finally, in CASAS-HMMNBCRF [16], experiments are performed with the same methodology adding CRF and NB modeling techniques to the analysis.
In WANG-EP [19], from log of sequential activities, Emerging Patterns are mined and the resulting set composes the model.
In KROS-CRF [24], the model is trained out from a labeled dataset.The log is divided in segments 60-s long and each segment is labeled.The dataset is composed by multi-days logs: one day is used for testing the approach, the remaining for training the models.The resulting model is an undirected graph as in Figure 2b.
In REIG-SITUATION [25], a SVM model, built on statistical values extracted from the measurements of a given user, is used for classifying the roles.Then, this information, combined with other statistically extracted ones, is involved into the training of the HMM that models the situations.
In YANG-NN [26], the input vector contains the features to consider, the output vector the classes (activities).The back-propagation learning algorithm is used for training the ANNs.Three neural networks are built on labeled logs: a pre-classifier and two classifiers; static activities and dynamic activities are modeled with separated ANNs.The structure of the neural classifier consists of an input layer, an hidden layer and an output layer.
In LES-PHI [31], given the maximum number of features the activity recognition system can use, the system automatically chooses the most discriminative sub-set of features and uses them to learn an ensemble of discriminative static classifiers for the activities that need to be recognized.Then, the class probabilities estimated from the static classifiers are used as inputs into HMMs.
In BUE-WISPS [32], the users are asked to perform activities.The resulting log is used for training an HMM.
In FLEURY-MCSVM [28], the classes of the classifier model the activities.Binary classifiers are built to distinguish activities: pairwise combinations selection.The number of SVMs for n activities will be n − 1.The features used are statistics from measurements.
The algorithm proposed in CASAS-DISCOREC [11][12][13] is to improve the performance of activity recognition algorithms by trying to reduce the part of the dataset that has not been labeled during data acquisition.In particular, for the unlabeled section of the log, the authors employ a pattern mining technique in order to discover, in an unsupervised manner, human activity patterns.A pattern is defined here as a set of events where order is not specified and events can be repeated multiple times.Patterns are mined by iteratively compressing the sensor log.The data mining method used for activity discovery is completely unsupervised without the need of manually segmenting the dataset or choosing windows and allows to discover interwoven activities as well.Starting from singleton patterns, at each step, the proposed technique compresses the logs by exploiting them and iteratively reprocesses the compressed log for recognizing new patterns and further compress the log.When it is difficult to further compress the log, each remaining pattern represents an activity class.Discovered labels are employed to train HMM, BN and SVM models following the same approach as in the supervised works of the same group.
In CASAS-HAM [15], the sensor log is considered completely unlabeled.Here temporal patterns (patterns with the addition of temporal information) are discovered similarly as in CASAS-DISCOREC [11][12][13] and are used for structuring a tree of Markov Chains.Different activations in different timestamps generate new paths in the tree.Depending to temporal constraints, a sub-tree containing Markov Chains at the leafs that model activities is generated.A technique to update the model is also proposed.Here the goal of the model is the actuation of target devices more than recognition.
Authors in STIK-MISVM [27] introduce a weakly supervised approach where two strategies are proposed for assigning labels to unlabeled data.The first strategy is based on the miSVM algorithm.miSVM is a SVM with two levels, the first for assigning labels to unlabeled data, the second one for applying recognition to the activities logs.The second strategy is instead called graph-based label propagation, where the nodes of the graphs are vectors of features.The nodes are connected by weighted edges, the weights represent similarity between nodes.When the entire training set is labeled, an SVM is trained for activity recognition.
In AUG-APUBS [10], the system generates ECA rules by considering the typology of the sensors involved in the measurements and the time relations between their activations.APUBS makes clear the difference between different categories of sensors:

•
Type O sensors installed in objects, thus, providing direct information about the actions of the users.

•
Type C sensors providing information about the environment (e.g., temperature, day of the week).

•
Type M sensors providing information about the position of the user inside the house (e.g., in the bedroom).
Events in the event part of the ECA rule always come from sets O and M. Conditions are usually expressed in terms of the values provided by Type C sensors.Finally, the action part contains only Type O sensors.
The set of Type O sensor is called mainSeT.The first step of the APUBS method consists of discovering, for each sensor in the mainSeT, the set associatedSeT of potential O and M sensors that can be potentially related to it as triggering events.The method employed is APriori for association rules [36]; the only difference is that possible association rules X ⇒ Y are limited to those where cardinality of both X and Y is unitary and Y only contains events contained in mainSeT.Obviously, this step requires a window size value to be specified in order to create transactions.As a second step, the technique discovers the temporal relationships between the events in associatedSeT and those in mainSeT.During this step, non-significant relations are pruned.As a third step, the conditions for the ECA rules are mined with a JRip classifier [48].
In WANG-HIER [23], starting from the raw log, the authors use a K-Medoids clustering method to discover template gestures.This method finds the k representative instances which best represent the clusters.Based on these templates, gestures are identified applying a template matching algorithm: Dynamic Time Warping is a classic dynamic programming based algorithm to match two time series with temporal dynamics.
In PALMES-OBJREL [29], the KeyExtract algorithm mines keys from the web that best identify activities.For each activity, the set of most important keys is mined.In the recognition phase, an unsupervised segmentation based on heuristics is performed.

Related Work
The literature contains several surveys attempting to classify works in the field of smart spaces and ambient intelligence.Papers are presented in chronological orders.None of the reported surveys, clearly states the modality by which papers have been selected.
Authors in [7] follow an approach similar to this work, i.e., they separately analyze the different phases of the life-cycle of the models.Differently from our work, for what concerns the model construction phase, they focus on classes of learning algorithms instead of analyzing the specific work.Additionally, specification-based methods are not taken into account.
The survey [49] is focused on logical formalisms to represent ambient intelligence contexts and reasoning about them.The analyzed approaches are solely specification-based.Differently from our work, the survey is focused on the reasoning aspect, which is not the focus of our work.
The work [50] is an extensive analysis of methods employed in ambient intelligence.This work separately analyzes the different methods without clearly defining a taxonomy.
Authors in [1] introduce a clear taxonomy of approaches in the field of context recognition (and more generally, about situation identification).The survey embraces the vast majority of the proposed approaches in the area.Equivalently, paper [51] is a complete work covering not only activity recognition but also fine-grained action recognition.Differently from our work, both surveys are not focusing on the life-cycle of models.
Authors in [52] focus on reviewing the possible applications of ambient intelligence in the specific case of health and elderly care.The work is orthogonal to the present paper and all the other reported works, as it is less focused on the pros and cons of each approach, while instead focusing on applications and future perspectives.
A manifesto of the applications and principles behind smart spaces and ambient intelligence is presented in [53].
As in [52], authors in [54] take moves from the health care application scenario in order to describe possible applications.Anyway, this work goes more into details of employed techniques with particular focus on classical machine learning methods.

Discussion
In this paper, we applied the SLR approach to classify most prominent approaches in the ambient intelligence area according to how they approach the different stages of a model life cycle.The kind of classification introduced in the paper allows to choose a specific method or approach according to available sensors, amount of labeled data, need for visual analysis, requirements in terms of enactment and decision-making on the environment.
The field of ambient intelligence has been very active in the last 10 years, and first products (such as NEST) have hit the market.Unfortunately, as of 2018, no examples of holistic approaches, such as those proposed in the vast majority of academic works, are available outside research labs.The reasons for this are multiple.
First of all, the vast majority of approaches either require a big effort in terms of (i) expert time to produce hand-made models in specification based approaches, or (ii) user time to manually label training sets for supervised learning based approaches.Unsupervised methods represent a small percentage on the total amount of academic works.
Secondarily, model update is often not taken into account by the proposed approaches.Human habits, the way of performing activities and preferences generally change over time.In the vast majority of cases, proposed solutions do not provide an explicit method to update models, thus, requiring modeling to start from scratch, with the drawbacks identified in the first step.
As a third point, support for multiple users performing actions separately and multiple users collaborating in a single activity are often neglected by proposed approaches.Supporting multiple users poses serious limitations to the applicability of available solutions.This problem is even harder to solve when human guests must be taken into account.
As a final point, the validity of the declared results is often difficult to confirm on datasets different from the one used for tests, as in the vast majority of cases source code is not made available.The lack of benchmark data, even though many datasets are made freely available by research groups, makes this situation even harder.

Figure 1 .
Figure 1.The ambient intelligence closed loop.Arrows denote information flow.Dashed lines are used to denote optional information flows.

Figure 5
Figure 5 contains three different examples of ontologies employed to model smart spaces.Clearly, each approach adopts the formalism most devised for its application, and many aspects can be modeled under different granularities.

Figure 7 .
Ontologies used in CHEN-ONT

Figure 8
Figure 8. Hierarchical ontology structure adopted in NUG-EVFUS[22] to model activities in a smart space.
. It evidences how the activities are composed by the time correlated states between consecutive actions.
ce Eating Figure 4. Model formalism used in [38] based on R-pTL formulas.The model is based on correlations between events.