Individual Behavior Modeling with Sensors Using Process Mining

: Understanding human behavior can assist in the adoption of satisfactory health interventions and improved care. One of the main problems relies on the deﬁnition of human behaviors, as human activities depend on multiple variables and are of dynamic nature. Although smart homes have advanced in the latest years and contributed to unobtrusive human behavior tracking, artiﬁcial intelligence has not coped yet with the problem of variability and dynamism of these behaviors. Process mining is an emerging discipline capable of adapting to the nature of high-variate data and extract knowledge to deﬁne behavior patterns. In this study, we analyze data from 25 in-house residents acquired with indoor location sensors by means of process mining clustering techniques, which allows obtaining workﬂows of the human behavior inside the house. Data are clustered by adjusting two variables: the similarity index and the Euclidean distance between workﬂows. Thereafter, two main models are created: (1) a workﬂow view to analyze the characteristics of the discovered clusters and the information they reveal about human behavior and (2) a calendar view, in which common behaviors are rendered in the way of a calendar allowing to detect relevant patterns depending on the day of the week and the season of the year. Three representative patients who performed three different behaviors: stable, unstable, and complex behaviors according to the proposed approach are investigated. This approach provides human behavior details in the manner of a workﬂow model, discovering user paths, frequent transitions between rooms, and the time the user was in each room, in addition to showing the results into the calendar view increases readability and visual attraction of human behaviors, allowing to us detect patterns happening on special days.


Introduction
The advent of new communication networks and the Internet of Things (IoT) has fostered a new paradigm in which devices are embedded in the environment around us, providing researchers and analysts with large amounts of information [1]. Human actions are not usually detached from this environment, as multiple devices are sending data which reflects human behavior [2]. Initiatives as Obama's 4P [3], which are personalized, predictive, preventive and participatory, are a pioneer in his a regular grammatical complexity [37]. The objective is to discover the extent of methods that can detect and describe a wide variety of human behaviors. To this end, data from an in-house real-time location sensor system are used, and new data visualization interfaces are created. Table 1. A summary of the advantages and limitations of the previous studies.

Study Advantages
Limitations [30] A graphical insight about the human activity on daily basis Only most frequent activity sequences are examined [31] The relationship between workload and service time is investigated with regression analysis The study needs more realistic by adequately modeling resources based on empirical data. Simulation models, which is the method used in the study, are often based on incorrect assumptions [20] An overview of gender behaviors in different months concerning followed similar paths Data quality issues in the preprocessing stage Only most frequent followed paths are examined [23] The algorithm allows for the inference of parallel activities and sequences The study is limited by the number of cases available for observation The study needs to investigate data with more information about the user's daily actions [32] Support the redesigning and personalization of decision support systems The study needs detailed navigation behavior of different target groups The rest of the paper is structured as follows. The materials and methods are described in Section 2, including the electronic set up of the indoor location system, process mining techniques, and the calendar views. Section 3 presents the results of the applied methods for the selected three patients to show stable, complex, and unstable behaviors. Section 4 includes the discussion of the results and their limitations. Section 5 states the final remark of our contribution.

Materials and Methods
The proposed solution includes three main components: (1) Indoor Location Systems to track human activity in a house by means of sensors; (2) process mining tools and techniques, to discover and classify human behaviors, and (3) the calendar views, to display the human behavior patterns in the format of a schedule to identify patterns. Figure 1 depicts the developed methodology for this study, and the following subsections describe in detail the entire process.

Indoor Location Systems
Indoor Location Systems (ILS) are defined as the technological infrastructures (hardware and software) that allow tracking assets and persons across indoor settings. The setup and type of technology used to track elements depend on the type of applications, elements to track and deployment characteristics. There is a wide range of technologies such as RFID [38], Bluetooth [39] and ZigBee [40] among others. Such systems have been previously used to support the activity recognition of humans in smart homes [41], AAL solutions [42] and the design of activities primitives [43] In this study, an ILS based on Passive Infrared (PIR) sensors, which is an electronic sensor that measures the infrared light radiated from objects in the field of vision and are spread used for motion detectors is used [44]. Figure 2 presents an example of PIR systems. Each PIR sensor needs power supply at 5 V and 65 mA and provides a digital output (0-3.3 V). The sensor is equipped with a Fresnel lens that enhances the sensing range to 120 degrees and 7 m under normal temperature conditions. The sensor sensitivity and the output pulse width can be adjusted through two potentiometers. These sensors are cheap, reliable, unobtrusive, and easy to install. Sensors are installed and tagged in every room of the elderly individual housing so that the location provided by the sensor corresponds to a single room and a single user. Houses with more than two users were not included in the study. PIR sensors are connected to a central server which stores positive events (motion detection) into a database concerning the current timestamp and the id of the room.

Process Mining and Clustering
Process mining techniques are used to provide comprehensive models and information about human behaviors using data from the system. The data extracted from these devices are used to build an event log which contains the relevant information (activities completed, timestamps, case IDs and the resources that performed those activities), to be used in the available tools such as ProM [45], PALIA Suite [46] and DISCO [47]. These tools provide an output of different models that allow the detailed visualizations of the processes.
A discovery algorithm in process mining is applied to identify trajectories, and the process followed in the indoor scenario [27]. The PALIA Suite has an implementation of quality threshold clustering technique [46]. It enables to group human behaviors by a predefined similarity ratio using a workflow distance based on the error correcting method [48]. The reason to choose the quality threshold algorithm is two-fold. In many clustering techniques, the algorithm requires the number of clusters as an input [49]. However, it is difficult to determine in human behavior modeling. The quality threshold clustering is selected because it does not need a predefined cluster number. Additionally, it considers both the sequence of the nodes and spent time in the nodes.

Calendar Views
The calendar views are developed to visualize data collected by sensors using process mining. PALIA Suite, a process mining tool specifically designed to work with indoor location systems data in the health care domain, is used to create calendar views [28]. The main features of the PALIA Suite are the discovery of the process model, conformance analysis of the process, and the enhancement of the view of the whole process through the use of process mining algorithms over localization events. Additionally to this, our purpose is to generate several calendar views to visualize the evolution and heterogeneity of patient behaviors in a human understandable way.
The generation of two different views is proposed: (1) A workflow model view of human behaviors is generated using sensor data (Process discovery and process clustering in Figure 1), obtained from the setup described in Section 2.1. The workflow model describes the movements of a patient between the different locations in the house (e.g., from the kitchen to the bedroom) and identifies where it usually behaves. (2) A calendar view based on the different workflows obtained for different patient behaviors. Each day of the calendar view is classified with a specific color according to the workflow model behavior of that specific patient (Calendar views in Figure 1).

Patient Data Information
A total of 25 participants who signed the informed consent were tracked using the ILS during a period varying from 68 days to 332 days. Table 2 provides the average distance walked by the user inside the house and the total registered number of days. Data from sensors were collected in the format of an event log, in which the system sampled the location of the user by means of motion detection, assigning to each event a universal timestamp, the house identification (ID in Table 2) and room label.  Figure 3 presents the number of clusters (y-axis) depending on the grouping factor index (x-axis). Since the number of clusters is a priory unknown, a quality threshold clustering method is applied to group patient trajectories into a data-driven number of clusters, but this technique needed a predefined quality threshold. Therefore, different thresholds were tested to assign behavior patterns into the same cluster. For example, for a group factor of 0.05, the technique groups behaviors in the same cluster if they are similar at a ratio of least 5%. As soon as the group factor increases, the number of clusters normally decreases, and the average duration of behavior normally increases. As a counterpart, if this relation is inverted, it means there are some unusual behaviors which are indicators of complexity in individual behaviors. At the same time, a higher number of clusters with the same grouping factor indicates a user featuring a complex behavior.
On the one hand, if the clustering results in three numbers of clusters, it is an indication of low complexity behavior. On the other hand, showing all behavior models within 30 clusters means higher complexity because they include many different behaviors and cannot be grouped into a lower number of clusters. The average days also give a clue about complex behaviors. If the workflows of a patient have a higher number of clusters with lower average days, it reveals a complex behavior. Beyond the number of determined clusters, it is also critical to understand the number of elements included and excluded (outliers) in the clusters. In Figure 3, the size of the point indicates the average number of items per cluster for each grouping factor index. It can be observed that the clusters computed with a group factor below 8% have fewer items than the clusters with a group factor over 20%. Moreover, the filling color shows the number of elements not included in any cluster (outliers). It provides meaningful information. For instance, the number of outliers decreases with the increase of the group factor.
These results show that grouping factors below 16% and above 20% are not well fitted for inferring patient trajectories. The first reason is that many outliers are encountered when the group factor is below 16%. The second reason is that there are a few outliers, but clusters contain complex workflows that are hard to understand when the group factor is above 20%. As a compromise solution, the study focused on selecting a group factor of 20% and three patients who featured three paradigmatic behaviors (Patient 18, Patient 10 and Patient 20) to show exciting and meaningful results of the proposed methodology. Figure 4 indicates details of clustering results for the selected patients. It is an expected result that the number of clusters decreases when the group factor index increases. Besides, the average day should be increased in each cluster. For example, suppose a patient has a 30 day trajectory. If it is divided into three clusters, the average day can be ten days. If it is divided into ten clusters; the average day can be three days. Hence, if the number of clusters decreases, the average day in each cluster increases. The selected patients show the expected results. While the number of clusters does not increase for Patient 18 and Patient 20, when the group factor is 0.13, it rises from 11 to 15 for Patient 10. It is an indicator of complicated behavior that we will discuss later.
Moreover, as long as the number of clusters decreases, the distances between clusters increases because the clusters should be more separable. For Patient 18 and Patient 10, the mean distances between clusters increase, while it is irregular for Patient 20.
Data were analyzed for each of these three patients, and workflow view models were compared with each other according to its similarity. The comparisons provide several clusters that represent the days with similar behavior during the year (as per the calendar views).

Patient Individual Behavior Models
In the workflow views created by process mining, nodes and arrows are colored from green to red. The red nodes refer to the store group that had a higher visit duration. Figures 5-7 include a legend for explaining meaning of the colors. The red arrows present the most executed transitions between two nodes (rooms in the house).

Patient 18
The patient paths are clustered into 5 (4 + 1 outlier) groups at 20% group factor. Patient 18 is determined as a stable person because the average distances between clusters are lower and similar for each group factor. If the distance is smaller, this means the similarity is high. Furthermore, the patient's behaviors are nearly similar for weekdays and weekends. Patient 18 has stable behaviors because colors are more stable in the calendar view. Figure 5 shows the patient's flows. Almost in all paths, the transaction between the bedroom and living room is higher than others. Also, the time spent in the bedroom is normally higher than others due to the duration of sleep.
In Group 0, displayed by a red color, the patient stays more in the bedroom and living room. In Group 1, represented by blue color, time in the living room decreases, and the patient goes for a walk because the Hall shows that the last view of the patient is the entry door. In Group 2, shown by yellow color, and in Group 3, presented by green color, the patient does not go out of the house and stays more at home. Figure 8 shows individual behavioral groups representing the groups of the workflow view of Figure 5. Patient 18, during weekdays, has similar behaviors because all are in the same cluster, Group 0. On the other hand, the patient has different behaviors at weekends. For example, yellow (Group 2) and green (Group 3) colors indicate the patient stayed more at home. In summary, Patient 18 has two regular behaviors in weekdays and weekends.

Patient 10
The patient paths are clustered into 10 (9+1 outlier) groups at 20% group factor and the average is 20.5 days for each group. The average distances between clusters have fluctuated. At the same time, the patient's behaviors are too different in each day. Thus, Patient 10 has complex behaviors. Figure 6 shows the discovered patient paths. Each flow presents a different behavior type representing a different kind of days.
First of all, the number of clustered groups shows the complexity of the patient's behavior. While Patient 18 has a stable behavior supported by the number of four clustered groups, Patient 10 has a complicated behavior that can be seen from nine clustered groups. The patient starts with only one room in some groups such as Group 1, Group 6, and Group 7, with more than one room in other groups. One typical behavior for Patient 10 is the transactions between the Kitchen and Living room is relatively higher than others; almost all groups. On the other hand, the spent time changes a lot for the same rooms. For example, the Hall is one of the rooms, which has the highest duration in Group 1 and Group 8. However, the patient did not spend much time in the Hall in other groups. In a general manner, the Living room is the most preferred room to spend time, and the Bathroom is the least used room for Patient 10. When the Living room is red, the Bedroom is green or yellow (except Group 1) and vice versa. Because of this, it can be concluded that the patient probably slept on the sofa, and it increased the spent time in the Living room. Figure 9 shows the calendar view representing the groups of the workflow view of Figure 6 for a group factor of 0.2. The border colors represent the group discovered by clustering and shown in the workflow view. There are much more differences among groups because colors are less stable. There is a month (July) that the behavior is very different. Although the behaviors presented by Group 0 (red color) are dominant in July, the behaviors presented by Group 4 (turquoise border), Group 2 (yellow border), Group 1 (blue color), Group 3 (green border) and can also be seen.

Patient 20
Eight (7 + 1 outlier) groups are created at 20% group factor, and the average day is found at 31.5 days by each group. The patient's behaviors change day by day. Moreover, the average distances between groups are higher than distances of the other patients' groups. Therefore, Patient 20 has more complex and unstable behaviors. Figure 7 presents the patient's flows. Each flow shows a different behavior type representing a different kind of day. Similar to other patients, the Living room is a room that spent more time in almost all groups. Nevertheless, the transactions from the Living room vary for different groups. For instance, in the most dominant group, Group 0, the patient passed the Bathroom from the Living room and vice versa. In Group 1, the second dominant group, the transactions between the Living room and Kitchen, are highest. Interestingly, Patient 20 was not seen in the Kitchen in Group 5. It is possible that the patient left the house earlier and had breakfast out of the house. The patient had a long shower and watched TV in the Living room after coming home. Figure 10 shows the calendar view for representing the workflow view. There are much more differences among groups because colors are less stable. Red colors (Group 0) are the most dominant behaviors. It is interesting that the number of blue colors (Group 1) is increased in the summer and December, and there is no blue color in January. The behavior presented by Group 5 is mainly encountered in January.

Discussion
In this paper, a novel methodology has presented and evaluated for extracting and understanding human behaviors based on ILS and advanced techniques from process mining. A total of 25 patients were analyzed, with daily data varying from 68 days to 332 days. The selected patients, Patient 18, Patient 10 and Patient 20, have 285, 205, 283 daily data, respectively. Firstly, the quality threshold clustering method is applied to increase the readability of the process mining results. Different threshold values are tried to decide the best group factor. The number of clusters, the average distances between groups, and the number of elements included in the groups are considered to determine the group factor. The method of choosing a better group factor is explained in Figure 3. Then, three representative patients are selected to demonstrate stable, complex, and unstable behaviors.
Two view models are created for each patient behavior. A discovery algorithm in process mining creates the workflow view model. The clustered groups are depicted by workflows to facilitate to make comparisons. The calendar view is created to realize real behavioral changes on a daily basis. Patient 18 has a stable behavior because the average distances between clusters are lower. Moreover, the patient's behaviors are nearly similar for weekdays and weekends. The patient generally stays more in the Bedroom and Living room. Patient 18 has two main behaviors in weekdays and weekends. On weekdays, the patient spent almost all the time in the Living room and Bedroom. On the other hand, the patient generally either goes out or stays at home on weekends. For example, Patient 18 went out all weekends in February.
Patient 10 has complex behaviors. The average distances between clusters have fluctuated. Furthermore, the patient's behaviors are too different in each day. In the workflow view, transactions between the Kitchen and Living room are typical behavior for Patient 10. However, the duration in the rooms varies a lot. In calendar view, behavior in August is very different. The patient has eight different behaviors shown by different colors in 22 days in August. Patient 20 has more unstable and complex behaviors. The patient's behaviors change day by day. Besides, the average distances between groups are higher than distances of the other patients' groups. In the workflow view, it is interesting that Patient 20 was not been seen in the Kitchen in Group 5. It is probably that the patient left the house earlier and had breakfast out of their home. The patient had a long shower and watched TV in the Living room after coming home. In the calendar view, the patient has similar behaviors in summer and December. Also, this behavior was never seen in January.
The significant advantage of the proposed methodology is to present readable and understandable results by not only experts but also non-experts. Many previous studies focused on human activity recognition, especially patient behaviors. For example, Kim et al. [33] developed a hidden Markov chain to model simple human activities such as eating, sitting, standing, and turning left-right. The results are hard to interpret for a non-expert on Markov models. On the other hand, some researchers benefited from process mining to visualize human activities. Maarif [30] built a visual model for human activity patterns on a daily basis in a smart home environment by using Heuristic miner and Fuzzy Miner in ProM. These algorithms mainly ignore infrequent event logs. However, infrequent event logs can have critical behaviors regarding the daily movements of patients [20,23]. Our methodology overcomes ignoring infrequent behaviors by the PALIA algorithm.
Although previous studies generated some graphical insight with process mining, such models are still complex and hard to understand. Process mining visualization is improved by means of a quality threshold clustering to create calendar views. Quality threshold clustering allows discovering several clustering patterns depending on two factors (similarity and distance), a valuable feature when the number of clusters is unknown. Since the number of clusters is unknown for human activities, the suitability is demonstrated in this study. Quality threshold clustering in process mining considers the similarity of the followed paths and spent time in the locations for creating groups. It also ensures that the similarity in a cluster is at least in a predefined quality level, which is used in the study group factor and increases the reliability of the results. Further work will probably consider other clustering techniques which take into account other important factors. For instance, the Levenshtein distance can measure the difference between two sequences ( i.e., the order in which the patient went from one room to another) or other techniques as the Principal Component Analysis to reduce the dimensionality of data and detect possibly correlated variables. Table 3 describes the advantages and limitations of the proposed solution with respect to state of the art. Table 3. Advantages and limitations of the proposed solution.

Advantages
Readable and understandable results by not only experts but also non-experts Process mining application as a novel solution for human behavior analysis on daily basis Discovering similar behaviors from human indoor paths by clustering analysis (workflow model) Visualization of human behaviors and activity patterns to understand behavioral changes (calendar view) Dealing with infrequent behaviors which mainly ignored but may include critical details in healthcare

Limitations
Difficulty in understanding human behaviors when people have mental health or syndrome problems Need for more clustering experiments of the clustering models Need deep data processing to remove errors and assess data quality Developing IoT technology surrounds us with smart devices that can collect a vast amount of data [50,51]. In many cases, the topic of analyzing human actions and behaviors is one of the significant areas in which the collected data can be used. Understanding of daily patient behavior is helpful to strengthening the adherence to the treatments [52]. Since the interest has increased in the control of patient behaviors to manage their illness, wearable smart devices were established into the houses of the 25 patients. The collected data by sensors are analyzed to understand the patients' behaviors. Many factors affecting human behaviors make it complicated to create general behavior models. Because of these drawbacks, process mining techniques are chosen to discover individual behavior models. Understandable human behavior models by providing significant human behavior details are created in the workflows form. In addition to the understandable human view, real behavioral changes over time are detected. Sensors that are seamlessly integrated into IoT devices enable fast, reliable, and smoother operation of the process with automatic monitoring and configuration features. The connection of devices is managed as a part of distributed systems to enable the accessibility of all related information in real-time processing [53] The study has the following limitations: (1) Even though different behaviors can be observed when applying different clustering thresholds, the proposed methodology needs to be tested using subjects that have a formal diagnosis of mental health or syndrome to validate our approach. (2) The data log may need quality assessment preprocessing to remove errors introduced by the ILS and unexpected events. (3) The data used in this study is limited to 25 residents. Including more massive datasets may be an essential factor for process mining-based human behavior discovery and classification. These limitations should be taken into account, scalability of such systems and generalization of the conclusions, and thus, the development of algorithms to correct ILS data is proposed to work in the future and to use other tracking technologies.

Conclusions
The study fills the literature gaps in two aspects. First, is to apply a clustering analysis to find similar behaviors, called the workflow view. A quality threshold clustering is chosen because the number of clusters is an unknown a priory. Moreover, each clustered behavior has a similarity ratio, defined in our experiments as the group factor. Different group factors are tested to find better clusters showing differences between patient behaviors. Second, is to improve the understandability of the result by a calendar, called the calendar view. The discovered and clustered workflow views are adopted into the calendar to present the behavioral changes in each day. The study proposes a novel solution moving away from the traditional predictive algorithms by using process mining and quality threshold clustering analysis to discover similar behavior from their indoor paths via the workflow model. Moreover, using calendar view improves the visual understanding of human behaviors and activity patterns.