The advent of new communication networks and the Internet of Things (IoT) has fostered a new paradigm in which devices are embedded in the environment around us, providing researchers and analysts with large amounts of information [1
]. Human actions are not usually detached from this environment, as multiple devices are sending data which reflects human behavior [2
]. Initiatives as Obama’s 4P [3
], which are personalized, predictive, preventive and participatory, are a pioneer in his words, a new model of patient-powered research
that promises to accelerate biomedical discoveries and provide clinicians with new tools, knowledge, and therapies to select which treatments will work best for which patients.
The understanding of human behavior can support in the selection of adequate treatments according to the characteristics of the patient [4
] or even detect illnesses in early stages where behavioral changes are apparent symptoms of disease inception like erratic movements in dementia [5
] and treatment adherence [6
]. However, the creation of human behavior models is not trivial. On the one hand, human behavior is determined by biological, psychological, and sociocultural factors that depend on multiple variables that make complex the creation of general behavior models [7
]. On the other hand, human behavior evolves in time in a not homogeneous way [8
]. This situation imposes that the model can change after its definition, making it not valid for further assessments and follow-up. This problem, elsewhere known as the concept drift, led authors to propose a continuous update of the model [9
], which need tools to infer individual behavioral models automatically.
The acceptance of new mobile personal technologies and wearable sensors have increased the quantity of data related to human behaviors [10
]. According to CISCO Visual Networking Index Prediction [12
], the number of connected things on the Internet will rise to 26.3 billion by 2020 [13
], which stands as an excellent opportunity towards a new generation of new applications and services at a population level [14
]. A large amount of data available provides a new challenge for creating new models capable of extracting information, insights, visualizations, and to support in daily decisions for improving accuracy in the diagnostics [16
]. In this way, new algorithms and methods have appeared to analyze the behavior through vision [17
] and human activity [18
Pattern recognition and machine learning techniques allow developing models to represent human behaviors [22
]. Although these techniques can create complex mathematical models that discover and classify the behavior of the user, these models are usually not humanly understandable [23
]. Understandability of individualized models allows experts to analyze human behavior better and detect undesired behaviors [25
]. Process mining is a machine learning discipline that infers models from event logs and provides understandable human models by providing significant human behavior details, usually in the form of workflows [26
]. Workflows are a simple representation of processes which can support human behavior analysis not only for detecting the behavioral changes but also offering an understandable view of patterns and insights of a person [27
Several improvements have been achieved in human behavior recognition in the literature. The behavior recognition domain commonly contains various topics such as behavior modeling, activity monitoring, data processing, and pattern recognition [29
]. Process mining is an emerging method in this area which provides extensive models and information about the executed processes applying data from various sources. In several studies, process mining has been used to discover human activities by considering human paths like a business process. Maarif [30
] used process mining to present human daily activity patterns in a graphical representation. Nakatumba and van der Aalst [31
] investigated the impact of workload on service times by considering the relationship between workload and services time. In a recent paper, process mining is applied to a data set of locations in a shopping mall to discover customer paths and classify genders [20
]. Besides, it is applied to analyze the movements of people in operating rooms in 25-week data belonging to nine people collected with RFID technology [23
]. Maruster et al. [32
] established a user behavior model for farmers’ behavioral patterns by linking insights with decision-making methods using process mining. Previous articles studied human activity recognition [33
] and visualization [20
], however, none of them developed an understandable presentation at the individual level. Table 1
summarizes advantages and limitations of the mentioned studies.
In this study, process mining techniques are applied to discover and classify human behavior patterns. The proposed approach is based on the grammatical inference pattern recognition framework [35
] and interprets workflows as timed parallel automatons, as tested by Fernandez-Llatas et al. [36
]. Timed parallel automatons is a formal framework for identifying highly powerful workflows which have a regular grammatical complexity [37
]. The objective is to discover the extent of methods that can detect and describe a wide variety of human behaviors. To this end, data from an in-house real-time location sensor system are used, and new data visualization interfaces are created.
The rest of the paper is structured as follows. The materials and methods are described in Section 2
, including the electronic set up of the indoor location system, process mining techniques, and the calendar views. Section 3
presents the results of the applied methods for the selected three patients to show stable, complex, and unstable behaviors. Section 4
includes the discussion of the results and their limitations. Section 5
states the final remark of our contribution.
In this paper, a novel methodology has presented and evaluated for extracting and understanding human behaviors based on ILS and advanced techniques from process mining. A total of 25 patients were analyzed, with daily data varying from 68 days to 332 days. The selected patients, Patient 18, Patient 10 and Patient 20, have 285, 205, 283 daily data, respectively. Firstly, the quality threshold clustering method is applied to increase the readability of the process mining results. Different threshold values are tried to decide the best group factor. The number of clusters, the average distances between groups, and the number of elements included in the groups are considered to determine the group factor. The method of choosing a better group factor is explained in Figure 3
. Then, three representative patients are selected to demonstrate stable, complex, and unstable behaviors.
Two view models are created for each patient behavior. A discovery algorithm in process mining creates the workflow view model. The clustered groups are depicted by workflows to facilitate to make comparisons. The calendar view is created to realize real behavioral changes on a daily basis.
Patient 18 has a stable behavior because the average distances between clusters are lower. Moreover, the patient’s behaviors are nearly similar for weekdays and weekends. The patient generally stays more in the Bedroom and Living room. Patient 18 has two main behaviors in weekdays and weekends. On weekdays, the patient spent almost all the time in the Living room and Bedroom. On the other hand, the patient generally either goes out or stays at home on weekends. For example, Patient 18 went out all weekends in February.
Patient 10 has complex behaviors. The average distances between clusters have fluctuated. Furthermore, the patient’s behaviors are too different in each day. In the workflow view, transactions between the Kitchen and Living room are typical behavior for Patient 10. However, the duration in the rooms varies a lot. In calendar view, behavior in August is very different. The patient has eight different behaviors shown by different colors in 22 days in August.
Patient 20 has more unstable and complex behaviors. The patient’s behaviors change day by day. Besides, the average distances between groups are higher than distances of the other patients’ groups. In the workflow view, it is interesting that Patient 20 was not been seen in the Kitchen in Group 5. It is probably that the patient left the house earlier and had breakfast out of their home. The patient had a long shower and watched TV in the Living room after coming home. In the calendar view, the patient has similar behaviors in summer and December. Also, this behavior was never seen in January.
The significant advantage of the proposed methodology is to present readable and understandable results by not only experts but also non-experts. Many previous studies focused on human activity recognition, especially patient behaviors. For example, Kim et al. [33
] developed a hidden Markov chain to model simple human activities such as eating, sitting, standing, and turning left–right. The results are hard to interpret for a non-expert on Markov models. On the other hand, some researchers benefited from process mining to visualize human activities. Maarif [30
] built a visual model for human activity patterns on a daily basis in a smart home environment by using Heuristic miner and Fuzzy Miner in ProM. These algorithms mainly ignore infrequent event logs. However, infrequent event logs can have critical behaviors regarding the daily movements of patients [20
]. Our methodology overcomes ignoring infrequent behaviors by the PALIA algorithm.
Although previous studies generated some graphical insight with process mining, such models are still complex and hard to understand. Process mining visualization is improved by means of a quality threshold clustering to create calendar views. Quality threshold clustering allows discovering several clustering patterns depending on two factors (similarity and distance), a valuable feature when the number of clusters is unknown. Since the number of clusters is unknown for human activities, the suitability is demonstrated in this study. Quality threshold clustering in process mining considers the similarity of the followed paths and spent time in the locations for creating groups. It also ensures that the similarity in a cluster is at least in a predefined quality level, which is used in the study group factor and increases the reliability of the results. Further work will probably consider other clustering techniques which take into account other important factors. For instance, the Levenshtein distance can measure the difference between two sequences ( i.e., the order in which the patient went from one room to another) or other techniques as the Principal Component Analysis to reduce the dimensionality of data and detect possibly correlated variables.
describes the advantages and limitations of the proposed solution with respect to state of the art.
Developing IoT technology surrounds us with smart devices that can collect a vast amount of data [50
]. In many cases, the topic of analyzing human actions and behaviors is one of the significant areas in which the collected data can be used. Understanding of daily patient behavior is helpful to strengthening the adherence to the treatments [52
]. Since the interest has increased in the control of patient behaviors to manage their illness, wearable smart devices were established into the houses of the 25 patients. The collected data by sensors are analyzed to understand the patients’ behaviors. Many factors affecting human behaviors make it complicated to create general behavior models. Because of these drawbacks, process mining techniques are chosen to discover individual behavior models. Understandable human behavior models by providing significant human behavior details are created in the workflows form. In addition to the understandable human view, real behavioral changes over time are detected. Sensors that are seamlessly integrated into IoT devices enable fast, reliable, and smoother operation of the process with automatic monitoring and configuration features. The connection of devices is managed as a part of distributed systems to enable the accessibility of all related information in real-time processing [53
The study has the following limitations: (1) Even though different behaviors can be observed when applying different clustering thresholds, the proposed methodology needs to be tested using subjects that have a formal diagnosis of mental health or syndrome to validate our approach. (2) The data log may need quality assessment preprocessing to remove errors introduced by the ILS and unexpected events. (3) The data used in this study is limited to 25 residents. Including more massive datasets may be an essential factor for process mining-based human behavior discovery and classification. These limitations should be taken into account, scalability of such systems and generalization of the conclusions, and thus, the development of algorithms to correct ILS data is proposed to work in the future and to use other tracking technologies.