Leveraging Semi-Markov Models to Identify Anomalies of Activities of Daily Living in Smart Homes Processes

Shaikh, Eman; McClean, Sally; Tariq, Zeeshan; Scotney, Bryan; Mohammad, Nazeeruddin

doi:10.3390/a19020150

Open AccessArticle

Leveraging Semi-Markov Models to Identify Anomalies of Activities of Daily Living in Smart Homes Processes

by

Eman Shaikh

¹,

Sally McClean

^1,*

,

Zeeshan Tariq

¹,

Bryan Scotney

¹ and

Nazeeruddin Mohammad

²

¹

School of Computing, Ulster University, Belfast BT15 1ED, UK

²

School of Computer Science, Adelaide University, Adelaide, SA 5005, Australia

^*

Author to whom correspondence should be addressed.

Algorithms 2026, 19(2), 150; https://doi.org/10.3390/a19020150

Submission received: 12 January 2026 / Revised: 6 February 2026 / Accepted: 10 February 2026 / Published: 12 February 2026

(This article belongs to the Special Issue Mathematical Modelling in Engineering and Human Behaviour (4th Edition))

Download

Browse Figures

Versions Notes

Abstract

Stochastic Process Mining, in particular, Markov processes, is used to represent uncertainty and variability in Activities of Daily Living (ADLs). However, the Markov models inherently assume that the time spent in each state must follow an exponential distribution. This presents a significant challenge to model real-life complexities in ADLs. Therefore, this paper employs semi-Markov models on publicly available ADL event logs to model state durations, where results are validated via goodness-of-fit tests (Kullback–Leibler, Kolmogorov–Smirnov, Cramér–von Mises). Synthetic durations are generated using the inverse transform sampling technique. To simulate dementia-based behaviours, the weights of the mixture model are altered to reflect prolonged duration in napping, toileting, meal, and drink preparation. These anomalies are then detected through the employment of log-likelihood ratio and chi-square tests. Experimental results demonstrate that the proposed approach can be used to reliably identify abnormal ADL durations, offering a proven framework to track early detection of behavioural shifts, and showcasing the effectiveness of detecting duration-based anomalies in ADL. By identifying such anomalies, our work aims to detect deterioration in the smart home resident’s condition, focusing in particular on their ability to execute different ADLs.

Keywords:

semi-Markov; Markov process; process mining; process duration; smart homes; mixture models; daily activity; human behaviour; anomalies

1. Introduction

In recent decades, advances in medicine and technology have played a pivotal role in promoting the steady growth of the elderly population [1]. A recent report from the World Health Organization estimates that the number of elderly individuals will grow from 1.4 to 2.1 billion by 2050 [2]. While this growth may appear positive, it is, however, linked with several challenges associated with ageing, such as cognitive and physical decline, decline in sensory functions, and a heightened risk of diseases. Among these, dementia is one of the most significant challenges that impacts the elderly population [3].

In general, dementia refers to a syndrome that can profoundly influence the cognitive skills, memory, and thinking abilities of individuals [4]. It is a progressive and permanent condition that can cause continuous and irreversible decline in cognitive function. Recent reports suggest that about 75% of individuals who suffer from dementia remain undiagnosed [5]. Because traditional clinical assessments often fail to detect these cases, the condition can go unnoticed until its symptoms become severe enough to ultimately affect the way individuals perform their Activities of Daily Living (ADLs).

Fundamentally, ADLs are commonly described as self-care tasks such as eating, showering, dressing, and sleeping, that are essential to maintain physical and mental well-being. As dementia progresses, the ability to perform such simple activities becomes progressively difficult. Individuals may find themselves taking longer naps [6] or taking a longer time to prepare meals [7] or performing toileting activities [8]. As a result, there is a heightened need for assistance from dedicated caretakers or family members, which often involves substantial healthcare costs, creating a pressing need for affordable and assistive technologies.

Smart homes use Internet of Things (IoT), mobile communication, and embedded systems, along with sensors such as smart plugs, environmental, motion, and contact sensors, to enable non-invasive and uninterrupted monitoring of ADLs [9,10,11]. The role of these sensors is to enhance the overall well-being and security of the individual. However, the true effectiveness of a smart home relies on transforming the collected sensor logs into valuable observations. This can be achieved through a technique called Process Mining (PM), which utilises sensor logs consisting of time-stamped activities, activity type, and resident ID to develop process models. These models, in turn, examine activity sequences, identify individual routines, extract meaningful insights, and reveal underlying behavioural patterns. However, traditional PM techniques face the limitation of assuming that the individual routines follow a fixed and deterministic pattern [12]. This poses a significant challenge because, in real-world settings, individuals perform daily activities in a flexible and inconsistent manner.

Stochastic Process Mining can be utilised to effectively incorporate the inherent uncertainty and variability in the daily activities performed by individuals. The purpose of this approach is to allow for the characterisation of probabilistic patterns and the representation of deviations in the sequences of activities, offering a more comprehensive and realistic way to recognise irregular and complex behaviour patterns. Markov models such as the hierarchical Markov model [13], hidden Markov model [14], and higher-order Markov [15] are commonly used in stochastic Process Mining to model human behaviour [16]. However, these models are typically developed under the assumption that the time spent on each activity strictly follows an exponential distribution. This assumption often fails to effectively represent the temporal characteristics observed in individual behaviour. Such a strict assumption can limit the ability of the model to characterise daily activities, which often possess variable durations [17]. For instance, monitoring drink and meal preparation activities is medically and physiologically crucial for individuals living with dementia. As functional and cognitive abilities progressively decline in these individuals, the time required to execute these activities increases. This functional decline may contribute to secondary consequences, such as dehydration and malnutrition, which can ultimately result in higher hospitalisation rates [18]. Furthermore, measuring the time taken to perform these activities helps provide an early indicator of functional deterioration. Previous studies, including work done by Grammatikopoulou et al. [19], have demonstrated that the duration of these activities can effectively help to distinguish between healthy individuals and those experiencing dementia. Therefore, effectively representing such patterns is crucial, as early dementia issues often emerge gradually, initially presenting as a subtle increase in the time taken to execute the ADLs rather than abrupt shifts. As a result, the current research on the modelling of ADLs still faces several challenges. In particular:

1.: Current studies for analysing human activity focus primarily on sequence-based recognition of ADLs, often neglecting the explicit modelling of ADL durations. As ADLs evolve, incorporating the duration of such activities is crucial for a comprehensive understanding.
2.: Existing semi-Markov studies model ADL durations, but they do not consider duration-based anomalies that progress over time.
3.: Several approaches in activity recognition require the availability of a large amount of data. However, in the real world, especially in the smart home and healthcare domain, collecting such data is costly and not always practical.
4.: Present studies on anomaly detection fail to consider anomalies in ADL durations, which limits their efficiency for long-term behavioural assessment.

To address the above challenges, our paper uses the Weibull distribution to represent ADL duration distributions. The model leverages both the shape and scale parameters to effectively capture various ADL durations, ranging from long to short durations, as observed in the data [20]. Building on this approach, we employed the Weibull mixture model to further enhance its flexibility, enabling the model to effectively characterise the heterogeneous patterns in the ADL durations [17]. In our case, we used the Weibull mixture model to characterise and examine deviations in the ADL durations of smart home residents. To achieve this, our work used a publicly available smart home dataset [21] as the basis to generate synthetic ADL duration data. The artificial data helped in generating a baseline distribution of normal activity durations and offered a systematic approach to detect the gradual deterioration of the duration distribution. The novelty of our work lies in the simulation and detection of duration-based ADL anomalies. This is achieved by altering the weights of the fitted Weibull mixture model. This enables us to replicate variable shifts in the ADL durations rather than depending upon arbitrary anomaly patterns. Unlike contemporary approaches that often focus on observed anomalies, our work introduced a progressive and controlled rise in abnormal ADL durations. The progression of such deviations was observed by employing a rolling monthly window to capture how the duration patterns of the smart home residents change over successive intervals. By identifying such anomalies, our work aims to focus on identifying deterioration in human behaviour, particularly by assessing the ability to execute the different ADLs.

In summary, the major contributions are as follows:

1.: A novel framework was developed that models ADL duration distributions using a Weibull mixture model to effectively represent the characteristics and heterogeneity of the different ADLs executed by the smart home residents.
2.: The inverse transform sampling technique was implemented to generate synthetic ADL durations, while abnormal ADL durations were simulated by altering the mixing proportion of the fitted Weibull mixture model to represent gradual behavioural changes.
3.: Log-likelihood ratio and chi-square tests were employed to effectively detect anomalies in the ADL durations.
4.: Sequential monthly comparisons of baseline versus gradual assessment were conducted, allowing for detection of variations in both minor temporal variations and significant deviations in ADL durations.

The remainder of this paper is organised as follows. Section 2 discusses the related work. Section 3 presents the proposed methodology. Section 4 demonstrates the results. Finally, Section 5 provides the conclusion and discussion.

2. Related Work

This section provides a brief review of the different techniques, Process Mining approaches, and Stochastic Process Mining methods applied in smart homes.

2.1. Traditional Approaches for Smart Homes

Over the past decade, the amount of sensor data has increased significantly, driven by the widespread use of affordable IoT devices and sensors. Such surge in sensor data has greatly advanced research in the smart home domain. Currently, various studies have focused on enhancing different aspects of smart homes, including resident security [22], comfort [23], and energy efficiency [24].

Activity recognition has emerged as a major research focus. At present, several research works have used activity recognition in smart homes [25]. For instance, Paul and George [26] examined instance-based learning to perform human activity recognition using smartphone techniques. Specifically, their work assessed the effectiveness of K-Nearest Neighbours (KNNs) and an improved clustered KNN model to optimise online activity classification. In [27], the authors utilised binary tree Support Vector Machines and a sliding sensor window to improve the accuracy of activity recognition over traditional SVM models. Xu et al. [28] used raw three-axis accelerometer data to develop a Convolutional Neural Network (CNN) that recognised daily activities. Mahmoodzadeh [29] employed a Deep Belief Network model for visual activity recognition by utilising a combination of integrated image features such as HOG, SIFT, and GIST.

On the other hand, various works have investigated different ways to simulate data to enhance the performance of human activity recognition. For instance, Bennett et al. [30] developed MotionSynthesis, an open-source toolset that uses sensor data extracted from video recordings and body-worn sensors to simulate human movements over time. In. [31], the authors used video recordings and an image transformation network to simulate synthetic radar data. Romero et al. [32] simulated human actions in visual scenarios by utilising artificial 3D human avatars that performed different actions. In [33], synthetic wearable sensor data was simulated by employing a Generative Adversarial Network and a diffusion model to obtain realistic and high-quality time-series data.

Conversely, some studies have focused on anomaly detection within smart home environments. For example, Alaghbari et al. [34] utilised an over-complete deep autoencoder model to perform anomaly detection in ADLs. For this purpose, the reconstruction error was computed from normal behaviour, which was then used to flag anomalous activities. In [35], a Decision Tree (DT) model was implemented to perform activity recognition, after which the identified activities were used to differentiate between normal and abnormal behaviours. The study suggested that a DT model alone was incapable of performing effective anomaly detection. Besides traditional approaches that focus on human anomalies, Sarwar et al. [36] aimed to enhance smart home security by employing Machine Learning (ML) classifiers (DT, Artificial Neural Network (ANN), Random Forest, and AdaBoost) to identify anomalies in IoT devices deployed within smart homes. Meanwhile, Kanev et al. [37] trained an ANN model on network and device characteristics to recognise both hardware and signal-related anomalies in IoT devices.

Overall, as summarised in Table 1, existing data simulation and anomaly detection approaches face complementary drawbacks. Prior works in data simulation generally rely on labelled datasets that do not fully capture the variability in real-world behaviour. This limitation can hinder the model’s generalisation, which can limit the ability of the model to accurately detect subtle deviations in individual behaviour. In addition, many existing works do not account for progressive behavioural changes and temporal dependencies, making it challenging to detect anomalies in activities that evolve over time (e.g., extended toileting, napping, or prolonged meal preparation). In contrast, our work employs a Weibull mixture model to characterise ADL durations, followed by an inverse probability integral transform approach to synthesise the ADL durations that preserve the statistical features of each ADL. The generated data are then used to simulate progressive anomalies over subsequent months, enabling the modelling of behavioural deviations in individuals affected by dementia.

2.2. Process Mining in Smart Homes

While a smart home has several general applications, specific techniques like PM has been increasingly utilised to improve the recognition of human behaviour through the application of various approaches [38]. Carolis et al. [39] implemented a first-order logic approach in PM to autonomously understand, model, and respond to the daily routines of smart home occupants. By doing so, their work ensured that the system could forecast user requirements, recognise deviations, and incrementally update the models accordingly. Maarif [40] employed PM to infer human behaviour, rather than pattern recognition techniques, which involved advanced mathematical formulations. The experiments conducted in the work demonstrated that PM provides an effective data-driven approach, generating a visual representation of the activity sequences and enabling the recognition of behavioural patterns. Theodoropoulou et al. [41] employed and compared various PM algorithms, such as heuristic miner, fuzzy miner, alpha miner, and inductive miner, to analyse the behaviour of elderly smart home residents. The primary objective of their study was to evaluate the effectiveness of PM through a range of standard performance metrics. Dogan et al. [42] used indoor location sensors to study the daily behaviour of elderly occupants of smart homes. In particular, their work concentrated on monitoring resident movement, length of stay in each room, and the frequency of visits per room to uncover notable patterns of daily activities.

Recently, a new subfield of PM, known as Stochastic Process Mining, has begun to draw increasing attention [43]. Unlike traditional PM, Stochastic Process Mining aims at improving the representation of daily behaviours by capturing the underlying variation and unpredictability in the daily activities [12]. At present, several research works have employed Markov models and their variations, all of which are well-known stochastic models that can be used to examine human behaviour [16]. Kang et al. [13] employed a Hierarchical Hidden Markov Model (HHMM) with shared representations to characterise and identify sophisticated in-home activities. Their primary objective was to estimate behavioural states and detect abnormal behaviour. Asghari et al. [44] implemented an online HHMM method to identify activities from real-time smart home sensor data, where the proposed approach partitions incoming data, monitors real-time activities, and enhances the activity labels by leveraging statistical features. Wu et al. [14] introduced an improved HMM to predict individualised behaviours of disabled smart home residents, where the proposed work integrated temporal context and employed a temporal state transition matrix to identify deviations in daily activities. Kalra et al. [45] employed a two-stage supervised Markov model to recognise ADLs from sensor data. Their work aimed to capture temporal relationships of single activities, as well as the transitions between these activities. Flett and Kelly [15] implemented a higher-order Markov chain to create accurate resident profiles to enhance energy demand modelling. Through the incorporation of multiple prior states, general residence duration ranges, and the engagement between the residents, their study aimed to improve its predictive performance. While the study considered typical occupancy duration ranges, it does not specifically model the exact length of individual states. This represents a major limitation of conventional Markov models, which typically focus solely on sequences rather than exact durations of each state.

Semi-Markov models have become a popular approach that has been increasingly employed in recent studies [46], specifically to enhance activity recognition in smart homes [47]. McClean et al. [48] modelled toileting activity using a gamma mixture model, where the activity was treated as a combination of multiple sub-activities. Their work emphasised how neglecting a particular sub-activity can impact the overall distribution of the activity. The goal of their work was to enhance the lifestyle of smart home residents by providing the means for supporting autonomous system functioning. McClean et al. [49] used a gamma mixture model to model the daily living activity durations and simulated synthetic data to capture deviations in these activities. However, the work focused exclusively on frequent toilet visits and short sleeping durations. On the other hand, Yang et al. [50] solely focused on detecting irregularities in the activities of long duration.

Table 2 presents a comparison of Process Mining and Stochastic Process Mining studies present in the literature. While these works have leveraged semi-Markov models to incorporate the duration of activities, they generally do not address variations across multiple activity durations. In contrast, our work focuses on simulating multiple ADLs. The proposed approach allows us to detect monthly deviations in such ADLs, which can aid in identifying patterns associated with dementia.

3. Proposed Methodology

The proposed work presents a duration-based pipeline for modelling the different ADLs, simulating anomalous ADL durations, and detecting anomalous ADL durations over time, as shown in Figure 1. The approach employs a Weibull mixture model under a semi-Markov assumption to model the individual ADL durations, effectively capturing the statistical properties of typical behaviour patterns. The learned ADL duration models are subsequently employed to generate synthetic ADL durations through inverse transform sampling, offering a controllable baseline for simulated experiments. After which, the ADL duration anomalies are synthetically introduced by altering the mixture proportions to model simulated deviations from typical duration behaviours. Lastly, anomaly detection is carried out by comparing the ADL duration distributions across monthly windows using statistical hypothesis testing. This research design is suitable as it enables robust evaluation of anomaly detection while preserving realistic temporal patterns in the individual behaviour.

In the following subsections, we provide detailed descriptions of the individual components of the proposed method:

3.1. Dataset

This subsection describes the van Kasteren dataset, which serves as input to the proposed framework. The dataset records the ADLs of a 26-year-old individual living in a three-room apartment. Fourteen sensors were deployed throughout the apartment in carefully chosen locations like cupboards, doors, refrigerator, and toilet flush to capture the different ADLs performed by the individual. These sensors collected the data autonomously for 28 days and also recorded the start and end times of each activity. Furthermore, a Bluetooth headset was used by the individual to conduct annotations of the recorded activities. Paper [21] can be referred to for additional information on the data collection process.

3.2. Semi-Markov Modelling of ADL Duration

A semi-Markov model is an extension of the traditional Markov model that consists of the standard set of states

S

, the transition probability

p (a, b)

, and an additional component called the sojourn distribution

s (a, t)

, which represents the probability distribution of the duration t that an individual spends in activity a before moving to a different activity [50]. Nevertheless, in real-life situations, the underlying duration distribution of the activities can vary considerably depending on the circumstances [51]. Generally, the distribution of ADL duration follows a multi-modal pattern that reflects the underlying behaviour in the data. For example, the duration of breakfast preparation varies based on the specific needs of each individual. That is, some individuals may prepare a quick breakfast on a weekday, taking only about 5 to 10 minutes, whereas on weekends, they might spend more time to have a more substantial breakfast. This variability demonstrates that a unimodal distribution is inadequate to capture such heterogeneity and would therefore risk oversimplifying the underlying patterns in the data.

In contrast, mixture models are capable of capturing the variability in ADL durations and are particularly designed to effectively model such complex patterns. To model the ADL durations, we used the Weibull mixture model, as it is able to handle skewed data, that is, it is able to model both long and short ADL durations, and its shape parameter can effectively capture decreasing/increasing rate of activities across time [17,52]. Overall, these distributions provide a comprehensive way to capture the different patterns of activity durations. Generally, a Weibull mixture model is a combination of multiple Weibull distributions, where each Weibull component is weighted by its mixture proportions. Let

Y_{a}

represent the observed duration for daily activity a. Then, for a given time t, where

t > 0

, the Probability Density Function (PDF) of the mixture distribution can be formulated as [53]:

f (t ∣ Θ_{a, j}) = \frac{k_{a, j}}{λ_{a, j}} {(\frac{t}{λ_{a, j}})}^{k_{a, j} - 1} e^{- {(t / λ_{a, j})}^{k_{a, j}}}

(1)

where

f (t ∣ Θ_{a, j})

denotes the PDF of the Weibull mixture and is defined as a weighted sum of j Weibull components. Here,

Θ_{a, j}

consists of:

k_{a, j} > 0

, which is the shape parameter, and

λ_{a, j} > 0

, which is the scale parameter for the j-th Weibull component in activity a. To estimate the parameters

Θ_{a} = (k_{a, j}, λ_{a, j})

of the Weibull mixture, we use the likelihood of the observed durations

Y_{a}

, as shown below [54]:

L (Θ_{a, j} | Y_{a}) = \prod_{t \in Y_{a}} (\sum_{j = 1}^{J} π_{a, j} f (t | k_{a, j}, λ_{a, j}))

(2)

Based on this, the corresponding log-likelihood can be given by

\begin{matrix} l o g [L (Θ_{a, j} | Y_{a})] = \sum_{t \in Y_{a}} l o g (\sum_{j = 1}^{J} π_{a, j} f (t | k_{a, j}, λ_{a, j})) \end{matrix}

(3)

However, in its current form, the formula cannot be solved directly. As a result, we have employed a common technique called the Expectation–Maximisation approach. In this approach, in the E-step, the probability of data t given its activity

a_{t}

belonging to the j-th component is computed as [55]

γ_{t, j}^{(r)} = \frac{π_{a_{t}, j}^{(r)} f (t | k_{a_{t}, j}^{(r)}, λ_{a_{t}, j}^{(r)})}{\sum_{j = 1}^{J} π_{a_{t}, j}^{(r)} f (t | k_{a_{t}, j}^{(r)}, λ_{a_{t}, j}^{(r)})}

(4)

The M-step updates the model parameters by maximising the expected complete log-likelihood computed. For the mixture weights

π_{a, j}

[55]

π_{a, j}^{(r + 1)} = \frac{1}{Y_{a}} \sum_{t \in Y_{a}} γ_{t, j}^{(r)}

(5)

The initial shape and scale parameters of the Weibull mixture model were estimated using the method of moments approach. For the detailed formula used to compute the initial Weibull mixture parameters, interested readers can refer to [56]. Subsequently, the estimates were further refined by employing the Nelder–Mead approach [57]. Algorithm 1 demonstrates the overall process to fit the Weibull mixture model.

Algorithm 1: Weibull mixture model parameter estimation [17]

Input:

Y_{a}

: ADL durations, J: maximum number of components.

Initialise:

L = 0, d = \infty

K-means to partition data

Y_{a}

into J clusters.

Apply moment matching to initialise parameters

Θ^{(r)}

across each cluster.

for

r = 1

to

r_{m a x}

do:

E-step: Compute

γ_{t, j}^{(r)}

(Equation (4))

M-step: Estimate

π_{a, j}^{(r + 1)}

and

Θ_{m}^{(r + 1)}

(Equation (5))

Maximise Equation 3 using Nelder–Mead with established

Θ_{m}^{(r + 1)}

Compute: Log-likelihood

L^{(r + 1)}

(Equation (3))

if

(a b s (L^{(r + 1)} - L^{(r)}) < d)

break

Update:

Θ^{(r)} = Θ^{(r + 1)}, r = r + 1, L_{r} = L_{r + 1}

.

end for

Θ = Θ^{(r)}

Output: Estimated parameter

Θ

The number of mixing components significantly affects the performance of the mixture model. To determine the optimal number of components, we used the Bayesian Information Criterion (BIC) approach, which consists of two major components: a penalty for model complexity and a log-likelihood estimation, reflecting a trade-off between the model fit and model complexity. In the case of the Weibull mixture distribution, the model includes the shape and scale parameters. As a result, a Weibull mixture model with n components includes

3 n - 1

parameters for optimising n shape and n scale parameters, and

n - 1

mixture weights.

To determine how well the data corresponds with a specified distribution, standard goodness-of-test were applied. From the widely available tests, the following were selected:

1.: Kullback–Leibler (KL) divergence: Alternatively known as relative entropy. This test is used to compute the difference between the empirical PDF computed from the data $p (a)$ and the PDF of the Weibull mixture distribution $q (a)$ , as shown in the following formula [58]:

$D_{KL} (p ‖ q) = \int_{- \infty}^{\infty} p (a) log \frac{p (a)}{q (a)} d a$

(6)

If $D_{K L} = 0$ , the data distribution is considered to exactly match the specified mixture model, whereas a higher $D_{K L}$ value suggests increasing discrepancies between them. This test was conducted to measure how effectively the model fits the given data in high-density regions, thereby helping to ensure that the model can correctly capture the most common part of the distribution.
2.: Kolmogorov–Smirnov (KS) test: This is a well-established technique that compares the Cumulative Distribution Functions (CDFs) using either the two-sample or one-sample approach. In this paper, we utilise the one-sample approach, in which the empirical CDF (ECDF) $F_{n} (a)$ is compared with the CDF of the mixture model $F (a)$ , as described in the equation below [59]:

$D_{K S} = max_{a} | F_{n} (a) - F (a) |$

(7)

The null hypothesis is accepted if ( $p_{K S}) \geq 0.05$ . This means that there exists no variation between the computed CDFs. This test was used to primarily obtain the largest difference between the data and the model.
3.: Cramér–von Mises (CvM) test: In contrast to the KS test, this method considers the squared difference between the ECDF of the data and the theoretical CDF $F (a_{i})$ of the mixture model. The test statistic is defined by the following formula [60]:

$ω^{2} = \frac{1}{12 n} + \sum_{i = 1}^{n} {(F (a_{i}) - \frac{2 i - 1}{2 n})}^{2}$

(8)

While this test uses the same null hypothesis and probability condition as the KS test, it focuses on the overall measure of how the model fits across the entire distribution. In this way, minor variations between the model and data can be detected.

3.3. Simulating ADL Durations

Using the estimated Weibull mixture models of ADL durations, artificial ADL durations are generated via a common method called the inverse probability integral (also known as inverse transform sampling) [61]. This technique is a fundamental yet remarkably powerful pseudo-random number sampling technique in Statistics. It begins by drawing uniform samples of a random number

U \in [0, 1]

, and then identifies the smallest number

{t \in R^{+} ∣ F (a, t) \geq U}

. Here,

F (a, t)

represents the CDF of the chosen distribution. In our case, we compute the mixed Weibull pseudo-random variable defined by [61]

F (a, t) = \sum_{j = 1}^{J} π_{a, j} [1 - e^{- {(t / λ_{a, j})}^{k_{a, j}}}], where \sum_{j = 1}^{J} π_{a, j} = 1

(9)

Due to the complexity of the mixture Weibull CDF equation, its inverse does not have a closed-form expression. Nevertheless, the equation can be numerically solved, and its inverse form can be implemented by computing

t = F^{- 1} (U)

. Here,

F^{- 1}

denotes the inverse of the function.

3.4. Simulating Anomalous ADL Durations

As presented in the previous section, each ADL duration follows a sojourn distribution

s (a, t)

. To simulate an anomaly in the ADL duration, the component associated with the selected duration (low/high) is identified based on the mean of each Weibull component, and its weight is then gradually increased, while ensuring the total sum of the weights is equal to one. Next, each observed ADL duration is independently drawn from the updated mixture of the given month. This approach ensures that the ADL duration produces a versatile and steady simulation approach to simulate anomalies in ADL over successive months. This technique of generating duration data of anomalous ADL durations is implemented using two different approaches.

In the first approach, we set the first month as a fixed baseline month representing the initial normal behaviour, such that the intended component weight

π^{(1)} = π_{b a s e}

. After which, the anomaly weight for the successive months is increased progressively as shown below:

π_{a n o m}^{(m)} = π_{b a s e} + ϵ_{m}, where ϵ_{1} < ϵ_{2} < \dots < ϵ_{M}

(10)

In the second approach, we modelled a progressive increase in the second anomalous month. Here, we compared a single baseline month (

π^{(1)} = π_{b a s e}

) to a single anomalous second month. The weight of the second anomalous month is increased in a controlled, monotonic manner (across j simulation runs), such that

π_{a n o m}^{(2)} = π_{b a s e} + ϵ_{j}, where ϵ_{1} < ϵ_{2} < \dots < ϵ_{j}

(11)

Overall, the method of applying a rolling monthly window is used to capture the continuous drift in the ADL durations, highlighting the decline in the ADL durations as time progresses, and is generally applicable to any ADL durations. In our case, these approaches were employed for the following activities:

1.: To reproduce the sleeping behaviour in individuals affected by dementia, the Weibull mixture model was employed to simulate prolonged napping periods. The increase in the napping behaviour for individuals suffering from dementia is due to issues in their neural functioning, where the sleep–wake cycle of such individuals is disrupted, making it harder for them to remain awake, which in turn leads to increased napping durations [6].
2.: To simulate the drink preparation behaviour in individuals experiencing dementia, we used the Weibull mixture model to simulate prolonged drink preparation durations. Extended periods of drink preparation occur because performing such essential tasks is more error-prone and is cognitively demanding for individuals suffering from dementia. Such behavioural changes are mainly driven by memory issues and reduced attention in such individuals [62].
3.: To mimic the toileting behaviour in individuals affected by dementia, the Weibull mixture model was employed to generate extended toileting periods. Often, individuals suffering from dementia tend to take a longer time in the toilet due to issues like cognitive disarray, limited movements, or difficulty finding the toilet location [8].
4.: To model the meal preparation behaviour in individuals influenced by dementia, we adjusted the Weibull mixture model to produce longer meal preparation durations. As dementia progresses, individuals often take a longer time preparing their meals. This is mainly due to their inability to organise, or even remember, the tasks required to prepare a meal [7].

3.5. Detecting Anomalous ADL Durations

To detect a monthly distributional drift that represents the anomalous behaviour of the Weibull mixture model, we use the likelihood ratio test. In this test, we consider that the daily activity for each month is modelled using the Weibull mixture model described in the previous section. To evaluate if the activity has drifted from month m to month

m + 1

, we use the following two models for comparison:

Null model $(H_{0})$ : considers that the weights of the Weibull mixture model are the same, where the previous month’s weight is used for both models.
Alternative model $(H_{1})$ : assumes that the weights of the Weibull mixture model differ from month to month.

If

L L_{0}

represents the log-likelihood of the null model and

L L_{1}

denotes the log-likelihood of the alternative model. In this case, the likelihood ratio statistic

L L R

can be expressed as follows [63]:

\begin{matrix} L L R & = 2 ln (\frac{likelihood for alternative model}{likelihood for null model}) \end{matrix}

(12)

\begin{matrix} = 2 \times (L L_{1} - L L_{0}) \end{matrix}

(13)

The above formula can be estimated as a chi-square distribution, with a degree of freedom

(d f)

set to be 1. This is because only one of the weights in the Weibull mixture model is assumed to change. In this case, the

p_{c h i^{2}}

can be formulated as [64]

p_{c h i^{2}} = 1 - F_{Y^{2}} (L L R; d f = 1)

(14)

Here,

F_{Y^{2}}

denotes the cumulative distribution function of the chi-square distribution. If

p_{c h i^{2}} < α

(with

α = 0.05

), it is inferred that there exists a notable month-to-month drift in the activities. On the other hand, if

p_{c h i^{2}} \geq α

, it is determined that there exists no notable month-to-month deviation in the activities.

Algorithm 2 demonstrates the overall procedure to simulate and detect anomalous ADL durations.

Algorithm 2 Simulating and Detecting Anomalous ADL Durations

Input:

N :

total number of observations, M: total number of months,

π^{(m)}

: fitted mixture proportion per month,

α

: significance level,

λ_{a, j}

: fitted scale of Weibull mixture model,

k_{a, j} :

fitted shape of Weibull mixture model.

Initialise:

α = 0.05

Obtain the fitted Weibull parameters

(k_{a, j}, λ_{a, j})

Obtain the fitted mixture proportion

π^{(m)}

for each month.

for

m \leftarrow [1, 2, 3, \dots, M]

do

for

i \leftarrow {1, \dots, N}

do

Generate:

d_{i}

using the inverse transform sampling

end for

Store the generated dataset

d^{(m)}

end for

for

m = 1

to

M - 1

do

Compute

D_{L L R}

using

d^{(m)}

,

d^{(m + 1)}

and the fitted parameters. (Equation (13))

Compute:

p_{c h i^{2}}

(Equation (14))

if (

p_{c h i^{2}} < α)

then

Duration drift is detected

else

No significant duration drift is detected.

end for

Output: Month-to-month drift detection.

4. Results

This section uses the Kasteren smart home dataset to model the durations of daily activities using the Weibull mixture model. The results are organised according to the techniques outlined in Section 3.

4.1. Fitting Weibull Mixture Model to ADL Durations

To begin, we fit the Weibull mixture model on the durations of four daily activities: sleep, toilet, drink preparation, and meal preparation. After which, its fitness is assessed using widely recognised goodness-of-fit tests: KS, CvM, and KL. These metrics were chosen for the different strengths that they possess [65]. The purpose of the KS test is to reflect the differences in the most frequent activity durations. Meanwhile, CvM captures the deviations across the entire range of ADL durations. In contrast, KL divergence measures the overall probability difference between the observed and modelled distributions. It emphasises where the model fails to capture the rare ADL durations.

As illustrated in the Table 3, we can infer that both

p_{K S}

and

p_{C v M}

substantially exceed the significance threshold of 0.05 across all the ADLs. Particularly, we can infer high

p_{K S}

and

p_{C v M}

values for sleep (0.924, 0.962), demonstrating that the model reliably captures common sleep duration patterns, while remaining largely insensitive to small variations in such durations. Toilet activity also demonstrates a high

p_{K S}

value of

0.783

and

p_{C v M}

value of 0.860. Although in this case, both

p_{K S}

and

p_{C v M}

demonstrate moderate sensitivity to the variation in toilet durations because some toilet duration activities take exceptionally shorter/longer times. On the other hand, the drink preparation activity yields a higher

p_{K S}

value of 0.812 and a

p_{C v M}

value of 0.850. Meal preparation activity reveals a relatively lower

p_{K S}

value of 0.738 and

p_{C v M}

value of 0.833. This is due to the occurrence of high variability in the meal preparation durations, which includes rare and extremely long meal preparation durations.

Conversely, the KL test offers an alternative viewpoint, where it concentrates on the extreme or rare ADL durations. In this case, the observed durations that the model considers as rare have a stronger impact on the overall

D_{K L}

. From Table 3, we can infer that the sleep activity exhibits a low

D_{K L}

value of 0.113, suggesting limited influence from irregular sleep durations. The toilet activity produces a moderate

D_{K L}

value of 0.162, denoting that most of the toilet durations are short. However, some of the toilet durations are substantially longer, and this impacts the

D_{K L}

value. Drink preparation yields a slightly higher

D_{K L}

value of 0.630. This is because of the presence of an unusually long time spent on preparing drinks. On the other hand, meal preparation shows a

D_{K L}

value of 0.373, indicating that this value is strongly impacted by the infrequently long meal preparation times. Nevertheless, when examined in conjunction with the high

p_{K S}

and

p_{C v M}

, it is apparent that the Weibull mixture model continues to provide a statistically adequate characterisation of these activities. In this case, a higher

D_{K L}

more often demonstrates a slight deviation between the shape of the empirical and model distribution, rather than a poor model fit overall.

Overall, the results of

p_{K S}, p_{C v M}

and

D_{K L}

values obtained from Table 3, along with Figure 2, Figure 3, Figure 4 and Figure 5, reveals that the Weibull mixture model provides a flexible and robust framework in representing the distribution of the ADL durations.

4.2. Simulating ADL Durations

To assess the similarity between the real and synthetic ADL data, we used the KS and KL statistical tests. From Table 4 and Figure 6, Figure 7, Figure 8 and Figure 9, we can infer that the KS test produces very high

p_{K S} \geq 0.808

values across all daily activities. In particular, the sleep and drink preparation activities yield a notably high p-value of 0.994 and 0.983, indicating a near-exact alignment between the two data distributions. Likewise, the toilet activity also exhibits a very high

p_{K S} = 0.941

, respectively. On the other hand, the meal preparation activity produces a comparatively lower

p_{K S}

of 0.808. However, this value still remains considerably above the significance threshold of 0.05, demonstrating that the data distributions characterise the fundamental statistical characteristics between the real and synthetic data.

Alternatively, the KL divergence results, as presented in Table 4, illustrate the effectiveness of the proposed approach in simulating real-life ADL duration distributions. As demonstrated in Figure 10, Figure 11, Figure 12 and Figure 13, toilet activity exhibits the lowest

D_{K L}

value of 0.018, indicating that the synthetic toilet durations closely resemble real durations across both brief and extended toilet visits, thereby depicting a highly reliable representation of toilet usage behaviour. Similarly, drink preparation also demonstrates a low

D_{K L}

value of 0.038, validating that the method can closely capture both short and occasionally extended drink preparation durations. On the other hand, meal preparation exhibits a slightly higher

D_{K L}

value of 0.059. This is due to variations in its durations, which include some extremely long meal preparation durations. This indicates that, although the proposed approach can replicate the general patterns of the meal preparation durations, it struggles with accurately capturing the uncommon extended meal preparation durations. On the other hand, the sleep activity produces the highest

D_{K L}

value of 0.066. This indicates that the sleep durations exhibit high natural variability. That is, the sleep duration ranges from short naps to extended nightly sleep, thereby making the duration distribution more widely distributed compared to the other activities. Nevertheless, the synthetic sleep duration is able to effectively capture the majority of the distribution, indicating that the model is able to accurately capture the overall sleep routines.

As a result, the consistently low

D_{K L}

values observed across the different ADLs demonstrate that the proposed method can effectively capture the underlying statistical properties of the real-life ADL durations. Furthermore, the strong alignment between the real and synthetic ADL durations validates the proposed approach, indicating that it provides an accurate and dependable means for producing ADL durations that reflect real-life behaviour. Such validity is crucial in applications like dementia monitoring, as it helps to generate abnormal behaviours in the synthetic data in a controlled manner, while still ensuring that the overall behaviour effectively reflects and aligns with real-life behaviour.

Overall, the high

p_{K S}

and low

D_{K L}

values indicate strong evidence that the synthetic ADL data shares the same statistical properties as the real ADL data. Thus, the results indicate that the Weibull mixture model is not only efficient for fitting the daily activities, but also for generating the synthetic data effectively.

4.3. Detecting Anomalous ADL Durations

To investigate possible drifts in the underlying distributions of the ADLs over a six-month period, we employed the log-likelihood ratio and chi-square test.

In the first case, we analysed the sleeping activity by systematically increasing the weight of the second Weibull mixture component that corresponds to nap durations. From Table 5, we can infer that from month 1 to month 2, the nap proportion was increased marginally from 16.6% to 20%, producing an

L L R

of 6.30 and a

p_{c h i^{2}}

of 0.012, respectively, suggesting a statistically significant but relatively modest drift in the napping activity. Thereafter, the proportion of naps continued to rise, reaching 30% in the third month, 45% in the fourth month, 65% in the fifth month, and 95% in the sixth month, with the

L L R

values increasing dramatically to 54.29, 55.24, 135.84, and 434.21, while all the

p_{c h i^{2}}

values remain effectively near zero. Such an increase in napping activity suggests that the individual might suffer from disrupted sleep and fatigue, which could result in increased daytime drowsiness in individuals.

Similarly, for toileting activity, we systematically increased the weight of the second Weibull mixture model component, corresponding to prolonged toileting activity. As shown in Table 5, the proportion of longer toilet activity rose from 41.5% during the first month to 43% in the second month. In this case, the

L L R

produced a value of 4.33, whereas

p_{c h i^{2}}

yielded a value of 0.037, respectively. This demonstrated that there existed a statistically meaningful yet modest variation in the toileting distribution. Such variations represent minor changes, such as routines or personal habits, in individual behaviour, but they do not influence the other ADLs. However, as the toileting duration continued to rise over the successive months, attaining 48% in the third month, 55% in the fourth month, 67% in the fifth month and 82% in the sixth month, with the

L L R

values increasing dramatically to 10.92, 18.68, 55.57, and 80.43, while all the

p_{c h i^{2}}

values remain effectively near to zero, indicating that the individual spent a increasingly large amount of their time on toilet activities. This might occur due to cognitive decline in the individual, such as confusion or the need to repeatedly perform the same toileting activity.

For the meal preparation activity, the weight of the second component of the Weibull mixture model, which corresponds to the extended meal preparation activity, is increased. Table 5 indicates that the proportion of longer duration in meal preparation rose from 40% in month 1 to 43.5% in month 2, respectively, indicating subtle changes in the daily behaviour. This may rise due to factors like difficulty in meal planning or organising. Later, from month 2 to month 4, a moderate shift is observed as the proportion of longer meal preparation duration rises from 43.5% to 52%, reflecting a more pronounced prolongation in the activity, which could be due to factors like difficulty in coordinating the multi-sub tasks during meal preparation. Finally, a high drift is indicated from month 4 to month 6 as the proportion of prolonged meal preparation increases from 52% to 67%, respectively. This could be due to reasons like confusion/forgetfulness during meal preparation, reduced attention, or impaired coordination.

For the drink preparation activity, we increased the weight of the second Weibull mixture model component, which corresponds to prolonged duration in drink preparation. From Table 5, it can be inferred that over months 1 and 2, the proportion of taking a longer time to prepare a drink increased from 10% to 13%, suggesting a low shift in the drink preparation activity. This might arise due to the early development of dementia, which causes the individual to either repeat the subtasks or take more time in performing the subtasks related to drink preparation. Later, from month 2 to month 4, we can observe a moderate drift where the preparation of the drink preparation activity changes from 13% to 34% respectively, possibly due to evident changes influenced by factors such as greater task difficulty, slower processing, or increased tiredness while performing the subtasks associated with drink preparation. Lastly, from month 4 to month 6, a high drift in the duration is demonstrated as the proportion rises from 34% to 68%, indicating more advanced dementia-related impairments like decreased ability to initiate the activity.

For the second case, we focused on simulating anomalous ADL durations in the second month relative to the initial baseline behaviour observed in the first month. Table 6 depicts the assessment of the distributional drifts between month 1 and month 2 across the four activity durations: nap durations, prolonged toileting, extended meal preparation, and longer drink preparation. Here, each Weibull mixture model is fitted to the corresponding raw durations of the respective ADL. The distribution of each activity in the first month reflects the baseline component weights, whereas month 2 represents how these component weights can be increased progressively over time, reflecting potential anomalies.

From the Table, we can infer that distributional drift is evident across all the month 2 mixture proportions for each ADL. For sleep duration, as the proportion of the nap component increases from 25% to 45%, the

L L R

value increases rapidly from 54.19 to 429.39, representing nearly an eightfold increase, with

p_{c h i^{2}}

values remaining near zero for significant deviations. Food preparation activity exhibits a steep increase from 5.07 to 248.98, while the component weight increases from 43% to 65%. By comparison, the duration of toileting activity depicts a more moderate rise, moving from 6.02 to 60.05, whereas the duration of drink preparation activity increased from 13.81 to 57.40 across similar weight ranges. The observed quantitative variations suggest that nap and food preparation activities react strongly to minor fluctuations, while drink preparation and toilet activities require more substantial shifts to exhibit notable deviations.

Essentially, the

L L R

test detects drifts and quantifies their magnitude, allowing sensitivity to be compared among different ADLs. The steady monotonic rise in

L L R

values, together with very small

p_{c h i^{2}}

values, suggest that even a moderate increase in component weights can produce significant effects. Overall, these findings confirm that the observed activity is practically meaningful and backed by statistical evidence. Furthermore, these results reveal that certain activities are more responsive to variation than others. Nevertheless, the overall activity patterns in the second month demonstrate a clear divergence from the normal behaviour observed in the first month.

5. Conclusions and Discussion

This paper presents a semi-Markov method to model and detect anomalies in ADL durations. To achieve this, we employed a Weibull mixture model to effectively represent the duration of each ADL and validate the fit of the model using the KL, KS, and CvM tests. Synthetic ADL durations were generated via the inverse probability integral method to replicate realistic ADL durations, allowing for comparison with real data to evaluate the performance of the model. To simulate anomalies, weights of the mixture model were modified to more accurately represent behavioural drifts, including prolonged meal preparation, drink preparation, toileting, and napping. These anomalies were detected within a rolling monthly window, with detection carried out using the log-likelihood ratio and chi-square test. Our results illustrate that the proposed approach can be used to reliably identify abnormal ADL durations, offering a proven framework to track early detection of behavioural shifts, and showcasing the effectiveness of detecting duration-based anomalies in ADLs, noting that current studies have not dealt with such deviations. Despite the progress made in this work, certain limitations remain, along with promising opportunities for future work.

1.: Limitations: This study examines only a restricted set of ADLs (sleep, toileting, drink preparation, and meal preparation), which may not represent the full range of daily behaviour. The duration distributions of the ADLs are modelled using a Weibull mixture model, which has been shown in previous studies to provide a good fit [17]. Nonetheless, other types of models might better capture additional characteristics and distinct features of these distributions. The synthetic data generation method uses inverse probability integral and presumes that altering mixture weights sufficiently simulates anomalous behaviour. Furthermore, because the dataset primarily represents normal human behaviour, the resulting simulations are inherently hypothetical. They may not fully capture the variety of real-world behavioural drifts among individuals with dementia.
2.: Future Work and Recommendation: The challenges in modelling ADL durations point to multiple directions for future research. For instance, while the current study focuses on modelling durations using a Weibull mixture model, other mixture models could be investigated to effectively capture subtle patterns in individual behaviour that may be missed by the Weibull mixture model. Moreover, incorporating the transitions of the ADL through a semi-Markov model may enhance the anomaly detection. Furthermore, the present work could extend the current monthly window analysis to real-time sequential analysis techniques, in which data collected in real time would be monitored continuously to enable early detection of behavioural anomalies. Multiple testing approaches, like the Benjamin–Hochberg (BH) approach, can be employed to limit the false discovery rate and recognise significant deviations more effectively [66]. Additionally, integrating temporal information, such as differences across time of the day or the day within the week, could enhance the realism of the synthetic anomaly simulation, making it more reflective of dementia-related behaviour patterns in the individuals.

Author Contributions

Development of concepts, S.M., B.S. and E.S.; Methodology, B.S. and S.M.; Implementation, E.S.; Validation, B.S. and S.M.; Resources, B.S., S.M. and E.S.; Data preparation, E.S.; Preparing the initial draft, E.S.; Reviewing: B.S., S.M. and Z.T.; Supervision, B.S., S.M., Z.T. and N.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the VCRS (Vice-Chancellor’s Research Studentships), funded by Ulster University.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in the study are openly available at: http://ailab.wsu.edu/mavhome/research.html (accessed on 11 January 2026).

Conflicts of Interest

The authors have no conflict of interest.

References

Population. Available online: https://www.un.org/en/global-issues/population (accessed on 3 February 2026).
Ageing. Available online: https://www.who.int/health-topics/ageing (accessed on 3 February 2026).
Dementia. Available online: https://www.who.int/news-room/fact-sheets/detail/dementia (accessed on 3 February 2026).
What Is Dementia? Symptoms, Types, and Diagnosis. Available online: https://www.nia.nih.gov/health/alzheimers-and-dementia/what-dementia-symptoms-types-and-diagnosis (accessed on 3 February 2026).
Over 41 Million Cases of Dementia go Undiagnosed Across the Globe—World Alzheimer Report Reveals. Available online: https://www.alzint.org/news-events/news/over-41-million-cases-of-dementia-go-undiagnosed-across-the-globe-world-alzheimer-report-reveals/ (accessed on 3 February 2026).
Cross, N.; Terpening, Z.; Rogers, N.L.; Duffy, S.L.; Hickie, I.B.; Lewis, S.J.; Naismith, S.L. Napping in older people ‘at risk’of dementia: Relationships with depression, cognition, medical burden and sleep quality. J. Sleep Res. 2015, 24, 494–502. [Google Scholar]
Papachristou, I.; Giatras, N.; Ussher, M. Impact of dementia progression on food-related processes: A qualitative study of caregivers’ perspectives. Am. J. Alzheimer’s Dis. Other Dementias® 2013, 28, 568–574. [Google Scholar] [CrossRef]
Aldridge, Z.; Harrison Dening, K. Dementia and continence issues. J. Community Nurs. 2021, 35, 58. [Google Scholar]
Zhu, Y.; Luo, H.; Chen, R.; Zhao, F. Diamondnet: A neural-network-based heterogeneous sensor attentive fusion for human activity recognition. IEEE Trans. Neural Networks Learn. Syst. 2023, 35, 15321–15331. [Google Scholar] [CrossRef]
Wang, J.; Zhu, T.; Gan, J.; Chen, L.L.; Ning, H.; Wan, Y. Sensor data augmentation by resampling in contrastive learning for human activity recognition. IEEE Sens. J. 2022, 22, 22994–23008. [Google Scholar] [CrossRef]
Malhotra, P.; Singh, Y.; Anand, P.; Bangotra, D.K.; Singh, P.K.; Hong, W.C. Internet of things: Evolution, concerns and security challenges. Sensors 2021, 21, 1809. [Google Scholar] [CrossRef]
Leemans, S.J.; van der Aalst, W.M.; Brockhoff, T.; Polyvyanyy, A. Stochastic process mining: Earth movers’ stochastic conformance. Inf. Syst. 2021, 102, 101724. [Google Scholar] [CrossRef]
Kang, W.; Shin, D.; Shin, D. Detecting and predicting of abnormal behavior using hierarchical markov model in smart home network. In Proceedings of the 2010 IEEE 17Th International Conference on Industrial Engineering and Engineering Management, Xiamen, China, 29–31 October 2010; IEEE: New York, NY, USA, 2010; pp. 410–414. [Google Scholar]
Wu, E.; Zhang, P.; Lu, T.; Gu, H.; Gu, N. Behavior prediction using an improved Hidden Markov Model to support people with disabilities in smart homes. In Proceedings of the 2016 IEEE 20th International Conference on Computer Supported Cooperative Work in Design (CSCWD), Nanchang, China, 4–6 May 2016; IEEE: New York, NY, USA, 2016; pp. 560–565. [Google Scholar]
Flett, G.; Kelly, N. An occupant-differentiated, higher-order Markov Chain method for prediction of domestic occupancy. Energy Build. 2016, 125, 219–230. [Google Scholar]
Moore, S.J.; Nugent, C.D.; Zhang, S.; Cleland, I.; Sani, S.; Healing, A. A markov model to detect sensor failure in IoT environments. In Proceedings of the 2020 IEEE World Congress on Services (SERVICES), Beijing, China, 18–23 October 2020; IEEE: New York, NY, USA, 2020; pp. 13–16. [Google Scholar]
Shaikh, E.; Scotney, B.; McClean, S.I.; Tariq, Z.; Mohammad, N. Using Mixture Models to Characterize the Process Durations of Daily Living. In Proceedings of the 24th UK Workshop in Computational Intelligence, Edinburgh, UK, 3–5 September 2025. [Google Scholar]
Fletcher-Lloyd, N.; Serban, A.I.; Kolanko, M.; Wingfield, D.; Wilson, D.; Nilforooshan, R.; Barnaghi, P.; Soreq, E. A Markov chain model for identifying changes in daily activity patterns of people living with dementia. IEEE Internet Things J. 2023, 11, 2244–2254. [Google Scholar] [CrossRef]
Grammatikopoulou, M.; Lazarou, I.; Alepopoulos, V.; Mpaltadoros, L.; Oikonomou, V.P.; Stavropoulos, T.G.; Nikolopoulos, S.; Kompatsiaris, I.; Tsolaki, M. Assessing the cognitive decline of people in the spectrum of AD by monitoring their activities of daily living in an IoT-enabled smart home environment: A cross-sectional pilot study. Front. Aging Neurosci. 2024, 16, 1375131. [Google Scholar] [CrossRef]
Gómez, Y.M.; Gallardo, D.I.; Marchant, C.; Sánchez, L.; Bourguignon, M. An in-depth review of the Weibull model with a focus on various parameterizations. Mathematics 2023, 12, 56. [Google Scholar] [CrossRef]
Van Kasteren, T.; Noulas, A.; Englebienne, G.; Kröse, B. Accurate activity recognition in a home setting. In Proceedings of the 10th International Conference on Ubiquitous Computing, Seoul, Republic of Korea, 21–24 September 2008; pp. 1–9. [Google Scholar]
Jose, A.C.; Malekian, R.; Letswamotse, B.B. Improving smart home security; integrating behaviour prediction into smart home. Int. J. Sens. Netw. 2018, 28, 253–269. [Google Scholar]
Fabi, V.; Spigliantini, G.; Corgnati, S.P. Insights on smart home concept and occupants’ interaction with building controls. Energy Procedia 2017, 111, 759–769. [Google Scholar] [CrossRef]
Jo, H.; Yoon, Y.I. Intelligent smart home energy efficiency model using artificial TensorFlow engine. Hum. Centric Comput. Inf. Sci. 2018, 8, 9. [Google Scholar]
Bouchabou, D.; Nguyen, S.M.; Lohr, C.; LeDuc, B.; Kanellos, I. A survey of human activity recognition in smart homes based on IoT sensors algorithms: Taxonomies, challenges, and opportunities with deep learning. Sensors 2021, 21, 6037. [Google Scholar] [CrossRef]
Paul, P.; George, T. An effective approach for human activity recognition on smartphone. In Proceedings of the 2015 IEEE International Conference on Engineering and Technology (ICETECH), Coimbatore, India, 20 March 2015; IEEE: New York, NY, USA, 2015; pp. 1–3. [Google Scholar]
Guo, L.; Fang, H. Real-time human activity recognition in smart home with binary tree SVM. In Proceedings of the 2016 35th Chinese Control Conference (CCC), Chengdu, China, 27–29 July 2016; IEEE: New York, NY, USA, 2016; pp. 6977–6981. [Google Scholar]
Xu, W.; Pang, Y.; Yang, Y.; Liu, Y. Human activity recognition based on convolutional neural network. In Proceedings of the 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China, 20–24 August 2018; IEEE: New York, NY, USA, 2018; pp. 165–170. [Google Scholar]
Mahmoodzadeh, A. Human activity recognition based on deep belief network classifier and combination of local and global features. J. Inf. Syst. Telecommun. 2021, 9, 33. [Google Scholar]
Bennett, T.R.; Massey, H.C.; Wu, J.; Hasnain, S.A.; Jafari, R. MotionSynthesis Toolset (MoST): An open source tool and data set for human motion data synthesis and validation. IEEE Sens. J. 2016, 16, 5365–5375. [Google Scholar]
Hernangómez, R.; Visentin, T.; Servadei, L.; Khodabakhshandeh, H.; Stańczak, S. Improving radar human activity classification using synthetic data with image transformation. Sensors 2022, 22, 1519. [Google Scholar] [CrossRef]
Romero, A.; Carvalho, P.; Côrte-Real, L.; Pereira, A. Synthesizing human activity for data generation. J. Imaging 2023, 9, 204. [Google Scholar] [CrossRef]
de Souza, M.D.; Junior, C.R.S.; Quintino, J.; Santos, A.L.; da Silva, F.Q.; Zanchettin, C. Exploring the impact of synthetic data on human activity recognition tasks. Procedia Comput. Sci. 2023, 222, 656–665. [Google Scholar] [CrossRef]
Alaghbari, K.A.; Saad, M.H.M.; Hussain, A.; Alam, M.R. Activities recognition, anomaly detection and next activity prediction based on neural networks in smart homes. IEEE Access 2022, 10, 28219–28232. [Google Scholar] [CrossRef]
Sánchez, V.G.; Skeie, N.O. Decision Trees for Human Activity Recognition Modelling in Smart House Environments. Simul. Notes Eur. 2018, 28, 177–184. [Google Scholar] [CrossRef]
Sarwar, N.; Bajwa, I.S.; Hussain, M.Z.; Ibrahim, M.; Saleem, K. IoT network anomaly detection in smart homes using machine learning. IEEE Access 2023, 11, 119462–119480. [Google Scholar] [CrossRef]
Kanev, A.; Nasteka, A.; Bessonova, C.; Nevmerzhitsky, D.; Silaev, A.; Efremov, A.; Nikiforova, K. Anomaly detection in wireless sensor network of the “smart home” system. In Proceedings of the 2017 20th Conference of Open Innovations Association (FRUCT), St. Petersburg, Russia, 3–7 April 2017; IEEE: New York, NY, USA, 2017; pp. 118–124. [Google Scholar]
Bertrand, Y.; Van den Abbeele, B.; Veneruso, S.; Leotta, F.; Mecella, M.; Serral, E. A survey on the application of process mining to smart spaces data. In Proceedings of the International Conference on Process Mining, Bolzano, Italy, 23–28 October 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 57–70. [Google Scholar]
Carolis, B.D.; Ferilli, S.; Redavid, D. Incremental learning of daily routines as workflows in a smart home environment. ACM Trans. Interact. Intell. Syst. (TiiS) 2015, 4, 1–23. [Google Scholar] [CrossRef]
Ma’arif, M.R. Revealing daily human activity pattern using process mining approach. In Proceedings of the 2017 4th International Conference on Electrical Engineering, Computer Science and Informatics (EECSI), Yogyakarta, Indonesia, 19–21 September 2017; IEEE: New York, NY, USA, 2017; pp. 1–5. [Google Scholar]
Theodoropoulou, G.; Bousdekis, A.; Miaoulis, G.; Voulodimos, A. Process mining for activities of daily living in smart homecare. In Proceedings of the 24th Pan-Hellenic Conference on Informatics, Athens, Greece, 20–22 November 2020; pp. 197–201. [Google Scholar]
Dogan, O.; Akkol, E.; Olucoglu, M. Understanding patient activity patterns in smart homes with process mining. In Proceedings of the Iberoamerican Knowledge Graphs and Semantic Web Conference, Madrid, Spain, 21–23 November 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 298–311. [Google Scholar]
Helske, J.; Helske, S.; Saqr, M.; López-Pernas, S.; Murphy, K. A modern approach to transition analysis and process mining with Markov models in education. In Learning Analytics Methods and Tutorials: A Practical Guide Using R; Springer: Cham, Switzerland, 2024; pp. 381–427. [Google Scholar]
Asghari, P.; Soleimani, E.; Nazerfard, E. Online human activity recognition employing hierarchical hidden Markov models. J. Ambient Intell. Humaniz. Comput. 2020, 11, 1141–1152. [Google Scholar]
Kalra, L.; Zhao, X.; Soto, A.J.; Milios, E. Detection of daily living activities using a two-stage Markov model. J. Ambient Intell. Smart Environ. 2013, 5, 273–285. [Google Scholar]
Kalenkova, A.; Mitchell, L.; Roughan, M. Performance analysis: Discovering semi-markov models from event logs. IEEE Access 2025, 13, 38035–38053. [Google Scholar] [CrossRef]
Van Kasteren, T.; Englebienne, G.; Kröse, B.J. Activity recognition using semi-Markov models on real world smart home datasets. J. Ambient Intell. Smart Environ. 2010, 2, 311–325. [Google Scholar]
McClean, S.; Yang, L. Semi-Markov models for process mining in smart homes. Mathematics 2023, 11, 5001. [Google Scholar] [CrossRef]
McClean, S.; Wang, D.; Yang, L.; McChesney, I.; Tariq, Z.; Prasad, S. Using semi-Markov models for generating, validating, and analyzing artificial smart home processes. In Proceedings of the International Conference on Ubiquitous Computing and Ambient Intelligence, Belfast, UK, 27–29 November 2024; Springer: Berlin/Heidelberg, Germany, 2024; pp. 300–312. [Google Scholar]
Yang, L.; Mc Clean, S.; Bashar, A.; Moore, S.; Tariq, Z. Using semi-Markov models to identify long holding times of activities of daily living in smart homes. In Proceedings of the 2023 IEEE Smart World Congress (SWC), Portsmouth, UK, 28–31 August 2023; IEEE: New York, NY, USA, 2023; pp. 748–753. [Google Scholar]
Chau, J.Y.; Gomersall, S.R.; Van Der Ploeg, H.P.; Milton, K. The evolution of time use approaches for understanding activities of daily living in a public health context. BMC Public Health 2019, 19, 451. [Google Scholar] [CrossRef]
Jiajin, X.; Zhentong, G. From Gaussian Distribution to Weibull Distribution. Glob. J. Res. Eng. I Numer. Methods 2023, 23, 1–6. [Google Scholar]
Weibull, W. A statistical distribution function of wide applicability. J. Appl. Mech. 1951, 18, 290–293. [Google Scholar] [CrossRef]
Papoulis, A. Random Variables and Stochastic Processes; McGraw Hill: New York, NY, USA, 1965. [Google Scholar]
Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 1977, 39, 1–22. [Google Scholar]
Cohen, A.C. Maximum likelihood estimation in the Weibull distribution based on complete and on censored samples. Technometrics 1965, 7, 579–588. [Google Scholar] [CrossRef]
Singer, S.; Nelder, J. Nelder-mead algorithm. Scholarpedia 2009, 4, 2928. [Google Scholar] [CrossRef]
Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
Berger, V.W.; Zhou, Y. Kolmogorov–smirnov test: Overview. In Wiley Statsref: Statistics Reference Online; Wiley Online Library: Hoboken, NK, USA, 2014. [Google Scholar]
Anderson, T.W. On the distribution of the two-sample Cramer-von Mises criterion. Ann. Math. Stat. 1962, 33, 1148–1159. [Google Scholar] [CrossRef]
Henderson, S.G.; Nelson, B.L. Handbooks in Operations Research and Management Science: Simulation; Elsevier: Amsterdam, The Netherlands, 2006; Volume 13. [Google Scholar]
Ramsden, C.M.; Kinsella, G.J.; Ong, B.; Storey, E. Performance of everyday actions in mild Alzheimer’s disease. Neuropsychology 2008, 22, 17. [Google Scholar]
Koch, K.R. Parameter Estimation and Hypothesis Testing in Linear Models; Springer: Berlin/Heidelberg, Germany, 1999. [Google Scholar]
Wilks, S.S. The large-sample distribution of the likelihood ratio for testing composite hypotheses. Ann. Math. Stat. 1938, 9, 60–62. [Google Scholar] [CrossRef]
Nguyen, M. A Guide on Data Analysis: From Basics to Causal Inference; Bookdown: Boca Raton, FL, USA, 2020. [Google Scholar]
Benjamini, Y.; Hochberg, Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodol.) 1995, 57, 289–300. [Google Scholar]

Figure 1. Overview of proposed methodology.

Figure 2. Histogram of sleep durations with fitted Weibull mixture PDF.

Figure 3. Histogram of toilet durations with fitted Weibull mixture PDF.

Figure 4. Histogram of drink preparation durations with fitted Weibull mixture PDF.

Figure 5. Histogram of meal preparation durations with fitted Weibull mixture PDF.

Figure 6. KS statistic between real vs. synthetic sleep duration distributions.

Figure 7. KS statistic between real vs. synthetic toilet duration distributions.

Figure 8. KS statistic between real vs. synthetic drink preparation duration distributions.

Figure 9. KS statistic between real vs. synthetic meal preparation duration distributions.

Figure 10. KL divergence between real and synthetic sleep duration distributions.

Figure 11. KL divergence between real and synthetic toilet duration distributions.

Figure 12. KL divergence between real and synthetic drink preparation duration distributions.

Figure 13. KL divergence between real and synthetic meal preparation duration distributions.

Table 1. Summary of data simulation and anomaly detection methods in smart homes.

Paper	Data	Strengths	Limitations
[30]	Sensor signals	Provides open-source tool for data synthesis, validation and visualisation	Requires MATLAB access
[31]	Radar signals	Reduce dependency on limited dataset	Restricted radar modelling
[32]	2D avatar video	Supports generation of customisable and realistic data	Struggles with complex human actions
[33]	Time series data	Boosts ML performance on limited data	Heavily relies on GAN stability.
[35]	Daily activities	Interpretable results	Restricted generalisability
[37]	Sensor data streams; network traffic	Accounts for both device-level metrics and network traffic	Restricted scalability and practical applicability
[34]	Daily activities	Provides activity recognition, prediction and anomaly detection	Depends on labelled data
[36]	IoT traffic dataset	Uses real dataset to examine malicious behaviour	Does not offer interpretability

Table 2. Summary of Process Mining and Stochastic Process Mining approaches to model human activities in smart homes.

Paper	Strengths	Limitations
[39]	Graphical representation of ADLs	Limited interpretability
[40]	Focused on simplifying the visualisation of process models	Only considered frequent sequential patterns
[41]	Compared performance of alpha, heuristic, fuzzy, and inductive miner algorithms	Majorly focused on comparing PM algorithms
[42]	Considered frequency, length of stay, and user pathways to examine human behaviour	Did not consider activity duration.
[13]	Hierarchical behaviour modelling	Model requires optimisation
[44]	Online recognition of disrupted activities	Depends on labelled data
[14]	Reliable personalised prediction	Complex temporal model
[45]	Model complex activity sequences	Resource-intensive model
[15]	Accurate multi-resident prediction	Minimal improvement over traditional models
[48]	Considers activity duration	Does not consider progressive anomaly simulation and detection
[49]	Considers activity duration	Only focused on toilet and breakfast activity
[50]	Considers activity duration	Focused only on long holding time

Table 3. Goodness-of-fit analysis of ADL durations.

	Sleep	Toilet	Drink Preparation	Meal Preparation
$p_{K S}$	0.924	0.783	0.812	0.738
$p_{C v M}$	0.962	0.860	0.850	0.833
$D_{K L}$	0.113	0.162	0.630	0.373

Table 4. Statistical comparison between real and synthetic ADL durations.

	Sleep	Toilet	Drink Preparation	Meal Preparation
$p_{K S}$	0.994	0.941	0.983	0.808
$D_{K L}$	0.066	0.018	0.038	0.059

Table 5. Progressive monthly increase in nap, prolonged toileting, meal, and drink preparation durations.

Activity	Months	Previous Month	Next Month	$LLR$	$p_{{chi}^{2}}$
Nap	1 vs. 2	16.6%	20%	6.30	1.21 × $10^{- 2}$
	2 vs. 3	20%	30%	54.29	1.73 × $10^{- 13}$
	3 vs. 4	30%	45%	55.24	1.07 × $10^{- 13}$
	4 vs. 5	45%	65%	135.84	0.0
	5 vs. 6	65%	95%	434.21	0.0
Prolonged toileting	1 vs. 2	41.5%	43%	4.33	3.74 × $10^{- 2}$
	2 vs. 3	43%	48%	10.92	9.50 × $10^{- 4}$
	3 vs. 4	48%	55%	18.68	1.54 × $10^{- 5}$
	4 vs. 5	55%	67%	55.57	8.97 × $10^{- 14}$
	5 vs. 6	67%	82%	80.43	0.00
Extended meal preparation	1 vs. 2	40%	43.5%	8.87	2.90 × $10^{- 03}$
	2 vs. 3	43.5%	48%	8.64	3.29 × $10^{- 3}$
	3 vs. 4	48%	52%	9.07	2.60 × $10^{- 3}$
	4 vs. 5	52%	58%	13.55	2.33 × $10^{- 4}$
	5 vs. 6	58%	67%	35.22	2.95 × $10^{9}$
Longer drink preparation	1 vs. 2	10%	13%	11.54	6.81 × $10^{- 4}$
	2 vs. 3	13%	22%	33.41	7.47 × $10^{- 9}$
	3 vs. 4	22%	34%	60.58	7.11 × $10^{- 15}$
	4 vs. 5	34%	49%	89.92	0.00
	5 vs. 6	49%	68%	105.10	0.00

Table 6. Progressive increase in month 2 nap, prolonged toileting, meal, and drink preparation durations from month 1.

Activity	Month 2 Proportion	$LLR$	$p_{{chi}^{2}}$
Nap	25%	54.19	1.82 × $10^{- 13}$
	30%	68.21	1.11 × $10^{- 16}$
	35%	175.66	2.23 × $10^{- 308}$
	40%	238.20	2.23 × $10^{- 308}$
	45%	429.30	2.23 × $10^{- 308}$
Prolonged toileting	45%	6.02	1.41 × $10^{- 2}$
	48%	11.21	8.100 × $10^{- 4}$
	51%	25.43	4.58 × $10^{- 7}$
	54%	39.59	3.12 × $10^{- 10}$
	57%	60.05	9.21 × $10^{- 15}$
Extended meal preparation	43%	5.07	2.43 × $10^{- 2}$
	47%	5.48	1.92 × $10^{- 2}$
	50%	40.11	2.39 × $10^{- 10}$
	54%	32.13	1.44 × $10^{- 8}$
	65%	248.97	2.23 × $10^{- 308}$
Longer drink preparation	12.2%	13.80	2.03 × $10^{- 4}$
	13.8%	14.92	1.12 × $10^{- 4}$
	15.1%	35.42	2.66 × $10^{- 9}$
	16.7%	25.96	3.47 × $10^{- 7}$
	18.2%	57.40	3.55 × $10^{- 14}$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shaikh, E.; McClean, S.; Tariq, Z.; Scotney, B.; Mohammad, N. Leveraging Semi-Markov Models to Identify Anomalies of Activities of Daily Living in Smart Homes Processes. Algorithms 2026, 19, 150. https://doi.org/10.3390/a19020150

AMA Style

Shaikh E, McClean S, Tariq Z, Scotney B, Mohammad N. Leveraging Semi-Markov Models to Identify Anomalies of Activities of Daily Living in Smart Homes Processes. Algorithms. 2026; 19(2):150. https://doi.org/10.3390/a19020150

Chicago/Turabian Style

Shaikh, Eman, Sally McClean, Zeeshan Tariq, Bryan Scotney, and Nazeeruddin Mohammad. 2026. "Leveraging Semi-Markov Models to Identify Anomalies of Activities of Daily Living in Smart Homes Processes" Algorithms 19, no. 2: 150. https://doi.org/10.3390/a19020150

APA Style

Shaikh, E., McClean, S., Tariq, Z., Scotney, B., & Mohammad, N. (2026). Leveraging Semi-Markov Models to Identify Anomalies of Activities of Daily Living in Smart Homes Processes. Algorithms, 19(2), 150. https://doi.org/10.3390/a19020150

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Leveraging Semi-Markov Models to Identify Anomalies of Activities of Daily Living in Smart Homes Processes

Abstract

1. Introduction

2. Related Work

2.1. Traditional Approaches for Smart Homes

2.2. Process Mining in Smart Homes

3. Proposed Methodology

3.1. Dataset

3.2. Semi-Markov Modelling of ADL Duration

3.3. Simulating ADL Durations

3.4. Simulating Anomalous ADL Durations

3.5. Detecting Anomalous ADL Durations

4. Results

4.1. Fitting Weibull Mixture Model to ADL Durations

4.2. Simulating ADL Durations

4.3. Detecting Anomalous ADL Durations

5. Conclusions and Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI