Machine Learning Prediction of Agitation in Dementia Patients Using Sleep and Physiological Data

Ramesh, Keshav; Yakoub, Anna; Ghoneim, Youssef; Al Korabi, Rehab; Ramesh, Jayroop; Sagahyroon, Assim; Aloul, Fadi

doi:10.3390/app15189908

Open AccessArticle

Machine Learning Prediction of Agitation in Dementia Patients Using Sleep and Physiological Data

by

Keshav Ramesh

,

Anna Yakoub

,

Youssef Ghoneim

,

Rehab Al Korabi

,

Jayroop Ramesh

^*

,

Assim Sagahyroon

and

Fadi Aloul

Department of Computer Science and Engineering, American University of Sharjah, Sharjah 26666, United Arab Emirates

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(18), 9908; https://doi.org/10.3390/app15189908

Submission received: 20 August 2025 / Revised: 4 September 2025 / Accepted: 8 September 2025 / Published: 10 September 2025

(This article belongs to the Special Issue AI Technologies for eHealth and mHealth, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Dementia is a progressive condition that affects cognitive and functional abilities. Psycho-motor agitation represents a frequent and challenging manifestation in People Living with Dementia (PLwD). This behavior contributes to heightened distress and increased risk of harm for patients, while posing a significant burden for caregivers, who must navigate the complexities of managing unpredictable and potentially harmful agitation episodes. Accurately predicting and promptly responding to agitation events is thus critical for enhancing the safety and well-being of PLwD. Leveraging artificial intelligence, tools can be used to monitor behavioral patterns and alert healthcare providers about potential agitation to facilitate timely and effective interventions. Despite the link between poor sleep quality and the likelihood of agitation, there remains a gap in utilizing sleep parameters for predictive analytics in this domain. This study explores the potential of integrating sleep and associated physiological data to predict the risk of agitation in dementia patients the next day, leveraging the Technology Integrated Health Management (TIHM) dataset. Our analysis reveals that the LightGBM model, enhanced with combined feature sets, delivers superior performance, achieving a weighted F1 score of 93.6% compared to standard baseline models. The findings underscore the value of incorporating sleep data into automated models and advocate for continued efforts to develop long-term agitation prediction methods.

Keywords:

dementia; machine learning; PLwD; sleep quality; sleep tracking

1. Introduction

Dementia is a spectrum of neurodegenerative conditions characterized by cognitive and functional decline, predominantly impacting the elderly. It manifests through memory loss, confusion, impaired communication, and motor dysfunction. Among the challenges it presents, sleep disturbances are notably prevalent, affecting up to 25% of individuals with mild-to-moderate dementia and as many as 50% in severe cases [1]. These disturbances are characterized by frequent awakenings and a predominance of light sleep, exacerbating cognitive decline and leading to behavior complications like restlessness and wandering. Specifically, we look at the characterization of such behavior as agitation, generally described in the behavioral context as increased—often undirected—motor activity, leading to restlessness, agggressiveness, and emotional distress [2]. Not only do these issues impose significant distress on caregivers, but they often lead to the institutionalization of affected individuals. The need to address these symptoms is underscored by their impact on the quality of dementia care.

In 2023, the economic impact of dementia, including Alzheimer’s disease, was substantial, costing the United States approximately USD 345 billion [3]. The current demographic trends indicate that over 6 million Americans are living with Alzheimer’s, with projections suggesting a doubling of this figure by 2050. The global scenario mirrors this growth, with over 55 million people affected worldwide, over 60% of whom reside in low- and middle-income countries [4]. Annually, approximately 10 million new cases are identified. These statistics highlight the urgent need for region-specific research and the development of effective interventions.

While there is growing recognition of the correlation between poor sleep quality and agitation in People Living with Dementia (PLwD), research utilizing machine learning to predict agitation based on this correlation remains limited. Current efforts focus on analyzing physiological and activity-based data to predict such behavioral disturbances. However, continuous monitoring of these data typically requires numerous sensors, complicating home-based caregiving due to the logistical challenges associated with sensor deployment. This study capitalizes on such observation, made possible by the dataset developed in [5].

We explore the condition of agitation using sleep and physiology parameters that were gathered from the homes of PLwD. We achieve this by developing a streamlined system that employs machine learning to analyze sleep patterns and physiological information from minimal sensors to predict and mitigate sleep disturbances and agitation in PLwD. By focusing on variables that can be efficiently captured, our approach minimizes the complexity for caregivers while maximizing the actionable insights derived from the data.

Sleep disturbances and cognitive impairment have been identified as significant contributors to agitated behaviors among nursing home residents. For instance, nighttime toileting awakenings have been linked to increased aggressive behaviors [6]. Similarly, Gehrman et al. [7] found that sleep-disordered breathing in Alzheimer’s patients correlates with certain types of agitation, and its treatment has shown the potential to reduce agitation and alleviate the caregiving burden. Moreover, studies have indicated that sleep problems, including insomnia and excessive time spent in bed, are associated with a higher risk of cognitive disorders, suggesting that managing sleep could be crucial in dementia prevention [8]. Furthermore, specific sleep disturbances, such as insomnia and fragmented sleep, are recognized risk factors for dementia, including Alzheimer’s and vascular dementia [9]. Dementia itself is closely associated with agitation, a common and distressing symptom. Agitation in dementia patients can significantly impact their quality of life and that of their caregivers. Effective monitoring and management of agitation are therefore critical in dementia care.

Recent research has explored innovative approaches to detecting and predicting agitation in dementia patients using wearable technology and machine learning. One study employed multi-modal wearable sensors, such as Empatica E4 wristbands, to capture physiological signals and detect agitation events labeled by nurses. The Extra Trees model demonstrated robust performance, with AUCs of 0.941 and 0.959 for different participants, highlighting the potential of this approach in agitation detection [10]. Additionally, HekmatiAthar et al. [11] proposed a deep learning-based approach that uses environmental data from dementia patients’ homes to predict agitation episodes. This approach achieved promising results using a Long Short-Term Memory (LSTM) model, including an F1 score of 0.64, suggesting its potential to assist caregivers in managing agitation more effectively. Another study examined whether mild behavioral impairment (MBI), including agitation and impulse dyscontrol, alongside brain morphology, predicted diagnosis changes over 40 months in individuals with normal cognition or mild cognitive impairment. Data from 340 participants showed MBI’s predictive accuracy in both binary (84.4%) and three-class (58.8%) classification models, underscoring MBI’s relevance for early dementia detection [12].

In a study by Palermo et al. [13] assessing agitation risk, researchers used data from an in-home monitoring LSTM system to build a model to detect agitation. This model achieved a notable recall of 79.78% but an F1 score of 37.64%, indicating room for improvement. Comparatively, a prior study by TIHM developed a Markov model for anomaly detection in a dementia patient’s behavior to detect agitation, achieving 80% accuracy [14]. Despite these advancements, there remains a notable research gap in leveraging sleep metrics to predict agitation in dementia patients, which this paper targets.

The paper is structured as follows: We begin by reviewing the existing literature on the link between sleep quality and agitation and evaluating previous attempts to use machine learning to predict agitation in PLwD. We then detail the materials and methods employed in our study, including the data preprocessing and model selection, and apply Shapley values to decipher the contributors of agitation in our collected samples post hoc. Finally, we discuss our results and present our findings, offering recommendations for future research in this critical area.

2. Materials and Methods

The first pipeline in Figure 1 outlines the steps taken in developing and evaluating a machine learning model using the TIHM dataset, from data preprocessing to model training and testing. It details the use of stratified sampling to split the data, the selection of the best machine learning model through evaluation, and the application of cross-validation for model robustness. The process concludes with applying explainable AI techniques, specifically Shapley values, to interpret the model outcomes. The second pipeline in Figure 2 focuses on the preprocessing steps taken for sleep and physiological data from the TIHM dataset. It begins with data cleaning and temporal alignment of agitation labels with sleep data, followed by feature engineering to extract relevant sleep and physiological metrics. The data are then aggregated by patient on a daily basis with the appropriate agitation label. Finally, the data undergo a final preprocessing step where they are unified, instances are de-duplicated, and values are scaled to prepare for model training.

2.1. Dataset

The Technology Integrated Health Management (TIHM) project [5] employs an advanced dataset collection approach focused on remote healthcare monitoring of dementia patients. Utilizing off-the-shelf sensory devices, the project gathers continuous in-home data related to daily activities and physiological parameters. The sensors offer various sampling frequencies, ensuring comprehensive data capture over an average period of 50 days per participant, with the devices being inactive during charging times. This dataset, originally derived from 56 participants with an almost even gender split and a broad age range, was meticulously cleaned and de-identified before analysis. The labeling process in the TIHM study involves a monitoring team that verifies health-related events reported by participants or detected by sensors. This dual validation approach ensures the dataset’s high reliability, which was previously confirmed to be effective in the main study. Ethical considerations were rigorously followed, with all participants providing informed consent under standards akin to those in the Helsinki Declaration. Further details on the original demographics and sensor properties can be found in the source dataset [5].

However, for our study’s focus on sleep data, the TIHM project only collected sleep data for 17 patients. While the ‘Physiology’ and ‘Labels’ datasets include records for all 56 patients, we explored the role of sleep and its associated parameters, which are far fewer in number. Thus, we aligned the data to match the patients with their corresponding records. This alignment resulted in 850 complete records from the original 2803 records for our study. The data encompassed, in addition to sleep, physiological measurements and behavioral observations. These records were encoded appropriately to maintain participant anonymity while allowing for accurate machine learning analysis. The dataset is primarily heterogeneous, containing binary, categorical, and numerical data. The distribution of labels, particularly focusing on agitation, highlights that health events can significantly impact the management and quality of life of patients.

In our research, we utilized three out of the five available datasets from the TIHM project: First, we used the ‘Sleep’ dataset, which includes data on four sleep states—awake, light, deep, and REM—captured using sleep tracking mats. This dataset also records snoring, heart rate, and respiratory rate, collecting data every minute that the person with dementia is in bed. Second, we used the ‘Physiology’ dataset, which contains daily measurements of vital signs like body temperature, blood pressure, heart rate, and body composition. However, some data points are missing, as not all participants log their information daily. Third, we used the ‘Labels’ dataset, which comprises data on six types of health alerts verified by the TIHM monitoring team, including agitation, unusual blood pressure, body temperature fluctuations, dehydration, irregular heart rate, and weight changes. We focused on agitation as our primary label to improve quality of life and noted that seven participants had no confirmed alerts; therefore, we excluded them from this dataset. A data description of the original TIHM dataset is shown in Table 1.

2.2. Preprocessing and Feature Curation

This section elaborates on the comprehensive approach we adopted to preprocess, refine, and integrate sleep and physiological data. Our workflow was structured into three sequential phases, each building upon the previous outputs to develop a dataset for predictive modeling and advanced data analysis.

The first phase focused on cleaning, formatting, and annotating sleep data with agitation indicators and derived relevant sleep metrics for initial analyses. We began by importing sleep data and corresponding agitation labels, converting dates and times to a consistent date–time format to facilitate temporal operations. We initiated a new binary column in the ‘Sleep’ dataset to mark agitation instances, initially set to 0.

We mapped sleep data to agitation labels to analyze sleep patterns in patients exhibiting agitation behaviors. In the dataset, timestamps for agitation events were only recorded at two specific times: 12 PM and 6 PM. Sleep parameters were acquired via sleep tracking, with a frequency of 1 min, but agitation episodes were only manually recorded the following noon and evening. Similarly, physiological parameters such as body temperature, systolic/diastolic blood pressure, and heart rate were also reported either once per day or sporadically throughout the day. Therefore, for chronological consistency and to view the effect of a single night’s sleep on the following day’s agitation label, we looked at two windows within a day to average the physiological metrics.

As such, this led to an ‘afternoon’ or ‘night’ window based on the hour. Next, we calculated the start and end times for each window. The ‘night’ window spanned from 6 PM of the previous day to 12 PM of the current day, while the afternoon window spanned from 12 PM to 6 PM of the same day. If there was an agitation case in the dataset with the patient’s ID and the date and time fell within the window, agitation was marked as 1.

After establishing the time windows, we calculated key sleep metrics essential for understanding sleep disturbances in dementia patients. For Wake After Sleep Onset (WASO), we computed the total time awake after sleep onset and before final awakening by summing periods of wakefulness and any gaps in the data, capturing all wakefulness events per patient. For Sleep Onset Latency (SOL), we measured the time taken to transition from wakefulness to the first sleep stage. If no sleep state was detected, SOL was set to a negative value (indicating ‘awake’). For Time in Bed (TiB), we calculated the total time spent in bed, defined as the duration between the first and last timestamps in each window. For Total Sleep Time (TST), we calculated the total sleep duration by subtracting wakefulness periods from TiB. For Sleep Efficiency (SE), we measured the proportion of Time in Bed spent sleeping, expressed as a percentage [15]. After defining these time windows, averages and standard deviations for heart rate and respiratory rate were calculated for each patient within each time window. The use of mean values is justified because physiological signals typically exhibit minimal variation in fairly healthy individuals, excluding agitation instances [16]. These metrics were aggregated by the patient ID and specific time windows, ensuring each entry retained the highest agitation level recorded within the corresponding timeframe to highlight potential triggers or symptoms of agitation.

In the second phase, our objective was to refine and structure physiological data to align with the sleep data by creating consistent time windows and linking agitation data. As mentioned previously, due to the randomness in the timings of the reported physiological values, we assigned them to an ‘afternoon’ or ‘night’ agitation label. The criteria were based on whether the timestamp was before or after 30 min past the hour, capturing overlapping physiological changes. For agitation labels at 6 PM, physiological data from 2 h before to up to 6 h after (4 PM to 10 PM) were analyzed. For agitation labels at 12 PM, data from 5 AM to 3 PM were considered, accommodating potential overnight physiological changes. Data entries meeting these criteria, where the patient ID matched and the date was within the specified timeframe, were flagged for agitation. Any other issues besides agitation itself, such as abnormally high or low blood pressure/temperature/heart rate, dehydration, and weight changes, were explicitly marked, which we accounted for if applicable.

Missing data were filled using Last Observation Carried Forward (LOCF), which is a type of forward filling that carries the value of the last observation (in our case, the previous time window for the patient), maintaining continuity. Remaining NaN values were then dropped.

The third phase involved integrating sleep and physiological data to form a comprehensive and unified dataset that encapsulates sleep patterns and physiological responses relative to agitation events. To synchronize sleep and physiological datasets, we mapped ‘night’ sleep data to the next day’s physiological data and ‘afternoon’ sleep data to the same day’s physiological data. This mapping considers the natural delay in physiological responses due to sundowning [17], ensuring the relevant sleep data aligned with subsequent physiological readings and mirrored the circadian rhythm [18].

Following preprocessing, our final dataset consisted of 525 rows and 11 columns, with the target feature designated as “Agitation.” Notably, there was a class imbalance of nearly 70% for Agitation = 0 and 30% for Agitation = 1, which is considered in the following step. The variables of the final dataset and their description can be found in Table 2.

2.3. Training and Evaluation

To enhance the reliability of our model evaluations, we implemented two techniques: stratified sampling and stratified k-fold cross-validation. Stratified sampling involved dividing the population into subgroups (strata) based on specific characteristics relevant to our study, such as agitation levels, and sampling from these subgroups proportionally. This method ensure that the diversity within a population is accurately represented in the sample, leading to more generalizable results. Additionally, we employed stratified k-fold cross-validation, an adaptation of the traditional k-fold method. This technique ensures that each fold of a dataset maintains the same class distribution as the original dataset, countering the effects of class imbalance, such as the 30:70 ratio in our dataset. By preserving class proportions across each fold, stratified k-fold cross-validation yields a more reliable assessment of model performance in imbalanced scenarios.

GBDTs consistently outperform other machine learning methods for imbalanced data, and are a popular choice in AI-based healthcare applications. Ramesh et al. [19] explored various machine learning models, including GBDTs, to screen for Vitamin A deficiency in school children without blood tests. Additionally, ref. [20] utilized GBDTs to predict adverse events in an Intensive Care Unit (ICU). Transformers [21,22] have shown promise with tabular data as well; however, these are reliant on large numbers of data, and have relatively higher computational requirements than GBDTs in the healthcare domain [23,24]. Therefore, for this study, which had fewer than a thousand records, we opted to exclusively pursue simpler machine learning models for our analysis. For this study, we chose five models: Random Forest, LightGBM, Extra Trees, XGBoost, and Gradient Boosting.

Explainable artificial intelligence (XAI) refers to applying artificial intelligence technology methods and techniques such that human experts can understand the results of the solution. It contrasts with the concept of the “black box” in machine learning, where even the designers cannot explain why the AI arrives at a specific decision.

For XAI, the Shapley value offers a way to fairly distribute the predictive power contributed by each feature in a cooperative game, ensuring each player or feature receives its due based on its contribution. The Shapley value estimation for a feature

x i

is formalized as shown in Equation (1):

ϕ_{y} (x_{i}; x) = \sum_{i ⊅ S} \frac{| S |! (n - 1 - | S |)!}{n!} [f_{y} (x_{S} \cup x_{i}) - f_{y} (x_{S})]

(1)

This formula calculates the Shapley value for a feature

x_{i}

in a dataset. Here, S represents all possible subsets of features excluding i, and N is the set of all features. The characteristic function

f (S)

is applied to these subsets to estimate the contribution of adding i to each subset S. The value for

x_{i}

is then obtained by computing the difference in performance (like accuracy or loss reduction) when i is included versus when it is not. This difference is averaged over all possible subsets to derive the Shapley value, which reflects the average marginal contribution of the feature across all potential combinations. A feature

x_{i}

that significantly improves model performance when added to subsets will have a high Shapley value, while a feature that contributes little will have a low Shapley value.

3. Results

Table 3 summarizes the evaluation metrics of the various machine learning models applied to our dataset. The weighted F1 score, precision, and recall serve as our primary evaluation metrics, and are especially effective for assessing models applied to imbalanced datasets and useful for evaluating a model’s efficacy in medical applications. Our models performed well, achieving weighted F1 scores between 93.4% and 95.3% in a single test case. To ensure a more reliable assessment of model performance, we adopted a stratified 5-fold cross-validation strategy. The results of the 5-fold stratified cross-validation are outlined in Table 4. It is worth mentioning that train–test patients did not overlap, avoiding data leakage across each fold.

Upon analysis, it is evident that LightGBM consistently outperformed other models, boasting a mean weighted F1 score of 93.54% across five trials. This robust performance positions LightGBM as the optimal choice for further analysis and deployment in our subsequent endeavors. When we generated the confusion matrix, the model correctly predicted “Not Agitated” for 70 cases while accurately identifying 29 cases of actual agitation. There were five false positives, where the model predicted agitation that did not occur (Type I error), and only one false negative, where an actual case of agitation was missed (Type II error). The small number of Type II errors is particularly important, as missing true agitation could lead to significant risks for patients and caregivers. On the other hand, while the five false positives (Type I errors) might lead to unnecessary alerts, they are less concerning in this context, as we prioritize identifying agitation early. The high sensitivity demonstrated by the model, with most true agitation cases being detected, supports the need to err on the side of caution to ensure timely intervention, even if it results in occasional false alarms.

4. Discussion

Our evaluation of various machine learning models applied to the dataset revealed significant insights into their performance and the nature of the data. One of the primary objectives of this study was to integrate into household patient care seamlessly, which requires input from minimal sensors. Our data engineering and preprocessing ensured that the remaining variables could be extracted from one–two devices, such as wearables and smartwatches, which are noninvasive and more manageable for the patient and the caretaker.

Ensemble methods, particularly LightGBM, demonstrated superior performance, achieving a mean weighted F1 score of 93.54% across five stratified cross-validation trials. This suggests that the data benefit from methods capable of capturing higher-dimensional interactions, which ensemble models handle effectively. Tree-based models like LightGBM excel in managing perturbations within datasets by leveraging feature selection thresholds, which can discard outliers or non-essential variations. This attribute is particularly useful in complex medical datasets where robustness against variability is crucial.

The confusion matrix above reveals that the model exhibited a notably low number of false negatives, indicating that nearly every instance of the target class “Agitation” was accurately predicted as such. This is advantageous because, in cases where the model misclassifies a non-agitated patient as agitated, caretakers can easily disregard the result. However, if the model were prone to missing instances where patients are genuinely agitated (resulting in false negatives), it would significantly impact their quality of life, rendering such an outcome undesirable.

Recent methodological reviews [25,26] show that small samples are particularly susceptible to overfitting and that cross-validation methods without independent test sets can overestimate performance. Thus, we interpreted the results through the lens of Shapley values with the intention of validating whether clinically viable features drove the performance or whether they exhibited spurious correlations. These values clarify how each feature affects a model’s output, granting us a transparent view of the internal decision-making process. This level of interpretability is critical, especially in high-stakes domains like healthcare, where understanding the reasoning behind predictions is as important as the predictions themselves.

The summary plot presented in Figure 3 offers a detailed analysis of the influence of various features on the predictive model output, explicitly targeting the prediction of “Agitation”. Our findings indicate that certain features such as ‘mean_HR_sleep’, ‘RR_var_sleep’, and ‘TIB_sleep’ significantly influenced the model’s agitation prediction, aligning with the medical literature investigating the role of circadian rhythm in this context [17,18].

‘Mean_HR_sleep’ significantly impacted model predictions, reflecting that higher heart rates during sleep are associated with increased agitation risks. This observation is supported by studies indicating that higher resting heart rates can be indicative of elevated neuro-psychiatric symptoms in dementia patients, thus reinforcing the clinical relevance of this feature [27]. Conversely, ‘RR_var_sleep’ (Variance of Respiratory Rate during sleep) displayed a wide distribution of Shapley values, mostly on the negative side, suggesting that higher variability in respiratory rate generally contributes to lowering the prediction of agitation. This aligns with the literature demonstrating that lower heart rate variability, a measure of autonomic nervous system flexibility, is associated with increased agitation risks in dementia patients, highlighting its potential as a clinical marker for monitoring and intervention [28]. Furthermore, ‘TIB_sleep’ (Time in Bed during sleep) showed a balanced distribution of impacts on the model’s predictions, indicating its nuanced role in agitation. As quantified by Time in Bed, disrupted sleep patterns have been shown to exacerbate agitation, aligning with studies emphasizing the critical nature of sleep management in dementia-related care [29].

Integrating cardiovascular and sleep metrics into our model enhanced its predictive accuracy and supports its applicability in real-time monitoring systems for dementia patients. By effectively linking AI insights with established medical knowledge, we enhance the technological and clinical aspects of dementia care, offering a promising avenue for future research and application. This approach has broader implications for predictive health analytics, patient monitoring, and developing responsive care systems in medical settings.

The growing prevalence of dementia globally and the substantial burden it places on individuals, families, and healthcare systems highlight the importance of our work. As the number of PLwD continues to rise, particularly in low–middle-income countries [4], there is an urgent need for scalable, cost-effective solutions that can be implemented in diverse settings. Our research contributes to fulfilling this need by providing a model that accurately predicts agitation and uses minimal sensor data, reducing the logistical and financial burdens of extensive sensor networks in home settings. As mentioned by a complementary study solely focusing on nocturnal disturbances [30] among dementia patients, approaches such as ours highlight the potential of using passive digital technologies for providing preemptive care to PLwD.

Moreover, the application of XAI techniques in our study enhanced the transparency and interpretability of our predictive model. This integration is crucial for building trust among healthcare providers and caregivers, who must understand the basis of the predictions to utilize them effectively in clinical practice, aligning with the growing emphasis on XAI in healthcare applications [31]. Our findings reveal that specific sleep-related features, such as mean heart rate during sleep and variability in respiratory rate, are significant predictors of agitation, aligning with the existing medical literature on the physiological underpinnings of agitation in dementia.

In terms of limitations, we note the limited generalizability, i.e., the possibility that patients may not not have similar demographics to the patients in this dataset. For instance, in the TIHM dataset, all patients have verified mild mental impairment if not a complete dementia diagnosis, the ethnicity is predominantly Caucasian, the lowest age is 70, and most patients live with at least one other person. Secondly, our approach to data fusion and aggregation is not ideal, as it represents a reasonable approximation based on the availability and variability of monitored data. In naturalistic settings with multiple moving parts where the sensory devices are installed in participants’ homes, there is likely to be missing data, heterogeneity between patient homes, confounding variables, and sparse annotations. Additionally, other imputation strategies in addition to LOCF can be explored in future work as this method may smoothen clinically meaningful variability.

In conjunction with other works leveraging TIHM that probe into other facets of these data modalities, we believe that our work offers an exploratory look into the role of sleep and its associated parameters in agitation prediction. We position our efforts as a preliminary research work tackling commonly used machine learning models and ensemble methods to improve prediction accuracy.

5. Conclusions

In this paper, we have explored the significant potential of integrating data acquired from observational and measurement technologies at home and machine learning to enhance sleep quality and predict agitation in PLwD. Our study bridges a crucial gap in the current research landscape by effectively leveraging sleep and physiological data to anticipate agitation, a common and distressing symptom in dementia. Employing the TIHM dataset, we demonstrated that the LightGBM model outperforms other tested machine learning algorithms, achieving a weighted F1 score of 93.6%. This high level of accuracy in predicting agitation underscores the value of our approach for enhancing patient care and supporting caregivers.

Looking forward, the methodology and findings presented in this paper lay the groundwork for further innovation in healthcare technology for dementia. There is a significant opportunity to refine data preprocessing techniques and to examine the impact of different demographic factors on model performance to enhance generalizability and robustness, aligning with the recent emphasis on these aspects in machine learning for healthcare applications [32]. Moreover, implementing these predictive models in real-world settings and measuring their impact on patient outcomes and caregiver efficiency would provide valuable insights into their practical benefits and areas for improvement, aligning with recent calls for increased focus on the real-world implementation of machine learning models in the continuum of degenerative disease treatment [33]. These steps will help advance the field and realize the potential of technology to transform care for dementia patients.

Author Contributions

Conceptualization, J.R.; Methodology, A.Y., Y.G. and R.A.K.; Software, K.R. and A.Y.; Formal analysis, K.R. and Y.G.; Investigation, K.R.; Data curation, R.A.K.; Writing — original draft, K.R. and A.Y.; Writing — review & editing, J.R.; Supervision, J.R., Assim Sagahyroon and F.A.; Project administration, A.S. and F.A.; Funding acquisition, F.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The TIHM dataset adopted in this research is openly available on [Zenodo] at https://zenodo.org/records/7622128 (accessed on 1 June 2024).

Acknowledgments

This paper represents the opinions of the authors and does not mean to represent the position or opinions of the American University of Sharjah.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

PLwD	Patients Living with Dementia
GBDTs	Gradient-Boosting Decision Trees

References

Koren, T.; Fisher, E.; Webster, L.; Livingston, G.; Rapaport, P. Prevalence of sleep disturbances in people with dementia living in the community: A systematic review and meta-analysis. Ageing Res. Rev. 2023, 83, 101782. [Google Scholar] [CrossRef]
Feast, A.; Orrell, M.; Charlesworth, G.; Melunsky, N.; Poland, F.; Moniz-Cook, E. Behavioural and psychological symptoms in dementia and the challenges for family carers: Systematic review. Br. J. Psychiatry 2016, 208, 429–434. [Google Scholar] [CrossRef]
Alzheimer’s Disease International. World Alzheimer Report 2022: Journey Through the Diagnosis of Dementia; Technical Report; Alzheimer’s Disease International (ADI): London, UK, 2022; Available online: https://www.alz.co.uk/research/world-report-2022 (accessed on 18 July 2024).
World Health Organization. Dementia Profile: Global Health Estimates; Technical Report; World Health Organization: Geneva, Switzerland, 2022.
Palermo, F.; Chen, Y.; Capstick, A.; Fletcher-Loyd, N.; Walsh, C.; Kouchaki, S.; True, J.; Balazikova, O.; Soreq, E.; Scott, G.; et al. TIHM: An open dataset for remote healthcare monitoring in dementia. Sci. Data 2023, 10, 606. [Google Scholar] [CrossRef] [PubMed]
Cohen-Mansfield, J.; Marx, M.S. The relationship between sleep disturbances and agitation in a nursing home. J. Aging Health 1990, 2, 42–57. [Google Scholar] [CrossRef]
Gehrman, P.R.; Martin, J.L.; Shochat, T.; Nolan, S.; Corey-Bloom, J.; Ancoli-Israel, S. Sleep-disordered breathing and agitation in institutionalized adults with Alzheimer disease. Am. J. Geriatr. Psychiatry 2003, 11, 426–433. [Google Scholar] [CrossRef]
Xu, W.; Tan, C.C.; Zou, J.J.; Cao, X.P.; Tan, L. Sleep problems and risk of all-cause cognitive decline or dementia: An updated systematic review and meta-analysis. J. Neurol. Neurosurg. Psychiatry 2020, 91, 236–244. [Google Scholar] [CrossRef]
Shi, L.; Chen, S.J.; Ma, M.Y.; Bao, Y.P.; Han, Y.; Wang, Y.M.; Shi, J.; Vitiello, M.V.; Lu, L. Sleep disturbances increase the risk of dementia: A systematic review and meta-analysis. Sleep Med. Rev. 2018, 40, 4–16. [Google Scholar] [CrossRef] [PubMed]
Badawi, A.; Elgazzar, K.; Ye, B.; Newman, K.; Mihailidis, A.; Iaboni, A.; Khan, S.S. Investigating Multimodal Sensor Features Importance to Detect Agitation in People with Dementia. In Proceedings of the 2023 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), Regina, SK, Canada, 24–27 September 2023; pp. 77–82. [Google Scholar] [CrossRef]
HekmatiAthar, S.; Goins, H.; Samuel, R.; Byfield, G.; Anwar, M. Data-driven forecasting of agitation for persons with dementia: A deep learning-based approach. SN Comput. Sci. 2021, 2, 326. [Google Scholar] [CrossRef]
Gill, S.; Mouches, P.; Hu, S.; Rajashekar, D.; MacMaster, F.P.; Smith, E.E.; Forkert, N.D.; Ismail, Z. Using machine learning to predict dementia from neuropsychiatric symptom and neuroimaging data. J. Alzheimers Dis. 2020, 75, 277–288. [Google Scholar] [CrossRef]
Palermo, F.; Li, H.; Capstick, A.; Fletcher-Lloyd, N.; Zhao, Y.; Kouchaki, S.; Nilforooshan, R.; Sharp, D.; Barnaghi, P. Designing A Clinically Applicable Deep Recurrent Model to Identify Neuropsychiatric Symptoms in People Living with Dementia Using In-Home Monitoring Data. arXiv 2021, arXiv:2110.09868. [Google Scholar] [CrossRef]
Enshaeifar, S.; Zoha, A.; Markides, A.; Skillman, S.; Acton, S.T.; Elsaleh, T.; Hassanpour, M.; Ahrabian, A.; Kenny, M.; Klein, S.; et al. Health management and pattern analysis of daily living activities of people with dementia using in-home sensors and machine learning techniques. PLoS ONE 2018, 13, e0195605. [Google Scholar] [CrossRef]
Reed, D.L.; Sacco, W.P. Measuring Sleep Efficiency: What Should the Denominator Be? J. Clin. Sleep Med. 2016, 12, 263–266. [Google Scholar] [CrossRef] [PubMed]
Zheng, N.S.; Annis, J.; Master, H.; Han, L.; Gleichauf, K.; Ching, J.H.; Nasser, M.; Coleman, P.; Desine, S.; Ruderfer, D.M.; et al. Sleep patterns and risk of chronic disease as measured by long-term monitoring with commercial wearable devices in the All of Us Research Program. Nat. Med. 2024, 30, 2648–2656. [Google Scholar] [CrossRef]
Carrarini, C.; Russo, M.; Dono, F.; Barbone, F.; Rispoli, M.G.; Ferri, L.; Di Pietro, M.; Digiovanni, A.; Ajdinaj, P.; Speranza, R.; et al. Agitation and dementia: Prevention and treatment strategies in acute and chronic conditions. Front. Neurol. 2021, 12, 644317. [Google Scholar] [CrossRef]
Volicer, L.; Harper, D.G.; Manning, B.C.; Goldstein, R.; Satlin, A. Sundowning and circadian rhythms in Alzheimer’s disease. Am. J. Psychiatry 2001, 158, 704–711. [Google Scholar] [CrossRef]
Ramesh, J.; Sankalpa, D.; Khamis, A.; Sagahyroon, A.; Aloul, F. Explainable Machine Learning for Vitamin A Deficiency Classification in Schoolchildren. In Proceedings of the 2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI), Ioannina, Greece, 27–30 September 2022; pp. 1–4. [Google Scholar] [CrossRef]
Zhu, Y.; Venugopalan, J.; Zhang, Z.; Chanani, N.K.; Maher, K.O.; Wang, M.D. Domain Adaptation Using Convolutional Autoencoder and Gradient Boosting for Adverse Events Prediction in the Intensive Care Unit. Front. Artif. Intell. 2022, 5, 640926. [Google Scholar] [CrossRef] [PubMed]
Hollmann, N.; Müller, S.; Eggensperger, K.; Hutter, F. TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second. arXiv 2023, arXiv:2207.01848. [Google Scholar] [CrossRef]
Arik, S.Ö.; Pfister, T. Tabnet: Attentive interpretable tabular learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 19–21 May 2021; Volume 35, pp. 6679–6687. [Google Scholar]
Yıldız, A.Y.; Kalayci, A. Gradient Boosting Decision Trees on Medical Diagnosis over Tabular Data. arXiv 2025, arXiv:2410.03705. [Google Scholar]
Ruan, Y.; Lan, X.; Ma, J.; Dong, Y.; He, K.; Feng, M. Language modeling on tabular data: A survey of foundations, techniques and evolution. arXiv 2024, arXiv:2408.10548. [Google Scholar] [CrossRef]
Ghasemzadeh, H.; Hillman, R.E.; Mehta, D.D. Toward Generalizable Machine Learning Models in Speech, Language, and Hearing Sciences: Estimating Sample Size and Reducing Overfitting. J. Speech Lang. Hear. Res. JSLHR 2024, 67, 753–781. [Google Scholar] [CrossRef]
Vabalas, A.; Gowen, E.; Poliakoff, E.; Casson, A.J. Machine learning algorithm validation with a limited sample size. PLoS ONE 2019, 14, e0224365. [Google Scholar] [CrossRef]
Liu, K.Y.; Whitsel, E.A.; Heiss, G.; Palta, P.; Reeves, S.; Lin, F.V.; Mather, M.; Roiser, J.P.; Howard, R. Heart rate variability and risk of agitation in Alzheimer’s disease: The Atherosclerosis Risk in Communities Study. Brain Commun. 2023, 5, fcad269. [Google Scholar] [CrossRef]
Deng, Y.T.; Kuo, K.; Wu, B.S.; Ou, Y.N.; Yang, L.; Zhang, Y.R.; Huang, S.Y.; Chen, S.D.; Guo, Y.; Zhang, R.Q.; et al. Associations of resting heart rate with incident dementia, cognition, and brain structure: A prospective cohort study of UK biobank. Alzheimer’s Res. Ther. 2022, 14, 147. [Google Scholar] [CrossRef]
Kroll, L.; Böhning, N.; Müßigbrodt, H.; Stahl, M.; Halkin, P.; Liehr, B.; Grunow, C.; Kujumdshieva-Böhning, B.; Freise, C.; Hopfenmüller, W.; et al. Non-contact monitoring of agitation and use of a sheltering device in patients with dementia in emergency departments: A feasibility study. BMC Psychiatry 2020, 20, 165. [Google Scholar] [CrossRef] [PubMed]
Rigny, L.; Fletcher-Lloyd, N.; Capstick, A.; Nilforooshan, R.; Barnaghi, P. Assessment of Sleep Patterns in Dementia and General Population Cohorts Using Passive In-Home Monitoring Technologies. Commun. Med. 2024, 4, 222. [Google Scholar] [CrossRef] [PubMed]
Arrieta, A.B. Explainable artificial intelligence (XAI): A review of methods and applications. Nat. Mach. Intell. 2019, 1, 201–211. [Google Scholar]
Sun, B. Generalizability in Machine Learning for Healthcare Applications. Nat. Med. 2023, 1, 1–10. [Google Scholar]
Lee, J. Towards Real-World Implementation of Machine Learning for Dementia Care: A Survey. In Proceedings of the IEEE Engineering in Medicine and Biology Society Conference (EMBC), Glasgow, UK, 11–15 July 2022; pp. 1–5. [Google Scholar]

Figure 1. End-to-end pipeline.

Figure 2. Data engineering pipeline.

Figure 3. Shapley values summary plot for the LightGBM model.

Table 1. TIHM dataset variable descriptions.

Dataset	Variable Name	Description
Common Variables	patient_id	Unique patient ID for all participants
	date	Date and time when input was recorded
Sleep	state	Sleep stage of patient (awake, light, deep, REM)
	heart_rate	Heart rate of patient during sleep at time of recording
	respiratory_rate	Respiratory rate of patient during sleep at time of recording
	snoring	Whether patient is snoring at time of recording (true/false)
Physiology	device_type	Variable of the PLwD measured (e.g., diastolic blood pressure, heart rate, etc.)
	value	Value of the variable measured
	unit	Unit of measurement (e.g., temperature—Celsius)
Labels	type	Type of episode (the target ‘agitation’ is the type of episode which we used for our study)

Table 2. Variable descriptions of final dataset.

Variable Name	Description
patient_id	Unique patient ID for all participants
window_period	The specific timeframe during which the recorded data are aggregated (e.g., 26 June 2019 night–27 June 2019 day).
mean_HR_sleep	The average heart rate (beats per min) of the patient’s sleep the night before
HR_var_sleep	The variability of heart rate during sleep
mean_RR_sleep	The average respiratory rate aggregated during sleep (breaths per minute)
RR_var_sleep	The variability in respiratory rate during sleep
WASO_sleep	The total time spent awake after initially falling asleep during the sleep period
SOL_sleep	The amount of time it took to transition from full wakefulness to sleep
TIB_sleep	The total time spent in bed by the patient, regardless of whether they were asleep or awake
TST_sleep	The total time spent asleep during the night
SE_sleep	The percentage of time in bed that was spent asleep
Heart Rate	The number of heartbeats per minute while the patient was awake
Agitation	A binary indicator of whether the patient exhibited agitation, marked as 1 for ‘yes’ and 0 for ‘no’

Table 3. Performance metrics of various models.

Model Name	Accuracy	Precision	Recall	Weighted F1 Score
Random Forest	0.933	0.871	0.900	0.934
Gradient Boosting	0.933	0.848	0.933	0.934
XGBoost	0.952	0.879	0.967	0.953
LightGBM	0.943	0.853	0.967	0.944
Extra Trees	0.933	0.897	0.867	0.933

Table 4. Results after 5-fold cross-validation.

Model	Mean Weighted F1 Score After 5-Fold CV
Random Forest	0.907941
Gradient Boosting	0.911245
XGBoost	0.898525
LightGBM	0.935402
Extra Trees	0.928545

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ramesh, K.; Yakoub, A.; Ghoneim, Y.; Al Korabi, R.; Ramesh, J.; Sagahyroon, A.; Aloul, F. Machine Learning Prediction of Agitation in Dementia Patients Using Sleep and Physiological Data. Appl. Sci. 2025, 15, 9908. https://doi.org/10.3390/app15189908

AMA Style

Ramesh K, Yakoub A, Ghoneim Y, Al Korabi R, Ramesh J, Sagahyroon A, Aloul F. Machine Learning Prediction of Agitation in Dementia Patients Using Sleep and Physiological Data. Applied Sciences. 2025; 15(18):9908. https://doi.org/10.3390/app15189908

Chicago/Turabian Style

Ramesh, Keshav, Anna Yakoub, Youssef Ghoneim, Rehab Al Korabi, Jayroop Ramesh, Assim Sagahyroon, and Fadi Aloul. 2025. "Machine Learning Prediction of Agitation in Dementia Patients Using Sleep and Physiological Data" Applied Sciences 15, no. 18: 9908. https://doi.org/10.3390/app15189908

APA Style

Ramesh, K., Yakoub, A., Ghoneim, Y., Al Korabi, R., Ramesh, J., Sagahyroon, A., & Aloul, F. (2025). Machine Learning Prediction of Agitation in Dementia Patients Using Sleep and Physiological Data. Applied Sciences, 15(18), 9908. https://doi.org/10.3390/app15189908

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning Prediction of Agitation in Dementia Patients Using Sleep and Physiological Data

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset

2.2. Preprocessing and Feature Curation

2.3. Training and Evaluation

3. Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI