Having sufficient and good sleep is crucial for our physical and mental health. There is established evidence that poor sleep may increase an individual’s risk for cardiovascular disease [1
], metabolic disorders [3
], and mental problems [5
]. Just as health is a multidimensional concept, sleep health can also be elaborated along multiple dimensions, including duration, continuity or efficiency, timing, daytime alertness, and overall subjective assessment of quality [6
]. When it comes to the measurement of sleep, polysomnography (PSG) is the gold standard sleep study procedure that simultaneously monitors the electroencephalogram (EEG), electro-oculogram (EOG), electromyogram (EMG), electrocardiogram (ECG), and pulse oximetry, as well as airflow and respiratory effort. Despite the comprehensive information generated, the use of PSG has been limited to sleep clinics and laboratories due to its high cost, obtrusiveness, and low usability.
In recent years, the rapid expansion of the sleep-tracking product market has provided a significant opportunity to promote sleep monitoring and sleep health in daily life. Popular consumer sleep-tracking wearables (e.g., Fitbit, Apple Watch, Oura Ring) and mobile apps (e.g., SleepAsAndroid, SleepCycle) have attracted increasing interest in the research community as well as among end users. A large body of research surrounding these technologies has intensively studied their accuracy and validity [7
], as well as user-perceived properties including usability and credibility [11
]. Some studies have also attempted to develop novel sleep staging algorithms that work with the processed data available from consumer sleep trackers [14
], as well as devising new analytic methods for personalized sleep data analysis and knowledge discovery [17
Despite their popularity, consumer sleep-tracking technologies face several barriers to improving sleep quality. Two major barriers include “not identifying reasons for sleep problems
” and “not knowing how to act
]. Research effort has been expended to address these barriers. For example, several systems have been developed to assist users to explore the relationships between sleep and a flux of behavioral and environmental data [11
]. However, these systems still leave users to interpret the statistical results on their own, without providing specific and readily actionable recommendations based on the results. Consequently, current sleep-tracking technologies have limited efficacy in improving the sleep quality of users.
Recommender systems (RSs) represent a promising solution to the current gap in ubiquitous sleep computing research. An RS is an application that suggests relevant items to users [24
]. Depending on the application context, the items could be movies, products, restaurants, travel routes, etc. At a high level, an RS attempts to predict whether a given recommendation item will be appreciated by the user as relevant. Early RSs operated in a two-dimensional (2D) user × item space and have been criticized for lacking analysis of contextual information [25
]. More recently, the RS research community has shifted focus to the so-called context-aware recommender systems (CARS) [27
], which aim to effectively and efficiently exploit the dynamic context of a user to offer suitable and relevant recommendations [30
]. In CARS, the classical 2D paradigm is extended to a 3D paradigm of user × item × context. The concept of CARS has also been introduced to health research, and many health CARS have been developed [32
The concept of recommending behavioral interventions as treatment to sleep problems is not new in sleep science. One of the most active research areas is the development of digitally aided cognitive behavior therapy for insomnia (CBT-I). CBT-I is a standardized multi-component treatment for insomnia as recommended in the American Academy of Sleep Medicine Practice Guidelines [34
]. A CBT-I program comprises sleep restriction therapy, stimulus control, relaxation strategies, sleep hygiene education, and modification of maladaptive beliefs about sleep. Several web and mobile applications have been developed to improve the accessibility and availability of CBT-I, such as iREST [35
] and Sleep Bunny [36
]. These systems are often implemented in the form of mobile health (mHealth) apps, aiming to deliver a cost-effective method of treatment that can be easily accessed at home. The meta-analysis of digital CBT-I has demonstrated significant improvements in sleep quality [37
] and long-term benefits compared to pharmacotherapy [39
]. However, most of the existing digital CBT-I systems merely translated self-help manuals to a digital format to provide general sleep hygiene recommendations. It was not until recently that new digital CBT-I systems started to offer recommendations that are shallowly tailored to limited user features.
On the other hand, sleep health RS for non-clinical populations is relatively new and has only started to attract research interest very recently [40
]. Popular consumer sleep trackers such as Fitbit and Oura all provide generic sleep hygiene tips to facilitate stimulus control and relaxation before bed. While these recommendations may be shallowly tailored to a user’s gender and age, they do not consider the dynamic context of a user, such as shifts in their sleep quality baseline, sleep goal, health state, daily schedule, and personal preference.
To this end, the research and development of sleep health systems that provide recommendations that are fully personalized and adapted to users’ static and dynamic context has only been in its infancy. Here, we coin the term context-aware sleep health recommender system (CASHRS) as an emerging multidisciplinary research field that sits at the intersection of multiple research domains, including health RS, ubiquitous and mobile computing, context-aware computing, persuasive technology, human-computer interaction, consumer electronics, and health informatics.
This paper aims to assess the extent and nature of the peer-reviewed publications that serve as a foundation for CASHRS. Interestingly, most of the systems reviewed in this article have not formally self-identified as RSs despite offering tailored actionable recommendations for improving sleep. There have been promising reviews on sleep apps, but they are notably limited to the traditional scope of mHealth [44
]. To our knowledge, the present study is the first to examine digital sleep health systems through the lens of CARS and to formally introduce the concept of RS to ubiquitous sleep computing research. We outline some of the illustrative and innovative CASHRS to give sleep specialists an idea of the recent landscape. Meanwhile, the results identify technical research trends in the literature, unveil limitations of prior studies, and show a future research direction for computer scientists and engineers who are interested in building CASHRS.
This study links together a scattered assortment of articles in multiple research domains on fully-automated digital sleep health systems that provide personalized/tailored sleep health recommendations. Systems that solely offer general sleep tips or that require manual prescriptions by clinicians were not of interest and were thus excluded. The narrative nature of this study allows us to obtain a broad perspective on the topic of interest rather than a formal, exhaustive, and non-biased systematic appraisal [51
]. A search was conducted using electronic databases including Google Scholar, PubMed, Scopus, IEEE Xplore, and ACM Digital Library. The following keywords were used: “sleep recommender system”, “sleep hygiene recommender system”, “self-experimentation sleep recommendation”, and “mHealth sleep”. Snowballing was performed to find related references. Both journal and conference articles were included if the recommendations provided were personalized/tailored to a certain degree, regardless of whether the systems were intended for clinical interventions or general-purpose use. Exclusion criteria were as follows: (1) general clinical sleep recommendation guidelines, (2) systems that are not fully automated and require guidance or assistance from a healthcare provider, (3) proprietary systems with limited accessibility, and (4) algorithms for recommending mattress or pillows.
We were interested in what was considered as context and how context was coupled to the recommendation algorithms, how these systems were evaluated and whether they were effective in improving sleep, and what theories or techniques were used to facilitate behavior change, as well as the challenges and barriers identified in the literature. Primary research questions in the current study were as follows:
RQ1: What was considered as context in CASHRS? How was it measured?
RQ2: How was context coupled to the recommendation algorithms?
RQ3: What approaches and algorithms were used for managing the life cycle of context (i.e., context acquisition, context modeling, context reasoning, context dissemination)?
RQ4: What theories or techniques were applied to encourage compliance to the recommendations and to promote positive behavior change?
RQ5: Were CASHRS approaches effective in fostering good sleep hygiene and improving sleep quality? Were the systems evaluated in other dimensions?
RQ6: What challenges and barriers were identified in prior studies?
In answering these questions, this research seeks to understand the scope and level of maturity of context-aware sleep recommendation technology and to lay a foundation for future CASHRS research.
We identified 12 systems that met the characteristics of CASHRS (Table 1
), among which 7 were developed within the scope of ubiquitous self-tracking tools and 5 were developed as a digital CBT-I solution for clinical use. These systems were implemented either as a stand-alone mHealth app or as a comprehensive system that integrates data from wearable and IoT sensors to a mobile app. As illustrated in Figure 1
, CASHRS typically consists of four main components: the input data, a database, the recommendation algorithm, and the behavior change techniques applied. The input data are obtained either explicitly or implicitly to initialize the recommendation process. The database stores information about the users and the item profiles (e.g., sleep hygiene tips). The recommendation algorithm uses the input data and the database to suggest a set of behavior interventions to target users. In addition, a CASHRS needs to bridge the gap between recommendation and action, to facilitate the initiation of behavior change, and to encourage sustained compliance to the recommendations. The behavior change technique component distinguishes CASHRS from traditional CARS. In this section, we report a qualitative summary of the findings to answer the research questions listed in the previous section. Note that RQ3 was left unanswered as this topic was not explored in any of the studies reviewed.
3.1. Context in CASHRS (RQ1)
A CASHRS differs from a traditional digital sleep hygiene education system in that the recommendations provided in CASHRS are to a certain degree adapted to a user’s context. In the field of computer science, context is defined as “any information that can be used to characterize the situation of an entity. An entity is a person, place, or object that is considered relevant to the interaction between a user and an application, including the user and application themselves”
]. Context-aware systems use “context to provide relevant information and/or services to the user, where relevancy depends on the user’s task
]. As such, the actual meaning of a user’s context in CASHRS refers to any factor(s) that may influence the quality and continuity of the user’s sleep at night.
A large body of sleep science studies has identified many factors that could influence night sleep. Demographic characteristics such as age and gender are known to be associated with sleep quality and varied risk of sleep diseases [63
]. Daytime events such as physical activity, exercise, and diet could have impacts on each sleep stage and sleep efficiency [64
]. Sleep hygiene [67
], which relates to the regularity of sleep schedule as well as minimizing potential sleep disturbing factors close to bedtime (e.g., avoid exposure to blue light, optimize bedroom environment), also has a profound impact on sleep [68
]. In the field of ubiquitous sleep computing, it has been acknowledged that collecting information on contextual factors of sleep is important for interpreting the self-tracking sleep data [11
]. Interventions to some of these factors may not only help to improve the sleep quality of healthy individuals with no diagnosed sleep problems but also help patients to reduce insomnia and improve sleep apnea symptoms [70
]. As such, these factors constitute the context of interest in CASHRS. We found that existing CASHRS approaches incorporated the monitoring of one or several factor(s) summarized in Table 2
. Note that we only counted the contextual information that was actually used in the recommendation algorithm of the CASHRS, and the information that was captured by the system but was not coupled to the recommendation algorithm was discarded. These contextual factors may fluctuate at different frequencies. For example, resting heart rate and ambient temperature normally do not change dramatically in a short period of time, while physical activity level and sleep quality could vary a great deal from day to day.
Interestingly, the sleep quality of previous nights was the most widely used context to personalize behavior intervention recommendations in existing CASHRS, followed by physical activity. Especially, clinical CASHRS often relied on a user’s average sleep quality in the past week, such as time in bed (TIB), total sleep time (TST), and sleep efficiency (SE), together with a user’s preference, to adjust the recommended sleep window (either TST or bedtime/wake time) in a sleep restriction therapy [55
]. On the other hand, temporal context (e.g., time of the day, weekday, meal time) was only considered in three CASHRS [22
]. The environmental context describing the environmental situation when sleep takes place, such as ambient temperature, light, and noise levels, was considered in two CASHRS [22
]. In addition to monitoring the environmental context before bedtime or during sleep, the design of the Lullaby system also placed strong emphasis on monitoring these factors during daytime using the embedded sensors [22
]. Psycho-social factors such as stress and anxiety were rarely considered in the systems. Many clinical CASHRS provided guided progressive relaxation before sleep, but without allowing users to assess or record their stress/anxiety level. Taken together, the variety of contexts considered in CASHRS so far is limited, and the improvisational aspects of users’ behavior context (which requires intraday data collection at higher resolution) were left out of scope in all systems.
3.2. Recommendation Algorithm and Context Filtering (RQ2)
There are several methods for generating recommendations. Recommendation algorithms could be collaborative [72
], content-based [74
], knowledge-based [75
], and hybrid [77
]. Collaborative algorithms attempt to estimate the unknown preference of a user based on ratings from similar users. This approach is popular because of its simplicity. Content-based algorithms recommend items that are similar to those that a target user preferred in the past. The similarity between items is computed based on their own characteristics (e.g., features or attributes) instead of other users’ ratings. Knowledge-based algorithms exploit structured domain knowledge as auxiliary information to improve the precision, diversity, and interpretability of the recommendations. Hybrid algorithms combine two or more types of algorithms to overcome the limitation of each individual type and to increase the quality of the recommendations.
In a related vein, there are three approaches to incorporate contextual information [80
]: pre-filtering, post-filtering, and contextual modeling. Pre-filtering applies contextual information to filter the data before applying traditional recommendation algorithms. Only items that are relevant to a given context are selected for generating recommendations. Post-filtering considers the contextual information only in the final step of the recommendation generation process. In other words, the recommendations are generated using traditional methods and then contextualized for each user. These two methods consider the context as an additional filtering constraint that can be applied to any traditional recommendation algorithm. In contrast, contextual modeling implies a totally different approach by directly incorporating the contextual information in the recommendation models. Contextual modeling directly leverages the contextual information in the estimation of ratings. It firstly models the contextual data and then parameterizes the recommendation algorithms as a function of the contextual model. Studies comparing the different approaches demonstrated no conclusive findings on their performance [81
]. A few studies also proposed to combine these approaches for better system performance [82
presents a summary of the recommendation algorithms and the context filtering approaches adopted in the reviewed CASHRS. A knowledge-based algorithm was used in almost all CASHRS. This is plausible as the generation of behavior interventions needs to be grounded on evidence-based sleep science domain knowledge. Such knowledge was obtained through population-level large-sample sleep studies. A problem of relying on population-level knowledge is that the recommendations may not ensure the desired homogeneous response from individual users due to interpersonal differences [83
]. Only three systems implemented recommendation algorithms other than a knowledge-based algorithm: CBSR [42
] and PARIS [40
] used a collaborative algorithm to overcome the cold-start problem, while PUM [41
] combined knowledge-based with content-based algorithms to leverage a user’s self-knowledge extracted from self-tracking data. In a sense, the collaborative and content-based algorithms used in the three systems still relied on some kind of knowledge, but such knowledge was obtained using new approaches that deviated from the traditional large-sample experiment design. CBSR [42
] and PARIS [40
] relied on crowd-sourcing, where the autonomous aggregation of data from an unprecedented scale of real-world users contributed to the discovery of new knowledge that would have been difficult to achieve in a traditional experiment design. In comparison, the principles of PUM [41
] and PARIS [40
] were centered on the discovery of users’ self-knowledge from their historical self-tracking data. The three systems present promising directions for devising novel recommendation algorithms in CASHRS.
With respect to context filtering, existing CASHRS all relied on post-filtering that performs post-hoc selection based on certain types of context. Clinical CASHRS approaches that offer digital CBT-I solutions tend to exploit the average sleep duration of the past one week (either TIB or TST) together with a user’s sleep goal and preference to adjust the recommended sleep window (either TIB or bedtime/wake time) in a sleep restriction therapy. In contrast, CASHRS for general-purpose use focused more on other types of context, including physical activity, meal time, and bedroom environment, to tailor the behavior intervention recommendations.
We found that personalization was not a yes–no characteristic. As shown in Table 4
, a large portion of the reviewed CASHRS provide a mixture of personalized and general recommendations. Clinical CASHRS have only dominantly focused on tailoring sleep windows, while keeping the other recommendations consistent with the standard CBT-I content in the literature. General-purpose CASHRS provide personalized recommendations in more aspects, probably due to their ability to collect a wide range of self-tracking data with multi-modal sensors. Notably, ShutEye [53
] provides general sleep hygiene tips but adjusts the recommendations by the time of day. For example, the recommendation of meal times was presented to users since waking time to 3 h before bed, while relaxation tips were only presented from 1 h before bedtime to bedtime. Lullaby [22
] also leverages temporal context (i.e., time of the day, weekday) to recommend the optimal bedroom environment, such as temperature, light, and noise. SleepCoacher [43
] and PUM [41
] provide fully personalized recommendations based on statistical analysis and data mining on users’ self-tracking data.
Two CASHRS approaches pioneered the introduction of a new concept: self- experimentation. SleepCoacher [43
] and SleepBandit [54
] developed design probes to support users to investigate personal sleep factors through self-experimenting and reflection. Despite being a less-known concept in the ubiquitous computing community, self-experimentation is a promising method for generating fully personalized sleep health recommendations based on a user’s self-knowledge in addition to population-level knowledge. Self-experimentation is grounded on the principle of the N-of-1 trials or single-case design in personalized medicine [84
], which involves repeated, prospective, and quantitative measurement of outcomes of interest in a single subject to identify the optimal treatment for the particular individual. The N-of-1 trials are considered to be an approach that is well suited to help people to find the best behavior interventions for health [84
]. Grounded on the principles of the N-of-1 trials, self-experimentation helps to address the heterogeneity of treatment effect (HTE) issue in existing health RS approaches. The traditional large-sample design widely used in clinical studies fails to count the characteristics of individual subjects (within-subject variation) [86
]. Recommendations generated based on population-level knowledge thus may not generalize well to individuals due to the variation of individual treatment effects across people [42
]. In contrast, CASHRS systems with self-experimentation feature guide users to control confounding factors and to intentionally increase data variability, meaning that it becomes feasible to discover patterns and knowledge that are specific to the specific user from whom the data are collected.
3.3. Behavior Change Techniques Incorporated in CASHRS (RQ4)
A main goal of CASHRS is to influence the behavior of users. However, behavioral modifications to improving sleep quality could be hard to achieve and sustain. The raw outputs of the recommendation algorithms may barely have an intervention effect and need to be enhanced with behavior change techniques (BCT). In clinical terms, both the recommendation contents and the BCTs adopted are components making up the behavior interventions. As such, theoretical models and techniques in health psychology and behavioral medicine have become an essential part of CASHRS.
We found that all the 12 CASHRS incorporated one or more evidence-based BCTs, and some BCTs appeared noticeably more often than others. We coded the techniques based on the BTC Taxonomy V1 [88
]. The taxonomy defines 16 principal methods of behavior change and is widely considered as the gold standard for behavior change research design and reporting. Eleven out of the 16 principal methods in the BTC Taxonomy V1 were incorporated in the reviewed CASHRS. We further divided the adopted BCTs into three categories. Category-I BCTs (Table 5
) target behavior interventions before sleep. Five BTCs fall into this category: goals and planning, repetition and substitution, antecedents, shaping knowledge, and regulation. Goals and planning sits at the beginning of the behavior change trajectory and is a key feature of clinical CASHRS. Setting realistic goals for sleep and behavior is usually the first step in a digital CBT-I program. Sleepio [55
] has the richest features in supporting goals and planning, followed by Sleepcare [56
] and SMSR [57
]. Habit formation based on repetition and substitution was the most widely adopted BCT (i.e., used in 8 out of the 12 CASHRS) under category-I. Allowing users to set reminders and providing guided relaxation are also popular BCTs (i.e., used in 5 out of the 12 CASHRS). Notably, no BCT in category-I was incorporated in Lullaby [22
Category-II BCTs (Table 6
) focus on the future outcomes of sleep or the consequences of poor sleep. These BCTs include feedback and monitoring
, reward and threat
, and natural consequences
. Self-tracking or daily logging of sleep
was incorporated in all the reviewed CASHRS. Indeed, the quantified-self practice of sleep-tracking serves as a foundation for all sleep-related ubiquitous computing systems. Conversely, self-tracking
or daily logging of behavior
was only incorporated in five systems, indicating that less attention was put into the monitoring of behavior in current CASHRS research. Interestingly, social reward
–a BCT commonly used in other health domains (e.g., physical activity)–was only incorporated in Sleepio [55
In addition, two BCTs targeting social support have also been used in CASHRS, as shown in Table 7
. Sleepio [55
] extensively applied the social cognitive theory to facilitate behavior change. Peer influence and peer support were implemented throughout the whole CBI-I program. When a user first logs in Sleepio, they will read about Sally’s personal story with insomnia and all the ways that Sleepio has helped her improve her sleep and life in general. The online Sleepio community engages users to connect with other users facing similar issues or to seek personalized guidance and reassurance. The ability to communicate with other users in the Sleepio community motivated users to complete the program and promoted long-term engagement [55
]. Sleepio users mentioned “reduced sense of isolation
”, “community being supportive and nonjudgmental
”, “positive comparison
”, and “altruism
” as some of the reasons that they engaged in the Sleepio online community [89
3.4. System Evaluation (RQ5)
While the evaluation of traditional RS has put emphasis on the accuracy of recommendation algorithms in predicting a user’s preference, the evaluation of CASHRS embraces more dimensions. Given the interactive nature of CASHRS, it became clear that system properties including efficacy in improving sleep outcomes and users’ experience and satisfaction with the system are equally, if not more, important. Good sleep health recommendations should not only accurately reflect and relate to a user’s needs but also be achievable in terms of a user’s physiological states/motivation and implementable given a user’s daily schedule and living environment.
There are three methods for evaluating RS: offline, user studies, and online experiments [90
]. The offline method leverages pre-collected datasets together with simulated users’ behaviors to examine the accuracy of different recommendation algorithms. User studies and online experiments examine the overall system outcomes on target measures as well as users’ interaction with the system, with the only difference between the two being whether the experiments are conducted in a controlled laboratory environment or in a naturalistic setting.
We found that none of the reviewed CASHRS approaches were evaluated using the offline approach; conversely, all of them were evaluated via user studies either in a controlled setting or in the wild. For CASHRS, algorithm accuracy appeared to be less of a concern than the overall efficacy and users’ experience with the system. In what follows, we first present an evaluation of the system maturity, followed by reporting the evaluations centered on the efficacy of the system in improving sleep and that centered on how users interact with the system.
3.4.1. System Maturity
Inspired by [58
], we assessed the maturity levels of the reviewed CASHRS on a continuum from pre-prototype to prototype to released. A pre-prototype refers to a CASHRS that was at the stage of conception or algorithm design without a functional app prototype. A prototype refers to the version of a CASHRS app with minimal working functionality for user testing. A matured version refers to the version of a CASHRS app that has undergone a redesign based on feedback from user testing. A released version refers to the version of a CASHRS app that is published in app stores for download. As shown in Table 8
, we found that three systems were pre-prototypes, four were prototypes, one was matured, and four were released.
3.4.2. Evaluation Centered on Efficacy
In a clinical setting, the evaluation of CASHRS was often centered on the efficacy of the recommendations in helping users improve sleep quality. Human sleep can be measured along multiple dimensions and sleep quality can be quantified using various metrics. The sleep metrics of interest may vary in different systems. Table 9
summarizes the sleep metrics considered in the evaluation of CASHRS, how the metrics were measured, and the direction of change before and after a clinical trial or field study (↓, ↑ and → indicate reduction, increase, and no change, respectively). Both objective and subjective measures of sleep were used. Clinical studies predominantly relied on retrospective questionnaires and sleep diaries to capture self-reported appraisals of sleep. Objective measures of sleep were noticeably underused in clinical studies. Most studies used simple statistical tests, while [56
] used more advanced multilevel analysis. We only included conclusive results with statistical significance. Some studies also consider additional metrics including the insomnia index, daytime sleepiness, and attitudes/beliefs about sleep [59
]. Multiple studies consistently demonstrated increased SE, reduced ISI, and reduced day-time sleepiness. The efficacy on SOL, TST, WASO, and subjective sleep rating was mixed.
We spotted several issues pertaining to the existing evaluation paradigm. First, system-level overall evaluation makes it impossible to pinpoint which intervention modules of the CBT-I protocol contributed most/least to the system efficacy. Second, adherence to the treatment plan and compliance to the behavior intervention recommendations, which are described in detail in the next subsection, could also confound the efficacy of CASHRS but were not discussed. Third, all studies relied on statistical analysis rather than clinical thresholds as an indication for whether a target sleep outcome was improved. For example, the average PSQI in [56
] was reduced from 11.0 to 7.4, which was a significant improvement. However, a PSQI score above 5 indicates that sleep problems still presented despite the reduced severity. Future studies are needed to address these issues.
3.4.3. Evaluation Centered on Human-Computer Interaction
RS research always places emphasis on the need for engaging users and minimizing users’ interaction effort, in addition to generating useful and trustworthy recommendations [96
]. It is crucial to understand how end users interact with CASHRS systems, as users are at the center of digital health technologies. As such, user-centered evaluation has been adopted in addition to efficacy-centered evaluation to gain an understanding into users’ engagement with the system and compliance to the recommendations, as well as the perceived usability and usefulness of the system.
As shown in Table 10
, the most widely used methods for user-centered evaluation include open-ended questions in survey/questionnaires and semi-structured interviews. These methods allow researchers to collect qualitative data that are not directly observable. Such qualitative insights can generate answers to a wide set of questions, such as whether the users enjoyed the user interface, why the users perceived the system as useful/useless in affecting their behavior, and how the system can be improved.
Usability is an important aspect of CASHRS and can be assessed either quantitatively or qualitatively. Along the quantitative spectrum, the mobile apps rating scale user version (uMARS) [97
] and the system usability scale (SUS) [98
] are the most widely used. Some studies also devised original surveys to collect users’ perception on system usability [57
]. Along the qualitative spectrum, researchers collected users’ feedback using open-ended survey questions or semi-structured interviews. These qualitative data were then analyzed using standard qualitative data analysis methods, such as thematic analysis. In a related vein, Refs [42
] investigated users’ perceived usefulness of the recommendations. Participants mentioned increased awareness of current sleep habits and their impact on sleep and how social comparison motivated behavior change.
Users’ adherence to a CASHRS system is important because it serves as a precursor to subsequent behavior change. Existing CASHRS approaches examined the adherence of users either based on app usage patterns [59
] or dropout rate during a trial [89
]. In [59
], the authors approximated app usage patterns by analyzing app events, which were captured and logged with a time stamp each time the user tapped in the app. A user’s overall app usage (including app opening events and usage days) and use of each active component were assessed. They found that the participants used the app 50.2% of days during the treatment period and 30.3% of days during the follow-up period. In a clinical trial using Sleepio, it was found that the dropout rate was less than 20%, and 75% of patients completed follow-up [89
Another aspect pertaining to the efficacy of CASHRS is users’ compliance to the recommendations. Adherence and compliance measure different aspects of user behavior. Adherence only reflects whether a user has consistently engaged with the system and is often obtained through analyzing app usage. Conversely, compliance refers to whether a user follows a given recommendation. App usage does not by default indicate that the user followed the given recommendations. Some studies relied on alternative ways to collect information on users’ compliance to behavior intervention recommendations, either by seeking users’ direct input (e.g., self-report) [42
] or by only sending recommendations that could be verified from passively collected data [42
]. Overall, adherence and compliance varied significantly across CASHRS.
3.5. Challenges Identified in Prior Studies (RQ6)
Existing studies highlighted several challenges and barriers to developing CASHRS that generate relevant, actionable, credible, and personalized recommendations.
Non-compliance to the recommendations provided in CASHRS was highlighted as a challenge in several studies [42
]. Compliance is a complicated issue when it comes to behavior intervention because many factors could confound a user’s decision-making on whether to follow a recommendation, and it is likely that not all the relevant factors are within the user’s control. As such, non-compliance could occur either intentionally or unintentionally. On one hand, people are less likely to make behavior change when the effort needed to adjust to a new sleep behavior is perceived to outweigh the potential benefits they could get. In [53
], some participants mentioned that they at times dismissed the recommendations because they preferred to hang out with friends at night. On the other hand, recommendations were not always actionable or achievable. Lifestyle, environment, and resource constraints could all hamper compliance. For instance, living near a bar may hinder a user’s ability to control ambient noise during sleep [53
], and a hectic work schedule could make it difficult to extend sleep hours [42
]. Often, lacking knowledge on how to achieve a recommended behavior change plays a critical role in non-compliance. People may have no knowledge of how to relax the body and mind, even when the system recommends relaxation before bedtime. It is therefore important to provide supplementary materials or a step-by-step guide on how to implement a behavior change plan.
Lacking perceived credibility is by itself a challenge and could also exacerbate non-compliance, especially if users feel better when they are not compliant [42
]. Some participants in [54
] found the recommendations unconvincing and lacking novelty. Recommendations that conflict users’ mental model of sleep or that with poor phrasing may all negatively affect trustworthiness [42
]. Technology dictatorship and privacy concern are two barriers related to credibility [53
]. Some participants complained about feeling like the technology was dictating what they should and should not do [53
], which may cause mental stress and rumination. From a technical perspective, predicting what sleep metrics interest a target user is a challenging task. Human sleep is multidimensional and can be measured using a number of metrics. It was found that recommendations targeting uninteresting sleep metrics may reduce perceived usefulness and trustworthiness [42
]. One possible solution, as suggested in [42
], was to directly ask users about which sleep dimension they intend to improve or they would like to focus on.
The purpose of this review was to assess the current landscape of CASHRS research. Specifically, we aimed to examine what types of context were considered and how they were measured, what BCTs were incorporated to promote positive behavior change, and how the systems were evaluated.
The literature on CASHRS remains small and lacks a systematic frame. Sleep has not been studied to the same extent as other health-related topics such as exercise and diet in CARS research, probably due to the concerns that digital technologies may not be well suited for sleep because they may interfere with sleep itself. However, most of the studies reviewed in this work established evidence for the efficacy of CASHRS in improving sleep. CASHRS approaches also share unique features that are not seen in other health CARS because sleep comprises multidimensional constructs that are often not directly controllable. Meanwhile, we found that prior research has been mostly centered on the clinical efficacy of the systems as well as how users interact with the systems. No study has systematically delved deep enough into the technical aspects such as the recommendation algorithms and the context life cycle. In what follows, we discuss current research trends in CASHRS and opportunities for future research.
4.1. Research Trend in CASHRS
Our analysis results revealed three trends in current CASHRS research: algorithm development, BCT incorporation, and self-experimentation.
The recommendation algorithm is a core element of a CASHRS. It uses the system’s input data and the database to suggest a set of behavior interventions to the target user. Eleven out of the 12 CASHRS in our review relied on knowledge-based recommendation algorithm, which is plausible given the importance of incorporating evidence-based sleep domain knowledge. However, there is no guarantee that the population-level knowledge generalizes well to individuals. A few recent studies combine knowledge-based algorithm with collaborative filtering (i.e., PARIS [41
] and CBSR [42
]) or content-based filtering (i.e., PUM [40
]). Prior studies on computer-tailored digital health programs concluded that incorporating a collaborative filtering (e.g., based on demographic information) as a second step to knowledge-based filtering could potentially enhance users’ experience with the RS [99
]. As such, these three systems are likely to have better prediction accuracy, but algorithm-level evaluation is missing in current literature.
Incorporating behavior change theories and techniques is another trendy topic in CASHRS. It is widely recognized that the design of health RS needs to consider the details of real needs and real use. A major drawback of previous systems is that the design of these systems was not grounded on behavior change theories [100
]. Simply giving advice or recommendations alone is rarely an effective trigger for behavior change, particularly when users experience ambivalence or resistance to change. A prior study on sleep apps found that social cognitive theory was the most aligned with the apps examined in the study. Other potentially useful theories include the reinforcement theory [101
] and the self-regulation theory [57
]. In this study, we found that 8 of the 16 principal methods of behavior change as defined in the BTC Taxonomy V1 have been applied to CASHRS. Some systems even combined multiple BCTs to maximize the effect. Self-monitoring of sleep was the most widely used BCT and was found in all of the reviewed CASHRS. Asking users to directly input their perception of sleep quality and their “gut feeling about sleep habits
” could prompt users to reflect on their sleep hygiene [42
], which in turn may help boost users’ motivation for positive behavior change [104
]. The quantified insights from wearable and IoT sensors can provide complementary information of users’ sleep structure and daily activity, which may serve as visual cues for behavior change. A recent review of mHealth sleep apps revealed that the category-I BCTs that focused on changing aspects of behavior before sleep were more appropriate for sleep intervention than the category-II BCTs that focused on future outcomes or consequences of poor sleep [44
], as the latter may lead to anxiety and rumination that interfere with the initiation and continuation of sleep. Indeed, our analysis revealed that more category-I BCTs were used in current CASHRS.
While self-monitoring is becoming increasingly common in recent years, self- experimentation is a relatively new concept in CASHRS. Two systems—SleepCoacher [43
] and SleepBandits [54
]—provided tailored recommendations on how to investigate personal sleep factors through self-experimentation and reflection. The two systems generated a set of hypotheses based on sleep domain knowledge and recommended micro self-experimentation plans based on the self-knowledge discovered in a user’s self-tracking data and preference. Users were able to identify causal relationships between personally concerned sleep factors and sleep outcomes. In a sense, the two systems expanded the scope and variety of recommendations based on the self-knowledge newly discovered from each user’s own data. New recommendations went beyond the widely known sleep hygiene recommendations, e.g., how listening to an audio book may influence SOL or how eating cheese for dinner may affect deep sleep ratios. Self-experimentation is grounded on the principle of the N-of-1 trials in personalized medicine [84
], which arose in the mid-1980s in response to the limitations of the conventional large-cohort trials [105
]. The N-of-1 design has wide applicability in clinical care and behavioral science [107
] and has been considered to be an approach that is well suited to help people to find the best behavior interventions for health [84
]. Inheriting the advantages of the N-of-1 trials, self-experimentation holds promise for generating behavior intervention recommendations that are fully personalized to each user’s physiological, behavioral, and environmental context. Self-experimentation also has great compatibility with the self-tracking practice, as the wearable and mobile technologies widely adopted for self-tracking can substantially reduce the burden of data collection and increase the feasibility of conducting self-experimentation [108
4.2. Opportunities for Future CASHRS Research
The results of this narrative review indicate that research on CASHRS is still in its infancy. The context coupled to existing CASHRS was restricted to a limited range, with dynamic context such as time, location, and social situation being left out of scope. This is especially the case in clinical CASHRS in which the content was structured only based on established therapeutic guidelines [34
]. Complying with a medical standard undoubtedly improves the rigidity of the content but comes at the cost of missing out novel and potentially effective recommendations. From a technical perspective, little attention has been placed on the computing aspects of CASHRS. Most of the knowledge-based recommendation algorithms are preliminary and incapable of incorporating dynamic context. Context life cycle—an important topic in RS research–has not yet been covered in any of the studies reviewed. The evaluation of CASHRS was dominantly performed on a high level (e.g., efficacy in improving sleep measures, users’ perceived usefulness, adherence), leaving the algorithm-level performance (e.g., accuracy, coverage, diversity [90
]) unexplored. Future research should focus on re-framing CASHRS research with established methods, approaches, and techniques in CARS. The major design opportunities concern addressing users’ compliance to recommendations (O2, O3) as well as developing and validating context-aware recommendation algorithms for CASHRS (O1).
4.2.1. O1: Developing and Validating New Algorithms for Recommendation Generation, Context Filtering and Context Life Cycle Management
Prior studies acknowledged that offering fully personalized recommendations was challenging for digital CBI-I systems [110
]. Addressing this issue requires increasing the variety of contextual information in data collection and designing new algorithms to incorporate such information.
A main gap in current CASHRS is the lack of integration of high-granularity and improvisational dynamic context. In this study, we found that the context considered in existing CASHRS was restricted to predefined, low granularity, daily aggregated contextual information, such as the sleep quality of previous nights, physical activity level in a day, and resting heart rate. A natural consequence of this design scheme was that the systems were not able to dynamically react to the changes of a user’s context. Prior studies on RS argued that context can encompass multidimensional and dynamic information, including a user’s location, physical and emotional states, the timing when the user engages in an activity, social interaction with family and colleagues, and the environmental situation concerning the user of a system [112
]. From an interactional view, context is a relational and occasioned property rather than being a stable, objective set of features [61
]. Context arises from the situation that the target user is currently engaged in; thus, the scope of contextual features needs to be defined dynamically [61
The major design opportunity concerns not the use of a predefined context but rather how a CASHRS can support the life cycle management of dynamic context, which comprises context acquisition, context modeling, context reasoning, and context dissemination. In a sense, not all the factors listed in Table 2
are relevant to a user all the time. The widespread and ubiquitous use of wearable (e.g., activity trackers) and mobile devices (e.g., smartphones, touch pads) opens the door for collecting a huge amount of high-resolution data to derive the actions and behaviors of the target users as well as the rich and ever-changing context in which they interact with the system. However, the data that were automatically and passively collected with wearable and mobile sensors have not been incorporated in most of the CASHRS reviewed in this article. Future CASHRS research needs to fully embrace the ubiquitous sensing and data-driven scheme. Developing new and multi-functional integrated wearable electronic devices to simultaneously capture numerous and stable data is another promising way to encourage the sustained acquisition of contextual information while reducing tracking fatigue.
Furthermore, the high dimensionality of the contextual information requires a hybrid method that incorporates the context at various stages in the recommendation generation algorithms. In addition to the post-filtering approach adopted in all the CASHRS, pre-filtering and context modeling based on machine learning and data mining are another two promising context filtering approaches. Context modeling is also an important phase in the context life cycle [114
] that precedes the context reasoning and context sharing phases [116
]. Researchers of CARS have devised a great number of algorithms and techniques [117
]. For instance, [119
] proposed a hybrid multilevel context-filtering approach that comprised pre-filtering using demographic information, collaborative filtering combined with knowledge-based filtering, and post-filtering with dynamic context information. Sequential recommendations based on sequential pattern mining were proposed in [120
]. These algorithms may serve as a foundation for the development of novel recommendation algorithms and context-filtering techniques tailored to CASHRS. Moreover, algorithm-level evaluation (i.e., accuracy, coverage, diversity [90
]) needs to be established as a complement to system-level evaluation (i.e., efficacy, usability).
4.2.2. O2: Enhancing the Credibility of CASHRS
Credibility describes the believability of a system and embraces two key components: trustworthiness and expertise [122
]. Credibility matters when computing systems “act as knowledge sources
”, “report measurements
”, “instruct or tutor users
”, or “act as decision aids
]. Perceived credibility also affects the adoption and retention of health technology [123
]. Credibility has been previously studied in general RS [124
] and quantified-self sleep-tracking technologies [12
] and was further reaffirmed as a crucial property for CASHRS [42
One way to enhance the credibility of CASHRS is to support better communication between the system and the users to build empathy [103
]. In classic psychotherapy research, it was found that effective human coaching relies on the patient and therapist mutually agreeing on therapeutic goals, the fulfillment of therapeutic tasks, and establishing mutual trust [125
]. As a CASHRS often plays the role of a digital therapist or coach, it needs to be perceived as legitimate and to form a bond with users. Empathy building starts with goal setting by defining target sleep measures. Prior studies found that subjective sleep quality was the most targeted and improved dimension of sleep by mHealth sleep apps [44
], while wearable device users target a wider spectrum of measures, ranging from SOL and TST to the ratio of deep sleep and REM sleep [12
]. Existing CASHRS support users to set goals of sleep hours and sleep schedule (i.e., bedtime and wake up time) but seldom target other sleep metrics such as sleep efficiency and sleep stages. Future CASHRS may enable sleep goal setting in as many dimensions as users preferred. Identifying modifiable sleep factors may also help to build empathy between the system and the users. The determinants of poor sleep quality are multi-factorial, with some but not all of them amenable to intervention. This requires CASHRS to focus on identifying modifiable factors for each user and to suggest what users can easily incorporate into their daily schedule rather than what they ought to do [61
]. Conversely, recommending behavior changes that are difficult to achieve may stir up feelings of doubt and compromise the credibility of the system. For instance, the SleepBandits system did not include sleep duration or timing as the target sleep metrics because they are predominantly determined by users’ schedule rather than behavioral factors [54
]. Beyond identifying modifiable factors, users may also want to receive detailed scaffolding on how to implement the recommendations. For example, “why not trying 10 min of jogging and 10 min of kickboxing after work” could be a more actionable recommendation than “20 fairly active minutes to hit the daily goal”. It is worthwhile to note that behavioral modifications to some sleep metrics (e.g., deep sleep, REM sleep) may not be well-supported by sleep science domain knowledge, but it may be possible to identify modifiable factors specific to a user based on well-designed self-experimentation. One study found that CASHRS may create a sense of dictatorship [53
]. It is hence important to allow users to negotiate the recommendations or to provide the top-N recommendations and let users choose their favorite ones. Many of the reviewed clinical CASHRS allow patients to negotiate sleep duration in a sleep restriction plan [55
], and [53
] allows users to modify the effective window of caffeine based on their tolerance level.
Another way to enhance the credibility of CASHRS is to improve technology transparency [12
]. It may be helpful to provide some introductory technical explanation on how the recommendations were generated, what metrics were used and how they were computed, and how many days of data were needed to draw reliable conclusions. Prior study found that observing how more data help to fine-tune the recommendations boosted users’ perceived trustworthiness and motivated users’ compliance to self-tracking and the recommendations provided [54
]. Exposing users to the technical details of the system may also help to resolve cognitive dissonance. Many users rely on their prior mental model of sleep health–which is usually based on general sleep hygiene–to judge the usefulness and trustworthiness of CASHRS [103
]. When users receive recommendations that conflict with their mental models of sleep health, they may experience cognitive dissonance, which may then drive them to discard the recommendations, as has been shown in sleep tracking in general [12
]. Re-directing users’ attention to how the recommendations are generated (e.g., the recommendations are tailored to their own data rather than the data of other people) may help them to understand the reason for the disparity as well as the potential limitations of the CASHRS.
4.2.3. O3: Supporting Better Decision Making and Sustained Behavior Change for Sleep Health
At its core, a CASHRS aims to help users to improve sleep through behavioral modifications. However, behavioral modifications are hard to implement and sustain. The major research opportunities include addressing users’ compliance to recommendations as well as supporting behavior changes. Theories of human decision making and BCTs can be systematically incorporated in CASHRS to achieve a balance between persuasion and empowerment [127
First, the design of future CASHRS should encourage users’ decision making that favors compliance to the recommendations. The “dual process” theories of cognition posit that there are two systems of human decision making [128
]. System one corresponds to intuitive decision making, which is fast, automatic, and effortless. It is often emotionally charged and hence difficult to control. System two corresponds to analytical and deliberative decision making, which is slower, serial, effortful, and deliberately controlled. Assuming that people make rational decisions for health, the design of existing health RS dominantly targets the decision process of system two. However, the nudge theory argues that people do not have unlimited cognitive abilities and complete self-control [129
]. In reality, people often rely on heuristics rather than analysis when they make health decisions [130
]. For example, the status quo bias
states that people sometimes prefer to remain in the current state and avoid change for loss aversion, even when the current state may not be objectively superior [131
]. This explains why a participant may choose to hang out with friends late at night instead of following the recommended bedtime in [53
], because going to bed comes at the cost of social life, which the participant places more value on. While some decision-making heuristics may lead to behaviors that go against the recommendation, others may be exploited to design system features that support compliance [132
]. The availability heuristic
refers to people’s tendency to judge the likelihood of an event by the case with which relevant instances come to mind. Based on the availability heuristic, sharing successful stories of other users who followed a recommendation is likely to motivate the target user to follow the said recommendation. One example is Sally (a virtual character) sharing her successful story with new Sleepio users in [55
]. Another relevant heuristic is the affect heuristic
—a mental shortcut that helps people to make decisions quickly by bringing emotional response into play. If people have pleasant feelings about something, they see the benefits as high and the risks as low, and vice versa. As such, the affect heuristic serves as a first and fast response mechanism in a decision making system. One way to exploit the affect heuristic for enhanced compliance is to remind users of their past positive experience when they followed a recommendation.
Second, the latest advance in health psychology and behavioral medicine can be exploited to empower users to achieve sustained behavior changes. One promising direction is to concurrently target multiple health behaviors [72
]. Prior studies found that targeting multiple health behaviors together could lead to greater health improvements than targeting one behavior alone. This is because of spillover effects
in which success with one health behavior aids in the ability to succeed with other health behaviors [133
]. This approach is promising for improving sleep health as many health behaviors such as exercise and good diet are known to have reciprocal relationships with sleep. In addition, BCTs such as feedback and monitoring
and goal setting
have been commonly implemented across mHealth app interventions targeting physical activity, diet, and sleep [44
]. Future research is needed to examine the optimal combinations of co-targeted health behaviors as well as co-occurring BCTs to maximize the benefits while avoiding ego depletion [136
]. Furthermore, different BCTs may be applied based on individual progress through the intervention stages. For instance, a novice user may receive information about the positive health benefits of 7–8 h of sleep. As the user makes progress to the maintenance stage, such information would become unnecessary, and the BCT should shift to relapse prevention strategies. In a related vein, there is also argument that to maximize change in multiple behavior intervention, each behavior must be targeted using appropriate behavior change techniques that are specific to that behavior [137
], but too much and too complex information may cause cognitive overload and compromise the usefulness of the recommendations [69
]. As such, the recommendations must be simple, clear, and easy to follow. Last but not least, group-based recommendations for sleep health (e.g., expanding the intervention target from individual to family) could also be an interesting idea to explore [139
This study has conducted a narrative appraisal of peer-reviewed publications on CASHRS. The review demonstrated that CASHRS research is still in its infancy as the variety of contextual information, recommendation algorithms, context filtering techniques, and the system evaluation methods are limited in the reviewed publications. Almost all of the reviewed systems relied on knowledge-based recommendation algorithms and incorporated context information using post-filtering. The sleep quality of previous nights was the most widely used context, followed by physical activity and bedroom environment. Most of the reviewed CASHRS provided a mixture of personalized and general recommendations. Notably, clinical CASHRS focused dominantly on tailoring the recommended sleep window in a sleep restriction therapy, while keeping the other recommendations consistent with the standard CBT-I content in literature. General-purpose CASHRS provided personalized recommendations in more aspects, probably due to their ability to collect a wide spectrum of self-tracking data using wearable and IoT sensors. No information was found regarding how the systems handled the context life cycle (especially the context modeling and context reasoning phases), which presents a major knowledge gap in CASHRS literature. All systems incorporated one or more BCTs, among which goals and planning and self-tracking or daily logging of sleep were the most popular. Interestingly, social reward and social support—two BCTs widely used in other health domains (e.g., physical activity)—were only incorporated in one CASHRS (i.e., Sleepio). The evaluation of the reviewed CASHRS covered both the overall system efficacy in improving multidimensional sleep outcomes and how users interacted with the systems (e.g., usability, perceived usefulness, adherence, compliance). Identified challenges in prior studies included users’ non-compliance to the recommendations and a lack of perceived credibility. Taken together, CASHRS points at a promising direction for ubiquitous sleep computing research, but this subdomain requires a formal re-framing using established methods and approaches in health CARS research. To achieve this, future CASHRS research may focus on addressing users’ compliance to recommendations as well as developing and validating new algorithms for recommendation generation, context filtering, and context life cycle management.