Chronic diseases represent a significant share of the burden of disease globally [1
]. They are responsible for 86% of all deaths [2
]. In Europe, chronic diseases affect over 80% of adults over 65 and incur 70% of the increasing healthcare costs [3
]. The most common chronic diseases are cardiovascular, pancreatic, pulmonary, and neoplastic. Unhealthy lifestyle and behaviours, such as physical inactivity, insufficient sleep, poor nutrition, and tobacco intake, explain up to 50% of the risk of chronic disease [4
]. We expect the importance of the long-term risk of disease to increase as the world population is ageing [5
]. As age dramatically contributes to the risk of multiple diseases [1
], the healthy old is a population both inherently at risk and appropriate for primary disease prevention.
Currently, human health studies assess behaviours through a combination of self-reported outcomes [6
], in particular patient-reported outcomes (PRO
]), and, more recently, patient-generated technology-reported outcomes (TechRO
]). Patient-reported outcomes include questionnaires with validated scales that assess individual outcomes momentarily or for a given recall period (e.g., “During the past month, how often have you had trouble sleeping?”
). However, self-reports are known to be the subject of biases related to the inherent shortcomings of participant reporting. The questionnaires are inconvenient, infrequent, memory-biased, socially conditioned, and qualitative. For example, seniors reporting physical activity tend to overestimate the amount undertaken [7
], while subjective sleep is less reliable than objective sleep according to studies of sleep, ageing, and cognition [8
In an attempt to address the shortcomings of self-reports and based on technological advances, we propose the coQoL
PRO-TechRO co-calibration method. Our research primarily focuses on assessing behaviours and outcomes by combining questionnaires with devices such as smartphones and wearables, assessing multiple outcomes (e.g., physical activity, sleep, and heart rate) momentarily
, and, if collected for a long time, also longitudinally
]. Numerous studies used validated, expensive, and bulky lab-grade devices (e.g., ActiGraph), although for a limited time due to the user burden and discomfort of wearing them [11
]. Conversely, consumer-friendly wearables measure continuously and objectively TechROs, increasingly more accurately, as technology progresses [12
]. Also, more individuals opt for consumer-friendly wearable devices; the market size for consumer wearables will likely double by 2022 [13
]. More recent research showed that consumer wearables could assess multiple behaviours accurately [14
], unobtrusively [15
], and continuously [16
] while worn by participants during the natural unfolding of their daily lives. Overall, consumer devices are accurate and used enough to be leveraged in human health studies.
There exist prior work aiming at co-calibration of physical and psychological outcomes with technology-related ones, as discussed in this paper. We identify the previous work by following by following a semi-structured literature review detailed in Appendix A.1
. Table 1
presents the PRO-TechRO co-calibration studies resulting from our literature review for the following outcomes: physical activity, social support, anxiety and depression, memory, sleep, and health-related Quality of Life. For each study, the table presents the PROs and TechROs used for co-calibration, the study design, the analysis methodology, and a summary of results. As for the PRO, the table presents the long names of the PRO instruments leveraged in the study, followed by the TechRO details, at least including the name and its form factor (consumer wearable or research-grade accelerometer, and position on the body). The study design details include its target population, sample size and age, and study duration. Past co-calibration methods range from simple descriptive statistics to inferential statistics via correlation methods, to machine learning, including regression and classification. The results bring a summary of PRO-TechRO co-calibration efforts, as presented in the paper.
To better emphasize the difference between state of the art and our work, we recall that we focus on healthy seniors and our method implies repeated sets of different PRO assessments in longitudinal daily life TechRO assessment settings, based on consumer wearables. All studies presented in Table 1
have at least one feature (marked in violet) that excludes them from co-calibrating PRO questionnaires with TechRO consumer wearables in healthy seniors in the wild
over long periods (above the typical 7–14 days found in the literature).
does not include studies on nutrition, since, to our best knowledge, the co-calibration of the diet
with distant measures such as steps
using questionnaire PROs and consumer wearables (or, at the very least, accelerometers) does not exist in the literature. However, there are numerous articles on energy expenditure estimates measured by consumer wearables that guide the energy intake (food types and qualities) for individuals following dietary recommendations [17
As can be seen from Table 1
, most studies focus on specific PROs suitable for the study aim; some of the PROs are disease-specific, which also relate to the user groups in the study (e.g., students, patients with a given condition). As for the TechROs, we observe few research-grade wearables, and many consumer-grade ones (Fitbit); mostly worn as wearable bracelets. The study design is characterized by diverse sample sizes (20–70, with very few examples of 500+ participants) and usually very short duration (7 days or less, very few beyond three weeks). We can call these co-calibration efforts momentary, as valid in these specific periods, for which the data was collected. The co-calibration method themselves used usually leverage descriptive statistical methods and correlations. The results of these co-calibrations rarely report values ≥0.5. In summary, little research focused on assessing the relationships between sets of different outcomes assessed via PROs and consumer wearable TechROs in healthy seniors, in the wild, for extended periods (beyond the typical study duration of 7–14 days).
Our paper is the result of research conducted as part of the EU AAL Caregiver and ME (CoME
, No. 14-7, 2017–2020) research project and software application. CoME aimed at self-management of health for individuals of old age at risk of mild cognitive impairments and their informal caregivers [20
]. The project used numerous PROs to obtain a holistic view of the participants’ health and wellbeing, by covering constructs that are both reflective (physical activity, anxiety, depression, memory, sleep) and formative (nutrition and social support) for the individual’s Quality of Life (QoL
]. These constructs assess participants’ health state and correspond to behavioural risk factors of dementia, as guided by the goals of the project [22
Our study involved 42 seniors from Hungary and Spain. The seniors provided PROs on questionnaires chosen by the consortium of the CoME project partners along [22
]. The measured outcomes included physical activity (using the International Physical Activity Questionnaire Long, or IPAQ
]), social support (Multidimensional Scale of Social Support, MSPSS
]), anxiety and depression (Goldberg Anxiety and Depression Scale, GADS
]), nutrition (Prevention with Mediterranean Diet, PREDIMED
] and Self-Reported Mini Nutritional Assessment, SelfMNA
]), memory (Memory Failures of Everyday, MFE
]), sleep (Pittsburgh Sleep Quality Index, PSQI
]), and health-related Quality of Life (EuroQoL with five dimensions and three levels, EQ-5D-3L
]) (Appendix B.1.1
describes the questionnaires and their validated scales in depth). Participants also provided TechROs of physical activity, sleep, and heart rate (Fitbit Charge 2 consumer wearable, [35
]) during the study, for up to two years.
Our paper has three objectives. First, we aim at demonstrating the feasibility of our co-calibration method, coQoL, by quantifying relationships between PROs and TechROs for our sample. Second, we aim at assessing the quality of the data collected while daily life unfolded for our participants. Third, we aim at informing the design of observational (and potentially interventional) personalized behavioural studies by leveraging the results from the first two objectives.
In this section we discuss our methodological approach (Section 4.1
), the coQoL method in the perspective of past evidence (Section 4.2
), observations on data quality (Section 4.3
), and pathways towards personalized medicine (Section 4.4
). We then review several limitations of our study (Section 4.5
) and envision future work (Section 4.6
4.1. Overall Methodological Approach in PROomics
The coQoL method explored patterns of correlations between PROs and TechROs towards their co-calibration. Consequently, we focused on identifying groups of strong correlations between PROs with a given recall period and TechROs, aggregating weeks to months of wearables data available before the administration day of the PRO. We considered correlations between similar latent constructs, e.g., PRO and TechRO physical activity or sleep, as high from 0.8 and above. However, for different latent constructs, such as PRO social support and TechRO sleep, where the probability of random correlation is low, correlations of even 0.5 are high. Hence, we presented in here correlations of 0.5 and above as of importance.
Due to the exploratory nature of our method, we deliberately omitted adjustments for multiple comparisons. The results of our method can guide future observational studies, as well as personalized, adaptive interventional studies, where the observational component will inform the intervention design as we go. Researchers can power such studies for enough confidence to exclude trivial effects.
4.2. coQoL in Perspective of Past Evidence
We recall that little prior research focused on assessing the relationships between sets of different outcomes assessed via PROs and consumer wearable TechROs in healthy seniors, in the wild, for extended periods (beyond the typical study duration of 7–14 days). On the one hand, past studies may have had similar to larger sample size, yet they have not yielded stronger statistical results; these co-calibrations rarely report values ≥ 0.5, as we do. On the other hand, we report a more prolonged study duration (up to 2 years). The study duration of over a few weeks is essential to overcome the “novelty” effect of the technology (TechRO) on the state and behaviour of the user. Namely, the user, motivated by the feedback provided by the device while the study is being conducted, may move more or sleep differently, which then would be erroneously co-calibrated with the self-reports (PROs). The coQoL method leads to more accurate, real-world PRO- and TechRO-based datasets representing the real states and behaviours of the users. We define the past evidence in the context of momentary co-calibration efforts, where the PRO-TechRO co-calibrations may have been valid only for the short interval of data collection. Our proposed method coQoL expands the state of the art.
4.3. Observations on Data Quality
The wearable monitored some TechROs for more days than others. For example, the energy expenditure and steps appeared in most days. However, some days did not include durations of physical activity at increasing intensities, due to some seniors not wearing the wearable for enough hours that Fitbit recognized the activity or they did not reach the increased intensity physical activity on those days. Also, the TechROs that combine other TechROs, e.g., fair+vigorous, appeared in at most the minimum of the numbers of days when their constituent TechROs appeared. We acknowledge errors of a few days in long-term monitoring stemming from conditions beyond our control, such as errors at the device setup, at the recruitment site which took days to correct, or when running the automated data collectors from the seniors that were beyond our control in the project. These technological and human factors influenced the quality of the available data.
The wearable monitoring period may depend on the measured outcome, frequency of answers, and human factors. While the recall period of many scales is short (e.g., one week), collecting wearable data only for that duration may prove too strict. If the design is too strict, numerous participants will disqualify, and the results may bias in favour of diligent or adherent responders, who may also exhibit positive behaviours, e.g., exercising more diligently as well. Although some results indicate that 14–28 days of data could be enough for significant co-calibrations, the observations used in the co-calibration depend on the PRO answers and the TechRO data alike. If the participants are adherent to data collection for four weeks, but do not answer the questionnaire, the quality of the data may be insufficient to derive correlations. For some questionnaires, coQoL may relax the alignment (leeway) to account for human factors that contributed to data loss. On the other hand, a monitoring window of 120 days (4 months) may prove too wide to collect data reflecting the same behaviour as the reported one (the recall period), also because of the potential influence of seasonal effects. These seasonal, as well as other context dependencies, are illustrated when applying the coQoL to the MSPSS social support PRO. Our results indicate that having approximately one month of data before the administration of the MSPSS is sufficient to obtain significant correlations between family trying to help social support and fair activity even within a small sample of 39 participants. We observe that the MSPSS is time context-specific. Overall, across all questionnaires, we argue for an intermediary period of aggregation interval for TechRO not extending beyond 60–90 days.
4.4. Pathways towards Personalized Medicine
There is growing evidence within the medical domain that personal data paves a path towards personalized medicine, including genetics data and population-specific data, as well as, on a growing scale, data originating in the individuals’ daily life environments and representing their natural, objective behaviours unfolding in different contexts of daily life. Daily life datasets are, in turn, collected via consumer wearables and smartphones with sensing capabilities.
From our study, we learn that an ideal wearable in the context of personalized medicine study would be comfortable to wear; should have a long battery life (at least a few days); should be accepted by individuals to use as their own, such that they forget they are in the study (implying minimal reactivity); and should provide relevant TechRO related to behavioural patterns (e.g., activity status, steps, as opposed to only heart rate, which would be hard to co-calibrate by itself).
Given our results, we also observe that for some PROs, different self-reported health status of the individuals yield different co-calibration results, even though our definition of disease refers only to mild self-reported cases. When the participants have a disease, other TechROs become correlated more strongly with other PROs than for the healthy ones. An observational study involving healthy individuals can leverage the coQoL method by monitoring a relevant subset of PRO/TechROs longitudinally, and occasionally co-calibrating the PROs with TechROs assuming the sensitivity of the coQoL method for when long-term, significant changes in TechRO occur. Based on the occasionally collected PRO answers, further in-depth examination of the individual’s state may seek to understand if the TechRO change signals coincide with a significant and relevant PRO change, potentially implying a real change of the individual’s health state. Once diagnosed, the individual’s health state may be followed up, assuming another set of PRO/TechRO outcomes co-calibrated in time, to assess the change in the state of the disease accurately.
For example, in the case of diseased Participant 169, we observed that improvements or deteriorations in the state (as self-reported via the PROs for physical activity, Mediterranean diet, memory, and Quality of Life) coincided with TechROs (of physical activity in the sedentary, and light-vigorous spectrum, as well as the total physically active duration). Such trends are likely to differ between persons. As observed with Participant 169, administering the PROs only three times in two years and monitoring the TechRO behaviours using the wearable (minimally obtrusively, continuously, during daily life) yielded numerous trends across not only pairs of PROs and TechROs, but also across different PROs and TechROs.
The coQoL can provide a frontline approach to further triage the individual state assessment, for the healthy or diseased, without burdening the individuals with self-assessments, and at the same time without excluding participants who develop diseases and need to be monitored for long periods. In the context of the latter, the coQoL may be very suitable to assess changes of behaviour and health state in chronically ill patients.
We envision the following coQoL use case. The coQoL results can inform the design of longitudinal observations for selected individual PRO/TechRO outcomes, leveraged in personalized medicine solutions. The procedure consists of the observation for several consecutive days (for more TechRO-adherent participants, four weeks; for the less adherent participants, up to 3 months, from which one can derive around four weeks of quality data) followed by the co-calibration of TechROs with PROs. While monitoring, a potential gradual change in a subset of TechROs of interest can lead to contacting the individual for further health outcome assessments, via PRO or even clinical examination.
In new study designs, we suggest the study participation period of 60–90 days at most, and leverage behavioural techniques for participant wearable-adherence, to maximize the validity of the results acquired. The study design may imply repeated measures longitudinally over the years, e.g., PRO/TechRO co-calibration efforts over 60–90 consecutive days, repeated every few months up to a year (assuming same season every year).
4.5. Study Limitations
Several limitations characterize the presented here preliminary coQoL study. The first limitation is the small sample size, specific to an exploratory feasibility study. A second limitation is the resulting lack of power that reduced the complexity of the analysis method (i.e., statistical hypothesis tests). A third limitation is the presence of multiple PRO answers per individual for the same wave, albeit with high variability. However, we only included one answer per participant-wave to reduce bias towards diligent responders. In case of multiple answers per participant-wave, we chose the latest answer in time, to account for any form submission issues in the CoME software application or the participant changing their mind after submitting the answers once. A fourth limitation is a significant decrease in the number of participants data leveraged for the co-calibrations; we allowed for a leeway to allow PRO and TechRO alignments that are both (1) short-term, but accurate (e.g., 7–14 days, close to the recall period), and (2) longitudinal, but permissive (e.g., 60–120 days, sufficient for the long-term behaviours to unfold). The study highlights the challenge of retaining individuals (shared by many health studies) that can provide outcomes through both self-report and a wearable that must be worn daily, over long periods.
4.6. Future Work
In the ongoing and future work, we expect to involve more participants for shorter periods (60–90 days), repeated every few months to a year, and focus on the PROs and TechROs delineated in this paper to deepen our knowledge about these specific co-calibration efforts and results. We plan to employ more advanced techniques and obtain more results within statistical significance as we increase the sample size in further studies aimed at calibrating PROs and TechROs for health outcomes and longitudinal behaviours such as physical activity and sleep in seniors. We aim to derive individual co-calibration trajectories models, as well as population models, e.g., similar groups of healthy or diseased individuals.