Co-Calibrating Physical and Psychological Outcomes and Consumer Wearable Activity Outcomes in Older Adults: An Evaluation of the coQoL Method

Inactivity, lack of sleep, and poor nutrition predispose individuals to health risks. Patient-Reported Outcomes (PROs) assess physical behaviours and psychological states but are subject of self-reporting biases. Conversely, wearables are an increasingly accurate source of behavioural Technology-Reported Outcomes (TechROs). However, the extent to which PROs and TechROs provide convergent information is unknown. We propose the coQoL PRO-TechRO co-calibration method and report its feasibility, reliability, and human factors influencing data quality. Thirty-nine seniors provided 7.4 ± 4.4 PROs for physical activity (IPAQ), social support (MSPSS), anxiety/depression (GADS), nutrition (PREDIMED, SelfMNA), memory (MFE), sleep (PSQI), Quality of Life (EQ-5D-3L), and 295 ± 238 days of TechROs (Fitbit Charge 2) along two years. We co-calibrated PROs and TechROs by Spearman rank and reported human factors guiding coQoL use. We report high PRO—TechRO correlations (rS ≥ 0.8) for physical activity (moderate domestic activity—light+fair active duration), social support (family help—fair activity), anxiety/depression (numeric score—sleep duration), or sleep (duration to sleep—sleep duration) at various durations (7–120 days). coQoL feasibly co-calibrates constructs within physical behaviours and psychological states in seniors. Our results can inform designs of longitudinal observations and, whenever appropriate, personalized behavioural interventions.


Introduction
Chronic diseases represent a significant share of the burden of disease globally [1]. They are responsible for 86% of all deaths [2]. In Europe, chronic diseases affect over 80% of adults over 65 and incur 70% of the increasing healthcare costs [3]. The most common chronic diseases are cardiovascular, pancreatic, pulmonary, and neoplastic. Unhealthy lifestyle and behaviours, such as physical inactivity, insufficient sleep, poor nutrition, and tobacco intake, explain up to 50% of the risk of chronic disease [4]. We expect the importance of the long-term risk of disease to increase as the world population is ageing [5]. As age dramatically contributes to the risk of multiple diseases [1], the healthy old is a population both inherently at risk and appropriate for primary disease prevention. 2 of 86 Currently, human health studies assess behaviours through a combination of self-reported outcomes [6], in particular patient-reported outcomes (PRO, [6]), and, more recently, patient-generated technology-reported outcomes (TechRO, [6]). Patient-reported outcomes include questionnaires with validated scales that assess individual outcomes momentarily or for a given recall period (e.g., "During the past month, how often have you had trouble sleeping?"). However, self-reports are known to be the subject of biases related to the inherent shortcomings of participant reporting. The questionnaires are inconvenient, infrequent, memory-biased, socially conditioned, and qualitative. For example, seniors reporting physical activity tend to overestimate the amount undertaken [7], while subjective sleep is less reliable than objective sleep according to studies of sleep, ageing, and cognition [8,9].
In an attempt to address the shortcomings of self-reports and based on technological advances, we propose the coQoL PRO-TechRO co-calibration method. Our research primarily focuses on assessing behaviours and outcomes by combining questionnaires with devices such as smartphones and wearables, assessing multiple outcomes (e.g., physical activity, sleep, and heart rate) momentarily, and, if collected for a long time, also longitudinally [10]. Numerous studies used validated, expensive, and bulky lab-grade devices (e.g., ActiGraph), although for a limited time due to the user burden and discomfort of wearing them [11]. Conversely, consumer-friendly wearables measure continuously and objectively TechROs, increasingly more accurately, as technology progresses [12]. Also, more individuals opt for consumer-friendly wearable devices; the market size for consumer wearables will likely double by 2022 [13]. More recent research showed that consumer wearables could assess multiple behaviours accurately [14], unobtrusively [15], and continuously [16] while worn by participants during the natural unfolding of their daily lives. Overall, consumer devices are accurate and used enough to be leveraged in human health studies.
There exist prior work aiming at co-calibration of physical and psychological outcomes with technology-related ones, as discussed in this paper. We identify the previous work by following by following a semi-structured literature review detailed in Appendix A.1. Table 1 presents the PRO-TechRO co-calibration studies resulting from our literature review for the following outcomes: physical activity, social support, anxiety and depression, memory, sleep, and health-related Quality of Life. For each study, the table presents the PROs and TechROs used for co-calibration, the study design, the analysis methodology, and a summary of results. As for the PRO, the table presents the long names of the PRO instruments leveraged in the study, followed by the TechRO details, at least including the name and its form factor (consumer wearable or research-grade accelerometer, and position on the body). The study design details include its target population, sample size and age, and study duration. Past co-calibration methods range from simple descriptive statistics to inferential statistics via correlation methods, to machine learning, including regression and classification. The results bring a summary of PRO-TechRO co-calibration efforts, as presented in the paper.
To better emphasize the difference between state of the art and our work, we recall that we focus on healthy seniors and our method implies repeated sets of different PRO assessments in longitudinal daily life TechRO assessment settings, based on consumer wearables. All studies presented in Table 1 have at least one feature (marked in violet) that excludes them from co-calibrating PRO questionnaires with TechRO consumer wearables in healthy seniors in the wild over long periods (above the typical 7-14 days found in the literature). Table 1 does not include studies on nutrition, since, to our best knowledge, the co-calibration of the diet with distant measures such as steps or sleep using questionnaire PROs and consumer wearables (or, at the very least, accelerometers) does not exist in the literature. However, there are numerous articles on energy expenditure estimates measured by consumer wearables that guide the energy intake (food types and qualities) for individuals following dietary recommendations [17][18][19]. Spearman correlation PAAQ and IPAQ agreed for moderate and vigorous activity (r S = 0.44, r S = 0.2, respectively). Garriguet et al. (2015) [36] Physical Activity International Physical Activity Questionnaire (IPAQ)

Patient-Reported Outcomes (Profile)
At the first visit, in the profile, participants provided their age, gender, ethnicity, profession, education, cohabitants status, height, weight, blood pressure, cholesterol, smoking, alcohol, medication (hypertension), history of personal health issues (diabetes, apnea, insomnia, hyperglycemia, stroke, infarct, depression), and history of family health issues (hypertension, diabetes, stroke, heart attack, dementia).
We included in the analysis participants who self-reported mild disease. We selected participants into three health groups: (1) all participants (denoted as the all health group), (2) only the healthy participants (healthy), and (3) only those with mild disease (diseased). We collected the behavioural wearable markers from the daily aggregates provided by the Fitbit daily activity summary application programmable interface (API) [63]. Appendix B.2.1 motivates our choice for Fitbit as a personal wearable activity monitor in the context of our study.
We processed the wearable data by aggregating it over consecutive days in aggregate intervals spanning from 7 to 120 days. We included in the analysis only days with at least 21 hours of Fitbit measurement as valid days. Then we required each aggregate interval to have at least 70% valid days. This procedure corresponds to Step 1B in Figure 1. Appendix B.2.2 details the data processing.

Co-Calibration (PROs vs. TechROs)
We co-calibrated PROs with TechROs by alignment. Concretely, for a PRO variable to align to a TechRO variable, the administration date of the former must have been within a set duration (0-120 days) from the end date of the latter.
To account for small samples, we allowed a leeway (0-120 days) between the end of the TechRO monitoring interval and the PRO scale administration date.
For each participant, we included only the last alignment in a wave, to discard repeated answers within a few minutes and reduce bias towards overly diligent responders.
When we aligned PROs with TechROs of increasing durations, the number of paired observations decreased; we thus required a minimum of 10 observations to have a nontrivial size [69].
For each PRO-TechRO pair, we reported the highest correlation among all aggregation intervals of TechRO (7-120 days) aligned to match the PRO administration date. We included only significant correlations, i.e., those correlation coefficients whose 95% confidence interval maintained sign. This procedure corresponds to Step 2 in Figure 1. Appendix B.3 elaborates on the details of the PRO-TechRO variable alignment.

Data Analysis
We conducted descriptive and inferential analyses of the PROs and TechROs. We then analyzed patterns from the analyses.

Descriptive Analysis (PROs and TechROs)
The descriptive analysis consisted of summary statistics (median, mean, and standard deviation, or SD) based on groups of participant-wave characteristics. In our study, we analyzed the participants by their health, country, and gender self-reported groups. For PROs, we observed the statistics across waves. Appendix B.1 elaborates on the analysis of the PRO variables. For TechROs, we observed the statistics across the entire study period and by counting valid days, described in depth in Appendix B.2. Appendix B.3.1 details the descriptive analysis procedure.

Inferential Analysis (PROs vs. TechROs)
We co-calibrated PRO variables with TechRO variables by applying the Spearman [70] statistical test on each pair of PRO-TechRO variables resulting from the alignments. The Spearman r S statistical correlation coefficient measures the direction and strength of the association between two variables. We used the SciPy library [71] to implement the Spearman correlations. Appendix B.3.2 elaborates on the motivation and assumptions for the inferential analysis. This procedure corresponds to Step 4 in Figure 1.

Pattern Analysis (PROs vs. TechROs)
We used the results from the inferential analysis to highlight informative PRO variables and pairs of PRO-TechRO. This procedure corresponds to Step 5 in Figure 1. We employed two metrics that focus on the number of correlations (a high number of significant correlations with TechRO variables indicates that the PRO variable is informative) and the quality of the correlations (where possible, a strong significant correlation with other significant correlations in its vicinity indicates that the PRO-TechRO correlation is informative).
The first metric, denoted total, counts all strong correlations (r S ≥ 0.5) for a given PRO variable and highlights those PRO variables that correlate with the most TechRO variables. We applied this metric to all PRO variables.
The second metric, denoted contour, can only apply for variables that can be ordered by a criterion. For our study, we ordered TechRO physical activity variables by their intensities (from sedentary to vigorous). We applied this metric on strong and significant correlations (r S ≥ 0.8) between a PRO and a TechRO physical activity intensity variable. The metric counted the maximum number of adjacent significant correlations of the same PRO variable (at lower and, separately, higher intensities) such that they would form a contiguous sequence of significant correlations that maintained the sign. Appendix B.3.3 further explains and exemplifies this metric.

Results
In this section, we report the results from the study participants (Section 3.1) and analyses (descriptive in Section 3.2, inferential in Section 3.3, and patterns in Section 3.4) as well as two use case examples for coQoL (Section 3.5).

Study Participants
Forty-two seniors (mean age 69.8 ± 7.4) signed up for the study. From these, 39 participants (mean age 70.0 ± 7.2, 22 women, 26 from Spain 26 and 13 from Hungary) provided at least one PRO; three participants were disqualified. Out of the qualified participants, 28 reported no health condition (thus being in the healthy health group) and 11 reported a mild health condition (forming the diseased health group). Participant characteristics are available in Table 4. Three waves of PRO participation resulted from January 2017 to December 2019: wave 1 (mid-2018), wave 2 (end-2018 and start-2019), and wave 3 (mid-2019). Table 5 illustrates the waves of participation for each participant and questionnaire.            In the health group with all participants, when assessing totals of correlations, PRO moderate

Inferential Analysis (PROs vs. TechROs)
Appendix C.2 elaborates on the Spearman rank correlations resulted from the inferential analysis on each questionnaire and PRO-TechRO variable pair.
3.4.1. coQoL for Physical Activity (IPAQ vs. Fitbit) We report the correlations of PRO physical activity variables (IPAQ) with TechRO variables (Fitbit) by using the total and contour metrics. Table 6 highlights the PROs that correlated with the most TechROs (r S ≥ 0.5) across all TechRO families by health group.  Quality of Life EQ-5D-3L All Q6: health state today 4 1 3 8
In the health group with all participants, when assessing totals of correlations, PRO moderate activity in the domestic, garden, and leisure domains correlated with the most TechROs (Table 6).
In the group with healthy participants, PRO moderate activity in the domestic and garden domains had the most correlations with TechROs as well. The domestic moderate and garden moderate activity were also the only two PROs highlighted by the total metric in the groups with all and healthy participants.
In the group with diseased participants, PRO vigorous in the garden and leisure domains correlated with the most TechROs, followed by the PRO moderate and vigorous activities in the work domain (Table 6).

Physical Activity Outcomes by Contours of Correlations
We report the strong correlations (r S ≥ 0.8) and their contours between PRO variables (IPAQ) and TechRO variables (Fitbit) in Table 7.
In the health group with all participants, when assessing strong correlations, the PRO domestic moderate activity had a small contour of correlations with the TechRO light+fair physical activity. Also, the PRO work vigorous activity may explain the TechRO active duration without a contour (Table 7, rows with Health: All).
In the group with healthy participants, only two strong correlations emerged without contours. PRO work moderate and total activity correlated with the TechRO fair activity duration (Table 7, rows with Health: Healthy).
In the group with diseased participants, we found numerous correlations with and without contours in the work domain. A positive relationship with a broad contour occurred between PRO work moderate activity and TechRO fair activity duration. Furthermore, PRO work moderate activity correlated negatively with TechRO sedentary duration. However, work activity at the two extreme intensities (walking and vigorous) also correlated negatively with relative light activity (Table 7, rows with Health: Diseased and PRO Domain: Work).
For the PRO garden domain, PRO vigorous activity correlated negatively with contours with TechRO relative sedentary and light activity, indicating that it may redistribute physical activity across the other intensities over the day (Table 7, rows with Health: Diseased and PRO Domain: Garden).
For the PRO leisure domain, walking activity correlated without contours with energy and steps. PRO leisure vigorous activity correlated positively with TechRO fair+vigorous activity durations and negatively with TechRO absolute sedentary and relative light durations. The PRO leisure total activity had a correlation with contour consistent with the previous correlation: negative relationship with TechRO sedentary+light activity (Table 7, rows with Health: Diseased and PRO Domain: Leisure).
The PRO vigorous activity in the work domain appeared in both groups with all and diseased participants. However, its correlations were divergent: for all participants, the work vigorous associated with the total daily activity, while for the mildly diseased, it may replace light activity. The moderate activity at work had inverse relations with fair activity for diseased (positive) and healthy (negative) participants. However, for the diseased, the correlation had a broad contour, while for the healthy it had none. In this case, the latter relation may have been a false positive (Table 7, rows with PRO Domain: Work).
Across numerous PROs, the TechRO of sedentary activity correlated strongly only for diseased participants and mostly in relative families. PRO moderate to vigorous activity at work, in the garden, and for leisure all negatively correlated with TechRO daily sedentary duration. These results indicate that moderate activity may contribute to lower measured TechRO sedentary duration, but the redistributions of daily time to other TechRO intensities may vary between TechRO fair and vigorous intensities. (Table 7, rows with Health: Diseased and TechRO Variable: Sedentary).

Physical Activity Outcomes Highlighted by Both Metrics
For the health group with all participants, the domestic moderate activity appeared with both metrics. This result is in concordance with the strong correlations in the PRO domestic domain mentioned above (Tables 6 and 7, rows with Health: All).
In the group with diseased participants, the total metric results confirmed those using the contour metric for the PRO work domain at moderate and vigorous intensities (Tables 6 and 7, rows with Health: Diseased).

Physical Activity Outcomes Interpretation
In the health group with all participants, we observed several "expected" correlations. The PRO domestic moderate activity associated with the TechRO absolute light+fair activity duration. This effect is only visible for the total metric, indicating that PRO domestic and garden moderate activity may redistribute physical activity across numerous TechRO intensities.
In the group with diseased participants, PRO work moderate associated with the TechRO absolute fair activity duration. For the same health group, leisure walking activity correlated with both energy and steps, while PRO vigorous activity correlated with both absolute fair+vigorous activity and relative vigorous activity (when including sleep).
In this group, we also found "expected" correlations between PROs and TechRO sedentary duration. PRO moderate activity at work, vigorous activity in the garden, and vigorous activity for leisure associated negatively with TechRO sedentary duration. The TechRO sedentary+light duration associated negatively with the PRO total active effort as well.
Other associations indicate potential activity replacements (within TechRO) for the same health group (diseased). Walking at work associated negatively with the relative duration of activity at the light intensity, indicating that, when they walk at work, they tend to perform less light activity elsewhere. Also, the vigorous activity effort may replace light activity duration during the day, indicating that the participants tend to limit their physical activity to a narrow spectrum of intensities.
The distribution of results per families of TechROs indicates that for the groups with all participants and the healthy, the absolute families may provide most, if not all, strong correlations. However, for the diseased group, measuring the entire physical activity duration and including sleep uncovered associations weaker or non-significant otherwise. For this group, measuring only raw energy or steps TechROs may be indicative of their leisure walking efforts, potentially useful for more sedentary participants who do not work.
Both metrics highlighted all IPAQ domains except transport. The PRO transport physical activity was not indicative of TechRO physical activity measures, potentially due to the lower and fewer correlations with transport. However, the raw responses indicate that transport walking activity may associate with the numeric score of physical activity.

coQoL for Social Support (MSPSS vs. Fitbit)
We report the correlations of PRO social support variables (MSPSS) with TechRO variables (Fitbit) by using the total and contour metrics. Table 6, rows with Outcome: Social Support, enumerates the PROs that correlated with the most TechROs (r S ≥ 0.5) across all families by health group.

Social Support Outcomes by Total Numbers of Correlations
In the health group with all participants, PRO family items Q8 (talks about problems) and Q11 (willing to help make decisions) correlated with the most TechROs.
In the group with healthy participants, PRO friends items, Q6 (friends try to help), Q9 (friends share joys and sorrows), and Q12 (friends talk about problems), had relatively more correlations with TechRos than PRO significant other or family items. Furthermore, the PRO friends numeric score had many correlations with TechROs.
In the group with diseased participants, PRO family Q4 (family gives emotional help and support) correlated negatively with TechRO absolute sedentary duration and Q12 (friends talk about problems) positively with the TechRO steps (Table 8, rows with Health: Diseased). Color coding: from orange (weak correlation) to green (strong correlation). × depicts an absent significant correlation of the same sign next to the strong correlation.

Social Support Outcomes by Contours of Correlations
We report the strong correlations (r S ≥ 0.8) and their contours between PRO variables (MSPSS) and TechRO variables (Fitbit) in Table 8.
In the health group with all participants, several PRO items related to the significant other social support, Q2 (a special person shares joys and sorrows), Q5 (a special person is a real source of comfort), and Q10 (a special person cares about my feelings) correlated strongly and with a broad contour with TechRO relative vigorous activity durations when including sleep (Table 8, rows with Health: All and PRO Source: Significant other). Also, several PRO family items, Q3 (family tries to help) and Q8 (family talks about problems) as well as the family numeric sub-score correlated strongly and with a broad contour with TechRO relative fair and vigorous activity durations when including sleep. These two strong co-calibrations only appeared as highlighted in the CLR PA+S family (Table 8, rows with Health: All and PRO Source: Family).
In the group with healthy participants, we observed numerous strong negative correlations with broad contours between numerous PRO items. Several are related to the significant other source: Q1 (a special person is around when in need), Q2 (a special person shares joys and sorrows), Q5 (a special person is a real source of comfort), and Q10 (a special person cares about my feelings) as well as the significant other numeric sub-score and the TechRO fair physical activity duration. However, we also observed a strong, positive correlation with a similarly sized contour with PRO item Q5 (a special person is a real source of comfort) and TechRO fair activity duration in the relative CLR PA+S family. These results indicate that measuring daily sleep is necessary to co-calibrate this PRO source and TechRO physical activity intensity (Table 8, rows with Health: Healthy and PRO Source: Significant other).
Also, several PRO family items, Q3 (family tries to help), Q8 (family talks about problems), and Q11 (family is willing to help make decisions) correlated negatively with TechRO absolute fair activity, but positively with the relative duration at the same physical activity intensity (Table 8, rows with Health: Healthy and PRO Source: Family), yielding a similar interpretation.
Few PRO friends items such as Q9 (friends share joys and sorrows) and Q12 (friends talk about problems) correlated with broad contours with the TechRO absolute light physical activity duration (Table 8, rows with Health: Healthy and PRO Source: Friends).
Also, the PRO categorical score strongly correlated without contour with the TechRO absolute daily duration of physical activity (active) and the relative CLR PA light activity. The PRO numeric score also correlated with the TechRO absolute light+fair activity and relative CLR PA+S fair activity, indicating a positive relationship between social support and light to fair activity (Table 8, rows with Health: Healthy and PRO Source: All).
In the group with diseased participants, we only observed two isolated strong correlations. PRO family item Q4 (gives emotional help and support) correlated negatively with TechRO sedentary duration. PRO friends item Q12 (talk about problems) correlated positively with daily steps (Table 8, rows with Health: Diseased).
PRO items Q2, Q3, Q5, Q8, Q10, and the numeric score appeared in both groups of all and healthy participants. However, only Q8 maintained the correlation with TechRO fair physical activity across health groups. Q12 had strong correlations in both groups of healthy and diseased participants. However, the relationship was expressed through separate outcomes: light activity and steps, respectively (Table 8).

Social Support Outcomes Highlighted by Both Metrics
In the health group with all participants, PRO friends Q9 (friends share joys and sorrows) and Q12 (friends talk about problems) were highlighted as strongly correlated by both contour and total metrics, and thus informative for co-calibration with TechROs (Tables 6 and 8, rows with Health: All).
In the group with healthy participants, for the significant other and family sources of social support, Q10 (a special person cares about my feelings) and Q3 (family tries to help) appeared as informative with both metrics (Tables 6 and 8, rows with Health: Healthy).

Social Support Outcomes Interpretation
In the health group with all participants, several PRO items related to the significant other and family social support. They alternatively correlated with TechRO relative fair and vigorous activity: family items to the fair activity, and significant other items to the vigorous activity. All correlations resulted from relative TechROs including sleep. For this reason, the assessment of social support may benefit from the inclusion of sleep in the analysis.
In the group with healthy participants, the PRO social support from the significant other had negative correlations with TechRO fair activity in the absolute amount and positive correlations with fair activity in the relative amount (including sleep). This pattern was also pronounced for the items related to family social support. Sleep changed the ordering of durations throughout the day across the healthy participants. We argue for including sleep in the analysis of significant other and family social support for healthy seniors. Having friends who share joys and sorrows and, in general, talk about problems, associated with more light activity.
In the group with diseased participants, emotional help and support from the family associated with less sedentary time throughout the day. Also, having friends who talk about problems associated with more steps.
In general, the significant other being a real source of comfort appeared in most instances, followed by having someone who cares about feelings, then having someone who shares joys and sorrows, and then (at a distance) having a special person around when in need. Having a significant other who is a source of comfort may serve as a proxy item for more frequent assessments of the relationships between significant other social support and physical activity at the fair to vigorous intensities.
Having a family that tries to help, talks about problems, and wishes to help make decisions appeared in three groups across metrics. However, getting emotional help and support from the family only appeared once. Frequent administrations of the MSPSS may choose to assess the relationships between family social support and fair physical activity by using only the first three items.
Having friends with whom to talk about problems appeared in three groups across metrics. Having friends who try to help and share joys and sorrows appeared less often with strong correlations and contours but had numerous correlations in total. We argue that counting on friends when things go wrong is a less prominent item in assessing relationships between friends social support and physical activity.

coQoL for Anxiety and Depression (GADS vs. Fitbit)
We report the correlations of PRO anxiety and depression (GADS) with TechRO variables (Fitbit) by using the total and contour metrics. Table 6, rows with Outcome: Anxiety and depression, enumerates the PROs that correlated with the most TechROs (r S ≥ 0.5) across all families by health group.

Anxiety and Depression Outcomes by Total Numbers of Correlations
In the health group with all participants, PRO anxiety item Q8A (worried about own health), as well as PRO depression items Q1D (lacking energy) and Q6D (lost weight due to poor appetite), recorded the most correlations with TechROs (Table 6, row with Outcome: Anxiety and depression, Health: All).
In the group with healthy participants, PRO item Q2D (lost interest in things) had the most correlations (Table 6, row with Outcome: Anxiety and depression, Health: Healthy).
In the group with diseased participants, PRO item Q2A (worrying a lot) had the most correlations with TechROs (Table 6, row with Outcome: Anxiety and depression, Health: Diseased).

Anxiety and Depression Outcomes by Contours of Correlations
We report the strong correlations (r S ≥ 0.8) and their contours between PRO variables (GADS) and TechRO variables (Fitbit) in Table 9.
In the health group with all participants, PRO anxiety item Q5A (sleeping poorly) correlated strongly with a broad contour with TechRO relative CLR PA+S light physical activity. We found other isolated correlations for anxiety. PRO item Q3A (irritable) correlated with the TechRO relative vigorous activity. PRO item Q7A (trembling [. . . ]) negatively correlated with the TechRO daily active duration. PRO depression items Q1D (lacking energy) and Q6D (lost weight due to poor appetite) had isolated correlations. The PRO numeric score had a strong correlation with the TechRO relative sleep duration (Table 9, rows with Health: All).
In the group with healthy participants, PRO anxiety item Q7A (trembling [. . . ]) correlated positively with TechRO vigorous activity and negatively with TechRO light and light+fair activity durations (the last with a broad contour) in both absolute and relative families. PRO item Q7A correlated negatively with the total daily active duration. PRO item Q3A (irritable) correlated negatively with total daily active duration. PRO depression items Q2D (lost interest in things) and Q9D (worse in the morning) had isolated correlations, the first negative with TechRO relative CLR PA light activity duration, and the second with TechRO relative CLR PA+S sedentary duration. PRO item Q6D (lost weight due to poor appetite) recorded a positive correlation as well, with TechRO relative sleep duration (Table 9, rows with Health: Healthy).
In the group with diseased participants, we did not observe strong correlations (r S ≥ 0.8) by using the contour metric (Table 9, rows with Health: Diseased).
PRO items Q3A, Q7A, and Q6D appeared in both groups with all and healthy participants. However, only Q7A kept the same strong correlation against total daily active duration in the two groups (Table 9).

Anxiety and Depression Outcomes Highlighted by Both Metrics
In the health group with all participants, PRO items Q1D (lacking energy) and Q6D (lost weight due to poor appetite) were highlighted by both metrics (Tables 6 and 9, rows with Health: All).
For healthy participants, PRO item Q2D (lost interest in things) appeared in both metrics as well (Tables 6 and 9, rows with Health: Healthy).

Anxiety and Depression Outcomes Interpretation
In the health groups with all and healthy participants, irritability and trembling may expediently assess anxiety while having lost interest in things and losing weight due to poor appetite may assess depression. Follow-up investigations may establish whether the health state is momentary or deteriorating over time.
PRO Trembling, tingling, dizziness, sweating, diarrhoea, or passing urine yielded numerous correlations for healthy participants: negative correlations with TechRO light, light+fair, and total daily active duration as well as a positive correlation with vigorous physical activity duration. When a daily life monitor observed a gradual replacement of light to fair activity with vigorous activity (as reported by the wearable), it may be worth investigating whether an otherwise healthy participant also becomes gradually more anxious (by using items).
In the group with healthy participants, a decrease in light physical activity may indicate that the participants experience an increase in depression. Researchers can then assess this hypothesis by administering, e.g., the corresponding item in the EQ-5D-3L scale. A similar process could be employed for all seniors by longitudinally monitoring the sleep duration relative to the 24 h of the day, based on the corresponding strong correlations between the numeric score and the relative sleep duration. In the case of increasingly longer sleep, the participant may enter a state of anxiety or depression.
In general, depression and anxiety positively associated with the sedentary duration, in both absolute and relative TechRO families, especially for participants who self-report disease. The two items in the scale referring to sleep may provide additional insights towards not only the anxiety and depression status of the participant, but also sleep quality.

coQoL for Mediterranean Nutrition (PREDIMED vs. Fitbit)
We report the correlations of PRO Mediterranean nutrition variables (PREDIMED) with TechRO variables (Fitbit) by using the total and contour metrics. Table 6, rows with Outcome: Mediterranean nutrition, enumerates the PROs that correlated with the most TechROs (r S ≥ 0.5) across all families by health group.

Mediterranean Nutrition Outcomes by Total Numbers of Correlations
In the health group with all participants, the PRO categorical score, numeric score and items Q12 (nuts use) and Q14 (sofrito use) had the most correlations with TechROs (Table 6, rows with Outcome: Mediterranean nutrition, Health: All).
In the group with healthy participants, PRO item Q4 (fruit use) and the categorical score had the most correlations with TechROs (Table 6, rows with Outcome: Mediterranean nutrition, Health: Healthy).
In the group with diseased participants, we only observed PROs with reduced numbers of correlations with TechROs across families ( Table 6, rows with Outcome: Mediterranean nutrition, Health: Diseased).
The categorical score is the only PRO that appeared with numerous correlations in the two groups with all and healthy participants ( Table 6).

Mediterranean Nutrition Outcomes by Contours of Correlations
We report the strong correlations (r S ≥ 0.8) and their contours between PRO variables (PREDIMED) and TechRO variables (Fitbit) in Table 10. Color coding: from orange (weak correlation) to green (strong correlation). × depicts an absent significant correlation of the same sign next to the strong correlation.
In the health group with all participants, PRO item Q12 (nuts use) had an isolated negative correlation with the TechRO absolute fair activity, but a positive correlation (with a contour) with the TechRO relative CLR PA+S light activity. The PRO numeric score also registered two correlations with contours: negative with TechRO absolute vigorous activity duration and positive with TechRO relative CLR PA+S light activity duration (Table 10, rows with Health: All).
In the group with healthy participants, PRO item Q3 (vegetables use) correlated negatively with the TechRO relative fair activity in both CLR PA and CLR PA+S families (Table 10, rows with Health: Healthy). While the two correlations had no contour, their presence in both families highlights an effect.
In the group with diseased participants, PRO item Q5 (red meat, hamburger, or meat use) correlated positively with TechRO energy expenditure. For the same group, PRO item Q11 (commercial sweets or pastries use) correlated positively with TechRO heart rate (Table 10, rows with Health: Diseased).

Mediterranean Nutrition Outcomes Highlighted by Both Metrics
For all participants, PRO item Q12 (nuts use) and the numeric score were highlighted by both metrics (Tables 6 and 10, rows with Health: All).

Mediterranean Nutrition Outcomes Interpretation
In the health group with all participants, the nutrition numeric score associated with the relative sleep duration, and using nuts had a similar correlation (both correlations with contours). Further studies may assess whether this item can be administered independently of the full scale (for the numeric score) to assess the relationship between (mal)nutrition and light physical activity in seniors.
With regards to poor nutrition choices and their potentially magnified effects on people with mild disease, the consumption of red meat and hamburgers by participants with mild disease correlated with higher energy expenditure. The consumption of commercial sweets or pastries also associated with an increased heart rate.
The PRO numeric and categorical scores correlated with numerous TechROs, indicating a replacement of fair to vigorous activity with the light activity.
Participants from Spain had on average more adherence than those from Hungary (Appendix C.1.1), making the country of residence a potential confounder for the relationships above.

coQoL for Nutrition (SelfMNA vs. Fitbit)
We report the correlations of PRO nutrition variables (SelfMNA) with TechRO variables (Fitbit) by using the total and contour metrics. Table 6, rows with Outcome: Nutrition, enumerates the PROs that correlated with the most TechROs (r S ≥ 0.5) across all families by health group.

Nutrition Outcomes by Total Numbers of Correlations
For all health groups, we found PROs correlated with few TechROs when compared to other outcomes (Table 6, row with Outcome: Nutrition, Health: All).
In the groups with all participants and the healthy, the PRO categorical score had the most correlations (Table 6, row with Outcome: Nutrition, Health: Healthy).
In the group with diseased participants, PRO items Q1 (food intake declined) and Q2 (weight lost) recorded the most correlations with TechROs (Table 6, row with Outcome: Nutrition, Health: Diseased).
The categorical score is the only PRO that appeared in two health groups: the group with all participants and the group with healthy participants (Table 6).

Nutrition Outcomes by Contours of Correlations
We report the strong correlations (r S ≥ 0.8) and their contours between PRO variables (SelfMNA) and TechRO variables (Fitbit) in Table 11.
We only found strong correlations (r S ≥ 0.8) in the group with diseased participants. PRO items Q1 (food intake declined) and Q2 (weight lost) correlated negatively with the TechRO relative sleep duration. PRO item Q4 (stressed or severely ill) correlated negatively with the TechRO absolute sedentary duration (Table 11).

Nutrition Outcomes Highlighted by Both Metrics
In the group with diseased participants, PRO items Q1 (food intake declined) and Q2 (weight lost) were highlighted by both metrics (Tables 6 and 11, rows with Health: Diseased).

Nutrition Outcomes Interpretation
In the health group with all participants, the PRO categorical score correlated with numerous TechROs. In general, better nutrition coincided with less sedentary and light physical activity and more fair and vigorous physical activity. In the group with healthy participants, both numeric and categorical scores exhibited this pattern (Appendix C.2).
In the group with diseased participants, a long-term decrease in sleep duration may indicate a decline in food intake or a loss of weight-two outcomes that appeared in both metrics and may lead to malnutrition.

coQoL for Memory (MFE vs. Fitbit)
We report the correlations of PRO memory variables (MFE) with TechRO variables (Fitbit) by using the total and contour metrics. Table 6, rows with Outcome: Memory, enumerates the PROs that correlated with the most TechROs (r S ≥ 0.5) across all families by health group.

Memory Outcomes by Total Numbers of Correlations
In the health group with all participants, the PRO items that correlated with the most TechROs were Q12 (having difficulty picking up a new skill), Q14 (forgetting to do planned things), and Q6 (forgetting the time of events) ( Table 6, rows with Outcome: Memory and Health: All).
In the group with healthy participants, PRO items Q6 (forgetting the time of events), Q15 (forgetting details of done things), Q12 (having difficulty picking up a new skill), and Q14 (forgetting to do planned things) correlated with the most TechROs (Table 6, rows with Outcome: Memory and Health: Healthy).
In the group with diseased participants, PRO items Q13 (having a word on the tip of the tongue) and Q25 (getting lost in often visited place) had the most correlations (Table 6, rows with Outcome: Memory and Health: Diseased).
PRO items Q12 (having difficulty picking up a new skill) and Q14 (forgetting to do planned things) were the only outcomes that had numerous correlations with TechROs across two groups: all and healthy ( Table 6).

Memory Outcomes by Contours of Correlations
We report the strong correlations (r S ≥ 0.8) and their contours between PRO variables (MFE) and TechRO variables (Fitbit) in Table 12.
In the health group with all participants, there was only one strong correlation with contour between PRO item Q24 (forgetting where things are normally kept) and PRO fair activity in the CLR PA family. The PRO numeric score had a negative correlation with the TechRO total daily active duration. PRO item Q7 (completely forgetting to take things) had a strong correlation with TechRO relative sleep duration. PRO items Q12 (having difficulty picking up a new skill) and Q13 (finding a word on the tip of the tongue) had negative and positive relations with TechRO relative light and fair CLR PA+S activity durations, respectively (Table 12, rows with Health: All). Color coding: from orange (weak correlation) to green (strong correlation). × depicts an absent significant correlation of the same sign next to the strong correlation.
In the group with healthy participants, PRO item Q14 (forgetting to do planned things) had a contour of two strong correlations with TechRO fair+vigorous and vigorous activity. PRO item Q16 (forgetting the topic of an ongoing conversation) had a strong correlation with contour TechRO absolute fair activity duration. PRO items Q10 (letting ramble about unimportant things) and Q24 (forgetting where things are normally kept) had isolated negative correlations with TechRO fair activity duration. PRO item Q7 (completely forgetting to take things) recurred in correlating strongly with sleep. The numeric score also correlated negatively with TechRO relative CLR PA fair activity duration (Table 12, rows with Health: Healthy).
In the group with diseased participants, PRO item Q18 (forgetting to tell somebody something important) had a broad contour with the TechRO fair, fair+vigorous, and vigorous physical activity duration. PRO item Q6 (forgetting the time of events) had a positive correlation with the TechRO heart rate, a positive correlation (having a contour) with the light activity, and a negative correlation with the sleep duration. PRO item Q1 (forgetting objects put) had a negative correlation (contour) with the TechRO relative vigorous activity in the PA+S family. Q13 (finding a word on the tip of the tongue) correlated negatively with TechRO daily active duration and positively with relative sedentary duration in the CLR PA+S family. Q8 (being reminded about things) had a positive correlation with the TechRO light+fair activity duration. The PRO numeric score correlated negatively with the TechRO total active duration (Table 12, rows with Health: Diseased).
PRO items Q7 (completely forgetting to take things) and Q24 (forgetting where things are normally kept), as well as the numeric score, appeared in both groups with all and healthy participants. Items Q7 and Q24 maintained the strong correlations between groups: positive with sleep duration and negative with relative fair activity. The numeric score expressed the inverse relation with physical activity in different ways depending on the health status. For all participants and the mildly diseased, it had a negative correlation with the total daily active duration. For the healthy participants, it had a negative correlation with the relative fair activity duration (Table 12).

Memory Outcomes Highlighted by Both Metrics
In the health group with all participants, Q12 (having difficulty picking up a new skill) was highlighted by both metrics as an informative PRO for memory (Tables 6 and 12, rows with Health: All).
In the group with healthy participants, PRO item Q14 (forgetting to do planned things) was informative in both metrics (Tables 6 and 12, rows with Health: Healthy).
In the group with diseased participants, PRO item Q13 (finding a word on the tip of the tongue) was informative through both metrics (Tables 6 and 12, rows with Health: Diseased).

Memory Outcomes Interpretation
In the health group with all participants, the memory numeric score strongly associated with shorter durations of any physical activity during the day. A negative correlation with relative fair physical activity also reflected this pattern in the group with healthy participants. A decrease in active duration may provide an opportunity for a long-term monitoring system to assess whether an otherwise healthy senior is experiencing a gradual increase in memory failures.
In the groups with all participants and the healthy, forgetting where things are normally kept associated positively with fair physical activity; however, only when accounting for sleep as well.
In the group with diseased participants, forgetting to tell somebody something important associated with numerous TechROs, suggesting a replacement of fair and vigorous activity durations with sedentary and light duration throughout the day. By observing this TechRO pattern longitudinally in time, a study may administer this item towards assessing memory failures. Finding a word is on the tip of the tongue is another PRO item that also correlated with TechRO sedentary duration and negatively correlated with daily active duration. Further research may investigate the reliability of a more frequent assessment than the MFE scale consisting of the items above for seniors with mild disease.

coQoL for Sleep (PSQI vs. Fitbit)
We report the correlations of PRO sleep variables (PSQI) with TechRO variables (Fitbit) by using the total and contour metrics. Table 6, rows with Outcome: Sleep, enumerates the PROs that correlated with the most TechROs (r S ≥ 0.5) across all families by health group.

Sleep Outcomes by Total Numbers of Correlations
In the health group with all participants, PRO items Q7 (trouble staying awake driving, eating, socialising) and Q4 (duration of actual sleep), followed by the daily dysfunction numeric sub-score, had the most correlations with TechROs across families (Table 6, rows with Outcome: Sleep and Health: All).
In the group with healthy participants, PRO items Q4 (duration of actual sleep), Q5C (trouble sleeping due to using the bathroom), Q7 (trouble staying awake driving, eating, socialising) had the most correlations with TechROs, followed by the daily dysfunction numeric sub-score (Table 6, rows with Outcome: Sleep and Health: Healthy).
In the group with diseased participants, the PROs that correlated with the most TechROs had relatively fewer correlations. The daily dysfunction numeric sub-score and Q6 (duration of actual sleep) registered the most correlations (Table 6, rows with Outcome: Sleep and Health: Diseased).
The PRO daily dysfunction numeric sub-score had numerous correlations in all three health groups. The PRO item Q4 (duration of actual sleep) appeared in the groups with all participants and the healthy ( Table 6).

Sleep Outcomes by Contours of Correlations
We report the strong correlations (r S ≥ 0.8) and their contours between PRO variables (PSQI) and TechRO variables (Fitbit) in Table 13. Color coding: from orange (weak correlation) to green (strong correlation). × depicts an absent significant correlation of the same sign next to the strong correlation.
In the health group with all participants, PRO sleep disturbance item Q5A (trouble sleeping due to not getting to sleep) correlated positively with TechRO relative sleep duration. PRO items Q5E (trouble sleeping due to coughing or snoring loudly) and Q5F (trouble sleeping due to feeling too cold) correlated with TechRO relative vigorous activity duration (negative, CLR PA family) and light activity duration (positive, CLR PA+S family), respectively. PRO item Q7 (trouble staying awake while driving, eating, socialising) correlated negatively with TechRO relative sleep duration and light activity durations. Two numeric sub-scores yielded correlations with relative sleep: latency (positive) and daily dysfunction (negative). The daily dysfunction numeric sub-score also correlated with TechRO vigorous activity (broad contour) and the relative light activity (contour). The efficiency numeric sub-score had an isolated correlation with TechRO fair activity (Table 13, rows with Health: All).
In the group with healthy participants, numerous PROs correlated with TechRO sleep: Q2 (duration to fall asleep), Q5A (trouble sleeping due to not getting to sleep), Q11 (duration stayed in bed), and the latency numeric sub-score. Among the sleep disturbance items, Q5C (trouble sleeping due to using the bathroom) had two contoured correlations: negative with light+fair and light activity (the latter with a broad contour) in absolute and relative CLR PA families, respectively. The PRO efficiency numeric sub-score correlated again with TechRO fair activity. The numeric score correlated positively (and having a contour) with fair+vigorous activity (Table 13, rows with Health: Healthy).
In the group with diseased participants, PRO item Q4 (duration of actual sleep) registered a broad contour of 3 strong correlations (including r S = 0.9) with fair, fair+vigorous, and vigorous TechRO absolute durations. PRO item Q1 (time gone to bed at night) correlated inversely with the TechRO absolute sleep duration. Sleep disturbance items Q5B (trouble sleeping due to waking up in the middle of the night) and Q5C (trouble sleeping due to using the bathroom) correlated negatively with energy expenditure (Table 13, rows with Health: Diseased).
PRO items Q5A (trouble sleeping due to not getting to sleep) and Q5E (trouble sleeping due to coughing or snoring loudly), and the latency and efficiency numeric sub-scores appeared for the groups with all participants and the healthy. Q5A and the latency numeric sub-score maintained a strong correlation with the TechRO sleep duration. The efficiency numeric sub-score maintained the strong correlation with the fair activity. Q5E had an inverse relation with TechRO physical activity across these two groups, but expressed through negative correlations with the relative vigorous duration and the relative light duration, respectively. Q5C (trouble sleeping due to using the bathroom) was highlighted in both healthy and diseased groups, but expressed an inverse relation with physical activity through different outcomes: light-fair activity duration and energy expenditure, respectively (Table 13).

Sleep Outcomes Highlighted by Both Metrics
In the health group with all participants, PRO item Q7 (trouble staying awake driving, eating, socialising) appeared as informative in both metrics (Tables 6 and 13, rows with Health: All).
In the group with healthy participants, Q5C (trouble sleeping due to using the bathroom) was an informative PRO item that appeared in both metrics (Tables 6 and 13, rows with Health: Healthy).

Sleep Outcomes Interpretation
Several PRO items strongly correlated with sleep-specific TechROs. In the health group with all participants, having trouble sleeping due to not being able to get to sleep as well as the sleep latency numeric sub-score correlated with relative sleep duration while having trouble staying awake while driving, eating, or socialising as well as the daily dysfunction numeric sub-score correlated negatively with relative sleep duration. In the group with healthy participants, the duration to fall asleep, having trouble sleeping due to not getting to sleep, the duration to stay in bed, and the latency numeric sub-score correlated with longer relative sleep during the day. In the group with diseased participants, only the time gone to bed at night correlated negatively with absolute sleep duration. Studies assessing sleep in healthy adults may benefit from the monitoring of the entire day, not only the sleep duration, to find a higher amount of significant outcomes.
In the health group with all participants, PRO decreased sleep quality correlated negatively with TechRO relative light and vigorous activity. In the group with healthy participants, the sleep efficiency numeric sub-score correlated with the relative fair activity, and using the bathroom correlated negatively with relative light physical activity (with a broad contour). In the group with diseased participants, the duration of actual sleep correlated with absolute fair, fair+vigorous, and vigorous durations. Having trouble sleeping due to waking up in the middle of the night may be an indicator of already low sleep quality in participants with mild disease.
3.4.8. coQoL for Health-Related Quality of Life (EQ-5D-3L vs. Fitbit) We report the correlations of PRO health-related Quality of Life variables (EQ-5D-3L) with TechRO variables (Fitbit) by using the total and contour metrics.
Health-Related Quality of Life Outcomes by Total Numbers of Correlations Table 6, rows with Outcome: Quality of Life, enumerates the PROs that correlated with the most TechROs (r S ≥ 0.5) across all families by health group.
In the health group with all participants, the PRO items with the most correlations were the health score and Q4 (pain/discomfort). The items in this scale had relatively fewer correlations than the other scales such as social support (MSPSS) or memory (MFE) ( Table 6, rows with Outcome: Quality of Life and Health: All).
In the group with healthy participants, PRO item Q4 (pain/discomfort) had the most correlations with TechROs (Table 6, row with Outcome: Quality of Life and Health: Healthy).
In the group with diseased participants, PRO item Q5 (anxiety/depression) had the most correlations with TechROs (Table 6, row with Outcome: Quality of Life and Health: Diseased).
Q4 (pain/discomfort) was the only PRO item that appeared in two groups: the group with all participants and the group with the healthy (Table 6).

Health-Related Quality of Life Outcomes by Contours of Correlations
We report the strong correlations (r S ≥ 0.8) and their contours between PRO variables (EQ-5D-3L) and TechRO variables (Fitbit) in Table 14. We only found one strong correlation in the group of participants with mild disease, between the PRO depression and anxiety item (Q5) and the TechRO absolute sedentary duration (Table 14).

Health-Related Quality of Life Outcomes Highlighted by Both Metrics
In the group with diseased participants, Q5 (anxiety/depression) recurred in both metrics (Tables 6 and 14, rows with Health: Diseased).

Health-Related Quality of Life Outcomes Interpretation
The PRO health state today correlated with numerous TechROs, in particular with a replacement of vigorous physical activity duration with sleep, sedentary, and fair durations across all participants, with a replacement of fair and vigorous durations with light activity for the healthy, and with a decrease in fair and vigorous activity among the diseased (Appendix C.2).
Pain and discomfort also had numerous correlations with TechROs, but only for the groups with all participants and the healthy. In participants with mild disease, having anxiety/depression correlated with sedentary physical activity. An increase in sedentary duration for participants with already existing mild disease may be an indication of decreased quality of life on the anxiety/depression domains which, in the affirmative, could be further assessed by administering specialized scales.

Use Case Examples for coQoL
The coQoL method allows for the in-depth analysis of the results both in terms of measured outcomes and individual participants. We provide two examples below, pertaining to longitudinal data (Section 3.5.1) and the story of a participant (Section 3.5.2).

Longitudinal Data Example
We exemplify a very strong correlation (r S = 0.9) between PROs and TechROs, to report how the interval and leeway durations influenced the correlations. In healthy participants, the MSPSS item Q3 (family is trying to help, PRO) correlated the strongest with the Fitbit fair physical activity duration in the CLR PA+S family, TechRO) for the TechRO aggregation interval of 28 days with a decreasing pattern as the leeway increases. Table 15 presents the resulting gradients of correlations for all combinations of TechRO aggregation interval-leeway durations and the TechRO raw data that yielded the strongest correlation. Table 16 depicts the raw results. In this table, the relative fair column is a centred log-ratio that has both negative (for less relative fair activity) and positive quantities (for more relative fair activity). Table 15. Gradient of correlations by interval durations (columns) and leeways (rows) in days.

Participant Story Example
Participant 169 is a 69-year-old female from Hungary who self-reported mild disease. She has a university degree, lives with her partner (no children), does not smoke, and drinks alcohol daily. She is a diligent responder who answered in all three waves of our study, wore the Fitbit for 794 days from which 141 were valid.
When aligning the numeric scores from the PRO scales and the TechROs (Table 17), Wave 1 (mid-2018) had the worst PRO depression and anxiety, (close to the worst) memory, and sleep as well as (close to) the worst TechRO sedentary duration, light activity duration, (close) fair activity, and vigorous activity duration. Wave 2 (end-2018 and start-2019) had the least adequate PRO physical activity, adherence to the Mediterranean diet, memory, sleep, and quality of life, reflected in the least adequate TechRO energy expenditure, steps, heart rate, sedentary duration, fair activity duration, and total active duration per day. In Wave 3 (mid-2019), Participant 169 registered better PRO for physical activity, depression and anxiety, memory, and sleep as well as more steps, a shorter sedentary duration, and longer light, fair, and vigorous durations. Social support was always high but never optimal. Nutrition and Quality of Life maintained high, but not optimal for waves 1 and 3. During the winter, the sleep duration was higher than during the summer. This real user example illustrates and emphasizes the importance of longitudinal state and behaviour assessments; we observed the change of state in participant 169 as a change in the TechRO variables that indeed associated with worse PRO-based self-reported states.  Color coding: from orange (worse outcome) to yellow to green (better outcome).

Discussion
In this section we discuss our methodological approach (Section 4.1), the coQoL method in the perspective of past evidence (Section 4.2), observations on data quality (Section 4.3), and pathways towards personalized medicine (Section 4.4). We then review several limitations of our study (Section 4.5) and envision future work (Section 4.6).

Overall Methodological Approach in PROomics
The coQoL method explored patterns of correlations between PROs and TechROs towards their co-calibration. Consequently, we focused on identifying groups of strong correlations between PROs with a given recall period and TechROs, aggregating weeks to months of wearables data available before the administration day of the PRO. We considered correlations between similar latent constructs, e.g., PRO and TechRO physical activity or sleep, as high from 0.8 and above. However, for different latent constructs, such as PRO social support and TechRO sleep, where the probability of random correlation is low, correlations of even 0.5 are high. Hence, we presented in here correlations of 0.5 and above as of importance.
Due to the exploratory nature of our method, we deliberately omitted adjustments for multiple comparisons. The results of our method can guide future observational studies, as well as personalized, adaptive interventional studies, where the observational component will inform the intervention design as we go. Researchers can power such studies for enough confidence to exclude trivial effects.

coQoL in Perspective of Past Evidence
We recall that little prior research focused on assessing the relationships between sets of different outcomes assessed via PROs and consumer wearable TechROs in healthy seniors, in the wild, for extended periods (beyond the typical study duration of 7-14 days). On the one hand, past studies may have had similar to larger sample size, yet they have not yielded stronger statistical results; these co-calibrations rarely report values r S ≥ 0.5, as we do. On the other hand, we report a more prolonged study duration (up to 2 years). The study duration of over a few weeks is essential to overcome the "novelty" effect of the technology (TechRO) on the state and behaviour of the user. Namely, the user, motivated by the feedback provided by the device while the study is being conducted, may move more or sleep differently, which then would be erroneously co-calibrated with the self-reports (PROs). The coQoL method leads to more accurate, real-world PRO-and TechRO-based datasets representing the real states and behaviours of the users. We define the past evidence in the context of momentary co-calibration efforts, where the PRO-TechRO co-calibrations may have been valid only for the short interval of data collection. Our proposed method coQoL expands the state of the art.

Observations on Data Quality
The wearable monitored some TechROs for more days than others. For example, the energy expenditure and steps appeared in most days. However, some days did not include durations of physical activity at increasing intensities, due to some seniors not wearing the wearable for enough hours that Fitbit recognized the activity or they did not reach the increased intensity physical activity on those days. Also, the TechROs that combine other TechROs, e.g., fair+vigorous, appeared in at most the minimum of the numbers of days when their constituent TechROs appeared. We acknowledge errors of a few days in long-term monitoring stemming from conditions beyond our control, such as errors at the device setup, at the recruitment site which took days to correct, or when running the automated data collectors from the seniors that were beyond our control in the project. These technological and human factors influenced the quality of the available data.
The wearable monitoring period may depend on the measured outcome, frequency of answers, and human factors. While the recall period of many scales is short (e.g., one week), collecting wearable data only for that duration may prove too strict. If the design is too strict, numerous participants will disqualify, and the results may bias in favour of diligent or adherent responders, who may also exhibit positive behaviours, e.g., exercising more diligently as well. Although some results indicate that 14-28 days of data could be enough for significant co-calibrations, the observations used in the co-calibration depend on the PRO answers and the TechRO data alike. If the participants are adherent to data collection for four weeks, but do not answer the questionnaire, the quality of the data may be insufficient to derive correlations. For some questionnaires, coQoL may relax the alignment (leeway) to account for human factors that contributed to data loss. On the other hand, a monitoring window of 120 days (4 months) may prove too wide to collect data reflecting the same behaviour as the reported one (the recall period), also because of the potential influence of seasonal effects. These seasonal, as well as other context dependencies, are illustrated when applying the coQoL to the MSPSS social support PRO. Our results indicate that having approximately one month of data before the administration of the MSPSS is sufficient to obtain significant correlations between family trying to help social support and fair activity even within a small sample of 39 participants. We observe that the MSPSS is time context-specific. Overall, across all questionnaires, we argue for an intermediary period of aggregation interval for TechRO not extending beyond 60-90 days.

Pathways towards Personalized Medicine
There is growing evidence within the medical domain that personal data paves a path towards personalized medicine, including genetics data and population-specific data, as well as, on a growing scale, data originating in the individuals' daily life environments and representing their natural, objective behaviours unfolding in different contexts of daily life. Daily life datasets are, in turn, collected via consumer wearables and smartphones with sensing capabilities.
From our study, we learn that an ideal wearable in the context of personalized medicine study would be comfortable to wear; should have a long battery life (at least a few days); should be accepted by individuals to use as their own, such that they forget they are in the study (implying minimal reactivity); and should provide relevant TechRO related to behavioural patterns (e.g., activity status, steps, as opposed to only heart rate, which would be hard to co-calibrate by itself).
Given our results, we also observe that for some PROs, different self-reported health status of the individuals yield different co-calibration results, even though our definition of disease refers only to mild self-reported cases. When the participants have a disease, other TechROs become correlated more strongly with other PROs than for the healthy ones. An observational study involving healthy individuals can leverage the coQoL method by monitoring a relevant subset of PRO/TechROs longitudinally, and occasionally co-calibrating the PROs with TechROs assuming the sensitivity of the coQoL method for when long-term, significant changes in TechRO occur. Based on the occasionally collected PRO answers, further in-depth examination of the individual's state may seek to understand if the TechRO change signals coincide with a significant and relevant PRO change, potentially implying a real change of the individual's health state. Once diagnosed, the individual's health state may be followed up, assuming another set of PRO/TechRO outcomes co-calibrated in time, to assess the change in the state of the disease accurately.
For example, in the case of diseased Participant 169, we observed that improvements or deteriorations in the state (as self-reported via the PROs for physical activity, Mediterranean diet, memory, and Quality of Life) coincided with TechROs (of physical activity in the sedentary, and light-vigorous spectrum, as well as the total physically active duration). Such trends are likely to differ between persons. As observed with Participant 169, administering the PROs only three times in two years and monitoring the TechRO behaviours using the wearable (minimally obtrusively, continuously, during daily life) yielded numerous trends across not only pairs of PROs and TechROs, but also across different PROs and TechROs.
The coQoL can provide a frontline approach to further triage the individual state assessment, for the healthy or diseased, without burdening the individuals with self-assessments, and at the same time without excluding participants who develop diseases and need to be monitored for long periods. In the context of the latter, the coQoL may be very suitable to assess changes of behaviour and health state in chronically ill patients.
We envision the following coQoL use case. The coQoL results can inform the design of longitudinal observations for selected individual PRO/TechRO outcomes, leveraged in personalized medicine solutions. The procedure consists of the observation for several consecutive days (for more TechRO-adherent participants, four weeks; for the less adherent participants, up to 3 months, from which one can derive around four weeks of quality data) followed by the co-calibration of TechROs with PROs. While monitoring, a potential gradual change in a subset of TechROs of interest can lead to contacting the individual for further health outcome assessments, via PRO or even clinical examination.
In new study designs, we suggest the study participation period of 60-90 days at most, and leverage behavioural techniques for participant wearable-adherence, to maximize the validity of the results acquired. The study design may imply repeated measures longitudinally over the years, e.g., PRO/TechRO co-calibration efforts over 60-90 consecutive days, repeated every few months up to a year (assuming same season every year).

Study Limitations
Several limitations characterize the presented here preliminary coQoL study. The first limitation is the small sample size, specific to an exploratory feasibility study. A second limitation is the resulting lack of power that reduced the complexity of the analysis method (i.e., statistical hypothesis tests). A third limitation is the presence of multiple PRO answers per individual for the same wave, albeit with high variability. However, we only included one answer per participant-wave to reduce bias towards diligent responders. In case of multiple answers per participant-wave, we chose the latest answer in time, to account for any form submission issues in the CoME software application or the participant changing their mind after submitting the answers once. A fourth limitation is a significant decrease in the number of participants data leveraged for the co-calibrations; we allowed for a leeway to allow PRO and TechRO alignments that are both (1) short-term, but accurate (e.g., 7-14 days, close to the recall period), and (2) longitudinal, but permissive (e.g., 60-120 days, sufficient for the long-term behaviours to unfold). The study highlights the challenge of retaining individuals (shared by many health studies) that can provide outcomes through both self-report and a wearable that must be worn daily, over long periods.

Future Work
In the ongoing and future work, we expect to involve more participants for shorter periods (60-90 days), repeated every few months to a year, and focus on the PROs and TechROs delineated in this paper to deepen our knowledge about these specific co-calibration efforts and results. We plan to employ more advanced techniques and obtain more results within statistical significance as we increase the sample size in further studies aimed at calibrating PROs and TechROs for health outcomes and longitudinal behaviours such as physical activity and sleep in seniors. We aim to derive individual co-calibration trajectories models, as well as population models, e.g., similar groups of healthy or diseased individuals.

Conclusions
In this study, we present the coQoL method for co-calibrating the relationships between PROs and TechRO for eight PRO outcomes and TechRO behavioural markers of physical activity, sleep, and heart rate in a cohort of 42 seniors contributing data for two years. We reported human factors and quality properties from the data collected while their daily life unfolded. Our results can inform the design of personalized observational that assess daily life behaviours continuously and longitudinally, and that enable interventional studies towards reducing the risk of chronic disease and improve health and Quality of Life in the long term.

Conflicts of Interest:
The authors declare no conflict of interest. The funder had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript: We searched for previous work by following a semi-structured approach, to prune papers distant from our research area from a vast body of literature. We agreed upon a hierarchy with properties divided into positive, neutral, and negative by their relative relevance to our research area ( Figure A1). things [76], medical imaging such as computer tomography or magnetic resonance [77]), focus on recognizing activities of daily life [78], or report only results following interventions [79].

. Answers Scoring
For the PRO questionnaires, we followed the scoring procedures set forth by the authors of the validated scales associated with each questionnaire. Only one questionnaire necessitated an additional assumption. For the physical activity questionnaire (IPAQ), we processed the individuals' physical activity answers by adhering to the data cleaning, maximum values for excluding outliers as described in the guide [89]. However, the guide does not provide a threshold for converting the duration reported as weekly (not daily) to daily into an average daily time. For example, if a senior reported seven hours of vigorous physical activity per day, the duration would likely reflect one hour per day. In this case, we allowed at most 7 h of physical activity per day at any intensity by dividing all excessive durations by 7 days.

Appendix B.1.4. Variables Derivation
We derived variables from both individual items, sub-scores, and scores of PRO scales. While the analysis of the scores exclusively would have been motivated by existing Rasch models providing calibrated positions of individual items and their sub-scores and scores [90], to our knowledge, there are no Rasch models for the PRO scales. The space of consumer wearable manufacturers and devices is diverse, recording over 200 models [91], and the trend of adoption is increasing [13]. From all devices that provide physical activity and sleep TechROs, we chose Fitbit. Fitbit (1) monitors daily life behaviours accurately and continuously, (2) operationalizes the critical human factors for prolonged wear by senior end-users, and (3) facilitates reliable behavioural data collection.
First, Fitbit aims at motivating consumers to "reach health and fitness goals by tracking activity, exercise, sleep, weight, and more" [35]. It was selected for Digital Health software pre-certification by the US FDA [92]. Previous studies measured the accuracy of Fitbit consumer-friendly devices in reporting daily life behaviours of physical activity and sleep. For physical activity, Fitbit One and Zip had strong validity for step count and sleep duration, moderate for energy expenditure, and were weaker for fair and vigorous activity [12]. Fitbit Flex and Zip had adequate reliability and validity in measuring step count [93]. Fitbit Charge HR, Charge, Flex, Surge, Zip, and Alta agree with the ActiWatch GT3X+ research-grade accelerometer in assessing active minutes [37]. For sleep, Fitbit Charge HR can measure total sleep time [94] and time spent in bed [95] reliably, as compared with a sleep diary in a free-living setting or a research-grade accelerometer. For senior populations, Fitbit Charge 2 had better results in step count, energy expenditure, and sleep duration than the Garmin Vivosmart HR+ accelerometer in free-living environments [96]. Also, Fitbit One and Flex measure steps accurately in seniors [97].
Second, the positive senior user experience with the wearable is an essential factor that prolongs monitoring durations. For Fitbit, human factors studies found that over 90% of seniors agree that Fitbit was "easy to use, useful, and acceptable" over 8 months of wear [15] and seniors also place Fitbit the highest in usability (using the System Usability Scale [98]) among numerous other wearables [99]. Furthermore, the presence of a data display on the wristband leads to higher operation ratings [99].
Third, Fitbit provides a well-documented and developer-friendly application programming interface (API) which exposes a rich set of behavioural markers along [22] addressing goals of the project.
For our study, we selected the Fitbit Charge 2 wearable, a small wrist-worn watch which can monitor physical activity and sleep by using the same sensors such as those used in the validations, and displays steps, heart rate, and time, previously used in studies involving seniors (e.g., [96]).

Appendix B.2.2. Wearable Data Processing
To maintain high data quality, we considered valid days for the analysis only those days where the total duration of Fitbit monitoring was at least 21 h. We allowed at most three hours of missing data for device battery charging and handling (15-20 min to 2 h). Our choice reduced the impact of missing measurements and improved not only the measurement accuracy of TechRO behavioural markers in absolute daily durations but also enabled the assessment of TechRO behavioural markers relative to each other in the 24-h model of a day [64].
We constructed aggregate intervals with fixed durations of 7, 14, 21, 28, 60, 90, and 120 valid days to balance the number of included days in the analysis with the available intraday monitoring quality. The choice of 7 days for the lower bound was motivated by the need to acquire enough representative data for daily life, the 7 days as a common denominator of the PRO recall periods (where present), and the significant improvements in Fitbit accuracy for active minutes from 7 days onwards [37]. The choice of increasing intervals to the upper bound of 120 days reflected the duration of a wave, a large number of valid days per person (e.g., median 153 days for Spanish participants, Table A11), but also the high variance (a standard deviation of 113 days in Spain, Table A11).
We only included in the analysis intervals with at least 70% of their days valid, such that both weekdays and weekends were expected present in a week; the limit is compatible with previously reported consumer wearable use in seniors [100].

Appendix B.2.3. Variables Derivation
We split the TechROs into two amounts, absolute (behaviours in isolation, expressed in absolute amounts) and relative (behaviours relative to each other reflects the interdependences between behaviours during the 24 h of the day [64], expressed in relative amounts by the centred log ratios (CLR) of their compositions [65]).
In the absolute amount, we derived the variables into two families: raw and processed. We derived the raw daily energy expenditure (energy), step count (steps), and resting heart rate (heart rate) towards a total of 3 raw TechROs. We then derived the processed sedentary duration (sedentary), and the duration at three intensities (light, moderate, and vigorous) as processed by the Fitbit internal activity recognition algorithms. Since Fitbit had not published intensity thresholds, we also derived the cumulative durations in processed sedentary and light (sedentary+light), light and fair (light+fair), and fair and vigorous (fair+vigorous) intensities. We also calculated the total daily active duration (active) cumulating the light, fair, and vigorous processed durations. For sleep, we included the entire sleep duration of the day as a processed TechRO towards a total of 9 processed TechROs. We derived a total of 12 TechROs in the absolute amount.
For each aggregate interval duration and absolute TechRO, we used in the analysis as the aggregate the median from the absolute daily amounts as a variable. The 84 resulting variables are visible in the upper half of Table 3.
In the relative amount, we derived variables denoting compositional components of physical activity intensities and sleep throughout the day. We derived TechROs for each component of the centred log-ratio (CLR, [65]) transformation. The CLR is a symmetric transformation that does not require a reference component behaviour. We computed the CLRs of two families denoting distinct compositions: (1) from all physical activity durations (CLR PA) and (2) from all physical activity durations and the sleep duration (CLR PA+S), having 4 and 5 TechROs, respectively. We derived two relative families, as the CLRs of a composition do not translate to sub-compositions [65], but some studies may not be able to monitor sleep. We obtained a total of 9 TechROs in the relative amount.
For each aggregateinterval duration and relative TechRO, we used in the analysis as the aggregate the geometric mean from the relative daily amounts. The 63 resulting variables are visible in the lower half of Table 3.
The 147 derived TechRO variables can be seen in Table 3 (TechRO).

Appendix B.3.1. Descriptive Analysis (PROs and TechROs)
We describe the PROs and TechROs from two perspectives. The first perspective refers to the values in the data. The second perspective refers to the amount of data.
Within the first perspective, we describe the PROs by observing three summary statistics (median, mean, and standard deviation) of the participants-waves when grouped by health status (healthy vs. (mildly) diseased), country (Spain vs. Hungary), and gender (male vs. female) (Tables A3-A10).
Within the same perspective, we describe the TechROs by observing medians across the entire monitoring period (Table A12) in the first perspective.
Within the second perspective, we observe the counts of total and valid days (Table A11) within the same groups as for the first perspective.

Appendix B.3.2. Inferential Analysis (PROs vs. TechROs)
We set the leeway between PRO administration date and TechRO aggregate interval end date at (successively) 0, 7, 14, 21, 28, 60, 90, 120 days due to scarce exact matches. Pairs of variables with nearer such dates took precedence. We then analyzed lists of these pairs by using Spearman rank correlations. We chose this test as the best statistic to represent co-calibration motivated by the following assumptions. First, the PRO and TechRO variables were not independent (as they referred to the same participant). Second, the Spearman test is a nonparametric test that does not require an underlying distribution for the variables (some variables did not distribute normally, Shapiro Wilk normality test yielded p < 0.05-and some variables measured different metrics). Third, our aim was holistic in observing groups of significant correlations (and not individual correlations).
We only report the strongest correlation per TechRO interval duration. We consider correlations between distinct constructs (e.g., PRO social support and TechRO sleep duration) to be strong at r S ≥ 0.5 and associations between similar constructs (e.g., PRO and TechRO physical activity) to be strong at r S ≥ 0.8.
We consider a correlation coefficient significant when the extremities of its 95% confidence interval have the same sign. We avoided effect omissions at the expense of potential effects due to chance by not using adjustments for multiple tests [101] as our focus is on observing groups of correlations rather than individual correlations.

Appendix B.3.3. Pattern Analysis (PROs vs. TechROs)
For the pattern analysis, the contour metric separately counts for a significant and strong target correlation for a physical activity intensity (r S 0.8 or above) the other significant correlations of the same sign at the lower and higher intensities. In case the intensity of the target correlation is at the extremity, the metric is undefined. In case the target correlation is adjacent to a correlation that has the opposite sign or is non-significant, the count on that side is 0. In case the correlation is unrelated to a physical activity intensity, this metric is undefined.
For example, the fair physical activity correlation 0.8 and the sequence of correlations [sedentary: 0.4*, sedentary+light: 0.5, light: 0.6*, light+fair: 0.6*, fair: 0.8*, fair+vigorous: 0.3*, and vigorous: −0.1*], where * denote significant correlations, has two correlations of lower intensities (0.6*, 0.6*) and one of higher intensity (0.3*). Figure A2 illustrates this case as Example (a). The figure contains three more examples. Figure A2. Examples of contours of correlations interrupted by non-significant or opposite-sign correlations. r S marks the target correlation. × marks an interruption. Arrows mark the width of the contour. Only significant correlations are colored from red (weak) to green (strong). In example (a), the contour is interrupted by a non-significant correlation (at a lower intensity) and an opposite-sign correlation (at a higher intensity). Example (b) interrupts the entire right side of the contour by an opposite-sign correlation, represented with ×. Example (c) depicts a singleton contour, marked with × on both sides. Example (d) illustrates the rare case of a higher correlation than the target correlation, both in the same contour.

Appendix C. Results
This section includes results from our descriptive (Appendix C.1) and inferential analysis (Appendix C.2) analyses.

Appendix C.1. Descriptive Analysis (PROs and TechROs)
This part includes results from our descriptive analysis from patient-reported outcomes (Appendix C.1.1) and technology-reported outcomes (Appendix C.1.2).

Appendix C.1.1. Patient-Reported Outcomes (Questionnaires)
The 39 participants provided 289 answers (7.4 ± 4.4) on the 8 scales along the 3 waves. Table A2 depicts the numeric scores across waves.  Color coding: from orange (worse score) to yellow to green (better).

Physical Activity (IPAQ)
We recorded 27 answers about physical activity on the IPAQ scale [26] that partitions physical activity into low, moderate, and high levels. The scale is described in depth in Appendix B.1.1. All participants recorded a median (mean ± SD) numeric score of 8038 (9535 ± 7106). There were 14 answers with a low categorical level of physical activity, one answer with a moderate level, and 12 answers with a high level. Table A3 enumerates the answers and Figure A3 depicts the sub-scores and scores by participant group.
Participant physical activity separated into two groups at the extremes of low and high physical activity. The levels only approximated the numeric scores, as the low categorical scores concentrated in the lower third of numeric scores and the high categorical scores concentrated in the upper third of numeric scores; the middle third included low and high levels of physical activity alike.
The participants from Hungary self-reported increased physical activity as compared to those from Spain, registering a median (mean ± SD) numeric score of 8478 (9738 ± 7370) compared to 6431 (9281 ± 6752) and a median categorical level of high physical activity compared to low physical activity.
Male participants reported increased levels of physical activity, registering a higher median numeric score of 8478 compared to 6820; however, the most active 5 participants contributed to a lower mean (SD) numeric score of 7916 (4038) compared to 11037 (8806) for the females. Woman participants registered higher variability in their self-reported physical activity than men.
Less than half (12/27) of the answers reported physical activity related to the work domain. Only a few (7/27) answers reported cycling as a means of transportation, and they associated with the upper half of numeric scores. The participants from Hungary reported increased physical activity as compared to those from Spain. Male participants reported increased median physical activity, and female participants reported increased mean physical activity.  Color coding: from orange (worse outcome relative to others) to yellow to green (better outcome).   Figure A4 depicts the sub-scores and scores by participant group.

1195
Both healthy and diseased participants reported only slightly different levels of social support, 1196 as observed from the median (mean ± SD) of 5.0 (5.3 ± 0.9) healthy and 5.0 (5.5 ± 0.9) diseased.

1197
Participants with disease reported slightly higher significant other social support, registering mean

Social Support (MSPSS)
Participants provided 55 answers on the MSPSS scale [27]. Their levels of social support were on a numeric scale from 1.0 to 7.0 corresponding to the categorical low, moderate, or high levels of social support. We describe this scale in Appendix B.1.1. All participants had a median (mean ± SD) numeric score of 5.0 (5.4 ± 0.9). Most answers corresponded to high social support. The levels of social support from separate sources (significant other, family, and friends) were also generally high. No answers reported low social support. Health status, country, and gender did not appear to change the level of social support fundamentally, neither by source nor in general. Table A4 enumerates the answers and Figure A4 depicts the sub-scores and scores by participant group.
Men self-reported lower social support than women, as observed in the median (mean ± std) numeric scores of 5.0 (5.2 ± 1.0) vs. 6.0 (5.5 ± 0.8) as well as median categorical score drop from high to moderate. Males self-reported less social support from the friends at means 5.2 vs. 5.6, less social support from the significant other at means 5.5 vs. 5.6, and similar social support from the family at mean 5.5.

of 89
Men self-reported lower social support than women, as observed in the median (mean ± std) 1206 numeric scores of 5.0 (5.2 ± 1.0) vs 6.0 (5.5 ± 0.8) as well as median categorical score drop from high  Table A5 enumerates the answers and Figure A5 illustrates the scores by participant group.

1217
Most answers corresponding to moderate and severe anxiety and depression originated from 1218 participants who self-reported as diseased. Across the items and scores, the participants with disease 1219 reported more substantial anxiety and depression than the healthy participants, in particular for

Anxiety and Depression (GADS)
We measured anxiety and depression through 34 answers on the GADS scale [28]. The scale assesses whether the anxiety and depression are categorized as absent, possible, mild, moderate, or severe through a numeric score from 0 to 90. It can be consulted in Appendix B.1.1. Participant mean ± SD numeric score was 20.8 ± 18.1. Participants self-reported absent anxiety and depression in 10 answers, possible anxiety and depression in 12 answers, mild in 6 answers, moderate in 4 answers, and severe in 2 answers. Table A5 enumerates the answers and Figure A5 illustrates the scores by participant group.
Most answers corresponding to moderate and severe anxiety and depression originated from participants who self-reported as diseased. Across the items and scores, the participants with disease reported more substantial anxiety and depression than the healthy participants, in particular for questions Q3A and Q7D. The median (mean ± SD) value for Q3A was 3.0 (2.0 ± 1.7) vs. 1.0 (0.9 ± 0.9). The median (mean ± SD) value for Q7D was 4.0 (2.8 ± 1.8) vs. 1.0 (1.3 ± 1.3), different by 2 and 3 levels, respectively. The median categorical scores were also different by one level, from possible to mild anxiety and depression. The answers from healthy participants had less variability than the answers from the participants with disease.
± 18.8) compared to 11.5 (13.7 ± 13.9). They reported anxiety and depression with higher variability 1228 as well. the Mediterranean diet, and two-thirds correspond to a medium adherence. Table A6 enumerates the 1235 answers. Figure A6 illustrates the scores by participant group.

1236
A remarkable result is that among the nutrition diets none had high adherence to a Mediterranean

1241
One question that associated with the numeric and categorical scores is Q1 referring to olive oil as 1242 the primary culinary fat. Conversely, questions Q7 on sweet beverage use and Q13 on the preference 1243 for small animal meat had only 1/23 and 2/23 answers in the affirmative.

1244
Participants from the healthy and diseased groups reported similar adherence, but higher 1245 variability, with means (SD) of 7.1 (2.7) and 6.9 (1.7), respectively. 1246 The participant country of residence much coincided to the numeric score on the Mediterranean 1247 nutrition scale. All participants from Spain reported numeric scores of 7 or higher, corresponding to a 1248 medium adherence. Only one outlier person from Hungary had a numeric score of 9, and all other 1249 participants from Hungary had numeric scores of 7 or less. All participants categorized as having no 1250 adherence to the Mediterranean diet were from Hungary. Participants from Spain reported a median 1251 (mean ± SD) numeric score of 9.0 (8.8 ± 1.4) compared to 5.5 (5.3 ± 2.0) for Hungary. In general, the 1252 answers from the participants from Hungary had higher variance. 1253 The answers from male participants indicated a higher adherence as depicted by the medians 1254 (means ± STD) of 8.5 (7.4 ± 2.6) and 7.0 (6.8 ± 2.3) on the numeric score, but also higher variability.

1255
However, there were fewer answers from men than women for this scale.
1256 Nutrition (SelfMNA) 1257 We quantified participant nutrition through 24 self-reported answers on the SelfMNA scale [31]. 1258 The scale assesses a categorical nutrition status as normal, at risk of malnutrition, or having malnutrition

Mediterranean Nutrition (PREDIMED)
Participants self-reported their adherence to the Mediterranean diet by answering the PREDIMED scale [29,30] 23 times. The scale provides categorical scores for absent, medium, and high adherence using a numeric scale from 0 to 14 points, as described in Appendix B.1.1. Participants registered a mean ± SD numeric score of 7.0 ± 2.4. One-third of the answers corresponded to absent adherence to the Mediterranean diet, and two-thirds correspond to a medium adherence. Table A6 enumerates the answers. Figure A6 illustrates the scores by participant group.
A remarkable result is that among the nutrition diets none had high adherence to a Mediterranean diet. The scoring of the PREDIMED scale may explain this fact. It requires at least 13/14 items to be indicative of a Mediterranean diet to categorize the diet as highly adherent, while only 6/14 are necessary for medium adherence. The most adherent two participants only scored 11/14 and were thus categorized with medium adherence. One question that associated with the numeric and categorical scores is Q1 referring to olive oil as the primary culinary fat. Conversely, questions Q7 on sweet beverage use and Q13 on the preference for small animal meat had only 1/23 and 2/23 answers in the affirmative.
Participants from the healthy and diseased groups reported similar adherence, but higher variability, with means (SD) of 7.1 (2.7) and 6.9 (1.7), respectively.
The participant country of residence much coincided to the numeric score on the Mediterranean nutrition scale. All participants from Spain reported numeric scores of 7 or higher, corresponding to a medium adherence. Only one outlier person from Hungary had a numeric score of 9, and all other participants from Hungary had numeric scores of 7 or less. All participants categorized as having no adherence to the Mediterranean diet were from Hungary. Participants from Spain reported a median (mean ± SD) numeric score of 9.0 (8.8 ± 1.4) compared to 5.5 (5.3 ± 2.0) for Hungary. In general, the answers from the participants from Hungary had higher variance.
The answers from male participants indicated a higher adherence as depicted by the medians (means ± STD) of 8.5 (7.4 ± 2.6) and 7.0 (6.8 ± 2.3) on the numeric score, but also higher variability. However, there were fewer answers from men than women for this scale.  Color coding: from orange (worse outcome relative to others) to yellow to green (better outcome).

Nutrition (SelfMNA)
We quantified participant nutrition through 24 self-reported answers on the SelfMNA scale [31]. The scale assesses a categorical nutrition status as normal, at risk of malnutrition, or having malnutrition and a numeric score between 0 and 14, as detailed in depth in Appendix B.1.1. Participants are well-nourished. Participants recorded a mean ± SD numeric score of 12.2 ± 1.7. More than two-thirds of the participants self-reported a healthy amount of nutrition, and the remaining answers reflected a risk of malnutrition. One third obtained the maximum possible numeric score. None of the answers categorized the participant as malnourished. Table A7 depicts the answers and Figure A7 illustrates the scores by participant group.  Color coding: from orange (worse outcome relative to others) to yellow, to green (better outcome).
The groups of healthy and diseased participants were characterized by similar medians (12.0) and means (12.1 and 12.4), and only slight differences in the standard deviations (1.8 vs. 1.5). Healthy participants self-reported a decline in food intake for question Q1 while participants with disease reported being more stressed and severely ill in question Q4. Participants with disease reported less weight loss in Q2 as well as fewer variable answers across all items and scores except for Q4.
The participants from Spain reported similar levels of nutrition; however alternating ranks between questions: participants from Spain reported more decline in food intake in Q1, less weight loss in Q2, more mobility in Q3, and less stress, illness, dementia, or sadness in Q4 and Q5. Participants from Hungary reported had a more stable numeric score with a standard deviation of 1.11 for Hungary compared to 1.92 for Spain.
Women and men reported similar levels of nutrition, but provided more stable answers within their group, e.g., male standard deviation of 1.21 compared to female standard deviation of 1.79 for the numeric score. the scores by participant group. 1265 The groups of healthy and diseased participants were characterized by similar medians (12.0) 1266 and means (12.1 and 12.4), and only slight differences in the standard deviations (1.8 vs 1.5). Healthy 1267 participants self-reported a decline in food intake for question Q1 while participants with disease 1268 reported being more stressed and severely ill in question Q4. Participants with disease reported less 1269 weight loss in Q2 as well as fewer variable answers across all items and scores except for Q4. 1270 The participants from Spain reported similar levels of nutrition, however alternating ranks 1271 between questions: participants from Spain reported more decline in food intake in Q1, less weight 1272 loss in Q2, more mobility in Q3, and less stress, illness, dementia, or sadness in Q4 and Q5. Participants 1273 from Hungary reported had a more stable numeric score with a standard deviation of 1.11 for Hungary 1274 compared to 1.92 for Spain.

1275
Women and men reported similar levels of nutrition, but provided more stable answers within 1276 their group, e.g., male standard deviation of 1.21 compared to female standard deviation of 1.79 for 1277 the numeric score.  Table A8 enumerates 1284 the answers. Figure A8 illustrates the scores by participant group.

1285
One item whose answers may associate with the numeric score is Q15: Forgetting important 1286 details of done things.

Memory (MFE)
Participants reported 36 answers on the MFE scale for memory [32]. The scale classifies memory failures as absent or potential through a numeric score from 0 to 56. See the description of MFE in Appendix B.1.1. Participants had mean ± SD numeric score of 8.7 ± 4.7. The median and mean numeric scores indicate absent memory failures. One-third of the answers indicate the possibility of memory failures, originating predominantly from female participants from Spain. Table A8 enumerates the answers. Figure A8 illustrates the scores by participant group.
One item whose answers may associate with the numeric score is Q15: Forgetting important details of done things.
The participants self-reported as diseased reported a higher probability of memory failures, as seen in the median (mean ± SD) numeric score of 9 (9.41 ± 4.5) compared to 7 (8.45 ± 4.8) for healthy participants. The ranking for the medians and means for individual items between the healthy and diseased alternate. Examples of questions where the diseased fared worse include Q5 (checking whether something was done), Q6 (forgetting time of events), Q14 (forgetting to do planned things), and Q18 (forgetting to tell somebody something important) as seen from the medians different by 1 out of the maximum two levels as well as the slightly different means. Healthy and diseased participants had similar variability in the numeric scores and alternating ranks of variability within individual questions.
The participants from Hungary may have slightly fewer chances of memory failure, as observed from the medians (means) of 7.5 (7.7) and 8.5 (9.7) different by 1 (2) points. Furthermore, the numeric scores from the participants from Hungary are more stable. Questions Q5 (checking whether something was done) and Q6 (forgetting time of events) indicate the potential memory decline within the subjects from Spain. Question Q8 (being reminded about things) indicates the opposite. Other questions that weigh towards an expected increase in memory failures for the participants from Spain are Q7 (being reminded about things), Q21 (telling someone a story or joke repeatedly), and Q24 (forgetting where things are normally kept).
Men self-reported improved memory numeric scores as compared to women, as seen from the medians (means) of 6 (6.54) and 8 (9.76), respectively. Questions that contribute to this difference are Q6, Q8, and Q24 and against this difference Q5. Males self-reported more stable memory failures, as seen from the SD 3.86 and SD 4.76, respectively.  Color coding: from orange (worse outcome relative to others) to yellow to green (better outcome).
things are normally kept).

1304
Men self-reported improved memory numeric scores as compared to women, as seen from the 1305 medians (means) of 6 (6.54) and 8 (9.76), respectively. Questions that contribute to this difference are 1306 Q6, Q8, and Q24 and against this difference Q5. Males self-reported more stable memory failures, as 1307 seen from the SD 3.86 and SD 4.76, respectively. Sleep (PSQI) 1309 The seniors self-reported their sleep quality through 32 answers on the PSQI scale [33]. PSQI  Table A9 enumerates the answers. Figure A9 illustrates 1314 the sub-scores and scores by participant group. 1315 The participants with disease self-reported less adequate sleep, as depicted by the median (mean

Sleep (PSQI)
The seniors self-reported their sleep quality through 32 answers on the PSQI scale [33]. PSQI assesses sleep quality as good or poor based on a numeric score from 0 to 21, as described in Appendix B.1.1. Participants recorded a median (mean ± SD) numeric score of 6.0 (6.3 ± 3.9). The median and mean sleep quality situated at the better extremity of poor sleep quality. Two-fifths of the answers corresponded to poor sleep quality. Table A9 enumerates the answers. Figure A9 illustrates the sub-scores and scores by participant group.
The participants with disease self-reported less adequate sleep, as depicted by the median (mean ± SD) of 8.0 (8.6 ± 3.2) compared to 5.0 (5.3 ± 4.3). Participants with disease self-reported less adequate sleep through questions Q5B (trouble sleeping due to waking up in the middle of the night) with a difference between median (mean) answers of 1.5 (0.53) out of 3. Conversely, healthy participants self-reported decreased sleep quality due to using the bathroom in Q5C with a median (mean) difference of 1.0 (0.55) out of 3. The healthy participants provided more stable PROs with a standard deviation for the numeric score of 3.23 as compared to 4.34.
The participants from Hungary reported worse sleep quality with a median (mean ± SD) of 6.0 (7.5 ± 0.2) in Hungary compared to 5.0 (5.5 ± 0.1) in Spain. The difference between the sleep quality for participants in Hungary and Spain is visible in the numeric sub-scores, e.g., subjective sleep quality, latency, duration, efficiency, and disturbance, but not medication. However, the Spanish participants reported more stable PROs.
Women and men reported similar levels of sleep quality with equal medians and means (0.9 and 0.8). Question Q5A: Trouble sleeping: cannot get to sleep influenced the quality of sleep in women, as observed by a difference of over one unit from a maximum of 3 between means. Males provided more stable results with a standard deviation of 2.45 compared to 4.32 for the numeric score. At the extremity of inadequate sleep, the worst six levels of sleep quality correspond to women from both Spain and Hungary.  Health-Related Quality of Life (EQ-5D-3L) Participants provided 30 answers about their quality of life on the EQ-5D-3L scale [34]. The scale provides 3 severity levels for five facets of life quality, no problem, some problems, and extreme problems as well as a 0-100 numeric score for the health status on the day of the administration, as detailed in Appendix B.1.1. Half of the answers report a health score of 90 or above. Five answers reported a health score of 75 or below, and five answers reported a health score of 100. Table A10 shows the answers and Figure A10 illustrates the sub-scores and scores by participant group. The mean ± SD perceived health is at 84.96 ± 13.8 across all participants. The means ± SD for the five domains are as follows: 1.2 ± 0.4 for mobility, 1.0 ± 0.0 for self-care, 1.1 ± 0.3 for usual activities, 1.5 ± 0.6 for pain/discomfort, and 1.2 ± 0.4 for depression/anxiety. None of the participants self-reported quality of life issues due to self-care impediments.
The healthy and diseased participants report similar quality of life in the mobility, self-care, and usual activities. However, the participants with disease report worse pain/discomfort and depression/anxiety. Furthermore, the participants with disease report a mean health score of only 77.27 as compared to the 89.42 for the healthy. The participants with disease also self-report less stable answers, e.g., SD for the health score of 16.97 as compared to the SD of 8.95 of the healthy.
Participants from Spain self-reported a slightly improved health than those from Hungary. The participants from Spain reported a median health score of 90 compared to 85 for those from Hungary. However, the mean health scores are similar: 86.84 and 83.52, respectively. The participants from Hungary participants provided more stable health score, but more varied depression/anxiety responses than the participants from Spain.
Female participants report similar health as compared to male participants, with a median health score of 85 compared to 90, but a mean of 85.42 compared to 83.88. Women self-report experiencing slightly less mobility, usual activities, and depression/anxiety.  Color coding: from orange (worse outcome relative to others) to yellow to reen (better outcome).
Appendix C.1.2. Technology-Reported Outcomes (Fitbit) We overview the TechROs by first assessing the data quality. Table A11 depicts the total compliance (as the number of days including TechROs) as well as the intraday compliance (as the number of valid days). Figure A11 depicts participant compliance in days (all monitored and valid) for each participant group. Figure A12 illustrates participant compliance by outcome. Figures A13-A15 show participant compliance by health, country, and gender groups, respectively.
While participants wore the devices for a median (mean) of 224 (295)            Concerning total compliance, Fitbit devices were worn by the participants in 295 ± 238 days on average and 50% of participants wore the Fitbit devices in at least 224 days. Healthy participants wore the devices on average 58 days more than participants with disease. Hungarian participants were also significantly more compliant in wearing the devices, by achieving mean 543 (446 more) days with monitored data. From the top 10 compliant, six were Hungarian. Most days were recorded by three Hungarians, and most valid days were recorded by one Hungarian. Men wore the devices for only slightly more extended periods than women. from the participants' days over the entire period of monitoring and summary statistics by participant 1387 group. The following paragraphs describe each TechRO in depth. Figures 8 and 9 depict the median 1388 values for each group across the entire monitoring period. Regarding intraday compliance, participants wore the devices for more than 23 h for a mean ± SD of 89 ± 89 days while 50% of them wore the devices for at least 49 valid days of 21 h. One third had less than 30 valid days, half had less than 60 days, one person had 90 days, and one third had more than 120 days. The participants with disease were more compliant intraday than the healthy participants, keeping 37 valid days as compared to only 51 by the healthy participants, having a relative ratio to the total days of 4. Participants from Hungary were also more compliant intraday, achieving 140 valid days compared to 30 valid days and 13 ratio to total.
We overview the dataset by depicting in Table A12 the medians of the TechRO variables obtained from the participants' days over the entire period of monitoring and summary statistics by participant group. The following paragraphs describe each TechRO in depth. Figures 8 and 9 depict the median values for each group across the entire monitoring period. Steps (Raw Family) 1398 For the steps Fitbit behavioural marker, participants were active: they performed a median (mean 1399 ± SD) of 8690 (8084 ± 3205) measured steps per day. Table A12 illustrates these results. Energy Expenditure (Raw Family) For the energy expenditure Fitbit behavioural marker, participants spent a mean ± SD energy of 2013 ± 487 kcal. 50% participants spent 1896 kcal. or more per day. Table A12 illustrates these results.
Participants with disease consumed 100-200 kcal. more than healthy participants per day, with medians (means) of 2000 and 1825 (2139 and 1951). We observed a similar difference between the participants from Hungary and Spain (difference of means 213 kcal). Men consumed more calories than women, with respective medians (means) of 2516 and 1720 (2477 and 1686), but also with higher variation, with male SD 363 kcal. vs. female 250 kcal.

Steps (Raw Family)
For the steps Fitbit behavioural marker, participants were active: they performed a median (mean ± SD) of 8690 (8084 ± 3205) measured steps per day. Table A12 illustrates these results.
Healthy participants performed on average 556 more steps than participants with disease, and with a median difference of 932 steps. Healthy and diseased participants had comparable variabilities in the step counts. Participants from Spain performed on average 1217 more steps than participants from Hungary and the devices measured more consistency. Men performed 1992 more steps on average than women. However, the 50% step counts are similar, partly due to four males who performed more than 12.000 median steps per day.

Heart Rate (Raw Family)
For the heart rate behavioural marker measured by Fitbit, the median and (mean ± SD) were Both healthy and diseased participants reported similar heart rate means and medians. Devices owned by participants with disease reported higher variability between daily measures than healthy participants with 8.77 bpm. and 5.81 bpm., respectively. Hungarian participant devices reported a lower median at 56 compared to 61 bpm. On average, men had 3 bpm. less than women.

Sedentary Duration (Processed Family)
For the behavioural marker of sedentary duration, the participants recorded 801 ± 192 mean minutes per day. Table A12 illustrates these results.
Participants with disease report more sedentary time than healthy participants, with means of 781 and 739 min, respectively. Participants from Hungary report 88 min more sedentary duration on average with 857 compared to 769; however, they report similar medians. Men also report 242 min. more sedentary time than women, with medians 971 and 729 min, respectively.

Light Intensity Physical Activity Duration (Processed Family)
For the duration of physical activity at a light intensity as reported by Fitbit, all participants spend on average 213 ± 57 min per day. Table A12 illustrates these results.
Healthy participants report approximately 20 min more per day with a median (mean) of 230 (219) compared to 199 (203). Participants from Spain also report 30 min more with 229 median min for Spain compared to 193 median min for Hungary. Females are more active in the light intensity spectrum by 20 min than males.

Fair Intensity Physical Activity Duration (Processed Family)
For the duration of physical activity at a fair intensity as reported by Fitbit, all participants spend on average 21 ± 13 min per day. Table A12 illustrates these results.
Regardless of their grouping criteria of health status, country, or gender, participants consistently report means and medians in the 16-22 min for the fair intensity physical activity.

Vigorous Intensity Physical Activity Duration (Processed Family)
For the duration of physical activity at a vigorous intensity as reported by Fitbit, all participants spend on average 26 ± 21 min. per day. Table A12 illustrates these results.
Regardless of their grouping criteria of health status or country, participants consistently report means and medians in the 19-28 min for the vigorous-intensity physical activity. Men may perform vigorous physical activity for 10-15 min more than women, as observed in their respective medians (means) of 27 (35) and 19 (20), but also with more variability as their standard deviation is 28 compared to 11.

Sleep Duration (Processed Family)
For the sleep duration, participants sleep on average 7 ± 1.6 h and 50% of the participants sleep 7 h and 30 min. Table A12 illustrates these results.
The healthy participants sleep on average 18 min more than those with mild disease.