4.1. Single-Case Experimental Design on QuantifyMe
Typical quantified self experiments measure an outcome (dependent variable) associated with whatever the user’s behavior was at the time, which results in findings that are correlations. In contrast, single-case experimental design tries to mimic a randomized control trial within a person over time; that is, it relies on active (“randomized”) manipulation of an independent variable (e.g., sleep duration) over a period of time and careful observation of a dependent variable (e.g., productivity), all within the same person. This allows the individual user and researchers overseeing a group of single-case self-experimenters to identify possible causal relationships between the independent variable and dependent variable.
A traditional suggested design for single-case experiments is an
design, where the
A phase corresponds to the baseline or control period, and the
B phase corresponds to the intervention period. This design can be modified as a non-terminated sequential intervention, e.g.,
, to see the relationship between different magnitudes of the intervention B and their outcomes [
46]. This is best suited to our platform, as we are looking to determine the optimum magnitude of the independent variable.
Therefore, we implemented a four-stage design (one baseline stage and three intervention stages) to help users determine optimal behaviors with each stage including 4–7 days of data points as suggested by Barlow and Hersen [
42]. In particular, we wanted to include increases and decreases in the behavior to see possible optimal behaviors.
We quantized behaviors into five zones for each experiment (see
Table 2). These target behaviors were predetermined by examining common behaviors and correlations between daily behaviors and measures of wellbeing (e.g., how sleep duration affects happiness and stress) based on typical measures found in the SNAPSHOT study, which collected daily behavior, mood, and stress from over 200 young adults [
28]. We also included a buffer around each target behavior to accommodate for normal variation and to make it easier for users to achieve the target behavior. This buffer size was also informed by the SNAPSHOT study.
The “randomized” ordering of target goals was chosen as follows. Stage 1 was a baseline measure where users were instructed to maintain their normal behavior. Because a choice needed to be made, we settled on having the middle stage (
O2) be the last stage for all experiments. We also decided to include at least one increase in the target behavior and one decrease in the behavior.
Table 3 lists the different stage order patterns currently used in the QuantifyMe platform. We note that there are other stage patterns that could be used that satisfy these conditions. The platform is flexible to implement any pattern of target behaviors. We note that it might be advisable for researchers to choose the same set of interventions for each individual in a group of self-experimenters to use the hierarchal techniques described by Van den Noortgate and Onghena to generalize results to a larger population [
47]. While users of the platform are notified that stages will last for 5–7 days and are notified of their target behavior each day, users in our evaluation study were not made aware of the specific behavioral goal for future stages.
As an example of intervention order, if a user’s average sleep duration during Stage 1 (baseline period) is 6.75 h (i.e., within O1), the user would be instructed to sleep 8.5 h, 6.5 h, and 7.5 h during Stages 2, 3, and 4, respectively. However, if the mean of the user’s sleep duration during Stage 1 was 8.75 h (i.e., within O3), the user would be instructed to sleep 6.5 h, 8.5 h, and 7.5 h during Stages 2, 3, and 4, respectively. The methodology of imposing behavioral targets adds more structure and validity to determining a causal relationship than simply correlating a behavior with how they feel the next day.
To complete a stage of the self-experiment, we aimed to have 5–7 days where the target behavior was within the target range and the outcome variable is stable as suggested by Kratochwill [
45] and Barlow et al. [
42] (see also
Section 2.2). However, we realized that changing and maintaining specific behaviors would be hard for novice self-experimenter users. Therefore, we developed the following rules for advancing to the next stage or restarting a stage:
If the user has (
i) at least five days within the appropriate target behavior range and (
ii) a stable output (defined as a self-reported output within 3 points on the Likert scale or sleep efficiencies within 10%, see
Section 4.2 for more information), then they are sent to the next stage.
If a stage lasts seven days and only four (instead of five) of the days were in the appropriate target behavior range, the user is sent to the next stage. If three or fewer days were in the target range, the stage is restarted. This was done because we need to have several days where the target behavior was achieved, but we do not want the user to be too frustrated with the system.
If a stage lasts seven days and the output is unstable, the user is sent to the next stage. This was implemented because we did not want users to become too frustrated with the system restarting when they had high adherence to checking in and following the target behavior.
If the user missed checking-in for two days, the stage is restarted. This rule is particularly strict because, if a user did not check-in (report outcomes and receive instructions for the day), he may not know the target behavior for that day.
Put simply, assuming that a user checked in each day, we only count a stage as complete if it has been:
five days and all days were in the correct target behavior range and the output was stable;
six days where five of the six days were in the correct target behavior range and the output was stable; or
seven days where four or five of the days were in the correct target behavior range.
Rule 1, which reflects the best practices for single-case experimental design, requires two conditions to be met. Rules 2 and 3 are implemented as slight deviations to Conditions (i) and (ii) of Rule 1, respectively, so that we can still maintain the user’s interest throughout the self-experiment. Finally, Rule 4 is required so that users are aware of the target goals. Importantly, these requirements are easily adaptable when using the QuantifyMe platform for other self-experiments if one wants to adjust how strictly individuals must adhere to the best practices of single-case experimental design.
This process is repeated for each of the stages of the experiment until all four stages are completed. After the experiment is complete, the results are analyzed using only the days where the target behavior was in the target range. An example self-experiment can be found in
Section 5.4. Therefore, self-experiments will take a minimum of 20 days to complete and use between 16 and 28 days for the data analysis. For example, an experiment will use 16 days for analysis if the participant was only able to maintain in the target behavior for four of the seven days in each stage, while an experiment will use 28 days for analysis if for each of the four stages the output was unstable. Similarly, an experiment will take only 20 days if the participant is able to maintain the target behavior and a stable output for five consecutive days for each of the four stages; however, the experiment may be longer if a participant’s behavior triggers repeated restarting of stages.
4.2. QuantifyMe Android App
The Android app was designed with the goal of setting up the self-experiment and letting the user easily “check-in” each day.
When the user first opens the app after installation, it prompts them for demographic data and to self-report their average happiness, stress levels, and subjective sleep quality with seven-point Likert scales (happiness: very unhappy–very happy; stress level: very low–very high; and sleep: terrible–great) and their average total active time per day (0.5, 1, 3, 8, and 12 h). The app also helps the user connect their Jawbone account to our system’s account. Users are then able to browse the self-experiments and select the one they are interested in.
Based on previous mhealth app research [
33,
48], we speculated that users’ engagement in this app and outcome from the self-experimentation might be related to their self-efficacy and their perceived efficacy of the app. Therefore, after the user selects a self experiment, the user was asked to rate each of the three efficacy questions on a seven-point Likert scale (from 1, Poorly effective, to 7, Highly effective):
How effective do you think this app will be in helping you run this experiment? (App Efficacy)
How effective do you think this experiment will be in getting concrete results? (Experiment Efficacy)
How effective do you think you will be in carrying out the experiment? (Self Efficacy)
Once the efficacy questions are completed, the self-experiment has begun.
Every morning during the experiment, the user is reminded to check-in and fill out a short daily survey. This survey asks about how well they followed the experiment’s instructions (seven-point Likert scale: poor–good), the amount of leisure time they had in the past 24 h, along with happiness, stress, and productivity levels experienced in the past 24 h using seven-point Likert scales (not at all–extremely). Finally, the app reminds the user to sync their Jawbone wearable sensor to Jawbone’s Up App (syncing takes a few seconds). Upon a successful check-in, the QuantifyMe app sends the data to the backend for processing. The backend then responds with both the status of the user’s experiment as well as the daily goal for that day (see
Section 4.3 for more information).
After the user has checked-in for the day, the app presents the user with a screen that lets them view their daily goal and experiment progress during that stage (see
Figure 2b,c). In particular, the user can see her recorded behavior for all of the days she has been in that stage. For example, in
Figure 2c, we can see that the user has completed three days of Stage 3 and went to bed at 12:05 a.m., 1:00 a.m., and 12:07 a.m. We did not give the user any indication of what the target behavior would be on the next day as we felt that it might bias the results. However, this is something that can be tested to determine the balance between biasing results and providing users with feedback on what to prepare for in the upcoming days.
If a user has failed a stage in the experiment (by not achieving the target behavior range for the targeted number of days, or not providing enough days of data), then they are shown a message prompting them to restart the stage (see
Figure 2d). Once an experiment has been completed successfully, the user is shown a success screen with their end results, and the experiment’s results are added to their history, which they can view from the daily goals screen at any time. The user is then again able to select a self-experiment to start (including the one that was just completed).
4.3. Backend System
When a user creates an account in the QuantifyMe app (see
Section 4.2), it is sent to the backend (implemented as a Django application) and used for user authentication for all subsequent app requests. The QuantifyMe app also sends an identifier that is used by Jawbone to identify each user, which we associate with our system’s user account (see
Figure 3a).
The backend server is set up so Jawbone’s system pushes all updates for all activities to our server automatically as they happen. The server then takes the relevant information involving sleep, activity, and workouts; associates the data with the user with the appropriate Jawbone identifier; and saves it in the database. When the user completes a check-in each day, the data are saved directly into the database. Once the check-in data are saved in the database, the system performs an analysis on the experiment data to determine the instructions for that day (see
Figure 3b).
During the initial stage of the experiment, the data are simply gathered without a daily goal to determine the normal base state for the user. Specifically, as discussed in
Section 4.1, users are instructed to maintain their normal behavior during Stage 1. When the initial stage is completed, the system determines the daily goals for each of the remaining stages of the experiment (see
Section 4.1 for more details on how target zones were determined). In each of the later stages (i.e., Stages 2, 3, and 4), the daily progress is matched against the goal for that stage.
The final results of the experiment are calculated upon the final stage’s completion. The data for every completed stage of the experiment are queried and grouped into days. Then, the system finds the stage that maximizes or minimizes the average of the output measurement, depending on the experiment. The results are then presented to the user through the Android App.
We note that a backend system is used to collect and analyze the data in the QuantifyMe platform so that the researcher helping individuals conduct self experiments could have a central repository for all of the data. This allows researchers to adapt the analysis, if necessary, in an agile way (i.e., without having to have all users install a new version of the app). Furthermore, we wanted to make sure that if the user accidentally deleted his/her app or data that we could still use this data for analysis and reset the individual to the appropriate place in the self-experiment.