Analysis of a Wake-Up Task-Based Mobile Alarm App

: The latest mobile alarm apps provide wake-up tasks (e.g., solving math problems) to dismiss the alarm, and many users willingly accept such an inconvenience in return for successfully waking up on time. However, there have been no studies that investigate how the wake-up tasks are used and their effects from a human–computer interaction perspective. This study aims to deepen our understanding of how users engage and utilize the task-based alarm app by (1) examining the characteristics of different wake-up tasks and (2) extracting usage factors of hard tasks which involve physical or cognitive task loads over a certain level. We developed and deployed Alarmy, which is a task-based mobile alarm app with four wake-up task features: touching a button, taking a picture, shaking the device, and solving math problems. We collected 42.9 million in situ usage data from 211,273 US users for ﬁve months. Their alarm app usage behaviors were measured in two folds: eight alarm-set variables and ﬁve alarm-dismiss variables. Our statistical test results reveal the signiﬁcant differences in alarm usage behaviors depending on the wake-up task, and the multiple regression analysis results show key usage patterns that affect the frequent uses of hard tasks, which are late alarm hours, many snoozes, and relatively more use on weekends. Our study results provide theoretical implications on behavior change as well as practical implications for designing task-based mobile alarm.


Introduction
Many people have experienced forced awakening using mobile alarm applications.Kurniawan et al. [1] found that the alarm clock is one of the most crucial features of smartphones, and the most popular sleep-related apps were alarm apps [2].Additionally, Park et al. [3] conducted surveys of mobile phone usage pattern, and the results showed that many participants used alarm applications very frequently.
The latest mobile alarm apps offer wake-up task alarms that do not allow the user to dismiss an alarm until they complete a specific task that we call a wake-up task in this study.According to relevant studies [4,5] and the 14 most downloaded alarm apps in Google Play (using the search keyword "alarm"), including our app, Alarmy (deployed in 2013), most alarm apps provide alarms with wake-up tasks.The types of wake-up tasks are diverse; they include inputting text (e.g., Earlybird Alarm Clock), taking a picture (e.g., Alarmy and Mimicker Alarm), reading a given sentence aloud (e.g., EarlyBird Alarm Clock), solving a math problem (e.g., Alarmy), walking some steps [4], playing a game (e.g., Alarmmon), and haptic features such as following a particular pattern (e.g., FreeAlarm Clock).These tasks may aggravate users by enforcing higher cognitive and physical efforts relative to the general touch-to-dismiss type of alarm.Task-based alarm users need to go through these to achieve their initial interaction goal (i.e., dismissing an alarm), and this would increase cognitive load and frustration [6].Despite such inconvenience, the large number of downloads and amount of usage shows that many users are willing to conduct the task to get up at the right time.
The principle behind task-based alarm apps is linked to the relationship between neural activation and wakefulness [7][8][9].From the perspective of the systemic-structural activity theory (SSAT) [10], prior studies revealed that task complexity could be related to the activation level of the brain [8,9].In particular, Bloch [7] pointed out that the higher the level of activation is, the higher the level of wakefulness.Based on these studies, performing wake-up tasks in alarm apps can be seen as activating neurons that help users to wake.
While SSAT is helpful in understanding the general working mechanism of wake-up tasks in mobile alarm apps, it rarely considers users' task selection and specific usage contexts.In most task-based alarm apps, users can select one of the wake-up tasks according to their preferences and needs.Even though the particular tasks can increase wakefulness, users may rarely choose them due to the ultra high level of the task difficulty.Furthermore, wake-up alarms have been relatively under-studied in human-computer interaction research (HCI).Many HCI researchers have mainly focused on using mobile sensing technologies to support healthy sleep behaviors such as monitoring sleep states [11] and measuring sleep quality [12,13].Particularly, mobile alarm usage behavior goes through a self-regulative process: setting up the alarm to plan one's upcoming wake-up time and selecting a task to make oneself adhere to the mobile alarm's "instruction" to complete a task.This study was inspired by Fogg's behavior model [14], which explains how one's motivation, ability, and a prompt to a task initiate an action.When a person is motivated to wake up on time, their ability may not be sufficient at the time of waking up.Therefore, a certain prompt must exist that could actually lead to one's action of waking up.Therefore, we aimed to analyze the different usage patterns and insights of the particular group of people whom we call hard-taskers, who actively use what could be inconvenient and frustrating task-based alarms.Understanding the in situ moments of awakening with task-based alarms would be of great value in the future design of mobile alarms that could be appropriate for users with different preferences and wake-up needs.
This paper aims to deepen our understanding of wake-up tasks, most distinguished from the conventional alarm-dismiss method (i.e., touching), by analyzing four wake-up tasks in the Alarmy app: (1) touching a button (normal_task), (2) taking a picture (picture_task), (3) shaking the device (shake_task), and (4) solving math problems (math_task).Specifically, we addressed two research questions: RQ 1.How does users' alarm usage differ depending on the wake-up task?RQ 2. How do hard-taskers, who use a higher proportion of the hard tasks with task load over a certain level, use task alarms, and in what contexts?
Regarding the first research question, we anticipated that the level of task loads will cause differences in alarm-set usage and alarm-dismiss usage.For example, it was expected that more difficult tasks are mostly used in earlier hours.

Hypothesis 1a.
The frequency of the alarm-set will be different among the four wake-up tasks.

Hypothesis 1b.
The time of the alarm-set will be different among the four wake-up tasks.

Hypothesis 1c.
The consistency of the alarm-set will be different among the four wake-up tasks.

Hypothesis 2a.
The time duration of the alarm-dismiss will be different among the four wake-up tasks.
Hypothesis 2b.The use of snoozing will be different among the four wake-up tasks.
Next, for the second research question, it was expected that there will be alarm app usage factors for predicting the proportion of the individual's hard-task uses.The following hypothesis aims to test that.Hypothesis 3. The particular alarm app usage will affect the proportion of hard-task uses.
The remainder of this paper is organized as follows.We first review the literature on human awakening and inconvenient interaction.We then explain the methodology to answer the two research questions and present the analysis results.Finally, we discuss the implications, including limitations of this study, and present the conclusions.

Task Capability Immediately after Awakening
Prior studies have examined task capability shortly after awakening [15].Jeanneret and Webb [16] evaluated grip strength by squeezing a dynamometer after awakening compared to the normal state.Wilkinson and Stretton [17] carried out performance experiments after awakening at different times in the night.They evaluated several performance metrics such as adding numbers, reaction response to a specific sound tone, and coordinating a ball at a point.Åkerstedt and Gillberg [18] performed memory experiments where subjects learned four playing cards and were tested for retention of the previous cards.
Such studies could be useful references to design the wake-up tasks by specifying the task load, but these were mainly conducted in limited experimental settings (i.e., laboratories).For our first research question, we analyzed task performance based on alarm usage logs collected from an extensive number of real users.Therefore, quantitative results in this study can better clarify the characteristics of a different wake-up task, and this can lead us to practical implications for designing a task-based mobile alarm app.

Uncomfortable and Inconvenient Interaction
Ease of use has been a general principle of interactive system design in HCI [19].This principle is to increase efficiency and convenience so that users do not have to struggle to achieve better results.However, Gilmore [20] addressed the question about the situation where guidelines and principles of interface design are applied uniformly to every information technology.Benford et al. [21] argued in favor of unconventional interaction design, which they called uncomfortable interaction.This tool is important to understand cultural experiences related to entertainment, enlightenment, and sociality.They investigated and reported multiple cases based on the uncomfortable system.For example, roller coasters and sports such as bungee jumping can cause physical discomfort such as tiredness and tension.However, people are pleased to accept this stress to feel the thrill and entertainment.
Rekimoto and Tsujita [22] proposed inconvenient interaction, which requires the user's participation and effort to attain a long-term benefit with the short-term inconvenience.For example, they showed a microwave that requires the user to perform a step-aerobics exercise during operation.Another example of the inconvenient system was a refrigerator that contains a smile-awareness sensor so that the user has to smile if they want to open it [23].These examples were based on the premise that laughing can help people to maintain long-term benefits of health.The Communication-Grill, for example, looks like an ordinary grill except that it cooks and keeps the meat warm only while people converse around it.This product encourages people to maintain a conversation with a positive effect.
Awakening from sleep is generally a difficult but indispensable behavior.In particular, people often need to get up at a challenging time such as the early morning hours, and that is why they use a traditional wake-up alarm.Needless to say, wake-up tasks impose inconvenience to users, so users have to increase their efforts to perform wake-up tasks.In this regard, using wake-up tasks can be understood as an empirical case for the principles of inconvenient interaction.
In this study, we examined hard-taskers who like to use such inconvenience to answer the second research question.The hard-tasker group is important and needs to be understood because the wake-up task, which causes task loads over a certain level, is the most distinguishing feature of the task-based alarm app.Based on these understandings, it would be possible to recommend a suitable wake-up task for a particular user and to manage the level of inconvenience.

Alarmy Usage Data
To answer the research questions, we analyzed Alarmy usage logs.Alarmy is a task-based mobile app we developed in 2013 which currently provides four different wake-up tasks that involve various levels of physical or mental efforts: (1) touching a button (normal_task), (2) taking a picture (picture_task), (3) shaking the device (shake_task), and (4) solving math problems (math_task).The following sections introduce the Alarmy app and describe the dataset for the analysis.

Alarmy App
In this work, we analyzed Alarmy, a wake-up task-based mobile alarm app.We deployed Alarmy to the public via Google Play and the App Store.After a few years, the app became the most downloaded alarm clock app in 97 countries, and the number of its daily active users (DAU) is about three million.
Alarmy provides alarms with wake-up tasks, and the user is forced to perform a set task to turn off an alarm.This section explains how Alarmy users set and dismiss alarms with wake-up tasks.The whole alarm usage process in Alarmy is as shown in Figure 1.Alarmy consists of four tab menus: (1) Alarm, (2) History, (3) Today's Panel, and (4) Settings.First, the Alarm tab shows a list of alarms by displaying the set time and the task type for each.The user can add a new alarm by touching the button at the bottom of the screen.When touching one of the alarms in the list, the user can see and modify detailed settings.Additionally, this tab enables the user to set a timer called Quick Alarm.Secondly, the History tab visualizes the user's previous alarm usage information such as alarm set time and the task type over time.Thirdly, the Today's Panel tab contains the day's weather, horoscope, and news.Finally, the Settings tab enables users to set details for the Alarmy app use, such as changing app theme, language, and so on.
To set the alarm, the user must specify detailed information.First, they must set when the alarm goes off.Further, the alarms can recur once the user has selected days of the week for recurrence (e.g., weekday or weekend).Next, the user can choose ringtone volume, type (e.g., sound or vibration), and sound (e.g., noisy sound or music in the device).Finally, the user must select one of five tasks for the alarm: (1) touching a button (conventional method), (2) taking a picture of a particular place, (3) shaking the device, (4) solving math problems of addition and multiplication, and (5) scanning a QR code.
These tasks certainly aggravate users by enforcing cognitive and physical efforts.Even so, our usage logs show that many users still conduct such tasks because they want to get up at the right time.In this paper, we refer to mobile alarm app tasks for waking people up wake-up tasks.We labeled each wake-up task as such: (1) touching a button (normal_task), (2) taking a picture (picture_task), (3) shaking the device (shake_task), and (4) solving math problems (math_task).In this study, we excluded the scanning a QR code method because the number of users selecting the scanning a QR code method is very low, and the procedure for this task is quite similar to that for the picture_task.
Touching a button is the conventional method for alarm dismissal, and its task load is relatively low.However, other wake-up tasks require relatively high physical or cognitive efforts, in which users must specify the detailed tasks in advance, called hard tasks.For example, the user can specify the difficulty of the math problems or the number of shakes.For the tasks of taking a picture and scanning a QR code, the user must register an image as a reference when setting the alarm.
Once an alarm goes off, the user must decide whether to delay the alarm (snooze) or perform the wake-up task.If the user decides to snooze, the alarm temporarily turns off, but it re-rings after the time the user previously specified.Otherwise, the user must perform a specific task to dismiss the alarm.

Dataset Overview
Table 1 shows the description of alarm usage variables and the statistics that we measured for each individual user.We collected Alarmy usage data from the real users who voluntarily downloaded and installed the app in the US from January to May 2018 (151 days).The collected usage log includes anonymized user ID, the time when the alarm was set, the time when the alarm was set, the details of the alarm-set (e.g., wake-up task type, set to recur), the time when the alarm was dismissed, and the number of snooze uses.The consent required for data collection from these users was exempt by the approval from the Hanyang Institutional Review Board (IRB No.: HYU-2019-06-010-1).During the study period, a total of 581,712 users left 48,763,070 usage logs.To exclude trial users, we selected 211,273 Alarmy users who had used the app for more than 30 days and their 42,909,263 app usage logs.They were 38% of the total users, but their usage logs cover 88% of the entire dataset.From the usage logs, we measured alarm usage behaviors in two folds: alarm-set and alarm-dismiss.The alarm-set usage contains eight variables in three subcategories: (1) set frequency, (2) set time, and (3) set consistency.Alarm-dismiss usage includes four variables from the two subcategories, (1) dismiss time and (2) snooze.
The average number of the days that the users set the alarm was 74 days, and the users usually set 1.68 alarms per day.The alarms mostly occurred in the early morning hours (7.43 h) on weekdays (84%).Further, most of the alarms were set to recur (77%).The most common task was normal_tasks (71%), followed by math_tasks (15%), shake_tasks (10%), and picture_tasks (4%).On average, it took about 19 s to dismiss an alarm (Normal: 14.1 s; Picture: 40.06 s; Shake: 26.62 s; Math: 40.81 s).

Comparing Alarm Usage among Wake-Up Tasks (RQ1)
To compare the alarm usage among the four wake-up tasks, each user's usage variables were aggregated for each task.To check the existence of statistically significant differences in alarm usage among the wake-up task, we conducted a Kruskal-Wallis H test, a non-parametric hypothesis test for a multigroup comparison.By considering the large size of each group, we also calculated the effect sizes to evaluate the magnitude of the differences [24].We considered 2 , the effect size for the Kruskal-Wallis H test.A prior study [25] suggested that 2 = 0.01 be considered a small effect size, 0.04 represents a moderate effect size, 0.16 a relatively strong effect size, and 0.46 a strong effect size.
In cases where the comparison test shows a moderate or strong effect size under the prior studies' guideline [24], we conducted the Dunn test as a post-hoc test to identify the detailed relationship between the two tasks and calculated Cohen's d as the effect size, which represent the magnitude of the mean difference.Cohen [26] provided rules of thumb for interpreting d, suggesting that an d of |0.2| represents a small size, |0.5| represents a medium size, and |0.8| represents a large effect size.We used 0.001 as the level of significance for all the hypothesis tests and conducted a Bonferroni correction method for the post-hoc tests.

Modeling Hard-Taskers' Usage Patterns (RQ2)
For the second research question, we adopted a statistical modeling approach to explore relevant usage patterns of hard-taskers, who show a relatively high proportion of non-normal tasks with cognitive and physical task loads (i.e., math_task, shake_task, and picture_task).Specifically, we conducted a multiple linear regression analysis, which builds a linear model of predicting the proportion of the hard task uses (prop_hard_task) based on the alarm-set and alarm-dismiss usage variables of each user.We used the above twelve independent usage variables, including eight alarm-set variables and four alarm-dismiss variables from Table 1.
For this analysis, all the variables were measured for each of the 211,273 users.We found several independent variables that were correlated.For example, the two highest correlations were observed between avg_daily_alarm and sd_daily_alarm, and between avg_ f req_snooze and sd_ f req_snooze.We excluded avg_daily_alarm and sd_ f req_snooze to make all the pairs of the independent variables have a Pearson correlation value less than 0.7, and the variance inflation factor (VIF) for each independent variable was lower than the benchmark of 10 for multicollinearity [27].The highest value was 1.757.Therefore, our regression models were safe from the multicollinearity effect.

Different Alarm App Usage among Wake-Up Tasks (RQ1)
Tables 2 and 3 show the task-wise comparison results by the Kruskal-Wallis H test, and Figures 2  and 3 represent the magnitude of the mean differences among the wake-up tasks.

Alarm-Set Usage
Set Frequency.There are three variables in the set frequency, f req_use_days, prop_use_days, and avg_daily_alarm.First, the number of days of setting each wake-up task ( f req_use_days) was compared via the Kruskal-Wallis H test, The results show significant and moderate differences among the wake-up tasks [H(3) = 10, 884.67, p < 0.001, 2 = 0.036].Post-hoc comparisons using the Dunn test also found significant differences in most of the pairs of the wake-up tasks, but their magnitudes of the difference differed depending on the task pair.Normal_task was used 54 days on average, and this is much greater than those of others (vs.picture_task: d = 0.491, vs. shake_task: d = 0.471, vs. math_task: d = 0.252).math_task tended to have a moderately lower number of days than picture_task (d = 0.251) and math_task (d = 0.226).However, the difference in the frequencies of picture_task uses and shake_task uses was not significant and not considerable (p = 0.278, d = 0.023).
Next, there is a significant and moderate difference in the proportion of setting each wake-up task (prop_use_days) [H(3) = 15, 704.64,p < 0.001, 2 = 0.052], and all the post-test results are also statistically significant.The effect sizes of the post-tests show the detailed relationship between the wake-up tasks.The proportion of days with normal_task is much greater than those of others (vs.picture_task: d = 0.553, vs. shake_task: d = 0.585, vs. math_task: d = 0.337).shake_task and picture_task showed a similar number of uses (d = 0.032), and math_task was the least used (vs.shake_task: d = 0.254 and vs. picture_task: d = 0.222).
We also found a significant and moderate difference in the number of set alarms in a day (avg_daily_alarm, multiple alarm-sets in a day) [H(3) = 13, 852.05, p < 0.001, 2 = 0.046].What all tasks had in common was that users tended to set a single alarm for a day (normal_task: 1.55, picture_task: 1.16, shake_task: 1.34, math_task: 1.36).However, the effect sizes showed a large difference between particular task pairs.For example, the number of normal_task alarms tended to be greater than those of the other tasks, and its effect size also showed that the magnitude of their differences was considerably large (vs.picture_task: d = 0.624, vs. shake_task: d = 0.351, vs. math_task: d = 0.314).Similarly to the prop_use_days, the effect size of the difference between shake_task and math_task was relatively small (d = 0.047), and they were frequently used in a day more than picture_task (vs.shake_task: d = 0.391 and vs. math_task: d = 0.425).
Set Time.The set time category includes avg_ring_hour and prop_weekday.First, the Kruskal-Wallis H test on the set-alarm hours (avg_ring_hour) reveals a significant difference among the tasks, but its effect size was small, indicating that most of the alarms were concentrated in the morning hours [H(3) = 2558.71,p < 0.001, 2 = 0.008].Even though the magnitude of the difference is not that large, we could find a tendency of the task preference in the set-time.normal_task's average hours (about 8 a.m.) tended to be slightly later than those of the other tasks (vs.picture_task: d = 0.281, vs. shake_task: d = 0.119, vs. math_task: d = 0.181).Interestingly, hard tasks (i.e., picture_task, shake_task, and math_task) tended to be more concentrated at the earlier hours than normal_task.
Similarly, we also found a small effect size in the comparison of the weekday use proportion (prop_weekday), indicating that most of the set alarms were concentrated on weekdays regardless of the wake-up task types [H(3) = 1828.93,p < 0.001, 2 = .006].Approximately 82-85% of the alarms went off on weekdays, and such a tendency was not entirely different depending on the task type.
Set Consistency.We compared three variables, sd_daily_alarm, sd_ring_hour, and prop_recur, in terms of the set consistency.First, we found a significant difference in the variability of the number of a daily alarm-set (sd_daily_alarm) and its moderate effect size [H(3) = 12, 485.94, p < 0.001, 2 = 0.041].normal_task tended to be more inconsistent in the number of the daily set alarms (vs.picture_task: d = 0.671, vs. shake_task: d = 0.354, vs. math_task: d = 0.302).picture_task showed strong consistency in the number of the daily set alarms compared to the other tasks (vs.shake_task: d = 0.371 and vs. math_task: d = 0.428).
Finally, we compared the proportion of weekly-recurrent alarms over wake-up tasks.The results showed a small effect size, indicating that most of the alarms were set to recur weekly regardless of the task type [H(3) = 2736.596,p < 0.001, 2 = 0.009].Approximately 73-76% of the alarms were set to recur.

Alarm-Dismiss Usage
Task Completion Time.We considered avg_task_completion_time and sd_task_completion_time in the usage category of the task completion time.First, our Kruskal-Wallis H test results on the avg_task_completion_time revealed significant and strong differences among wake-up tasks [H(3) = 14, 4454.19,p < 0.001, 2 = 0.683].The task completion time of normal_task was approximately 14 s on average, while those of picture_task (Mean = 40.05,SD = 15.63) and math_task (Mean = 40.81,SD = 14.67) were the longest.shake_task's completion time was approximately 26 s.
There is also significant and moderate difference in the variability of the task completion time among the tasks (sd_task_completion_time) [H(3) = 14, 281.84, p < 0.001, 2 = 0.067].normal_task tended to be more consistent in the task completion time compared to the other tasks (vs.picture_task: d = 0.594, vs. shake_task: d = 0.163, vs. math_task: d = 0.461).In the variability of the completion time, the magnitude of the difference between math_task and picture_task was not great (d = 0.156).
Snooze.Regarding snooze use, avg_ f req_snooze and sd_ f req_snooze were compared among the wake-up tasks.We found significant difference in the number of snoozes (avg_ f req_snooze), but its magnitude was small [H(3) = 277.76,p < 0.001, 2 = 0.001].picture_task has the largest number of snooze use per alarm (Mean = 0.332 and SD = 0.507) among the wake-up tasks.However, the standard deviation of snooze use with the picture_task was too large, so the Kruskal-Wallis H test accepted the null hypothesis.Next, we found that there was a significant but weak difference in the consistency of snooze use (sd_ f req_snooze) [H(3) = 16.18,p = 0.001, 2 < 0.001].The standard deviation of the snooze use per alarm was not considerably different among the wake-up tasks.

Alarm Usage Factors for Predicting the Proportion of the Hard-Task Use (RQ2)
Table 4 shows the results of the multiple regression analysis.The regression model was statistically significant (F(10, 22, 110) = 22, 110, p < 0.001) and explained the proportion of using hard tasks reasonably well (R 2 of .511).
The results reveal significant usage patterns to understand hard-taskers.First, our regression analysis results reveal that hard-taskers experienced challenging situations such as awakening at an earlier time relatively less.The proportion of hard-task uses tends to be high when avg_ring_hour was late, and the proportion of weekday use was low.Second, hard-taskers showed irregular usage behaviors, which possibly short-circuit habit-formation.Their number of daily alarms varied much (multiple alarms in a day), and they consistently delayed waking up by dismissing an alarm using the snooze feature.Additionally, hard-taskers' alarm uses tended to be sparse over a longer period (positively correlated with f req_use_days and negatively correlated with prop_use_days).Finally, hard-taskers tended to show a longer task completion time.This is obvious because they are the users who use hard tasks relatively often, which take a longer time to dismiss.

Characteristics of Wake-Up Tasks
normal_task is the most casual task type.normal_task is the easiest and causes the lowest task load.The task completion time of normal_task was the shortest (14 s), and also the number of the snooze uses was the lowest (0.155 times) among the tasks.Such a low level of the task load possibly leads users to use normal_task frequently and casually in diverse situations.Almost half of the total alarm-sets were normal_task (48%), and the average number of days was the greatest among the wake-up tasks (54 days).Further, normal_task tended to be used for alarms in the late morning hours (7.72-equivalent to 7:43 a.m.), which are likely less challenging.Finally, the consistency of using normal_task tended to be lower than that of the other tasks in terms of both set and dismiss usage, possibly because of its frequent uses in diverse contexts.
picture_task is the hardest task and mainly used for challenging situations.Dismissing picture_task tends to take a longer time (40 s), and the number of snooze uses was the greatest, indicating that picture_task is the most challenging task to be completed among the wake-up tasks.
The higher level of the difficulty probably causes users to usually set picture_task for challenging contexts.The average set-hour of picture_task was the earliest (6.88 h), and its standard deviation was the lowest.Moreover, the overall frequency of the picture_task use was the lowest.picture_task is more valuable and useful when the user is desperate to awaken on time, especially in the earlier morning.
math_task is also a difficult type, but its psychological barrier seems lower than that of picture_task.Similar to picture_task, math_task takes a longer time to complete (40 s).However, the frequency of the math_task use was greater than that of the picture_task use by almost five times.Further, its number of snooze uses was much lower than that of picture_task.Such differences between math_task and picture_task occur possibly because these tasks cause different types of task load.picture_task causes a physical load, while math_task a cognitive load.According to the results, it seems that users feel the burden of physical tasks more than that of cognitive tasks, even though the actual completion time of the tasks is the same.
shake_task shows moderate usage tendencies in both alarm-set and alarm-dismiss.In alarm-dismiss, shake_task showed a longer task completion time, and the higher number of the snooze uses compared to noraml_task.However, its difficulty level did not reach that of the other two hard tasks, i.e., picture_task and math_task.On the other hand, the alarm-set variables of shake_task tended to be similar to those of math_task, except for the set frequency, even though shake_task causes a physical load, unlike math_task.

Physical Tasks vs. Cognitive Tasks
This study found the characteristics of the wake-up tasks by analyzing dismissal behaviors.The task of touching a button tends to require the shortest time to complete, while the tasks of taking a picture and solving math problems showed relatively long task completion times.Next, users tended to snooze more often when the wake-up task was set as taking a picture.Solving math problems had a significantly lower number of snoozes than taking a picture, which showed a similar task completion time.Due to the greater number of snoozes, taking a picture showed the longest time to dismiss from the first ring; this was followed by solving math problems, then shaking a device, and finally touching a button.Our results show that we should consider cognitive and physical loads of wake-up tasks in their design.
Though taking a picture and solving math problems showed similar task completion times, we could find significant differences in the number of snoozes and the frequency of setting the task.The task of solving math problems mainly causes a cognitive load, while taking a picture requires physical effort.The results may indicate that users tend to experience more psychological pressure from physical tasks than cognitive tasks.Requiring a physical load could lead to better performance in waking the user from sleep.However, this kind of task should be carefully designed because users would avoid using this task due to the pressure it causes.Therefore, recommending cognitive tasks can be relatively more effective in user retention, especially for those who are sensitive to psychological pressures.
As the design guideline from Rekimoto and Tsujita's inconvenient interaction [22] suggests, the difficulty of inconvenience (i.e., physical or cognitive tasks) needs to be well balanced with the necessity (i.e., waking up on time), or else it may lead the user to abandon the use.Further, Kim et al.'s study [6] suggests that users engage in a cost-benefit analysis when given a mobile task (i.e., wake-up task) with a certain level of difficulty and weigh whether the outcome of the task (i.e., waking up) is worth not ignoring (i.e., snoozing) it.Therefore, our work complements prior studies by characterizing the mobile alarm tasks with physical and cognitive loads for future wake-up app design support.

Wake-Up Tasks and Fogg's Motivation-Ability-Prompt (MAP) Model
The study results are well linked with Fogg's MAP behavior model [14], which explains human behavior through three elements: (1) motivation, (2) ability, and (3) prompts, as shown in Figure 4.In this model, if at least one of the three elements is missing, the anticipated behavior is rarely triggered.This model can thoroughly explain mobile alarm users' usage behavior.A user's motivation can vary depending on the next morning's schedule or their willingness to build the habit of waking up early.Furthermore, the ability in MAP corresponds to how much the user is confident in waking up on time.According to the MAP theory, the activation threshold for awakening on time can basically be determined by motivation and ability.Additionally, the wake-up tasks in the mobile alarm app can be seen as prompts.For example, in the context when a user's motivation is low, and their ability is high, the wake-up tasks do not need to be that hard.The study results show that most users frequently used the easiest task (normal_task) and mostly set it to recur.This probably indicates that motivation is usually lower than ability for most users.Even the easiest task can trigger the targeted behavior (awakening on time) because it may not be that important.However, the regression analysis results reveal that hard-taskers usually seem less pressured to wake up on time (e.g., late and irregular alarm hour, many snoozes, and relatively less use in weekdays), so they do not need to build the habit of waking up early (i.e., the ability is relatively low).However, probably, they occasionally face situations that require them to get up early (i.e., the motivation becomes higher), and the hard tasks become more valuable at this time.
Developing the ability requires time and effort, so diverse wake-up tasks in mobile alarm app scan be useful to deal with different activation levels depending on contextual motivation and individual ability.One further idea would be a wake-up task recommendation based on the user's schedule.For example, a mobile alarm app can be connected with a calendar app and encourage the user to set a challenging wake-up task only if there is an important event early in the next morning.Such flexible interaction can reduce the overall pressure or burden on the app user and convince them to continue using the app.

Limitations
In this current study, we were not able to include a user profile for a detailed analysis (e.g., an individual's level of motivation of use; the sensitivity to the alarm).Additionally, the detailed context of use was not available for analysis.For example, in the case of picture_task, the target picture may be just near the bed, or even outside of the house, which could greatly influence usage behavior.Despite such limitations, the large dataset allowed us to observe a generalized behavior that partly justifies the extreme outliers.It would be interesting to conduct a qualitative study with a smaller number of datasets to uncover the hidden motivations behind the various wake-up tasks for future work.

Conclusions
This paper aimed to improve task-based mobile alarm app design by understanding the wake-up task uses in the Alarmy app.The results of this study reveal the prevalent use of the normal_task, the earlier and regular use of the picture_task, and the longest task completion time of the math_task.Further, our regression analysis results showed that hard-taskers experienced challenging situations such as awakening at an earlier time relatively less and mostly showed irregular behaviors such as a high number of snooze uses and an inconsistent number of alarms in a day, which possibly short-circuit habit-formation.
In future work, it will be interesting to design wake-up tasks for mobile crowdsourcing [28,29].Our study results showed the potential of wake-up tasks for crowdsourcing.For example, we found that there are many users who willingly and consistently conduct cognitive or physical tasks in order to get out of bed.Moreover, task completion tended to be longer (50-70 s) than that in prior work, such as utilizing smartphone unlock moments (1-3 s) [30,31].
Furthermore, we believe that our study can be a pioneer in the HCI field of behavior change related to status right after awakening and encourage branching out in further studies.This study presents empirical cases under the principles of inconvenient interaction.HCI has mostly focused on increasing effectiveness in task performance, but like task-based alarms, users still decide to accept inconvenience in some domains.We believe that understanding such intended inconvenient experiences will expand the HCI research area.

Figure 1 .
Figure 1.Setting and dismissing an alarm in the Alarmy app.

Figure 2 .
Figure 2. The magnitude of task-wise mean differences on alarm set-usage: Cohen's d (col, row).

Table 2 .
Alarm set-usage comparison among the wake-up tasks (Kruskal-Wallis H test).

Table 3 .
Alarm dismiss-usage comparison among the wake-up tasks (Kruskal-Wallis H test).