1. Introduction
The relationship between humans and technology is best exemplified through wearable technology, which increases human potential through the provision of smart gadgets that can help to monitor and track body parameters. Wearable devices provide continuous monitoring of an individual’s health, activity, and fitness, which has significant potential for enhancing human activities and standard of living. There are sensors embedded in these devices that gather personal information and exchange this information through Wi-Fi, Bluetooth, cellular technology, etc. [
1]. Wearable technologies offer a wide range of functionalities for unobtrusive health monitoring, such as heart rate tracking, blood sugar tracking, blood pressure analysis, gait analysis, sleep score, as well as a variety of other health factors. These devices are the subject of extensive research [
2] and can measure physical activity, such as steps walked, calories burnt, or workout intensity, using a bracelet-like gadget worn on the wrist. Then, the data are transmitted to a mobile application, either wirelessly through Bluetooth synchronizing or through the connection of the device to a smartphone, where objectives, progress, and activity may be recorded [
3]. The most widely used wearable devices are wrist-worn smartwatches, which can receive and send notifications, messages, or calls. For example, the Apple watch series can receive incoming calls and has a comparatively large screen, but is more expensive than other smartwatches. The mainstream wrist-worn devices include the Apple watch, Samsung Gear S, Mi Band, Huawei Honor, Fitbit Surge, and Jawbone Up3, with many others available for purchase.
Currently, the major aspects of wearable devices are extensively researched, such as the dependability and accuracy of evaluation. Recently, researchers have raised concerns regarding the long-term usage of wearable devices, highlighting the importance of combining behavioral change approaches such as goal setting, feedback, and rewards with other evidence-based techniques [
4]. In addition, whilst wearable technology has a wider range of potential applications, users do face usability issues with these devices [
5]. Furthermore, studies have found that the acceptability of wearable items for regular consumers, as well as their comfort level, is critical to a product’s success [
6]. For these reasons, it is necessary to identify and address the usability issues of wearable devices because the success of these devices relies on users’ product experience. As user experience is one of the core aspects contributing to a product’s popularity, it is crucial that the device performs its functionality in an easy, comfortable, and intuitive way. Therefore, continuous usability testing is essential to observe users as they interact with a product or prototype and ensure a satisfactory user experience, contributing to the widespread and quick adoption of wearable devices.
Usability is a multifaceted concept that can be explored in various ways. The term “usability” has been used extensively for the past few decades, and various individuals interpret it differently. Some people associate usability with ease of use or convenience and examine it from the standpoint of an interactive interface; meanwhile, others refer to usability as a conceptual scale for the real-time evaluation of a product’s functionality obtained from feedback from potential users [
7]. Therefore, despite the testing paradigm, usability testing assesses a product’s ability to fulfill its intended functions. Food, consumer items, websites or web applications, computer interfaces, papers, and electronics are all examples of products that benefit from usability testing.
There are various factors in developing usability criteria. According to Nielsen, usability has five key characteristics: learnability, satisfaction, efficiency, low mistake rate/quick error recovery, and memorability [
8]. The International Standards Organization (ISO) describes the effectiveness and efficiency of, and satisfaction with, a product as usability parameters [
9]. Shackel defined usability attributes as effectiveness, learnability, flexibility, and user attitude [
10]. According to Hix and Hariston, the usability parameters are initial performance, long-term performance, learnability, retainability, advanced feature usage, first impressions, and long-term user satisfaction [
11]. Furtado defined the usability parameters as ease of use and learning [
12]. The parameters defined by ISO and Nielsen are commonly used for usability evaluation [
13]. There are several usability evaluation techniques. Any approach or technique used to perform usability evaluation or testing to enhance the usability of an interactive system at any point of its development is known as a usability evaluation method (UEM). Several usability testing methods have been introduced, such as laboratory-based formative assessment with users, heuristics, questionnaires, and other expert-based usability evaluation techniques, model-based analytic approaches, all types of expert assessment, and the remote evaluation of interactive software after field deployment.
The factors that influence the adoption of smart wearable devices among potential customers are wearability, ease of use, compelling design, functionality, and price. Wearability refers to the existence of pain, degree of comfort, ease in wearing, requiring support to wear, or willingness to wear again. Meanwhile, ease of use means no interference in daily activities, comprehensibility, and learnability. Compelling design means visually appealing. Functionality refers to the core features. Although conventional usability evaluation methods such as thematic analysis, heuristic evaluation, interviews, and think-aloud and cognitive walkthroughs can be used, the main shortcoming of these methods is that the findings in a laboratory can sometimes be difficult to illustrate [
14]. Alternatively, the Agency for Healthcare Research and Quality has suggested that questionnaires can be effectively used for usability evaluation [
15]. The most widely used questionnaires are the system usability scale (SUS) and UTUAT, the post-study system usability questionnaire covering use, satisfaction, ease, etc.
The dependability and accuracy evaluation of wearable devices are currently the focus of extensive research. Concerns have been raised relating to the use of wearable devices, and researchers have indicated the importance of combining behavioral change approaches such as goal setting, feedback, and rewards alongside other evidence-based techniques [
15]. A mixed response to adopting wearable devices has also been observed, with the response of users being less positive than expected [
5]. In addition, whilst wearable technology represents a significant range of potential applications, usability issues still remain to be solved by research [
5]. Two key features have also been identified that are critical for a product’s success: the acceptability of the items to regular consumers and their comfort level [
6].
Therefore, the usability issues of wearable devices remain to be solved, as their success depends on users’ experience; therefore, it is critical that a wearable device is easy and intuitive to use and comfortable to wear. As a result, continuous usability testing is essential for manufacturers to ensure the widespread and rapid adoption of their particular product. This testing observes users interacting with the product or prototype and is a key factor in ensuring a satisfactory user experience.
The key aim of this paper is to investigate the usability and other issues relating to the existing commercially available wearable smartwatches. The usability will be analyzed based on the customized heuristic evaluations and the system usability scale (SUS). These usability evaluations of smartwatches will provide a comprehensive, user-specific, and customized measure of usability approach, which will provide quantitative and statistical data for benchmarking and further design improvements.
2. Literature Review
Smartwatches have received significant attention for their versatility, which satisfies a wide range of consumer interests, including fitness and health monitoring [
4]. Other than basic health and fitness features, smartwatches offer a wide range of features varying from person to person, such as real-time vital signs and overall health monitoring in senior people with Parkinson’s, heart disease, or other chronic conditions. In addition, they may capture both crucial and trivial data regarding patient location and behavior more quickly and precisely [
16]. According to recent IDC surveys on wristwatch usage, the industry will continue to expand exponentially, with 373 million units expected to be shipped in 2020, up from 100 million in 2016. In 2016, smartwatches made up 25% of all smart wearables. Although smartwatches have a wide range of functionalities, they still face technical and usability challenges, such as aged people being less familiar with new technologies or ease of use or comfort issues for patients [
17]. The usability testing needs to examine how consumers really use their smartwatches rather than observing how they were intended to be used. The usability test aims to understand consumers’ requirements of these devices through the examination of usability concerns connected to the real tasks that users undertake using their smartwatches [
12].
Despite many studies exploring the usability, perceived value, and role of smartwatches in patient monitoring, only limited research has evaluated the relationship of usability and brand factors influencing usability using different usability evaluation methods. This section provides a summary of the fragmented research on wearable technology. Y. Wu et al. [
18] proposed a novel method for evaluating a smartwatch’s usability based on eye movement tracking. The eye tracker recorded the testers’ eye movements, and the eye movement data were added to the system for calculating the usability rating index. In the study, 10 participants were asked to perform specific tasks with Motorola 360 smartwatches and were interviewed afterwards. The results of the task test showed that eye movement data can accurately assess the icons on smartwatch interfaces and illustrate how users search for certain features. J. Chun et al. [
19] completed a study to identify challenges in smartwatch usability using in-depth interviews. Many users appeared to find mobile devices convenient for checking pushed alerts. However, smartwatches obtained a poor response when individuals tried to interact with visually rich material. According to the study, individuals like to use smartwatches to listen to music and check the weather. The study of N. Anggraini et al. [
20] observed how usability, brand, and price influenced customers’ impressions of smartwatches in Indonesia. In order to conduct the study, 116 Indonesian respondents were surveyed. The participants were less concerned with usability and instead focused on brands and pricing when making their purchases. Most of them did not consult evaluations and recommendations from others when they already had a brand in mind. However, one of the shortcomings was the limited number of participants involved, as with a greater number of participants, the results could be generalized to Indonesia’s smartwatch users.
M. Bang et al. [
21] designed a nurses’ watch app with the Motorola 360 smartwatch in order to automate patient monitoring and checklist systems. The usability of the IT systems, including complexity, training, and need for support for the user interface, was evaluated using the system usability scale (SUS). The mean score in the SUS resulted in the average score as the question on comfort and security from the SUS questionnaire received very poor marks. Neuropsychiatric diseases are the primary cause of disability globally, but current mental health monitoring methods rely on subjective DSM-5 specifications; however, developments in EEG and video monitoring technology have not been extensively embraced because of inconvenience. Kamdar and Wu [
22] presented a novel platform–the Passive, Real-time Information for Sensing Mental Health (PRISM)—through the integration of heart rate, light, and motion data from a smartwatch application and text input from a web application. The SUS questionnaire was used to evaluate its usability, and a total of 13 healthy participants were asked to wear the Samsung Gear S smartwatch for usability evaluation. The SUS questionnaire demonstrated that participants had a positive attitude toward PRISM. The participants said that the system was simple to use, requiring little expertise or training; however, they lacked the motivation to use it regularly.
In the study of C.R. Laborde et al. [
23], the authors evaluated user satisfaction, usability, and compliance with the help of a real-time, online assessment and mobility monitoring (ROAMM) mobile app designed for smartwatches. In the study, 28 participants were asked to wear smartwatches and fill out a standardized questionnaire. The ROAMM wristwatch app received high marks from older people with knee osteoarthritis, indicating their satisfaction with the app. The condition of atrial fibrillation (AF) is difficult to diagnose since it often presents with mild symptoms. Smartwatches can be used for long-term, non-invasive monitoring, which could improve AF care. The main objective of the study of E.Y. Ding et al. [
24] was to evaluate the efficacy of arrhythmia discrimination using a wristwatch. A total of 40 participants were observed, and a questionnaire was completed evaluating several aspects of the device’s usability. A real-time algorithm was used to analyze the pulse recordings. The results showed that the majority of participants thought the smartwatch was very usable. The general level of comfort and the level of data privacy when wearing a wristwatch for rhythm monitoring were both positively correlated with younger age and past cardioversion, respectively. The participants deemed the smartwatch to be extremely acceptable despite their age, lack of knowledge of smartwatches, and a significant load of comorbidities. Although smartwatches may be potential tools for atrial fibrillation identification, elderly stroke patients have not been given enough attention.
E. Dickson et al. [
25] introduced the pulse watch, a smartwatch-based AF detection system, and evaluated its precision, usability, and adherence in stroke patients. The participants filled out questionnaires to evaluate numerous psychosocial factors and health-related behaviors. The results demonstrated that older stroke patients found the scheme useful and would stick to the monitoring schedule. Smartwatches have also been demonstrated to accurately assess blood pressure in glaucoma patients; however, their usability evaluation has been neglected in previous research. For this purpose, S.B. Bhanvadia et al. [
26] conducted an experiment where adult participants received a wristwatch blood pressure monitor for indoor monitoring using the Omron BP monitor and an associated mobile app. Usability testing methodologies included the post-study system usability questionnaire (PSSUQ), which was used for assessing aspects of user satisfaction such as the overall system usefulness, information quality, and interface quality, and the system usability scale (SUS), which was used to assess the overall usability of smartwatches. Furthermore, usability on the basis of age, gender, and race was also evaluated. The usability evaluations demonstrated that the smartwatches had satisfactory usability ratings, although older age was linked to lower levels of perceived usability and user experience.
Table 1 shows the current state-of-the-art methods to evaluate the usability of smartwatches.
The interface of smartwatches is one of the main factors contributing to a satisfactory user experience. Everything visualized on the screen should be clear and self-explanatory so that users do not become confused. However, this is limited by its size. The real dilemma is to identify usability issues while improving the user experience for smartwatches. The buyers’ judgment of the level of satisfaction for a certain product might vary. Smartwatches are particularly adapted to record the variances in mood, exhaustion, sleep quality, etc., remotely and conveniently as experienced by users. These are the factors that influence the usability of smartwatches, and a single heuristic method alone may be insufficient when evaluating the usability of smart devices, which also require more context-specific heuristics. However, as
Table 1 shows, researchers have infrequently used heuristics for the usability evaluation. In addition, there is only one evaluation method in each of the studies; however, if we conduct usability evaluations using more than one method, we may reveal more smartwatch usability issues. Therefore, in this study, we sought to evaluate the usability of four different smartwatches which includes Samsung Galaxy watch 5, Samsung Galaxy watch 4, Fibit Charge 5, and Fibit Versa 2. Our main contributions are the following:
5. Results and Discussion
In this section, the results from the questionnaire and findings are discussed. The usability evaluations were performed on both wearable devices and apps, using both the heuristic evaluation and SUS. This section also provides a comparison between the results of this research with the current research. We conducted the heuristic evaluation using 10 evaluators who used the watches for 10 days and completed the questionnaires, adding, if necessary, brief comments or explanations about problems and their solutions. In the group of evaluators, six were females and the rest were males. Three of the users were Samsung watch users, and seven were Fitbit users. In our study, the Fitbit and Samsung Watch users were randomly selected as evaluators. Therefore, the distribution of Samsung Watch users and Fitbit users was coincidental. For the SUS survey, 20 users used the watches for 30 days and completed the SUS questionnaire. Of the users, 15 were male and 5 were female. None of them had any prior experience with the Samsung Galaxy Watch 4, Galaxy Watch 5, Fitbit Charge 5, or Fitbit Versa 2.
5.1. Evaluation Method 1: Heuristic Evaluation Results
The HCI experts were given the task of installing the application on mobile phones, connecting it to the watches, and analyzing them for usability issues. A total of 20 usability principles were evaluated by each of the 10 experts independently. We compiled the usability issues encountered by each usability expert to create this heuristic evaluation report and present it in
Table 5 and
Table 6. These tables show the number of criteria violated per severity rating for each heuristic listed. Three hundred and seven (307) usability issues were identified in all four watches, where one-hundred and nine (35.5%) were discovered in the Fitbit Charge 5 with twenty-two minor, nineteen major, and fifteen disastrous problems. However, only 46 (14.9%) usability issues were reported for the Samsung Galaxy Watch 5. Out of 307, 66 issues (21.4%) were reported in the Galaxy Watch 4, and the remaining 86 (28%) were reported in the Fitbit Versa 2. Most of the problems were located in H8 “Ease of Use”, H9 “Flexibility and Efficiency of use”, and H10 “Features”.
The Galaxy Watch 4 demonstrates huge advances in its hardware and software due to its Exynos W920 architecture, Samsung’s first 5 nm wearable processor [
61]. There are two Cortex-A55 cores on the Exynos W920, along with a Mali-G68 MP2 GPU, designed to deliver 20 percent faster CPU performance and 10 times faster GPU performance than the Exynos W9110. In addition to higher power efficiency, the 5 nm processor also results in longer smartwatch battery life. A dedicated low-power processor on the Exynos W920 handles always-on displays and other tasks while consuming very little power [
62]. This architecture also manages heart rate and notifications in the background more easily. The heuristic evaluations observed that it has a great UI design as compared to other watches. The Galaxy Watch 4′s UI design combines the look and feel of Samsung’s Tizen platform and Wear OS [
63]. The watch has a virtual rotating bezel, two buttons on the right-hand side, and a colorful and intuitive interface that allows for easy navigation and customization. Watch 4 has a redesigned notification system that allows more actions and interactions, including a more consistent and seamless experience across Samsung devices, such as syncing settings, installing apps, and transferring data. The customizable watch face editor allows you to choose from various customization options, colors, and styles. The simplified settings menu is easier to navigate and adjust [
64], again allowing user customization, including gesture features that are so sensitive that even moving the wrist can swipe up or down the layout on the watch.
A few evaluators commented that its haptic bezel is not very smooth. The one other design feature is the access strap tucked underneath the wrist strap; therefore, there is no extra flap that can bother the user, dangle, or become lost. This is conclusively a good design approach and increases its wearability. Samsung Galaxy Watch 5 (GW5) has the same chipset, RAM, and storage as in the GW4. The main difference is its battery size. The watch has 284 milliampere per hour (MAH), which offers 40 h of battery life. However, it was noted by evaluators that it has 100% battery and stays live for around 32 h, with a touch-sensitive bezel for easy navigation, as in the GW4. Another difference is the Bluetooth version, which is upgraded to 5.2. The GW4 uses a C-type cable to charge and provides a 45% charge in 30 min. According to the evaluators, it also has sapphire glass upgraded from gorilla glass, which provides a more solid and expensive feel.
Fitbit Versa 2 only weighs 40 gm, which makes it comfortable to wear; however, the Samsung watches weigh less. Versa 2 provides a voice input feature in the watch to set timers and reminders using Alexa. The evaluators observed that its voice recognition was more than 95% effective. The evaluators found completing tasks with it was quite easy and fast, and the system information was understandable, as shown in
Table 5. For example, the Fitbit obtained the highest score for the heuristic relating to flexibility and efficiency of use. On the other hand, the Fitbit Charge 5 is a discrete fitness tracker band, that is simple to use and with a bright display, but at the expense of the battery. The Fitibt is also easily wearable in daily life as it is very light, has auto activity detection, works accurately, and runs an electrodermal activity (EDA) sensor instead of a button, making it more aesthetically pleasing.
Table 5 clearly shows that minimum usability issues were found in the Galaxy Watch 5. A report on the heuristic evaluation was produced based on the usability issues encountered by each usability expert in
Table 5 and
Table 6. Out of 307 usability issues, 109 (35.5%) were found in the Fitbit Charge 5, with 22 minor, 19 major, and 15 disastrous problems. However, only 46 (14.9%) usability issues were reported for the Samsung Galaxy Watch 5, 66 (21.4%) were reported for the Galaxy Watch 4, and the remaining 86 (28%) were reported for the Fitbit Versa 2.
The evaluators mostly mentioned self-descriptiveness problems as cosmetic issues, indicating that this was not something requiring immediate attention for these watches. Most problems identified as being located in H8 as “Ease of Use”, H9 was “Flexibility and Efficiency of Use”, and H10 was “features”. The usability problems are explained above. Smartwatches from Samsung and Fitbit require user consent before storing any personal data. Samsung takes privacy seriously and seeks user consent before collecting personal data through their devices, including smartwatches. As part of the set-up process for Samsung or Fitbit smartwatches, the user is required to agree to the terms and conditions. These terms include information about data collection and usage. The brands collect data such as your name, email, location, health metrics, payment information, etc. These brands also state that they ask for your consent before collecting or sharing your data [
65]. In addition, the evaluators reported that they could manage app permissions to control what data they wanted to provide. The usability problems are explained above. The evaluators have also provided comments regarding usability and design issues and the better features of smartwatches, and we categorized those comments into positive and negative comments, which are presented in
Table 7 and
Table 8.
Table 7 presents the comments on Samsung watches and
Table 8 presents those for Fitbit watches.
Negative comments relate to the usability problems, and it is observed that Samsung watches need improvement for quick and efficiency synchronization with the mobile applications.
Table 7 shows that the GW5 has more positive comments than the GW4 from the evaluators. Although some improvements are still required, i.e., relating to the accuracy of activity tracking features in the GW5 because four of the evaluators were dissatisfied with the results. We also observed that Charge 5 has design issues, missing voice input, and with the speakers and voice assistant; however, Versa 2 has a better battery life and a voice assistant that makes setting reminders and alarms easier. From the results in
Table 6 and the comments from
Table 8, we concluded that Fitbit Versa 2 has fewer usability problems than Fitbit Charge 5 according to the evaluators. However, the Galaxy Watch 5 still has the lowest number of usability problems compared to the other three watches. We present these usability problems in
Table 9 with their severity rating value.
5.2. Evaluation Method 2: SUS Evaluation Results
An SUS score above 68 is considered above average, and a score below 68 is considered below average; thus, 68 is the average score, making it the 50th percentile [
55]. An SUS score greater than 80 is considered excellent, while a score below 51 is considered an awful design in terms of efficiency, effectiveness, and overall ease of use. Once the 20 participants filled out the SUS survey for each watch, we calculated the mean for each watch regarding that item number (#), as shown in
Table 10. GW5 obtained the highest SUS score of 87.375 among all four watches, and was considered as possessing excellent usability.
The battery life of these devices is one of the most significant obstacles to their acceptance and usability in consumer markets [
66]. With a battery life of up to 50 h, the GW5 is the most useful in daily life, even though it is almost identical to GW4. In addition to improving battery life, the GW5 has some features that make it more appealing, such as an aluminum metal frame with a sapphire crystal display. However, GW5 is much more expensive than GW4. Moreover, the SUS score in
Table 7 shows Fitbit Versa 2 has a greater usability score than Fitbit Charge 5. In the Samsung Galaxy range, the Galaxy Watch 5 is better. From these results, we also analyzed that the latest launch of the GW5 had a higher usability value than other Samsung watches (GW4). However, in the Fitbit series, Fitbit Versa 2 was more efficient than the Fitbit Charge 5. Galaxy has higher values in odd questions, which indicates that the users mostly agree that this system is easy to use and efficient, and lower values in even questions indicating that they did not find the system unnecessarily complex and inefficient, as compared to Fitbit.
From these evaluation results, we concluded that the customized heuristics and SUS both favored Galaxy Watch 5 because the evaluators identified fewer usability issues, and there were more positive comments among all four watches in terms of usability. Hence, this also demonstrated that the customized heuristics validated the heuristics for the usability evaluation of watches. In this work, we overcame the limitations of previous research with a combination of different evaluation methods and the customization of 11 heuristics for usability evaluation.
We performed a comparative analysis of our findings with the previous research to find if we have generalized values with other studies. The Galaxy Watch 4′s haptic bezel was identified as not being very smooth, which sometimes made navigation difficult. As a result, the user lost control and freedom, and the system became intolerable. The findings were similar to those found in other studies [
67], where users found it difficult to navigate between pages when applications lacked backward and forward navigation buttons. As a result, this aspect of the design made it difficult to navigate a smartwatch. Fitbit Versa 2 does not pair automatically after resetting, thus violating the heuristic of ease of use. In addition, Fitbit Charge 5 does not support voice assistants because it does not have a microphone and speaker, making alarm settings difficult and violating the heuristic of ease of use. In a previous study [
68], it was shown that ease of use in the system improves the quality experience of the user with that system.
The results of the SUS survey for smartwatches showed that the users found them easy to learn and use, resulting in improved performance. This aligns with previous studies [
69], which also found the systems easy to learn. The Watch 5 was found to be particularly effective for everyday use due to its long battery life, customization options, and wearability, as also reported by the authors of [
70]. Overall, the usability results were positive for the effectiveness of the watch applications. However, issues were reported after updates, which confirmed the findings of earlier studies by the authors of [
69,
71] that the applications were efficient for users. Additionally, the participants reported that they would easily remember how to use the watches in the future, a finding supported by the authors of [
72,
73]. The users also encountered fewer errors while using the watch applications, a result consistent with earlier studies by [
71,
72,
74]. The users were also satisfied with the features, functionalities, design, information, and display quality of the Watch 5 among all four watches, as reported in studies by [
69,
74,
75]. The Galaxy watches were also found to be easy to use, leading to user satisfaction, as reported in studies by [
75,
76]. The new haptic bezel feature of the Galaxy watch was highly rated for its aesthetic appeal, as also observed in [
76]. The Fitbit Charge 5 provided advanced fitness tracking features, as reported in [
75], which found the application useful in achieving fitness-related goals.