Next Article in Journal
Eccentric Quasi-Isometric Exercise Produces Greater Impulse with Less Pain than Isokinetic Heavy–Slow Resistance Exercise in Ankle Plantar Flexors: Quasi-Randomized Controlled Trial
Next Article in Special Issue
“I’m a Fish!”: Exploring Children’s Engagement with Human–Data Interactions in Museums
Previous Article in Journal
Performance and Psychophysiological Responses to Mental Fatigue in Artistic Swimming
Previous Article in Special Issue
Enhancing Accessibility in Education Through Brain–Computer Interfaces: A Scoping Review on Inclusive Learning Approaches
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

HEUXIVA: A Set of Heuristics for Evaluating User eXperience with Voice Assistants

1
Escuela de Ingeniería Informática, Pontificia Universidad Católica de Valparaíso, Valparaíso 2340000, Chile
2
Departamento de Electrotecnia e Informática, Universidad Técnica Federico Santa María, Viña del Mar 2520000, Chile
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(20), 11178; https://doi.org/10.3390/app152011178
Submission received: 15 September 2025 / Revised: 12 October 2025 / Accepted: 16 October 2025 / Published: 18 October 2025
(This article belongs to the Special Issue Emerging Technologies in Innovative Human–Computer Interactions)

Abstract

Voice assistants have become increasingly common in everyday devices such as smartphones and smart speakers. Improving their user experience (UX) is crucial to ensuring usability, acceptance, and long-term effectiveness. Heuristic evaluation is a widely used method for UX evaluation due to its efficiency in detecting problems quickly and at low cost. Nonetheless, existing usability/UX heuristics were not designed to address the specific challenges of voice-based interaction, which relies on spoken dialog and auditory feedback. To overcome this limitation, we developed HEUXIVA, a set of 13 heuristics specifically developed for evaluating UX with voice assistants. The proposal was created through a structured methodology and refined in two iterations. We validated HEUXIVA through heuristic evaluations, expert judgment, and user testing. The results offer preliminary but consistent evidence supporting the effectiveness of HEUXIVA in identifying UX issues specific to the voice assistant “Google Nest Mini”. Experts described the heuristics as clear, practical, and easy to use. They also highlighted their usefulness in evaluating interaction features and supporting the overall UX evaluation process. HEUXIVA therefore provides designers, researchers, and practitioners with a specialized tool to improve the quality of voice assistant interfaces and improve user satisfaction.

1. Introduction

As voice assistants become more common in smartphones, smart speakers, and other devices, understanding and improving user experience (UX) can significantly impact their use and effectiveness. UX evaluation can detect usability issues, identify user needs and preferences, and guide the design of more intuitive and effective voice interfaces. Heuristic evaluation is a widely used method for assessing UX with interfaces. It involves a group of UX/usability experts examining a system and evaluating whether its interface adheres to a set of principles known as heuristics [1]. This method is useful and effective for evaluating UX with voice assistants, as it allows for the quick and economical identification of usability problems. Nevertheless, it is crucial to have a set of heuristics that can detect specific issues related to these types of intelligent assistants, such as issues related to voice interaction, clarity and naturalness of the assistant’s responses, ease of use of voice commands, and the assistant’s ability to understand and effectively respond to user requests.
However, traditional UX/usability heuristics may not directly apply to voice interfaces, which rely on spoken language and audio feedback. Therefore, developing specialized heuristics adapted to the specific features and limitations of voice interaction can provide a more accurate and comprehensive evaluation for designers and researchers. For this reason, we proposed HEUXIVA: a set of 13 Heuristics for Evaluating User eXperience with Voice Assistants. The abbreviation HEUXIVA stands for Heuristics for Evaluating User eXperience with Voice Assistants. The acronym derives from the key concepts that define the purpose of the set: H for Heuristics, E for Evaluating, U for User, X for eXperience, V for Voice, and A for Assistants.
We used the methodology proposed by Quiñones et al. [2] to develop HEUXIVA. The proposed set was validated in two iterations through heuristic evaluation, user test, and expert judgments using Google Nest Mini. This allowed us to refine the proposed set and verify its effectiveness in detecting usability/UX problems related to voice assistants. Based on the results obtained, we conclude that, although preliminary, the findings indicate that HEUXIVA is a useful and reliable instrument for evaluating the user experience of voice assistants and that the research is progressing in a promising direction.
The article is organized as follows: Section 2 presents the background; Section 3 presentes the related work; Section 4 details the methodology applied to create HEUXIVA and explains the activities performed to develop, validate, and refine the set; Section 5 describes the validation process and its results; Section 6 presents HEUXIVA: a set of heuristics for evaluating user experience with voice assistants; Section 7 details the discussions; Section 8 presents the limitations of the study; and Section 9 discusses the conclusions and future work.

2. Background

The concepts of voice assistants, user experience and user experience evaluation are presented below. In addition, the related work is discussed.

2.1. Voice Assistants

Virtual assistants, including voice assistants and chatbots, are distinguished by their modes of interaction: voice and text, respectively [3]. These devices leverage artificial intelligence to perform various tasks based on user commands, such as sending emails, making phone calls, providing personalized recommendations, and controlling appliances [4,5,6]. In the literature, virtual assistants are referred to by several names, including Intelligent Personal Assistant (IPA) [4], Conversational Agents [7], and Virtual Personal Assistants (VPA) [8]. For the purposes of this study, the term IPA is used to denote these devices [9].
The characteristics of voice assistants vary depending on the device. Based on the literature review [9,10,11,12,13,14,15,16,17,18], we identified the following main features that describe these intelligent assistants:
  • Effective Communication: The interaction between the user and the voice assistant is bidirectional, involving a continuous exchange of information and roles (sender and receiver) [13].
  • Effective: Requests and responses are not restricted to a single topic and are coherent with the user’s environment [13,14].
  • Activity Management: The voice assistant enables management actions, such as scheduling appointments, alarms, calls, sending messages, translations, among other tasks [14,18].
  • Customizable: The device can be adapted according to the user’s preferences or needs, whether it be in language, voice of the assistant, voice commands for switches or smart plugs, news preferences, routines, among others [15,16].
  • Multi-user: The device can recognize the voice of other people, making it usable by everyone present in the same location [17].
Additionally, we conducted a formal inspection to detect usability/UX issues with voice assistants (see Section 4.1). The results of this evaluation can be seen in Appendix D. As a result of this evaluation and considering the findings presented by [9,19,20,21], we consider the following features relevant when evaluating voice assistants:
  • Security and Privacy: The device has a privacy policy that specifies what data they collect, why they collect it, and how the user can update, manage, export, and delete it.
  • Multi-linkable: The device allows linking/integrating other devices and external services/apps and controlling their use, such as smart home devices (controlling lights, appliances, temperature, and air conditioning, etc.), music services, among others.
  • Culturizable/Adaptable: The device recognizes/generates expressions and sets of words that cannot be deduced from the meanings of the words forming them, all according to the geographical location of the user.
  • Voice Interface: The device provides the corresponding information through the voice interface.
  • Guidance and Assistance: The device guides and assists the user with problems related to the use and installation/configuration of it.

2.2. User Experience

According to ISO 9241-10, the user experience (UX) is defined as “a person’s perceptions and responses that result from the use and/or anticipated use of a product, system or service” [22]. Several authors have proposed various attributes or factors to describe UX, such as Park et al. [23], Lykke et al. [24], and Morville [25]. To develop HEUXIVA, we selected the factors proposed by Morville [25], as they are concise and easy to consider for evaluating these devices:
  • Useful: The product must be useful and satisfy a user’s need.
  • Usable: The system must be easy to use and quick to learn.
  • Desirable: The design elements should be attractive and interesting to the user to cause appreciation and emotion.
  • Findable/locatable: Information should be navigable and easy to find within and outside a system.
  • Credible: The image of the company or system must be trustworthy.
  • Valuable: The system must provide added value or contribute to the user to satisfy his needs.
  • Accessible: The system must be able to adapt to users with some type of disability.

2.3. User Experience Evaluation

UX evaluation methods provide information about a system interface and its users through several techniques, aiming to identify how users feel when interacting with software, devices, or applications. There are different evaluation methods, which can be classified into two main approaches: formative and summative [26,27]. Formative evaluations are conducted to improve the design of an interface, identify usability/UX problems, and detect elements that need improvement. Both qualitative and quantitative methods are used, such as usability inspections, usability tests, and heuristic evaluations [26,27]. On the other hand, summative evaluations are performed to measure whether a system meets expectations using quantitative and qualitative data, such as satisfaction surveys, performance metrics, and A/B tests [26,27]. To develop HEUXIVA, we used the following UX evaluation methods (throughout the stages of the applied methodology):
  • Heuristic evaluation: A method in which a group of 3 to 5 evaluators analyze an interface, identifying positive and negative aspects according to a set of rules called heuristics [1].
  • User testing (thinking aloud): A method involving representative users who navigate and interact with a system while performing predefined tasks, verbalizing their thoughts and actions aloud [28].
  • Expert judgment: A method where UX experts and/or specialists apply their knowledge to review a system’s interface. They navigate the platform and identify elements that could improve or negatively affect the user experience [29,30].

3. Related Work

Langevin et al. [10] developed a set of usability heuristics to guide and evaluate the design of conversational agents, including both chatbots and voice assistants. They adapted Nielsen’s heuristics [31] and worked with experts to establish them. As a result, the authors proposed 11 heuristics, among which the “Context Prevention” and “Reliability” heuristics stand out as domain specific. Although this set includes voice assistants, it does not focus particularly on them, as chatbots are also considered (with a primary focus on evaluating mainly chatbots).
On the other hand, Sanchez-Adame et al. [11] created a set of 5 heuristics for chatbots. These devices, like voice assistants, use natural language processing to interact with users. The proposed heuristics are useful for evaluating chatbots but do not assess voice assistants. Moreover, they focus mainly on evaluating usability (efficiency, effectiveness, satisfaction) and not UX components.
Zwakman et al. [12] conducted research to verify if the System Usability Scale (SUS) [32] remains suitable for assessing usability aspects and end-user experiences. SUS was created to evaluate a system with an interface/GUI, and there are several differences between a voice-based environment and a GUI environment. The results indicated that using SUS might not be the best measure to evaluate the usability of smart assistants, as voice systems are interactive and more like human interaction. Additionally, SUS does not consider features unique to a voice environment, such as sound quality/clarity, ease of understanding, level of immersion, and speech monotony that may cause boredom. Based on the above, the authors adapted the 10 questions of SUS to evaluate voice assistants (VUS, [12,33]).
On the other hand, Cowan et al. [9] identified various problems related to the experiences of occasional users of Intelligent Personal Assistants (IPAs). Their research highlighted the following six main themes: (1) Challenges in supporting hands-free interaction; (2) Performance issues related to user accent and speech recognition overall; (3) Difficulties in integrating with third-party applications, platforms, and systems; (4) Social embarrassment as a barrier to using these devices in public settings; (5) The anthropomorphic nature of IPAs; and (6) Concerns regarding trust, data privacy, transparency, and ownership.
While there are interesting studies and sets of heuristics for evaluating aspects of smart assistants, none of them are directly focused on evaluating the UX with voice assistants. This reaffirms the need to create unique and specific heuristics for this domain that can effectively detect usability/UX problems, as these issues negatively affect users’ perception and interaction with these devices. Voice assistants have made the use of artificial intelligence more accessible, as users have been able to interact more intuitively with these devices. Given that HEUXIVA is a specific set for evaluating voice assistants, it will allow for establishing guidelines for the development and implementation of these technologies, making the interaction increasingly user-friendly.

4. Material and Methods

4.1. Methodology Applied for Developing HEUXIVA

The methodology proposed by Quiñones et al. [2] was used to develop the proposed set, since it establishes a methodical, iterative, and effective work plan using qualitative and quantitative methods. HEUXIVA was established and validated in two iterations through heuristic evaluations, expert judgments, and user testing (see Figure 1. Iterations are marked as “It. N”). In iteration 1, the eight stages of the methodology were conducted, and the first set of heuristics (HVA) was developed and validated. In iteration 2, the last three stages of the methodology were carried out to specify the final set of heuristics (HEUXIVA). Details of each iteration, including inputs, outputs, and activities performed, can be reviewed in Appendix A (iteration 1) and Appendix B (iteration 2). Appendix C presents the set of heuristics for voice assistants created in each iteration. Each set is abbreviated differently for each iteration: HVA (first version, iteration 1), HEUXIVA (second and final version, iteration 2). The iterations conducted are explained below.

4.2. First Iteration: Development Process for HVA

In “Step 1: Exploratory Stage”, a literature review was conducted to obtain information on voice assistants and their features (see Section 2.1), user experience attributes (see Section 2.2), existing heuristic sets and related elements (see Section 2.3). To conduct this literature review, the digital databases Scopus, ScienceDirect, and Google Scholar were considered. With the results obtained, the most relevant documents whose topics were related to the area were analyzed. Although there are several related studies to intelligent assistants, none establishes a specific set for voice assistants. For this reason, studies related to UX in the interaction with these devices and studies that are based on heuristics [10,11] and/or questionnaires of virtual and/or voice assistants [12,33] were selected.
In “Step 2: Experimental Stage”, a formal inspection of a Google Assistant device was performed. The objective was to familiarize the user with the device and detect problems that may negatively affect the user experience. Two researchers from the team used the device for 11 days. The first days were to get to know and adapt to the device, the following days the device was used fluently with the objective of finding possible usability/UX problems. During these days, a total of 13 problems were detected. Scores were assigned on a 4-level scale to the detected problems using the severity scale used in heuristic evaluations, where severity is understood as the level of seriousness of the identified problem [34] (for more details about the problems detected, see Appendix D). Most of the identified problems have a value less than or equal to 2, even so two were identified that stand out as catastrophic problems (with a score of 4). These are “P1: Device ignores user” and “P2: Difficulty initializing device”. The first can generate problems with the user regarding the correct functioning of the device and its value to the user; the second causes frustration in the user and the desire or need to use it is lost.
In “Step 3: Descriptive Stage”, the relevant information obtained from the previous stages was selected. For this, we prioritized the information collected on a 3-level scale (1 being: slightly relevant information and 3: very important information) [2] and this was later organized into 5 categories: information about voice assistants, features of voice assistants, usability/UX attributes, sets of existing heuristics and other related elements, and usability problems of voice assistants (detected in the experimental stage through formal inspection). Finally, according to the prioritization, the following information was selected to develop the heuristics (the details of the selected information can be seen in the Appendix E):
  • 5 types of information of voice assistants: 3 definitions of voice assistant [4,7,8], need to create a UX evaluation method for voice assistants, taxonomy of voice assistants.
  • 10 voice assistants’ features: effective communication, effective, activity management, customizable, multi-user, security and privacy, multi-linkable, culturizable/adaptable [9], voice interface, guidance and assistance.
  • 3 usability attributes: effectiveness, efficiency, and satisfaction [35].
  • 7 UX attributes: useful, desirable, usable, findable/locatable, credible, and valuable, and learning capacity [25,36].
  • 5 sets of existing heuristics or related elements: Nielsen [31], Langevin et al. [10], Sanchez-Adame et al. [11], Zwakman et al. [12,33] (VUS), Nowacki and Gordeeva [37].
  • Voice Assistant Usability Issues: formal inspection made in Stage 2, Infrequent Users’ Experiences of Intelligent Personal Assistants by Cowan [9].
While usability is conceptually part of UX, in this study we considered its attributes (effectiveness, efficiency, and satisfaction [35]) as a distinct group of evaluative dimensions. This decision was made to emphasize their relevance in the design and validation of the proposed heuristics. Separating these dimensions allowed us to ensure a balanced coverage of both functional quality (usability) and experiential quality (UX). This distinction also facilitated the mapping of heuristics to specific measurable criteria during the development process.
In “Step 4: Correlational Stage”, the correlation between voice assistant features, usability/UX attributes, heuristic sets, associated usability issues (detected in the experimental stage) and items from the Voice Usability Seal (VUS) questionnaire was performed. During this process, it was discovered that there is no heuristic that fully covers the usability/UX attributes related to the features, these are partially covered. However, there are several heuristics and information that allowed the new set to be generated (this correlation process can be seen in Appendix F).
In “Step 5: Selection Stage”, the existing heuristics were selected based on the information collected in Stage 4. The following actions were determined: adapt, keep or eliminate. 1 heuristic was kept, and 39 heuristics were adapted. Of these 40 heuristics, 12 were identified as useful, 11 as important and 17 as critical. A total of 11 heuristics were eliminated because they are not related to the topic of voice assistants or are fulfilled by another selected heuristic. Even though there are no heuristics that fully cover the characteristics of voice assistants, mixing several proposals and adding more complete specifications helped to generate the new set (details on the selection process can be seen in Appendix G).
In “Step 6: Specification Stage”, the preliminary set of heuristics for voice assistants was proposed (see Appendix C). This version contains 12 heuristics which were documented using the template proposed in the methodology [2] including the following information: ID, name, definition, covered voice assistant feature, covered usability/UX attribute, and related heuristics. In “Step 7: Validation Stage”, the first version of heuristics (HVA) was validated through heuristic evaluation and expert judgment. Details of these validations can be seen in more detail in Section 5.1 and Section 5.2, respectively. In “Step 8: Refinement Stage”, the heuristics were refined based on the feedback obtained in Stage 7 as follows (for more details on the refinement see Appendix H):
  • 12 heuristics need to be refined mainly by improving their checklists and definition.
  • A new heuristic needs to be added to cover the user response aspects.
  • It was decided to carry out a second iteration repeating the last three steps of the methodology.

4.3. Second Iteration: Development Process for HEUXIVA

In the second iteration, the last three stages of the methodology were performed again, starting from “Step 6: Specification Stage”. In this step, a second version of the set of heuristics that evaluate voice assistants (HEUXIVA) was proposed. The refinement made in the first iteration was considered along with the information matched in “Step 4: Correlational Stage” in the first iteration (see Appendix C).
In “Step 7: Validation Stage”, the second version of the set of heuristics was validated through expert judgment and user testing. Details of the validations performed can be reviewed in Section 5.3 and Section 5.4, respectively. Finally, in “Step 8: Refinement Stage”, 10 heuristics were modified based on the expert judgment and user testing performed in Stage 7. The final version of HEUXIVA is presented in Section 6.

5. Results

5.1. Results Obtained in the Iteration 1: Validation Through Heuristic Evaluation

We conducted a heuristic evaluation to validate the effectiveness of the first version of the proposed heuristics (HVA). For this purpose, we defined a control group, composed of 3 evaluators who used the existing set of heuristics for “conversational agents” (CAH) [10]; and an experimental group, composed of 3 evaluators who used the new set of proposed heuristics in its first version: HVA. Both groups conducted the evaluation using Google’s voice assistant Nest Mini [38]. Each group was composed of evaluators with a similar level of experience in interacting with voice assistants and performing heuristic evaluations. All were computer engineers with formal training in UX research and professional experience ranging from three to five years in the field of UX. Their ages ranged from 30 to 35 years, and the group consisted of four men and two women.
To evaluate the effectiveness of HVA, we used the criteria defined in the methodology applied [2] (the explanation of the formulas and the calculation of each criterion applied to evaluate effectiveness can be found in Appendix I). The results obtained by the control and experimental groups were compared in terms of:
  • Numbers of correct and incorrect associations of problems to heuristics
  • Number of usability/UX problems identified
  • Number of specific usability/UX problems identified
  • Number of identified usability/UX problems that qualify as more severe (how catastrophic the usability/UX problem detected is)
  • Number of identified usability/UX problems that qualify as more critical (how severe and frequent the problem detected is)
Table 1 shows the results obtained in the heuristic evaluations performed by the experimental and control groups. In addition, the effectiveness of HVA in terms of the five criteria is shown. As shown in Table 1, HVA performed better than CAH on two of the five criteria. HVA detected more usability/UX problems than CAH and detected more specific problems related to voice assistants (ESS1 > ESS2). However, CAH had a higher percentage of correct associations than HVA (CA1 < CA2), and CAH detected more severe (ESV1 < ESV2) and critical (ESC1 < ESC2) problems than HVA. The above indicates that HVA requires refinement in terms of its specification both to improve clarity and increase the number of correct associations of problems to heuristics, as well as to increase the number of severe and critical problems detected (related to voice assistants).

5.2. Results Obtained in Iteration 1: Validation Through Expert Judgment

In addition to conducting a heuristic evaluation, in the first iteration we also conducted a survey with a group of three experts to evaluate HVA. These experts were those who participated as evaluators in the experimental group in the heuristic evaluation presented in Section 5.1. The survey was designed to obtain the evaluators’ perception of HVA along four dimensions: D1—Utility, D2—Clarity, D3—Ease of use, and D4—Need for a checklist. We used a five-point Likert-type scale (1 represents the worst rating and 5 the best for dimensions D1, D2, and D3. For dimension D4, a rating of 1 indicates a complete need for additional elements, while 5 signifies no need). Table 2 shows the average values obtained for each dimension per heuristic.
Regarding dimension “D1—Utility”, the evaluators perceived all heuristics useful for evaluating voice assistants, except for the heuristic “HVA6: Consistent Voice Interface” (rated 3.3), indicating a need to review its utility by either improving its specification or removing it from the set. Nevertheless, all heuristics were perceived as clear by the evaluators (dimension “D2—Clarity”), with all ratings being 4.0 or higher. However, in terms of “dimension D3—Ease of use”, 6 out of the 12 heuristics were perceived by the evaluators as difficult to use in practice for detecting usability/UX issues in voice assistants (with ratings of 3.6 or lower), particularly the heuristics “HVA3: Brevity and Relevance of Information” and “HVA11: Reliability and Data Privacy” (both rated 2.6). This indicates a need to enhance the specification of these heuristics, either by adding more detail or improving their wording.
Finally, concerning dimension “D4—Need of additional elements”, the evaluators considered that 2 out of the 12 heuristics should incorporate additional information to improve their specification, the heuristics “HVA6: Consistent Voice Interface” and “HVA7: User Control and Freedom” (both rated 3.6). Both were perceived as useless and difficult to use, respectively, so their specifications could be significantly improved by incorporating additional elements. Based on the results obtained both from the heuristic evaluation and expert judgment, the heuristics were refined in the second iteration (step 6: specification stage).

5.3. Results Obtained in Iteration 2: Validation Through Expert Judgment

In the second iteration, we applied another survey to eight experts to validate the refined set of heuristics: HEUXIVA. We searched for experts through LinkedIn and contacted them via email. The experts had medium-to-high experience using heuristic sets and conducting heuristic evaluations. Specifically, three experts had high experience (more than 6 evaluations conducted); four experts had medium experience (4 to 5 evaluations), and one had low experience (3 evaluations conducted).
The survey had the same design as the one used in the first iteration (see Section 5.2), but this time it focused on evaluating the new heuristics proposal (HEUXIVA). The objective was to gather the evaluators’ perceptions of HEUXIVA along four dimensions: D1—Utility, D2—Clarity, D3—Ease of use, and D4—Need for a checklist. We used once again a five-point Likert-type scale (1 represents the worst rating and 5 the best for dimensions D1, D2, and D3. For dimension D4, a rating of 1 indicates a complete need for additional elements, while 5 signifies no need). Table 3 shows the average values obtained for each dimension per heuristic.
As shown in Table 3, it is noticeable that most of the heuristics were well perceived by the experts in all four dimensions. Regarding dimension “D1—Utility”, 12 out of 13 heuristics received a rating above 4.3, indicating that they were perceived as very useful. Only the heuristic “HEUXIVA13: Guides and Documentation” received a “neutral” rating (3.8, “moderately useful”), suggesting that it could be refined to ensure it is considered useful for evaluating voice assistants. For the dimensions “D2—Clarity” and “D4—Need of additional elements”, all heuristics received ratings above 4.0, indicating that they were perceived as clear (easy to read) and do not require additional information for understanding and using them to detect usability/UX problems. Finally, concerning dimension “D3—Ease of Use”, 11 out of 13 heuristics were perceived as easy to use (with ratings above 4.0), except for the heuristics “HEUXIVA5: Information Accuracy” and “HEUXIVA7: Consistent Voice Interface”, which received a rating close to “neutral” (3.9), being perceived as “moderately easy to use”. This suggests that further improvements could be made to their specifications.
Compared to the results obtained in the expert judgment of the first iteration (see Section 5.2), we concluded that the specification of the heuristics has improved for all dimensions (see Table 4), particularly noting the positive enhancements in terms of “utility”, “ease of use”, and “need for additional elements” for the heuristics “HEUXIVA6: User Control and Freedom” and “HEUXIVA7: Consistent Voice Interface” (see Table 5).

5.4. Results Obtained in Iteration 2: Validation Through User Testing

We conducted a user test to: (1) verify whether the most severe and critical problems identified by evaluators in the heuristic evaluation conducted in iteration 1 (see Section 5.1) are perceived in the same way by users; and (2) identify usability/UX issues that arise during user interaction with a voice assistant and verify if these issues are covered by HEUXIVA (i.e., to check if it is possible to identify these problems detected during a user test using HEUXIVA). The user test was a thinking aloud type, moderated by the authors and synchronous.

5.4.1. User Test Design

The user test consisted of three parts (pre-test, test, and post-test). The first part (or pre-test) included an individual questionnaire with demographic questions to understand the participant’s profile and their experience using voice assistants, as well as a confidentiality agreement. The second part (test) involved a scenario with 8 tasks that participants had to perform individually (see Table 6). Additionally, during the test, participants were required to verbally express their opinions, experiences, emotions, and comments about the tasks and the use of the voice assistant. Finally, the third part (post-test) included a questionnaire to evaluate the participants’ perceptions and experiences using the voice assistant. For the tests, the Google Nest Mini voice assistant was used.

5.4.2. Participant Selection

Twelve users participated in the test, aged between 22 and 28 years. Four of the participants had never used a voice assistant before (inexperienced users), four had used a voice assistant at least once (medium-experienced users), and four used them daily (highly experienced users). We decided to seek three different user profiles (inexperienced or novice, medium experienced, and highly experienced) to obtain representative results and visualize how users interact with this type of device.

5.4.3. Results Obtained

Based on verbal comments made by users during the execution of tasks and the responses provided by users in the post-test, several usability/UX issues were identified and documented. Table 6 shows the tasks performed by the users in the thinking-aloud test and their results. Based on the users’ performance of the tasks, 20 usability/UX problems were identified. We reviewed whether HEUXIVA allows the detection of the identified problems (P1 to P20), concluding that HEUXIVA covers all problems detected in the test (see Table 6, last column). This allows to determine that the proposed set is effective in detecting usability/UX issues related to voice assistants.
As shown in Table 6, users completed all tasks with average times ranging from 78 to 168 s, reflecting a good performance for simple actions and greater effort for complex tasks. The most expressed emotions (neutral, happiness, confusion, and irritation) suggest that positive experiences were linked to successful and fluent interactions, while negative emotions appeared when the assistant failed to interpret or execute commands correctly.
On the other hand, it can be observed that the problems detected during the user test were related to the following 8 heuristics: HEUXIVA1, HEUXIVA3, HEUXIVA4, HEUXIVA5, HEUXIVA6, HEUXIVA7, HEUXIVA10, and HEUXIVA12 (see Table 7). Of these 8 heuristics, 5 were proposed to identify specific usability/UX problems directly related to voice assistants (HEUXIVA3, HEUXIVA4, HEUXIVA5, HEUXIVA7, and HEUXIVA12). Based on the results obtained in the user test, it is possible to highlight the utility of the HEUXIVA set, as several specific usability/UX problems were identified while the users were using the voice assistant and completing the tasks (12 specific problems detected, see Table 7).
These findings offer preliminary but consistent evidence supporting the effectiveness of HEUXIVA in identifying UX issues specific to voice assistants.

6. HEUXIVA: Heuristics for Evaluating User eXperience with Voice Assistants

Based on the iterations and validation described in the previous sections, the HEUXIVA set was refined and improved. We proposed a total of 13 heuristics that can be used to evaluate the user experience of voice assistants. Of the proposed heuristics, 7 are new and are defined to detect specific problems of voice assistants (HEUXIVA3, HEUXIVA4, HEUXIVA5, HEUXIVA7, HEUXIVA8, HEUXIVA11, and HEUXIVA12). These heuristics are presented in Table 8, including: ID, name, description, the voice assistant features evaluated with the heuristic; and the UX attributes evaluated with the heuristic.
In addition, Appendix J presents each heuristic in detail using the template specified in the methodology applied [2]. Each heuristic is presented in a table containing: ID, name, definition, explanation, priority (how important the heuristic is: critical, important, or useful), usability and UX attributes evaluated with the heuristic, voice assistant features evaluated with the heuristic, set of heuristics related, a checklist, and a compliance example and non-compliance example.
A Supplementary Excel File (S1) is provided to support the practical application of HEUXIVA. This material includes five sheets: (1) a brief description of the 13 heuristics; (2) an extended description of each heuristic; (3) the corresponding checklist items for each one; (4) examples of usability/UX problems that can be identified using HEUXIVA and guidance on how to document them; and (5) a blank template for recording problems during heuristic evaluations. This resource aims to facilitate reproducibility and assist researchers and practitioners in applying the HEUXIVA.

7. Discussions

7.1. About the Results Obtained in Validation Stage (First and Second Iteration)

We performed four experiments to validate HEUXIVA, a heuristic evaluation and expert judgment in the first iteration, and another expert judgment and user testing in the second iteration. As shown in Table 1, the results from the first iteration demonstrate that the initial set of heuristics (HVA) achieved reasonable levels of effectiveness across the applied criteria. Although some variability was observed among evaluators, the findings show that the first version of heuristics permits to identify relevant usability/UX problems related to voice assistants. These early outcomes provided the empirical foundation for refining the heuristics into the final HEUXIVA set.
On the other hand, as shown in Table 4, the comparison of expert judgment results between the first and second iterations highlights the gradual refinement of HEUXIVA. Experts reported improvements in clarity, relevance, and applicability, particularly for those heuristics addressing user control and freedom, and consistent voice interface. The observed increase in agreement among evaluators suggests that the iterative process was effective in reducing ambiguity and improving the understanding of each heuristic’s purpose. Finally, the findings presented in Table 6 integrate quantitative and qualitative data from the thinking-aloud user test conducted during the second iteration. The results support the practical usefulness of HEUXIVA by demonstrating that the heuristics cover the types of usability/UX problems encountered by real users when interacting with voice assistants.
Although the current validations are preliminary, the results demonstrate good progress in the refinement of HEUXIVA through multiple iterations and experiments. The integration of expert and user perspectives provides evidence that the proposed set is useful and applicable for evaluating UX in voice assistants.

7.2. Comparative Analysis with Existing Heuristics and Evaluation Methods

To contextualize the contribution of HEUXIVA, Table 9 compares the proposed set with existing studies related to voice assistants’ evaluation, including the heuristics proposed by Langevin et al. [10] and Sánchez-Adame et al. [11], the Voice Usability Scale (VUS) proposed by Zwakman et al. [12], ergonomics criteria for voice user interface proposed by Nowacki and Gordeeva [37], and usability problems related to intelligent personal assistants (IPAs) identified by Cowan et al. [9]. While these previous studies represent important advances, most of them primarily address usability aspects and do not fully integrate user experience (UX) dimensions.
Existing sets of heuristics are focused on chatbots and conversational agents, rather than voice assistants specifically. The sets by Langevin et al. [10] and Sánchez-Adame et al. [11] adapt Nielsen’s general usability principles but overlook essential aspects of voice interaction. Cowan et al. [9] provides an empirical characterization of user challenges with IPAs, highlighting several usability/UX problems. However, their study focuses on describing user problems rather than translating them into heuristics. The ergonomic criteria by Nowacki and Gordeeva [37] provide valuable guidance for voice user interfaces (VUIs) but lack empirical validation. Finally, the Voice Usability Scale (VUS) by Zwakman et al. [12] offers a quantitative perspective on usability through a concise ten-item survey adapting SUS [32] but does not provide heuristics or focus on user experience.
In contrast, HEUXIVA integrates insights from these works while addressing their main gaps. It includes usability and UX perspectives, incorporates features unique to voice assistants (such as effective communication, voice interface, guidance and assistance, adaptation, among others), and introduces practical checklists to guide evaluations. Moreover, HEUXIVA was developed through a structured, iterative methodology that combines literature synthesis, heuristic evaluation, expert judgment, and user testing, resulting in a comprehensive set. Although the current validation of HEUXIVA is preliminary (limited to a single device and small participant samples) the comparative analysis indicates that HEUXIVA advances the evaluation of voice assistants by addressing specific interaction issues not captured by existing heuristic sets. This early validation nonetheless provides promising evidence of its potential to guide more comprehensive and context-aware UX evaluations in future studies.

7.3. Novel Contributions and Creation of New Heuristics

Of the HEUXIVA set, six heuristics were adapted from existing proposals (Nielsen [31], Langevin et al. [10], Sánchez-Adame et al. [11], Nowacki and Gordeeva [37]). This adaptation, however, was not a direct translation but part of a systematic integration process defined in the correlational and selection stages of the applied methodology. During these stages, each heuristic was evaluated for its relevance to voice-based interaction, its coverage of UX attributes, and its ability to identify usability/UX problems.
Through this process, we also defined new heuristics. Specifically, seven heuristics (HEUXIVA3, HEUXIVA4, HEUXIVA5, HEUXIVA7, HEUXIVA8, HEUXIVA11, and HEUXIVA12) are considered entirely novel, as the aspects they address were only partially covered by existing proposals. These heuristics go beyond traditional usability/UX principles by integrating elements that are unique to voice assistants, such as conversational fluidity, linguistic adaptability, information accuracy, personalization, privacy transparency, and reliability in autonomous interactions. HEUXIVA extends the conceptual and practical boundaries of prior heuristic sets, offering a more comprehensive instrument for evaluating user experience in voice assistants. Table 10 shows the origin of each HEUXIVA heuristic and its contribution to UX evaluation.
To ensure transparency of coverage and to avoid redundancy across heuristics, a matrix was developed linking each voice assistant feature with its corresponding HEUXIVA heuristic, checklist item, problem type, and representative example (see Appendix K). To prevent overlaps among heuristics, three rules were applied: (1) each voice assistant feature was mapped to a primary heuristic that best represents its evaluative focus; (2) checklist items covering similar UX aspects were grouped and assigned to the most specific heuristic; and (3) overlapping items were merged under the heuristic with the broader or more integrative scope.

8. Limitations

This study has several limitations; however, we believe it represents a valuable contribution toward advancing the evaluation of UX with voice assistants. First, the validation scope was narrow, as all experiments were conducted using Google Nest Mini. While this restricts generalization to other devices, it allowed to maintain experimental control and consistency across iterations. Focusing on a single device enabled a more precise identification of domain-specific UX issues, which can later be contrasted with other platforms in future research.
Second, the sample sizes were relatively small and homogeneous. The heuristic evaluation involved two independent groups of three experts each (HVA vs. CAH), which may not fully eliminate group-composition effects. However, this setup provided valuable initial evidence about the effectiveness and clarity of the proposed heuristics. A larger, cross-over design is recommended for future studies to enhance statistical robustness, but the current results already show consistent tendencies that support the validity of HEUXIVA.
Third, in the first iteration, the same specialists who conducted the heuristic evaluation also participated in the expert judgment survey. This overlap may introduce a degree of confirmation bias. Nevertheless, it also ensured continuity and a deep understanding of the heuristics under review, resulting in meaningful, expert-informed feedback that guided the refinement of the set. Future iterations will address this by engaging independent evaluators for each stage.
Finally, the user testing in the second iteration was limited to 12 participants aged 22–28, using a single device and eight scenarios. While this constitutes a limited sample, it provided an appropriate pilot exploration that successfully verified the applicability and coverage of HEUXIVA. Broader studies including diverse devices (e.g., Siri or Alexa), languages, and usage contexts are planned to expand the external validity of the results.
Overall, despite these limitations, the study presents a domain-specific heuristic set that fills a gap in the UX evaluation of voice assistants. The iterative process, combination of multiple validation methods, and findings demonstrate that HEUXIVA is both rigorous and promising, serving as a strong basis for future refinement and broader application.

9. Conclusions and Future Work

Voice assistants are designed to support users in their daily activities by performing tasks such as setting alarms, retrieving information, or managing smart devices through natural voice interaction. However, due to the diversity of existing platforms and their conversational limitations, they often present usability and UX issues that negatively affect user satisfaction and continued adoption. Establishing specific heuristics to evaluate the user experience with these devices is therefore essential to identify interaction problems and improve their overall quality.
In this study, we proposed HEUXIVA, a set of 13 heuristics specifically developed to evaluate the user experience of voice assistants. The heuristics were created through a structured, iterative methodology and validated through heuristic evaluation, expert judgment, and user testing. The results—although preliminary—suggest that HEUXIVA is a useful and reliable instrument for identifying usability and UX issues specific to voice assistants, indicating that this research is progressing in a promising direction.
As a limitation of this study, the experiments were conducted exclusively using the Google Nest Mini device, with small and homogeneous samples. These constraints provided control and consistency in early stages but also limit the generalizability of the findings. As future work, we plan to address these aspects through several actions: broaden the validation scope by including multiple platforms (e.g., Amazon Alexa and Apple Siri) and varied acoustic, linguistic, and environmental contexts; increase and diversify the participant samples, incorporating users from different age groups, linguistic backgrounds, and profiles to improve external validity; and separate evaluator roles by engaging independent expert groups for heuristic evaluation and for subsequent judgment or surveys, thereby reducing bias and improving result independence.
We expect that the proposed heuristic set will support researchers and industry practitioners in developing and refining new voice assistants, facilitating the detection of usability and UX problems and improving users’ overall interaction experience. By improving user experience, these systems can better ensure quality, satisfaction, and alignment with user expectations.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/app152011178/s1, Excel Sheet S1: (S1) HEUXIVA—Supplementary Material.

Author Contributions

Conceptualization, D.Q.; methodology, D.Q., S.C. and L.F.R.; validation, D.Q., J.R. and V.B.; formal analysis, D.Q., J.R. and V.B.; investigation, D.Q., J.R. and V.B.; resources, D.Q.; data curation, D.Q., C.S., J.R. and V.B.; writing—original draft preparation, D.Q., C.S., J.R., V.B. and L.F.R.; writing—review and editing, D.Q., S.C. and L.F.R.; visualization, D.Q., C.S. and L.F.R.; supervision, D.Q.; project administration, D.Q.; funding acquisition, D.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Agencia Nacional de Investigación y Desarrollo (ANID), Chile, FONDECYT INICIACIÓN, Project No. 11190759.

Institutional Review Board Statement

The study was conducted in accordance with the ethical standards defined in the regulations of the Pontificia Universidad Católica de Valparaíso, Chile (protocol code BIOEPUCV-H 319-2019, date of approval: 14 October 2019), the Declaration of Bioethics and Human Rights of 2005 by UNESCO, and the ANID regulations for studies involving humans.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Data is contained within the article or Supplementary Material. The original contributions presented in this study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors would like to thank all the participants (experts, users, evaluators, and researchers) who were involved in the experiments for this study. During the preparation of this work the authors used ChatGPT 4.0 and 5.0 to translate the text of the article into English. After using this tool, the authors reviewed and edited the content as needed and take full responsibility for the content of the publication.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A. Inputs, Outputs, and Activities for Each Step Performed in Iteration 1

Table A1. Inputs, outputs, and activities for each step performed in iteration 1.
Table A1. Inputs, outputs, and activities for each step performed in iteration 1.
StepInputOutputActivities Performed
Step 1:
Exploratory Stage
-① Information about voice assistant devices (three definitions, ten features, and the necessity and taxonomy of voice assistants); ② one proposal for usability attributes and one proposal for UX attributes; and ③ five sets of related heuristicsConduct a literature review about voice assistants (definitions and features); usability/UX attributes; existing sets of usability/UX heuristics related, and other relevant information.
Step 2:
Experimental Stage
① Information about voice assistant devices; ② one proposal for usability attributes and one proposal for UX attributes; and ③ five sets of related heuristics④ Voice assistant usability issuesConduct a formal inspection made by two researchers.
Identify usability issues during the formal inspection of the device.
Step 3:
Descriptive Stage
① Information about voice assistant devices; ② one proposal for usability attributes and one proposal for UX attributes; and ③ five sets of related heuristics; ④ Voice assistant usability issues⑤ Selected information about voice assistants; ⑥ ten features of voice assistants; ⑦ three UX attributes from one proposal; ⑧ usability issues found; and ⑨ five selected sets of heuristicsGroup all the information collected.
Sort and prioritize the information using a three-level scale (3: highly important; 2: somewhat important; 1: not important).
Select the relevant information to develop the set of heuristics.
Step 4:
Correlational Stage
⑤ Selected information about voice assistants; ⑥ ten features of voice assistants; ⑦ three UX attributes from one proposal; ⑧ usability issues found; and ⑨ five selected sets of heuristics⑩ Matched all features, attributes, existing heuristics, and other related elements togetherMatch the ten voice assistant features with the three UX attributes, and the five sets of heuristics [10,11,31,33,37]; and the usability issues.
Step 5:
Selection Stage
⑩ Matched features, attributes, existing heuristics, and other related elements⑪ Classified heuristics (1 heuristic to keep; 39 heuristics to adapt; and 11 heuristics to eliminate)Review Nielsen heuristics [31], conversational agents heuristics [10], heuristics for evaluating chatbots [11], and the ergonomic criteria for voice user interfaces [37].
Determine what heuristics to: keep, adapt, and eliminate.
Step 6:
Specification Stage
⑩ Matched features, attributes, existing heuristics, and other related elements; ⑪ Classified heuristics (1 heuristic to keep; 39 heuristics to adapt; and 11 heuristics to eliminate)⑫ Set 12 of voice assistant heuristics, HVA (first iteration)Specify 12 UX heuristics for voice assistants (HVA), including: id, name, definition,
explanation, voice assistant feature,
examples, UX attribute, and existing heuristics related.
Step 7:
Validation Stage
⑫ Set 12 of voice assistant heuristics, HVA (first iteration)⑬ Heuristic evaluation results: effectiveness of HVA; ⑭ Expert judgment results (survey)Perform a heuristic evaluation with six evaluators (three evaluators for the control group, and three evaluators for the experimental group).
Perform a survey for experts to review the heuristics.
Step 8:
Refinement Stage
⑬ Heuristic evaluation results: effectiveness of HVA; ⑭ Expert judgment results (survey)⑮ Refining document: (1) 12 heuristics to refine, 1 heuristic to add; (2) repeat steps 5–8Document the improvements to be performed in the specification of HVA.
It is decided to repeat stages 5–8.

Appendix B. Inputs, Outputs, and Activities for Each Step Performed in Iteration 2

Table A2. Inputs, outputs, and activities for each step performed in iteration 2.
Table A2. Inputs, outputs, and activities for each step performed in iteration 2.
StepInputOutputActivities Performed
Step 6:
Specification Stage
⑮ Refining document: 12 heuristics to refine, 1 heuristic to add; ⑩ Matched features, attributes, existing heuristics, and other related elements; ⑫ Set 12 of voice assistant heuristics, HVA (first iteration)① Set 12 of voice assistant heuristics, HEUXIVA (second iteration)Refine the specification of the 13 UX heuristics for voice assistants (HVA), including: id, name, definition,
explanation, voice assistant feature,
examples, UX attribute, and existing heuristics related.
Step 7:
Validation Stage
① Set 12 of voice assistant heuristics, HEUXIVA (second iteration)② Heuristic evaluation results:
effectiveness of HEUXIVA; ③ Expert judgment results (survey); ④ User tests results
Perform a heuristic evaluation with X evaluators (X evaluators for the control group, and X evaluators for experimental group).
Perform a survey for eight experts to review the heuristics.
Perform a thinking-aloud test to evaluate a case study with twelve users.
Step 8:
Refinement Stage
① Set 12 of voice assistant heuristics, HEUXIVA (second iteration); ② Heuristic evaluation results:
effectiveness of HEUXIVA; ③ Expert judgment results (survey); ④ User tests results
⑤ Set of 13 voice assistant heuristics, HEUXIVA (second iteration)Refine and improve the final specification of 13 UX heuristics for voice assistants (HEUXIVA).

Appendix C. Set of Heuristics for Voice Assistants Developed at Each Iteration

Table A3. Set of heuristics for voice assistants developed at each iteration.
Table A3. Set of heuristics for voice assistants developed at each iteration.
First Iteration (HVA)Second Iteration (HEUXIVA)
HVA1: System Status VisibilityHEUXIVA1: System Status Visibility
HVA2: Feedback and Help Users Prevent ErrorsHEUXIVA2: System Guidance and Capabilities
HVA3: Brevity and Relevance of InformationHEUXIVA3: Effective and Fluid Communication
HVA4: Natural CommunicationHEUXIVA4: Environment Match Between Assistant and User Language
HVA5: Match Between the System and the Real WorldHEUXIVA5: Information Accuracy
HVA6: Consistent Voice InterfaceHEUXIVA6: User Control and Freedom
HVA7: User Control and FreedomHEUXIVA7: Consistent Voice Interface
HVA8: Flexibility and PersonalizationHEUXIVA8: Voice Shortcuts, Flexibility and Personalization
HVA9: Help Users Recognize, Diagnose, and Fix ErrorsHEUXIVA9: Error Prevention
HVA10: System Guidance and CapabilitiesHEUXIVA10: Help Users Recognize, Diagnose, and Fix Errors
HVA11: Reliability and Data PrivacyHEUXIVA11: Data Privacy
HVA12: Guides and DocumentationHEUXIVA12: Voice Assistant Reliability
HEUXIVA13: Guides and Documentation

Appendix D. First Iteration, Step 2: “Experimental Stage”

Table A4. First iteration, Step 2 “Experimental stage”: List of usability/UX problems of the voice assistant found in formal inspection.
Table A4. First iteration, Step 2 “Experimental stage”: List of usability/UX problems of the voice assistant found in formal inspection.
IDProblemOccurrence ExampleExplanation (Why It Affects the User)Severity
P1Device ignores userWhen making a request, action and/or question, the device will sometimes “wake up” (perform the listening action) and ignore the user without providing feedback as to why it will not perform the requested action.When ignored, the user feels uncertain about what the problem is, and why it does not work.4 (catastrophic problem)
P2Difficulty initializing deviceWhen connecting the device for the first time, the pairing process becomes difficult for the user because the device does not provide feedback until it is fully configured. Also, when reconnecting it to the same location, the device would present connection errors, and it must be manually reset.When the device presents difficulties in initializing (user’s first impressions) it generates the person’s intention not to use it.4 (catastrophic problem)
P3Lack of manual or instructionsThere is no guide to reset the device.Without a complete user guide or manual, the person must manually search the Internet for external explanations.3 (major problem)
P4Device does not understand language and jargonWhen the user expresses themself using language and technical terms, the device ends the conversation early and/or says, “I’m sorry, I didn’t understand”.Since the device does not know the language of the place in which it is located, it causes the user to change the way they speak, in addition to not generating a fluid conversation.3 (major problem)
P5Device provides incoherent responsesWhen asking the device about certain topics (e.g., the user’s mood), it provides incoherent answers and changes the context of the conversation.By providing incoherent and/or unrelated responses to the topic, the device generates uncertainty in the user about the device’s capabilities (the limits that the device has).3 (major problem)
P6The device does not recognize the user’s voice when in a noisy environmentWhen the device is in a noisy environment (e.g., television on), it does not distinguish the user’s voice despite having voice recognition.If the user cannot be detected by the device, the user must increase the volume of their voice, raise the pitch and/or turn off the device that is providing noise near the device.3 (major problem)
P7Device does not provide useful information to userWhen asking about the weather in the city of Punta Arenas, Chile, the device gives the weather in the city of Puntarenas, Costa Rica.It is annoying for the user that it gives different results since it is supposed to know their location when connected to their home network and provide information accordingly.3 (major problem)
P8The device has limited memoryWhen you ask the device about a topic that was discussed less than 30 s ago, it does not remember what was discussed.If the device does not remember what the user told it in the previous request, it gives the impression that the user is not being listened to and/or paid attention to.2 (minor problem)
P9The device does not have orientation on the volume up and down buttonsWhen trying to manually increase the volume of the device, the user becomes disoriented when trying to increase/decrease the volume.The user may be confused as they must press the buttons at random to find out which button they wanted to select.2 (minor problem)
P10Device ends conversations prematurelyWhen interacting with the device, it stops talking after less than 1 s, causing the user to have to start the conversation again with the activation phrase.As the device ends conversations at its discretion, it makes the user realize that they are talking to a machine/robot.2 (minor problem)
P11Inconsistent languageWhen the device is playing music on Spotify and the user disconnect it using his/her phone, the device displays a message in English despite being set to Spanish.A message in another language causes confusion for the user because they may not understand what the device is communicating.2 (minor problem)
P12The device does not understand search requestsWhen the user asks the device to “Search Barso”, it responds “Sorry, I didn’t understand”, even though the device can perform Google searches.By not understanding search queries, the user may become uncertain about whether the device works or can be useful.2 (minor problem)
P13Device does not manage voice pairings with external devicesThe process of linking the device to external devices must be manual, using the mobile phone application (Google Home).Since the action of managing links is not performed by voice, the user is forced to do them manually using the device’s mobile application.2 (minor problem)

Appendix E. First Iteration, Step 3: “Descriptive Stage”

Table A5. First iteration, Step 3 “Descriptive stage”: Relevance for voice assistant features, UX attributes, sets of existing heuristics, and related relevant elements.
Table A5. First iteration, Step 3 “Descriptive stage”: Relevance for voice assistant features, UX attributes, sets of existing heuristics, and related relevant elements.
TopicValue According to RelevanceExplanation
3: Highly Important2: Somewhat Important1: Not
Important
Voice assistant informationName and definition of voice assistant [4]; Name and definition of voice assistant [7]; Name and definition of voice assistant [8]; Need to create a UX evaluation method for voice assistants [12].Taxonomy of voice assistants [40]-The different definitions of voice assistants and the need to create a UX evaluation method for them were deemed highly relevant and their taxonomy was somewhat relevant.
Voice assistant featuresEffective Communication; Effective; Activity Management; Customizable; Multi-user; Security and Privacy; Multi-linkable; Culturizable/adaptable; Voice Interface; Guidance and Assistance--All features were considered highly relevant.
UX attributesUseful; Usable; Desirable; Findable/locatable; Credible; Valuable; Learning Capacity; Effectiveness; Efficiency; Satisfaction-AccessibilityOut of the three proposals for UX attributes collected in Stage 1, only Accessibility was not considered due to its complexity.
Sets of heuristics11 R. Langevin’s heuristics [10]; 10 Nielsen’s heuristics [31]5 L. M. Sanchez-Adame’s heuristics [11]; 8 C. Nowacki and A. Gordeeva’s heuristics [37]-Two sets of heuristics were deemed highly important, and 3 sets were considered somewhat relevant.
Usability/UX problemsFormal inspection by researchers (see Appendix D)R. Cowan’s problems with the experience of people who use IPAs occasionally [9]-Two sets of usability/UX problems were considered relevant enough.
Other related elements-Zwakman’s VUS questionnaire [33]-One related element was selected.

Appendix F. First Iteration, Step 4: “Correlational Stage”

Table A6. First iteration, Step 4 “Correlational stage”: Match between the voice assistants features, usability/UX attributes, heuristics proposed by other authors, usability/UX problems detected, and related elements proposed by other authors.
Table A6. First iteration, Step 4 “Correlational stage”: Match between the voice assistants features, usability/UX attributes, heuristics proposed by other authors, usability/UX problems detected, and related elements proposed by other authors.
FeatureUsability/UX AttributeHeuristic RelatedUsability/UX Problems (Obtained from Formal Inspection and R. Cowan’s Problems [9]VUS Items
Effective communicationEffectiveness; Efficiency;
Useful
H2: Context (partially covered feature)
H3: Naturalness (partially covered feature)
C1: Visibility of system status (slightly covered feature)
C5: Error prevention (fully covered feature)
C8: Aesthetic, minimalist and engaging design (partially covered feature)
C9: Help users recognize, diagnose and recover from errors (fully covered feature)
C10: Context preservation (partially covered feature)
N1: Visibility of system status (slightly covered feature)
N5: Error prevention (partially covered feature)
N9: Help users recognize, diagnose, and recover from errors (slightly covered feature)
E1.2: Immediate feedback (partially covered feature)
E5: Error management (slightly covered feature)
E5.2: Quality of error messages (partially covered feature)
E7.1: Short a long-term memory (partially covered feature)
P1: Device ignores user
P5: Device provides incoherent responses
P10: Device ends conversations prematurely
P11: Inconsistent language
I1: I thought the response from the voice assistant was easy to understand
EffectiveEffectiveness; Efficiency;
Useful
H1: Complexity (slightly covered feature)
H2: Context (slightly covered feature)
C6: Help and guidance (partially covered feature)
E5: Error management (slightly covered feature)
P7: Device does not provide useful information to user
P8: The device has limited memory
P12: The device does not understand search requests
I2: I thought the information provided by the voice assistant was not relevant to what I asked
I10: I found the voice assistant difficult to use
Activity managementUseful;
Credible;
Valuable;
Satisfaction; Learning
capacity
C3: User control and freedom (slightly covered feature)
C7: Flexibility and efficiency of use (partially covered feature)
N3: User control and freedom (slightly covered feature)
N7: Flexibility and efficiency of use (slightly covered feature)
E2.1: Brevity (slightly covered feature)
E2.2: Information density (slightly covered feature)
E3.1: Explicit user action (partially covered feature)
E3.2: User control (slightly covered feature)
P2: Device ignores user
P3: Difficulty initializing device
PP: Trust issues when assigning activities to the device
I5: I felt the voice assistant enabled me to successfully complete my tasks when I required help
I7: The voice assistant had all the functions and capabilities that I expected it to have
CustomizableSatisfaction;
Useful;
Desirable
C7: Flexibility and efficiency of use (partially covered feature)
N3: User control and freedom (slightly covered feature)
N7: Flexibility and efficiency of use (slightly covered feature)
E4.1: Flexibility (partially covered feature)
E4.2: User’s experience level (partially covered feature)
E7.2: Environment (partially covered feature)
E8.2: Behavior (partially covered feature)
No associated problem found/detectedI6: I found it frustrating to use the voice assistant in a noisy and loud environment
I8: I found it difficult to customize the voice assistant according to my needs and preferences
Multi-userEffectiveness; UsefulH2: Context (slightly covered feature)
C10: Context preservation (partially covered feature)
E4.3: Multi-user (partially covered feature)
P6: The device does not recognize the user’s voice when in a noisy environmentAn associated item was not found/detected
Security and privacyCredible;
Satisfaction; Findable/locatable
C11: Trustworthiness (partially covered feature)
E8.2: Behavior (slightly covered feature)
PP: Trust, data privacy, transparency and data ownership issuesAn associated item was not found/detected
Multi-linkableUseful;
Valuable;
Effectiveness
C9: Help users recognize, diagnose and recover from errors (slightly covered feature)
N9: Help users recognize, diagnose and recover from errors (slightly covered feature)
P13: Device does not manage voice pairings with external devices
PP: Problems with integration with apps, platforms and systems
An associated item was not found/detected
Culturizable/adaptableEfficiency; Satisfaction; DesirableH2: Context (partially covered feature)
H3: Naturalness (partially covered feature)
C2: Match between system and the real world (partially covered feature)
C4: Consistency and standards (partially covered feature)
N2: Match between system and the real world (slightly covered feature)
N4: Consistency and standards (partially covered feature)
N8: Aesthetic and minimalist design (slightly covered feature
E4: Adaptability (slightly covered feature)
E4.1: Flexibility (partially covered feature)
E4.3: Multi-user (partially covered feature)
P4: Device does not understand idioms and jargonAn associated item was not found/detected
Voice interfaceEffectiveness; Efficiency; UsefulH3: Naturalness (partially covered feature)
C1: Visibility of system status (slightly covered feature)
C6: Help and guidance (partially covered feature)
N1: Visibility of system status (partially covered feature)
E6: Consistency (slightly covered feature)
E6.2: External consistency
E8.1: Identity
PP: Hands-free interaction support issuesAn associated item was not found/detected
Guidance and assistanceEffectiveness; Useful; Valuable;
Satisfaction; Findable/locatable
N10: Help and documentation (partially covered feature)No associated problem found/detectedAn associated item was not found/detected
The letter “N” is used as ID to indicate the number of Nielsen’s heuristics [31], the letter “C” for R. Langevin’s heuristics [10], the letter “H” for L. M. Sanchez-Adame’s heuristics [11], the letter “E” for C. Nowacki and A. Gordeeva’s heuristics [37]; the letter “PP” for R. Cowan’s problems with the experience of people who use IPAs occasionally [9], and the letter “I” for Zwakman’s VUS questionnaire [33].

Appendix G. First Iteration, Step 5: “Selection Stage”

Table A7. First iteration, Step 5 “Selection stage”: Heuristics and principles selection process.
Table A7. First iteration, Step 5 “Selection stage”: Heuristics and principles selection process.
IDNameActionReferencesVoice Assistant Feature CoveredApplicability
H1ComplexityAdapt[11]Effectiveness(1) Useful
H2ContextAdapt[11]Effective communication; Effectiveness; Multi-user; Culturizable/adaptable(1) Useful
H3NaturalnessAdapt[11]Effective communication; Culturizable/adaptable; Voice interface(2) Important
C1Visibility of system statusAdapt[10]Effective communication; Voice interface(3) Critical
C2Match between system and the real worldAdapt[10]Culturizable/adaptable(2) Important
C3User control and freedomAdapt[10]Activity management(3) Critical
C4Consistency and standardsAdapt[10]Culturizable/adaptable(2) Important
C5Error preventionAdapt[10]Effective communication(3) Critical
C6Help and guidanceAdapt[10]Effectiveness; Voice interface(3) Critical
C7Flexibility and efficiency of useAdapt[10]Activity management; Customizable(2) Important
C8Aesthetic, minimalist and engaging designAdapt[10]Effective communication(3) Critical
C9Help users recognize, diagnose and recover from errorsAdapt[10]Effective communication; Multi-linkable(3) Critical
C10Context preservationAdapt[10]Effective communication; Multi-user(3) Critical
C11TrustworthinessAdapt[10]Security and privacy(3) Critical
N1Visibility of system statusAdapt[31]Effective communication; Voice interface(2) Important
N3User control and freedomAdapt[31]Activity management; Customizable(3) Critical
N4Consistency and standardsAdapt[31]Culturizable/adaptable(2) Important
N5Error preventionAdapt[31]Effective communication(2) Important
N6Recognition rather than recallAdapt[31]Activity management(1) Useful
N7Flexibility and efficiency of useAdapt[31]Activity management; Customizable(2) Important
N8Aesthetic and minimalist designAdapt[31]Culturizable/adaptable(1) Useful
N9Help users recognize, diagnose and recover from errorsAdapt[31]Effective communication; Multi-linkable(2) Important
N10Help and documentationAdapt[31]Guidance and assistance(3) Critical
E1.2Immediate feedbackAdapt[37]Effective communication(3) Critical
E2.1BrevityAdapt[37]Activity management(3) Critical
E2.2Information densityAdapt[37]Activity management(3) Critical
E3.1Explicit user actionAdapt[37]Activity management(3) Critical
E3.2User controlAdapt[37]Activity management(3) Critical
E4AdaptabilityAdapt[37]Culturizable/adaptable(2) Important
E4.1FlexibilityAdapt[37]Customizable; Culturizable/adaptable(1) Useful
E4.2User’s experience levelAdapt[37]Customizable(1) Useful
E4.3Multi-userAdapt[37]Multi-user; Culturizable/adaptable(3) Critical
E5Error managementAdapt[37]Effective communication; Effectiveness(1) Useful
E5.2Quality of error messagesAdapt[37]Effective communication(1) Useful
E6ConsistencyAdapt[37]Voice interface(1) Useful
E6.2External consistencyAdapt[37]Voice interface(1) Useful
E7.1Short a long-term memoryAdapt[37]Effective communication(1) Useful
E7.2EnvironmentAdapt[37]Customizable(2) Important
E8.1IdentityAdapt[37]Voice interface(3) Critical
E8.2BehaviorAdapt[37]Customizable; Security and privacy(1) Useful

Appendix H. First Iteration, Step 8: “Refinement Stage”

Table A8. First iteration, Step 8 “Refinement stage”: Refinement of the first set of heuristics HVA.
Table A8. First iteration, Step 8 “Refinement stage”: Refinement of the first set of heuristics HVA.
IDRefinement SectionDescriptionActionSource
HVA1DefinitionInclude “illumination aspects”.AddHeuristic evaluation
Reduction for better understanding.ModifyExpert
judgment
ChecklistInclude the following elements:
The microphone states (on/off) are known by its lighting state.
The device keeps its lights off while inactive.
The device provides feedback after each action.
The device indicates that the request is suspended when the user stops speaking for a period of time.
The system provides an activation sound when listening starts.
AddHeuristic evaluation
HVA2Name, Definition, ExplanationModify to make them more comprehensible and representative.ModifyExpert
judgment
ChecklistInclude the following elements:
The device provides constructive help with errors and/or problems.
The device clearly indicates the possible causes of errors.
AddHeuristic evaluation
Remove the following item: The device warns of possible situations when carrying out a particular action.RemoveExpert
judgment
HVA3Definition, ExplanationRemove the description related to “short or minimal activation command”.RemoveExpert
judgment
Specification tableInclude the concept of “coherence”.AddExpert
judgment
Review the ease of use of heuristic.AnalyzeExpert
judgment
ChecklistInclude the following elements:
The device’s response is consistent with user request.
The device provides accurate and/or truthful information.
AddExpert
judgment
Remove the following elements:
The responses have a duration of approximately 8 s.
Voice commands consist of a phrase of 2 words at most.
RemoveExpert
judgment
HVA4Name, DefinitionModify to make them more comprehensible and representative.ModifyExpert
judgment
Specification tableRemove the concept of “coherence”.RemoveExpert
judgment
ChecklistInclude the following elements:
The device remains listening for a few seconds when the user stops/is thinking in the middle of a request.
The device allows to extend the conversation.
AddHeuristic evaluation
HVA5Name, ExplanationSpecify for better understanding.ModifyExpert
judgment
DefinitionInclude the concept of “idiolect”.AddHeuristic evaluation
ChecklistInclude the following element: The artifact recognizes the user’s particular way of speaking in requests.AddHeuristic evaluation
Specification tableAnalyze why HVA5 obtained 50% of correct associations.AnalyzeHeuristic evaluation
HVA6ChecklistInclude the following element: The device maintains its formal language even in error situationsAddExpert
judgment
Expand checklist listing.Analyze, AddExpert
judgment
HVA8Name, DefinitionIncorporate concepts: voice shortcut, customization/adaptation.AddExpert
judgment
ChecklistInclude the following elements:
The device allows the customization of the voice assistant.
The device can configure sounds that indicate a particular action.
AddExpert
judgment
HVA9ChecklistInclude the following element: The device clearly indicates the possible causes of errors.AddExpert
judgment
HVA10ChecklistInclude the following elements:
The listening limit of the device must be within 2 m.
The device provides help to the user regardless of the activity being performed.
AddHeuristic evaluation
HVA11Specification tableReview the ease of use of heuristic.AnalyzeExpert
judgment

Appendix I. Criteria Used to Evaluate the Effectiveness of a New Set of Usability/UX Heuristics (From [2,41])

Table A9. Five criteria used to evaluate the effectiveness of a new set of usability/UX heuristics [2].
Table A9. Five criteria used to evaluate the effectiveness of a new set of usability/UX heuristics [2].
Criterion DescriptionFormula
  • Numbers of correct and incorrect associations of problems to heuristics
C A   = n = 1 T C A H n T P × 100             I A   = n = 1 T I A H n T P × 100
where
CA: correct associations
IA: incorrect associations
T: total number of heuristics of the set
CAHn: number of correct associations of the problems to the heuristic “n”
IAHn: number of incorrect associations of the problems to the heuristic “n”
TP: total usability/UX problems identified
2.
Number of usability/UX problems identified
P1 = Problems that are identified by both groups of evaluators (common problems identified by both groups)
P2 = Problems that are identified only by the group that used the new set of heuristics (without considering the common problems)
P3 = Problems that are identified only by the group that used control heuristics (without considering the common problems)
3.
Number of specific usability/UX problems identified
E S S   = N S P T P × 100
where
ESS: effectiveness
NSP: number of specific usability/UX problems identified
TP: total usability/UX problems identified
4.
Number of identified usability/UX problems that qualify as more severe (how catastrophic the usability/UX problem detected is)
E S V   = N P V T P × 100
where
ESV: effectiveness
NPV: number of usability/UX problems identified qualified with a severity greater than 2
TP: total usability/UX problems identified
5.
Number of identified usability/UX problems that qualify as more critical (how severe and frequent the problem detected is)
E S C   = N P C T P × 100
where
ESC: effectiveness
NPC: number of usability/UX problems identified qualified with a criticality greater than 4
TP: total usability/UX problems identified

Appendix J. Full HEUXIVA Specification, Using the Template Proposed in the Methodology Applied

Table A10. HEUXIVA 1: “System status visibility”.
Table A10. HEUXIVA 1: “System status visibility”.
IDHEUXIVA1
NameSystem Status Visibility
DefinitionThe device must indicate to the user via voice, sound and/or illumination every action that is performed.
ExplanationThe device must deliver communication sufficiently intuitive for the user through the intonation of the voice of the assistant, of emphasis at the beginning and end of the conversation giving way to the user to continue the dialog with the artifact. Likewise, to provide the user with the status of the system, the assistant must communicate every action performed, to be performed or being performed in the same context/situation or request.
Priority(3) Critical
UX/Usability attributeUsability: Effectiveness, Efficiency, Satisfaction
UX: Useful, Valuable
Voice Assistant FeatureEffective conversation, Voice interface, Activity management
Set of heuristics related
C1: Visibility of system status [10]
N1: Visibility of System Status [31]
Checklist
The device communicates using voice.
The device has lighting signals when it interacts with the user.
Microphone states (on/off) are known according to its state of illumination.
The device keeps its lights off when kept idle.
The device provides feedback after each action.
The device indicates that the request is suspended when the user stops talking for a period.
The device provides a wake-up sound when starting to listen to the user.
The device keeps the user informed about the status of a request.
The device when presenting lighting, that is, when listening to the user, must always provide a response.
ExamplesCompliance:
Applsci 15 11178 i001 Ok Google, play music
Applsci 15 11178 i002 Ok, playing *song name* on Spotify
*Music starts playing*
Non-compliance:
Applsci 15 11178 i003 Ok Google, tell me about my reminders for today
*Device lights turn on*
Applsci 15 11178 i002 *Silence*
Table A11. HEUXIVA 2: “System guidance and capabilities”.
Table A11. HEUXIVA 2: “System guidance and capabilities”.
IDHEUXIVA2
NameSystem Guidance and Capabilities
DefinitionThe device must guide the user through dialog and activities using words that the user recognizes (and does not increase their cognitive ability). It should also clarify in a simple way its capabilities.
ExplanationThe device must be capable of establishing a conversation with the user. Where it guides and orients the user throughout the dialog so that the device can function correctly, and the user does not get lost in the process. In turn, if the device does not have a feature and/or cannot carry out a user’s request, the device must explain in a simple way why it does not have and/or cannot execute the action in natural language.
Priority(3) Critical
UX/Usability attributeUsability: Effectiveness, Satisfaction
UX: Useful, Desirable, Usable, Valuable
Voice Assistant FeatureCulturizable/adaptable, Voice interface, Effective communication, Activity management
Set of heuristics related
C6: Error prevention [10]
N6: Recognition Rather than Recall [31]
Checklist
The system guides the dialog through validation questions with the user.
The device knows its capabilities.
The devices allow users to perform and manage functional tasks (such as scheduling appointments or setting alarms) through voice commands.
The device provides help to the user regardless of the activity the user is doing.
The device has a maximum listening limit of 2 m.
ExampleCompliance:
Applsci 15 11178 i001 Ok Google, read my email
Applsci 15 11178 i002 My version does not allow me to perform this action, however, update 1.2 allows it
Non-compliance:
Applsci 15 11178 i003 Ok Google, read my email
Applsci 15 11178 i002 I’m sorry, I didn’t understand you
Table A12. HEUXIVA 3: “Effective and fluid communication”.
Table A12. HEUXIVA 3: “Effective and fluid communication”.
IDHEUXIVA3
NameEffective and Fluid Communication
DefinitionThe device must adapt to the context and situations that arise in the conversation, as well as remembering previous requests and conversations with the user.
ExplanationThe device must communicate as effectively as possible with the user, respecting the context of the conversation and being prepared to pause, conversation fillers and interruptions, as well as failures in the dialog, detours, and in turn the device must be able to remember previous conversations with the user and/or requests from the user.
Priority(3) Critical
UX/Usability attributeUsability: Effectiveness, Efficiency
UX: Useful, Usable
Voice Assistant FeatureEffective communication, Effective, Multi-user
Set of heuristics related
H2: Context [11]
H3: Naturalness [11]
E7.1: Short and Long-term system memory [37]
E8.1: Identity [37]
Checklist
The device provides a continuous conversation option and maintains context between consecutive interactions.
The device maintains intonation according to the context.
The device remembers requests made.
The device remains listening for a few seconds when the user stops/thinks in the middle of a request.
The device allows the user to extend the conversation.
ExamplesCompliance:
*User whispers*
Applsci 15 11178 i001 It’s time to sleep
*GA whispers*
Applsci 15 11178 i002 I will play music to sleep, may you rest
Non-compliance:
*GA playing music*
Applsci 15 11178 i003 OK Google, pause
Applsci 15 11178 i002 According to the RAE, “pause” means brief interruption of an action or movement.
Table A13. HEUXIVA 4: “Environment match between assistant and user language”.
Table A13. HEUXIVA 4: “Environment match between assistant and user language”.
IDHEUXIVA4
NameEnvironment Match Between Assistant and User Language
DefinitionThe device must understand the user’s particular way of speaking, in addition to interacting in their language; with words, phrases and concepts familiar to the user.
ExplanationThe device must be verbally adapted to the geographical location in which it is located, giving way to conversations using the language and concepts or expressions that the user uses daily.
Priority(2) Important
UX/Usability attributeUsability: Effectiveness
UX: Useful, Valuable, Desirable
Voice Assistant FeatureCulturizable/adaptable, Multi-user, Voice interface
Set of heuristics related
C2: Match between system and the real world [10]
Checklist
The device allows the user to manage aspects of his/her voice tone by voice or text.
The device recognizes user languages.
The device responds according to the user’s language.
The device recognizes in the requests the particular way of talking about the user.
The device recognizes and differentiate the voices of multiple users, allowing everyone in the same environment to interact with the assistant naturally.
The device recognizes established informal words and is recognized in the user’s language.
ExamplesCompliance:
Applsci 15 11178 i001 Ok Google. How are you?
Applsci 15 11178 i002 I feel very well
*2 s later*
Applsci 15 11178 i001 Ok Google, reproduce música
Applsci 15 11178 i002 Reproduciendo *name of song* en Spotify.
Non-compliance:
*User speaking English*
Applsci 15 11178 i003 Ok Google, What time is it?
*GA answer in Spanish*
Applsci 15 11178 i002 Son las 8 de la mañana
Table A14. HEUXIVA5: “Information Accuracy”.
Table A14. HEUXIVA5: “Information Accuracy”.
IDHEUXIVA5
NameInformation Accuracy
DefinitionThe responses delivered by the device must be relevant, brief and according to what is requested by the user. Similarly, the device must provide truthful information during interaction with the user.
ExplanationFor actions/requests to be more efficient and effective, the device’s responses must be coherent and truthful, that is, the information provided must be logical, realistic and true. In turn, to capture the user’s attention, the responses must be brief and contain the most essential and/or important part of what is requested.
Priority(3) Critical
UX/Usability attributeUsability: Effectiveness, Efficiency, Satisfaction
UX: Useful, Valuable
Voice Assistant FeatureEffective conversation, Effective, Voice interface
Set of heuristics related
E2.1: Brevity [37]
E2.2: Information density [37]
N8: Aesthetic and minimalist design [31]
N6: Recognition rather than recall [31]
Checklist
The device mostly provides indispensable information.
The response of the device is coherent and cohesive with the user request.
The device provides accurate and/or truthful information.
The device provides the response quickly or in a reasonable time.
The device provides consistent information according to the date and time of the consultation.
ExamplesCompliance:
Applsci 15 11178 i001 Ok, Google, when did World War II start?
Applsci 15 11178 i002 The Second World War began on 1 September, 1939.
Non-compliance:
Applsci 15 11178 i003 Ok Google, what is the temperature?
Applsci 15 11178 i002 The current temperature in Valparaíso is 11 °C, for tomorrow a temperature of 16° is expected with a maximum of 15° and a minimum of 7° and a probability of rain of 20%.
Table A15. HEUXIVA6: “User control and freedom”.
Table A15. HEUXIVA6: “User control and freedom”.
IDHEUXIVA6
NameUser Control and Freedom
DefinitionThe device allows the user to perform, redo, and undo actions or requests.
ExplanationThe device allows actions requested by the user and at their request. Sometimes redo and undo these requests when the user deems it.
Priority(3) Critical
UX/Usability attributeUsability: Satisfaction
UX: Credible, Valuable, Learning capacity, Useful
Voice Assistant FeatureActivity management, Effective communication, Customizable
Set of heuristics related
C3: User control and freedom [10]
N3: User control and freedom [31]
E3.1: Explicit user actions [37]
E3.2: User control (including ethics and privacy) [37]
Checklist
The device executes the user’s requests.
The device must allow deleting, adding and/or modifying actions.
The device should not do or undo actions without the user requesting it.
ExamplesCompliance:
Applsci 15 11178 i001 Ok Google, delete my 7 p.m. alarm
Applsci 15 11178 i002 Ok, alarm deleted
Non-compliance:
Applsci 15 11178 i003 Ok Google, delete my 7 p.m. alarm
Applsci 15 11178 i002 I can’t delete the alarm
Table A16. HEUXIVA7: “Consistent voice interface”.
Table A16. HEUXIVA7: “Consistent voice interface”.
IDHEUXIVA7
NameConsistent Voice Interface
DefinitionThe device must be able to provide the information through voice and being consistent in its personality.
ExplanationThe device should be able to provide information and/or answers ideally through the voice interface and, in turn, in the interaction with the user, the device should follow standards in the user’s personality, that is, have a consistent voice/tone, language style and sounds, so as not to confuse the user.
Priority(2) Important
UX/Usability attributeUsability: Satisfaction
UX: Credible, Desirable, Useful, Valuable
Voice Assistant FeatureEffective conversation, Voice interface, Culturizable/adaptable, Activity management
Set of heuristics related
C4: Consistency and standards [10]
N4: Consistency and standards [31]
E6: Consistency [37]
E6.2: External consistency [37]
Checklist
The device maintains the chosen voice throughout the conversation.
The device communicates using a voice interface.
The device maintains its formal language in all situations.
The device uses a consistent tone, vocabulary, and personality across interactions.
ExamplesCompliance:
*It is 1:00 p.m. on 28 July.
Applsci 15 11178 i001 Ok Google, read my reminders
Applsci 15 11178 i002 Today you have 2 reminders, one at 2:30 p.m. “Take pills” and another at 6:00 p.m. “Walk”. Do you want me to mention the week’s reminders?
Non-compliance:
*GA is playing music on Spotify*
Applsci 15 11178 i003 Ok Google, how are you?
*GA indicates in a feminine voice*
Applsci 15 11178 i002 I feel great today.
*User unlinks the GA connection with Spotify*
*GA indicates in masculine voice*
Applsci 15 11178 i002 Error when playing Spotify
Table A17. HEUXIVA 8: “Voice shortcuts, flexibility and personalization”.
Table A17. HEUXIVA 8: “Voice shortcuts, flexibility and personalization”.
IDHEUXIVA8
NameVoice Shortcuts, Flexibility and Personalization
DefinitionThe device must answer depending on the environment in which the user is located, providing shortcuts according to the context, allowing customization and adapting according to the needs of the user.
ExplanationThe device must have flexibility to adapt to the needs and capabilities of users, this being the type of user (novice, expert), physical environments and aspects of device customization. In addition to providing voice shortcuts to perform an action more quickly.
Priority(2) Important
UX/Usability attributeUsability: Effectiveness, Efficiency, Satisfaction
UX: Usable, Learning capacity
Voice Assistant FeatureCustomizable, Multi-user, Multi-linkable
Sets of heuristics related
C7: Flexibility and efficiency of use [10]
N7: Flexibility and efficiency of use [31]
E4: Adaptability [37]
E4.1: Flexibility [37]
E4.2: Level of user experience [37]
E4.3: Multi-user [37]
E7.2: Environment [37]
Checklist
The device responds to the user’s shortcut requests.
The device understands the shortcut context of the requests.
The device allows the creation of shortcuts.
The device allows voice customization of the assistant.
The device can configure sounds that indicate a particular action.
The device allows voice command customization.
The device allows linking or integrating external services and smart devices (e.g., music apps, lighting, appliances, temperature control) and enables their management through voice commands.
The device provides the option to add a word and customize it for use.
The device grants the option to adjust/customize the default settings.
ExampleCompliance:
Applsci 15 11178 i001 Ok Google, music.
Applsci 15 11178 i002 Ok, reproducing *song name* on Spotify
(The user can say “music” instead of “play music”)
Non-compliance:
Applsci 15 11178 i003 Ok Google, music.
Applsci 15 11178 i002 I’m sorry, I didn’t understand.
Table A18. HEUXIVA 9: “Error prevention”.
Table A18. HEUXIVA 9: “Error prevention”.
IDHEUXIVA9
NameError Prevention
DefinitionThe device must provide the necessary information to warn the user when an error is about to occur.
ExplanationWhen the user requests an action that could change the context of the interaction and/or an error is about to be triggered, the system must warn the user, communicating the consequences of the action that is about to be performed.
Priority(2) Important
UX/Usability attributeUsability: Effectiveness, Efficiency
UX: Useful
Voice Assistant FeatureEffective Conversation, Voice interface
Sets of heuristics related
H1: Completeness [11]
H3: Naturalness [11]
N5: Error prevention [31]
E1.2: Immediate feedback [37]
E5: Error management [37]
Checklist
The device asks for confirmation from the user to perform an action that could have consequences on the interaction.
The device rephrases unclear input for confirmation.
The device prevents accidental activation or unintended actions.
ExamplesCompliance:
Applsci 15 11178 i001 Ok Google, play music
Applsci 15 11178 i002 Ok, playing music
Applsci 15 11178 i001 Ok Google, call mom
Applsci 15 11178 i002 When calling, the music will stop, do you still want to call?
Non-compliance:
Applsci 15 11178 i003 Ok Google, read me today’s news
Applsci 15 11178 i002 Here you have today’s news
*Reads the news*
Applsci 15 11178 i003 Ok Google, I want to watch a Youtube video
Applsci 15 11178 i002 Ok, playing recommended videos on Youtube
*Stops reading the news*
Table A19. HEUXIVA 10: “Help users recognize, diagnose, and fix errors”.
Table A19. HEUXIVA 10: “Help users recognize, diagnose, and fix errors”.
IDHEUXIVA10
NameHelp Users Recognize, Diagnose, and Fix Errors
DefinitionError messages should be expressed in simple language (not codes), accurately indicate the problem, and constructively suggest a solution that mostly uses voice commands or actions.
ExplanationAt the time of an error or problem occurring during interaction with the device, that is, while the user is using the device, it manifests and implies the error in a language understandable to it and provides an appropriate solution and help, all this preferably through the voice interface.
Priority(3) Critical
UX/Usability attributeUsability: Effectiveness, Efficiency
UX: Valuable, Useful
Voice Assistant FeatureCulturizable/adaptable, Voice interface, Multi-linkable
Sets of heuristics related
C9: Help users recognize, diagnose and recover from errors [10]
N9: Help the user to recognize, diagnose and recover from errors [31]
E5.2: Quality of the error message (action proposal) [37]
Checklist
The device provides constructive help in the event of errors and/or problems.
The device clearly indicates the possible causes of errors.
The device suggests possible solutions or recovery options.
ExamplesCompliance:
Applsci 15 11178 i001 Ok Google, call Fernanda O.
Applsci 15 11178 i002 I’m sorry, I can’t do that. To make a call you must first link the device with Google’s Duo App.
Non-compliance:
*There’s an alarm programmed for the next day and 9 p.m.*
Applsci 15 11178 i003 Ok Google, create a new alarm for tomorrow at 9 p.m.
Applsci 15 11178 i002 Sorry, I didn’t understand
Table A20. HEUXIVA 11: “Data privacy”.
Table A20. HEUXIVA 11: “Data privacy”.
IDHEUXIVA11
NameData Privacy
DefinitionThe device must inform the user about the privacy and use of personal data. Likewise, it must grant the possibility of rejecting the collection and analysis of their data, thus being transparent and truthful with the user.
ExplanationThe device must request the user’s permission for the use of the data that will be collected over time, and the user must have the possibility to reject this option.
Priority(3) Critical
UX/Usability attributeUsability: Satisfaction
UX: Valuable, Credible
Voice Assistant FeatureSecurity and privacy, Activity management
Sets of heuristics related
C11: Integrity [10]
Checklist
The device requests authorization for the use of the data collected during the dialog.
The device provides a section to manage privacy and security.
Privacy settings and permissions are easily accessible to users.
ExamplesCompliance:
*Initializing the device for the first time*
Applsci 15 11178 i002 Hello, our conversations help me improve, do you allow me to collect data?
Applsci 15 11178 i001 No thanks
Applsci 15 11178 i002 Okay, the data from our conversations will not be collected.
Non-compliance:
*Initializing the device for the first time*
Applsci 15 11178 i002 Hello, our conversations help me improve, do you allow me to collect data?
Applsci 15 11178 i003 No thanks
Applsci 15 11178 i002 If you do not accept, I will not be able to function properly
Table A21. HEUXIVA 12: “Voice assistant reliability”.
Table A21. HEUXIVA 12: “Voice assistant reliability”.
IDHEUXIVA12
NameVoice Assistant Reliability
DefinitionReliability must be transmitted through the behavior of the device both in interaction with the user and when the user is inactive.
ExplanationThe device must communicate to the user how active listening works to generate more trust between the user and the device. In turn, this should be activated only by using the activation command.
Priority(3) Important
UX/Usability attributeUX: Valuable, Credible
Voice Assistant FeatureCustomizable, Security and privacy
Sets of heuristics related
C11: Integrity [10]
E8.2: Behavior [37]
Checklist
The device only activates and interacts when called.
The device provides accurate feedback when unable to execute a command.
The device performs tasks accurately even under varying conditions (e.g., background noise).
ExamplesCompliance:
Applsci 15 11178 i001 Ok Google, call Daniela
Applsci 15 11178 i002 Ok, calling Daniela
Non-compliance:
Applsci 15 11178 i003 *Talking to another person in the environment*
*GA device activates*
Applsci 15 11178 i002 Calling Daniela
Table A22. HEUXIVA13: “Guides and documentation”.
Table A22. HEUXIVA13: “Guides and documentation”.
IDHEUXIVA13
NameGuides and Documentation
DefinitionThe device must provide simple and comprehensive physical or electronic documentation of the internal and external workings of the device, either through a request from the user or external search.
ExplanationThe device must be provided with a user manual/guide for easy first use and installation/reinstallation to a new location for novice and/or first-time users. This being through the voice assistant (preset explained/described installation instructions before connecting it to the WIFI). This should contain all the information and usage examples necessary for the user to interact with the device properly. The appliance must provide internal, external information (device buttons/its operation) and of configuration about it.
Priority(2) Important
UX/Usability attributeUsability: Effectiveness, Satisfaction
UX: Findable/locatable, Valuable, Usable
Voice Assistant FeatureGuidance and assistance
Sets of heuristics related
N10: Help and documentation [31]
Checklist
The device has a virtual/physical manual.
The device provides access to guides or helps resources through voice.
The user manual has steps for installing the device.
The device offers context-sensitive help based on user actions.
ExamplesCompliance:
The device has a physical instruction manual and an online one on its website.
Non-compliance:
The device has no information on basic functions.

Appendix K. Coverage Matrix Linking Voice Assistant Features, Heuristics, Checklist Items, and Problem Types

Table A23. Coverage matrix for HEUXIVA heuristics.
Table A23. Coverage matrix for HEUXIVA heuristics.
Voice Assistant FeatureHEUXIVA HeuristicChecklist Item (Example)Problem Type (UX Aspect)Example (Compliance/Non-Compliance)
Effective communicationHEUXIVA1, HEUXIVA2, HEUXIVA3, HEUXIVA5, HEUXIVA6, HEUXIVA7, HEUXIVA9(HEUXIVA1) The device has lighting signals when it interacts with the user.Lack of system feedback✅ The device lights up and says “I’m listening”.
❌ No response after the “wake” word.
(HEUXIVA6) The artifact executes the user’s requests.Lack of control✅ “Stop music”. Command immediately halts playback.
❌ Must wait for assistant to finish speaking.
(HEUXIVA9) The device rephrases unclear input for confirmation.Ambiguous input handling✅ “Did you mean alarm for 7 AM or 7 PM?”/❌ Executes wrong command without clarifying.
EffectiveHEUXIVA3, HEUXIVA5(HEUXIVA3) The device provides a continuous conversation option and maintains context between consecutive interactions.Context loss✅ Understands follow-up question: “And what about tomorrow?”/❌ Requires repeating full command each time.
Activity managementHEUXIVA1, HEUXIVA2, HEUXIVA6, HEUXIVA7, HEUXIVA11(HEUXIVA2) The devices allow users to perform and manage functional tasks (such as scheduling appointments or setting alarms) through voice commands.Task management and functionality coverage✅ The assistant successfully schedules a meeting or sends a message via voice command./❌ The assistant fails to complete management actions or requires manual confirmation on a secondary device.
CustomizableHEUXIVA6, HEUXIVA8, HEUXIVA12(HEUXIVA12) The device performs tasks accurately even under varying conditions (e.g., background noise).Performance✅ Recognizes commands in noisy environments./❌ Fails to respond when music is playing.
Multi-userHEUXIVA3, HEUXIVA4, HEUXIVA8(HEUXIVA4) The device recognizes and differentiate the voices of multiple users, allowing everyone in the same environment to interact with the assistant naturally.Multi-user inclusiveness✅ The assistant identifies different household members and adapts responses (e.g., personalized calendar or music)./❌ Only responds to the registered user’s voice, ignoring others in the same space.
Security and privacyHEUXIVA11, HEUXIVA12(HEUXIVA11) The device requests authorization for the use of the data collected during the dialog.Transparency issue✅ “Do you agree to save this recording?”/❌ Stores voice data automatically.
Multi-linkableHEUXIVA8, HEUXIVA10(HEUXIVA8) The device allows linking or integrating external services and smart devices (e.g., music apps, lighting, appliances, temperature control) and enables their management through voice commands.Integration✅ The assistant connects to Spotify and smart lights, allowing full control by voice./❌ Integration with external apps or devices fails or requires manual configuration.
Culturizable/adaptableHEUXIVA2, HEUXIVA4, HEUXIVA7, HEUXIVA10(HEUXIVA10) The device suggests possible solutions or recovery options.Lack of recovery adaptation✅ “Try saying the command again”./❌ Offers no instruction to fix issue.
Voice interfaceHEUXIVA1, HEUXIVA2, HEUXIVA4, HEUXIVA5, HEUXIVA7, HEUXIVA9, HEUXIVA10(HEUXIVA5) The response of the device is coherent and cohesive with the user request.Irrelevant or excessive information✅ Gives only relevant weather data./❌ Reads the entire Wikipedia page.
(HEUXIVA7) The device uses a consistent tone, vocabulary, and personality across interactions.Inconsistent persona✅ Maintains friendly tone and terminology./❌ Changes voice or phrasing randomly.
Guidance and assistanceHEUXIVA13(HEUXIVA13) The device provides access to guides or helps resources through voice.Lack of support resources✅ “You can say ‘Help’ to learn available commands”./❌ No help option available.
“The ✅ symbol shows an example of compliance of each heuristic; while the ❌ symbol shows an example of a non-compliance”.

References

  1. Nielsen, J.; Molich, R. Heuristic evaluation of user interfaces. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems Empowering People—CHI ’90, Seattle, WA, USA, 1–5 April 1990; pp. 249–256. [Google Scholar] [CrossRef]
  2. Quiñones, D.; Rusu, C.; Rusu, V. A methodology to develop usability/user experience heuristics. Comput. Stand. Interfaces 2018, 59, 109–129. [Google Scholar] [CrossRef]
  3. Rzepka, C.; Berger, B.; Hess, T. Voice assistant vs. Chatbot–examining the fit between conversational agents’ interaction modalities and information search tasks. Inf. Syst. Front. 2022, 24, 839–856. [Google Scholar] [CrossRef]
  4. Santos, J.; Rodrigues, J.J.P.C.; Casal, J.; Saleem, K.; Denisov, V. Intelligent personal assistants based on internet of things approaches. IEEE Syst. J. 2016, 12, 1793–1802. [Google Scholar] [CrossRef]
  5. Santos, J.; Rodrigues, J.J.P.C.; Silva, B.M.C.; Casal, J.; Saleem, K.; Denisov, V. An IoT-based mobile gateway for intelligent personal assistants on mobile health environments. J. Netw. Comput. Appl. 2016, 71, 194–204. [Google Scholar] [CrossRef]
  6. Han, S.; Yang, H. Understanding adoption of intelligent personal assistants: A parasocial relationship perspective. Ind. Manag. Data Syst. 2018, 118, 618–636. [Google Scholar] [CrossRef]
  7. Aymerich-Franch, L.; Ferrer, I. Investigating the use of speech-based conversational agents for life coaching. Int. J. Hum. Comput. Stud. 2022, 159, 102745. [Google Scholar] [CrossRef]
  8. Massai, L.; Nesi, P.; Pantaleo, G. PAVAL: A location-aware virtual personal assistant for retrieving geolocated points of interest and location-based services. Eng. Appl. Artif. Intell. 2019, 77, 70–85. [Google Scholar] [CrossRef]
  9. Cowan, B.R.; Pantidi, N.; Coyle, D.; Morrissey, K.; Clarke, P.; Al-Shehri, S.; Earley, D.; Bandeira, N. “What can i help you with?” infrequent users’ experiences of intelligent personal assistants. In Proceedings of the 19th International Conference on Human-Computer Interaction with Mobile Devices and Services, Vienna, Austria, 4–7 September 2017; pp. 1–12. [Google Scholar]
  10. Langevin, R.; Lordon, R.J.; Avrahami, T.; Cowan, B.R.; Hirsch, T.; Hsieh, G. Heuristic evaluation of conversational agents. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, Virtual, 8–13 May 2021; pp. 1–15. [Google Scholar]
  11. Sánchez-Adame, L.M.; Mendoza, S.; Urquiza, J.; Rodríguez, J.; Meneses-Viveros, A. Towards a set of heuristics for evaluating chatbots. IEEE Lat. Am. Trans. 2021, 19, 2037–2045. [Google Scholar] [CrossRef]
  12. Zwakman, D.S.; Pal, D.; Triyason, T.; Vanijja, V. Usability of voice-based intelligent personal assistants. In Proceedings of the 2020 International Conference on Information and Communication Technology Convergence (ICTC), Jeju Island, Republic of Korea, 21–23 October 2020; pp. 652–657. [Google Scholar]
  13. Google Actions on Google Glossary (Dialogflow). 2024. Available online: https://developers.google.com/assistant/df-asdk/glossary (accessed on 15 January 2025).
  14. Google Nest Explore What You Can Do with Google Nest and Home Devices. 2025. Available online: https://support.google.com/googlenest/answer/7130274 (accessed on 15 January 2025).
  15. Google Nest Customize Smart Plug or Smart Switch Voice Commands with Device Type. 2025. Available online: https://support.google.com/googlenest/answer/9921419 (accessed on 15 January 2025).
  16. Google Nest Customize Your News Experience. 2025. Available online: https://support.google.com/googlenest/answer/7551674 (accessed on 15 January 2025).
  17. Google Nest Guests and Your Google Connected Home Devices. 2025. Available online: https://support.google.com/googlenest/answer/7177221 (accessed on 15 January 2025).
  18. García, N.H.; Martínez, I.L.; Gutiérrez, M.S.; Veracruzana, X. Development of new commands for Google Assistant using Dialogflow, Firebase and NodeMCU (ESP8266) as an intermediary. Abstr. Appl. 2020, 29, 74–87. [Google Scholar]
  19. Google Nest FAQs on Privacy: Google Nest. 2025. Available online: https://support.google.com/googlenest/answer/9415830 (accessed on 15 January 2025).
  20. Google Assistant What It Can Do—Get Started. 2025. Available online: https://assistant.google.com/learn/ (accessed on 15 January 2025).
  21. Google Assistant Control Smart Home Devices with Google Assistant. 2025. Available online: https://support.google.com/assistant/answer/7314909? (accessed on 15 January 2025).
  22. ISO 9241-210:2019; Ergonomics of Human-System Interaction—Part 210: Human-Centred Design for Interactive Systems. ISO: Geneva, Switzerland, 2019.
  23. Park, J.; Han, S.H.; Kim, H.K.; Cho, Y.; Park, W. Developing elements of user experience for mobile phones and services: Survey, interview, and observation approaches. Hum. Factors Ergon. Manuf. Serv. Ind. 2013, 23, 279–293. [Google Scholar] [CrossRef]
  24. Lykke, M.; Jantzen, C. User experience dimensions: A systematic approach to experiential qualities for evaluating information interaction in museums. In Proceedings of the 2016 ACM on Conference on Human Information Interaction and Retrieval, Chapel Hill, NC, USA, 13–17 March 2016; pp. 81–90. [Google Scholar]
  25. Morville, P. User Experience Design. Semantic Studios. 2004. Available online: https://semanticstudios.com/user_experience_design/ (accessed on 15 January 2025).
  26. Lewis, J.R. Usability: Lessons Learned. and Yet to Be Learned. Int. J. Hum. Comput. Interact. 2014, 30, 663–684. [Google Scholar] [CrossRef]
  27. Kendrick, A. Formative vs. Summative Evaluations. Nielsen Norman Group. 2019. Available online: https://www.nngroup.com/articles/formative-vs-summative-evaluations/ (accessed on 15 January 2025).
  28. Nielsen, J. Thinking Aloud: The #1 Usability Tool. Nielsen Norman Group. 2012. Available online: https://www.nngroup.com/articles/thinking-aloud-the-1-usability-tool/ (accessed on 15 January 2025).
  29. Experience Research Society UX Expert Evaluation. 2024. Available online: https://experienceresearchsociety.org/ux-methods/ux-expert-evaluation/ (accessed on 15 January 2025).
  30. Harley, A. UX Expert Reviews. Nielsen Norman Group. 2018. Available online: https://www.nngroup.com/articles/ux-expert-reviews/ (accessed on 15 January 2025).
  31. Nielsen, J. 10 Usability Heuristics for User Interface Design. Nielsen Norman Group. 2024. Available online: https://www.nngroup.com/articles/ten-usability-heuristics/ (accessed on 15 January 2025).
  32. Brooke, J. SUS-A quick and dirty usability scale. Usability Eval. Ind. 1996, 189, 4–7. [Google Scholar]
  33. Zwakman, D.S.; Pal, D.; Triyason, T.; Arpnikanondt, C. Voice usability scale: Measuring the user experience with voice assistants. In Proceedings of the 2020 IEEE International Symposium on Smart Electronic Systems (iSES) (Formerly iNiS), Chennai, India, 14–16 December 2020; pp. 308–311. [Google Scholar]
  34. Nielsen, J. Severity Ratings for Usability Problems. Nielsen Norman Group. 1994. Available online: https://www.nngroup.com/articles/how-to-rate-the-severity-of-usability-problems/ (accessed on 15 January 2025).
  35. ISO 9241-11:2018; Ergonomics of Human-System Interaction—Part 11: Usability: Definitions and Concepts. ISO: Geneva, Switzerland, 2018. Available online: https://www.iso.org/standard/63500.html (accessed on 1 June 2022).
  36. Nielsen, J. Usability 101: Introduction to Usability. Nielsen Norman Group. 2012. Available online: https://www.nngroup.com/articles/usability-101-introduction-to-usability/ (accessed on 15 January 2025).
  37. Nowacki, C.; Gordeeva, A.; Lizé, A.-H. Improving the usability of voice user interfaces: A new set of ergonomic criteria. In Proceedings of the Design, User Experience, and Usability. Design for Contemporary Interactive Environments: 9th International Conference, DUXU 2020, Held as Part of the 22nd HCI International Conference, HCII 2020, Copenhagen, Denmark, 19–24 July 2020; Proceedings, Part I. Springer: Berlin/Heidelberg, Germany, 2020; pp. 117–133. [Google Scholar]
  38. Google Store Nest Mini—Overview. 2025. Available online: https://store.google.com/us/product/google_nest_mini?hl=en-US (accessed on 15 January 2025).
  39. Scapin, D.L.; Bastien, J.M.C. Ergonomic criteria for evaluating the ergonomic quality of interactive systems. Behav. Inf. Technol. 1997, 16, 220–231. [Google Scholar] [CrossRef]
  40. de Barcelos Silva, A.; Gomes, M.M.; da Costa, C.A.; da Rosa Righi, R.; Barbosa, J.L.V.; Pessin, G.; De Doncker, G.; Federizzi, G. Intelligent personal assistants: A systematic literature review. Expert Syst. Appl. 2020, 147, 113193. [Google Scholar] [CrossRef]
  41. Quiñones, D.; Ojeda, C.; Herrera, R.F.; Rojas, L.F. UXH-GEDAPP: A set of user experience heuristics for evaluating generative design applications. Inf. Softw. Technol. 2024, 168, 107408. [Google Scholar] [CrossRef]
Figure 1. Steps and iterations performed to develop HEUXIVA.
Figure 1. Steps and iterations performed to develop HEUXIVA.
Applsci 15 11178 g001
Table 1. Effectiveness of HVA (first iteration).
Table 1. Effectiveness of HVA (first iteration).
Experimental GroupControl GroupObservations
Number of evaluators33-
Set of heuristics usedHeuristics for evaluating voice assistants (HVA)Conversational agents’ heuristics (CAH) [10]-
Amount of heuristics1211-
Total of problems identified3126-
Total of the correct associations1517-
Total of the incorrect associations169-
Percentage of the correct associations (CA)CA1 = 48.8%CA2 = 65.38%CA1 < CA2, it is concluded that the control set performs better than the proposed set, as it has a higher percentage of correct associations (HVA requires refinement).
Percentage of the incorrect associations (IA)IA1 = 51.62%IA2 = 34.62%IA1 > IA2, it is concluded that the control set is better, since the proposed set has a higher percentage of incorrect associations (HVA requires refinement).
Problems identified by both groups (P1)7(P2) identified more problems than (P3), it is concluded that the proposed set performs better than the control set.
Problems identified by the experimental group (P2)24-
Problems identified by the control group (P3)-19
Number of specific problems identified1914-
Effectiveness in terms of number of specific problems identified (ESS)ESS1 = 61.29%ESS2 = 53.84%ESS1 > ESS2, the proposed set identified more specific problems than the control set, then it works better.
Number of problems identified and qualified with a severity greater than 21319-
Effectiveness in terms of number of problems identified and qualified with a severity greater than 2 (ESV)ESV1 = 41.93%ESV2 = 70.07%ESV1 < ESV2, it is concluded that the control set is better, since it finds more problems rated as severe than the proposed set (HVA requires refinement).
Number of problems identified and qualified with a criticality greater than 41522-
Effectiveness in terms of number of problems identified and qualified with a criticality greater than 4 (ESC)ESC1 = 48.38%ESC2 = 84.61%ESC1 < ESC2, it is concluded that the control set encounters more problems rated as critical than the proposed set (HVA requires refinement).
Table 2. Average perception scores for HVA set in the four evaluated dimensions (first iteration).
Table 2. Average perception scores for HVA set in the four evaluated dimensions (first iteration).
HeuristicD1—UtilityD2—ClarityD3—Ease of UseD4—Need of Additional Elements
HVA1: System Status Visibility5.05.04.04.3
HVA2: Feedback and Help Users Prevent Errors4.64.63.64.3
HVA3: Brevity and Relevance of Information4.04.62.64.6
HVA4: Natural Communication4.64.03.35.0
HVA5: Match Between the System and the Real World4.34.34.34.3
HVA6: Consistent Voice Interface3.34.64.03.6
HVA7: User Control and Freedom4.34.63.33.6
HVA8: Flexibility and Personalization4.34.04.04.6
HVA9: Help Users Recognize, Diagnose, and Fix Errors4.34.34.34.0
HVA10: System Guidance and Capabilities4.64.34.64.0
HVA11: Reliability and Data Privacy4.04.32.65.0
HVA12: Guides and Documentation4.04.33.34.6
Average per dimension4.34.43.74.4
Table 3. Average perception scores for HEUXIVA set in the four evaluated dimensions (second iteration).
Table 3. Average perception scores for HEUXIVA set in the four evaluated dimensions (second iteration).
HeuristicD1—UtilityD2—ClarityD3—Ease of UseD4—Need of Additional Elements
HEUXIVA1: System Status Visibility5.04.94.85.0
HEUXIVA2: System Guidance and Capabilities4.44.14.14.5
HEUXIVA3: Effective and Fluid Communication4.64.34.34.5
HEUXIVA4: Environment Match Between Assistant and User Language4.54.84.14.8
HEUXIVA5: Information Accuracy4.44.13.94.9
HEUXIVA6: User Control and Freedom5.04.04.84.9
HEUXIVA7: Consistent Voice Interface4.34.33.94.8
HEUXIVA8: Voice Shortcuts. Flexibility and Personalization4.54.34.04.8
HEUXIVA9: Error Prevention4.64.84.44.6
HEUXIVA10: Help Users Recognize, Diagnose, and Fix Errors4.95.04.94.9
HEUXIVA11: Data Privacy4.44.94.35.0
HEUXIVA12: Voice Assistant Reliability4.44.34.04.5
HEUXIVA13: Guides and Documentation3.85.04.94.5
Average per dimension4.54.54.34.7
Table 4. Comparison of results obtained in the expert judgment in the first and second iteration.
Table 4. Comparison of results obtained in the expert judgment in the first and second iteration.
D1—UtilityD2—ClarityD3—Ease of UseD4—Need of Additional Elements
Average first iteration4.34.43.74.4
Average second iteration4.54.54.34.7
Table 5. Improvements in the perception of HEUXIVA6 and HEUXIVA7 in the expert judgment of iteration 2.
Table 5. Improvements in the perception of HEUXIVA6 and HEUXIVA7 in the expert judgment of iteration 2.
HeuristicIDIterationD1—UtilityD2—ClarityD3—Ease of UseD4—Need of Additional Elements
User Control and FreedomHV71º It.4.34.63.33.6
HEUXIVA62º It.5.04.04.84.9
Consistent Voice InterfaceHVA61º It.3.34.64.03.6
HEUXIVA72º It.4.3.33.94.8
Table 6. Thinking aloud user test quantitative and qualitative results (second iteration).
Table 6. Thinking aloud user test quantitative and qualitative results (second iteration).
Task (T)Percentage of Task FulfillmentAverage TimeObservationsMost Expressed EmotionsUsability/UX Problems Related (P)Heuristic Related (HEUXIVA)
T1: Make a call100%79.6
s
All users performed the task correctly.
Users appreciated that their requests were carried out quickly and efficiently. It was evident that there is a need for the device to communicate to the user through voice, sound, and/or light that it is performing an action.
Neutral (41.7%)
Happiness (33.3%)
P1: The user forgets the activation word
P2: The device does not understand what the user says
P1 is covered by HEUXIVA6
P2 is covered by HEUXIVA4
T2: Check available flights100%131
s
All users performed the task correctly.
Users become confused when they realize that sometimes the device cannot understand their requests.
It is necessary to reconsider whether the activation command is too complex for users.
Irritation (41.7%) and Confusion (33.3%)P3: The device does not effectively communicate the error
P4: The user forgets the activation word
P3 is covered by HEUXIVA10
P4 is covered by HEUXIVA6
T3: Speak colloquially with the device100%77.8
s
All users performed the task correctly. Users expected the device to provide a response that matches their request. If the device performs an activity or delivers a response different from what was requested, users indicated that they tend to doubt both themselves and the device.Neutral (41.7%) and Happiness (33.3%)P5: The device does not respond immediately after the request is completed
P6: The device provides incoherent and unrelated responses to what the user requested
P7: The device does not understand the user’s idiolect
P5 is covered by HEUXIVA3
P6 is covered by HEUXIVA5
P7 is covered by HEUXIVA4
T4: Make queries in the area/field of History100%122.3
s
All users performed the task correctly. Based on user feedback, it highlighted the importance of the device communicating in the user’s language and providing information in an understandable manner.Neutral (33.3%) and Happiness (33.3%)P8: The device provides extensive and confusing information
P9: The device does not follow instructions
P8 is covered by HEUXIVA5
P9 is covered by HEUXIVA1
T5: Set an alarm100%118.8
s
All users performed the task correctly. Users appreciated that the voice assistant responded to their requests; however, they became confused when they made a request, and the device did not carry out the specified action.Happiness (41.7%)P10: The device provides extensive information
P11: The device interrupts the user while they are giving instructions
P10 is covered by HEUXIVA5
P11 is covered by HEUXIVA3
T6: Delete or edit an alarm100%85.3
s
All users performed the task correctly. Some users were confused and annoyed when they noticed that the device did not properly carry out the request they had just made.Confusion (41.7%) and Irritation (33.3%)P12: The device does not allow editing of an instruction
P13: The device performs a function different from what was requested
P14: The device does not effectively communicate the error
P15: The device ignores the user
P16: The device does not distinguish commands from questions
P12 is covered by HEUXIVA6
P13 is covered by HEUXIVA12
P14 is covered by HEUXIVA10
P15 is covered by HEUXIVA1
P16 is covered by HEUXIVA4
T7: Customize assistant attributes100%79.1
s
All users performed the task correctly. Users expected the voice assistant to allow them to perform the same actions they do when interacting with their mobile phone. This surprised some users when the device redirected them to the mobile interface.Irritation (33%) and Neutral (33%)P17: The device requests manual configurations to be made
P18: The device provides lengthy instructions
P17 is covered by HEUXIVA7
P18 is covered by HEUXIVA5
T8: Find device91.66% (11 of 12)168.2
s
Most users completed the task. Users showed annoyance and/or frustration when they noticed that the device was not following their instructions, was delivering incorrect responses, and was also ignoring them.Confusion (41.7%) and Irritation (33.3%)P19: The device ignores the user (does not perform or respond to requests)
P20: The device performs a function different from what the user requested
P19 is covered by HEUXIVA1
P20 is covered by HEUXIVA12
Table 7. Problems detected in user testing and the related heuristics.
Table 7. Problems detected in user testing and the related heuristics.
HeuristicNumber of ProblemsProblems Related
HEUXIVA5: Information accuracy4
P6: The device provides incoherent and unrelated responses to what the user requested
P8: The device provides extensive and confusing information
P10: The device provides extensive information
P18: The device provides lengthy instructions
HEUXIVA6: User control and freedom3
P1: The user forgets the activation word
P4: The user forgets the activation word
P12: The device does not allow editing of an instruction
HEUXIVA4: Environment match between assistant and user language3
P2: The device does not understand what the user says
P7: The device does not understand the user’s idiolect
P16: The device does not distinguish commands from questions
HEUXIVA1: System status visibility3
P9: The device does not follow instructions
P15: The device ignores the user
P19: The device ignores the user (does not perform or respond to requests)
HEUXIVA3: Effective and fluid communication2
P5: The device does not respond immediately after the request is completed
P11: The device interrupts the user while they are giving instructions
HEUXIVA10: Help users recognize, diagnose, and fix errors2
P3: The device does not effectively communicate the error
P14: The device does not effectively communicate the error
HEUXIVA12: Voice assistant reliability2
P13: The device performs a function different from what was requested
P20: The device performs a function different from what the user requested
HEUXIVA7: Consistent voice interface1
P17: The device requests manual configurations to be made
Table 8. HEUXIVA: a set of Heuristics for Evaluating the User eXperience with Voice Assistants.
Table 8. HEUXIVA: a set of Heuristics for Evaluating the User eXperience with Voice Assistants.
IDNameDescriptionVoice Assistant FeatureUsability/UX Attribute
HEUXIVA1System status visibilityThe device must indicate to the user via voice, sound or light every action that is being performed.Effective communication,
Voice interface,
Activity management
Effectiveness, Efficiency,
Useful, Valuable, Satisfaction
HEUXIVA2System guidance and capabilitiesThe device must guide the user through dialog and activities using words that the user recognizes (and does not increase their cognitive abilities). It should also clarify in a simple way its capabilities.Culturizable/adaptable,
Voice interface,
Effective communication, Activity management
Useful, Effectiveness, Wearable, Satisfaction, Desirable,
Valuable
HEUXIVA3Effective and fluid communicationThe device must adapt to the context and situations that arise in the conversation, as well as remembering previous requests and conversations with the user.Effective communication, Effective,
Multi-user
Efficiency, Effectiveness, Useful,
Wearable
HEUXIVA4Environment match between assistant and user languageThe device must understand the user’s particular way of speaking, in addition to interacting in their language with words, phrases and concepts familiar to the user.Culturizable/adaptable,
Multi-user,
Voice interface
Useful, Effectiveness, Valuable, Desirable
HEUXIVA5Information accuracyThe responses delivered by the device must be relevant, brief and according to what is requested by the user. Similarly, the device must provide truthful information during interaction with the user.Effective communication, Effective,
Voice interface
Effectiveness, Efficiency,
Useful,
Valuable, Satisfaction
HEUXIVA6User control and freedomThe device allows the user to perform, redo, and undo actions or requests.Activity management, Effective communication, CustomizableCredible,
Valuable, Satisfaction, Learning capacity,
Useful
HEUXIVA7Consistent voice interfaceThe device must be able to provide information through voice and being consistent in its personality.Effective communication,
Voice interface,
Culturizable/adaptable,
Activity management
Satisfaction,
Credible,
Desirable,
Useful,
Valuable
HEUXIVA8Voice shortcuts, flexibility and personalizationThe device should answer depending on the environment in which the user is located, providing shortcuts according to the context, allowing customization and adapting according to the needs of the user.Customizable,
Multi-user,
Multi-linkable
Effectiveness,
Efficiency,
Satisfaction,
Usable,
Learning capacity
HEUXIVA9Error preventionThe device must provide the necessary information to warn the user when an error is about to occur.Effective communication,
Voice interface
Effectiveness,
Efficiency,
Useful
HEUXIVA10Help users recognize, diagnose, and fix errorsError messages should be expressed in simple language (not codes), accurately indicate the problem, and constructively suggest a solution that mostly uses voice commands or actions.Culturizable/adaptable,
Voice interface,
Multi-linkable
Valuable,
Useful,
Effectiveness,
Efficiency
HEUXIVA11Data privacyThe device must inform the user about the privacy and use of personal data. Likewise, it must grant the possibility of rejecting the collection and analysis of their data, thus being transparent and truthful with the user.Security and privacy,
Activity management
Valuable,
Satisfaction,
Credible
HEUXIVA12Voice assistant reliabilityReliability must be transmitted through the behavior of the device both in interaction with the user and when the user is inactive.Customizable,
Security and privacy
Valuable,
Credible
HEUXIVA13Guides and documentationThe device must provide simple and comprehensive physical or electronic documentation of the internal and external workings of the device, either through a request from the user or external search.Guidance and assistanceFindable/locatable,
Valuable,
Useful,
Effectiveness,
Satisfaction
Table 9. Comparison between studies related to voice assistants.
Table 9. Comparison between studies related to voice assistants.
StudyDomainDescriptionNumber of ElementsValidationLimitations
Nielsen heuristics (1990) [1,31]General desktop applicationsSet of heuristics. Focus on usability.10 heuristicsExpert review, heuristic evaluation.Not specific to voice assistants, limited to usability.
Cowan et al. (2017) [9]Intelligent personal assistants (IPAs)6 main areas related to usability/UX problems. Focus on user experience.6 key themesNo reportedOnly qualitative analysis, does not propose heuristics.
Langevin et al. (2021) [10]Conversational agentsSet of heuristics, adapted from Nielsen [31]. Focus on usability.11 heuristicsExpert review, heuristic evaluation.Not specific to voice assistants; limited to usability.
Sánchez-Adame et al. (2021) [11]ChatbotsSet of heuristics. Focus on usability.5 heuristicsExpert review, heuristic evaluation.Only for text-based devices, limited to usability.
Zwakman et al. (2020) [12]Voice assistantsSurvey (scale), adapted from SUS [32]. Focus on usability.10 items Quantitative validation (exploratory factor analysis).Does not propose heuristics; limited to perceived usability.
Nowacki and Gordeeva [37]Voice user interface (VUIs)Ergonomic criteria, based on [31,39]. Focus on usability and ergonomics.8 criteria and 20 sub-criteriaPreliminary user testingPreliminary validation, does not propose heuristics, limited to ergonomics.
HEUXIVAVoice assistantsSet of heuristics, based on [9,10,11,12,31,37]. Focus on user experience.13 heuristicsHeuristic evaluation, expert judgment, user testing.Preliminary validation scope (single device).
Table 10. Origin of each HEUXIVA heuristic and its contribution to UX evaluation.
Table 10. Origin of each HEUXIVA heuristic and its contribution to UX evaluation.
IDNameTypeOriginNovel Aspect Introduced for UX Evaluation
HEUXIVA1System status visibilityAdapted heuristicHeuristics: Nielsen and Langevin et al.Focus on feedback (voice, light, sound)
HEUXIVA2System guidance and capabilitiesAdapted heuristicHeuristics: Nielsen and Langevin et al.Guidance through dialog and capability explanation
HEUXIVA3Effective and fluid communicationNew heuristicHeuristics: Sánchez et al.
Ergonomics criteria: Nowacki and Gordeeva
Conversational fluidity, contextual continuity, memory
HEUXIVA4Environment match between assistant and user languageNew heuristicHeuristics: Langevin et al.Adaptation to user language and linguistic environment
HEUXIVA5Information accuracyNew heuristicHeuristics: Nielsen
Ergonomics criteria: Nowacki and Gordeeva
Accuracy, brevity, and contextual relevance of responses
HEUXIVA6User control and freedomAdapted heuristicHeuristics: Nielsen and Langevin et al.
Ergonomics criteria: Nowacki and Gordeeva
Undo/redo through conversational commands
HEUXIVA7Consistent voice interfaceNew heuristicHeuristics: Nielsen and Langevin et al.
Ergonomics criteria: Nowacki and Gordeeva
Voice consistency and coherence
HEUXIVA8Voice shortcuts, flexibility and personalizationNew heuristicHeuristics: Nielsen and Langevin et al.
Ergonomics criteria: Nowacki and Gordeeva
Voice shortcuts, customization, and adaptability
HEUXIVA9Error preventionAdapted heuristicHeuristics: Nielsen and Sánchez et al.
Ergonomics criteria: Nowacki and Gordeeva
Preemptive voice feedback before execution
HEUXIVA10Help users recognize, diagnose, and fix errorsAdapted heuristicHeuristics: Nielsen and Langevin et al.
Ergonomics criteria: Nowacki and Gordeeva
Constructive voice-based error communication
HEUXIVA11Data privacyNew heuristicHeuristics: Langevin et al.Data transparency, privacy management, user consent
HEUXIVA12Voice assistant reliabilityNew heuristicHeuristics: Langevin et al.
Ergonomics criteria: Nowacki and Gordeeva
Reliability and trust in autonomous voice behavior
HEUXIVA13Guides and documentationAdapted heuristicHeuristics: NielsenSimplified physical and digital documentation
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Quiñones, D.; Rojas, L.F.; Serrá, C.; Ramírez, J.; Barrientos, V.; Cano, S. HEUXIVA: A Set of Heuristics for Evaluating User eXperience with Voice Assistants. Appl. Sci. 2025, 15, 11178. https://doi.org/10.3390/app152011178

AMA Style

Quiñones D, Rojas LF, Serrá C, Ramírez J, Barrientos V, Cano S. HEUXIVA: A Set of Heuristics for Evaluating User eXperience with Voice Assistants. Applied Sciences. 2025; 15(20):11178. https://doi.org/10.3390/app152011178

Chicago/Turabian Style

Quiñones, Daniela, Luis Felipe Rojas, Camila Serrá, Jessica Ramírez, Viviana Barrientos, and Sandra Cano. 2025. "HEUXIVA: A Set of Heuristics for Evaluating User eXperience with Voice Assistants" Applied Sciences 15, no. 20: 11178. https://doi.org/10.3390/app152011178

APA Style

Quiñones, D., Rojas, L. F., Serrá, C., Ramírez, J., Barrientos, V., & Cano, S. (2025). HEUXIVA: A Set of Heuristics for Evaluating User eXperience with Voice Assistants. Applied Sciences, 15(20), 11178. https://doi.org/10.3390/app152011178

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop