On the basis of the previously presented results, we extended our research by investigating further related work, formulating a final pattern model and evaluating a software representation of this model. In this regard, the second review is not limited to representative work, but includes a comprehensive analysis of the current research field. Here, the consolidation, implementation and evaluation of design patterns are based on the methodology established in our initial review. Individual components of our second review are explained in the course of this section.
4.1. Final Design Pattern Model
In our finalization process, five additional papers comprising a total of 82 patterns were explored and evaluated. Here, we considered AR-specific works as well as research from general mobile software development. In the following, we first present our research results and then discuss how these were reflected in the refinement of our model.
As described in Section 2
, only a few works deal with patterns for general AR UI design. One of these works was presented by Endsley et al. [14
], who consider various design heuristics and derive a final list of nine AR design patterns, including the minimization of mental load or the alignment of the virtual and physical world. Although AR-specific research often focuses on a specific use-case or target group, these papers usually also consider generally applicable patterns that might be valuable for our model refinement. Here, Agati et al. [17
] focus on the industrial sector by performing a profound aggregation of relevant work, but also include universal patterns spanning categories such as “Cognitive” or “Ergonomics.” In addition, Tuli and Mantri [16
] focus on AR games for kindergarten children, but also consider general patterns such as “Learnability” or “Consistency.” For a broad exploration of the research domain, we not only explored AR-specific research, but also related work from the field of general mobile software development. Here, both Kumar and Goundar [24
] and Dourado and Canedo [25
] reviewed the Nielsen usability heuristics (see [26
]) in the context of mobile applications. This basic set of patterns has been expanded to include heuristics such as “Content Organization” and “Visual Representation” [24
] or “Privacy” and “Efficiency of Use” [25
For our model refinement, we individually evaluated all patterns from the mentioned research and grouped them based on their core objective, excluding patterns that are strongly use-case dependent or only applicable in niche situations. As shown in Table 3
, we consolidated heuristics based on their featured patterns while listing their source of origin. It is noticeable that some patterns are mentioned by almost all considered works, such as Aesthetic & Minimalist Design
or World Consistency
, but some patterns are only mentioned by a small number of papers, such as Multimodality
or Hierarchy & Navigation
. Furthermore, it becomes apparent that many of the listed patterns seem to be universally applicable to mobile software. This is not surprising, since mobile AR applications tend to share similarities with conventional mobile applications. However, patterns that seem universal at first glance reveal specific challenges in the AR context, which we will highlight in the course of this section. It is further evident that there is an overlap of the consolidated heuristics with our initial research, albeit new aspects were discovered that have not been considered before, such as Reduce Cognitive Demand
The consolidated heuristics were reflected in the refinement of our pattern model, which was derived from our initial research. As presented in Figure 2
, we still distinguish between the categories Information
. Nevertheless, the initial category Usage
was replaced by Design
, since many Usage
aspects are already considered in the Interaction
category. We transferred these patterns to the relevant category and focused on actual design aspects within the Design
category. Individual categories and their included patterns are explained in the further course of this section, with a special focus on modifications to the initial classification and AR-specific aspects.
Within our pattern model, the Design
category encompasses UI aspects that deal with aesthetics as well as with the semantic correspondence of coexisting virtual and physical objects. As already mentioned, we differ from our initial categorization based on Ko et al. [12
] by focusing on design artifacts and transitioning interaction aspects to the relevant category. This categorization is also reported in related work. For instance, Kumar and Goundar [24
] address artifacts of visual design by defining the category Visual Representation
. Tuli and Mantri [16
] further describe an Orientation
category which includes aspects of appealing design and consistent worlds, which is in line with our categorization.
The number of patterns in the Design category has been reduced to serve simplicity. Removed patterns were either merged with existing patterns or reclassified. The initial pattern Control Mapping, which deals with the mapping of control elements to single actions, has been assigned to the Aesthetic & Minimalist Design pattern, which addresses simple and appealing design choices. Here, UI designers should limit UI elements to a minimum and avoid visual clutter that may obstruct the AR view. Furthermore, the former patterns Context-based and Seamful Design were assigned to the World Consistency pattern. This pattern aims at aligning the virtual and physical world, considering various application contexts, while preventing a disconnect between user and system, e.g., by addressing tracking limitations. When aligning the physical and virtual world, UI designers should ensure that physical and virtual entities follow common laws, so that, for example, a virtual representation of a cube follows the same laws of gravity as its physical counterpart. In addition, the initial Device Metaphor pattern, which deals with the perception of device familiarity, was extended and renamed to Semantic Correspondence, aiming at also attributing this familiarity to virtual artifacts. Here, virtual objects should be semantically aligned with their real-world functionality such as a lightning icon triggering the camera flash.
Within our Information
category, we followed the initial definition by Ko et al. [12
], addressing the accessibility and visibility of consistent information. In our final refinement, we further concentrated on the accessibility and provision of information and moved aesthetic aspects to the appropriate category. Kumar and Goundar [24
] describe a similar Content Organisation
category, which comprises the localization and structuring of content.
Within our final model, the Accessibility pattern ensures the structure and organization of content, as well as the management of AR-specific information, suggesting less important information to be nested deeper in the application, while instantly providing significant information. Here, the hierarchical information structure should follow conventional organization paradigms, such as prioritizing newly added information more than older information. UI designers should further ensure that augmented data are accessible regardless of the current field of view. Thus, they comprise the initial patterns Default, dealing with the initial configuration of the UI, and Hierarchy, Navigation & Availability, addressing content organization. The former Enjoyment pattern, which ensures an appealing UI presentation, was assigned to Aesthetic & Minimalist Design and thus moved to the Design category. The relevance of the Multimodality pattern was revealed by our initial study and confirmed by the analysis of further work and was thus adopted in our final model. Here, multiple interaction methods should be offered to enhance the freedom of use, which we found to be crucial in AR interactions to ensure a positive user experience. Here, sensors such as accelerometers could be used for motion tracking and responsive sound or vibration could satisfy auditory and tactile senses and thus serve usability purposes. The Visibility pattern has been mentioned in several related works and has been adopted from our initial model as well. This pattern aims to ensure that the visibility of AR content is not hindered, e.g., by an overloaded UI, as fixed virtual objects may be off-screen within the dynamic AR view. In this case, UI designers should provide visual cues that point to the off-screen objects’ positions. Within our initial research, we observed a highly significant relationship between Consistency and a positive usability. We adopted this pattern in our final model, aiming to ensure a homogenous design that prevents confusion, e.g., by enabling users to attribute one function’s behavior to a visually similar function.
The initial Interaction
category based on Ko et al. [12
] is also reflected in our final pattern model, while retaining the initial intention of bundling patterns that address interaction aspects. Here, we focused on user-controlled interactions that trigger feedback from the system and may cause physical effort. As described below, we acquired patterns from other initial categories, such as the initial Usage
Within the Interaction category, the Feedback pattern was adopted from our initial model, as it was reflected in several related works, aiming at constantly informing users about the current system state, such as simple loading icons or progress bars communicating computing progression. The initial pattern Landmarks has not been referred to in other considered works and is thus assigned to the Feedback pattern as it is intended to provide virtual feedback on spatial navigation. Reduce Physical Effort was adopted as well, reflecting the need for awareness of physical efforts in AR applications through minimizing physical fatigue. In our final model, this pattern also addresses physical limitations that need to be considered since they may restrict engaging with augmented data, formerly defined as the Body Constraints pattern. We attribute special relevance to this consolidated pattern in the AR context, as many applications tend to require movement in physical space. UI designers should be aware of this aspect and minimize steps required to perform tasks as well arrange frequently used elements to be easy to reach, e.g., on the bottom of the screen when a smartphone is held vertically. The User Control pattern was moved from the Support Category, since it explicitly deals with the perceived control through interacting with UI artifacts. This pattern is similarly interpreted across most research papers as focused on providing sufficient interactions while preventing being stuck in an unwanted state. In AR applications, user control can be expressed, for example, by providing free specification of object attributes, such as scaling or positioning. Another more conventional example is the undo and redo functionality, which is applied throughout the software landscape to reverse or restore actions. Since the pattern Personal Presence showed no significant relationship to a positive usability in our initial research and was not referred to in the considered related work, we did not include this pattern in our final model. We assume that Personal Presence is more crucial in purely virtual and thus more immersive worlds, which could explain the seemingly minor significance for AR UIs.
category was adopted from the initial Ko et al. [12
] definition, but interpreted differently in our final refinement. We adopted aspects of system adaptability as well as help instructions, but expanded this category with patterns concerning error management, which were similarly classified by Tuli and Mantri [16
]. Within the Support
category, the Help & Documentation
pattern was considered by several research papers and thus adopted, aiming at dynamically providing instructions and helpful information, e.g., through introducing core functionality at first use or by providing instructions on a dedicated help page. The Personalization
pattern was adopted as well, but renamed to Customization & Personalization
to acknowledge the often-mentioned Customization
more explicitly, which refers to user-driven aspects of Personalization
. This pattern aims at enabling users to adjust UI artifacts to personal preferences, such as changing settings or creating personalized shortcuts to allow an individual application usage. As mentioned earlier, the interactive aspect of the former User Control & Responsiveness
pattern has been moved to the Interaction
category. However, the further aspect of this pattern, the ensuring of a stable user experience without interruptions, has been more explicitly included in the pattern Error Prevention & Management
, aiming at avoiding unwanted system states through extensive early testing. Here, error handling spans from textual prompting to assisting throughout a recovery process.
category was initially adopted from Ko et al. [12
] and retained, as this category is reflected in multiple related works. In particular, Agati et al. [17
] address Cognition
with respect to memory processes and Tuli and Mantri [16
] refer to learnability and memory load as cognitive aspects.
Within the Cognition category, the Learnability, Recognition & Predictability pattern was adopted, but simplified to Learnability following related work, since it can be argued that Recognition and Predictability are merely facets of Learnability. Learnability aims at implementing UI artifacts that are easily recognizable and build upon each other. Thus, an application should be easy to use from the initial start-up and slowly introduce more complex functions. The Cognition category was further extended by the Reduce Cognitive Demand pattern, aiming at minimizing the amount of information to process and memorize during usage. Augmented data should not be overly complex to a point where users’ perception is compromised. Due to the coexistence of physical and virtual information, there is a constant risk of information overload in AR UIs, which leads to a high cognitive demand that should be prevented.
The final pattern model presented in this section was examined as part of our second evaluation. For this purpose, a software representation of this model was implemented, which will be discussed in detail in the following section.
4.2. colAR Application
Within our final review, the AR application colAR
was implemented as a vehicle to verify the pattern model. This Android application allows the user to pick real-world colors from the physical environment and to apply them to virtual objects that are positioned on the desired color within the physical space (see Figure 3
). This AR application represents 14 out of 15 patterns from our final pattern model. The Multimodality
pattern was not included in our analysis, since it was already evaluated in our initial review and could not be usefully considered within the application’s use-case.
As shown in Figure 3
, the Aesthetic & Minimalist Design
pattern was addressed by integrating simple icons with clear shapes and matching colors. Here, established design icons were applied to represent basic functionality such as accessing the settings view. These icons were explicitly chosen to represent the individual functionalities, ensuring Semantic Correspondence
. For instance, the button to pick a color visually represents a pipette while the button to change an object dynamically changes based on the currently used object. World Consistency
was addressed by the application’s use-case, the assignment of real-world colors to virtual objects, which serves the alignment of both worlds. A disconnect between system and user was further prevented by avoiding advanced terminology and illogical order. The Accessibility
was ensured through positioning all frequently needed functions at the bottom of the main view, which also addresses Reduce Physical Effort
, since all main functions are easy to reach while holding the smartphone vertically. To increase Visibility
, icons were arranged towards the edge of the screen to leave room for the camera view. Artifacts overlapping the view, such as the color information panel, are additionally designed to be compact and slightly transparent. Consistency
is ensured by matching text fonts and colors across all UI artifacts throughout the runtime of the application. In order to address the Feedback
pattern, users are visually notified when a system state changes. For instance, a newly applied color will instantly appear as the current color, while the former color is automatically listed as the previous one. User Control
was implemented through the ability to freely move and scale virtual objects using a two-finger pinch. Within the settings menu, an instruction tab informs users about basic functionalities, addressing Help & Documentation
. The Customization & Personalization
pattern was considered by allowing virtual objects to be individually adjusted in size and color. Error Prevention & Management
has not been translated into a specific implementation, but has been addressed through intensive testing. Learnability
was ensured by slowly introducing more advanced functionality. Here, textual hints instruct new users on how to place virtual objects. Finally, the pattern Reduce Cognitive Demand
was addressed by not requiring memorization of information between application states. As an example, the information about re-applicable current and previous colors is accessible without the need to recall them.
In sum, the colAR application serves as a representation of the final pattern model and thus as a vehicle for evaluation, which was conducted within a user-based setting. This empirical evaluation is explained in the following section.
4.3. Empirical Study
Our empirical investigation of the colAR application comprises various aspects of usability, addressed by different research questions. These research questions are explained in the current section, along with the experimental design, the results and the discussion of results.
4.3.1. Research Questions
Similar to our initial evaluation, the primary focus of our second evaluation is whether the consolidation of UI patterns in our refined final pattern model is related to a positive usability. This results in the following research question:
Is the consideration of the final pattern model related to a positive usability?
In order to broadly investigate our model’s relevance, we further examine if an explicit non-consideration of this model is related to a more negative usability. Thus, the following research question was formulated:
Is there a difference in usability rating between an application considering the pattern model and an equivalent application neglecting it?
Since AR is still a novel technology, it can be assumed that users have different levels of prior experience. Thus, it cannot be dismissed that the individual perception of usability might be connected to prior technological experience, which is addressed by the following research question:
Does prior AR experience correlate with the usability rating?
In order to investigate the formulated research questions, an additional model-neglecting version of the colAR application was implemented, which will be explained prior to the experimental design.
Within our empirical study, two variants of the colAR
application were implemented to investigate the aforementioned research questions. The version referred to as Variant A
represents the original colAR
application with consideration of the final pattern model (see Section 4.2
). An additional Variant B
was developed, representing an application variant that explicitly neglects our pattern model.
As shown in Figure 4
, Variant B
does not feature an Aesthetic & Minimalist Design
, since approximately
of available screen space is covered by a black bar, which also weakens the Visibility
. Differences in icon colors, shapes and depths are further supposed to contradict an aesthetic design. The non-uniformity in text fonts and icons, where each icon follows a different design philosophy, further contradicts Consistency
. The Semantic Correspondence
was disregarded by representing the color information button by a palette symbol, which usually represents a color selection instead of the mere presentation of information.
Furthermore, the gear icon for the settings view has been replaced by an icon that rather refers to a context menu. Instant visual Feedback was disregarded by moving the dynamic “Change Object” button to the settings view. The information panel listing current and previous colors was moved to the settings view as well, ensuring no direct visual feedback is provided. This is additionally intended to weaken the World Consistency by creating a disconnect between the user and the AR experience. Accessibility and Reduce Physical Effort are also affected, as users are forced to repeatedly open and close a dedicated view to use the application. The former direct object manipulation now requires five consecutive actions, which also weakens Reduce Cognitive Demand, since the click path to the settings as well as the configurations themselves need to be memorized. The mentioned textual hints to place virtual objects were removed to decrease the application’s Learnability, which was further impeded by a reduced Help & Documentation. Nevertheless, in order to ensure general functionality of both application variants, the User Control pattern was kept but impeded. Here, the former free positioning and scaling of virtual objects by a pinch gesture has been replaced with a simple slider.
4.3.3. Experimental Design
In our empirical study, participants were randomly assigned to either Variant A or Variant B within a between-subjects design. Participants were first informed about the general use-case of the application (“With colAR, users are able to detect colors in the environment and apply them to a virtual object. […] Once an area is properly scanned, users can place and move a virtual 3D object within the set boundaries. […]”). Participants then received a worksheet, which was available throughout the whole interaction and contained a total of eight tasks, defined to ensure that the entire application’s function range and thus all implemented patterns are perceived. These tasks include simple actions such as scanning a surface or toggling objects to more complex tasks such as the colorization of virtual objects based on real world colors or browsing through usage history (e.g., “Assign colors to set orange as the previous color and red as the current color.”). In addition, tasks required virtual objects to be scaled to the proportions of present physical objects.
In order to perform given tasks, a physical table was provided on which ten colored sheets were laid out to serve as templates for color picking and thus ensure comparability across interactions (see Figure 3
and Figure 4
). These sheets were placed in a 2 × 5 arrangement, where every tile was assigned to a high-contrast neighbor. Colors meant to be scanned within the interaction (orange, blue, red) were placed in different sections of the table to force participants to engage in the positioning of virtual objects. Additionally, physical ornaments were placed on the bottom right corner of the table to assist in the scaling of a virtual sphere to a physical sphere’s proportions. During the interaction, each participant was restricted to the same position in front of the physical table to ensure comparable conditions.
After completing given tasks, various questionnaires were surveyed, addressing several aspects of the application’s usability. On the one hand, the System Usability Scale [21
] was applied to consider general usability, as in our initial evaluation (see Section 3.3
). We further surveyed the Raw Task Load Index (RTLX) [27
], a simplified version of the well-known NASA-TLX [28
], to capture the perceived workload spanning over different levels, such as the mental demand (e.g., “How mentally demanding was the task?”
) or the frustration level (e.g., “How insecure, discouraged, irritated, stressed, […] did you feel during the task?”
) on a 20-point Likert scale from 1 (“Very Low”) to 20 (“Very High”), resulting in an overall perceived workload (RTLX score) from 0 (low) to 100 (high). Within our empirical study, the time required to complete tasks was measured as an indicator for efficiency. Additionally, the level of prior experience was surveyed in order to investigate the relationship of prior experience and usability. Participants were asked to rate their prior experience on a 5-point Likert scale from 1 (“I have never dealt with AR applications before.”) to 5 (“I am well versed in AR applications.”). The study concluded with the collection of demographic data.
The evaluation procedure described in our second study builds upon our initially approved methodology, enabling the comparison of results between both studies. However, the second study expands usability dimensions and considers an A/B testing to broadly investigate the model’s relevance.
4.3.4. Sample and Results
Our empirical study was conducted with participants, of whom eight were male and two female. Participants were on average () years old. As mentioned before, participants were randomly assigned to one of our variants, resulting in two groups of participants.
To investigate RQ3, the usability of Variant A
was analyzed on multiple levels. The SUS score of Variant A
results in a total score of
) with a minimum single score of
and a maximum of
. Variant A
achieved an RTLX score of
) with a minimum single score of
and a maximum of
. To investigate whether the perceived workload relates to the overall usability, these variables were examined in a correlation. As displayed in Figure 5
, the SUS score and the RTLX score showed a significant negative correlation with a large effect size (see Cohen [22
= −0.913, p
= 0.030) for Variant A
. In addition, users of Variant A
required an average of
) seconds to complete all given tasks.
In RQ4, differences in usability variables between variants were investigated. The SUS score of Variant B
results in a total score of
with a minimum single score of
and a maximum of
. Variant B
achieved an RTLX score of
with a minimum single score of
and a maximum of
. The SUS score and the RTLX score showed a significant negative correlation with a large effect size (r
= −0.916, p
= 0.029) for Variant B
. To investigate differences in SUS score between variants, a t
-test was performed, which revealed a significant difference
= 0.022), as shown in Table 4
. A further t
-test additionally identified a significant difference (
= 0.008) between both groups regarding the RTLX score. Considering the time needed to complete given tasks, participants using Variant B
needed an average of
) seconds, which equals a significant difference (
= −3.61, p
= 0.007) to Variant A
, revealed through a t
-test. In addition, a significant correlation (r
= 0.932, p
= 0.021) was observed between the RTLX score in Variant B
and the time required in Variant B
To investigate RQ5, the relationship between prior AR experience and usability variables was examined through correlation analyses. Participants reported a prior experience of () on a 5-point Likert scale. As a result, a significant negative correlation could be identified between the reported prior experience and the RTLX score in Variant A (r = −0.895, p = 0.040). No other correlations between prior experience and usability variables were observed. Neither in relation to the SUS score (r = −0.249, p = 0.488), nor in relation to the RTLX score (r = 0.187, p = 0.605), nor in connection with the time needed to complete given tasks (r = 0.186, p = 0.608).
In addition to the variant-specific analyses, significant correlations were found between aggregated values. Here, a highly significant negative correlation (r = −0.929, p = 0.001) between the total SUS score and the total RTLX score was observed. In addition, a significant negative correlation (r = −0.845, p = 0.002) between the time required and the total SUS score as well as a highly significant correlation (r = 0.963, p = 0.001) between the time required and the total RTXL score were revealed.
The presented results suggest interesting implications in the context of our research. In RQ3, we investigated whether a consideration of the final pattern model is related to a positive usability. We observed a total SUS score of
in Variant A
, which corresponds to a “grade A+” usability (see [23
]), which even exceeds the rating of our initial pattern model (see Section 3.3
). The perceived workload resulted in an overall RTLX score of
for Variant A
, which is at the lower limit of the “medium” workload classification (see [29
]). The assumptions underlying RQ3 could thus be confirmed in the sense that the implementation of the final pattern model is related to a positive usability, which is characterized by a high SUS score and a perceived workload at the lower end of the scale.
In RQ4, we further considered the final pattern model by comparing the AR application to a similar version that explicitly neglects this model. Variant B received an SUS score of , corresponding to a “grade C” rating. The RTLX score of is classified as “somewhat high”. The difference to Variant A was confirmed as statistically significant for all measured usability variables. It can be assumed that the significantly poorer usability of Variant B is due to the overloaded UI and the additional steps needed to perform tasks. Relevant functionality such as the color information panel was moved to a nested menu, which may have increased the workload and thus reduced usability. Inconsistencies may have further caused confusion within the UI, which was reflected in verbal statements throughout the empirical study. Participants using Variant B repeatedly asked how to perform basic actions like placing, scaling and exchanging virtual objects, which could also be due to the poor visual representation of underlying functionality. These results thus confirm the assumptions underlying RQ4. Variant B was found to have a significantly poorer general usability, a significantly higher perceived workload and a significantly longer time required to solve given tasks.
We further investigated whether prior AR experience relates to the usability ratings within our study. While we did not detect any significant relation to the aggregated usability scores, we did observe a significantly negative influence of prior experience on the perceived workload in Variant A. It can be assumed that a higher expertise led to a lower perceived workload, which is not surprising, but would also be expected for Variant B. This may be attributed to the generally high perceived workload in this variant, which may have offset the influence of prior experience. The fact that no other correlations with prior experience were found might be related to the young average age of 23 years within our study, since this age group is attributed with a high technical affinity and thus adaptability to novel technologies. However, this may also lead to the conclusion that prior experience is not a dominant influence factor for a system’s usability, although this would be worth investigating further. Nevertheless, the underlying assumptions of RQ5 could not be confirmed within our empirical study.
Some additional implications from our reported results remain. We observed a strong connection between all considered usability variables, in aggregated scores as well as in specific variants. Our results indicate that a positive general usability is strongly related to low perceived workload and vice versa, both of which result in less time needed to perform tasks. Although this finding is not surprising, it confirms the general assumption that the perceived workload is strongly connected to the general usability and the efficiency of task performance.
Nevertheless, some limitations of the empirical study need to be mentioned. Due to the nature of A/B testing, the number of participants testing each variant was halved. This rather small sample size may have effected our results. Within our research, we chose an incremental approach to reflect theoretical models in multiple observations in order to compose valid results. Nevertheless, even though our results were explored in two separate studies, larger studies should be conducted to further validate our findings. In addition, further evaluation possibilities such as including focus groups or expert interviews should be considered. Furthermore, in this specific study, we only examined the first-time use of a single application to which the results may be attributed. However, we consider this second evaluation to be an extension of our initial evaluation, the shared results were thus observed in two distinct AR applications.