Design Patterns for Mobile Augmented Reality User Interfaces—An Incremental Review †

: The virtual enhancement of the physical world through Augmented Reality (AR) has an enormous potential in its application, but faces challenges in its development. The lack of standards and the increased complexity of interaction opportunities complicate the deﬁnition of suitable User Interfaces (UIs). Several principles and patterns have been formulated to simplify UI design for AR applications, but their joint contribution to a positive usability as well as the inﬂuence of individual patterns remain unclear. In this paper, AR design principles from selected research were reviewed and merged into a comprehensive pattern model within an incremental process. Based on an initial model, we developed ARScribble, a mobile AR application which imitates a physical spray can to virtually sketch within a real environment. In a user-based study, we evaluated the usability of ARScribble as well as the role of individual patterns for the overall usability. We found promising indications that the pattern model implementation is related to a positive usability. The individual pattern analysis showed that AR users particularly desire a consistent and structured UI. A consistent appealing design and multimodal interaction concepts were also found to positively correlate with the overall usability. Based on these results, we included additional related work to reﬁne the initial model into a ﬁnal pattern model. To evaluate this reﬁnement, the colAR application was developed, which allows real-world colors to be assigned to virtual objects. As a result, we found the consideration of the ﬁnal pattern model to be related to a positive usability, which was conﬁrmed in an A/B test, in which an application neglecting the pattern model showed a signiﬁcantly poorer usability.


Introduction
The novel technology augmented reality allows the extension of the physical world by virtual information and is already used in many application areas. AR has an enormous potential and enables novel interaction possibilities, but faces challenges in the development process, including the complex modeling of 3D content and interactions [1]. Furthermore, the relevance of the physical world requires AR user interfaces to comprise virtual and physical artifacts, resulting in a particularly complex development process [2]. Since no standards for AR UI engineering have yet been established [3,4], each UI concept has to be considered individually. Thus, Ashtari et al. [5] name the lack of concrete design guidelines as one of the key barriers to AR development.
As manifested in our initial approximation (see [6]), current AR usability research often focuses on the adaptation of established UI engineering methods to AR requirements. For example, sketching has been applied for the conception of AR UIs, especially regarding virtual objects [7,8] or as a foundation for interaction prototyping [9]. Prototyping itself is applied to iterate and evaluate interaction concepts in order to foster design decisions [10], even using related technologies such as virtual reality [11]. Only a few approaches have focused on the core objective, the formulation of usability guidelines for the creation of AR UIs. Here, most approaches formulate best practices by performing meta-analyses of AR applications. In a few cases, this results in usability principles [12], pre-patterns [13] or design heuristics [14], which are often focused on a specific domain, such as educational video games [15], kindergarten applications [16] or industry 4.0 [17]. However, it remains unclear whether a joint set of principles and pre-patterns is actually accompanied by a positive usability and, in addition, which of the defined patterns play a particularly crucial role for the usability. In this paper, we follow an incremental review approach. As a first step, we formulate an initial joint pattern model and evaluate the overall usability of this implemented model as well as the role of individual UI patterns to establish our research methodology. In a further step, we refine our pattern model through examining additional related work, which is further investigated within an A/B test. We thus contribute to AR research by reflecting the current state of research through extracting and consolidating generally applicable design guidelines from related meta-analyses. The exemplary implementation and evaluation of patterns is intended to provide assistance in the definition of suitable AR UIs.
In Section 2, related work from current AR usability research is highlighted. Our first review is presented in Section 3, including the initial pattern model (Section 3.1), the implemented ARScribble application (Section 3.2) as well as the empirical evaluation (Section 3.3). Our second review is presented in Section 4, where our final pattern model is presented (Section 4.1), represented by the colAR application (Section 4.2) and evaluated within a user-based study (Section 4.3). The key findings of our incremental review are discussed in the conclusion in Section 5.

Related Work
AR usability research is often concerned with adapting existing design heuristics to AR requirements. These design heuristics are usually a collection of best practices [18] that simplify recurring problems [19]. For instance, Dünser et al. [20] linked user-centered design principles to AR requirements and derived challenges to be considered by researchers, such as the reduction of cognitive overhead. Nevertheless, the design principles are limited to a small set and remain rather general. As Dünser et al. [20] state, their work should be seen more as a research encouragement and less as a holistic pattern model. Tuli and Mantri [16] conducted a promising meta-analysis of existing usability guidelines through research exploration, expert evaluation and the derivation of mobile AR design principles for kindergarten children. Agati et al. [17] investigate industrial AR applications by focusing on the area of manual assembly, resulting in various usability guidelines. These two papers are either strongly directed at a specific target group or focused on a single use-case. However, both papers also feature patterns that are generally applicable within AR UI engineering. These patterns will thus be explored in our second review (see Section 4) and evaluated individually within the refinement process.
Our initial research (see Section 3) especially builds on the work of Ko et al. [12] and Xu et al. [13], which were selected as representatives of the research field. Ko et al. [12] define AR usability principles by analyzing existing research on mobile applications, tangible UIs and heuristic evaluation methods, resulting in new guidelines to solve AR usability problems. Based on this research, 22 AR usability principles, such as Enjoyment or Learnability, were defined and classified. Xu et al. [13] analyzed academic and commercial AR games and generated best practices for AR UI engineering. Based on this, Xu et al. [13] formulate nine design pre-patterns, such as World Consistency or Landmarks.
Although research regarding the definition of AR design principles exists, approaches are often focused on individual design artifacts or application domains. The definition and evaluation of a joint pattern model is an essential first step towards applicable AR usability standards. To finalize our initial review, further related work is considered for the refinement of our pattern model. The specific works considered here are described in Section 4.1.

Initial Review
In our first review, we focus on design patterns from two representative sources, which we analyzed individually and merged into a shared design pattern model. The feasibility of this consolidation, the ability to represent this model in an AR application as well as the sufficiency of applied evaluation methods are the focus of our initial review. Thus, we developed the ARScribble application that implements this model and evaluated its usability within a user-based study. The representation of individual patterns in our AR application as well as its empirical evaluation are explained in the course of the section.

Initial Design Pattern Model
In order to initially investigate the role of existing design patterns for the usability of AR systems, design principles were identified and then transformed into a joint model. As shown in Table 1, design principle categories formulated by Ko et al. [12] were adopted. In line with Ko et al. [12], we adapted the category Usage, which addresses the actual use of the application and user (re-)actions that affect the application flow. These usageoriented patterns comprise control elements, contextual situations and possible limitations that span physical and virtual artifacts. In the Information category, patterns are concerned with the consistent visual design and accessibility of information. Here, hierarchical structures, navigation concepts and visibility are crucial aspects, in addition to the appealing and enjoyable presentation. In line with Ko et al. [12], our Interaction category addresses interactions that impact the virtual world but may require physical effort within the real world. Assistance for navigation during the interaction as well as the communication of current processes are a further focus of this category. This assistance is treated even more explicitly in the Support category, dealing with providing useful information in the form of instructions and aids, while ensuring a stable customizable experience that is adaptable to individual preferences. Finally, we adopted the Cognition category from Ko et al. [12], which addresses the mental load during application use, covering aspects of learnability, memorization and predictability of UI artifacts that may affect the user's cognitive capacity.
Specific patterns were collated from Xu et al. [13] and Ko et al. [12], then reviewed and consolidated into shared patterns based on their core objectives. Consolidated patterns and their individual core objectives are listed in Table 1, where their origin from Ko et al. [12] (1) or Xu et al. [13] (2) is highlighted. Here, the patterns Learnability, Recognition and Predictability were merged, since they all essentially concern the explicitness of UI artifacts. The joint pattern Hierarchy, Navigation & Availability was defined, which determines the structuring of and navigation through information. The pattern Multimodality & Hidden Information was consolidated since the transmission of hidden information is usually tied to the consideration of multiple modalities. Furthermore, User Control and Responsiveness were merged, since they relate to the system's uninterrupted response to user input and the resulting sense of control for the user. In addition, only those patterns from Xu et al. [13] were extracted which were evaluated as transferable from their AR game origin to general AR applications.

ARScribble Application
In order to evaluate the initial pattern model, the smartphone application ARScribble was developed, which imitates a physical spray can, allowing users to virtually paint within the real environment (see Figure 1). Since the feasibility of design patterns depends on the application use-case, not all, but as many patterns as possible were implemented (15 out of 21). Since we only consider a narrow use-case in our evaluation, the pattern Context-based was not considered. Tracking of physical artifacts was not required and therefore did not need to be addressed in our use-case (Seamful Design). In addition, spatial navigation points (Landmarks), the immersion of the virtual world (Personal Presence) as well as the interference of multiple users' actions (Body Constraints) were evaluated as not applicable in our use-case.
As shown in Figure 1, a consistent appealing UI design was implemented by following Apple's corporate design, allowing the reuse of empirically evaluated artifacts throughout the application (Consistency; Enjoyment). Furthermore, a centered marker constantly visualizes the currently configured line width and color (Feedback). By Default, the color and line width are set to appropriate values (a highly visible yellow tone in a medium line width), but can be configured by the user (Personalization). Although touch input serves as the primary interaction form, color changing can be performed using a voice command as well (Multimodality & Hidden Information). Besides the mentioned functionality, only two additional UI buttons were implemented (speech recognition, help menu), ensuring a wellorganized and non-overloading design (Hierarchy, Navigation & Availability; Visibility). As shown in Figure 1, the help menu offers further functionality explanations (Help & Documentation). To simulate the use of a real spray can, the virtual fill level decreases while painting and users are prompted to shake the smartphone to refill the can (Device Metaphors; Control Mapping). Prompts are visually displayed as well as haptically transmitted by vibration and disappear as soon as the paint is refilled (Feedback; Multimodality & Hidden Information). Since this frequently occurs while painting, users develop a feeling for when to refill the can (Learnability, Recognition & Predictability). Besides haptic signals, acoustic output has been integrated, which imitates the sound of painting with a real spray can (Device Metaphors; Multimodality & Hidden Information). As soon as the painting button is released, the acoustic signal stops (Device Metaphors). Furthermore, painted virtual artifacts are illuminated through adopting ambient light from the real world (World Consistency). The Low Physical Effort pattern was ensured by simplifying the accessibility of UI artifacts at the bottom of the view. The entire software was tested extensively in several iterations, resulting in a stable application (User Control & Responsiveness).
The described implementation of the ARScribble application serves the empirical evaluation of the initial pattern model, which is explained in the following section.

Empirical Study
The application presented in the previous section was analyzed within a user-based study, evaluating the overall usability as well as the role of individual patterns. In this section, we first introduce our research questions, experimental design and sample, followed by the presentation and discussion of results.

Research Questions, Experimental Design and Sample
As stated in Section 2, current AR usability research often focuses specific domains or individual UI artifacts. In our study, we strove to evaluate whether an implementation of bundled design patterns is related to a positive usability. Thus, the following research question was formulated:

RQ1:
Does the consideration of joint AR design patterns correlate with a positive usability?
In order to identify particularly influencing patterns and thus derive explicit recommendations for AR UI engineering, the role of individual patterns for the overall usability needs to be considered. This results in the following second research question:

RQ2:
Which of the design patterns play a particularly crucial role for the overall usability?
Within our study, several tasks were designed to guide participants through the full feature range of ARScribble and thus ensure that all implemented patterns were noticed. First, participants were asked to paint their initials into the real environment with free choice of color and line width. Next, participants outlined their painted initials with a thick red and a thin blue line. These tasks were performed under controlled conditions, within the same physical room under identical lighting conditions using the same device. Afterwards, the System Usability Scale (SUS, see [21]) was surveyed to evaluate the overall usability of the application, rating statements such as "I think that I would like to use this system frequently." on a 5-point Likert scale ranging from 1 ("Strongly disagree") to 5 ("Strongly agree"), resulting in a cumulative SUS score on a scale of 0 − 100. To evaluate the role of individual patterns for the SUS, the individual pattern implementation was evaluated (e.g., "I think the pattern 'Default' in ARScribble is well implemented (initial color and line width)") on a 5-point Likert scale ranging from 1 ("Strongly disagree") to 5 ("Strongly agree"). Finally, the prior AR experience was surveyed ("How much experience do you have in operating AR applications?") on a 5-point Likert scale ranging from 1 ("None") to 5 ("Very much") in order to consider an influence of previous AR experience on the usability evaluation.
Our stated empirical study was conducted with N = 18 participants of which 13 were male and 5 female. Participants were on average M = 29 (SD = 7.27) years old.

Results
The overall usability was calculated and results in a total SUS score of M = 80.56 (SD = 11.26), with a minimum single score of 47.5 and a maximum of 97.5. The relationship between individual patterns and the overall SUS score was evaluated through correlation analyses, based on the individual implementation ratings. As shown in Table 2  In order to assess a possible influence of prior AR experience on the usability evaluation, the AR experience was correlated with the SUS score. Participants reported a prior AR experience of M = 2.54 (SD = 1.25) and the correlation of the SUS score with the prior AR experience showed no significant result (r = −0.199, p = 0.429).

Discussion
The usability evaluation of ARScribble resulted in an SUS score of 80.56. Since metastudies found that an SUS score between 78.9 and 80.7 is considered as a "grade A-" usability (based on the American school grading system, see [23]), the usability of ARScribble was rated as particularly good. Thus, underlying assumptions of RQ1 were confirmed, since the consideration of a joint model of current design patterns was related to a positive usability. Analyses regarding individual patterns showed a strong significant correlation between Consistency and the overall usability. It is conceivable that users seek consistency in order to master an application, especially when familiarizing with novel technologies. This assumption is supported by the significant correlation coefficients for the patterns Hierarchy, Navigation & Availability, Control Mapping and User Control & Responsiveness, since their objectives also promote clear structures, well-organized interfaces and uninterrupted use. The significant effects for the pattern Enjoyment strengthen this assumption, since an appealing design thrives on clear structures and well-organized information. Additionally, the results implicate that AR applications should integrate multiple modalities, since the pattern Multimodality & Hidden Information was significantly positively related to the SUS score, likely facilitated by the increased freedom of use. Thus, the underlying assumptions of RQ2 were confirmed by revealing particularly crucial patterns.
Nevertheless, some limitations of the empirical study need to be mentioned. Although it can be assumed that the implementation of individual patterns had a positive influence on the SUS, the causality could also be reversed. Future research should investigate causality effects by means of long-term studies or more complex experimental designs. Additionally, the design pattern model mainly focused on two main sources and was evaluated through a single smartphone application. In order to address this limitation, we extended our initial research by a second review, which is described in detail in the next section.

Final Review
On the basis of the previously presented results, we extended our research by investigating further related work, formulating a final pattern model and evaluating a software representation of this model. In this regard, the second review is not limited to representative work, but includes a comprehensive analysis of the current research field. Here, the consolidation, implementation and evaluation of design patterns are based on the methodology established in our initial review. Individual components of our second review are explained in the course of this section.

Final Design Pattern Model
In our finalization process, five additional papers comprising a total of 82 patterns were explored and evaluated. Here, we considered AR-specific works as well as research from general mobile software development. In the following, we first present our research results and then discuss how these were reflected in the refinement of our model.
As described in Section 2, only a few works deal with patterns for general AR UI design. One of these works was presented by Endsley et al. [14], who consider various design heuristics and derive a final list of nine AR design patterns, including the minimization of mental load or the alignment of the virtual and physical world. Although AR-specific research often focuses on a specific use-case or target group, these papers usually also consider generally applicable patterns that might be valuable for our model refinement.
Here, Agati et al. [17] focus on the industrial sector by performing a profound aggregation of relevant work, but also include universal patterns spanning categories such as "Cognitive" or "Ergonomics." In addition, Tuli and Mantri [16] focus on AR games for kindergarten children, but also consider general patterns such as "Learnability" or "Consistency." For a broad exploration of the research domain, we not only explored AR-specific research, but also related work from the field of general mobile software development. Here, both Kumar and Goundar [24] and Dourado and Canedo [25] reviewed the Nielsen usability heuristics (see [26]) in the context of mobile applications. This basic set of patterns has been expanded to include heuristics such as "Content Organization" and "Visual Representation" [24] or "Privacy" and "Efficiency of Use" [25].
For our model refinement, we individually evaluated all patterns from the mentioned research and grouped them based on their core objective, excluding patterns that are strongly use-case dependent or only applicable in niche situations. As shown in Table 3, we consolidated heuristics based on their featured patterns while listing their source of origin. It is noticeable that some patterns are mentioned by almost all considered works, such as Aesthetic & Minimalist Design, Consistency or World Consistency, but some patterns are only mentioned by a small number of papers, such as Multimodality or Hierarchy & Navigation. Furthermore, it becomes apparent that many of the listed patterns seem to be universally applicable to mobile software. This is not surprising, since mobile AR applications tend to share similarities with conventional mobile applications. However, patterns that seem universal at first glance reveal specific challenges in the AR context, which we will highlight in the course of this section. It is further evident that there is an overlap of the consolidated heuristics with our initial research, albeit new aspects were discovered that have not been considered before, such as Reduce Cognitive Demand.
The consolidated heuristics were reflected in the refinement of our pattern model, which was derived from our initial research. As presented in Figure 2, we still distinguish between the categories Information, Interaction, Support and Cognition. Nevertheless, the initial category Usage was replaced by Design, since many Usage aspects are already considered in the Interaction category. We transferred these patterns to the relevant category and focused on actual design aspects within the Design category. Individual categories and their included patterns are explained in the further course of this section, with a special focus on modifications to the initial classification and AR-specific aspects.

Design
Within our pattern model, the Design category encompasses UI aspects that deal with aesthetics as well as with the semantic correspondence of coexisting virtual and physical objects. As already mentioned, we differ from our initial categorization based on Ko et al. [12] by focusing on design artifacts and transitioning interaction aspects to the relevant category. This categorization is also reported in related work. For instance, Kumar and Goundar [24] address artifacts of visual design by defining the category Visual Representation. Tuli and Mantri [16] further describe an Orientation category which includes aspects of appealing design and consistent worlds, which is in line with our categorization.
The number of patterns in the Design category has been reduced to serve simplicity. Removed patterns were either merged with existing patterns or reclassified. The initial pattern Control Mapping, which deals with the mapping of control elements to single actions, has been assigned to the Aesthetic & Minimalist Design pattern, which addresses simple and appealing design choices. Here, UI designers should limit UI elements to a minimum and avoid visual clutter that may obstruct the AR view. Furthermore, the former patterns Context-based and Seamful Design were assigned to the World Consistency pattern. This pattern aims at aligning the virtual and physical world, considering various application contexts, while preventing a disconnect between user and system, e.g., by addressing tracking limitations. When aligning the physical and virtual world, UI designers should ensure that physical and virtual entities follow common laws, so that, for example, a virtual representation of a cube follows the same laws of gravity as its physical counterpart. In addition, the initial Device Metaphor pattern, which deals with the perception of device familiarity, was extended and renamed to Semantic Correspondence, aiming at also attributing this familiarity to virtual artifacts. Here, virtual objects should be semantically aligned with their real-world functionality such as a lightning icon triggering the camera flash.

Information
Within our Information category, we followed the initial definition by Ko et al. [12], addressing the accessibility and visibility of consistent information. In our final refinement, we further concentrated on the accessibility and provision of information and moved aesthetic aspects to the appropriate category. Kumar and Goundar [24] describe a similar Content Organisation category, which comprises the localization and structuring of content. Within our final model, the Accessibility pattern ensures the structure and organization of content, as well as the management of AR-specific information, suggesting less important information to be nested deeper in the application, while instantly providing significant information. Here, the hierarchical information structure should follow conventional organization paradigms, such as prioritizing newly added information more than older information. UI designers should further ensure that augmented data are accessible regardless of the current field of view. Thus, they comprise the initial patterns Default, dealing with the initial configuration of the UI, and Hierarchy, Navigation & Availability, addressing content organization. The former Enjoyment pattern, which ensures an appealing UI presentation, was assigned to Aesthetic & Minimalist Design and thus moved to the Design category. The relevance of the Multimodality pattern was revealed by our initial study and confirmed by the analysis of further work and was thus adopted in our final model. Here, multiple interaction methods should be offered to enhance the freedom of use, which we found to be crucial in AR interactions to ensure a positive user experience. Here, sensors such as accelerometers could be used for motion tracking and responsive sound or vibration could satisfy auditory and tactile senses and thus serve usability purposes. The Visibility pattern has been mentioned in several related works and has been adopted from our initial model as well. This pattern aims to ensure that the visibility of AR content is not hindered, e.g., by an overloaded UI, as fixed virtual objects may be off-screen within the dynamic AR view. In this case, UI designers should provide visual cues that point to the off-screen objects' positions. Within our initial research, we observed a highly significant relationship between Consistency and a positive usability. We adopted this pattern in our final model, aiming to ensure a homogenous design that prevents confusion, e.g., by enabling users to attribute one function's behavior to a visually similar function.

Interaction
The initial Interaction category based on Ko et al. [12] is also reflected in our final pattern model, while retaining the initial intention of bundling patterns that address interaction aspects. Here, we focused on user-controlled interactions that trigger feedback from the system and may cause physical effort. As described below, we acquired patterns from other initial categories, such as the initial Usage category.
Within the Interaction category, the Feedback pattern was adopted from our initial model, as it was reflected in several related works, aiming at constantly informing users about the current system state, such as simple loading icons or progress bars communicating computing progression. The initial pattern Landmarks has not been referred to in other considered works and is thus assigned to the Feedback pattern as it is intended to provide virtual feedback on spatial navigation. Reduce Physical Effort was adopted as well, reflecting the need for awareness of physical efforts in AR applications through minimizing physical fatigue. In our final model, this pattern also addresses physical limitations that need to be considered since they may restrict engaging with augmented data, formerly defined as the Body Constraints pattern. We attribute special relevance to this consolidated pattern in the AR context, as many applications tend to require movement in physical space. UI designers should be aware of this aspect and minimize steps required to perform tasks as well arrange frequently used elements to be easy to reach, e.g., on the bottom of the screen when a smartphone is held vertically. The User Control pattern was moved from the Support Category, since it explicitly deals with the perceived control through interacting with UI artifacts. This pattern is similarly interpreted across most research papers as focused on providing sufficient interactions while preventing being stuck in an unwanted state. In AR applications, user control can be expressed, for example, by providing free specification of object attributes, such as scaling or positioning. Another more conventional example is the undo and redo functionality, which is applied throughout the software landscape to reverse or restore actions. Since the pattern Personal Presence showed no significant relationship to a positive usability in our initial research and was not referred to in the considered related work, we did not include this pattern in our final model. We assume that Personal Presence is more crucial in purely virtual and thus more immersive worlds, which could explain the seemingly minor significance for AR UIs.

Support
Our Support category was adopted from the initial Ko et al. [12] definition, but interpreted differently in our final refinement. We adopted aspects of system adaptability as well as help instructions, but expanded this category with patterns concerning error management, which were similarly classified by Tuli and Mantri [16]. Within the Support category, the Help & Documentation pattern was considered by several research papers and thus adopted, aiming at dynamically providing instructions and helpful information, e.g., through introducing core functionality at first use or by providing instructions on a dedicated help page. The Personalization pattern was adopted as well, but renamed to Customization & Personalization to acknowledge the often-mentioned Customization more explicitly, which refers to user-driven aspects of Personalization. This pattern aims at enabling users to adjust UI artifacts to personal preferences, such as changing settings or creating personalized shortcuts to allow an individual application usage. As mentioned earlier, the interactive aspect of the former User Control & Responsiveness pattern has been moved to the Interaction category. However, the further aspect of this pattern, the ensuring of a stable user experience without interruptions, has been more explicitly included in the pattern Error Prevention & Management, aiming at avoiding unwanted system states through extensive early testing. Here, error handling spans from textual prompting to assisting throughout a recovery process.

Cognition
The Cognition category was initially adopted from Ko et al. [12] and retained, as this category is reflected in multiple related works. In particular, Agati et al. [17] address Cognition with respect to memory processes and Tuli and Mantri [16] refer to learnability and memory load as cognitive aspects.
Within the Cognition category, the Learnability, Recognition & Predictability pattern was adopted, but simplified to Learnability following related work, since it can be argued that Recognition and Predictability are merely facets of Learnability. Learnability aims at implementing UI artifacts that are easily recognizable and build upon each other. Thus, an application should be easy to use from the initial start-up and slowly introduce more complex functions. The Cognition category was further extended by the Reduce Cognitive Demand pattern, aiming at minimizing the amount of information to process and memorize during usage. Augmented data should not be overly complex to a point where users' perception is compromised. Due to the coexistence of physical and virtual information, there is a constant risk of information overload in AR UIs, which leads to a high cognitive demand that should be prevented. The final pattern model presented in this section was examined as part of our second evaluation. For this purpose, a software representation of this model was implemented, which will be discussed in detail in the following section.

colAR Application
Within our final review, the AR application colAR was implemented as a vehicle to verify the pattern model. This Android application allows the user to pick real-world colors from the physical environment and to apply them to virtual objects that are positioned on the desired color within the physical space (see Figure 3). This AR application represents 14 out of 15 patterns from our final pattern model. The Multimodality pattern was not included in our analysis, since it was already evaluated in our initial review and could not be usefully considered within the application's use-case. As shown in Figure 3, the Aesthetic & Minimalist Design pattern was addressed by integrating simple icons with clear shapes and matching colors. Here, established design icons were applied to represent basic functionality such as accessing the settings view. These icons were explicitly chosen to represent the individual functionalities, ensuring Semantic Correspondence. For instance, the button to pick a color visually represents a pipette while the button to change an object dynamically changes based on the currently used object. World Consistency was addressed by the application's use-case, the assignment of real-world colors to virtual objects, which serves the alignment of both worlds. A disconnect between system and user was further prevented by avoiding advanced terminology and illogical order. The Accessibility was ensured through positioning all frequently needed functions at the bottom of the main view, which also addresses Reduce Physical Effort, since all main functions are easy to reach while holding the smartphone vertically. To increase Visibility, icons were arranged towards the edge of the screen to leave room for the camera view. Artifacts overlapping the view, such as the color information panel, are additionally designed to be compact and slightly transparent. Consistency is ensured by matching text fonts and colors across all UI artifacts throughout the runtime of the application. In order to address the Feedback pattern, users are visually notified when a system state changes. For instance, a newly applied color will instantly appear as the current color, while the former color is automatically listed as the previous one. User Control was implemented through the ability to freely move and scale virtual objects using a two-finger pinch. Within the settings menu, an instruction tab informs users about basic functionalities, addressing Help & Documentation. The Customization & Personalization pattern was considered by allowing virtual objects to be individually adjusted in size and color. Error Prevention & Management has not been translated into a specific implementation, but has been addressed through intensive testing. Learnability was ensured by slowly introducing more advanced functionality. Here, textual hints instruct new users on how to place virtual objects. Finally, the pattern Reduce Cognitive Demand was addressed by not requiring memorization of information between application states. As an example, the information about re-applicable current and previous colors is accessible without the need to recall them.
In sum, the colAR application serves as a representation of the final pattern model and thus as a vehicle for evaluation, which was conducted within a user-based setting. This empirical evaluation is explained in the following section.

Empirical Study
Our empirical investigation of the colAR application comprises various aspects of usability, addressed by different research questions. These research questions are explained in the current section, along with the experimental design, the results and the discussion of results.

Research Questions
Similar to our initial evaluation, the primary focus of our second evaluation is whether the consolidation of UI patterns in our refined final pattern model is related to a positive usability. This results in the following research question:

RQ3:
Is the consideration of the final pattern model related to a positive usability?
In order to broadly investigate our model's relevance, we further examine if an explicit non-consideration of this model is related to a more negative usability. Thus, the following research question was formulated:

RQ4:
Is there a difference in usability rating between an application considering the pattern model and an equivalent application neglecting it?
Since AR is still a novel technology, it can be assumed that users have different levels of prior experience. Thus, it cannot be dismissed that the individual perception of usability might be connected to prior technological experience, which is addressed by the following research question:

RQ5:
Does prior AR experience correlate with the usability rating?
In order to investigate the formulated research questions, an additional model-neglecting version of the colAR application was implemented, which will be explained prior to the experimental design.

Materials
Within our empirical study, two variants of the colAR application were implemented to investigate the aforementioned research questions. The version referred to as Variant A represents the original colAR application with consideration of the final pattern model (see Section 4.2). An additional Variant B was developed, representing an application variant that explicitly neglects our pattern model.
As shown in Figure 4, Variant B does not feature an Aesthetic & Minimalist Design, since approximately 20% of available screen space is covered by a black bar, which also weakens the Visibility. Differences in icon colors, shapes and depths are further supposed to contradict an aesthetic design. The non-uniformity in text fonts and icons, where each icon follows a different design philosophy, further contradicts Consistency. The Semantic Correspondence was disregarded by representing the color information button by a palette symbol, which usually represents a color selection instead of the mere presentation of information.
Furthermore, the gear icon for the settings view has been replaced by an icon that rather refers to a context menu. Instant visual Feedback was disregarded by moving the dynamic "Change Object" button to the settings view. The information panel listing current and previous colors was moved to the settings view as well, ensuring no direct visual feedback is provided. This is additionally intended to weaken the World Consistency by creating a disconnect between the user and the AR experience. Accessibility and Reduce Physical Effort are also affected, as users are forced to repeatedly open and close a dedicated view to use the application. The former direct object manipulation now requires five consecutive actions, which also weakens Reduce Cognitive Demand, since the click path to the settings as well as the configurations themselves need to be memorized. The mentioned textual hints to place virtual objects were removed to decrease the application's Learnability, which was further impeded by a reduced Help & Documentation. Nevertheless, in order to ensure general functionality of both application variants, the User Control pattern was kept but impeded. Here, the former free positioning and scaling of virtual objects by a pinch gesture has been replaced with a simple slider.

Experimental Design
In our empirical study, participants were randomly assigned to either Variant A or Variant B within a between-subjects design. Participants were first informed about the general use-case of the application ("With colAR, users are able to detect colors in the environment and apply them to a virtual object. [. . . ] Once an area is properly scanned, users can place and move a virtual 3D object within the set boundaries. [. . . ]"). Participants then received a worksheet, which was available throughout the whole interaction and contained a total of eight tasks, defined to ensure that the entire application's function range and thus all implemented patterns are perceived. These tasks include simple actions such as scanning a surface or toggling objects to more complex tasks such as the colorization of virtual objects based on real world colors or browsing through usage history (e.g., "Assign colors to set orange as the previous color and red as the current color."). In addition, tasks required virtual objects to be scaled to the proportions of present physical objects.
In order to perform given tasks, a physical table was provided on which ten colored sheets were laid out to serve as templates for color picking and thus ensure comparability across interactions (see Figures 3 and 4). These sheets were placed in a 2 × 5 arrangement, where every tile was assigned to a high-contrast neighbor. Colors meant to be scanned within the interaction (orange, blue, red) were placed in different sections of the table to force participants to engage in the positioning of virtual objects. Additionally, physical ornaments were placed on the bottom right corner of the table to assist in the scaling of a virtual sphere to a physical sphere's proportions. During the interaction, each participant was restricted to the same position in front of the physical table to ensure comparable conditions. After completing given tasks, various questionnaires were surveyed, addressing several aspects of the application's usability. On the one hand, the System Usability Scale [21] was applied to consider general usability, as in our initial evaluation (see Section 3.3). We further surveyed the Raw Task Load Index (RTLX) [27], a simplified version of the well-known NASA-TLX [28], to capture the perceived workload spanning over different levels, such as the mental demand (e.g., "How mentally demanding was the task?") or the frustration level (e.g., "How insecure, discouraged, irritated, stressed, [. . . ] did you feel during the task?") on a 20-point Likert scale from 1 ("Very Low") to 20 ("Very High"), resulting in an overall perceived workload (RTLX score) from 0 (low) to 100 (high). Within our empirical study, the time required to complete tasks was measured as an indicator for efficiency. Additionally, the level of prior experience was surveyed in order to investigate the relationship of prior experience and usability. Participants were asked to rate their prior experience on a 5-point Likert scale from 1 ("I have never dealt with AR applications before.") to 5 ("I am well versed in AR applications."). The study concluded with the collection of demographic data.
The evaluation procedure described in our second study builds upon our initially approved methodology, enabling the comparison of results between both studies. However, the second study expands usability dimensions and considers an A/B testing to broadly investigate the model's relevance.

Sample and Results
Our empirical study was conducted with N = 10 participants, of whom eight were male and two female. Participants were on average M = 23 (SD = 3.58) years old. As mentioned before, participants were randomly assigned to one of our variants, resulting in two groups of N = 5 participants.
To investigate RQ3, the usability of Variant A was analyzed on multiple levels. The SUS score of Variant A results in a total score of M = 87.00 (SD = 4.47) with a minimum single score of 85.00 and a maximum of 95.00. Variant A achieved an RTLX score of M = 11.00 (SD = 1.22) with a minimum single score of 10.00 and a maximum of 13.00. To investigate whether the perceived workload relates to the overall usability, these variables were examined in a correlation. As displayed in Figure 5, the SUS score and the RTLX score showed a significant negative correlation with a large effect size (see Cohen [22]) (r = −0.913, p = 0.030) for Variant A. In addition, users of Variant A required an average of M = 113.20 (SD = 28.50) seconds to complete all given tasks.
Note. **p < 0.01, *p < 0.05, bold value indicates a significant effect. In RQ4, differences in usability variables between variants were investigated. The SUS score of Variant B results in a total score of M = 67.50 (SD = 14.68) with a minimum single score of 42.50 and a maximum of 80.00. Variant B achieved an RTLX score of M = 34.60 (SD = 14.96) with a minimum single score of 24.00 and a maximum of 61.00. The SUS score and the RTLX score showed a significant negative correlation with a large effect size (r = −0.916, p = 0.029) for Variant B. To investigate differences in SUS score between variants, a t-test was performed, which revealed a significant difference (t(8) = 2.84, p = 0.022), as shown in Table 4. A further t-test additionally identified a significant difference (t(8) = −3.51, p = 0.008) between both groups regarding the RTLX score. Considering the time needed to complete given tasks, participants using Variant B needed an average of M = 300.40 (SD = 99.55) seconds, which equals a significant difference (t(8) = −3.61, p = 0.007) to Variant A, revealed through a t-test. In addition, a significant correlation (r = 0.932, p = 0.021) was observed between the RTLX score in Variant B and the time required in Variant B. To investigate RQ5, the relationship between prior AR experience and usability variables was examined through correlation analyses. Participants reported a prior experience of M = 3.00 (SD = 1.15) on a 5-point Likert scale. As a result, a significant negative correlation could be identified between the reported prior experience and the RTLX score in Variant A (r = −0.895, p = 0.040). No other correlations between prior experience and usability variables were observed. Neither in relation to the SUS score (r = −0.249, p = 0.488), nor in relation to the RTLX score (r = 0.187, p = 0.605), nor in connection with the time needed to complete given tasks (r = 0.186, p = 0.608).
In addition to the variant-specific analyses, significant correlations were found between aggregated values. Here, a highly significant negative correlation (r = −0.929, p = 0.001) between the total SUS score and the total RTLX score was observed. In addition, a significant negative correlation (r = −0.845, p = 0.002) between the time required and the total SUS score as well as a highly significant correlation (r = 0.963, p = 0.001) between the time required and the total RTXL score were revealed.

Discussion
The presented results suggest interesting implications in the context of our research. In RQ3, we investigated whether a consideration of the final pattern model is related to a positive usability. We observed a total SUS score of 87.00 for colAR in Variant A, which corresponds to a "grade A+" usability (see [23]), which even exceeds the rating of our initial pattern model (see Section 3.3). The perceived workload resulted in an overall RTLX score of 11.00 for Variant A, which is at the lower limit of the "medium" workload classification (see [29]). The assumptions underlying RQ3 could thus be confirmed in the sense that the implementation of the final pattern model is related to a positive usability, which is characterized by a high SUS score and a perceived workload at the lower end of the scale.
In RQ4, we further considered the final pattern model by comparing the AR application to a similar version that explicitly neglects this model. Variant B received an SUS score of 67.50, corresponding to a "grade C" rating. The RTLX score of 34.60 is classified as "somewhat high". The difference to Variant A was confirmed as statistically significant for all measured usability variables. It can be assumed that the significantly poorer usability of Variant B is due to the overloaded UI and the additional steps needed to perform tasks. Relevant functionality such as the color information panel was moved to a nested menu, which may have increased the workload and thus reduced usability. Inconsistencies may have further caused confusion within the UI, which was reflected in verbal statements throughout the empirical study. Participants using Variant B repeatedly asked how to perform basic actions like placing, scaling and exchanging virtual objects, which could also be due to the poor visual representation of underlying functionality. These results thus confirm the assumptions underlying RQ4. Variant B was found to have a significantly poorer general usability, a significantly higher perceived workload and a significantly longer time required to solve given tasks.
We further investigated whether prior AR experience relates to the usability ratings within our study. While we did not detect any significant relation to the aggregated usability scores, we did observe a significantly negative influence of prior experience on the perceived workload in Variant A. It can be assumed that a higher expertise led to a lower perceived workload, which is not surprising, but would also be expected for Variant B. This may be attributed to the generally high perceived workload in this variant, which may have offset the influence of prior experience. The fact that no other correlations with prior experience were found might be related to the young average age of 23 years within our study, since this age group is attributed with a high technical affinity and thus adaptability to novel technologies. However, this may also lead to the conclusion that prior experience is not a dominant influence factor for a system's usability, although this would be worth investigating further. Nevertheless, the underlying assumptions of RQ5 could not be confirmed within our empirical study.
Some additional implications from our reported results remain. We observed a strong connection between all considered usability variables, in aggregated scores as well as in specific variants. Our results indicate that a positive general usability is strongly related to low perceived workload and vice versa, both of which result in less time needed to perform tasks. Although this finding is not surprising, it confirms the general assumption that the perceived workload is strongly connected to the general usability and the efficiency of task performance.
Nevertheless, some limitations of the empirical study need to be mentioned. Due to the nature of A/B testing, the number of participants testing each variant was halved. This rather small sample size may have effected our results. Within our research, we chose an incremental approach to reflect theoretical models in multiple observations in order to compose valid results. Nevertheless, even though our results were explored in two separate studies, larger studies should be conducted to further validate our findings. In addition, further evaluation possibilities such as including focus groups or expert interviews should be considered. Furthermore, in this specific study, we only examined the first-time use of a single application to which the results may be attributed. However, we consider this second evaluation to be an extension of our initial evaluation, the shared results were thus observed in two distinct AR applications.

Conclusions
We explored design patterns for mobile AR user interfaces in an incremental review. In our initial approach, we merged existing patterns from two main sources ( [12,13]) into a joint model and evaluated the usability of the ARScribble application representing this model. This initial evaluation revealed that the implementation of this model is related to a positive usability. The examination of individual patterns further revealed that patterns concerning the consistency, structure and organization of UI artifacts were particularly influential, emphasizing users' need for clear structures and interactions in new technologies such as AR. Our initial results further indicate that users seek freedom in their choice of interaction methods. In a second iteration, we extended and refined our initial pattern model. For this purpose, we explored additional research ( [14,16,17,24,25]) from AR-specific and related fields and individually reviewed the design patterns described here, resulting in a final pattern model of 15 consolidated patterns spanning over five distinct categories. The colAR application representing this model was implemented and examined within an empirical study regarding general usability, perceived workload and time needed to perform tasks. As a result, the AR application showed a highly positive usability, which differed significantly from a similar application with explicit disregard of our final pattern model. Our second evaluation reaffirmed the importance of simple and clear structures, as well as a consistent appealing UI design, confirming results of our first evaluation that were even surpassed in terms of usability ratings. Our work contributes to AR research by reflecting the current state of research regarding generally applicable design guidelines and providing assistance in AR UI design through defining a consolidated pattern model. Here, special attention should be paid to patterns aiming at consistency, organization and accessibility of information as well as user-controlled multimodal interaction.
For future work, several interesting research options remain. We plan to investigate our final pattern model in further studies, especially regarding adaptations to use-cases, application domains and target groups. Additional evaluation possibilities, such as including focus groups or expert interviews, are to be taken into account in order to enable a further evaluation of current findings. Furthermore, it remains interesting to explore whether our results are applicable to other device groups, such as head-mounted displays, which we plan on investigating in future work.