Facilitating Workers’ Task Proﬁciency with Subtle Decay of Contextual AR-Based Assistance Derived from Unconscious Memory Structures

: Contemporary assistance systems support a broad variety of tasks. When they provide information or instruction, the way they do it has an implicit and often not directly graspable impact on the user. System design often forces static roles onto the user, which can have negative side effects when system errors occur or unique and previously unknown situations need to be tackled. We propose an adjustable augmented reality-based assistance infrastructure that adapts to the user’s individual cognitive task proﬁciency and dynamically reduces its active intervention in a subtle, not consciously noticeable way over time to spare attentional resources and facilitate independent task execution. We also introduce multi-modal mechanisms to provide context-sensitive assistance and argue why system architectures that provide explainability of concealed automated processes can improve user trust and acceptance.


Introduction
Currently available assistance systems, be it prototypes or commercially available products, can be characterized as ranging from advisory systems that ease information access for decision-making to artificial instructors that in their most distinct shape do not require any decision by the user during operation. Additionally, systems can be categorized by their contextual awareness. Systems may require user input to synchronize their system state with the current task state or are able (to a certain extent) to detect activities and update their model correspondingly. For instance, devices acting as personal assistants have been introduced as advisory systems. They react and provide information when specifically queried, e.g., when the user requested prompts by setting an alarm or a custom workflow. Amazon leads the market of intelligent personal assistance in the US embedded into smart speakers with their solution called Alexa. Especially in the US, smart speakers can be considered widely adopted as one out of four US adults now own such a device [1]. Amazon now targets the market of worn assistance systems with their currently introduced smart glass solution Echo Frame [2]. Apple is also expected to enter the market of smart glasses in the foreseeable future [3]. While personal assistance systems are currently rather reactive and rather advisory, this might change in the foreseeable future. As Eric Schmidt (former Executive Chairman of Alphabet Inc, CA, USA) noted in an interview considering the future on search engines and in particular Google: "They [the users] want Google to tell them what they should be doing next" [4]. For domain-specific tasks, systems that instruct users about what to do next also exist. For instance, the Vorwerk Thermomix TM5 is an instructing assistance system for preparing food [5]. The cooking process is configured by selecting a recipe, which is then serialized into step-by-step instructions for the user. Users need to follow the order the TM5 is providing. A prototypical system which extends cooking assistance considering versatility and flexibility of step execution is called KogniChef [6].
While the vast majority of assistance systems are designed to provide feedback that is intended to be perceived consciously and/or guide the user's attention, the framing of active support or guidance contains several occasions where unconscious information processing impacts the user experience and behaviour. In the following, we discuss several aspects of unconscious information processing phenomena that potentially influence the user experience and user acceptance and how we tackle these issues with the currently developed cognitive assistance system AVIKOM.
First, the application of assistance systems may influence the way in which users conduct tasks in the long run. Especially when the system design considers users as mere 'instruction receivers', they may increasingly depend on the system and lose the ability to react to unforeseen situation swiftly and accurately. A system that aligns to the users current capabilities and encourages independent decision-making can facilitate user commitment and prevent errors caused by the system's limitations. AVIKOM integrates an assessment of users' task proficiency on a cognitive level to facilitate the provisioning of a customized assistance experience. The process requires the identification of underlying memory structures, which cannot be consciously verbalized by users. We illustrate how AVIKOM, starting from initial assessment results, gradually transitions from being an instructing and proactive system towards a reactive and advisory system over time when tasks are repeatedly conducted (as depicted in Figure 1).
Second, rather subtle and potentially unconscious feedback may be used in peripheral displays to communicate secondary features and hereby improve multi-modal prompts or present task states relevant for process monitoring. Especially auditory displays offer as yet mostly untapped potential for effortless peripheral interaction.
Third, trust is a potentially unconscious factor of system acceptance, which has often been unaddressed when previously unknown tools were deployed. We argue that exposing decision making processes and embracing explainability as system feature can ease system introduction and adaptation. Figure 1. Assistance systems may act as advisors, instructors, or a combination of both. Systems may also require a varying amount of user feedback to assess the task's context. The currently developed AVIKOM system initially assists users with proactive instructions and subtly withdraws into a reactive advisory role over time.

AVIKOM-A Cognitive Assistance System
The AVIKOM system is a multi-modal augmented reality cognitive assistance system for industrial applications that recognizes its current user and adapts the provided assistance to individual needs [7]. Assisted tasks range from assembly instructions in manufacturing to commissioning and order processing guidance in intralogistics processes. This includes contextual suggestions about best practice measures, real-time status updates and locations of required assets and tools. The system is developed in a joint endeavour of three research groups together with four small and medium-sized enterprises (SMEs), as well as network partners and a diaconal institution. Within the project, the award-winning ADAMAAS system [8] and the smart hearing protection HEA 2 R [9] shall be combined. A cognitive skill assessment method, which has been previously deployed in ADAMAAS will now be extended with a dynamic adaptation process. In the following, we will introduce how this shall enable AVIKOM to adjust assistance over time in a subtle manner, which is conceptually comparable to "subtle hiding" approaches described by [10].

Skill Assessment and Preemptive Adjustments
The system has been designed to transfer and enrich users' process knowledge rather than replace it with technology. To do so, the system initially assesses the user's cognitive task proficiency with a semi-automated survey method for structural-dimensional analysis of mental representations (SDA-M) [11]. The theoretical foundations of SDA-M relate to the hierarchically-organized cognitive action architecture (see Table 1), which assumes that conscious mental functions of action control evolutionary emerged from more elementary functions [12]. Furthermore, this model differentiates between action representation and different (goal-oriented or automated) action control levels. Processes on the lowest level of sensorimotor control are mostly unconscious unless they are explicitly attended to. The top level of (volitional) mental control codes intended effects into action goals. These are associated to the level of mental representations using basic action concepts (BACs), which integrate (goal-related) functional and sensory perceptual features serving as a cognitive benchmark for voluntary movement regulation. Through chunking and comparable means, BACs are connected with sensorimotor representations that store perceptual effects of actions. Therefore, both conscious and unconscious (automatized) processes of action organization are functionally based on mental representation structures [13]. Whereas users can understand representations of BACs in terms of pictorial and textual descriptions and associate them with corresponding longterm memory contents, the underlying associative structure of these mental representations is not supposed to be exhaustively accessible on a conscious, declarative level.
SDA-M is a method for analyzing these mental representation structures related to specific tasks for a specific user. To this end, workers who want to use the AVIKOM system are first subjected to the SDA-M split procedure using specialized software such as the QSplit tool shown in Figure 2. This software displays pairs of task-related action representations (BACs) and asks the users to judge whether they believe the actions are directly associated during task execution or not. Currently, QSplit runs on desktop and mobile devices with mouse or touch interaction and transfers its results to AVIKOM's wearable hardware. An integration of the SDA-M split procedure into AVIKOM's augmented reality user interface is currently under evaluation. The split procedure yields data about the individual strengths of associations between task-related actions in the user's long-term memory. AVIKOM then uses special algorithms for automatized analyses of this data based on computational cognitive architecture models [14]. This provides estimated likelihoods that the user would need assistance in a specific situation within the defined action sequence, as well as overall predictions of that user's cognitive task proficiency. These results are converted into initial task step support recommendations, which are adjusted during the system's runtime. By eliciting the unconscious structures of task-related long-term memory contents and using them to estimate initial assistance requirements, AVIKOM aims to reduce clutter and increases salience of required instructions. Furthermore, potential errors can be targeted and prevented before they actually happen. Expected benefits of this approach include avoidance of common error-learning pitfalls and accelerated skill acquisition. The overarching strategy of AVIKOM is then to gradually, subtly and inconspicuously decrease the system's proactive assistance provision to individual users over time to facilitate their self-dependent task execution. The avoidance of unneeded assistance may prevent habituation effects and increase the likelihood of vital remarks being recognized and acted upon. This approach treats the user's attention as what it is: a sparse resource. Especially permanently operated assistance systems based on technologies such as AR need to follow strategies that reduce cognitive load and prevent mistakes or even injuries caused by distraction through inadequately placed assistance. Again, we argue that encouraging and relying on user involvement and expertise development in task-solving strategies can prevent issues that technology alone cannot solve today.
Assistance levels will be gradually and subtly reduced as long as the frequency of action error occurrence decreases or levels off until no further assistance is provided at all. This dynamic adjustment shall happen in a fuzzy, not consciously noticeable manner to prevent users from predicting and consciously adjusting to these changes. Such an adaptation violates traditional usability paradigms that emphasize predictability and consistency of user interfaces (e.g., Nielsen's usability heuristics [15]). However, by providing and withdrawing assistance in a way that eludes users' conscious expectation building, AVIKOM embraces the postulate that long-term memory structures are by no means static but adapt and improve with appropriate training and experience (see e.g., [16][17][18]). By keeping assistance augmentations versatile, we expect users to transition toward independent task solving as they are not trained to wait for proactive instructions before tackling an imminent task step. As mentioned before, keeping assistance to the lowest necessary amount can mitigate the risk of system-induced errors caused by misalignment of system states and real world context. A lack of information available to the system in combination with unfavourable visual guidance may create error categories that could be prevented by reducing the users' dependency on the system. For instance, as seen in Figure 3, an outdated guiding system might suggest impassable routes and, even worse, inadvertently cover relevant real-world cues such as warning signs. These types of errors are not specifically novel as they can be observed on a daily basis. For instance, when GPS systems operating with outdated maps cause disruption for drivers, as happened in Colorado where 100 drivers were stuck in a dead end [19]. The artist Simon Wecker exploited a similar effect when he used a cart full of smartphones to simulate a traffic jam causing GPS systems to reroute traffic. This allowed him to walk his cart on a completely empty street [20]. In addition to applications for user-adaptive assistance, preliminary practical experiences in project AVIKOM suggest that task analyses aimed at identifying and formalizing a set of basic actions for SDA-M may also help enterprises with making established implicit structures of work processes explicit, thus enabling the assessment and standardization of best practices.

Context-Sensitive Prompting
While AVIKOM certainly focuses on audiovisual (in-situ) assistance, it will also feature peripheral interaction capabilities. Peripheral interaction can be defined as "activities and perceptions. . . [that] take place in the periphery of attention" ( [21], p. 2), which can be subconscious and intentional since peripheral information can easily and intentionally be shifted to the focus of attention. Sound is very effective in drawing our attention towards events outside our field of view, e.g., to become aware of somebody approaching from behind (e.g., by their footstep sounds) or of an alarm clock or mobile phone beeping on the table [22]. It is also not constrained to a single location, conveys positional information and can be perceived independent from the listeners own orientation [23]. Technology such as 3-D audio (filtering sound to make it appear as if the sound is actually situated in a certain location outside the listeners head instead of coming from the headphones or speakers) can amplify the degree with which synthesized signals integrate with the listeners natural surroundings [24,25]. Being able to modulate sound to blend with the surroundings of the listener and the listener's ability to capture sound in a 360 • fashion make the auditory channel especially suited for providing peripheral information.
Speech however might not be well-suited for providing peripheral information as it immediately grabs the user's attention and requires a substantial amount of cognitive resources on the part of the listener to be processed compared to non-speech sounds ( [26], p. 78). Furthermore, speech has been shown to be especially distracting when the user is engaged in another conversation [27]. Nevertheless, there is some evidence that distraction by speech also partly depends on the listener's degree of participation [28]. Being actively engaged in a conversation bound more attentional resources, whereas passively listening to instructions resulted in less or no distraction [28]. Other evidences suggest that the choice of the voice and its personality can influence the attentional resources required to process speech with extroverted voices being more predictable and reducing cognitive load on the part of the listener ( [29], p. 42). These insights can be used to make speech slightly more appropriate to be presented in the periphery of the user's attention when using non-speech sounds is not possible, e.g., if the information to be presented is too complex.
Non-speech sounds can easily be designed to fit an appropriate salience and demand less cognitive resources than speech ( [26], p. 78). In some occasions sounds were also shown to be less annoying than speech [28], which helps to keep sonic feedback in the periphery of attention. There are many aspects that can increase the degree to which the listener's attention is drawn towards an auditory stimulus. Rapid onset and irregular harmonics [30] or physical properties can modulate the amount of engaged attentional resources. Another aspect is the ease with which sonic information can be decoded [31]. An everyday example of communicating information through sound are functional sounds which are associated with a specific meaning (e.g., the notification sound of a messenger application indicating that you have received a message). While most of the relations between non-speech sounds and their meaning need to be learned (just like a language), studies have shown that by carefully designing functional sounds and considering already existing metaphors between sound and concepts outside of the sonic domain, the rate with which these relations can be learned can be greatly increased ( [32], p. 49). Ideally, the learning process quickly converges to a level at which information transported through functional sounds or sonification methods (translating continuous data into sound) can be processed intuitively and subconsciously reducing workload and distraction.
Taking the aforementioned considerations into account, AVIKOM will combine speech, functional sounds and sonification methods to ideally support the user in solving different tasks. Complex instructions, for example, will be delivered through generated speech whereas feedback for interactions with AVIKOM (e.g., hand gestures) and event-based messages, such as new incoming orders, will be implemented using functional sounds. When referring to events that have a specific representation in space (e.g., indicating a malfunction of a specific machine that has a fixed, known position) AVIKOM will employ 3-D audio technology encoding the positional information directly into the warning signal. Furthermore, as sonification-based monitoring systems can enable users to efficiently work on a task while monitoring another [33]. AVIKOM will synthesize appropriate sounds and, as conceptionally illustrated in Figure 4, modify existing soundscapes to guide the user's attention towards relevant signals (right) or attenuate potentially distracting noise (left). This is especially useful in settings in which users have to wear hearing protection which usually isolates users from both, noise and signals.
When monitoring transitions into action, users intuitively evaluate their peripersonal space, the area in which they can interact with the environment [34]. The approximation of this space is based upon an integrated neural representation of their body (body schema). The peripersonal space is egocentric, dynamic, and constantly reevaluated without effort. Humans can also project peripersonal space estimations, which is relevant for collaboration. They approximate their interlocutor's peripersonal space and adjust their action accordingly. Strategies to make use of the interaction space-the intersection of multiple individuals peripersonal space-in interaction with humanoid robots has been outlined in [35]. A similarly volatile egocentric space concept has been proposed by the notion of Action Space [36]. Action spaces consider more factors than physical distance, for instance affordances of objects and the capability of users to perceive emitted feedback and to what extent. AVIKOM offers context-sensitive prompting, which distinguishes between imminent accessible targets and targets out of the user's reach. For the latter, AVIKOM will provide multi-modal guidance based on aforementioned sonification models considering distance, occlusion, and relative direction from the user's current head orientation.

System Transparency and Explainability
As a part of our stakeholder-centered agile development approach [37], we have conducted focus group workshops and interdisciplinary analyses to elicit potential ethical issues and related non-functional requirements of an AR assistance system such as AVIKOM. Participants from a broad range of disciplines (psychology, sociology, economics, engineering, logistics, IT, and cognitive science) took part in these analyses. Considering user acceptance, the need for a system to be able to explain its capabilities and justify its decision-making has been brought up. The severity of this issue was assigned to level two out of four of the model for the ethical evaluation of socio-technical arrangements (MEESTAR) [38], i.e., it was considered "ethically sensitive, but this can in practice be compensated for". Presumably, this requirement arose to some degree from the fact that spatial computing devices such as the HoloLens 2 are fairly new devices and not yet known to the majority of potential users. While the vast majority of the user base can, for instance, intuitively grasp the features and functionality of simple well-known tools such as a hammer, there are no clear affordances for AR goggles. Systems that do not provide introspection features might cause distrust and also enable opportunity of abuse, e.g., when tech-savvy colleagues, supervisors, or hackers actively alter its output to discredit or harm the user. Even worse, system design may be steered towards so-called dark patterns. This term loosely describes a set of measures applied to a software to 'trick' users into excessive usage (e.g., variable ratio schedule [39]) or make it harder to stop using it (e.g., infinite scroll [40]).
A constantly worn augmented reality system may abuse its central position in the users interaction loop by applying similar misleading gratification patterns.
We strive to position AVIKOM as a tool for the user rather than an instruction system. This role must be communicated to the user and put into action by constantly reassuring the user to be in charge of the system. Features that cannot be deactivated or only with the permission of a supervisor should be avoided unless absolutely necessary, e.g., to ensure workers' safety. If a system is considered a 'snitch' rather than a tool for the user, acceptance and consistent appliance cannot be achieved. For instance, the introduction of Google Glass to consumers is widely considered to be unsuccessful. While some blame the lack of features and technological limitations, others claim that the vision of users acting as the eyes and ears of Google in the public was unacceptable and represented a massive private intrusion even for a usually rather unconcerned audience [41].
The system concept as a helper must also be reflected in the way in which the system addresses the user. Relevant questions include: Is the overall tone and presentation of prompts friendly and cherishing? Does the system use domain-specific terminology and applies it correctly?
Systems that suggest or even require a certain approach to a given task often do this by design. They are formalized for a certain way of task solving and require the user to follow this line of thinking disregarding the conceptual solution space. This may prime or at least influence the way in which users assess a future task. Such an influence is not inherently bad since it provides a way to communicate and teach best practices. While the reasoning for and the background of these problem-solving strategies are not communicated and are thus unconsciously (and to a certain extend unwillingly) 'forced' onto the user, the resulting skill may be a desirable effect.
However, issues arise when the system's problem-solving space is unintentionally narrowed down and collides with established solutions. It may require a significant behavior shift of users or lead to failed adaption altogether.
Nudging users toward unwanted or sub-optimal behavior should be considered even more precarious if unconscious manipulation techniques were used. In recent years, several approaches to subliminal manipulation of users through technical systems have been proposed and investigated [42], e.g., subliminal cueing of item selection using masks [43], exploitation of change blindness effects or saccadic suppression [10], visual crowding and continuous flash suppression [44].
To detect and analyze undesired assumptions about problem-solving strategies, as well as potentially unwanted manipulations, a system's decision process must be explainable. While some sub-steps may lack explainability due to their sub-symbolic nature, the overarching reasoning should be made transparent to users on demand. Introspection is essential in this endeavor. AVIKOM deliberately decouples process logic, content and presentation to offer a transparent, flexible and interchangeable module structure. With gRPC as a middleware [45], AVIKOM realizes a separation of concerns as shown in Figure 5, which allowed us to establish a diverse development toolkit. Components of this toolkit have been chosen to consider user roles and features of the concern in question. For process logic, we have chosen the Business Process Modelling Notation (BPMN) [46], which is executed by a BPM engine. BPM is particularly suited to visualize, investigate and track processes. It is a common tool for enterprise controllers to track manufacturing or service states on a macro level. However, we claim that it is also suitable to formalize tasks and communicate expected workflows. AVIKOM brings this level of abstraction closer to the user in an attempt to illustrate decision-making processes and build trust. For instance, the AVIKOM headset component HEA 2 R may act as a hearing protection but occasionally enables a pass-through of a (filtered) outside noise signal. As mentioned earlier, this is intended to provide environmental cues that are commonly not accessible when wearing ear protection. However, users may not initially understand such an event even if extensive care has been taken to design and present it in a comprehensible way. With introspection, users can request recent event and activity logs to get a better grasp of the system behaviour and use that knowledge to make informed decisions whether they want to opt-in or -out of provided assistance features.
Introspection also offers an interesting channel for self-reflection and monitoring as it allows to a certain degree to draw conclusions about the user's behaviour from observed system events. User's may wish to be informed when their performance degrades slowly over time or when they do not reach a self-assigned goal (e.g., stretch every 15 min to prevent back pain). As such kind of information may also offer a massive potential for abuse and potentially illegal staff performance assessment, AVIKOM decouples task fulfilment from the user data. This allows to apply separate and personalized encryption to secure these particular sensitive data.
AVIKOM's presentation layer is by design abstract and redundant. A user may operate multiple input and output devices at the same time. We call these devices peers. A peer can be assigned to a user at runtime and operates stateless concerning the task at hand. An orchestration layer controls how information is presented and feedback is retrieved from the user by filtering content based on the current state of the logic layer and providing cues (e.g., urgency, difficulty, instruction type) to a peer. This layer also acts as an interaction manager [36] and determines which peer is used based on user preferences and environmental factors such as brightness or noise. Modal capabilities of peers are not modelled binary though as a user may use multiple devices at the same time with intersecting or redundant modal capabilities. Thus, a dedicated headset can override integrated speakers of a spatial computing unit by providing a prioritization attempt for its auditory display capability.

Conclusions
When designing modern assistance systems, the system's and user's roles have to be carefully considered. By ignoring effects of static assumptions about the user in a commonly dynamic process, assistance systems may cause subtle yet continuous alignment of users towards the user model of the system. This keeps relevant cognitive resources of users untapped and may lead to a form of system dependency that hinders appropriate decision-making in previously unknown situations. We have proposed a dynamic process for subtle and not consciously noticeable modulation of assistance allocation based on analyses of (unconscious) task-related memory structures in combination with peripheral displays to minimize unnecessary cognitive load, provide a seamless and pleasant assistance experience, and facilitate learning processes for self-dependent task execution. The AVIKOM system is designed to provide guidance for a variety of work-related tasks in small and medium-sized enterprises. In interdisciplinary workshops, a need for traceable and transparent system behavior has been identified. We argue that explainability as a system feature will be of central importance for future systems to gain user trust and acceptance and ease the introduction of previously unknown complex tools such as augmented reality-based cognitive assistance systems. We have implemented a software architecture for AVIKOM that allows to separate content authoring from interface design and process logic. The system uses state-of-the-art approaches for content recognition and world registration. Existing SDA-M tools can be used to elicit long-term memory structures and convert them into explicit task proficiency to adapt AVIKOM's assistance verbosity. We strive to attune the presented system architecture and individually proven concepts to our project partners' requirements in the next steps. This may also reveal insights into the interplay of components and strategies to integrate assessment tools into AR. Whether usability tests in actual working environments can be conducted or whether we have to fall back to remote testing approaches depends on the currently rather dynamic development of pandemic countermeasures. Funding: This research has been funded by the German Federal Ministry of Education and Research (BMBF) and the European Social Fund (ESF) in the frame of the project "AVIKOM" for the BMBF call on "The future of work" to strengthen innovation potential in the labour organization of SME during the digital transformation process.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.