Multimodal Assistance in Rehabilitation: User Experience of Embodied and Non-Embodied Agents for Collecting Patient-Reported Outcome Measures

Ashrafi, Navid; Graf, Philipp; Marquardt, Manuela; Harnisch, Philipp; Hillmann, Stefan; Ploner, Nico; Compagna, Diego; Cirit, Eren; Papst, Lilia; Voigt-Antons, Jan-Niklas

doi:10.3390/virtualworlds5010015

Open AccessArticle

Multimodal Assistance in Rehabilitation: User Experience of Embodied and Non-Embodied Agents for Collecting Patient-Reported Outcome Measures

by

Navid Ashrafi

^1,*

,

Philipp Graf

²

,

Manuela Marquardt

³

,

Philipp Harnisch

⁴

,

Stefan Hillmann

⁴

,

Nico Ploner

⁵

,

Diego Compagna

²

,

Eren Cirit

⁶,

Lilia Papst

³

and

Jan-Niklas Voigt-Antons

^1,*

¹

Immersive Reality Lab, Hamm-Lippstadt University of Applied Sciences, Dr.-Arnold-Hueck-Straße 3, 59557 Lippstadt, Germany

²

Faculty 11, Munich University of Applied Sciences, Am Stadtpark 20, 81243 Munich, Germany

³

Institut für Medizinische Soziologie und Rehabilitationswissenschaft, Charité–Universitätsmedizin Berlin, Charitéplatz 1, 10117 Berlin, Germany

⁴

Quality and Usability Lab, Technische Universität Berlin, Marchstr 23, 10587 Berlin, Germany

⁵

Acalta GmbH, Wetterkreuz 15, 91058 Erlangen, Germany

⁶

Dexter Health GmbH, Bonner Wall 126, 50677 Köln, Germany

^*

Authors to whom correspondence should be addressed.

Virtual Worlds 2026, 5(1), 15; https://doi.org/10.3390/virtualworlds5010015

Submission received: 25 December 2025 / Revised: 9 February 2026 / Accepted: 25 February 2026 / Published: 19 March 2026

(This article belongs to the Topic AI-Based Interactive and Immersive Systems)

Download

Browse Figures

Versions Notes

Abstract

The collection of patient-reported outcome measures (PROMs) is a key measurement tool for patient-centred care. At the same time, collecting these measures poses obstacles for many patients, leading to these groups being underrepresented in the data. We have therefore developed a multimodal, AI-driven assistance system to support patients in collecting these data. The interface of the system comprised a digital tablet containing the PROM questionnaire items and the assistant in three forms of embodiment: A virtual avatar, a physical avatar, and a voice-only agent. To evaluate the users’ experience and ratings of the system, two separate studies were implemented in two rehabilitation centers with 195 patients. A mixed within–between RCT was conducted at an outpatient clinic, where patients completed PROMs both with and without an assistant, and a between-subject design at an inpatient clinic comparing routine PC-based care with avatar- and robot-assisted PROM administration. Our results suggest a preference for the non-assisted tablet-only condition in Clinic A, whereas, in Clinic B, both agent conditions were preferred over routine care. We have further analyzed aspects such as trust and social presence in this study to gain a more thorough understanding of the users’ experience. Our analysis shows a higher trust rating for the voice-only assistant, whereas the robot, virtual avatar, and the voice-only conditions were perceived as more socially present. The impact of demographic factors and affinity for technology on the user ratings was also thoroughly studied. Our findings shed light on the role of agent embodiment in PROM assistance and contribute to the future design and evaluation of effective, engaging, and trustworthy systems for data collection in healthcare settings.

Keywords:

assistance; medical setting; embodied conversational agents

1. Introduction

PROMs are now recognized as essential components of contemporary healthcare, providing patient-centric insights that complement traditional clinical metrics and facilitate better treatment evaluation and quality control at micro- (clinical), meso- (organizational), and macro- (systemic) levels [1,2]. Historically, PROMs have enabled direct reporting of symptoms, well-being, and functional status, with both generic instruments (e.g., EQ-5D, WHO-5 PROMs) and disease-specific variants in use across a broad array of clinical settings [1]. The recent literature provides strong evidence for PROMs’ benefits in improving patient–provider communication, decision-making, and engagement [1]; however, real-world uptake is hindered by technical, motivational, and accessibility barriers [2,3]. Digital transformation has partially responded to these challenges, with electronic PROMs (ePROMs), web portals, and mobile apps providing improved access, feedback, and data integration [4]. Nevertheless, limitations remain regarding inclusivity for patients with sensory, motor, cognitive, or language barriers [1,3].

Patient-reported outcome measures (PROMs) have become increasingly vital for diagnosis, health status monitoring, and evaluating the effectiveness of clinical treatments, fostering a patient-centered health care approach [5]. On the other hand, traditional paper-based questionnaires are often perceived as lengthy, monotonous, and exhausting, especially by individuals with mental and cognitive impairments, which can result in inaccurate input, skipped questions, or drop-outs. This ultimately hinders comprehensive analysis and effective treatment recommendations [6,7]. These challenges, including limited patient engagement, comprehension barriers, and the need for resource-intensive clinical support, are particularly pronounced among groups with additional accessibility needs, such as older adults and those with psychological or neurological disorders [6]. In the worst cases, questionnaire fatigue leads to increased non-response rates and structural bias in the data quality of already disadvantaged groups. The patient group that is most severely affected in terms of health and would actually benefit most from PROMs is systematically underrepresented in the data. While standardization, digitalization, simplification of language, and session-based divisions have been proposed to address these barriers, ongoing research continues to seek more effective solutions for improving patient data collection methods and accessibility in routine healthcare [7,8].

Recent advances in artificial intelligence and multimodal interfaces offer promising solutions to improve the usability and inclusivity of PROM collection systems [1]. Rather than relying solely on static digital forms, intelligent agents can act as interactive intermediaries that explain questions, structure the interaction, and adapt to patients’ abilities and preferences. Virtual agents rendered on screens can use gaze, facial expressions, and synchronized speech to create a more engaging and supportive interaction context, which has been shown to positively influence perceived rapport and willingness to disclose health-related information [9,10]. Socially assistive robots extend these capabilities into the physical space, where co-present, embodied interaction can motivate patients to complete demanding or repetitive tasks, including structured self-report assessments in rehabilitation settings [11,12]. Voice-only agents and speech assistants, embedded in tablets, smart speakers, or mobile devices, offer yet another modality by enabling hands-free and eyes-free completion of questionnaires, which is particularly valuable for patients with visual, motor, or literacy-related barriers [13,14]. In combination with contemporary AI techniques such as automatic speech recognition, natural language understanding, and adaptive dialogue management, these agents can rephrase items, provide examples on demand, and adjust pace or verbosity in real time while preserving the psychometric structure of PROM instruments [15,16]. Multimodal systems that orchestrate visual, auditory, and interactive feedback can further reduce cognitive load, for example, by highlighting key terms on screen while verbally summarizing prior answers or indicating overall progress through the questionnaire [17]. In the context of rehabilitation, such AI-augmented agent systems have the potential to transform PROM administration from a burdensome documentation task into a guided, supportive process that better accommodates fluctuating energy levels, cognitive limitations, and emotional states, thereby improving completion rates and data quality for patient groups that are often underrepresented in current PROM datasets [18]. In this work, we do not directly measure completion rates or missing data; instead, we focus on user experience aspects that serve as necessary preconditions for the longer-term goals of improving PROM completion and data quality.

However, it remains an open question as to what specific form of embodiment of an agent is appropriate and functional for this context. There remains a need for rigorous user experience (UX) evaluations to guide the effective design and deployment of such systems in medical settings. The present study is an exploratory research that examines the impact of agent embodiment and interface modality primarily on the patients’ pragmatic and hedonistic quality of experience, and the perceived sense of trust and social presence in two clinics. We additionally investigated the roles of demographic factors and affinity to technology in shaping user experiences. We especially focus on differences caused by an agent´s embodiment, thus contributing to the question of which embodiment adds practical value to an interaction with a speech assistant system. By comparing virtual, physical, voice, and no-agent PROM interfaces with a large sample of patients, we aim to inform the future design of effective, engaging, and trustworthy multimodal assistant systems in healthcare. In the context of our work, trust refers to the willingness of the patients to disclose personal data about their health to our agents. This requires certain visual and vocal characteristics of the agents to make the patients feel safe and engaged in the conversation. These characteristics have been thoroughly studied by us and implemented on our agents to elicit a high level of trust and also social presence.

2. Literature Review

Modern advancements in artificial intelligence and multimodal user interface (UI) design, combining touch, voice, and visual modalities, have demonstrated promise in enhancing usability, accessibility, and personalization [19]. Systems leveraging generative AI models, speech recognition, and natural language processing enable context-adaptive support, personalized conversation, and emotional attunement, reducing user cognitive load and improving engagement for diverse populations [3,20]. Beyond linguistic adaptation, the embodiment of an assistance system, whether as a virtual avatar or a physical robot, can contribute significantly to its effectiveness by providing a realistic and potentially intuitive interface for speech support, while simultaneously fostering social presence and relational cues that promote engagement and persistence with PROM completion [21,22]. For instance, multimodal avatars, conversational robotic agents, and intelligent assistants can guide PROM completion offer real-time explanations, and allow dynamic adaptation to patient needs by combining verbal prompts with gaze, gesture, and facial expressions that help users interpret and contextualize questionnaire items [3,21]. Our system builds upon this body of research by proposing an AI-powered, multimodal speech assistant specifically designed to support PROMs collection and enhance patient experience in two rehabilitation centers [23]. Key features include embodied conversational agents (avatars and robots), voice-enabled interactive questionnaires, and graphical interfaces that accommodate varying literacy, language, and physical abilities [21]. Similar projects, such as “ReThiCare” for elderly populations and studies deploying humanoid robots for clinical engagement, emphasize the importance of trust, adaptability, and participatory design in achieving sustained acceptance and meaningful use in vulnerable patient groups [1,22]. Interoperability, data privacy, and standards compliance, such as Fast Healthcare Interoperability Resources (FHIRs), are increasingly prioritized, facilitating integration into broader health information infrastructures and enabling PROM-based insights to inform routine clinical workflows [2,3]. Participatory and co-creative development and research approaches, centered around involving patients in decisions about the design and evaluation of these technologies, are therefore critical not only from an ethical perspective but also for shaping both the pragmatic qualities (e.g., efficiency, clarity, reliability) and hedonic qualities (e.g., enjoyment, perceived support, social presence) that determine overall user experience [2,23]. These are dimensions that this study explicitly investigates.

Virtual and physically embodied agents have been increasingly investigated as mediators of healthcare interactions, including PROM administration [24]. Recent work on agent appearance in a PROM application shows that perceived competence, rather than human-likeness, is the dominant factor in agent selection and willingness to share health data, and that clothing style systematically shapes perceptions of warmth and competence in virtual healthcare assistants [25]. Emotion-adaptive virtual health assistants built with photorealistic MetaHuman avatars further demonstrate that synchronizing speech and facial expressions to users’ emotional states can enhance perceived naturalness and overall user experience in health-related conversations [26]. Complementary studies on agent size and spatial representation in a conversational PROM system highlight that medium-sized agents can optimize usability, perspicuity, and stimulation, and that social presence is strongly linked to attractiveness and engagement, with some gender-specific preferences for human-scale versus smaller agents [27]. Interface layout and device configuration also matter: comparing PROM interactions on a single tablet versus a dual-tablet setup indicates that a single device can yield higher usability and pragmatic quality, although some users attribute a stronger sense of presence to the agent displayed on a second tablet [28]. Beyond PROMs, ongoing work on highly photorealistic interviewer agents in virtual and immersive environments illustrates how embodiment and medium affect anxiety, social presence, and motivation in demanding interaction scenarios such as job interviews or clinical consultations, underscoring the broader relevance of virtual agent design and modality for sensitive, assessment-oriented use cases [29]. In parallel, healthcare-focused voice assistants and conversational agents deployed on smart speakers and mobile devices are being explored for chronic disease management and patient education, showing promising results for feasibility and engagement but also revealing concerns around privacy, reliability, and equitable accessibility that must be considered when designing voice-based PROM support systems [14,30].

In summary, the literature highlights the transformative potential of combining multimodal AI and digital assistance tools [31] such as speech assistants, and physical and virtual agents to overcome patient-side barriers in PROM collection. Exemplifying this synthesis, this research aims to provide equitable and engaging access for diverse patient groups and to foster evidence-informed healthcare practice [2,3,4,19,20,21,22]. Our study addresses the aforementioned gaps by developing and evaluating an AI-driven, multimodal assistance system designed to enhance PROM collection in two rehabilitation clinics. The specific forms of agent embodiment (virtual avatar, physical robot, voice-only) were established as part of the overall project setup, including the participatory design process guiding system development. The conditions were designed to be comparable in functionality while differing in embodiment. This study explores whether these different embodiments lead to measurable differences in patients’ pragmatic and hedonic user experience, as well as in perceived trust and social presence. Given the exploratory design, the aim is to identify patterns in user experience that can inform the design of future multimodal PROM assistance systems.

3. Materials and Methods

3.1. Participatory Design Process

We recruited a five-member patient advisory board, selected to match the target group in age and rehabilitation background, to inform the project and take key design decisions. Members were sourced from the [anonymized] study mailing list, yielding a mean age of 61 years to reflect rehabilitation clinic demographics. Inclusion required ties to the rehabilitation system, ideally personal experience (one psychosomatic, three somatic); three also held healthcare professions. Written informed consent was obtained.

In order to combine participatory and co-design elements, we agreed with the patient advisory board at the beginning of the process on a set of decisions for which the board would hold final decision-making authority. This included agent identity, character, physical robot platform, and agent appearance. In the case of the virtual avatar appearance and the robotic platform, we presented them with a set of possibilities for evaluation. The board reviewed a range of humanoid and non-humanoid agents, including both realistic and stylized (cartoon-like or abstract) designs, and provided qualitative feedback on their preferences.

Six 3 h workshops proceeded as follows: (1) introductions and project overview; (2–4) collaborative design; (5) final prototype testing and fine-tuning; (6) process reflection. Each ended with a project-funded dinner as participation compensation.

3.2. System Design

The current system was conceived as a modular, multimodal assistance platform for digital PROM collection. Its architecture is built around a tablet application (Android/iOS), connected and synchronized via a local Message Queuing Telemetry Transport (MQTT) pub/sub server with multimodal agents: a physical robot (Furhat [32]), a virtual agent, and a voice-only agent (see Figure 1). A central dialogue management system was utilized to integrate components such as Automatic Speech Recognition (ASR), Natural Language Understanding (NLU), and Text-to-Speech (TTS), enabling real-time navigation of questionnaires and personalized support. Data were stored as FHIR-compliant responses to support clinical interoperability. Special considerations were made to maximize inclusivity: the design addressed potential motor, sensory, and cognitive barriers through flexible input/output modalities. Speech support, easy-read language, on-demand explanations, and context-adaptive prompts contributed to broad usability across diverse user groups, including those with little digital experience and physical or mental impairments.

3.2.1. Technical Components

Questionnaire Tablet

Serving as a non-agent-assisted benchmark and means of questionnaire administration, the tablet application constituted the primary human-system interface for self-administered patient-reported outcomes (PROs) within the platform. Without the need for preliminary studies, interaction elements for common questionnaires were rapidly developed, enabling a user-centered iterative development process for questionnaire selection and agent development. Each iteration resulted in the incorporation of adaptation features such as color schemes, dynamic button sizing, and context-sensitive help boxes. Custom question types were introduced as needed. The software was developed as a platform-independent modular app prioritising accessibility and configurability. User-facing features included a light/dark mode toggle, dynamic font and button size adaptation, and context-sensitive help boxes. Touch targets were intentionally enlarged and responsive, with support for both touch and speech input. The software implemented a diverse range of question formats, including standardized multiple choice questions, scale sliders, and exclusive voice input items where appropriate. Figure 2 shows the questionnaire tablet together with the Furhat robot and the virtual agent application.

The tablet application’s architecture was based on real-time MQTT messaging for seamless integration with other components, such as the embodied agents or even temporary experimental units. We developed two questionnaire sets suitable for both clinics. The menu options on the questionnaire application would allow patients to configure settings, such as toggling the sound volume, adjusting the speed of the agent’s speech, adjusting the amount of speech (meaning less or more amount of information would be given by the agent regarding the questions), and changing the background color. The questionnaire content and session data adhered to FHIR interoperability standards, enabling export and integration into existing clinical workflows or research databases. Results were stored both locally and in exportable files, e.g., Excel or PDF, with support for manual and automated transfer to hospital information systems where permitted.

Virtual Avatar

To create a virtual agent for our system, biographic information about the target patient population, including age, background, and diagnosis, was first collected from both clinics. A UX study using the same set of avatars as in the patient advisory board discussion was conducted to quantitatively assess perceived suitability for the PROM context. Synthesizing insights from the UX study, the advisory board, and related work on avatar appearance (e.g., clothing, facial characteristics, hairstyle, ethnicity, age, and gender) [33,34,35], the final design choice was a photorealistic female character with a warm, friendly, and competent appearance in professional attire, paired with a matching warm and calming voice. This agent was further examined in several additional UX studies and repeatedly rated by the patient advisory board throughout the project to ensure continuous alignment with a user-centered design approach.

The virtual agent application was developed using the Unity (version 2021.3.16f1) game engine [36] and displayed on a 13-inch iPad Pro. We used the Daz 3D (version 4.22) [37] software, which is a 3D modeling tool specialized in providing rigged 3D human models, to design our virtual agent. The Genesis 8.1 base female character with built-in face morph and lip-synch capabilities was used to sculpt and customize our character. The avatar had an idle body animation and a neutral face capable of performing facial animations and lip sync. The background was plain white, capable of adapting to the dark/light background mode of the questionnaire tablet. The character was wearing a white coat with a name tag and a logo similar to the clinic’s staff. The uniform color (white or light blue) and logo would also adapt to each clinic based on the selected questionnaire set on the questionnaire tablet (Figure 2). The avatars were not created based on specific models, but rather inspired by the earlier. The avatar voices were generated using a local response generator using a warm and friendly German female voice, and the voice snippets were synchronized with the agent using the SALSA Lipsynch (v2) [38] package within Unity that provides automatic lip and emotion synchronization for digital characters and objects.

Furhat

The Furhat robot functioned as a physical embodied agent in the system. Selected in consultation with the patient advisory board, the Furhat offered human-like expressiveness, articulated facial animations, and social cues such as eye gazing and idle head movement, aiming to foster an intuitive interaction. To align its appearance with the clinical environment, two custom shirts (white and blue) were designed for the robot, one for each clinic, featuring a name tag and the respective clinic logo, matching both the local staff uniforms and the clothing of the virtual agent (Figure 2); this supported a fairer comparison between modalities and helped patients relate to the agents as part of the clinical team. A warm, calming female voice and a friendly, approachable facial appearance were chosen for the robot to mirror the virtual agent’s persona, ensuring consistency in tone and perceived demeanor across conditions. The device interfaced with the MQTT backend and shared the communication infrastructure with the tablet app and virtual avatar, ensuring consistent protocol handling, synchronized prompts/responses, and unified user profiles. The Furhat delivered audio prompts, interpreted spoken responses, and provided both visual (LED, facial expression, gaze) and auditory feedback. A blue light on the bottom of the robotic platform indicated the status of the microphone, making the system’s state of Furhat transparent to users.

3.2.2. Component Integration

The technical components described in Section 3.2.1 are connected through the MQTT messaging protocol, which implements a subscribe-and-publish pattern, allowing for loose coupling and the interchange of individual components (e.g., Questionnaire tablet ↔ Furhat). Therefore, the setup includes an MQTT server instance running on a local computer device, and all components connected to the same network. The benefit of this approach was that we did not need to define the exact data schema for communication between components at the beginning, allowing for a more iterative development method. At the same time, we still had a structured communication, that was independent of which component was publishing or subscribing, and in which network, or on which device it was running.

A major integration challenge we encountered was considering the potential delay of messages through the MQTT protocol. For example, when the system reads out a synthetic speech response to the user, a message is sent to inform the speech recording component. If the message is not delivered immediately, it can occur that the system records the beginning of the synthetic speech answer and thus creates a loop of self-talking. We solved this challenge by introducing start and stop signals of speaking components, and the recording component waits for all received start signals to receive the matching stop signal, before taking the recording into consideration again. We used a dialogue manager to orchestrate the questionnaire navigation and the conversational flow and to redirect messages and commands between components. This enabled a fully synchronized, seamless, and low-latency system, where the separate modules functioned as a unified body. While integrating the virtual agent in the system was relatively straightforward, in the case of the Furhat robot, integration challenges, such as SDK limitations (e.g., the stability of custom speech synthesis or lip sync), as well as the need for robust fallback mechanisms in the event of network or recognition errors have emerged.

3.3. Experimental Design

The empirical evaluation was divided into study site I, which was conducted at an outpatient rehabilitation clinic with psychosomatic and neurological patients (clinic A, Section 3.3.1), and study site II, which was conducted at an inpatient rehabilitation clinic with psychosomatic patients (clinic B, Section 3.3.2). Different study designs were used in both study sites, as described below.

3.3.1. Study Site I: Outpatient Clinic

Study Design: This study employed a 3 × 2 × 2 mixed-design RCT, with embodiment form of the assistive agent as a between-subjects factor (three groups) and assistance (assisted vs. unassisted) and order of administration (assisted first vs. unassisted first) as within-subjects factors.

Randomization: Participants were block-randomized to one of three embodiment groups (robot, avatar, voice-only). Randomization was stratified by age cohort (18–34, 35–49, 50–64, 65–99) and medical condition (psychological vs. neurological). Within each stratum, blocks of 30 participants were used, ensuring balanced allocation across embodiment conditions. Randomization lists were generated in advance using Python 3.

Allocation followed the pre-generated randomization lists and was implemented sequentially according to participants’ scheduled study appointments within each stratum. Thus, all participants scheduled on a given day were assigned based on the predefined allocation sequence corresponding to their age cohort and medical condition. Allocation concealment and blinding were not feasible due to the nature of the intervention, which required visible preparation and setup of the assigned assistance system prior to each session.

Recruitment: Patients were recruited during the medical admission interview, and their participation was scheduled within the rehabilitation therapy plan. The inclusion criteria were an age between 18 and 99 years and ongoing treatment at the clinic. The only exclusion criterion was a lack of proficiency in German, although a basic knowledge of the language was considered sufficient.

Procedure: After providing informed consent, participants were briefed on the crossover study setup. Each completed the health questionnaire in both assisted and unassisted modes, in randomized sequence. Technical or procedural questions during this study could be signalled with a radio bell to ensure the support of the researchers if necessary. In the assisted conditions, the assistant/agent would greet the patient, explain the procedure of filling in the questionnaire, and familiarize the patient with the menu options and system functionalities, e.g., the ability to provide answers via voice or touch the screen. The assistant would then accompany the patient throughout the survey, reading questions and answers, providing additional information, and offering words of encouragement. The questionnaire tablet and the agent dialogue were in German. A short evaluation was conducted after each phase. All sessions included individual instructions and clear separation of questionnaire and evaluation. Final qualitative feedback was collected using a semi-structured interview. The interviews were based on an interview guide that focused on the interaction experience, the visual and general impression of the embodiment and assistance services, and any obstacles. Participation was voluntary; withdrawal or skipping questions was allowed at any point.

3.3.2. Study Site II: Inpatient Clinic

Study Design: This study employed a mixed design combining a randomized assignment of two intervention groups with a non-randomized control group (CG) drawn from routine care. This pragmatic design allowed comparison of assisted assessments with standard PC-based diagnostics under real-world clinical constraints.

Randomization: Participants in the intervention arm were block-randomized to one of two embodiment conditions (robot or avatar). Blinding of participants and researchers was not feasible due to the visible and interactive nature of the assistance systems.

Control Group: The control group consisted of patients undergoing standard PC-based diagnostics as part of routine care during the same recruitment period. No active recruitment or self-selection occurred, as participation fell under the clinic’s general consent for routine data use. Standard diagnostics were conducted in small groups according to the regular therapy schedule, and evaluation questionnaires were distributed by clinical staff following routine assessment. The control group served as a pragmatic reference condition rather than a randomized comparator.

Recruitment: On the day of admission to inpatient rehabilitation, patients attended a scheduled group session where this study was introduced, informed consent was obtained, and their participation was scheduled within the rehabilitation therapy plan. The inclusion criteria were an age between 18 and 70 years and ongoing treatment at the clinic. Exclusion criteria were lack of proficiency in German, restrictions that would hinder or preclude participation in regular diagnostics, and the current presence of active psychotic or manic symptoms.

Procedure: Participants were randomized to one of two assistance conditions (robot or avatar) and assigned a two-hour individual assessment appointment. Data collection took place in the participants’ rooms under real-world conditions, serving as an alternative to the standard PC-based diagnostic procedure. Two researchers prepared the assigned system, demonstrated its use, and explained the process. Participants completed approximately 250 PROM items, with any questions or technical issues signaled by ringing a bell and documented as off-script events. Participation was voluntary; item skipping was not permitted, as withdrawal required reassignment to the standard diagnostic routine. After completing the assessment, participants provided brief qualitative feedback and completed a quantitative evaluation questionnaire. PROM data from this study were subsequently merged with the clinic’s diagnostic system data. As well as at study site I, no blinding was used due to logical constraints of the intervention. For the CG, no active recruitment was conducted, as participation fell under the clinic’s general consent for routine data use. Standard PC-based diagnostics were carried out according to the therapy schedule in small groups of up to six patients in the diagnostics room. The clinical diagnostician additionally distributed short evaluation questionnaires following the routine assessment.

Sample Size Considerations: The study was designed as an exploratory evaluation under real-world clinical conditions. For study site I, the target sample size of approximately 100 participants was defined based on feasibility within the planned recruitment period. This sample size was considered sufficient to detect small effects in crossover designs and to support mixed-model analyses. No formal a priori power calculation was conducted for study site II due to its pragmatic design and constraints related to routine-care recruitment.

3.4. Data Analysis

All statistical analyses were conducted using Python 3 with the pandas, scipy, and statsmodels libraries. The analysis strategy was designed to account for the within-subject nature of the data at Clinic A and the between-subject setting at both clinics, and to compare user experience, trust, and social presence across embodied and non-embodied conditions. The interviews were transcribed using speech to text software, validated again, and then coded using qualitative content analysis according to [39]. We developed a code book with deductive and inductive codes. The codebook was pilot-tested on 10% of the material. Three coders evaluated it independently; discrepancies (five interviews) were resolved by discussing coding rules. After two iterations, we achieved conceptual saturation (intercoder Ü > 85%).

3.4.1. Questionnaires

User Experience Questionnaire (UEQ-Short): The UEQ-Short [40] is an 8-item instrument rated on 7-point semantic differential scales that assesses user experience along two higher-order dimensions. The pragmatic qualitysubscale consists of four items reflecting goal-directed interaction qualities, while the hedonic quality subscale consists of four items capturing non-goal-directed experiential aspects. For each participant and condition, mean scores for pragmatic (prag_mean) and hedonic (hedo_mean) quality were computed by averaging the respective item ratings. An overall UEQ score was calculated as the mean of the pragmatic and hedonic subscale scores, providing a global index of user experience.

Trust and Social Presence Scales: Trust was measured using the UEQ+ derivative of the UEQ questionnaire [41], rated on 7-point Likert scales. A composite trust score (trust_mean) was computed as the mean of these four items for each participant and condition. Social presence was assessed using seven items taken from the Lee et. al questionnaire [42] that measured the perceived “being there” and social richness of the interaction, rated on 10-point Likert scales. A composite social presence score (a_presence_mean) was calculated as the mean of these items for each participant and condition. Note that social presence was only assessed for assisted conditions where an agent (virtual, Furhat, voice-only) was present.

To obtain a concise overall preference index per condition, we additionally computed a standardized composite score that is distinct from the “overall UEQ” aggregate. For each clinic separately, condition means on overall UEQ, trust, and social presence (where available) were first converted to z-scores across the conditions of that clinic. This approach weights each construct equally and yields a unit-free index suitable for comparing relative preferences across conditions; to avoid confusion with the UEQ aggregate, we refer to this composite as the “overall preference rating”.

Additionally, patients at Clinic A have filled in a demographics questionnaire and an Affinity for technology interaction (ATI) questionnaire [43] as part of their evaluation questionnaire. We have excluded the ATI questionnaire from clinic B due to the extensive amount of questionnaire items and lack of time; very basic sociodemographic information was instead provided by clinical staff.

3.4.2. Independent Variables

The primary independent variable was embodiment condition. For Study site I (Clinic A), this factor had four levels: tablet-only (non-assisted interaction), virtual avatar (virtual agent), Furhat robot, and voice-only assistant. Each participant in Clinic A was assigned to one assisted condition (between-subjects) and also completed the tablet-only condition (within-subjects), resulting in a mixed experimental design. For Study site II (Clinic B), the embodiment condition had three levels: CG (routine care, non-randomized), virtual avatar intervention group, and Furhat robot intervention group. Participants in Clinic B were assigned between-subjects to one of these three conditions, with randomized allocation to the two intervention groups.

Secondary independent variables included:

Demographics
–
Diagnosis: Neurological disorder (only clinic A) vs. psychosomatic disorder (neuropsych).
–
Gender: Male vs. female (sex).
–
Age: Analyzed both as a continuous variable and as a categorical grouping (younger <45 years vs. older ≥45 years).
Affinity for Technology
–
ATI: Mean score on the ATI scale (ati_mean), analyzed both continuously and categorically (low, medium, high tertiles).

3.4.3. Statistical Procedures

Within- and Between-subject Comparisons: To compare tablet-only and assisted conditions within participants in the within-subject scenario, paired-samples t-tests were conducted for each outcome measure (overall UEQ, trust, social presence). This analysis collapsed across the three forms of embodiment to test the general effect of assistance present. To compare the three assisted embodiments (virtual, furhat, voice-only) for the between-subject conditions, one-way analyses of variance (ANOVAs) were conducted for each outcome measure, using only the assisted condition data. When the omnibus ANOVA indicated significant differences, post-hoc pairwise comparisons were conducted using Tukey’s HSD test to control for multiple comparisons among embodiments in the primary outcomes.

Subgroup Analyses and Effect Sizes: To examine the influence of demographic factors and technology affinity, we conducted one-way ANOVAs with sex, age group, and ATI group as independent variables and overall UEQ (for assisted conditions) as the dependent variable. Additional exploratory analyses examined interactions between embodiment and these demographic factors using two-way ANOVAs where sample sizes permitted. Statistical significance was set at

α

= 0.05 for all tests, and no additional multiplicity correction (e.g., Bonferroni) was applied to these exploratory subgroup analyses; corresponding results should therefore be interpreted with caution. Effect sizes are reported as Cohen’s d for t-tests and partial eta-squared (

η_{p}^{2}

) for ANOVAs where applicable, and the main figures additionally provide 95% confidence intervals to complement p-values.

4. Results

The following section presents our evaluation results. This study included a total of 195 patients with psychosomatic and neurological diagnoses recruited from the two rehabilitation clinics. Across both sites, 34 patients were diagnosed with neurological disorders and 161 with psychosomatic disorders. Specifically, at clinic A, the sample comprised 106 patients (46 male, 60 female), with a mean age of 49 years (range: 22–72). At clinic B, 89 patients participated (29 male, 56 female, 4 not specified or nonbinary), with a mean age of 50.17 years (range: 21–64). Participants with missing or incomplete data were excluded from the analysis. This resulted in

n = 102

patients for clinic A and

n = 87

for clinic B. Minor variations in effective n for specific scales are reflected in the ANOVA degrees of freedom reported in results (overall UEQ:

n = 102

, pragmatic UEQ:

n = 98

, hedonic UEQ:

n = 100

, trust:

n = 101

). In the following subsections, we present the results regarding user experience, trust, social presence, and the impact of demographic factors and affinity for technology on patients’ interactions with the different system conditions. Tables and figures are used to visualize the most important results, highlighting the significance of our findings.

4.1. User Experience Results

Clinic A: At this clinic, an overall UEQ score was computed as the mean of the hedonic and pragmatic subscales for each condition. Because each patient experienced both a tablet-only interaction and exactly one assisted condition (Virtual Avatar, Furhat, or VoiceOnly), we first contrasted tablet-only and assisted use within subjects and then compared the assisted types between subjects. A paired-samples t-test showed that the tablet-only condition yielded significantly higher overall UEQ scores than the assisted condition,

t (101) = 2.74

,

p = 0.007

(see Table 1). Tablet-only interaction showed higher pragmatic quality (

M = 5.98

,

S D = 0.94

) but lower hedonic quality (

M = 4.53

,

S D = 1.37

) (see Figure 3) than in the assisted conditions, reflecting a combination of efficiency and reduced stimulation. For the assisted conditions, the Virtual Avatar (

M_{prag} = 4.90

,

S D = 1.65

;

M_{hedo} = 5.36

,

S D = 1.31

), VoiceOnly (

M_{prag} = 5.03

,

S D = 1.76

;

M_{hedo} = 4.98

,

S D = 1.15

), and Furhat robot (

M_{prag} = 4.39

,

S D = 1.60

;

M_{hedo} = 4.91

,

S D = 1.24

) showed similar overall usability but slightly different profiles of pragmatic and hedonic qualities. A one-way ANOVA on overall UEQ for the assisted conditions revealed no significant differences between Virtual, Furhat, and VoiceOnly,

F (2, 99) = 0.40

,

p = 0.67

, suggesting that the presence of an assistant affected user experience more than the specific embodiment used.

To ensure that the analysis matched the mixed-design structure at Clinic A, we additionally conducted 2 (assistance: tablet-only vs. assisted) × 3 (embodiment: virtual avatar, Furhat robot, voice-only) mixed ANOVAs on overall, pragmatic, and hedonic UEQ scores. See Table 2 for full statistics. For overall UEQ, there was a significant main effect of assistance,

F (1, 100) = 7.38

,

p = 0.008

,

η_{p}^{2} = 0.07

, indicating higher usability ratings for the tablet-only condition compared to assisted use, but no significant main effect of embodiment,

F (2, 100) = 0.34

,

p = 0.71

,

η_{p}^{2} = 0.01

, and no assistance × embodiment interaction,

F (2, 100) = 0.28

,

p = 0.76

,

η_{p}^{2} = 0.01

. A similar pattern emerged for the pragmatic subscale, with a large main effect of assistance,

F (1, 96) = 46.90

,

p < 0.001

,

η_{p}^{2} = 0.33

, but no significant main effect of embodiment,

F (2, 96) = 1.01

,

p = 0.37

,

η_{p}^{2} = 0.02

, and no interaction,

F (2, 96) = 0.37

,

p = 0.69

,

η_{p}^{2} = 0.01

. For hedonic UEQ, assistance again showed a significant main effect,

F (1, 98) = 7.72

,

p = 0.007

,

η_{p}^{2} = 0.07

, while embodiment and the assistance × embodiment interaction remained non-significant, all

p > 0.61

. In contrast to pragmatic quality, this effect was reversed for hedonic UEQ, with the assisted conditions showing higher hedonic mean scores than the tablet-only control condition. These results indicate that the assistance-related differences do not vary systematically across embodied agent forms; rather, assistance shows opposite effects on pragmatic and hedonic user experience—being associated with lower pragmatic but higher hedonic ratings—while remaining largely independent of the specific embodiment.

Clinic B: In this clinic, a between-subjects design was employed with participants in the intervention arm randomly assigned to one of two assistance conditions (Virtual Avatar or Furhat robot) and compared with a non-randomized control group receiving routine PC-based diagnostics (CG; effective sample sizes per outcome are reported in Table 1 and Table A1). The Furhat robot condition showed the highest pragmatic quality (

M = 6.17

,

S D = 0.7

), followed by the Virtual Avatar (

M = 5.98

,

S D = 1.17

) and CG control condition (

M = 5.26

,

S D = 1.25

), whereas both agent-based conditions yielded higher hedonic quality (

M_{hedo, Avatar} = 5.73

,

S D = 1.13

;

M_{hedo, Furhat} = 5.65

,

S D = 1.11

) than CG (

M_{hedo, CG} = 4.68

,

S D = 1.38

) (see Figure 4). A one-way ANOVA on overall UEQ scores revealed a significant main effect of condition,

F (2, 72) = 8.57

,

p = 0.0005

,

η^{2} = 0.19

. Post-hoc tests (Tukey HSD) indicated that both the Virtual Avatar (

M \approx 5.85

,

S D \approx 1.02

) and Furhat robot (

M \approx 5.91

,

S D \approx 0.93

) conditions were rated significantly higher than CG (

M \approx 4.97

,

S D \approx 1.13

; both

p < 0.01

), whereas the difference between the two agents was not significant. Analyses of UEQ subscales corroborated this pattern. For hedonic UEQ, a one-way ANOVA was significant,

F (2, 82) = 6.73

,

p = 0.002

,

η^{2} = 0.14

, with both Virtual Avatar and Furhat showing higher hedonic ratings than CG and all agent vs. CG contrasts reaching

p < 0.05

. In contrast, pragmatic UEQ scores also differed by condition,

F (2, 72) = 4.46

,

p = 0.015

, but the only significant contrast was between Furhat and CG (Tukey HSD

p = 0.037

), while Virtual Avatar vs. CG did not differ significantly (

p = 0.071

). A summary of all UEQ ratings for both clinics is shown in Table 1; detailed Tukey HSD contrasts for Clinic B are provided in Table A2.

4.2. Trust and Data Disclosure

Clinic A: A paired comparison between tablet-only and assisted use showed that tablet-only interaction was rated significantly more trustworthy than the assisted condition overall,

t (100) = 3.65

,

p = 0.001

, with means of

M = 5.62

(

S D = 1.35

) for tablet-only and lower scores for the assisted conditions (see Figure 5 and Table 1). A one-way ANOVA on trust scores for the assisted conditions indicated a significant effect of embodiment,

F (2, 99) = 3.18

,

p = 0.046

; see Table A3, with VoiceOnly (

M = 5.67

,

S D = 1.19

) and Virtual Avatar (

M = 5.14

,

S D = 1.57

) generally trusted more than Furhat (

M = 4.67

,

S D = 1.65

). These results show that, in clinic A, patients trusted the tablet-only interface most, and among assistants they tended to trust the voice-only interface more. A 2 (assistance: tablet-only vs. assisted) × 3 (embodiment: virtual avatar, Furhat robot, voice-only) mixed ANOVA on trust scores showed a significant main effect of assistance,

F (1, 99) = 13.76

,

p < 0.001

,

η_{p}^{2} = 0.12

, ndicating that tablet-only interaction was trusted more than the assisted conditions overall. The main effect of embodiment did not reach significance,

F (2, 99) = 1.92

,

p = 0.15

,

η_{p}^{2} = 0.04

, and the assistance × embodiment interaction was only marginal,

F (2, 99) = 2.58

,

p = 0.08

,

η_{p}^{2} = 0.05

, consistent with the pattern that trust is generally high for tablet-only, virtual, and voice-only interfaces but reduced for the Furhat robot.

Clinic B: Trust ratings (mean across the trust items) were uniformly high across all three conditions (CG:

M = 5.35

,

S D = 1.52

; Virtual Avatar:

M = 5.54

,

S D = 1.20

; Furhat Robot:

M = 5.61

,

S D = 1.63

; see Table 1), whereas a one-way ANOVA revealed no significant differences in trust between conditions,

F (2, 83) = 0.25

,

p = 0.779

,

η^{2} = 0.01

, indicating that the presence or type of agent did not affect trust ratings in this sample.

4.3. Social Presence Results

Clinic A: Social presence ratings showed the highest scores for voiceOnly (

M = 5.32

), followed by virtual avatar (

M = 4.67

) and Furhat (

M = 4.39

). A one-way ANOVA across the assisted embodiments did not reveal significant differences in social presence between Virtual Avatar, Furhat, and VoiceOnly,

F (2, 99) = 1.65

,

p = 0.20

, indicating that the descriptive ordering (VoiceOnly > Virtual Avatar > Furhat) was not statistically robust at the chosen significance level.

Clinic B: Social presence was assessed only for the agent conditions (virtual avatar and robot) at this clinic, as the CG condition lacked agent features. Descriptively, the Furhat showed slightly higher social presence ratings (

M = 5.16

,

S D = 1.63

) than the virtual avatar (

M = 4.91

,

S D = 1.73

). A one-way ANOVA including the small subset of CG participants who answered social presence items yielded no significant differences across conditions,

F (2, 37) = 0.53

,

p = 0.594

, and Tukey HSD contrasts likewise indicated no reliable pairwise effects (all

p > 0.57

; see Table A2). The social presence scores for both clinics can be viewed on Table 1.

4.4. Overall Preference Ratings

For both clinics, overall ratings were calculated using a standardized scoring procedure. For each clinic separately, condition means on each measure (overall UEQ, trust, social presence) were first converted to z-scores across the available conditions for that clinic. The overall rating for each condition was then computed as the mean of its z-scores on the included measures using UEQ and trust for the CG condition in clinic B and the tablet-only condition in Clinic A, and UEQ, trust, and social presence for all other conditions. In Clinic A, the tablet-only condition showed the highest standardized overall rating (

z = 1.16

), followed by VoiceOnly (

z = 0.66

), whereas the virtual avatar (

z = - 0.16

) and especially the Furhat robot (

z = - 1.27

) scored lower on this composite. In Clinic B, the Furhat robot obtained the highest overall preference rating (

z = 0.92

), the avatar condition was close to the clinic mean (

z \approx 0.00

), and the routine-care control condition scored lowest (

z = - 1.39

). We also explored alternative constructions of this overall preference score e.g., based only on overall UEQ, or by averaging the raw scores before standardizing them, and always obtained the same rank order of conditions within each clinic. This shows that our conclusions about overall preference do not depend on the exact formula used, but instead reflect a stable pattern across overall UEQ, trust, and social presence.

4.5. Impacts of Demographic Factors and Affinity for Technology

Clinic A: To examine the influence of demographic factors and technology affinity on user experience in clinic A, we analyzed overall UEQ scores for the assisted conditions by sex, age group, and ATI group. One-way ANOVAs revealed no significant main effects of sex (

F \approx 0.38

,

p \approx 0.54

), age group (<45 vs. ≥45 years;

F \approx 0.06

,

p \approx 0.81

), or ATI tertiles (

F \approx 1.13

,

p \approx 0.33

) on overall UEQ. Thus, in this dataset, differences in user experience between embodiments appear not to be driven by demographic or attitudinal variables but by embodiment characteristics.

Clinic B: For clinic B, we examined the influence of sex and age on overall UEQ ratings for the agent conditions (virtual avatar and robot combined). Independent-samples t-tests revealed no significant differences by sex (male

M = 5.76

, female

M = 5.72

;

t (53) = 0.13

,

p = 0.900

) or age group (<45 years

M = 5.91

, ≥45 years

M = 5.67

;

t (55) = 0.78

,

p = 0.437

). Thus, consistent with clinic A, demographic factors did not appear to shape overall UEQ scores in clinic B, suggesting that responses to agent-based PROM collection are broadly consistent across demographic subgroups.

4.6. Qualitative Data

We additionally interviewed each patient after their interaction with the system regarding their experience. Although the focus of this paper remains on the aforementioned quantitative data, we briefly report selected user insights. Voice input and speech-based agents were valued by some participants, especially those with visual impairment, but not all users found these modalities helpful. Frustration levels were occasionally higher in assisted conditions, pointing to the need for ongoing system refinement and better contextual adaptation. Qualitative feedback underscored the multidimensional nature of acceptance and usability: users appreciated the motivational and emotional qualities of embodied agents, finding speech output and friendly prompts helpful for sustaining engagement, yet some felt observed, overwhelmed, or confused by new functionalities, and expressed a desire for more control, improved graphics, and multilingual support.

5. Discussion

5.1. Context-Dependent Effects of Agent Embodiment

The user experience data reveal context-dependent patterns that underscore the complexity of deploying conversational agents in healthcare settings. At clinic B, agent-based conditions (both virtual avatar and Furhat robot) significantly outperformed the CG condition on overall UEQ ratings (

p = 0.008

), with this advantage driven primarily by hedonic qualities such as stimulation, novelty, and engagement (

p < 0.001

). The Furhat robot, in particular, was rated significantly higher than CG on both overall UEQ (

p = 0.006

) and hedonic dimensions (

p < 0.001

), while the virtual avatar showed a similar trend (

p = 0.047

for hedonic). Pragmatic qualities (ease of use, clarity, efficiency) at clinic B showed no significant differences across conditions, suggesting that agents enhance the experiential and emotional aspects of PROM collection without compromising usability. In contrast, at clinic A, the tablet-only condition was rated significantly higher than all assisted conditions on both overall UEQ (

p = 0.007

) and trust (

p < 0.001

). Among the assisted modalities, no significant differences emerged in overall UEQ, though trust ratings varied: voice-only assistance was trusted significantly more than the Furhat robot (

p = 0.046

), suggesting that disembodied voice may be perceived as less intrusive or more neutral in certain clinical contexts. These contrasting patterns between sites highlight the importance of contextual factors such as patient population characteristics, clinical workflows, and environmental constraints in shaping user responses to embodied agents.

5.2. Trust and Embodiment in Healthcare Contexts

Trust ratings were uniformly high across all conditions in clinic B, with no significant differences between CG, virtual avatar, and robot (

p = 0.562

). This suggests that, in this sample, the introduction of conversational agents did not undermine patients’ confidence in the data collection process. However, in clinic A, tablet-only interaction was rated as significantly more trustworthy than all assisted conditions, and among assistants, voice-only was preferred over the robot embodiment. This discrepancy could have several possible explanations, such as differences in patient expectations, prior exposure to technology, or perceived legitimacy of different interaction modalities. Although these factors were not directly measured in our research and, therefore, they remain hypotheses for future work. The finding that voice-only assistance was trusted more than physically embodied agents in clinic A is particularly noteworthy. One possible interpretation, consistent with prior work on trust in anthropomorphic systems, is that anthropomorphic presence may sometimes increase perceived risk or uncertainty, especially when patients are unfamiliar with such technologies or when the stakes of accurate health reporting feel high.

5.3. Social Presence and User Engagement

Social presence ratings showed modest but meaningful variation across conditions. In Clinic A, voice-only interaction elicited the highest social presence ratings, followed by the virtual avatar and then the Furhat robot, although these differences were not statistically significant (

p = 0.20

). In clinic B, the robot was rated descriptively higher on social presence than the virtual avatar, but again this difference was not significant (

p = 0.122

). These patterns suggest that social presence is multifaceted and may depend on factors beyond embodiment alone, such as vocal quality, conversational responsiveness, and the alignment of agent behavior with user expectations. These mechanisms were not directly assessed in this work but are suggested by prior literature [10,44] as well as by our qualitative interview data, and thus further research is required to draw concrete conclusions. The finding that disembodied voice-only assistance can elicit strong social presence in this context suggests that physical or visual embodiment is not, by itself, sufficient to guarantee engaging human–agent interaction. Instead, social presence appears to depend more strongly on paralinguistic cues, conversational dynamics, and the alignment between the agent’s role, behavior, and user expectations, especially whether the embodiment conveys additional, task-relevant meaning beyond basic movements and facial expressions.

5.4. Demographic and Individual Differences

Importantly, demographic factors (sex, age) and affinity for technology (ATI) did not significantly moderate overall user experience ratings in either clinic. In Clinic A, no main effects of sex (

p \approx 0.54

), age (

p \approx 0.81

), or ATI tertiles (

p \approx 0.33

) were observed on overall UEQ scores for assisted conditions. Similarly, in clinic B, sex (

p = 0.900

) and age group (

p = 0.437

) did not predict differential responses to agent-based assistance. This suggests that the benefits and challenges of conversational agents for PROM collection are broadly consistent across patient subgroups, supporting their accessibility across demographic subgroups within these clinics.

5.5. Qualitative Feedback

The qualitative data from the patients’ feedback indicate that individual preferences and concerns, such as discomfort with being “watched,” desire for multilingual support, or preference for paper-and-pencil formats, remain important considerations for system design. While assistance enabled participation for patients with sensory or motivational difficulties and compensated for barriers related to reading and focus, it sometimes introduced additional effort due to unfamiliarity or system complexity. The balance of technical innovation with clarity and personal relevance remains a core design challenge.

5.6. Implications for Design and Deployment

The divergent patterns between clinic A and clinic B underscore the need for adaptable, configurable systems that can be tailored to specific clinical contexts and patient populations. In settings where patients prioritize efficiency, familiarity, and perceived neutrality (as suggested by clinic A results), traditional tablet-based or minimally assisted approaches may be preferable. In contrast, environments where engagement, emotional support, and experiential quality are valued (as in clinic B) may benefit from more richly embodied agents. This divergence could also potentially stem from differences in local workflows. In Clinic A, the multimodal tablet and assistance system was newly introduced alongside existing outpatient rehabilitation appointments, so the assisted conditions may have been experienced as more complex and time-consuming than the relatively straightforward tablet-only interaction. In Clinic B, by contrast, the agents replaced a longer, PC-based diagnostic session in individual in-room assessments. This context helps explain why embodied conditions were relatively more attractive there. These observations suggest that the generalizability of our findings depends not only on patient characteristics, but also on how assistant systems are embedded into existing clinical routines. The finding that only hedonic UEQ dimensions differ significantly across conditions in clinic B suggests that embodied agents serve primarily to enhance the affective and motivational aspects of PROM collection, rather than to improve functional usability. This distinction is critical for designers and implementers, as it implies that agents should be positioned as engagement tools rather than replacements for well-designed interfaces.

The results of this study provide important insights into how multimodal AI assistance can enhance user experience, trust, and social presence in digital health data collection without compromising the integrity or validity of patient-reported outcome measures. The use of embodied agents has not majorly improved the overall quality of experience for the patients. While some participants appreciated the motivational and emotional qualities of the embodied agent, the primary drivers of their experience were pragmatic factors such as clarity of instructions, robustness of speech recognition, and the fit with their clinical routines. This pattern is consistent with recent work on conversational agents and social robots in healthcare, which finds that embodied interfaces can enhance engagement and perceived presence but do not reliably translate into large gains in effectiveness, trust, or adherence across all user groups [45,46].

5.7. Limitations and Future Directions

Several limitations should be acknowledged. First, the two study sites employed different experimental designs, with a within-subjects crossover design in Clinic A and a between-subjects design with a pragmatic control group in Clinic B. These design differences limit direct quantitative comparability across sites and preclude strong cross-centre causal inferences. Second, the study was conducted in German-speaking rehabilitation clinics, and results may not generalize to other languages, cultures, or clinical settings. Third, while PROMIS score equivalence was maintained, longer-term effects on data completeness, clinical utility, and patient adherence were not assessed. Future research should explore longitudinal impacts, examine how agent characteristics (e.g., voice quality, visual realism, conversational style) interact with patient needs, and investigate the role of system customization and user control in shaping acceptance and trust.

Several methodological constraints need to be taken into consideration regarding the inter-centre differences. In Clinic B, the comparison with routine care relied on a non-randomized control groupdrawn from standard clinical practice and should therefore be interpreted as a pragmatic reference rather than a fully randomized comparator. Although the control group was recruited during the same period and did not involve self-selection, unmeasured differences in case mix, motivation, or contextual factors may have biased condition effects despite randomized allocation within the agent arms. Moreover, the randomization procedures and experimental designs differed between study sites, which limits the strength of causal claims across sites. In addition, allocation concealment and blinding were not feasible due to the visible and interactive nature of the assistance systems; randomization lists were generated in advance and implemented sequentially. Finally, all inferential analyses were based on complete cases. If patients with missing evaluation items systematically differed from those with complete data, our estimates of user experience, trust, and social presence may be affected by selection bias and should therefore be interpreted carefully.

6. Conclusions

This study examined user experience, trust, and social presence for different embodiments of an AI-driven, multimodal assistance system for collecting PROMs in two rehabilitation clinics. Across 195 patients in two clinics, we compared non-assisted conditions of tablet-only and PC-based control group to assisted agents (virtual avatar, a physical robot, and a voice-only assistant). We did not directly assess data completeness, measurement validity, or clinical decision-making and our conclusions therefore revolve around user experience aspects rather than demonstrable gains in data integrity or clinical utility. Our results suggest a mixed pattern across two clinics. At clinic A, the tablet-only condition achieved significantly higher user experience ratings than the assisted embodiments and a higher trust rating than the robot and avatar conditions, suggesting that simplicity and familiarity can outweigh the added value of conversational agents in this context. However, the slightly higher trust ratings of the voice-only agent suggest a positive impact of a voice assistant on users’ trust. Among the assisted conditions, overall UEQ scores did not differ significantly, although the voice-only interface was trusted more than the robot embodiment. At clinic B, by contrast, both the Furhat robot and virtual avatar scored significantly higher than the CG condition on hedonic user experience, indicating that agent-based interactions can enhance engagement when introduced in a receptive clinical setting. In clinic A, voice-only and virtual embodiments were descriptively more socially present than the robot; in Clinic B, robot and avatar social presence were similar and differences were not significant.” Demographic variables such as age, gender, and affinity for technology had only minor effects on user experience, suggesting that well-designed conversational agents can be accessible and engaging for a broad patient population.

The results point to context-dependent implications for the design and deployment of AI-driven healthcare interfaces. First, they highlight that the value of embodied agents is context-dependent: in settings where efficiency and trust are paramount, simpler interfaces may be preferred, while in contexts emphasizing patient engagement and experience, conversational agents can offer measurable benefits. Second, the lack of strong embodiment effects among assisted conditions suggests that interaction quality, conversational naturalness, responsiveness, and perceived empathy may matter more than the specific physical or virtual form of the agent. Third, the robustness of user experience across demographic subgroups is encouraging for deployment in similarly structured rehabilitation populations. Initial qualitative user feedback from the UX studies and the patient advisory board identified challenges related to navigation intuitiveness, indicating a need for ongoing user-centered refinement and rapid prototyping to strike a balance between configurability and simplicity. Finally, testing the system with real patients in the clinics highlighted the importance of speech output and motivational system dialogue for enhancing user engagement and comfort.

Future work should explore the longitudinal effects of repeated interactions with these systems, investigate the role of conversational content and emotional expression in shaping trust and engagement, and examine how individual patient characteristics, such as diagnosis, health literacy, and technology anxiety, moderate responses to different embodiments. Additionally, controlled studies manipulating specific design features, such as avatar realism, voice quality, and conversational style, will help isolate the mechanisms underlying the observed patterns. As healthcare increasingly integrates AI-driven interactive systems, understanding how to match embodiment and interaction modalities to clinical goals and patient needs will be essential for maximizing both acceptance and perceived usefulness. This study provides empirical evidence on the complex interplay between embodiment, user experience, and trust in healthcare AI systems, offering practical guidance for designers and clinicians seeking to implement patient-centered digital health solutions.

Author Contributions

Conceptualization, N.A., P.G., M.M., S.H., and J.-N.V.-A.; methodology, N.A., P.G., M.M., S.H., P.H., N.P., and E.C.; software, N.A., P.H., N.P., and E.C.; validation, N.A., P.G., M.M., S.H., P.H., and N.P.; formal analysis, N.A., M.M., L.P., and P.H.; investigation, N.A., P.G., M.M., L.P., and P.H.; resources, P.G., M.M., L.P., S.H., N.P., D.C., and J.-N.V.-A.; data curation, N.A., M.M., L.P., and P.H.; writing—original draft preparation, N.A.; writing—review and editing, N.A., P.G., M.M., S.H., P.H., N.P., and J.-N.V.-A.; visualization, N.A., M.M., and P.H.; supervision, S.H. and J.-N.V.-A.; project administration, S.H., J.-N.V.-A., D.C., and L.P.; funding acquisition, S.H., J.-N.V.-A., and D.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the German Federal Ministry of Education and Research (BMBF) under the program “AI-based Assistance Systems for Process-Accompanying Health Applications (Künstliche Intelligenz in der Arbeitswelt von morgen – KIAS)” as part of the MIA-PROM project (Multimodal Interactive Assistant for Patient-Reported Outcome Measures), grant number 16SV8732. The APC was funded by the German Federal Ministry of Education and Research (BMBF).

Institutional Review Board Statement

These studies were conducted in accordance with the Declaration of Helsinki, and approved by the Institutional Review Board of the Charité – Universitätsmedizin Berlin, Virchowweg 22, 10117 Berlin (protocol code: EA1/194/24, date: 30 September 2024) and the ethics committee of Landesärztekammer Brandenburg, Geschäftsstelle Cottbuss, Postfach 101445, 03014 Cottbus (protocol code: 2024-33-BO-ff, date: 24 April 2024).

Informed Consent Statement

Informed consent was obtained from all subjects involved in this study.

Data Availability Statement

The datasets presented in this article are not readily available because They contain confidential patient information. Requests to access the datasets should be directed to navid.ashrafi@hshl.de and manuela.marquardt@charite.de.

Acknowledgments

The authors of this paper used PerplexityAI (Version 145.0.7632.76; Official Build; arm64) and Grammarly for grammar and spell-checking. After using these tools, the authors reviewed and edited the content as needed and take full responsibility for the publication’s content.

Conflicts of Interest

The authors declare that they have no financial or personal relationships that could inappropriately influence or bias the work reported in this manuscript. Two co-authors are employed by Acalta GmbH and Dexter Health GmbH were not involved in the study design, data collection, statistical analysis, interpretation of the findings, or the decision to submit the article for publication, and company personnel did not have access to identifiable participant data. All authors take responsibility for the integrity and accuracy of the analyses.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
API	Application Programming Interface
AR	Augmented Reality
ASR	Automatic Speech Recognition
ATI	Affinity for Technology Interaction
ECA	Embodied Conversational Agent
EHR	Electronic Health Record
ePROM	Electronic Patient-Reported Outcome Measure
FHIR	Fast Healthcare Interoperability Resources
HCI	Human–Computer Interaction
IoT	Internet of Things
IVA	Intelligent Virtual Agent
LLM	Large Language Model
MQTT	Message Queuing Telemetry Transport
NLU	Natural Language Understanding
PROM	Patient-Reported Outcome Measure
PROMIS	Patient-Reported Outcomes Measurement Information System
PRO	Patient-Reported Outcome
TTS	Text-to-Speech
UI	User Interface
UX	User Experience
VR	Virtual Reality

Appendix A. Supplementary Tables

Table A1. Descriptive statistics (means and standard deviations) for all outcome measures by clinic and condition.

Clinic	Outcome	Condition	n	M	SD

Clinic A	Overall UEQ	Tablet-only	102	5.22	1.06
Clinic A	Overall UEQ	Virtual Avatar	102	4.94	1.52
Clinic A	Overall UEQ	VoiceOnly	102	4.83	1.45
Clinic A	Overall UEQ	Furhat Robot	102	4.63	1.28
Clinic A	Hedonic UEQ	Tablet-only	93	4.53	1.37
Clinic A	Hedonic UEQ	Virtual Avatar	31	5.36	1.31
Clinic A	Hedonic UEQ	VoiceOnly	32	4.98	1.15
Clinic A	Hedonic UEQ	Furhat Robot	29	4.91	1.24
Clinic A	Pragmatic UEQ	Tablet-only	91	5.98	0.94
Clinic A	Pragmatic UEQ	Virtual Avatar	32	4.90	1.65
Clinic A	Pragmatic UEQ	VoiceOnly	32	5.03	1.76
Clinic A	Pragmatic UEQ	Furhat Robot	29	4.39	1.60
Clinic A	Trust	Tablet-only	92	5.62	1.35
Clinic A	Trust	Virtual Avatar	31	5.14	1.57
Clinic A	Trust	VoiceOnly	32	5.67	1.19
Clinic A	Trust	Furhat Robot	29	4.67	1.65
Clinic A	Social Presence	Virtual Avatar	32	4.67	2.38
Clinic A	Social Presence	VoiceOnly	32	5.32	2.04
Clinic A	Social Presence	Furhat Robot	29	4.39	2.17
Clinic B	Pragmatic UEQ	CG	42	5.26	1.25
Clinic B	Pragmatic UEQ	Virtual Avatar	19	5.98	1.17
Clinic B	Pragmatic UEQ	Furhat Robot	14	6.17	0.79
Clinic B	Hedonic UEQ	CG	46	4.68	1.38
Clinic B	Hedonic UEQ	Virtual Avatar	21	5.73	1.13
Clinic B	Hedonic UEQ	Furhat Robot	18	5.65	1.11
Clinic B	Trust	CG	46	5.35	1.52
Clinic B	Trust	Virtual Avatar	21	5.54	1.20
Clinic B	Trust	Furhat Robot	19	5.61	1.63
Clinic B	Social Presence	Virtual Avatar	18	4.91	1.73
Clinic B	Social Presence	Furhat Robot	17	5.16	1.63

Note. Sample sizes differ per outcome because some patients did not complete all evaluation items. Overall UEQ values in the main text and Table 1 are derived as the mean of pragmatic and hedonic subscales.

Table A2. Tukey HSD post-hoc comparisons between conditions in Clinic B (CG, Virtual Avatar, Furhat Robot).

Outcome	Contrast	$Δ M$	95% CI	$p_{Tukey}$
Overall UEQ	CG vs. Virtual Avatar	0.93	[0.24, 1.62]	0.005 ^∗∗
	CG vs. Furhat Robot	1.10	[0.32, 1.87]	0.003 ^∗∗
	Virtual Avatar vs. Furhat Robot	−0.17	[−1.05, 0.71]	0.89
Pragmatic UEQ	CG vs. Virtual Avatar	0.72	[−0.05, 1.49]	0.071
	CG vs. Furhat Robot	0.90	[0.05, 1.76]	0.037 ^∗
	Virtual Avatar vs. Furhat Robot	−0.18	[−1.16, 0.80]	0.89
Hedonic UEQ	CG vs. Virtual Avatar	1.05	[0.25, 1.85]	0.007 ^∗∗
	CG vs. Furhat Robot	0.97	[0.13, 1.82]	0.020 ^∗
	Virtual Avatar vs. Furhat Robot	0.07	[−0.90, 1.05]	0.98
Trust	CG vs. Virtual Avatar	0.19	[−0.74, 1.12]	0.88
	CG vs. Furhat Robot	0.26	[−0.70, 1.22]	0.80
	Virtual Avatar vs. Furhat Robot	−0.07	[−1.18, 1.05]	0.99
Social Presence	CG vs. Virtual Avatar	0.66	[−1.47, 2.78]	0.73
	CG vs. Furhat Robot	0.90	[−1.24, 3.04]	0.57
	Virtual Avatar vs. Furhat Robot	−0.24	[−1.66, 1.18]	0.91

Note.

Δ M = M_{first} - M_{second}

. Positive values indicate higher scores for the first condition in each contrast. Values are based on Tukey HSD tests following the Clinic B ANOVAs. ^∗

p < 0.05

, ^∗∗

p < 0.01

.

Table A3. Post-hoc comparisons between assisted embodiments at Clinic A (Virtual Avatar, VoiceOnly, Furhat).

Contrast	Measure	$Δ M$ (95% CI)	$p_{Tukey}$
Virtual Avatar vs. VoiceOnly	Overall UEQ	0.11 ( $- 0.52$ , 0.74)	0.90
Virtual Avatar vs. Furhat	Overall UEQ	0.31 ( $- 0.32$ , 0.94)	0.51
VoiceOnly vs. Furhat	Overall UEQ	0.20 ( $- 0.44$ , 0.84)	0.78
Virtual Avatar vs. VoiceOnly	Pragmatic UEQ	$- 0.13$ ( $- 0.87$ , 0.61)	0.92
Virtual Avatar vs. Furhat	Pragmatic UEQ	0.51 ( $- 0.23$ , 1.25)	0.24
VoiceOnly vs. Furhat	Pragmatic UEQ	0.64 ( $- 0.10$ , 1.38)	0.11
Virtual Avatar vs. VoiceOnly	Hedonic UEQ	0.38 ( $- 0.35$ , 1.11)	0.47
Virtual Avatar vs. Furhat	Hedonic UEQ	0.45 ( $- 0.28$ , 1.18)	0.35
VoiceOnly vs. Furhat	Hedonic UEQ	0.07 ( $- 0.66$ , 0.80)	0.99
Virtual Avatar vs. VoiceOnly	Trust	$- 0.53$ ( $- 1.37$ , 0.31)	0.29
Virtual Avatar vs. Furhat	Trust	0.38 ( $- 0.46$ , 1.22)	0.54
VoiceOnly vs. Furhat	Trust	0.91 (0.07, 1.75)	0.046 ^∗

Note.

Δ M = M_{first} - M_{second}

. Positive values indicate higher ratings for the first condition in the contrast.

p_{Tukey}

refers to Tukey’s HSD post-hoc tests following the one-way ANOVAs on assisted conditions. Only the VoiceOnly vs. Furhat contrast on trust reached statistical significance, consistent with the main text. Asterisks indicate significance: ^∗

p < 0.05

.

References

Bonsel, J.; Itiola, A.; Huberts, A.; Bonsel, G.; Penton, H. The use of patient-reported outcome measures to improve patient-related outcomes—A systematic review. Health Qual. Life Outcomes 2024, 22, 97. [Google Scholar]
Bertelsmann Stiftung. Patient-Reported Outcome Measures (PROMs): An International Comparison; Technical report, Bertelsmann Stiftung Reports; Bertelsmann Stiftung: Gütersloh, Germany, 2021. [Google Scholar]
Hillmann, N.; Schröder, C.; Kulik, T.; Völkel, S.; Huang, X.; Caprini, F.; Beck, J.; Staubitz, T.; Scherer, P. Multimodal Interactive Assistance for the Digital Collection of PROMs in Rehabilitation. MIA-PROM Project Report, 2024. Available online: https://mia-prom.de (accessed on 1 February 2026).
Rusconi, D.; Basile, I.; Rampichini, F.; Colombo, S.; Arba, L.; Pancheri, M.L.; Consolo, L.; Lusignani, M. Electronic Patient Reported Outcomes Measures (e-PROMs) in Pediatric Palliative Oncology Care: A Scoping Review. J. Palliat. Care 2024, 39, 298–315. [Google Scholar] [CrossRef]
Wittich, L.; Tsatsaronis, C.; Kuklinski, D.; Schöner, L.; Steinbeck, V.; Busse, R.; Rombey, T. Patient-Reported outcome measures as an intervention: A comprehensive overview of systematic reviews on the effects of feedback. Value Health 2024, 27, 1436–1453. [Google Scholar]
Long, C.; Beres, L.K.; Wu, A.W.; Giladi, A.M. Patient-level barriers and facilitators to completion of patient-reported outcomes measures. Qual. Life Res. 2022, 31, 1711–1718. [Google Scholar] [PubMed]
Aiyegbusi, O.L.; Roydhouse, J.; Rivera, S.C.; Kamudoni, P.; Schache, P.; Wilson, R.; Stephens, R.; Calvert, M. Key considerations to reduce or address respondent burden in patient-reported outcome (PRO) data collection. Nat. Commun. 2022, 13, 6026. [Google Scholar] [CrossRef]
Tran, V.T.; Riveros, C.; Ravaud, P. Reimagining patient-reported outcomes in the age of generative AI. Npj Digit. Med. 2025, 8, 624. [Google Scholar] [CrossRef]
Bickmore, T.W.; Pfeifer, L.M.; Jack, B.W. Taking the time to care: Empowering low health literacy hospital patients with virtual nurse agents. In Proceedings of the CHI ’10: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Atlanta, GA, USA, 10–15 April 2010; Association for Computing Machinery: New York, NY, USA, 2010; pp. 1413–1422. [Google Scholar] [CrossRef]
Demiris, G.; Choi, Y. Embodied conversational agents in healthcare: A scoping review. J. Am. Med Informatics Assoc. 2020, 27, 1799–1808. [Google Scholar] [CrossRef]
Mordoch, E.; Osterreicher, A.; Guse, L.; Roger, K.; Thompson, G. Use of social commitment robots in the care of elderly people with dementia: A literature review. Maturitas 2013, 74, 14–20. [Google Scholar] [CrossRef] [PubMed]
Shibata, T.; Wada, K. Therapeutic seal robot as biofeedback medical device: Qualitative and quantitative evaluations of robot therapy in dementia care. Proc. IEEE 2012, 100, 2527–2538. [Google Scholar] [CrossRef]
Palani, S.; Luger, E. Voice interfaces in healthcare: Patient perspectives on smart speakers. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, Glasgow, UK, 4–9 May 2019; pp. 1–13. [Google Scholar] [CrossRef]
Saripalle, R.; Patel, R. From Command to Care: A Scoping Review on Utilization of Smart Speakers by Patients and Providers. Mayo Clin. Proc. Digit. Health 2024, 2, 207–220. [Google Scholar] [CrossRef]
Schumann, A.; Tschäpe, J.; Wieser, M. Intelligent conversational agents in healthcare: A review of natural language processing applications. J. Biomed. Inform. 2019, 94, 103182. [Google Scholar] [CrossRef]
Schachner, T.; Leitner, P.; Martinez, M. A systematic review of health dialog systems. Methods Inf. Med. 2019, 58, e1–e18. [Google Scholar] [CrossRef] [PubMed]
Holzinger, A.; Biemann, C.; Pattichis, C.S.; Kell, D.B. What do we need to build explainable AI systems for the medical domain? Rev. Interdiscip. Res. 2019, 8, 1–28. [Google Scholar]
Magdon-Ismail, Z.; Castelli, G. Human-centred design of patient-facing digital health tools. In Digital Health: Scaling Healthcare to the World; Bates, D.W., Wright, A., Eds.; Springer: Berlin/Heidelberg, Germany, 2020; pp. 45–68. [Google Scholar]
Bieniek, J.; Rahouti, M.; Verma, D.C. Generative AI in Multimodal User Interfaces: Trends, Challenges, and Cross-Platform Adaptability. arXiv 2024, arXiv:2411.10234. [Google Scholar] [CrossRef]
Jacob, A.; Fish, J.; Breslin, D. A Review on Virtual Personal AI Assistants; Zenodo: Geneva, Switzerland, 2024; Available online: https://zenodo.org/records/13447274 (accessed on 1 February 2026).
Staubitz, T.; Völkel, S.; Hillmann, N.; Huang, X.; Caprini, F.; Beck, J.; Scherer, P. Local vs. Avatar Robot: Performance and Perceived Presence. Front. Robot. AI 2021, 8, 778753. [Google Scholar]
Karami, V.; Yaffe, M.J.; Gore, G.C.; Moon, A.; Abbasgholizadeh, S. Socially Assistive Robots for Patients with Alzheimer’s Disease: A Scoping Review. Arch. Gerontol. Geriatr. 2024, 123, 105409. [Google Scholar] [CrossRef] [PubMed]
MIA-PROM Project. AI-Assisted System for the Collection of PROMs. 2025. Available online: https://mia-prom.de/?page_id=365&lang=en (accessed on 1 February 2026).
Jiang, Z.; Huang, X.; Wang, Z.; Liu, Y.; Huang, L.; Luo, X. Embodied Conversational Agents for Chronic Diseases: Scoping Review. J. Med Internet Res. 2024, 26, e47134. [Google Scholar] [CrossRef]
Ashrafi, N.; Graf, P.; Marquardt, M.; Vona, F.; Schorlemmer, J.; Voigt-Antons, J.N. Avatar Appearance Beyond Pixels-User Ratings and Avatar Preferences Within Health Applications. In Proceedings of the International Conference on Human-Computer Interaction, Yokohama, Japan, 26 April–1 May 2025; Study on avatar appearance, competence, and data disclosure in PROMs; Springer Nature: Cham, Switzerland, 2025; pp. 147–162. [Google Scholar]
Ashrafi, N.; Schorlemmer, J.; Vona, F.; Kojic, T.; Hillmann, S.; Braytee, A.; Wang, Y.K.; Kocaballi, B.; Moreira, C.; Möller, S.; et al. Emotionally adaptive virtual health assistants for enhanced social presence and trust. In Proceedings of the Adjunct Proceedings of the 25th ACM International Conference on Intelligent Virtual Agents, Berlin, Germany, 16–19 September 2025; Metahuman-based, emotion-adaptive virtual assistants for health applications; Association for Computing Machinery: New York, NY, USA, 2025; pp. 1–7. [Google Scholar]
Ashrafi, N.; Vona, F.; Hinzmann, S.; Henning, J.; Vergari, M.; Warsinke, M.; Moreira, C.P.; Voigt-Antons, J.N. Size Matters: The Impact of Avatar Size on User Experience in Healthcare Applications. arXiv 2025, arXiv:2512.07357. [Google Scholar] [CrossRef]
Ashrafi, N.; Vona, F.; Hinzmann, S.; Harnisch, P.; Graf, P.; Voigt-Antons, J.N. Single Vs Dual: Influence of the Number of Displays on User Experience within Virtually Embodied Conversational Systems. In Proceedings of the 30th ACM Symposium on Virtual Reality Software and Technology, Trier, Germany, 9–11 October 2024; PROM application on single vs. dual tablet; usability and presence comparison; Association for Computing Machinery: New York, NY, USA, 2024; pp. 1–2. [Google Scholar]
Ashrafi, N.; Vona, F.; Ringsdorf, C.; Hertel, C.; Toni, L.; Kailer, S.; Bartels, A.; Kojic, T.; Voigt-Antons, J.N. Enhancing Job Interview Preparation Through Immersive Experiences Using Photorealistic, AI-powered Metahuman Avatars. In Proceedings of the 2024 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), Bellevue, WA, USA, 21–25 October 2024; Work-in-progress on job interview avatars across VR/AR/desktop with psychophysiological measures; IEEE: Piscataway, NJ, USA, 2024; pp. 345–346. [Google Scholar]
Cevasco, K.E.; Morrison Brown, R.E.; Woldeselassie, R.; Kaplan, S. Patient Engagement with Conversational Agents in Health Applications 2016–2022: A Systematic Review and Meta-Analysis. J. Med. Syst. 2024, 48, 40. [Google Scholar] [CrossRef]
Clemensen, J.; Larsen, S.B.; Kyng, M.; Kirkevold, M. Participatory design in health sciences: Using cooperative experimental methods in developing health services and computer technology. Qual. Health Res. 2007, 17, 122–130. [Google Scholar]
Moubayed, S.A.; Beskow, J.; Skantze, G.; Granström, B. Furhat: A Back-Projected Human-Like Robot Head for Multiparty Human-Machine Interaction. In Proceedings of the Workshops on Multimodal Corpora and eNTERFACE, Santa Monica, CA, USA, 22–26 October 2012; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
Lupetti, M.L.; Hagens, E.; Van Der Maden, W.; Steegers-Theunissen, R.; Rousian, M. Trustworthy Embodied Conversational Agents for Healthcare: A Design Exploration of Embodied Conversational Agents for the periconception period at Erasmus MC. In CUI ’23: Proceedings of the 5th International Conference on Conversational User Interfaces, New York, NY, USA, 19–21 July 2023; Association for Computing Machinery: New York, NY, USA, 2023. [Google Scholar] [CrossRef]
Ter Stal, S.; Kramer, L.L.; Tabak, M.; op den Akker, H.; Hermens, H. Design Features of Embodied Conversational Agents in eHealth: A Literature Review. Int. J. Hum.-Comput. Stud. 2020, 138, 102409. [Google Scholar] [CrossRef]
Thaler, M.; Schlögl, S.; Groth, A. Agent vs. Avatar: Comparing Embodied Conversational Agents Concerning Characteristics of the Uncanny Valley. In Proceedings of the 2020 IEEE International Conference on Human-Machine Systems (ICHMS), Rome, Italy, 7–9 September 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–6. [Google Scholar] [CrossRef]
Unity Technologies. Unity, Version 2021.3.16f1; cross-platform game engine; Unity Technologies ApS: San Francisco, CA, USA, 2021.
DAZ 3D. DAZ Studio, Version 4.21; 3D figure customization and animation software; DAZ 3D, Inc.: Salt Lake City, UT, USA, 2022.
Crazy Minnow Studio. SALSA LipSync Suite, Version 2; real-time lip-sync and facial animation toolkit; Crazy Minnow Studio, LLC: Minneapolis, MN, USA, 2022.
Mayring, P. Qualitative Inhaltsanalyse—Grundlagen und Techniken; Beltz Juventa: Weinheim, Germany, 2015. [Google Scholar]
Schrepp, M.; Hinderks, A.; Thomaschewski, J. Design and evaluation of a short version of the user experience questionnaire (UEQ-S). Int. J. Interact. Multimed. Artif. Intell. 2017, 4, 103–108. [Google Scholar]
Schrepp, M.; Thomaschewski, J. Handbook for the Modular Extension of the User Experience Questionnaire; Technical report; UEQ Research: Lübeck, Germany, 2023; Available online: https://ueqplus.ueq-research.org/Material/UEQ+_Handbook_V6.pdf (accessed on 1 February 2026).
Lee, K.; Peng, W.; Jin, S.A.; Yan, C. Can Robots Manifest Personality?: An Empirical Test of Personality Recognition, Social Responses, and Social Presence in Human–Robot Interaction. J. Commun. 2006, 56, 754–772. [Google Scholar] [CrossRef]
Franke, T.; Attig, C.; Wessel, D. A Personal Resource for Technology Interaction: Development and Validation of the Affinity for Technology Interaction (ATI) Scale. Int. J. Hum.–Comput. Interact. 2019, 35, 456–467. [Google Scholar] [CrossRef]
Ter Stal, S.; Tabak, M.; op den Akker, H.; Beinema, T.; Hermens, H. Who Do You Prefer? The Effect of Age, Gender and Role on Users’ First Impressions of Embodied Conversational Agents in eHealth. Int. J. Hum.-Comput. Interact. 2020, 36, 881–892. [Google Scholar] [CrossRef]
Salem, M.; Ziadee, M.; Sakr, M. The Influence of a Robot’s Embodiment on Trust. In Proceedings of the 2015 ACM/IEEE International Conference on Human-Robot Interaction, Portland, OR, USA, 3–5 March 2015; Association for Computing Machinery: New York, NY, USA, 2015; pp. 157–164. [Google Scholar] [CrossRef]
MacNeill, A.L.; Luke, A.; Doucet, S. Individual Differences in Views Toward Healthcare Conversational Agents. J. Med. Internet Res. 2025. Early view. [Google Scholar]

Figure 1. MQTT-based technical architecture of the system, including the Questionnaire Tablet, Virtual Avatar, Furhat Robot, and the backend components.

Figure 2. Front-end components visible to the patients with both embodied agents: Furhat (left side), Virtual Avatar (right side), and the questionnaire tablet (middle). Note that patients would either use the questionnaire tablet solely in the tablet-only condition or a combination of the questionnaire tablet with one of the agents in the assisted conditions.

Figure 3. Pragmatic UEQ ratings at Clinic A for tablet-only, virtual avatar, voice-only, and Furhat robot conditions (effective n per outcome varies; see Table A1). Error bars indicate 95% confidence intervals; the ** denotes the significant difference between tablet-only and the combined assisted condition (

p = 0.007

).

Figure 3. Pragmatic UEQ ratings at Clinic A for tablet-only, virtual avatar, voice-only, and Furhat robot conditions (effective n per outcome varies; see Table A1). Error bars indicate 95% confidence intervals; the ** denotes the significant difference between tablet-only and the combined assisted condition (

p = 0.007

).

Figure 4. Hedonic UEQ ratings at Clinic B for CG, virtual avatar, and Furhat robot conditions (effective n per outcome varies; see Table A1). Error bars indicate 95% confidence intervals; the */** highlight significant differences between both agent conditions and CG (all

p < 0.05

, Tukey HSD).

Figure 4. Hedonic UEQ ratings at Clinic B for CG, virtual avatar, and Furhat robot conditions (effective n per outcome varies; see Table A1). Error bars indicate 95% confidence intervals; the */** highlight significant differences between both agent conditions and CG (all

p < 0.05

, Tukey HSD).

Figure 5. Trust ratings at Clinic A for tablet-only, virtual avatar, voice-only, and Furhat robot conditions (trust data complete for 92–32 participants per condition; see Table A1). Error bars show 95% confidence intervals; the */*** indicate significant differences between tablet-only and the combined assisted condition (

p < 0.001

) and between VoiceOnly and Furhat (

p = 0.046

, Tukey HSD).

Figure 5. Trust ratings at Clinic A for tablet-only, virtual avatar, voice-only, and Furhat robot conditions (trust data complete for 92–32 participants per condition; see Table A1). Error bars show 95% confidence intervals; the */*** indicate significant differences between tablet-only and the combined assisted condition (

p < 0.001

) and between VoiceOnly and Furhat (

p = 0.046

, Tukey HSD).

Table 1. Summaryof patients’ ratings across conditions for clinics A and B. The values represent means and standard deviations. Significance indicators: ^∗

p < 0.05

, ^∗∗

p < 0.01

, ^∗∗∗

p < 0.001

.

Table 1. Summaryof patients’ ratings across conditions for clinics A and B. The values represent means and standard deviations. Significance indicators: ^∗

p < 0.05

, ^∗∗

p < 0.01

, ^∗∗∗

p < 0.001

.

Measure	Tablet-Only/CG	Virtual Avatar	VoiceOnly	Furhat Robot
Clinic A ( $n = 102$ )
Overall UEQ	5.22 (1.06) ^∗∗	4.94 (1.52)	4.83 (1.45)	4.63 (1.28)
Hedonic UEQ	4.53 (1.37)	5.36 (1.31)	4.98 (1.15)	4.91 (1.24)
Pragmatic UEQ	5.98 (0.94)	4.90 (1.65)	5.03 (1.76)	4.39 (1.60)
Trust	5.62 (1.35) ^∗∗∗	5.14 (1.57)	5.67 (1.19) ^∗	4.67 (1.65)
Social Presence	–	4.67 (2.38)	5.32 (2.04)	4.39 (2.17)
Clinic B ( $n = 88$ , descriptives per outcome vary)
Overall UEQ	4.97 (1.13)	5.85 (1.02) ^∗	–	5.91 (0.93) ^∗∗
Hedonic UEQ	4.68 (1.38)	5.73 (1.13) ^∗	–	5.65 (1.11) ^∗
Pragmatic UEQ	5.26 (1.25)	5.98 (1.17)	–	6.17 (0.79) ^∗
Trust	5.35 (1.52)	5.54 (1.20)	–	5.61 (1.63)
Social Presence	–	4.91 (1.73)	–	5.16 (1.63)

Tablet-only (Clinic A: overall UEQ) significantly higher than all assisted conditions (

p < 0.01

). Furhat robot (Clinic B: overall UEQ and hedonic UEQ) significantly higher than CG (

p < 0.05

); Virtual avatar (Clinic B: overall UEQ and hedonic UEQ) significantly higher than CG (

p < 0.01

). VoiceOnly (Clinic A: trust) significantly higher than Furhat (

p < 0.05

). Social presence in Clinic B was not measured for the CG condition. Clinic B did not include a VoiceOnly condition. Note. Stars in Table 1 highlight only the primary contrasts discussed in the text. Other significant pairwise differences and full ANOVA statistics are reported in Table A1, Table A2 and Table A3.

Table 2. Mixed ANOVA results for user experience and trust at Clinic A (2 × 3 design with assistance as within-subject factor and embodiment as between-subject factor). Degrees of freedom reflect the effective sample for each outcome (overall UEQ:

n = 102

, pragmatic UEQ:

n = 98

, hedonic UEQ:

n = 100

, trust:

n = 101

).

Table 2. Mixed ANOVA results for user experience and trust at Clinic A (2 × 3 design with assistance as within-subject factor and embodiment as between-subject factor). Degrees of freedom reflect the effective sample for each outcome (overall UEQ:

n = 102

, pragmatic UEQ:

n = 98

, hedonic UEQ:

n = 100

, trust:

n = 101

).

Measure	Source	SS	df₁	df₂	MS	F	p	$η_{p}^{2}$
Overall UEQ
Overall UEQ	E	1.36	2	100	0.68	0.34	0.714	0.007
Overall UEQ	A	8.82	1	100	8.82	7.38	0.008	0.069
Overall UEQ	A × E	0.66	2	100	0.33	0.28	0.759	0.006
Pragmatic UEQ
Pragmatic UEQ	E	4.91	2	96	2.45	1.01	0.367	0.021
Pragmatic UEQ	A	82.32	1	96	82.32	46.90	<0.001	0.328
Pragmatic UEQ	A × E	1.32	2	96	0.66	0.37	0.689	0.008
Hedonic UEQ
Hedonic UEQ	E	2.28	2	98	1.14	0.49	0.616	0.010
Hedonic UEQ	A	10.10	1	98	10.10	7.72	0.007	0.073
Hedonic UEQ	A × E	0.88	2	98	0.44	0.34	0.714	0.007
Trust
Trust	E	12.19	2	99	6.09	1.92	0.152	0.037
Trust	A	11.65	1	99	11.65	13.76	<0.001	0.122
Trust	A × E	4.37	2	99	2.19	2.58	0.081	0.050

E stands for Embodiment and A stands for Assistance.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ashrafi, N.; Graf, P.; Marquardt, M.; Harnisch, P.; Hillmann, S.; Ploner, N.; Compagna, D.; Cirit, E.; Papst, L.; Voigt-Antons, J.-N. Multimodal Assistance in Rehabilitation: User Experience of Embodied and Non-Embodied Agents for Collecting Patient-Reported Outcome Measures. Virtual Worlds 2026, 5, 15. https://doi.org/10.3390/virtualworlds5010015

AMA Style

Ashrafi N, Graf P, Marquardt M, Harnisch P, Hillmann S, Ploner N, Compagna D, Cirit E, Papst L, Voigt-Antons J-N. Multimodal Assistance in Rehabilitation: User Experience of Embodied and Non-Embodied Agents for Collecting Patient-Reported Outcome Measures. Virtual Worlds. 2026; 5(1):15. https://doi.org/10.3390/virtualworlds5010015

Chicago/Turabian Style

Ashrafi, Navid, Philipp Graf, Manuela Marquardt, Philipp Harnisch, Stefan Hillmann, Nico Ploner, Diego Compagna, Eren Cirit, Lilia Papst, and Jan-Niklas Voigt-Antons. 2026. "Multimodal Assistance in Rehabilitation: User Experience of Embodied and Non-Embodied Agents for Collecting Patient-Reported Outcome Measures" Virtual Worlds 5, no. 1: 15. https://doi.org/10.3390/virtualworlds5010015

APA Style

Ashrafi, N., Graf, P., Marquardt, M., Harnisch, P., Hillmann, S., Ploner, N., Compagna, D., Cirit, E., Papst, L., & Voigt-Antons, J.-N. (2026). Multimodal Assistance in Rehabilitation: User Experience of Embodied and Non-Embodied Agents for Collecting Patient-Reported Outcome Measures. Virtual Worlds, 5(1), 15. https://doi.org/10.3390/virtualworlds5010015

Article Menu

Multimodal Assistance in Rehabilitation: User Experience of Embodied and Non-Embodied Agents for Collecting Patient-Reported Outcome Measures

Abstract

1. Introduction

2. Literature Review

3. Materials and Methods

3.1. Participatory Design Process

3.2. System Design

3.2.1. Technical Components

Questionnaire Tablet

Virtual Avatar

Furhat

3.2.2. Component Integration

3.3. Experimental Design

3.3.1. Study Site I: Outpatient Clinic

3.3.2. Study Site II: Inpatient Clinic

3.4. Data Analysis

3.4.1. Questionnaires

3.4.2. Independent Variables

3.4.3. Statistical Procedures

4. Results

4.1. User Experience Results

4.2. Trust and Data Disclosure

4.3. Social Presence Results

4.4. Overall Preference Ratings

4.5. Impacts of Demographic Factors and Affinity for Technology

4.6. Qualitative Data

5. Discussion

5.1. Context-Dependent Effects of Agent Embodiment

5.2. Trust and Embodiment in Healthcare Contexts

5.3. Social Presence and User Engagement

5.4. Demographic and Individual Differences

5.5. Qualitative Feedback

5.6. Implications for Design and Deployment

5.7. Limitations and Future Directions

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Supplementary Tables

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI