Human–Robot Interaction in Indoor Mobile Robotics: Current State, Interaction Modalities, Applications, and Future Challenges

Khan, Arman Ahmed; Thurow, Kerstin

doi:10.3390/s26061840

Open AccessReview

Human–Robot Interaction in Indoor Mobile Robotics: Current State, Interaction Modalities, Applications, and Future Challenges

by

Arman Ahmed Khan

and

Kerstin Thurow

^*

Center for Life Science Automation, University of Rostock, F.-Barnewitz-Str. 8, 18119 Rostock, Germany

^*

Author to whom correspondence should be addressed.

Sensors 2026, 26(6), 1840; https://doi.org/10.3390/s26061840

Submission received: 1 February 2026 / Revised: 28 February 2026 / Accepted: 9 March 2026 / Published: 14 March 2026

(This article belongs to the Special Issue Intelligent Mobile Robotics: Object Recognition, Human–Robot Interaction and Autonomous Navigation)

Download

Browse Figures

Versions Notes

Abstract

This paper provides a comprehensive survey of Human–Robot Interaction (HRI) for indoor mobile robots operating in human-centered environments such as hospitals, laboratories, offices, and homes. We review interaction modalities—including speech, gesture, touch, visual, and multimodal interfaces—and examine key user experience factors such as usability, trust, and social acceptance. Implementation challenges are discussed, encompassing safety, privacy, and regulatory considerations. Representative case studies, including healthcare and domestic platforms, highlight design trade-offs and integration lessons. We identify critical technical challenges, including robust perception, reliable multimodal fusion, navigation in dynamic spaces, and constraints on computation and power. Finally, we outline future directions, including embodied AI, adaptive context-aware interactions, and standards for safety and data protection. This survey aims to guide the development of indoor mobile robots capable of collaborating with humans naturally, safely, and effectively.

Keywords:

human–robot interaction; indoor mobile robots; interaction modalities; multimodal interfaces; usability and trust; safety; privacy; regulatory considerations

1. Introduction

The integration of mobile robots into indoor environments marks a significant paradigm shift in robotics—from isolated industrial applications to direct collaboration with humans in shared spaces [1]. Indoor mobile robots are increasingly deployed across diverse environments, including hospitals [2], laboratories [3], offices, and residential settings [4]. This shift requires Human–Robot Interaction (HRI) capabilities to be sophisticated to enable natural, safe, and efficient collaboration between humans and robotic systems.

Several converging factors motivate the development of effective HRI in indoor mobile robotics.

Aging population: An aging global population creates a critical need for assistance. By 2050, the world’s population of people aged 60 years and older will double to 2.1 billion; 1 in 6 people worldwide will be over 65 by 2050, which is a significant increase from 1 in 11 in 2019, according to the World Health Organization [5]. This aging population and growing healthcare needs require the deployment of robots in hospitals, care facilities, and home care. Robots can relieve caregivers by performing routine tasks, thereby increasing the efficiency of patient care [6]. The use of mobile robots enhances and enables independent living for the elderly and those with disabilities. The use of smart wheelchairs with Human–Robot Interaction via artificial intelligence increases usability, learnability, efficiency, satisfaction, and a sense of independence and dignity among elderly and mobility-impaired individuals [7,8]. Matthias and Markus proposed an intelligent wheelchair that can navigate in indoor environments and accompany any person. In addition, it allows social interaction while walking to relieve relatives or nursing staff, who otherwise need to push the wheelchair [9]. Shannon Vallor in her paper “Carebots and Caregivers” introduced the concept of “carebots” and argued that the ethical evaluation of these systems must extend beyond the impact on patients to consider the moral value of caregiving practices for the caregivers themselves, examining whether robots sustain or deprive them of the internal goods of caring [10]. The TORNADO cloud-integrated robotics platform, featuring people-aware navigation and dexterous manipulation, includes a validation scenario specifically focused on patient support in a hospital palliative ward [11]. This ageing population requires robots to assist, care, help, and socially interact. To achieve that goal, we need the human-centric design of Human–Robot Interaction.

Labor shortage: Another major motivating factor to develop robots with intelligent and user-accepted Human–Robot Interaction is due to the labor shortage. To have the robot cooperate and coordinate with humans, it needs to understand human norms and proxemics. Cao and Tam proposed a search and fetch operation for the mobile manipulation robots with a multimodal Human–Robot Interaction via gesture, voice, and face recognition. This can address the persistent labor shortage in roles that require complex, repetitive tasks in indoor environments like manufacturing factories [12]. Beyond simple repetition, the economic viability of automation is increasingly driven by the modernization of existing “brownfield” facilities. Unlike “greenfield” projects built from scratch, brownfield environments require Autonomous Mobile Robots (AMRs) that can navigate without fixed infrastructure like rails, allowing companies to automate gradually without expensive shutdowns or remodeling [13]. In labor-intensive industries like textile composite production, where full automation is cost-prohibitive, collaborative robots (cobots) offer a middle ground. They reduce production time and costs while preserving the tacit knowledge of skilled human experts who are becoming scarcer due to aging workforces [14]. This trend extends to agriculture, where robotic planters are being designed to remedy future farmer shortages by optimizing energy usage and draft force [15]. One of the common things among these users is their lack of technical skills and training to operate and interact with robots, and hence, the human-centric design of interaction becomes paramount. In the service and healthcare sectors, the integration of Generative AI (GenAI) which include large language models (LLMs), large behavioral models (LBMs), and agentic AI is transforming the economic landscape by enabling “citizen developers”—frontline employees who can train and fine-tune robots without coding skills, leading to cost-effective service excellence [16]. Real-world applications demonstrate significant returns on investment; for instance, the deployment of Moxi robots has saved clinical staff over 575,000 h and 1.5 billion steps, directly addressing clinician burnout and operational inefficiency [17]. Furthermore, humanoid robots are being explored as a solution to hospital labor shortages, capable of performing teleoperated clinical tasks with human-like dexterity [18]. The deployment of mobile robots in healthcare settings has demonstrated promising results in improving operational efficiency and reducing the workload on medical staff [19,20]. Service robots have evolved from simple delivery systems to sophisticated platforms capable of complex Human–Robot Interaction [21,22].

Pandemic Response and Biosecurity: The COVID-19 pandemic acted as a catalyst for robotic adoption, highlighting the advantage of systems with intrinsic immunity to pathogens [23]. Robots became essential for maintaining “social distancing” in medical settings. For example, the “Dr. Spot” quadruped robot was developed to measure vital signs (skin temperature, heart rate, SpO2) without direct contact, preserving Personal Protective Equipment (PPE) and protecting healthcare workers [24]. Mobile robots are assessed for deployment in isolation-room hospital settings to execute tasks like remote supply delivery and medication distribution, aiming to minimize the risk of cross-contamination and reduce staff workload, particularly during infectious disease outbreaks like the COVID-19 pandemic [25,26]. In extreme cases, such as in Wuhan, China, entire wards were temporarily run by robots to deliver food and medicine to quarantine patients, minimizing human exposure [27]. Telepresence robots also gained traction, allowing isolated patients to interact with families and doctors while avoiding contagion risks [28]. Beyond direct care, robots have been utilized for disinfection, logistics, and waste handling, effectively breaking the chain of virus transmission [26]. This protection extends to surgical oncology, where robotic systems allow for precise interventions while adhering to strict safety protocols to prevent viral aerosolization [29]. Therefore, whenever a human is involved, which is very likely the case in the indoor environments of hospitals and laboratories, the robot must have a Human–Robot Interaction model covering the basics like human-aware navigation, intuitive interaction, and likable presence.

Occupational and Psychological Safety: Beyond biological hazards, HRI addresses physical and mental safety in industry. In the construction sector, which faces high accident rates, machine learning models are being used to predict human trust in robots based on physiological data (such as skin temperature), ensuring that human–robot collaboration (HRC) remains safe and productive [30]. However, safety is also psychological. The isolation caused by pandemics or varying abilities affects mental health. Social robots with “hybrid face” designs have been deployed to support the mental health of older adults and those with dementia, providing companionship and “human-like” conversation to mitigate isolation [31]. Research in aged care suggests that while robots can relieve the physical workload, their acceptance depends heavily on their reliability and the emotional reactions of the staff, emphasizing that psychological safety is a prerequisite for successful implementation [32]. Ultimately, robots are tasked with the “dull, dirty, and dangerous” jobs—from welding to patient lifting—enhancing the overall safety profile of modern work environments [33].

The challenge for Human–Robot Interaction in indoor environments is twofold. One is the technical challenge, and the other is the human challenge (Figure 1).

Complexity of environments: Indoor environments such as hospitals, laboratories, and offices are dynamic, tightly structured, and populated with many moving obstacles. Robots must not only navigate but also consider human activities and interact safely. Effective HRI supports adaptive, context-sensitive, flexible collaboration [34].
Extending robot functionality: Natural interaction methods such as speech, gestures, touch, or gaze enable robots to perform more complex tasks that would be difficult to manage without HRI. These capabilities make robots more intuitive and “human-like” to operate, lower the learning barrier for users, and improve task efficiency.
User acceptance and trust: The success of robotic systems depends heavily on user trust and acceptance. Robots must behave in a comprehensible, reliable, and socially appropriate manner to ensure that users can rely on them and use them effectively in daily life [35]. Trust is a prerequisite for user acceptance [35,36,37]. It depends heavily on crucial human-centric factors, such as whether users are willing to accept the technology and trust it to perform its tasks reliably [38]. Only then can we have the long-term adaptation of mobile robotics in our personal spaces.
Safety and ethical considerations: In human-centered environments, safety is paramount. HRI contributes to predictable, transparent, and ethically sound interactions, for example, through clear communication, privacy protection, and the respectful handling of human autonomy.

HRI can be classified along multiple axes; Goodrich and Schultz survey HRI in terms of robot autonomy, interaction roles, and team organization [1]. Parasuraman et al. formalize levels of automation for information acquisition, analysis, decision, and action—a framework widely adopted in HRI design and evaluation [39]. Steinfeld and Fong talked about common matrices for Human–Robot Interactions [40]. Bartneck et al. covered measurement instruments for the anthropomorphism, animacy, likeability, perceived intelligence, and the perceived safety of robots [41]. Figure 2 represents the different dimensions of HRI classification.

This review provides a comprehensive and integrative synthesis Human–Robot Interaction (HRI) in indoor mobile robotics. Unlike prior surveys that focus on isolated subdomains, we systematically combine three perspectives that are often treated separately: (i) technical aspect of HRI, including modalities such as speech, gesture, touch, visual, and emerging LLM-enabled interfaces; (ii) human aspect of HRI, encompassing usability, trust formation, social acceptance, and long-term user experience; and (iii) practical and regulatory consideration, including safety engineering, privacy considerations, deployment constraints, and the compliance framework.

By bridging fragmented literature across robotics, human–computer interaction, healthcare technology, and safety engineering, this review addresses a critical gap: the lack of a unified 2025 perspective that connects technical implementation with socio-technical integration in real deployment contexts. We analyze representative case studies from operational systems to ground theoretical insights in practice and to derive transferable design principles.

In addition to synthesizing the state of the art, we identify key open research challenges, including the need for longitudinal studies on trust dynamics, the cross-cultural validation of HRI models, scalable economic deployment frameworks, and standardized interoperability architectures for heterogeneous robot fleets. Addressing these challenges is essential for advancing indoor mobile robotics toward natural, safe, economically sustainable, and socially accepted human–robot collaboration.

Finally, we analyze representative case studies and examine current approaches with highlights of future research directions aimed at advancing the field toward more natural, safe, and effective human–robot collaboration in indoor environments.

2. Review Methodology

To ensure a rigorous and comprehensive analysis of the current state of Human–Robot Interaction (HRI) in indoor environments, this survey follows a systematic literature review approach. This methodology was designed to minimize selection bias and ensure the inclusion of both foundational theories and the most recent high-impact developments, particularly in the rapidly evolving fields of Large Language Models (LLMs) and embodied AI.

2.1. Search Strategy and Data Sources

A systematic search was conducted across two primary scientific databases: Scopus and Google Scholar. To address the interdisciplinary nature of HRI, medical-focused databases (e.g., PubMed) were also consulted specifically for applications in healthcare robotics. For Google Scholar, only the first 200 results sorted by relevance were screened to ensure feasibility and quality control, due to the lack of structured filtering mechanisms comparable to curated academic databases.

The search strategy employed a combination of keywords using Boolean operators, adapted for each database’s syntax. The core search strings included the following:

Primary Descriptors: (“Mobile Robot” OR “Service Robot” OR “Social Robot” OR “Embodied AI”)
Context Qualifiers: AND (“Indoor” OR “Hospital” OR “Home” OR “Office” OR “Laboratory”)
Interaction Descriptors: AND (“HRI” OR “Human-Robot Interaction” OR “Trust” OR “Social Acceptance” OR “Social Navigation”)
Emerging Technologies (2023–2025): Specific supplementary searches were conducted for (“Large Language Models” OR “LLM” OR “Generative AI” OR “Vision-Language Models”) AND “Robotics” to address the paradigm shift identified by recent scholarship.

2.2. Inclusion and Exclusion Criteria

The initial search yielded over 1724 potential records from the searched platforms. Duplicate records were removed prior to screening. The remaining articles were screened first by title and abstract and subsequently by full-text review based on the following criteria:

2.2.1. Inclusion Criteria

Platform: Studies focusing on mobile robotic platforms (wheeled only) operating in indoor environments.
Focus: Papers where HRI, user experience (UX), or human-aware navigation were a primary variable.
Recency: Priority was given to works published between 2015 and 2025, with exceptions made for seminal foundational papers that established core HRI metrics.
Language: Peer-reviewed articles and high-impact conference proceedings in English.
Domain Expertise: When meeting the inclusion criteria, peer-reviewed contributions from the authors’ group were included to ensure completeness within the life-science robotics domain. These papers were subjected to the same screening criteria.

2.2.2. Exclusion Criteria

Environment: Outdoor robotics (e.g., autonomous driving, agriculture) and underwater/aerial drones were excluded.
Morphology: Stationary industrial manipulators (robotic arms) operating in caged environments, as well as any legged robots, were excluded, except when mounted on mobile bases (mobile manipulation). This restriction ensures comparability in navigation constraints and interaction paradigms, as legged robots introduce fundamentally different locomotion and perception dynamics.
Methodology: Simulation-only studies that lacked human validation or real-world applicability were generally excluded, except where they demonstrated novel theoretical frameworks.

2.3. Dataset Composition

The final dataset consists of 216 references. To ensure that the review remains timely for a 2025 audience, the bibliography is heavily weighted towards recent advancements, with a significant proportion of citations from the 2020–2025 period. This distribution captures the post-COVID-19 acceleration in service robotics and the integration of generative AI. The Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) can be seen in Figure 3.

2.4. Case Study Selection

To bridge the gap between academic theory and practical deployment, three commercial robotic platforms—Moxi (Diligent Robotics), Temi, and Amazon Astro—were selected for the detailed analysis in Section 6. These specific platforms were chosen based on the following:

Market Maturity: All three have moved beyond the prototype stage to widespread commercial deployment.
Domain Representation: They represent three distinct indoor domains analyzed in this review: Healthcare (Moxi), Telepresence/Office (Temi), and Domestic/Home (Astro).
Interaction Diversity: They exhibit a range of interaction modalities, from multimodal mobile manipulation to consumer-grade social presence, allowing for a comparative analysis of trust and acceptance metrics.

The selection is not intended to be exhaustive but illustrative. The three platforms were chosen to represent distinct deployment domains and interaction paradigms rather than market dominance or technological superiority. This ensures analytical comparability while acknowledging the broader ecosystem of indoor mobile service robots. Alternative platforms were screened but excluded due to limited deployment data or insufficient publicly available interaction metrics.

2.5. Reproducibility Trends: A Crisis and a Shift

HRI has historically suffered from a “reproducibility crisis” because user studies often rely on custom hardware, specific physical environments, and subjective surveys that are hard to replicate [42].

The “Context” Problem: Unlike pure code benchmarks (like in computer vision), HRI results depend heavily on the physical context (e.g., a hospital hallway vs. a lab) and social context (e.g., cultural norms of the participants). Papers now increasingly demand “conceptual replication” (testing the same hypothesis in a new context) rather than just exact replication [43].
Standardized Scenarios: There is a trend toward using standardized task boards and scripted interaction protocols (e.g., the “socnav” benchmarks [44]) so different labs can test robots on the exact same social navigation scenarios.

2.6. Benchmarking Standards

Currently, no single “gold standard” exists, but the community is coalescing around specific metrics for Social Navigation and Interaction Quality.

2.6.1. Social Navigation Metrics (Quantitative)

For mobile robots moving among people, “success” is not just reaching the goal; it is how you get there.
Safety: Minimum distance to human, collision rate.
Comfort/Social Norms:
- Social Work: Number of intrusions into “personal space” (proxemics violations).
- Jerky Motion: Path irregularity or acceleration changes (smoothness), which correlates with human trust.
Efficiency: Success weighted by path length is the standard metric. It penalizes robots that take safe but absurdly long detours.

2.6.2. Interaction and Trust Metrics (Qualitative/Subjective)

Since “feeling safe” is subjective, standardized questionnaires are critical for comparability.

Godspeed Questionnaire: The most widely used standard for measuring Anthropomorphism, Animacy, Likeability, Perceived Intelligence, and Perceived Safety [41].
Trust in Automation Scales: Specific scales (like the Jian et al. Trust Scale) are now standard for measuring how much a user trusts a mobile robot [45].
NASA-TLX: Standard for measuring cognitive load—i.e., how mentally demanding it was for a human to interact with the robot [46].
SUS for Perceived Usability: The System Usability Scale is standardized questionnaire for the assessment of perceived usability [47].

2.6.3. Datasets and Simulators

SEAN 2.0 (Social Environment for Autonomous Navigation): A popular simulation platform for benchmarking social navigation algorithms [48].
SocNavBench: A dataset/benchmark specifically for measuring how well robots predict and move around human social behaviors [44].
Scand: A large-scale dataset of socially compliant navigation demonstrations [49].
RISE: It is a tool to generate human robot interaction scenarios [50].

3. Applications of Indoor Mobile Robots: Cross-Domains and Domain-Specific Challenges

Indoor mobile robots are autonomous or semi-autonomous robotic systems that operate across diverse indoor environments, such as hospitals, laboratories, offices, industrial facilities, and private homes. These environments differ in structure, predictability, and user characteristics. As summarized in Table 1, indoor operational context can be broadly classified as structured and predictable or unstructured and dynamic. This distinction provides a useful framework for understanding both cross-domain and domain specific requirements.

Indoor mobile robots perform tasks ranging from transportation and assistance to monitoring and social interaction while interacting safely and effectively with humans and dynamic surroundings. Compared to outdoor industrial robots, they operate in fundamentally different contexts, which impose specialized capabilities and design considerations [51]. These environments present unique constraints that directly influence interaction design and user experience. Indoor mobile robots operate in constrained, dynamic environments where they must navigate around humans, devices, furniture, and other obstacles while maintaining social appropriateness [52]. The interaction design must accommodate diverse user groups with varying technical expertise, from healthcare professionals to elderly residents [53] as depicted in Table 2.

3.1. Cross-Domain HRI Challenges

Regardless of the environment, several challenges are shared across domains:

Safe Navigation in Human-Populated Spaces: Robots must operate in constrained, dynamic environments with humans, furniture, and moving obstacles. They need precise collision avoidance while maintaining efficiency and must adapt to unpredictable human behavior [34,54]. Structured environments, like laboratories, emphasize exact path following, whereas homes or healthcare settings require dynamic adaptation to moving people and clutter [55,56].
Social Appropriateness and Intent Communication: Across domains, robots must behave in socially intelligible ways. Mechanisms include visual signaling, auditory cues, gesture-based communication, and projected indicators of movement or intent [57,58]. Such communication increases the predictability and legibility of robot motion, thereby enhancing human–robot trust, whether in office corridors, hospital hallways, or shared home spaces [59,60].
Heterogenous User Groups: Indoor robots interact with highly trained professionals, administrative staff, elderly residents, children, or casual visitors [6,61]. Each group has different expectations, technical literacy, and cognitive loads [62]. The interaction design must accommodate these differences, providing clear, context-appropriate feedback without overloading users, accounting for varying levels of psychosocial functioning [63].
Infrastructure Integration: Robots must interface with existing buildings and workflow infrastructure, including elevators, automatic doors, IT networks, and laboratory management systems [64,65]. Reliable integration ensures smooth operation and minimal disruption, often requiring standardized “plug and play” frameworks for seamless deployment [19,66].
Trust, Privacy, and Ethical Considerations: Especially in healthcare and residential settings, robots often process sensitive data [67]. The cross-domain HRI design must safeguard privacy, implement consent mechanisms, and maintain transparency to ensure user trust and avoid the ethical pitfalls related to surveillance or autonomy [35,68,69].

3.2. Domain-Specific Requirements

While indoor mobile robots share certain cross-domain challenges such as safe navigation, clear communication of intent, infrastructure integration, and accommodation of heterogeneous user groups, the relative importance and technical implementation of these challenges vary significantly by domain. Each environment imposes unique operational constraints, interaction priorities, and user expectations, which shape both the design of the robot and the nature of Human–Robot Interaction.

3.2.1. Healthcare and Elderly Care

Education and healthcare are particularly fertile application areas for domain-specific Human–Robot Interaction [61,70]. Healthcare facilities represent one of the most demanding indoor environments due to sterility requirements, strict patient privacy constraints, time-critical workflows, and the emotional sensitivity of users. Robots such as Moxi (Diligent Robotics) assist nursing staff with routine tasks, including delivering supplies or transporting medications [71]. These assistance tasks involve more direct support to users and often require an understanding of individual user preferences, routines, and intentions. Such applications rely on advanced interaction capabilities, including speech recognition, gesture interpretation, contextual understanding, and long-term user modeling to personalize the robot’s behavior and responses over time.

Common applications in healthcare settings include medication reminders for elderly patients, health monitoring and data collection, or educational activities in classrooms and laboratories [72]. Robots in these environments must navigate precisely around medical equipment, patient beds, and healthcare staff, while strictly maintaining sterility and safety protocols [73]. Platforms such as “Marvin”, an omni-directional assistant for domestic environments, are designed to support elderly monitoring and remote presence [74]. Additionally, robots must recognize and respond appropriately to human presence, avoid disrupting workflows, and always ensure patient privacy.

In residential care, robots increasingly provide emotional support and assistance with activities of daily living for elderly individuals [22]. The deployment of heterogeneous mobile robot fleets in hospitals, such as Tartu University Hospital, demonstrate the practical impact of these systems: robots perform time-critical object transportation tasks, moving samples from intensive care units (ICUs) to hospital laboratories through crowded and narrow hallways [75]. The development of usable autonomous mobile robots requires careful consideration of user needs, environmental constraints, and task-specific requirements [21]. Social robots in hospitals play roles ranging from patient companionship to healthcare delivery. They generally yield high user satisfaction when they have multimodal communication and personalized behaviors to mitigate occasional user fear or frustration [76,77].

3.2.2. Laboratory Environments

Laboratory environments introduce considerable complexity due to the presence of expensive and sensitive equipment, hazardous chemicals, and strict procedural requirements [3]. Robots operating in laboratories must handle delicate instruments accurately, maintain contamination control, navigate safely around staff and equipment, and adapt to dynamic experimental setups [64]. Integration with laboratory management systems, adherence to precise spatial paths, and responsiveness to dynamic workspace changes are essential to ensure both safety and workflow efficiency.

Multi-floor labware transportation systems such as the MOLAR Automated Guided Vehicles (AGVs) developed at the Center for Life Science Automation (CELISCA) execute complex workflows, moving labware and materials between workbenches located on different floors [78,79]. These robots are integrated with laboratory infrastructure, including automatic doors and elevators opening [64,78,79]. Research also addresses methods to correct unexpected localization errors to maintain operational safety [80]. Similarly, the H2O robot at CELISCA uses StarGazer sensors to navigate via ceiling landmarks and employs hybrid elevator-controlling strategies that combine robot arm manipulation and wireless control [65,79]. For both systems, social navigation and interaction with humans are critical HRI components.

The mobile robot Kevin was specifically designed to handle the transportation of labware, relieving highly trained laboratory staff from logistical duties [81]. With a non-intimidating height (100–160 cm) and organic shapes, Kevin uses a multimodal interface comprising lights, speakers, and a tablet to communicate its status and movement intentions to non-technical personnel. User studies indicate that a “medium” communication level that provides targeted, concise feedback is preferred over continuous signaling to avoid information overload [81].

Clinical specimen delivery systems such as Proxie robot, a mobile collaborative robot (cobot) piloted at Mayo Clinic Laboratories, autonomously move existing carts containing laboratory specimens between pathology stations. This significantly reduces staff effort while maintaining trust through a stable architecture and subtle visual cues, such as expressive “eyes,” which help users feel confident working alongside it. Its “Scout Sense” captures the environment from a human-like eye level to ensure situational awareness, while “Glide 360” mobility allows it to move naturally and intuitively around people. The robot prioritizes safe interaction and uses adaptive AI to learn and harmonize with human workflows, aiming to assist rather than replace staff [82].

Frameworks such as LAPP (Laboratory Automation Plug & Play) further extend flexibility in pharmaceutical and general laboratory automation by enabling mobile manipulator with vision systems to “learn” device poses. This approach aims to simplify the integration of devices from different vendors for end-to-end automation systems [66].

In addition, sophisticated HRI and safety mechanisms are implemented in shared spaces. Mobile systems employ multi-layer smart collision avoidance using Kinect sensors for the real-time recognition of dynamic human face orientation (classified using LVQ neural networks) or specific hand gesture commands (classified using SVM) to receive direct navigational guidance from personnel in narrow zones [65]. Methods to communicate robot intent include projected visual signals, LEDs, speakers, and wearable haptic devices, which improve both predictability and user trust [59].

Research Platforms like the Pioneer 3-DX and TIAGo mobile robots are widely used to study human safety perception and trust [38]. TIAGo, with a semi-humanoid form, an adjustable height, movable head, and multimodal interaction capabilities, supports natural communication via speech, gesture recognition, touch, and emotion detection, allowing safe collaboration in research and healthcare [83].

Overall, indoor mobile robots must balance operational precision, procedural compliance, infrastructure integration, and safe social interaction. Each application, from labware transport to specimen delivery, imposes unique technical and social requirements. Successful deployment depends on integrating perception, socially intelligent navigation, and natural Human–Robot Interaction to meet both safety and user acceptance standards.

3.2.3. Office Environments

Office environments are socially structured but less safety-critical than healthcare or laboratory setting. Robots in offices must navigate within shared workspaces and must respect professional etiquette, meeting norms, and hierarchical structures and team dynamics. Typical tasks include document delivery, providing information services, visitor guidance, and telepresence. The interaction design must minimize disruption and maintain low-friction communication, allowing users to focus on work rather than managing the robot [84].

To assist individuals in office spaces, several research efforts have demonstrated effective HRI solutions. Iida and Abdulali proposed a telepresence robot implementing DEtect, TRAck and FOllow (DETRAFO) algorithm [28] to enable intuitive tracking and following users. Ngo and Nguyen developed a cost-effective on-device Natural Language Command Navigation System for mobile robots in indoor environments like an office, making the Human–Robot Interaction efficient and effective to communicate goals via natural commands [85]. Additionally, Balcı and Poncelet introduced movable robotic furniture in shared office spaces to modulate human–human interaction, to avoid distraction, and make spontaneous interaction more meaningful, thereby improving the overall workflow efficiency [86,87].

3.2.4. Industrial Settings

Industrial indoor environments focus on productivity enhancement, repetitive task execution, and safe human–robot collaboration. Cao and Tam proposed a search and fetch operation for mobile manipulation robots that leverages multimodal Human–Robot Interaction via gesture, voice, and face recognition. This can address the persistent labor shortage in roles that require complex, repetitive tasks in indoor environments such as manufacturing factories [12]. Colceriu and Theis examined human-centric GUI designs for mobile cobots to increase their potential for assembly work in industry settings [88]. Huy and Vietcheslav further developed a novel interface framework for Human–Robot Interaction in industry, employing a laser-writer in combination with a see-through head-mounted display using augmented reality and spatial augmented reality to securely exchange information. They also introduced a novel handheld device enabling multiple input modalities, allowing users to interact with mobile robots’ efficiency [89].

These approaches aim to enhance productivity and safety in harsh or hazardous environments where robots can take over risky jobs [90]. HRI thus plays a central role in improving operational efficiency while maintaining safety in complex indoor construction sites. In such settings, robots work alongside human teams, supporting physically demanding operations, facilitating task communication, and adhering to strict safety protocols [91]. Compared to healthcare or laboratory environments, emotional intelligence and social or companion functions are largely unnecessary. The focus is instead on task performance, reliability, and predictable interaction.

3.2.5. Residential Homes

Residential environments introduce substantial social and behavioral challenges. Robots must navigate shared spaces occupied by multiple individuals with varying routines, preferences, and technical skills [92]. They are required to interpret informal social cues, avoid interfering with family interactions, and respect privacy, including restricted access to certain rooms and protection of personal data. In addition to assistance tasks, residential robots frequently perform companion roles, providing emotional support, engagement through conversation or play, and help with daily routines, particularly for elderly or disabled residents [93].

Monitoring and surveillance tasks are also common in these settings, including autonomous patrolling, security enforcement, and environmental monitoring [94]. While these applications are socially interactive and assistive [57,95,96], they raise critical privacy and ethical considerations, as robots often collect sensitive data about people and spaces. Designers must carefully implement robust data protection, consent mechanisms, and transparent reporting, to ensure that the robot’s presence does not infringe on individual rights or create distrust. Social and companion roles are increasingly relevant in residential care, home environments, and facilities supporting vulnerable populations. In these roles, robots provide emotional support, companionship, and engagement through conversation, play, or assistance with daily routines [93]. These applications demand the most sophisticated HRI capabilities, including emotion recognition, adaptive personalization, and the ability to build long-term relationships with users. Robots must interpret subtle social cues, respond appropriately to changing moods, and foster trust over repeated interactions, creating a sense of presence and companionship that extends beyond functional task performance. Demand for increased convenience, security monitoring, and remote care for relatives at home can be achieved by the Astro robot from Amazon [97]. Examples of residential robots include the Astro platform from Amazon, which supports convenience, security monitoring, and remote care for relatives at home [97]. Other robots, such as ZERITH H1 and Loki (Loki Robotics), are used for housekeeping and toilet cleaning in homes, hotels, and offices [98,99]. Unlike laboratories or industrial robots, residential robots prioritize social acceptance, privacy preservation, adaptability to unstructured environments, and long-term relational interaction. They require sophisticated HRI capabilities, including emotion recognition, adaptive personalization, and the ability to foster trust over repeated interactions, creating a sense of companionship that extends beyond functional task performance.

3.3. Synthesis

Although indoor mobile robots share foundational capabilities such as navigation, perception, and infrastructure integration, the nature and complexity of Human–Robot Interaction (HRI) differs substantially across domains. These differences arise from variations in the environmental structure, user expectations, task criticality, and social context.

In industrial settings, HRI is primarily task-oriented and performance-driven. Interaction focuses on clear command input, predictable system responses, and compliance with safety protocols. Multimodal interfaces (e.g., gesture, voice, GUI, augmented reality) are designed to increase efficiency and reduce cognitive load during repetitive or hazardous operations. Social expressiveness and emotional intelligence are largely unnecessary; instead, transparency, reliability, and unambiguous intent communication are central. The human operator remains goal-directed, and interaction serves operational optimization.

Similarly, laboratory environments require highly structured and controlled interaction. Here, HRI must support precision, traceability, and procedural compliance. Communication is typically concise and purpose-specific, avoiding distraction in cognitively demanding research settings. Robots must signal movement intentions clearly to ensure safety in confined spaces, but excessive social signaling can reduce usability. Trust is established through accuracy, predictability, and seamless integration into laboratory workflows rather than through expressive or companion-like behavior. Thus, HRI in laboratories is functional, minimally intrusive, and tightly coupled to workflow reliability.

In healthcare environments, HRI becomes significantly more complex. Robots interact not only with trained professionals but also with patients, elderly individuals, and visitors. Consequently, interaction must combine clinical precision with social sensitivity. Speech recognition, gesture interpretation, contextual awareness, and user modeling are required to personalize assistance and accommodate varying cognitive and physical abilities. Emotional sensitivity, privacy protection, and trust-building are critical. Unlike in laboratories or industry, interaction failures may directly affect well-being or patient confidence. HRI must therefore be adaptive, transparent, and ethically grounded.

Office environments occupy an intermediate position. While less safety-critical, they are socially structured and norm-sensitive. HRI must be low-friction, socially compliant, and minimally disruptive. Robots should respect the proxemics, meeting etiquette, and hierarchical dynamics. Interaction is often informational (e.g., navigation guidance, telepresence, task delivery) and must be seamlessly integrate into daily routines. Compared to healthcare, emotional engagement is less central; compared to laboratories, social appropriateness carries greater weight than strict procedural precision.

The most demanding HRI requirements emerge in residential environments. Homes are socially dynamic, informal, and privacy-sensitive spaces with heterogeneous users, including children, elderly individuals, and guests. Robots must interpret subtle social cues, adapt to changing routines, and avoid interfering with family interactions. In companion or assistive roles, HRI must support emotion recognition, adaptive personalization, and long-term relationship building. Trust formation, consent management, and data transparency are not peripheral concerns but central design constraints. Unlike industrial or laboratory systems, residential robots are evaluated as much on relational quality and perceived presence as on task performance.

Across domains, several cross-cutting HRI dimensions can be identified:

Intent Communication: Required everywhere, but ranging from purely functional signaling (industry, labs) to socially expressive behavior (homes, healthcare).
Adaptivity: Minimal in highly structured environments; essential in healthcare and residential settings.
User Modeling: Optional in industrial contexts; critical in long-term residential or elderly care scenarios.
Emotional Intelligence: Marginal in productivity-driven domains; central in companion-oriented applications.
Trust Formation: Performance-based trust dominates in laboratories and industry, while relational and privacy-based trust becomes decisive in homes and healthcare.

In summary, indoor mobile robotics do not present a uniform HRI problem. Instead, each domain shifts the balance between efficiency, safety, social intelligence, emotional responsiveness, and ethical safeguards. Successful HRI design therefore requires domain-specific prioritization layered upon a shared technical foundation (Table 3 and Table 4).

4. The Technical Aspect of HRI

Mobile robots in indoor environments typically perform tasks through three major steps: perception, navigation, and interaction. Among these, the Human–Robot Interaction (HRI) component is central when a robot executes tasks on behalf of or in collaboration with humans. Effective HRI ensures safety, comfort, and efficient task execution, forming the core of human-centered robot behavior (Figure 4).

Before explicit interaction begins, implicit cues during navigation already constitute a form of interaction. For instance, human-aware navigation communicates the robot’s intent, enhancing coordination in shared spaces and providing humans with a sense of safety [34,54,60,100]. Similarly, interaction modalities such as speech, gesture, touch, and visual cues shape how humans understand and guide the robot’s actions.

While many studies have catalogued individual methods and interaction techniques, a structured, analytical synthesis of these approaches is still lacking. Current research often focuses on isolated methods without systematically comparing their robustness, computational requirements, user cognitive load, or adaptability to real-world indoor environments. This gap motivates the present chapter, which aims not only to summarize existing work but also to critically evaluate methods, highlight emerging trends, and identify open challenges and research gaps.

In the following sections, the chapter is organized to support this analytical perspective:

Human-aware navigation (Section 4.1)—methods for safe, legible, and socially compliant navigation, including a comparison of classical and learning-based approaches.

Interaction modalities (Section 4.2)—detailed treatment of speech, gesture, touch, visual/gaze, and multimodal systems, with structured comparison tables, evaluation of performance trade-offs, and integration of recent advances such as LLMs and VLMs.

Cross-cutting synthesis (Section 4.3)—an overview of trends, methodological patterns, and critical limitations across modalities.

By explicitly combining descriptive and analytical perspectives, this chapter aims to move beyond a simple catalog of HRI techniques and provide a critical, structured review of the current state-of-the-art in indoor mobile robotics.

4.1. Human-Aware Navigation

Before explicit interaction begins, implicit interaction already occurs through the navigational behavior of the robot. Human-aware navigation aims to give human a perception of safety, communicate the robot’s intent, and facilitate coordination in shared workspaces [34,53,60,100]. Design principles include expressive movement, legibility, comfort, and adherence to social norms in human-populated environments [34]. Table 5 represents four approaches to social navigation.

The human-centric nature of perception and navigation in indoor mobile robots can be seen in Figure 5.

While all four approaches enable human-aware navigation, they differ in robustness, adaptability, computational requirements, and real-world applicability. Reactive approaches are simple and fast but limited in dynamic environments, whereas predictive and model-based approaches improve coordination and social compliance at the cost of higher computational effort. Learning-based methods provide adaptive and socially compliant behaviors but require extensive training and sensor integration. Understanding these trade-offs is crucial for selecting navigation strategies tailored to specific indoor environments and user populations.

Lasota et al. survey safety strategies for close-proximity collaboration [54], and these principles are integrated into collision avoidance and navigation systems. One of the primary limitations is space; indoor environments often feature narrow corridors, cluttered rooms, and dynamic obstacles, requiring compact, agile robots with highly precise navigation capabilities while maintaining safety and trust of their human counterparts and not becoming a hurdle in their path [55]. Robots must maneuver around furniture, equipment, and humans safely and without disruption. Power management represents another critical challenge [101]. Many robots, especially in healthcare settings or laboratory settings, must operate continuously for extended periods, often 8–12 h, without frequent charging [102]. This necessitates efficient energy consumption, optimized motion planning, and, in some cases, the ability to autonomously dock and recharge. Advanced power systems, lightweight materials, and energy-efficient components are therefore essential design considerations. Human tracking and following is another aspect of assistive robots. Multisensor-based human detection and tracking systems combine data from cameras, depth sensors, and other modalities to achieve robust performance in crowded environments [103]. Intelligent mobility-assistance robots leverage multimodal sensory processing to provide safe and effective support for users with mobility impairments [104]. RGB-D sensor-based real-time people detection and mapping systems enable mobile robots to maintain awareness of human positions and movements [105], helping overcome the limitations of individual modalities and providing redundancy in case of sensor failures. A typical human tracking and following system is presented in Figure 6.

Simultaneous Localization and Mapping (SLAM) is essential for indoor navigation. Robots must accurately map unknown or changing environments while simultaneously tracking their position in real-time. SLAM systems must remain robust in dynamic spaces with moving obstacles, variable lighting, and crowded conditions [73,106]. High-precision mapping supports both task execution and safe interaction with humans. Modern indoor robots employ a variety of sensors, including LiDAR, RGB-D cameras, ultrasonic sensors, and inertial measurement units (IMUs), which must operate reliably across diverse indoor conditions [107]. Collaborative mapping approaches enable multiple robots to work together in constructing environmental representations, improving coverage and efficiency [108]. Advanced scan matching techniques exploiting dominant directional features and improve localization accuracy [109], while integrating Ultra-Wideband (UWB) technology with traditional SLAM systems further enhances indoor positioning accuracy and human avoidance [110].

The evolution from reactive to learning-based navigation reflects a broader trend toward adaptive, socially aware, and context-sensitive navigation. Multisensor fusion, predictive path planning, and collaborative mapping are increasingly standard. However, open challenges remain:

Robust operation in extremely dense or highly dynamic indoor spaces.
Trade-offs among computational cost, real-time performance, and social compliance.
Long-duration operation under constrained energy resources.
Seamless integration with multimodal interaction modalities to ensure cohesive HRI.
Adaptability to diverse user populations with varying mobility and cognitive abilities.

Addressing these gaps is critical for the next generation of indoor mobile robots capable of safe, efficient, and socially compliant operation in real-world human environments.

4.2. Interaction Modalities

The effectiveness of indoor mobile robots depends critically on the design and implementation of appropriate interaction modalities, as these determine how intuitively, efficiently, and safely humans can communicate with and control robotic systems. Unlike traditional human–computer interfaces, which often rely on static screens or input devices, HRI in mobile robotics must account for the dynamic and content-rich nature of human-centered environments. This includes managing spatial relationships, supporting natural and multimodal communication, adapting to human mobility and movement patterns, and responding to real-time environmental changes that affect both robot behavior and human expectations [111]. Figure 7 depicts the explicit and implicit channels of interaction typically used in indoor mobile robotics. These channels span speech, gesture, touch, visual/gaze, and multimodal approaches, forming the basis for safe, effective, and social-aware interactions.

Domain-specific studies provide valuable insights into user expectations and interaction requirements. Reviews of social robots in classrooms highlight evidence for effective engagement and learning outcomes [61], while research in autism-related therapy demonstrates the potential of robots to support specialized interventions [70]. Similarly, studies of service robots in home environments reveal how everyday adaptations, user preferences, and environmental constraints shape interaction design [4]. These findings emphasize that no single interaction modality suffices across all contexts. Instead, interaction systems must be selected and adapted based on the task, user population, and environmental characteristics. The following Section 4.2.1, Section 4.2.2, Section 4.2.3, Section 4.2.4 and Section 4.2.5 provide a detailed examination of each major modality—speech, gesture, touch, visual/gaze, and multimodal systems—highlighting their technical characteristics, comparative strengths, current trends, and remaining limitations. By systematically evaluating each modality, we aim to provide a critical and structured synthesis rather than a simple catalog of existing methods.

4.2.1. Speech-Based Interaction

Speech-based interaction represents one of the most natural communication modalities for humans and has been extensively studied in indoor mobile robotics [112]. Voice interfaces enable hands-free operation, which is particularly valuable in healthcare and laboratory environments where users’ hands may be occupied with other tasks [113].

Automatic Speech Recognition (ASR) technology has matured to a level suitable for practical deployment, with modern systems achieving high accuracy even in noisy environments [114]. Cloud-based ASR systems can further improve performance but raise privacy concerns in sensitive settings, leading to increased interest in on-device processing [115]. Figure 8 illustrates the process of a typical speech-based interaction with distinct components like ASR, NLP and TTS.

Natural Language Processing (NLP) allows robots to understand complex commands and engage in contextual conversations [116]. Large Language Models (LLMs) are increasingly integrated into robotic systems, enabling sophisticated dialogue management and intent understanding [117]. Cognitive instruction interfaces leverage natural language understanding to provide intuitive robot navigation commands [118], while recent advances in grounding implicit goal descriptions allow robots to interpret ambiguous spatial references through recursive belief updates [119]. Overall, LLM integration is emerging as a powerful approach to enhance the comprehension of complex verbal instructions [120].

The third component of speech interaction is text-to-speech (TTS). Recent developments in speech synthesis allow robots to provide natural-sounding feedback and convey emotional states [121]. Paralinguistic features such as tone, pace, and volume can communicate robot intentions and emotions, enhancing the overall user experience [122]. For secure interactions, voice biometric authentication ensures that only recognized personnel can control the robot, even when multiple users are present (Figure 9).

Despite these advances, speech-based interaction in indoor environments faces several technical and practical challenges. Ambient noise, multiple speakers, and acoustic reverberation can degrade recognition performance [123]. Cultural and linguistic diversity requires multilingual support and accent adaptation [124]. Privacy concerns are particularly critical when voice data is processed or stored in healthcare settings [125].

Table 6 summarizes the main approaches to speech-based interaction, including classical keyword/grammar-based methods and their key performance characteristics, evaluation environments, metrics, and identified limitations. This structured comparison highlights the trade-offs and gaps in existing methods, motivating the integration of LLM and VLM approaches discussed below.

Recent advances in Large Language Models (LLMs) and vision-language models (VLMs) are fundamentally transforming Human–Robot Interaction. Unlike traditional command-based speech recognition, LLMs enable context-aware reasoning, natural language understanding, and adaptive dialogue, allowing robots to interpret complex user intents and interact in more flexible, human-like ways. Compared to classical interaction modalities such as command-based speech, gesture, or gaze, LLM-driven approaches provide enhanced adaptability, a richer semantic understanding, and the ability to integrate multimodal input streams. This paradigm shift allows indoor mobile robots to operate more autonomously in dynamic environments while improving the naturalness of human–robot collaboration.

Despite these capabilities, deploying LLMs on mobile robots introduces several practical challenges. Onboard inference can be limited by hardware constraints, whereas cloud-based inference introduces network latency and potential reliability concerns. Safety-critical tasks require a careful evaluation of decision timing, error propagation, and fallback mechanisms to ensure robust interaction in real-world environments.

To systematically analyze these methods, we evaluate LLM- and VLM-based approaches using the same criteria applied to classical modalities, including computational cost, robustness to environmental noise, user cognitive load, and hardware requirements. This comparative perspective highlights the trade-offs among performance, deployability, and safety and informs the selection of appropriate speech-based interaction strategies in indoor mobile robotics.

Table 7 summarizes the main LLM- and VLM-based approaches to speech HRI, highlighting their evaluation settings, metrics, strengths, and limitations. Compared to classical methods (Table 6), these approaches offer a richer context understanding, multimodal integration, and greater flexibility, but require the careful consideration of computational cost, latency, and safety-critical constraints.

For a quick comparative overview, Table 8 summarizes key differences across computational costs, robustness, user load, hardware requirements, evaluation metrics, strengths, and limitations, highlighting the evolution from traditional to modern methods.

This comparative overview highlights the trade-offs between classical and modern LLM/VLM-based speech interaction methods, emphasizing where traditional approaches fall short and motivating the integration of context-aware, multimodal models to improve robustness, flexibility, and user experience in indoor mobile HRI.

4.2.2. Gesture-Based Interaction

Gesture-based interaction provides an intuitive and natural communication channel that can complement or substitute for speech in various scenarios [133]. Hand and arm gestures can convey spatial information, directional commands, and social signals that are difficult to express verbally [134].

Vision-based gesture recognition systems primarily rely on RGB cameras, depth sensors, or combinations thereof to capture and interpret human movements [135]. These systems must operate in real-time while remaining robust to variations in lighting, occlusions, user differences, and environmental clutter [136]. A typical gesture-recognition pipeline combines spatial understanding (e.g., MediaPipe Holistic [137]) with temporal modeling via recurrent networks such as LSTMs to interpret dynamic gestures (Figure 10).

Simple gesture vocabularies have proven effective for basic robot control, including pointing gestures for navigation, hand signals for stop/start commands, and waving to attract attention [138]. More complex gestures can convey emotional states, social intentions, and task-specific instructions [139].

Spatial gesture recognition allows users to indicate locations, directions, and objects within the environment [140]. Over time, gesture-based HRI methods have evolved from template- or rule-based systems toward machine learning and deep learning approaches. Machine learning-based systems enable robots to recognize subtle gestural cues and adapt to user-specific behaviors, supporting tasks such as mobility assistance [141]. Deep learning methods, including LSTM or Transformer models, further enhance recognition accuracy and temporal understanding, enabling real-time gesture control in dynamic environments [142]. Collaborative object handling between robots and human operators also benefit from sophisticated gesture recognition and motion prediction [143], which is particularly valuable for mobile robots navigating to specific locations or interacting with environmental objects. Overall, the field shows a clear trend from fixed vocabularies to adaptive, learning-based systems, capable of interpreting complex gestures, social cues, and emotional states. Table 9 provides a summary of gesture-based HRI methods including suitable evaluation metrics, and limitations.

Despite these advances, gesture-based interaction faces several challenges. Gesture ambiguity and cultural differences can lead to misinterpretation [144], and environmental factors, such as lighting, background clutter, and camera positioning, strongly influence recognition reliability. Many systems require user training or adaptation to function robustly. Furthermore, computational cost and latency for real-time deep learning models can constrain deployment on mobile robots. Finally, integrating gesture recognition with other interaction modalities, such as speech or gaze, remains an open area of research to improve robustness, flexibility, and overall user experience.

4.2.3. Touch and Physical Interaction

Touch and physical interaction provide direct, immediate feedback channels that are particularly valuable for users with visual or auditory impairments [145]. In contrast to speech or vision-based modalities, tactile interaction enables unambiguous, proximal control and often reduces the cognitive load, especially in safety-critical scenarios. Tactile interfaces on mobile robots include touchscreens, physical buttons, force-sensitive surfaces, and haptic feedback mechanisms.

Touchscreen interfaces offer familiar interaction paradigms from smartphones and tablets, enabling complex command input and information display [146]. However, robot mobility introduces challenges. Screen visibility can be compromised during movement, accessibility may vary depending on the robot’s height and orientation, and hardware must withstand diverse indoor environments. Thus, while touchscreens support rich interaction, their usability is dependent on context.

The physical manipulation of robot components, such as guiding arm movement or adjusting robot positioning, enables direct spatial instruction [147]. This modality is particularly useful for demonstrating desired behaviors, correcting robot actions, or guiding motion in shared workspaces. Compared to indirect modalities such as speed, physical interaction offers high precision and immediate feedback but requires close proximity and careful safety control.

Haptic feedback through vibration, force, or texture variation can provide the confirmation of user inputs and convey robot status information [145]. Such feedback is especially important for users with visual impairments or in environments where visual attention must be directed elsewhere. In this sense, haptic feedback not only supports accessibility but also enhances situational awareness.

Safety considerations are paramount in the design of physical interactions. Force limitation, emergency stop mechanisms, and collision avoidance strategies must be implemented to prevent injury and ensure safe operation [148]. The robot’s physical design must carefully balance interaction capabilities with user safety requirements. Authentication mechanisms, such as fingerprint identification, ensure that only authorized personnel can access certain touch-based functionalities. Figure 11 represents a typical fingerprint-identification process.

A clear trend can be observed from simple mechanical interfaces toward sensor-rich, force-aware systems that enable compliant and safe physical interaction. Modern systems increasingly integrate force-torque sensors, tactile skins, and adaptive control strategies to support shared control and collaborative behaviors. However, this evolution introduces trade-offs among the hardware complexity, safety certification requirements, and deployment cost.

Despite their intuitive nature, touch and physical interaction modalities face several limitations:

Requirement of physical proximity, limiting scalability and remote use.
Safety risks in case of control failure or excessive force.
Hardware wear and durability issues in high-frequency use scenarios.
Hygiene and contamination concerns in healthcare environments.
Limited expressiveness compared to speech or multimodal interaction.

Furthermore, the integration of physical interaction with other modalities (speech, gaze, gesture) remains an important research direction to enable seamless transitions between direct and indirect control paradigms.

Table 10 provides a summary of touch and physical interaction HRI methods including evaluation metrics, and limitations.

4.2.4. Visual and Gaze-Based Interaction

Visual interaction modalities leverage human visual perception and attention mechanisms to establish effective communication channels between humans and robots [149]. Eye gaze patterns, facial expressions, and body language provide rich information about user intentions, emotional states, and situational context. These modalities complement speech and touch, enabling more nuanced and socially aware interaction.

Gaze-based interaction allows users to direct robot attention, indicate objects of interest, and provide spatial references [150]. Eye-tracking technology integrated into mobile robots can detect the gaze direction and duration, enabling attention-aware behaviors and improving task coordination in shared instances. Figure 12 depicts a typical gaze-based interaction pipeline.

Facial expression recognition enables robots to assess user emotional states and adapt the behavior accordingly [151], which is particularly valuable in healthcare, social robotics, and assistive environments. Face recognition additionally provides a secure mechanism for user identification, as illustrated in Figure 13.

Visual displays on robots, including LED patterns, screens, and expressive body language, can convey information, intentions, and emotional states to users [58]. Moreover, robot positioning and orientation communicate social intentions and respect personal space, helping to maintain appropriate distances according to proxemic norms [152]. Effective visual interaction thus integrates perception, expressive signaling, and socially aware navigation.

Despite these advantages, visual and gaze-based modalities face several challenges. Variations in lighting, occlusion, and user mobility can degrade recognition performance. Cultural differences in gestural or gaze meaning require adaptive interpretation. Privacy concerns are also paramount when capturing or processing visual data, particularly facial recognition information.

Visual and gaze-based HRI methods can be grouped into several categories:

Gaze Tracking Systems—Eye trackers and camera-based systems for attention-aware behavior and spatial referencing [150].
Facial Expression Recognition—Classifiers (traditional or deep learning-based) to infer user affective states and guide robot responses [151].
Face Recognition and Biometric Access—Security and user identification using facial data [58].
Visual Feedback Displays—LEDs, expressive screens, or robot motion to communicate information and robot intent [58,152].

A clear trend emerges toward deep learning-based and multimodal visual interpretation, combining gaze, facial expressions, and robot positioning to support context-aware behavior. Integrating these systems with other modalities (speech, gesture, touch) enhances robustness, adaptivity, and user experience.

Table 11 provides a comprehensive summary of visual and gaze-based HRI methods including their approaches, metrics, and limitations.

4.2.5. Multimodal and Adaptive Systems

Multimodal interaction systems combine multiple input and output modalities to create more robust and flexible communication channels [153]. These systems can adapt to user preferences, environmental conditions, and task requirements by dynamically selecting appropriate interaction modalities. Multimodal human–robot interfaces that combine speech with other modalities have proven effective for remote robot operation [154]. Sensor fusion techniques combine data from speech, vision, touch, and other sensors to improve recognition accuracy, robustness, and overall system reliability [155].

Adaptive interaction systems learn user preferences and adjust their behavior over time [87]. Machine learning algorithms optimize interaction strategies based on user feedback, task success rates, and environmental conditions. Context-aware systems consider environmental factors, user states, and task requirements when selecting interaction modalities [156]. For example, a robot may switch from speech to visual display in noisy environments or use gesture recognition when users’ hands are free.

The integration of multiple modalities requires sophisticated fusion algorithms and decision-making frameworks [157]. Temporal synchronization, conflict resolution, and priority management are critical considerations in designing robust multimodal HRI systems. Despite advances, challenges remain in the latency, computational cost, sensor calibration, and safety-critical decision making, especially in real-world, dynamic environments.

Table 12 provides a comprehensive comparison of interaction modalities across key performance dimensions, highlighting the complementary nature of different approaches and the importance of multimodal integration for optimal user experience.

Recent advances in vision-language models (VLMs) have enabled multimodal systems to integrate visual perception with natural language understanding, allowing robots to interpret user commands and environmental cues simultaneously. For example, a robot can identify objects in its environment using VLM-based perception such as Contrastive Language-Image Pretraining (CLIP) [132] while understanding instructions provided in natural language through LLM reasoning.

This integration represents a major step toward “embodied AI,” shifting mobile service robots from simple command executors to autonomous, high-level agents capable of intuitive communication and context-aware assistance [158]. By leveraging LLMs and VLMs, mobile robots can perform language-conditioned control, allowing them to navigate via complex semantic instructions (e.g., systems like NavGPT [159]) and execute adaptive mobile manipulation tasks (e.g., BUMBLE) [76]. To ensure continuous and seamless Human–Robot Interaction (HRI), novel frameworks utilizing Perception-Action Loops (PALoop) have been introduced, enabling robots to combine logic reasoning with pre-trained databases for long-horizon planning [160].

A critical challenge in applying these foundation models to physical environments is bridging the gap between abstract text and the 3D world. To address this, frameworks like the TAsk Planing Agent (TaPA) align LLMs with open-vocabulary visual object detectors to generate grounded, executable plans for complex indoor household tasks [161]. Similarly, 3D vision-language-action (VLA) models, such as LEO, allow mobile agents to perceive, reason, and act directly within 3D spaces, responding accurately to multi-round user instructions [162]. Because “hallucination” in LLMs poses a safety risk in physical HRI, recent prompting mechanisms like Contextual Set-of-Mark (ConSoM) have been developed to significantly improve visual grounding and precision in indoor robotic scenarios [163].

Furthermore, modern VLM integration is expanding beyond simple RGB vision to true multisensory perception. Advanced models like MultiPLY integrate 3D visual data with tactile, auditory, and thermal inputs [164]. This multisensory approach is pivotal for robust embodied interaction; for instance, incorporating tactile sensing enables a mobile manipulator to gather vital safety feedback to avoid applying excessive force, while auditory processing allows it to isolate voice commands [165]. Finally, to ensure safety and precision during task execution, researchers are implementing closed-loop feedback systems. By integrating Small Language Models (SLMs) alongside VLMs, indoor mobile robots can iteratively evaluate scene changes and refine their control commands in real-time, adapting instantly to dynamic human environments [166].

Table 13 provides a comparative overview of multimodal and VLM-based interaction methods. Each approach is evaluated across key dimensions including computational cost, robustness to noise, user cognitive load, and hardware requirements. This table highlights the advantages of combining multiple modalities with LLM/VLM reasoning, such as increased adaptability, improved semantic understanding, and enhanced task performance. At the same time, it draws attention to limitations such as latency, onboard resource demands, and safety-critical constraints, guiding design decisions for real-world deployments.

Integrating LLMs and VLMs into multimodal interfaces represents a paradigm shift in HRI, moving from fixed-rule fusion strategies toward adaptive, context-aware reasoning systems. This enables more natural, flexible, and socially compliant human–robot collaboration. Design decisions for selecting modalities or combinations thereof should be guided by the task context, user population, and environmental constraints, with explicit consideration of computational limitations and safety-critical requirements.

4.3. Cross-Cutting Synthesis

Human–Robot Interaction in indoor mobile robotics encompasses multiple modalities, each offering distinct advantages, constraints, and suitability depending on the context. While previous sections have discussed speech, gesture, touch, visual/gaze, and multimodal approaches separately, synthesizing these insights highlights overarching trends, comparative strengths, and persistent challenges.

Table 14 provides a comprehensive overview of interaction modalities, highlighting advantages, limitations, computational demands, and typical applications. This allows for a side-by-side comparison of classical and learning-based approaches, including the role of LLM and VLM reasoning in modern adaptive systems.

A key development in HRI is the clear shift from classical, rule-based systems toward learning-based, adaptive approaches. Early systems for speech and gesture often relied on predefined templates and command sets. Modern deep learning architectures—particularly LSTM and Transformer models—allow the recognition of complex, context-dependent commands. When combined with LLMs and VLMs, robots can now interpret multimodal inputs and respond to subtle, situational user intentions.

Another significant trend is context- and environment-aware adaptation. Multimodal systems integrating speech, gestures, visual, and tactile signals can dynamically select the most appropriate interaction modality. For instance, robots may rely on visual displays or gestures in noisy environments or leverage gesture recognition when users’ hands are free, ensuring both efficiency and clarity of interaction.

Human-centered design is increasingly emphasized. Effective systems feature legible motion, socially compliant behavior, and adaptive responses. Multimodal integration not only enhances efficiency but also accessibility: limitations of individual modalities—such as reduced gesture recognizability in poor lighting or speech commands in visually cluttered environments—can be mitigated through sensor fusion.

Finally, cross-cutting challenges span the entire field. These include dependence on environmental conditions, variability of human users (e.g., cognitive load, cultural differences), and high computational and hardware demands for LLM/VLM-based systems. Moreover, the lack of standardized datasets, evaluation metrics, and protocols complicates the comparative assessment and reproducibility of approaches.

In summary, Human–Robot Interaction in indoor mobile robotics is shaped by evolving methodologies, context-specific applications, and enduring challenges. Understanding these cross-cutting factors is critical for developing adaptive, robust, and user-centered systems and provides a foundation for advancing future HRI research and deployment.

5. The Human Aspect in HRI

5.1. User Experience and Acceptance

User experience and acceptance are critical determinants of successful indoor mobile robot deployment, as they shape not only the immediate usability of robotic systems but also their long-term integration into everyday human environments [167]. Beyond technical performance, these factors encompass usability, trust, social acceptance, and the development of a long-term human–robot relationship, all of which influence whether users feel comfortable, empowered, and willing to cooperate with robots. Figure 14 represents various human factors that affect robot acceptance.

Moreover, sustainable adoption also requires attention to relationship building, cultural sensitivity, and the adaptation of robotic behaviors to diverse user groups and contexts, ensuring that technology serves human needs rather than imposing additional burdens. Fong et al. review socially interactive robots and taxonomies of embodiment and modalities [57]. Dautenhahn highlights HRI dimensions and the need for “robotiquette” [95], which is a set of social rules and behaviors that are comfortable, predictable, and acceptable to humans when the robots become social partners, while Feil-Seifer and Matari’c define socially assistive robotics that support users through social (non-contact) interaction [96]. This social perspective complements the technical interaction design, emphasizing the importance of robot embodiment and adherence to social norms in human–robot collaboration.

5.1.1. Human-Centered Design Principles

Human-centered design in indoor mobile robotics requires an understanding of user needs, capabilities, and limitations across diverse populations [168]. This approach emphasizes iterative design, the integration of user feedback, and accessibility considerations throughout the development process. Universal design principles promote robot accessibility for users with varying abilities, ages, and technical expertise [169], including accommodations for visual, auditory, motor, and cognitive impairments. The Dual nature of HRI is depicted in Figure 15.

User interface design must balance simplicity with functionality, providing intuitive controls while enabling complex task execution [170]. The challenge is to accommodate both novice and expert users within a single interaction framework.

Personalization allows robots to adapt to individual user preferences, communication styles, and task patterns [171], achieved either through explicit configuration or implicit learning from interaction behaviors.

5.1.2. Trust and Reliability

Trust is a fundamental requirement for effective Human–Robot Interaction, particularly in indoor environments where robots operate close to humans. Trust calibration and appropriate reliance are central to long-term deployment [36]. Sheridan outlines challenges related to supervision, safety, and value alignment [37]. A meta-analysis quantifies robot performance and attributes as dominant predictors of trust in HRI [35,77]. According to this meta-analysis by Hancock et al., the foundational framework for trust is affected by three major factors in decreasing order of significance:

Robot-related factors: performance (reliability, capability, and predictability) and attributes (appearance and personality)
Environment-related factors: task type, team composition and risk level
Human-related factors: personality, culture, experience, age and gender

System reliability encompasses both technical performance and interaction consistency [172,173]. Users must trust that robots will execute tasks correctly, respond appropriately to commands, and operate safely. Failures or unpredictable behaviors can severely undermine user trust. Much like human relationships, trust in robots is positively affected by their perceived competence (ability to perform tasks effectively) and warmth (compatible intentions, social behavior) [174]. Figure 16 depicts the importance of communication when encountering failure to mitigate or exacerbate the impact. Robots that use explicit and implicit communication to make their actions understandable and predictable are perceived as more trustworthy [59].

Transparency and explainability enable users to understand robot decision-making processes and limitations [36]. This understanding is particularly important when robots make autonomous decisions that affect user safety or task outcomes.

Appropriate trust calibration prevents both over-reliance and under-utilization of robotic systems [124]. Users should maintain realistic expectations of robot capabilities while feeling confident in the robot’s capabilities. Table 15 depicts the translation of human–human trust factors to human–robot trust paradigms.

Cultural factors influence trust development, technology acceptance, authority relationships, and social interaction norms vary across different populations [124]. Robot design must account for these cultural differences to ensure successful deployment in diverse environments.

Trust is not a simple dyadic relationship. It is a triad among the user, the robot, and the developer (the organization). A user’s trust in the robot is contingent on their trust in the deploying organization [175]. Figure 17 represents this three-way trust equation.

5.1.3. Social Acceptance and Integration

Social acceptance reflects the broader community’s willingness to integrate robots into shared spaces and daily activities [176]. The Technology Acceptance Model (TAM) suggests that acceptance is driven by perceived usefulness and perceived ease of use [177]. This acceptance depends on perceived benefits, social norms, and ethical considerations. For specific demographics like older adults, psychosocial factors and pre-existing attitudes towards technology play a significant role [63].

Robot appearance and behavior strongly influence social acceptance, with design choices affecting perceived intelligence, friendliness, and trustworthiness [178]. The “uncanny valley” phenomenon suggests that robots should appear either clearly non-human or convincingly human-like to maximize acceptance [179].

Defining the social role of robots establishes appropriate expectations and boundaries for behavior [58]. Clear role definitions prevent role confusion and establish consistent interaction protocols.

Long-term effects must also be considered, as users develop relationships with robots over time [180]. The initial novelty may fade, requiring sustained engagement strategies and the continued demonstration of value.

Successful community integration involves training, support, and change management processes to facilitate robot adoption [14]. Adaptation depends on buy-in from multiple stakeholders, including the users, administrators, and support staff.

Human–Robot Interaction (HRI) studies indicate that user acceptance relies not only on functional capabilities but also on social and emotional factors. Robots that display appropriate social behaviors, maintain culturally sensitive interaction patterns, and adapt to individual user preferences achieve higher levels of acceptance and sustained engagement. The cognitive aspects of HRI extend beyond simple command-response paradigms to include learning, adaptation, and the development of shared mental models between humans and robots [181].

5.1.4. Usability and Accessibility

Usability evaluations in mobile robotics require specialized metrics that account for spatial interaction, mobility, and multi-user scenarios [40]. Traditional usability measures must be adapted for the unique characteristics of mobile robot interaction. The usability heavily depends on the cognitive workload of the supervisor or user. To scale multi-robot systems, the focus must shift from adding more robots to designing systems that reduce the number of commands required per robot and optimize the human interaction model to manage attention and workload effectively [62]. This can be achieved by delegating most of the tasks to a fleet management system, and supervision is done by the user instead of direct instructional control, just like in the case of human management.

Accessibility considerations ensure that robots can effectively interact with users with diverse abilities and needs [6], including support for visual, auditory, motor, and cognitive impairments.

Learnability and memorability are critical, as users must quickly understand robot capabilities and remember interaction procedures [168]. Complex interaction sequences should be minimized, and consistent paradigms should be maintained across different robot functions.

Error prevention and recovery mechanisms help users avoid mistakes and recover from interaction failures [182]. Clear feedback, confirmation dialogs, and undo capabilities can improve user confidence and task success rates.

Performance metrics should include both objective measures (e.g., task completion time, error rates) and subjective measures (e.g., satisfaction, comfort, perceived usefulness) to provide a comprehensive usability assessment [167].

5.1.5. Metrics and Evaluation Standards

Steinfeld et al. propose standardized metrics for task-oriented HRI [40]. For social perception, the “Godspeed Questionnaire Series” measures anthropomorphism, animacy, likeability, perceived intelligence, and safety [41]. Such standardized metrics are essential for comparing robot systems across different applications and ensuring the consistent evaluation of user experience.

These considerations collectively support design principles emphasizing calibrated trust and the use of standardized measurement methods to evaluate robots across diverse settings and applications.

5.2. Safety and Privacy

Safety and privacy considerations are paramount in indoor mobile robotics, where robots operate close to humans, navigate complex and dynamic shared spaces, and may access, process, or transmit sensitive personal or institutional information [54]. These considerations encompass not only physical safety—such as collision avoidance, reliable obstacle detection, and fail-safe emergency mechanisms—but also data privacy, including responsible handling of user information, secure communication channels, and transparent data practices. Regulatory compliance also plays an essential role, as robots must adhere to established standards, ethical guidelines, and legal frameworks that protect individuals and ensure accountability for developers and operators. Together, these aspects form the foundation for user trust and the sustainable deployment of mobile robots in human-centered environments.

5.2.1. Physical Safety vs. Perceived Safety

The concept of physical safety in indoor mobile robotics involves preventing harm to humans through collision avoidance, force limitation, and emergency response capabilities and understanding injury mechanisms through systematic collision tests [148], while perceived safety is the perception of humans towards the robot, and it depends on human factors. Safety requirements vary depending on the environment, user population, and robot capabilities.

Physical Safety: Multi-sensor approaches combining LiDAR, cameras, and ultrasonic sensors provide redundancy and improve detection reliability. Collision-avoidance systems must reliably operate in dynamic environments with moving humans, furniture, and other obstacles [34]. Human-aware navigation algorithms consider social conventions, personal space, and movement prediction to enable safe and socially appropriate robot behavior [52]. These algorithms must balance efficiency with social acceptability.

Emergency stop capabilities and fail-safe mechanisms ensure that robots can be quickly disabled in case of malfunction or unsafe situations [183]. These systems must be easily accessible and intuitive for users to operate under stress.

Force limitation and compliance control prevent injury during physical interactions, which is particularly important for robots with manipulation capabilities [148]. Robots must be designed to limit forces to safe levels while maintaining sufficient capability for useful tasks.

The identification of dynamic contact-avoidance zones in human–robot collaborative workspaces is crucial for preventing collisions and ensuring safe interaction [184]. Longterm deployments such as the Xavier robot project have provided valuable lessons on maintaining safe operation over extended periods [185]. The detection and state estimation of moving objects from a moving platform remain challenging problems for safe indoor navigation [56].

Perceived safety, which may differ from objective safety metrics, significantly influences user acceptance and must be carefully considered in robot design [186]. A robot that is physically safe by design can still be perceived as unsafe by humans. Akalin et al. addressed these influencing factors, which are as follows [187]:

Comfort: How at ease does the person feel?
Familiarity: Prior experience with robots.
Sense of Control: The user’s feeling of agency over the interaction.
Transparency and Predictability: Clear understanding of the robot’s actions and intentions.

5.2.2. Regulatory and Standards Framework

Safety standards for mobile robots are evolving to address the unique challenges of indoor Human–Robot Interaction. Standards such as ISO 13482 for personal care robots provide frameworks for safety assessments [188]. Physical safety is governed by standards like ISO 15066, which provide guidelines for collaborative robot systems to mitigate hazards from unintended contact [189]. Figure 18 depicts various safety standards involving different environments, perceived and physical safety, and social navigation.

Risk-assessment methodologies must consider both systematic and random failures, human factors, and environmental variations [190]. Comprehensive risk analysis should address all phases of robot operation, including deployment, maintenance, and end-of-life.

Certification processes vary by region and application domain, with medical device regulations applying to healthcare robots and consumer product standards for residential applications [191]. Manufacturers must navigate complex regulatory landscapes to achieve market approval [68].

Liability frameworks remain unclear in many jurisdictions, raising questions about responsibility for robot actions and decisions [192]. Clear legal frameworks are needed to support widespread robot deployment while protecting users and manufacturers.

5.2.3. Privacy and Data Protection

Privacy protection in indoor mobile robots involves safeguarding personal information collected through sensors, interactions, and environmental monitoring [94]. Privacy considerations encompass data collection, storage, processing, access, and sharing practices.

Data-minimization principles suggest collecting only necessary information for robot operation and interaction [193]. This approach, which is called privacy by design, reduces privacy risks while maintaining functional capabilities. This privacy versus utility tradeoff is depicted in Figure 19.

Consent mechanisms must be designed to inform users about data collection practices and provide meaningful choices about privacy settings [194]. Dynamic consent systems may be needed to address changing privacy preferences and robot capabilities.

Anonymization and encryption techniques can protect sensitive data while enabling necessary robot functions [67]. However, the effectiveness of anonymization decreases as datasets become richer and more comprehensive. Which increases the concern for household robots becoming targets of malicious attacks, turning a helpful assistant into a surveillance device.

Edge computing and on-device processing can reduce privacy risks by minimizing data transmission and cloud storage [101]. Privacy considerations are particularly important in person-following applications, where robots must track individuals while respecting their privacy boundaries [195].

Security robots deployed in homes and commercial spaces must balance surveillance capabilities with data-protection requirements [196,197]. This approach may require tradeoffs in processing capability and system performance.

5.2.4. Ethical Considerations

Ethical considerations in indoor mobile robotics encompass autonomy, dignity, transparency, and fairness [198]. These considerations become particularly important in vulnerable populations such as elderly users or patients.

Autonomy preservation ensures that users maintain control over robot behavior and decision-making [10]. Robots should enhance rather than replace human agency and decision-making capabilities.

Dignity preservation requires a respectful interaction design that maintains user privacy, choice, and social status [69]. Robots should not infantilize users or create dependency relationships that undermine human dignity.

Transparency in robot capabilities, limitations, and decision-making processes enables informed user consent and appropriate trust calibration [199]. Users should understand how robots operate and make decisions that affect them.

Fairness and non-discrimination require robot systems to provide equitable service across diverse user populations [198]. Bias in the recognition systems, interaction design, or service delivery must be identified and addressed.

Following landmark HRI research, safety and privacy measures should work together to establish standardized approaches that enable safe, trustworthy human–robot collaboration. Design frameworks should integrate physical safety mechanisms with data-protection protocols and ethical guidelines, supporting consistent assessments across different indoor mobile robot deployments. The success of indoor robotics using Human–Robot Interaction is at the interconnection of these domains, as depicted in Figure 20.

6. Case Studies

This section examines three representative indoor mobile robots, Moxi (healthcare), Temi (personal assistant), and Astro (home security/assistance)—to illustrate different approaches to Human–Robot Interaction (HRI) design and implementation in real-world applications. Each system demonstrates distinct strategies for integrating navigation, perception, interaction modalities, and social behaviors, reflecting the specific requirements of their target environments and user populations. By analyzing their interaction design, technical implementation, and user experience, this section highlights the successes and challenges relevant to the development of future indoor mobile robots.

6.1. Moxi—Healthcare Assistant Robot from Diligent Robotics

Moxi, developed by Diligent Robotics, represents a sophisticated approach to healthcare robotics with an emphasis on social intelligence and human-aware behavior [71]. Deployed in over 30 hospitals, Moxi has completed more than 500,000 deliveries, demonstrating practical viability in demanding healthcare environments [200].

Interaction Design: Moxi employs a multimodal interaction approach combining speech recognition, gesture detection, and expressive LED displays [201]. The robot’s anthropomorphic design with animated eyes and head movements creates social presence while maintaining professional appropriateness for healthcare settings.
Multimodal Efficiency: Hospital environments are notoriously noisy, which can render standard speech recognition unreliable. Moxi compensates for this by utilizing a highly effective multimodal interface (Visual/Gaze + Gestures + Audio cues). Instead of relying solely on voice, Moxi uses animated LED eyes to signal intent, accompanied by auditory “meeps” and specific voice lines to capture staff attention when an intervention is needed and head movements to communicate its status—a direct application of the “Visual/Gaze” advantages outlined in Table 12, providing non-intrusive attention awareness that does not disrupt clinical workflows.
Technical Implementation: The system integrates advanced navigation capabilities with social awareness, enabling operation in crowded hospital corridors while respecting patient privacy and clinical workflows [202]. Moxi utilizes elevator navigation, badge-access doors, and integration with hospital information systems.
User Experience: Healthcare staff report positive acceptance due to Moxi’s ability to handle routine tasks (supply delivery, equipment transport) that allow nurses to focus on patient care [2]. The robot’s social behaviors, including greeting staff and expressing gratitude, contribute to positive user relationships.
Predictability and Transparency: Moxi strongly addresses predictability by strictly adhering to routine hospital corridors and pausing when pathways are blocked to wait for the area to clear. Furthermore, it achieves transparency by using its gaze cues to look in the direction it is about to travel, allowing busy healthcare staff to intuitively anticipate its path and avoid collisions. It exhibits a highly structured transparency loop when facing navigation errors (e.g., failed badge access or network dropouts). Rather than failing silently, it displays explicit error messages on its screen, emits audio alerts, and sends direct status updates to staff communication devices. Furthermore, Moxi utilizes an off-site developer loop using LTE backup networks for remote monitoring. This ensures that while the robot operates autonomously, system failures are highly transparent to both on-site users (for immediate un-blocking) and off-site developers (for lifelong AI training).
Autonomy and Flexibility: Moxi scores exceptionally high here due to its mobile manipulation capabilities and integration with hospital infrastructure (e.g., calling elevators, opening badge-access doors) [202]. Its ability to rely on emergency stop buttons for manual human intervention ensures flexibility in urgent clinical scenarios.
Emotional Engagement: Moxi features social behaviors like expressing gratitude or displaying heart-shaped eyes. While this foster perceived warmth and helps staff accept the robot, a persistent challenge remains in maintaining transparency when the robot makes a navigation error, as staff often lack an explainable reason for the failure.
Challenges and Lessons: Key challenges include integration with existing hospital systems, maintaining sterility protocols, and adapting to diverse clinical environments. In a high-stakes hospital environment, Moxi must be impeccably reliable and safe. Success depends on earning the trust of busy clinical staff and seamlessly integrating into complex, time-sensitive hospital workflows without causing disruption. The success of Moxi demonstrates the importance of task-appropriate design and extensive real-world testing [203].

Healthcare robots must navigate complex regulatory environments while addressing diverse stakeholder needs, including patients, medical staff, administrators, and family members. The integration of mobile robots into clinical workflows requires careful attention to infection control protocols, emergency-response procedures, and interoperability with existing hospital information systems. Brain–computer interface (BCI) technologies are emerging as potential control modalities for patients with severe mobility impairments, enabling direct neural control of assistive robots [20].

6.2. Temi—Personal Assistant Robot from Temi

Temi represents a consumer-oriented approach to indoor mobile robotics, focusing on personal assistance, entertainment, and communication [204]. Designed for home and office environments, Temi emphasizes accessibility and ease of use for diverse user populations.

Interaction Design: Temi features voice-first interaction with natural language processing, complemented by a touchscreen interface and mobile app control [205]. The robot follows users when requested and provides hands-free access to information, communication, and entertainment services [206].
Multimodal Efficiency: Temi relies heavily on speech-first modality backed by a touch interface (tablet screen). As noted in Table 12, speech provides highly accessible, hands-free interaction, making Temi particularly suitable for elderly users or those with limited mobility [206]. Temi employs an advanced hardware–software audio suite. It utilizes a four-microphone omnidirectional array paired with AI-driven Automatic Speech Recognition (ASR) and Natural Language Processing (NLP). This allows the robot to perform active voice localization—calculating the heading and distance to a specific speaker—while utilizing echo cancellation and noise reduction to filter out background interference. Furthermore, Temi fuses this audio data with facial recognition to visually distinguish between different users, enabling it to isolate commands and provide tailored interactions even in crowded, multi-speaker environments [206]. It relies on cloud-processing latency. The touch screen provides a vital fallback (direct feedback) but requires physical proximity, limiting the robot’s remote utility.
Technical Implementation: The system utilizes SLAM navigation, face recognition for personalization, and cloud-based AI services for natural language understanding [207]. Integration with smart home systems and video calling capabilities extends the robot’s utility.
User Experience: Temi targets non-technical users with simple setup and intuitive voice commands [204]. The robot’s friendly appearance and conversational interface reduce technology barriers for elderly users and families.
Emotional Engagement: Temi succeeds in projecting warmth and approachability through natural language conversations and a non-threatening physical footprint, significantly lowering technology barriers for families. It enhances engagement through physical adaptability—such as the screen automatically adjusting its angle to face the user during telepresence meetings—and broadens its accessibility via multi-language support.
Autonomy and Flexibility: Unlike Moxi, Temi lacks physical manipulation capabilities. This restricts its flexibility, meaning users must adapt to their expectations for physical assistance. However, its navigational autonomy is highly optimized for dynamic social spaces. Rather than just a “fetch-and-follow” dynamic, Temi’s robust sensor suite allows it to fluidly move between different groups of people in crowded environments (e.g., acting as a guide or sales assistant). Its 20-watt adaptive audio system (incorporating tweeters, mid-range speakers, and a subwoofer) ensures that its communication flexibility remains high even in public settings [207].
Transparency: Despite its sophisticated localization hardware, a major critical gap for Temi remains the lack of system explainability regarding its always-listening voice activation [205]. Users must implicitly trust the system’s cloud-based processing of their ambient audio and facial data, which can negatively impact long-term dependability and user privacy in intimate domestic spaces.
Challenges and Lessons: Privacy concerns arise from always-listening voice activation and cloud-based processing [205]. Limited manipulation capabilities restrict task range, highlighting the trade-offs between simplicity and functionality in consumer robotics.

6.3. Amazon Astro—Home Security and Assistance

Amazon Astro represents a comprehensive home robotics platform combining security monitoring, family communication, and smart home integration [97]. The system leverages Amazon’s ecosystem of services and devices for seamless integration.

Interaction Design: Astro employs multimodal interaction, including voice (Alexa), visual displays, and mobile app control [208]. The robot’s expressive screen and movement patterns communicate the status and intentions to family members.
Multimodal Efficiency: Astro is a prime example of a computationally intensive multimodal system. It fuses voice commands, visual screen expressions, and ambient sensing [208]. By integrating Ring camera technology, Astro extends its visual modality beyond simple HRI into active environmental monitoring [209]. However, as indicated in Table 12, this level of comprehensive sensing in a domestic setting severely exacerbates privacy concerns.
Technical Implementation: Advanced SLAM technology enables the autonomous navigation and mapping of home environments [209]. Integration with Ring security systems, smart home devices, and Amazon services provides comprehensive home automation.
User Experience: Astro aims to provide peace of mind through security monitoring while maintaining family connectivity [210]. The robot’s ability to patrol homes, check on elderly family members, and provide video calling enhances its value proposition.
Predictability: Astro scores well on predictability regarding its autonomous patrols and scheduled routines, leveraging advanced SLAM to reliably navigate changing home environments.
Emotional Engagement: Astro attempts to build emotional engagement using on-screen digital eyes to mimic animal-like companion behaviors. While Astro succeeds in providing immediate hardware transparency, long-term trust remains a challenge due to its integration with the broader Amazon ecosystem. Although users can manage data via Alexa Privacy Settings, the inherent nature of cloud-based visual and audio processing in intimate domestic spaces creates friction regarding data sincerity and privacy [211], highlighting the ongoing tension between advanced capability and consumer privacy.
Transparency: A major challenge in domestic robots is the user’s fear of covert surveillance. Astro explicitly addresses this Transparency requirement through distinct hardware cues: a bright green light on its periscope indicates active video streaming, while a blue light and on-screen “picture-in-picture” visuals indicate audio processing. Additionally, physical privacy controls (camera/mic off buttons) and distinct app states (Away/Home/Disarmed) allow users to accurately predict and control the robot’s monitoring behavior.
Challenges and Lessons: Privacy concerns are significant given Amazon’s data-collection practices and always-on monitoring capabilities [211]. The high cost and limited availability restrict market adoption, illustrating challenges in consumer robot economics.

6.4. Comparative Analysis

These case studies illustrate different approaches to indoor mobile robot design, each optimized for specific environments and user needs (Table 16). Moxi’s success in healthcare demonstrates the value of domain-specific design and extensive field-testing. Temi’s consumer focus highlights the importance of simplicity and accessibility in personal robotics. Astro’s comprehensive platform approach shows the potential for ecosystem integration while raising significant privacy concerns. Common success factors include reliable navigation, appropriate social behaviors, and clear value propositions for specific use cases. All three systems face challenges related to privacy, costs, and user acceptance that must be addressed for broader adoption.

7. Current Challenges and Future Perspectives

The field of indoor mobile robotics faces numerous technical, social, and economic challenges that must be addressed to achieve widespread adoption and enable truly effective human–robot collaboration [37]. These challenges span a wide spectrum, including the need for reliable navigation and mapping in dynamic indoor environments, robust and intuitive interaction modalities, adaptive behavior in response to diverse user populations, and the integration of safety, privacy, and ethical considerations. In addition, economic and market factors such as cost-effectiveness, standardization, and regulatory clarity play an important role in determining whether indoor mobile robots can be sustainably deployed across healthcare, laboratory, office, and residential settings.

7.1. Technical Challenges

Multi-modal Fluency: Moving beyond clunky commands to a state where robots can understand and generate communication through a seamless fusion of language, gesture, and contextual awareness, much like humans do. Speech recognition, gesture detection, and other interaction modalities must achieve near-human reliability to maintain user trust [112].
Adaptive Autonomy: The ability of a robot to intelligently adjust its level of autonomy based on the task, the user’s expertise, and the complexity of the environment. This enables true collaboration, not just task delegation as shown in Figure 21. This requires systems that can dynamically shift between levels of autonomy, for example, from “robot decides and acts” to “human suggests, robot approves” [212].

Figure 21. Adaptive autonomy.

Explainable AI (XAI): Trust is impossible when a robot’s decision-making process is an opaque “black box”. Users need to understand the ‘why’ behind a robot’s actions, especially when those actions are unexpected. As trust is built by transparency, and hence, the reasoning for autonomous systems should be transparent and meaningful for users [199]. Figure 22 depicts this relationship of trust and predictability.

7.2. Social and Ethical Challenges

Privacy and Surveillance: Indoor mobile robots raise significant privacy concerns through their sensing capabilities and data-collection practices [94]. Balancing functionality with privacy protection remains a fundamental challenge.
Social Integration: Achieving acceptance across diverse user populations requires an understanding of cultural differences, social norms, and individual preferences [124]. Long-term social integration effects remain poorly understood.
Trust and Reliability: Building and maintaining appropriate trust levels requires predictable robot behavior, transparent decision-making, and effective error recovery [35]. Trust calibration remains difficult, with risks of both over-trust and under-trust.
Ethical Decision-Making: Robots operating in human environments must navigate ethical dilemmas, prioritize competing interests, and respect human autonomy [198]. Encoding ethical principles into robot behavior remains an active research area.

7.3. Economic and Market Challenges

Cost–Benefit Analysis: Indoor mobile robots must demonstrate clear economic value to justify deployment costs [213]. This requires quantifying benefits such as labor savings, efficiency improvements, and quality enhancements.
Market Fragmentation: Diverse indoor environments require specialized solutions, limiting economies of scale and increasing development costs [214]. Standardization efforts may help address this fragmentation.
Regulatory Uncertainty: Unclear regulatory frameworks create barriers to deployment and investment [215]. Harmonized standards and clear liability frameworks are needed to support market development.
Skills and Training: Successful robot deployment requires training for users, maintenance staff, and support personnel [14]. Educational programs and support infrastructure must develop alongside technology deployment.

7.4. Future Research Directions and Technology Trends

The field of indoor mobile robotics is evolving rapidly, driven by advances in AI, sensor technology, and computational infrastructure. Future research must not only address technical and social challenges but also leverage emerging technologies to develop reliable, adaptable, and human-centered robotic systems. Integrating methodological rigor into experimental studies and benchmarking new technologies will be essential to translate innovations into practical deployment.

7.4.1. Future Research Directions

Advanced AI Integration: Large Language Models (LLMs) and foundation models offer potential for more sophisticated interaction capabilities [117,127]. Driven by LLMs, robots can perceive environments, understand natural language instructions, and plan actions in a unified way [127]. Future research should focus on the safe, reliable, and ethical integration of these technologies. Methodologies may include controlled deployment studies comparing LLM-driven robots to traditional command-based systems, latency and safety benchmarking under different computational configurations (onboard vs. cloud), and simulation-based stress testing to evaluate robustness and error handling in dynamic indoor environments.
Embodied AI: Combining perception, reasoning, and action in mobile robots requires advances in embodied intelligence and sensorimotor learning [117]. Multi-modal foundation models may enable more general-purpose capabilities. Future approaches include simulated-to-real transfer experiments to assess multi-modal sensor integration, reinforcement learning in diverse indoor layouts, and ablation studies to determine which sensor modalities most strongly contribute to task success.
Social Learning: Robots that can learn social norms, user preferences, and environmental dynamics through observation and interaction may achieve better long-term integration [216].

Future research should include longitudinal field studies with diverse user populations to track adaptation and trust over time, employ cross-cultural evaluation protocols to assess generalizability, and implement online learning experiments to measure improvement in social behaviors.

Collaborative Intelligence: Human–robot teams that leverage complementary strengths may achieve superior performance compared to either humans or robots alone [90] as depicted in Figure 23. Research on team dynamics and coordination is needed. Methodologies may involve task-specific human–robot collaboration trials with quantitative performance metrics, the use of robot-readable environment annotations to standardize shared context, and interaction log analysis frameworks to study coordination, communication, and role adaptation. Also, for better collaboration, a robot-readable world should be the future.

Figure 23. The ultimate goal: A recapitulation.

Trust and Explainable AI (XAI): Building and maintaining trust requires transparent decision-making and predictable behavior [35,199]. Future studies should integrate user-centered XAI evaluations, including think-aloud protocols and post-task questionnaires, to test the comprehension of robot reasoning, as well as performing controlled experiments varying transparency levels to determine the impact on user trust and over/under-reliance.
Privacy, Ethics, and Regulatory Compliance: Robots must balance functionality with ethical and legal obligations [94,198]. Research should include scenario-based risk assessments for privacy and safety, simulations of regulatory-compliant operations in lab and office environments, and the development of ethics checklists and auditing protocols for real-world deployments.
Adaptive Autonomy: Robots need to adjust autonomy levels dynamically based on task complexity and user expertise [212]. Future research should conduct hierarchical autonomy trials, where robot decision-making is systematically varied. Online learning mechanisms will be implemented to tune autonomy levels in real-time. An important part is also the evaluation of human–robot performance using task success rates and cognitive load metrics.

7.4.2. Technology Trends and Opportunities

Emerging technologies and infrastructure developments provide new opportunities to enhance the capabilities, reliability, and affordability of indoor mobile robots. Exploiting these trends effectively will require rigorous testing, benchmarking, and integration strategies to ensure that innovations translate into real-world improvements.

5G and Edge Computing: High-bandwidth, low-latency communications enable cloud-based AI processing while maintaining real-time responsiveness [101]. This may enable more sophisticated capabilities in resource-constrained mobile platforms. Future research should evaluate task performance, latency, and reliability under different network configurations, including hybrid edge-cloud setups, to determine optimal architectures for resource-constrained mobile platforms.
Sensor Miniaturization: Advances in sensor technology enable richer environmental perception in smaller, less expensive packages [102]. This trend may enable new interaction modalities and improved system performance. Methodologies such as ablation studies, sensor fusion benchmarking, and comparative trials across diverse indoor layouts can identify which sensor combinations most effectively support navigation, interaction, and safety.
Battery Technology: Improvements in energy density and charging speed may enable longer operation times and reduced downtime [214]. Research should explore wireless charging, energy harvesting, and adaptive power-management strategies through long-duration field trials to quantify real-world gains in operational efficiency.
Manufacturing Advances: 3D printing, modular design, and automated assembly may reduce costs and enable customization for specific applications [187]. These advances may help address market fragmentation challenges. An evaluation of modular and scalable manufacturing approaches through prototyping and a cost–benefit analysis will help address market fragmentation and support broader deployment.

These future research directions and emerging technological trends provide a roadmap for advancing indoor mobile robotics. By combining methodological rigor, AI-driven capabilities, adaptive autonomy, and robust sensor and infrastructure technologies, researchers and practitioners can move toward more reliable, socially aware, and economically viable robotic systems. The following conclusions summarize the key insights and implications of these developments for human–robot collaboration in diverse indoor environments.

8. Conclusions

This comprehensive review of Human–Robot Interaction in indoor mobile robotics reveals a rapidly evolving field with significant potential for transforming how humans and robots collaborate in shared environments. The analysis of interaction modalities demonstrates that effective HRI requires multimodal approaches that combine speech, gesture, touch, and visual communication channels. Each modality offers unique advantages while facing distinct challenges related to environmental conditions, user diversity, and technical limitations.

The examination of user experience and acceptance factors highlights the critical importance of human-centered design in achieving successful robot deployment. Trust, reliability, and social integration emerge as fundamental requirements that must be addressed through careful attention to robot behavior, transparency, and cultural sensitivity.

The case studies of Moxi, Temi, and Astro illustrate different approaches to addressing these challenges, with varying degrees of success depending on the application domain and user requirements.

Safety and privacy considerations present ongoing challenges that require continued attention from researchers, manufacturers, and policymakers. The development of appropriate regulatory frameworks, technical standards, and ethical guidelines will be essential for the widespread adoption of indoor mobile robots. Privacy protection mechanisms and transparent data practices must be integrated into system design rather than treated as afterthoughts.

The technical challenges identified in this review, including navigation reliability, interaction robustness, and computational constraints, represent active areas of research with promising solutions emerging from advances in AI, sensor technology, and computing infrastructure. The integration of Large Language Models and foundation models offers promise for enhancing interaction capabilities while raising new questions about safety and reliability.

Future research should focus on developing more robust and adaptable interaction systems that can operate effectively across diverse indoor environments and user populations. Long-term studies of human–robot relationships, social integration effects, and trust development will be crucial for understanding the full implications of robot deployment in human-centered environments.

The success of indoor mobile robotics will ultimately depend on achieving the right balance among technical capability, user acceptance, safety, and economic viability. This requires continued collaboration among technologists, social scientists, ethicists, and end users to ensure that robotic systems truly serve human needs and values. As the field matures, the focus must shift from demonstrating technical feasibility to creating sustainable, beneficial, and trustworthy partnerships between humans and robots in our most personal and important spaces.

Author Contributions

Conceptualization, A.A.K. and K.T.; methodology, A.A.K.; validation, A.A.K. and K.T.; formal analysis, A.A.K.; investigation, A.A.K.; resources, K.T.; data curation, A.A.K.; writing—original draft preparation, A.A.K.; writing—review and editing, K.T.; visualization, A.A.K.; supervision, K.T.; project administration, K.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research is part of the Synergy Project ADAM (Autonomous Discovery of Advanced Materials), funded by the European Research Council (grant number: 856405). The authors also wish to thank the German Research Foundation and the State MV for funding the project 491224713.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

The authors acknowledge the valuable contributions of researchers and practitioners in the fields of Human–Robot Interaction, mobile robotics, and automation systems whose work forms the foundation for this review. Special recognition goes to the development teams at organizations advancing indoor mobile robotics applications, including laboratory automation systems, healthcare robotics, and domestic service platforms.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

AGV	Automated Guided Vehicle
AI	Artificial Intelligence
AMR	Autonomous Mobile Robot
API	Application Programming Interface
ASR	Automatic Speech Recognition
BCI	Brain–computer interface
CLIP	Contrastive Language-Image Pretraining
CPU	Central Processing Unit
DL	Deep Learning
GDPR	General Data Protection Regulation
GenAI	Generative Artificial Intelligence
GPT	Generative Pre-trained Transformer
GPU	Graphics Processing Unit
GUI	Graphical User Interface
HRC	Human–Robot Collaboration
HRI	Human–Robot Interaction
ICU	Intensive Care Unit
IMU	Inertial Measurement Unit
IR	Infrared
LBM	Large Behavioral Model
LiDAR	Light Detection and Ranging
LLM	Large Language Model
LSTM	Long Short-Term Memory
LVQ	Learning Vector Quantization
NLP	Natural Language Processing
PPE	Personal Protective Equipment
PRISMA	Preferred Reporting Items for Systematic Reviews and Meta-Analyses
RGB	Red-Green-Blue
RGB-D	Red-Green-Blue plus Depth
SLAM	Simultaneous Localization and Mapping
SVM	Support Vector Machine
TAM	Technology Acceptance Model
TTS	Text-to-Speech
UI	User Interface
UWB	Ultra-Wideband
UX	User Experience
VLM	Vision-Language Model
XAI	Explainable Artificial Intelligence

References

Goodrich, M.A.; Schultz, A.C. Human–Robot Interaction: A Survey. Found. Trends Hum.-Comput. Interact. 2007, 1, 203–275. [Google Scholar] [CrossRef]
Johnson, K. Hospital Robots Are Helping Combat a Wave of Nurse Burnout. Wired. 2022. Available online: https://www.wired.com/story/moxi-hospital-robot-nurse-burnout-health-care/ (accessed on 21 October 2025).
Thurow, K. Automation for Life Science Laboratories. Adv. Biochem. Eng. Biotechnol. 2022, 182, 3–22. [Google Scholar] [CrossRef]
Forlizzi, J.; DiSalvo, C. Service Robots in the Domestic Environment: A Study of the Roomba Vacuum in the Home. In Proceedings of the 1st ACM SIGCHI/SIGART Conference on Human-Robot Interaction, Salt Lake City, Utah, USA, 2–4 March 2006; ACM: New York, NY, USA, 2006; pp. 258–265. [Google Scholar] [CrossRef]
Steverson, M. Ageing and Health. World Health Organization. 2024. Available online: https://www.who.int/news-room/fact-sheets/detail/ageing-and-health (accessed on 21 October 2025).
Broadbent, E.; Stafford, R.; MacDonald, B. Acceptance of Healthcare Robots for the Older Population: Review and Future Directions. Int. J. Soc. Robot. 2009, 1, 319–330. [Google Scholar] [CrossRef]
Sahoo, S.K.; Choudhury, B.B. A Review on Human-Robot Interaction and User Experience in Smart Robotic Wheelchairs. J. Technol. Innov. Energy 2023, 2, 39–55. [Google Scholar] [CrossRef]
Kappel, K.S.; Ferreira Junior, P.R. Towards Comfortable and Socially Acceptable Navigation in Autonomous Motorized Wheelchairs. In Proceedings 2023 Latin American Robotics Symposium (LARS), 2023 Brazilian Symposium on Robotics (SBR), and 2023 Workshop on Robotics in Education (WRE), Salvador, Brazil, 9–11 October 2023; IEEE: New York, NY, USA, 2023; pp. 319–324. [Google Scholar] [CrossRef]
Kalenberg, M.; Lieret, M.; Hofmann, C.; Franke, J.E. A Multimodal A* Algorithm to Solve the Two-Dimensional Optimization Problem of Accompanying a Person for an Intelligent Wheelchair. In Proceedings of the 45th Annual International Conference of the IEEE Engineering in Medicine and Biology, Sydney, Australia, 24–27 July 2023. [Google Scholar] [CrossRef]
Vallor, S. Carebots and Caregivers: Sustaining the Ethical Ideal of Care in the Twenty-First Century. Philos. Technol. 2011, 24, 251–268. [Google Scholar] [CrossRef]
Moutousi, M.; El Saer, A.; Nikolaou, N.; Sanfeliu, A.; Garrell, A.; Bláha, L.; Martin, C.; Markakis, E.K.; Kefaloukos, I.; Lagomarsino, M.; et al. TORNADO: Foundation Models for Robots that Handle Small, Soft and Deformable Objects. In Proceedings of the 6th International Conference in Electronic Engineering and Information Technology (EEITE), Chania, Greece, 4–6 June 2025. [Google Scholar] [CrossRef]
Cao, Y.; Tam, I.W.; Wang, X.; Wu, Z.; Xu, Q. Mobile Robotic Manipulation for Search-and-Fetch Tasks by Integrating Human-Robot Interaction. In Proceedings of the IEEE International Conference on Mechatronics and Automation (ICMA), Beijing, China, 3–6 August 2025; IEEE: New York, NY, USA, 2025; pp. 598–603. [Google Scholar] [CrossRef]
4am Robotics. Mobile Robotics on the Upswing: Trends and Future Prospects. Scio-Automation.com. Scio Automation GmbH. 19 March 2025. Available online: https://www.scio-automation.com/update/4am/mobile-robotics-on-the-upswing-trends-and-future-prospects (accessed on 21 October 2025).
Dammers, H.; Vervier, L.; Mittelviefhaus, L.; Brauner, P.; Ziefle, M.; Gries, T. Usability of Human-Robot Interaction within Textile Production: Insights into the Acceptance of Different Collaboration Types. AHFE Int. USA 2022, 39, 213–223. [Google Scholar] [CrossRef]
Aminzadeh, R.; Fotouhi, R. Novel Design of a Precision Planter for a Robotic Assistant Farmer. In Proceedings of the ASME 2024 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, Volume B: 38th Mechanism and Robotics Conference, Buffalo, NY, USA, 17–20 August 2014; ASME: New York, NY, USA, 2015. [Google Scholar] [CrossRef]
Wirtz, J.; Stock-Homburg, R. Generative AI Meets Service Robots. J. Serv. Res. 2025, 28, 527–543. [Google Scholar] [CrossRef]
Staff, T.R.R. Diligent Robotics Hits 1M Picks with Moxi Robot in Healthcare Settings. The Robot Report. 2025. Available online: https://www.therobotreport.com/diligent-robotics-hits-1m-picks-with-moxi-robot-in-healthcare-settings/ (accessed on 21 October 2025).
Atar, S.; Liang, X.; Joyce, C.; Richter, F.; Ricardo, W.; Goldberg, C.; Suresh, P.; Yip, M. Humanoids in Hospitals: A Technical Study of Humanoid Robot Surrogates for Dexterous Medical Interventions. arXiv 2025, arXiv:2503.12725. [Google Scholar] [CrossRef]
Ramdani, N.; Panayides, A.S.; Karamousadakis, M.; Mellado, M.; Lopez, R.; Christophorou, C.; Rebiai, M.; Blouin, M.; Vellidou, E.; Koutsouris, D. A Safe, Efficient and Integrated Indoor Robotic Fleet for Logistic Applications in Healthcare and Commercial Spaces: The ENDORSE Concept. In Proceedings of the 20th IEEE International Conference on Mobile Data Management (MDM), Hong Kong, China, 10–13 June 2019; IEEE: New York, NY, USA, 2019; pp. 425–430. [Google Scholar] [CrossRef]
Geng, T.; Dyson, M.; Tsui, C.S.L.; Gan, J.Q. A 3-Class Asynchronous BCI Controlling a Simulated Mobile Robot. In Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology, Lyon, France, 22–26 August 2007; IEEE: New York, NY, USA, 2007; pp. 2524–2527. [Google Scholar] [CrossRef]
Carnegie, D.A. Developing Human Useable Autonomous Mobile Robots. In Proceedings of the IEEE International Conference on Mechatronics, Singapore, 1–3 December 2004; IEEE: New York, NY, USA, 2004; pp. 129–134. [Google Scholar] [CrossRef]
Nourbakhsh, I.R.; Crowley, K.J.; Bhave, A.Y.; Hamner, E.; Hsiu, T.; Pérez-Bergquist, A.S.; Richards, S.; Wilkinson, K. The Robotic Autonomy Mobile Robotics Course: Robot Design, Curriculum Design and Educational Assessment. Auton. Robot. 2005, 18, 103–127. [Google Scholar] [CrossRef]
Shen, Y.; Guo, D.; Long, F.; Mateos, L.A.; Ding, H.; Xiu, Z.; Hellman, R.B.; King, A.; Chen, S.; Zhang, C.; et al. Robots Under COVID-19 Pandemic: A Comprehensive Survey. IEEE Access 2021, 9, 1590–1615. [Google Scholar] [CrossRef] [PubMed]
Huang, H.W.; Chai, P.; Ehmke, C.; Merewether, G.B.; Dadabhoy, F.Z.; Feng, A.; Thomas, A.J.; Li, C.; da Silva, M.; Raibert, M.; et al. Agile Mobile Robotic Platform for Contactless Vital Signs Monitoring. TechRxiv 2020, preprint. [Google Scholar] [CrossRef]
Yoo, H.J.; Kim, E.H.; Lee, H. Mobile Robots for Isolation-Room Hospital Settings: A Scenario Based Preliminary Study. Comput. Struct. Biotechnol. J. 2024, 24, 237–246. [Google Scholar] [CrossRef]
Yang, G.Z.; Nelson, B.J.; Murphy, R.R.; Choset, H.; Christensen, H.; Collins, S.H.; Dario, P.; Goldberg, K.; Ikuta, K.; Jacobstein, N.; et al. Combating COVID-19—The Role of Robotics in Managing Public Health and Infectious Diseases. Sci. Robot. 2020, 5, 40. [Google Scholar] [CrossRef]
O’Meara, S. Hospital Ward Run by Robots to Spare Staff from Catching Virus. New Sci. 2020, 245, 11. [Google Scholar] [CrossRef]
Aboki, N.; Georgievski, I.; Aiello, M. Automating a Telepresence Robot for Human Detection, Tracking, and Following. In Towards Autonomous Robotic Systems, Proceedings of the Annual Conference Towards Autonomous Robotic Systems TAROS 2023, Cambridge, UK, 13–15 September 2023; Springer: Cham, Switzerland, 2023; pp. 150–161. [Google Scholar] [CrossRef]
Obek, C.; Doganca, T.; Argun, O.B.; Kural, A.R. Management of Prostate Cancer Patients during COVID-19 Pandemic. Prostate Cancer Prostatic Dis. 2020, 23, 398–406. [Google Scholar] [CrossRef]
Chauhan, H.; Jang, Y.; Jeong, I. Predicting Human Trust in Human-Robot Collaborations Using Machine Learning and Psychophysiological Responses. Adv. Eng. Inform. 2024, 62, 102720. [Google Scholar] [CrossRef]
Lima, M.R.; Wairagkar, M.; Natarajan, N.; Vaitheswaran, S.; Vaidyanathan, R. Robotic Telemedicine for Mental Health: A Multimodal Approach to Improve Human-Robot Engagement. Front. Robot. AI 2021, 8, 618866. [Google Scholar] [CrossRef]
Macalupu, V.; Miller, E.; Martin, L.; Caldwell, G. Human–Robot Interactions and Experiences of Staff and Service Robots in Aged Care. Sci. Rep. 2025, 15, 2495. [Google Scholar] [CrossRef] [PubMed]
Sharkawy, A.N. Human-Robot Interaction: Applications. In Proceedings of the 1st IFSA Winter Conference on Automation, Robotics & Communication for Industry 4.0, Chamonix-Mont-Blanc, France, 3–5 February 2021; IFSA Publishing: Barcelona, Spain, 2021. [Google Scholar] [CrossRef]
Kruse, T.; Pandey, A.K.; Alami, R.; Kirsch, A. Human-Aware Robot Navigation: A Survey. Robot. Auton. Syst. 2013, 61, 1726–1743. [Google Scholar] [CrossRef]
Hancock, P.A.; Billings, D.R.; Schaefer, K.E.; Chen, J.Y.; De Visser, E.J.; Parasuraman, R. A Meta-Analysis of Factors Affecting Trust in Human-Robot Interaction. Hum. Factors 2011, 53, 517–527. [Google Scholar] [CrossRef]
Lee, J.D.; See, K.A. Trust in Automation: Designing for Appropriate Reliance. Hum. Factors 2004, 46, 50–80. [Google Scholar] [CrossRef] [PubMed]
Sheridan, T.B. Human–Robot Interaction: Status and Challenges. Hum. Factors 2016, 58, 525–532. [Google Scholar] [CrossRef]
Haney, J.M.; Liang, C.J. A Literature Review on Safety Perception and Trust during Human–Robot Interaction with Autonomous Mobile Robots that Apply to Industrial Environments. IISE Trans. Occup. Ergon. Hum. Factors 2024, 12, 6–27. [Google Scholar] [CrossRef] [PubMed]
Parasuraman, R.; Sheridan, T.B.; Wickens, C.D. A Model for Types and Levels of Human Interaction with Automation. IEEE Trans. Syst. Man Cybern.—Part A Syst. Hum. 2000, 30, 286–297. [Google Scholar] [CrossRef]
Steinfeld, A.; Fong, T.; Kaber, D.; Lewis, M.; Scholtz, J.; Schultz, A.; Goodrich, M. Common Metrics for Human-Robot Interaction. In Proceedings of the 1st ACM SIGCHI/SIGART Conference on Human-Robot Interaction (HRI ’06), Salt Lake City, UT, USA, 2–3 March 2006; ACM: New York, NY, USA, 2006; pp. 33–40. [Google Scholar] [CrossRef]
Bartneck, C.; Kulić, D.; Croft, E.; Zoghbi, S. Measurement Instruments for the Anthropomorphism, Animacy, Likeability, Perceived Intelligence, and Perceived Safety of Robots. Int. J. Soc. Robot. 2009, 1, 71–81. [Google Scholar] [CrossRef]
Bonsignorio, F.; del Pobil, A.P. Toward Replicable and Measurable Robotics Research [From the Guest Editors]. IEEE Robot. Autom. Mag. 2015, 22, 32–35. [Google Scholar] [CrossRef]
Gunes, H.; Broz, F.; Crawford, C.S.; der Pütten, A.R.; Strait, M.; Riek, L. Reproducibility in Human-Robot Interaction: Furthering the Science of HRI. Curr. Robot. Rep. 2022, 3, 281–292. [Google Scholar] [CrossRef]
Biswas, A.; Wang, A.; Silvera, G.; Steinfeld, A.; Admoni, H. SocNavBench: A Grounded Simulation Testing Framework for Evaluating Social Navigation. J. Hum.-Robot. Interact. 2022, 11, 26. [Google Scholar] [CrossRef]
Jian, J.Y.; Bisantz, A.M.; Drury, C.G. Foundations for an Empirically Determined Scale of Trust in Automated Systems. Int. J. Cogn. Ergon. 2000, 4, 53–71. [Google Scholar] [CrossRef]
Hart, S.G.; Staveland, L.E. Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research. Adv. Psychol. 1988, 52, 139–183. [Google Scholar] [CrossRef]
Lewis, J.R. The System Usability Scale: Past, Present, and Future. Int. J. Hum.–Comput. Interact. 2018, 34, 577–590. [Google Scholar] [CrossRef]
Tsoi, N. SEAN 2.0: Formalizing and Generating Social Situations for Robot Navigation. IEEE Robot. Autom. Lett. 2022, 7, 11047–11054. [Google Scholar] [CrossRef]
Karnan, H.; Nair, A.; Xiao, X.; Warnell, G.; Pirk, S.; Toshev, A.; Hart, J.; Biswas, J.; Stone, P. Socially Compliant Navigation Dataset (SCAND): A Large-Scale Dataset of Demonstrations for Social Navigation. arXiv 2022, arXiv:2203.15041. [Google Scholar] [CrossRef]
Groß, A.; Schütze, C.; Brandt, M.; Wrede, B.; Richter, B. RISE: An open-source architecture for interdisciplinary and reproducible human–robot interaction research. Front. Robot. AI 2023, 10, 1245501. [Google Scholar] [CrossRef]
Kemp, C.C.; Edsinger, A.; Torres-Jara, E. Challenges for Robot Manipulation in Human Environments. IEEE Robot. Autom. Mag. 2007, 14, 20–29. [Google Scholar] [CrossRef]
Mumm, J.; Mutlu, B. Human-Robot Proxemics: Physical and Psychological Distancing in Human-Robot Interaction. In Proceedings of the 6th International Conference on Human-Robot Interaction, Lausanne, Switzerland, 6–9 March 2011; ACM: New York, NY, USA, 2006; pp. 331–338. [Google Scholar] [CrossRef]
Chauhan, S.; Kapoor, S.; Nagpal, M.; Choudhary, G.; Dutt, V. Building Trust in the Age of Human-Machine Interaction: Insights, Challenges, and Future Directions. Front. Robot. AI 2025, 12, 1535082. [Google Scholar] [CrossRef] [PubMed]
Lasota, P.A.; Fong, T.; Shah, J.A. A Survey of Methods for Safe Human-Robot Interaction. Found. Trends Robot. 2017, 5, 261–349. [Google Scholar] [CrossRef]
Deshmukh, R.A.; Hasamnis, M.A.; Kulkarni, M.B.; Bhaiyya, M. Advancing Indoor Positioning Systems: Innovations, Challenges, and Applications in Mobile Robotics. Robotica 2025, 43, 2710–2750. [Google Scholar] [CrossRef]
Sun, H.; Meng, Z.; Shen, X.; Ang, M.H. Detection and State Estimation of Moving Objects on a Moving Base for Indoor Navigation. In Proceedings of the 2016 IEEE/SICE International Symposium on Systems Integration (SII), Sapporo, Japan, 13–15 December 2016; IEEE: New York, NY, USA, 2017; pp. 325–330. [Google Scholar] [CrossRef]
Fong, T.; Nourbakhsh, I.; Dautenhahn, K. A Survey of Socially Interactive Robots. Robot. Auton. Syst. 2003, 42, 143–166. [Google Scholar] [CrossRef]
Ju, W.; Leifer, L. The Design of Implicit Interactions. Des. Issues 2008, 24, 72–84. [Google Scholar] [CrossRef]
Che, Y.; Okamura, A.M.; Sadigh, D. Efficient and Trustworthy Social Navigation via Explicit and Implicit Robot-Human Communication. IEEE Trans. Robot. 2020, 36, 692–707. [Google Scholar] [CrossRef]
Dragan, A.; Lee, K.C.T.; Srinivasa, S. Legibility and Predictability of Robot Motion. In Proceedings of the 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI), Tokyo, Japan, 3–6 March 2013; IEEE: New York, NY, USA, 2013; pp. 301–308. [Google Scholar] [CrossRef]
Belpaeme, T.; Kennedy, J.; Ramachandran, A.; Scassellati, B.; Tanaka, F. Social Robots for Education: A Review. Sci. Robot. 2018, 3, eaat5954. [Google Scholar] [CrossRef] [PubMed]
Adams, J.A. Multiple Robot/Single Human Interaction: Effects on Perceived Workload. Behav. Inf. Technol. 2009, 28, 183–198. [Google Scholar] [CrossRef]
Baisch, S.; Kolling, T.; Schall, A.; Rühl, S.; Selic, S.; Kim, Z.; Rossberg, H.; Klein, B.; Pantel, J.; Oswald, F.; et al. Acceptance of Social Robots by Elder People: Does Psychosocial Functioning Matter? Int. J. Soc. Robot. 2017, 9, 293–307. [Google Scholar] [CrossRef]
Huang, J.; Junginger, S.; Roddelkopf, T.; Liu, H.; Thurow, K. IoT-Based Solution for Mobile Robots Utilizing Closed-System Elevators in Life Science Laboratories. In Proceedings of the 2024 7th Iberian Robotics Conference (ROBOT), Madrid, Spain, 6–8 November; IEEE: New York, NY, USA, 2024; pp. 1–6. [Google Scholar] [CrossRef]
Liu, H.; Stoll, N.; Junginger, S.; Thurow, K. Mobile Robotic Transportation in Laboratory Automation: Multi-robot Control, Robot-Door Integration and Robot-Human Interaction. In Proceedings of the 2014 IEEE International Conference on Robotics and Biomimetics (ROBIO 2014), Bali, Indonesia, 5–10 December 2014; IEEE: New York, NY, USA, 2015; pp. 1033–1038. [Google Scholar] [CrossRef]
Wolf, A.; Wolton, D.; Trapl, J.; Janda, J.; Romeder-Finger, S.; Gatternig, T.; Farcet, J.B.; Galambos, P.; Széll, K. Towards Robotic Laboratory Automation Plug & Play: The “LAPP” Framework. SLAS Technol. 2022, 27, 18–25. [Google Scholar] [CrossRef]
Denning, T.; Matuszek, C.; Koscher, K.; Smith, J.R.; Kohno, T. A Spotlight on Security and Privacy Risks with Future Household Robots: Attacks and Lessons. In Proceedings of the 11th International Conference on Ubiquitous Computing, Orlando, FL, USA, 30 September–3 October 2009; ACM: New York, NY, USA, 2009; pp. 105–114. [Google Scholar] [CrossRef]
Chatterjee, S.; Chaudhuri, R.; Vrontis, D. Usage Intention of Social Robots for Domestic Purpose: From Security, Privacy, and Legal Perspectives. Inf. Syst. Front. 2024, 26, 121–136. [Google Scholar] [CrossRef]
Sharkey, A.; Sharkey, N. Granny and the Robots: Ethical Issues in Robot Care for the Elderly. Ethics Inf. Technol. 2012, 14, 27–40. [Google Scholar] [CrossRef]
Scassellati, B.; Admoni, H.; Mataric, M.J. Robots for Use in Autism Research. Annu. Rev. Biomed. Eng. 2012, 14, 275–294. [Google Scholar] [CrossRef] [PubMed]
Diana, C. Designing Robots that Work and Matter. In Designing Interactions with Robots; DiFabio, J.M., Shyam Sundar, S., Jones, K.S., Eds.; Chapman and Hall/CRC: Boca Raton, FL, USA, 2024. [Google Scholar]
Tapus, A.; Mataric, M.; Scassellati, B. Socially Assistive Robotics [Grand Challenges of Robotics]. IEEE Robot. Autom. Mag. 2007, 14, 35–42. [Google Scholar] [CrossRef]
Pineau, J.; Montemerlo, M.; Pollack, M.; Roy, N.; Thrun, S. Towards Robotic Assistants in Nursing Homes: Challenges and Results. Robot. Auton. Syst. 2003, 42, 271–281. [Google Scholar] [CrossRef]
Eirale, A.; Martini, M.; Tagliavini, L.; Gandini, D.; Chiaberge, M.; Quaglia, G. Marvin: An Innovative Omni-Directional Robotic Assistant for Domestic Environments. Sensors 2022, 22, 5261. [Google Scholar] [CrossRef]
Valner, R.; Masnavi, H.; Rybalskii, I.; Põlluäär, R.; Kõiv, E.; Aabloo, A.; Kruusamäe, K.; Singh, A.K. Scalable and Heterogenous Mobile Robot Fleet-Based Task Automation in Crowded Hospital Environments—A Field Test. Front. Robot. AI 2022, 9, 922835. [Google Scholar] [CrossRef]
Shah, R.; Yu, A.; Zhu, Y.; Zhu, Y.; Martín-Martín, R. BUMBLE: Unifying Reasoning and Acting with Vision-Language Models for Building-wide Mobile Manipulation (Version 1). arXiv 2024. [Google Scholar] [CrossRef]
Zhu, T.; Ahn, H.S.; Broadbent, E. A Deep Dive into Human-Robot Interaction in Hospitals: Scoping Review on the Services Provided, Engagement Behaviours and Interaction Outcomes. Int. J. Soc. Robot. 2025, 17, 1541–1561. [Google Scholar] [CrossRef]
Huang, J.; Liu, H.; Junginger, S.; Thurow, K. Mobile Robots in Automated Laboratory Workflows. SLAS Technol. 2025, 30, 100240. [Google Scholar] [CrossRef]
Thurow, K.; Zhang, L.; Liu, H.; Junginger, S.; Stoll, N.; Huang, J. Multi-Floor Laboratory Transportation Technologies Based on Intelligent Mobile Robots. Transp. Saf. Environ. 2019, 1, 37–53. [Google Scholar] [CrossRef]
Huang, J.; Junginger, S.; Liu, H.; Thurow, K. Correcting of Unexpected Localization Measurement for Indoor Automatic Mobile Robot Transportation Based on Neural Network. Transp. Saf. Environ. 2024, 6, tdad019. [Google Scholar] [CrossRef]
Kleine-Wechelmann, S.; Bastiaanse, K.; Freundel, M.; Becker-Asano, C. Designing the Mobile Robot Kevin for a Life Science Laboratory. In Proceedings of the 31st IEEE International Conference on Robot and Human Interactive Communication (ROMAN), Napoli, Italy, 29 August–2 September 2022; IEEE: New York, NY, USA, 2022; pp. 870–875. [Google Scholar] [CrossRef]
Lewis, D.; Pearce, K.; Riley, A.; Stellmaker, J.; Andrijasevic, D.; Reese-Davis, A.; Mudler, L.; Freyholtz, C.; Rosier, A.; Fischer, S.; et al. B-036 Paving the Path for Autonomous Mobile Robots (AMRs) in the Clinical Laboratory: A Pilot Study of Collaborative Robotics (Cobot) Proxie Robot for Cart Movement at Mayo Clinic Laboratories. Clin. Chem. 2025, 71, hvaf086.434. [Google Scholar] [CrossRef]
PAL. Robotics. TIAGo: The Mobile Manipulator Robot That Adapts to Your Research Needs. Available online: https://pal-robotics.com/robot/tiago/ (accessed on 21 October 2025).
Stückler, J.; Behnke, S. Integrating Indoor Mobility, Object Manipulation, and Intuitive Interaction for Domestic Service Tasks. In Proceedings of the 9th IEEE-RAS International Conference on Humanoid Robots, Paris, France, 7–10 December 2009; IEEE: New York, NY, USA, 2009; pp. 506–513. [Google Scholar] [CrossRef]
Ngo, T.T.; Nguyen, K.T.N.; Nguyen, D.Q.; Pham, K.G.; Hoang, K.H.M.; Pham, Q.P.M.; Do, T.T. Development of a Cost-Effective on-Device Natural Language Command Navigation System for Mobile Robots in Challenging Indoor Scenarios. In Proceedings of the 2025 IEEE/SICE International Symposium on System Integration (SII), Munich, Germany, 21–24 January 2025; IEEE: New York, NY, USA, 2025; pp. 1255–1259. [Google Scholar] [CrossRef]
Balcı, O.; Poncelet, S.; Nguyen, B.V.D.; Vande Moere, A. Manifesting Architectural Subspaces with Two Mobile Robotic Partitions to Facilitate Spontaneous Office Meetings. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, Yokohama, Japan, 26 April–1 May 2025; ACM: New York, NY, USA, 2025. [Google Scholar] [CrossRef]
Nguyen, B.V.D.; Vande Moere, A. The Adaptive Architectural Layout: How the Control of a Semi-Autonomous Mobile Robotic Partition Was Shared to Mediate the Environmental Demands and Resources of an Open-Plan Office. In Proceedings of the 2024 Conference on Human Factors in Computing Systems, Honolulu, HI, USA, 11–16 May 2024; ACM: New York, NY, USA, 2024. [Google Scholar] [CrossRef]
Colceriu, C.; Theis, S.; Brell-Cokcan, S.; Nitsch, V. User-Centered Design in Mobile Human Robot Cooperation: Consideration of Usability and Situation Awareness in GUI Design for Mobile Robots at Assembly Workplaces. I-COM 2023, 22, 193–213. [Google Scholar] [CrossRef]
Huy, D.Q.; Vietcheslav, I.; Gerald, S.G.L. See-through and Spatial Augmented Reality—A Novel Framework for Human-Robot Interaction. In Proceedings of the 3rd International Conference on Control, Automation and Robotics (ICCAR), Nagoya, Japan, 24–26 April 2017; IEEE: New York, NY, USA, 2017; pp. 719–726. [Google Scholar] [CrossRef]
Avram, O.; Baraldo, S.; Valente, A. Generalized Behavior Framework for Mobile Robots Teaming with Humans in Harsh Environments. Front. Robot. AI 2022, 9, 898366. [Google Scholar] [CrossRef] [PubMed]
Duorinaah, F.X.; Rajendran, M.; Kim, T.W.; Kim, J.I.; Lee, S.; Lee, S.; Kim, M.K. Human and Multi-Robot Collaboration in Indoor Environments: A Review of Methods and Application Potential for Indoor Construction Sites. Buildings 2025, 15, 2794. [Google Scholar] [CrossRef]
Young, J.E.; Hawkins, R.; Sharlin, E.; Igarashi, T. Toward Acceptable Domestic Robots: Applying Insights from Social Psychology. Int. J. Soc. Robot. 2009, 1, 95–108. [Google Scholar] [CrossRef]
Leite, I.; Martinho, C.; Paiva, A. Social Robots for Long-Term Interaction: A Survey. Int. J. Soc. Robot. 2013, 5, 291–308. [Google Scholar] [CrossRef]
Astua, C.; Crespo, J.; Barber, R. Detecting Objects for Indoor Monitoring and Surveillance for Mobile Robots. In Proceedings of the 2014 International Conference on Emerging Security Technologies, Alcala de Henares, Spain, 10–12 September 2014; IEEE: New York, NY, USA, 2014; pp. 124–129. [Google Scholar] [CrossRef]
Dautenhahn, K. Socially Intelligent Robots: Dimensions of Human–Robot Interaction. Philos. Trans. 2007, 362, 679–704. [Google Scholar] [CrossRef]
Feil-Seifer, D.; Matari’c, M.J. Defining Socially Assistive Robotics. In Proceedings of the 2005 IEEE 9th International Conference on Rehabilitation Robotics (ICORR), Chicago, IL, USA, 28 June–1 July 2005; IEEE: New York, NY, USA, 2005; pp. 465–468. [Google Scholar] [CrossRef]
Moscaritolo, A. Amazon Astro. PCMag Middle East. 19 November 2025. Available online: https://me.pcmag.com/en/smart-home-2/14892/amazon-astro (accessed on 21 October 2025).
Weber, T. Meet ZERITH H1: The Hotel Housekeeper of the Future. HouseBots. 2024. Available online: https://news.housebots.com/news/meet-zerith-h1-the-hotel-housekeeper-of-the-future (accessed on 21 October 2025).
Loki Robotics, Inc. Loki Robotics. Available online: https://lokirobotics.co/ (accessed on 21 October 2025).
Hoffman, G.; Ju, W. Designing Robots with Movement in Mind. J. Hum.-Robot Interact. 2014, 3, 89–122. [Google Scholar] [CrossRef]
Shi, W.; Cao, J.; Zhang, Q.; Li, Y.; Xu, L. Edge Computing: Vision and Challenges. IEEE Internet Things J. 2016, 3, 637–646. [Google Scholar] [CrossRef]
Liu, H.; Stoll, N.; Junginger, S.; Thurow, K. An Application of Charging Management for Mobile Robot Transportation in Laboratory Environments. In Proceedings of the IEEE International Instrumentation and Measurement Technology Conference (I2MTC), Minneapolis, MN, USA, 6–9 May 2013; IEEE: New York, NY, USA, 2013; pp. 435–439. [Google Scholar] [CrossRef]
Bellotto, N.; Hu, H. Multisensor-Based Human Detection and Tracking for Mobile Service Robots. IEEE Trans. Syst. Man Cybern. 2009, 39, 167–181. [Google Scholar] [CrossRef]
Papageorgiou, X.S.; Tzafestas, C.S.; Maragos, P.A.; Pavlakos, G.; Chalvatzaki, G.G.; Moustris, G.P.; Kokkinos, I.; Peer, A.; Stanczyk, B.; Fotinea, E.; et al. Advances in Intelligent Mobility Assistance Robot Integrating Multimodal Sensory Processing. In Universal Access in Human-Computer Interaction, Aging and Assistive Environments, Proceedings of the 8th international Conference, UAHCI 2014, Heraklion, Greece, 22–27 June 2014; Springer: Cham, Switzerland, 2014; Volume 8515, pp. 692–703. [Google Scholar] [CrossRef]
Sales, F.F.; Portugal, D.; Rocha, R.P. Real-Time People Detection and Mapping System for a Mobile Robot Using a RGB-D Sensor. In Proceedings of the 11th International Conference on Informatics in Control, Automation and Robotics (ICINCO), Vienna, Austria, 1–3 September 2014; IEEE: New York, NY, USA, 2014; pp. 467–474. [Google Scholar] [CrossRef]
Durrant-Whyte, H.; Bailey, T. Simultaneous Localization and Mapping: Part I. IEEE Robot. Autom. Mag. 2006, 13, 99–110. [Google Scholar] [CrossRef]
Huang, J.; Junginger, S.; Liu, H.; Thurow, K. Indoor Positioning Systems of Mobile Robots: A Review. Robotics 2023, 12, 47. [Google Scholar] [CrossRef]
Achour, A.; Al-Assaad, H.; Dupuis, Y.; El Zaher, M. Collaborative Mobile Robotics for Semantic Mapping: A Survey. Appl. Sci. 2022, 12, 10316. [Google Scholar] [CrossRef]
Bando, S.; Tsubouchi, T.; Yuta, S. Scan Matching Method Using Projection in Dominant Direction of Indoor Environment. Adv. Robot. 2014, 28, 1243–1251. [Google Scholar] [CrossRef]
Ranjan, R.; Younsik Jung, S.B.; Donggyu, S.; Kim, T.S.; Kim, C.; Lee, S.; Kye, J. Comparative Analysis of Integrated Filtering Methods Using UWB Localization in Indoor Environment. Sensors 2024, 24, 1052. [Google Scholar] [CrossRef]
Multimodal Interfaces. In Encyclopedia of Multimedia; Furht, B., Ed.; Springer: Boston, MA, USA, 2006; pp. 635–637. [Google Scholar] [CrossRef]
Marge, M.; Espy-Wilson, C.; Ward, N.G.; Alwan, A.; Artzi, Y.; Bansal, M.; Blankenship, G.; Chai, J.; Daumé, H.; Dey, D.; et al. Spoken Language Interaction with Robots: Recommendations for Future Research. Comput. Speech Lang. 2022, 71, 101255. [Google Scholar] [CrossRef]
Deuerlein, C.; Langer, M.; Seßner, J.; Heß, P.; Franke, J. Human-Robot-Interaction Using Cloud Based Speech Recognition Systems. Procedia CIRP 2021, 97, 130–135. [Google Scholar] [CrossRef]
Valin, J.M.; Yamamoto, S.; Rouat, J.; Michaud, F.; Nakadai, K.; Okuno, H.G. Robust Recognition of Simultaneous Speech by a Mobile Robot. IEEE Trans. Robot. 2007, 23, 742–752. [Google Scholar] [CrossRef]
Norda, M.; Engel, C.; Rennies, J.; Appell, J.E.; Lange, S.C.; Hahn, A. Evaluating the Efficiency of Voice Control as Human Machine Interface in Production. IEEE Trans. Autom. Sci. Eng. 2024, 21, 4817–4828. [Google Scholar] [CrossRef]
Drygajlo, A.; Prodanov, P.J.; Ramel, G.; Meisser, M.; Siegwart, R. On Developing a Voice-Enabled Interface for Interactive Tour-Guide Robots. Adv. Robot. 2003, 17, 599–616. [Google Scholar] [CrossRef]
Wang, C.; Hasler, S.; Tanneberg, D.; Ocker, F.; Joublin, F.; Ceravola, A.; Deigmoeller, J.; Gienger, M. LaMI: Large Language Models for Multi-Modal Human-Robot Interaction. arXiv 2024, arXiv:2401.15174. [Google Scholar] [CrossRef]
Elmogy, M.M.; Habel, C.U.; Zhang, J. Cognitive Instruction Interface for Mobile Robot Navigation. In Proceedings of the 2009 International Conference on Computer Engineering & Systems, Cairo, Egypt, 14–16 December 2009; IEEE: New York, NY, USA, 2009; pp. 115–120. [Google Scholar] [CrossRef][Green Version]
Johnson, M.; Bradshaw, J.M. The Role of Interdependence in Trust. In Trust in Human-Robot Interaction; Elsevier: Amsterdam, The Netherlands, 2021; pp. 379–403. [Google Scholar] [CrossRef]
Stevenson, S.; Shek, S.; Bajit, E.; Nath, D.; Vela, P.A. LLM-enhanced Mobile Robot Navigation and Scene Description for Indoor Environments. In Proceedings of IEEE Opportunity Research Scholars Symposium (ORSS), Atlanta, GA, USA, 1 April–1 July 2025. [CrossRef]
Foster, M.E. Natural Language Generation for Social Robotics: Opportunities and Challenges. Philos. Trans. 2019, 374, 20180027. [Google Scholar] [CrossRef] [PubMed]
Schuller, B.; Batliner, A. Computational Paralinguistics: Emotion, Affect and Personality in Speech and Language Processing; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
Bohus, D.; Raux, A.; Harris, T.K.; Eskenazi, M.; Rudnicky, A.I. Olympus: An open-source framework for conversational spoken language interface research. In Proceedings of the Workshop on Bridging the Gap: Academic and Industrial Research in Dialog Technologies, Rochester, NY, USA, 26 April 2007; Association for Computational Linguistics: Stroudsburg, PA, USA, 2007; pp. 32–39. [Google Scholar] [CrossRef][Green Version]
Haring, K.S.; Mougenot, C.; Ono, F.; Watanabe, K. Cultural Differences in Perception and Attitude towards Robots. Int. J. Affect. Eng. 2014, 13, 149–157. [Google Scholar] [CrossRef]
Sahoo, S.K.; Choudhury, B.B. Challenges and Opportunities for Enhanced Patient Care with Mobile Robots in Healthcare. J. Mechatron. AI Eng. 2023, 4, 83–103. [Google Scholar] [CrossRef]
Chen, Z.; Su, Z.; Chesser, C.; Bourlon, C.; Sheng, W. Intelligent Indoor Navigation for Home Robots Based on Large Language Models. In Proceedings of the 21st IEEE International Conference on Automation Science and Engineering, Los Angeles, CA, USA, 17–21 August 2025; IEEE: New York, NY, USA, 2025; pp. 1709–1714. [Google Scholar] [CrossRef]
Forlini, M.; Babcinschi, M.; Palmieri, G.; Neto, P. D-RMGPT: Robot-assisted Collaborative Tasks Driven by Large Multimodal Models. arXiv 2024, arXiv:2408.11761. [Google Scholar] [CrossRef]
Byrne, A. Vision-Language Models for Human-Robot Collaboration: Real-Time Task Understanding and Execution. J. Comput. Sci. Softw. Appl. 2025, 5. Available online: https://mfacademia.org/index.php/jcssa/article/download/243/213 (accessed on 21 October 2025).
Brohan, A.; Brown, N.; Carbajal, J.; Chebotar, Y.; Dabis, J.; Finn, C.; Gopalakrishnan, K.; Hausman, K.; Herzog, A.; Hsu, J.; et al. RT-1: Robotics Transformer for Real-World Control at Scale. arXiv 2022, arXiv:2212.06817. [Google Scholar] [CrossRef]
Chiang, H.T.L.; Xu, Z.; Fu, Z.; Jacob, M.G.; Zhang, T.; Lee, T.W.E.; Tan, J. Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs. arXiv 2024, arXiv:2407.07775. [Google Scholar] [CrossRef]
Zhong, R.; Tian, Z.; Wang, Q.; Guo, M.; Shi, W. Design and implementation of a voice controlled indoor autonomous robot kit. In Proceedings of the 3rd IEEE International Conference on Mobility, Operations, Services and Technologies (MOST), Newark, DE, USA, 4–7 May 2025. [Google Scholar] [CrossRef]
Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning Transferable Visual Models from Natural Language Supervision (Version 1). arXiv 2021, arXiv:2103.00020. [Google Scholar] [CrossRef]
Mitra, S.; Acharya, T. Gesture Recognition: A Survey. IEEE Trans. Syst. Man Cybern. 2000, 37, 311–324. [Google Scholar] [CrossRef]
Burger, B.; Ferrané, I.; Lerasle, F.; Infantes, G. Two-Handed Gesture Recognition and Fusion with Speech to Command a Robot. Auton. Robot. 2012, 32, 129–147. [Google Scholar] [CrossRef]
Rautaray, S.S.; Agrawal, A. Vision Based Hand Gesture Recognition for Human Computer Interaction: A Survey. Artif. Intell. Rev. 2015, 43, 1–54. [Google Scholar] [CrossRef]
Cao, Z.; Simon, T.; Wei, S.E.; Sheikh, Y. Realtime Multi-Person 2d Pose Estimation Using Part Affinity Fields. arXiv 2017, arXiv:1611.08050. [Google Scholar] [CrossRef]
Grishchenko, I.; Bazarevsky, V. MediaPipe Holistic Simultaneous Face, Hand and Pose Prediction on Device. Google Research Blog. 2020. Available online: https://research.google/blog/mediapipe-holistic-simultaneous-face-hand-and-pose-prediction-on-device/ (accessed on 21 October 2025).
Kondaxakis, P.; Pajarinen, J.; Kyrki, V. Real-Time Recognition of Pointing Gestures for Robot to Robot Interaction. In Proceedings of the 2014 IEEE/RSJ International Conference in Intelligent Robots and Systems, Chicago, IL, USA, 14–18 September 2014; IEEE: New York, NY, USA, 2014; pp. 2621–2626. [Google Scholar] [CrossRef]
Alonso-Mora, J.; Haegeli Lohaus, S.; Leemann, P.; Siegwart, R.; Beardsley, P. Gesture Based Human—Multi-robot Swarm Interaction and Its Application to an Interactive Display. In Proceedings of the 2015 IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA, 26–30 May 2015; IEEE: New York NY, USA, 2015; pp. 5948–5953. [Google Scholar] [CrossRef]
Perzanowski, D.; Schultz, A.; Adams, W. Integrating Natural Language and Gesture in a Robotics Domain. In Proceedings of the 1998 IEEE International Symposium on Intelligent Control (ISIC), Gaithersburg, MD, USA, 14–17 September 1998; IEEE: New York NY, USA, 2002; pp. 247–252. [Google Scholar] [CrossRef]
Coviello, L.; Cavallo, F.; Limosani, R.; Rovini, E.; Fiorini, L. Machine Learning Based Physical Human-Robot Interaction for Walking Support of Frail People. In Proceedings of the 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Berlin, Germany, 23–27 July 2019; IEEE: New York NY, USA, 2019. [Google Scholar] [CrossRef]
Bhat, M.; Mahto, G.; Kesaria, S.; Fernandes, V.; Arya, K. Real-Time Gesture Control UAV with a Low Resource Framework. In Proceedings of the 2021 International Symposium of Asian Control Association on Intelligent Robotics and Industrial Automation (IRIA), Goa, India, 20–22 September 2021; IEEE: New York NY, USA, 2021; pp. 19–24. [Google Scholar] [CrossRef]
Lee, S.; Jung, S. Object Handling Control among Two-Wheel Robots and a Human Operator: An Empirical Approach. Int. J. Control Autom. Syst. 2013, 11, 346–353. [Google Scholar] [CrossRef]
Molchanov, P.; Gupta, S.; Kim, K.; Kautz, J. Hand Gesture Recognition with 3D Convolutional Neural Networks. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CPVRW), Boston, MA, USA, 7–12 June 2015; IEEE: New York NY, USA, 2015; pp. 1–7. [Google Scholar] [CrossRef]
Dahiya, R.S.; Metta, G.; Valle, M.; Sandini, G. Tactile Sensing—From Humans to Humanoids. IEEE Trans. Robot. 2010, 26, 1–20. [Google Scholar] [CrossRef]
Bourguet, M.L. Designing and Prototyping Multimodal Commands. In Proceedings of the Human-Computer Interaction INTERACT’03, Zurich, Switzerland, 1–5 September 2003; IOS Press: Amsterdam, The Netherlands, 2003; pp. 717–720. [Google Scholar]
Pascher, M.; Saad, A.; Liebers, J.; Heger, R.; Gerken, J.; Schneegaß, S.; Gruenefeld, U. Hands-On Robotics: Enabling Communication Through Direct Gesture Control. In Proceedings of the HRI’24: Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction, Boulder, CO, USA, 11–15 March 2024; ACM: New York, NY, USA, 2024; pp. 822–827. [Google Scholar] [CrossRef]
Haddadin, S.; Albu-Schäffer, A.; Hirzinger, G. Requirements for Safe Robots: Measurements, Analysis and New Insights. Int. J. Rob. Res. 2009, 28, 1507–1527. [Google Scholar] [CrossRef]
Admoni, H.; Scassellati, B. Social Eye Gaze in Human-Robot Interaction. J. Hum.-Robot. Interac. 2017, 6, 25–63. [Google Scholar] [CrossRef]
Mutlu, B.; Shiwa, T.; Kanda, T.; Ishiguro, H.; Hagita, N. Footing in Human-Robot Conversations: How Robots Might Shape Participant Roles Using Gaze Cues. In Proceedings of the 4th ACM/IEEE International Conference on Human Robot Interaction, La Jolla, CA, USA, 9–13 March 2009; ACM: New York, NY, USA, 2009; pp. 61–68. [Google Scholar] [CrossRef]
Ekman, P.; Rosenberg, E.L. What the Face Reveals: Basic and Applied Studies of Spontaneous Expression Using the Facial Action Coding System; Oxford University Press: Oxford, UK, 2020. [Google Scholar]
Baker, M.; Pouget, J.; Kanda, T. Toward Robot Autonomy in Group Conversations: Understanding the Effects of Body Orientation and Gaze. In Proceedings of the HRI 2017, Vienna, Austria, 6–9 March 2017. [Google Scholar] [CrossRef]
Baltrušaitis, T.; Ahuja, C.; Morency, L.P. Multimodal Machine Learning: A Survey and Taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 423–443. [Google Scholar] [CrossRef] [PubMed]
Fischer, C.; Schmidt, G.K. Multi-Modal Human-Robot Interface for Interaction with a Remotely Operating Mobile Service Robot. Adv. Robot. 1997, 12, 397–409. [Google Scholar] [CrossRef]
Paul, S.K.; Nicolescu, M.; Nicolescu, M. Enhancing Human–Robot Collaboration through a Multi-Module Interaction Framework with Sensor Fusion: Object Recognition, Verbal Communication, User of Interest Detection, Gesture and Gaze Recognition. Sensors 2023, 23, 5798. [Google Scholar] [CrossRef]
Lall’ee, S.; Lemaignan, S.; Lenz, A.; Melhuish, C.; Natale, L.; Skachek, S.; Warneken, F.; Dominey, P.F. Towards a Platform-Independent Cooperative Human-Robot Interaction System: I. Perception. In Proceedings of the 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, Taipei, Taiwan, 18–22 October 2010; IEEE: New York, NY, USA, 2010; pp. 4444–4451. [Google Scholar] [CrossRef]
Nieuwenhuisen, M.; Stückler, J.; Behnke, S. Intuitive Multimodal Interaction for Domestic Service Robots. In Proceedings of the Joint Conference of ISR 2010 (41st International Symposium on Robotics) and ROBOTIK 2010 (6th German Conference on Robotics), Munich, Germany, 7–9 June 2010; IEEE: New York, NY, USA, 2010; p. 8. [Google Scholar]
Lisondra, M.; Benhabib, B.; Nejat, G. Embodied AI with Foundation Models for Mobile Service Robots: A Systematic Review (Version 2). arXiv 2025, arXiv:2505.20503. [Google Scholar] [CrossRef]
Zhou, G.; Hong, Y.; Wang, Z.; Wang, X.E.; Wu, Q. NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models (Version 2). arXiv 2024. [Google Scholar] [CrossRef]
Lin, M.-Y.; Lee, O.-W.; Lu, C.-Y. Embodied AI with Large Language Models: A Survey and New HRI Framework. In Proceedings of the 2024 International Conference on Advanced Robotics and Mechatronics (ICARM), Tokyo, Japan, 8–10 July 2024; IEEE: New York, NY, USA, 2024; pp. 978–983. [Google Scholar] [CrossRef]
Wu, Z.; Wang, Z.; Xu, X.; Lu, J.; Yan, H. Embodied Task Planning with Large Language Models (Version 1). arXiv 2023, arXiv:2307.01848. [Google Scholar] [CrossRef]
Huang, J.; Yong, S.; Ma, X.; Linghu, X.; Li, P.; Wang, Y.; Li, Q.; Zhu, S.-C.; Jia, B.; Huang, S. An Embodied Generalist Agent in 3D World (Version 3). arXiv 2023, arXiv:2311.12871. [Google Scholar] [CrossRef]
Xu, W. Exploring Grounding Abilities in Vision-Language Models Through Contextual Perception. IEEE Trans. Cogn. Dev. Syst. 2025, 17, 1461–1473. [Google Scholar] [CrossRef]
Hong, Y.; Zheng, Z.; Chen, P.; Wang, Y.; Li, J.; Gan, C. MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World (Version 1). arXiv 2024, arXiv:2401.08577. [Google Scholar] [CrossRef]
Mehta, V.; Sharma, C.; Thiyagarajan, K. Large language models and 3D vision for intelligent robotic perception and autonomy. Sensors 2025, 25, 6394. [Google Scholar] [CrossRef]
Tang, G.; Jia, Q.; Huang, Z.; Chen, G.; Ji, N.; Yao, Z. 3D-Grounded Vision-Language Framework for Robotic Task Planning: Automated Prompt Synthesis and Supervised Reasoning (Version 1). arXiv 2025, arXiv:2502.08903. [Google Scholar] [CrossRef]
Apraiz, A.; Lasa, G.; Mazmela, M. Evaluation of User Experience in Human–Robot Interaction: A Systematic Literature Review. Int. J. Soc. Robot. 2023, 15, 187–210. [Google Scholar] [CrossRef]
Nielsen, J. Usability Engineering; Academic Press: Amsterdam, The Netherlands, 1993. [Google Scholar]
Mace, R.L.; Hardie, G.; Graeme, J.; Place, J. Accessible Environments: Toward Universal Design; Center for Accessible Housing, North Carolina State University: Raleigh, NC, USA, 1990; Available online: https://books.google.de/books?id=k6n-ygAACAAJ (accessed on 21 October 2025).
Rogers, Y.; Sharp, H.; Preece, J. Interaction Design: Beyond Human-Computer Interaction, 3rd ed.; Wiley: Chichester, UK, 2011. [Google Scholar]
Tapus, A.; Ţăpuş, C.; Matarić, M.J. User—Robot Personality Matching and Assistive Robot, Behavior Adaptation for Post-Stroke Rehabilitation Therapy. Int. Serv. Robot. 2008, 1, 169–183. [Google Scholar] [CrossRef]
Desai, M.; Kaniarasu, P.; Medvedev, M.; Steinfeld, A.; Yanco, H. Impact of Robot Failures and Feedback on Real-Time Trust. In Proceedings of the 8th ACM/IEEE International Conference on Human-Robot Interaction, Tokyo, Japan, 3–6 March 2013; IEEE: New York, NY, USA, 2013; pp. 251–258. [Google Scholar] [CrossRef]
Chita-Tegmark, M.; Law, T.; Rabb, N.; Scheutz, M. Can You Trust Your Trust Measure? In Proceedings of the 2021 ACM/IEEE International Conference on Human-Robot Interaction, Boulder, CO, USA, 8–11 March 2021; ACM: New York, NY, USA, 2021; pp. 92–100. [Google Scholar] [CrossRef]
Christoforakos, L.; Gallucci, A.; Surmava-Große, T.; Ullrich, D.; Diefenbach, S. Can Robots Earn Our Trust the Same Way Humans Do? A Systematic Exploration of Competence, Warmth, and Anthropomorphism as Determinants of Trust Development in HRI. Front. Robot. AI 2021, 8, 640444. [Google Scholar] [CrossRef]
Cameron, D.; Collins, E.C.; de Saille, S.; Law, J. The Social Triad Model of Human-Robot Interaction. arXiv 2023, arXiv:2308.04581. [Google Scholar] [CrossRef]
Cameron, H.R.; Castle-Green, S.; Chughtai, M.; Dowthwaite, L.; Kucukyilmaz, A.; Maior, H.A.; Ngo, V.; Schneiders, E.; Stahl, B.C. A Taxonomy of Domestic Robot Failure Outcomes: Understanding the Impact of Failure on Trustworthiness of Domestic Robots. In Proceedings of the Second International Symposium on Trustworthy Autonomous Systems, Austin, TX, USA, 16–18 September 2024; ACM: New York, NY, USA, 2024; pp. 1–14. [Google Scholar] [CrossRef]
Davis, F.D. Perceived Usefulness, Perceived Ease of Use, and User Acceptance of Information Technology. MIS Q. 1989, 13, 319–340. [Google Scholar] [CrossRef]
Cha, E.; Dragan, A.; Srinivasa, S. Effects of Robot Capability on User Acceptance. In Proceedings of the 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI), Tokyo, Japan, 3–6 March 2013. [Google Scholar] [CrossRef]
Mori, M.; MacDorman, K.F.; Kageki, N. The Uncanny Valley. IEEE Robot. Autom. Mag. 2012, 19, 98–100. [Google Scholar] [CrossRef]
Hebesberger, D.; Koertner, T.; Gisinger, C.; Pripfl, J. A Long-Term Autonomous Robot at a Care Hospital: A Mixed Methods Study on Social Acceptance and Experiences of Staff and Older Adults. Int. J. Soc. Robot. 2017, 9, 417–429. [Google Scholar] [CrossRef]
Giovannangeli, C.; Gaussier, P. Human-Robot Interactions as a Cognitive Catalyst for the Learning of Behavioral Attractors. In Proceedings of the 16th IEEE International Symposium on Robot and Human Interactive Communication, Jeju, Republic of Korea, 26–29 August 2017; IEEE: New York, NY, USA, 2007; pp. 1028–1033. [Google Scholar] [CrossRef]
Norman, D.A. The Design of Everyday Things: Psychologie und Design der Alltäglichen Dinge, 2nd ed.; Vahlen: Munich, Germany, 2016. [Google Scholar]
Al-Okby, M.F.R.; Neubert, S.; Stoll, N.; Thurow, K. Low-Cost, Flexible, and Reliable Hybrid Wheelchair Controller for Patients with Tetraplegia. In Proceedings of the 2019 IEEE International Conference on Cyborg and Bionic Systems (CBS), Munich, Germany, 18–20 September 2019; IEEE: New York, NY, USA, 2019; pp. 177–183. [Google Scholar] [CrossRef]
Cheng, M.H.; Liang, C.J.; McKenzie, E.A.; Dominguez, E.G. Identification of Contact Avoidance Zones of Robotic Devices in Human-Robot Collaborative Workspaces. IFAC-Pap. 2023, 56, 577–582. [Google Scholar] [CrossRef]
Simmons, R.G.; Fernández, J.L.; Goodwin, R.T.; Koenig, S.A.; O’Sullivan, J. Lessons Learned from Xavier. IEEE Robot. Autom. Mag. 2000, 7, 33–39. [Google Scholar] [CrossRef]
Rubagotti, M.; Tusseyeva, I.; Baltabayeva, S.; Summers, D.; Sandygulova, A. Perceived Safety in Physical Human–Robot Interaction—A Survey. Robot. Auton. Syst. 2022, 151, 104047. [Google Scholar] [CrossRef]
Akalin, N.; Kiselev, A.; Kristoffersson, A.; Loutfi, A. A Taxonomy of Factors Influencing Perceived Safety in Human–Robot Interaction. Int. J. Soc. Robot. 2023, 15, 1993–2004. [Google Scholar] [CrossRef]
Jacobs, T.; Virk, G.S. ISO 13482—The New Safety Standard for Personal Care Robots. In Proceedings of the Joint of the 45th International Symposium on Robotics and the 8th German Conference on Robotics, Munich, Germany, 2–4 June 2014. [Google Scholar]
Chemweno, P.; Pintelon, L.; Decre, W. Orienting Safety Assurance with Outcomes of Hazard Analysis and Risk Assessment: A Review of the ISO 15066 Standard for Collaborative Robot Systems. Saf. Sci. 2020, 129, 104832. [Google Scholar] [CrossRef]
Belzile, B.; Wanang-Siyapdjie, T.; Karimi, S.; Braga, R.G.; Iordanova, I.; St-Onge, D. From Safety Standards to Safe Operation with Mobile Robotic Systems Deployment. arXiv 2025, arXiv:2502.20693. [Google Scholar] [CrossRef]
Ebers, M. AI Robotics in Healthcare Between the EU Medical Device Regulation and the Artificial Intelligence Act: Gaps and Inconsistencies in the Protection of Patients and Care Recipients. Oslo Law Rev. 2024, 11, 1–12. [Google Scholar] [CrossRef]
Lin, P.; Abney, K.; Bekey, G.A. (Eds.) Robot Ethics: The Ethical and Social Implications of Robotics; Intelligent Robotics and Autonomous Agents; The MIT Press: Cambridge, MA, USA, 2014. [Google Scholar]
Cavoukian, A. Privacy by Design: The 7 Foundational Principles; Information and Privacy Commissioner of Ontario: Toronto, ON, Canada, 2011; Available online: https://web.archive.org/web/20221020132403/https://www.ipc.on.ca/wp-content/uploads/Resources/pbd-implement-7found-principles.pdf (accessed on 21 October 2025).
Vogt, P.; von dem Bussche, A. The EU General Data Protection Regulation (GDPR): A Practical Guide, 2nd ed.; Springer Nature: Cham, Switzerland, 2024. [Google Scholar]
Miller, O.A.; Gregory, J.M.; Reardon, C.M. Heterogeneous Multi-Robot Person-Following in Constrained Environments. In Proceedings of the 2024 IEEE International Symposium on Safety Security Rescue Robotics (SSRR), New York, NY, USA, 12–14 November 2024; IEEE: New York, NY, USA, 2024; pp. 136–141. [Google Scholar] [CrossRef]
Li, R.; Zhao, L.; Ge, L.; Sun, L.N.; Gao, T. The Development of a General Type of Security Robot. In Proceedings 2007 IEEE International Conference on Robotics and Biomimetics (ROBIO), Sanya, China, 15–18 December 2007; IEEE: New York, NY, USA, 2008; pp. 47–52. [Google Scholar] [CrossRef]
Chang, C.; Chen, K.; Lin, H.; Wang, C.; Jean, J. Development of a Patrol Robot for Home Security with Network Assisted Interactions. In Proceedings of the SICE Annual Conference, Takamatsu, Japan, 17–20 September 2007; IEEE: New York, NY, USA, 2008; pp. 924–928. [Google Scholar] [CrossRef]
Floridi, L. Translating Principles into Practices of Digital Ethics: Five Risks of Being Unethical. Philos. Technol. 2019, 32, 185–193. [Google Scholar] [CrossRef]
Gebellí, F.; Ros, R.; Lemaignan, S.; Garrell, A. Co-Designing Explainable Robots: A Participatory Design Approach for HRI. In Proceedings of the 2024 33rd IEEE International Conference on Robot and Human Interactive Communication (ROMAN), Pasadena, CA, USA, 26–30 August 2024; IEEE: New York, NY, USA, 2024; pp. 1564–1570. [Google Scholar] [CrossRef]
Maciborski, W. Moxi, Healthcare Robot Created in Austin, Completes 500k Deliveries. Available online: https://www.diligentrobots.com/press/moxi-healthcare-robot-created-in-austin-completes-500k-deliveries (accessed on 21 October 2025).
Warring, C. Diligent Robotics Unveils Moxi 2.0, Advancing the Largest Fleet of Deployed AI-Powered Mobile Manipulation Robots Operating in Unstructures, Human-Centric Work Environment. Diligent Robotics. 2025. Available online: https://www.diligentrobots.com/blog/moxi2-0 (accessed on 21 October 2025).
Tulk Jesso, S.; Greene, C.; Zhang, S.; Booth, A.; DiFabio, M.; Babalola, G.; Adegbemijo, A.; Sarkar, S. On the Potential for Human-Centered, Cognitively Inspired AI to Bridge the Gap between Optimism and Reality for Autonomous Robotics in Healthcare: A Respectful Critique. Proc. Int. Symp. Hum. Factors Ergon. Health Care 2024, 13, 106–112. [Google Scholar] [CrossRef]
Griner, T. Robotic Support of Nursing Care: State of the Technology and Future Predictions. Nurse Lead. 2022, 20, 569–573. [Google Scholar] [CrossRef]
Wolf, Y. The Anatomy of a Home Robot. Temi Blog; Temi USA Inc: New York, NY, USA, 2017. Available online: https://www.robotemi.com/creating-intelligent-autonomous-home-robot/ (accessed on 21 October 2025).
Isserles, D. Open to the Possibilities. Temi Blog. Available online: https://www.robotemi.com/open-to-the-possibilities/ (accessed on 21 October 2025).
Temi USA Inc. Personal Robots Help Reduce Isolation. Temi Blog. Available online: https://www.robotemi.com/personal-robots-help-reduce-isolation/ (accessed on 21 October 2025).
Temi USA Inc. Specs—Temi Robot. Available online: https://www.robotemi.com/specs/ (accessed on 21 October 2025).
Tritschler, C. Meet Astro, a Home Robot Unlike Any Other. About Amazon. Available online: https://www.aboutamazon.com/news/devices/meet-astro-a-home-robot-unlike-any-other (accessed on 21 October 2025).
Park, J.J.; Sen, A. Astro’s Intelligent Motion Brings State-of-the-Art Navigation to the Home. Amazon Science. Available online: https://www.amazon.science/blog/astros-intelligent-motion-brings-state-of-the-art-navigation-to-the-home (accessed on 21 October 2025).
Oitzman, M. Amazon Astro Offers Security Monitoring for Small Businesses. The Robot Report. Available online: https://www.therobotreport.com/amazon-astro-offers-security-monitoring-for-small-businesses/ (accessed on 21 October 2025).
Morse, J. Privacy Experts Aren’t Thrilled by Amazon’s Rolling Surveillance Robot. Mashable. Available online: https://mashable.com/article/amazon-astro-robot-privacy-concerns (accessed on 21 October 2025).
Beer, J.M.; Fisk, A.D.; Rogers, W.A. Toward a Framework for Levels of Robot Autonomy in Human-Robot Interaction. J. Hum. Rob. Interact. 2014, 3, 74–79. [Google Scholar] [CrossRef] [PubMed]
Weerarathna, I.N.; Raymond, D.; Luharia, A. Human-Robot Collaboration for Healthcare: A Narrative Review. Cureus 2023, 15, e49210. [Google Scholar] [CrossRef] [PubMed]
Siciliano, B.; Khatib, O. (Eds.) Springer Handbook of Robotics; Springer International Publishing: Cham, Switzerland, 2016. [Google Scholar] [CrossRef]
Terry, N.P. Of Regulating Healthcare AI and Robots. Yale J. Law Tech. 2019, 21, 133. [Google Scholar] [CrossRef]
Tanevska, A.; Rea, F.; Sandini, G.; Cañamero, L.; Sciutti, A.A. Socially Adaptable Framework for Human-Robot Interaction. Front. Robot. AI 2020, 7, 121. [Google Scholar] [CrossRef]

Figure 1. Twofold challenge of Indoor Human–Robot Interaction.

Figure 2. HRI classifications.

Figure 3. Flow diagram of the literature selection process.

Figure 4. Stages of HRI.

Figure 5. Evolution of robotics navigation.

Figure 6. Human tracking and following.

Figure 7. Channels of HRI.

Figure 8. Process of a typical speech-based interaction.

Figure 9. Voice biometric process.

Figure 10. Process of a typical gesture-based interaction.

Figure 11. Fingerprint-identification process.

Figure 12. Process of a typical gaze-based interaction.

Figure 13. Face biometric process.

Figure 14. Various human factors affecting robot acceptance.

Figure 15. Human-centric design for successful integration.

Figure 16. Impact of failure on trust.

Figure 17. The social triad.

Figure 18. Safety standards.

Figure 19. The data dilemma.

Figure 20. Successful adaptation.

Figure 22. Explainable AI.

Table 1. Comparison of robotic operational contexts.

Context	Interaction Priorities	Operational Constraints	Notes
Structured & Predictable	Unwavering efficiency and precision	Repetitive and pre- programmed tasks	Limited and highly controlled human interaction; static environment
Unstructured & Dynamic	Safe, adaptive, and socially intelligent assistance	Varied and context- dependent tasks	Continuous interaction with untrained users (adults, children, pets); dynamic clutter

Table 2. Diverse user groups for Human–Robot Interactions.

Context	Interaction Priorities	Operational Constraints	Notes
Healthcare	Hands-free, reliability, clarity, efficiency	Sterility, shift-length runtime	Elevator, door integration
Laboratory	Precision, traceability, zero-contact interaction	Safety, contamination control	Delicate equipment handling
Office	Low-friction tasking, efficiency	BYOD networks, mixed spaces	Meeting etiquette, proxemics
Home	Approachability, privacy, trust, emotional intelligence	Diverse users, clutter	Family routines, consent

Table 3. Comparative overview of domain-specific requirements for indoor mobile robots.

Domain	Primary Objective	Interaction Focus	Environmental Structure
Healthcare	Workflow relief, patient support	Clear, reliable, emotionally sensitive	Semi-structured, high traffic
Laboratory	Precision workflow automation	Predictable, concise communication	Highly structured
Office	Low-friction service support	Socially appropriate, non-disruptive	Structured, but socially dynamic
Industrial	Productivity and safety	Task-oriented, multimodal command	Structured, safety regulated
Residential	Assistance and companionship	Adaptive, personalized, privacy-aware	Unstructured, dynamic

Table 4. Domain comparison.

Domain	Structure	Technical Rigidity	Social Complexity	Dominant HRI Requirements
Industrial	Highly structured	High	Low	Predictable interaction, safety compliance
Laboratory	Highly structured	Very high	Moderate	Safety, intent signaling
Healthcare	hybrid	High	High	Trust, clarity, contextual awareness
Office	Semi-structured	Moderate	Moderate–high	Etiquette awareness, proxemics
Residential	Highly dynamic	Moderate	Very high	Emotional intelligence, trust-building

Table 5. Approaches to social navigation in robotics.

Context	Interaction Priorities	Operational Constraints
Reactive	Immediate obstacle avoidance	Simple rule-based logic
Predictive	Proactive path planning	Anticipation of human trajectories
Model-based	Adherence to proxemics	Explicit social rule implementation
Learning-based	Socially compliant behaviors	Reinforcement learning and simulation

Table 6. Overview of speech recognition approaches for indoor mobile robots.

	Keyword-Based AST	Command + Grammar NLP	Template-Based/Rule-Based	Statistical/N-Gram Models
Typical Datasets/Test Environments	Small vocabulary sets, lab-controlled speech	Structures tasks, simulated indoor environments	Predefined task environments	Controlled speech corpora
Evaluation Metrics	Accuracy, latency	Accuracy, task success rate	Accuracy, user completion	Perplexity, accuracy
Strengths/Advantages	Low computational cost, simple deployment	Moderate flexibility, handles simple context	Deterministic behavior, predictable	Can model probabilistic commands
Limitations/Gaps	Limited vocabulary, poor context understanding, noise sensitive	Medium cognitive load, limited adaptability	Rigid, fails under unexpected inputs	Limited semantic understanding, poor generalization
References	[113,114,115]	[116,118]	[123]	[112]

Table 7. Comparative analysis of LLM- and VLM approaches for speech-based HRI.

	GPT-Style LLM (Onboard)	LLM + Cloud Inference	VLM-Enhanced Speech + Vision	Transformer-Based Multimodal Fusion
Typical Datasets/Test Environments	Domain-specific corpora, interactive tasks	Open-domain dialogues, multimodal indoor datasets	Indoor navigation + multimodal interaction datasets	Simulated and real-world interactive tasks
Evaluation Metrics	Task success, command comprehension, latency	Task success, latency, robustness, user satisfaction	Task success, robustness, user satisfaction	Task completion, cognitive load, robustness
Strengths/Advantages	Context-aware reasoning, flexible dialogue, supports complex instructions	Adaptive, context-aware, can integrate multimodal input streams	Integrates visual context for disambiguation, improves sematic understanding	Combines speech and gestures, adapts to user preferences and environmental conditions
Limitations/Gaps	High computational cost, memory-intensive, limited onboard resources	Latency, network dependency, privacy concerns, reliability in critical tasks	High computational cost, sensor calibration required, safety-critical constraints	Complex fusion algorithms, temporal synchronization issues, resource-heavy
References	[126]	[127]	[128]	[129,130]

Table 8. Overview comparison of speech-based HRI approaches.

Criterion	Classical Speech Methods	LLM/VLM-Based Methods
Approach/Models	Keyword-based AST, Command Grammar + NLP	GPT-style LLM (onboard), LLM + cloud inference, VLM-enhanced fusion
Computational Cost	Low–medium	Medium–high
Robustness to Noise	medium	high
User Cognitive Load	Low–medium	low
Hardware Requirements	Microphone, CPU	GPU, Camera, Microphone, network/CPU
Typical Test Environment/Datasets	Lab-controlled speech, small vocabulary sets	Interactive indoor tasks, multimodal datasets
Evaluation Metrics	Accuracy, latency, task completion	Task success, comprehension, robustness, user satisfaction
Strengths/Advantages	Simple, predictable, low-cost	Context-aware, flexible, multimodal, adaptive
Limitations/Gaps	Limited vocabulary, poor context understanding	High computational cost, latency, network dependency, safety-critical concerns
References	[113,114,115,116,118,123]	[126,127,128,130,131,132]

Table 9. Summary of Gesture-Based HRI Methods, Evaluation Metrics, and Limitations.

	Template/Rule-Based	ML-Based Classifiers	Deep Learning (LSTM, Transformers)
Typical Environments/Data sets	Lab-controlled, predefined gestures	Gesture datasets, indoor tasks	Simulated + real-world interactive tasks
Evaluation Metrics	Accuracy, task completion	Accuracy, F1-score, latency	Task success, recognition rate, latency
Strength/Advantages	Simple, low computational cost, predictable	Adaptive to user-specific gestures	Capture sequential and subtle cues, real-time
Limitations/Gaps	Rigid, limited gestures, poor generalization	Requires labeled data, moderate computational cost	High computational cost, requires sensor calibration
References	[138,140]	[139,141]	[136,137,142,143,144]

Table 10. Summary of touch and physical interaction HRI methods, evaluation metrics, and limitations.

	Touchscreen Interfaces	Physical Guidance	Haptic Feedback	Biometric Access
Typical Technologies	Capacitive screens, GUI systems	Kinesthetic teaching, force sensors	Vibrotactile motors, force feedback	Fingerprint sensors
Evaluation Metrics	Task completion time, error rate, usability scores	Position accuracy, response time, safety compliance	Reaction time, user perception accuracy	Authentication accuracy, latency
Advantages	Familiar, high information density	High precision, intuitive correction	Accessibility support, confirmation feedback	Secure access control
Limitations	Visibility issues, mobility constraints	Requires proximity, safety risks	Limited bandwidth, hardware complexity	Sensor reliability, hygiene concerns

Table 11. Summary of visual and gaze-based HRI methods: Approaches, metrics, and limitations.

	Gaze Tracking	Facial Expression Recognition	Face Recognition/Biometric	Visual Feedback Displays
Typical Technologies	Eye trackers, camera-based algorithms	RGB cameras, CNNs, DL models	RGB/IR camera, face embeddings	LEDs, screens, expressive robot motion
Evaluation Metrics	Accuracy of gaze detection, latency, task performance	Recognition accuracy, response time, affect detection	Authentication accuracy, latency	User comprehension, task efficiency
Advantages	Attention-aware behavior, spatial referencing	Infers emotional state, supports adaptive behavior	Secure access, personalized interaction	Communicates robot intent, social cues
Limitations	Sensitive to lighting, occlusion and head movement	Variability in expressions, cultural differences, computational cost	Privacy concern, sensor reliability	Limited expressiveness, interpretation varies by user

Table 12. Interaction modalities in indoor mobile robotics.

Modality	Advantages	Challenges	Best Applications	User Accessibility
speech	natural, hands-free, multilingual support	noise sensitivity; privacy concerns; multispeaker confusion	healthcare; domestic assistance	limited for hearing-impaired users
Gesture	intuitive; conveys spatial information; often culturally universal	lighting dependent; occlusion issues; limited vocabulary; training required	navigation; object manipulation	limited for mobility-impaired users
Touch	direct feedback; precise control; familiar UI	safety risks; requires proximity	emergency stops; detailed configuration	high accessibility
Visual/Gaze	social cues; attention awareness; nonintrusive	privacy concerns; lighting-sensitive	social interaction; attention management	limited for visually impaired users
Multimodal	robust; adaptive; comprehensive	complex; computationally intensive	complex tasks across domain	highest accessibility

Table 13. Multimodal/VLM interaction.

Modality	Gesture + Speech	Multimodal + LLM	VLM-Based Perception
Approach/Model	Classical Model	Transformer-based fusion	CLIP * or similar
Computational Cost	Medium	High	Medium to high
Robustness to Noise	Medium	High	high
User Cognitive Load	Medium	Low	low
Hardware Requirements	Camera + microphone	GPU + Camera + microphone	GPU + camera
References	[134,140,155]	[117,127]	[129,132]

* CLIP: Contrastive Language-Image Pretraining (OpenAI model).

Table 14. Comparative analysis of interaction modalities in indoor mobile robotics.

	Speech	Gesture	Touch/Physical	Visual/Gaze	Multimodal + LLM/VLM
Approach/Model	Keyword-based AST; Command Grammar + NLP	Template-based; ML-based; Deep Learning (LSTM/Transformer)	Touchscreens, force-sensitive surfaces, haptic feedback	Eye tracking, facial expression, body pose, proxemics	Fusion of speech, gesture, touch, visual inputs; Transformer-based or CLIP-like models
Computational Cost	Low–Medium	Medium	Low–medium	Medium–high	High
Robustness to Noise/Environment	Medium; sensitive to ambient noise	Medium; lighting/occlusion sensitive	High, proximity required	Medium; lighting, movement sensitive	High
User Cognitive Load	Low–Medium	Medium	Low	Medium	Low
Accessibility	Limited for hearing-impaired users	Limited for mobility-impaired users	High accessibility	Limited for visually impaired users	High
Typical Applications	Healthcare, domestic assistance	Navigation, object manipulation, collaborative tasks	Direct robot control, configuration, emergency interventions	Social interaction, attention management, secure access	Complex tasks, adaptive interaction, context-aware operation
References	[112,114,116,118]	[133,136,139,142]	[145,146,147,148]	[149,150,151,152]	[127,129,132,153,155]

Table 15. Translation of human–human trust factors to human–robot trust paradigms.

Human-Human Trust	Basis	Human-Robot Trust	Translation
Dependability	Reliability of intentions and actions	Transparency	System explainability; lack of transparency negatively impacts trust
Generosity	Caring for others’ well-being	Predictability	Anticipation of regular behavior that meets user requirements
Competence	The ability to perform tasks effectively.	Autonomy and Flexibility	Robot’s ability to accomplish tasks efficiently
Sincerity	Honesty, fairness, and integrity	Emotional Engagement	Perceived warmth and empathy of the robot

Table 16. Comparative analysis of Moxi, Temo, and Astro.

Dimension	Moxi (Diligent Robotics)	Temi	Amazon Astro
Primary domain	Healthcare (hospital logistics)	Home and office personal assistance	Home security and assistance
Target users	Nurses, hospital staff	Families, elderly users, offices	Families, elderly individuals, smart-home users
Core value proposition	Task automation (supply delivery) to free clinical staff	Voice-controlled mobile assistant for communication and entertainment	Mobile home monitoring and ecosystem integration
Interaction design	Multimodal (speech, gesture, LED gaze, audio cues)	Voice-first, touchscreen, mobile app	Voice, visual expressions, ambient sensing
Navigation and mobility	Advances hospital SLAM, elevator and badge-access integration	SLAM-based navigation, follow-me mode	Advances SLAM, autonomous patrol routines
Manipulation capability	Yes (mobile manipulation for delivery tasks)	No physical manipulation	No manipulation (monitoring-focused mobility)
Autonomy level	High (infrastructure integration, remote LTE monitoring)	Moderate (navigation autonomy, limited physical interaction)	High navigation autonomy, ecosystem-dependent functionality
Predictability	Very high (routine routes, gaze-direction cues, structures error loops)	Moderate (less explicit signaling of system state)	High for patrol routines and scheduled behaviors
Transparency mechanisms	Explicit error messages, gaze cues, audio alerts, remote developer loop	Limited explainability of always-listening cloud processing	Strong hardware cues (green/blue lights), physical privacy controls
Emotional engagement	Subtle anthropomorphism (animated eyes, gratitude behaviors)	Conversational warmth, adaptive screen positioning	Animal-like digital eyes, companion-style behavior
Privacy implications	Moderate (clinical data integration, regulated environment)	Significant (cloud-based voice and facial recognition)	High (continuous monitoring, ecosystem-wide data processing)
Infrastructure integration	Deep integration (Hospital IT, elevators, badge system)	Smart home integration, cloud AI services	Extensive amazon ecosystems and Ring integration
Major strength	Domain-specific reliability and workflow integration	Accessibility and ease of use	Ecosystem integration, hardware transparency
Primary limitations	Complex hospital integration, explainability gaps in failure cases	Limited task range (no manipulation), privacy concerns	Privacy concerns, high cost, ecosystem lock-in
HRI design philosophy	Human-aware professional assistant	Accessible consumer companion	Mobile smart-home security platform

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Khan, A.A.; Thurow, K. Human–Robot Interaction in Indoor Mobile Robotics: Current State, Interaction Modalities, Applications, and Future Challenges. Sensors 2026, 26, 1840. https://doi.org/10.3390/s26061840

AMA Style

Khan AA, Thurow K. Human–Robot Interaction in Indoor Mobile Robotics: Current State, Interaction Modalities, Applications, and Future Challenges. Sensors. 2026; 26(6):1840. https://doi.org/10.3390/s26061840

Chicago/Turabian Style

Khan, Arman Ahmed, and Kerstin Thurow. 2026. "Human–Robot Interaction in Indoor Mobile Robotics: Current State, Interaction Modalities, Applications, and Future Challenges" Sensors 26, no. 6: 1840. https://doi.org/10.3390/s26061840

APA Style

Khan, A. A., & Thurow, K. (2026). Human–Robot Interaction in Indoor Mobile Robotics: Current State, Interaction Modalities, Applications, and Future Challenges. Sensors, 26(6), 1840. https://doi.org/10.3390/s26061840

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Human–Robot Interaction in Indoor Mobile Robotics: Current State, Interaction Modalities, Applications, and Future Challenges

Abstract

1. Introduction

2. Review Methodology

2.1. Search Strategy and Data Sources

2.2. Inclusion and Exclusion Criteria

2.2.1. Inclusion Criteria

2.2.2. Exclusion Criteria

2.3. Dataset Composition

2.4. Case Study Selection

2.5. Reproducibility Trends: A Crisis and a Shift

2.6. Benchmarking Standards

2.6.1. Social Navigation Metrics (Quantitative)

2.6.2. Interaction and Trust Metrics (Qualitative/Subjective)

2.6.3. Datasets and Simulators

3. Applications of Indoor Mobile Robots: Cross-Domains and Domain-Specific Challenges

3.1. Cross-Domain HRI Challenges

3.2. Domain-Specific Requirements

3.2.1. Healthcare and Elderly Care

3.2.2. Laboratory Environments

3.2.3. Office Environments

3.2.4. Industrial Settings

3.2.5. Residential Homes

3.3. Synthesis

4. The Technical Aspect of HRI

4.1. Human-Aware Navigation

4.2. Interaction Modalities

4.2.1. Speech-Based Interaction

4.2.2. Gesture-Based Interaction

4.2.3. Touch and Physical Interaction

4.2.4. Visual and Gaze-Based Interaction

4.2.5. Multimodal and Adaptive Systems

4.3. Cross-Cutting Synthesis

5. The Human Aspect in HRI

5.1. User Experience and Acceptance

5.1.1. Human-Centered Design Principles

5.1.2. Trust and Reliability

5.1.3. Social Acceptance and Integration

5.1.4. Usability and Accessibility

5.1.5. Metrics and Evaluation Standards

5.2. Safety and Privacy

5.2.1. Physical Safety vs. Perceived Safety

5.2.2. Regulatory and Standards Framework

5.2.3. Privacy and Data Protection

5.2.4. Ethical Considerations

6. Case Studies

6.1. Moxi—Healthcare Assistant Robot from Diligent Robotics

6.2. Temi—Personal Assistant Robot from Temi

6.3. Amazon Astro—Home Security and Assistance

6.4. Comparative Analysis

7. Current Challenges and Future Perspectives

7.1. Technical Challenges

7.2. Social and Ethical Challenges

7.3. Economic and Market Challenges

7.4. Future Research Directions and Technology Trends

7.4.1. Future Research Directions

7.4.2. Technology Trends and Opportunities

8. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI