Observation of Human–Robot Interactions at a Science Museum: A Dual-Level Analytical Approach

Yoon, Heeyoon; Shim, Gahyeon; Lee, Hanna; Kim, Min-Gyu; Kim, SunKyoung

doi:10.3390/electronics14122368

Open AccessArticle

Observation of Human–Robot Interactions at a Science Museum: A Dual-Level Analytical Approach

by

Heeyoon Yoon

^1,†

,

Gahyeon Shim

^2,†

,

Hanna Lee

¹

,

Min-Gyu Kim

¹

and

SunKyoung Kim

^3,*

¹

Human-Robot Interaction Research Center, Korea Institute of Robotics and Technology Convergence, Pohang 37553, Republic of Korea

²

Graduate School of Artificial Intelligence, Ulsan National Institute of Science and Technology, Ulsan 44919, Republic of Korea

³

Institute of Library, Information and Media Science, University of Tsukuba, Tsukuba 305-8550, Japan

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Electronics 2025, 14(12), 2368; https://doi.org/10.3390/electronics14122368

Submission received: 26 April 2025 / Revised: 30 May 2025 / Accepted: 5 June 2025 / Published: 10 June 2025

(This article belongs to the Special Issue Human-Robot Interaction and Applications: Challenges and Future Perspectives)

Download

Browse Figures

Review Reports Versions Notes

Abstract

This study proposes a dual-level analytical approach to observing human–robot interactions in a real-world public setting, specifically a science museum. Observation plays a crucial role in human–robot interaction research by enabling the capture of nuanced and context-sensitive behaviors that are often missed by post-interaction surveys or controlled laboratory experiments. Public environments such as museums pose particular challenges due to their dynamic and open-ended nature, requiring methodological approaches that balance ecological validity with analytical rigor. To address these challenges, we introduce a dual-level approach for behavioral observation, integrating statistical analysis across demographic groups with time-series modeling of individual engagement dynamics. At the group level, we analyzed engagement patterns based on age and gender, revealing significantly higher interaction levels among children and adolescents compared to adults. At the individual level, we employed temporal behavioral analysis using a Hidden Markov Model to identify sequential engagement states—low, moderate, and high—derived from time-series behavioral patterns. This approach offers both broad and detailed insights into visitor engagement, providing actionable implications for designing adaptive and socially engaging robot behaviors in complex public environments. Furthermore, it can facilitate the analysis of social robot interactions in everyday contexts and contribute to building a practical foundation for their implementation in real-world settings.

Keywords:

human–robot interaction; observational study; engagement analysis; hidden Markov model; time-series behavioral modeling; science museum

Graphical Abstract

1. Introduction

Service robots are increasingly being deployed in everyday environments such as museums, restaurants, and public facilities, moving beyond the confines of laboratory testing [1,2,3,4]. As these robots become integrated into various aspects of daily life, it has become critically important to understand how users interact with them from a human–robot interaction (HRI) perspective. This is particularly relevant in the service sector, where optimizing user–robot interactions is essential for delivering meaningful and satisfying user experiences [5].

Traditionally, HRI research has relied heavily on survey-based methods to assess user perceptions and preferences [6]. However, surveys often fall short in capturing the subtle, dynamic, and context-dependent behaviors that occur in real-world settings [7]. Designing robots that can effectively respond to the diverse needs of users in complex public environments requires moving beyond self-reported data and adopting more naturalistic investigative methods [8].

Observational methods offer a distinct advantage in HRI research by capturing user behavior. Among various observational techniques, naturalistic observation is particularly valuable for identifying authentic behavioral patterns as they occur in real-life contexts [7,9]. By studying behavior in everyday environments without experimental interference, researchers can uncover subtle interaction dynamics that are often difficult to detect through surveys, controlled experiments, or post hoc interviews [8,10]. Despite these advantages, prior studies in HRI over the past decade have frequently lacked structured analytical frameworks to systematically interpret behavioral data, limiting the generalizability and practical application of their findings [11,12,13].

To address this limitation, the present study introduces a dual-level analytical approach to behavioral observation, which integrates group-level statistical analysis based on demographic factors with individual-level temporal modeling of user engagement. This approach is intended to support a more comprehensive understanding of HRI in real-world public settings and to inform the practical deployment of socially engaging and adaptive service robots.

To introduce the dual-level analytical approach, the remainder of this paper is organized as follows: Section 2 reviews related works on observational methods in HRI, highlighting the need for structured analytical frameworks. Section 3 details the development of our behavior coding scheme, including the initial identification and subsequent refinement of visitor behaviors. Section 4 presents the observation results, encompassing both group-level statistical analyses based on demographic factors (age and gender) and individual-level temporal modeling of engagement using a Hidden Markov Model (HMM). Section 5 discusses the implications of our findings for behavior-driven, user-centered design, adaptive interaction strategies, and the utility of time-based engagement modeling, along with study limitations. Finally, Section 6 concludes this paper and suggests future research directions.

2. Related Works

Service robots are increasingly being integrated into daily life, accompanied by a rise in large-scale field studies. This trend has led to growing interest in observational research methods suited for real-world settings. As a result, recent efforts in the HRI field have focused on systematizing observational research methodologies, and recent studies have been introduced that examine human behavior through observational techniques.

In HRI research, field studies have frequently employed observational methods to explore diverse user behaviors and perceptions, especially in deployment contexts outside the laboratory. One of the key strengths of observational research is its ability to capture behaviors in natural environments where direct intervention may be impractical or disruptive [8]. This is especially valuable in everyday social settings and public-facing environments, where artificial manipulation could unintentionally influence the behaviors being observed. By capturing user behavior in its natural context, observational methods allow researchers to identify genuine interaction patterns and uncover insights that are often inaccessible through surveys or controlled experiments [14].

In this way, observational data complement self-reported measures by providing a more accurate reflection of actual user behavior and mitigating biases inherent in retrospective reporting [15]. These strengths are particularly evident in analytical observational studies, which allow researchers to investigate behavioral outcomes in real-world settings without experimental manipulation, while still enabling causal inference through advanced techniques such as matching and stratification [16].

Recent studies have demonstrated the value of observational approaches in public and semi-public environments. For instance, Babel et al. analyzed pedestrian interactions with cleaning robots in train stations, identifying points of conflict and proposing design strategies to enhance robot acceptance [17]. Laura-Dora et al. examined how greeting behaviors used by a museum guide robot influenced visitor engagement, highlighting the role of spatial and cultural context in shaping HRI [18,19]. Lettinger et al. applied the Technology Acceptance Model [20] to study elderly users’ interactions with a social assistive robot in nursing homes, reporting generally positive emotional responses and improved social communication [21].

Matsumoto et al. proposed a conceptual approach to proximate human–robot teaming, which considers various task contexts, platforms, and sensors. Their approach supports the exploration of complex HRI components through iterative and time-series data collection and analysis [22]. However, the framework lacks specific methodological guidance, instead offering a high-level overview of potential components such as research questions, task environments, and analysis methods ranging from causal inference to quantitative assessment.

Diehl and Ramirez-Amaro proposed a causal inference framework that allows robots to understand, explain, predict, and prevent task execution failures by employing Causal Bayesian Networks [23,24]. Their method models cause–effect relationships using conditional probabilities, enabling contrastive explanations and failure prediction with preventative strategies. While offering a valuable methodological foundation for analyzing robotic task execution, their work primarily addresses controlled manipulation scenarios, limiting its applicability to broader social HRI or unstructured environments.

Boos et al. proposed the Compliance–Reactance Framework, which uses conditional probabilities to evaluate human responses to robot cues in both experimental and field studies [25]. This framework supports the analysis of compliance, cooperation, and resistance behaviors and offers recommendations for improving robot interaction design. However, it primarily focuses on the binary presence or absence of compliance, overlooking the nuanced progression of user engagement.

Kim et al. conducted an observational study using Bayesian networks to model subtle engagement behaviors in interactions between individual children with autism spectrum disorders and a social robot [26]. Their research showed the feasibility of probabilistic inference in understanding complex interaction dynamics in therapeutic contexts. However, the study focused on a highly specific setting—prosocial skill development in autism therapy—and involved a small sample size, limiting the generalizability of its findings.

While prior studies demonstrated the value of observational approaches in HRI, many have lacked structured and reproducible analytical frameworks, often relying on qualitative categorization or descriptive reporting without standardized coding schemes or formal behavior modeling [7]. To address these limitations, the present study introduces a structured dual-level observational approach that combines group-level statistical analysis based on demographic trends with individual-level temporal modeling of engagement using an HMM [27]. This method is supported by a flexible behavior coding scheme grounded in formalized behavioral grammar. The primary contributions of this study include the following: (a) formalizing grammar to define human behavioral patterns; (b) developing a low-specificity, adaptable coding scheme applicable across diverse real-world contexts; (c) implementing an iterative refinement process for improving coding reliability; (d) providing guidelines for consistent video tagging; (e) applying a probabilistic sequential model to analyze the link between time-series behavioral data and user engagement; and (f) validating the proposed approach in a real-world public environment, specifically a science museum.

3. Behavior Coding Scheme

The development of the behavior coding scheme was divided into two stages: an initial identification stage, where behavior types were classified and predefined, and a refinement stage, where the classification and labeling of these predefined behaviors were revised. This approach was adopted to ensure that behaviors observed during video tagging were interpreted objectively and consistently. Since open-ended or ambiguous definitions could lead to subjective interpretations and compromise coding reliability, separate groups of coders were assigned to each stage to maintain objectivity and consistency across both phases.

3.1. Initial Identification of Visitor Behaviors

In this stage, two coders reviewed the collected videos and formalized the specific situations and conditions in which behavior types occurred to define the coding scheme. The coding scheme was primarily defined as physical proximity [28] and interaction attempts [29,30].

Andrés et al. classified the characteristics of interactions based on the types of interactions between humans and robots [31]. They mentioned that the characteristics of interactions were categorized into interaction level, role in interaction, physical proximity, spatiotemporal context, and level of intervention. The most clearly demonstrated interaction characteristic of the visitors at the RoboLife Museum was the physical proximity. Hence, physical proximity was applied as one of the behavior factors in the coding scheme. Physical proximity indicates the distance at which humans feel a sense of intimacy with others [28]. The act of humans reducing the distance to a robot can be interpreted as feeling a sense of intimacy. In the collected videos, the visitors exhibited behavior types related to physical proximity, such as approaching, avoiding, and passing by the museum guide robot.

The second behavior factor, interaction attempts, includes the various kinds of behaviors. The visitors who approached the robot closely attempted multiple interactions with the museum guide robot. Their behaviors were categorized into three main types: attempting social interactions such as greeting and gesturing, attempting to obtain information from the screen by touching the robot, and visually observing the robot without attempting interaction.

Table 1 shows representative behavior cases and selected video clips used during the initial identification phase. These examples include both individual and multi-person interactions. While some of the behaviors involve multiple people (e.g., two persons touching the screen together), each individual’s actions were later coded separately for consistency in analysis. The selected behavior types were confirmed to be classifiable based on the attributes of physical proximity and interaction attempts, which served as the foundational categories for the behavior coding scheme.

To provide a descriptive definition of visitor behaviors based on selected behavioral types, behavioral grammar was developed as shown in Table 2. The grammar was designed to express behaviors starting from the point at which a visitor intends to initiate an interaction with a robot. In addition, behavioral definitions were structured to be described based on stimuli and responses, where stimuli refer to specific robot actions (stationary or moving) and responses refer to characteristic behaviors, such as physical proximity and interaction attempts, categorized as primary behavioral factors. Grammar 1 applies to behaviors before an interaction attempt while Grammar 2 applies to behaviors after an interaction attempt. For instance, using Grammar 1, we can descriptively define behavior as ‘approaches while looking (a response related to recognizing the robot’s presence) and passes (a response related to distance) when the robot is stationary (stimulus)’. The action of recognizing the robot’s presence, such as gazing or head orientation, was considered synchronous behavior with the distance-related response. Grammar 2 allows us to define behaviors that occur after approaching the robot, that is, after reaching a distance where interaction with the robot is possible. For example, descriptive definitions such as ‘greets (a response related to social interaction)’ and ‘touches the screen (a response related to information seeking)’ are possible.

The initial draft of the coding scheme included four behaviors related to physical proximity (approach, pass, avoid, and follow) and three behaviors related to interaction attempts (touch, gesture, and none) for a total of seven behaviors. In describing the seven behaviors according to Grammars 1 and 2, behaviors that were subject to various subjective interpretations were expressed differently. For example, ‘approaches’ in approach was changed to ‘stops and stands’, and ‘avoids’ in avoid was changed to ‘steps aside’. ‘Approaches’ implies closing the distance with the robot, connecting to an interaction attempt. However, it is difficult to express with ‘approach’ the meaning that the behavior related to distance ends to attempt the next interaction, and also difficult to distinguish the visitor’s behavior shown in the video data from ‘avoid’, ‘pass’, and ‘follow’ with ‘approach’. Therefore, it was modified to a more specific expression, such as ‘stops and stands’, which conveys the meaning of the end of the behavior.

Although the behaviors of visitors avoiding or passing the robot can be distinguished by the letters ‘avoid’ and ‘pass’, they appeared as similar behaviors in the video data. Hence, expressions that clearly distinguish ‘avoid’ and ‘pass’ were needed. To this end, we distinguished between the cases where the robot is stationary or moving as stimuli, added temporal concepts such as ‘passes immediately’, and also added directional concepts such as ‘steps aside from the path the robot is moving’.

3.2. Refinement of Behavior Coding Scheme

To ensure intercoder reliability, a refinement process was conducted for the coding scheme. Two additional coders were involved in this phase. These coders were tasked with tagging videos using the established coding scheme. Before initiating the video-tagging process, they reviewed and revised the descriptive definitions for behaviors. This revision addressed instances that the coders either overlooked during the initial identification of visitor behavior, subject to varied interpretations, or were incongruent with the natural context in the representative behavior selection and initial analysis.

Modifications were primarily made to the definitions of interaction initiation. For instance, there were instances where visitors often touched not only the screen but also the robot’s body. Given the robot’s mobility within the museum lobby, determining whether the screen or the body was touched could be challenging due to the video’s angle. Consequently, the definition of touch was expanded to encompass interactions with the robot’s screen, its body, or both.

Furthermore, observations revealed that children exhibited a wide range of gestures toward the robot. There were various social interaction attempts, such as waving their hands in greeting, shaking their heads as they approached, and raising their arms and bringing them toward the robot’s face. As a result, gesture was refined to include all forms of interaction attempts, excluding touch.

Lastly, it was observed that some visitors would approach the robot, halt, and remain stationary but then engage in no further action. Since the interaction attempts are considered to be a series of continuous behaviors with physical proximity, the behavior of none, where no action is taken, was added in the refinement process as a pause. The revised coding scheme is detailed in Table 3. This includes the seven behaviors of physical proximity and interaction attempts, along with their respective descriptive definitions.

All behaviors were coded per individual, even when multiple people appeared in the same video frame.

3.3. Video Tagging

The two coders involved in the video tagging completed the refinement process and proceeded to tag the videos based on the refined coding scheme. The tagging focused on the behaviors exhibited by individuals toward the robot, based on the assumption that behaviors are meaningful only when the visitor has clearly recognized the robot’s presence. All behaviors were coded on an individual basis, even in scenes where multiple people appeared in the same video frame. Consequently, five key guidelines were established for the tagging process:

The tagging process is based on the subject’s behavior. All actions become significant once the subject acknowledges the presence of the robot. Therefore, observations begin when the subject’s face is oriented toward the robot.
The behavior code “Pass” was used when a visitor noticed the robot but continued to move past it without halting, determined by observing the direction of the visitor’s head.
If the visitor followed the robot and eventually stopped while looking at the robot, “F-AP” was tagged sequentially. Conversely, if the visitor started to follow the robot but then diverged onto a different path, “F” was tagged.
The behavior code “None” was specifically tagged only for the behavior after either an approach or follow action. It was used when no gesture or touch occurred after the visitor approached the robot. “None” was also used to denote the absence of interaction or the interval between different interactions.
Continuous occurrences of the same interaction, even if separated by intervals, were considered a single action and tagged as such.

Before starting the video tagging, to accurately identify the behavior of specific individuals across different footage, we documented the external characteristics (age, gender, and clothing) of all visitors shown in the videos, along with the filename of the video in which they appeared. Subsequently, we trimmed the original footage to only include the segments featuring the targeted individual for easier reference in Dartfish Software (Pro S), a video-tagging program.

Through this data preprocessing stage, a total of 290 samples were extracted. To maintain the consistency of the tagging data, the two coders collaboratively tagged the same set of 290 people, rather than dividing the workload between them. The videos were analyzed to display both the physical proximity and the interaction attempts of all visitors, with the timeline unit set to seconds. Each value was presented in Interlinear Text format, separated by tabs for clarity. The inter-rater reliability analysis yielded a Cohen’s kappa of 0.8, indicating a substantial level of agreement between the two coders at 84.2%, as shown in Table 4.

4. Observation Results

4.1. Environment

This observation study was conducted at the RoboLife Museum in Pohang, South Korea. The video data about the interactions between visitors and the museum guide robot were collected under Institutional Review Board approval (approval number: KIRO-2023-IRB-01). Upon entering the museum, visitors were to confirm their reservation status at the information desk, where they were informed about the purpose of data collection, its use, and security measures. Written consent was obtained from all participants, and especially for children, written consent was provided by their guardians, ensuring full ethical compliance.

The museum guide robot is equipped with a service that allows it to patrol and provide guidance at both the entrance and central lobby. To facilitate this, CCTV cameras capable of monitoring the entire entrance and lobby from all angles were installed. Video data were recorded for approximately 4 h each day, from 10 A.M. to 6 P.M., aligned with the four scheduled daily exhibition tours. This resulted in a total of 24 h of collected footage. Data was collected through video recordings of 290 visitors. The coding of the video data followed the behavior coding scheme described in Table 3.

Given that the observations were conducted on an unspecified population of visitors to the RoboLife Museum, demographic analysis was not feasible. Nonetheless, comparisons were made between visually discernible gender groups, and between the adult group and the children and adolescent group.

4.2. Group-Level Behavioral Observation

Group-level comparisons were conducted by categorizing participants based on gender and age. Mann–Whitney U tests were then used to determine whether there were significant differences in the frequency and duration of physical proximity and interaction attempts between these demographic groups.

A total of 290 participants were included in the analysis. For group-level comparisons, participants were categorized by gender (male = 162, female = 128) and by age group (children and adolescents = 160, adults = 130).

4.2.1. Gender Difference

No significant difference was observed in the duration of maintaining physical proximity between genders (male: M = 20.7, S.D. = 16.6; female: M = 22.9, S.D. = 17.9; p-value = 0.26, r = 0.066), nor for the duration of interaction attempts (male: M = 26.6, S.D. = 43.3; female: M = 21.5, S.D. = 43.7; p-value = 0.16, r = 0.083). There were no significant differences found in the frequency of physical proximity (male: M = 0.097, S.D. = 0.118; female: M = 0.071, S.D. = 0.073; p-value = 0.708, r = 0.022) and interaction attempts (male: M = 0.039, S.D. = 0.053; female: M = 0.029, S.D. = 0.055; p-value = 0.067, r = 0.107) between genders.

4.2.2. Age Difference

A significant difference was found in the duration of interaction attempts between age groups (U-value = 8426.5, p-value = 0.0027, r = 0.176), with children and adolescents (M = 30.2, S.D. = 48.8) exhibiting a higher duration than adults (M = 17.2, S.D. = 34.7). The frequency of interaction attempts also differed significantly between age groups (U-value = 8882, p-value = 0.021, r = 0.135), with children and adolescents (M = 0.039, S.D. = 0.057) engaging more frequently than adults (M = 0.029, S.D. = 0.049). However, the duration of physical proximity did not significantly vary (adults: M = 21.3, S.D. = 14.8; children and adolescents: M = 21.9, S.D. = 19; p-value = 0.353, r = 0.054). No significant difference was also found in the frequency of physical proximity between age groups (adults: M = 0.07, S.D. = 0.08; children and adolescents: M = 0.09, S.D. = 0.11; p-value = 0.666, r = 0.025).

4.3. Individual-Level Behavioral Observation

In the individual-level behavioral observation, we modeled visitor’s engagement levels using a probabilistic model. It focused on distinguishing the time-series behavioral patterns of museum visitors and categorizing the depth of engagement in their interactions with the museum guide robot. It was conducted by utilizing video data, capturing the interaction between visitors and the museum guide robot.

4.3.1. Model Selection and Data Preprocessing

The HMM is a probabilistic model to deduce patterns and information about hidden states that are not directly observable [32]. An HMM can be employed to discover hidden states using observable variables in time-series data [33]. Prior research [34,35,36,37] has successfully applied an HMM in analyzing human behavior patterns. The selection of the HMM is typically determined by the input format: standard HMMs are used for discrete data, Gaussian Hidden Markov Models (GHMMs) for continuous data, and Gaussian Mixture Models (GMMs) for cases that involve both discrete and continuous data. We achieved using a series of observed behavior codes and their durations and frequencies from 290 samples as input variables. In this study, a Gaussian Mixture Hidden Markov Model (GMM-HMM) was developed to infer the engagement levels of individual visitors with the museum guide robot.

The estimation of the hidden state was conducted using three data sources. The variable duration was defined as the length in seconds during which specific individual behaviors such as approach, follow, avoid, etc., were observed. The variable frequency was defined as the occurrences of each behavior occurring divided by the entire observation period for each visitor. The variable code was assigned an integer value ranging from 0 to 5 (approach = 0, avoid = 1, follow = 2, gesture = 3, pass = 4, and touch = 5), serving as a unique identifier for each distinct behavior representing 6 individual codes in Table 3. Although the behavior code none can represent a type of pause in the interaction process, it was excluded to minimize ambiguity, simplify transition dynamics, and reduce noise in the engagement analysis. While none in group-level observation provides a comprehensive view of the behavior of different visitor groups including active interaction and pauses, which is important for understanding differences in interaction styles and designing for diverse visitor needs, the HMM focuses purely on active states, allowing for a more straightforward interpretation of the visitor behavior without introducing additional complexity from pauses that can be challenging to interpret in a consistent manner. This simplification helps in achieving actionable insights into how the visitors engaged with the robot and what improvements can be made to sustain or increase engagement levels.

4.3.2. Model Training

To estimate the parameters of GMM-HMM, we employed K-fold cross-validation to determine the optimal number of hidden states (K = 5). The cross-validation procedure identified the optimal model by selecting the one with the highest average log-likelihood across folds, thereby ensuring that the model could generalize effectively to unseen data. As depicted in Figure 1, the optimal number of hidden states was selected to be five. This approach allowed the model to effectively capture the variability inherent in user behaviors. Emissions from each hidden state were modeled as a combination of several Gaussian distributions, providing a flexible representation capable of accounting for the diverse interaction patterns observed.

The model parameters were refined iteratively using the Expectation Maximization (EM) algorithm. These parameters included the initial state probabilities, transition probabilities between hidden states, and GMM-HMM parameters for emissions. The dataset used for training consisted of individual observation sequences. Each sequence corresponded to a continuous stream of behaviors from a single participant. For instance, a sequence such as approach, touch, avoid, and pass—coded from a participant approaching the robot, interacting briefly, and then leaving—was treated as one unit of analysis. The segmentation relied entirely on observed behavioral continuity and not on fixed time intervals. The training process was set to n_iteration = 100, and the dataset was split into an 8:2 ratio for the train and test sets. The model training converged after several iterations as the log-likelihood values stabilized, indicating that the model had effectively learned the underlying patterns in the data. The final model produced a transition matrix with probabilities that reflected the likelihood of users transitioning between different hidden states, as shown in Figure 2. The optimal number of hidden states was determined based on the highest average log-likelihood achieved during cross-validation, and the final model used these hidden states to classify user behaviors. The observation distributions and state transition dynamics were then further analyzed to understand user interaction patterns.

4.3.3. Model Interpretation

The results provide valuable insights into the different levels of visitor’s engagement during interactions with the robot. The transition diagram in Figure 3 allows us to better understand the behaviors associated with each of the five hidden states. Each state represents a distinct level of engagement, ranging from initial exploration to more focused interactions. The state transitions illustrate how the visitors move between these phases, highlighting typical paths in their engagement.

Figure 4 illustrates the distribution of observed behaviors across five inferred states. The x-axis in each plot represents the feature value intervals without physical units, as both duration and frequency denote normalized or scaled feature values rather than raw time or frequency.

The five states derived from the HMM, namely

L E_{e n t r y}

,

L E_{o b s e r v e}

,

M E_{t r i a l}

,

M E_{f o c u s}

, and

H E_{e n g a g e d}

, are defined based on distinct behavioral profiles involving differences in action type, duration, and frequency, as illustrated in Figure 4. These states represent various levels and styles of engagement, ranging from minimal physical interaction to sustained immersive behavior. The transitions between these states are governed not by arbitrary shifts but by measurable behavioral thresholds that reflect changes in user attention, proximity, and interaction strategy.

L E_{e n t r y}

is primarily characterized by a high frequency of touch behavior, while gesture appears only sporadically. The duration of behavior is mostly short, typically around five, and behavior frequency generally falls within the low range, 0.025, as demonstrated in the first row of Figure 4. This pattern indicates that the user engages in minimal, short-term physical contact with the robot without progressing toward cognitively or socially meaningful interaction.

In contrast,

L E_{o b s e r v e}

is defined by a more diverse behavior set, with pass being the most frequently observed, followed by touch, gesture, and follow. These behaviors typically occur with moderate durations ranging from 5 to 35 and low-to-moderate frequencies between 0.025 and 0.125, as indicated in the second row of Figure 4. This state reflects a scenario in which the user lingers near the robot, expressing ambient interest but without initiating sustained or focused engagement.

As user engagement intensifies, the user transitions into the

M E_{t r i a l}

state, which is characterized by broad behavioral diversity. In this phase, pass and approach behaviors are especially prominent, with behavior durations generally short, around five, and frequency values ranging widely from 0.025 to 0.275, as illustrated in the third row of Figure 4. These behavioral patterns reflect an exploratory interaction mode, in which the user performs a variety of discrete actions including touch, gesture, pass, and approach without yet establishing any repeated or consistent engagement pattern. Functionally,

M E_{t r i a l}

serves as a transitional zone, allowing the user to either escalate into more focused interactions or regress to passive states, depending on the perceived responsiveness of the robot.

The transition into

M E_{f o c u s}

occurs when exploratory behaviors in

M E_{t r i a l}

begin to consolidate into repeated, directed interactions.

M E_{f o c u s}

is almost exclusively dominated by repetitive approach actions, with durations typically moderate, spanning 5 to 35, and frequency values clustered in the moderate range between 0.025 and 0.125, as indicated in the fourth row of Figure 4. This state reflects a form of attentional lock-in, where the user remains near the robot and demonstrates consistent behavior without yet initiating overt social interaction. Touch and gesture are not observed in

M E_{f o c u s}

.

Finally, the

H E_{e n g a g e d}

state is defined by sustained touch interactions accompanied by a substantial amount of gesture behavior. Other behaviors such as follow, avoid, and pass appear less frequently but are still present. Durations vary widely, ranging from short to extended periods between 5 and over 65, while frequency values are generally low and concentrated around 0.025, as demonstrated in the fifth row of Figure 4. Together, these metrics indicate a sustained, content-focused interaction session, wherein the user exhibits clear attentional commitment and task engagement.

For example, the transition from

L E_{e n t r y}

to

M E_{f o c u s}

is associated with a shift in behavioral profile from short, isolated touch interactions to longer sequences dominated by repeated approach behaviors. This change reflects an increase in spatial attention and physical proximity, as indicated by the transition structure shown in Figure 3 and the distinct behavior distributions in Figure 4. This transition is supported by a probability of 0.33, as shown in Figure 3. Alternatively, when the touch duration exceeds, the user may transition directly into

H E_{e n g a g e d}

, with a transition probability of 0.16. In the absence of such behavioral escalation, however, the user typically remains in

L E_{e n t r y}

through self-transition, which occurs with a probability of 0.44.

The pathway from

L E_{o b s e r v e}

to

M E_{f o c u s}

is initiated when repeated approach behaviors are observed, suggesting proximity-based interest. If the user’s actions become increasingly varied rather than converging toward a consistent behavioral pattern by attempting more types of interactions without repetition, the user is more likely to transition into

M E_{t r i a l}

rather than

M E_{f o c u s}

. These two pathways,

L E_{o b s e r v e}

to

M E_{f o c u s}

and

L E_{o b s e r v e}

to

M E_{t r i a l}

, are supported by transition probabilities of 0.54 and 0.13, respectively, and are structurally permitted as shown in Figure 3.

In the case of

M E_{t r i a l}

, if users begin to repeat specific behaviors or if behavior durations exceed, the user tends to transition into

M E_{f o c u s}

, with a probability of 0.39. Conversely, if the user does not perceive meaningful feedback, they often return to

L E_{o b s e r v e}

, as reflected in the transition probability of 0.35.

While touch and gesture are not observed in

M E_{f o c u s}

according to Figure 4, their emergence at this stage is interpreted as a trigger for transition into

H E_{e n g a g e d}

, supported by a transition probability of 0.25, as shown in Figure 3. However, if no further interaction occurs within

M E_{f o c u s}

, the user may regress to

L E_{e n t r y}

with a probability of 0.32 or

L E_{o b s e r v e}

with a probability of 0.21. Once in the

H E_{e n g a g e d}

state, users may maintain their engagement via self-transition with a probability of 0.25 or shift back to earlier states depending on content depletion or fatigue. Specifically, transitions back to

M E_{f o c u s}

with a probability of 0.15 or to

L E_{e n t r y}

with a probability of 0.15 may occur if the user’s attention diminishes.

4.4. A Guide to Observation Studies Using Our Approach

Based on the procedure used in our observational study, we propose a generalized guideline for conducting observation studies in HRI, as illustrated in Figure 5.

4.4.1. Development of Behavior Coding Scheme

The development of the coding scheme follows a two-stage process. Initially, researchers define a preliminary set of behavior categories informed by observing collected data. Afterward, new interaction patterns may be identified, or the definitions of existing behaviors may require modification, prompting revisions to the coding categories. This iterative refinement process ensures that the coding scheme evolves to remain relevant and accurately reflects the observed behaviors. A detailed account of the process for developing the behavior coding scheme is provided in Figure 5.

4.4.2. Group-Level Observation for Broad Behavior Patterns

Group-level observation is focused on capturing broad behavioral patterns across large groups or populations. It emphasizes the frequency, duration, and proximity of interactions with robots or other stimuli. This approach is beneficial when studying group-level trends or analyzing differences across demographic variables.

To implement group-level observation effectively, researchers begin by employing video surveillance using wide-angle cameras or CCTV systems to record interactions across large environments. This setup facilitates the observation of broad behavioral trends such as how visitors approach or avoid robots. By capturing a comprehensive view of the environment, researchers can gather valuable data on interaction patterns within a diverse population.

Next, researchers apply the behavior coding scheme to categorize general actions, including approaching, avoiding, or interacting with the robot, and then conduct statistical analyses such as t-tests or an ANOVA to identify significant differences in behaviors across demographic groups, including factors like age, gender, or other relevant categories. These statistical methods are well-suited for quantifying broad interaction trends and comparing behavioral responses across different population segments, thereby providing a more comprehensive understanding of group-level engagement dynamics.

4.4.3. Individual-Level Observation for Detailed Interaction Dynamics

Individual-level observation offers a detailed lens for capturing the moment-by-moment dynamics of HRI. This approach focuses on subtle behaviors such as gestures, gaze shifts, and specific physical or verbal responses directed toward the robot. By analyzing these nuanced aspects of interaction, researchers can gain deeper insights into comprehensive interaction processes.

A key strength of this method lies in its use of time-series data to represent behavioral sequences. The process begins with the precise coding of observed behaviors, which are then converted into temporal data streams. This enables the systematic tracking of interaction dynamics over time, capturing shifts in engagement levels and transitions between distinct interaction states.

In this study, we employed an HMM to analyze temporal engagement trajectories. The HMM was particularly well-suited to our context, as it captures latent engagement states and models the probabilistic transitions between them over time. This allowed us to identify meaningful patterns in user behavior that would have been difficult to detect using static or frequency-based methods.

However, it is important to note that the choice of temporal model should be guided by the nature of the data and the research goals. While the HMM proved advantageous for modeling sequential engagement states in our study, alternative models—such as Conditional Random Fields, Dynamic Bayesian Networks or deep-learning-based temporal models—may be more appropriate in other contexts.

Ultimately, the use of temporal modeling in individual-level observation enhances our ability to understand how engagement evolves, identify key turning points in interaction, and inform the design of adaptive systems. This approach contributes to more responsive and personalized robot behaviors, fostering richer and more effective HRIs in real-world environments.

5. Discussion

This study offers meaningful insights into HRI in a real-world setting, focusing on spontaneous interactions between visitors and a museum guide robot. The findings demonstrate the effectiveness of a dual-level observational approach—combining group-level and individual-level analyses—for capturing both overall behavioral trends and dynamic engagement processes.

The results have several implications for the design and implementation of service robots in public spaces. Key themes include the importance of behavior-driven, user-centered design; the value of adaptive interaction strategies; and the utility of time-based engagement modeling for supporting long-term interaction.

5.1. Behavior-Driven, User-Centered Design

A group-level analysis revealed that children and adolescents engaged more frequently and intensely with the robot compared to adults. This aligns with prior research showing higher responsiveness among younger users to interactive technologies [38,39,40,41]. While there is some evidence of gender-related differences in attitudes or perceptions toward robots, research on the influence of gender on actual interaction frequency or duration in real-world HRI settings remains limited. Most existing studies have focused on survey-based assessments rather than direct observation of interaction behaviors, and findings have been somewhat inconsistent [42,43].

These findings underscore the importance of user-centered design principles grounded in actual behavioral patterns. For younger users, playful and exploratory interaction elements may enhance engagement [44,45], while adults may prefer more functional and task-oriented interfaces [46]. This observation aligns with the theory of selective optimization with compensation, which suggests that older users prioritize technologies that enhance everyday functionality.

To increase inclusivity and engagement, robot behaviors should be tailored to demographic and situational characteristics. Flexible interfaces and modular interaction modes can help ensure that service robots remain accessible and engaging for a wide range of users in public environments.

5.2. Adaptive, Dynamic Interaction Strategies

Through the use of an HMM, this study identified three distinct engagement states—low, moderate, and high—capturing the dynamic nature of user interaction. Visitors categorized as HE showed more sustained and complex engagement behaviors, consistent with research linking prolonged interaction to increased curiosity and attention.

These findings emphasize the need for robots to adopt real-time adaptive strategies. For users in LE states, simple prompts such as greetings or visual stimuli could initiate engagement. In contrast, HE users may benefit from content-rich experiences like storytelling, games, or interactive tasks [47]. Tailoring interactions in this way can optimize user experience and maintain engagement over time.

5.3. Utility of Time-Based Engagement Modeling

This study highlights the value of time-based engagement modeling for understanding and supporting user interaction in real-world HRI contexts. By converting behavioral codes into time-series data, we were able to analyze not only the frequency of behaviors but also the transitions between engagement states and the duration of each state. This temporal perspective allows engagement to be treated as a dynamic, context-sensitive process rather than a fixed or binary outcome.

Such an approach provides a meaningful foundation for designing robots that can adapt to the ebb and flow of user behavior in real time. Integrating temporal modeling with intelligent sensing technologies could lead to the development of socially responsive robots capable of recognizing patterns, predicting disengagement, and modifying their behavior to sustain meaningful interaction.

Importantly, this dynamic modeling also offers practical implications for long-term engagement. In public environments such as museums—where users may interact with the same robot across multiple visits—robots equipped with memory and adaptive mechanisms can foster familiarity and build user trust over time [48,49]. Our findings, particularly the transitions from LE to HE, suggest that repeated positive experiences can deepen user–robot relationships. Gradually increasing interaction complexity and personalization can help sustain interest and promote continued engagement [50].

These insights extend beyond the museum context and are relevant to other domains such as healthcare, education, and customer service—settings where long-term engagement and trust are vital for effective and meaningful HRI.

5.4. Limitations

While this study provides valuable insights into HRI within a real-world public setting, several limitations should be acknowledged.

First, although the proposed approach effectively categorized visitor engagement into three levels, it did not investigate how these engagement states could be leveraged to inform real-time robot behavior. As a result, the findings remain limited to post hoc analysis and do not yet support adaptive interactions during live deployments. Future research should explore mechanisms for detecting engagement states in real time and dynamically adjusting robot behaviors to enhance user experience.

Second, the present study focused exclusively on short-term, single-session interactions, without examining how engagement may evolve over repeated encounters. Understanding long-term engagement trajectories is essential for designing robots that can foster sustained interest, trust, and acceptance—particularly in public environments where recurring user contact is common.

Third, the analysis was limited to age and gender as demographic variables. While these factors are relevant, a more inclusive approach would incorporate additional characteristics such as cultural background, educational attainment, prior experience with technology, and cognitive or physical diversity. Expanding the demographic scope would improve the generalizability of the findings and support broader applications in other domains, including healthcare, retail, and public services.

6. Conclusions

In this study, we proposed an observational approach to analyze HRI in real-world public environments, with a specific focus on a science museum setting. By combining group-level and individual-level behavioral observation methods, our approach captured both broad behavioral trends and subtle interactional nuances in visitor interactions with a museum guide robot. The group-level analysis revealed demographic differences, showing that children and adolescents engaged more actively than adults. At the individual level, we applied a GMM-HMM model to classify visitor engagement into three distinct levels, enabling a dynamic view of how engagement evolves over time. This dual-level approach provided valuable insights into interaction dynamics and informed practical design guidelines for optimizing HRI in complex public environments.

Building on these findings, several promising directions for future research emerge. First, long-term engagement patterns—particularly for repeat or returning users—should be explored to understand how sustained interaction can influence user experience and robot effectiveness in public settings. Second, incorporating affective and physiological data using non-intrusive biometric sensing—such as gaze tracking, facial expression analysis, posture estimation, or audio-based crowd response detection—could deepen our understanding of both individual and group-level engagement. These multimodal signals can reveal internal and collective emotional states that are difficult to observe through behavioral cues alone, enabling more adaptive and context-aware HRI in public environments. Finally, given the substantial time and labor required for manual observational studies, future work should explore the use of artificial intelligence to assist in automating behavioral annotation and pattern recognition, thereby enhancing the efficiency and real-world applicability of observational research.

Author Contributions

All authors worked on conceptualizing the observation research framework for human–robot interaction. H.Y. and G.S. were responsible for the methodology, investigation, visualization, project administration, and the original draft. H.L. was involved in the software implementation and testing. M.-G.K. and S.K. served as the principal investigators for the study and revised the original manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Institute of Information and Communications Technology Planning and Evaluation (IITP) grant funded by the Korean Government through MSIT (Development of AI Technology for Early Screening of Infant/Child Autism Spectrum Disorders Based on Cognition of the Psychological Behavior and Response) under grant RS-2019-II190330. This work was also equally supported by the Korea Evaluation Institute of Industrial Technology (KEIT) grant funded by the Korean Government through MOTIE (Development of Behavior-Oriented HRI AI Technology for Long-Term Interaction between Service Robots and Users) under grant 20023495.

Institutional Review Board Statement

This study was conducted in accordance with the Declaration of Helsinki and was approved by the Institutional Review Board of the Korea Institute of Robotics and Technology Convergence (KIRO-2023-IRB-01) on 29 March 2023.

Informed Consent Statement

Informed consent was obtained from all the participants involved in this study.

Data Availability Statement

All data are available upon request.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Gehle, R.; Pitsch, K.; Dankert, T.; Wrede, S. How to open an interaction between robot and museum visitor? Strategies to establish a focused encounter in HRI. In Proceedings of the 12th ACM/IEEE International Conference on Human-Robot Interaction (HRI 2017), Vienna, Austria, 6–9 March 2017; pp. 187–195. [Google Scholar]
Gasteiger, N.; Hellou, M.; Ahn, H.S. Deploying social robots in museum settings: A quasi-systematic review exploring purpose and acceptability. Int. J. Adv. Robot. Syst. 2021, 18, 1–13. [Google Scholar] [CrossRef]
Eksiri, A.; Kimura, T. Restaurant service robots development in Thailand and their real environment evaluation. J. Robot. Mechatronics 2015, 27, 91–102. [Google Scholar] [CrossRef]
Mintrom, M.; Sumartojo, S.; Kulić, D.; Tian, L.; Carreno-Medrano, P.; Allen, A. Robots in public spaces: Implications for policy design. Policy Des. Pract. 2021, 5, 123–139. [Google Scholar] [CrossRef]
Borghi, M.; Mariani, M.M. Asymmetrical influences of service robots’ perceived performance on overall customer satisfaction: An empirical investigation leveraging online reviews. J. Travel Res. 2023, 63, 1086–1111. [Google Scholar] [CrossRef]
Zimmerman, M.; Bagchi, S.; Marvel, J.; Nguyen, V. An analysis of metrics and methods in research from human-robot interaction conferences, 2015–2021. In Proceedings of the 17th ACM/IEEE International Conference on Human-Robot Interaction (HRI 2022), Sapporo, Japan, 7–10 March 2022; pp. 644–648. [Google Scholar]
Leichtmann, B.; Nitsch, V.; Mara, M. Crisis ahead? Why human-robot interaction user studies may have replicability problems and directions for improvement. Front. Robot. AI 2022, 9, 838116. [Google Scholar] [CrossRef] [PubMed]
Hoffman, G.; Zhao, X. A primer for conducting experiments in human–robot interaction. ACM Trans. Hum.-Robot Interact. 2020, 10, 1–31. [Google Scholar] [CrossRef]
Hauser, E.; Chan, Y.-C.; Modak, S.; Biswas, J.; Hart, J. Vid2Real HRI: Align video-based HRI study designs with real-world settings. In Proceedings of the 33rd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN 2024), Pasadena, CA, USA, 26–30 August 2024; pp. 542–548. [Google Scholar]
Bartneck, C.; Belpaeme, T.; Eyssel, F.; Kanda, T.; Keijsers, M.; Šabanović, S. Research methods. In Human-Robot Interaction: An Introduction; Cambridge University Press: Cambridge, UK, 2020; pp. 126–160. ISBN 978-1-108-47310-4. [Google Scholar]
Bu, F.; Fischer, K.; Ju, W. Making sense of robots in public spaces: A study of trash barrel robots. J.-Hum.-Robot. Interact. 2025, 1–20. [Google Scholar] [CrossRef]
Koike, A.; Okafuji, Y.; Hoshimure, K.; Baba, J. What drives you to interact?: The role of user motivation for a robot in the wild. In Proceedings of the 20th ACM/IEEE International Conference on Human-Robot Interaction (HRI 2025), Boulder, CO, USA, 10–13 March 2025; pp. 183–192. [Google Scholar]
Chang, W.-L.; Šabanovic, S.; Huber, L. Observational study of naturalistic interactions with the socially assistive robot PARO in a nursing home. In Proceedings of the 23rd IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN 2014), Edinburgh, UK, 25–29 August 2014; pp. 294–299. [Google Scholar]
Zawieska, K.; Hannibal, G. Towards a conceptualisation and critique of everyday life in HRI. Front. Robot. AI 2023, 10, 1212034. [Google Scholar] [CrossRef]
Schrum, M.; Ghuy, M.; Hedlund-Botti, E.; Natarajan, M.; Johnson, M.; Gombolay, M. Concerning trends in likert scale usage in human-robot interaction: Towards improving best practices. J. Hum.-Robot Interact. 2023, 12, 1–32. [Google Scholar] [CrossRef]
Rosenbaum, P.R. Observation and Experiment: An Introduction to Causal Inference; Harvard University Press: Cambridge, MA, USA, 2017; ISBN 978-0674975576. [Google Scholar]
Babel, F.; Kraus, J.; Baumann, M. Findings from a qualitative field study with an autonomous robot in public: Exploration of user reactions and conflicts. Int. J. Soc. Robot. 2022, 14, 1625–1655. [Google Scholar] [CrossRef]
Daczo, L.-D.; Kalova, L.; Bonita, K.L.F.; Lopez, M.D.; Rehm, M. Interaction initiation with a museum guide robot—From the lab into the field. In Proceedings of the 18th IFIP TC13 International Conference on Human-Computer Interaction (INTERACT), Bari, Italy, 30 August–3 September 2021; pp. 438–447. [Google Scholar]
Shiomi, M.; Kanda, T.; Ishiguro, H.; Hagita, N. A larger audience, please!—Encouraging people to listen to a guide robot. In Proceedings of the 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI 2010), Osaka, Japan, 2–5 March 2010; pp. 31–38. [Google Scholar]
Heerink, M.; Kröse, B.; Evers, V.; Wielinga, B. Assessing acceptance of assistive social agent technology by older adults: The Almere model. Int. J. Soc. Robot. 2010, 2, 361–375. [Google Scholar] [CrossRef]
Rettinger, L.; Fürst, A.; Kupka-Klepsch, E.; Babel, F.; Baumann, M. Observing the interaction between a socially-assistive robot and residents in a nursing home. Int. J. Soc. Robot. 2024, 16, 403–413. [Google Scholar] [CrossRef]
Matsumoto, S.; Washburn, A.; Riek, L.D. A framework to explore proximate human-robot coordination. ACM Trans. Hum.-Robot Interact. 2022, 11, 1–34. [Google Scholar] [CrossRef]
Diehl, M.; Ramirez-Amaro, K. Why did I fail? A causal-based method to find explanations for robot failures. IEEE Robot. Autom. Lett. 2022, 7, 8925–8932. [Google Scholar] [CrossRef]
Diehl, M.; Ramirez-Amaro, K. A causal-based approach to explain, predict and prevent failures in robotic tasks. Robot. Auton. Syst. 2023, 162, 104383. [Google Scholar] [CrossRef]
Boos, A.; Herzog, O.; Reinhardt, J.; Bengler, K.; Zimmermann, M. A compliance–reactance framework for evaluating human-robot interaction. Front. Robot. AI 2022, 9, 733504. [Google Scholar] [CrossRef]
Kim, S.; Hirokawa, M.; Matsuda, S.; Funahashi, A.; Suzuki, K. Smiles as a signal of prosocial behaviors toward the robot in the therapeutic setting for children with autism spectrum disorder. Front. Robot. AI 2021, 8, 599755. [Google Scholar] [CrossRef]
Oliveira, R.; Arriaga, P.; Paiva, A. Human-robot interaction in groups: Methodological and research practices. Multimodal Technol. Interact. 2021, 5, 59. [Google Scholar] [CrossRef]
Hall, E.T. A system for the notation of proxemic behavior. Am. Anthropol. 1963, 65, 1003–1026. [Google Scholar] [CrossRef]
Hayes, A.T.; Hughes, C.E.; Bailenson, J. Identifying and coding behavioral indicators of social presence with a social presence behavioral coding system. Front. Virtual Real. 2022, 3, 773448. [Google Scholar] [CrossRef]
Fischer, K.; Yang, S.; Mok, B.; Maheshwari, R.; Sirkin, D.; Ju, W. Initiating interactions and negotiating approach: A robotic trash can in the field. In Proceedings of the Association for the Advancement of Artificial Intelligence Spring Symposium (AAAI 2015), Palo Alto, CA, USA, 23–25 March 2015; pp. 10–16. [Google Scholar]
Andrés, A.; Pardo, D.E.; Díaz, M.; Angulo, C. New instrumentation for human robot interaction assessment based on observational methods. J. Ambient Intell. Smart Environ. 2015, 7, 397–413. [Google Scholar] [CrossRef]
Rabiner, L.; Juang, B. An introduction to hidden markov models. IEEE ASSP Mag. 1986, 3, 4–16. [Google Scholar] [CrossRef]
Panuccio, A.; Bicego, M.; Murino, V. A hidden markov model-based approach to sequential data clustering. In Proceedings of the Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition (SSPR/SPR 2002), Windsor, ON, Canada, 6–9 August 2002; pp. 734–742. [Google Scholar]
Sánchez, V.G.; Lysaker, O.M.; Skeie, N.-O. Human behaviour modelling for welfare technology using hidden Markov models. Pattern Recognit. Lett. 2020, 137, 71–79. [Google Scholar] [CrossRef]
Le, H.; Hoch, J.E.; Ossmy, O.; Adolph, K.E.; Fern, X.; Fern, A. Modeling infant free play using hidden Markov models. In Proceedings of the IEEE International Conference on Development and Learning (ICDL 2021), Beijing, China, 23–26 August 2021; pp. 1–6. [Google Scholar]
Gupta, D.; Gupta, M.; Bhatt, S.; Tosun, A.S. Detecting anomalous user behavior in remote patient monitoring. In Proceedings of the 22nd IEEE International Conference on Information Reuse and Integration for Data Science (IRI 2021), Las Vegas, NV, USA, 6–8 August 2021; pp. 33–40. [Google Scholar]
Cheng, X.; Huang, B. CSI-based human continuous activity recognition using GMM–HMM. IEEE Sens. J. 2022, 22, 18709–18717. [Google Scholar] [CrossRef]
Flanagan, T.; Wong, G.; Kushnir, T. The minds of machines: Children’s beliefs about the experiences, thoughts, and morals of familiar interactive technologies. Dev. Psychol. 2023, 59, 1017–1031. [Google Scholar] [CrossRef]
Lewis, K.L.; Kervin, L.K.; Verenikina, I.; Howard, S.J. Young children’s at-home digital experiences and interactions: An ethnographic study. Front. Educ. 2024, 9, 1392379. [Google Scholar] [CrossRef]
Hemminghaus, J.; Kopp, S. Towards adaptive social behavior generation for assistive robots using reinforcement learning. In Proceedings of the 12th ACM/IEEE International Conference on Human-Robot Interaction (HRI 2017), Vienna, Austria, 6–9 March 2017; pp. 332–340. [Google Scholar]
Wróbel, A.; Źróbek, K.; Schaper, M.-M.; Zguda, P.; Indurkhya, B. Age-appropriate robot design: In-the-wild child–robot interaction studies of perseverance styles and robot’s unexpected behavior. In Proceedings of the 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN 2023), Busan, Republic of Korea, 28–31 August 2023; pp. 1451–1458. [Google Scholar]
Abel, M.; Kuz, S.; Patel, H.J.; Petruck, H.; Schlick, C.M.; Pellicano, A.; Binkofski, F.C. Gender effects in observation of robotic and humanoid actions. Front. Psychol. 2020, 11, 797. [Google Scholar] [CrossRef]
Zand, M.; Kodur, K.; Banerjee, S.; Banerjee, N.; Kyrarini, M. Examining diverse gender dynamics in human-robot interaction: Trust, privacy and safety perceptions. In Proceedings of the 17th International Conference on Pervasive Technologies Related to Assistive Environments (PETRA 2024), Crete, Greece, 26–28 June 2024; pp. 74–79. [Google Scholar]
Yin, S.; Kasraian, D.; Wang, G.; Evers, S.; van Wesemael, P. Co-designing an ideal nature-related digital tool with children: An exploratory study from the Netherlands. Environ. Behav. 2025, 56, 739–775. [Google Scholar] [CrossRef]
Neerincx, A.; Veldhuis, D.; Masthoff, J.M.F.; de Graaf, M.M.A. Co-designing a social robot for child health care. Int. J. Child-Comput. Interact. 2023, 38, 100615. [Google Scholar] [CrossRef]
Chu, L.; Chen, H.W.; Cheng, P.Y.; Ho, P.; Weng, I.T.; Yang, P.L.; Chien, S.E.; Tu, Y.C.; Yang, C.C.; Wang, T.M.; et al. Identifying features that enhance older adults’ acceptance of robots: A mixed methods study. Gerontology 2019, 65, 441–450. [Google Scholar] [CrossRef]
Finkel, M.; Krämer, N.C. The robot that adapts too much? An experimental study on users’ perceptions of social robots’ behavioral and persona changes between interactions with different users. Comput. Hum. Behav. Artif. Hum. 2023, 1, 100018. [Google Scholar] [CrossRef]
Matcovich, B.; Gena, C.; Vernero, F. How the personality and memory of a robot can influence user modeling in human-robot interaction. In Proceedings of the 32nd ACM Conference on User Modeling, Adaptation and Personalization (UMAP 2024), Cagliari, Italy, 1–4 July 2024; pp. 136–141. [Google Scholar]
Maroto-Gómez, M.; Villarroya, S.M.; Malfaz, M.; Castro-González, Á.; Castillo, J.C.; Salichs, M.Á. A preference learning system for the autonomous selection and personalization of entertainment activities during human-robot interaction. In Proceedings of the IEEE International Conference on Development and Learning (ICDL 2022), London, UK, 12–15 September 2022; pp. 343–348. [Google Scholar]
Joshi, S.; Malavalli, A.; Rao, S. From multimodal features to behavioural inferences: A pipeline to model engagement in human-robot interactions. PLoS ONE 2023, 18, e0285749. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Result of K-fold likelihood to determine the optimal number of hidden states.

Figure 2. A heatmap of the transition matrix.

Figure 3. Transition diagram for the estimated five-state GMM-HMM.

Figure 4. Observation distribution where each row represents a state and each column indicates a feature associated with that state.

Figure 5. Group- and individual-level observational approach for exploring HRI.

Table 1. Types of visitor behavior toward robots observed during the initial identification phase.

Types of Visitor Behavior	Snapshot from CCTV
A person avoiding robot after recognizing where the robot is.
A person passing by the robot without knowing where the robot is.
A person greeting the robot by waving hands.
A person touching the screen after following the robot while it moves.
A person touching the screen after approaching the robot.
Two persons touching the screen after approaching the robot.
A person pointing to the screen in order for another person to touch it together.

Table 2. Behavioral grammar.

Item	Descriptions
Grammar 1	When the robot performs (action), the visitor (gazes/directs head) while (maintaining distance).
Grammar 2	(Interaction attempt) is made.

Table 3. Behavior coding scheme.

Group	Code	Descriptions
Physical proximity	AP (approach)	Look at the robot’s location, approach it, and stop in front of it.
	P (pass)	When the robot is stationary, look at it and immediately walk past it.
	AV (avoid)	When the robot is moving, step aside in the direction it is heading.
	F (follow)	Follow the robot as it moves in the same direction.
Interaction attempts	T (touch)	Touch the robot’s screen or body.
	G (gesture)	Make gestures toward the robot (e.g., waving, nodding, raising your arms, etc.).
	N (none)	Remain still and do nothing to interact with the robot.

Table 4. Intercoder reliability test for two coders.

Factors	Test Results
Percent Agreement	84.19244
Scott’s Pi	0.7998
Cohen’s Kappa	0.7998
Krippendorff’s Alpha (Nominal)	0.7998
Number of Agreements	980
Number of Disagreements	184
Number of Cases	1164
Number of Decisions	2328

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yoon, H.; Shim, G.; Lee, H.; Kim, M.-G.; Kim, S. Observation of Human–Robot Interactions at a Science Museum: A Dual-Level Analytical Approach. Electronics 2025, 14, 2368. https://doi.org/10.3390/electronics14122368

AMA Style

Yoon H, Shim G, Lee H, Kim M-G, Kim S. Observation of Human–Robot Interactions at a Science Museum: A Dual-Level Analytical Approach. Electronics. 2025; 14(12):2368. https://doi.org/10.3390/electronics14122368

Chicago/Turabian Style

Yoon, Heeyoon, Gahyeon Shim, Hanna Lee, Min-Gyu Kim, and SunKyoung Kim. 2025. "Observation of Human–Robot Interactions at a Science Museum: A Dual-Level Analytical Approach" Electronics 14, no. 12: 2368. https://doi.org/10.3390/electronics14122368

APA Style

Yoon, H., Shim, G., Lee, H., Kim, M.-G., & Kim, S. (2025). Observation of Human–Robot Interactions at a Science Museum: A Dual-Level Analytical Approach. Electronics, 14(12), 2368. https://doi.org/10.3390/electronics14122368

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Observation of Human–Robot Interactions at a Science Museum: A Dual-Level Analytical Approach

Abstract

1. Introduction

2. Related Works

3. Behavior Coding Scheme

3.1. Initial Identification of Visitor Behaviors

3.2. Refinement of Behavior Coding Scheme

3.3. Video Tagging

4. Observation Results

4.1. Environment

4.2. Group-Level Behavioral Observation

4.2.1. Gender Difference

4.2.2. Age Difference

4.3. Individual-Level Behavioral Observation

4.3.1. Model Selection and Data Preprocessing

4.3.2. Model Training

4.3.3. Model Interpretation

4.4. A Guide to Observation Studies Using Our Approach

4.4.1. Development of Behavior Coding Scheme

4.4.2. Group-Level Observation for Broad Behavior Patterns

4.4.3. Individual-Level Observation for Detailed Interaction Dynamics

5. Discussion

5.1. Behavior-Driven, User-Centered Design

5.2. Adaptive, Dynamic Interaction Strategies

5.3. Utility of Time-Based Engagement Modeling

5.4. Limitations

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI