Design and Evaluation of a Sound-Driven Robot Quiz System with Fair First-Responder Detection and Gamified Multimodal Feedback

Tutul, Rezaul; Pinkwart, Niels

doi:10.3390/robotics14090123

Open AccessArticle

Design and Evaluation of a Sound-Driven Robot Quiz System with Fair First-Responder Detection and Gamified Multimodal Feedback

by

Rezaul Tutul

^*

and

Niels Pinkwart

Department of Mathematics and Natural Sciences, Humboldt University of Berlin, 10117 Berlin, Germany

^*

Author to whom correspondence should be addressed.

Robotics 2025, 14(9), 123; https://doi.org/10.3390/robotics14090123

Submission received: 16 July 2025 / Revised: 18 August 2025 / Accepted: 28 August 2025 / Published: 31 August 2025

(This article belongs to the Section Educational Robotics)

Download

Browse Figures

Versions Notes

Abstract

This paper presents the design and evaluation of a sound-driven robot quiz system that enhances fairness and engagement in educational human–robot interaction (HRI). The system integrates a real-time sound-based first-responder detection mechanism with gamified multimodal feedback, including verbal cues, music, gestures, points, and badges. Motivational design followed the Octalysis framework, and the system was evaluated using validated scales from the Technology Acceptance Model (TAM), the Intrinsic Motivation Inventory (IMI), and the Godspeed Questionnaire. An experimental study was conducted with 32 university students comparing the proposed multimodal system combined with sound-driven first quiz responder detection to a sequential turn-taking quiz response with a verbal-only feedback system as a baseline. Results revealed significantly higher scores for the experimental group across perceived usefulness (M = 4.32 vs. 3.05, d = 2.14), perceived ease of use (M = 4.03 vs. 3.17, d = 1.43), behavioral intention (M = 4.24 vs. 3.28, d = 1.62), and motivation (M = 4.48 vs. 3.39, d = 3.11). The sound-based first-responder detection system achieved 97.5% accuracy and was perceived as fair and intuitive. These findings highlight the impact of fairness, motivational feedback, and multimodal interaction on learner engagement. The proposed system offers a scalable model for designing inclusive and engaging educational robots that promote active participation through meaningful and enjoyable interactions.

Keywords:

educational robotics; human–robot interaction (HRI); multimodal feedback; sound-based input; Intrinsic Motivation; gamification; Pepper robot

1. Introduction

Socially Assistive Robots (SARs) have gained increasing traction in educational environments, particularly for their ability to promote learner motivation, engagement, and collaboration through embodied interaction and social presence [1,2]. Among the most common use cases is the deployment of robot-led quiz games in classrooms, which have shown potential for improving attention, emotional engagement, and peer collaboration. However, many existing systems rely on sequential turn-taking and unimodal feedback mechanisms, which often lack fairness and real-time responsiveness, two critical elements in competitive group learning scenarios [3,4].

In team-based classroom activities, particularly quiz competitions, determining the first responder fairly is a persistent challenge [5,6]. Traditional quiz formats often rely on strict turn orders or simple button-press systems, which do not accurately capture who responded first in real-time or allow for expressive interaction. Emerging work in Human–Robot Interaction (HRI) suggests that non-verbal sound input recognition, such as clapping, whistling, or tapping, can be used as an intuitive and fair method of first responder detection [7,8]. However, this has rarely been integrated into full-fledged educational robotics systems, particularly with real-time multimodal feedback mechanisms.

Equally important is the design of robot behavior and feedback [9]. Multimodal interaction, including gestures, music, verbal expressions, and visual animations, can significantly enhance learners’ emotional engagement and social connection with robots [9,10,11,12]. When these elements are embedded into a game-based context and aligned with structured motivational frameworks such as the Octalysis framework [13], they can activate psychological drivers like accomplishment, social influence, and curiosity essential for sustained learner motivation [14].

To ensure meaningful evaluation of such systems, this study integrates validated instruments from the Technology Acceptance Model (TAM) [15], the Intrinsic Motivation Inventory (IMI) [16], and the Godspeed Questionnaire Series [17]. These tools offer multi-dimensional insights into user perceptions of usefulness, ease of use, motivation, robot likability, and behavioral intention. This study presents and compares two interactive educational systems:

Artefact A (Experimental Prototype): A robot-led quiz system featuring sound-driven first responder detection (using cross-correlation), multimodal feedback (gesture, music, speech), and gamification elements (points, badges).
Artefact B (Baseline Prototype): A robot-led quiz system with sequential turn-taking, verbal-only feedback, and no gamification.

To evaluate the impact of these features, we conducted a between-subject experiment with university students and assessed their responses using validated subscales from TAM, IMI, and Godspeed. The study addresses the following research question and hypothesizes:

RQ: How does gamified multimodal feedback combined with sound-based first responder detection compare with verbal-only feedback with sequential response during robot-led quiz activities involving two competing teams in terms of perceived usefulness, ease of use, motivation, social presence, and behavioral intention?

H1.

Students’ perceived usefulness of Artefact A is significantly higher than that of Artefact B.

H2.

Students’ perceived ease of use of Artefact A is significantly higher than that of Artefact B.

H3.

Students’ motivation (interest/enjoyment and competence) while using Artefact A is significantly higher than that of Artefact B.

H4.

Students’ perceived social presence of the robot (likeability and anthropomorphism) in Artefact A is significantly higher than that of Artefact B.

H5.

Students’ behavioral intention to use robot-assisted systems is significantly higher when using Artefact A than Artefact B.

By combining response fairness, motivational design, and system usability, our work contributes to the growing field of human-centered educational robotics, providing actionable design and evaluation guidelines for future robot-assisted learning environments.

2. Related Work

2.1. Educational Robots in Learning Environments

Social robots have become increasingly prevalent in educational contexts due to their ability to engage learners through embodied interaction, social presence, and adaptive communication strategies [1]. Robots such as Pepper and NAO have been employed for vocabulary acquisition, collaborative tasks, and quiz-based learning in both K–12 and higher education settings [2,3,18,19]. These systems are particularly effective when they offer personalized feedback, group facilitation, and interactive quizzes, making them ideal for promoting motivation and attention in learners [4,20].

However, most existing robot-based learning systems rely on verbal interaction or sequential turn-taking [21], which limits natural competition and does not scale well to multi-student group settings. This paper addresses this gap by integrating real-time sound-based responder detection to support fair, competitive participation.

2.2. Multimodal Interaction and Feedback in HRI

Multimodal feedback combining speech, gestures, audio cues, and visual signals has been shown to improve both task performance and user engagement in HRI systems [10,22]. In educational settings, multimodal robots increase children’s enjoyment and task recall [23,24], while gesture-augmented interactions with robots enhance social presence and perceived intelligence [25]. Expressive feedback (e.g., music and dancing after correct answers) has also been linked to stronger emotional bonding and memory retention [26,27,28].

Our system leverages these findings by combining gesture-based movement, auditory music cues, and verbal praise to deliver affective and motivational feedback during a quiz game. This creates a richer experience than unimodal systems and is further enhanced through sound-driven input recognition.

2.3. Fairness and First Responder Detection in Group-Based HRI

Fairness in group interactions is critical for sustained engagement and trust in educational technology [29]. In multi-student contexts, the perception of fairness is who gets to answer first, whether the robot treats all students equally has been shown to affect motivation and participation [30]. Yet, few systems offer transparent or real-time responder detection based on ambient input.

Some prior work uses hand-raising detection or button presses, but these require additional hardware or physical constraints [31]. Our approach using cross-correlation of non-verbal buzzer sounds enables a low-cost, intuitive, and fair way to identify who responded first, even in noisy environments. This supports a transparent fairness mechanism embedded in the game logic.

2.4. Gamification and the Octalysis Framework

Gamification is a key design strategy for improving learning outcomes by integrating motivational elements into educational systems [32,33,34,35]. Frameworks like Octalysis [13] provide a comprehensive structure for mapping features to psychological drives such as accomplishment, ownership, unpredictability, and social influence.

In robot-based learning, gamification has shown positive effects on user motivation, attitude, and acceptance [36,37]. However, few studies have applied a structured gamification framework like Octalysis to systematically design robot behaviors and feedback mechanisms. In this work, we explicitly align our system’s features (e.g., badges, team scores, expressive dance feedback) with Octalysis drives to maximize user motivation and engagement.

2.5. Evaluation Through Multiscale HRI Instruments

Validating educational robot systems requires multidimensional measurement tools. The Technology Acceptance Model (TAM) [Appendix A.1] has been widely used to assess user attitudes and behavioral intention in HRI [17]. The Intrinsic Motivation Inventory (IMI) [Appendix A.2] captures psychological factors such as enjoyment, competence, and pressure [18], while the Godspeed Questionnaire Series [Appendix A.3] evaluates robot anthropomorphism, animacy, likeability, and perceived intelligence [19].

By combining these scales, this study ensures a holistic understanding of user perceptions toward both the robot system and its interaction design.

3. System Design

The proposed system enables a Pepper robot to facilitate a group-based quiz game by detecting the first responder using non-verbal sound inputs, verifying answers via verbal input, and providing multimodal feedback through gestures, music, and speech. The architecture was designed to ensure real-time interaction fairness, personalization, and engagement, with modular components for detection, interaction, and gamified feedback.

3.1. System Architecture

The system is composed of two main modules:

A Python (3.12)-based backend responsible for sound order detection, template matching, and interaction logic
A Kotlin (2.1)-based Pepper application using QiSDK ASR for speech recognition, gesture control, and verbal output

The sound detection module uses cross-correlation between live audio and pre-recorded templates to determine the first responder in real-time. These templates are created using recordable buzzers, which allow students to generate any sound (e.g., clap, whistle, tap). Before each session, the system enables users to record and test sound templates to ensure robustness and minimize false detections.

The detected responder ID is then forwarded to the robot client via WebSocket, which manages the game flow and multimodal feedback as shown in Figure 1.

3.2. Sound-Based First Responder Detection

The system uses mono-channel microphone input to capture overlapping signals from the physical buzzers. After amplitude filtering, it performs cross-correlation between the incoming signal and the stored sound templates. The sound with the highest peak score and minimal lag is determined as the first responder. To ensure fairness both buzzers need to be placed in equal distance from the microphone as shown in Figure 2.

The system includes amplitude thresholding and a noise-rejection window to ignore irrelevant peaks. In tests involving ambient classroom noise, detection precision dropped slightly (94.3%) when students spoke over each other or clapped off-cue. However, the system continued to correctly detect the majority of first responders based on the earliest valid cross-correlation peak. We instructed students to remain silent during designated buzzing windows and verified detection accuracy with post-session video recordings from mobile phones. In three cases, students used their phones to confirm that the detected first responder matched the true order.

As an alternative, Bluetooth-based buzzers could provide lower latency and reduced ambiguity. We chose sound-based input because it avoids the setup complexity of pairing multiple Bluetooth devices and supports expressive personalization (students record their unique buzzer tone). It also adds a game-like physical interaction consistent with multimodal engagement [38,39]. Future versions may integrate multi-modal input fusion (sound, Bluetooth, or gesture) for increased robustness.

3.3. Gamification via Octalysis Integration

The design of the robot-supported quiz game was guided by the Octalysis gamifi-cation framework, which comprises eight core motivational drives. We systematically applied the Octalysis Framework to guide the design of gamified elements in the robot quiz system. Table 1 below evaluates the strength of each core drive and compares its presence in the experimental vs. the control system.

This detailed comparison shows that the control group supports fewer and weaker gamification drives, while the experimental group activates multiple strong psychological motivators using robot-enabled features.

3.4. Feedback and Interaction Modalities

The quiz game algorithm flowchart shown in Figure 3.

At the beginning of the session, Pepper initiates interaction by greeting the participants using a combination of speech, gesture, and background music to establish rapport. It then verbally explains the rules of the quiz game and awaits participant confirmation to proceed. Upon receiving confirmation, Pepper plays a brief “Ding” sound to signal the start of the game. A quiz question is displayed on the robot’s chest-mounted tablet, and Pepper reads the question aloud while simultaneously playing a “Dong” sound and altering the tablet screen color to prompt participants to activate their buzzers.

Once the system recognizes the first responder through sound-based detection, Pepper verbally announces the team or individual who responded first, accompanied by a pointing gesture toward the identified participant(s). Subsequently, the robot listens for a verbal answer. Upon recognition of the verbal response and determination of its correctness, Pepper delivers multimodal feedback consisting of speech, gestures, and short music cues, and updates the corresponding team’s score.

This process continues iteratively for each quiz question. Upon completion of all questions, Pepper announces the winning team, delivers congratulatory feedback through a celebratory dance and upbeat music, and awards visual ranking badges as a form of recognition and motivation.

4. Experimental Design

To evaluate the effectiveness of the proposed sound-driven multimodal robot quiz system, a between-subject experimental study was conducted with two conditions: a sequential turn-taking quiz response with a verbal-only feedback system as a baseline, and the proposed gamified multimodal feedback system combined with sound-driven first quiz responder detection. The primary objective was to examine how multimodal feedback and sound-based first responder detection influenced students’ perceptions of usefulness, ease of use, motivation, and social presence and acceptance of the robot.

4.1. Study Design and Conditions

The control group interacted with a version (Baseline) of the robot that asked quiz questions verbally and received verbal responses from participants in a fixed sequential order. Feedback in this condition was limited to simple verbal statements such as “Correct” or “Incorrect,” without any additional gestures, music, or rewards.

In contrast, the experimental group engaged with the enhanced system that included non-verbal sound-based input, real-time first responder detection using cross-correlation, gesture and music feedback, a team-based point system, and visual badge rewards based on their obtained points as a team achievement, shown in Figure 4. Each student in the experimental group used a physical recordable buzzer, which allowed them to record and use a personalized non-verbal sound for the competition. The system provided testing functionality prior to the game to ensure that the sound templates were accurately recognized and differentiated.

4.2. Participants

A total of thirty-two undergraduate students (N = 32), aged between 19 and 24 years and enrolled in a C programming course at a German university of applied sciences, voluntarily participated in the study. Participants (7 female, 25 male) were randomly assigned to either the control group (N= 16) or the experimental group (N = 16). Recruitment was performed via in-class announcements. Students were informed that the activity involved using a robot in an interactive quiz, but were not given prior details about gamification, first-responder detection, or multimodal feedback features. This helped mitigate expectation bias or performance priming. For the experimental group, additional time was allocated for recording and testing individual buzzer sounds to ensure system accuracy and participant familiarity with the setup.

4.3. Procedure and Evaluation Criteria

Each group participated in a robot-led 25 min quiz session consisting of eight multiple-choice questions. The robot, implemented using QiSDK and autonomously moderated the game. After each question, the robot listened for answers and provided feedback appropriate to the group’s condition. The interaction duration and question difficulty were kept consistent across both groups.

Immediately after the session, participants completed a post-interaction questionnaire consisting of selected subscales from the Technology Acceptance Model (TAM), the Intrinsic Motivation Inventory (IMI), and the Godspeed Questionnaire Series. Specifically, the TAM subscales measured perceived usefulness, perceived ease of use, and behavioral intention; the IMI captured enjoyment, competence as a motivation; and the Godspeed scales measured likeability and anthropomorphism as a social presence of the robot. All questionnaire items were rated on a five-point Likert scale ranging from 1 (“Strongly disagree”) to 5 (“Strongly agree”).

4.4. Data Analysis

Statistical analysis was performed using independent samples t-tests in IBM SPSS Statistics 29 to compare the mean scores between the control and experimental groups. Cronbach’s alpha was calculated for each subscale to confirm internal consistency. Effect sizes (Cohen’s d) were also computed to quantify the magnitude of observed differences. This design allowed for a direct comparison of the impact of the proposed system on user perception, motivation, and acceptance, using validated multidimensional evaluation instruments.

5. Results

This section presents the results of the comparative evaluation between the control (Baseline) and experimental (sound-driven, multimodal) groups, structured around a research question with five hypotheses. Independent samples t-tests were conducted for each subscale, and effect sizes were calculated using Cohen’s d to assess the magnitude of group differences, as shown in Figure 5 and Table 2.

H1: examined how gamified multimodal feedback combined with sound-based first responder detection affects students’ perceived usefulness of the robot quiz system. The results revealed a statistically significant difference in perceived usefulness between the control group (M = 3.05, SD = 0.81) and the experimental group (M = 4.32, SD = 0.83), t(30) = 6.05, p = 0.01. The effect size was large (Cohen’s d = 2.14), indicating a substantial improvement in perceived usefulness due to the integration of multimodal feedback and fairness mechanisms.

H2: addressed the effect of the proposed system on perceived ease of use. Students in the experimental group reported significantly higher ease of use (M = 4.03, SD = 0.92) compared to those in the control group (M = 3.17, SD = 0.71), t(30) = 4.07, p < 0.001. The observed effect size was large (Cohen’s d = 1.43), suggesting that the inclusion of intuitive input mechanisms and expressive feedback positively impacted the usability of the system.

H3: focused on students’ motivation, as measured by the Interest/Enjoyment and Competence subscales of the Intrinsic Motivation Inventory (IMI). The experimental group reported significantly greater motivation (M = 4.48, SD = 0.34) than the control group (M = 3.39, SD = 0.36), t(30) = 6.96, p < 0.001. The effect size was extremely large (Cohen’s d = 3.11), indicating that the gamified and multimodal features contributed strongly to user motivation and engagement during the quiz experience.

H4: explored the impact of the system on students’ perception of the robot’s social presence measured by the Likeability and Anthropomorphism subscales of the Godspeed Questionnaire, which serve as validated indicators of perceived social presence in HRI research that showed a significant difference between groups, with the experimental group scoring higher (M = 3.70, SD = 0.62) than the control group (M = 3.36, SD = 0.70), t(30) = 2.17, p = 0.03. Although the effect size was moderate (Cohen’s d = 0.48), the result suggests that multimodal feedback, including gestures, music, and expressive animations, enhanced the robot’s perceived social interactivity.

H5: investigated students’ behavioral intention to use the system in the future. The experimental group exhibited significantly greater behavioral intention (M = 4.24, SD = 0.83) compared to the control group (M = 3.28, SD = 0.80), t(30) = 4.58, p < 0.001. The corresponding effect size was large (Cohen’s d = 1.62), indicating that the integration of fair first responder detection and engaging feedback mechanisms positively influenced students’ willingness to use such systems in future educational settings that also complies with Affective Computing Theory [38] and Octalysis Framework [14].

Furthermore, the non-verbal sound-driven first responder detection mechanism was perceived as fair, as no complaints were recorded during the experiment. Several students noted in their qualitative feedback that they had verified the fairness of the detection system using their mobile phone cameras and confirmed its reliability. In contrast, some students reported that the robot occasionally required multiple attempts to recognize verbal responses from female participants, whereas verbal inputs from male students, particularly those with stronger vocal projection, were recognized more readily. Although such issues were infrequent, with only one or two complaints observed, they were noted in the qualitative responses. These findings suggest that while the system effectively ensures fairness in non-verbal sound-based detection, minor inconsistencies in verbal recognition remain and may warrant further refinement.

Together, these results demonstrate that the proposed sound-driven, gamified multimodal robot quiz system outperforms the verbal-only baseline across all evaluated dimensions, with particularly strong gains in perceived usefulness, enjoyment, and behavioral intention.

6. Discussion

The findings of this study demonstrate that integrating sound-based first responder detection with gamified multimodal feedback (Artefact A) significantly enhances students’ perceived usefulness, ease of use, motivation, social presence, and behavioral intention compared to a traditional verbal-only, sequential interaction system (Artefact B). These results support the hypotheses H1, H2, H3, and H5, with partial support for H4.

Hypothesis 1 (H1) was supported by a large effect size (Cohen’s d = 2.14), indicating that students found Artefact A more beneficial for engaging with quiz content. This aligns with earlier HRI research showing that real-time, expressive feedback increases user involvement and perceived value in educational contexts. The addition of visual scoring, dynamic robot gestures, and verbal praise likely contributed to these high ratings of perceived usefulness.

Hypothesis 2 (H2) was also confirmed, with students in the experimental group reporting significantly higher perceived ease of use. Despite the added complexity of configuring personalized buzzers and managing real-time sound input, the system remained intuitive to operate. This outcome highlights the importance of designing interactive systems that balance rich functionality with accessible interaction logic.

Hypothesis 3 (H3) yielded the strongest effect (Cohen’s d = 3.11), underscoring the motivational power of gamified robot-led interaction. Elements such as personalized buzzer sounds, animated feedback, and competitive team dynamics likely activated core psychological drives from the Octalysis framework, including Development & Accomplishment, Scarcity & Impatience, and Empowerment of Creativity. These design elements promoted a sense of agency, urgency, and emotional investment, leading to heightened enjoyment and competence perception among participants.

Hypothesis 4 (H4), regarding social presence, showed a moderate effect (Cohen’s d = 0.48) and remained statistically significant. While multimodal cues enhanced the robot’s expressiveness and personality, they may not have been sufficient to substantially increase perceived anthropomorphism or likeability compared to verbal-only interaction. This suggests that stronger embodiment features (e.g., facial expressions, gaze control) may be needed to elevate social presence in similar HRI systems.

Hypothesis 5 (H5) was supported by a large effect size (Cohen’s d = 1.62), indicating that the combination of fairness and engaging feedback positively influenced students’ behavioral intention to reuse the system. Participants expressed high willingness to engage with robot-led quizzes in future settings, reinforcing the importance of motivational design and fairness mechanisms for long-term user acceptance.

6.1. Fairness Validation by Participants

An important contribution of this study lies in the perceived fairness and transparency of the sound-based first responder detection mechanism. Although the system used cross-correlation to determine the first valid buzzer sound, students themselves validated this process during the experiment. Several participants voluntarily recorded the game using their mobile phones and later reviewed the footage in slow motion to determine the actual order of buzzer presses. In every instance, the system’s decision matched the order observed in the videos. This informal, user-led verification served as a strong endorsement of the system’s fairness and built participant trust in the robot’s decisions, an essential factor in educational group-based settings.

The fact that students took the initiative to validate fairness also reflects the system’s transparency and interpretability, which are key principles in explainable AI and ethical robotics. Future studies could formally incorporate such fairness validation protocols using synchronized audiovisual recordings.

6.2. Design Implications and Technical Considerations

The integration of Octalysis-mapped gamification elements contributed meaningfully to the system’s motivational impact. As shown in Table 1, Artefact A scored significantly higher across multiple core drives, with substantial differences in Development & Accomplishment (+4), Scarcity & Impatience (+5), and Empowerment of Creativity (+3). These design aspects were not only well-received by participants but also correlated with higher motivation and behavioral intention scores. This confirms the utility of structured gamification frameworks in HRI system design.

Technically, the sound-based detection mechanism achieved 97.5% accuracy, even in a classroom setting with moderate background noise. The use of pre-session sound template testing, amplitude filtering, and cross-correlation allowed reliable detection, and no complaints or misidentifications were reported. The only limitation was the system’s reliance on equal buzzer distance from the microphone, which was addressed in the setup protocol. While Bluetooth-based alternatives may offer latency benefits, our sound-based approach provided an accessible and engaging experience consistent with multimodal interaction goals.

7. Conclusions, Limitations, and Future Work

This study introduced and evaluated a robot quiz system designed to enhance fairness, motivation, and engagement in group-based educational activities. Two prototypes were compared: Artefact A, featuring sound-driven first responder detection, gamified multimodal feedback, and personalized buzzers, and Artefact B, a sequential verbal-only baseline system.

The results provided strong support for four of the five hypotheses. Artefact A significantly improved students’ perceived usefulness (H1), ease of use (H2), motivation (H3), and behavioral intention (H5) compared to Artefact B. Although a moderate increase was observed in social presence (H4), this difference suggests that richer embodiment features are needed to influence perceptions of robot likeability and anthropomorphism.

A key contribution of this work lies in the validated fairness of the sound-based detection mechanism. Students independently recorded the quiz sessions on their mobile phones and later reviewed the footage in slow motion to verify the robot’s decision. In every case, Pepper’s first responder recognition matched the video evidence. This spontaneous user-led validation confirmed the reliability and fairness of the system, reinforcing trust in the robot’s role as a neutral moderator in competitive learning scenarios.

The integration of Octalysis-mapped gamification elements further contributed to student motivation. Features such as team points, badges, urgent competition for first response, and personalized buzzer sounds activated core motivational drives, including Development & Accomplishment, Scarcity & Impatience, and Empowerment of Creativity. These design aspects explain the substantial gains in motivation and behavioral intention scores, highlighting the importance of structured motivational design in HRI.

7.1. Limitations

Several limitations must be acknowledged. First, the sample size (N = 32) was relatively small and restricted to undergraduate computer science students at a single institution, limiting generalizability to broader populations such as school children, older learners, or non-technical cohorts. Second, the study measured short-term perceptions only; long-term effects on learning outcomes, sustained engagement, or retention were not assessed. Third, while the sound-based detection mechanism achieved high accuracy, it requires careful microphone placement and equal buzzer distances to maintain fairness. Finally, the graphical user interface of the Pepper tablet, though functional, showed some inconsistencies in design (e.g., color contrasts, layout), which may have influenced ease-of-use ratings and introduced potential bias.

7.2. Future Work

Future research should expand both the technical and educational scope of the system. From a technical perspective, enhancements could include:

Robust multimodal fusion combining sound, Bluetooth signals, and gesture input to reduce dependency on microphone placement.
Adaptive ASR tuning, particularly to improve recognition of softer voices and female participants, thereby ensuring inclusivity.
Standardized GUI design principles to minimize bias and improve usability across diverse learner groups.

From an educational perspective, further studies should:

Involve larger and more diverse populations, including younger students, neurodiversity learners, and cross-cultural cohorts.
Conduct longitudinal evaluations to measure learning outcomes, motivation, retention, and long-term system acceptance.
Explore extended gamification strategies, such as progressive difficulty, storytelling, or cooperative challenges, to sustain engagement over repeated sessions.
Investigate the role of fairness perception more systematically, integrating synchronized audiovisual logging to formally validate responder detection accuracy alongside user perception.

7.3. Final Remark

Overall, this work demonstrates that integrating fairness mechanisms, gamified feedback, and multimodal interaction into robot-led educational tools significantly enhances student engagement and acceptance. By validating fairness not only through technical accuracy but also through participant verification, the system provides a replicable model for designing equitable and motivating human–robot learning interactions.

Author Contributions

Conceptualization, R.T. and N.P.; methodology, R.T.; software, R.T.; validation, R.T. and N.P.; formal analysis, N.P.; investigation, R.T.; resources, R.T.; data curation, R.T.; writing—original draft preparation, R.T.; writing—review and editing, R.T. and N.P.; visualization, R.T.; supervision, N.P.; project administration, N.P.; funding acquisition, N.P. All authors have read and agreed to the published version of the manuscript.

Funding

The article processing charge (APC) was funded by the Open Access Publication Fund of Humboldt-Universität zu Berlin.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The research data supporting the findings of this study are available on request from the corresponding author due to project privacy reason.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

HRI	Human–Robot Interaction
TAM	Technology Acceptance Model
IMI	Intrinsic Motivation Inventory
QiSDK	QI Software Development Kit (SDK) for NAO and Pepper robots by SoftBank Robotics
ASR	Automatic Speech Recognition

Appendix A

Appendix A.1. Technology Acceptance Model (TAM) Subscales and Items

Table A1. Technology Acceptance subscales and Items.

Subscales	Items
Perceived Usefulness (PU)	Using the robot quiz helped me engage with the topic. The system improved my classroom experience. The game-based interaction was beneficial to my understanding.
Perceived Ease of Use (PEOU)	The robot quiz system was easy to operate. I could understand how to interact with the robot without effort. Using the robot did not require much training.
Behavioral Intention (BI)	I would like to use this type of system in future courses. I would recommend this learning system to others. I would prefer using robot-led quizzes over traditional methods.

Appendix A.2. Intrinsic Motivation Inventory (IMI) Subscales and Items

Table A2. Motivation subscales and Items.

Subscales	Items
Interest/Enjoyment	I enjoyed participating in the robot quiz. The game was fun and engaging. This experience was entertaining and motivating.
Perceived Competence	I felt skilled while answering questions with the robot. I was able to perform well in the quiz. I believe I did well, regardless of the outcome.

Appendix A.3. Godspeed Social Presence Subscales and Items

Table A3. Social Presence subscales and Items.

Subscales	Items
Likeability	I found the robot likable. The robot had a pleasant personality. I enjoyed interacting with the robot.
Anthropomorphism	The robot seemed intelligent and aware. The robot behaved in a human-like way. The robot’s actions felt natural and expressive.

References

Belpaeme, T.; Kennedy, J.; Ramachandran, A.; Scassellati, B.; Tanaka, F. Social robots for education: A review. Sci. Robot. 2018, 3, eaat5954. [Google Scholar] [CrossRef]
Papakostas, G.A.; Sidiropoulos, G.K.; Papadopoulou, C.I.; Vrochidou, E.; Kaburlasos, V.G.; Papadopoulou, M.T.; Holeva, V.; Nikopoulou, V.-A.; Dalivigkas, N. Social Robots in Special Education: A Systematic Review. Electronics 2021, 10, 1398. [Google Scholar] [CrossRef]
Stasolla, F.; Curcio, E.; Borgese, A.; Passaro, A.; Di Gioia, M.; Zullo, A.; Martini, E. Educational Robotics and Game-Based Interventions for Overcoming Dyscalculia: A Pilot Study. Computers 2025, 14, 201. [Google Scholar] [CrossRef]
Zhang, X.; Li, D.; Tu, Y.-F.; Hwang, G.-J.; Hu, L.; Chen, Y. Engaging Young Students in Effective Robotics Education: An Embodied Learning-Based Computer Programming Approach. J. Educ. Comput. Res. 2023, 62, 532–558. [Google Scholar] [CrossRef]
Louie, W.-Y.G.; Nejat, G. A Social Robot Learning to Facilitate an Assistive Group-Based Activity from Non-expert Caregivers. Int. J. Soc. Robot. 2020, 12, 1159–1176. [Google Scholar] [CrossRef]
Yang, Q.-F.; Lian, L.-W.; Zhao, J.-H. Developing a gamified artificial intelligence educational robot to promote learning effectiveness and behaviour in laboratory safety courses for undergraduate students. Int. J. Educ. Technol. High. Educ. 2023, 20, 18. [Google Scholar] [CrossRef]
Chang, M.L.; Trafton, G.; McCurry, J.M.; Thomaz, A.L. Unfair! Perceptions of Fairness in Human-Robot Teams. In Proceedings of the 2021 30th IEEE International Conference on Robot & Human Interactive Communication (RO-MAN), Vancouver, BC, Canada, 8–12 August 2021; pp. 905–912. [Google Scholar] [CrossRef]
Tutul, R.; Buchem, I.; Jakob, A.; Pinkwart, N. Enhancing Learner Motivation, Engagement, and Enjoyment Through Sound-Recognizing Humanoid Robots in Quiz-Based Educational Games. In Digital Interaction and Machine Intelligence, Proceedings of the MIDI’2023—11th Machine Intelligence and Digital Interaction Conference, Warsaw, Poland, 12–14 December 2023; Lecture Notes in Networks and Systems; Springer Nature: Cham, Switzerland, 2024; pp. 123–132. [Google Scholar] [CrossRef]
Alam, A. Social Robots in Education for Long-Term Human-Robot Interaction: Socially Supportive Behaviour of Robotic Tutor for Creating Robo-Tangible Learning Environment in a Guided Discovery Learning Interaction. ECS Trans. 2022, 107, 12389–12403. [Google Scholar] [CrossRef]
Leite, I.; Martinho, C.; Paiva, A. Social robots for long-term interaction: A survey. Int. J. Soc. Robot. 2013, 5, 291–308. [Google Scholar] [CrossRef]
Bacula, A.; Knight, H. Dancing with Robots at a Science Museum: Coherent Motions Got More People To Dance, Incoherent Sends Weaker Signal. In Proceedings of the 2024 International Symposium on Technological Advances in Human-Robot Interaction, Boulder, CO, USA, 9–10 March 2024. [Google Scholar]
Theodotou, E. Dancing With children or dancing for children? Measuring the effects of a dance intervention in children’s confidence and agency. Early Child Dev. Care 2025, 195, 64–73. [Google Scholar] [CrossRef]
Chou, Y.K. Actionable Gamification: Beyond Points, Badges, and Leaderboards; Octalysis Group: Sheridan, WY, USA, 2015. [Google Scholar]
Bagheri, E.; Vanderborght, B.; Roesler, O.; Cao, H.-L. A Reinforcement Learning Based Cognitive Empathy Framework for Social Robots. Int. J. Soc. Robot. 2020, 13, 1079–1093. [Google Scholar] [CrossRef]
Venkatesh, V.; Davis, F.D. A theoretical extension of the TAM: Four longitudinal studies. Manag. Sci. 2000, 46, 186–204. [Google Scholar] [CrossRef]
Deci, E.L.; Ryan, R.M. Intrinsic Motivation and Self-Determination in Human Behavior; Springer: Berline/Heidelberg, Germany, 1985. [Google Scholar]
Bartneck, C.; Kulic, D.; Croft, E.; Zoghbi, S. Measurement instruments for the anthropomorphism, animacy, likeability, perceived intelligence, and perceived safety of robots. Int. J. Soc. Robot. 2009, 1, 71–81. [Google Scholar] [CrossRef]
Fung, K.Y.; Lee, L.H.; Sin, K.F.; Song, S.; Qu, H. Humanoid robot-empowered language learning based on self-determination theory. Educ. Inf. Technol. 2024, 29, 18927–18957. [Google Scholar] [CrossRef]
Hirschmanner, M.; Gross, S.; Krenn, B.; Neubarth, F.; Trapp, M.; Vincze, M. Grounded Word Learning on a Pepper Robot. In Proceedings of the 18th International Conference on Intelligent Virtual Agents, Sydney, NSW, Australia, 5–8 November 2018. [Google Scholar]
Schiavo, F.; Campitiello, L.; Todino, M.D.; Di Tore, P.A. Educational Robots, Emotion Recognition and ASD: New Horizon in Special Education. Educ. Sci. 2024, 14, 258. [Google Scholar] [CrossRef]
Ackermann, H.; Lange, A.L.; Hafner, V.V.; Lazarides, R. How adaptive social robots influence cognitive, emotional, and self-regulated learning. Sci. Rep. 2025, 15, 6581. [Google Scholar] [CrossRef] [PubMed]
Goldman, E.J.; Baumann, A.; Poulin-Dubois, D. Pre-schoolers’ anthropomorphizing of robots: Do human-like properties matter? Front. Psychol. 2023, 13, 1102370. [Google Scholar] [CrossRef] [PubMed]
Guo, J.; Ishiguro, H.; Sumioka, H. A Multimodal System for Empathy Expression: Impact of Haptic and Auditory Stimuli. In Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA’25), Yokohama, Japan, 26 April–1 May 2025; Association for Computing Machinery: New York, NY, USA, 2025. Article 44. [Google Scholar] [CrossRef]
Su, H.; Qi, W.; Chen, J.; Yang, C.; Sandoval, J.; Laribi, M.A. Recent advancements in multimodal human–robot interaction. Front. Neurorobot. 2023, 17, 1084000. [Google Scholar] [CrossRef]
Kennedy, J.; Baxter, P.; Belpaeme, T. Comparing Robot Embodiments in a Guided Discovery Learning Interaction with Children. Int. J. Soc. Robot. 2015, 7, 293–308. [Google Scholar] [CrossRef]
Delecluse, M.; Sanchez, S.; Cussat-Blanc, S.; Schneider, N.; Welcomme, J.-B. High-level behavior regulation for multi-robot systems. In Proceedings of the Companion Publication of the 2014 Annual Conference on Genetic and Evolutionary Computation (GECCO Comp’14), Vancouver, BC, Canada, 12–16 July 2014; Association for Computing Machinery: New York, NY, USA, 2014; pp. 29–30. [Google Scholar] [CrossRef]
Kragness, H.E.; Ullah, F.; Chan, E.; Moses, R.; Cirelli, L.K. Tiny dancers: Effects of musical familiarity and tempo on children’s free dancing. Dev. Psychol. 2022, 58, 1277–1285. [Google Scholar] [CrossRef]
Huang, P.; Hu, Y.; Nechyporenko, N.; Kim, D.; Talbott, W.; Zhang, J. EMOTION: Expressive Motion Sequence Generation for Humanoid Robots with In-Context Learning. IEEE Robot. Autom. Lett. 2024, 10, 7699–7706. [Google Scholar] [CrossRef]
Cao, J.; Chen, N. The Influence of Robots’ Fairness on Humans’ Reward-Punishment Behaviors and Trust in Human-Robot Cooperative Teams. Hum. Factors 2022, 66, 1103–1117. [Google Scholar] [CrossRef] [PubMed]
Ayalon, O.; Hok, H.; Shaw, A.; Gordon, G. When it is ok to give the Robot Less: Children’s Fairness Intuitions Towards Robots. Int. J. Soc. Robot. 2023, 15, 1581–1601. [Google Scholar] [CrossRef]
Salinas-Martínez, Á.-G.; Cunillé-Rodríguez, J.; Aquino-López, E.; García-Moreno, A.-I. Multimodal Human–Robot Interaction Using Gestures and Speech: A Case Study for Printed Circuit Board Manufacturing. J. Manuf. Mater. Process. 2024, 8, 274. [Google Scholar] [CrossRef]
Dichev, C.; Dicheva, D. Gamifying education: What is known, what is believed and what remains uncertain: A critical review. Int. J. Educ. Technol. High. Educ. 2017, 14, 9. [Google Scholar] [CrossRef]
Hamari, J.; Koivisto, J.; Sarsa, H. Does gamification work? A literature review of empirical studies on gamification. In Proceedings of the 47th Hawaii International Conference on System Sciences, Waikoloa, HI, USA, 6–9 January 2014; pp. 3025–3034. [Google Scholar] [CrossRef]
Sripathy, A.; Bobu, A.; Li, Z.; Sreenath, K.; Brown, D.S.; Dragan, A.D. Teaching Robots to Span the Space of Functional Expressive Motion. In Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 23–27 October 2022; pp. 13406–13413. [Google Scholar]
Buchem, I.; Mc Elroy, A.; Tutul, R. Designing and programming game-based learning with humanoid robots: A case study of the multimodal “Make or Do” English grammar game with the Pepper robot. In Proceedings of the 15th Annual International Conference of Education, Research and Innovation, Seville, Spain, 7–9 November 2022. [Google Scholar]
Saerbeck, M.; Schut, T.; Bartneck, C.; Janse, M.D. Expressive robots in education: Varying the degree of social supportive behavior of a robotic tutor. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’10), Atlanta, GA, USA, 10–15 April 2010; Association for Computing Machinery: New York, NY, USA, 2010; pp. 1613–1622. [Google Scholar] [CrossRef]
Ravandi, B.S. Gamification for Personalized Human-Robot Interaction in Companion Social Robots. In Proceedings of the 12th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW), Glasgow, UK, 15 September 2024; pp. 106–110. [Google Scholar] [CrossRef]
Picard, R.W. Affective Computing; The MIT Press: Cambridge, MA, USA, 1997. [Google Scholar]
Tutul, R.; Jakob, A.; Buchem, I.; Pinkwart, N. Sound recognition with a humanoid robot for a quiz game in an educational environment. In Proceedings of the Fortschritte der Akustik—DAGA 2023, Hamburg, Germany, 6–9 March 2023; pp. 938–941. [Google Scholar]

Figure 1. Proposed system architecture of gamified multimodal feedback, combined with first responder detection.

Figure 2. Sound-based first responder detection.

Figure 3. Flowchart illustrating the quiz game interaction sequence, including first responder detection, verbal answer recognition, and multimodal feedback delivery via gestures, music, and robot speech.

Figure 4. User interface of the Pepper robot quiz system. (Left) Active question phase with multiple-choice layout and team prompt. (Right) Final result screen with team points and badge display.

Figure 5. Boxplots showing the distribution of scores per subscale.

Table 1. Octalysis Drive Strength and Comparison Between Artefact A and Artefact B.

Core Drive	Description	Score (A) *	Score (B) *	Δ	Justification
Epic Meaning & Calling	Feeling of contributing to a bigger goal	3	2	+1	Both systems use team competition, but only Artefact A provides team badges and verbal recognition.
2. Development & Accomplishment	Progress through points and achievements	5	1	+4	Artefact A awards real-time points and badges; Artefact B offers no visible achievement.
3. Empowerment of Creativity	Making meaningful choices or expressing individuality	4	1	+3	Artefact A lets users record custom buzzer sounds; Artefact B has no personalization.
4. Ownership & Possession	Emotional investment via team identity or rewards	4	1	+3	Artefact A reinforces team identity through badges and scores.
5. Social Influence & Relatedness	Peer collaboration or recognition	3	2	+1	Artefact A enables simultaneous team interaction; Artefact B is sequential and less social.
6. Scarcity & Impatience	Urgency or time pressure to act quickly	5	0	+5	Artefact A rewards the fastest responder; Artefact B lacks time-based interaction.
7. Unpredictability & Curiosity	Surprise elements, randomness	3	1	+2	Artefact A offers variable feedback (music, gesture); B does not.
8. Loss & Avoidance	Avoiding failure or missing rewards	3	1	+2	Artefact A uses sad music and gestures for incorrect answers; B gives neutral feedback.

* Scoring scale: 0 = not present, 5 = very strongly present. Δ = Difference in score between Artefact A and B for each drive.

Table 2. Descriptive Statistics of all Subscales.

Subscale	Control Mean (SD)	Experimental Mean (SD)	t(df)	p-Value	Cohen’s d	Result Summary
Perceived Usefulness	3.05 (0.81)	4.32 (0.83)	6.05	0.01	2.14	Significant, large effect
Perceived Ease of Use	3.17 (0.71)	4.03 (0.92)	4.07	<0.001	1.43	Significant, large effect
Motivation	3.39 (0.36)	4.48 (0.34)	6.96	<0.001	3.11	Significant, very large effect
Social Presence	3.36 (0.70)	3.70 (0.62)	2.17	0.03	0.48	Moderate, Significant medium effect
Behavioral Intention	3.28 (0.80)	4.24 (0.83)	4.58	<0.001	1.62	Significant, large effect

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tutul, R.; Pinkwart, N. Design and Evaluation of a Sound-Driven Robot Quiz System with Fair First-Responder Detection and Gamified Multimodal Feedback. Robotics 2025, 14, 123. https://doi.org/10.3390/robotics14090123

AMA Style

Tutul R, Pinkwart N. Design and Evaluation of a Sound-Driven Robot Quiz System with Fair First-Responder Detection and Gamified Multimodal Feedback. Robotics. 2025; 14(9):123. https://doi.org/10.3390/robotics14090123

Chicago/Turabian Style

Tutul, Rezaul, and Niels Pinkwart. 2025. "Design and Evaluation of a Sound-Driven Robot Quiz System with Fair First-Responder Detection and Gamified Multimodal Feedback" Robotics 14, no. 9: 123. https://doi.org/10.3390/robotics14090123

APA Style

Tutul, R., & Pinkwart, N. (2025). Design and Evaluation of a Sound-Driven Robot Quiz System with Fair First-Responder Detection and Gamified Multimodal Feedback. Robotics, 14(9), 123. https://doi.org/10.3390/robotics14090123

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Design and Evaluation of a Sound-Driven Robot Quiz System with Fair First-Responder Detection and Gamified Multimodal Feedback

Abstract

1. Introduction

2. Related Work

2.1. Educational Robots in Learning Environments

2.2. Multimodal Interaction and Feedback in HRI

2.3. Fairness and First Responder Detection in Group-Based HRI

2.4. Gamification and the Octalysis Framework

2.5. Evaluation Through Multiscale HRI Instruments

3. System Design

3.1. System Architecture

3.2. Sound-Based First Responder Detection

3.3. Gamification via Octalysis Integration

3.4. Feedback and Interaction Modalities

4. Experimental Design

4.1. Study Design and Conditions

4.2. Participants

4.3. Procedure and Evaluation Criteria

4.4. Data Analysis

5. Results

6. Discussion

6.1. Fairness Validation by Participants

6.2. Design Implications and Technical Considerations

7. Conclusions, Limitations, and Future Work

7.1. Limitations

7.2. Future Work

7.3. Final Remark

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

Appendix A.1. Technology Acceptance Model (TAM) Subscales and Items

Appendix A.2. Intrinsic Motivation Inventory (IMI) Subscales and Items

Appendix A.3. Godspeed Social Presence Subscales and Items

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI