Improving the Efficiency of Collaboration Between Humans and Embodied AI Agents in 3D Virtual Environments
Abstract
1. Introduction
1.1. Research Background
1.2. Problem Definition and Research Objectives
1.3. Research Objectives and Main Contributions
- 1.
- A Flexible Planning Graph Framework Using AND/OR Branching:We present a plan representation that integrates AND/OR branching nodes—capable of handling failures and alternative exploration—into a Directed Acyclic Graph (DAG). This hybrid structure ensures the logical coherence necessary for complex missions while enabling fluid decision-making regarding execution paths in dynamic environments.
- 2.
- An Adaptive Modification Mechanism via Natural Language Instructions: We developed a dynamic update mechanism that interprets human natural language input to assess its relevance to the active planning graph. This allows the system to instantaneously modify, append, or prune specific nodes without discarding the global plan, thereby maximizing the agent’s compliance and responsiveness.
- 3.
- Empirical Validation of Collaborative Efficiency: Through a comprehensive user study involving 30 participants in a Minecraft-based environment, we demonstrate that our framework significantly outperforms baselines. Specifically, the proposed agent reduced the mission completion time by an average of 97 s and substantially alleviated the cognitive workload of the users.
1.4. Structure of the Paper
2. Related Work
2.1. Minecraft-Based Embodied Agents
2.2. Hierarchical Planning for Embodied Agents
2.3. Graph-Based Planning and Reasoning
2.4. Multi-Agent Cooperative Systems
2.5. Human–Agent Interaction and Collaboration
3. Materials and Methods
3.1. Structure of the Graph-Based Planning Framework
3.1.1. Mathematical Definition and Structural Characteristics of the Planning Graph
3.1.2. Execution Flexibility via AND/OR Branching
3.2. Plan Execution and Automated Recovery Mechanism
3.2.1. Initial Request Transmission and Complexity Analysis
3.2.2. LLM-Based Plan Execution and Node Selection
3.2.3. Automated Recovery Mechanism via Backtracking
3.3. Dynamic Plan Modification Based on Human Intervention
- Chat: Casual utterances like “The weather is nice” are responded to lightly, and the existing task continues.
- Stop: In case of an urgent stop request like “Stop,” all actions and plans are immediately terminated, and the agent switches to an idle state.
- New Task: Cases directing a new task unrelated to the current context, such as “Stop what you’re doing and mine wood”. This was the most frequent type in pilot tests; the system discards the current graph and re-initiates the complexity analysis and graph generation process (Section 3.3.1).
3.3.1. New Task Request (New Task)
3.3.2. Switch Active Node
3.3.3. Delete Node
3.3.4. Add Node
3.4. Memory Summarization Strategy for Context Management
3.4.1. Limitations of Existing Reactive Memory
3.4.2. Task-Centric Summarization Based on Plan Graph Completion
4. Results
4.1. Experimental Design
4.1.1. Experimental Environment and System Configuration
4.1.2. Experimental Procedure
4.1.3. System Implementation and Models
4.1.4. Experimental Scenario
4.2. Hypotheses and Metrics
4.2.1. Research Questions and Hypotheses
- H1: The proposed graph-based agent (Experimental Group) will result in shorter mission completion times compared to the control agent.
- H2: Through the robustness of the plan and the automated recovery mechanism (Section 3.2.3), the proposed agent will significantly lower the subjective workload perceived by users (NASA-TLX), specifically “Frustration” and “Mental Demand,” compared to the control group.
- H3: Users will evaluate collaboration with the proposed agent as more useful and satisfying than with the control agent.
4.2.2. Metrics
- Mission Completion Time: The total time taken to complete placing all 9 items on the bingo board.
- Communication Efficiency: By comparing the number of utterances between the agent and the human, we quantitatively measured the communication cost and information density invested to achieve the same goal.
- NASA-TLX (Task Load Index) [26]: A standard scale measuring subjective workload felt by participants across 6 dimensions (MD, PD, TD, Effort, Perf, Frus).
- PSSUQ (Post-Study System Usability Questionnaire) [27]: A standard satisfaction scale measuring overall usability, information quality, and interface quality.
- Collaboration Satisfaction (Custom Scale): An 8-item scale developed for this study to evaluate “Collaboration Quality,” the core objective of this study (e.g., “Did it understand the intent well?”, “Was it a competent partner?”).
- Post-Interview: In-depth qualitative feedback on the collaboration experience with each agent.
4.3. Quantitative Analysis Results
4.3.1. Mission Completion Time
4.3.2. Communication Efficiency
4.4. Qualitative Analysis Results
4.4.1. NASA-TLX Analysis
4.4.2. PSSUQ Analysis
4.4.3. Collaboration Satisfaction Analysis (Custom Scale)
4.4.4. Post-Interview
4.4.5. Case Study
- Case 1:
- Reactive Planning and User Confusion
- Case 2:
- Blind Loops and State Awareness
- Case 3:
- Context Inertia and Hallucination Frequency
4.5. Technical Validation
5. Discussion
- Verification of Efficiency and Stability (H1): Agent B not only achieved an average time reduction of 97 s, but also demonstrated outstanding performance stability, maintaining consistent execution times regardless of execution order or proficiency, unlike reactive agent (A), which was heavily influenced by user skill.
- Substantial Reduction in Workload (H2): NASA-TLX analysis showed the proposed method significantly reduced the Effort required for mission completion. Frustration levels also showed a significant decreasing trend of about 20%, interpreted as the automated recovery mechanism and plan-based execution reducing unnecessary user intervention and mental consumption.
- Improvement in Flexibility and Communication Satisfaction (H3): PSSUQ and Collaboration Satisfaction analysis showed the proposed model outperformed the control group in all items. High scores in instruction comprehension and compliance (C6) prove that the dynamic modification mechanism in Section 3.4 accurately reflected human intent, building trust.
6. Conclusions
6.1. Summary and Conclusions
6.2. Limitations
6.3. Future Research Directions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. Algorithms
| Algorithm A1 Dynamically Modifiable Graph-based Planning and Execution Loop |
| 1: Input: User Request U, Agent State S (Inventory, History, ) |
| 2: Output: Execution Result |
| 3: function HandleRequest() |
| 4: if is True then |
| 5: HandleIntervention() ▹ See Algorithm A3 |
| 6: else |
| 7: Complexity ← AnalyzeComplexity() |
| 8: if is Simple then |
| 9: ExecuteSingleAction(U) |
| 10: else ▹ Complex Task |
| 11: G(V, E) ← GenerateKnowledgeGraph() ▹ Generate Graph Plan |
| 12: |
| 13: |
| 14: TraverseAndExecute() |
| 15: end if |
| 16: end if |
| 17: end function |
| 18: function GenerateKnowledgeGraph() |
| 19: “Given Request U and State S, generate a DAG plan G satisfying constraints (No Cycles, Inventory First).” |
| 20: JSON ← QueryLLM(Prompt, S) |
| 21: G ← ParseAndValidate(JSON) return G |
| 22: end function |
| Algorithm A2 LLM-based Context-Aware Node Prioritization |
| 1: function PrioritizeTasks() |
| 2: SubGraph ← GetSubGraph(CurrentNode, Depth = k) |
| 3: Prompt ← “Given State S and Goal, prioritize tasks in Requirements.” |
| 4: PrioritizedList ← QueryLLM(Prompt, S) |
| 5: ValidateNodes() |
| 6: return |
| 7: end function |
| Algorithm A3 Integrated Dynamic Planning and Intervention Handling |
| 1: Input: User Request U, Agent State S (Inventory, History, G, ) |
| 2: Output: Execution Result |
| 3: function HandleRequest() |
| 4: if is True then |
| 5: InterventionHandled ← HandleIntervention() |
| 6: if is True then |
| 7: Return True ▹ Intervention processed (Switch/Add/Delete/Stop) |
| 8: end if |
| 9: end if |
| 10: Complexity ← Analyzecomplexity() |
| 11: |
| 12: execute Standard Planning Process ▹ See Algorithm A1 |
| 13: end function |
| 14: function HandleIntervention() |
| 15: “Classify user intent U relative to plan into {Switch, Delete, Add, |
| Stop, New Task, Chat}.” |
| 16: JSON ← QueryLLM(Prompt, S) |
| 17: Intent ← ParseJSON(JSON) |
| 18: switch |
| 19: case ‘switch’ |
| 20: StopActions(S) |
| 21: SwitchToNode() |
| 22: Return True |
| 23: end case |
| 24: case ‘delete’ |
| 25: StopActions(S) |
| 26: DeleteNode() |
| 27: if is Root then |
| 28: StopExecution(S) |
| 29: else |
| 30: SwitchToNode() |
| 31: end if |
| 32: Return True |
| 33: end case |
| 34: case ‘add’ |
| 35: StopActions(S) |
| 36: SubGraph ← GeneratePlan(Intent.Node) |
| 37: AddNode() |
| 38: SwitchToNode() |
| 39: Return True |
| 40: end case |
| 41: case ‘stop’ |
| 42: StopExecution(S) |
| 43: Return True ▹ Plan stopped, Agent becomes Idle |
| 44: end case |
| 45: case ‘new_task’ |
| 46: StopExecution(S) |
| 47: Return False ▹ Signal to HandleRequest to start new plan |
| 48: end case |
| 49: case ‘Chat’ |
| 50: Return False ▹ Just chatting, keep plan running |
| 51: end case |
| 52: end switch |
| 53: Return False |
| 54: end function |
Appendix B. Questionnaires and Interview Items
Appendix B.1. PSSUQ (Post-Study System Usability Questionnaire)
- Overall, I am satisfied with how easy it is to use this system.
- It was simple to use this system.
- I could effectively complete the tasks and scenarios using this system.
- I was able to complete the tasks and scenarios quickly using this system.
- I felt comfortable using this system.
- It was easy to learn to use this system.
- I believe I could become productive quickly using this system.
- The system gave error messages that clearly told me how to fix problems.
- Whenever I made a mistake using the system, I could recover easily and quickly.
- The information (such as online help, on-screen messages, and other documentation) provided with this system was clear.
- It was easy to find the information I needed.
- The information was effective in helping me complete the tasks and scenarios.
- The organization of information on the system screens was clear.
- The interface of this system was pleasant.
- I liked using the interface of this system.
- This system has all the functions and capabilities I expect it to have.
Appendix B.2. NASA-TLX (Task Load Index)
- Mental Demand:How much mental and perceptual activity was required (e.g., thinking, deciding, calculating, remembering, looking, searching, etc.)? Was the task easy or demanding, simple or complex, exacting or forgiving?
- Physical Demand: How much physical activity was required (e.g., pushing, pulling, turning, controlling, activating, etc.)? Was the task easy or demanding, slow or brisk, slack or strenuous, restful or laborious?
- Temporal Demand: How much time pressure did you feel due to the rate or pace at which the tasks or task elements occurred? Was the pace slow and leisurely or rapid and frantic?
- Effort: How hard did you have to work (mentally and physically) to accomplish your level of performance?
- Performance: How successful do you think you were in accomplishing the goals of the task set by the experimenter (or yourself)? How satisfied were you with your performance in accomplishing these goals?
- Frustration Level: How insecure, discouraged, irritated, stressed, and annoyed versus secure, gratified, content, relaxed, and complacent did you feel during the task?
Appendix B.3. Collaboration Satisfaction (Custom Scale)
- B1.
- The AI agent collected materials on my behalf, allowing me to focus more on other tasks (e.g., exploring for other items).
- B2.
- I believe the AI agent reduced my overall workload.
- B3.
- Collaborating with the AI agent was more efficient than working alone.
- B4.
- Overall, I was satisfied with the collaboration with the AI agent.
- B5.
- Communication with the AI agent was clear and effective.
- B6.
- The AI agent understood and followed my instructions well.
- B7.
- I would like to work with this AI agent again in Minecraft in the future.
- B8.
- I believe the AI agent complemented my role well.
Appendix B.4. Post-Experiment Interview Questions
- You collaborated with two different AI agents today. How did the overall feeling of working with the [First Agent] differ from the [Second Agent]?
- Which of the two agents felt more like a “competent collaboration partner (colleague)”? Conversely, which felt more like a “simple command execution tool”? Why?
- Which agent caused more frustration or stress during the task? Specifically, at what moments did you feel that way?
- Were there moments when the AI made mistakes or things didn’t go according to plan? If so, how did each agent cope, and how did you feel about their approach?
- If you had to perform a complex task in Minecraft again, which of the two agents would you choose to collaborate with? What is the decisive reason for your choice?
Appendix C. Detailed Interaction Log Analysis
| Baseline (Agent A)—Reactive approach causing latency | |
| User | Make a bed and bring it to me. |
| Agent | I think I need to make a bed first. |
| Agent | I need to make planks from birch logs first! |
| Agent | I need to gather birch logs first! |
| Agent | Ah, I will make planks with the existing logs I have! |
| Agent | Wait, I need to convert more logs to planks! |
| Proposed (Agent B)—Proactive planning | |
| User | Make a bed. |
| Agent | To make a bed, I need wool, planks, and a crafting table. First, I need to get white wool. |
| Agent | I will hit a sheep to get wool. |
| Agent | I am collecting oak logs now. |
| Agent | I am crafting planks now. |
| Baseline (Agent A)—Blind loop despite possessing the item | |
| User | Give me the stone pickaxe in your inventory. |
| Agent | Here it is! (Action executes but fails to transfer) |
| Agent | Looks like I have trouble giving it. I will make another one quickly. |
| Agent | I don’t have planks. |
| Agent | I need to make a wooden pickaxe first. |
| Proposed (Agent B)—Verified state avoiding redundancy | |
| User | Make an Oak Door. |
| Agent | I already have sufficient oak planks. Using the crafting table nearby. |
| Agent | Craft Oak Door successfully completed. |
| Agent | Gift Oak Door to Ryan successfully completed. |
| Baseline (Agent A)—Hallucination due to context inertia | |
| User | Make a furnace. |
| Agent | I need more coal for torches. |
| User | Stop making torches. |
| Agent | Okay, I’ll stop making torches. I’ll mine coal ore instead. |
| User | Stop working and come here. |
| Agent | Okay! But first I need to make planks! |
| Proposed (Agent B)—Successful context switching | |
| User | Stop mining. Build a chest now. |
| Agent | Understood. |
| Agent | Using planks to craft a chest. |
| Agent | Craft Chest successfully completed. |
References
- Wang, G.; Xie, Y.; Jiang, Y.; Mandlekar, A.; Xiao, C.; Zhu, Y.; Fan, L.; Anandkumar, A. Voyager: An Open-Ended Embodied Agent with Large Language Models. In Proceedings of the Intrinsically-Motivated and Open-Ended Learning Workshop@ NeurIPS2023, Paris, France, 13–15 September 2023. [Google Scholar]
- Yu, X.; Fu, J.; Deng, R.; Han, W. Mineland: Simulating large-scale multi-agent interactions with limited multimodal senses and physical needs. arXiv 2024, arXiv:2403.19267. [Google Scholar]
- Song, C.H.; Wu, J.; Washington, C.; Sadler, B.M.; Chao, W.L.; Su, Y. Llm-planner: Few-shot grounded planning for embodied agents with large language models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2–6 October 2023; pp. 2998–3009. [Google Scholar]
- Wang, Z.; Cai, S.; Chen, G.; Liu, A.; Ma, X.; Liang, Y.; CraftJarvis, T. Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents. In Proceedings of the 37th International Conference on Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023; pp. 34153–34189. [Google Scholar]
- Dong, Y.; Zhu, X.; Pan, Z.; Zhu, L.; Yang, Y. VillagerAgent: A Graph-Based Multi-Agent Framework for Coordinating Complex Task Dependencies in Minecraft. In Findings of the Association for Computational Linguistics ACL 2024; Association for Computational Linguistics: Stroudsburg, PA, USA, 2024; pp. 16290–16314. [Google Scholar]
- Guss, W.H.; Houghton, B.; Topin, N.; Wang, P.; Codel, C.; Veloso, M.; Salakhutdinov, R. MineRL: A large-scale dataset of minecraft demonstrations. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, Macao, China, 10–16 August 2019; pp. 2442–2448. [Google Scholar]
- Fan, L.; Wang, G.; Jiang, Y.; Mandlekar, A.; Yang, Y.; Zhu, H.; Tang, A.; Huang, D.-A.; Zhu, Y.; Anandkumar, A. Minedojo: Building open-ended embodied agents with internet-scale knowledge. Adv. Neural Inf. Process. Syst. 2022, 35, 18343–18362. [Google Scholar]
- Sacerdoti, E.D. Planning in a hierarchy of abstraction spaces. Artif. Intell. 1974, 5, 115–135. [Google Scholar] [CrossRef]
- Tate, A. Generating project networks. In Proceedings of the 5th International Joint Conference on Artificial Intelligence-Volume 2, Cambridge, MA, USA, 22–25 August 1977; pp. 888–893. [Google Scholar]
- Erol, K.; Hendler, J.; Nau, D.S. HTN planning: Complexity and expressivity. In Proceedings of the Twelfth AAAI National Conference on Artificial Intelligence, Seattle, WA, USA, 31 July–4 August 1994; pp. 1123–1128. [Google Scholar]
- Nau, D.S.; Au, T.C.; Ilghami, O.; Kuter, U.; Murdock, J.W.; Wu, D.; Yaman, F. SHOP2: An HTN planning system. J. Artif. Intell. Res. 2003, 20, 379–404. [Google Scholar] [CrossRef]
- Ahn, M.; Brohan, A.; Brown, N.; Chebotar, Y.; Cortes, O.; David, B.; Finn, C.; Fu, C.; Gopalakrishnan, K.; Hausman, K.; et al. Do as I can, not as I say: Grounding language in robotic affordances. In Proceedings of the 6th Conference on Robot Learning (CoRL 2022), Auckland, New Zealand, 14–18 December 2022; pp. 287–318. [Google Scholar]
- Huang, W.; Xia, F.; Xiao, T.; Chan, H.; Liang, J.; Florence, P.; Zeng, A.; Tompson, J.; Mordatch, I.; Chebotar, Y.; et al. Inner Monologue: Embodied Reasoning through Planning with Language Models. In Proceedings of the 6th the Conference on Robot Learning, (CoRL 2022), Auckland, New Zealand, 14–18 December 2022; pp. 1769–1782. [Google Scholar]
- Huang, W.; Abbeel, P.; Pathak, D.; Mordatch, I. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In Proceedings of the International Conference on Machine Learning, PMLR, Baltimore, MD, USA, 17–23 July 2022; pp. 9118–9147. [Google Scholar]
- Zitkovich, B.; Yu, T.; Xu, S.; Xu, P.; Xiao, T.; Xia, F.; Wu, J.; Wohlhart, P.; Welker, S.; Wahid, A.; et al. Rt-2: Vision-language-action models transfer web knowledge to robotic control. In Proceedings of the Conference on Robot Learning, PMLR, Atlanta, GA, USA, 6–9 November 2023; pp. 2165–2183. [Google Scholar]
- Yao, S.; Yu, D.; Zhao, J.; Shafran, I.; Griffiths, T.; Cao, Y.; Narasimhan, K. Tree of thoughts: Deliberate problem solving with large language models. Adv. Neural Inf. Process. Syst. 2023, 36, 11809–11822. [Google Scholar]
- Besta, M.; Blach, N.; Kubicek, A.; Gerstenberger, R.; Podstawski, M.; Gianinazzi, L.; Gajda, J.; Lehmann, T.; Niewiadomski, H.; Nyczyk, P.; et al. Graph of thoughts: Solving elaborate problems with large language models. Proc. AAAI Conf. Artif. Intell. 2024, 38, 17682–17690. [Google Scholar] [CrossRef]
- Park, J.S.; O’Brien, J.; Cai, C.J.; Morris, M.R.; Liang, P.; Bernstein, M.S. Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, Francisco, CA, USA, 29 October–1 November 2023; pp. 1–22. [Google Scholar]
- Zhang, C.; Yang, K.; Hu, S.; Wang, Z.; Li, G.; Sun, Y.; Zhang, C.; Zhang, Z.; Liu, A.; Zhu, S.-C.; et al. Proagent: Building proactive cooperative ai with large language models. arXiv 2023, arXiv:2308.11339. [Google Scholar] [CrossRef]
- Chen, W.; Su, Y.; Zuo, J.; Yang, C.; Yuan, C.; Chan, C.M.; Yu, H.; Lu, Y.; Hung, Y.-H.; Qian, C.; et al. AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors. In Proceedings of the ICLR, Vienna, Austria, 7–11 May 2024. [Google Scholar]
- White, I.; Nottingham, K.; Maniar, A.; Robinson, M.; Lillemark, H.; Maheshwari, M.; Qin, L.; Ammanabrolu, P. Collaborating action by action: A multi-agent LLM framework for embodied reasoning. arXiv 2025, arXiv:2504.17950. [Google Scholar] [CrossRef]
- Shinn, N.; Cassano, F.; Gopinath, A.; Narasimhan, K.; Yao, S. Reflexion: Language agents with verbal reinforcement learning. Adv. Neural Inf. Process. Syst. 2023, 36, 8634–8652. [Google Scholar]
- Taioli, F.; Zorzi, E.; Franchi, G.; Castellini, A.; Farinelli, A.; Cristani, M.; Wang, Y. Collaborative Instance Object Navigation: Leveraging Uncertainty-Awareness to Minimize Human-Agent Dialogues. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Honolulu, HI, USA, 19–23 October 2025; pp. 18781–18792. [Google Scholar]
- Liu, J.; Zhou, P.; Du, Y.; Tan, A.H.; Snoek, C.G.; Sonke, J.J.; Gavves, E. CaPo: Cooperative Plan Optimization for Efficient Embodied Multi-Agent Cooperation. In Proceedings of the Thirteenth International Conference on Learning Representations, Singapore, 24–28 April 2025. [Google Scholar]
- Asuzu, K.; Singh, H.; Idrissi, M. Human–robot interaction through joint robot planning with large language models. Intell. Serv. Robot. 2025, 18, 261–277. [Google Scholar] [CrossRef]
- Hart, S.G.; Staveland, L.E. Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. Adv. Psychol. 1988, 52, 139–183. [Google Scholar]
- Lewis, J.R. Psychometric evaluation of the post-study system usability questionnaire: The PSSUQ. In Proceedings of the Human Factors Society Annual Meeting; Sage Publications: Los Angeles, CA, USA, 1992; Volume 36, pp. 1259–1260. [Google Scholar]
- Dunn, T.J.; Baguley, T.; Brunsden, V. From alpha to omega: A practical solution to the pervasive problem of internal consistency estimation. Br. J. Psychol. 2014, 105, 399–412. [Google Scholar] [CrossRef] [PubMed]







| Scale | Mean (A) | SD (A) | Mean (B) | SD (B) | % (B < A) |
|---|---|---|---|---|---|
| Overall | 1042.267 | 433.711 | 944.6 | 347.417 | 60 |
| A First Group | 1254.267 | 456.120 | 933.6 | 368.511 | 80 |
| B First Group | 830.267 | 292.049 | 955.6 | 337.579 | 40 |
| Scale | Mean (A) | SD (A) | Mean (B) | SD (B) | t-Value | df | p-Value | Cohen’s d |
|---|---|---|---|---|---|---|---|---|
| AI Message Count | 93.03 | 60.14 | 71.93 | 40.32 | 1.784 | 28 | 0.085 | 0.412 |
| Human Message Count | 23.41 | 14.59 | 22.38 | 14.64 | 0.314 | 28 | 0.756 | 0.071 |
| Scale | Mean (A) | SD (A) | Mean (B) | SD (B) | t-Value | df | p-Value | Cohen’s d |
|---|---|---|---|---|---|---|---|---|
| Mental Demand (MD) | 2.93 | 1.14 | 2.53 | 1.25 | 1.755 | 29 | 0.090 | 0.334 |
| Physical Demand (PD) | 2.53 | 1.20 | 2.30 | 1.15 | 1.424 | 29 | 0.165 | 0.199 |
| Temporal Demand (TD) | 2.37 | 1.13 | 2.43 | 1.04 | −0.263 | 29 | 0.794 | 0.061 |
| Effort | 2.83 | 1.26 | 2.47 | 1.17 | 2.164 | 29 | 0.039 | 0.302 |
| Performance (Perf) | 3.17 | 1.21 | 3.30 | 1.24 | −0.548 | 29 | 0.588 | 0.109 |
| Frustration (Frus) | 2.93 | 1.39 | 2.37 | 1.38 | 1.788 | 29 | 0.084 | 0.410 |
| Scale | Mean (A) | SD (A) | Mean (B) | SD (B) | t-Value | df | p-Value | Cohen’s d |
|---|---|---|---|---|---|---|---|---|
| Overall Satisfaction | 3.43 | 1.53 | 2.94 | 1.62 | 2.209 | 29 | 0.035 | 0.306 |
| System Usefulness (SYSUSE) | 3.40 | 1.57 | 2.98 | 1.76 | 2.066 | 29 | 0.048 | 0.250 |
| Information Quality (INFOQUAL) | 3.47 | 1.60 | 2.89 | 1.48 | 2.304 | 29 | 0.029 | 0.379 |
| Interface Quality (INTERQUAL) | 3.23 | 1.70 | 2.93 | 1.92 | 1.064 | 29 | 0.296 | 0.165 |
| Item Description | Mean (A) | SD (A) | Mean (B) | SD (B) | t | df | p | d |
|---|---|---|---|---|---|---|---|---|
| C1. Work Focus | 4.23 | 1.92 | 4.97 | 1.97 | −2.451 | 29 | 0.021 | 0.376 |
| C2. Reduced Burden | 4.27 | 1.78 | 5.17 | 1.78 | −3.031 | 29 | 0.005 | 0.505 |
| C3. Efficiency | 3.97 | 2.17 | 4.80 | 2.02 | −2.154 | 29 | 0.040 | 0.397 |
| C4. Overall Satisfaction | 4.03 | 1.85 | 4.97 | 1.85 | −3.043 | 29 | 0.005 | 0.505 |
| C5. Communication | 3.80 | 1.85 | 4.53 | 2.11 | −2.083 | 29 | 0.046 | 0.370 |
| C6. Instruction Compliance | 3.50 | 1.83 | 4.57 | 1.87 | −3.087 | 29 | 0.004 | 0.576 |
| C7. Intention to Reuse | 4.10 | 1.99 | 5.03 | 2.01 | −2.603 | 29 | 0.014 | 0.467 |
| C8. Role Complementation | 4.37 | 2.03 | 5.20 | 1.81 | −2.533 | 29 | 0.017 | 0.434 |
| Scenario Name | Node Count (N) | Accuracy (%) | Avg. Latency (ms) |
|---|---|---|---|
| Wooden Pickaxe | 12 | 100.0 | 880 |
| Bed Plan | 44 | 100.0 | 726 |
| Nether Portal | 60 | 100.0 | 699 |
| Battle Prep. | 120 | 100.0 | 781 |
| Master Plan | 215 | 80.0 | 782 |
| Average | - | 96.0 | 773.6 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Han, S.; Lee, K.H. Improving the Efficiency of Collaboration Between Humans and Embodied AI Agents in 3D Virtual Environments. Appl. Sci. 2026, 16, 1135. https://doi.org/10.3390/app16021135
Han S, Lee KH. Improving the Efficiency of Collaboration Between Humans and Embodied AI Agents in 3D Virtual Environments. Applied Sciences. 2026; 16(2):1135. https://doi.org/10.3390/app16021135
Chicago/Turabian StyleHan, Seowon, and Kang Hoon Lee. 2026. "Improving the Efficiency of Collaboration Between Humans and Embodied AI Agents in 3D Virtual Environments" Applied Sciences 16, no. 2: 1135. https://doi.org/10.3390/app16021135
APA StyleHan, S., & Lee, K. H. (2026). Improving the Efficiency of Collaboration Between Humans and Embodied AI Agents in 3D Virtual Environments. Applied Sciences, 16(2), 1135. https://doi.org/10.3390/app16021135

