Sensing User Intent: An LLM-Powered Agent for On-the-Fly Personalized Virtual Space Construction from UAV Sensor Data
Abstract
1. Introduction
- SGAA Framework: A dual-path decision system for low-latency, reliable interaction, using FSM and Command Cache for high-frequency commands, and an LLM-driven IDC Funnel for complex queries;
- IDC Funnel: A multi-stage LLM-powered pipeline that converts user intent into structured queries and generates dynamic, personalized virtual experiences;
- Empirical Validation: The framework outperforms traditional agents in response time (1512 ms vs. 4301 ms), task success (95% vs. 57%), and query precision (85.5% vs. 52.7%). A user study confirmed improved user engagement.
2. Methodology: The Personalized Virtual Space Construction Framework
2.1. Overall Framework Overview
2.2. Grounding the Agent: Knowledge Base Preparation
- AI-Powered Structured Metadata Generation: For each identified asset (e.g., a bird species), we automated the enrichment of sparse raw data into comprehensive JSON entries. This was achieved by prompting the Qwen2.5-72B-Instruct LLM [22] with precise system instructions to synthesize attributes such as scientific names, detailed descriptions, habitat, and conservation status from raw textual identifiers.
- Vector Embedding with Pre-trained Models: To enable the semantic retrieval crucial for the RAG component, we consolidated the textual attributes of each entry into a coherent string and transformed them into high-dimensional vector embeddings. We utilized BAAI/bge-large-zh-v1.5 [23], a specialized pre-trained model optimized for semantic similarity, to ensure high-fidelity representation of the domain-specific biodiversity data.
- Efficient Similarity Search via FAISS: The generated embeddings were indexed using the Facebook AI Similarity Search (FAISS) library [24]. Specifically, we employed vector similarity search, which transforms the collection of embeddings into an optimized data structure. This allows for highly efficient nearest-neighbor queries, enabling the system to dynamically match user intent with relevant knowledge base entries with millisecond-level latency. Critically, this standardized pipeline—spanning data ingestion, embedding generation (BAAI/bge), and vector indexing (FAISS)—serves as the foundational retrieval mechanism for both our proposed IDC-RAG framework and the ’Standard RAG’ baseline used in our comparative experiments (see Section 3).
| Algorithm 1 Knowledge Base Construction Pipeline |
|
2.3. The Agent’s Core Intelligence: SGAA and IDC Funnel
2.3.1. The State-Gated Agent Architecture: A Framework for Robust, Real-Time Interaction
| Listing 1. Example JSON Schema for LLM Intent Formalization. Note: This schema details the primary fields (intents, filters, count, layout) that the LLM is instructed to output. The action_type (FSM trigger) is derived by the backend from this structured intent and subsequent data retrieval results, not directly from the LLM output. |
![]() |
- Invalid or Unfulfillable Intents: If the LLM’s output cannot be successfully parsed, or if the subsequent data retrieval (combining IDC and RAG paths) fails to identify any relevant exhibits based on the LLM’s formalized intent, the backend system does not generate an invalid FSM trigger. Instead, it explicitly constructs a SceneAction with action_type: “GENERAL_RESPONSE” and a user-facing explanatory message (e.g., “Sorry, no bird information found matching your description.”). This provides immediate feedback to the user regarding the unfulfilled request.
- System Errors: In cases of unexpected backend processing errors (e.g., LLM API failure, data processing exceptions), a SceneAction with action_type: “ERROR” and a generic apology is sent to the user. All such incidents are also thoroughly logged to the backend console for diagnostic purposes.
- FSM State Maintenance: Upon such a “rejection” (i.e., when a valid, actionable SceneAction.action_type corresponding to the user’s positive intent cannot be generated), the FSM maintains its current valid state. It does not attempt to transition to an undefined or error state. The frontend receives the GENERAL_RESPONSE or ERROR action, displays the feedback, and awaits further user input, ensuring the system’s stability, predictability, and a graceful user experience.
| Algorithm 2 State-Gated Intent Processing and Curation |
|
|
|
2.3.2. The Hybrid IDC-RAG Funnel: A Multi-Stage Process for Robust Curation
- Precise Filtering (IDC Path): If structured filters were successfully extracted in Stage 1, these filters are applied directly to the knowledge base to identify a set of precisely matching bird IDs. This leverages IDC’s logical precision.
- Semantic Search (RAG Path): Concurrently, a vector-based semantic search is performed on the original user query. This retrieves a set of semantically related bird IDs based on their semantic similarity to the query, providing a broader set of potentially relevant entities.
- Conditional Re-filtering: Ensuring Logical Adherence with Semantic Relevance. If structured filters were present (i.e., extracted_filters is not empty), the semantically related bird IDs obtained from the RAG path are subjected to a secondary re-filtering process using the exact same precise IDC criteria. This step is critical for overcoming RAG’s probabilistic limitations and ensuring logical accuracy.To illustrate its value, consider the query: “Show me endangered blue birds.”
- 1.
- In Stage 1, the LLM formalizes this into filters: “conservation_status”: “Endangered”, “feather_color”: “Blue”.
- 2.
- In Stage 2, the RAG path might retrieve a broad set of semantically related birds. Due to vector proximity, this set could inadvertently include a “Blue-feathered” bird whose conservation_status is actually “Least Concern” (e.g., Blue Eared Pheasant), or an “Endangered” bird whose feather_color is “Green”. These are typical RAG “false positives”—semantically relevant but logically incorrect.
- 3.
- The Conditional Re-filtering stage strictly applies the logic to all RAG-retrieved IDs. Any bird failing either condition is precisely eliminated.
This process ensures that any semantically relevant entities from RAG also strictly adhere to the user’s explicit logical constraints, yielding a set of re-filtered semantic bird IDs. This mechanism dramatically improves the precision of the final results by eliminating logically incongruent entities that a pure RAG system might otherwise present. - Result Merging: The final set of bird IDs for the narrative are then formed by taking the deduplicated union of the precisely matching bird IDs (from the initial IDC filtering) and the re-filtered semantic bird IDs (from the re-filtered RAG path). This strategy ensures that all deterministically matching entities are included, augmented by semantically relevant entities that also satisfy the precise filters.
- RAG-only Fallback: Differentiating Broad Queries from LLM Failures. If no structured filters were extracted in Stage 1 (extracted_filters is empty), the system differentiates between two scenarios based on the confidence_score from the LLM’s intent formalization:
- 1.
- Broad Query (High Confidence): If extracted_filters is empty and the confidence_score is high (e.g., >0.7), it indicates that the user’s query was intentionally broad or general (e.g., “Show me some birds”), and no specific filters were needed. In this case, the system relies solely on the semantically related bird IDs from the RAG path as the final set of bird IDs for the narrative.
- 2.
- LLM Failure (Low Confidence): If extracted_filters is empty and the confidence_score is low (e.g., <0.7), or if the LLM’s output was malformed, it signals that the LLM failed to understand or formalize the user’s intent. In this scenario, the system does not proceed with RAG fallback. Instead, it returns a SceneAction with action_type: “GENERAL_RESPONSE” containing an explicit error message (e.g., “Sorry, I could not understand your request. Please try rephrasing.”). Detailed error logs are generated, and the FSM maintains its current state, ensuring system stability.
2.4. Agent-Environment Communication: The SceneAction Protocol
| Algorithm 3 Structure of the SceneAction Communication Protocol |
|
|
|
|
3. Experiments
3.1. Experimental Setup
3.2. Component-Level Technical Benchmarking
3.2.1. Responsiveness & Cost-Effectiveness (Fast Path Validation)
3.2.2. Reliability & Task Success Rate (FSM Guardrail Validation)
3.2.3. Precision on Complex Queries (Hybrid IDC-RAG Validation)
- Standard RAG Failure (False Positive): The baseline RAG agent incorrectly retrieved the Bar-headed Goose (Conservation Status: Least Concern). This occurred because its textual description contained the sentence “This species is currently not considered endangered.” The vector embedding mechanism captured the high semantic similarity with the keyword “endangered” but failed to parse the negation logic (“not”), resulting in the retrieval of a non-endangered species.
- IDC Success (Logical Elimination): In contrast, the CurationAgent’s IDC Funnel correctly extracted the structured filter {“conservation_status”: “Endangered”}. During the Conditional Re-filtering stage, the system checked the metadata of the RAG-retrieved Bar-headed Goose. Since its status (Least Concern) did not match the strict filter (Endangered), the IDC logic explicitly eliminated this false positive result. This demonstrates that structured intent extraction is essential for ensuring logical correctness where probabilistic embedding fails.
3.3. Holistic User Study
3.3.1. Experiment 1: Paradigm Validation
3.3.2. Case Study: Architectural Validation of Complex Intent and Robustness
“I am looking for a bird found in (North America OR Europe), which must be a (small songbird), AND its conservation status is (NOT Endangered) AND (NOT Vulnerable).”
| Listing 2. Structured JSON Query Parsed by the LLM. |
![]() |
“First, find all birds that live in the forest. THEN, exclude all (finches and crows) from that list. FINALLY, from the remaining results, show me only those that are (endangered) OR (have colorful, i.e., non-monochrome, feathers).”
- Step 1 (Retrieve): Get_birds(Habitat = “Forest”) -> Result_Set_1;
- Step 2 (Filter/Exclude): Filter_birds(input = Result_Set_1, exclude_families = [“Fringillidae”, “Corvidae”]) -> Result_Set_2;
- Step 3 (Final Filter): Filter_birds(input = Result_Set_2, conditions = [“status”: “Endangered”, “color”: “colorful”]) -> Final_Result.
“Tell me a joke,” or “What is the weather in Beijing?”
“I’m sorry, I am an assistant focused on the Lalu Wetland virtual exhibition and do not provide jokes or weather information. Would you like to know about the local flora or fauna?”
3.4. Summary of Experimental Findings
4. Discussion
4.1. Theoretical Implications: Validating Dual Process Theory
4.2. Comparative Analysis: Logic over Probability
4.3. Practical Deployment: Latency and Resource Trade-Offs
4.4. Limitations and Future Directions
5. Conclusions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Styliani, S.; Fotis, L.; Kostas, K.; Petros, P. Virtual museums, a survey and some issues for consideration. J. Cult. Herit. 2009, 10, 520–528. [Google Scholar] [CrossRef]
- Carrozzino, M.; Bergamasco, M. Beyond virtual museums: Experiencing immersive virtual reality in real museums. J. Cult. Herit. 2010, 11, 452–458. [Google Scholar] [CrossRef]
- Brusilovsky, P. Adaptive Hypermedia. User Model. User Adapt. Interact. 2001, 11, 87–110. [Google Scholar] [CrossRef]
- Stock, O.; Zancanaro, M.; Busetta, P.; Callaway, C.B.; Krüger, A.; Kruppa, M.; Kuflik, T.; Not, E.; Rocchi, C. Adaptive, intelligent presentation of information for the museum visitor in PEACH. User Model. User Adapt. Interact. 2007, 17, 257–304. [Google Scholar] [CrossRef]
- Thompson, J.R.; Liu, Z.; Stasko, J.T. Data Animator: Authoring Expressive Animated Data Graphics. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, Yokohama, Japan, 8–13 May 2021; ACM: New York, NY, USA, 2021; pp. 15:1–15:18. [Google Scholar] [CrossRef]
- Manfreda, S.; McCabe, M.F.; Miller, P.E.; Lucas, R.M.; Madrigal, V.P.; Mallinis, G.; Ben-Dor, E.; Helman, D.; Estes, L.D.; Ciraolo, G.; et al. On the Use of Unmanned Aerial Systems for Environmental Monitoring. Remote Sens. 2018, 10, 641. [Google Scholar] [CrossRef]
- Shi, L.; Chen, X.; Zhou, Y. Barycentric Coordinate-Based Distributed Localization for Wireless Sensor Networks Under False-Data-Injection Attacks. IEEE Trans. Cybern. 2025, 55, 1568–1579. [Google Scholar] [CrossRef] [PubMed]
- Poorvi, J.; Kalita, A.; Gurusamy, M. Reliable and Efficient Data Collection in UAV-based IoT Networks. arXiv 2023, arXiv:2311.05303. [Google Scholar] [CrossRef]
- Keim, D.A.; Andrienko, G.L.; Fekete, J.; Görg, C.; Kohlhammer, J.; Melançon, G. Visual Analytics: Definition, Process, and Challenges. In Information Visualization-Human-Centered Issues and Perspectives; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2008; Volume 4950, pp. 154–175. [Google Scholar] [CrossRef]
- Sarwar, B.M.; Karypis, G.; Konstan, J.A.; Riedl, J. Item-based collaborative filtering recommendation algorithms. In Proceedings of the Tenth International World Wide Web Conference, Hong Kong, China, 1–5 May 2001; ACM: New York, NY, USA, 2001; pp. 285–295. [Google Scholar] [CrossRef]
- Shaker, N.; Togelius, J.; Nelson, M.J. Procedural Content Generation in Games; Computational Synthesis and Creative Systems; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar] [CrossRef]
- Summerville, A.; Snodgrass, S.; Guzdial, M.; Holmgård, C.; Hoover, A.K.; Isaksen, A.; Nealen, A.; Togelius, J. Procedural Content Generation via Machine Learning (PCGML). IEEE Trans. Games 2018, 10, 257–270. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is All you Need. In Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. Available online: https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html (accessed on 8 December 2025).
- Yao, S.; Zhao, J.; Yu, D.; Du, N.; Shafran, I.; Narasimhan, K.R.; Cao, Y. ReAct: Synergizing Reasoning and Acting in Language Models. In Proceedings of the Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, 1–5 May 2023; OpenReview.net. 2023. Available online: https://openreview.net/forum?id=WE_vluYUL-X (accessed on 8 December 2025).
- Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Ichter, B.; Xia, F.; Chi, E.H.; Le, Q.V.; Zhou, D. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, 28 November–9 December 2022; Available online: http://papers.nips.cc/paper_files/paper/2022/hash/9d5609613524ecf4f15af0f7b31abca4-Abstract-Conference.html (accessed on 8 December 2025).
- Yao, S.; Yu, D.; Zhao, J.; Shafran, I.; Griffiths, T.; Cao, Y.; Narasimhan, K. Tree of Thoughts: Deliberate Problem Solving with Large Language Models. In Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, 10–16 December 2023; Available online: http://papers.nips.cc/paper_files/paper/2023/hash/271db9922b8d1f4dd7aaef84ed5ac703-Abstract-Conference.html (accessed on 8 December 2025).
- Skarbez, R.; Brooks, F.P., Jr.; Whitton, M.C. A Survey of Presence and Related Concepts. ACM Comput. Surv. 2018, 50, 96:1–96:39. [Google Scholar] [CrossRef]
- Schick, T.; Dwivedi-Yu, J.; Dessì, R.; Raileanu, R.; Lomeli, M.; Hambro, E.; Zettlemoyer, L.; Cancedda, N.; Scialom, T. Toolformer: Language Models Can Teach Themselves to Use Tools. In Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, 10–16 December 2023; Available online: http://papers.nips.cc/paper_files/paper/2023/hash/d842425e4bf79ba039352da0f658a906-Abstract-Conference.html (accessed on 8 December 2025).
- Kahneman, D. Thinking, Fast and Slow; Farrar, Straus and Giroux: New York, NY, USA, 2011. [Google Scholar]
- Liu, J.; Yu, C.; Gao, J.; Xie, Y.; Liao, Q.; Wu, Y.; Wang, Y. LLM-Powered Hierarchical Language Agent for Real-time Human-AI Coordination. In Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems, Auckland, New Zealand, 6–10 May 2024; International Foundation for Autonomous Agents and Multiagent Systems: Taipei, Taiwan, China, 2024; pp. 1219–1228. [Google Scholar] [CrossRef]
- Baek, J.; Chandrasekaran, N.; Cucerzan, S.; Herring, A.; Jauhar, S.K. Knowledge-Augmented Large Language Models for Personalized Contextual Query Suggestion. In Proceedings of the ACM on Web Conference 2024, Singapore, 13–17 May 2024; ACM: New York, NY, USA, 2024; pp. 3355–3366. [Google Scholar] [CrossRef]
- Yang, A.; Yang, B.; Zhang, B.; Hui, B.; Zheng, B.; Yu, B.; Li, C.; Liu, D.; Huang, F. Qwen 2.5 Technical Report. arXiv 2024, arXiv:2412.15115. [Google Scholar] [CrossRef]
- Xiao, S.; Liu, Z.; Zhang, P.; Muennighoff, N.; Lian, D.; Nie, J. C-Pack: Packed Resources For General Chinese Embeddings. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, Washington, DC, USA, 14–18 July 2024; ACM: New York, NY, USA, 2024; pp. 641–649. [Google Scholar] [CrossRef]
- Johnson, J.; Douze, M.; Jégou, H. Billion-Scale Similarity Search with GPUs. IEEE Trans. Big Data 2021, 7, 535–547. [Google Scholar] [CrossRef]
- Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.; Rocktäschel, T.; et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Adv. Neural Inf. Process. Syst. 2020, 33, 9459–9474. Available online: https://proceedings.neurips.cc/paper/2020/hash/6b493230205f780e1bc26945df7481e5-Abstract.html (accessed on 8 December 2025).
- Piosenka, G. Birds 525 Species (Image Classification). 2022. Available online: https://www.kaggle.com/code/sunilthite/birds-525-species-image-classification (accessed on 8 December 2025).
- Kelley, J.F. An Iterative Design Methodology for User-Friendly Natural Language Office Information Applications. ACM Trans. Inf. Syst. 1984, 2, 26–41. [Google Scholar] [CrossRef]
- Hart, S.; Staveland, L. Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. Hum. Ment. Workload. 1988, 52, 139–183. [Google Scholar] [CrossRef]
- Brooke, J. SUS-A quick and dirty usability scale. Usability Eval. Ind. 1996, 189, 4–7. [Google Scholar] [CrossRef]




| Metric | CurationAgent | Wizard-of-Oz (WoZ) | Static Exhibition |
|---|---|---|---|
| Task Completion (s) | 2.91 (0.95) | 15.23 (2.31) | 169.04 (72.72) |
| Cognitive Load (TLX) | 17.78 (3.08) | 25.35 (6.94) | 65.56 (5.96) |
| System Usability (SUS) | 74.67 (5.57) | 68.89 (5.21) | 59.33 (5.20) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Luo, S. Sensing User Intent: An LLM-Powered Agent for On-the-Fly Personalized Virtual Space Construction from UAV Sensor Data. Sensors 2025, 25, 7610. https://doi.org/10.3390/s25247610
Luo S. Sensing User Intent: An LLM-Powered Agent for On-the-Fly Personalized Virtual Space Construction from UAV Sensor Data. Sensors. 2025; 25(24):7610. https://doi.org/10.3390/s25247610
Chicago/Turabian StyleLuo, Sanbi. 2025. "Sensing User Intent: An LLM-Powered Agent for On-the-Fly Personalized Virtual Space Construction from UAV Sensor Data" Sensors 25, no. 24: 7610. https://doi.org/10.3390/s25247610
APA StyleLuo, S. (2025). Sensing User Intent: An LLM-Powered Agent for On-the-Fly Personalized Virtual Space Construction from UAV Sensor Data. Sensors, 25(24), 7610. https://doi.org/10.3390/s25247610


