AI-based educational chatbots can expand access to learning, but many remain limited to text-only interfaces and fixed infrastructures, while purely generative responses raise concerns of reliability and consistency. In this context, we present ROboMC, a portable and multimodal system that combines a validated
[...] Read more.
AI-based educational chatbots can expand access to learning, but many remain limited to text-only interfaces and fixed infrastructures, while purely generative responses raise concerns of reliability and consistency. In this context, we present ROboMC, a portable and multimodal system that combines a validated knowledge base with generative responses (OpenAI) and voice–text interaction, designed to enable both text and voice interaction, ensuring reliability and flexibility in diverse educational scenarios. The system, developed in Django, integrates two response pipelines: local search using normalized keywords and fuzzy matching in the LocalQuestion database, and fallback to the generative model GPT-3.5-Turbo (OpenAI, San Francisco, CA, USA) with a prompt adapted exclusively for Romanian and an explicit disclaimer. All interactions are logged in AutomaticQuestion for later analysis, supported by a semantic encoder (SentenceTransformer—paraphrase-multilingual-MiniLM-L12-v2’, Hugging Face Inc., New York, NY, USA) that ensures search tolerance to variations in phrasing. Voice interaction is managed through gTTS (Google LLC, Mountain View, CA, USA) with integrated audio playback, while portability is achieved through deployment on a Raspberry Pi 4B (Raspberry Pi Foundation, Cambridge, UK) with microphone, speaker, and battery power. Voice input is enabled through a cloud-based speech-to-text component (Google Web Speech API accessed via the Python SpeechRecognition library, (Anthony Zhang, open-source project, USA) using the Google Web Speech API (Google LLC, Mountain View, CA, USA; language = “ro-RO”)), allowing users to interact by speaking. Preliminary tests showed average latencies of 120–180 ms for validated responses on laptop and 250–350 ms on Raspberry Pi, respectively, 2.5–3.5 s on laptop and 4–6 s on Raspberry Pi for generative responses, timings considered acceptable for real educational scenarios. A small-scale usability study (N ≈ 35) indicated good acceptability (SUS ~80/100), with participants valuing the balance between validated and generative responses, the voice integration, and the hardware portability. Although system validation was carried out in the eHealth context, its architecture allows extension to any educational field: depending on the content introduced into the validated database, ROboMC can be adapted to medicine, engineering, social sciences, or other disciplines, relying on ChatGPT only when no clear match is found in the local base, making it a scalable and interdisciplinary solution.
Full article