1. Introduction
Large language model-based conversational systems are now widely used across multiple domains, including education [
1,
2,
3]. As a result, artificial intelligence has moved from an experimental technology to an influential sociotechnical infrastructure, including a growing presence in formal and informal education [
2,
3,
4,
5,
6,
7]. Importantly, this transition has shifted attention away from model capabilities alone toward system-level questions concerning how reasoning processes are structured, controlled, and sustained during human–AI interaction [
2]. In domains requiring transparent, incremental reasoning, the lack of explicit workflow and reasoning-control mechanisms remains a key limitation of general-purpose LLM applications [
8,
9].
In educational settings, the question is no longer whether LLMs will be used, but how interactive LLM-based systems should be designed to support meaningful reasoning. Recent research has examined a range of interaction paradigms in which LLMs function as tutors, assistants, or feedback providers, with a focus on their influence on learning processes [
5,
6,
10,
11]. From a systems perspective, the challenge is not only generating fluent responses but regulating how outputs are produced and integrated into user workflows. Their cross-domain applicability and low configuration requirements [
6,
12,
13,
14] have accelerated adoption, yet they expose interaction-design limitations, particularly in controlling reasoning depth, sequencing, and transparency.
Within this broader context of system-level adoption, several benefits of LLM-based conversational systems have been reported, particularly for text-centric tasks such as drafting, revision, and summarization [
5]. More recent studies also indicate potential value in instructional settings, including mathematics learning across multiple educational levels [
15,
16,
17]. Despite these benefits, substantial concerns have been raised regarding the pedagogical implications of unrestricted LLM usage [
2,
6,
11,
18,
19]. A recurring issue is the tendency of general-purpose chatbots to provide final answers or complete solutions without engaging learners in intermediate problem-solving steps [
13,
19,
20,
21]. This can lead to superficial task completion by bypassing essential processes such as understanding the task, planning a strategy, and executing it stepwise [
22,
23]. From a pedagogical perspective, skipping these steps undermines the development of conceptual understanding and problem-solving skills, even when the final answer is correct [
24,
25]. Another challenge is learners’ increasing reliance on LLM-based chatbots, particularly when they are unable to solve a task independently [
19,
20,
26,
27]. When users lack sufficient domain knowledge, they are less capable of identifying errors, inconsistencies, or hallucinated responses produced by the model [
2,
28]. This increases the risk of uncritical acceptance of generated solutions and further amplifies the negative effects of step skipping and overreliance on automated assistance [
20,
23,
26,
28,
29].
Motivated by these challenges, this paper presents SPARK_AI (Socratic–Polya Adaptive Reasoning Kit), an adaptable, prompt-orchestrated system architecture that prioritizes process-oriented reasoning over direct answer generation. SPARK_AI enforces explicit interaction control through stepwise phase progression and a bounded hint policy, while maintaining stateful, per-problem sessions with persistent dialog traces and structured logs to support reproducible interaction trajectories and fine-grained analysis. We instantiate the approach in SPARK_AI_MATH, a domain module that operationalizes Pólya-inspired problem-solving phases and Socratic dialog with optional tool-mediated scaffolding (e.g., GeoGebra). We also report a preliminary user-acceptance evaluation with students, combining descriptive statistics, qualitative analysis, and PLS-SEM (Partial Least Squares-Structural Equation Modeling). The technical solution is implemented as a web-based system that separates a persistence layer from an LLM orchestration service and remains provider-agnostic through an adapter-based provider abstraction, enabling the underlying LLM backend to be swapped without altering the core reasoning protocol; the same control and logging structure also supports efficient domain adaptation through localized prompt strategies.
5. Discussion
The descriptive analysis provides evidence that SPARK_AI_MATH was positively perceived by first-year mechanical engineering students when used for mathematical problem solving. Mean scores across all TAM constructs were consistently high, with particularly strong ratings for the SPARK-specific learning support items (
Table 2). These results suggest that students valued the system’s instructional scaffolding, including visualization support, perceived improvement in understanding upon task completion, and structured step-by-step guidance in contrast to receiving an immediate solution. Among the TAM constructs, PEOU also emerged as one of the most strongly rated dimensions, indicating that students were able to adapt quickly to the chatbot-based tutoring interaction without experiencing notable usability barriers. Overall, all questionnaire items exhibited relatively positive evaluations, with mode values predominantly at 4 or 5 on the Likert scale (
Appendix A,
Table A1). This distribution indicates that the majority of participants expressed favorable attitudes toward the system not only in terms of learning support and perceived ease of use, but also with respect to intrinsic motivation, perceived usefulness, and behavioral intention to use such tool in the future. From a system design perspective, these descriptive patterns suggest that users positively respond to interaction paradigms that explicitly structure reasoning processes and make intermediate steps visible, rather than treating the language model as a black-box answer generator.
The structural model results indicate that intrinsic motivation functions as a primary upstream driver of technology-related perceptions. Intrinsic motivation exhibits strong positive effects on both perceived ease of use and perceived usefulness, suggesting that more intrinsically motivated participants are more likely to perceive the system as easier to use and, more importantly, as useful. This pattern is consistent with findings from prior studies examining students’ adoption of generative AI tools, including research on the use of ChatGPT in higher education contexts [
56]. This finding indicates that motivational predispositions shape how users engage with the system’s reasoning workflow, reinforcing the role of adaptive interaction design in supporting positive usability and usefulness perceptions.
In turn, perceived ease of use contributes more modestly to perceived usefulness, supporting the notion that usability primarily facilitates the formation of usefulness beliefs rather than directly driving adoption. Similar relationships have been reported in studies investigating student acceptance of educational platforms and digital learning technologies [
67,
68].
Perceived usefulness emerges as a central mechanism translating these perceptions into learning-related outcomes and usage intentions. Perceived usefulness strongly predicts learning support and exerts a direct effect on behavioral intention, while learning support further contributes to behavioral intention. This pattern is consistent with findings reported in prior studies on the adoption of AI-based and educational technologies [
56,
67,
68]. Taken together, these results suggest that users’ intentions to continue using the system are driven primarily by perceived instructional value and the learning support provided by the system, rather than by usability considerations alone. In the context of SPARK_AI_MATH, perceived usefulness appears to be closely tied to the system’s ability to regulate reasoning flow and scaffold intermediate steps, rather than to surface-level usability features.
Notably, the direct effect of perceived ease of use on behavioral intention was not statistically significant and was slightly negative, a finding that is consistent with prior research on student adoption of generative AI tools [
56]. This pattern suggests that ease of use alone may be insufficient to sustain engagement in reasoning-centered systems, underscoring the importance of explicitly communicating and delivering cognitive value through structured interaction. In addition, the direct path from intrinsic motivation to behavioral intention was also non-significant, diverging from the results reported by Lai (2023) [
56]. However, intrinsic motivation exhibited a significant overall effect on behavioral intention through indirect pathways, indicating that motivation influences intention primarily via perceived usefulness and learning support. The absence of a direct relationship should not be interpreted as evidence that the link between intrinsic motivation and behavioral intention is unimportant. Rather, it may suggest that, in this context, intrinsic motivation toward the tool was expressed less as a direct driver of intention and more through students’ beliefs about the system’s usefulness and its capacity to support learning. One possible interpretation is that first-year mechanical engineering students approached SPARK_AI_MATH primarily as a practical learning support tool; accordingly, their intention to continue using it depended more strongly on its perceived instructional value than on whether the system itself was intrinsically motivating to use. At the same time, this interpretation remains tentative and should be treated with caution, as the present quantitative design does not permit firm conclusions about the underlying reasons for this pattern.
Qualitative feedback complemented the quantitative results by clarifying how students experienced SPARK_AI_MATH. Participants highlighted clear explanations, step-by-step guidance, and, in some cases, visualization as key strengths, which is consistent with prior research showing that students value educational chatbots and intelligent tutoring systems that provide explanatory feedback, structured guidance, and learning support [
69,
70,
71]. In particular, the emphasis on step-by-step problem decomposition aligns with the broader literature on intelligent tutoring systems, in which guided support is regarded as a central mechanism for promoting understanding and problem-solving [
70]. At the same time, students reported practical limitations affecting interaction efficiency, including response latency, limited multimodal input, and the inability to edit previous turns. Such issues are highly relevant for the overall user experience and align with prior findings emphasizing the importance of response time in conversational systems [
72] and the value of flexible input modalities, including handwriting-based mathematical input, in tutoring environments [
73]. Overall, the findings reinforce that the system’s primary value lies in its reasoning control mechanisms rather than raw answer generation, and they motivate concrete requirements: latency aware response handling and multimodal input support for diagram or handwriting-based work.
From a system design perspective, these findings highlight the importance of explicit reasoning control, guided problem solving, and sustained engagement in LLM-based conversational systems. The strong perceptions of learning support align with the design choice to enforce stepwise reasoning and integrate visualization into the tutoring workflow. However, these results are preliminary and reflect short term, first use perceptions. Future work should complement acceptance evidence with objective task and learning measures, such as solution accuracy, quality of intermediate problem-solving steps, time to solution, and retention or transfer. Larger samples would also enable more reliable PLS SEM analyses to examine indirect effects, group differences, and changes over sustained use. Overall, perceived value appears to depend more on interaction architecture that supports structured and transparent reasoning than on generative fluency alone.
In comparison with other similar systems, SPARK_AI can be situated in relation to several recent LLM-based tutoring architectures. Similar to Khanmigo, it is intended to support learning through guided interaction rather than direct answer delivery; however, whereas Khanmigo is embedded in Khan Academy’s broader multi-subject learning ecosystem and is presented as an AI tutor that prompts students to think critically without simply providing answers, SPARK_AI places stronger emphasis on explicit control of reasoning progression in mathematics-focused dialog (Khan Academy, 2023) [
74].
In contrast to systems embedded within predefined content structures, SPARK_AI supports more flexible, user-initiated problem exploration. SPARK_AI also shares important pedagogical ground with TALPer, which combines Socratic dialog and Pólya’s problem-solving strategy within the Taiwan Adaptive Learning Platform, but differs in targeting older learners and in framing support through a more explicit reasoning-control architecture with a distinct hint policy and optional tool-mediated visualization support [
15]. A further point of comparison is SocraticLM, which advances a Socratic, thought-provoking teaching paradigm through multi-round tutoring dialogs; compared with such approaches, SPARK_AI is less centered on model-level teaching-style generation and more on reproducible interaction design that regulates how support unfolds across phases of problem solving [
75]. Finally, unlike recent LLM tutoring architectures that emphasize personalization, learned pedagogical response generation, or model-level optimization of student outcomes, SPARK_AI foregrounds reproducible reasoning control at the interaction-design level through pedagogically constrained orchestration rather than personalization or model fine-tuning alone [
76]. This distinction is particularly important in mathematical problem solving, where the structure and progression of reasoning play an important role in learning. From this perspective, the main contribution of SPARK_AI lies not in claiming general superiority over other AI tutors, but in offering a transparent and controllable architecture for process-oriented mathematical reasoning support.