1. Introduction
In recent years, the mental health crisis among postsecondary students has emerged as a critical concern within the educational and healthcare communities [
1,
2,
3,
4]. A growing body of evidence highlights a significant uptick in mental health symptoms, including anxiety, depression, stress-related and sleep disorders, and even suicidal ideation, thereby impacting students’ academic performance, quality of life, and overall well-being [
5,
6].
Data from a United Kingdom (UK) survey show a notable increase in students reporting a serious psychological issue from 2018 to 2020 [
7]. These rates are compounded by the recent pandemic, which has impacted negatively on student mental health, with increases in depression and alcohol-use disorder [
8]. The transition to higher education introduces a unique set of stressors. As educators and healthcare professionals grapple with these complexities, the potential of Artificial Intelligence (AI) in offering innovative support mechanisms has come to the fore.
AI chatbots, designed to simulate conversational interactions, present a promising avenue for providing immediate, accessible, and non-judgmental mental health support [
9,
10]. With the COVID-19 pandemic accelerating healthcare services’ digitalization, the integration of AI technologies such as chatbots into postsecondary educational settings is posited as a transformative approach to supplement traditional mental health services, thereby potentially bridging gaps in access disparity and reducing stigma associated with seeking help [
11]. However, using AI in sensitive areas like mental health brings up a lot of ethical, privacy, and safety concerns, especially when it comes to how predictable and reliable AI responses are in handling the wide range of complex human emotions [
12,
13].
A critical hurdle to the adoption of AI chatbots for mental health support in educational settings is the IRB. The uncertainty surrounding AI chatbots’ responses and the potential for unforeseen harm poses significant challenges, prompting IRBs to exercise caution [
14].
Luna is built as a modular PHP-based web application with dedicated components for configuration, user authentication, chat processing, and API integration with the GPT-4 AI model. Details on Luna’s technical implementation—including novel prompt engineering techniques and robust safety guardrails—are provided. Through examining the development, pilot testing, and subsequent modifications of Luna, our research team addresses the critical concerns raised by the IRB committee. Importantly, the current work identifies unique policy challenges to ethically integrate AI chatbots into educational settings.
The first section of the paper reviews the literature exploring mental health challenges in higher education and strategies for adopting AI in these settings. This is followed by the method section, which first describes the development and pre-testing processes of the AI chatbot, Luna, and later the research design, encompassing multi-methods approaches to pilot-test Luna. The results section highlights the findings from the pilot study. Before the conclusion, insights into the study’s implications for health policy, its limitations, and areas for future research and a discussion of the complexities of navigating the ethical approval processes are shared.
2. Literature Review
Mental health concerns among postsecondary students have risen significantly over the past decade, with data indicating a 50% increase in reported issues since 2013 [
3,
15]. The American College Health Association (ACHA) surveys consistently reported stress, anxiety, depression, and suicidal ideation as being significant in the higher education population, with one study demonstrating an association of these mental health concerns with unhealthy lifestyle behaviors such as poor diet and substance use [
1].
Specific mental health concerns intensified by the pandemic include depression, substance use, eating disorders, and suicidal ideation [
16,
17,
18]. Recent data from 2023 from the Center for Collegiate Mental Health (CCMH) at Penn State University also found that anxiety is the most common initial concern that students present with. It was also found that social anxiety and family distress concerns have continued to increase since their initial spread during the pandemic [
19]. Aside from the barrier of access to mental healthcare, other major factors include academic and financial stress [
20]. Counseling provided through higher education institutions is generally effective and helpful, with evidence of improvements in student academic achievement and retention [
21]. However, the demand for resources far exceeds the availability for helping college students [
22]. Notably, universities have difficulties in developing policies and methods to identify the mental health challenges that college students may face and in finding solutions to address student mental health issues within a large population [
23]. Postsecondary students have the task of maintaining their academic performance and balancing the need for financial support, which can create overwhelming anxiety and worry. These worries can also be exacerbated by a background of childhood poverty. The financial burden alongside chronic stress can lead to hopelessness and helplessness [
2]. Overall well-being and mental wellness can be significantly affected by financial instability or the fear of the unknown. Additionally, postsecondary students face peer and societal pressure that further endorses the breeding ground for mental health challenges [
24]. Consequently, feelings of self-worth and self-doubt contribute to the psychological burden, intensifying the overall pressure students experience. This is further complicated by the financial pressures of inflating tuition fees, serviced by ballooning student loans and/or non-sustainable scholarships. For an increasing number of students, financial stress can be a significant trigger for mental health disorders [
25]
When students are in an academic setting and feel pressured to succeed, fear often contributes quickly to poor mental health conditions. The fear of failing can worsen pre-existing mental health conditions and cause students to develop mental health challenges that may not have previously existed. Finally, a transition from high school to postsecondary education can also contribute to mental health challenges that may not have existed previously or could exacerbate ones that already exist [
26].
Several existing mental health chatbots have been deployed in educational or therapeutic contexts, including Woebot, Wysa, Tess, and Pi. Woebot and Wysa use scripted responses based on cognitive behavioral therapy (CBT) principles and have shown promise in delivering short-term anxiety and mood support. Tess, developed for mental health coaching, is deployed in clinical and organizational settings and integrates emotional AI to tailor responses. Pi is a newer conversational AI known for its supportive and empathetic tone [
27]. These tools rely on predefined frameworks or static conversation models. Luna distinguishes itself through its use of a GPT-4 foundation, dynamic prompt engineering, and university-contextualized safety guardrails.
In the absence of accessible pre-existing support systems, students may face an escalating burden from these adverse conditions, particularly as their coping capacities diminish over time. Symptoms of depression, anxiety, or burnout can lead to a decline in student well-being and academic performance. Stigma and barriers to accessing or seeking care are common issues that college students often encounter, exacerbating the risk of mental health disorders or causing a decline in current ones [
28]. The common factors that can lessen the intensity of the challenges would be adequate support and coping mechanisms to enhance mental well-being and overall college experience. Many students experience feelings of loneliness, homesickness, and a sense of disorientation as they adapt to their new environment [
29].
There have been some studies that have studied AI and mental health with students. One study found that students interacting with an AI chatbot valued its empathetic language and viewed it as a supportive peer for practicing emotional expression [
30]. Similarly, a different large-scale survey [
31] reported that nearly half of university students had engaged with mental health chatbots, citing benefits such as anonymity and accessibility, though concerns about limited personalization persisted. Complementing these findings, an additional review [
32] emphasized both the potential and limitations of conversational agents in mental healthcare, particularly regarding crisis management, intervention fidelity, and ethical oversight. Together, these studies reinforce the promise and challenges of deploying AI-driven mental health support systems like Luna within higher education.
3. Materials and Methods
3.1. Design Science Framework
The development and evaluation of Luna was guided by the Design Science Research (DSR) methodology, incorporating both the Three-Cycle View and the Framework for Evaluation in Design Science (FEDS). This approach facilitated the iterative construction of the chatbot as an artifact responsive to both user needs and institutional requirements. The Relevance Cycle connected Luna to the higher education context—in particular, student mental health challenges and IRB review processes—while the Design Cycle encompassed the build–evaluate–refine loop informed by expert feedback and user pilot data. The Rigor Cycle drew upon theoretical foundations in AI ethics, digital mental health, and counseling practices, particularly in the creation of ethical guardrails and the validation of evidence-based responses.
Figure 1a presents an adapted DSR model that contextualizes Luna’s developmental process within these three interacting cycles.
Figure 1b illustrates the four core phases of the Luna chatbot’s development: Prompt Design, Prototype, Pilot Testing, and Refinement. These phases align with traditional system development stages—Planning, System Design, Implementation, and Evaluation—mapped beneath each corresponding project-specific milestone.
3.2. Study Design and Rationale
This pilot study was conducted to evaluate “Luna” from both design-science and empirical research perspectives. The design-science aspect involved the iterative development and documentation of Luna’s modular architecture, with the aim of enabling reproducibility in different contexts. Empirically, the pilot investigated perceptions of Luna’s usability, safety, and usefulness among a small group of participants, without including a formal control group. College students commonly experience three core themes under periods of high stress: Time Management, Anxiety, and Stress. Each theme has subthemes, providing students with the opportunity to utilize evidence-based resources, such as mindfulness activities, time management techniques, and strategies for coping with school-related stress, through Luna. The themes and subthemes were used to develop prompts that generated content to assist students in developing effective coping strategies and self-regulation techniques and encourage them to prioritize their mental health.
3.3. IRB Approval and Ethical Procedures
The study was reviewed by the IRB at the University of Detroit Mercy (UDM)—(Protocol #23-24-38), with deferred full approval and a requested pilot study to evaluate the safety of Luna. Participants provided written informed consent, and the study adhered to ethical safeguards that aligned with the Common Rule (45 CFR 46), including data anonymization, access-restricted storage, and crisis response protocols for at-risk users. Although the study was not registered under CONSORT-AI or CARE guidelines due to its pilot nature, key checklist components—such as version disclosure (GPT-4), consistent prompt delivery, participant safety measures, and data governance—were addressed. Future research will seek full compliance with IRB approval, with formal reporting standards for AI-based mental health interventions to strengthen reproducibility and ethical transparency.
3.4. Computational Implementation and System Architecture
Luna was designed as a modular PHP web application with distinct components for user authentication, chat processing, prompt engineering (via GPT-4), and integrated safety guardrails. The chatbot’s iterative refinement was guided by early feedback from AI ethics experts and mental health professionals, resulting in enhancements to response accuracy, style, and crisis escalation protocols. The system was composed of several distinct components.
This file established the environment settings for Luna, including database connection details, API keys, and other critical parameters.
These scripts secured user login and logout processes, ensuring that only authorized users can access Luna.
This component served as the entry point and dashboard; these files provided the main interface through which users interact with Luna.
This component handled the processing of user inputs, including the detection of sensitive content and the activation of safety protocols.
Acting as the intermediary, this file packages user queries and communicates with the GPT-4 AI backend to retrieve and deliver generated responses.
When a user logs in via the authentication module, they access the main dashboard where a chat session is initiated. User inputs are processed by the chat module, which applies safety guardrails, such as checking for sensitive content, and then forwards sanitized input to the GPT-4 backend through the API endpoint. The AI-generated response is returned to the user, completing the interaction. A simplified pseudocode example of this logic is shown in
Figure 2. This flowchart outlines the full interaction pipeline for the Luna mental health chatbot. After receiving user input, the system performs content filtering to detect sensitive or emergency topics. If flagged, the system escalates to safety protocols or provides crisis referral resources. If deemed safe, the input is sanitized and forwarded to the GPT-4 API using structured prompt formatting. The AI response is post-processed with guardrails and presented to the user, followed by optional user feedback collection. All interactions are securely logged for audit and improvement.
3.5. Luna Development Stages
The development process for Luna entailed several key stages.
Luna’s foundational model was built using GPT-4, which had been fine-tuned with a specific focus on mental health-related conversations.
To ensure that Luna understood and responded appropriately to a wide range of student concerns, its customization involved prompt engineering techniques designed to train the AI on mental health scenarios, including discussions on anxiety, depression, and stress management.
Figure 3 illustrates the user interface designed for clarity and ease of use, presenting categorized support options for time management, anxiety, and stress through intuitive, clickable prompts, as shown in the interface screenshot.
Luna’s prompt engineering strategy was specifically tailored for mental health conversations, incorporating a multi-layered approach to guide GPT-4’s responses. The system used a persistent system-level prompt that defined Luna’s persona, therapeutic tone, and safety constraints. This was dynamically paired with real-time user input and contextual metadata—such as emotional keywords (e.g., “stressed,” “overwhelmed”)—to inject additional guidance into the prompt. For example, if the input contained a test-related concern, a calming or cognitive reframing directive was applied. Unlike standard prompt engineering, which typically involves static, one-shot instruction templates, Luna’s approach used adaptive prompting, integrating intent detection, content filters, and escalation keywords. This ensured that generated responses remained contextually relevant, emotionally appropriate, and ethically constrained. This layered prompt architecture was key to supporting Luna’s therapeutic use-case, setting it apart from traditional chatbot prompting techniques. A key benefit of a prompt engineering approach was the implementation of safety guardrails. These guardrails were designed to recognize conversations around sensitive topics surrounding anxiety, depression, suicidality and other mental-health related issues and respond accordingly.
Anxiety and Depression: When conversations indicated symptoms of anxiety or depression, Luna was programmed to recommend university counseling services. This was achieved through pre-defined triggers that, upon detection, guided the conversation towards professional resources.
Suicidality: For users mentioning suicidality, Luna was designed to immediately provide emergency contact information and national suicide helpline resources. This critical safety measure ensured that users would receive prompt and appropriate guidance in urgent situations.
Stage 3. AI illustrative applications
To enhance Luna’s therapeutic capabilities, illustrative applications (
Figure 3) were integrated into the system via prompt modification. These applications included therapy sessions, mindfulness exercises, and cognitive-behavioral techniques, which Luna could recommend based on the user’s needs.
Each application was carefully selected and tested to align with best practices in virtual mental health support.
Figure 4 is an example of a mindfulness request.
Prior to deployment, Luna was extensively pre-tested to ensure the soundness of its AI responses. A diverse set of simulated user interactions was created to test Luna’s responses across various scenarios. These simulations helped to identify potential weaknesses in the AI’s understanding and responsiveness.
3.6. Research Design
The pilot employed a mixed-methods approach to assess the effectiveness and safety of Luna. This approach was adopted to leverage the strengths of both quantitative and qualitative methods, allowing for a comprehensive evaluation of Luna’s performance from multiple perspectives. Participants were recruited voluntarily from a diverse group of college students and healthcare professionals to reflect a broad spectrum of users. Detailed information about the study’s objectives, the function of the AI chatbot, and ethical considerations, including data privacy, was provided to the participants, from whom informed consent was individually registered prior to study participation.
3.7. Participants and Recruitment
A total of 52 individuals participated in the pilot, including 34 students drawn from various universities and academic programs (e.g., nursing, psychology, and behavioral analysis) and 18 healthcare professionals or faculty members (e.g., nurses, nurse practitioners, and professors). Recruitment was conducted through emailed invitations and campus flyers. Volunteers were directed to a secure login portal for interacting with Luna, after which they submitted anonymized transcripts of their conversations and completed an online survey.
3.8. Data Collection and Instruments
Participants engaged in one or more chat sessions with Luna and were subsequently asked to provide transcript excerpts, enabling the research team to assess conversation quality and adherence to safety measures. In addition, participants completed a web-based survey designed to capture the following information:
Perceived Usefulness: This was assessed on five-point Likert scales, with items such as “How helpful was Luna’s advice in managing stress or anxiety?” and “Did interacting with Luna motivate you to seek additional mental health resources?”
Safety and Appropriateness: Participants indicated agreement with statements such as “I felt comfortable sharing personal concerns with Luna,” using five-point scales.
Topic-Specific Queries: The survey included items related to time management, relationships, academic pressure, and other non-suicidal concerns to ensure a broad capture of issues beyond crisis-level interactions.
Open-Ended Feedback: Free-response fields allowed the participants to describe any particularly useful or unhelpful chatbot responses and to suggest improvements or additional features.
3.8.1. Quantitative Methods
Quantitative data were collected using a structured, custom-developed online survey designed to assess the perceived usefulness and safety of Luna. Survey items were informed by the literature on digital mental health interventions and refined through pilot testing with students to ensure clarity. Participants completed the survey following their interaction with Luna, and informed consent was obtained in accordance with IRB-approved ethical procedures.
Quantitative data were analyzed via statistical analytics software Microsoft Excel. Descriptive statistics were used to summarize the data, while inferential statistics, such as t-tests and effect size, were employed to identify significant differences in perceptions based on captured responses. We used independent-samples t-tests to compare mean usefulness and safety ratings between student and non-student participants. While Likert-scale responses are ordinal, prior research supports treating them as approximately interval-level data when 5-point scales are used, and the distribution is relatively symmetric. Given the sample sizes (n = 34 for students, n = 18 for non-students), t-tests were considered robust enough to tolerate minor violations of normality. Although we did not perform formal normality testing, the use of t-tests in pilot studies of this size and type is common in exploratory research. We also calculated Cohen’s d to assess the practical significance of group differences, using conventional interpretation thresholds of 0.2 (small), 0.5 (medium), and 0.8 (large).
3.8.2. Qualitative Methods
Qualitative feedback was gathered via open-text fields embedded within the structured surveys. This approach allowed participants to provide detailed comments on their experiences with Luna, focusing on aspects such as the appropriateness of responses, perceived empathy, and any concerns regarding privacy and safety.
- A.
Prompt Development: Open-ended questions were designed to elicit rich, descriptive feedback. For example, participants were asked “Please describe any specific instances where you felt Luna’s response was particularly helpful or unhelpful”.
- B.
Pre-Testing: The open-text fields were tested with a small group to ensure they effectively captured the desired feedback.
Qualitative data were collected alongside the quantitative surveys. Participants were encouraged to provide as much detail as possible in their responses. Informed consent included a clause in which participants agreed to provide qualitative feedback as part of their participation.
Qualitative data were analyzed vis-à-vis logical thematic analysis. Responses were coded to identify recurring themes and patterns. Two independent coders reviewed the data to enhance reliability, and any discrepancies were resolved through discussion. Key themes related to the appropriateness of Luna’s responses and user concerns were highlighted and documented.
4. Results
4.1. Participant Demographics
The evaluation involved 52 voluntary participants, reflecting a diverse group with substantial representation from both academia and healthcare.
Professors and healthcare professionals constituted 33.4% of the participants, showcasing the interest and relevance of Luna across professional domains. A significant majority, 76.6%, were college students from over 20 institutions, encompassing healthcare professionals such as nurses and nurse practitioners (45%) and students (55%) in fields such as Health Services Administration, Nursing, Psychology, and Behavioral Analysis.
This mix offered a broad array of perspectives on the study’s subject matter.
4.2. Safety and Themes of Interactions
A paramount concern in the implementation of AI for mental health support is the safety and appropriateness of interactions.
In our study, an overwhelming 96% of respondents considered their interactions with Luna to be safe (
Figure 5). This finding indicated a high degree of trust in the chatbot’s capacity to navigate sensitive topics and offered support without posing risks to users, a testament to the effectiveness of the implemented safety guardrails. These results are consistent with a meta-analysis which found that participant interactions with similar chatbots were broadly safe, with no worsening of symptoms, distress, or adverse events reported [
33]. Another survey [
11] demonstrated that users of chatbots integrating into existing mental health apps generally indicate high satisfaction and positive feelings regarding their interactions with such chatbots. Data were also gathered regarding the common themes students explored in their interactions with Luna (
Figure 6), with anxiety being the most common topic students wanted to discuss. Again, these results are consistent with the common reasons people make use of chatbots, frequently including themes of anxiety, stress, depression, and self-care [
27].
4.3. Perceived Usefulness of Luna
The utility of Luna as a mental health support tool was affirmed by most respondents (
Figure 7), with 90.39% providing a usefulness rating between 3 (“useful”) and 5 (“very useful”). The finding demonstrated strong confidence in the chatbot’s potential to positively influence student well-being and serve as a supportive resource. These results are consistent with a recent meta-analysis which showed that AI-based chatbots were initially effective in treating anxiety and depression [
34].
Another meta-analysis reported high usefulness ratings among participants from multiple studies, with benefits such as privately practicing conversations, preparing for conversations with mental health professionals, and creating a sense of self-accountability [
12].
4.4. Comparison of Students v. Non-Students
An analysis was performed comparing the responses of students v. non-students regarding themes of interactions, perceived usefulness, and perceived safety.
Table 1 lists the percentage of responses mentioning different themes from students compared with non-students.
An exploratory comparison was conducted to assess thematic differences between students and non-students in their interactions with Luna, focusing on issues like anxiety, depression, and time management. Although two-tailed t-tests were used to evaluate differences in theme frequency, no statistically significant results were found (see
Table 2).
A detailed analysis of perceived usefulness and safety is presented in
Table 3. Participants rated Luna’s usefulness on a 5-point Likert scale (1 = very little use; 5 = very useful). Students reported a significantly higher mean usefulness rating than non-students (
p = 0.0359;
p < 0.05) using a two-tailed t-test assuming unequal variances. For safety, participants responded ‘Yes’ or ‘No’ regarding whether Luna provided safe and appropriate responses. All students (100%) responded ‘Yes’, compared to 86.67% of non-students. However, this difference was not statistically significant (
p = 0.164;
p > 0.05).
4.5. Areas for Improvement
Despite the overall positive reception, 9.62% of participants gave a low usefulness rating of 1 (“not useful at all”) or 2 (“somewhat useful”). These ratings point to areas for improvement and the need to understand the specific shortcomings perceived by these users.
Detailed analysis of this feedback was used to refine Luna’s response mechanisms to ensure that it met the diverse needs of all student demographics. Prior research into the drawbacks of AI chatbots in mental health indicated one or more of the following: the perceptions of limited personalization, users becoming caught in conversational loops, and seemingly inappropriate use of referrals to crisis support services, either presenting such resources when not needed or not doing so when they were [
27].
Similar themes were found in the present study, as discussed in the next section.
4.6. Qualitative Analysis of Common Data
Table 4 categorizes specific pieces of feedback (coded data extracts) from survey responses into broader themes. Each entry in the “Data Extract” column represents a comment from the user feedback field, aligned with a specific “Code” that identifies a key aspect or concern mentioned by the respondent. These are then linked to the “Generated Theme”, which groups similar codes into meaningful categories that represent overarching insights or issues highlighted by the respondents.
5. Discussion
While Luna’s initial pilot outcomes are encouraging, caution must be exercised when interpreting these results. The small sample size, absence of a control group, and reliance on self-reported measures limit the strength of the conclusions. Moreover, the chatbot’s limitations in addressing high-risk disclosures and tailoring interactions to individual needs highlight the critical importance of continuous system refinement and ethical oversight in future deployments.
The thematic analysis of user feedback on Luna reveals critical insights into the chatbot’s functionality and user satisfaction. As separately detailed below, five main themes emerged from analyzing the coded survey responses: (1) Privacy and Data Security, (2) Response Quality and Utility, (3) User Experience Customization, (4) Accessibility of Supplementary Support, and (5) Communication Style. Together, these derived themes highlight the prevalent concerns and suggestions for improvement.
5.1. Privacy and Data Security
Concerns about privacy and data security were prominent among users. Participants expressed apprehension regarding the confidentiality of their interactions with Luna, questioning how data privacy is maintained. For instance, one user asked whether “questions asked by students will remain private and not become public”.
This theme underscores the necessity for stringent data protection measures and transparent communication about how user data are handled, which are crucial for maintaining trust in AI systems. Concerns regarding data privacy and security are common in discussions regarding AI chatbots. Recommendations from researchers in this area include implementing proper disclosures regarding the use of patient data, whether that be simply storing said data or using data to further train AI models, following data privacy legislation where appropriate, and/or disclosure regarding data storage and security [
35].
5.2. Response Quality and Utility
Feedback on the quality and utility of Luna’s responses was mixed. While many users found the chatbot’s suggestions helpful, there was notable feedback concerning the adequacy of responses, especially regarding severe mental health issues. For example, one participant highlighted the need for more than just a number to call when discussing active suicidal thoughts, suggesting a limitation of the guardrails that were implemented to specifically avoid this conversation in Luna’s willingness to handle crises effectively.
Essentially, this theme indicates the importance of a policy regarding more robust support in critical situations. Other research has also identified this limitation regarding the use of AI chatbots in mental health support [
12,
27].
5.3. User Experience Customization
The desire for personalized interactions emerged as a significant theme. Users expressed preferences for responses that are tailored to individual needs and circumstances, with suggestions for Luna to “remember previous conversations” to enhance personalization.
More specifically, this theme reflects the growing expectation for AI services to adapt to individual user profiles, enhancing the relevance and impact of their support. These participant experiences have been echoed in the literature [
27,
35], with concerns regarding the generic nature of chatbot responses, as well as responses that are not congruent with the conversation being conducted.
5.4. Accessibility of Supplementary Support
Under this theme, users generally recommended that Luna integrates more actionable support resources, for example, local mental health services and/or immediate helplines. As to the suggestion to “include local support resources in the chat”, it clearly points to a need for Luna to offer more than generic advice; in other words, there is a need to provide users with practical, location-specific options for assistance.
5.5. Communication Style
Preferences regarding the communication style of Luna were also noted. Users favored concise and precise responses, with feedback indicating that some of Luna’s replies were overly lengthy and complex. As one user put it, the answers were “very lengthy… most college students would prefer shorter, more bite-sized answers”.
This theme stresses the need for AI communication to be easily digestible and adjusted to fit the typical user’s attention span and information processing preferences.
Figure 8 provides an example of a useful but verbose response.
These user characterizations of Luna are consistent with common user comments regarding the communication style of chatbots. As part of a meta-analysis into user perceptions of chatbots [
12], common issues brought up were confusing responses, shallow responses, and responses containing an overwhelming amount of information.
5.6. Limitations and Areas for Future Research
This pilot did not include a control or comparison group, limiting the strength of causal inferences. While the pilot’s results are promising, they also reveal limitations that warrant further investigation. For example, the lower usefulness ratings by some participants suggest that future research should explore the customization of mental health chatbot interventions to better address individual preferences and needs. Such a limitation has been echoed by previous researchers [
12] and remains an area for further development in the field of chatbot development.
This study did not include formal normality testing for the t-tests conducted to compare student and non-student groups. While small sample sizes and Likert-type data can present challenges for parametric assumptions, we relied on the robustness of the t-test under these conditions as supported by previous literature. Nevertheless, future studies should use larger samples, formally assess distributional assumptions, and consider non-parametric alternatives such as the Mann–Whitney U test to validate group comparisons more rigorously.
Overall, the present study’s pilot nature and small sample size limit the internal validity of our findings. Without a control group or randomization, it is not possible to attribute observed positive outcomes solely to the Luna intervention. Furthermore, self-selection bias may have influenced participants’ responses. Future research will address these limitations by employing a Randomized Controlled Trial (RCT) design or a quasi-experimental framework, comparing Luna to standard mental health resources to strengthen causal inference.
6. Conclusions
Our study demonstrates that Luna is both effective and safe as a digital mental health intervention. Beyond its practical impact on student well-being, the detailed account of its computational implementation—including a modular PHP architecture and integrated safety protocols—provides a reproducible framework for future research. The availability of the complete source code further underscores our commitment to transparency and innovation in AI-driven mental health support.
Finally, our pilot analysis not only informs key areas for improving Luna construction and development but also contributes to broader discussions on the integration of AI in mental health services, with implications for policy, practice, and user engagement. The integration of AI tools like Luna into educational settings reveals significant potential to support student mental health, as mental health is a priority in ensuring a student’s academic success.