Deploying a Mental Health Chatbot in Higher Education: The Development and Evaluation of Luna, an AI-Based Mental Health Support System

Olla, Phillip; Barnes, Ashlee; Elliott, Lauren; Abumeeiz, Mustafa; Olla, Venus; Tan, Joseph

doi:10.3390/computers14060227

Open AccessArticle

Deploying a Mental Health Chatbot in Higher Education: The Development and Evaluation of Luna, an AI-Based Mental Health Support System

by

Phillip Olla

^1,*,

Ashlee Barnes

²,

Lauren Elliott

³,

Mustafa Abumeeiz

³,

Venus Olla

⁴ and

Joseph Tan

⁵

¹

College of Health Professionals, University of Detroit Mercy, Detroit, MI 48221, USA

²

McAuley School of Nursing, University of Detroit Mercy, Detroit, MI 48221, USA

³

Schulich School of Medicine and Dentistry, Western University, London, ON N6A 3K7, Canada

⁴

Clinical Therapist Student Counselling Centre, University of Windsor, Windsor, ON N9B3P4, Canada

⁵

DeGroote School of Business, McMaster University, Hamilton, ON L8S 4K1, Canada

^*

Author to whom correspondence should be addressed.

Computers 2025, 14(6), 227; https://doi.org/10.3390/computers14060227

Submission received: 23 April 2025 / Revised: 4 June 2025 / Accepted: 4 June 2025 / Published: 10 June 2025

(This article belongs to the Special Issue Transformative Approaches in Education: Harnessing AI, Augmented Reality, and Virtual Reality for Innovative Teaching and Learning)

Download

Browse Figures

Versions Notes

Abstract

Rising mental health challenges among postsecondary students have increased the demand for scalable, ethical solutions. This paper presents the design, development, and safety evaluation of Luna, a GPT-4-based mental health chatbot. Built using a modular PHP architecture, Luna integrates multi-layered prompt engineering, safety guardrails, and referral logic. The Institutional Review Board (IRB) at the University of Detroit Mercy (Protocol #23-24-38) reviewed the proposed study and deferred full human subject approval, requesting technical validation prior to deployment. In response, we conducted a pilot test with a variety of users—including clinicians and students who simulated at-risk student scenarios. Results indicated that 96% of expert interactions were deemed safe, and 90.4% of prompts were considered useful. This paper describes Luna’s architecture, prompt strategy, and expert feedback, concluding with recommendations for future human research trials.

Keywords:

mental health chatbots; conversational AI; Luna; mental health; postsecondary students; higher education

1. Introduction

In recent years, the mental health crisis among postsecondary students has emerged as a critical concern within the educational and healthcare communities [1,2,3,4]. A growing body of evidence highlights a significant uptick in mental health symptoms, including anxiety, depression, stress-related and sleep disorders, and even suicidal ideation, thereby impacting students’ academic performance, quality of life, and overall well-being [5,6].

Data from a United Kingdom (UK) survey show a notable increase in students reporting a serious psychological issue from 2018 to 2020 [7]. These rates are compounded by the recent pandemic, which has impacted negatively on student mental health, with increases in depression and alcohol-use disorder [8]. The transition to higher education introduces a unique set of stressors. As educators and healthcare professionals grapple with these complexities, the potential of Artificial Intelligence (AI) in offering innovative support mechanisms has come to the fore.

AI chatbots, designed to simulate conversational interactions, present a promising avenue for providing immediate, accessible, and non-judgmental mental health support [9,10]. With the COVID-19 pandemic accelerating healthcare services’ digitalization, the integration of AI technologies such as chatbots into postsecondary educational settings is posited as a transformative approach to supplement traditional mental health services, thereby potentially bridging gaps in access disparity and reducing stigma associated with seeking help [11]. However, using AI in sensitive areas like mental health brings up a lot of ethical, privacy, and safety concerns, especially when it comes to how predictable and reliable AI responses are in handling the wide range of complex human emotions [12,13].

A critical hurdle to the adoption of AI chatbots for mental health support in educational settings is the IRB. The uncertainty surrounding AI chatbots’ responses and the potential for unforeseen harm poses significant challenges, prompting IRBs to exercise caution [14].

Luna is built as a modular PHP-based web application with dedicated components for configuration, user authentication, chat processing, and API integration with the GPT-4 AI model. Details on Luna’s technical implementation—including novel prompt engineering techniques and robust safety guardrails—are provided. Through examining the development, pilot testing, and subsequent modifications of Luna, our research team addresses the critical concerns raised by the IRB committee. Importantly, the current work identifies unique policy challenges to ethically integrate AI chatbots into educational settings.

The first section of the paper reviews the literature exploring mental health challenges in higher education and strategies for adopting AI in these settings. This is followed by the method section, which first describes the development and pre-testing processes of the AI chatbot, Luna, and later the research design, encompassing multi-methods approaches to pilot-test Luna. The results section highlights the findings from the pilot study. Before the conclusion, insights into the study’s implications for health policy, its limitations, and areas for future research and a discussion of the complexities of navigating the ethical approval processes are shared.

2. Literature Review

Mental health concerns among postsecondary students have risen significantly over the past decade, with data indicating a 50% increase in reported issues since 2013 [3,15]. The American College Health Association (ACHA) surveys consistently reported stress, anxiety, depression, and suicidal ideation as being significant in the higher education population, with one study demonstrating an association of these mental health concerns with unhealthy lifestyle behaviors such as poor diet and substance use [1].

Specific mental health concerns intensified by the pandemic include depression, substance use, eating disorders, and suicidal ideation [16,17,18]. Recent data from 2023 from the Center for Collegiate Mental Health (CCMH) at Penn State University also found that anxiety is the most common initial concern that students present with. It was also found that social anxiety and family distress concerns have continued to increase since their initial spread during the pandemic [19]. Aside from the barrier of access to mental healthcare, other major factors include academic and financial stress [20]. Counseling provided through higher education institutions is generally effective and helpful, with evidence of improvements in student academic achievement and retention [21]. However, the demand for resources far exceeds the availability for helping college students [22]. Notably, universities have difficulties in developing policies and methods to identify the mental health challenges that college students may face and in finding solutions to address student mental health issues within a large population [23]. Postsecondary students have the task of maintaining their academic performance and balancing the need for financial support, which can create overwhelming anxiety and worry. These worries can also be exacerbated by a background of childhood poverty. The financial burden alongside chronic stress can lead to hopelessness and helplessness [2]. Overall well-being and mental wellness can be significantly affected by financial instability or the fear of the unknown. Additionally, postsecondary students face peer and societal pressure that further endorses the breeding ground for mental health challenges [24]. Consequently, feelings of self-worth and self-doubt contribute to the psychological burden, intensifying the overall pressure students experience. This is further complicated by the financial pressures of inflating tuition fees, serviced by ballooning student loans and/or non-sustainable scholarships. For an increasing number of students, financial stress can be a significant trigger for mental health disorders [25]

When students are in an academic setting and feel pressured to succeed, fear often contributes quickly to poor mental health conditions. The fear of failing can worsen pre-existing mental health conditions and cause students to develop mental health challenges that may not have previously existed. Finally, a transition from high school to postsecondary education can also contribute to mental health challenges that may not have existed previously or could exacerbate ones that already exist [26].

Several existing mental health chatbots have been deployed in educational or therapeutic contexts, including Woebot, Wysa, Tess, and Pi. Woebot and Wysa use scripted responses based on cognitive behavioral therapy (CBT) principles and have shown promise in delivering short-term anxiety and mood support. Tess, developed for mental health coaching, is deployed in clinical and organizational settings and integrates emotional AI to tailor responses. Pi is a newer conversational AI known for its supportive and empathetic tone [27]. These tools rely on predefined frameworks or static conversation models. Luna distinguishes itself through its use of a GPT-4 foundation, dynamic prompt engineering, and university-contextualized safety guardrails.

In the absence of accessible pre-existing support systems, students may face an escalating burden from these adverse conditions, particularly as their coping capacities diminish over time. Symptoms of depression, anxiety, or burnout can lead to a decline in student well-being and academic performance. Stigma and barriers to accessing or seeking care are common issues that college students often encounter, exacerbating the risk of mental health disorders or causing a decline in current ones [28]. The common factors that can lessen the intensity of the challenges would be adequate support and coping mechanisms to enhance mental well-being and overall college experience. Many students experience feelings of loneliness, homesickness, and a sense of disorientation as they adapt to their new environment [29].

There have been some studies that have studied AI and mental health with students. One study found that students interacting with an AI chatbot valued its empathetic language and viewed it as a supportive peer for practicing emotional expression [30]. Similarly, a different large-scale survey [31] reported that nearly half of university students had engaged with mental health chatbots, citing benefits such as anonymity and accessibility, though concerns about limited personalization persisted. Complementing these findings, an additional review [32] emphasized both the potential and limitations of conversational agents in mental healthcare, particularly regarding crisis management, intervention fidelity, and ethical oversight. Together, these studies reinforce the promise and challenges of deploying AI-driven mental health support systems like Luna within higher education.

3. Materials and Methods

3.1. Design Science Framework

The development and evaluation of Luna was guided by the Design Science Research (DSR) methodology, incorporating both the Three-Cycle View and the Framework for Evaluation in Design Science (FEDS). This approach facilitated the iterative construction of the chatbot as an artifact responsive to both user needs and institutional requirements. The Relevance Cycle connected Luna to the higher education context—in particular, student mental health challenges and IRB review processes—while the Design Cycle encompassed the build–evaluate–refine loop informed by expert feedback and user pilot data. The Rigor Cycle drew upon theoretical foundations in AI ethics, digital mental health, and counseling practices, particularly in the creation of ethical guardrails and the validation of evidence-based responses. Figure 1a presents an adapted DSR model that contextualizes Luna’s developmental process within these three interacting cycles.

Figure 1b illustrates the four core phases of the Luna chatbot’s development: Prompt Design, Prototype, Pilot Testing, and Refinement. These phases align with traditional system development stages—Planning, System Design, Implementation, and Evaluation—mapped beneath each corresponding project-specific milestone.

3.2. Study Design and Rationale

This pilot study was conducted to evaluate “Luna” from both design-science and empirical research perspectives. The design-science aspect involved the iterative development and documentation of Luna’s modular architecture, with the aim of enabling reproducibility in different contexts. Empirically, the pilot investigated perceptions of Luna’s usability, safety, and usefulness among a small group of participants, without including a formal control group. College students commonly experience three core themes under periods of high stress: Time Management, Anxiety, and Stress. Each theme has subthemes, providing students with the opportunity to utilize evidence-based resources, such as mindfulness activities, time management techniques, and strategies for coping with school-related stress, through Luna. The themes and subthemes were used to develop prompts that generated content to assist students in developing effective coping strategies and self-regulation techniques and encourage them to prioritize their mental health.

3.3. IRB Approval and Ethical Procedures

The study was reviewed by the IRB at the University of Detroit Mercy (UDM)—(Protocol #23-24-38), with deferred full approval and a requested pilot study to evaluate the safety of Luna. Participants provided written informed consent, and the study adhered to ethical safeguards that aligned with the Common Rule (45 CFR 46), including data anonymization, access-restricted storage, and crisis response protocols for at-risk users. Although the study was not registered under CONSORT-AI or CARE guidelines due to its pilot nature, key checklist components—such as version disclosure (GPT-4), consistent prompt delivery, participant safety measures, and data governance—were addressed. Future research will seek full compliance with IRB approval, with formal reporting standards for AI-based mental health interventions to strengthen reproducibility and ethical transparency.

3.4. Computational Implementation and System Architecture

Luna was designed as a modular PHP web application with distinct components for user authentication, chat processing, prompt engineering (via GPT-4), and integrated safety guardrails. The chatbot’s iterative refinement was guided by early feedback from AI ethics experts and mental health professionals, resulting in enhancements to response accuracy, style, and crisis escalation protocols. The system was composed of several distinct components.

Configuration (config.php):

This file established the environment settings for Luna, including database connection details, API keys, and other critical parameters.

User Authentication (login.php and logout.php):

These scripts secured user login and logout processes, ensuring that only authorized users can access Luna.

User Interface (index.php and home.php):

This component served as the entry point and dashboard; these files provided the main interface through which users interact with Luna.

Core Chat Functionality (chat):

This component handled the processing of user inputs, including the detection of sensitive content and the activation of safety protocols.

API Integration (api.php):

Acting as the intermediary, this file packages user queries and communicates with the GPT-4 AI backend to retrieve and deliver generated responses.

The sample files can be viewed in the Supplementary Materials section.

Workflow Overview:

When a user logs in via the authentication module, they access the main dashboard where a chat session is initiated. User inputs are processed by the chat module, which applies safety guardrails, such as checking for sensitive content, and then forwards sanitized input to the GPT-4 backend through the API endpoint. The AI-generated response is returned to the user, completing the interaction. A simplified pseudocode example of this logic is shown in Figure 2. This flowchart outlines the full interaction pipeline for the Luna mental health chatbot. After receiving user input, the system performs content filtering to detect sensitive or emergency topics. If flagged, the system escalates to safety protocols or provides crisis referral resources. If deemed safe, the input is sanitized and forwarded to the GPT-4 API using structured prompt formatting. The AI response is post-processed with guardrails and presented to the user, followed by optional user feedback collection. All interactions are securely logged for audit and improvement.

3.5. Luna Development Stages

The development process for Luna entailed several key stages.

Stage 1: Model training and customization

Luna’s foundational model was built using GPT-4, which had been fine-tuned with a specific focus on mental health-related conversations.

To ensure that Luna understood and responded appropriately to a wide range of student concerns, its customization involved prompt engineering techniques designed to train the AI on mental health scenarios, including discussions on anxiety, depression, and stress management. Figure 3 illustrates the user interface designed for clarity and ease of use, presenting categorized support options for time management, anxiety, and stress through intuitive, clickable prompts, as shown in the interface screenshot.

Stage 2. Prompt Engineering Strategy

Luna’s prompt engineering strategy was specifically tailored for mental health conversations, incorporating a multi-layered approach to guide GPT-4’s responses. The system used a persistent system-level prompt that defined Luna’s persona, therapeutic tone, and safety constraints. This was dynamically paired with real-time user input and contextual metadata—such as emotional keywords (e.g., “stressed,” “overwhelmed”)—to inject additional guidance into the prompt. For example, if the input contained a test-related concern, a calming or cognitive reframing directive was applied. Unlike standard prompt engineering, which typically involves static, one-shot instruction templates, Luna’s approach used adaptive prompting, integrating intent detection, content filters, and escalation keywords. This ensured that generated responses remained contextually relevant, emotionally appropriate, and ethically constrained. This layered prompt architecture was key to supporting Luna’s therapeutic use-case, setting it apart from traditional chatbot prompting techniques. A key benefit of a prompt engineering approach was the implementation of safety guardrails. These guardrails were designed to recognize conversations around sensitive topics surrounding anxiety, depression, suicidality and other mental-health related issues and respond accordingly.

Anxiety and Depression: When conversations indicated symptoms of anxiety or depression, Luna was programmed to recommend university counseling services. This was achieved through pre-defined triggers that, upon detection, guided the conversation towards professional resources.
Suicidality: For users mentioning suicidality, Luna was designed to immediately provide emergency contact information and national suicide helpline resources. This critical safety measure ensured that users would receive prompt and appropriate guidance in urgent situations.
Stage 3. AI illustrative applications

To enhance Luna’s therapeutic capabilities, illustrative applications (Figure 3) were integrated into the system via prompt modification. These applications included therapy sessions, mindfulness exercises, and cognitive-behavioral techniques, which Luna could recommend based on the user’s needs.

Each application was carefully selected and tested to align with best practices in virtual mental health support. Figure 4 is an example of a mindfulness request.

Stage 4. Pre-testing phase

Prior to deployment, Luna was extensively pre-tested to ensure the soundness of its AI responses. A diverse set of simulated user interactions was created to test Luna’s responses across various scenarios. These simulations helped to identify potential weaknesses in the AI’s understanding and responsiveness.

3.6. Research Design

The pilot employed a mixed-methods approach to assess the effectiveness and safety of Luna. This approach was adopted to leverage the strengths of both quantitative and qualitative methods, allowing for a comprehensive evaluation of Luna’s performance from multiple perspectives. Participants were recruited voluntarily from a diverse group of college students and healthcare professionals to reflect a broad spectrum of users. Detailed information about the study’s objectives, the function of the AI chatbot, and ethical considerations, including data privacy, was provided to the participants, from whom informed consent was individually registered prior to study participation.

3.7. Participants and Recruitment

A total of 52 individuals participated in the pilot, including 34 students drawn from various universities and academic programs (e.g., nursing, psychology, and behavioral analysis) and 18 healthcare professionals or faculty members (e.g., nurses, nurse practitioners, and professors). Recruitment was conducted through emailed invitations and campus flyers. Volunteers were directed to a secure login portal for interacting with Luna, after which they submitted anonymized transcripts of their conversations and completed an online survey.

3.8. Data Collection and Instruments

Participants engaged in one or more chat sessions with Luna and were subsequently asked to provide transcript excerpts, enabling the research team to assess conversation quality and adherence to safety measures. In addition, participants completed a web-based survey designed to capture the following information:

Perceived Usefulness: This was assessed on five-point Likert scales, with items such as “How helpful was Luna’s advice in managing stress or anxiety?” and “Did interacting with Luna motivate you to seek additional mental health resources?”
Safety and Appropriateness: Participants indicated agreement with statements such as “I felt comfortable sharing personal concerns with Luna,” using five-point scales.
Topic-Specific Queries: The survey included items related to time management, relationships, academic pressure, and other non-suicidal concerns to ensure a broad capture of issues beyond crisis-level interactions.
Open-Ended Feedback: Free-response fields allowed the participants to describe any particularly useful or unhelpful chatbot responses and to suggest improvements or additional features.

3.8.1. Quantitative Methods

Survey Instrument and Data Collection

Quantitative data were collected using a structured, custom-developed online survey designed to assess the perceived usefulness and safety of Luna. Survey items were informed by the literature on digital mental health interventions and refined through pilot testing with students to ensure clarity. Participants completed the survey following their interaction with Luna, and informed consent was obtained in accordance with IRB-approved ethical procedures.

Data Analysis

Quantitative data were analyzed via statistical analytics software Microsoft Excel. Descriptive statistics were used to summarize the data, while inferential statistics, such as t-tests and effect size, were employed to identify significant differences in perceptions based on captured responses. We used independent-samples t-tests to compare mean usefulness and safety ratings between student and non-student participants. While Likert-scale responses are ordinal, prior research supports treating them as approximately interval-level data when 5-point scales are used, and the distribution is relatively symmetric. Given the sample sizes (n = 34 for students, n = 18 for non-students), t-tests were considered robust enough to tolerate minor violations of normality. Although we did not perform formal normality testing, the use of t-tests in pilot studies of this size and type is common in exploratory research. We also calculated Cohen’s d to assess the practical significance of group differences, using conventional interpretation thresholds of 0.2 (small), 0.5 (medium), and 0.8 (large).

3.8.2. Qualitative Methods

Instrument Development

Qualitative feedback was gathered via open-text fields embedded within the structured surveys. This approach allowed participants to provide detailed comments on their experiences with Luna, focusing on aspects such as the appropriateness of responses, perceived empathy, and any concerns regarding privacy and safety.

A.: Prompt Development: Open-ended questions were designed to elicit rich, descriptive feedback. For example, participants were asked “Please describe any specific instances where you felt Luna’s response was particularly helpful or unhelpful”.
B.: Pre-Testing: The open-text fields were tested with a small group to ensure they effectively captured the desired feedback.

Data Collection

Qualitative data were collected alongside the quantitative surveys. Participants were encouraged to provide as much detail as possible in their responses. Informed consent included a clause in which participants agreed to provide qualitative feedback as part of their participation.

Data Analysis

Qualitative data were analyzed vis-à-vis logical thematic analysis. Responses were coded to identify recurring themes and patterns. Two independent coders reviewed the data to enhance reliability, and any discrepancies were resolved through discussion. Key themes related to the appropriateness of Luna’s responses and user concerns were highlighted and documented.

4. Results

4.1. Participant Demographics

The evaluation involved 52 voluntary participants, reflecting a diverse group with substantial representation from both academia and healthcare.

Professors and healthcare professionals constituted 33.4% of the participants, showcasing the interest and relevance of Luna across professional domains. A significant majority, 76.6%, were college students from over 20 institutions, encompassing healthcare professionals such as nurses and nurse practitioners (45%) and students (55%) in fields such as Health Services Administration, Nursing, Psychology, and Behavioral Analysis.

This mix offered a broad array of perspectives on the study’s subject matter.

4.2. Safety and Themes of Interactions

A paramount concern in the implementation of AI for mental health support is the safety and appropriateness of interactions.

In our study, an overwhelming 96% of respondents considered their interactions with Luna to be safe (Figure 5). This finding indicated a high degree of trust in the chatbot’s capacity to navigate sensitive topics and offered support without posing risks to users, a testament to the effectiveness of the implemented safety guardrails. These results are consistent with a meta-analysis which found that participant interactions with similar chatbots were broadly safe, with no worsening of symptoms, distress, or adverse events reported [33]. Another survey [11] demonstrated that users of chatbots integrating into existing mental health apps generally indicate high satisfaction and positive feelings regarding their interactions with such chatbots. Data were also gathered regarding the common themes students explored in their interactions with Luna (Figure 6), with anxiety being the most common topic students wanted to discuss. Again, these results are consistent with the common reasons people make use of chatbots, frequently including themes of anxiety, stress, depression, and self-care [27].

4.3. Perceived Usefulness of Luna

The utility of Luna as a mental health support tool was affirmed by most respondents (Figure 7), with 90.39% providing a usefulness rating between 3 (“useful”) and 5 (“very useful”). The finding demonstrated strong confidence in the chatbot’s potential to positively influence student well-being and serve as a supportive resource. These results are consistent with a recent meta-analysis which showed that AI-based chatbots were initially effective in treating anxiety and depression [34].

Another meta-analysis reported high usefulness ratings among participants from multiple studies, with benefits such as privately practicing conversations, preparing for conversations with mental health professionals, and creating a sense of self-accountability [12].

4.4. Comparison of Students v. Non-Students

An analysis was performed comparing the responses of students v. non-students regarding themes of interactions, perceived usefulness, and perceived safety. Table 1 lists the percentage of responses mentioning different themes from students compared with non-students.

An exploratory comparison was conducted to assess thematic differences between students and non-students in their interactions with Luna, focusing on issues like anxiety, depression, and time management. Although two-tailed t-tests were used to evaluate differences in theme frequency, no statistically significant results were found (see Table 2).

A detailed analysis of perceived usefulness and safety is presented in Table 3. Participants rated Luna’s usefulness on a 5-point Likert scale (1 = very little use; 5 = very useful). Students reported a significantly higher mean usefulness rating than non-students (p = 0.0359; p < 0.05) using a two-tailed t-test assuming unequal variances. For safety, participants responded ‘Yes’ or ‘No’ regarding whether Luna provided safe and appropriate responses. All students (100%) responded ‘Yes’, compared to 86.67% of non-students. However, this difference was not statistically significant (p = 0.164; p > 0.05).

4.5. Areas for Improvement

Despite the overall positive reception, 9.62% of participants gave a low usefulness rating of 1 (“not useful at all”) or 2 (“somewhat useful”). These ratings point to areas for improvement and the need to understand the specific shortcomings perceived by these users.

Detailed analysis of this feedback was used to refine Luna’s response mechanisms to ensure that it met the diverse needs of all student demographics. Prior research into the drawbacks of AI chatbots in mental health indicated one or more of the following: the perceptions of limited personalization, users becoming caught in conversational loops, and seemingly inappropriate use of referrals to crisis support services, either presenting such resources when not needed or not doing so when they were [27].

Similar themes were found in the present study, as discussed in the next section.

4.6. Qualitative Analysis of Common Data

Table 4 categorizes specific pieces of feedback (coded data extracts) from survey responses into broader themes. Each entry in the “Data Extract” column represents a comment from the user feedback field, aligned with a specific “Code” that identifies a key aspect or concern mentioned by the respondent. These are then linked to the “Generated Theme”, which groups similar codes into meaningful categories that represent overarching insights or issues highlighted by the respondents.

5. Discussion

While Luna’s initial pilot outcomes are encouraging, caution must be exercised when interpreting these results. The small sample size, absence of a control group, and reliance on self-reported measures limit the strength of the conclusions. Moreover, the chatbot’s limitations in addressing high-risk disclosures and tailoring interactions to individual needs highlight the critical importance of continuous system refinement and ethical oversight in future deployments.

The thematic analysis of user feedback on Luna reveals critical insights into the chatbot’s functionality and user satisfaction. As separately detailed below, five main themes emerged from analyzing the coded survey responses: (1) Privacy and Data Security, (2) Response Quality and Utility, (3) User Experience Customization, (4) Accessibility of Supplementary Support, and (5) Communication Style. Together, these derived themes highlight the prevalent concerns and suggestions for improvement.

5.1. Privacy and Data Security

Concerns about privacy and data security were prominent among users. Participants expressed apprehension regarding the confidentiality of their interactions with Luna, questioning how data privacy is maintained. For instance, one user asked whether “questions asked by students will remain private and not become public”.

This theme underscores the necessity for stringent data protection measures and transparent communication about how user data are handled, which are crucial for maintaining trust in AI systems. Concerns regarding data privacy and security are common in discussions regarding AI chatbots. Recommendations from researchers in this area include implementing proper disclosures regarding the use of patient data, whether that be simply storing said data or using data to further train AI models, following data privacy legislation where appropriate, and/or disclosure regarding data storage and security [35].

5.2. Response Quality and Utility

Feedback on the quality and utility of Luna’s responses was mixed. While many users found the chatbot’s suggestions helpful, there was notable feedback concerning the adequacy of responses, especially regarding severe mental health issues. For example, one participant highlighted the need for more than just a number to call when discussing active suicidal thoughts, suggesting a limitation of the guardrails that were implemented to specifically avoid this conversation in Luna’s willingness to handle crises effectively.

Essentially, this theme indicates the importance of a policy regarding more robust support in critical situations. Other research has also identified this limitation regarding the use of AI chatbots in mental health support [12,27].

5.3. User Experience Customization

The desire for personalized interactions emerged as a significant theme. Users expressed preferences for responses that are tailored to individual needs and circumstances, with suggestions for Luna to “remember previous conversations” to enhance personalization.

More specifically, this theme reflects the growing expectation for AI services to adapt to individual user profiles, enhancing the relevance and impact of their support. These participant experiences have been echoed in the literature [27,35], with concerns regarding the generic nature of chatbot responses, as well as responses that are not congruent with the conversation being conducted.

5.4. Accessibility of Supplementary Support

Under this theme, users generally recommended that Luna integrates more actionable support resources, for example, local mental health services and/or immediate helplines. As to the suggestion to “include local support resources in the chat”, it clearly points to a need for Luna to offer more than generic advice; in other words, there is a need to provide users with practical, location-specific options for assistance.

5.5. Communication Style

Preferences regarding the communication style of Luna were also noted. Users favored concise and precise responses, with feedback indicating that some of Luna’s replies were overly lengthy and complex. As one user put it, the answers were “very lengthy… most college students would prefer shorter, more bite-sized answers”.

This theme stresses the need for AI communication to be easily digestible and adjusted to fit the typical user’s attention span and information processing preferences. Figure 8 provides an example of a useful but verbose response.

These user characterizations of Luna are consistent with common user comments regarding the communication style of chatbots. As part of a meta-analysis into user perceptions of chatbots [12], common issues brought up were confusing responses, shallow responses, and responses containing an overwhelming amount of information.

5.6. Limitations and Areas for Future Research

This pilot did not include a control or comparison group, limiting the strength of causal inferences. While the pilot’s results are promising, they also reveal limitations that warrant further investigation. For example, the lower usefulness ratings by some participants suggest that future research should explore the customization of mental health chatbot interventions to better address individual preferences and needs. Such a limitation has been echoed by previous researchers [12] and remains an area for further development in the field of chatbot development.

This study did not include formal normality testing for the t-tests conducted to compare student and non-student groups. While small sample sizes and Likert-type data can present challenges for parametric assumptions, we relied on the robustness of the t-test under these conditions as supported by previous literature. Nevertheless, future studies should use larger samples, formally assess distributional assumptions, and consider non-parametric alternatives such as the Mann–Whitney U test to validate group comparisons more rigorously.

Overall, the present study’s pilot nature and small sample size limit the internal validity of our findings. Without a control group or randomization, it is not possible to attribute observed positive outcomes solely to the Luna intervention. Furthermore, self-selection bias may have influenced participants’ responses. Future research will address these limitations by employing a Randomized Controlled Trial (RCT) design or a quasi-experimental framework, comparing Luna to standard mental health resources to strengthen causal inference.

6. Conclusions

Our study demonstrates that Luna is both effective and safe as a digital mental health intervention. Beyond its practical impact on student well-being, the detailed account of its computational implementation—including a modular PHP architecture and integrated safety protocols—provides a reproducible framework for future research. The availability of the complete source code further underscores our commitment to transparency and innovation in AI-driven mental health support.

Finally, our pilot analysis not only informs key areas for improving Luna construction and development but also contributes to broader discussions on the integration of AI in mental health services, with implications for policy, practice, and user engagement. The integration of AI tools like Luna into educational settings reveals significant potential to support student mental health, as mental health is a priority in ensuring a student’s academic success.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/computers14060227/s1.

Author Contributions

Conceptualization, P.O. and A.B.; methodology, P.O., A.B., L.E., M.A., V.O. and J.T.; software, P.O.; validation, P.O., A.B. and L.E.; formal analysis, P.O. and A.B.; investigation, P.O.; resources, P.O., A.B., L.E., M.A., V.O. and J.T.; data curation, P.O.; writing—original draft preparation, P.O., A.B., L.E., M.A., V.O. and J.T.; writing—review and editing, P.O., A.B., L.E., M.A., V.O. and J.T.; visualization, P.O.; supervision, P.O. and A.B.; project administration, P.O.; funding acquisition, P.O. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author, [PO], upon reasonable request.

Acknowledgments

The authors would like to acknowledge the support of the IRB committee that assisted with the design of the pilot study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Jao, N.C.; Robinson, L.D.; Kelly, P.J.; Ciecierski, C.C.; Hitsman, B. Unhealthy behavior clustering and mental health status in United States college students. J. ACH 2019, 67, 790–800. [Google Scholar] [CrossRef] [PubMed]
Sheldon, E.; Simmonds-Buckley, M.; Bone, C.; Mascarenhas, T.; Chan, N.; Wincott, M.; Gleeson, H.; Sow, K.; Hind, D.; Barkham, M. Prevalence and risk factors for mental health problems in university undergraduate students: A systematic review with meta-analysis. J. Affect. Disord. 2021, 287, 282–292. [Google Scholar] [CrossRef] [PubMed]
Lipson, S.K.; Zhou, S.; Abelson, S.; Heinze, J.; Jirsa, M.; Morigney, J.; Patterson, A.; Singh, M.; Eisenberg, D. Trends in college student mental health and help-seeking by race/ethnicity: Findings from the National Healthy Minds Study, 2013–2021. J. Affect. Disord. 2022, 306, 138–147. [Google Scholar] [CrossRef] [PubMed]
Moghimi, E.; Stephenson, C.; Gutierrez, G.; Jagayat, J.; Layzell, G.; Patel, C.; McCart, A.; Gibney, C.; Langstaff, C.; Ayonrinde, O.; et al. Mental health challenges, treatment experiences, and care needs of post-secondary students: A cross-sectional mixed-methods study. BMC Public Health 2023, 23, 655. [Google Scholar] [CrossRef]
Hitches, E.; Woodcock, S.; Ehrich, J. Building self-efficacy without letting stress knock it down: Stress and academic self-efficacy of university students. Int. J. Educ. Res. Open 2022, 3, 100124. [Google Scholar] [CrossRef]
Li, W.; Zhao, Z.; Chen, D.; Peng, Y.; Lu, Z. Prevalence and associated factors of depression and anxiety symptoms among college students: A systematic review and meta-analysis. J. Child. Psychol. Psychiatry 2022, 63, 1222–1230. [Google Scholar] [CrossRef]
Campbell, F.; Blank, L.; Cantrell, A.; Baxter, S.; Blackmore, C.; Dixon, J.; Goyder, E. Factors that influence mental health of university and college students in the UK: A systematic review. BMC Public Health 2022, 22, 1778. [Google Scholar] [CrossRef]
Kim, H.; Rackoff, G.N.; Fitzsimmons-Craft, E.E.; Shin, K.E.; Zainal, N.H.; Schwob, J.T.; Eisenberg, D.; Wilfley, D.E.; Taylor, C.B.; Newman, M.G. College mental health before and during the COVID-19 pandemic: Results from a nationwide survey. Cogn. Ther. Res. 2022, 46, 1–10. [Google Scholar] [CrossRef]
Dekker, I.; De Jong, E.M.; Schippers, M.C.; De Bruijn-Smolders, M.; Alexiou, A.; Giesbers, B. Optimizing students’ mental health and academic performance: AI-enhanced life crafting. Front. Psychol. 2020, 11, 1063. [Google Scholar] [CrossRef]
van der Schyff, E.L.; Ridout, B.; Amon, K.L.; Forsyth, R.; Campbell, A.J. Providing self-led mental health support through an artificial intelligence–powered chat bot (Leora) to meet the demand of mental health care. J. Med. Internet Res. 2023, 25, e46448. [Google Scholar] [CrossRef]
Boucher, E.M.; Harake, N.R.; Ward, H.E.; Stoeckl, S.E.; Vargas, J.; Minkel, J.; Parks, A.C.; Zilca, R. Artificially intelligent chatbots in digital mental health interventions: A review. Expert. Rev. Med. Devices 2021, 18, 37–49. [Google Scholar] [CrossRef] [PubMed]
Abd-Alrazaq, A.A.; Alajlani, M.; Ali, N.; Denecke, K.; Bewick, B.M.; Househ, M. Perceptions and opinions of patients about mental health chatbots: Scoping review. J. Med. Internet Res. 2021, 23, e17828. [Google Scholar] [CrossRef]
Hamdoun, S.; Monteleone, R.; Bookman, T.; Michael, K. AI-based and digital mental health apps: Balancing need and risk. IEEE Technol. Soc. Mag. 2023, 42, 25–36. [Google Scholar] [CrossRef]
Abdelhafiz, A.S.; Ali, A.; Maaly, A.M.; Ziady, H.H.; Sultan, E.A.; Mahgoub, M.A. Knowledge, perceptions and attitude of researchers towards using ChatGPT in research. J. Med. Syst. 2024, 48, 26. [Google Scholar] [CrossRef]
National College Health Assessment. Academic Year 2024–2025. Available online: https://www.acha.org/ncha/data-results/survey-results/academic-year-2024-2025/ (accessed on 17 April 2025).
Dogan-Sander, E.; Kohls, E.; Baldofski, S.; Rummel-Kluge, C. More depressive symptoms, alcohol and drug consumption: Increase in mental health symptoms among university students after one year of the COVID-19 pandemic. Front. Psychiatry 2021, 12, 790974. [Google Scholar] [CrossRef] [PubMed]
Tavolacci, M.P.; Ladner, J.; Déchelotte, P. Sharp increase in eating disorders among university students since the COVID-19 pandemic. Nutrients 2021, 13, 3415. [Google Scholar] [CrossRef]
Yan, Y.; Hou, J.; Li, Q.; Yu, N.X. Suicide before and during the COVID-19 pandemic: A systematic review with meta-analysis. Int. J. Environ. Res. Public Health 2023, 20, 3346. [Google Scholar] [CrossRef]
Scofield, B. 2023 Annual Report; Publication No. STA 24-147; Center for Collegiate Mental Health (CCMH), Penn State University: University Park, PA, USA, 2024; pp. 2–36. Available online: https://archive.org/details/ERIC_ED640229 (accessed on 17 April 2025).
Auerbach, R.P.; Mortier, P.; Bruffaerts, R.; Alonso, J.; Benjet, C.; Cuijpers, P.; Demyttenaere, K.; Ebert, D.D.; Green, J.G.; Hasking, P.; et al. WHO world mental health surveys international college student project: Prevalence and distribution of mental disorders. J. Abnorm. Psychol. 2018, 127, 623. [Google Scholar] [CrossRef] [PubMed]
Priestley, M.; Broglia, E.; Hughes, G.; Spanner, L. Student perspectives on improving mental health support services at university. Couns. Psychother. Res. 2022, 22, 197–206. [Google Scholar] [CrossRef]
Batchelor, R.; Pitman, E.; Sharpington, A.; Stock, M.; Cage, E. Student perspectives on mental health support and services in the UK. J. Furth. High. Educ. 2020, 44, 483–497. [Google Scholar] [CrossRef]
Kang, H.K.; Rhodes, C.; Rivers, E.; Thornton, C.P.; Rodney, T. Prevalence of mental health disorders among undergraduate university students in the United States: A review. J. Psychosoc. Nurs. Men. 2021, 59, 17–24. [Google Scholar] [CrossRef] [PubMed]
Wentzel, K.R.; Jablansky, S.; Scalise, N.R. Peer social acceptance and academic achievement: A meta-analytic study. J. Educ. Psychol. 2021, 113, 157–180. [Google Scholar] [CrossRef]
Mofatteh, M. Risk factors associated with stress, anxiety, and depression among university undergraduate students. AIMS Public Health 2020, 8, 36–65. [Google Scholar] [CrossRef]
Park, S.Y.; Andalibi, N.; Zou, Y.; Ambulkar, S.; Huh-Yoo, J. Understanding students’ mental well-being challenges on a university campus: Interview study. JMIR Form. Res. 2020, 4, e15962. [Google Scholar] [CrossRef]
Haque, M.D.R.; Rubya, S. An overview of chatbot-based mobile mental health apps: Insights from app description and user reviews. JMIR mHealth uHealth 2023, 11, e44838. [Google Scholar] [CrossRef] [PubMed]
Shim, Y.R.; Eaker, R.; Park, J. Mental health education, awareness and stigma regarding mental illness among college students. J. Ment. Health Clin. Psychol. 2022, 6, 6–15. [Google Scholar] [CrossRef]
Kroshus, E.; Hawrilenko, M.; Browning, A. Stress, self-compassion, and well-being during the transition to college. Soc. Sci. Med. 2021, 269, 113514. [Google Scholar] [CrossRef]
Kuhail, M.A.; Alturki, N.; Thomas, J.; Alkhalifa, A.K. Human vs. AI counseling: College students’ perspectives. Comp. Hum. Behav. Rep. 2024, 16, 100534. [Google Scholar] [CrossRef]
Rackoff, G.N.; Zhang, Z.Z.; Newman, M.G. Chatbot-delivered mental health support: Attitudes and utilization in a sample of U.S. college students. Digit. Health 2025, 11, 20552076241313401. [Google Scholar] [CrossRef]
Vaidyam, A.N.; Wisniewski, H.; Halamka, J.D.; Kashavan, M.S.; Torous, J.B. Chatbots and conversational agents in mental health: A review of the psychiatric landscape. Can. J. Psychiatry 2019, 64, 456–464. [Google Scholar] [CrossRef]
Abd-Alrazaq, A.A.; Rababeh, A.; Alajlani, M.; Bewick, B.M.; Househ, M. Effectiveness and safety of using chatbots to improve mental health: Systematic review and meta-analysis. J. Med. Internet Res. 2020, 22, e16021. [Google Scholar] [CrossRef] [PubMed]
Zhong, W.; Luo, J.; Zhang, H. The therapeutic effectiveness of artificial intelligence-based chatbots in alleviation of depressive and anxiety symptoms in short-course treatments: A systematic review and meta-analysis. J. Affect. Disord. 2024, 356, 459–469. [Google Scholar] [CrossRef] [PubMed]
Coghlan, S.; Leins, K.; Sheldrick, S.; Cheong, M.; Gooding, P.; D’Alfonso, S. To chat or bot to chat: Ethical issues with using chatbots in mental health. Digit. Health 2023, 9, 20552076231183542. [Google Scholar] [CrossRef] [PubMed]

Figure 1. (a) Luna development framework for evaluation in design science. (b) Development phases of the Luna mental health chatbot.

Figure 2. End-to-End workflow of the Luna mental health chatbot. (a) Initialization module (config.php), (b) authentication and session start (login.php), (c) chat handling and safety check (chat.php), (d) output and user feedback (output.php).

Figure 3. Luna user interface.

Figure 4. Example of a mindfulness exercise request.

Figure 5. Students’ perceptions of the safety/appropriateness of interactions with Luna.

Figure 6. Breakdown of themes in interactions with Luna.

Figure 7. Full breakdown of usefulness ratings for Luna.

Figure 8. Example of a useful but verbose response from Luna.

Table 1. Percentage of responses mentioning different themes by students compared with non-students.

Theme Mentioned	Percentage of Student Responses	Percentage of Non-Student Responses
Anxiety	58.33	56.25
Depression	5.56	6.25
Exam Concerns	0	6.25
Study Techniques	0	6.25
Time Management	16.67	25
Stress	19.44	0
Mindfulness	8.33	0
Relationship	2.78	0

Table 2. Comparison of percentage of responses containing most frequently mentioned themes between students and non-students.

Logistic Parameter	Students		Non-Students		t (28, 27, 25) *	p	Cohen’s d
Logistic Parameter	M	SD	M	SD	t (28, 27, 25) *	p	Cohen’s d
Anxiety	58.33%	50	56.25%	51.23	0.1363	0.8925	0.04
Depression	5.56%	23.23	6.25%	25	−0.0944	0.9254	−0.03
TM **	16.67%	37.80	25%	44.72	−0.6494	0.5220	−0.21

* Degrees of freedom listed as 28, 27, and 25 for anxiety, depression, and time management, respectively. ** TM stands for time management.

Table 3. Comparison of usefulness and safety ratings from students and non-students.

Logistic Parameter	Students		Non-Students		t (17, 14) *	p	Cohen’s d
Logistic Parameter	M	SD	M	SD	t (17, 14) *	p	Cohen’s d
Usefulness rating	4.1429	0.7724	3.2	1.5213	2.2779	0.03594	0.90
Safety rating	100	0	86.6667	35.1866	1.4676	0.16432	0.70

Note. For ‘Usefulness’, ratings were given on a 1–5 Likert scale. For ‘Safety, ratings were given as ‘Yes’ or ‘No’, and as such coded as 100 or 0, respectively. * Degrees of freedom are listed as 17 and 14 for usefulness and safety, respectively.

Table 4. Feedback and related themes.

Code	Data Extract	Generated Theme
Privacy concerns	Is there any guarantee that questions asked will remain private and not become public?	Privacy and Data Security
Data security	What if students provide personal information they don’t want others to know?	Privacy and Data Security
Helpful suggestions	I had no concerns, I think the suggestions that were made were appropriate and helpful.	Response Quality and Utility
Crisis handling	Active suicidal thoughts/what else can we do besides give the number to call to maintain their safety?	Response Quality and Utility
Personalization need	My only concern is that the answers provided by Luna were very lengthy. I feel like most college students would prefer shorter, more bite-sized answers that are easier to digest.	User Experience Customization
Remembering interactions	It would be helpful if Luna could remember our last conversation to make this feel more personalized.	User Experience Customization
Resource accessibility	No safety concerns. If students were to ask for local support resources in the chat or free resources to accomplish one of the recommendations, it might be helpful to have some available.	Accessibility of Supplementary Support
Local support suggestion	Add the 988 number to your list of resources for students.	Accessibility of Supplementary Support
Response length concern	Answers are too lengthy; would prefer more concise responses.	Communication Style
Repetitive responses	Only feedback is that the repetition of contacting the wellness center made it feel very impersonal.	Communication Style

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Olla, P.; Barnes, A.; Elliott, L.; Abumeeiz, M.; Olla, V.; Tan, J. Deploying a Mental Health Chatbot in Higher Education: The Development and Evaluation of Luna, an AI-Based Mental Health Support System. Computers 2025, 14, 227. https://doi.org/10.3390/computers14060227

AMA Style

Olla P, Barnes A, Elliott L, Abumeeiz M, Olla V, Tan J. Deploying a Mental Health Chatbot in Higher Education: The Development and Evaluation of Luna, an AI-Based Mental Health Support System. Computers. 2025; 14(6):227. https://doi.org/10.3390/computers14060227

Chicago/Turabian Style

Olla, Phillip, Ashlee Barnes, Lauren Elliott, Mustafa Abumeeiz, Venus Olla, and Joseph Tan. 2025. "Deploying a Mental Health Chatbot in Higher Education: The Development and Evaluation of Luna, an AI-Based Mental Health Support System" Computers 14, no. 6: 227. https://doi.org/10.3390/computers14060227

APA Style

Olla, P., Barnes, A., Elliott, L., Abumeeiz, M., Olla, V., & Tan, J. (2025). Deploying a Mental Health Chatbot in Higher Education: The Development and Evaluation of Luna, an AI-Based Mental Health Support System. Computers, 14(6), 227. https://doi.org/10.3390/computers14060227

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deploying a Mental Health Chatbot in Higher Education: The Development and Evaluation of Luna, an AI-Based Mental Health Support System

Abstract

1. Introduction

2. Literature Review

3. Materials and Methods

3.1. Design Science Framework

3.2. Study Design and Rationale

3.3. IRB Approval and Ethical Procedures

3.4. Computational Implementation and System Architecture

3.5. Luna Development Stages

3.6. Research Design

3.7. Participants and Recruitment

3.8. Data Collection and Instruments

3.8.1. Quantitative Methods

3.8.2. Qualitative Methods

4. Results

4.1. Participant Demographics

4.2. Safety and Themes of Interactions

4.3. Perceived Usefulness of Luna

4.4. Comparison of Students v. Non-Students

4.5. Areas for Improvement

4.6. Qualitative Analysis of Common Data

5. Discussion

5.1. Privacy and Data Security

5.2. Response Quality and Utility

5.3. User Experience Customization

5.4. Accessibility of Supplementary Support

5.5. Communication Style

5.6. Limitations and Areas for Future Research

6. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI