A Pilot Study of an AI Chatbot for the Screening of Substance Use Disorder in a Healthcare Setting

Wright, Tara; Salyers, Adam; Howell, Kevin; Harrison, Jessica; Silvasstar, Joshva; Bull, Sheana

doi:10.3390/ai6060113

Open AccessArticle

A Pilot Study of an AI Chatbot for the Screening of Substance Use Disorder in a Healthcare Setting

by

Tara Wright

¹,

Adam Salyers

²,

Kevin Howell

¹,

Jessica Harrison

¹,

Joshva Silvasstar

² and

Sheana Bull

^2,*

¹

Department of Psychiatry & Behavioral Sciences, University of Texas Health San Antonio, San Antonio, TX 78229, USA

²

Clinic Chat, LLC, Denver, CO 80216, USA

^*

Author to whom correspondence should be addressed.

AI 2025, 6(6), 113; https://doi.org/10.3390/ai6060113

Submission received: 22 April 2025 / Revised: 15 May 2025 / Accepted: 28 May 2025 / Published: 31 May 2025

Download

Browse Figures

Versions Notes

Abstract

Background: Screening for substance use disorder (SUD) is a critical step to address the ongoing opioid crisis in the U.S., but fewer than 10% of people at risk screen. Technology may play a role in substantially increasing screening by making screening accessible through artificially intelligent (AI) chatbots. Methods: This was a single-arm mixed-methods pilot study to establish the system usability of an AI chatbot delivering information about substances, substance use disorder, and treatment options, and implementing self-screening for anxiety, depression, and substance use disorder. Participants were asked to engage with the AI chatbot for seven days and could self-select to screen. Results: Of the 92 participants enrolled, 91 engaged with the system at least once, and 29 (32%) completed at least one screener. Those who screened were given a referral if they exhibited moderate or severe anxiety, depression, and/or SUD. Over three-quarters (83%) of those screened received a referral for treatment, and 50% of those referred made an appointment for care. Users indicated that they found the system helpful and informative, and they felt comfortable screening. Conclusions: While other AI systems that share information about mental health and substance use exist, we know of no other AI chatbot that is being deployed specifically to facilitate SUD screening and referral. The system we describe here shows potential to support self-screening. Users generally find the system acceptable to use. AI technology may allow for improved access to SUD screening and treatment referrals, a critical step in responding to the opioid crisis.

Keywords:

artificially intelligent chatbots; usability; acceptability; substance use disorder screening

1. Introduction

Opioid and stimulant misuse and overdose are public health crises in the U.S. The opioid epidemic is “one of the most severe public health crises in U.S. history” [1], and in 2022, there were over 100,000 drug overdose deaths, with 76% attributable to opioids [2]. Nationally, opioid deaths are highest among males (67%) and non-Hispanic Whites (76%) [2]. Deaths from methamphetamine and cocaine underscore an evolution of illicit drug use that now includes stimulants as well as opioids [3,4]. Nationally, deaths have quadrupled in the past decade among African American, Hispanic, and Non-Hispanic Whites [5].

Screening for SUD is a critical first step to care [2], but it is a substantially underutilized tool. Screening can be successful in diverse settings and can lead to referrals for services ranging from outpatient to medically managed intensive in-patient services [6]. However, estimates indicate that fewer than 10% of people at risk for SUD are screened. When those with SUD go unscreened and untreated, the U.S. spends more than USD 271 billion annually to address subsequent health issues [7]. Conversely, every dollar spent in SUD treatment saves USD 4 in healthcare costs [8].

Myriad barriers exist to SUD screening. A perceived lack of need for treatment among people at risk for SUD is a barrier to initiating SUD treatment [9]. Even if people generate an awareness of need for care, they then face challenges in access to care, which undermines the credibility of screening [6]. Co-occurring conditions, in particular mental health conditions including anxiety and depression, can exacerbate SUD [2]. Multiple organizations including the National Institute of Drug Abuse (NIDA) underscore the critical need to reduce internalized stigma among individuals with SUD, as well as provider stigma [10].

Diverse approaches to increase screening for SUD are needed. Screening, brief intervention, and referral to treatment (SBIRT) interventions have efficacy in addressing problematic alcohol consumption [11]. These programs have been criticized for failing to consistently move people from the screening and brief intervention to successful referral for treatment, particularly for opioid use [12,13]. Some suggest this is due to variability in persons delivering the intervention [11], while others have demonstrated through single studies and meta-analyses that the type of referral matters and can improve outcomes, i.e., referring opioid users for medication-assisted therapy (MAT) immediately (buprenorphine, naltrexone, methadone) concurrently with a referral for behavioral therapy instead of for behavioral therapy alone [14].

Technology solutions have proven efficacy in supporting SUD prevention. Cellphone ownership is nearly universal in the U.S. [15], and the use of text messaging to communicate is common, with 81% of cellphone owners using their phones to text messages [16]. Meta-analyses and systematic reviews of text messaging compared to in-person educational interventions focused on mental health diagnoses including SUD show improvements in medication adherence and psychiatric functioning for adults [17] and reductions in substance use for adolescents [18]. However, this evidence is not specific to opioid and stimulant use and does not specifically address stigma associated with care seeking for opioid and stimulant use or alternate delivery mechanisms beyond text messaging [19].

AI-enabled conversational chatbots are emerging as the next generation for technology-based health behavior interventions [20,21] with applications in primary care to assist in diagnosis and prognosis [22,23,24], including prognosis for mental health outcomes [25]. There have also been applications that utilize AI to offer education related to mental health and substance use [26], but as of yet, they have not been applied specifically to facilitate screening for SUD and referrals for SUD treatment. These systems advance automated communication from fixed-state a priori text message libraries that push out content to users, such as a pre-determined set of questions from which users must pick an option (e.g., text “1” if you need information on buprenorphine). However, using generative AI such as ChatGPT is problematic. These AI-enabled chatbots are trained on language models inclusive of all content available on the Internet. Recent studies have demonstrated that these generative chatbots can offer misinformation when asked about health-related topics [9], suggesting a need to curate and maintain quality control over user-facing health content. Additionally, these systems generate content that is “one size fits all,” devoid of cultural, linguistic, and literacy considerations that can make messages resonate more fully; research on AI applications focused on education for mental health and substance use offers evidence of the superiority of expert-created content compared to AI-generated content [26]. They do not have the capacity to tailor content, nor are they skilled in being able to generate responses that represent best practices in health communication to support behavior change.

In this paper, we report on the pilot test of an AI chatbot developed for substance use education and screening. The AI chatbot, designed and deployed by Clinic Chat, LLC, was implemented through Be Well Texas, a statewide SUD prevention, screening, and treatment program at The University of Texas Health Science Center at San Antonio (UT Health San Antonio). Our primary goal was to establish the usability of the system in preparation for a larger trial of efficacy. We first describe the methods we used to pilot test the system, including detail on the theoretical framework that guided our investigation, measures used for assessment, and analyses employed. We then present the results of our pilot test. This is followed by a discussion that offers a summary of our findings, consideration of the limitations of this work, and a discussion of how this investigation compares to current technology initiatives deployed to support SUD screening.

2. Materials and Methods

2.1. Study Setting

Be Well Texas consists of a hybrid treatment clinic with statewide telemedicine capacity, and a provider network that consists of over 160 contracted community providers of SUD treatment, recovery and peer services, recovery housing, and hospital-based screening and initiation of treatment. Be Well Texas offers a website with information on their services and an opportunity to self-screen for SUD, anxiety, and/or depression via their website. In 2023, they had 214,628 people visit their website. Of these, less than half of one percent (N = 758) sought information on SUD screening. Of the 758 who did seek information, 63 (8%) took no action to self-screen. Over a third (298, 39%) began self-screening for SUD, anxiety, and/or depression but did not complete any screener. A smaller proportion, 32% (241), completed at least one screening, and 65% (156) of those who completed a screening went on to seek a referral for care at a Be Well Texas clinic [27]. Be Well Texas has an interest in adapting this web-based informational and screening tool for delivery via an AI chatbot to determine if offering diverse delivery modalities will result in greater uptake of SUD screening.

Clinic Chat, LLC, designed and built the AI chatbot system explored here, adapting systems built for other projects focused on prevention and management of COVID-19 [28] and access to sexual and reproductive healthcare [29]. Clinic Chat’s AI chatbot systems automate the delivery of text messages via SMS to facilitate user-initiated queries on diverse health-related topics. Initial system engagement is promoted by “pushing” text messages to people who have agreed to receive them. Users who receive messages are invited to engage in a dialogue on topics of their choosing. This initial push message also included language letting the user know they were in communication with an AI chatbot instead of a live person, and that if they were experiencing a medical or psychiatric emergency, they should contact 911 immediately.

The system was built to use AI in several ways. First, we generated queries that we anticipated that system users would pose related to substance use, substance use disorder, anxiety and depression, and screening. We then generated multiple variations of ways to pose questions (e.g., one person may ask “what is Narcan?” while another might say “tell me what Narcan is”) using large language models (LLMs) [30] with data in text form available from content online, such as articles, blogs, posts and web pages, to develop a repository of language used in making queries related to these topics. We took these data and fine-tuned an LLM (Llama) to better handle our specific content. Before utilizing our fine-tuned LLM for classification, we employed a variety of natural language processing (NLP) techniques to pre-process the data. This included using smaller pre-trained machine learning algorithms and python libraries to redact PHI from inputs, regex to pull numbers from inputs, and fuzzy matching to obtain match questions that were almost exactly the same as our intents. We then used natural language processing (NLP) to determine the meaning or intent behind each user query. NLP employed probabilistic modeling to assess the likelihood that a query matched the intent behind a user query and to determine if the query was spelled correctly and was grammatically correct. The LLM further helped the system by storing diverse variations in spelling, grammar mistakes, and slang, which, when combined with variations in how users would pose questions on the same topic as described above, allowed the system to recognize the intent underlying each question during the NLP process. From here, we passed the redacted text back to our LLM. When prompting the LLM, we provided a list of all the intents, along with their “definitions”, which were designed to distinguish intents from one another. This is because, with hundreds of intents, some may seem very similar but have important distinctions; e.g., “What if I don’t want to quit using?” may seem very similar to “What if I don’t think I can quit using?”, but they have two very distinct different answers. When the system cannot match the user input to a question intent with confidence (<6075% of the time), it replies with a fixed choice (also called a “pick list”) set of responses, for example, “I think you are asking about one of these topics” [lists options for users to choose]. “Please type the letter corresponding to the topic you wish to explore or try your question again.” The chatbot prioritizes the options to offer based on those that have the highest prediction score and are most likely to match the user’s question. The system then employs machine learning (ML), a third AI strategy to improve the process of correctly matching responses to user intents for subsequent queries. Specifically, whenever the system was unable to match a response to a user query, the programmers had to go in and manually reclassify the content, and the model learned that the next time a user posed a similar query, the system “learned” that there was a response that they could choose from the system library and share with users. Finally, we employed another important part of the machine learning process: re-classification and post-processing. When confidence was low to none for many of the intents, we manually re-classified those to new intents, which involved (a) identifying and creating a new intent with a new answer, or (b) editing the definition and further fine-tuning our LLM implementation to handle the new intent.

The system sought responses from within a closed library that matches this intent. This reliance on a closed library diverged from generative AI chatbots. The generative AI chatbot derives responses from diverse sources online and is not able to assess the quality of the information it utilizes for the process of matching a response to the intent behind a user query. While generative systems have access to and utilize enormous databases of content, their inability to discern if content is correct means that they may provide misinformation or inaccurate responses that could be medically harmful [9]. Furthermore, generative AI responses have been shown to hallucinate, which means they can deviate from providing information relevant to the intent of a user query and instead provide information that is off-topic and may be nonsensical [31].

In the system described here, we only allowed the system to retrieve responses to user queries from a closed library, where every answer in the library had been curated and reviewed to ensure medical accuracy, a lack of stigma or bias, and optimal messaging that deployed theory-based health communication strategies with demonstrated efficacy to support behavior change [32]. The library of content for the system we describe here was co-developed by Be Well and Clinic Chat with input from subject matter experts and our intended audience of people facing risks of SUD. Participants in this process christened the AI chatbot discussed here with the name “Be Well Buddy” [33].

The library for Be Well Buddy included content that addressed barriers to treatment with 150 intents that addressed perceptions about substance use disorder risk, access to care, treatment options, and information about anxiety and depression. The library also contained 250 intents with information on different substances, medication-assisted therapy, Narcan, buprenorphine, and methadone. Table 1 offers examples of intents from these domains and system responses. Finally, the library incorporated three tools that users could self-administer for screening. These include the PHQ-2, a two-item assessment for depression [34], the two-item General Anxiety Disorder (GAD), assessment of anxiety [35], and the 10-item Drug Abuse Screening Test (DAST) [36]. Both the PHQ-2 and the GAD-2 assign values of 0–3 for each question and generate scores ranging from 0 to 6, where a score of 3 or more will trigger a referral. The DAST-10 assigns single point values for each of the 10 questions with a range of 0–10, where a score of 3 or greater will also trigger a referral. The system also had built in feedback for those who utilized any of these screening tools to indicate whether they had either minimal or no risk, moderate risk, or severe risk for depression, anxiety, and or SUD. Everyone who had a moderate or severe risk was encouraged to seek care and given a referral with a link and a telephone number to Be Well Texas. People with severe risk were encouraged to call 911 or 988 if they were experiencing an emergency. The system utilized secure encryption to report back to Be Well Texas with the first name and telephone number of everyone who received a referral and whether they were at moderate or severe risk for anxiety, depression, or SUD.

Because the system includes information related to illicit substance use and information on treatment that can be highly stigmatized, and because it was designed to make referrals for care, attention to security, privacy, and data encryption, as well as compliance with the Health Insurance Portability and Accountability Act (HIPAA), was also paramount in the system design. The system specifications for security and privacy are detailed elsewhere [37].

Be Well Texas and Clinic Chat collaborated to design a schedule of messages from the closed library that we would “push” to users enrolled in the pilot study. The schedule included a plan to “push” three text messages to users each day for seven days, including messages that shared information about substances, information about recovery and treatment services, and information on how to screen for SUD, anxiety, and depression. Each message included an invitation for the user to engage further by asking related questions, to begin a SUD screening process, or to be referred to a live person at Be Well Texas for further information. The system flow is illustrated in Figure 1.

To establish system functionality and acceptability, we conducted a mixed methods pilot study with persons who were seeking care or information about or were existing clients of Be Well Texas. We had two main questions to address in the pilot study: (a) Does the system function as intended to facilitate screening for SUD, anxiety, and depression? (b) Do system users find the system acceptable to use?

The pilot study was implemented by UT Health San Antonio research staff under the supervision of the lead author (Dr. Wright). All study procedures were reviewed and approved on 29 September 2023, by the UT Health San Antonio Institutional Review Board, protocol number 20230662H.

2.2. Recruitment

Research staff from UT Health San Antonio recruited participants for the pilot study by posting notifications on the Be Well Texas website, through the Be Well Texas network of community-based providers, and flyers in the San Antonio Be Well Texas clinic, inviting people to call the study team to learn more and possibly enroll. Participants were eligible if they were at least 18 years old and had responded to the web- or clinic-based invitation; could speak and read English; had a mobile phone; and agreed to receive text messages. Participants were only asked their age to determine eligibility for the study; no other demographic data were collected. Given our efforts to determine if people would use a chatbot to self-report SUD-related behaviors, and because these behaviors have a high risk of being stigmatizing, we felt that allowing for anonymity could help increase trust in and willingness to use the system. People were excluded if they were currently enrolled in SUD treatment (by self-report).

The UT Health San Antonio research team administered a snowball sampling technique by inviting participants who contacted them to refer friends and family to enroll as well. All participants were offered informed consent prior to enrollment and were invited to engage with the AI chatbot system for seven days. No baseline data were collected from users; we only collected data from use of the system and in-depth interviews. We offered users a USD 40 gift card for participation with the chatbot. Once enrolled and opted in to receive text messages, they could interact with Be Well Buddy as much or as little as desired, and they could also self-screen for depression, anxiety, and/or SUD at any time during the seven-day trial period. Users were informed during their initial enrollment that all data transmissions were end-to-end-encrypted and stored behind a secure firewall at UT Health San Antonio. After they completed seven days of interactions, regardless of their level of interaction, research assistants invited participants to complete an in-depth interview. Participants were offered a USD 25 gift card to complete the interview. Research assistants continued enrolling participants willing to complete an in-depth interview until the interview data with new enrollees revealed no new information.

2.3. Measures

The People at the Center for Mobile Application Design (PACMAD) model [38] is a theoretical framework that offers specific guidelines on how to assess usability for consumer-facing technologies such as apps and text messaging. PACMAD considers a tool usable if it both functions as intended and is accepted by users.

Table 2 illustrates the specific ways that we adhered to the PACMAD model. PACMAD identifies seven attributes of usability to demonstrate both functionality and acceptability, including memorability, efficiency, errors, efficacy, effectiveness, satisfaction, and cognitive load. Memorability refers to the ability of users to use and reuse the system easily without having to re-learn how to do so with each subsequent use. Efficiency is a measure of how quickly users can access and engage with the system. We assessed memorability and efficiency by logging every text sent to users, every response text they sent back, and the time of day and day of the week communications occurred. We reviewed user logs to see if users had queries related to how to use the system (memorability) or if it took longer for users to respond to subsequent questions following their initial response (efficiency). Errors refer to the capacity for the system to deliver content without breaking down or crashing; we documented errors and system breakdowns throughout the pilot. A critical aspect of assessing errors with an AI chatbot is precision, meaning a chatbot can return an answer that provides a response that correctly matches the intent of the user query. This is measured by documenting the number of times the system returns a response indicating, “I did not understand your question. I think you may have been asking about (a, b, or c). Please pick one of these topics or rephrase your question”, and calculating overall precision as the number of these responses (numerator) divided by total number of responses sent (denominator).

Effectiveness is a measure of how well a system can be expanded to delivery across multiple environments. Because this was a pilot carried out with a small sample in a single site, we could not assess effectiveness. We did assess potential for efficacy, i.e., whether the system has potential to impact access to care by documenting the number of screenings conducted and referrals made; without a randomized trial with a comparison group, however, we cannot assert system efficacy in the data we present here. We assessed system satisfaction and cognitive load (a measure of how easy the system is to use without confusing users) through our in-depth interviews with users, asking them what they liked and disliked about the system, whether they would use it again, and if they found any difficulty in using the system.

To ascertain if there was a relationship between the frequency of engagement with the system and screening behaviors, we conducted a Mann–Whitney U test (Wilcoxon rank-sum test) with R software version 4.4.0 [39], anticipating that engagement with the system would not be normally distributed, with some people not engaging at all with the system and others engaging frequently.

The UT Health San Antonio research team conducted and recorded one-on-one in-depth interviews via online video conference calls and completed analysis. The interviews were semi-structured and followed a question guide. The interviewers spent between 45 and 60 min on the interviews, and the interviews took an average of 50 min to complete. Data from recordings were transcribed and a thematic analysis was performed using a hybrid coding approach with Dedoose version 10.0, a qualitative coding software [40]. This involved using a priori codes derived from anticipated user reaction to the chatbot from the literature. Specifically, this included questions anticipating antipathy towards chatbots and concerns that they would not be accurate or helpful [41,42]. The team then documented other ideas identified by the participants when they did not fall within the a priori codes. Once interviewers were hearing similar themes in response to questions, they determined that we had reached saturation, and no more interviews were conducted. Analysis included a quasi-inductive approach, using both a priori and emergent codes to categorize in-depth interview data. Thematic content analysis of codes involved grouping similar codes into cohesive, overarching themes from across all in-depth interviews. Two coders analyzed transcripts, each responsible for the initial coding of half the transcripts. We randomly selected segments representing 15% of all the content for each transcript, asked coders who had not coded that content to do so and documented that they had coding agreement in 80% of their codes.

3. Results

UT Health San Antonio screened 150 people. Of these, 110 were self-referred and 40 were referred by enrollees. Of the 150 people screened, 58 (39%) did not agree to receive text messages from Clinic Chat. There were 92 participants enrolled; of these, 1 participant never responded to or engaged with any text messages. We have data from 91 participants who responded to at least one text message with the system. Participants were enrolled to receive text messages and invited to engage with the system over a seven-day period. The pilot took place over 90 days between June and September of 2024, and 13 of those enrolled completed an in-depth interview. As noted above, we did not collect demographics of participants to protect their anonymity except for age. Because being between the ages of 18 and 89 was an eligibility criterion for enrollment, we documented the age of all people enrolled. Persons over 89 were ineligible per IRB regulations, given the fact that the small number of people aged 90 and older there would introduce the possibility that they could be identified. The mean age of participants enrolled was 42, and participants ranged in age from 18 to 56.

3.1. Usability

3.1.1. Memorability and Efficiency

Be Well Buddy delivered 4173 messages during the pilot, 1418 (34%) of which were push notifications and 2755 (66%) of which were responses to subsequent user queries. Push notifications included brief instructions on how to use the system (“text me questions you have about substance use substance use disorder and screening for substance use”). Of all the participant-initiated queries, 2339 (87% of the total) were not directly connected to screening for anxiety, depression, or SUD. This represents an average of 32 messages per participant not specific to screening among those who engaged, with a range of 1–142 messages sent. None of the queries included requests for clarification on how to use the system.

Figure 2 shows the distribution of messages to the system by the time of day, showing that the largest number of messages was initiated by users between 11:30 p.m. and 1 a.m. Table 3 documents the frequency of engagement and use of the system by participants. We did not document any decrease in the average time to respond to messages.

The most popular topics for users are depicted in Figure 3. For those who screened for anxiety, depression, and/or SUD, the most popular queries were related to concerns about quitting (e.g., “what if I don’t want to quit using?”) substances and the necessity of treatment. By comparison, those who engaged with the system but did not screen asked questions about relapse, medications, and the cost and location of services.

3.1.2. Errors

The system correctly responded to 2204 of the 2755 user-initiated queries (80%).

There were three errors in system delivery over the course of the trial. In the first three days of message delivery, phone carriers blocked 10% of messages sent from the system to persons enrolled; these messages all included the word cocaine.

On 19 July 2024, there was a global information technology outage for more than eight million Microsoft users, which was traced to a defective system update for CrowdStrike, a U.S.-based cybersecurity company [43]. We noted that between July 19 and 23, our system was not sending messages, and we attributed this error to the global IT outage.

Finally, between August 26 and 28th, Clinic Chat experienced a communication breakdown between servers, resulting in a failure to deliver messages.

With 10% of messages blocked in the first three days and delivery failures on five subsequent days, we documented errors on 8 of the 90 (9%) message delivery days during the pilot. This impacted 246 messages in total, or 6% of all messages sent.

3.1.3. Potential for Efficacy in Increased Screening for SUD

One third of the sample (33%, or 30 people) opted to screen for one or more conditions; of these, 29 (97% of those screening) completed at least one screener. There were 29 people who completed the screener for anxiety (39%), 27 for depression (36%), and 25 for SUD (34%). Of those screening, 83% were referred and encouraged to seek help for anxiety (24 people), 41% for depression (12 people), and 52% for SUD (15 people).

Participants who ultimately screened for anxiety, depression, and/or SUD received an average of 7.8 invitations to do so before they initiated a screening.

Of the 24 people who were referred after completing a screening, 12 of them (50%) made an appointment with Be Well for care. Privacy regulations do not allow us to determine if appointments were for anxiety, depression, or SUD care.

We conducted a Mann–Whitney U test to determine if more queries to the system were associated with greater screening or higher scores across the different measures (GAD, PHQ, DAST). Scores were continuous in the model with a U value of 8, ranging from 0 to 2 for GAD; 0 to 2 for PHQ; and 0 to 10 for DAST. They are presented as ranges in Figure 3 for simplified display. The test showed no significant evidence to conclude that more queries is associated with more screening or higher scores in GAD, PHQ, or DAST. However, the PHQ and DAST scores have slightly lower p-values (p = 0.11), suggesting a possible trend that might be worth further exploration with a larger dataset. The effect sizes were all less than 0.3.

Figure 4 illustrates the relationship between the frequency of user-initiated queries to Be Well Buddy, screening, and screening scores for the 29 who screened at least once. People who initiated fewer than 11 queries to the system did not screen for anxiety, depression, or SUD. Those initiating between 11 and 25 queries to the system had the lowest levels of screening and the lowest frequency of moderate-to-severe scores. Those initiating 26–50 and 51+ queries to the system showed similar levels of screening and similar screening scores except for DAST scores, which were lower when users queried the system more than 50 times.

3.1.4. Satisfaction and Cognitive Load

Participants in 13 interviews focused on participants’ experiences using the Be Well Buddy chatbot.

Relevant for cognitive load, most (11) participants indicated the system was unobtrusive, convenient, and easy to use.

“[I liked] that it checked in with me throughout the week, or throughout the day, so I might have a different mood one day, and you know, just to have something talk to or someone to talk to is really good. It was pretty easy, [to use] if I’m just playing on my phone or anything. I just go back and look through my messages and go back and text it, that was kind of cool. That was really great.”

Related to satisfaction, the Be Well Buddy system offers users prompts to ask other questions with suggestions of what they might want to learn more about (“Ask me about something else! Type the number of any of these topics here or type in your own question: 1. What is buprenorphine? 2. How much does treatment cost? 3. What is involved in screening for substance use?”), a feature that many agreed was helpful: “So that was pretty cool. I think that was probably my favorite part,” said one participant, and “Because without [the prompts] I don’t know if I would…engage much,” said another.

Many participants (9 of 13) found the chatbot informative, providing detailed responses to their queries. They appreciated its ability to offer educational content about various aspects of substance use and treatment: “for the most part, like, it’s very, very educational,” and “Ha! It helped with, you know, accessing care through Be Well, you know. That’s pretty cool.”

Most participants in the in-depth interviews (9 of 13) did use the self-screening tool within the chatbot. Those who did use the tool indicated they felt comfortable with the system by saying things like “It made me feel it was ok [to screen]” and “I was curious to know my risk.” Those who did not screen indicated either that the system was not something they wanted to use (“I’ll only screen with someone face to face, I’m too worried to trust this”) or that they did not see the benefit of screening in general (“Why bother if there isn’t treatment available where I live?”).

Several (four) participants highlighted the importance of reassurance around confidentiality, particularly for those worried about how seeking help might impact their families, such as interactions with child protective services: “I don’t know, me being a single parent, and from my past substance abuse… the confidentiality with CPS [is critical] if I’m seeking help. Worry about losing my kid is a big issue.”

Many (10) valued the supportive tone (“But I mean I think it was very person-centered language, so I didn’t feel any kind of like stigmatized or judgment”, said one participant) and human-like interaction (“It’s not robotic like, it’s more, it’s more like “humany”).

Others (three) noted that the chatbot could improve its empathetic qualities, as it sometimes felt too automated: “I said it was very professional to me the way they answered the questions that you needed, or the services that you needed” but “it is hard to show empathy unless you use an emoji or a gif”.

Most (11) participants indicated they would recommend the chatbot to others, especially for its role in providing initial information and support in the recovery process. “I’d recommend it because I know a lot of people who are afraid to ask… they don’t want the judgment, or you know they’re embarrassed” said one participant. Almost all (12) indicated they would continue to use it if offered the opportunity: “Yes, I like being able to get some questions answered when I think of them, indicated one participant. A few participants (four) said they would like to use it if they could schedule their own appointments for follow-up after screening for SUD.

3.2. Areas for Improvement

Several (seven) participants wished the chatbot could connect them directly with live support, like peer recovery specialists or medical professionals, especially when it could not answer specific questions. Some mentioned the need for more location-specific information, such as local treatment centers or resources available in rural areas. “And there was a lot of things on here about Narcan and stuff like that, maybe providing some links of where they can access free Narcan, and wherever they’re at”, said one participant, and “Places to go to seek help, like when like a “locations around me”, said another, these being illustrations of interest in access to localized resources.

Enhancements like including information on state funding options for treatment and harm reduction strategies were suggested: “Not everybody has insurance, or has even money on the sliding scale fee, maybe to help people to find that state funding”.

Others (three) noted the need for some technical improvements, indicating frustration if they did not receive an immediate response (“A couple of times. I got a delay”) or received an incorrect response (“The response I got was not [right] not close”).

4. Discussion

4.1. Principal Results

This paper describes the usability of Be Well Buddy, an AI chatbot to support sharing information about SUD and facilitating screening and treatment referrals for SUD. We offer evidence that Be Well Buddy meets all but one element of the PACMAD criteria for usability, including memorability, efficiency, errors, potential for efficacy, satisfaction, and cognitive load. Our study design did not allow an assessment of system effectiveness, and our assessment of efficacy is limited to considering only the potential for efficacy given a lack of a comparison group in this pilot. It is critical to note that establishing usability is an important preliminary step for assessment of any digital health tool.

We observed that the system is memorable; i.e., people were able to use the system without multiple reminders of how to do so. The system appears efficient, with users able to pose queries and obtain responses quickly, without any variability in the time it took to do so. Users made most of their queries to the system late at night, and that those who screened made more queries to the system than people who did not screen, although these differences were not significant. The system is precise, offering the correct response to user queries 80% of the time. We did note errors in delivery over 9% of the days when messages were delivered, which impacted 6% of all messages delivered. Although delivery failures between July 19 and 23 related to the global IT outage were beyond our control, we consider that errors on the remaining dates are ones we can avoid in future iterations implementing Be Well Buddy.

With 32% of those enrolled completing at least one screener, between 40% and 83% obtaining a referral for care, and 50% of those referred making treatment appointments, we submit that the system clearly has potential for efficacy, particularly when one considers that estimates state that fewer than 10% of people at risk for SUD will screen. While we assert that this suggests a promising approach to increase screening, there remains room for improvement to increase screening among system users. It is possible that the seven-day observation time of system users is too short, and that given more time, users may feel more comfortable with screening. As one participant suggested, we can offer more information on what to expect from screening and to reassure users of locally available resources they can access to motivate screening. Other digital health research has shown that users find systems credible when they are directly linked to a trusted healthcare provider [12], so including the name of healthcare agencies or even individual providers may help engender the trust needed to initiate screening. Subsequent research with a larger sample that utilizes a randomized controlled design can further elucidate system efficacy.

Users expressed satisfaction with the system and consider it appealing, appropriate, and easy to use. Among those participating in in-depth interviews, those who screened said they felt safe and were curious to know their risk, and many indicated that they would likely return to use Be Well Buddy again and refer others to engage with it. Importantly, users indicated that they felt the system offered information in a non-judgmental, supportive manner, avoiding stigma and helping people to feel safe in accessing sensitive topics. Data from in-depth interviews reinforce the importance of maintaining an empathetic tone with messaging, linking people to resources for SUD including appointment scheduling for SUD treatment, and reiterating the steps the system uses to maintain user confidentiality. Nevertheless, the willingness to screen is not universal, and there were many who did not screen, suggesting we have more to learn about whether we can consistently ensure people feel safe and supported using an AI chatbot for SUD screening.

4.2. Limitations

The very small number of participants in this work, along with the manner of recruitment, using a compensated convenience sample with self-selection and snowball sampling for referrals, does not allow us to generalize our findings beyond the pilot study sample. Furthermore, because we asked people who were enrolled explicitly to engage with the system, we cannot infer that people would voluntarily do so when they are not part of a research study. We also cannot assert that those who chose to screen would not have screened via some other modality such as through an SBIRT intervention. Only 60% of those invited to participate did so, raising possible concerns that this tool will not succeed in being universally accepted for seeking information and self-screening for SUD. However, we assert that there is no one tool or modality that will be universally accepted and having this as an option did appeal to a majority of people invited to consider it.

Because the primary objective of this work was to establish usability, these limitations are ones that will appropriately be addressed in subsequent research. Future evaluation of the system should include a recruitment strategy that will allow for greater generalizability. This could involve widespread advertising of the system in primary care clinics and emergency room settings. To avoid a self-selection bias, it would be optimal to make an invitation to use the system part of a standard protocol for primary care and emergency room visits.

A critical next step in evaluating this system would include a randomized controlled trial with a comparison group(s). This will allow an evaluation of whether system benefits are attributable to engagement with Be Well Buddy or if they are occurring at random. Comparison groups could include a comparison of users of the Be Well website and could also include comparisons to other screening programs such as SBIRT. Another approach would be to compare Be Well Buddy to a generative AI that does not rely on a closed library, which would allow us to understand more about the potential value of using a closed library, which remains an empirical question. Robust recruitment of participants in emergency departments and primary care settings could improve the generalizability of the study findings. This would allow us to determine if this system is superior or inferior to other approaches in increasing SUD awareness and facilitate screening and referral, and it would allow an assessment of whether the system offers benefits across diverse audiences.

Finally, any digital health intervention runs a risk of limited reach due to disparities in access to technology. As mentioned previously, cell phone use is nearly universal, and 81% of people with cell phones use them to send and receive text messages [16], making this among the most accessible digital health solutions available. However, there remain geographic areas where cell phone signals will not permit text messaging, which would limit the reach of this intervention. Nevertheless, the text message medium for Be Well Buddy is one that allows for scalability and adaptability to a large and diverse audience. It also allows for less technological complexity, which can potentially facilitate greater sustainability than other digital health interventions that rely on apps or websites to deliver content.

4.3. Comparisons with Other Work

With estimates that fewer than 10% of people at risk are screened for SUD, we urgently need solutions that can help increase that number. Although we cannot compare our efforts to unprompted screening in the general population, it is worth noting that Be Well Buddy was able to screen 32% of those enrolled.

While we also cannot compare the performance of Be Well Buddy directly to the Be Well Texas website, we note important opportunities that the Be Well Buddy chatbot may offer to complement screening on a website. Be Well Texas shows that 32% of people with queries about screening to their website complete a screening; the proportion was the same for Be Well Buddy, with 32% completing a screener. On the Be Well Texas website, there were more than a third (39%) who abandoned a screener after starting—that number was only 3% for Be Well Buddy. Once screened, 65% of the Be Well Texas users sought a referral; for Be Well Buddy, between 44% and 60% were referred for care.

One might reasonably compare these findings to other research on AI chatbots that addresses mental health and substance use and other digital health technology applications related to SUD. Elyoseph et al. [25] discussed an assessment of an AI system that facilitated the prognosis of mental health outcomes using AI and found that AI systems consistently underestimate the risk for a poor mental health prognosis compared to clinician assessments. Be Well Buddy does not focus on prognosis, and results from self-screening include referrals for direct communication with providers, which may be a needed element to optimize applications of AI in behavioral healthcare.

Spalleck et al. [26] reviewed content in AI educational applications focused on mental health and substance use. Their conclusions that expert-generated content in AI educational efforts is superior to content produced with generative AI offers additional support for our system that relies on a closed library with curated content to avoid misinformation and hallucination.

Screening, brief intervention, and referral to treatment (SBIRT) programs have been delivered electronically—although not via AI chatbots—and offer a possible comparator to Be Well Buddy, although meta-analyses and commentaries related to SBIRT emphasize a failure to consistently move people from the screening and brief intervention to successful referral for treatment, particularly for opioid use [13,44,45,46].

Researchers consider the inconsistency in evidence supporting the efficacy of SBIRT to be likely related to challenges in translating impacts of alcohol use and other substances; it may also be related to variability in persons delivering the screening, brief intervention, and referrals to treatment.

Be Well Buddy is designed to use technology instead of people to offer standardized and consistent information, non-judgmental support, and self-efficacy for screening, where users direct the conversation on topics of their own choosing and only screen of their own volition. Because the system is automated, it may offer an advantage over variability in intervention fidelity when delivered by different people. SBIRT also offers important lessons to consider for a subsequent trial of Be Well Buddy—it may be important to include referrals not only for treatment but perhaps simply to connect with a health professional to further assess needs and readiness for treatment without the expectation that scheduling treatment will be immediate. The most popular topics explored with Be Well Buddy among those who screened included queries related to medication options (medication-assisted therapy (MAT)). Other researchers attempting to address SBIRT limitations have shown the benefit of engaging users more quickly and consistently in MAT [46]. A similar focus for Be Well Buddy to facilitate initial access to MAT prior to the utilization of a broader range of SUD services may be impactful.

The PACMAD framework we relied on to guide the assessment of usability for this work is not explicit on what is an acceptable error rate for mobile applications, and there is no consensus on what is a reasonable level of error from reviews of health-related text messaging interventions that utilize a technology for message delivery that is similar to Be Well Buddy [47]. A general industry standard for any mobile application that includes health-related applications but does not exclude other fields (e.g., banking) suggests that error rates up to 10% may be acceptable [48]. With this suggestion, the errors we documented for Be Well Buddy are within this range; if we address those related to the message content and build in redundancies in anticipation of server outages, we can reduce or even eliminate these errors, substantially contributing to longer-term feasibility for system implementation.

5. Conclusions

Be Well Buddy is the first AI chatbot we are aware of with demonstrated usability in sharing information about SUD that also helps people who are interested in self-screening, obtaining referrals for care, and accessing care. With continued SUD crises in the U.S., including deaths from overdoses, using a supportive automated system that has the potential for scalability and opportunity to link users to diverse resources, including treatment as desired, represents a promising new modality to consider in addressing SUD. The critical next steps to advance this work are to demonstrate efficacy through a randomized controlled trial, and then to scale the system for use in diverse settings to reach a broad audience of people at risk for SUD. We aim to optimize the system to include information on geographically proximal SUD treatment.

Author Contributions

Conceptualization, T.W. and S.B.; methodology, T.W., J.H. and S.B.; software, A.S., K.H. and J.S.; formal analysis, T.W. and S.B.; investigation, T.W.; resources, T.W.; data curation, A.S. and K.H.; writing—original draft preparation, T.W. and S.B.; writing—review and editing, T.W., A.S., K.H., J.H., J.S. and S.B.; visualization, J.S.; supervision, T.W. and S.B.; project administration, T.W. and S.B.; funding acquisition, T.W. and S.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Institute of Drug Abuse (NIDA) at the National Institutes of Health, under grant number 1R41DA059275-01A1.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki. All study procedures were reviewed and approved on 29 September 2023, by the UT Health San Antonio Institutional Review Board; protocol number: 20230662H.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data presented here can be made available for review upon written request to the corresponding author due to the sensitive nature of substance use.

Acknowledgments

We gratefully acknowledge Jennifer Potter from the University of Texas Health San Antonio for her support of this work.

Conflicts of Interest

Three of the authors (Bull, Salyers, and Silvasstar) report affiliations with the organization Clinic Chat, LLC, that developed and deployed the Be Well Buddy system described here.

Abbreviations

The following abbreviations are used in this manuscript:

SUD	Substance use disorder
AI	Artificial Intelligence
NIDA	National Institute of Drug Abuse
SBIRT	Screening, brief intervention, and referral to treatment
MAT	Medication-assisted therapy

References

Salazar, C.I.; Huang, Y. The burden of opioid-related mortality in Texas, 1999 to 2019. Ann. Epidemiol. 2022, 65, 72–77. [Google Scholar] [CrossRef] [PubMed]
National Institute of Drug Abuse. Overdose Death Rates. Available online: https://nida.nih.gov/research-topics/trends-statistics/overdose-death-rates (accessed on 14 December 2022).
Additional Texas Overdose Death Data. Available online: https://nasadad.org/wp-content/uploads/2024/09/Texas-SOR-Brief-Draft-2024_Final.pdf (accessed on 2 December 2022).
Centers for Disease Control and Prevention. Polysubstance Use in the United States. Available online: https://www.cdc.gov/drugoverdose/deaths/other-drugs.html (accessed on 2 December 2022).
Kulesza, M.; Matsuda, M.; Ramirez, J.J.; Werntz, A.J.; Teachman, B.A.; Lindgren, K.P. Towards greater understanding of addiction stigma: Intersectionality with race/ethnicity and gender. Drug Alcohol. Depend. 2016, 169, 85–91. [Google Scholar] [CrossRef] [PubMed]
Medicaid Innovation Accelerator Program Reducing Substance Use Disorders. High Intensity Learning Collaborative Fact Sheet. Available online: https://www.google.com/search?q=https://www.medicaid.gov/state-resource-center/innovation-accelerator-program/iap-downloads/learn-hilciap.pdf (accessed on 7 November 2022).
National Institute on Drug Abuse. Trends & Statistics: Costs of Substance Abuse. National Institutes of Health, June. 2020. Available online: https://nida.nih.gov (accessed on 2 December 2022).
Substance Abuse and Mental Health Services Administration (US); Office of the Surgeon General (US). Facing Addiction in America: The Surgeon General’s Report on Alcohol, Drugs, and Health [online]; November; Chapter 7; Vision for the Future: A Public Health Approach; Department of Health and Human Services: Washington, DC, USA, 2016. Available online: https://www.ncbi.nlm.nih.gov/books/NBK424861/ (accessed on 2 December 2022).
Sarraju, A.; Bruemmer, D.; Van Iterson, E.; Cho, L.; Rodriguez, F.; Laffin, L. Appropriateness of Cardiovascular Disease Prevention Recommendations Obtained From a Popular Online Chat-Based Artificial Intelligence Model. JAMA 2023, 329, 842–844. [Google Scholar] [CrossRef]
National Institute on Drug Abuse. Addressing the Stigma that Surrounds Addiction. 22 April 2020. Available online: https://nida.nih.gov/about-nida/noras-blog/2020/04/addressing-stigma-surrounds-addiction (accessed on 7 November 2022).
Bernstein, S.L.; D’Onofrio, G. Screening, treatment initiation, and referral for substance use disorders. Addict. Sci. Clin. Pract. 2017, 12, 18. [Google Scholar] [CrossRef]
Saitz, R. ‘SBIRT’ is the answer? Probably not. Addiction 2015, 110, 1416–1417. [Google Scholar] [CrossRef]
Glass, J.E.; Hamilton, A.M.; Powell, B.J.; Perron, B.E.; Brown, R.T.; Ilgen, M.A. Specialty substance use disorder services following brief alcohol intervention: A meta-analysis of randomized controlled trials. Addiction 2015, 110, 1404–1415. [Google Scholar] [CrossRef]
Ray, L.A.; Meredith, L.R.; Kiluk, B.D.; Walthers, J.; Carroll, K.M.; Magill, M. Combined Pharmacotherapy and Cognitive Behavioral Therapy for Adults with Alcohol or Substance Use Disorders: A Systematic Review and Meta-analysis. JAMA Netw. Open 2020, 3, e208279. [Google Scholar] [CrossRef]
Pew Research Center: Internet, Science & Tech. Demographics of Mobile Device Ownership and Adoption in the United States. Available online: https://www.pewresearch.org/internet/fact-sheet/mobile/ (accessed on 2 May 2021).
Rhoades, H.; Wenzel, S.L.; Rice, E.; Winetrobe, H.; Henwood, B. No digital divide? Technology use among homeless adults. J. Soc. Distress Homeless. 2017, 26, 73–77. [Google Scholar] [CrossRef] [PubMed]
Watson, T.; Simpson, S.; Hughes, C. Text messaging interventions for individuals with mental health disorders including substance use: A systematic review. Psychiatry Res. 2016, 243, 255–262. [Google Scholar] [CrossRef]
Mason, M.; Ola, B.; Zaharakis, N.; Zhang, J. Text messaging interventions for adolescent and young adult substance use: A meta-analysis. Prev. Sci. 2015, 16, 181–188. [Google Scholar] [CrossRef]
Snyder, L.B.; Hamilton, M.A.; Mitchell, E.W.; Kiwanuka-Tondo, J.; Fleming-Milici, F.; Proctor, D. A Meta-Analysis of the Effect of Mediated Health Communication Campaigns on Behavior Change in the United States. J. Health Commun. 2004, 9 (Suppl. 1), 71–96. [Google Scholar] [CrossRef] [PubMed]
Avila-Tomas, J.F.; Olano-Espinosa, E.; MinuŽ-Lorenzo, C.; Martinez-Suberbiola, F.J.; Matilla-Pardo, B.; Serrano-Serrano, M.E.; Escortell-Mayor, E. Effectiveness of a chat-bot for the adult population to quit smoking: Protocol of a pragmatic clinical trial in primary care (Dejal@). BMC Med. Inform. Decis. Mak. 2019, 19, 249. [Google Scholar] [CrossRef] [PubMed]
Oh, K.J.; Lee, D.; Ko, B.; Hyeon, J.; Choi, H.J. Empathy Bot: Conversational Service for Psychiatric Counseling with Chat Assistant. Stud. Health Technol. Inform. 2017, 245, 1235. [Google Scholar] [PubMed]
Alam, R.; Cheraghi-Sohi, S.; Campbell, S.; Esmail, A.; Bower, P. The Effectiveness of Electronic Differential Diagnoses (DDX) Generators: A systematic review and meta-analysis. PLoS ONE 2016, 11, e0148991. [Google Scholar] [CrossRef]
Jones, O.T.; Calanzani, N.; Saji, S.; Duffy, S.W.; Emery, J.; Hamilton, W.; Singh, H.; de Wit, N.J.; Walter, F.M. Artificial Intelligence Techniques That May Be Applied to Primary Care Data to Facilitate Earlier Diagnosis of Cancer: Systematic Review. J. Med. Internet Res. 2021, 23, e23483. [Google Scholar] [CrossRef]
Millenson, M.; Baldwin, J.; Zipperer, L.; Singh, H. Beyond Dr. Google: The evidence on consumer-facing digital tools for diagnosis. Diagnosis 2018, 5, 95–105. [Google Scholar] [CrossRef]
Elyoseph, Z.; Levkovich, I.; Shinan-Altman, S. Assessing prognosis in depression: Comparing perspectives of AI models, mental health professionals and the general public. Fam. Med. Community Health 2024, 12 (Suppl. 1), e002583. [Google Scholar] [CrossRef]
Spallek, S.; Birrell, L.; Kershaw, S.; Devine, E.K.; Thornton, L. Can we use chatGPT for mental health and substance use education? Examining its quality and potential harms. JMIR Med. Educ. 2023, 9, e51243. [Google Scholar] [CrossRef]
Howell, K. (University of Texas Health Sciences San Antonio: Antonio, TX, USA). Personal communication, 2024.
Zhou, S.; Silvasstar, J.; Clark, C.; Salyers, A.; Chavez, C.; Bull, S. An Artificially Intelligent, Natural Language Processing Chatbot Designed to Promote COVID-19 Vaccination: A Proof-of-concept Pilot Study. Digital Health 2023, 9, 20552076231155679. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Bull, S.; Hood, S.; Mumby, S.; Hendrickson, A.; Silvasstar, J.; Salyers, A. Feasibility of using an artificially intelligent chatbot to increase access to information and sexual and reproductive health services. Digital Health 2024, 10, 20552076241308994. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Kaplan, J.; Macandlish, S.; Henighan, T.; Brown, T.; Chess, B.; Child, R.; Gray, S.; Radford, A.; Wu, J.; Amodei, D. Scaling laws for neural language models. arXiv 2020, arXiv:2001.08361. [Google Scholar] [CrossRef]
Hatem, R.; Simmons, B.; Thornton, J.E. A Call to Address AI “Hallucinations” and How Healthcare Professionals Can Mitigate Their Risks. Cureus 2023, 15, e44720. [Google Scholar] [CrossRef]
Nancy, S.; Dongre, A.R. Behavior Change Communication: Past, Present, and Future. Indian J. Community Med. 2021, 46, 186. [Google Scholar] [CrossRef]
Mumby, S.; Wright, T.; Salyers, A.; Howell, K.; Bull, S. Development of an AI Chatbot to facilitate access to information, screening, and treatment referrals for substance use disorder. Front. Digit. Health 2025, in press. [Google Scholar]
Kroenke, K.; Spitzer, R.L.; Williams, J.B. The Patient Health Questionnaire-2: Validity of a two-item depression screener. Med. Care. 2003, 41, 1284–1292. [Google Scholar] [CrossRef]
Plummer, F.; Manea, L.; Trepel, D.; McMillan, D. Screening for anxiety disorders with the GAD-7 and GAD-2: A systematic review and diagnostic meta-analysis. Gen. Hosp. Psychiatry 2016, 39, 24–31. [Google Scholar] [CrossRef]
Shirinbayan, P.; Salavati, M.; Soleimani, F.; Saeedi, A.; Asghari-Jafarabadi, M.; Hemmati-Garakani, S.; Vameghi, R. The Psychometric Properties of the Drug Abuse Screening Test. Addict. Health 2020, 12, 25–33. [Google Scholar] [CrossRef]
Salyers, A.; Bull, S.; Silvasstar, J.; Howell, K.; Wright, T.; Banaei-Kashani, F. Building and Beta-Testing Be Well Buddy Chatbot, a Secure, Credible and Trustworthy AI Chatbot That Will Not Misinform, Hallucinate or Stigmatize Substance Use Disorder: Development and Usability Study. JMIR Hum Factors 2025, 12, e69144. [Google Scholar] [CrossRef]
Harrison, R.; Flood, D.; Duce, D. Usability of mobile applications: Literature review and rationale for a new usability model. J. Interact. Sci. 2013, 1, 1. [Google Scholar] [CrossRef]
R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2024; Available online: https://www.R-project.org/ (accessed on 15 September 2024).
SocioCultural Research Consultants, LLC. Dedoose; Version 10.0; SocioCultural Research Consultants, LLC: Los Angeles, CA, USA, 2025; Available online: https://www.dedoose.com/ (accessed on 3 December 2024).
Wilson, L.; Marasoiu, M. The Development and Use of Chatbots in Public Health: Scoping Review. JMIR Hum. Factors 2022, 9, e35882. [Google Scholar] [CrossRef]
Waughtal, J.; Glorioso, T.; Sandy, L.; Peterson, P.; Chavez, C.; Bull, S.; Ho, M.; Allen, L.; Thomas, J. Patient engagement with prescription refill text reminders across time and major societal events. Cardiovasc. Digit. Health J. 2023, 4, 133–136. [Google Scholar] [CrossRef] [PubMed]
Premier Continuum. The IT Outage Explained. Available online: https://www.premiercontinuum.com/resources/microsoft-outbreak-july-2024 (accessed on 25 September 2024).
Olmstead, T.A.; Yonkers, K.A.; Ondersma, S.J.; Forray, A.; Gilstad-Hayden, K.; Martino, S. Cost-effectiveness of electronic- and clinician-delivered screening, brief intervention and referral to treatment for women in reproductive health centers. Addiction 2019, 114, 1659–1669. [Google Scholar] [CrossRef] [PubMed]
Bruguera, P.; Barrio, P.; Manthey, J.; Oliveras, C.; López-Pelayo, H.; Nuño, L.; Miquel, L.; López-Lazcano, A.; Blithikioti, C.; Caballeria, E.; et al. Mid and long-term effects of a SBIRT program for at-risk drinkers attending to an emergency department. Follow-up results from a randomized controlled trial. Eur. J. Emerg. Med. 2021, 28, 373–379. [Google Scholar] [CrossRef]
Pickard, J.G.; van den Berk-Clark, C.; Johnson, S.D.; Taylor, M. Evaluation of an online SBIRT training program. Am. J. Drug Alcohol Abuse 2024, 50, 173–180. [Google Scholar] [CrossRef] [PubMed]
Hall, A.K.; Cole-Lewis, H.; Bernhardt, J.M. Mobile text messaging for health: A systematic review of reviews. Annu. Rev. Public Health 2015, 36, 393–415. [Google Scholar] [CrossRef]
App Signal. What Are Good and Acceptable Error Rates? Available online: https://www.appsignal.com/learning-center/what-are-good-and-acceptable-error-rates (accessed on 24 September 2024).

Figure 1. Process for engagement, screening and referrals for Be Well Buddy.

Figure 2. Message distribution by time of day.

Figure 3. Topics with the most engagement.

Figure 4. Screening and scores by frequency of user queries to Be Well Buddy.

Table 1. Example intents and messages from the Be Well Buddy closed library.

Domain: Barriers to Treatment
How does this chatbot protect my privacy?	Any questions that you ask Be Well Buddy are anonymous and confidential. That means that the Be Well Buddy system doesn’t record your name. The system does collect data based on how well the responses match up with your questions so that it can continue to improve. But Be Well Buddy doesn’t share this information with anyone else.
Why do I need help with going through withdrawal?	Going through withdrawal (or detoxing) can be incredibly uncomfortable. You may feel things like intense nausea and vomiting, muscle pains, sweating, and depression or anxiety. One of the biggest reasons to get help going through withdrawal is that these symptoms can become overwhelming, and you may return to drugs for relief. Having someone there to help you through it can mean the difference between successfully getting off drugs and returning to drug use. Staff at Be Well Texas are trained in withdrawal treatment and are here to help.
Will people treat me with respect if I seek treatment?	Ben Alfeck, the Hollywood actor, has been open about his struggle with addiction. He says, “If you have a problem, getting help is a sign of courage, not weakness or failure.” And that’s what everyone at Be Well Texas thinks too! Seeking help is an act of courage and you will be treated with respect by all of your care team.
What if I don’t want to quit using?	If you don’t want to quit using drugs and/or alcohol, maybe take some time to reflect on your reasons for using and to consider the impact that drug use has on your life. Using drugs and alcohol can impact your health, relationships, finances, future, and overall well-being. Even if it feels challenging, seeking support from friends, family, or professionals can provide insights into your choices. Remember, support is available when you’re ready to explore healthier alternatives or address underlying issues contributing to your substance use.
Domain: Substances
Am I at risk for an opioid overdose?	Anyone who uses opioids—whether they have been prescribed by a medical professional or not—can overdose. If you or someone you know uses opioids, be sure to learn everything you can about how to stay safe and avoid overdose. You can talk with a Be Well provider who will help you, without judgment, to reduce any risks you might have for overdose. https://BeWelltexasclinic.org/get-started/ (accessed on 21 April 2025)
How does Narcan work?	Opioids attach to parts of your nerves in the brain, and when this happens, your brain releases dopamine, a chemical that helps to block pain. Sometimes a person using opioids may stop breathing. Narcan works by attaching to these same nerves and helps the person breathe. Narcan will only work if someone has opiates in their system. Did you know? Narcan works to reverse an overdose in 7 to 10 out of 10 cases! It is one of the most effective tools we have to reduce death from opioid overdose.
Why do people use buprenorphine?	People choose buprenorphine because it works well to help reduce withdrawal symptoms that can happen when you want to quit opioids, & it is safe—you will not overdose when taking it. Be sure to talk with a medical provider about whether this is an option for you. Be Well can help you get started when you are ready.

Table 2. Key areas to explore Be Well Buddy’s functionality and acceptability.

Usability	Relevant Area of Exploration: To What Extent Is the System….	Outcomes Measured
Memorability and Efficiency	…able to be used?	Timing and frequency of engagement/system use
Errors	…successfully delivered without errors?	Delivered accurately with minimal errors or breakdowns; level of precision in responses
Potential Efficacy	…successful in increasing access to care?	Number of screenings conducted and referrals made
Effectiveness	…potentially adaptable for delivery across multiple environments?	Able to be delivered in diverse settings +
Satisfaction and Cognitive Load	…judged as suitable, satisfying, or attractive to program deliverers? To program recipients?	Satisfaction * Intent to use/use * Easily understood, not confusing *

* Items assessed through qualitative interviews; + not assessed given delivery in single setting.

Table 3. Engagement with the Be Well Buddy AI chatbot for SUD.

	Total Sample (N = 91)	Participants Who Screened One or More Times (N = 29)	Participants Who Initiated Queries But Did Not Screen (N = 44)
Number of participant-initiated queries, mean (S.D.)	25.42 (23.42)	55.10, (27.61)	16.71, (16.17)
Range of queries initiated	0–142	16–142	1–70
Completed at least one screener	29, 32%	29, 100%	N/A
Completed GAD	29, 32%	29, 100%
Referred for GAD	24, 26%	24, 83% of those screened
Completed PHQ	27, 30%	27, 93%
Referred for PHQ	12, 13%	12, 44% of those screened
Completed DAST	25, 27%	25, 86%
Referred for DAST	15, 16%	15, 60% of those screened
Made an appointment with Be Well following referral		12 of 24, or 50% of those referred

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wright, T.; Salyers, A.; Howell, K.; Harrison, J.; Silvasstar, J.; Bull, S. A Pilot Study of an AI Chatbot for the Screening of Substance Use Disorder in a Healthcare Setting. AI 2025, 6, 113. https://doi.org/10.3390/ai6060113

AMA Style

Wright T, Salyers A, Howell K, Harrison J, Silvasstar J, Bull S. A Pilot Study of an AI Chatbot for the Screening of Substance Use Disorder in a Healthcare Setting. AI. 2025; 6(6):113. https://doi.org/10.3390/ai6060113

Chicago/Turabian Style

Wright, Tara, Adam Salyers, Kevin Howell, Jessica Harrison, Joshva Silvasstar, and Sheana Bull. 2025. "A Pilot Study of an AI Chatbot for the Screening of Substance Use Disorder in a Healthcare Setting" AI 6, no. 6: 113. https://doi.org/10.3390/ai6060113

APA Style

Wright, T., Salyers, A., Howell, K., Harrison, J., Silvasstar, J., & Bull, S. (2025). A Pilot Study of an AI Chatbot for the Screening of Substance Use Disorder in a Healthcare Setting. AI, 6(6), 113. https://doi.org/10.3390/ai6060113

Article Menu

A Pilot Study of an AI Chatbot for the Screening of Substance Use Disorder in a Healthcare Setting

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Setting

2.2. Recruitment

2.3. Measures

3. Results

3.1. Usability

3.1.1. Memorability and Efficiency

3.1.2. Errors

3.1.3. Potential for Efficacy in Increased Screening for SUD

3.1.4. Satisfaction and Cognitive Load

3.2. Areas for Improvement

4. Discussion

4.1. Principal Results

4.2. Limitations

4.3. Comparisons with Other Work

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI