1. Introduction
Artificial Intelligence (AI) has emerged as a transformative driver in modern science and engineering, encompassing a diverse spectrum of subfields ranging from general cognitive processes, such as learning and perception, to highly specialized applications, including disease detection and the verification of complex mathematical theorems. As a useful discipline, AI is increasingly being integrated into nearly every facet of intellectual life, acting as both a hub of knowledge and an active agent in problem-solving [
1]. The utility of AI in healthcare decision-making became remarkably prominent during the COVID-19 pandemic, where it played a vital role in disease monitoring and diagnostic screening, thereby demonstrating its profound value in enhancing clinical and public health outcomes [
2].
In the modern era, the search for medical advice has moved beyond the “Dr. Google” era of static search results into the era of “conversational health-seeking.” Despite this accessibility, the transition toward AI-mediated health seeking is accompanied by significant concerns regarding clinical accuracy and patient safety. Empirical evaluations have indicated that global digital assistants, such as Siri, Alexa, and Google Assistant, may provide incomplete or potentially misleading responses to critical health inquiries, often failing to recognize the distinction required for acute medical emergencies [
3]. Recent evidence from Alharbi et al. (2025) suggests that while advanced models like ChatGPT exhibit encouraging performances, demonstrating alignment with clinical guidelines in specialized areas such as ophthalmic chemical injuries, rigorous clinical oversight remains critical to ensure diagnostic reliability [
4].
Modern AI applications for the general public have transitioned from simple digital archives to active decision-support systems, which are broadly categorized into three domains: symptom checkers and triaging, wearable biometrics, and generative AI. Applications such as Ada, WebMD, and K Health utilize probabilistic graphical models to compare user-input symptoms against vast medical databases, often performing at accuracy rates comparable to general practitioners in non-emergency scenarios. Simultaneously, wearable devices like the Apple Watch, Fitbit, and Oura ring have shifted from passive step-counting to active “preventative” monitoring, utilizing AI to detect arrhythmias (e.g., atrial fibrillation) or sleep apnea. Finally, the emergence of LLMs such as ChatGPT and Gemini has introduced a layer of synthesized, personalized advice. In the context of the general population, the use of AI for health-related decision making is defined as the autonomous utilization of technologies, specifically LLMs, symptom checkers, and wearable biometrics, to interpret symptoms and determine health-seeking behaviors without immediate professional intervention [
5].
Reliance on AI in health-related decisions refers to the psychological and behavioral commitment to act upon AI-generated results. This transition from familiarity to dependence is heavily influenced by a spectrum of factors ranging from supplementary support to delegated decision-making. While many users utilize AI for initial triaging or convenience (e.g., appointment reminders), a notable group maintains consistent use for chronic disease self-management. Reliance is not dictated by accuracy alone; it is shaped by task complexity and cognitive load. Users may follow AI advice despite limited understanding of system limitations if the interface provides a high degree of perceived “humanness” or empathy [
6]. However, reliance is often condition-specific, with higher dependence observed in routine self-management compared to life-saving health decisions [
7]. Populations in Saudi Arabia are more likely to act on AI outputs when the system is framed as clinically validated or embedded within trusted institutional portals, such as national e-health platforms [
8].
Trust serves as a multidimensional psychological state that bridges the gap between AI output and user action. In the Saudi context, it is categorized into three dimensions: Competence Trust, Integrity Trust, and Benevolence Trust. Competence Trust refers to the belief in the AI’s ability to provide accurate clinical advice [
9]; Integrity Trust involves the expectation that the system operates honestly and discloses limitations—explainability is a critical factor here, with users prioritizing transparency over raw performance; and Benevolence Trust is the perception that the system acts in the user’s best interest, often mediated by data security and privacy concerns [
10].
In KSA, 61.1% of users believe AI assists professionals, yet only 12.5% believe it can replace them, indicating that Competence Trust is conditional [
11].
Patients now bypass formal settings to use AI in evaluating health risks and selecting treatment options independently, as shown by several studies [
12]. Nevertheless, even when AI aligns with clinical guidelines, the absence of human oversight can lead to inappropriate self-treatment and delays in seeking professional care.
Regionally, systematic reviews across Arab nations demonstrate a robust integration of Artificial Intelligence (AI) into clinical healthcare routines. In Saudi Arabia specifically, the exigencies of the COVID-19 pandemic served as a critical catalyst, accelerating adoption rates to an estimated 82% [
13]. Despite the widespread assumption that “digital natives” are more receptive to new tech, demographic patterns reveal a surprising divergence; while younger generations possess higher general technical literacy, research by Cinalioglu et al. identified a paradoxical trend: older adults reported significantly greater comfort in utilizing AI for health-related decision making compared to their younger counterparts [
14].
In spite of this high penetration, public engagement in Saudi Arabia is characterized by a “trust–competence paradox”, as shown by a study conducted by Alshutayli, who found that 52.3% of respondents express comfort in using “AI doctors” as partial alternatives to human physicians, where 63.7% remain concerned about their ability to accurately communicate symptoms to an algorithm [
15]. Furthermore, while a significant portion of the population believes AI will enhance professional performance, a study conducted by Syed et al. showed that only 12.5% believe it can fully replace a physician [
16].
1.1. Public Health Significance
This study addresses the urgent need to map the circumstances under which the Saudi public utilizes AI, distinguishing between helpful integration and hazardous substitution of professional expertise. By identifying the drivers of this trust–reliance gap, this research provides the evidence base necessary to align Vision 2030’s technological ambitions with robust patient safety protocols and optimized resource allocation.
1.2. Aim
This study aims to investigate the determinants of AI reliance in healthcare decision-making. By analyzing user trust and demographic variables, the research seeks to understand how AI facilitates personalized diagnosis and treatment planning among individuals in Saudi Arabia.
2. Materials and Methods
2.1. Study Design, Setting, and Period
A quantitative, descriptive cross-sectional study with inferential analysis was conducted in the Kingdom of Saudi Arabia (KSA), and the data were collected between January and March 2026. The formal study and full implementation strictly adhered to the protocol following IRB approval, leading to the finalization of the report in May 2026. This design was chosen to provide a representative snapshot of the prevalence, perceptions, and levels of reliance on Artificial Intelligence (AI) in health-related decision making during the nation’s current phase of rapid digital transformation. KSA serves as an ideal setting due to its diverse population of approximately 35 million and its strategic focus on AI under Vision 2030 [
17].
2.2. Participants and Eligibility Criteria
The target population comprised adults (≥18 years) residing across all five administrative regions of KSA (Northern, Southern, Eastern, Western, and Central). Inclusion Criteria: Residents of KSA (citizens and expatriates), regular users of internet-enabled digital devices, and those providing electronic informed consent. Exclusion Criteria: Individuals with cognitive or psychiatric impairments, healthcare professionals and medical students (to eliminate professional knowledge bias), and participants involved in the pilot phase.
2.3. Sampling and Sample Size
2.3.1. Sampling
Non-probability convenience sampling was utilized, and the survey was hosted on Google Forms and disseminated via major social media platforms (WhatsApp, X, and LinkedIn) to ensure a broad geographical reach across the Kingdom.
2.3.2. Sample Size Determination
The minimum sample size was calculated using the single population proportion formula [
18].
where 52.3% of respondents reported comfort in using AI as a physician alternative, agreeing with a previous study [
8]. With a 95% confidence level (
Z = 1.96) and a 5% margin of error (
d = 0.05), the initial requirement was 384 participants. To account for the design effect (DEFF) of web-based non-probability sampling, a factor of 1.5 was applied (384 times 1.5 = 576). The target was rounded to 580, and a final sample of 627 was obtained (representing an 8% buffer for non-response).
2.4. Instrumentation and Adaptation
2.4.1. Data Collection Instrument
A structured, self-administered Arabic questionnaire was developed, comprising four sections:
Sociodemographic: age, gender, education, income, occupation, region, and marital status.
AI Usage Patterns: frequency of use, tool types (e.g., ChatGPT, chatbots), and motivations (e.g., cost-saving, convenience).
Perceptions and Self-Efficacy: adapted from Zhang et al. [
19], assessing the trustworthiness of AI advice.
Reliance and Dependence: AI reliance was measured using the framework by Cao and Huang [
20], while emotional/behavioral dependence was assessed via the AI dependence scale by Morales-García et al. [
21].
Scoring: items used a 5-point Likert scale (1: Strongly Disagree to 5: Strongly Agree). Reliance and Dependence: Total scores (range 4–20) were categorized into low (4–9.33), moderate (9.34–14.66), and high (14.67–20). Treatment Trust: Total scores (range 4–8) were categorized into low (4–5.33), moderate (5.34–6.66), and high (6.67–8).
2.4.2. Instrument Translation and Cultural Adaptation
To ensure linguistic and conceptual equivalence, the original English scales were translated into Arabic following a rigorous forward-and-backward translation protocol. First, two independent bilingual researchers native to Saudi Arabia translated the questionnaire into Arabic (forward translation), and then a third independent bilingual researcher translated this Arabic draft back into English (back-translation) without having access to the original English version. The back-translated version was compared against the original English text by the research team to resolve any semantic discrepancies.
Cultural adaptation was subsequently performed by the expert panel in healthcare informatics and public health to ensure the phrasing aligned with local terminology and cultural contexts in Saudi Arabia. Finally, the tool was pilot-tested on a small sample (
n = 23) to verify clarity and readability before formal data collection commenced. Survey on the Use of AI-Based Health Tools and Health Decision-Making Among Adults was presented in
Supplementary Materials.
A multi-step validation process was employed to ensure the instrument’s psychometric integrity. Because the measurement items were adapted directly from established, previously validated frameworks, specifically Zhang et al. [
19] for trustworthiness, Cao and Huang [
20] for AI reliance, and Morales-García et al. [
21] for AI dependence, construct validity had been rigorously established by the original authors. Therefore, a de novo Exploratory Factor Analysis (EFA) or Confirmatory Factor Analysis (CFA) was not performed. To adapt these scales to the target population, a forward-and-backward translation process into Arabic was conducted by independent bilingual researchers. Content and face validity were then evaluated by a panel of experts in healthcare informatics and public health to ensure cultural relevance, clarity, and alignment of the items with the underlying constructs. A pilot study (
n = 23) was conducted to confirm linguistic clarity; pilot data were excluded from the final analysis. Internal consistency for the full sample (
n = 627) was excellent: the total 8-item scale demonstrated a Cronbach’s alpha of 0.907, while the AI reliance and AI dependence subscales yielded alpha values of 0.830 and 0.880, respectively.
2.5. Data Analysis
Data was analyzed using JMP software (SAS JMP, Cary, NC, USA). Descriptive statistics (frequencies, percentages, means, and standard deviations) summarized the demographic profile and AI usage. Inferential analysis included chi-square tests for categorical variables, and ANOVA and independent t-tests for comparing mean scores. A multivariable linear regression model identified independent predictors of AI reliance, and statistical significance was set at p < 0.05.
A multiple linear regression analysis was conducted to identify the primary predictors of AI reliance. The dependent variable was operationalized as the total continuous score from the AI reliance subscale, while the independent variables entered into the model simultaneously included baseline demographic factors (age, gender, education, and income) alongside continuous scores from the Treatment Trust Scale.
2.6. Ethical Considerations
This study was conducted in accordance with the Declaration of Helsinki, and ethical approval was obtained from Princess Nourah bint Abdulrahman University (25-0689). Participants provided informed consent electronically before accessing the survey.
2.7. Declaration of Generative AI Use
Generative AI was used in this study solely for the purpose of refining the manuscript’s language, grammar, and formatting to ensure clarity and professional standards. It was not used for study design, data collection, or the interpretation of statistical results.
4. Discussion
The integration of Artificial Intelligence (AI) into the Saudi Arabian healthcare landscape is no longer a small trend but a central component of patient behavior. This study reveals a significant shift where AI, specifically Large Language Models (LLMs), acts as a primary gatekeeper to formal medical consultation. The following discussion synthesizes the findings across behavioral patterns, the “reliance–trust gap”, and the demographic drivers of AI adoption in the Kingdom.
4.1. The Paradigm Shift: AI as the New Clinical Front Door
The results indicate a profound shift in the patient’s informational journey. With 92.2% of participants favoring LLMs and chatbots over specialized digital tools, AI has effectively replaced traditional search engines and, for a substantial portion of the cohort, represents the initial step in addressing health inquiries (
Table 1). This high frequency of usage (mean = 3.75) aligns with Al-Somali’s findings regarding the role of AI-powered chatbots in advancing health management in Saudi Arabia [
8].
The dominance of LLMs over specialized symptom checkers (3%) suggests that the “natural language” interface is a major catalyst for adoption. Patients utilize a conversational interface that mimics human interaction, a trend that underscores the transformative potential of digital health in disease detection and management [
2]. Because 71.8% of users report consulting AI as a preliminary step before attending professional appointments, these tools function as an influential “pre-consultation” phase in the modern patient workflow. While this data does not imply that AI replaces institutional healthcare channels or formal clinical triage—metrics that fall outside the comparative scope of this study—it highlights AI’s role as an immediate, self-directed source of health information prior to face-to-face medical encounters.
4.2. The “Reliance–Trust Gap”: A Psychological Paradox
A striking finding is the discrepancy between high AI reliance (mean = 15.025) and moderate Treatment Trust (mean = 6.489) (
Table 4). Behavioral Habit vs. Clinical Faith: Users have developed a sustained behavioral habit (r = 0.7308), yet they remain cognitively guarded regarding the clinical validity of the output (
Table 7). Strategic Hedging: This “reliance–trust gap” indicates that while AI is viewed as a sophisticated assistant, it has not achieved the status of a definitive clinical authority. This reflects a level of skepticism regarding final treatment outcomes without professional oversight, echoing concerns about safety risks when using conversational assistants for medical information [
3]. Saudi users demonstrate a pattern of strategic utilization—utilizing AI for rapid screening while reserving final clinical authority for human practitioners [
10].
It is important to note, however, that this “reliance–trust gap” was not quantified using a standalone, specialized psychometric instrument. Instead, this construct represents an analytical inference derived from the statistical divergence between two independent, previously validated scales within our framework: the AI reliance subscale [
20] and the Treatment Trust Scale [
19]. While this conceptual synthesis explains the behavioral tension between high daily utility and guarded clinical faith, future psychometric research should focus on developing unified metrics designed to directly measure this cognitive dissonance.
4.3. Economic and Clinical Drivers
The dual motivation of clinical convenience (76.9%) and cost-avoidance (46.7%) creates a distinct adoption profile (
Table 2). Efficiency: The high demand for initial treatment suggestions and medication risk comparisons (69.2%) indicates a public emphasis on immediate health literacy. Economic Impact: Nearly half of the respondents utilize AI as a practical cost-avoidance measure, which suggests that AI acts as a technology for patient empowerment, allowing users to navigate health concerns while bypassing professional service fees [
12]. However, this economic driver remains secondary to the clinical convenience of rapid information gathering [
13].
4.4. Demographic Factors
Contrary to many digital divide theories, the analysis shows that age, education, and income do not significantly influence AI-related behaviors in this context (
Table 9). The Saudi Context: Adoption in the Kingdom appears uniform across generations, diverging from findings in other regions where younger populations show significantly different perceptions [
14]. This uniformity may be attributed to high smartphone penetration and the digital-first infrastructure promoted by the General Authority for Statistics [
17]. Gender Significance: Gender emerged as a statistically significant determinant for both AI Reliance (χ
2 = 11.196,
p = 0.0037) and AI dependence (χ
2 = 8.399,
p = 0.0150), with female participants demonstrating a higher baseline prevalence in the high-frequency categories (
Table 6 and
Table 9). Following the primary research objectives of this study, this variation is captured as a baseline demographic predictor; exploring granular intersectional subgroups or evaluating multi-layered interactions within this gender variance fell outside the objective scope of the current framework. Bivariate analyses initially indicated that gender was a statistically significant determinant of standalone AI reliance, with female participants showing a higher descriptive prevalence in high-use categories. However, this effect completely disappears in our multiple linear regression model (
Table 8). Although the bivariate analysis results (
Table 6) showed a significant association between gender and AI reliance, this effect diminished and became non-significant once entered into the multivariable regression model (
Table 9). This indicates that the initial gender association was influenced by shared variance with other sociodemographic factors rather than representing an independent effect. This divergence is not attributable to multicollinearity; all Variance Inflation Factors (VIFs) were well below 2.0. Instead, the pattern reflects a mediation process in which psychometric Trust emerges as the primary behavioral determinant. Trust accounts for the variance that gender appeared to explain in the bivariate test. In practical terms, gender differences in AI reliance within this sample are better understood as downstream reflections of differing trust levels rather than as standalone demographic predictors.
4.5. Predictors of Behavior: Trust as the Ultimate Catalyst
The regression analysis confirms that trust levels are the only significant predictors of AI engagement (
p < 0.0001), while demographic variables fail to reach significance (
Table 8). The Trust Threshold: High trust levels have a strong positive impact on reliance (β = 2.149), while low trust levels show a strong inhibitory effect (β = −2.338). Clinical Implications: These results imply that psychological factors and the perceived reliability of the system are far more influential than baseline demographics [
20]. To bridge the “reliance–trust gap”, stakeholders should focus on improving source attribution and citation in AI-generated advice, which has been shown to bolster user trust [
19].
4.6. Methodological Scope and Limitations
The cross-sectional nature of this study limits the ability to infer any causal relationships. In addition, the model excluded several important determinants of technology adoption, such as user-experience design, system latency, privacy concerns, perceived risk, prior negative encounters, and the transparency of AI algorithms.
The reliance on online convenience sampling produced a strong demographic skew toward younger individuals and students, which restricts the generalizability of the findings to the wider Saudi population.
The data are also dependent on self-reporting and recall biases. Moreover, the study did not evaluate participants’ digital health or AI literacy, their capacity to detect misinformation, or whether they sought confirmation of AI-generated advice from healthcare professionals.
Because the study lacked direct comparisons with clinic visitation patterns or hospital triage records, the results reflect digital information-seeking behaviors rather than evidence of a systemic shift in how people enter the healthcare system.
Finally, the study did not include any clinical auditing of the AI outputs; the accuracy and medical safety of the recommendations were not assessed. Future work should integrate behavioral data with clinical evaluations to identify potential safety risks.
4.7. Recommendations
This study recommends shifting from cross-sectional surveys to longitudinal tracking to definitively establish causal relationships between user trust and behavioral reliance over time.
It also suggests combining self-reported behavioral tracking with clinical auditing of AI outputs to measure the objective medical accuracy of advice and assess user exposure to misinformation.