Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (71)

Search Parameters:
Keywords = Google Gemini

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
13 pages, 281 KB  
Article
Is It a Case of Safe Haven? Analyzing Stablecoin Returns Considering Cryptocurrency Dynamics
by Vitor Fonseca Machado Beling Dias and Rodrigo Fernandes Malaquias
J. Risk Financial Manag. 2026, 19(1), 81; https://doi.org/10.3390/jrfm19010081 - 20 Jan 2026
Viewed by 152
Abstract
In this study, we evaluated the returns and return volatility of a Brazilian stablecoin linked to fertilizers during periods preceding its discontinuation. In light of the safe haven literature, we also tested the correlation between this stablecoin and a traditional cryptocurrency, Bitcoin, and [...] Read more.
In this study, we evaluated the returns and return volatility of a Brazilian stablecoin linked to fertilizers during periods preceding its discontinuation. In light of the safe haven literature, we also tested the correlation between this stablecoin and a traditional cryptocurrency, Bitcoin, and modeled its behavior during periods of Bitcoin’s extreme returns. In terms of methodology, we employ GARCH-family models (including DCC-GARCH) to analyze daily data from 1 December 2022 to 16 January 2025. We also employ an analysis using Large Language Models (LLMs), evaluating the stablecoin time series considering the period of its discontinuation. The results indicated that as the discontinuation date approached, the stablecoin exhibited statistically significant lower returns and higher volatility. While the DCC-GARCH indicated no correlation between the assets, we found that the stablecoin’s returns exhibited a negative relationship with Bitcoin’s extreme returns, challenging its potential efficacy as a safe haven. This article offers practical contributions for digital asset investors, indicating that even physically backed stablecoins, designed for stability, are subject to significant volatility, idiosyncratic risks, and potential discontinuation. Full article
25 pages, 3538 KB  
Article
Pushing the Limits of Large Language Models in Quantum Operations
by Dayton C. Closser and Zbigniew J. Kabala
Quantum Rep. 2026, 8(1), 7; https://doi.org/10.3390/quantum8010007 - 19 Jan 2026
Viewed by 109
Abstract
What is the fastest Artificial Intelligence Large Language Model (AI LLM) for generating quantum operations? To answer this, we present the first benchmarking study comparing popular and publicly available AI models tasked with creating quantum gate designs. The Wolfram Mathematica framework was used [...] Read more.
What is the fastest Artificial Intelligence Large Language Model (AI LLM) for generating quantum operations? To answer this, we present the first benchmarking study comparing popular and publicly available AI models tasked with creating quantum gate designs. The Wolfram Mathematica framework was used to interface with the six AI LLMs, including Google Gemini 2.0 Flash, Anthropic Claude 3 Haiku, WolframLLM Notebook Assistant For Mathematica V14.3.0.0, OpenAI ChatGPT Omni 4 Mini, Google Gemma 3 4b 1t, and DeepSeek Chat V3. Our novel study found the following: (1) Gemini 2.0 Flash is overall the fastest AI LLM of the models tested in producing average quantum gate designs at 2.66101 s, factoring in the “thinking” execution time and ServiceConnect network latencies. (2) On average, four out of the ten quantum operations that the six LLMs produced compiled in Python version 3.13.5 (40.8% success rate). (3) Quantum operations averaged approximately 21–45 Lines of Code (omitting nonsensical outliers). (4) DeepSeek Chat V3 produced the shortest code with an average of 21.6 lines. This comparison evaluates the time taken by each AI LLM platform to generate quantum operations (including ServiceConnect networking times). These findings highlight a promising horizon where publicly available Large Language Models can become fast collaborators with quantum computers, enabling rapid quantum gate synthesis and paving the way for greater interoperability between two remarkable and cutting-edge technologies. Full article
Show Figures

Figure 1

13 pages, 2459 KB  
Article
Visual Large Language Models in Radiology: A Systematic Multimodel Evaluation of Diagnostic Accuracy and Hallucinations
by Marc Sebastian von der Stück, Roman Vuskov, Simon Westfechtel, Robert Siepmann, Christiane Kuhl, Daniel Truhn and Sven Nebelung
Life 2026, 16(1), 66; https://doi.org/10.3390/life16010066 - 1 Jan 2026
Viewed by 482
Abstract
Visual large language models (VLLMs) are discussed as potential tools for assisting radiologists in image interpretation, yet their clinical value remains unclear. This study provides a systematic and comprehensive comparison of general-purpose and biomedical VLLMs in radiology. We evaluated 180 representative clinical images [...] Read more.
Visual large language models (VLLMs) are discussed as potential tools for assisting radiologists in image interpretation, yet their clinical value remains unclear. This study provides a systematic and comprehensive comparison of general-purpose and biomedical VLLMs in radiology. We evaluated 180 representative clinical images with validated reference diagnoses (radiography, CT, MRI; 60 each) using seven VLLMs (ChatGPT-4o, Gemini 2.0, Claude Sonnet 3.7, Perplexity AI, Google Vision AI, LLaVA-1.6, LLaVA-Med-v1.5). Each model interpreted the image without and with clinical context. Mixed-effects logistic regression models assessed the influence of model, modality, and context on diagnostic performance and hallucinations (fabricated findings or misidentifications). Diagnostic accuracy varied significantly across all dimensions (p ≤ 0.001), ranging from 8.1% to 29.2% across models, with Gemini 2.0 performing best and LLaVA performing weakest. CT achieved the best overall accuracy (20.7%), followed by radiography (17.3%) and MRI (13.9%). Clinical context improved accuracy from 10.6% to 24.0% (p < 0.001) but shifted the model to rely more on textual information. Hallucinations were frequent (74.4% overall) and model-dependent (51.7–82.8% across models; p ≤ 0.004). Current VLLMs remain diagnostically unreliable, heavily context-biased, and prone to generating false findings, which limits their clinical suitability. Domain-specific training and rigorous validation are required before clinical integration can be considered. Full article
Show Figures

Figure 1

15 pages, 626 KB  
Article
Evaluating the Performance of AI Large Language Models in Detecting Pediatric Medication Errors Across Languages: A Comparative Study
by Rana K. Abu-Farha, Haneen Abuzaid, Jena Alalawneh, Muna Sharaf, Redab Al-Ghawanmeh and Eyad A. Qunaibi
J. Clin. Med. 2026, 15(1), 162; https://doi.org/10.3390/jcm15010162 - 25 Dec 2025
Viewed by 1130
Abstract
Objectives: This study aimed to evaluate the performance of four AI models, (GPT-5, GPT-4, Microsoft Copilot, and Google Gemini), in detecting medication errors through pediatric case scenarios. Methods: A total of 60 pediatric cases were analyzed for the presence of medication errors, [...] Read more.
Objectives: This study aimed to evaluate the performance of four AI models, (GPT-5, GPT-4, Microsoft Copilot, and Google Gemini), in detecting medication errors through pediatric case scenarios. Methods: A total of 60 pediatric cases were analyzed for the presence of medication errors, of which only half contained errors. The cases covered four therapeutic systems (respiratory, endocrine, neurology, and infectious). The four models were exposed to the cases in both English and Arabic using a unified prompt. The responses for each model were used to calculate various performance metric cover accuracy, sensitivity, specificity and reproducibility. Analysis was carried out using SPSS version 22. Results: Microsoft Copilot demonstrated relatively higher accuracy (86.7% in English, 85.0% in Arabic) compared to other models in this dataset, followed by GPT-5 (81.7% in English, 75.0% in Arabic). GPT-4 and Google Gemini had less accuracy, with Gemini having the lowest accuracy across all languages (76.7% in English, and 73.3% in Arabic). Microsoft Copilot showed comparatively higher sensitivity and specificity, particularly in cases of respiratory and infectious diseases. The accuracy in Arabic was lower compared to that of English for the majority of models. Microsoft Copilot exhibited relatively higher reproducibility and inter-run agreement (Cohen’s Kappa = 0.836 English, 0.815 Arabic, p < 0.001 for both), while Gemini showed the lowest reproducibility. For inter-language agreement in general, Copilot showed the highest Cohen’s Kappa of 0.701 for English and Arabic (p < 0.001). Conclusions: In our evaluation, Microsoft Copilot demonstrated relatively higher performance in pediatric drug error detection compared to the other AI models. The decreased performance in Arabic points toward the requirement of improved multilingual training for supporting equal AI aid across languages. This study highlights the importance of human oversight and domain-based training for AI tools in pediatric pharmacotherapy. Full article
(This article belongs to the Section Pharmacology)
Show Figures

Figure 1

38 pages, 3484 KB  
Article
From Prompts to Paths: Large Language Models for Zero-Shot Planning in Unmanned Ground Vehicle Simulation
by Kelvin Olaiya, Giovanni Delnevo, Chan-Tong Lam, Giovanni Pau and Paola Salomoni
Drones 2025, 9(12), 875; https://doi.org/10.3390/drones9120875 - 18 Dec 2025
Viewed by 1066
Abstract
This paper explores the capability of Large Language Models (LLMs) to perform zero-shot planning through multimodal reasoning, with a particular emphasis on applications to Unmanned Ground Vehicles (UGVs) and unmanned platforms in general. We present a modular system architecture that integrates a general-purpose [...] Read more.
This paper explores the capability of Large Language Models (LLMs) to perform zero-shot planning through multimodal reasoning, with a particular emphasis on applications to Unmanned Ground Vehicles (UGVs) and unmanned platforms in general. We present a modular system architecture that integrates a general-purpose LLM with visual and spatial inputs for adaptive planning to iteratively guide UGV behavior. Although the framework is demonstrated in a ground-based setting, it directly extends to other unmanned systems, where semantic reasoning and adaptive planning are increasingly critical for autonomous mission execution. To assess performance, we employ a continuous evaluation metric that jointly considers distance and orientation, offering a more informative and fine-grained alternative to binary success measures. We evaluate a foundational LLM (i.e., Gemini 2.0 Flash, Google DeepMind) on a suite of zero-shot navigation and exploration tasks in simulated environments. Unlike prior LLM-robot systems that rely on fine-tuning or learned waypoint policies, we evaluate a purely zero-shot, stepwise LLM planner that receives no task demonstrations and reasons only from the sensed data. Our findings show that LLMs exhibit encouraging signs of goal-directed spatial planning and partial task completion, even in a zero-shot setting. However, inconsistencies in plan generation across models highlight the need for task-specific adaptation or fine-tuning. These findings highlight the potential of LLM-based multimodal reasoning to enhance autonomy in UGV and drone navigation, bridging high-level semantic understanding with robust spatial planning. Full article
(This article belongs to the Special Issue Advances in Guidance, Navigation, and Control)
Show Figures

Figure 1

2 pages, 119 KB  
Abstract
The Hidden Environmental and Climate Footprint of Frontier AI: Evidence from Google’s Gemini Training and Regional Emissions
by Jun Ho Choi
Proceedings 2025, 131(1), 94; https://doi.org/10.3390/proceedings2025131094 - 17 Dec 2025
Viewed by 221
Abstract
Recent advances in artificial intelligence (AI) have generated widespread interest and investment across industries, yet the environmental and public health costs of large-scale model training remain poorly understood [...] Full article
(This article belongs to the Proceedings of The 11th World Sustainability Forum (WSF11))
13 pages, 777 KB  
Article
AI-Powered Learning: Revolutionizing Education and Automated Code Evaluation
by Andrija Bernik, Danijel Radošević and Andrej Čep
Information 2025, 16(11), 1015; https://doi.org/10.3390/info16111015 - 20 Nov 2025
Viewed by 1890
Abstract
The paper presents a case study on using artificial intelligence (AI) for preliminary grading of student programming assignments. By integrating our previously introduced learning programming interface Verificator with the Gemini 2.5 large language model via Google AI Studio, C++ student submissions were evaluated [...] Read more.
The paper presents a case study on using artificial intelligence (AI) for preliminary grading of student programming assignments. By integrating our previously introduced learning programming interface Verificator with the Gemini 2.5 large language model via Google AI Studio, C++ student submissions were evaluated automatically and compared with teacher-assigned grades. The results showed moderate to high correlation, although the AI was stricter. The study demonstrates that AI tools can improve grading speed and consistency while highlighting the need for human oversight due to limitations in interpreting non-standard solutions. It also emphasizes ethical considerations such as transparency, bias, and data privacy in educational AI use. A hybrid grading model combining AI efficiency and human judgment is recommended. Full article
(This article belongs to the Section Information and Communications Technology)
Show Figures

Graphical abstract

29 pages, 9355 KB  
Article
AI-Delphi: Emulating Personas Toward Machine–Machine Collaboration
by Lucas Nóbrega, Luiz Felipe Martinez, Luísa Marschhausen, Yuri Lima, Marcos Antonio de Almeida, Alan Lyra, Carlos Eduardo Barbosa and Jano Moreira de Souza
AI 2025, 6(11), 294; https://doi.org/10.3390/ai6110294 - 14 Nov 2025
Viewed by 1194
Abstract
Recent technological advancements have made Large Language Models (LLMs) easily accessible through apps such as ChatGPT, Claude.ai, Google Gemini, and HuggingChat, allowing text generation on diverse topics with a simple prompt. Considering this scenario, we propose three machine–machine collaboration models to streamline and [...] Read more.
Recent technological advancements have made Large Language Models (LLMs) easily accessible through apps such as ChatGPT, Claude.ai, Google Gemini, and HuggingChat, allowing text generation on diverse topics with a simple prompt. Considering this scenario, we propose three machine–machine collaboration models to streamline and accelerate Delphi execution time by leveraging the extensive knowledge of LLMs. We then applied one of these models—the Iconic Minds Delphi—to run Delphi questionnaires focused on the future of work and higher education in Brazil. Therefore, we prompted ChatGPT to assume the role of well-known public figures from various knowledge areas. To validate the effectiveness of this approach, we asked one of the emulated experts to evaluate his responses. Although this individual validation was not sufficient to generalize the approach’s effectiveness, it revealed an 85% agreement rate, suggesting a promising alignment between the emulated persona and the real expert’s opinions. Our work contributes to leveraging Artificial Intelligence (AI) in Futures Research, emphasizing LLMs’ potential as collaborators in shaping future visions while discussing their limitations. In conclusion, our research demonstrates the synergy between Delphi and LLMs, providing a glimpse into a new method for exploring central themes, such as the future of work and higher education. Full article
(This article belongs to the Topic Generative AI and Interdisciplinary Applications)
Show Figures

Figure 1

33 pages, 2190 KB  
Article
Benchmarking ChatGPT and Other Large Language Models for Personalized Stage-Specific Dietary Recommendations in Chronic Kidney Disease
by Makpal Kairat, Gulnoza Adilmetova, Ilvira Ibraimova, Abduzhappar Gaipov, Huseyin Atakan Varol and Mei-Yen Chan
J. Clin. Med. 2025, 14(22), 8033; https://doi.org/10.3390/jcm14228033 - 12 Nov 2025
Viewed by 930
Abstract
Background: Chronic kidney disease (CKD) requires strict dietary management tailored to disease stage and individual needs. Recent advances in artificial intelligence (AI) have introduced chatbot-based tools capable of generating dietary recommendations. However, their accuracy, personalization, and practical applicability in clinical nutrition remain [...] Read more.
Background: Chronic kidney disease (CKD) requires strict dietary management tailored to disease stage and individual needs. Recent advances in artificial intelligence (AI) have introduced chatbot-based tools capable of generating dietary recommendations. However, their accuracy, personalization, and practical applicability in clinical nutrition remain largely unvalidated, particularly in non-Western settings. Methods: Simulated patient profiles representing each CKD stage were developed and used to prompt GPT-4 (OpenAI), Gemini (Google), and Copilot (Microsoft) with the same request for meal planning. AI-generated diets were evaluated by three physicians using a 5-point Likert scale across three criteria: personalization, consistency with guidelines, practicality, and availability. Descriptive statistics, Kruskal–Wallis tests, and Dunn’s post hoc tests were performed to compare model performance. Nutritional analysis of four meal plans (Initial, GPT-4, Gemini, and Copilot) was conducted using both GPT-4 estimates and manual calculations validated against clinical dietary sources. Results: Scores for personalization and consistency were significantly higher for Gemini and GPT-4 compared with Copilot, with no significant differences between Gemini and GPT-4 (p = 0.0001 and p = 0.0002, respectively). Practicality showed marginal significance, with GPT-4 slightly outperforming Gemini (p = 0.0476). Nutritional component analysis revealed discrepancies between GPT-4’s internal estimations and manual values, with occasional deviations from clinical guidelines, most notably for sodium and potassium, and moderate overestimation for phosphorus. Conclusions: While AI chatbots show promise in delivering dietary guidance for CKD patients, with Gemini demonstrating the strongest performance, further development, clinical validation, and testing with real patient data are needed before AI-driven tools can be fully integrated into patient-centered CKD nutritional care. Full article
(This article belongs to the Section Clinical Nutrition & Dietetics)
Show Figures

Figure 1

21 pages, 2761 KB  
Article
The Development and Evaluation of a Retrieval-Augmented Generation Large Language Model Virtual Assistant for Postoperative Instructions
by Syed Ali Haider, Srinivasagam Prabha, Cesar Abraham Gomez Cabello, Ariana Genovese, Bernardo Collaco, Nadia Wood, James London, Sanjay Bagaria, Cui Tao and Antonio Jorge Forte
Bioengineering 2025, 12(11), 1219; https://doi.org/10.3390/bioengineering12111219 - 7 Nov 2025
Viewed by 1823
Abstract
Background: During postoperative recovery, patients and their caregivers often lack crucial information, leading to numerous repetitive inquiries that burden healthcare providers. Traditional discharge materials, including paper handouts and patient portals, are often static, overwhelming, or underutilized, leading to patient overwhelm and contributing to [...] Read more.
Background: During postoperative recovery, patients and their caregivers often lack crucial information, leading to numerous repetitive inquiries that burden healthcare providers. Traditional discharge materials, including paper handouts and patient portals, are often static, overwhelming, or underutilized, leading to patient overwhelm and contributing to unnecessary ER visits and overall healthcare overutilization. Conversational chatbots offer a solution, but Natural Language Processing (NLP) systems are often inflexible and limited in understanding, while powerful Large Language Models (LLMs) are prone to generating “hallucinations”. Objective: To combine the deterministic framework of traditional NLP with the probabilistic capabilities of LLMs, we developed the AI Virtual Assistant (AIVA) Platform. This system utilizes a retrieval-augmented generation (RAG) architecture, integrating Gemini 2.0 Flash with a medically verified knowledge base via Google Vertex AI, to safely deliver dynamic, patient-facing postoperative guidance grounded in validated clinical content. Methods: The AIVA Platform was evaluated through 750 simulated patient interactions derived from 250 unique postoperative queries across 20 high-frequency recovery domains. Three blinded physician reviewers assessed formal system performance, evaluating classification metrics (accuracy, precision, recall, F1-score), relevance (SSI Index), completeness, and consistency (5-point Likert scale). Safety guardrails were tested with 120 out-of-scope queries and 30 emergency escalation scenarios. Additionally, groundedness, fluency, and readability were assessed using automated LLM metrics. Results: The system achieved 98.4% classification accuracy (precision 1.0, recall 0.98, F1-score 0.9899). Physician reviews showed high completeness (4.83/5), consistency (4.49/5), and relevance (SSI Index 2.68/3). Safety guardrails successfully identified 100% of out-of-scope and escalation scenarios. Groundedness evaluations demonstrated strong context precision (0.951), recall (0.910), and faithfulness (0.956), with 95.6% verification agreement. While fluency and semantic alignment were high (BERTScore F1 0.9013, ROUGE-1 0.8377), readability was 11th-grade level (Flesch–Kincaid 46.34). Conclusion: The simulated testing demonstrated strong technical accuracy, safety, and clinical relevance in simulated postoperative care. Its architecture effectively balances flexibility and safety, addressing key limitations of standalone NLP and LLMs. While readability remains a challenge, these findings establish a solid foundation, demonstrating readiness for clinical trials and real-world testing within surgical care pathways. Full article
Show Figures

Figure 1

19 pages, 374 KB  
Article
Large Language Models to Support Socially Responsible Solar Energy Siting in Utah
by Uliana Moshina, Izabelle P. Chick, Juliet E. Carlisle and Daniel P. Ames
Solar 2025, 5(4), 52; https://doi.org/10.3390/solar5040052 - 6 Nov 2025
Viewed by 779
Abstract
This study investigates the efficacy of large language models (LLMs) in supporting responsible and optimized geographic site selection for large-scale solar energy farms. Using Microsoft Bing (predecessor to Copilot), Google Bard (predecessor to Gemini), and ChatGPT, we evaluated their capability to address complex [...] Read more.
This study investigates the efficacy of large language models (LLMs) in supporting responsible and optimized geographic site selection for large-scale solar energy farms. Using Microsoft Bing (predecessor to Copilot), Google Bard (predecessor to Gemini), and ChatGPT, we evaluated their capability to address complex technical and social considerations fundamental to solar farm development. Employing a series of guided queries, we explored the LLMs’ “understanding” of social impact, geographic suitability, and other critical factors. We tested varied prompts, incorporating context from existing research, to assess the models’ ability to use external knowledge sources. Our findings demonstrate that LLMs, when meticulously guided through increasingly detailed and contextualized inquiries, can yield valuable insights. We discovered that (1) structured questioning is key; (2) characterization outperforms suggestion; and (3) harnessing expert knowledge requires specific effort. However, limitations remain. We encountered dead ends due to prompt restrictions and limited access to research for some models. Additionally, none could independently suggest the “best” site. Overall, this study reveals the potential of LLMs for geographic solar farm site selection, and our results can inform future adaptation of geospatial AI queries for similarly complex geographic problems. Full article
Show Figures

Figure 1

8 pages, 504 KB  
Article
Evaluating the Readability and Quality of Bladder Cancer Information from AI Chatbots: A Comparative Study Between ChatGPT, Google Gemini, Grok, Claude and DeepSeek
by Kunjan Patel and Robert Radcliffe
J. Clin. Med. 2025, 14(21), 7804; https://doi.org/10.3390/jcm14217804 - 3 Nov 2025
Viewed by 1499
Abstract
Background/Objectives: Artificial Intelligence (AI)-based chatbots such as ChatGPT are easily available and are quickly becoming a source of information for patients as opposed to traditional Google searches. We assessed the quality of information on bladder cancer, provided by various AI chatbots such [...] Read more.
Background/Objectives: Artificial Intelligence (AI)-based chatbots such as ChatGPT are easily available and are quickly becoming a source of information for patients as opposed to traditional Google searches. We assessed the quality of information on bladder cancer, provided by various AI chatbots such as ChatGPT 4o, Google Gemini 2.0 flash, Grok 3, Claude Sonnet 3.7 and DeepSeek R1. Their responses were analysed in terms of Readability Indices, and two consultant urologists rated the quality of information provided using the validated DISCERN tool. Methods: The top 10 most frequently asked questions about bladder cancer were identified using Google Trends. These questions were then provided to five different AI chatbots, and their responses were collected. No prompts were used, reflecting natural language queries that patients would use. The responses were analysed in terms of their readability using five validated indices: Flesch Reading Ease (FRE), the Flesch–Kincaid Reading Grade Level (FKRGL), the Gunning Fog Index, the Coleman–Liau Index and the SMOG index. Two consultant urologists then independently assessed the responses of various AI chatbots using the DISCERN tool, which rates the quality of the health information on a five-point LIKERT scale. Inter-rater agreement was calculated using Cohen’s Kappa and the intraclass correlation coefficient (ICC). Results: ChatGPT 4o was the overall winner in readability scores, with the highest Flesch Reading Ease score (59.4) and the lowest average reading grade level (7.0) required to understand the material. Grok 3 was a close second (FRE 58.3, grade level 8.7). Claude 3.7 Sonnet used the most complex language in its answers and therefore scored the lowest FRE score of 44.9, with the highest grade level (9.5) and also the highest complexity on other indices. In the DISCERN analysis, Grok 3 received the highest average score (52.0), followed closely by ChatGPT 4o (50.5). The inter-rater agreement was highest for ChatGPT 4o (ICC: 0.791; Kappa: 0.437), while it was lowest for Grok 3 (ICC: 0.339, Kappa 0.0, Weighted Kappa 0.335). Conclusions: All AI chatbots can provide generally good-quality answers to questions about bladder cancer with zero hallucinations. ChatGPT 4o was the overall winner, with the best readability metrics, strong DISCERN ratings and highest inter-rater agreement. Full article
(This article belongs to the Special Issue Advances in Diagnosis and Treatment of Urological Cancers)
Show Figures

Figure 1

7 pages, 218 KB  
Brief Report
Can AI Models like ChatGPT and Gemini Dispel Myths About Children’s and Adolescents’ Mental Health? A Comparative Brief Report
by Filipe Prazeres
Psychiatry Int. 2025, 6(4), 135; https://doi.org/10.3390/psychiatryint6040135 - 3 Nov 2025
Viewed by 1043
Abstract
Background: Dispelling myths is crucial for policy and health communication because misinformation can directly influence public behavior, undermine trust in institutions, and lead to harmful outcomes. This study aims to assess the effectiveness and differences between OpenAI’s ChatGPT and Google Gemini in dispelling [...] Read more.
Background: Dispelling myths is crucial for policy and health communication because misinformation can directly influence public behavior, undermine trust in institutions, and lead to harmful outcomes. This study aims to assess the effectiveness and differences between OpenAI’s ChatGPT and Google Gemini in dispelling myths about children’s and adolescents’ mental health. Methods: Using seven myths about mental health from the UNICEF & WHO Teacher’s Guide, ChatGPT-4o and Gemini were asked to “classify each sentence as a myth or a fact”. Responses of each LLM for word count, understandability, readability and accuracy were analyzed. Results: Both ChatGPT and Gemini correctly identified all 7 statements as myths. The average word count of ChatGPT’s responses was 60 ± 11 words, while Gemini’s responses averaged 60 ± 29 words, a statistically non-significant difference between the LLMs. The Flesch–Kincaid Grade Level averaged 11.7 ± 2.2 for ChatGPT and 10.2 ± 1.3 for Gemini, also a statistically non-significant difference. In terms of readability, both ChatGPT and Gemini’s answers were considered difficult to read, with all grades exceeding the 7th grade level. The findings should nonetheless be interpreted with caution due to the limited dataset. Conclusions: The study adds valuable insights into the strengths of ChatGPT and Gemini as helpful resources for people seeking medical information about children’s and adolescents’ mental health, although the content may not be as easily accessible to those below a college reading level. Full article
17 pages, 2127 KB  
Article
Leveraging Large Language Models for Real-Time UAV Control
by Kheireddine Choutri, Samiha Fadloun, Ayoub Khettabi, Mohand Lagha, Souham Meshoul and Raouf Fareh
Electronics 2025, 14(21), 4312; https://doi.org/10.3390/electronics14214312 - 2 Nov 2025
Viewed by 2221
Abstract
As drones become increasingly integrated into civilian and industrial domains, the demand for natural and accessible control interfaces continues to grow. Conventional manual controllers require technical expertise and impose cognitive overhead, limiting their usability in dynamic and time-critical scenarios. To address these limitations, [...] Read more.
As drones become increasingly integrated into civilian and industrial domains, the demand for natural and accessible control interfaces continues to grow. Conventional manual controllers require technical expertise and impose cognitive overhead, limiting their usability in dynamic and time-critical scenarios. To address these limitations, this paper presents a multilingual voice-driven control framework for quadrotor drones, enabling real-time operation in both English and Arabic. The proposed architecture combines offline Speech-to-Text (STT) processing with large language models (LLMs) to interpret spoken commands and translate them into executable control code. Specifically, Vosk is employed for bilingual STT, while Google Gemini provides semantic disambiguation, contextual inference, and code generation. The system is designed for continuous, low-latency operation within an edge–cloud hybrid configuration, offering an intuitive and robust human–drone interface. While speech recognition and safety validation are processed entirely offline, high-level reasoning and code generation currently rely on cloud-based LLM inference. Experimental evaluation demonstrates an average speech recognition accuracy of 95% and end-to-end command execution latency between 300 and 500 ms, validating the feasibility of reliable, multilingual, voice-based UAV control. This research advances multimodal human–robot interaction by showcasing the integration of offline speech recognition and LLMs for adaptive, safe, and scalable aerial autonomy. Full article
Show Figures

Figure 1

14 pages, 826 KB  
Article
Balancing Accuracy and Readability: Comparative Evaluation of AI Chatbots for Patient Education on Rotator Cuff Tears
by Ali Can Koluman, Mehmet Utku Çiftçi, Ebru Aloğlu Çiftçi, Başar Burak Çakmur and Nezih Ziroğlu
Healthcare 2025, 13(21), 2670; https://doi.org/10.3390/healthcare13212670 - 23 Oct 2025
Cited by 2 | Viewed by 837
Abstract
Background/Objectives: Rotator cuff (RC) tears are a leading cause of shoulder pain and disability. Artificial intelligence (AI)-based chatbots are increasingly applied in healthcare for diagnostic support and patient education, but the reliability, quality, and readability of their outputs remain uncertain. International guidelines (AMA, [...] Read more.
Background/Objectives: Rotator cuff (RC) tears are a leading cause of shoulder pain and disability. Artificial intelligence (AI)-based chatbots are increasingly applied in healthcare for diagnostic support and patient education, but the reliability, quality, and readability of their outputs remain uncertain. International guidelines (AMA, NIH, European health communication frameworks) recommend that patient materials be written at a 6th–8th grade reading level, yet most online and AI-generated content exceeds this threshold. Methods: We compared responses from three AI chatbots—ChatGPT-4o (OpenAI), Gemini 1.5 Flash (Google), and DeepSeek-V3 (Deepseek AI)—to 20 frequently asked patient questions about RC tears. Four orthopedic surgeons independently rated reliability and usefulness (7-point Likert) and overall quality (5-point Global Quality Scale). Readability was assessed using six validated indices. Statistical analysis included Kruskal–Wallis and ANOVA with Bonferroni correction; inter-rater agreement was measured using intraclass correlation coefficients (ICCs). Results: Inter-rater reliability was good to excellent (ICC 0.726–0.900). Gemini 1.5 Flash achieved the highest reliability and quality, ChatGPT-4o performed comparably but slightly lower in diagnostic content, and DeepSeek-V3 consistently scored lowest in reliability and quality but produced the most readable text (FKGL ≈ 6.5, within the 6th–8th grade target). None of the models reached a Flesch Reading Ease (FRE) score above 60, indicating that even the most readable outputs remained more complex than plain-language standards. Conclusions: Gemini 1.5 Flash and ChatGPT-4o generated more accurate and higher-quality responses, whereas DeepSeek-V3 provided more accessible content. No single model fully balanced accuracy and readability. Clinical Implications: Hybrid use of AI platforms—leveraging high-accuracy models alongside more readable outputs, with clinician oversight—may optimize patient education by ensuring both accuracy and accessibility. Future work should assess real-world comprehension and address the legal, ethical, and generalizability challenges of AI-driven patient education. Full article
Show Figures

Figure 1

Back to TopTop