Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (479)

Search Parameters:
Keywords = Gemini

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
26 pages, 595 KB  
Article
Natural Language Processing as a Scalable Method for Evaluating Educational Text Personalization by LLMs
by Linh Huynh and Danielle S. McNamara
Appl. Sci. 2025, 15(22), 12128; https://doi.org/10.3390/app152212128 (registering DOI) - 15 Nov 2025
Abstract
Four versions of science and history texts were tailored to diverse hypothetical reader profiles (high and low reading skills and domain knowledge), generated by four Large Language Models (i.e., Claude, Llama, ChatGPT, and Gemini). The Natural Language Processing (NLP) technique was applied to [...] Read more.
Four versions of science and history texts were tailored to diverse hypothetical reader profiles (high and low reading skills and domain knowledge), generated by four Large Language Models (i.e., Claude, Llama, ChatGPT, and Gemini). The Natural Language Processing (NLP) technique was applied to examine variations in Large Language Model (LLM) text personalization capabilities. NLP was leveraged to extract and quantify linguistic features of these texts, capturing linguistic variations as a function of LLMs, text genres, and reader profiles. An approach leveraging NLP-based analyses provides an automated and scalable solution for evaluating alignment between LLM-generated personalized texts and readers’ needs. Findings indicate that NLP offers a valid and generalizable means of tracking linguistic variation in personalized educational texts, supporting its use as an evaluation framework for text personalization. Full article
Show Figures

Figure 1

29 pages, 9353 KB  
Article
AI-Delphi: Emulating Personas Toward Machine–Machine Collaboration
by Lucas Nóbrega, Luiz Felipe Martinez, Luísa Marschhausen, Yuri Lima, Marcos Antonio de Almeida, Alan Lyra, Carlos Eduardo Barbosa and Jano Moreira de Souza
AI 2025, 6(11), 294; https://doi.org/10.3390/ai6110294 - 14 Nov 2025
Abstract
Recent technological advancements have made Large Language Models (LLMs) easily accessible through apps such as ChatGPT, Claude.ai, Google Gemini, and HuggingChat, allowing text generation on diverse topics with a simple prompt. Considering this scenario, we propose three machine–machine collaboration models to streamline and [...] Read more.
Recent technological advancements have made Large Language Models (LLMs) easily accessible through apps such as ChatGPT, Claude.ai, Google Gemini, and HuggingChat, allowing text generation on diverse topics with a simple prompt. Considering this scenario, we propose three machine–machine collaboration models to streamline and accelerate Delphi execution time by leveraging the extensive knowledge of LLMs. We then applied one of these models—the Iconic Minds Delphi—to run Delphi questionnaires focused on the future of work and higher education in Brazil. Therefore, we prompted ChatGPT to assume the role of well-known public figures from various knowledge areas. To validate the effectiveness of this approach, we asked one of the emulated experts to evaluate his responses. Although this individual validation was not sufficient to generalize the approach’s effectiveness, it revealed an 85% agreement rate, suggesting a promising alignment between the emulated persona and the real expert’s opinions. Our work contributes to leveraging Artificial Intelligence (AI) in Futures Research, emphasizing LLMs’ potential as collaborators in shaping future visions while discussing their limitations. In conclusion, our research demonstrates the synergy between Delphi and LLMs, providing a glimpse into a new method for exploring central themes, such as the future of work and higher education. Full article
(This article belongs to the Topic Generative AI and Interdisciplinary Applications)
15 pages, 904 KB  
Article
Bridging LLMs, Education, and Sustainability: Guiding Students in Local Community Initiatives
by Nebojša Jurišević, Novak Nikolić, Artur Nemś, Dušan Gordić, Nikola Rakić, Davor Končalović and Dénes Kocsis
Sustainability 2025, 17(22), 10148; https://doi.org/10.3390/su172210148 - 13 Nov 2025
Abstract
The introduction of large language models (LLMs) has significantly influenced learning and learning assessments, dividing the academic community with arguments for and against their implementation. This study investigates how LLMs can be effectively incorporated into student assignments on sustainable development in local communities. [...] Read more.
The introduction of large language models (LLMs) has significantly influenced learning and learning assessments, dividing the academic community with arguments for and against their implementation. This study investigates how LLMs can be effectively incorporated into student assignments on sustainable development in local communities. In that regard, the study pairs traditional, community-oriented tasks with emerging frameworks for structured LLM use, emphasizing that output quality depends on prompt quality. Accordingly, several prompting frameworks were outlined, and the suitability of ChatGPT and Gemini for specific assignment tasks was assessed. The effectiveness of the approach was evaluated with a survey of two student groups: one using supervised LLM support (23 students) and another using LLMs independently (17 students). Compared to the unsupervised group, the supervised group reported that the frameworks enhanced project preparedness, fostered critical thinking, and reduced reliance on mentors. The supervising mentor noted a slightly lower workload than in earlier projects, while the mentor of the unsupervised group reported higher effort in guiding and refining outcomes. Overall, the findings suggest that guided LLM integration has the potential to improve learning, deepen critical engagement, foster independence, and reduce mentor workload when compared to those who do not provide structured guidance in LLM use. Full article
Show Figures

Figure 1

33 pages, 2190 KB  
Article
Benchmarking ChatGPT and Other Large Language Models for Personalized Stage-Specific Dietary Recommendations in Chronic Kidney Disease
by Makpal Kairat, Gulnoza Adilmetova, Ilvira Ibraimova, Abduzhappar Gaipov, Huseyin Atakan Varol and Mei-Yen Chan
J. Clin. Med. 2025, 14(22), 8033; https://doi.org/10.3390/jcm14228033 - 12 Nov 2025
Viewed by 121
Abstract
Background: Chronic kidney disease (CKD) requires strict dietary management tailored to disease stage and individual needs. Recent advances in artificial intelligence (AI) have introduced chatbot-based tools capable of generating dietary recommendations. However, their accuracy, personalization, and practical applicability in clinical nutrition remain [...] Read more.
Background: Chronic kidney disease (CKD) requires strict dietary management tailored to disease stage and individual needs. Recent advances in artificial intelligence (AI) have introduced chatbot-based tools capable of generating dietary recommendations. However, their accuracy, personalization, and practical applicability in clinical nutrition remain largely unvalidated, particularly in non-Western settings. Methods: Simulated patient profiles representing each CKD stage were developed and used to prompt GPT-4 (OpenAI), Gemini (Google), and Copilot (Microsoft) with the same request for meal planning. AI-generated diets were evaluated by three physicians using a 5-point Likert scale across three criteria: personalization, consistency with guidelines, practicality, and availability. Descriptive statistics, Kruskal–Wallis tests, and Dunn’s post hoc tests were performed to compare model performance. Nutritional analysis of four meal plans (Initial, GPT-4, Gemini, and Copilot) was conducted using both GPT-4 estimates and manual calculations validated against clinical dietary sources. Results: Scores for personalization and consistency were significantly higher for Gemini and GPT-4 compared with Copilot, with no significant differences between Gemini and GPT-4 (p = 0.0001 and p = 0.0002, respectively). Practicality showed marginal significance, with GPT-4 slightly outperforming Gemini (p = 0.0476). Nutritional component analysis revealed discrepancies between GPT-4’s internal estimations and manual values, with occasional deviations from clinical guidelines, most notably for sodium and potassium, and moderate overestimation for phosphorus. Conclusions: While AI chatbots show promise in delivering dietary guidance for CKD patients, with Gemini demonstrating the strongest performance, further development, clinical validation, and testing with real patient data are needed before AI-driven tools can be fully integrated into patient-centered CKD nutritional care. Full article
(This article belongs to the Section Clinical Nutrition & Dietetics)
Show Figures

Figure 1

4 pages, 667 KB  
Interesting Images
Umbilical Hernia Probe-Induced Cocco Sign: Color Doppler During Pressure and Release in Standing Position
by Corrado Tagliati, Marco Fogante, Claudio Ventura, Stefania Lamja, Roberto Esposito, Marco Di Serafino, Antonio Corvino, Giulio Argalia, Ernesto Di Cesare, Andrea Delli Pizzi and Giulio Cocco
Diagnostics 2025, 15(22), 2863; https://doi.org/10.3390/diagnostics15222863 - 12 Nov 2025
Viewed by 74
Abstract
We describe a case of an ultrasound-detected umbilical hernia in a 56-year-old female patient who underwent a skin ultrasound. With the patient in a standing position, the probe was positioned at the level of the umbilicus. Pressure was applied with the probe and [...] Read more.
We describe a case of an ultrasound-detected umbilical hernia in a 56-year-old female patient who underwent a skin ultrasound. With the patient in a standing position, the probe was positioned at the level of the umbilicus. Pressure was applied with the probe and then it was released; probe-induced Cocco sign was revealed. Full article
(This article belongs to the Section Medical Imaging and Theranostics)
Show Figures

Figure 1

27 pages, 3096 KB  
Article
EnergAI: A Large Language Model-Driven Generative Design Method for Early-Stage Building Energy Optimization
by Jing Zhong, Peilin Li, Ran Luo, Jun Yin, Yizhen Ding, Junjie Bai, Chuxiang Hong, Xiang Deng, Xintong Ma and Shuai Lu
Energies 2025, 18(22), 5921; https://doi.org/10.3390/en18225921 - 10 Nov 2025
Viewed by 414
Abstract
The early stage of architectural design plays a decisive role in determining building energy performance, yet conventional evaluation is typically deferred to later phases, restricting timely and data-informed feedback. This paper proposes EnergAI, a generative design framework that incorporates energy optimization objectives directly [...] Read more.
The early stage of architectural design plays a decisive role in determining building energy performance, yet conventional evaluation is typically deferred to later phases, restricting timely and data-informed feedback. This paper proposes EnergAI, a generative design framework that incorporates energy optimization objectives directly into the scheme generation process through large language models (e.g., GPT-4o, DeepSeek-V3.1-Think, Qwen-Max, and Gemini-2.5 pro). A dedicated dataset, LowEnergy-FormNet, comprising 2160 cases with site parameters, massing descriptors, and simulation outputs, was constructed to model site, form, and energy relationships. The framework encodes building massing into a parametric vector representation and employs hierarchical prompt strategies to establish a closed-loop compatibility with ClimateStudio. Experimental evaluations demonstrate that geometry-oriented and fuzzy-goal prompts achieve average annual reductions of approximately 16–17% in energy use intensity and 3–4% in energy cost compared with human designs, while performance-oriented structured prompts deliver the most reliable improvements, eliminating high-energy outliers and yielding an average EUI-saving rate above 50%. In cross-model comparisons under an identical toolchain, GPT-4o delivered the strongest and most stable optimization, achieving 63.3% mean EUI savings, nearly 13% higher than DeepSeek-V3.1-Think, Qwen-Max, and Gemini-2.5 baselines. These results demonstrate the feasibility and indicate the potential robustness of embedding performance constraints at the generation stage, providing a feasible approach to support proactive, data-informed early design. Full article
(This article belongs to the Special Issue Challenges and Research Trends of Integrated Zero-Carbon Power Plant)
Show Figures

Figure 1

16 pages, 2243 KB  
Article
Evaluating Large Language Models in Interpreting MRI Reports and Recommending Treatment for Vestibular Schwannoma
by Arthur H. A. Sales, Christine Julia Gizaw, Jürgen Beck and Jürgen Grauvogel
Diagnostics 2025, 15(22), 2841; https://doi.org/10.3390/diagnostics15222841 - 10 Nov 2025
Viewed by 403
Abstract
Background/Objectives: The use of large language models (LLMs) by patients seeking information about their diagnosis and treatment is rapidly increasing. While their application in healthcare is still under scientific investigation, the demand for these models is expected to grow significantly in the [...] Read more.
Background/Objectives: The use of large language models (LLMs) by patients seeking information about their diagnosis and treatment is rapidly increasing. While their application in healthcare is still under scientific investigation, the demand for these models is expected to grow significantly in the coming years. This study evaluates the accuracy of three publicly available AI tools—GPT-4, Gemini, and Bing—in interpreting MRI reports and suggesting treatments for patients with vestibular schwannomas (VS). To evaluate and compare the diagnostic accuracy and treatment recommendations provided by GPT-4, Gemini, and Bing for patients with VS based on MRI reports, while addressing the growing use of these tools by patients seeking medical information. Methods: This retrospective study included 35 consecutive patients with VS treated at a university-based neurosurgery department. Anonymized MRI reports in German were translated to English, and AI tools were prompted with five standardized verbal prompts for diagnoses and treatment recommendations. Diagnostic accuracy, differential diagnoses, and treatment recommendations were assessed and compared. Results: Thirty-five patients (mean age, 57 years ± 13; 18 men) were included. GPT-4 achieved the highest diagnostic accuracy for VS at 97.14% (34/35), followed by Gemini at 88.57% (31/35), and Bing at 85.71% (30/35). GPT-4 provided the most accurate treatment recommendations (57.1%, 20/35), compared to Gemini (45.7%, 16/35) and Bing (31.4%, 11/35). GPT-4 correctly recommended surgery in 60% of cases (21/35), compared to 51.4% for Bing (18/35) and 45.7% for Gemini (16/35). The difference between GPT-4 and Bing was statistically significant (p-value: 0.02). Conclusions: GPT-4 outperformed Gemini and Bing in interpreting MRI reports and providing treatment recommendations for VS. Although the AI tools demonstrated good diagnostic accuracy, their treatment recommendations were less precise than those made by an interdisciplinary tumor board. This study highlights the growing role of AI tools in patient-driven healthcare inquiries. Full article
Show Figures

Figure 1

35 pages, 61373 KB  
Article
Mapping Manual Laboratory Tasks to Robot Movements in Digital Pathology Workflow
by Marianna Dimitrova Kucarov, Mátyás Takács, Bence Géza Czakó, Béla Molnár and Miklos Kozlovszky
Sensors 2025, 25(22), 6830; https://doi.org/10.3390/s25226830 - 8 Nov 2025
Viewed by 391
Abstract
This study evaluated and integrated automatic pathology equipment and a collaborative robot to create a fully autonomous workflow. We selected the Gemini AS Automated Slide Stainer, ClearVue Coverslipper, and Pannoramic 1000 digital slide scanner, controlled by a UR5e robotic arm. To perform essential [...] Read more.
This study evaluated and integrated automatic pathology equipment and a collaborative robot to create a fully autonomous workflow. We selected the Gemini AS Automated Slide Stainer, ClearVue Coverslipper, and Pannoramic 1000 digital slide scanner, controlled by a UR5e robotic arm. To perform essential clinical laboratory tasks, we determined that the robotic arm, in combination with a custom manipulator, requires 9 degrees of freedom—5 from the robot and 4 from the manufactured manipulator. The patented manipulator is equipped with a camera, LED lighting, and three specialized grippers for object detection and precise handling of equipment doors, magazines, and slides. It is designed to mount onto a standardized robot flange interface (ISO 9409-1-50-4-M6), making it mechanically compatible with various robot arms. A minimum of 24 distinct laboratory tasks were defined for the training of the robotic arm. This autonomous workflow mitigates labor shortages and accelerates diagnostic processes by offloading repetitive tasks, thereby improving efficiency in pathology laboratories. Full article
Show Figures

Figure 1

21 pages, 2761 KB  
Article
The Development and Evaluation of a Retrieval-Augmented Generation Large Language Model Virtual Assistant for Postoperative Instructions
by Syed Ali Haider, Srinivasagam Prabha, Cesar Abraham Gomez Cabello, Ariana Genovese, Bernardo Collaco, Nadia Wood, James London, Sanjay Bagaria, Cui Tao and Antonio Jorge Forte
Bioengineering 2025, 12(11), 1219; https://doi.org/10.3390/bioengineering12111219 - 7 Nov 2025
Viewed by 458
Abstract
Background: During postoperative recovery, patients and their caregivers often lack crucial information, leading to numerous repetitive inquiries that burden healthcare providers. Traditional discharge materials, including paper handouts and patient portals, are often static, overwhelming, or underutilized, leading to patient overwhelm and contributing to [...] Read more.
Background: During postoperative recovery, patients and their caregivers often lack crucial information, leading to numerous repetitive inquiries that burden healthcare providers. Traditional discharge materials, including paper handouts and patient portals, are often static, overwhelming, or underutilized, leading to patient overwhelm and contributing to unnecessary ER visits and overall healthcare overutilization. Conversational chatbots offer a solution, but Natural Language Processing (NLP) systems are often inflexible and limited in understanding, while powerful Large Language Models (LLMs) are prone to generating “hallucinations”. Objective: To combine the deterministic framework of traditional NLP with the probabilistic capabilities of LLMs, we developed the AI Virtual Assistant (AIVA) Platform. This system utilizes a retrieval-augmented generation (RAG) architecture, integrating Gemini 2.0 Flash with a medically verified knowledge base via Google Vertex AI, to safely deliver dynamic, patient-facing postoperative guidance grounded in validated clinical content. Methods: The AIVA Platform was evaluated through 750 simulated patient interactions derived from 250 unique postoperative queries across 20 high-frequency recovery domains. Three blinded physician reviewers assessed formal system performance, evaluating classification metrics (accuracy, precision, recall, F1-score), relevance (SSI Index), completeness, and consistency (5-point Likert scale). Safety guardrails were tested with 120 out-of-scope queries and 30 emergency escalation scenarios. Additionally, groundedness, fluency, and readability were assessed using automated LLM metrics. Results: The system achieved 98.4% classification accuracy (precision 1.0, recall 0.98, F1-score 0.9899). Physician reviews showed high completeness (4.83/5), consistency (4.49/5), and relevance (SSI Index 2.68/3). Safety guardrails successfully identified 100% of out-of-scope and escalation scenarios. Groundedness evaluations demonstrated strong context precision (0.951), recall (0.910), and faithfulness (0.956), with 95.6% verification agreement. While fluency and semantic alignment were high (BERTScore F1 0.9013, ROUGE-1 0.8377), readability was 11th-grade level (Flesch–Kincaid 46.34). Conclusion: The simulated testing demonstrated strong technical accuracy, safety, and clinical relevance in simulated postoperative care. Its architecture effectively balances flexibility and safety, addressing key limitations of standalone NLP and LLMs. While readability remains a challenge, these findings establish a solid foundation, demonstrating readiness for clinical trials and real-world testing within surgical care pathways. Full article
Show Figures

Figure 1

12 pages, 1085 KB  
Article
Genetic Insights into Familial Hypospadias Identifying Rare Variants and Their Potential Role in Urethral Development
by Kholoud N. Al-Shafai, Seem Arar, Asma Jamil, Amina Azzah, Maraeh Mancha, Luis R. Saraiva and Tariq Abbas
Genes 2025, 16(11), 1340; https://doi.org/10.3390/genes16111340 - 6 Nov 2025
Viewed by 371
Abstract
Background: Hypospadias is a common congenital condition in male infants, characterised by incomplete development of the underside of the penile shaft. Genetic factors play a major role in its development. Therefore, studying genetic contributions, especially in familial cases, can enhance our understanding of [...] Read more.
Background: Hypospadias is a common congenital condition in male infants, characterised by incomplete development of the underside of the penile shaft. Genetic factors play a major role in its development. Therefore, studying genetic contributions, especially in familial cases, can enhance our understanding of disease causes and guide targeted interventions. Materials and Methods: Through a structured biobank for hypospadias, we collected blood samples from individuals with familial hypospadias and their relatives. Whole-genome sequencing (WGS) was performed on 27 individuals across seven families to identify potential genetic causes. Bioinformatics analysis, including the GEMINI tool, was used to assess inheritance patterns of single-nucleotide variants (SNVs) within families and identify potential causative SNVs. Results: We identified three likely pathogenic variants in genes not previously associated with hypospadias in EIF2B5, INO80, and ACADVL genes, in three index patients. These variants co-segregated with the condition within the families. Additionally, we detected variants of uncertain significance in hypospadias-related gene families (DNAH12 and LHFP) and in other genes, such as COL6A3, which may cause the phenotype. No potential causative variants were found in two of the seven studied families, indicating the need for further analysis, including the assessment of copy number variants (CNVs). Functional studies will be crucial to establish the role of the identified variants in the development of hypospadias. Conclusions: This study underscores the importance of disease biobanking and genetic analysis in identifying potential underlying causes of congenital conditions, such as hypospadias. The identified variants provide new opportunities for functional research and may enhance our understanding of hypospadias pathophysiology. These findings broaden the genetic landscape of hypospadias and lay the groundwork for functional validation, improved risk assessment, and personalised medicine strategies. Full article
(This article belongs to the Section Human Genomics and Genetic Diseases)
Show Figures

Figure 1

19 pages, 374 KB  
Article
Large Language Models to Support Socially Responsible Solar Energy Siting in Utah
by Uliana Moshina, Izabelle P. Chick, Juliet E. Carlisle and Daniel P. Ames
Solar 2025, 5(4), 52; https://doi.org/10.3390/solar5040052 - 6 Nov 2025
Viewed by 193
Abstract
This study investigates the efficacy of large language models (LLMs) in supporting responsible and optimized geographic site selection for large-scale solar energy farms. Using Microsoft Bing (predecessor to Copilot), Google Bard (predecessor to Gemini), and ChatGPT, we evaluated their capability to address complex [...] Read more.
This study investigates the efficacy of large language models (LLMs) in supporting responsible and optimized geographic site selection for large-scale solar energy farms. Using Microsoft Bing (predecessor to Copilot), Google Bard (predecessor to Gemini), and ChatGPT, we evaluated their capability to address complex technical and social considerations fundamental to solar farm development. Employing a series of guided queries, we explored the LLMs’ “understanding” of social impact, geographic suitability, and other critical factors. We tested varied prompts, incorporating context from existing research, to assess the models’ ability to use external knowledge sources. Our findings demonstrate that LLMs, when meticulously guided through increasingly detailed and contextualized inquiries, can yield valuable insights. We discovered that (1) structured questioning is key; (2) characterization outperforms suggestion; and (3) harnessing expert knowledge requires specific effort. However, limitations remain. We encountered dead ends due to prompt restrictions and limited access to research for some models. Additionally, none could independently suggest the “best” site. Overall, this study reveals the potential of LLMs for geographic solar farm site selection, and our results can inform future adaptation of geospatial AI queries for similarly complex geographic problems. Full article
Show Figures

Figure 1

8 pages, 504 KB  
Article
Evaluating the Readability and Quality of Bladder Cancer Information from AI Chatbots: A Comparative Study Between ChatGPT, Google Gemini, Grok, Claude and DeepSeek
by Kunjan Patel and Robert Radcliffe
J. Clin. Med. 2025, 14(21), 7804; https://doi.org/10.3390/jcm14217804 - 3 Nov 2025
Viewed by 416
Abstract
Background/Objectives: Artificial Intelligence (AI)-based chatbots such as ChatGPT are easily available and are quickly becoming a source of information for patients as opposed to traditional Google searches. We assessed the quality of information on bladder cancer, provided by various AI chatbots such [...] Read more.
Background/Objectives: Artificial Intelligence (AI)-based chatbots such as ChatGPT are easily available and are quickly becoming a source of information for patients as opposed to traditional Google searches. We assessed the quality of information on bladder cancer, provided by various AI chatbots such as ChatGPT 4o, Google Gemini 2.0 flash, Grok 3, Claude Sonnet 3.7 and DeepSeek R1. Their responses were analysed in terms of Readability Indices, and two consultant urologists rated the quality of information provided using the validated DISCERN tool. Methods: The top 10 most frequently asked questions about bladder cancer were identified using Google Trends. These questions were then provided to five different AI chatbots, and their responses were collected. No prompts were used, reflecting natural language queries that patients would use. The responses were analysed in terms of their readability using five validated indices: Flesch Reading Ease (FRE), the Flesch–Kincaid Reading Grade Level (FKRGL), the Gunning Fog Index, the Coleman–Liau Index and the SMOG index. Two consultant urologists then independently assessed the responses of various AI chatbots using the DISCERN tool, which rates the quality of the health information on a five-point LIKERT scale. Inter-rater agreement was calculated using Cohen’s Kappa and the intraclass correlation coefficient (ICC). Results: ChatGPT 4o was the overall winner in readability scores, with the highest Flesch Reading Ease score (59.4) and the lowest average reading grade level (7.0) required to understand the material. Grok 3 was a close second (FRE 58.3, grade level 8.7). Claude 3.7 Sonnet used the most complex language in its answers and therefore scored the lowest FRE score of 44.9, with the highest grade level (9.5) and also the highest complexity on other indices. In the DISCERN analysis, Grok 3 received the highest average score (52.0), followed closely by ChatGPT 4o (50.5). The inter-rater agreement was highest for ChatGPT 4o (ICC: 0.791; Kappa: 0.437), while it was lowest for Grok 3 (ICC: 0.339, Kappa 0.0, Weighted Kappa 0.335). Conclusions: All AI chatbots can provide generally good-quality answers to questions about bladder cancer with zero hallucinations. ChatGPT 4o was the overall winner, with the best readability metrics, strong DISCERN ratings and highest inter-rater agreement. Full article
(This article belongs to the Special Issue Advances in Diagnosis and Treatment of Urological Cancers)
Show Figures

Figure 1

7 pages, 218 KB  
Brief Report
Can AI Models like ChatGPT and Gemini Dispel Myths About Children’s and Adolescents’ Mental Health? A Comparative Brief Report
by Filipe Prazeres
Psychiatry Int. 2025, 6(4), 135; https://doi.org/10.3390/psychiatryint6040135 - 3 Nov 2025
Viewed by 312
Abstract
Background: Dispelling myths is crucial for policy and health communication because misinformation can directly influence public behavior, undermine trust in institutions, and lead to harmful outcomes. This study aims to assess the effectiveness and differences between OpenAI’s ChatGPT and Google Gemini in dispelling [...] Read more.
Background: Dispelling myths is crucial for policy and health communication because misinformation can directly influence public behavior, undermine trust in institutions, and lead to harmful outcomes. This study aims to assess the effectiveness and differences between OpenAI’s ChatGPT and Google Gemini in dispelling myths about children’s and adolescents’ mental health. Methods: Using seven myths about mental health from the UNICEF & WHO Teacher’s Guide, ChatGPT-4o and Gemini were asked to “classify each sentence as a myth or a fact”. Responses of each LLM for word count, understandability, readability and accuracy were analyzed. Results: Both ChatGPT and Gemini correctly identified all 7 statements as myths. The average word count of ChatGPT’s responses was 60 ± 11 words, while Gemini’s responses averaged 60 ± 29 words, a statistically non-significant difference between the LLMs. The Flesch–Kincaid Grade Level averaged 11.7 ± 2.2 for ChatGPT and 10.2 ± 1.3 for Gemini, also a statistically non-significant difference. In terms of readability, both ChatGPT and Gemini’s answers were considered difficult to read, with all grades exceeding the 7th grade level. The findings should nonetheless be interpreted with caution due to the limited dataset. Conclusions: The study adds valuable insights into the strengths of ChatGPT and Gemini as helpful resources for people seeking medical information about children’s and adolescents’ mental health, although the content may not be as easily accessible to those below a college reading level. Full article
17 pages, 2127 KB  
Article
Leveraging Large Language Models for Real-Time UAV Control
by Kheireddine Choutri, Samiha Fadloun, Ayoub Khettabi, Mohand Lagha, Souham Meshoul and Raouf Fareh
Electronics 2025, 14(21), 4312; https://doi.org/10.3390/electronics14214312 - 2 Nov 2025
Viewed by 842
Abstract
As drones become increasingly integrated into civilian and industrial domains, the demand for natural and accessible control interfaces continues to grow. Conventional manual controllers require technical expertise and impose cognitive overhead, limiting their usability in dynamic and time-critical scenarios. To address these limitations, [...] Read more.
As drones become increasingly integrated into civilian and industrial domains, the demand for natural and accessible control interfaces continues to grow. Conventional manual controllers require technical expertise and impose cognitive overhead, limiting their usability in dynamic and time-critical scenarios. To address these limitations, this paper presents a multilingual voice-driven control framework for quadrotor drones, enabling real-time operation in both English and Arabic. The proposed architecture combines offline Speech-to-Text (STT) processing with large language models (LLMs) to interpret spoken commands and translate them into executable control code. Specifically, Vosk is employed for bilingual STT, while Google Gemini provides semantic disambiguation, contextual inference, and code generation. The system is designed for continuous, low-latency operation within an edge–cloud hybrid configuration, offering an intuitive and robust human–drone interface. While speech recognition and safety validation are processed entirely offline, high-level reasoning and code generation currently rely on cloud-based LLM inference. Experimental evaluation demonstrates an average speech recognition accuracy of 95% and end-to-end command execution latency between 300 and 500 ms, validating the feasibility of reliable, multilingual, voice-based UAV control. This research advances multimodal human–robot interaction by showcasing the integration of offline speech recognition and LLMs for adaptive, safe, and scalable aerial autonomy. Full article
Show Figures

Figure 1

22 pages, 1320 KB  
Article
Comparative Evaluation of Advanced Chunking for Retrieval-Augmented Generation in Large Language Models for Clinical Decision Support
by Cesar Abraham Gomez-Cabello, Srinivasagam Prabha, Syed Ali Haider, Ariana Genovese, Bernardo G. Collaco, Nadia G. Wood, Sanjay Bagaria and Antonio Jorge Forte
Bioengineering 2025, 12(11), 1194; https://doi.org/10.3390/bioengineering12111194 - 1 Nov 2025
Viewed by 872
Abstract
Retrieval-augmented generation (RAG) quality depends on how source documents are segmented before indexing; fixed-length chunks can split concepts or add noise, reducing precision. We evaluated whether proposition, semantic, and adaptive chunking improve accuracy and relevance for safer clinical decision support. Using a curated [...] Read more.
Retrieval-augmented generation (RAG) quality depends on how source documents are segmented before indexing; fixed-length chunks can split concepts or add noise, reducing precision. We evaluated whether proposition, semantic, and adaptive chunking improve accuracy and relevance for safer clinical decision support. Using a curated domain knowledge base with Gemini 1.0 Pro, we built four otherwise identical RAG pipelines that differed only in the chunking strategy: adaptive length, proposition, semantic, and a fixed token-dependent baseline. Thirty common postoperative rhinoplasty questions were submitted to each pipeline. Outcomes included medical accuracy and clinical relevance (3-point Likert scale) and retrieval precision, recall, and F1; group differences were tested with ANOVA and Tukey post hoc analyses. Adaptive chunking achieved the highest accuracy—87% (Likert 2.37 ± 0.72) versus baseline 50% (1.63 ± 0.72; p = 0.001)—and the highest relevance (93%, 2.90 ± 0.40). Retrieval metrics were strongest with adaptive (precision 0.50, recall 0.88, F1 0.64) versus baseline (0.17, 0.40, 0.24). Proposition and semantic strategies improved all metrics relative to baseline, though less than adaptive. Aligning chunks to logical topic boundaries yielded more accurate, relevant answers without modifying the language model, offering a model-agnostic, data-source-neutral lever to enhance the safety and utility of LLM-based clinical decision support. Full article
Show Figures

Figure 1

Back to TopTop