Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (6)

Search Parameters:
Keywords = Anthropic’s Claude

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
41 pages, 1212 KiB  
Article
Detection of Malicious Office Open Documents (OOXML) Using Large Language Models: A Static Analysis Approach
by Jonas Heß  and Kalman Graffi
J. Cybersecur. Priv. 2025, 5(2), 32; https://doi.org/10.3390/jcp5020032 - 11 Jun 2025
Viewed by 892
Abstract
The increasing prevalence of malicious Microsoft Office documents poses a significant threat to cybersecurity. Conventional methods of detecting these malicious documents often rely on prior knowledge of the document or the exploitation method employed, thus enabling the use of signature-based or rule-based approaches. [...] Read more.
The increasing prevalence of malicious Microsoft Office documents poses a significant threat to cybersecurity. Conventional methods of detecting these malicious documents often rely on prior knowledge of the document or the exploitation method employed, thus enabling the use of signature-based or rule-based approaches. Given the accelerated pace of change in the threat landscape, these methods are unable to adapt effectively to the evolving environment. Existing machine learning approaches are capable of identifying sophisticated features that enable the prediction of a file’s nature, achieving sufficient results on existing samples. However, they are seldom adequately prepared for the detection of new, advanced malware techniques. This paper proposes a novel approach to detecting malicious Microsoft Office documents by leveraging the power of large language models (LLMs). The method involves extracting textual content from Office documents and utilising advanced natural language processing techniques provided by LLMs to analyse the documents for potentially malicious indicators. As a supplementary tool to contemporary antivirus software, it is currently able to assist in the analysis of malicious Microsoft Office documents by identifying and summarising potentially malicious indicators with a foundation in evidence, which may prove to be more effective with advancing technology and soon to surpass tailored machine learning algorithms, even without the utilisation of signatures and detection rules. As such, it is not limited to Office Open XML documents, but can be applied to any maliciously exploitable file format. The extensive knowledge base and rapid analytical abilities of a large language model enable not only the assessment of extracted evidence but also the contextualisation and referencing of information to support the final decision. We demonstrate that Claude 3.5 Sonnet by Anthropic, provided with a substantial quantity of raw data, equivalent to several hundred pages, can identify individual malicious indicators within an average of five to nine seconds and generate a comprehensive static analysis report, with an average cost of USD 0.19 per request and an F1-score of 0.929. Full article
(This article belongs to the Section Security Engineering & Applications)
Show Figures

Figure 1

15 pages, 2483 KiB  
Article
Thyro-GenAI: A Chatbot Using Retrieval-Augmented Generative Models for Personalized Thyroid Disease Management
by Minjeong Shin, Junho Song, Myung-Gwan Kim, Hyeong Won Yu, Eun Kyung Choe and Young Jun Chai
J. Clin. Med. 2025, 14(7), 2450; https://doi.org/10.3390/jcm14072450 - 3 Apr 2025
Cited by 1 | Viewed by 855
Abstract
Background: Large language models (LLMs) have the potential to enhance information processing and clinical reasoning in the healthcare industry but are hindered by inaccuracies and hallucinations. The retrieval-augmented generation (RAG) technique may address these problems by integrating external knowledge sources. Methods: We developed [...] Read more.
Background: Large language models (LLMs) have the potential to enhance information processing and clinical reasoning in the healthcare industry but are hindered by inaccuracies and hallucinations. The retrieval-augmented generation (RAG) technique may address these problems by integrating external knowledge sources. Methods: We developed a RAG-based chatbot called Thyro-GenAI by integrating a database of textbooks and guidelines with LLM. Thyro-GenAI and three service LLMs: OpenAI’s ChatGPT-4o, Perplexity AI’s ChatGPT-4o, and Anthropic’s Claude 3.5 Sonnet, were asked personalized clinical questions about thyroid disease. Three thyroid specialists assessed the quality of the generated responses and references without being blinded, which allowed them to interact with different chatbot interfaces. Results: Thyro-GenAI achieved the highest inverse-weighted mean rank for overall response quality. The overall inverse-weighted mean rankings for Thyro-GenAI, ChatGPT, Perplexity, and Claude were 3.0, 2.3, 2.8, and 1.9, respectively. Thyro-GenAI also achieved the second-highest inverse-weighted mean rank for overall reference quality. The overall inverse-weighted mean rankings for Thyro-GenAI, ChatGPT, Perplexity, and Claude were 3.1, 2.3, 3.2, and 1.8, respectively. Conclusions: Thyro-GenAI produced patient-specific clinical reasoning output based on a vector database, with fewer hallucinations and more reliability, compared to service LLMs. This emphasis on evidence-based responses ensures its safety and validity, addressing a critical limitation of existing LLMs. By integrating RAG with LLMs, it has the potential to support frontline clinical decision-making, especially helping first-line physicians by offering reliable decision support while managing thyroid disease patients. Full article
(This article belongs to the Section General Surgery)
Show Figures

Figure 1

18 pages, 2055 KiB  
Article
Think Before You Classify: The Rise of Reasoning Large Language Models for Consumer Complaint Detection and Classification
by Konstantinos I. Roumeliotis, Nikolaos D. Tselikas and Dimitrios K. Nasiopoulos
Electronics 2025, 14(6), 1070; https://doi.org/10.3390/electronics14061070 - 7 Mar 2025
Cited by 1 | Viewed by 2012
Abstract
Large language models (LLMs) have demonstrated remarkable capabilities in various natural language processing (NLP) tasks, but their effectiveness in real-world consumer complaint classification without fine-tuning remains uncertain. Zero-shot classification offers a promising solution by enabling models to categorize consumer complaints without prior exposure [...] Read more.
Large language models (LLMs) have demonstrated remarkable capabilities in various natural language processing (NLP) tasks, but their effectiveness in real-world consumer complaint classification without fine-tuning remains uncertain. Zero-shot classification offers a promising solution by enabling models to categorize consumer complaints without prior exposure to labeled training data, making it valuable for handling emerging issues and dynamic complaint categories in finance. However, this task is particularly challenging, as financial complaint categories often overlap, requiring a deep understanding of nuanced language. In this study, we evaluate the zero-shot classification performance of leading LLMs and reasoning models, totaling 14 models. Specifically, we assess DeepSeek-V3, Gemini-2.0-Flash, Gemini-1.5-Pro, Anthropic’s Claude 3.5 and 3.7 Sonnet, Claude 3.5 Haiku, and OpenAI’s GPT-4o, GPT-4.5, and GPT-4o Mini, alongside reasoning models such as DeepSeek-R1, o1, and o3. Unlike traditional LLMs, reasoning models are specifically trained with reinforcement learning to exhibit advanced inferential capabilities, structured decision-making, and complex reasoning, making their application to text classification a groundbreaking advancement. The models were tasked with classifying consumer complaints submitted to the Consumer Financial Protection Bureau (CFPB) into five predefined financial classes based solely on complaint text. Performance was measured using accuracy, precision, recall, F1-score, and heatmaps to identify classification patterns. The findings highlight the strengths and limitations of both standard LLMs and reasoning models in financial text processing, providing valuable insights into their practical applications. By integrating reasoning models into classification workflows, organizations may enhance complaint resolution automation and improve customer service efficiency, marking a significant step forward in AI-driven financial text analysis. Full article
Show Figures

Figure 1

21 pages, 4836 KiB  
Article
Chef Dalle: Transforming Cooking with Multi-Model Multimodal AI
by Brendan Hannon, Yulia Kumar, J. Jenny Li and Patricia Morreale
Computers 2024, 13(7), 156; https://doi.org/10.3390/computers13070156 - 21 Jun 2024
Cited by 6 | Viewed by 6388
Abstract
In an era where dietary habits significantly impact health, technological interventions can offer personalized and accessible food choices. This paper introduces Chef Dalle, a recipe recommendation system that leverages multi-model and multimodal human-computer interaction (HCI) techniques to provide personalized cooking guidance. The application [...] Read more.
In an era where dietary habits significantly impact health, technological interventions can offer personalized and accessible food choices. This paper introduces Chef Dalle, a recipe recommendation system that leverages multi-model and multimodal human-computer interaction (HCI) techniques to provide personalized cooking guidance. The application integrates voice-to-text conversion via Whisper and ingredient image recognition through GPT-Vision. It employs an advanced recipe filtering system that utilizes user-provided ingredients to fetch recipes, which are then evaluated through multi-model AI through integrations of OpenAI, Google Gemini, Claude, and/or Anthropic APIs to deliver highly personalized recommendations. These methods enable users to interact with the system using voice, text, or images, accommodating various dietary restrictions and preferences. Furthermore, the utilization of DALL-E 3 for generating recipe images enhances user engagement. User feedback mechanisms allow for the refinement of future recommendations, demonstrating the system’s adaptability. Chef Dalle showcases potential applications ranging from home kitchens to grocery stores and restaurant menu customization, addressing accessibility and promoting healthier eating habits. This paper underscores the significance of multimodal HCI in enhancing culinary experiences, setting a precedent for future developments in the field. Full article
Show Figures

Figure 1

22 pages, 3038 KiB  
Article
Benchmarking Large Language Model (LLM) Performance for Game Playing via Tic-Tac-Toe
by Oguzhan Topsakal and Jackson B. Harper
Electronics 2024, 13(8), 1532; https://doi.org/10.3390/electronics13081532 - 17 Apr 2024
Cited by 2 | Viewed by 5639
Abstract
This study investigates the strategic decision-making abilities of large language models (LLMs) via the game of Tic-Tac-Toe, renowned for its straightforward rules and definitive outcomes. We developed a mobile application coupled with web services, facilitating gameplay among leading LLMs, including Jurassic-2 Ultra by [...] Read more.
This study investigates the strategic decision-making abilities of large language models (LLMs) via the game of Tic-Tac-Toe, renowned for its straightforward rules and definitive outcomes. We developed a mobile application coupled with web services, facilitating gameplay among leading LLMs, including Jurassic-2 Ultra by AI21, Claude 2.1 by Anthropic, Gemini-Pro by Google, GPT-3.5-Turbo and GPT-4 by OpenAI, Llama2-70B by Meta, and Mistral Large by Mistral, to assess their rule comprehension and strategic thinking. Using a consistent prompt structure in 10 sessions for each LLM pair, we systematically collected data on wins, draws, and invalid moves across 980 games, employing two distinct prompt types to vary the presentation of the game’s status. Our findings reveal significant performance variations among the LLMs. Notably, GPT-4, GPT-3.5-Turbo, and Llama2 secured the most wins with the list prompt, while GPT-4, Gemini-Pro, and Mistral Large excelled using the illustration prompt. GPT-4 emerged as the top performer, achieving victory with the minimum number of moves and the fewest errors for both prompt types. This research introduces a novel methodology for assessing LLM capabilities using a game that can illuminate their strategic thinking abilities. Beyond enhancing our comprehension of LLM performance, this study lays the groundwork for future exploration into their utility in complex decision-making scenarios, offering directions for further inquiry and the exploration of LLM limits within game-based frameworks. Full article
Show Figures

Figure 1

12 pages, 1603 KiB  
Article
Effectiveness of ChatGPT in Coding: A Comparative Analysis of Popular Large Language Models
by Carlos Eduardo Andino Coello, Mohammed Nazeh Alimam and Rand Kouatly
Digital 2024, 4(1), 114-125; https://doi.org/10.3390/digital4010005 - 8 Jan 2024
Cited by 41 | Viewed by 18328
Abstract
This study explores the effectiveness and efficiency of the popular OpenAI model ChatGPT, powered by GPT-3.5 and GPT-4, in programming tasks to understand its impact on programming and potentially software development. To measure the performance of these models, a quantitative approach was employed [...] Read more.
This study explores the effectiveness and efficiency of the popular OpenAI model ChatGPT, powered by GPT-3.5 and GPT-4, in programming tasks to understand its impact on programming and potentially software development. To measure the performance of these models, a quantitative approach was employed using the Mostly Basic Python Problems (MBPP) dataset. In addition to the direct assessment of GPT-3.5 and GPT-4, a comparative analysis involving other popular large language models in the AI landscape, notably Google’s Bard and Anthropic’s Claude, was conducted to measure and compare their proficiency in the same tasks. The results highlight the strengths of ChatGPT models in programming tasks, offering valuable insights for the AI community, specifically for developers and researchers. As the popularity of artificial intelligence increases, this study serves as an early look into the field of AI-assisted programming. Full article
Show Figures

Figure 1

Back to TopTop