Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (46)

Search Parameters:
Keywords = Chain of Thought (CoT)

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
29 pages, 5294 KB  
Article
Building a Regional Platform for Monitoring Air Quality
by Stanimir Nedyalkov Stoyanov, Boyan Lyubomirov Belichev, Veneta Veselinova Tabakova-Komsalova, Yordan Georgiev Todorov, Angel Atanasov Golev, Georgi Kostadinov Maglizhanov, Ivan Stanimirov Stoyanov and Asya Georgieva Stoyanova-Doycheva
Future Internet 2026, 18(2), 78; https://doi.org/10.3390/fi18020078 (registering DOI) - 2 Feb 2026
Abstract
This paper presents PLAM (Plovdiv Air Monitoring)—a regional multi-agent platform for air quality monitoring, semantic reasoning, and forecasting. The platform uses a hybrid architecture that combines two types of intelligent agents: classic BDI (Belief-Desire-Intention) agents for complex, goal-oriented behavior and planning, and ReAct [...] Read more.
This paper presents PLAM (Plovdiv Air Monitoring)—a regional multi-agent platform for air quality monitoring, semantic reasoning, and forecasting. The platform uses a hybrid architecture that combines two types of intelligent agents: classic BDI (Belief-Desire-Intention) agents for complex, goal-oriented behavior and planning, and ReAct agents based on large language models (LLM) for quick response, analysis, and interaction with users. The system integrates data from heterogeneous sources, including local IoT sensor networks and public external services, enriching it with a specialized OWL ontology of environmental norms. Based on this data, the platform performs comparative analysis, detection of anomalies and inconsistencies between measurements, as well as predictions using machine learning models. The results are visualized and presented to users via a web interface and mobile application, including personalized alerts and recommendations. The architecture demonstrates essential properties of an intelligent agent such as autonomy, proactivity, reactivity, and social capabilities. The implementation and testing in the city of Plovdiv demonstrate the system’s ability to provide a more objective and comprehensive assessment of air quality, revealing significant differences between measurements from different institutions. The platform offers a modular and adaptive design, making it applicable to other regions, and outlines future development directions, such as creating a specialized small language model and expanding sensor capabilities. Full article
(This article belongs to the Special Issue Intelligent Agents and Their Application)
Show Figures

Graphical abstract

19 pages, 554 KB  
Article
Multimodal Sample Correction Method Based on Large-Model Instruction Enhancement and Knowledge Guidance
by Zhenyu Chen, Huaguang Yan, Jianguang Du, Meng Xue and Shuai Zhao
Electronics 2026, 15(3), 631; https://doi.org/10.3390/electronics15030631 - 2 Feb 2026
Abstract
With the continuous improvement of power system intelligence, multimodal data generated during distribution network maintenance have grown exponentially. However, existing power multimodal datasets commonly suffer from issues such as low sample quality, frequent factual errors, and inconsistent instruction expressions caused by regional differences.Traditional [...] Read more.
With the continuous improvement of power system intelligence, multimodal data generated during distribution network maintenance have grown exponentially. However, existing power multimodal datasets commonly suffer from issues such as low sample quality, frequent factual errors, and inconsistent instruction expressions caused by regional differences.Traditional sample correction methods mainly rely on manual screening or single-feature matching, which suffer from low efficiency and limited adaptability. This paper proposes a multimodal sample correction framework based on large-model instruction enhancement and knowledge guidance, focusing on two critical modalities: temporal data and text documentation. Multimodal sample correction refers to the task of identifying and rectifying errors, inconsistencies, or quality issues in datasets containing multiple data types (temporal sequences and text), with the objective of producing corrected samples that maintain factual accuracy, temporal consistency, and domain-specific compliance. Our proposed framework employs a three-stage processing approach: first, temporal Bidirectional Encoder Representations from Transformers (BERT) models and text BERT models are used to extract and fuse device temporal features and text features, respectively; second, a knowledge-injected assessment mechanism integrated with power knowledge graphs and DeepSeek’s long-chain-of-thought (CoT) capabilities is designed to achieve precise assessment of sample credibility; third, beam search algorithms are employed to generate high-quality corrected text, significantly improving the quality and reliability of multimodal samples in power professional scenarios. Experimental results demonstrate that our method significantly outperforms baseline models across all evaluation metrics (BLEU: 0.361, ROUGE: 0.521, METEOR: 0.443, F1-Score: 0.796), achieving improvements ranging from 21.1% to 73.0% over state-of-the-art methods: specifically, a 21.1% improvement over GECToR in BLEU, 26.5% over GECToR in ROUGE, 30.3% over Deep Edit in METEOR, and 11.8% over Deep Edit in F1-Score, with a reduction of approximately 35% in hallucination rates compared to existing approaches. These improvements provide important technical support for intelligent operation and maintenance of power systems, with implications for improving data quality management, enhancing model reliability in safety-critical applications, and enabling scalable knowledge-guided correction frameworks transferable to other industrial domains requiring high data integrity. Full article
Show Figures

Figure 1

23 pages, 12806 KB  
Article
Modality-Bridging for Automated Chain-of-Thought Construction in Meteorological Reasoning: A Study on WeatherQA
by Hang Cui, Jiqing Gu, Jing Peng, Tiejun Wang and Xi Wu
Information 2026, 17(2), 116; https://doi.org/10.3390/info17020116 - 26 Jan 2026
Viewed by 83
Abstract
This study applies a modality-bridging framework to automatically construct Chain-of-Thought (CoT) reasoning from meteorological images, reducing the need for expert annotation. The proposed pipeline integrates semantic extraction, Pseudo-CoT generation, and logical fusion to produce structured reasoning chains. Using the WeatherQA benchmark, we build [...] Read more.
This study applies a modality-bridging framework to automatically construct Chain-of-Thought (CoT) reasoning from meteorological images, reducing the need for expert annotation. The proposed pipeline integrates semantic extraction, Pseudo-CoT generation, and logical fusion to produce structured reasoning chains. Using the WeatherQA benchmark, we build datasets under single-image, 3-image, and 20-image settings—with automated and Expert-Guided variants—and evaluate performance on Areas Affected and Conditional Concern tasks. The results show near-expert spatial reasoning and more compact, well-aligned CoTs with reduced-image inputs. Multi-image settings reveal challenges in integrating dense visual cues, while semantic classification remains difficult due to label ambiguity. Overall, modality-bridging offers a scalable, interpretable, and low-cost approach for multimodal meteorological reasoning. Full article
Show Figures

Figure 1

34 pages, 6023 KB  
Article
Multi-Dimensional Evaluation of Auto-Generated Chain-of-Thought Traces in Reasoning Models
by Luis F. Becerra-Monsalve, German Sanchez-Torres and John W. Branch-Bedoya
AI 2026, 7(1), 35; https://doi.org/10.3390/ai7010035 - 21 Jan 2026
Viewed by 243
Abstract
Automatically generated chains-of-thought (gCoTs) have become common as large language models adopt deliberative behaviors. Prior work emphasizes fidelity to internal processes, leaving explanatory properties underexplored. Our central hypothesis is that these traces, produced by highly capable reasoning models, are not arbitrary by-products of [...] Read more.
Automatically generated chains-of-thought (gCoTs) have become common as large language models adopt deliberative behaviors. Prior work emphasizes fidelity to internal processes, leaving explanatory properties underexplored. Our central hypothesis is that these traces, produced by highly capable reasoning models, are not arbitrary by-products of decoding but exhibit stable and practically valuable textual properties beyond answer fidelity. We apply a multidimensional text-evaluation framework that quantifies four axes—structural coherence, logical–factual consistency, linguistic clarity, and coverage/informativeness—that are standard dimensions for assessing textual quality, and use it to evaluate five reasoning models on the GSM8K arithmetic word-problem benchmark (~1.3 k–1.4 k items) with reproducible, normalized metrics. Logical verification shows near-ceiling self-consistency, measured by the Aggregate Consistency Score (ACS ≈ 0.95–1.00), and high final-answer entailment, measured by Final Answer Soundness (FAS0 ≈ 0.85–1.00); when sound, justifications are compact, with Justification Set Size (JSS ≈ 0.51–0.57) and moderate redundancy, measured by the Redundant Constraint Ratio (RCR ≈ 0.62–0.70). Results also show consistent coherence and clarity; from gCoT to answer implication is stricter than from question to gCoT support, indicating chains anchored to the prompt. We find no systematic trade-off between clarity and informativeness (within-model slopes ≈ 0). In addition to these automatic and logic-based metrics, we include an exploratory expert rating of a subset (four raters; 50 items × five models) to contextualize model differences; these human judgments are not intended to support dataset-wide generalization. Overall, gCoTs display explanatory value beyond fidelity, primarily supported by the automated and logic-based analyses, motivating hybrid evaluation (automatic + exploratory human) to map convergence/divergence zones for user-facing applications. Full article
Show Figures

Figure 1

34 pages, 4044 KB  
Article
Modular Chain-of-Thought (CoT) for LLM-Based Conceptual Construction Cost Estimation
by Prashnna Ghimire, Kyungki Kim, Terry Stentz and Tirthankar Roy
Buildings 2026, 16(2), 396; https://doi.org/10.3390/buildings16020396 - 18 Jan 2026
Viewed by 333
Abstract
The traditional cost estimation process in construction involves extracting information from diverse data sources and relying on human intuition and judgment, making it time-intensive and error-prone. While recent advancements in large language models offer opportunities to automate these processes, their effectiveness in cost [...] Read more.
The traditional cost estimation process in construction involves extracting information from diverse data sources and relying on human intuition and judgment, making it time-intensive and error-prone. While recent advancements in large language models offer opportunities to automate these processes, their effectiveness in cost estimation tasks remains underexplored. Prior studies have investigated LLM applications in construction, but there is a lack of studies that have systematically evaluated their performance in cost estimation or proposed a framework for systematic evaluations of their performance in cost estimation and ways to enhance their accuracy and reliability through prompt engineering. This study evaluates the performance of pre-trained LLMs (GPT-4o, LLaMA 3.2, Gemini 2.0, and Claude 3.5 Sonnet) for conceptual cost estimation, comparing zero-shot prompting with a modular chain-of-thought framework. The results indicate that zero-shot prompting produced incomplete responses with an average confidence score of 1.91 (64%), whereas the CoT framework improved accuracy to 2.52 (84%) and achieved significant gains across BLEU, ROUGE-L, METEOR, content overlap, and semantic similarity metrics. The proposed modular CoT framework enhances structured reasoning, contextual alignment, and reliability in estimation workflows. This study contributes by developing a conceptual cost estimation framework for LLMs, benchmarking baseline model performance, and demonstrating how structured prompting improves estimation accuracy. This offers a scalable foundation for integrating AI into construction cost estimation workflows. Full article
(This article belongs to the Special Issue Knowledge Management in the Building and Construction Industry)
Show Figures

Figure 1

33 pages, 2758 KB  
Article
LLM-Driven Predictive–Adaptive Guidance for Autonomous Surface Vessels Under Environmental Disturbances
by Seunghun Lee, Yoonmo Jeon and Woongsup Kim
J. Mar. Sci. Eng. 2026, 14(2), 147; https://doi.org/10.3390/jmse14020147 - 9 Jan 2026
Viewed by 317
Abstract
Advances in AI are accelerating intelligent ship autonomy, yet robust trajectory tracking remains challenging under nonlinear dynamics and persistent environmental disturbances. Traditional model-based guidance becomes tuning-sensitive and loses robustness under strong disturbances, while data-driven approaches like reinforcement learning often suffer from poor generalization [...] Read more.
Advances in AI are accelerating intelligent ship autonomy, yet robust trajectory tracking remains challenging under nonlinear dynamics and persistent environmental disturbances. Traditional model-based guidance becomes tuning-sensitive and loses robustness under strong disturbances, while data-driven approaches like reinforcement learning often suffer from poor generalization to unseen dynamics and brittleness in out-of-distribution conditions. To address these limitations, we propose a guidance architecture embedding a Large Language Model (LLM) directly within the closed-loop control system. Using in-context prompting with a structured Chain-of-Thought (CoT) template, the LLM generates adaptive k-step heading reference sequences conditioned on recent navigation history, without model parameter updates. A latency-aware temporal inference mechanism synchronizes the asynchronous LLM predictions with a downstream Model Predictive Control (MPC) module, ensuring dynamic feasibility and strict actuation constraints. In MMG-based simulations of the KVLCC2, our framework consistently outperforms conventional model-based baselines. Specifically, it demonstrates superior path-keeping accuracy, higher corridor compliance, and faster disturbance recovery, achieving these performance gains while maintaining comparable or reduced rudder usage. These results validate the feasibility of integrating LLMs as predictive components within physical control loops, establishing a foundation for knowledge-driven, context-aware maritime autonomy. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

31 pages, 7858 KB  
Article
Domain-Adapted MLLMs for Interpretable Road Traffic Accident Analysis Using Remote Sensing Imagery
by Bing He, Wei He, Qing Chang, Wen Luo and Lingli Xiao
ISPRS Int. J. Geo-Inf. 2026, 15(1), 8; https://doi.org/10.3390/ijgi15010008 - 21 Dec 2025
Cited by 1 | Viewed by 412
Abstract
Traditional road traffic accident analysis has long relied on structured data, making it difficult to integrate high-dimensional heterogeneous information such as remote sensing imagery and leading to an incomplete understanding of accident scene environments. This study proposes a road traffic accident analysis framework [...] Read more.
Traditional road traffic accident analysis has long relied on structured data, making it difficult to integrate high-dimensional heterogeneous information such as remote sensing imagery and leading to an incomplete understanding of accident scene environments. This study proposes a road traffic accident analysis framework based on Multimodal Large Language Models. The approach integrates high-resolution remote sensing imagery with structured accident data through a three-stage progressive training pipeline. Specifically, we fine-tune three open-source vision–language models using Low-Rank Adaptation (LoRA) to sequentially optimize the model’s capabilities in visual environmental description, multi-task accident classification, and Chain-of-Thought (CoT) driven causal reasoning. A multimodal dataset was constructed containing remote sensing image descriptions, accident classification labels, and interpretable reasoning chains. Experimental results show that the fine-tuned model achieved a maximum improvement in the CIDEr score for image description tasks. In the joint classification task of accident severity and duration, the model achieved an accuracy of 71.61% and an F1-score of 0.8473. In the CoT reasoning task, both METEOR and CIDEr scores improved significantly. These results validate the effectiveness of structured reasoning mechanisms in multimodal fusion for transportation applications, providing a feasible path toward interpretable and intelligent analysis for real-world traffic management. Full article
(This article belongs to the Topic Artificial Intelligence Models, Tools and Applications)
Show Figures

Figure 1

21 pages, 1991 KB  
Article
Zero-Shot Resume–Job Matching with LLMs via Structured Prompting and Semantic Embeddings
by Panagiotis Skondras, Panagiotis Zervas and Giannis Tzimas
Electronics 2025, 14(24), 4960; https://doi.org/10.3390/electronics14244960 - 17 Dec 2025
Viewed by 935
Abstract
In this article, we present a tool for matching resumes to job posts and vice versa (job post to resumes). With minor modifications, it may also be adapted to other domains where text matching is necessary. This tool may help organizations save time [...] Read more.
In this article, we present a tool for matching resumes to job posts and vice versa (job post to resumes). With minor modifications, it may also be adapted to other domains where text matching is necessary. This tool may help organizations save time during the hiring process, as well as assist applicants by allowing them to match their resumes to job posts they have selected. To achieve text matching without any model training (zero-shot matching), we constructed dynamic structured prompts that consisted of unstructured and semi-structured job posts and resumes based on specific criteria, and we utilized the Chain of Thought (CoT) technique on the Mistral model (open-mistral-7b). In response, the model generated structured (segmented) job posts and resumes. Then, the job posts and resumes were cleaned and preprocessed. We utilized state-of-the-art sentence similarity models hosted on Hugging face (nomic-embed-text-v1-5 and google-embedding-gemma-300m) through inference endpoints to create sentence embeddings for each resume and job post segment. We used the cosine similarity metric to determine the optimal matching, and the matching operation was applied to eleven different occupations. The results we achieved reached up to 87% accuracy for some of the occupations and underscore the potential of zero-shot techniques in text matching utilizing LLMs. The dataset we used was from indeed.com, and the Spring AI framework was used for the implementation of the tool. Full article
(This article belongs to the Special Issue Advances in Text Mining and Analytics)
Show Figures

Figure 1

28 pages, 3811 KB  
Article
Diagnosing and Mitigating LLM Failures in Recognizing Culturally Specific Korean Names: An Error-Driven Prompting Framework
by Xiaonan Wang, Gyuri Choi, Subin An, Joeun Kang, Seoyoon Park, Hyeji Choi, Jongkyu Lee and Hansaem Kim
Appl. Sci. 2025, 15(24), 12977; https://doi.org/10.3390/app152412977 - 9 Dec 2025
Viewed by 777
Abstract
As large language models (LLMs) improve in understanding and reasoning, they are increasingly used in privacy protection tasks such as de-identification, privacy-sensitive text generation, and entity obfuscation. However, these applications depend on an essential requirement: the accurate identification of personally identifiable information (PII). [...] Read more.
As large language models (LLMs) improve in understanding and reasoning, they are increasingly used in privacy protection tasks such as de-identification, privacy-sensitive text generation, and entity obfuscation. However, these applications depend on an essential requirement: the accurate identification of personally identifiable information (PII). Compared with template-based PII that follows clear structural patterns, name-related PII depends much more on cultural and pragmatic context, which makes it harder for models to detect and raises higher privacy risks. Although recent studies begin to address this issue, existing work remains limited in language coverage, evaluation granularity, and the depth of error analysis. To address these gaps, this study proposes an error-driven framework that integrates diagnosis and intervention. Specifically, the framework introduces a method called Error-Driven Prompt (EDP), which transforms common failure patterns into executable prompting strategies. It further explores the integration of EDP with general advanced prompting techniques such as Chain-of-Thought (CoT), few-shot learning, and role-playing. In addition, the study constructed K-NameDiag, the first fine-grained evaluation benchmark for Korean name-related PII, which includes twelve culturally sensitive subtypes designed to examine model weaknesses in real-world contexts. The experimental results showed that EDP improved F1-scores in the range of 6 to 9 points across three widely used commercial LLMs, namely Claude Sonnet 4.5, GPT-5, and Gemini 2.5 Pro, while the Combined Enhanced Prompt (CEP), which integrates EDP with advanced prompting strategies, resulted in different shifts in precision and recall rather than consistent improvements. Further subtype-level analysis suggests that subtypes reliant on implicit cultural context remain resistant to correction, which shows the limitations of prompt engineering in addressing a model’s lack of internalized cultural knowledge. Full article
Show Figures

Figure 1

25 pages, 1219 KB  
Article
Chain-of-Thought Prompt Optimization via Adversarial Learning
by Guang Yang, Xiantao Cai, Shaohe Wang and Juhua Liu
Information 2025, 16(12), 1092; https://doi.org/10.3390/info16121092 - 9 Dec 2025
Viewed by 1195
Abstract
Chain-of-Thought (CoT) prompting has demonstrated strong effectiveness in improving the reasoning capabilities of Large Language Models (LLMs). However, existing CoT optimization approaches still lack systematic mechanisms for evaluating and refining prompts. To address this gap, we propose Adversarial Chain-of-Thought (adv-CoT), a framework that [...] Read more.
Chain-of-Thought (CoT) prompting has demonstrated strong effectiveness in improving the reasoning capabilities of Large Language Models (LLMs). However, existing CoT optimization approaches still lack systematic mechanisms for evaluating and refining prompts. To address this gap, we propose Adversarial Chain-of-Thought (adv-CoT), a framework that introduces adversarial learning into prompt optimization. Adv-CoT iteratively refines an initial prompt through generator–discriminator interactions and integrates both feedback and verification mechanisms. This process enables more targeted and interpretable improvements to CoT instructions and demonstrations. We evaluate adv-CoT on twelve datasets across commonsense, factual, symbolic, and arithmetic reasoning. Across 12 reasoning datasets, adv-CoT yields an average improvement of 4.44% on GPT-3.5-turbo and 1.08% on GPT-4o-mini, with both gains being statistically significant (paired t-test, p < 0.05). The experimental results show that the framework yields consistent but task-dependent gains, particularly on numerical and factual reasoning tasks, and maintains competitive performance on symbolic and commonsense benchmarks. Paired significance tests further indicate that improvements are statistically reliable on high-capacity proprietary models, while results on smaller open-source models exhibit greater variance. Although these findings demonstrate the promise of adversarial refinement for CoT prompting, the conclusions remain preliminary. The effectiveness of adv-CoT depends on the base model’s reasoning capability, and the current evaluation is limited to four major categories of reasoning tasks. We will release the full implementation and prompts to support further investigation into broader applications and more generalizable prompt optimization strategies. Full article
Show Figures

Graphical abstract

19 pages, 2747 KB  
Article
Evaluating a Multi-Modal Large Language Model for Ophthalmology Triage
by Caius Goh, Jabez Ng, Wei Yung Au, Clarence See, Alva Lim, Jun Wen Zheng, Xiuyi Fan and Kelvin Li
J. Clin. Transl. Ophthalmol. 2025, 3(4), 25; https://doi.org/10.3390/jcto3040025 - 30 Nov 2025
Viewed by 863
Abstract
Background/Purpose: Ophthalmic triage is challenging for non-specialists due to limited training and rising global eye disease burden. This study evaluates a multimodal framework integrating clinical text and ophthalmic imaging with large language models (LLMs). Textual consistency filtering and chain-of-thought (CoT) reasoning were incorporated [...] Read more.
Background/Purpose: Ophthalmic triage is challenging for non-specialists due to limited training and rising global eye disease burden. This study evaluates a multimodal framework integrating clinical text and ophthalmic imaging with large language models (LLMs). Textual consistency filtering and chain-of-thought (CoT) reasoning were incorporated to improve diagnostic accuracy. Methods: A dataset of 56 ophthalmology cases from a Singapore restructured hospital was pre-processed with acronym expansion, sentence reconstruction, and textual consistency filtering. To address dataset size limitations, 100 synthetic cases were generated via one-shot GPT-4 prompting, validated by semantic checks and ophthalmologist review. Three diagnostic approaches were tested: Text-Only, Image-Assisted, and Image with CoT. Diagnostic performance was quantified using a novel SNOMED-CT-based dissimilarity score, defined as the shortest path distance between predicted and reference diagnoses in the ontology, which was used to quantify semantic alignment. Results: The synthetic dataset included anterior segment (n = 40), posterior segment (n = 35), and extraocular (n = 25) cases. The text-only approach yielded a mean dissimilarity of 6.353 (95% CI: 4.668, 8.038). Incorporation of image assistance reduced this to 5.234 (95% CI: 3.930, 6.540), while CoT prompting provided further gains when imaging cues were ambiguous. Conclusions: The multimodal pipeline showed potential in improving diagnostic alignment in ophthalmology triage. Image inputs enhanced accuracy, and CoT reasoning reduced errors from ambiguous features, supporting its feasibility as a pilot framework for ophthalmology triage. Full article
Show Figures

Figure 1

24 pages, 4967 KB  
Article
Phish-Master: Leveraging Large Language Models for Advanced Phishing Email Generation and Detection
by Weihong Han, Junyi Zhu, Chenhui Zhang, Zhiqiang Zhang, Yangyang Mei and Le Wang
Appl. Sci. 2025, 15(22), 12203; https://doi.org/10.3390/app152212203 - 17 Nov 2025
Viewed by 1449
Abstract
Phishing emails present a significant and persistent cybersecurity threat to individuals and organizations globally due to the difficulty in detecting these malicious messages. Large Language Models (LLMs) have inadvertently intensified this challenge by facilitating the automated creation of high-quality, covert phishing emails that [...] Read more.
Phishing emails present a significant and persistent cybersecurity threat to individuals and organizations globally due to the difficulty in detecting these malicious messages. Large Language Models (LLMs) have inadvertently intensified this challenge by facilitating the automated creation of high-quality, covert phishing emails that can evade traditional rule-based detection systems. In this study, we examine the offensive capabilities of LLMs in generating phishing emails and introduce Phish-Master, a novel algorithm that integrates Chain-of-Thought (COT) reasoning, MetaPrompt techniques, and domain-specific insights to produce phishing emails designed to bypass enterprise-level filters. Our experiment, involving 100 malicious emails, validates Phish-Master’s real-world effectiveness, achieving a 99% evasion rate within authentic campus networks, successfully bypassing filters and targeting recipients, a testament to its capability in navigating complex network environments. To counteract the threat posed by Phish-Master and similar LLM-generated phishing emails, we have developed a multi-machine learning model integration framework trained on Kaggle’s phishing email dataset. This framework achieved an impressive detection rate of 99.87% on a rigorous test set of LLM-generated phishing emails, highlighting the critical role of our specialized dataset in enabling the detection tool to effectively recognize sophisticated patterns in LLM-crafted phishing emails. This study highlights the evolving threat of LLM-generated phishing emails and introduces an effective detection algorithm to mitigate this risk, emphasizing the importance of continued research in this domain. Full article
Show Figures

Figure 1

19 pages, 1039 KB  
Article
Adaptive Chain-of-Thought Distillation Based on LLM Performance on Original Problems
by Jianan Shen, Xiaolong Cui, Zhiqiang Gao and Xuanzhu Sheng
Mathematics 2025, 13(22), 3646; https://doi.org/10.3390/math13223646 - 14 Nov 2025
Viewed by 2727
Abstract
The chain-of-thought (CoT) approach in large language models (LLMs) has markedly enhanced their performance on complex tasks; however, effectively distilling this capability into LLMs with smaller parameter scales remains a challenge. Studies have found that small LLMs do not always benefit from CoT [...] Read more.
The chain-of-thought (CoT) approach in large language models (LLMs) has markedly enhanced their performance on complex tasks; however, effectively distilling this capability into LLMs with smaller parameter scales remains a challenge. Studies have found that small LLMs do not always benefit from CoT distillation. Inspired by the concept of teaching students in accordance with their aptitude, we propose an adaptive chain-of-thought distillation (ACoTD) framework. The core idea is to dynamically and adaptively customize distillation data and supervision signals for student models based on their performance on the original problems. Specifically, ACoTD initially evaluates and categorizes the original problems according to the capabilities of the student model. Subsequently, for Easy- and Medium-level problems, a short CoT distillation is employed for a brief lecture to reinforce knowledge and enhance training efficiency, for high-difficulty problems where the student model underperforms, and a detailed long CoT distillation is utilized for in-depth explanation to infuse richer reasoning logic. This differentiated distillation strategy ensures that student models achieve a better grasp of learning. We conducted experiments on multiple benchmark datasets. The results indicate that, compared to the baseline, our method can significantly improve the inference performance of small LLMs. Our method provides a new student-centered paradigm for knowledge distillation, demonstrating that adaptive adjustment of teaching strategies based on student feedback is an effective way to enhance small LLMs’ reasoning ability. Full article
Show Figures

Figure 1

19 pages, 4107 KB  
Article
Structured Prompting and Collaborative Multi-Agent Knowledge Distillation for Traffic Video Interpretation and Risk Inference
by Yunxiang Yang, Ningning Xu and Jidong J. Yang
Computers 2025, 14(11), 490; https://doi.org/10.3390/computers14110490 - 9 Nov 2025
Cited by 1 | Viewed by 1290
Abstract
Comprehensive highway scene understanding and robust traffic risk inference are vital for advancing Intelligent Transportation Systems (ITS) and autonomous driving. Traditional approaches often struggle with scalability and generalization, particularly under the complex and dynamic conditions of real-world environments. To address these challenges, we [...] Read more.
Comprehensive highway scene understanding and robust traffic risk inference are vital for advancing Intelligent Transportation Systems (ITS) and autonomous driving. Traditional approaches often struggle with scalability and generalization, particularly under the complex and dynamic conditions of real-world environments. To address these challenges, we introduce a novel structured prompting and multi-agent collaborative knowledge distillation framework that enables automatic generation of high-quality traffic scene annotations and contextual risk assessments. Our framework orchestrates two large vision–language models (VLMs): GPT-4o and o3-mini, using a structured Chain-of-Thought (CoT) strategy to produce rich, multiperspective outputs. These outputs serve as knowledge-enriched pseudo-annotations for supervised fine-tuning of a much smaller student VLM. The resulting compact 3B-scale model, named VISTA (Vision for Intelligent Scene and Traffic Analysis), is capable of understanding low-resolution traffic videos and generating semantically faithful, risk-aware captions. Despite its significantly reduced parameter count, VISTA achieves strong performance across established captioning metrics (BLEU-4, METEOR, ROUGE-L, and CIDEr) when benchmarked against its teacher models. This demonstrates that effective knowledge distillation and structured role-aware supervision can empower lightweight VLMs to capture complex reasoning capabilities. The compact architecture of VISTA facilitates efficient deployment on edge devices, enabling real-time risk monitoring without requiring extensive infrastructure upgrades. Full article
Show Figures

Figure 1

15 pages, 1245 KB  
Article
Multimodal Behavioral Sensors for Lie Detection: Integrating Visual, Auditory, and Generative Reasoning Cues
by Daniel Grabowski, Kamila Łuczaj and Khalid Saeed
Sensors 2025, 25(19), 6086; https://doi.org/10.3390/s25196086 - 2 Oct 2025
Viewed by 1356
Abstract
Advances in multimodal artificial intelligence enable new sensor-inspired approaches to lie detection by combining behavioral perception with generative reasoning. This study presents a deception detection framework that integrates deep video and audio processing with large language models guided by chain-of-thought (CoT) prompting. We [...] Read more.
Advances in multimodal artificial intelligence enable new sensor-inspired approaches to lie detection by combining behavioral perception with generative reasoning. This study presents a deception detection framework that integrates deep video and audio processing with large language models guided by chain-of-thought (CoT) prompting. We interpret neural architectures such as ViViT (for video) and HuBERT (for speech) as digital behavioral sensors that extract implicit emotional and cognitive cues, including micro-expressions, vocal stress, and timing irregularities. We further incorporate a GPT-5-based prompt-level fusion approach for video–language–emotion alignment and zero-shot inference. This method jointly processes visual frames, textual transcripts, and emotion recognition outputs, enabling the system to generate interpretable deception hypotheses without any task-specific fine-tuning. Facial expressions are treated as high-resolution affective signals captured via visual sensors, while audio encodes prosodic markers of stress. Our experimental setup is based on the DOLOS dataset, which provides high-quality multimodal recordings of deceptive and truthful behavior. We also evaluate a continual learning setup that transfers emotional understanding to deception classification. Results indicate that multimodal fusion and CoT-based reasoning increase classification accuracy and interpretability. The proposed system bridges the gap between raw behavioral data and semantic inference, laying a foundation for AI-driven lie detection with interpretable sensor analogues. Full article
(This article belongs to the Special Issue Sensor-Based Behavioral Biometrics)
Show Figures

Figure 1

Back to TopTop