MDPI - Publisher of Open Access Journals

18 pages, 5327 KiB

Open AccessArticle

Few-Shot Supervised Learning for Multivariate Knowledge Extraction from Dietary Reviews: Addressing Low-Resource Challenges with Optimized Datasets and Schema Layers

by Yuanhao Zhang, Wanxia Yang, Beiei Zhou, Xiang Zhao and Xin Li

Electronics 2025, 14(15), 3116; https://doi.org/10.3390/electronics14153116 - 5 Aug 2025

Abstract

Dietary reviews contain rich emotional and objective information; however, existing knowledge extraction methods struggle with low-resource scenarios due to sparse and imbalanced label distributions. To address these challenges, this paper proposes a few-shot supervised learning approach. First, we develop a professional dietary–emotional schema [...] Read more.

Dietary reviews contain rich emotional and objective information; however, existing knowledge extraction methods struggle with low-resource scenarios due to sparse and imbalanced label distributions. To address these challenges, this paper proposes a few-shot supervised learning approach. First, we develop a professional dietary–emotional schema by integrating domain knowledge with real-time data to ensure the coverage of diverse emotional expressions. Next, we introduce a dataset optimization method based on dual constraints—label frequency and quantity—to mitigate label imbalance and improve model performance. Utilizing the optimized dataset and a tailored prompt template, we fine-tune the DRE-UIE model for multivariate knowledge extraction. The experimental results demonstrate that the DRE-UIE model achieves a 20% higher F1 score than BERT-BiLSTM-CRF and outperforms TENER by 1.1%. Notably, on a 20-shot subset, the model on the Chinese dataset scores 0.841 and attains a 15.16% F1 score improvement over unoptimized data, validating the effectiveness of our few-shot learning framework. Furthermore, the approach also exhibits robust performance across Chinese and English corpora, underscoring its generalization capability. This work offers a practical solution for low-resource dietary–emotional knowledge extraction by leveraging schema design, dataset optimization, and model fine-tuning to achieve high accuracy with minimal annotated data. Full article

► Show Figures

Figure 1

25 pages, 394 KiB

Open AccessArticle

SMART DShot: Secure Machine-Learning-Based Adaptive Real-Time Timing Correction

by Hyunmin Kim, Zahid Basha Shaik Kadu and Kyusuk Han

Appl. Sci. 2025, 15(15), 8619; https://doi.org/10.3390/app15158619 (registering DOI) - 4 Aug 2025

Viewed by 27

Abstract

The exponential growth of autonomous systems demands robust security mechanisms that can operate within the extreme constraints of real-time embedded environments. This paper introduces SMART DShot, a groundbreaking machine learning-enhanced framework that transforms the security landscape of unmanned aerial vehicle motor control systems [...] Read more.

The exponential growth of autonomous systems demands robust security mechanisms that can operate within the extreme constraints of real-time embedded environments. This paper introduces SMART DShot, a groundbreaking machine learning-enhanced framework that transforms the security landscape of unmanned aerial vehicle motor control systems through seamless integration of adaptive timing correction and real-time anomaly detection within Digital Shot (DShot) communication protocols. Our approach addresses critical vulnerabilities in Electronic Speed Controller (ESC) interfaces by deploying four synergistic algorithms—Kalman Filter Timing Correction (KFTC), Recursive Least Squares Timing Correction (RLSTC), Fuzzy Logic Timing Correction (FLTC), and Hybrid Adaptive Timing Correction (HATC)—each optimized for specific error characteristics and attack scenarios. Through comprehensive evaluation encompassing 32,000 Monte Carlo test iterations (500 per scenario × 16 scenarios × 4 algorithms) across 16 distinct operational scenarios and PolarFire SoC Field-Programmable Gate Array (FPGA) implementation, we demonstrate exceptional performance with 88.3% attack detection rate, only 2.3% false positive incidence, and substantial vulnerability mitigation reducing Common Vulnerability Scoring System (CVSS) severity from High (7.3) to Low (3.1). Hardware validation on PolarFire SoC confirms practical viability with minimal resource overhead (2.16% Look-Up Table utilization, 16.57 mW per channel) and deterministic sub-10 microsecond execution latency. The Hybrid Adaptive Timing Correction algorithm achieves 31.01% success rate (95% CI: [30.2%, 31.8%]), representing a 26.5% improvement over baseline approaches through intelligent meta-learning-based algorithm selection. Statistical validation using Analysis of Variance confirms significant performance differences (F(3,1996) = 30.30, p < 0.001) with large effect sizes (Cohen’s d up to 4.57), where 64.6% of algorithm comparisons showed large practical significance. SMART DShot establishes a paradigmatic shift from reactive to proactive embedded security, demonstrating that sophisticated artificial intelligence can operate effectively within microsecond-scale real-time constraints while providing comprehensive protection against timing manipulation, de-synchronization, burst interference, replay attacks, coordinated multi-channel attacks, and firmware-level compromises. This work provides essential foundations for trustworthy autonomous systems across critical domains including aerospace, automotive, industrial automation, and cyber–physical infrastructure. These results conclusively demonstrate that ML-enhanced motor control systems can achieve both superior security (88.3% attack detection rate with 2.3% false positives) and operational performance (31.01% timing correction success rate, 26.5% improvement over baseline) simultaneously, establishing SMART DShot as a practical, deployable solution for next-generation autonomous systems. Full article

(This article belongs to the Special Issue Artificial Intelligence and Cybersecurity: Challenges and Opportunities)

► Show Figures

Figure 1

16 pages, 3373 KiB

Open AccessArticle

Knowledge-Augmented Zero-Shot Method for Power Equipment Defect Grading with Chain-of-Thought LLMs

by Jianguang Du, Bo Li, Zhenyu Chen, Lian Shen, Pufan Liu and Zhongyang Ran

Electronics 2025, 14(15), 3101; https://doi.org/10.3390/electronics14153101 - 4 Aug 2025

Viewed by 33

Abstract

As large language models (LLMs) increasingly enter specialized domains, inference without external resources often leads to knowledge gaps, opaque reasoning, and hallucinations. To address these challenges in power equipment defect grading, we propose a zero-shot question-answering framework that requires no task-specific examples. Our [...] Read more.

As large language models (LLMs) increasingly enter specialized domains, inference without external resources often leads to knowledge gaps, opaque reasoning, and hallucinations. To address these challenges in power equipment defect grading, we propose a zero-shot question-answering framework that requires no task-specific examples. Our system performs two-stage retrieval—first using a Sentence-BERT model fine-tuned on power equipment maintenance texts for coarse filtering, then combining TF-IDF and semantic re-ranking for fine-grained selection of the most relevant knowledge snippets. We embed both the user query and the retrieved evidence into a Chain-of-Thought (CoT) prompt, guiding the pre-trained LLM through multi-step reasoning with self-validation and without any model fine-tuning. Experimental results show that on a held-out test set of 218 inspection records, our method achieves a grading accuracy of 54.2%, which is 6.0 percentage points higher than the fine-tuned BERT baseline at 48.2%; an Explanation Coherence Score (ECS) of 4.2 compared to 3.1 for the baseline; a mean retrieval latency of 28.3 ms; and an average LLM inference time of 5.46 s. Ablation and sensitivity analyses demonstrate that a fine-stage retrieval pool size of k = 30 offers the optimal trade-off between accuracy and latency; human expert evaluation by six senior engineers yields average Usefulness and Trustworthiness scores of 4.1 and 4.3, respectively. Case studies across representative defect scenarios further highlight the system’s robust zero-shot performance. Full article

(This article belongs to the Special Issue Recent Progress in Visual AI: Architectures, Learning, and Applications)

► Show Figures

Figure 1

17 pages, 462 KiB

Open AccessArticle

Knowledge-Guided Cyber Threat Intelligence Summarization via Term-Oriented Input Construction

by Junmei Ding and Yueming Lu

Electronics 2025, 14(15), 3096; https://doi.org/10.3390/electronics14153096 - 3 Aug 2025

Viewed by 171

Abstract

Cyber threat intelligence summarization plays a critical role in enhancing threat awareness and operational response in cybersecurity. However, existing summarization models often fail to capture essential threat elements due to the unstructured nature of cyber threat intelligence documents and the lack of domain-specific [...] Read more.

Cyber threat intelligence summarization plays a critical role in enhancing threat awareness and operational response in cybersecurity. However, existing summarization models often fail to capture essential threat elements due to the unstructured nature of cyber threat intelligence documents and the lack of domain-specific knowledge. This paper presents a knowledge-guided cyber threat intelligence summarization framework via term-oriented input construction, designed to improve summary fidelity, semantic relevance, and model robustness. The proposed approach consists of two key components: a hybrid term construction pipeline that combines unsupervised keyword extraction and supervised term generation with rule-based refinement, and a knowledge-injected input construction paradigm that explicitly incorporates structured terms into the model input. This strategy enhances the model’s understanding of critical threat semantics without altering its architecture. Extensive experiments conducted on cyber threat intelligence summarization benchmarks under both zero-shot and supervised settings demonstrate that the proposed method consistently improves summarization performance across different models, offering strong generalization and deployment flexibility. Full article

► Show Figures

Figure 1

23 pages, 4379 KiB

Open AccessArticle

Large Vision Language Model: Enhanced-RSCLIP with Exemplar-Image Prompting for Uncommon Object Detection in Satellite Imagery

by Taiwo Efunogbon, Abimbola Efunogbon, Enjie Liu, Dayou Li and Renxi Qiu

Electronics 2025, 14(15), 3071; https://doi.org/10.3390/electronics14153071 - 31 Jul 2025

Viewed by 158

Abstract

Large Vision Language Models (LVLMs) have shown promise in remote sensing applications, yet struggle with “uncommon” objects that lack sufficient public labeled data. This paper presents Enhanced-RSCLIP, a novel dual-prompt architecture that combines text prompting with exemplar-image processing for cattle herd detection in [...] Read more.

Large Vision Language Models (LVLMs) have shown promise in remote sensing applications, yet struggle with “uncommon” objects that lack sufficient public labeled data. This paper presents Enhanced-RSCLIP, a novel dual-prompt architecture that combines text prompting with exemplar-image processing for cattle herd detection in satellite imagery. Our approach introduces a key innovation where an exemplar-image preprocessing module using crop-based or attention-based algorithms extracts focused object features which are fed as a dual stream to a contrastive learning framework that fuses textual descriptions with visual exemplar embeddings. We evaluated our method on a custom dataset of 260 satellite images across UK and Nigerian regions. Enhanced-RSCLIP with crop-based exemplar processing achieved 72% accuracy in cattle detection and 56.2% overall accuracy on cross-domain transfer tasks, significantly outperforming text-only CLIP (31% overall accuracy). The dual-prompt architecture enables effective few-shot learning and cross-regional transfer from data-rich (UK) to data-sparse (Nigeria) environments, demonstrating a 41% improvement over baseline approaches for uncommon object detection in satellite imagery. Full article

(This article belongs to the Topic Next-Generation IoT and Smart Systems for Communication and Sensing)

► Show Figures

Figure 1

25 pages, 3625 KiB

Open AccessArticle

Automated Classification of Public Transport Complaints via Text Mining Using LLMs and Embeddings

by Daniyar Rakhimzhanov, Saule Belginova and Didar Yedilkhan

Information 2025, 16(8), 644; https://doi.org/10.3390/info16080644 - 29 Jul 2025

Viewed by 242

Abstract

The proliferation of digital public service platforms and the expansion of e-government initiatives have significantly increased the volume and diversity of citizen-generated feedback. This trend emphasizes the need for classification systems that are not only tailored to specific administrative domains but also robust [...] Read more.

The proliferation of digital public service platforms and the expansion of e-government initiatives have significantly increased the volume and diversity of citizen-generated feedback. This trend emphasizes the need for classification systems that are not only tailored to specific administrative domains but also robust to the linguistic, contextual, and structural variability inherent in user-submitted content. This study investigates the comparative effectiveness of large language models (LLMs) alongside instruction-tuned embedding models in the task of categorizing public transportation complaints. LLMs were tested using a few-shot inference, where classification is guided by a small set of in-context examples. Embedding models were assessed under three paradigms: label-only zero-shot classification, instruction-based classification, and supervised fine-tuning. Results indicate that fine-tuned embeddings can achieve or exceed the accuracy of LLMs, reaching up to 90 percent, while offering significant reductions in inference latency and computational overhead. E5 embeddings showed consistent generalization across unseen categories and input shifts, whereas BGE-M3 demonstrated measurable gains when adapted to task-specific distributions. Instruction-based classification produced lower accuracy for both models, highlighting the limitations of prompt conditioning in isolation. These findings position multilingual embedding models as a viable alternative to LLMs for classification at scale in data-intensive public sector environments. Full article

(This article belongs to the Special Issue Text Mining: Challenges, Algorithms, Tools and Applications)

► Show Figures

Graphical abstract

24 pages, 2508 KiB

Open AccessArticle

Class-Discrepancy Dynamic Weighting for Cross-Domain Few-Shot Hyperspectral Image Classification

by Chen Ding, Jiahao Yue, Sirui Zheng, Yizhuo Dong, Wenqiang Hua, Xueling Chen, Yu Xie, Song Yan, Wei Wei and Lei Zhang

Remote Sens. 2025, 17(15), 2605; https://doi.org/10.3390/rs17152605 - 27 Jul 2025

Viewed by 340

Abstract

In recent years, cross-domain few-shot learning (CDFSL) has demonstrated remarkable performance in hyperspectral image classification (HSIC), partially alleviating the distribution shift problem. However, most domain adaptation methods rely on similarity metrics to establish cross-domain class matching, making it difficult to simultaneously account for [...] Read more.

In recent years, cross-domain few-shot learning (CDFSL) has demonstrated remarkable performance in hyperspectral image classification (HSIC), partially alleviating the distribution shift problem. However, most domain adaptation methods rely on similarity metrics to establish cross-domain class matching, making it difficult to simultaneously account for intra-class sample size variations and inherent inter-class differences. To address this problem, existing studies have introduced a class weighting mechanism within the prototype network framework, determining class weights by calculating inter-sample similarity through distance metrics. However, this method suffers from a dual limitation: susceptibility to noise interference and insufficient capacity to capture global class variations, which may lead to distorted weight allocation and consequently result in alignment bias. To solve these issues, we propose a novel class-discrepancy dynamic weighting-based cross-domain FSL (CDDW-CFSL) framework. It integrates three key components: (1) the class-weighted domain adaptation (CWDA) method dynamically measures cross-domain distribution shifts using global class mean discrepancies. It employs discrepancy-sensitive weighting to strengthen the alignment of critical categories, enabling accurate domain adaptation while maintaining feature topology; (2) the class mean refinement (CMR) method incorporates class covariance distance to compute distribution discrepancies between support set samples and class prototypes, enabling the precise capture of cross-domain feature internal structures; (3) a novel multi-dimensional feature extractor that captures both local spatial details and continuous spectral characteristics simultaneously, facilitating deep cross-dimensional feature fusion. The results in three publicly available HSIC datasets show the effectiveness of the CDDW-CFSL. Full article

► Show Figures

Figure 1

23 pages, 1604 KiB

Open AccessArticle

Fine-Tuning Large Language Models for Kazakh Text Simplification

by Alymzhan Toleu, Gulmira Tolegen and Irina Ualiyeva

Appl. Sci. 2025, 15(15), 8344; https://doi.org/10.3390/app15158344 - 26 Jul 2025

Viewed by 359

Abstract

This paper addresses text simplification task for Kazakh, a morphologically rich, low-resource language, by introducing KazSim, an instruction-tuned model built on multilingual large language models (LLMs). First, we develop a heuristic pipeline to identify complex Kazakh sentences, manually validating its performance on 400 [...] Read more.

This paper addresses text simplification task for Kazakh, a morphologically rich, low-resource language, by introducing KazSim, an instruction-tuned model built on multilingual large language models (LLMs). First, we develop a heuristic pipeline to identify complex Kazakh sentences, manually validating its performance on 400 examples and comparing it against a purely LLM-based selection method; we then use this pipeline to assemble a parallel corpus of 8709 complex–simple pairs via LLM augmentation. For the simplification task, we benchmark KazSim against standard Seq2Seq systems, domain-adapted Kazakh LLMs, and zero-shot instruction-following models. On an automatically constructed test set, KazSim (Llama-3.3-70B) achieves BLEU 33.50, SARI 56.38, and F1 87.56 with a length ratio of 0.98, outperforming all baselines. We also explore prompt language (English vs. Kazakh) and conduct human evaluation with three native speakers: KazSim scores 4.08 for fluency, 4.09 for meaning preservation, and 4.42 for simplicity—significantly above GPT-4o-mini. Error analysis shows that remaining failures cluster into tone change, tense change, and semantic drift, reflecting Kazakh’s agglutinative morphology and flexible syntax. Full article

(This article belongs to the Special Issue Natural Language Processing and Text Mining)

► Show Figures

Figure 1

18 pages, 1687 KiB

Open AccessArticle

Beyond Classical AI: Detecting Fake News with Hybrid Quantum Neural Networks

by Volkan Altıntaş

Appl. Sci. 2025, 15(15), 8300; https://doi.org/10.3390/app15158300 - 25 Jul 2025

Viewed by 224

Abstract

The advent of quantum computing has introduced new opportunities for enhancing classical machine learning architectures. In this study, we propose a novel hybrid model, the HQDNN (Hybrid Quantum–Deep Neural Network), designed for the automatic detection of fake news. The model integrates classical fully [...] Read more.

The advent of quantum computing has introduced new opportunities for enhancing classical machine learning architectures. In this study, we propose a novel hybrid model, the HQDNN (Hybrid Quantum–Deep Neural Network), designed for the automatic detection of fake news. The model integrates classical fully connected neural layers with a parameterized quantum circuit, enabling the processing of textual data within both classical and quantum computational domains. To assess its effectiveness, we conducted experiments on the widely used LIAR dataset utilizing Term Frequency–Inverse Document Frequency (TF-IDF) features, as well as transformer-based DistilBERT embeddings. The experimental results demonstrate that the HQDNN achieves a superior recall performance—92.58% with TF-IDF and 94.40% with DistilBERT—surpassing traditional machine learning models such as Logistic Regression, Linear SVM, and Multilayer Perceptron. Additionally, we compare the HQDNN with SetFit, a recent CPU-efficient few-shot transformer model, and show that while SetFit achieves higher precision, the HQDNN significantly outperforms it in recall. Furthermore, an ablation experiment confirms the critical contribution of the quantum component, revealing a substantial drop in performance when the quantum layer is removed. These findings highlight the potential of hybrid quantum–classical models as effective and compact alternatives for high-sensitivity classification tasks, particularly in domains such as fake news detection. Full article

► Show Figures

Figure 1

18 pages, 516 KiB

Open AccessArticle

A Nested Named Entity Recognition Model Robust in Few-Shot Learning Environments Using Label Description Information

by Hyunsun Hwang, Youngjun Jung, Changki Lee and Wooyoung Go

Appl. Sci. 2025, 15(15), 8255; https://doi.org/10.3390/app15158255 - 24 Jul 2025

Viewed by 227

Abstract

Nested named entity recognition (NER) is a task that identifies hierarchically structured entities, where one entity can contain other entities within its span. This study introduces a nested NER model for few-shot learning environments, addressing the difficulty of building extensive datasets for general [...] Read more.

Nested named entity recognition (NER) is a task that identifies hierarchically structured entities, where one entity can contain other entities within its span. This study introduces a nested NER model for few-shot learning environments, addressing the difficulty of building extensive datasets for general named entities. We enhance the Biaffine nested NER model by modifying its output layer to incorporate label semantic information through a novel label description embedding (LDE) approach, improving performance with limited training data. Our method replaces the traditional biaffine classifier with a label attention mechanism that leverages comprehensive natural language descriptions of entity types, encoded using BERT to capture rich semantic relationships between labels and input spans. We conducted comprehensive experiments on four benchmark datasets: GENIA (nested NER), ACE 2004 (nested NER), ACE 2005 (nested NER), and CoNLL 2003 English (flat NER). Performance was evaluated across multiple few-shot scenarios (1-shot, 5-shot, 10-shot, and 20-shot) using F1-measure as the primary metric, with five different random seeds to ensure robust evaluation. We compared our approach against strong baselines including BERT-LSTM-CRF with nested tags, the original Biaffine model, and recent few-shot NER methods (FewNER, FIT, LPNER, SpanNER). Results demonstrate significant improvements across all few-shot scenarios. On GENIA, our LDE model achieves 45.07% F1 in five-shot learning compared to 30.74% for the baseline Biaffine model (46.4% relative improvement). On ACE 2005, we obtain 44.24% vs. 32.38% F1 in five-shot scenarios (36.6% relative improvement). The model shows consistent gains in 10-shot (57.19% vs. 49.50% on ACE 2005) and 20-shot settings (64.50% vs. 58.21% on ACE 2005). Ablation studies confirm that semantic information from label descriptions is the key factor enabling robust few-shot performance. Transfer learning experiments demonstrate the model’s ability to leverage knowledge from related domains. Our findings suggest that incorporating label semantic information can substantially enhance NER models in low-resource settings, opening new possibilities for applying NER in specialized domains or languages with limited annotated data. Full article

(This article belongs to the Special Issue Applications of Natural Language Processing to Data Science)

► Show Figures

Figure 1

23 pages, 10648 KiB

Open AccessArticle

Meta-Learning-Integrated Neural Architecture Search for Few-Shot Hyperspectral Image Classification

by Aili Wang, Kang Zhang, Haibin Wu, Haisong Chen and Minhui Wang

Electronics 2025, 14(15), 2952; https://doi.org/10.3390/electronics14152952 - 24 Jul 2025

Viewed by 218

Abstract

In order to address the limitations of the number of label samples in practical accurate classification scenarios and the problems of overfitting and an insufficient generalization ability caused by Few-Shot Learning (FSL) in hyperspectral image classification (HSIC), this paper designs and implements a [...] Read more.

In order to address the limitations of the number of label samples in practical accurate classification scenarios and the problems of overfitting and an insufficient generalization ability caused by Few-Shot Learning (FSL) in hyperspectral image classification (HSIC), this paper designs and implements a neural architecture search (NAS) for a few-shot HSI classification method that combines meta learning. Firstly, a multi-source domain learning framework was constructed to integrate heterogeneous natural images and homogeneous remote sensing images to improve the information breadth of few-sample learning, enabling the final network to enhance its generalization ability under limited labeled samples by learning the similarity between different data sources. Secondly, by constructing precise and robust search spaces and deploying different units at different locations, the classification accuracy and model transfer robustness of the final network can be improved. This method fully utilizes spatial texture information and rich category information of multi-source data and transfers the learned meta knowledge to the optimal architecture for HSIC execution through precise and robust search space design, achieving HSIC tasks with limited samples. Experimental results have shown that our proposed method achieved an overall accuracy (OA) of 98.57%, 78.39%, and 98.74% for classification on the Pavia Center, Indian Pine, and WHU-Hi-LongKou datasets, respectively. It is fully demonstrated that utilizing spatial texture information and rich category information of multi-source data, and through precise and robust search space design, the learned meta knowledge is fully transmitted to the optimal architecture for HSIC, perfectly achieving classification tasks with few-shot samples. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition, 2nd Edition)

► Show Figures

Figure 1

15 pages, 3893 KiB

Open AccessArticle

Exploration of 3D Few-Shot Learning Techniques for Classification of Knee Joint Injuries on MR Images

by Vinh Hiep Dang, Minh Tri Nguyen, Ngoc Hoang Le, Thuan Phat Nguyen, Quoc-Viet Tran, Tan Ha Mai, Vu Pham Thao Vy, Truong Nguyen Khanh Hung, Ching-Yu Lee, Ching-Li Tseng, Nguyen Quoc Khanh Le and Phung-Anh Nguyen

Diagnostics 2025, 15(14), 1808; https://doi.org/10.3390/diagnostics15141808 - 18 Jul 2025

Viewed by 455

Abstract

Accurate diagnosis of knee joint injuries from magnetic resonance (MR) images is critical for patient care. Background/Objectives: While deep learning has advanced 3D MR image analysis, its reliance on extensive labeled datasets is a major hurdle for diverse knee pathologies. Few-shot learning [...] Read more.

Accurate diagnosis of knee joint injuries from magnetic resonance (MR) images is critical for patient care. Background/Objectives: While deep learning has advanced 3D MR image analysis, its reliance on extensive labeled datasets is a major hurdle for diverse knee pathologies. Few-shot learning (FSL) addresses this by enabling models to classify new conditions from minimal annotated examples, often leveraging knowledge from related tasks. However, creating robust 3D FSL frameworks for varied knee injuries remains challenging. Methods: We introduce MedNet-FS, a 3D FSL framework that effectively classifies knee injuries by utilizing domain-specific pre-trained weights and generalized end-to-end (GE2E) loss for discriminative embeddings. Results: MedNet-FS, with knee-MRI-specific pre-training, significantly outperformed models using generic or other medical pre-trained weights and approached supervised learning performance on internal datasets with limited samples (e.g., achieving an area under the curve (AUC) of 0.76 for ACL tear classification with k = 40 support samples on the MRNet dataset). External validation on the KneeMRI dataset revealed challenges in classifying partially torn ACL (AUC up to 0.58) but demonstrated promising performance for distinguishing intact versus fully ruptured ACLs (AUC 0.62 with k = 40). Conclusions: These findings demonstrate that tailored FSL strategies can substantially reduce data dependency in developing specialized medical imaging tools. This approach fosters rapid AI tool development for knee injuries and offers a scalable solution for data scarcity in other medical imaging domains, potentially democratizing AI-assisted diagnostics, particularly for rare conditions or in resource-limited settings. Full article

(This article belongs to the Special Issue New Technologies and Tools Used for Risk Assessment of Diseases)

► Show Figures

Figure 1

16 pages, 2355 KiB

Open AccessArticle

Generalising Stock Detection in Retail Cabinets with Minimal Data Using a DenseNet and Vision Transformer Ensemble

by Babak Rahi, Deniz Sagmanli, Felix Oppong, Direnc Pekaslan and Isaac Triguero

Mach. Learn. Knowl. Extr. 2025, 7(3), 66; https://doi.org/10.3390/make7030066 - 16 Jul 2025

Viewed by 307

Abstract

Generalising deep-learning models to perform well on unseen data domains with minimal retraining remains a significant challenge in computer vision. Even when the target task—such as quantifying the number of elements in an image—stays the same, data quality, shape, or form variations can [...] Read more.

Generalising deep-learning models to perform well on unseen data domains with minimal retraining remains a significant challenge in computer vision. Even when the target task—such as quantifying the number of elements in an image—stays the same, data quality, shape, or form variations can deviate from the training conditions, often necessitating manual intervention. As a real-world industry problem, we aim to automate stock level estimation in retail cabinets. As technology advances, new cabinet models with varying shapes emerge alongside new camera types. This evolving scenario poses a substantial obstacle to deploying long-term, scalable solutions. To surmount the challenge of generalising to new cabinet models and cameras with minimal amounts of sample images, this research introduces a new solution. This paper proposes a novel ensemble model that combines DenseNet-201 and Vision Transformer (ViT-B/8) architectures to achieve generalisation in stock-level classification. The novelty aspect of our solution comes from the fact that we combine a transformer with a DenseNet model in order to capture both the local, hierarchical details and the long-range dependencies within the images, improving generalisation accuracy with less data. Key contributions include (i) a novel DenseNet-201 + ViT-B/8 feature-level fusion, (ii) an adaptation workflow that needs only two images per class, (iii) a balanced layer-unfreezing schedule, (iv) a publicly described domain-shift benchmark, and (v) a 47 pp accuracy gain over four standard few-shot baselines. Our approach leverages fine-tuning techniques to adapt two pre-trained models to the new retail cabinets (i.e., standing or horizontal) and camera types using only two images per class. Experimental results demonstrate that our method achieves high accuracy rates of 91% on new cabinets with the same camera and 89% on new cabinets with different cameras, significantly outperforming standard few-shot learning methods. Full article

(This article belongs to the Section Data)

► Show Figures

Figure 1

21 pages, 1118 KiB

Open AccessReview

Integrating Large Language Models into Robotic Autonomy: A Review of Motion, Voice, and Training Pipelines

by Yutong Liu, Qingquan Sun and Dhruvi Rajeshkumar Kapadia

AI 2025, 6(7), 158; https://doi.org/10.3390/ai6070158 - 15 Jul 2025

Viewed by 1486

Abstract

This survey provides a comprehensive review of the integration of large language models (LLMs) into autonomous robotic systems, organized around four key pillars: locomotion, navigation, manipulation, and voice-based interaction. We examine how LLMs enhance robotic autonomy by translating high-level natural language commands into [...] Read more.

This survey provides a comprehensive review of the integration of large language models (LLMs) into autonomous robotic systems, organized around four key pillars: locomotion, navigation, manipulation, and voice-based interaction. We examine how LLMs enhance robotic autonomy by translating high-level natural language commands into low-level control signals, supporting semantic planning and enabling adaptive execution. Systems like SayTap improve gait stability through LLM-generated contact patterns, while TrustNavGPT achieves a 5.7% word error rate (WER) under noisy voice-guided conditions by modeling user uncertainty. Frameworks such as MapGPT, LLM-Planner, and 3D-LOTUS++ integrate multi-modal data—including vision, speech, and proprioception—for robust planning and real-time recovery. We also highlight the use of physics-informed neural networks (PINNs) to model object deformation and support precision in contact-rich manipulation tasks. To bridge the gap between simulation and real-world deployment, we synthesize best practices from benchmark datasets (e.g., RH20T, Open X-Embodiment) and training pipelines designed for one-shot imitation learning and cross-embodiment generalization. Additionally, we analyze deployment trade-offs across cloud, edge, and hybrid architectures, emphasizing latency, scalability, and privacy. The survey concludes with a multi-dimensional taxonomy and cross-domain synthesis, offering design insights and future directions for building intelligent, human-aligned robotic systems powered by LLMs. Full article

► Show Figures

Figure 1

22 pages, 3279 KiB

Open AccessArticle

HA-CP-Net: A Cross-Domain Few-Shot SAR Oil Spill Detection Network Based on Hybrid Attention and Category Perception

by Dongmei Song, Shuzhen Wang, Bin Wang, Weimin Chen and Lei Chen

J. Mar. Sci. Eng. 2025, 13(7), 1340; https://doi.org/10.3390/jmse13071340 - 13 Jul 2025

Viewed by 313

Abstract

Deep learning models have obvious advantages in detecting oil spills, but the training of deep learning models heavily depends on a large number of samples of high quality. However, due to the accidental nature, unpredictability, and urgency of oil spill incidents, it is [...] Read more.

Deep learning models have obvious advantages in detecting oil spills, but the training of deep learning models heavily depends on a large number of samples of high quality. However, due to the accidental nature, unpredictability, and urgency of oil spill incidents, it is difficult to obtain a large number of labeled samples in real oil spill monitoring scenarios. Surprisingly, few-shot learning can achieve excellent classification performance with only a small number of labeled samples. In this context, a new cross-domain few-shot SAR oil spill detection network is proposed in this paper. Significantly, the network is embedded with a hybrid attention feature extraction block, which consists of a coordinate attention module to perceive the channel information and spatial location information, as well as a global self-attention transformer module capturing the global dependencies and a multi-scale self-attention module depicting the local detailed features, thereby achieving deep mining and accurate characterization of image features. In addition, to address the problem that it is difficult to distinguish between the suspected oil film in seawater and real oil film using few-shot due to the small difference in features, this paper proposes a double loss function category determination block, which consists of two parts: a well-designed category-perception loss function and a traditional cross-entropy loss function. The category-perception loss function optimizes the spatial distribution of sample features by shortening the distance between similar samples while expanding the distance between different samples. By combining the category-perception loss function with the cross-entropy loss function, the network’s performance in discriminating between real and suspected oil films is thus maximized. The experimental results effectively demonstrate that this study provides an effective solution for high-precision oil spill detection under few-shot conditions, which is conducive to the rapid identification of oil spill accidents. Full article

(This article belongs to the Section Marine Environmental Science)

► Show Figures

Figure 1

Search Results (295)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (295)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI