Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (239)

Search Parameters:
Keywords = teacher–student distillation

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
24 pages, 1353 KB  
Article
SLTP: A Symbolic Travel-Planning Agent Framework with Decoupled Translation and Heuristic Tree Search
by Debin Tang, Qian Jiang, Jingpu Yang, Jingyu Zhao, Xiaofei Du, Miao Fang and Xiaofei Zhang
Electronics 2026, 15(2), 422; https://doi.org/10.3390/electronics15020422 (registering DOI) - 18 Jan 2026
Abstract
Large language models (LLMs) demonstrate outstanding capability in understanding natural language and show great potential in open-domain travel planning. However, when confronted with multi-constraint itineraries, personalized recommendations, and scenarios requiring rigorous external information validation, pure LLM-based approaches lack rigorous planning ability and fine-grained [...] Read more.
Large language models (LLMs) demonstrate outstanding capability in understanding natural language and show great potential in open-domain travel planning. However, when confronted with multi-constraint itineraries, personalized recommendations, and scenarios requiring rigorous external information validation, pure LLM-based approaches lack rigorous planning ability and fine-grained personalization. To address these gaps, we propose the Symbolic LoRA Travel Planner (SLTP) framework—an agent architecture that combines a two-stage symbol-rule LoRA fine-tuning pipeline with a user multi-option heuristic tree search (MHTS) planner. SLTP decomposes the entire process of transforming natural language into executable code into two specialized, sequential LoRA experts: the first maps natural-language queries to symbolic constraints with high fidelity; the second compiles symbolic constraints into executable Python planning code. After reflective verification, the generated code serves as constraints and heuristic rules for an MHTS planner that preserves diversified top-K candidate itineraries and uses pruning plus heuristic strategies to maintain search-time performance. To overcome the scarcity of high-quality intermediate symbolic data, we adopt a teacher–student distillation approach: a strong teacher model generates high-fidelity symbolic constraints and executable code, which we use as hard targets to distill knowledge into an 8B-parameter Qwen3-8B student model via two-stage LoRA. On the ChinaTravel benchmark, SLTP using an 8B student achieves performance comparable to or surpassing that of other methods built on DeepSeek-V3 or GPT-4o as a backbone. Full article
(This article belongs to the Special Issue AI-Powered Natural Language Processing Applications)
21 pages, 11029 KB  
Article
Scale Calibration and Pressure-Driven Knowledge Distillation for Image Classification
by Jing Xie, Penghui Guan, Han Li, Chunhua Tang, Li Wang and Yingcheng Lin
Symmetry 2026, 18(1), 177; https://doi.org/10.3390/sym18010177 (registering DOI) - 18 Jan 2026
Abstract
Knowledge distillation achieves model compression by training a lightweight student network to mimic the output distribution of a larger teacher network. However, when the teacher becomes overconfident, its sharply peaked logits break the scale symmetry of supervision and induce high-variance gradients, leading to [...] Read more.
Knowledge distillation achieves model compression by training a lightweight student network to mimic the output distribution of a larger teacher network. However, when the teacher becomes overconfident, its sharply peaked logits break the scale symmetry of supervision and induce high-variance gradients, leading to unstable optimization. Meanwhile, research that focuses only on final-logit alignment often fails to utilize intermediate semantic structure effectively. This causes weak discrimination of student representations, especially under class imbalance. To address these issues, we propose Scale Calibration and Pressure-Driven Knowledge Distillation (SPKD): a one-stage framework comprising two lightweight, complementary mechanisms. First, a dynamic scale calibration module normalizes the teacher’s logits to a consistent magnitude, reducing gradient variance. Secondly, an adaptive pressure-driven mechanism refines student learning by preventing feature collapse and promoting intra-class compactness and inter-class separability. Extensive experiments on CIFAR-100 and ImageNet demonstrate that SPKD achieves superior performance to distillation baselines across various teacher–student combinations. For example, SPKD achieves a score of 74.84% on CIFAR-100 for the homogeneous architecture VGG13-VGG8. Additional evidence from logit norm and gradient variance statistics, as well as representation analyses, proves the fact that SPKD stabilizes optimization while learning more discriminative and well-structured features. Full article
(This article belongs to the Section Computer)
29 pages, 9144 KB  
Article
PhysGraphIR: Adaptive Physics-Informed Graph Learning for Infrared Thermal Field Prediction in Meter Boxes with Residual Sampling and Knowledge Distillation
by Hao Li, Siwei Li, Xiuli Yu and Xinze He
Electronics 2026, 15(2), 410; https://doi.org/10.3390/electronics15020410 (registering DOI) - 16 Jan 2026
Viewed by 19
Abstract
Infrared thermal field (ITF) prediction for meter boxes is crucial for the early warning of power system faults, yet this method faces three major challenges: data sparsity, complex geometry, and resource constraints in edge computing. Existing physics-informed neural network-graph neural network (PINN-GNN) approaches [...] Read more.
Infrared thermal field (ITF) prediction for meter boxes is crucial for the early warning of power system faults, yet this method faces three major challenges: data sparsity, complex geometry, and resource constraints in edge computing. Existing physics-informed neural network-graph neural network (PINN-GNN) approaches suffer from redundant physics residual calculations (over 70% of flat regions contain little information) and poor model generalization (requiring retraining for new box types), making them inefficient for deployment on edge devices. This paper proposes the PhysGraphIR framework, which employs an Adaptive Residual Sampling (ARS) mechanism to dynamically identify hotspot region nodes through a physics-aware gating network, calculating physics residuals only at critical nodes to reduce computational overhead by over 80%. In this study, a `hotspot region’ is explicitly defined as a localized area exhibiting significant temperature elevation relative to the background—typically concentrated around electrical connection terminals or wire entrances—which is critical for identifying potential thermal faults under sparse data conditions. Additionally, it utilizes a Physics Knowledge Distillation Graph Neural Network (Physics-KD GNN) to decouple physics learning from geometric learning, transferring universal heat conduction knowledge to specific meter box geometries through a teacher–student architecture. Experimental results demonstrate that on both synthetic and real-world meter box datasets, PhysGraphIR achieves a hotspot region mean absolute error (MAE) of 11.8 °C under 60% infrared data missing conditions, representing a 22% improvement over traditional PINN-GNN. The training speed is accelerated by 3.1 times, requiring only five infrared samples to adapt to new box types. The experiments prove that this method significantly enhances prediction accuracy and computational efficiency under sparse infrared data while maintaining physical consistency, providing a feasible solution for edge intelligence in power systems. Full article
27 pages, 6058 KB  
Article
Hierarchical Self-Distillation with Attention for Class-Imbalanced Acoustic Event Classification in Elevators
by Shengying Yang, Lingyan Chou, He Li, Zhenyu Xu, Boyang Feng and Jingsheng Lei
Sensors 2026, 26(2), 589; https://doi.org/10.3390/s26020589 - 15 Jan 2026
Viewed by 170
Abstract
Acoustic-based anomaly detection in elevators is crucial for predictive maintenance and operational safety, yet it faces significant challenges in real-world settings, including pervasive multi-source acoustic interference within confined spaces and severe class imbalance in collected data, which critically degrades the detection performance for [...] Read more.
Acoustic-based anomaly detection in elevators is crucial for predictive maintenance and operational safety, yet it faces significant challenges in real-world settings, including pervasive multi-source acoustic interference within confined spaces and severe class imbalance in collected data, which critically degrades the detection performance for minority yet critical acoustic events. To address these issues, this study proposes a novel hierarchical self-distillation framework. The method embeds auxiliary classifiers into the intermediate layers of a backbone network, creating a deep teacher–shallow student knowledge transfer paradigm optimized jointly via Kullback–Leibler divergence and feature alignment losses. A self-attentive temporal pooling layer is introduced to adaptively weigh discriminative time-frequency features, thereby mitigating temporal overlap interference, while a focal loss function is employed specifically in the teacher model to recalibrate the learning focus towards hard-to-classify minority samples. Extensive evaluations on the public UrbanSound8K dataset and a proprietary industrial elevator audio dataset demonstrate that the proposed model achieves superior performance, exceeding 90% in both accuracy and F1-score. Notably, it yields substantial improvements in recognizing rare events, validating its robustness for elevator acoustic monitoring. Full article
Show Figures

Figure 1

17 pages, 710 KB  
Article
KD-SecBERT: A Knowledge-Distilled Bidirectional Encoder Optimized for Open-Source Software Supply Chain Security in Smart Grid Applications
by Qinman Li, Xixiang Zhang, Weiming Liao, Tao Dai, Hongliang Zheng, Beiya Yang and Pengfei Wang
Electronics 2026, 15(2), 345; https://doi.org/10.3390/electronics15020345 - 13 Jan 2026
Viewed by 145
Abstract
With the acceleration of digital transformation, open-source software has become a fundamental component of modern smart grids and other critical infrastructures. However, the complex dependency structures of open-source ecosystems and the continuous emergence of vulnerabilities pose substantial challenges to software supply chain security. [...] Read more.
With the acceleration of digital transformation, open-source software has become a fundamental component of modern smart grids and other critical infrastructures. However, the complex dependency structures of open-source ecosystems and the continuous emergence of vulnerabilities pose substantial challenges to software supply chain security. In power information networks and cyber–physical control systems, vulnerabilities in open-source components integrated into Supervisory Control and Data Acquisition (SCADA), Energy Management System (EMS), and Distribution Management System (DMS) platforms and distributed energy controllers may propagate along the supply chain, threatening system security and operational stability. In such application scenarios, large language models (LLMs) often suffer from limited semantic accuracy when handling domain-specific security terminology, as well as deployment inefficiencies that hinder their practical adoption in critical infrastructure environments. To address these issues, this paper proposes KD-SecBERT, a domain-specific semantic bidirectional encoder optimized through multi-level knowledge distillation for open-source software supply chain security in smart grid applications. The proposed framework constructs a hierarchical multi-teacher ensemble that integrates general language understanding, cybersecurity-domain knowledge, and code semantic analysis, together with a lightweight student architecture based on depthwise separable convolutions and multi-head self-attention. In addition, a dynamic, multi-dimensional distillation strategy is introduced to jointly perform layer-wise representation alignment, ensemble knowledge fusion, and task-oriented optimization under a progressive curriculum learning scheme. Extensive experiments conducted on a multi-source dataset comprising National Vulnerability Database (NVD) and Common Vulnerabilities and Exposures (CVE) entries, security-related GitHub code, and Open Web Application Security Project (OWASP) test cases show that KD-SecBERT achieves an accuracy of 91.3%, a recall of 90.6%, and an F1-score of 89.2% on vulnerability classification tasks, indicating strong robustness in recognizing both common and low-frequency security semantics. These results demonstrate that KD-SecBERT provides an effective and practical solution for semantic analysis and software supply chain risk assessment in smart grids and other critical-infrastructure environments. Full article
Show Figures

Figure 1

15 pages, 1363 KB  
Article
Hierarchical Knowledge Distillation for Efficient Model Compression and Transfer: A Multi-Level Aggregation Approach
by Titinunt Kitrungrotsakul and Preeyanuch Srichola
Information 2026, 17(1), 70; https://doi.org/10.3390/info17010070 - 12 Jan 2026
Viewed by 185
Abstract
The success of large-scale deep learning models in remote sensing tasks has been transformative, enabling significant advances in image classification, object detection, and image–text retrieval. However, their computational and memory demands pose challenges for deployment in resource-constrained environments. Knowledge distillation (KD) alleviates these [...] Read more.
The success of large-scale deep learning models in remote sensing tasks has been transformative, enabling significant advances in image classification, object detection, and image–text retrieval. However, their computational and memory demands pose challenges for deployment in resource-constrained environments. Knowledge distillation (KD) alleviates these issues by transferring knowledge from a strong teacher to a student model, which can be compact for efficient deployment or architecturally matched to improve accuracy under the same inference budget. In this paper, we introduce Hierarchical Multi-Segment Knowledge Distillation (HIMS_KD), a multi-stage framework that sequentially distills knowledge from a teacher into multiple assistant models specialized in low-, mid-, and high-level representations, and then aggregates their knowledge into the final student. We integrate feature-level alignment, auxiliary similarity-logit alignment, and supervised loss during distillation. Experiments on benchmark remote sensing datasets (RSITMD and RSICD) show that HIMS_KD improves retrieval performance and enhances zero-shot classification; and when a compact student is used, it reduces deployment cost while retaining strong accuracy. Full article
(This article belongs to the Special Issue AI-Based Image Processing and Computer Vision)
Show Figures

Figure 1

42 pages, 3251 KB  
Article
Efficient and Accurate Epilepsy Seizure Prediction and Detection Based on Multi-Teacher Knowledge Distillation RGF-Model
by Wei Cao, Qi Li, Anyuan Zhang and Tianze Wang
Brain Sci. 2026, 16(1), 83; https://doi.org/10.3390/brainsci16010083 - 9 Jan 2026
Viewed by 287
Abstract
Background: Epileptic seizures are unpredictable, and while existing deep learning models achieve high accuracy, their deployment on wearable devices is constrained by high computational costs and latency. To address this, this work proposes the RGF-Model, a lightweight network that unifies seizure prediction and [...] Read more.
Background: Epileptic seizures are unpredictable, and while existing deep learning models achieve high accuracy, their deployment on wearable devices is constrained by high computational costs and latency. To address this, this work proposes the RGF-Model, a lightweight network that unifies seizure prediction and detection within a single causal framework. Methods: By integrating Feature-wise Linear Modulation (FiLM) with a Ring-Buffer Gated Recurrent Unit (Ring-GRU), the model achieves adaptive task-specific feature conditioning while strictly enforcing causal consistency for real-time inference. A multi-teacher knowledge distillation strategy is employed to transfer complementary knowledge from complex teacher ensembles to the lightweight student, significantly reducing complexity without sacrificing accuracy. Results: Evaluations on the CHB-MIT and Siena datasets demonstrate that the RGF-Model outperforms state-of-the-art teacher models in terms of efficiency while maintaining comparable accuracy. Specifically, on CHB-MIT, it achieves 99.54% Area Under the Curve (AUC) and 0.01 False Prediction Rate per hour (FPR/h) for prediction, and 98.78% Accuracy (Acc) for detection, with only 0.082 million parameters. Statistical significance was assessed using a random predictor baseline (p < 0.05). Conclusions: The results indicate that the RGF-Model provides a highly efficient solution for real-time wearable epilepsy monitoring. Full article
(This article belongs to the Section Neurotechnology and Neuroimaging)
Show Figures

Figure 1

16 pages, 1443 KB  
Article
DCRDF-Net: A Dual-Channel Reverse-Distillation Fusion Network for 3D Industrial Anomaly Detection
by Chunshui Wang, Jianbo Chen and Heng Zhang
Sensors 2026, 26(2), 412; https://doi.org/10.3390/s26020412 - 8 Jan 2026
Viewed by 125
Abstract
Industrial surface defect detection is essential for ensuring product quality, but real-world production lines often provide only a limited number of defective samples, making supervised training difficult. Multimodal anomaly detection with aligned RGB and depth data is a promising solution, yet existing fusion [...] Read more.
Industrial surface defect detection is essential for ensuring product quality, but real-world production lines often provide only a limited number of defective samples, making supervised training difficult. Multimodal anomaly detection with aligned RGB and depth data is a promising solution, yet existing fusion schemes tend to overlook modality-specific characteristics and cross-modal inconsistencies, so that defects visible in only one modality may be suppressed or diluted. In this work, we propose DCRDF-Net, a dual-channel reverse-distillation fusion network for unsupervised RGB–depth industrial anomaly detection. The framework learns modality-specific normal manifolds from nominal RGB and depth data and detects defects as deviations from these learned manifolds. It consists of three collaborative components: a Perlin-guided pseudo-anomaly generator that injects appearance–geometry-consistent perturbations into both modalities to enrich training signals; a dual-channel reverse-distillation architecture with guided feature refinement that denoises teacher features and constrains RGB and depth students towards clean, defect-free representations; and a cross-modal squeeze–excitation gated fusion module that adaptively combines RGB and depth anomaly evidence based on their reliability and agreement.Extensive experiments on the MVTec 3D-AD dataset show that DCRDF-Net achieves 97.1% image-level I-AUROC and 98.8% pixel-level PRO, surpassing current state-of-the-art multimodal methods on this benchmark. Full article
(This article belongs to the Section Sensor Networks)
Show Figures

Figure 1

20 pages, 2549 KB  
Article
RD-RE: Reverse Distillation with Feature Reconstruction Enhancement for Industrial Anomaly Detection
by Youjia Fu and Antao Lin
Computers 2026, 15(1), 21; https://doi.org/10.3390/computers15010021 - 4 Jan 2026
Viewed by 284
Abstract
Industrial anomaly detection methods based on reverse distillation (RD) have shown significant potential. However, existing RD approaches struggle to achieve an effective balance between constraining the feature consistency of the teacher–student networks and maintaining differentiated representation capability, which is crucial for precise anomaly [...] Read more.
Industrial anomaly detection methods based on reverse distillation (RD) have shown significant potential. However, existing RD approaches struggle to achieve an effective balance between constraining the feature consistency of the teacher–student networks and maintaining differentiated representation capability, which is crucial for precise anomaly detection. To address this challenge, we propose Reverse Distillation with Feature Reconstruction Enhancement (RD-RE) for Industrial Anomaly Detection. Firstly, we design a cross-stage feature fusion student network to integrate spatial detail information from the encoder with rich semantic information from the decoder. Secondly, we introduce a Locally Aware Dynamic Attention (LDA) module to enhance local detail feature response, thereby improving the model’s robustness in capturing anomalous regions. Finally, a Context-Aware Adaptive Multi-Scale Feature Fusion (CFFMS-FF) module is designed to constrain the consistency of local feature reconstruction. Experiments on the MVTec AD benchmark dataset demonstrate the effectiveness of RD-RE, achieving competitive results of 99.0%, 95.8%, 78.3%, and 99.7% on pixel-level AUROC, PRO, and AP and image-level AUROC metrics, and outperforming existing RD-based approaches. These results conclude that the integration of cross-stage fusion and local attention effectively mitigates the representation-consistency trade-off, providing a more robust solution for industrial anomaly localization. Full article
Show Figures

Figure 1

34 pages, 7143 KB  
Review
Knowledge Distillation in Object Detection: A Survey from CNN to Transformer
by Tahira Shehzadi, Rabya Noor, Ifza Ifza, Marcus Liwicki, Didier Stricker and Muhammad Zeshan Afzal
Sensors 2026, 26(1), 292; https://doi.org/10.3390/s26010292 - 2 Jan 2026
Viewed by 515
Abstract
Deep learning models, especially for object detection have gained immense popularity in computer vision. These models have demonstrated remarkable accuracy and performance, driving advancements across various applications. However, the high computational complexity and large storage requirements of state-of-the-art object detection models pose significant [...] Read more.
Deep learning models, especially for object detection have gained immense popularity in computer vision. These models have demonstrated remarkable accuracy and performance, driving advancements across various applications. However, the high computational complexity and large storage requirements of state-of-the-art object detection models pose significant challenges for deployment on resource-constrained devices like mobile phones and embedded systems. Knowledge Distillation (KD) has emerged as a prominent solution to these challenges, effectively compressing large, complex teacher models into smaller, efficient student models. This technique maintains good accuracy while significantly reducing model size and computational demands, making object detection models more practical for real-world applications. This survey provides a comprehensive review of KD-based object detection models developed in recent years. It offers an in-depth analysis of existing techniques, highlighting their novelty and limitations, and explores future research directions. The survey covers the different distillation algorithms used in object detection. It also examines extended applications of knowledge distillation in object detection, such as improvements for lightweight models, addressing catastrophic forgetting in incremental learning, and enhancing small object detection. Furthermore, the survey also delves into the application of knowledge distillation in other domains such as image classification, semantic segmentation, 3D reconstruction, and document analysis. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

19 pages, 1920 KB  
Article
Knowledge Distillation Meets Reinforcement Learning: A Cluster-Driven Approach to Image Processing
by Titinunt Kitrungrotsakul, Yingying Xu and Preeyanuch Srichola
Sensors 2026, 26(1), 209; https://doi.org/10.3390/s26010209 - 28 Dec 2025
Viewed by 569
Abstract
Knowledge distillation (KD) enables the training of lightweight yet effective models, particularly in the visual domain. Meanwhile, reinforcement learning (RL) facilitates adaptive learning through environment-driven interactions, addressing the limitations of KD in handling dynamic and complex tasks. We propose a novel two-stage framework [...] Read more.
Knowledge distillation (KD) enables the training of lightweight yet effective models, particularly in the visual domain. Meanwhile, reinforcement learning (RL) facilitates adaptive learning through environment-driven interactions, addressing the limitations of KD in handling dynamic and complex tasks. We propose a novel two-stage framework integrating Knowledge Distillation with Reinforcement Learning (KDRL) to enhance model adaptability to complex data distributions, such as remote sensing and medical imaging. In the first stage, supervised fine-tuning guides the student model using logit and feature-based distillation. The second stage refines the model via RL, leveraging confidence-based and cluster alignment rewards while dynamically reducing reliance on task loss. By combining the strengths of supervised knowledge distillation and reinforcement learning, KDRL provides a comprehensive approach to address the dual challenges of model efficiency and domain heterogeneity. A key innovation is the introduction of auxiliary layers within the student encoder to evaluate and reward the alignment of the characteristics with the teacher’s cluster centers, promoting robust feature learning. Our framework demonstrates superior performance and computational efficiency across diverse tasks, establishing a scalable design for efficient model training. Across remote sensing benchmarks, KDRL boosts the lightweight CLIP/ViT-B-32 student to 69.51% zero-shot accuracy on AID and 80.08% on RESISC45; achieves state-of-the-art cross-modal retrieval on RSITMD with 67.44% (I→T) and 74.76% (T→I) at R@10; and improves DIOR-RSVG visual-grounding precision to 64.21% at Pr@0.9. These gains matter in real deployments by reducing missed targets and speeding analyst search on resource-constrained platforms. Full article
Show Figures

Figure 1

23 pages, 9916 KB  
Article
Online Prototype Angular Balanced Self-Distillation for Non-Ideal Annotation in Remote Sensing Image Segmentation
by Hailun Liang, Haowen Zheng, Jing Huang, Hui Ma and Yanyan Liang
Remote Sens. 2026, 18(1), 22; https://doi.org/10.3390/rs18010022 - 22 Dec 2025
Viewed by 345
Abstract
This paper proposes an Online Prototype Angular Balanced Self-Distillation (OPAB) framework to address the challenges posed by non-ideal annotation in remote sensing image semantic segmentation. “Non-ideal annotation” typically refers to scenarios where long-tailed class distributions and label noise coexist in both training and [...] Read more.
This paper proposes an Online Prototype Angular Balanced Self-Distillation (OPAB) framework to address the challenges posed by non-ideal annotation in remote sensing image semantic segmentation. “Non-ideal annotation” typically refers to scenarios where long-tailed class distributions and label noise coexist in both training and testing sets. Existing methods often tackle these two issues separately, overlooking the conflict between noisy samples and minority classes as well as the unreliable early stopping caused by non-clean validation sets, which exacerbates the model’s tendency to memorize noisy samples. OPAB mitigates the imbalance problem by employing an improved bilateral-branch network (BBN) that integrates max-min angular regularization (MMA) and category-level inverse weighting to achieve balanced hyperspherical representations. The balanced hyperspherical representations further facilitate noise-clean sample separation and early stopping estimation based on large category-wise Local Intrinsic Dimensionality (LID). Moreover, OPAB introduces a bootstrap teacher label refinement strategy coupled with a student full-parameter retraining mechanism to avoid memorizing noisy samples. Experimental results on ISPRS datasets demonstrate that OPAB achieves a 2.0% mIoU improvement under non-ideal annotation conditions and achieves 89% mIoU after cross-set correction, showcasing strong robustness across different backbones and effective iterative calibration capability. Full article
(This article belongs to the Section AI Remote Sensing)
Show Figures

Figure 1

19 pages, 2690 KB  
Article
Pattern Learning and Knowledge Distillation for Single-Cell Data Annotation
by Ming Zhang, Boran Ren and Xuedong Li
Biology 2026, 15(1), 2; https://doi.org/10.3390/biology15010002 - 19 Dec 2025
Viewed by 431
Abstract
Transferring cell type annotations from reference dataset to query dataset is a fundamental problem in AI-based single-cell data analysis. However, single-cell measurement techniques lead to domain gaps between multiple batches or datasets. The existing deep learning methods lack consideration on batch integration when [...] Read more.
Transferring cell type annotations from reference dataset to query dataset is a fundamental problem in AI-based single-cell data analysis. However, single-cell measurement techniques lead to domain gaps between multiple batches or datasets. The existing deep learning methods lack consideration on batch integration when learning reference annotations, which is a challenge for cell type annotation on multiple query batches. For cell representation, batch integration can not only eliminate the gaps between batches or datasets but also improve the heterogeneity of cell clusters. In this study, we proposed PLKD, a cell type annotation method based on pattern learning and knowledge distillation. PLKD consists of Teacher (Transformer) and Student (MLP). Teacher groups all input genes (features) into different gene sets (patterns), and each pattern represents a specific biological function. This design enables model to focus on biologically relevant functions interaction rather than gene-level expression that is susceptible to gaps of batches. In addition, knowledge distillation makes lightweight Student resistant to noise, allowing Student to infer quickly and robustly. Furthermore, PLKD supports multi-modal cell type annotation, multi-modal integration and other tasks. Benchmark experiments demonstrate that PLKD is able to achieve accurate and robust cell type annotation. Full article
Show Figures

Figure 1

22 pages, 3733 KB  
Article
LightEdu-Net: Noise-Resilient Multimodal Edge Intelligence for Student-State Monitoring in Resource-Limited Environments
by Chenjia Huang, Yanli Chen, Bocheng Zhou, Xiuqi Cai, Ziying Zhai, Jiarui Zhang and Yan Zhan
Sensors 2025, 25(24), 7529; https://doi.org/10.3390/s25247529 - 11 Dec 2025
Viewed by 464
Abstract
Multimodal perception for student-state monitoring is difficult to deploy in rural classrooms because sensors are noisy and computing resources are highly constrained. This work targets these challenges by enabling noise-resilient, multimodal, real-time student-state recognition on low-cost edge devices. We propose LightEdu-Net, a sensor-noise-adaptive [...] Read more.
Multimodal perception for student-state monitoring is difficult to deploy in rural classrooms because sensors are noisy and computing resources are highly constrained. This work targets these challenges by enabling noise-resilient, multimodal, real-time student-state recognition on low-cost edge devices. We propose LightEdu-Net, a sensor-noise-adaptive Transformer-based multimodal network that integrates visual, physiological, and environmental signals in a unified lightweight architecture. The model incorporates three key components: a sensor noise adaptive module (SNAM) to suppress degraded sensor inputs, a cross-modal attention fusion module (CMAF) to capture complementary temporal dependencies across modalities, and an edge-aware knowledge distillation module (EAKD) to transfer knowledge from high-capacity teachers to an embedded-friendly student network. We construct a multimodal behavioral dataset from several rural schools and formulate student-state recognition as a multimodal classification task with explicit evaluation of noise robustness and edge deployability. Experiments show that LightEdu-Net achieves 92.4% accuracy with an F1-score of 91.4%, outperforming representative lightweight CNN and Transformer baselines. Under a noise level of 0.3, accuracy drops by only 1.1%, indicating strong robustness to sensor degradation. Deployment experiments further show that the model operates in real time on Jetson Nano with a latency of 42.8 ms (23.4 FPS) and maintains stable high accuracy on Raspberry Pi 4B and Intel NUC platforms. Beyond technical performance, the proposed system provides a low-cost and quantifiable mechanism for capturing fine-grained learning process indicators, offering new data support for educational economics studies on instructional efficiency and resource allocation in underdeveloped regions. Full article
Show Figures

Figure 1

41 pages, 2260 KB  
Article
Development of a Knowledge-Distillation-Based Breast Cancer Classifier for LMICs: Comparison with Pruning and Quantization
by Falmata Modu, Rajesh Prasad and Farouq Aliyu
Electronics 2025, 14(24), 4842; https://doi.org/10.3390/electronics14244842 - 9 Dec 2025
Viewed by 331
Abstract
Breast cancer (BC) mortality rates remain high in Low- and Middle-Income Countries (LMICs) due to limited awareness, poverty, and inadequate medical facilities that hinder early detection. Although deep learning models have achieved high accuracy in BC detection (BCD), they require substantial computational resources, [...] Read more.
Breast cancer (BC) mortality rates remain high in Low- and Middle-Income Countries (LMICs) due to limited awareness, poverty, and inadequate medical facilities that hinder early detection. Although deep learning models have achieved high accuracy in BC detection (BCD), they require substantial computational resources, making them unsuitable for deployment in remote or rural areas. This study proposes a lightweight convolutional neural network (CNN) using Knowledge Distillation (KD) for BCD, where a large Teacher Model (TM) transfers learned representations to a smaller Student Model (SM), which is better suited for deployment on low-power devices. We compare it with two prominent model compression techniques: pruning and quantization. Experimental results indicate that the TensorFlow Lite (TFLite)-optimized Student Model (SM_TFLite) achieved 97.67% accuracy, representing a 2.33% relative loss to its teacher, a result comparable to other compression techniques. Its mean accuracy is 73.97% with a 95% Confidence Interval of [65.04%, 82.90%] in a cross-dataset experiment. However, SM_TFLite was the most compact (5.21 kB) and fastest (3.3 ms latency), outperforming both pruned (2924.31 kB, 13.68 ms) and quantized models (746–751 kB, 4–5 ms). Evaluation on a Raspberry Pi 4 Model B demonstrated that all models exhibited similar CPU and memory usage, with SM_TFLite causing only a minor increase in device temperature. These results demonstrate that KD combined with TFLite conversion offers the best trade-off between accuracy, compactness, and speed. Full article
Show Figures

Figure 1

Back to TopTop