Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (4,156)

Search Parameters:
Keywords = pretrained models

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 1284 KB  
Article
Probabilistic Indoor 3D Object Detection from RGB-D via Gaussian Distribution Estimation
by Hyeong-Geun Kim
Mathematics 2026, 14(3), 421; https://doi.org/10.3390/math14030421 - 26 Jan 2026
Abstract
Conventional object detectors represent each object by a deterministic bounding box, regressing its center and size from RGB images. However, such discrete parameterization ignores the inherent uncertainty in object appearance and geometric projection, which can be more naturally modeled as a probabilistic density [...] Read more.
Conventional object detectors represent each object by a deterministic bounding box, regressing its center and size from RGB images. However, such discrete parameterization ignores the inherent uncertainty in object appearance and geometric projection, which can be more naturally modeled as a probabilistic density field. Recent works have introduced Gaussian-based formulations that treat objects as distributions rather than boxes, yet they remain limited to 2D images or require late fusion between image and depth modalities. In this paper, we propose a unified Gaussian-based framework for direct 3D object detection from RGB-D inputs. Our method is built upon a vision transformer backbone to effectively capture global context. Instead of separately embedding RGB and depth features or refining depth within region proposals, our method takes a full four-channel RGB-D tensor and predicts the mean and covariance of a 3D Gaussian distribution for each object in a single forward pass. We extend a pretrained vision transformer to accept four-channel inputs by augmenting the patch embedding layer while preserving ImageNet-learned representations. This formulation allows the detector to represent both object location and geometric uncertainty in 3D space. By optimizing divergence metrics such as the Kullback–Leibler or Bhattacharyya distances between predicted and target distributions, the network learns a physically consistent probabilistic representation of objects. Experimental results on the SUN RGB-D benchmark demonstrate that our approach achieves competitive performance compared to state-of-the-art point-cloud-based methods while offering uncertainty-aware and geometrically interpretable 3D detections. Full article
Show Figures

Figure 1

26 pages, 2618 KB  
Article
A Cascaded Batch Bayesian Yield Optimization Method for Analog Circuits via Deep Transfer Learning
by Ziqi Wang, Kaisheng Sun and Xiao Shi
Electronics 2026, 15(3), 516; https://doi.org/10.3390/electronics15030516 - 25 Jan 2026
Abstract
In nanometer integrated-circuit (IC) manufacturing, advanced technology scaling has intensified the effects of process variations on circuit reliability and performance. Random fluctuations in parameters such as threshold voltage, channel length, and oxide thickness further degrade design margins and increase the likelihood of functional [...] Read more.
In nanometer integrated-circuit (IC) manufacturing, advanced technology scaling has intensified the effects of process variations on circuit reliability and performance. Random fluctuations in parameters such as threshold voltage, channel length, and oxide thickness further degrade design margins and increase the likelihood of functional failures. These variations often lead to rare circuit failure events, underscoring the importance of accurate yield estimation and robust design methodologies. Conventional Monte Carlo yield estimation is computationally infeasible as millions of simulations are required to capture failure events with extremely low probability. This paper presents a novel reliability-based circuit design optimization framework that leverages deep transfer learning to improve the efficiency of repeated yield analysis in optimization iterations. Based on pre-trained neural network models from prior design knowledge, we utilize model fine-tuning to accelerate importance sampling (IS) for yield estimation. To improve estimation accuracy, adversarial perturbations are introduced to calibrate uncertainty near the model decision boundary. Moreover, we propose a cascaded batch Bayesian optimization (CBBO) framework that incorporates a smart initialization strategy and a localized penalty mechanism, guiding the search process toward high-yield regions while satisfying nominal performance constraints. Experimental validation on SRAM circuits and amplifiers reveals that CBBO achieves a computational speedup of 2.02×–4.63× over state-of-the-art (SOTA) methods, without compromising accuracy and robustness. Full article
(This article belongs to the Topic Advanced Integrated Circuit Design and Application)
Show Figures

Figure 1

18 pages, 321 KB  
Article
Instruction-Tuned Decoder-Only Large Language Models for Efficient Extreme Summarization on Consumer-Grade GPUs
by Attia Fathalla Elatiky, Ahmed M. Hamad, Heba Khaled and Mahmoud Fayez
Algorithms 2026, 19(2), 96; https://doi.org/10.3390/a19020096 (registering DOI) - 25 Jan 2026
Abstract
Extreme summarization generates very short summaries, typically a single sentence, answering the question “What is the document about?”. Although large language models perform well in text generation, fine-tuning them for summarization often requires substantial computational resources that are unavailable to many researchers. In [...] Read more.
Extreme summarization generates very short summaries, typically a single sentence, answering the question “What is the document about?”. Although large language models perform well in text generation, fine-tuning them for summarization often requires substantial computational resources that are unavailable to many researchers. In this study, we present an effective method for instruction-tuning open decoder-only large language models under limited GPU resources. The proposed approach combines parameter-efficient fine-tuning techniques, such as Low-Rank Adaptation (LoRA), with quantization to reduce memory requirements, enabling training on a single consumer-grade GPU. We fine-tuned a pre-trained decoder-only model on the XSum dataset using an instruction-following format. Experimental results demonstrate that the proposed decoder-only approach achieves competitive performance on the XSum dataset under strict GPU memory constraints. On the full test set, the proposed 2G–1R pipeline attains ROUGE-1/2/L F1 scores of 46.0/22.0/37.0 and a BERTScore F1 of 0.917, outperforming the individual generator models in lexical overlap and semantic similarity. Evaluation was conducted using traditional overlap-based metrics (ROUGE) and semantic metrics, including BERTScore and G-Eval. While remaining competitive in ROUGE compared to strong encoder–decoder baselines, the pipeline consistently produces summaries with higher semantic quality. These findings demonstrate that large decoder-only language models can be efficiently fine-tuned for extreme summarization on limited consumer-grade hardware without sacrificing output quality. Full article
(This article belongs to the Topic Applications of NLP, AI, and ML in Software Engineering)
Show Figures

Figure 1

26 pages, 712 KB  
Article
Comparing Multi-Scale and Pipeline Models for Speaker Change Detection
by Alymzhan Toleu, Gulmira Tolegen and Bagashar Zhumazhanov
Acoustics 2026, 8(1), 5; https://doi.org/10.3390/acoustics8010005 - 25 Jan 2026
Abstract
Speaker change detection (SCD) in long, multi-party meetings is essential for diarization, Automatic speech recognition (ASR), and summarization, and is now often performed in the space of pre-trained speech embeddings. However, unsupervised approaches remain dominant when timely labeled audio is scarce, and their [...] Read more.
Speaker change detection (SCD) in long, multi-party meetings is essential for diarization, Automatic speech recognition (ASR), and summarization, and is now often performed in the space of pre-trained speech embeddings. However, unsupervised approaches remain dominant when timely labeled audio is scarce, and their behavior under a unified modeling setup is still not well understood. In this paper, we systematically compare two representative unsupervised approaches on the multi-talker audio meeting corpus: (i) a clustering-based pipeline that segments and clusters embeddings/features and scores boundaries via cluster changes and jump magnitude, and (ii) a multi-scale jump-based detector that measures embedding discontinuities at several window lengths and fuses them via temporal clustering and voting. Using a shared front-end and protocol, we vary the underlying features (ECAPA, WavLM, wav2vec 2.0, MFCC, and log-Mel) and test the model’s robustness under additive noise. The results show that embedding choice is crucial and that the two methods offer complementary trade-offs: the pipeline yields low false alarm rates but higher misses, while the multi-scale detector achieves relatively high recall at the cost of many false alarms. Full article
Show Figures

Figure 1

23 pages, 2628 KB  
Article
Scattering-Based Self-Supervised Learning for Label-Efficient Cardiac Image Segmentation
by Serdar Alasu and Muhammed Fatih Talu
Electronics 2026, 15(3), 506; https://doi.org/10.3390/electronics15030506 - 24 Jan 2026
Viewed by 49
Abstract
Deep learning models based on supervised learning rely heavily on large annotated datasets and particularly in the context of medical image segmentation, the requirement for pixel-level annotations makes the labeling process labor-intensive, time-consuming and expensive. To overcome these limitations, self-supervised learning (SSL) has [...] Read more.
Deep learning models based on supervised learning rely heavily on large annotated datasets and particularly in the context of medical image segmentation, the requirement for pixel-level annotations makes the labeling process labor-intensive, time-consuming and expensive. To overcome these limitations, self-supervised learning (SSL) has emerged as a promising alternative that learns generalizable representations from unlabeled data; however, existing SSL frameworks often employ highly parameterized encoders that are computationally expensive and may lack robustness in label-scarce settings. In this work, we propose a scattering-based SSL framework that integrates Wavelet Scattering Networks (WSNs) and Parametric Scattering Networks (PSNs) into a Bootstrap Your Own Latent (BYOL) pretraining pipeline. By replacing the initial stages of the BYOL encoder with fixed or learnable scattering-based front-ends, the proposed method reduces the number of learnable parameters while embedding translation-invariant and small deformation-stable representations into the SSL pipeline. The pretrained encoders are transferred to a U-Net and fine-tuned for cardiac image segmentation on two datasets with different imaging modalities, namely, cardiac cine MRI (ACDC) and cardiac CT (CHD), under varying amounts of labeled data. Experimental results show that scattering-based SSL pretraining consistently improves segmentation performance over random initialization and ImageNet pretraining in low-label regimes, with particularly pronounced gains when only a few labeled patients are available. Notably, the PSN variant achieves improvements of 4.66% and 2.11% in average Dice score over standard BYOL with only 5 and 10 labeled patients, respectively, on the ACDC dataset. These results demonstrate that integrating mathematically grounded scattering representations into SSL pipelines provides a robust and data-efficient initialization strategy for cardiac image segmentation, particularly under limited annotation and domain shift. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

23 pages, 2066 KB  
Article
Intelligent Attention-Driven Deep Learning for Hip Disease Diagnosis: Fusing Multimodal Imaging and Clinical Text for Enhanced Precision and Early Detection
by Jinming Zhang, He Gong, Pengling Ren, Shuyu Liu, Zhengbin Jia, Lizhen Wang and Yubo Fan
Medicina 2026, 62(2), 250; https://doi.org/10.3390/medicina62020250 - 24 Jan 2026
Viewed by 43
Abstract
Background: Hip joint disorders exhibit diverse and overlapping radiological features, complicating early diagnosis and limiting the diagnostic value of single-modality imaging. Isolated imaging or clinical data may therefore inadequately represent disease-specific pathological characteristics. Methods: This retrospective study included 605 hip joints [...] Read more.
Background: Hip joint disorders exhibit diverse and overlapping radiological features, complicating early diagnosis and limiting the diagnostic value of single-modality imaging. Isolated imaging or clinical data may therefore inadequately represent disease-specific pathological characteristics. Methods: This retrospective study included 605 hip joints from Center A (2018–2024), comprising normal hips, osteoarthritis, osteonecrosis of the femoral head (ONFH), and femoroacetabular impingement (FAI). An independent cohort of 24 hips from Center B (2024–2025) was used for external validation. A multimodal deep learning framework was developed to jointly analyze radiographs, CT volumes, and clinical texts. Features were extracted using ResNet50, 3D-ResNet50, and a pretrained BERT model, followed by attention-based fusion for four-class classification. Results: The combined Clinical+X-ray+CT model achieved an AUC of 0.949 on the internal test set, outperforming all single-modality models. Improvements were consistently observed in accuracy, sensitivity, specificity, and decision curve analysis. Grad-CAM visualizations confirmed that the model attended to clinically relevant anatomical regions. Conclusions: Attention-based multimodal feature fusion substantially improves diagnostic performance for hip joint diseases, providing an interpretable and clinically applicable framework for early detection and precise classification in orthopedic imaging. Full article
(This article belongs to the Special Issue Artificial Intelligence in Medicine: Shaping the Future of Healthcare)
Show Figures

Figure 1

16 pages, 1428 KB  
Article
StrDiSeg: Adapter-Enhanced DINOv3 for Automated Ischemic Stroke Lesion Segmentation
by Qiong Chen, Donghao Zhang, Yimin Chen, Siyuan Zhang, Yue Sun, Fabiano Reis, Li M. Li, Li Yuan, Huijuan Jin and Wu Qiu
Bioengineering 2026, 13(2), 133; https://doi.org/10.3390/bioengineering13020133 - 23 Jan 2026
Viewed by 120
Abstract
Deep vision foundation models such as DINOv3 offer strong visual representation capacity, but their direct deployment in medical image segmentation remains difficult due to the limited availability of annotated clinical data and the computational cost of full fine-tuning. This study proposes an adaptation [...] Read more.
Deep vision foundation models such as DINOv3 offer strong visual representation capacity, but their direct deployment in medical image segmentation remains difficult due to the limited availability of annotated clinical data and the computational cost of full fine-tuning. This study proposes an adaptation framework called StrDiSeg that integrates lightweight bottleneck adapters between selected transformer layers of DINOv3, enabling task-specific learning while preserving pretrained knowledge. An attention-enhanced U-Net decoder with multi-scale feature fusion further refines the representations. Experiments were performed on two publicly available ischemic stroke lesion segmentation datasets—AISD (Non Contrast CT) and ISLES22 (DWI). The proposed method achieved Dice scores of 0.516 on AISD and 0.824 on ISLES22, outperforming baseline models and demonstrating strong robustness across different clinical imaging modalities. These results indicate that adapter-based fine-tuning provides a practical and computationally efficient strategy for leveraging large pretrained vision models in medical image segmentation. Full article
Show Figures

Figure 1

22 pages, 1462 KB  
Article
Effects of Window and Batch Size on Autoencoder-LSTM Models for Remaining Useful Life Prediction
by Eugene Jeon, Donghwan Jin and Yeonhee Kim
Machines 2026, 14(2), 135; https://doi.org/10.3390/machines14020135 - 23 Jan 2026
Viewed by 59
Abstract
Remaining useful life (RUL) prediction is central to predictive maintenance, but acquiring sufficient run-to-failure data remains challenging. To better exploit limited labeled data, this study investigates a pipeline combining an unsupervised autoencoder (AE) and supervised LSTM regression on the NASA C-MAPSS dataset. Building [...] Read more.
Remaining useful life (RUL) prediction is central to predictive maintenance, but acquiring sufficient run-to-failure data remains challenging. To better exploit limited labeled data, this study investigates a pipeline combining an unsupervised autoencoder (AE) and supervised LSTM regression on the NASA C-MAPSS dataset. Building on an AE-LSTM baseline, we analyze how window size and batch size affect accuracy and training efficiency. Using the FD001 and FD004 subsets with training-capped RUL labels, we perform multi-seed experiments over a wide grid of window lengths and batch sizes. The AE is pre-trained on normalized sensor streams and reused as a feature extractor, while the LSTM head is trained with early stopping. Performance was assessed using RMSE, C-MAPSS score, and training time, reporting 95% confidence intervals. Results show that fine-tuning the encoder with a batch size of 128 yielded the best mean RMSE of 13.99 (FD001) and 28.67 (FD004). We obtained stable optimal window ranges (40–70 for FD001; 60–80 for FD004) and found that batch sizes of 64–256 offer the best accuracy–efficiency trade-off. These optimal ranges were further validated using Particle Swarm Optimization (PSO). These findings offer practical recommendations for tuning AE-LSTM-based RUL prediction models and demonstrate that performance remains stable within specific hyperparameter ranges. Full article
17 pages, 3892 KB  
Article
Transformer-Driven Semi-Supervised Learning for Prostate Cancer Histopathology: A DINOv2–TransUNet Framework
by Rubina Akter Rabeya, Jeong-Wook Seo, Nam Hoon Cho, Hee-Cheol Kim and Heung-Kook Choi
Mach. Learn. Knowl. Extr. 2026, 8(2), 26; https://doi.org/10.3390/make8020026 - 23 Jan 2026
Viewed by 50
Abstract
Prostate cancer is diagnosed through a comprehensive study of histopathology slides, which takes time and requires professional interpretation. To minimize this load, we developed a semi-supervised learning technique that combines transformer-based representation learning and a custom TransUNet classifier. To capture a wide range [...] Read more.
Prostate cancer is diagnosed through a comprehensive study of histopathology slides, which takes time and requires professional interpretation. To minimize this load, we developed a semi-supervised learning technique that combines transformer-based representation learning and a custom TransUNet classifier. To capture a wide range of morphological structures without manual annotation, our method pretrains DINOv2 on 10,000 unlabeled prostate tissue patches. After receiving the transformer-derived features, a bespoke CNN-based decoder uses residual upsampling and carefully constructed skip connections to merge data from many spatial scales. Expert pathologists identified only 20% of the patches in the whole dataset; the remaining unlabeled samples were contributed by using a consistency-driven learning method that promoted reliable predictions across various augmentations. The model received precision and recall scores of 91.81% and 89.02%, respectively, and an accuracy of 93.78% on an additional test set. These results exceed the performance of a conventional U-Net and a baseline encoder–decoder network. All things considered, the localized CNN (Convolutional Neural Network) decoding and global transformer attention provide a reliable method for prostate cancer classification in situations with little annotated data. Full article
Show Figures

Graphical abstract

18 pages, 2210 KB  
Article
SPINET-KSP: A Multi-Modal LLM-Graph Foundation Model for Contextual Prediction of Kinase-Substrate-Phosphatase Triads
by Michael Olaolu Arowolo, Marian Emmanuel Okon, Davis Austria, Muhammad Azam and Sulaiman Olaniyi Abdulsalam
Kinases Phosphatases 2026, 4(1), 3; https://doi.org/10.3390/kinasesphosphatases4010003 - 22 Jan 2026
Viewed by 31
Abstract
Reversible protein phosphorylation is an important regulatory mechanism in cellular signalling and disease, regulated by the opposing actions of kinases and phosphatases. Modern computer methods predict kinase–substrate or phosphatase–substrate interactions in isolation and lack specificity for biological conditions, neglecting triadic regulation. We present [...] Read more.
Reversible protein phosphorylation is an important regulatory mechanism in cellular signalling and disease, regulated by the opposing actions of kinases and phosphatases. Modern computer methods predict kinase–substrate or phosphatase–substrate interactions in isolation and lack specificity for biological conditions, neglecting triadic regulation. We present SPINET-KSP, a multi-modal LLM–Graph foundation model engineered for the prediction of kinase–substrate–phosphatase (KSP) triads with contextual awareness. SPINET-KSP integrates high-confidence interactomes (SIGNOR, BioGRID, STRING), structural contacts obtained from AlphaFold3, ESM-3 sequence embeddings, and a 512-dimensional cell-state manifold with 1612 quantitative phosphoproteomic conditions. A heterogeneous KSP graph is examined utilising a cross-attention Graphormer with Reversible Triad Attention to mimic kinase–phosphatase antagonism. SPINET-KSP, pre-trained on 3.41 million validated phospho-sites utilising masked phosphorylation modelling and contrastive cell-state learning, achieves an AUROC of 0.852 for kinase-family classification (sensitivity 0.821, specificity 0.834, MCC 0.655) and a Pearson correlation coefficient of 0.712 for phospho-occupancy prediction. In distinct 2025 mass spectrometry datasets, it identifies 72% of acknowledged cancer-resistance triads within the top 10 rankings and uncovers 247 supplementary triads validated using orthogonal proteomics. SPINET-KSP is the first foundational model for simulating context-dependent reversible phosphorylation, enabling the targeting of dysregulated kinase-phosphatase pathways in diseases. Full article
Show Figures

Figure 1

17 pages, 5486 KB  
Article
Enhancing Parameter-Efficient Code Representations with Retrieval and Structural Priors
by Shihao Zheng, Yong Li and Xiang Ma
Appl. Sci. 2026, 16(2), 1106; https://doi.org/10.3390/app16021106 - 21 Jan 2026
Viewed by 66
Abstract
High-quality code representations are fundamental to code intelligence. Achieving such representations with parameter-efficient fine-tuning (PEFT) remains a key challenge. While code pre-trained models (CodePTMs) offer a robust foundation for general-purpose embeddings, current PEFT approaches face two main obstacles when adapting them: (i) they [...] Read more.
High-quality code representations are fundamental to code intelligence. Achieving such representations with parameter-efficient fine-tuning (PEFT) remains a key challenge. While code pre-trained models (CodePTMs) offer a robust foundation for general-purpose embeddings, current PEFT approaches face two main obstacles when adapting them: (i) they fail to adequately capture the deep structural characteristics of programs, and (ii) they are limited by the model’s finite internal parameters, restricting their ability to overcome inherent knowledge bottlenecks. To address these challenges, we introduce a parameter-efficient code representation learning framework that combines retrieval augmentation with structure-aware priors. Our framework features three complementary, lightweight modules: first, a structure–semantic dual-channel retrieval mechanism that infuses high-quality external code knowledge as non-parametric memory to alleviate the knowledge bottleneck; second, a graph relative bias module that strengthens the attention mechanism’s capacity to model structural relationships within programs; and third, a span-discriminative contrastive objective that sharpens the distinctiveness and boundary clarity of span-level representations. Extensive experiments on three benchmarks spanning six programming languages show that our method consistently outperforms state-of-the-art parameter-efficient baselines. Notably, on structure-sensitive tasks using the PLBART backbone, RS-Rep surpasses full fine-tuning, delivering a 22.1% improvement in Exact Match for code generation and a 4.4% increase in BLEU scores for code refinement, all while utilizing only about 5% of the trainable parameters. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

41 pages, 2850 KB  
Article
Automated Classification of Humpback Whale Calls Using Deep Learning: A Comparative Study of Neural Architectures and Acoustic Feature Representations
by Jack C. Johnson and Yue Rong
Sensors 2026, 26(2), 715; https://doi.org/10.3390/s26020715 - 21 Jan 2026
Viewed by 95
Abstract
Passive acoustic monitoring (PAM) using hydrophones enables collecting acoustic data to be collected in large and diverse quantities, necessitating the need for a reliable automated classification system. This paper presents a data-processing pipeline and a set of neural networks designed for a humpback-whale-detection [...] Read more.
Passive acoustic monitoring (PAM) using hydrophones enables collecting acoustic data to be collected in large and diverse quantities, necessitating the need for a reliable automated classification system. This paper presents a data-processing pipeline and a set of neural networks designed for a humpback-whale-detection system. A collection of audio segments is compiled using publicly available audio repositories and extensively curated via manual methods, undertaking thorough examination, editing and clipping to produce a dataset minimizing bias or categorization errors. An array of standard data-augmentation techniques are applied to the collected audio, diversifying and expanding the original dataset. Multiple neural networks are designed and trained using TensorFlow 2.20.0 and Keras 3.13.1 frameworks, resulting in a custom curated architecture layout based on research and iterative improvements. The pre-trained model MobileNetV2 is also included for further analysis. Model performance demonstrates a strong dependence on both feature representation and network architecture. Mel spectrogram inputs consistently outperformed MFCC (Mel-Frequency Cepstral Coefficients) features across all model types. The highest performance was achieved by the pretrained MobileNetV2 using mel spectrograms without augmentation, reaching a test accuracy of 99.01% with balanced precision and recall of 99% and a Matthews correlation coefficient of 0.98. The custom CNN with mel spectrograms also achieved strong performance, with 98.92% accuracy and a false negative rate of only 0.75%. In contrast, models trained with MFCC representations exhibited consistently lower robustness and higher false negative rates. These results highlight the comparative strengths of the evaluated feature representations and network architectures for humpback whale detection. Full article
(This article belongs to the Section Sensor Networks)
Show Figures

Figure 1

28 pages, 1241 KB  
Article
Joint Learning for Metaphor Detection and Interpretation Based on Gloss Interpretation
by Yanan Liu, Hai Wan and Jinxia Lin
Electronics 2026, 15(2), 456; https://doi.org/10.3390/electronics15020456 - 21 Jan 2026
Viewed by 55
Abstract
Metaphor is ubiquitous in daily communication and makes language expression more vivid. Identifying metaphorical words, known as metaphor detection, is crucial for capturing the real meaning of a sentence. As an important step of metaphorical understanding, the correct interpretation of metaphorical words [...] Read more.
Metaphor is ubiquitous in daily communication and makes language expression more vivid. Identifying metaphorical words, known as metaphor detection, is crucial for capturing the real meaning of a sentence. As an important step of metaphorical understanding, the correct interpretation of metaphorical words directly affects metaphor detection. This article investigates how to use metaphor interpretation to enhance metaphor detection. Since previous approaches for metaphor interpretation are coarse-grained or constrained by ambiguous meanings of substitute words, we propose a different interpretation mechanism that explains metaphorical words by means of gloss-based interpretations. To comprehensively explore the optimal joint strategy, we go beyond previous work by designing diverse model architectures. We investigate both classification and sequence labeling paradigms, incorporating distinct component designs based on MIP and SPV theories. Furthermore, we integrate Part-of-Speech tags and external knowledge to further refine the feature representation. All methods utilize pre-trained language models to encode text and capture semantic information of the text. Since this mechanism involves both metaphor detection and metaphor interpretation but there is a lack of datasets annotated for both tasks, we have enhanced three datasets with glosses for metaphor detection: one Chinese dataset (PSUCMC) and two English datasets (TroFi and VUA). Experimental results demonstrate that the proposed joint methods are superior to or at least comparable to state-of-the-art methods on the three enhanced datasets. Results confirm that joint learning of metaphor detection and gloss-based interpretation makes metaphor detection more accurate. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

21 pages, 8669 KB  
Article
LLM4FB: A One-Sided CSI Feedback and Prediction Framework for Lightweight UEs via Large Language Models
by Xinxin Xie, Xinyu Ning, Yitong Liu, Hanning Wang, Jing Jin and Hongwen Yang
Sensors 2026, 26(2), 691; https://doi.org/10.3390/s26020691 - 20 Jan 2026
Viewed by 112
Abstract
Massive MIMO systems can substantially enhance spectral efficiency, but such gains rely on the availability of accurate channel state information (CSI). However, the increase in the number of antennas leads to a significant growth in feedback overhead, while conventional deep-learning-based CSI feedback methods [...] Read more.
Massive MIMO systems can substantially enhance spectral efficiency, but such gains rely on the availability of accurate channel state information (CSI). However, the increase in the number of antennas leads to a significant growth in feedback overhead, while conventional deep-learning-based CSI feedback methods also impose a substantial computational burden on the user equipment (UE). To address these challenges, this paper proposes LLM4FB, a one-sided CSI feedback framework that leverages a pre-trained large language model (LLM). In this framework, the UE performs only low-complexity linear projections to compress CSI. In contrast, the BS leverages a pre-trained LLM to accurately reconstruct and predict CSI. By utilizing the powerful modeling capabilities of the pre-trained LLM, only a small portion of the parameters needs to be fine-tuned to improve CSI recovery accuracy with low training cost. Furthermore, a multiobjective loss function is designed to simultaneously optimize normalized mean square error (NMSE) and spectral efficiency (SE). Simulation results show that LLM4FB outperforms existing methods across various compression ratios and mobility levels, achieving high-precision CSI feedback with minimal computational capability from terminal devices. Therefore, LLM4FB presents a highly promising solution for next-generation wireless sensor networks and industrial IoT applications, where terminal devices are often strictly constrained by energy and hardware resources. Full article
(This article belongs to the Section Communications)
Show Figures

Figure 1

19 pages, 10545 KB  
Article
Comparative Analysis of Deep Learning Architectures for Automatic Tooth Segmentation in Panoramic Dental Radiographs: Balancing Accuracy and Computational Efficiency
by Alperen Yalım, Emre Aytugar, Fahrettin Kalabalık and İsmail Akdağ
Diagnostics 2026, 16(2), 336; https://doi.org/10.3390/diagnostics16020336 - 20 Jan 2026
Viewed by 122
Abstract
Background/Objectives: This study provides a systematic benchmark of U-Net–based deep learning models for automatic tooth segmentation in panoramic dental radiographs, with a specific focus on how segmentation accuracy changes as computational cost increases across different encoder backbones. Methods: U-Net models with ResNet, EfficientNet, [...] Read more.
Background/Objectives: This study provides a systematic benchmark of U-Net–based deep learning models for automatic tooth segmentation in panoramic dental radiographs, with a specific focus on how segmentation accuracy changes as computational cost increases across different encoder backbones. Methods: U-Net models with ResNet, EfficientNet, DenseNet, and MobileNetV3-Small encoder families pretrained on ImageNet were evaluated on the publicly available Tufts Dental Database (1000 panoramic radiographs) using a five-fold cross-validation strategy. Segmentation performance was quantified using the Dice coefficient and Intersection over Union (IoU), while computational efficiency was characterized by parameter count and floating-point operations reported as GFLOPs per image. Statistical comparisons were conducted using the Friedman test followed by Nemenyi-corrected post hoc analyses (p<0.05). Results: The overall segmentation quality was consistently high, clustering within a narrow range (Dice: 0.9168–0.9259). This suggests diminishing returns as the backbone complexity increases. EfficientNet-B7 achieved the highest nominal accuracy (Dice: 0.9259 ± 0.0007; IoU: 0.8621 ± 0.0013); however, the differences in Dice score between EfficientNet-B0, B4 and B7 were not statistically significant (p>0.05). In contrast, computational demands varied substantially (2.9–67.2 million parameters; 4.93–40.8 GFLOPs). EfficientNet-B0 provided an accurate and efficient operating point (Dice: 0.9244 ± 0.0011) at low computational cost (5.98 GFLOPs). In contrast, MobileNetV3-Small offered the lowest computational cost (4.93 GFLOPs; 2.9 million parameters), but also the lowest Dice score (0.9168 ± 0.0031). Compared with heavier ResNet and DenseNet variants, EfficientNet-B0 achieved competitive accuracy with a markedly lower computational footprint. Conclusions: The findings show that larger models do not always perform better and that models with increased performance may not necessarily yield meaningful gains. It should be noted that the findings are limited to the task of tooth segmentation; different findings may be obtained for different tasks. Among the models evaluated for tooth segmentation, EfficientNet-B0 stands out as the most practical option, maintaining near-saturated accuracy levels while keeping model size and computational cost low. Full article
(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)
Show Figures

Figure 1

Back to TopTop