Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (664)

Search Parameters:
Keywords = Generative Pre-trained Transformer

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 7109 KB  
Article
Stereo Radargrammetry Using Deep Learning-Based Image Matching with Fine-Tuned Model on Synthetic Aperture Radar Images
by Koichi Ito, Tatsuya Sasayama, Shintaro Ito, Haruki Iwasa, Takafumi Aoki and Jyunpei Uemoto
Remote Sens. 2026, 18(10), 1662; https://doi.org/10.3390/rs18101662 - 21 May 2026
Viewed by 134
Abstract
Stereo radargrammetry using Synthetic Aperture Radar (SAR) images is a powerful technique for all-weather 3D topographic measurements. However, conventional methods based on local template matching often struggle to establish accurate correspondences in mountainous or vegetated areas due to severe SAR-specific geometric modulations. In [...] Read more.
Stereo radargrammetry using Synthetic Aperture Radar (SAR) images is a powerful technique for all-weather 3D topographic measurements. However, conventional methods based on local template matching often struggle to establish accurate correspondences in mountainous or vegetated areas due to severe SAR-specific geometric modulations. In this paper, we propose a novel high-accuracy stereo radargrammetry framework by introducing RoMa, a robust Transformer-based deep learning model, for dense SAR image matching. Optical pre-trained deep learning models often suffer from a domain gap. To overcome this limitation, we develop an automated pipeline to construct a patch-based SAR image dataset using a reference Digital Surface Model (DSM) and an SAR projection model. By fine-tuning RoMa on this dataset, the model effectively adapts to the complex non-linear deformations of SAR images. Furthermore, unlike conventional methods, our approach establishes correspondences directly on the original slant-range images without requiring ground-range projection, thereby avoiding image quality degradation caused by pixel interpolation. Experimental results using airborne Pi-SAR2 images demonstrate that the fine-tuned RoMa significantly outperforms conventional methods, achieving an 82.86% matching accuracy at a 10-pixel threshold. In the 3D measurement evaluation, the proposed method achieves the lowest elevation mean error (1.24 m) and the highest inlier ratio (74.1%), proving its effectiveness in generating accurate, dense, and wide-area 3D point clouds even in challenging terrains. Full article
(This article belongs to the Special Issue SAR Images Processing and Analysis (3rd Edition))
Show Figures

Figure 1

30 pages, 26440 KB  
Article
HybridHiT-UNet: Multi-Scale Temporal U-Net with Hierarchical Shot-Aware Transformers for Video Summarization
by Saadman Sakib, Tanjim Mahmud, Karl Andersson and Kaushik Deb
Mach. Learn. Knowl. Extr. 2026, 8(5), 135; https://doi.org/10.3390/make8050135 - 20 May 2026
Viewed by 159
Abstract
Video summarization aims to produce a short yet informative summary of a long video while reducing the amount of redundancy. Most transformer-based methods are single-temporal scale or are unconcerned with shot-level structure, limiting temporal coherence and cross-dataset generalization. To fill these gaps, we [...] Read more.
Video summarization aims to produce a short yet informative summary of a long video while reducing the amount of redundancy. Most transformer-based methods are single-temporal scale or are unconcerned with shot-level structure, limiting temporal coherence and cross-dataset generalization. To fill these gaps, we present HybridHiT-UNet, a supervised framework that combines three complementary parts: a pretrained Vision Transformer encoder to provide spatially rich frame representations, a multi-scale 1D Temporal U-Net backbone to provide hierarchical temporal modeling of frame representations, and a shot-aware hierarchical transformer scoring head to provide inter-shot context to importance prediction. Frame-level scores are summed into shot-level utilities and optimized with a knapsack selection on a fixed-length budget, and a weighted focal loss is used to address extreme class imbalance. Wide experiments using four benchmarks (SumMe, TVSum, OVP, and YouTube) under canonical, augmented, and transfer protocols demonstrate that HybridHiT-UNet achieves F1-scores of 65.8% on SumMe and 79.92% on TVSum, which is higher than the existing methods, which still achieve diversity scores of 64.98% and 48.68%, respectively. A systematic study further demonstrates that a 20% summary budget would yield a consistently superior coverage-diversity trade-off than the traditional 15% one, which provides useful evidence-based advice on the selection of summary length. Full article
(This article belongs to the Section Learning)
Show Figures

Graphical abstract

21 pages, 269 KB  
Article
Exploring Data Augmentation in a Low-Resource Language Context: A Case Study on Text Generation for Reading Comprehension in Turkish
by Seyma N. Yildirim-Erbasli and Okan Bulut
Algorithms 2026, 19(5), 413; https://doi.org/10.3390/a19050413 - 20 May 2026
Viewed by 162
Abstract
This study presents a controlled empirical and comparative analysis of existing data augmentation techniques for text generation in Turkish, a morphologically rich, low-resource language. A collection of 265 Turkish reading passages for Grades 4 and 5 was augmented using four techniques: paraphrasing with [...] Read more.
This study presents a controlled empirical and comparative analysis of existing data augmentation techniques for text generation in Turkish, a morphologically rich, low-resource language. A collection of 265 Turkish reading passages for Grades 4 and 5 was augmented using four techniques: paraphrasing with GPT-3.5-turbo (Generative Pre-trained Transformer 3.5 Turbo), back translation (Turkish–English–Turkish and Turkish–French–Turkish) via Google Translate, synonym replacement via GPT-3.5-turbo, and random insertion via GPT-3.5-turbo. Human evaluators assessed the fluency, coherence, grammaticality, logical flow, and naturalness of the augmented datasets. Each augmented dataset, along with the original, was then used to fine-tune a Turkish GPT-2-medium model, which was evaluated using automatic metrics such as BLEU (Bilingual Evaluation Understudy), ROUGE (Recall-Oriented Understudy for Gisting Evaluation), METEOR (Metric for Evaluation of Translation with Explicit ORdering), chrF (CHaRacter-level F-score), BERTScore (Bidirectional Encoder Representations from Transformers Score), and cosine similarity. According to the human evaluation of the original and augmented datasets, the original texts received the highest ratings, followed by those generated through random insertion, paraphrasing, synonym replacement, and back translation variants, with cosine similarity results between original and augmented texts showing a comparable trend; however, the differences between methods were generally small. The results from text generation indicate that models trained on the original dataset generally achieved slightly higher performance across evaluation metrics compared to those trained on augmented datasets. Among the augmented methods, synonym replacement showed marginally better performance, followed by back translation, random insertion, and paraphrasing; however, the differences between methods were small and not statistically significant. Full article
26 pages, 94235 KB  
Article
CLIP-HBD: Hierarchical Boundary-Constrained Decoding for Open-Vocabulary Semantic Segmentation
by Jing Wang, Quan Zhou, Anyi Yang and Junyu Lin
Computers 2026, 15(5), 318; https://doi.org/10.3390/computers15050318 - 15 May 2026
Viewed by 263
Abstract
Open-vocabulary semantic segmentation (OVSS) aims to achieve pixel-level object segmentation guided by arbitrary natural language descriptions. Although pre-trained vision–language models (VLMs) have significantly advanced the development of OVSS, their reliance on the Vision Transformer (ViT) architecture imposes a fundamental constraint on dense prediction. [...] Read more.
Open-vocabulary semantic segmentation (OVSS) aims to achieve pixel-level object segmentation guided by arbitrary natural language descriptions. Although pre-trained vision–language models (VLMs) have significantly advanced the development of OVSS, their reliance on the Vision Transformer (ViT) architecture imposes a fundamental constraint on dense prediction. Specifically, the absence of hierarchical downsampling in ViT-based VLM results in single-scale representations that trade spatial localization for global semantics. To address these issues, this paper proposes a hierarchical boundary-constrained decoding network for OVSS, called CLIP-HBD. Our approach leverages VLM semantic priors to reconstruct multi-scale features and introduces a boundary-constrained decoding strategy to refine edge details. Specifically, CLIP-HBD leverages a ConvNeXt-based backbone alongside a hierarchical adaptation mechanism to fuse multi-layer VLM features, generating a comprehensive multi-scale representation. To address the issue of boundary inaccuracy, we perform explicit boundary prediction based on multi-scale representations, where the resulting boundary maps are subsequently transformed into structural constraints to steer the decoder’s focus toward boundary regions. By integrating structural constraints with hierarchical features, the decoding process effectively maintains semantic consistency and restores precise object boundaries. Extensive experiments demonstrate that CLIP-HBD achieves superior performance in both segmentation precision and boundary quality across multiple benchmarks. Full article
(This article belongs to the Special Issue Advanced Image Processing and Computer Vision (3rd Edition))
Show Figures

Figure 1

14 pages, 3021 KB  
Article
Validation of Synthetic Megavoltage Computed Tomography (MVCT) for Dose Calculation in Radiotherapy Treatment Planning
by Aurora Corso, Niki Martinel, Mubashara Rehman, Joseph Stancanello, Christian Micheloni, Cristian Deana, Cristina Cappelletto, Paola Chiovati, Riccardo Spizzo, Giuseppe Fanetti, Andrea Dassie and Michele Avanzo
Cancers 2026, 18(10), 1603; https://doi.org/10.3390/cancers18101603 - 14 May 2026
Viewed by 215
Abstract
Background/Objectives: Dental metallic implants cause severe streaking artifacts in kilovoltage CT (kVCT), compromising dose calculation in radiotherapy (RT) treatment planning. The purpose of this study is to assess the dosimetric agreement of synthetic MVCT (sMVCT) images generated from artifact-affected kVCT using a [...] Read more.
Background/Objectives: Dental metallic implants cause severe streaking artifacts in kilovoltage CT (kVCT), compromising dose calculation in radiotherapy (RT) treatment planning. The purpose of this study is to assess the dosimetric agreement of synthetic MVCT (sMVCT) images generated from artifact-affected kVCT using a deep learning network with respect to true MVCT (tMVCT) acquired at the treatment machine. Methods: Nineteen head and neck cancer patients with dental metallic implants treated with RT were included. Planning kVCT images were converted to sMVCT using Metal Artifact Reduction through Domain Transformation Network (MAR-DTN), a UNet-inspired deep learning network. The sMVCT images were rigidly registered to true MVCT (tMVCT) acquired on the Hi-Art II Tomotherapy system. Mean Hounsfield Unit (HU) values were compared across seven structures (thyroid, bilateral parotids, brainstem, spinal cord, GTV, PTV70) using pairwise Wilcoxon tests and Two One-Sided Tests (TOST) for statistical equivalence within a pre-specified margin of ±20 HU (corresponding to a 2% deviation in physical density). Dose distributions were recalculated on sMVCT using the AAA algorithm and compared to reference tMVCT-based plans via dose–volume histogram (DVH) metrics, evaluated for equivalence by TOST within a margin of ±2% of the prescribed dose (±142 cGy of 70.95 Gy), and via 3D gamma index, evaluated by one-sided non-inferiority test against the clinically accepted thresholds of 90% (2 mm/2%) and 95% (3 mm/3%). A pre-specified sensitivity analysis was performed by repeating all comparisons on the strictly independent sub-cohort (n = 16) excluding three patients drawn from the MAR-DTN training set. Results: All seven anatomical structures showed statistical equivalence between sMVCT and tMVCT under the ±20 HU margin (TOST p < 0.05; mean HU differences in the range −1.1 to +8.4 HU; all Wilcoxon p > 0.05). All nine DVH metrics achieved formal dosimetric equivalence within ±2% of the prescribed dose (TOST p < 0.05). Mean 3D gamma pass rates were 94.3% (95% CI: 89.3–97.1) for the 2 mm/2% criterion and 97.6% (95% CI: 94.8–99.0) for the 3 mm/3% criterion, both formally non-inferior to the respective clinical thresholds (p < 0.0001). Residual gamma failures were concentrated at the patient surface, consistent with inter-session repositioning uncertainty rather than errors in synthetic image generation. Sensitivity analysis on the n = 16 sub-cohort confirmed all conclusions, with mean HU and DVH differences smaller than in the full cohort for the structures showing the largest mean differences, and comparable for the remaining structures, with all TOST equivalence and gamma non-inferiority tests confirmed in both cohorts. Conclusions: sMVCT images generated via MAR-DTN show dosimetric agreement with physically acquired tMVCT in head and neck patients with dental implants, formally demonstrated by TOST equivalence within ±2% of prescribed dose for all DVH metrics. The combined HU and gamma index framework presented here represents a promising quality assurance approach for AI-based synthetic imaging tools in radiotherapy, pending validation in larger prospective multicentre cohorts. Full article
Show Figures

Figure 1

18 pages, 8033 KB  
Article
Parameter-Efficient Domain Adaptation and Lightweight Decoding for Agricultural Monocular Depth Estimation
by Yanliang Mao, Wenhao Zhao and Liping Chen
Agronomy 2026, 16(10), 972; https://doi.org/10.3390/agronomy16100972 (registering DOI) - 13 May 2026
Viewed by 89
Abstract
Reliable monocular depth estimation (MDE) is essential for agricultural robots and unmanned platforms, where low-cost visual perception is required for safe navigation and scene understanding in complex field environments. However, general-purpose depth foundation models remain limited by substantial domain gaps in agriculture, while [...] Read more.
Reliable monocular depth estimation (MDE) is essential for agricultural robots and unmanned platforms, where low-cost visual perception is required for safe navigation and scene understanding in complex field environments. However, general-purpose depth foundation models remain limited by substantial domain gaps in agriculture, while full fine-tuning of large backbones is computationally expensive and less suitable for deployment on resource-constrained platforms. In this paper, an efficient agricultural MDE framework, termed AgriLoRA-DA, is proposed based on Depth-Anything-V2. Specifically, the pretrained DINOv2 encoder is kept frozen and adapted using LoRA in selected attention projections, while the original Dense Prediction Transformer (DPT) decoder is replaced with a lightweight Lite-FPNHead to reduce decoding overhead and improve deployment efficiency. Experiments conducted on the WE3DS dataset indicate that, although Depth-Anything-V3 provides the strongest zero-shot generalization among the evaluated baselines, target-domain adaptation is still necessary for WE3DS agricultural scenes. After adaptation, AgriLoRA-DA achieves the best overall performance with AbsRel = 0.0133, SqRel = 3.518, RMSE = 132.264, log10 = 0.0057, and delta1 = 0.9990, while requiring only 0.19 M (0.87%) trainable parameters. These results suggest that parameter-efficient adaptation and lightweight decoding provide a practical direction for deployable depth estimation in crop-row scenes similar to WE3DS, while broader cross-dataset validation remains an important direction for future work. Full article
Show Figures

Figure 1

29 pages, 5091 KB  
Article
RNAFoldDiff-Based Sequence-Aware Graph Diffusion for Accurate RNA 3D Structure Prediction
by Abdullah Al-Refai, Mohammad F. Al-Hammouri, Bandi Vamsi and Ali Al Bataineh
Algorithms 2026, 19(5), 381; https://doi.org/10.3390/a19050381 - 11 May 2026
Viewed by 342
Abstract
The prediction accuracy of RNA’s tertiary structure remains a core challenge in the field of computational biology. Existing models frequently encounter significant challenges due to the complexities of diverse topologies and the intricate nature of long-range interactions. We introduce RNAFoldDiff, a generative framework [...] Read more.
The prediction accuracy of RNA’s tertiary structure remains a core challenge in the field of computational biology. Existing models frequently encounter significant challenges due to the complexities of diverse topologies and the intricate nature of long-range interactions. We introduce RNAFoldDiff, a generative framework that integrates a sequence-aware graph transformer with a geometric diffusion process for end-to-end RNA 3D structure prediction. RNA sequences and secondary structures are converted into graph representations that capture backbone connectivity and base pair topology. The transformer models local motifs and global dependencies, while the diffusion module iteratively denoises coordinates into physically consistent conformations. The model was pretrained on more than 15,000 structural motifs from the RNA 3D Hub and fine-tuned on complete RNAs from the RNA-Puzzles dataset. In benchmarking tests, RNAFold-Diff achieved an average root mean square deviation (RMSD) of 2.64 Å, a Global Distance Test (GDT) score of 68.7%, and a base pair accuracy of 89.5%, reducing RMSD by nearly 30% and improving GDT by 9 points compared to RoseTTAFoldNA. The framework also outperformed FARFAR2, SimRNA, and RNAformer. Ablation experiments confirmed the contributions of diffusion refinement, edge-aware graph encoding, and motif-level pretraining, while qualitative analyses showed biologically plausible folds including helices, junctions, and multiloops. By combining topology-aware graph learning with generative diffusion, RNAFoldDiff advances RNA tertiary structure modeling and provides a practical tool for RNA design, ribozyme analysis, and structure-guided drug discovery. Full article
Show Figures

Figure 1

10 pages, 354 KB  
Article
Responsible AI for Personalized Patient Education and Engagement Across Medical Conditions: Leveraging Multi-Agent LLMs, Ambient Technology, and NotebookLM—A Case Study in Diabetes Education and Limb Preservation
by Shayan Mashatian, Shu-Fen Wung, Aaron Ritter, Jessica Fishman, Jeffrey Robbins, Shereen Aziz, Michelle Huo and David G. Armstrong
J. Am. Podiatr. Med. Assoc. 2026, 116(3), 30; https://doi.org/10.3390/japma116030030 - 8 May 2026
Cited by 1 | Viewed by 350
Abstract
Background: Effective communication with patients is vital for improving health outcomes in chronic disease management. In this study, we investigated WoundScribeAI’s Scribe AI, also known as Ambient Technology, and its patient education and engagement app, Pingoo.AI. It employed a multi-agent AI model [...] Read more.
Background: Effective communication with patients is vital for improving health outcomes in chronic disease management. In this study, we investigated WoundScribeAI’s Scribe AI, also known as Ambient Technology, and its patient education and engagement app, Pingoo.AI. It employed a multi-agent AI model that leveraged Large Language Models (LLMs) and NotebookLM to enhance patient communication in clinical settings. Methods: The system comprised specialized agents that transcribed healthcare provider–patient conversations through ambient dictation. This transcription generated medical notes that followed the Subjective, Objective, Assessment, and Plan (SOAP) format—a structured document used by healthcare providers to record and communicate information about patient encounters. Simultaneously, comprehensive visit summaries were also created. In the next step, these visit summaries were used to produce conversational and educational content by leveraging NotebookLM, an AI model introduced by Google that can generate podcast-style conversations from provided information. Integrating these agents allows clinicians to deliver engaging, empathetic, and actionable information to patients. Medical experts conducted a two-phase evaluation of the system’s performance based on multiple criteria, with a particular focus on diabetes education and diabetic foot care. The first phase used pre-recorded training videos, while the second phase involved simulated consultations by clinicians using the system. To validate the AI-generated educational content, we used several established frameworks in health communication that closely align with our enhancement goals. Results: The results showed that the AI model generated accurate clinical documentation and met the criteria for accurate SOAP Notes, visit summaries, and engaging educational content for patients. Given that hallucination is a significant concern related to large language models, especially in critical fields like healthcare, we meticulously analyzed the generated outputs to identify any signs of hallucinated information. Three outcomes successfully passed the validation criteria, including accuracy, completeness, comprehensiveness, absence of potential harm, and no hallucination. Additionally, the Conversational Education content was confirmed against established patient education frameworks and met criteria such as the use of metaphors, empathetic tone, and appropriate language, providing additional detail to help manage the condition. Conclusions: By providing specific instructions and prompts to NotebookLM to transform visit summaries into educational conversations, we significantly enhanced the comprehensiveness and engagement of the content for patients. In contrast to a traditional summary of the clinical visit, the podcast-style conversation enriched the content with background information, encouraging language, an empathetic tone, and helpful metaphors. Our analysis confirmed that the system did not exhibit any hallucinations, highlighting the effectiveness of our approach in mitigating this risk. These findings support the use of multi-agent AI models, combined with ambient dictation and tools like NotebookLM, to improve patient communication that surpasses traditional paper-based brochures, which are often impersonal, minimal, and do not always adhere to recommended factors for health literacy. Full article
Show Figures

Figure 1

21 pages, 2357 KB  
Article
Integrating Thesaurus-Based Knowledge into Transformer Models for Semantic Understanding of Domain-Specific Texts
by Bayangali Abdygalym, Saule Tazhibayeva, Madina Sambetbayeva, Aigerim Yerimbetova, Roman Taberkhan, Manzura Abjalova, Aidos Sabdenov and Elmira Daiyrbayeva
Computers 2026, 15(5), 297; https://doi.org/10.3390/computers15050297 - 7 May 2026
Viewed by 237
Abstract
Integrating structured linguistic resources into deep learning architectures represents a key challenge in domain-oriented NLP. This study proposes a framework for incorporating knowledge from a military thesaurus of the Ground Forces, structured according to the XML Zthes standard, into pre-trained transformed language models, [...] Read more.
Integrating structured linguistic resources into deep learning architectures represents a key challenge in domain-oriented NLP. This study proposes a framework for incorporating knowledge from a military thesaurus of the Ground Forces, structured according to the XML Zthes standard, into pre-trained transformed language models, including KazBERT, multilingual BERT, and XLM-RoBERTA. The approach addresses two interrelated tasks in specialized terminology processing: concept linking and semantic search. Unlike existing knowledge-injection methods designed primarily for general-domain applications, this framework formalizes the mapping of Zthes elements, such as Term, Broader term, Narrower term, Related term, ScopeNote, Language, and Source into structured textual representations that can be directly processed by transformer architectures. Fine-tuning is conducted on a dataset of 18,400 training instances automatically generated from the thesaurus, including synonym pairs, hierarchical relations (hyperonymy and hyponymy), associative links, and definitional descriptions. Experimental evaluation demonstrated that thesaurus-enriched models outperform baseline architectures across all major metrics. XLM-RoBERTA model achieves F1 = 0.84 and Top-5 accuracy = 0.94 in the concept linking task, representing a five-point improvement over the baseline. The model reaches Macro-F1 = 0.84 across four relation types. Results obtained on a specialized test set derived from terminology databases of Kazakhstan’s Armed Forces confirm robust cross-lingual generalization across Kazakh, Russian and English military discourse. Full article
Show Figures

Graphical abstract

22 pages, 421 KB  
Article
Frame-Level Audio Forgery Localization Using Handcrafted and Neural Features
by Mostafa Moallim, Taqwa A. Alhaj, Fatin A. Elhaj, Inshirah Idris and Tasneem Darwish
Signals 2026, 7(3), 42; https://doi.org/10.3390/signals7030042 - 7 May 2026
Viewed by 388
Abstract
Audio forgery has emerged as a significant security and forensic challenge, driven by rapid advances in generative artificial intelligence and the widespread availability of audio editing tools, which enable the creation of highly realistic manipulated speech with minimal technical expertise. Existing approaches predominantly [...] Read more.
Audio forgery has emerged as a significant security and forensic challenge, driven by rapid advances in generative artificial intelligence and the widespread availability of audio editing tools, which enable the creation of highly realistic manipulated speech with minimal technical expertise. Existing approaches predominantly operate at the file level, providing only coarse binary decisions without identifying when or where manipulation occurs. This study addresses fine-grained temporal localization through a unified frame-level localization framework. We introduce a controlled forgery generation framework derived from the TIMIT speech corpus, applying atomic, localized manipulations under strict temporal constraints and producing precise frame-level annotations across diverse manipulation types. Building on this dataset, we then propose a transform-agnostic localization-driven detection approach using temporal inconsistency modeling, enabling unified analysis across heterogeneous manipulations at frame-level resolution. To analyze forensic evidence, we present an evidence-stratified modeling paradigm comparing three complementary strategies: a handcrafted anomaly-based method, a deep localization model leveraging pretrained wav2vec 2.0 representations, and a hybrid approach combining both through confidence-aware fusion and temporal consistency reinforcement. A systematic experimental analysis evaluates the effects of representation adaptation, hybrid fusion, and manipulation type on detection and localization performance. Results show that handcrafted features are insufficient for reliable frame-level localization, while task-adapted wav2vec 2.0 achieves strong and consistent performance. The hybrid approach does not consistently improve frame-level accuracy but yields substantial gains in segment-level localization by enforcing temporal coherence. Per-transform analysis confirms robust performance across most manipulations, with deletion-based operations remaining the most challenging. Full article
Show Figures

Figure 1

17 pages, 3173 KB  
Article
RaTDet: A Marine Radar Transformer Network for End-to-End Target Detection
by Huaxing Kuang, Haocheng Yang and Luxi Yang
Electronics 2026, 15(9), 1933; https://doi.org/10.3390/electronics15091933 - 2 May 2026
Viewed by 449
Abstract
Recent advancements in deep learning have shown considerable potential to enhance radar target detection, particularly in improving detection probability under complex environmental conditions. However, existing deep learning approaches largely operate in the real number domain, neglecting the complex-valued nature of radar data, and [...] Read more.
Recent advancements in deep learning have shown considerable potential to enhance radar target detection, particularly in improving detection probability under complex environmental conditions. However, existing deep learning approaches largely operate in the real number domain, neglecting the complex-valued nature of radar data, and often inherit vision-oriented architectures that fail to address radar-specific challenges—such as sparse target echoes, the necessity for phase preservation, and constraints imposed by scanning radar systems. Meanwhile, conventional radar signal processing methods, including CA-CFAR, are limited by their dependence on idealized statistical models and often underperform in dynamic and cluttered electromagnetic environments.To overcome these issues, this paper proposes Radar Transformer for Detection (RaTDet), an end-to-end detection network that integrates complex-valued convolutional neural networks (CNNs) and Transformers. RaTDet fully leverages complex-valued data to preserve critical phase and amplitude information, enabling automated feature learning directly from raw radar signals. The model operates effectively with very few pulses, making it suitable for resource-constrained scenarios, and can serve as a pre-trained foundation model for various radar downstream tasks. Experimental results demonstrate that RaTDet achieves excellent detection performance, characterized by high detection probability (Pd) and low false alarm rate (Pfa), outperforming both traditional signal processing and conventional deep learning methods. This work bridges the gap between deep learning and radar signal processing, offering a flexible and powerful network for next-generation radar systems. Full article
Show Figures

Figure 1

19 pages, 940 KB  
Article
Hydraulic Seal Wear Classification by Fine-Tuning a Transformer-Based Audio Model Using Acoustic Emission
by Lisa Maria Svendsen, Vignesh V. Shanbhag and Rune Schlanbusch
Sensors 2026, 26(9), 2856; https://doi.org/10.3390/s26092856 - 2 May 2026
Viewed by 1516
Abstract
Accurate classification of seal wear is essential for condition-based and predictive maintenance of hydraulic cylinders, where seal degradation can cause fluid leakage and impair normal system operation. This study investigates the adaptation of a Transformer-based audio model for classifying seal wear conditions using [...] Read more.
Accurate classification of seal wear is essential for condition-based and predictive maintenance of hydraulic cylinders, where seal degradation can cause fluid leakage and impair normal system operation. This study investigates the adaptation of a Transformer-based audio model for classifying seal wear conditions using acoustic emission (AE) signals. Specifically, we adapt the Audio Spectrogram Transformer (AST), a convolution-free, purely attention-based model that operates directly on audio spectrograms. The Transformer architecture enables the modeling of long-range dependencies, while the model learns discriminative representations directly from AE data without relying on manually engineered features. A selective fine-tuning strategy was implemented by adding layer-freezing functionality to the AST training pipeline, enabling different freezing configurations during fine-tuning. This allowed earlier pretrained representations to be preserved while adapting the later layers to the target AE signals, thereby reducing the risk of overfitting in the small-data setting. In addition, validation-driven early stopping was implemented to further improve generalization during fine-tuning. The model was initialized with ImageNet and AudioSet pretrained weights to exploit general-purpose representations learned from large-scale datasets. The AE data were acquired under varying pressure conditions on a hydraulic test rig designed to simulate hydraulic cylinder leakage. The datasets were partitioned into fine-tuning, validation, and evaluation subsets and labeled into three wear states: unworn, semi-worn, and worn. In addition, data augmentation techniques were applied to the fine-tuning data to increase diversity and mitigate class imbalance. The adapted model achieved 97.92% classification accuracy across all wear conditions and pressure settings, demonstrating its ability to learn discriminative wear-related patterns directly from AE data. Furthermore, the framework’s versatility was further assessed on a bearing strip dataset acquired from the same hydraulic test rig. Using the same fine-tuning configuration, the model achieved 95.65% accuracy and 100% recall for the worn state. These findings highlight the potential of transformer-based architectures for data-efficient, end-to-end AE-based diagnostics across hydraulic system components. Full article
(This article belongs to the Special Issue Acoustic Sensing for Condition Monitoring)
Show Figures

Figure 1

44 pages, 856 KB  
Article
A GPT-Based Assessment of Alignment Between Privacy Legal Frameworks & ISO/IEC 27701:2025: A Latin American Case Study
by David Cevallos-Salas, José Estrada-Jiménez and Danny S. Guamán
Technologies 2026, 14(5), 273; https://doi.org/10.3390/technologies14050273 - 30 Apr 2026
Viewed by 414
Abstract
The 2025 update of the International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) 27701 standard offers a major advantage by enabling organizations to implement a Privacy Information Management System (PIMS) autonomously while maintaining alignment with the General Data Protection Regulation (GDPR). However, it remains [...] Read more.
The 2025 update of the International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) 27701 standard offers a major advantage by enabling organizations to implement a Privacy Information Management System (PIMS) autonomously while maintaining alignment with the General Data Protection Regulation (GDPR). However, it remains unclear to what extent privacy legal frameworks in developing jurisdictions, particularly in Latin American countries, align with this new standard. At the same time, the traditional method for assessing the alignment between privacy legal frameworks and ISO/IEC 27701 continues to rely on manual mapping between the standard’s subclauses and privacy regulatory articles, a process that is time-consuming, costly, and error-prone. More critically, no method exists to quantitatively assess the reliability of such mappings, leaving alignment assessments largely subjective. To address these limitations, this paper proposes a novel method based on an OpenAI Generative Pre-trained Transformer (GPT) combined with a Chain-of-Thought (CoT) reasoning strategy to quantitatively assess the alignment between privacy legal frameworks and ISO/IEC 27701:2025. By leveraging GPT’s logarithmic probabilities (logprobs) and the standard’s subclause definitions as classification categories, the method enables confidence-based evaluation of legal–standard alignment. The proposed method is then applied to analyze the privacy legal frameworks of Paraguay, Chile, Ecuador, México, Colombia, and Perú, examining how effectively they promote the standard’s guidelines. A suitable confidence threshold is then selected by assessing the GDPR and comparing the results with the reference mappings reported in Annex D of the standard. Finally, the method identifies the number of compliant subclauses per clause, the regulatory articles influencing the resulting logprobs, and the underlying privacy gaps for reduced alignment across the analyzed privacy legal frameworks. Overall, our results indicate that while Latin American privacy legal frameworks mandate protective measures by promoting a suitable operation and continuous improvement of a PIMS, they do not explicitly demand adequate risk management and sufficient preventive safeguards for citizens’ Personally Identifiable Information (PII) in dynamic contexts. Full article
(This article belongs to the Section Information and Communication Technologies)
Show Figures

Graphical abstract

19 pages, 870 KB  
Article
Integrating Unsupervised Learning for the Factual Consistency of Generative Models
by Sindhu Nair and Y. S. Rao
Future Internet 2026, 18(5), 235; https://doi.org/10.3390/fi18050235 - 27 Apr 2026
Viewed by 302
Abstract
Text summarization involves analyzing large amounts of text, selecting the salient text features, and arranging them coherently. The graph-based TextRank and statistical topic modeling are unsupervised approaches for generating an extractive synopsis. Deep learning models are supervised, data-driven, and pre-trained on a huge [...] Read more.
Text summarization involves analyzing large amounts of text, selecting the salient text features, and arranging them coherently. The graph-based TextRank and statistical topic modeling are unsupervised approaches for generating an extractive synopsis. Deep learning models are supervised, data-driven, and pre-trained on a huge corpus of data, making a significant contribution to automatic text summarization systems. Despite grammatical correctness and coherence, deep learning-based summarization systems are prone to factual inconsistency. This has hindered the applicability of transformer-based summarizers, particularly in critical domains where misleading summarization systems can lead to severe consequences due to their significant social impact. This work proposes an ingenious hybrid hierarchical approach that combines unsupervised approaches, such as the TextRank algorithm and Latent Dirichlet Allocation (LDA)-based summaries, with contemporary transformer-based language models. When validated on three benchmark summarization datasets, empirical results prove that our hybrid hierarchical transformer-based approach mitigates the factual inconsistency problem inherent in abstractive summarization. The improved summary consistency score of the abstractive summaries generated with our multilevel hybrid approach, in comparison to the fine-tuned baseline transformer-based language models, increases trust in transformer-based summarizers. Full article
(This article belongs to the Special Issue Artificial Intelligence (AI) and Natural Language Processing (NLP))
Show Figures

Figure 1

20 pages, 1844 KB  
Article
AI-Enhanced Prognostic Model for Predicting Polyp Recurrence and Guiding Post-Polypectomy Surveillance Intervals Using the ERCPMP-V5 Dataset
by Sri Harsha Boppana, Sachin Sravan Kumar Komati, Ritwik Raj, Gautam Maddineni, Raja Chandra Chakinala, Pradeep Yarra, Venkata C. K. Sunkesula and Cyrus David Mintz
J. Clin. Med. 2026, 15(9), 3303; https://doi.org/10.3390/jcm15093303 - 26 Apr 2026
Viewed by 465
Abstract
Introduction: Colorectal cancer remains a leading cause of cancer-related morbidity and mortality, with adenomatous polyps representing a common precursor. Post-polypectomy polyp recurrence represents a significant risk of colorectal cancer, driving periodic colonoscopy surveillance and polypectomy as needed. In this study, we explore a [...] Read more.
Introduction: Colorectal cancer remains a leading cause of cancer-related morbidity and mortality, with adenomatous polyps representing a common precursor. Post-polypectomy polyp recurrence represents a significant risk of colorectal cancer, driving periodic colonoscopy surveillance and polypectomy as needed. In this study, we explore a multimodal machine learning approach that integrates endoscopic imaging with clinical and pathology data to improve recurrence risk prediction and support individualized surveillance planning. Methods: We developed and evaluated a multimodal artificial intelligence (AI) model to predict post-polypectomy colorectal polyp recurrence using the ERCPMP-v5 dataset. The cohort included 217 patients with 796 high-resolution endoscopic RGB images and 21 endoscopic videos; video data were converted to still frames at 2 frames per second. Images and frames were resized to 224 × 224 pixels and normalized. Patient-level demographic, morphological (Paris, Kudo Pit, JNET), anatomical, and pathological variables were encoded using standard scaling for continuous features and one-hot encoding for categorical features. Visual representations were extracted using a pretrained Vision Transformer backbone (ViT-Base-Patch16-224) with frozen weights. Structured metadata (79 variables) was encoded using a multilayer perceptron. A late fusion framework used image and metadata representations to generate a recurrence probability via a sigmoid classifier; probabilities were thresholded at 0.5 for binary prediction. Model performance was evaluated on a held-out test set using accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC). We additionally compared fusion performance with image-only and metadata-only baselines. Predicted probabilities were translated to surveillance recommendations using risk tiers: low risk (0.00 ≤ p < 0.20), moderate risk (0.20 ≤ p < 0.50), and high risk (p ≥ 0.50). Results: On the test set, the multimodal fusion model achieved 90.4% accuracy, 86.7% precision, 83.1% recall, 84.9% F1-score, and an AUC of 0.920. The image-only model achieved 84.6% accuracy (AUC 0.880), and the metadata-only model achieved 81.9% accuracy (AUC 0.850), indicating improved performance with multimodal fusion. Risk stratification enabled surveillance recommendations of 1–3 years for low risk, 6–12 months for moderate risk, and 3–6 months for high risk. Conclusions: A late-fusion multimodal model integrating endoscopic imaging with structured clinical and pathology variables demonstrated excellent performance for predicting post-polypectomy recurrence and generated actionable risk-based surveillance intervals. This approach may support individualized follow-up planning and more efficient allocation of surveillance resources, while prioritizing timely evaluation for patients at higher predicted risk. Full article
Show Figures

Graphical abstract

Back to TopTop