Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (3)

Search Parameters:
Keywords = shifted patch tokenization

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
26 pages, 1586 KB  
Article
Adaptive Vision–Language Transformer for Multimodal CNS Tumor Diagnosis
by Inzamam Mashood Nasir, Hend Alshaya, Sara Tehsin and Wided Bouchelligua
Biomedicines 2025, 13(12), 2864; https://doi.org/10.3390/biomedicines13122864 - 24 Nov 2025
Viewed by 605
Abstract
Objectives: Correctly identifying Central Nervous System (CNS) tumors through MRI is complicated by utilization of divergent MRI acquisition protocols, unequal tumor morphology, and a difficulty in systematically combining imaging with clinical information. This study presents the Adaptive Vision–Language Transformer (AVLT), a multimodal [...] Read more.
Objectives: Correctly identifying Central Nervous System (CNS) tumors through MRI is complicated by utilization of divergent MRI acquisition protocols, unequal tumor morphology, and a difficulty in systematically combining imaging with clinical information. This study presents the Adaptive Vision–Language Transformer (AVLT), a multimodal diagnostic infrastructure designed to integrate multi-sequence MRI with clinical descriptions while improving robustness and interpretability to domain shifts. Methods: AVLT integrates the MRI sequence (T1, T1c, T2, FLAIR) and clinical note text in a joint process using normalized cross-attention to establish association of visual patch embeddings with clinical token representations. An Adaptive Normalization Module (ANM) functions to mitigate distribution shift across datasets by adapting the statistics of domain-specific features. Auxiliary semantic and alignment losses were incorporated to enhance stability of multimodal fusion. Results: On all datasets, AVLT provided superior classification accuracy relative to CNN-, transformer-, radiogenomic-, and multimodal fusion-based models. The AVLT model accuracy was 84.6% on BraTS (OS), 92.4% on TCGA-GBM/LGG, 89.5% on REMBRANDT, and 90.8% on GLASS. AvLT AUC values are at least above 90 for all domains. Conclusions: AVLT provides a reliable, generalizable, and clinically interpretable method for accurate diagnosis of CNS tumors. Full article
(This article belongs to the Special Issue Diagnosis, Pathogenesis and Treatment of CNS Tumors (2nd Edition))
Show Figures

Figure 1

16 pages, 3338 KB  
Article
Enhancing Cervical Pre-Cancerous Classification Using Advanced Vision Transformer
by Manal Darwish, Mohamad Ziad Altabel and Rahib H. Abiyev
Diagnostics 2023, 13(18), 2884; https://doi.org/10.3390/diagnostics13182884 - 8 Sep 2023
Cited by 13 | Viewed by 3763
Abstract
One of the most common types of cancer among in women is cervical cancer. Incidence and fatality rates are steadily rising, particularly in developing nations, due to a lack of screening facilities, experienced specialists, and public awareness. Visual inspection is used to screen [...] Read more.
One of the most common types of cancer among in women is cervical cancer. Incidence and fatality rates are steadily rising, particularly in developing nations, due to a lack of screening facilities, experienced specialists, and public awareness. Visual inspection is used to screen for cervical cancer after the application of acetic acid (VIA), histopathology test, Papanicolaou (Pap) test, and human papillomavirus (HPV) test. The goal of this research is to employ a vision transformer (ViT) enhanced with shifted patch tokenization (SPT) techniques to create an integrated and robust system for automatic cervix-type identification. A vision transformer enhanced with shifted patch tokenization is used in this work to learn the distinct features between the three different cervical pre-cancerous types. The model was trained and tested on 8215 colposcopy images of the three types, obtained from the publicly available mobile-ODT dataset. The model was tested on 30% of the whole dataset and it showed a good generalization capability of 91% accuracy. The state-of-the art comparison indicated the outperformance of our model. The experimental results show that the suggested system can be employed as a decision support tool in the detection of the cervical pre-cancer transformation zone, particularly in low-resource settings with limited experience and resources. Full article
(This article belongs to the Special Issue Deep Learning in Medical Image Segmentation and Diagnosis)
Show Figures

Figure 1

18 pages, 1614 KB  
Article
SL-Swin: A Transformer-Based Deep Learning Approach for Macro- and Micro-Expression Spotting on Small-Size Expression Datasets
by Erheng He, Qianru Chen and Qinghua Zhong
Electronics 2023, 12(12), 2656; https://doi.org/10.3390/electronics12122656 - 13 Jun 2023
Cited by 16 | Viewed by 4671
Abstract
In recent years, the analysis of macro- and micro-expression has drawn the attention of researchers. These expressions provide visual cues to an individual’s emotions, which can be used in a broad range of potential applications such as lie detection and policing. In this [...] Read more.
In recent years, the analysis of macro- and micro-expression has drawn the attention of researchers. These expressions provide visual cues to an individual’s emotions, which can be used in a broad range of potential applications such as lie detection and policing. In this paper, we address the challenge of spotting facial macro- and micro-expression from videos and present compelling results by using a deep learning approach to analyze the optical flow features. Unlike other deep learning approaches that are mainly based on Convolutional Neural Networks (CNNs), we propose a Transformer-based deep learning approach that predicts a score indicating the probability of a frame being within an expression interval. In contrast to other Transformer-based models that achieve high performance by being pre-trained on large datasets, our deep learning model, called SL-Swin, which incorporates Shifted Patch Tokenization and Locality Self-Attention into the backbone Swin Transformer network, effectively spots macro- and micro-expressions by being trained from scratch on small-size expression datasets. Our evaluation outcomes surpass the MEGC 2022 spotting baseline result, obtaining an overall F1-score of 0.1366. Additionally, our approach performs well on the MEGC 2021 spotting task, with an overall F1-score of 0.1824 and 0.1357 on the CAS(ME)2 and SAMM Long Videos, respectively. The code is publicly available on GitHub. Full article
(This article belongs to the Section Computer Science & Engineering)
Show Figures

Graphical abstract

Back to TopTop