MDPI - Publisher of Open Access Journals

20 pages, 90560 KiB

Open AccessArticle

A Hybrid MIL Approach Leveraging Convolution and State-Space Model for Whole-Slide Image Cancer Subtyping

by Dehui Bi and Yuqi Zhang

Mathematics 2025, 13(13), 2178; https://doi.org/10.3390/math13132178 - 3 Jul 2025

Viewed by 217

Precise identification of cancer subtypes from whole slide images (WSIs) is pivotal in tailoring patient-specific therapies. Under the weakly supervised multiple instance learning (MIL) paradigm, existing techniques frequently fall short in simultaneously capturing local tissue textures and long-range contextual relationships. To address these [...] Read more.

Precise identification of cancer subtypes from whole slide images (WSIs) is pivotal in tailoring patient-specific therapies. Under the weakly supervised multiple instance learning (MIL) paradigm, existing techniques frequently fall short in simultaneously capturing local tissue textures and long-range contextual relationships. To address these challenges, we introduce ConvMixerSSM, a hybrid model that integrates a ConvMixer block for local spatial representation, a state space model (SSM) block for capturing long-range dependencies, and a feature-gated block to enhance informative feature selection. The model was evaluated on the TCGA-NSCLC dataset and the CAMELYON16 dataset for cancer subtyping tasks. Extensive experiments, including comparisons with state-of-the-art MIL methods and ablation studies, were conducted to assess the contribution of each component. ConvMixerSSM achieved an AUC of 97.83%, an ACC of 91.82%, and an F1 score of 91.18%, outperforming existing MIL baselines on the TCGA-NSCLC dataset. The ablation study revealed that each block contributed positively to performance, with the full model showing the most balanced and superior results. Moreover, our visualization results further confirm that ConvMixerSSM can effectively identify tumor regions within WSIs, providing model interpretability and clinical relevance. These findings suggest that ConvMixerSSM has strong potential for advancing computational pathology applications in clinical decision-making. Full article

(This article belongs to the Special Issue Computational Perspectives on Artificial Intelligence Drive in Medical Decision-Making)

► Show Figures

Figure 1

25 pages, 5304 KiB

Open AccessArticle

Automatic Detection of Occluded Main Coronary Arteries of NSTEMI Patients with MI-MS ConvMixer + WSSE Without CAG

by Mehmet Cagri Goktekin, Evrim Gul, Tolga Çakmak, Fatih Demir, Mehmet Ali Kobat, Yaman Akbulut, Ömer Işık, Zehra Kadiroğlu, Kürşat Demir and Abdulkadir Şengür

Diagnostics 2025, 15(3), 347; https://doi.org/10.3390/diagnostics15030347 - 2 Feb 2025

Cited by 1 | Viewed by 975

Abstract

Background/Objectives: Heart attacks are the leading cause of death in the world. There are two important classes of heart attack: ST-segment Elevation Myocardial Infarction (STEMI) and Non-ST-segment Elevation Myocardial Infarction (NSTEMI) patient groups. While the STEMI group has a higher mortality rate in [...] Read more.

Background/Objectives: Heart attacks are the leading cause of death in the world. There are two important classes of heart attack: ST-segment Elevation Myocardial Infarction (STEMI) and Non-ST-segment Elevation Myocardial Infarction (NSTEMI) patient groups. While the STEMI group has a higher mortality rate in the short term, the NSTEMI group is considered more dangerous and insidious in the long term. Blocked coronary arteries can be predicted from ECG signals in STEMI patients but not in NSTEMI patients. Therefore, coronary angiography (CAG) is inevitable for these patients. However, in the elderly and some patients with chronic diseases, if there is a single blockage, the CAG procedure poses a risk, so medication may be preferred. In this study, a novel deep learning-based approach is used to automatically detect the occluded main coronary artery or arteries in NSTEMI patients. For this purpose, a new seven-class dataset was created with expert cardiologists. Methods: A new Multi Input-Multi Scale (MI-MS) ConvMixer model was developed for automatic detection. The MI-MS ConvMixer model allows simultaneous training of 12-channel ECG data and highlights different regions of the data at different scales. In addition, the ConMixer structure provides high classification performance without increasing the complexity of the model. Moreover, to maximise the classifier performance, the WSSE algorithm was developed to adjust the classification prediction value according to the feature importance weights. Results: This algorithm improves the SVM classifier performance. The features extracted from this model were classified with the WSSE algorithm, and an accuracy of 88.72% was achieved. Conclusions: This study demonstrates the potential of the MI-MS ConvMixer model in advancing ECG signal classification for diagnosing coronary artery diseases, offering a promising tool for real-time, automated analysis in clinical settings. The findings highlight the model’s ability to achieve high sensitivity, specificity, and precision, which could significantly improve. Full article

(This article belongs to the Special Issue Artificial Intelligence in Biomedical Diagnostics and Analysis 2024)

► Show Figures

Figure 1

26 pages, 5539 KiB

Open AccessArticle

A Comprehensive CNN Model for Age-Related Macular Degeneration Classification Using OCT: Integrating Inception Modules, SE Blocks, and ConvMixer

by Elif Yusufoğlu, Hüseyin Fırat, Hüseyin Üzen, Salih Taha Alperen Özçelik, İpek Balıkçı Çiçek, Abdulkadir Şengür, Orhan Atila and Numan Halit Guldemir

Diagnostics 2024, 14(24), 2836; https://doi.org/10.3390/diagnostics14242836 - 17 Dec 2024

Cited by 1 | Viewed by 1045

Abstract

Background/Objectives: Age-related macular degeneration (AMD) is a significant cause of vision loss in older adults, often progressing without early noticeable symptoms. Deep learning (DL) models, particularly convolutional neural networks (CNNs), demonstrate potential in accurately diagnosing and classifying AMD using medical imaging technologies [...] Read more.

Background/Objectives: Age-related macular degeneration (AMD) is a significant cause of vision loss in older adults, often progressing without early noticeable symptoms. Deep learning (DL) models, particularly convolutional neural networks (CNNs), demonstrate potential in accurately diagnosing and classifying AMD using medical imaging technologies like optical coherence to-mography (OCT) scans. This study introduces a novel CNN-based DL method for AMD diagnosis, aiming to enhance computational efficiency and classification accuracy. Methods: The proposed method (PM) combines modified Inception modules, Depthwise Squeeze-and-Excitation Blocks, and ConvMixer architecture. Its effectiveness was evaluated on two datasets: a private dataset with 2316 images and the public Noor dataset. Key performance metrics, including accuracy, precision, recall, and F1 score, were calculated to assess the method’s diagnostic performance. Results: On the private dataset, the PM achieved outstanding performance: 97.98% accuracy, 97.95% precision, 97.77% recall, and 97.86% F1 score. When tested on the public Noor dataset, the method reached 100% across all evaluation metrics, outperforming existing DL approaches. Conclusions: These results highlight the promising role of AI-based systems in AMD diagnosis, of-fering advanced feature extraction capabilities that can potentially enable early detection and in-tervention, ultimately improving patient care and outcomes. While the proposed model demon-strates promising performance on the datasets tested, the study is limited by the size and diversity of the datasets. Future work will focus on external clinical validation to address these limita-tions. Full article

(This article belongs to the Special Issue Artificial Intelligence in Biomedical Diagnostics and Analysis 2024)

► Show Figures

Figure 1

14 pages, 1898 KiB

Open AccessArticle

Privacy-Preserving ConvMixer Without Any Accuracy Degradation Using Compressible Encrypted Images

by Haiwei Lin, Shoko Imaizumi and Hitoshi Kiya

Information 2024, 15(11), 723; https://doi.org/10.3390/info15110723 - 11 Nov 2024

Cited by 1 | Viewed by 1216

Abstract

We propose an enhanced privacy-preserving method for image classification using ConvMixer, which is an extremely simple model that is similar in spirit to the Vision Transformer (ViT). Most privacy-preserving methods using encrypted images cause the performance of models to degrade due to the [...] Read more.

We propose an enhanced privacy-preserving method for image classification using ConvMixer, which is an extremely simple model that is similar in spirit to the Vision Transformer (ViT). Most privacy-preserving methods using encrypted images cause the performance of models to degrade due to the influence of encryption, but a state-of-the-art method was demonstrated to have the same classification accuracy as that of models without any encryption under the use of ViT. However, the method, in which a common secret key is assigned to each patch, is not robust enough against ciphertext-only attacks (COAs) including jigsaw puzzle solver attacks if compressible encrypted images are used. In addition, ConvMixer is less robust than ViT because there is no position embedding. To overcome this issue, we propose a novel block-wise encryption method that allows us to assign an independent key to each patch to enhance robustness against attacks. In experiments, the effectiveness of the method is verified in terms of image classification accuracy and robustness, and it is compared with conventional privacy-preserving methods using image encryption. Full article

(This article belongs to the Special Issue Deep Learning for Image, Video and Signal Processing)

► Show Figures

Figure 1

18 pages, 1646 KiB

Open AccessArticle

Electroencephalogram-Based ConvMixer Architecture for Recognizing Attention Deficit Hyperactivity Disorder in Children

by Min Feng and Juncai Xu

Brain Sci. 2024, 14(5), 469; https://doi.org/10.3390/brainsci14050469 - 7 May 2024

Cited by 5 | Viewed by 2095

Abstract

Attention deficit hyperactivity disorder (ADHD) is a neuro-developmental disorder that affects approximately 5–10% of school-aged children worldwide. Early diagnosis and intervention are essential to improve the quality of life of patients and their families. In this study, we propose ConvMixer-ECA, a novel deep [...] Read more.

Attention deficit hyperactivity disorder (ADHD) is a neuro-developmental disorder that affects approximately 5–10% of school-aged children worldwide. Early diagnosis and intervention are essential to improve the quality of life of patients and their families. In this study, we propose ConvMixer-ECA, a novel deep learning architecture that combines ConvMixer with efficient channel attention (ECA) blocks for the accurate diagnosis of ADHD using electroencephalogram (EEG) signals. The model was trained and evaluated using EEG recordings from 60 healthy children and 61 children with ADHD. A series of experiments were conducted to evaluate the performance of the ConvMixer-ECA. The results showed that the ConvMixer-ECA performed well in ADHD recognition with 94.52% accuracy. The incorporation of attentional mechanisms, in particular ECA, improved the performance of ConvMixer; it outperformed other attention-based variants. In addition, ConvMixer-ECA outperformed state-of-the-art deep learning models including EEGNet, CNN, RNN, LSTM, and GRU. t-SNE visualization of the output of this model layer validated the effectiveness of ConvMixer-ECA in capturing the underlying patterns and features that separate ADHD from typically developing individuals through hierarchical feature learning. These outcomes demonstrate the potential of ConvMixer-ECA as a valuable tool to assist clinicians in the early diagnosis and intervention of ADHD in children. Full article

(This article belongs to the Special Issue Diagnosis and Prediction of Neurological Diseases: Application of EEG-Based Technology)

► Show Figures

Figure 1

22 pages, 4219 KiB

Open AccessArticle

MixerNet-SAGA A Novel Deep Learning Architecture for Superior Road Extraction in High-Resolution Remote Sensing Imagery

by Wei Wu, Chao Ren, Anchao Yin and Xudong Zhang

Appl. Sci. 2023, 13(18), 10067; https://doi.org/10.3390/app131810067 - 6 Sep 2023

Cited by 5 | Viewed by 1625

Abstract

In this study, we address the limitations of current deep learning models in road extraction tasks from remote sensing imagery. We introduce MixerNet-SAGA, a novel deep learning model that incorporates the strengths of U-Net, integrates a ConvMixer block for enhanced feature extraction, and [...] Read more.

In this study, we address the limitations of current deep learning models in road extraction tasks from remote sensing imagery. We introduce MixerNet-SAGA, a novel deep learning model that incorporates the strengths of U-Net, integrates a ConvMixer block for enhanced feature extraction, and includes a Scaled Attention Gate (SAG) for augmented spatial attention. Experimental validation on the Massachusetts road dataset and the DeepGlobe road dataset demonstrates that MixerNet-SAGA achieves a 10% improvement in precision, 8% in recall, and 12% in IoU compared to leading models such as U-Net, ResNet, and SDUNet. Furthermore, our model excels in computational efficiency, being 20% faster, and has a smaller model size. Notably, MixerNet-SAGA shows exceptional robustness against challenges such as same-spectrum–different-object and different-spectrum–same-object phenomena. Ablation studies further reveal the critical roles of the ConvMixer block and SAG. Despite its strengths, the model’s scalability to extremely large datasets remains an area for future investigation. Collectively, MixerNet-SAGA offers an efficient and accurate solution for road extraction in remote sensing imagery and presents significant potential for broader applications. Full article

(This article belongs to the Special Issue AI-Based Image Processing: 2nd Edition)

► Show Figures

Figure 1

13 pages, 1087 KiB

Open AccessArticle

Privacy-Preserving Image Classification Using ConvMixer with Adaptative Permutation Matrix and Block-Wise Scrambled Image Encryption

by Zheng Qi, AprilPyone MaungMaung and Hitoshi Kiya

J. Imaging 2023, 9(4), 85; https://doi.org/10.3390/jimaging9040085 - 18 Apr 2023

Cited by 8 | Viewed by 2379

Abstract

In this paper, we propose a privacy-preserving image classification method using block-wise scrambled images and a modified ConvMixer. Conventional block-wise scrambled encryption methods usually need the combined use of an adaptation network and a classifier to reduce the influence of image encryption. However, [...] Read more.

In this paper, we propose a privacy-preserving image classification method using block-wise scrambled images and a modified ConvMixer. Conventional block-wise scrambled encryption methods usually need the combined use of an adaptation network and a classifier to reduce the influence of image encryption. However, we point out that it is problematic to utilize large-size images with conventional methods using an adaptation network because of the significant increment in computation cost. Thus, we propose a novel privacy-preserving method that allows us not only to apply block-wise scrambled images to ConvMixer for both training and testing without an adaptation network, but also to provide a high classification accuracy and strong robustness against attack methods. Furthermore, we also evaluate the computation cost of state-of-the-art privacy-preserving DNNs to confirm that our proposed method requires fewer computational resources. In an experiment, we evaluated the classification performance of the proposed method on CIFAR-10 and ImageNet compared with other methods and the robustness against various ciphertext-only-attacks. Full article

► Show Figures

Figure 1

14 pages, 5476 KiB

Open AccessArticle

Deep Learning Architectures for Diagnosis of Diabetic Retinopathy

by Alberto Solano, Kevin N. Dietrich, Marcelino Martínez-Sober, Regino Barranquero-Cardeñosa, Jorge Vila-Tomás and Pablo Hernández-Cámara

Appl. Sci. 2023, 13(7), 4445; https://doi.org/10.3390/app13074445 - 31 Mar 2023

Cited by 12 | Viewed by 3443

Abstract

For many years, convolutional neural networks dominated the field of computer vision, not least in the medical field, where problems such as image segmentation were addressed by such networks as the U-Net. The arrival of self-attention-based networks to the field of computer vision [...] Read more.

For many years, convolutional neural networks dominated the field of computer vision, not least in the medical field, where problems such as image segmentation were addressed by such networks as the U-Net. The arrival of self-attention-based networks to the field of computer vision through ViTs seems to have changed the trend of using standard convolutions. Throughout this work, we apply different architectures such as U-Net, ViTs and ConvMixer, to compare their performance on a medical semantic segmentation problem. All the models have been trained from scratch on the DRIVE dataset and evaluated on their private counterparts to assess which of the models performed better in the segmentation problem. Our major contribution is showing that the best-performing model (ConvMixer) is the one that shares the approach from the ViT (processing images as patches) while maintaining the foundational blocks (convolutions) from the U-Net. This mixture does not only produce better results (

D I C E = 0.83

) than both ViTs (

0.80

/0.077 for UNETR/SWIN-Unet) and the U-Net (

0.82

) on their own but reduces considerably the number of parameters (2.97M against 104M/27M and 31M, respectively), showing that there is no need to systematically use large models for solving image problems where smaller architectures with the optimal pieces can get better results. Full article

(This article belongs to the Special Issue Machine/Deep Learning: Applications, Technologies and Algorithms)

► Show Figures

Figure 1

14 pages, 1017 KiB

Open AccessArticle

EmbedFormer: Embedded Depth-Wise Convolution Layer for Token Mixing

by Zeji Wang, Xiaowei He, Yi Li and Qinliang Chuai

Sensors 2022, 22(24), 9854; https://doi.org/10.3390/s22249854 - 15 Dec 2022

Cited by 8 | Viewed by 3884

Abstract

Visual Transformers (ViTs) have shown impressive performance due to their powerful coding ability to catch spatial and channel information. MetaFormer gives us a general architecture of transformers consisting of a token mixer and a channel mixer through which we can generally understand how [...] Read more.

Visual Transformers (ViTs) have shown impressive performance due to their powerful coding ability to catch spatial and channel information. MetaFormer gives us a general architecture of transformers consisting of a token mixer and a channel mixer through which we can generally understand how transformers work. It is proved that the general architecture of the ViTs is more essential to the models’ performance than self-attention mechanism. Then, Depth-wise Convolution layer (DwConv) is widely accepted to replace local self-attention in transformers. In this work, a pure convolutional "transformer" is designed. We rethink the difference between the operation of self-attention and DwConv. It is found that the self-attention layer, with an embedding layer, unavoidably affects channel information, while DwConv only mixes the token information per channel. To address the differences between DwConv and self-attention, we implement DwConv with an embedding layer before as the token mixer to instantiate a MetaFormer block and a model named EmbedFormer is introduced. Meanwhile, SEBlock is applied in the channel mixer part to improve performance. On the ImageNet-1K classification task, EmbedFormer achieves top-1 accuracy of 81.7% without additional training images, surpassing the Swin transformer by +0.4% in similar complexity. In addition, EmbedFormer is evaluated in downstream tasks and the results are entirely above those of PoolFormer, ResNet and DeiT. Compared with PoolFormer-S24, another instance of MetaFormer, our EmbedFormer improves the score by +3.0% box AP/+2.3% mask AP on the COCO dataset and +1.3% mIoU on the ADE20K. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

16 pages, 8388 KiB

Open AccessArticle

Mixer U-Net: An Improved Automatic Road Extraction from UAV Imagery

by Furkat Sultonov, Jun-Hyun Park, Sangseok Yun, Dong-Woo Lim and Jae-Mo Kang

Appl. Sci. 2022, 12(4), 1953; https://doi.org/10.3390/app12041953 - 13 Feb 2022

Cited by 24 | Viewed by 3950

Abstract

Automatic road extraction from unmanned aerial vehicle (UAV) imagery has been one of the major research topics in the area of remote sensing analysis due to its importance in a wide range of applications such as urban planning, road monitoring, intelligent transportation systems, [...] Read more.

Automatic road extraction from unmanned aerial vehicle (UAV) imagery has been one of the major research topics in the area of remote sensing analysis due to its importance in a wide range of applications such as urban planning, road monitoring, intelligent transportation systems, and automatic road navigation. Thanks to the recent advances in Deep Learning (DL), the tedious manual segmentation of roads can be automated. However, the majority of these models are computationally heavy and, thus, are not suitable for UAV remote-sensing tasks with limited resources. To alleviate this bottleneck, we propose two lightweight models based on depthwise separable convolutions and ConvMixer inception block. Both models take the advantage of computational efficiency of depthwise separable convolutions and multi-scale processing of inception module and combine them in an encoder–decoder architecture of U-Net. Specifically, we substitute standard convolution layers used in U-Net for ConvMixer layers. Furthermore, in order to learn images on different scales, we apply ConvMixer layer into Inception module. Finally, we incorporate pathway networks along the skip connections to minimize the semantic gap between encoder and decoder. In order to validate the performance and effectiveness of the models, we adopt Massachusetts roads dataset. One incarnation of our models is able to beat the U-Net’s performance with 10× fewer parameters, and DeepLabV3’s performance with 12× fewer parameters in terms of mean intersection over union (mIoU) metric. For further validation, we have compared our models against four baselines in total and used additional metrics such as precision (P), recall (R), and F1 score. Full article

(This article belongs to the Special Issue Perception, Navigation, and Control for Unmanned Aerial Vehicles)

► Show Figures

Figure 1

Search Results (10)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (10)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI