MDPI - Publisher of Open Access Journals

19 pages, 4569 KB

Open AccessArticle

NeuroNet-AD: A Multimodal Deep Learning Framework for Multiclass Alzheimer’s Disease Diagnosis

by Saeka Rahman, Md Motiur Rahman, Smriti Bhatt, Raji Sundararajan and Miad Faezipour

Bioengineering 2025, 12(10), 1107; https://doi.org/10.3390/bioengineering12101107 - 15 Oct 2025

Viewed by 464

Alzheimer’s disease (AD) is the most prevalent form of dementia. This disease significantly impacts cognitive functions and daily activities. Early and accurate diagnosis of AD, including the preliminary stage of mild cognitive impairment (MCI), is critical for effective patient care and treatment development. [...] Read more.

Alzheimer’s disease (AD) is the most prevalent form of dementia. This disease significantly impacts cognitive functions and daily activities. Early and accurate diagnosis of AD, including the preliminary stage of mild cognitive impairment (MCI), is critical for effective patient care and treatment development. Although advancements in deep learning (DL) and machine learning (ML) models improve diagnostic precision, the lack of large datasets limits further enhancements, necessitating the use of complementary data. Existing convolutional neural networks (CNNs) effectively process visual features but struggle to fuse multimodal data effectively for AD diagnosis. To address these challenges, we propose NeuroNet-AD, a novel multimodal CNN framework designed to enhance AD classifcation accuracy. NeuroNet-AD integrates Magnetic Resonance Imaging (MRI) images with clinical text-based metadata, including psychological test scores, demographic information, and genetic biomarkers. In NeuroNet-AD, we incorporate Convolutional Block Attention Modules (CBAMs) within the ResNet-18 backbone, enabling the model to focus on the most informative spatial and channel-wise features. We introduce an attention computation and multimodal fusion module, named Meta Guided Cross Attention (MGCA), which facilitates effective cross-modal alignment between images and meta-features through a multi-head attention mechanism. Additionally, we employ an ensemble-based feature selection strategy to identify the most discriminative features from the textual data, improving model generalization and performance. We evaluate NeuroNet-AD on the Alzheimer’s Disease Neuroimaging Initiative (ADNI1) dataset using subject-level 5-fold cross-validation and a held-out test set to ensure robustness. NeuroNet-AD achieved 98.68% accuracy in multiclass classification of normal control (NC), MCI, and AD and 99.13% accuracy in the binary setting (NC vs. AD) on the ADNI dataset, outperforming state-of-the-art models. External validation on the OASIS-3 dataset further confirmed the model’s generalization ability, achieving 94.10% accuracy in the multiclass setting and 98.67% accuracy in the binary setting, despite variations in demographics and acquisition protocols. Further extensive evaluation studies demonstrate the effectiveness of each component of NeuroNet-AD in improving the performance. Full article

(This article belongs to the Special Issue Next-Generation Diagnostic and Therapy Systems for Neurodegenerative Diseases)

► Show Figures

Graphical abstract

24 pages, 2596 KB

Open AccessArticle

Improving Segmentation Accuracy for Asphalt Pavement Cracks via Integrated Probability Maps

by Roman Trach, Volodymyr Tyvoniuk and Yuliia Trach

Appl. Sci. 2025, 15(18), 9865; https://doi.org/10.3390/app15189865 - 9 Sep 2025

Viewed by 589

Abstract

Asphalt crack segmentation is essential for preventive maintenance but is sensitive to noise, viewpoint, and illumination. This study evaluates a minimally invasive strategy that augments standard RGB input with an auxiliary fourth channel—a crack-probability map generated by a multi-scale ensemble of classifiers—and injects [...] Read more.

Asphalt crack segmentation is essential for preventive maintenance but is sensitive to noise, viewpoint, and illumination. This study evaluates a minimally invasive strategy that augments standard RGB input with an auxiliary fourth channel—a crack-probability map generated by a multi-scale ensemble of classifiers—and injects it into segmentation backbones. Field imagery from unmanned aerial vehicles and action cameras was used to train and compare U-Net, ENet, HRNet, and DeepLabV3+ under unified settings; the probability map was produced by an ensemble of lightweight convolutional neural networks (CNNs). Across models, the four-channel configuration improved performance over three-channel baselines; for DeepLabV3+, the Intersection over Union (IoU) increased by 6.41%. Transformer-based classifiers, despite strong accuracy, proved less effective and slower than lightweight CNNs for probability-map generation; the final ensemble processed images in approximately 0.63 s each. Integrating ensemble-derived probability maps yielded consistent gains, with the best four-channel CNNs surpassing YOLO11x-seg and Transformer baselines while remaining practical. This study presents a systematic evaluation showing that probability maps from classifier ensembles can serve as an auxiliary channel to improve segmentation of asphalt pavement cracks, providing a novel modular complement or alternative to attention mechanisms. The findings demonstrate a practical and effective strategy for enhancing automated pavement monitoring. Full article

(This article belongs to the Special Issue Technology and Organization Applied to Civil Engineering)

► Show Figures

Figure 1

25 pages, 3596 KB

Open AccessArticle

Enhancing Deep Learning Sustainability by Synchronized Multi Augmentation with Rotations and Multi-Backbone Architectures

by Nikita Gordienko, Yuri Gordienko and Sergii Stirenko

Big Data Cogn. Comput. 2025, 9(5), 115; https://doi.org/10.3390/bdcc9050115 - 27 Apr 2025

Viewed by 670

Abstract

Deep learning applications for Edge Intelligence (EI) face challenges in achieving high model performance while maintaining computational efficiency, particularly under varying image orientations and perspectives. This study investigates the synergy of multi-backbone (MB) configurations and Synchronized Multi Augmentation (SMA) to address these challenges [...] Read more.

Deep learning applications for Edge Intelligence (EI) face challenges in achieving high model performance while maintaining computational efficiency, particularly under varying image orientations and perspectives. This study investigates the synergy of multi-backbone (MB) configurations and Synchronized Multi Augmentation (SMA) to address these challenges by leveraging diverse input representations and spatial transformations. SMA employs synchronously augmented input data across MBs during training, thereby improving feature extraction across diverse representations. The outputs provided by these MBs are merged through different fusion strategies: Averaging Fusion with aggregation of predictions and Dense Fusion with integration of features via a fully connected neural network. It aims to increase model accuracy on previously unseen input data and to reduce computational requirements by minimizing neural network size, particularly advantageous for EI systems characterized by the limited computing resources. This study employed MBs with the MobileNetV3 architecture and the CIFAR-10 dataset to investigate the impact of SMA techniques and different fusion strategies on model robustness and performance. SMA techniques were applied to simulate diverse image orientations, and MB architectures were tested with Averaging and Dense fusion strategies to assess their ability to learn diverse feature representations and improve robustness. The experiments revealed that models augmented with SMA outperformed the baseline MobileNetV3 on modified datasets, achieving higher robustness to orientation variations. Models with Averaging fusion exhibited the most stable performance across datasets, while Dense fusion achieved the highest metrics under specific conditions. Results indicate that SMAs incorporating image transformation adjustments, such as rotation, significantly enhance generalization across varying orientation conditions. This approach enables the production of more stable results using the same pretrained weights in real-world applications by configuring Image Signal Processing (ISP) to effectively use SMA. The findings encourage further exploration of SMA techniques in conjunction with diverse camera sensor configurations and ISP settings to optimize real-world deployments. Full article

(This article belongs to the Special Issue Machine Learning and AI Technology for Sustainable Development)

► Show Figures

Figure 1

20 pages, 10708 KB

Open AccessArticle

Synchronized Multi-Augmentation with Multi-Backbone Ensembling for Enhancing Deep Learning Performance

by Nikita Gordienko, Yuri Gordienko and Sergii Stirenko

Appl. Syst. Innov. 2025, 8(1), 18; https://doi.org/10.3390/asi8010018 - 21 Jan 2025

Cited by 1 | Viewed by 1405

Abstract

This study introduces a novel technique called Synchronized Multi-Augmentation (SMA) combined with multi-backbone (MB) ensembling to enhance model performance and generalization in deep learning (DL) tasks in real-world scenarios. SMA utilizes synchronously augmented input data for training across multiple backbones, improving the overall [...] Read more.

This study introduces a novel technique called Synchronized Multi-Augmentation (SMA) combined with multi-backbone (MB) ensembling to enhance model performance and generalization in deep learning (DL) tasks in real-world scenarios. SMA utilizes synchronously augmented input data for training across multiple backbones, improving the overall feature extraction process. The outputs from these backbones are fused using two distinct strategies: the averaging fusion method, which averages predictions, and the dense fusion method, which averages features through a fully connected network. These methods aim to boost accuracy and reduce computational costs, particularly in Edge Intelligence (EI) systems with limited resources. The proposed SMA technique was evaluated on the CIFAR-10 dataset, highlighting its potential to enhance classification tasks in DL workflows. This study provides a comprehensive analysis of various backbones, their ensemble methods, and the impact of different SMAs on model performance. The results demonstrate that SMAs involving color adjustments, such as contrast and equalization, significantly improve generalization under varied lighting conditions that simulated real-world low-illumination conditions, outperforming traditional spatial augmentations. This approach is particularly beneficial for EI hardware, such as microcontrollers and IoT devices, which operate under strict constraints like limited processing power and memory and real-time processing requirements. This study’s findings suggest that employing SMA and MB ensembling can offer significant improvements in accuracy, generalization, and efficiency, making it a viable solution for deploying DL models on edge devices with constrained resources under real-world practical conditions. Full article

(This article belongs to the Special Issue Advancements in Deep Learning and Its Applications)

► Show Figures

Figure 1

24 pages, 3332 KB

Open AccessArticle

U-Net Ensemble for Enhanced Semantic Segmentation in Remote Sensing Imagery

by Ivica Dimitrovski, Vlatko Spasev, Suzana Loshkovska and Ivan Kitanovski

Remote Sens. 2024, 16(12), 2077; https://doi.org/10.3390/rs16122077 - 8 Jun 2024

Cited by 28 | Viewed by 9512

Abstract

Semantic segmentation of remote sensing imagery stands as a fundamental task within the domains of both remote sensing and computer vision. Its objective is to generate a comprehensive pixel-wise segmentation map of an image, assigning a specific label to each pixel. This facilitates [...] Read more.

Semantic segmentation of remote sensing imagery stands as a fundamental task within the domains of both remote sensing and computer vision. Its objective is to generate a comprehensive pixel-wise segmentation map of an image, assigning a specific label to each pixel. This facilitates in-depth analysis and comprehension of the Earth’s surface. In this paper, we propose an approach for enhancing semantic segmentation performance by employing an ensemble of U-Net models with three different backbone networks: Multi-Axis Vision Transformer, ConvFormer, and EfficientNet. The final segmentation maps are generated through a geometric mean ensemble method, leveraging the diverse representations learned by each backbone network. The effectiveness of the base U-Net models and the proposed ensemble is evaluated on multiple datasets commonly used for semantic segmentation tasks in remote sensing imagery, including LandCover.ai, LoveDA, INRIA, UAVid, and ISPRS Potsdam datasets. Our experimental results demonstrate that the proposed approach achieves state-of-the-art performance, showcasing its effectiveness and robustness in accurately capturing the semantic information embedded within remote sensing images. Full article

(This article belongs to the Special Issue GeoAI and EO Big Data Driven Advances in Earth Environmental Science)

► Show Figures

Figure 1

16 pages, 1869 KB

Open AccessArticle

Multi-Stage Classification of Retinal OCT Using Multi-Scale Ensemble Deep Architecture

by Oluwatunmise Akinniyi, Md Mahmudur Rahman, Harpal Singh Sandhu, Ayman El-Baz and Fahmi Khalifa

Bioengineering 2023, 10(7), 823; https://doi.org/10.3390/bioengineering10070823 - 10 Jul 2023

Cited by 19 | Viewed by 3583

Abstract

Accurate noninvasive diagnosis of retinal disorders is required for appropriate treatment or precision medicine. This work proposes a multi-stage classification network built on a multi-scale (pyramidal) feature ensemble architecture for retinal image classification using optical coherence tomography (OCT) images. First, a scale-adaptive neural [...] Read more.

Accurate noninvasive diagnosis of retinal disorders is required for appropriate treatment or precision medicine. This work proposes a multi-stage classification network built on a multi-scale (pyramidal) feature ensemble architecture for retinal image classification using optical coherence tomography (OCT) images. First, a scale-adaptive neural network is developed to produce multi-scale inputs for feature extraction and ensemble learning. The larger input sizes yield more global information, while the smaller input sizes focus on local details. Then, a feature-rich pyramidal architecture is designed to extract multi-scale features as inputs using DenseNet as the backbone. The advantage of the hierarchical structure is that it allows the system to extract multi-scale, information-rich features for the accurate classification of retinal disorders. Evaluation on two public OCT datasets containing normal and abnormal retinas (e.g., diabetic macular edema (DME), choroidal neovascularization (CNV), age-related macular degeneration (AMD), and Drusen) and comparison against recent networks demonstrates the advantages of the proposed architecture’s ability to produce feature-rich classification with average accuracy of 97.78%, 96.83%, and 94.26% for the first (binary) stage, second (three-class) stage, and all-at-once (four-class) classification, respectively, using cross-validation experiments using the first dataset. In the second dataset, our system showed an overall accuracy, sensitivity, and specificity of 99.69%, 99.71%, and 99.87%, respectively. Overall, the tangible advantages of the proposed network for enhanced feature learning might be used in various medical image classification tasks where scale-invariant features are crucial for precise diagnosis. Full article

(This article belongs to the Special Issue Artificial Intelligence in Medical Image Processing and Segmentation)

► Show Figures

Figure 1

25 pages, 7981 KB

Open AccessArticle

A Multimodal Data Fusion and Deep Learning Framework for Large-Scale Wildfire Surface Fuel Mapping

by Mohamad Alipour, Inga La Puma, Joshua Picotte, Kasra Shamsaei, Eric Rowell, Adam Watts, Branko Kosovic, Hamed Ebrahimian and Ertugrul Taciroglu

Fire 2023, 6(2), 36; https://doi.org/10.3390/fire6020036 - 17 Jan 2023

Cited by 31 | Viewed by 7485

Abstract

Accurate estimation of fuels is essential for wildland fire simulations as well as decision-making related to land management. Numerous research efforts have leveraged remote sensing and machine learning for classifying land cover and mapping forest vegetation species. In most cases that focused on [...] Read more.

Accurate estimation of fuels is essential for wildland fire simulations as well as decision-making related to land management. Numerous research efforts have leveraged remote sensing and machine learning for classifying land cover and mapping forest vegetation species. In most cases that focused on surface fuel mapping, the spatial scale of interest was smaller than a few hundred square kilometers; thus, many small-scale site-specific models had to be created to cover the landscape at the national scale. The present work aims to develop a large-scale surface fuel identification model using a custom deep learning framework that can ingest multimodal data. Specifically, we use deep learning to extract information from multispectral signatures, high-resolution imagery, and biophysical climate and terrain data in a way that facilitates their end-to-end training on labeled data. A multi-layer neural network is used with spectral and biophysical data, and a convolutional neural network backbone is used to extract the visual features from high-resolution imagery. A Monte Carlo dropout mechanism was also devised to create a stochastic ensemble of models that can capture classification uncertainties while boosting the prediction performance. To train the system as a proof-of-concept, fuel pseudo-labels were created by a random geospatial sampling of existing fuel maps across California. Application results on independent test sets showed promising fuel identification performance with an overall accuracy ranging from 55% to 75%, depending on the level of granularity of the included fuel types. As expected, including the rare—and possibly less consequential—fuel types reduced the accuracy. On the other hand, the addition of high-resolution imagery improved classification performance at all levels. Full article

(This article belongs to the Special Issue Advances in the Measurement of Fuels and Fuel Properties)

► Show Figures

Figure 1

30 pages, 3615 KB

Open AccessArticle

ETECADx: Ensemble Self-Attention Transformer Encoder for Breast Cancer Diagnosis Using Full-Field Digital X-ray Breast Images

by Aymen M. Al-Hejri, Riyadh M. Al-Tam, Muneer Fazea, Archana Harsing Sable, Soojeong Lee and Mugahed A. Al-antari

Diagnostics 2023, 13(1), 89; https://doi.org/10.3390/diagnostics13010089 - 28 Dec 2022

Cited by 42 | Viewed by 6160

Abstract

Early detection of breast cancer is an essential procedure to reduce the mortality rate among women. In this paper, a new AI-based computer-aided diagnosis (CAD) framework called ETECADx is proposed by fusing the benefits of both ensemble transfer learning of the convolutional neural [...] Read more.

Early detection of breast cancer is an essential procedure to reduce the mortality rate among women. In this paper, a new AI-based computer-aided diagnosis (CAD) framework called ETECADx is proposed by fusing the benefits of both ensemble transfer learning of the convolutional neural networks as well as the self-attention mechanism of vision transformer encoder (ViT). The accurate and precious high-level deep features are generated via the backbone ensemble network, while the transformer encoder is used to diagnose the breast cancer probabilities in two approaches: Approach A (i.e., binary classification) and Approach B (i.e., multi-classification). To build the proposed CAD system, the benchmark public multi-class INbreast dataset is used. Meanwhile, private real breast cancer images are collected and annotated by expert radiologists to validate the prediction performance of the proposed ETECADx framework. The promising evaluation results are achieved using the INbreast mammograms with overall accuracies of 98.58% and 97.87% for the binary and multi-class approaches, respectively. Compared with the individual backbone networks, the proposed ensemble learning model improves the breast cancer prediction performance by 6.6% for binary and 4.6% for multi-class approaches. The proposed hybrid ETECADx shows further prediction improvement when the ViT-based ensemble backbone network is used by 8.1% and 6.2% for binary and multi-class diagnosis, respectively. For validation purposes using the real breast images, the proposed CAD system provides encouraging prediction accuracies of 97.16% for binary and 89.40% for multi-class approaches. The ETECADx has a capability to predict the breast lesions for a single mammogram in an average of 0.048 s. Such promising performance could be useful and helpful to assist the practical CAD framework applications providing a second supporting opinion of distinguishing various breast cancer malignancies. Full article

(This article belongs to the Special Issue Medical Diagnostic Systems Based on Advancing Artificial Intelligence Concepts)

► Show Figures

Figure 1

17 pages, 7848 KB

Open AccessArticle

A Patient-Specific Algorithm for Lung Segmentation in Chest Radiographs

by Manawaduge Supun De Silva, Barath Narayanan Narayanan and Russell C. Hardie

AI 2022, 3(4), 931-947; https://doi.org/10.3390/ai3040055 - 18 Nov 2022

Cited by 7 | Viewed by 4491

Abstract

Lung segmentation plays an important role in computer-aided detection and diagnosis using chest radiographs (CRs). Currently, the U-Net and DeepLabv3+ convolutional neural network architectures are widely used to perform CR lung segmentation. To boost performance, ensemble methods are often used, whereby probability map [...] Read more.

Lung segmentation plays an important role in computer-aided detection and diagnosis using chest radiographs (CRs). Currently, the U-Net and DeepLabv3+ convolutional neural network architectures are widely used to perform CR lung segmentation. To boost performance, ensemble methods are often used, whereby probability map outputs from several networks operating on the same input image are averaged. However, not all networks perform adequately for any specific patient image, even if the average network performance is good. To address this, we present a novel multi-network ensemble method that employs a selector network. The selector network evaluates the segmentation outputs from several networks; on a case-by-case basis, it selects which outputs are fused to form the final segmentation for that patient. Our candidate lung segmentation networks include U-Net, with five different encoder depths, and DeepLabv3+, with two different backbone networks (ResNet50 and ResNet18). Our selector network is a ResNet18 image classifier. We perform all training using the publicly available Shenzhen CR dataset. Performance testing is carried out with two independent publicly available CR datasets, namely, Montgomery County (MC) and Japanese Society of Radiological Technology (JSRT). Intersection-over-Union scores for the proposed approach are 13% higher than the standard averaging ensemble method on MC and 5% better on JSRT. Full article

(This article belongs to the Special Issue Feature Papers for AI)

► Show Figures

Figure 1

17 pages, 9092 KB

Open AccessArticle

Segmentation for Multi-Rock Types on Digital Outcrop Photographs Using Deep Learning Techniques

by Owais A. Malik, Idrus Puasa and Daphne Teck Ching Lai

Sensors 2022, 22(21), 8086; https://doi.org/10.3390/s22218086 - 22 Oct 2022

Cited by 15 | Viewed by 3026

Abstract

The basic identification and classification of sedimentary rocks into sandstone and mudstone are important in the study of sedimentology and they are executed by a sedimentologist. However, such manual activity involves countless hours of observation and data collection prior to any interpretation. When [...] Read more.

The basic identification and classification of sedimentary rocks into sandstone and mudstone are important in the study of sedimentology and they are executed by a sedimentologist. However, such manual activity involves countless hours of observation and data collection prior to any interpretation. When such activity is conducted in the field as part of an outcrop study, the sedimentologist is likely to be exposed to challenging conditions such as the weather and their accessibility to the outcrops. This study uses high-resolution photographs which are acquired from a sedimentological study to test an alternative basic multi-rock identification through machine learning. While existing studies have effectively applied deep learning techniques to classify the rock types in field rock images, their approaches only handle a single rock-type classification per image. One study applied deep learning techniques to classify multi-rock types in each image; however, the test was performed on artificially overlaid images of different rock types in a test sample and not of naturally occurring rock surfaces of multiple rock types. To the best of our knowledge, no study has applied semantic segmentation to solve the multi-rock classification problem using digital photographs of multiple rock types. This paper presents the application of two state-of-the-art segmentation models, namely U-Net and LinkNet, to identify multiple rock types in digital photographs by segmenting the sandstone, mudstone, and background classes in a self-collected dataset of 102 images from a field in Brunei Darussalam. Four pre-trained networks, including Resnet34, Inceptionv3, VGG16, and Efficientnetb7 were used as a backbone for both models, and the performances of the individual models and their ensembles were compared. We also investigated the impact of image enhancement and different color representations on the performances of these segmentation models. The experiment results of this study show that among the individual models, LinkNet with Efficientnetb7 as a backbone had the best performance with a mean over intersection (MIoU) value of 0.8135 for all of the classes. While the ensemble of U-Net models (with all four backbones) performed slightly better than the LinkNet with Efficientnetb7 did with an MIoU of 0.8201. When different color representations and image enhancements were explored, the best performance (MIoU = 0.8178) was noticed for the L*a*b* color representation with Efficientnetb7 using U-Net segmentation. For the individual classes of interest (sandstone and mudstone), U-Net with Efficientnetb7 was found to be the best model for the segmentation. Thus, this study presents the potential of semantic segmentation in automating the reservoir characterization process whereby we can extract the patches of interest from the rocks for much deeper study and modeling to be conducted. Full article

(This article belongs to the Special Issue Computational Intelligence in Image Analysis)

► Show Figures

Figure 1

15 pages, 3721 KB

Open AccessArticle

High Performance DeepFake Video Detection on CNN-Based with Attention Target-Specific Regions and Manual Distillation Extraction

by Van-Nhan Tran, Suk-Hwan Lee, Hoanh-Su Le and Ki-Ryong Kwon

Appl. Sci. 2021, 11(16), 7678; https://doi.org/10.3390/app11167678 - 20 Aug 2021

Cited by 33 | Viewed by 8177

Abstract

The rapid development of deep learning models that can produce and synthesize hyper-realistic videos are known as DeepFakes. Moreover, the growth of forgery data has prompted concerns about malevolent intent usage. Detecting forgery videos are a crucial subject in the field of digital [...] Read more.

The rapid development of deep learning models that can produce and synthesize hyper-realistic videos are known as DeepFakes. Moreover, the growth of forgery data has prompted concerns about malevolent intent usage. Detecting forgery videos are a crucial subject in the field of digital media. Nowadays, most models are based on deep learning neural networks and vision transformer, SOTA model with EfficientNetB7 backbone. However, due to the usage of excessively large backbones, these models have the intrinsic drawback of being too heavy. In our research, a high performance DeepFake detection model for manipulated video is proposed, ensuring accuracy of the model while keeping an appropriate weight. We inherited content from previous research projects related to distillation methodology but our proposal approached in a different way with manual distillation extraction, target-specific regions extraction, data augmentation, frame and multi-region ensemble, along with suggesting a CNN-based model as well as flexible classification with a dynamic threshold. Our proposal can reduce the overfitting problem, a common and particularly important problem affecting the quality of many models. So as to analyze the quality of our model, we performed tests on two datasets. DeepFake Detection Dataset (DFDC) with our model obtains 0.958 of AUC and 0.9243 of F1-score, compared with the SOTA model which obtains 0.972 of AUC and 0.906 of F1-score, and the smaller dataset Celeb-DF v2 with 0.978 of AUC and 0.9628 of F1-score. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

25 pages, 8157 KB

Open AccessArticle

An Ensemble of Global and Local-Attention Based Convolutional Neural Networks for COVID-19 Diagnosis on Chest X-ray Images

by Ahmed Afifi, Noor E Hafsa, Mona A. S. Ali, Abdulaziz Alhumam and Safa Alsalman

Symmetry 2021, 13(1), 113; https://doi.org/10.3390/sym13010113 - 11 Jan 2021

Cited by 40 | Viewed by 5012

Abstract

The recent Coronavirus Disease 2019 (COVID-19) pandemic has put a tremendous burden on global health systems. Medical practitioners are under great pressure for reliable screening of suspected cases employing adjunct diagnostic tools to standard point-of-care testing methodology. Chest X-rays (CXRs) are appearing as [...] Read more.

The recent Coronavirus Disease 2019 (COVID-19) pandemic has put a tremendous burden on global health systems. Medical practitioners are under great pressure for reliable screening of suspected cases employing adjunct diagnostic tools to standard point-of-care testing methodology. Chest X-rays (CXRs) are appearing as a prospective diagnostic tool with easy-to-acquire, low-cost and less cross-contamination risk features. Artificial intelligence (AI)-attributed CXR evaluation has shown great potential for distinguishing COVID-19-induced pneumonia from other associated clinical instances. However, one of the associated challenges with diagnostic imaging-based modeling is incorrect feature attribution, which leads the model to learn misguiding disease patterns, causing wrong predictions. Here, we demonstrate an effective deep learning-based methodology to mitigate the problem, thereby allowing the classification algorithm to learn from relevant features. The proposed deep-learning framework consists of an ensemble of convolutional neural network (CNN) models focusing on both global and local pathological features from CXR lung images, while the latter is extracted using a multi-instance learning scheme and a local attention mechanism. An inspection of a series of backbone CNN models using global and local features, and an ensemble of both features, trained from high-quality CXR images of 1311 patients, further augmented for achieving the symmetry in class distribution, to localize lung pathological features followed by the classification of COVID-19 and other related pneumonia, shows that a DenseNet161 architecture outperforms all other models, as evaluated on an independent test set of 159 patients with confirmed cases. Specifically, an ensemble of DenseNet161 models with global and local attention-based features achieve an average balanced accuracy of 91.2%, average precision of 92.4%, and F1-score of 91.9% in a multi-label classification framework comprising COVID-19, pneumonia, and control classes. The DenseNet161 ensembles were also found to be statistically significant from all other models in a comprehensive statistical analysis. The current study demonstrated that the proposed deep learning-based algorithm can accurately identify the COVID-19-related pneumonia in CXR images, along with differentiating non-COVID-19-associated pneumonia with high specificity, by effectively alleviating the incorrect feature attribution problem, and exploiting an enhanced feature descriptor. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

Search Results (12)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (12)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI