Performance Comparison of Embedded AI Solutions for Classification and Detection in Lung Disease Diagnosis

Ahmed, Md Sabbir; Giordano, Stefano; Adami, Davide

doi:10.3390/app15179345

Open AccessArticle

Performance Comparison of Embedded AI Solutions for Classification and Detection in Lung Disease Diagnosis

by

Md Sabbir Ahmed

^1,*,

Stefano Giordano

¹

and

Davide Adami

²

¹

Department of Information Engineering, University of Pisa, Via G. Caruso 16, 56122 Pisa, Italy

²

Department of Information Engineering, CNIT—University of Pisa, Via G. Caruso 16, 56122 Pisa, Italy

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(17), 9345; https://doi.org/10.3390/app15179345

Submission received: 27 April 2025 / Revised: 6 July 2025 / Accepted: 14 August 2025 / Published: 26 August 2025

Download

Browse Figures

Versions Notes

Abstract

Lung disease diagnosis from chest X-ray images is a critical task in clinical care, especially in resource-constrained settings where access to radiology expertise and computational infrastructure is limited. Recent advances in deep learning have shown promise, yet most studies focus solely on either classification or detection in isolation, rarely exploring their combined potential in an embedded, real-world setting. To address this, we present a dual deep learning approach that combines five-class disease classification and multi-label thoracic abnormality detection, optimized for embedded edge deployment. Specifically, we evaluate six state-of-the-art CNN architectures—ResNet101, DenseNet201, MobileNetV3-Large, EfficientNetV2-B0, InceptionResNetV2, and Xception—on both base (2020 images) and augmented (9875 images) datasets. Validation accuracies ranged from 55.3% to 70.7% on the base dataset and improved to 58.4% to 72.0% with augmentation, with MobileNetV3-Large achieving the highest accuracy on both. In parallel, we trained a YOLOv8n model for multi-label detection of 14 thoracic diseases. While not deployed in this work, its lightweight architecture makes it suitable for future use on embedded platforms. All classification models were evaluated for end-to-end inference on a Raspberry Pi 4 using a high-resolution chest X-ray image (2566 × 2566, PNG). MobileNetV3-Large demonstrated the fastest latency at 429.6 ms, and all models completed inference in under 2.4 s. These results demonstrate the feasibility of combining classification for rapid triage and detection for spatial interpretability in real-time, embedded clinical environments—paving the way for practical, low-cost AI-based decision support systems for surgery rooms and mobile clinical environments.

Keywords:

embedded AI; deep learning for medical imaging; edge computing; lung disease detection; chest X-ray analysis

1. Introduction

Lung diseases such as pneumonia, COVID-19, pulmonary fibrosis, and interstitial lung disease (ILD) remain among the leading causes of global morbidity and mortality [1,2]. These conditions place significant strain on healthcare systems, particularly in low- and middle-income regions where timely diagnosis is often hindered by limited access to trained radiologists and advanced diagnostic infrastructure [3,4,5].

In recent years, artificial intelligence (AI) has emerged as a transformative force in medical imaging, offering powerful tools for automated diagnosis, decision support, and workflow optimization. AI-driven techniques, particularly deep learning, have demonstrated substantial potential in enhancing diagnostic accuracy and speed across a range of modalities, including chest X-rays [6,7]. For instance, computer-assisted systems based on morphological analysis have shown promise in detecting asymmetries in mammographic images [8], illustrating how AI can support clinical decisions even in complex or subtle cases.

Chest X-rays (CXRs) are widely available and cost-effective for lung disease screening, yet their interpretation remains a skill-intensive task, leading to diagnostic bottlenecks in many clinical environments [9,10]. Building on this momentum, recent advances in deep learning have shown promise in automating medical image analysis [11,12]. Convolutional Neural Networks (CNNs) have been extensively adopted for classification tasks, achieving expert-level accuracy in identifying disease patterns in CXR images [13,14]. In parallel, object detection models such as the You Only Look Once (YOLO) series [15,16] have enabled the localized identification of multiple abnormalities, offering enhanced interpretability and spatial precision in diagnosis.

In this context, classification refers to assigning an entire chest X-ray image a single label (e.g., COVID-19 or Normal), while object detection goes further by identifying and localizing multiple abnormalities within the image using bounding boxes (e.g., detecting both Pleural Effusion and Cardiomegaly in different regions). These two approaches offer complementary insights: classification supports rapid triage, while detection enhances interpretability through spatial reasoning.

However, the deployment of such models in real-world, resource-constrained environments demands more than just accuracy. Models must also be lightweight, efficient, and capable of running on embedded systems with limited computational capacity—what is broadly referred to as edge AI [17,18]. In this context, we adopt the general term “edge computing” to encompass all forms of near-data inference, ranging from so-called Extreme Edge (or End Computing)—which includes wearable and embedded systems like the Raspberry Pi—to more capable edge servers. While distinctions exist between Extreme Edge, Edge, and Cloud layers, this study focuses on the edge spectrum as a whole, without explicitly differentiating among them. These Embedded AI systems must deliver reliable classification and detection performance while adhering to the constraints of portable or point-of-care hardware platforms, such as the Raspberry Pi or wearable devices.

1.1. Motivation and Research Gap

While both classification and detection models have individually demonstrated success in CXR interpretation, few studies have explored their performance under deployment constraints on edge hardware. Most existing research emphasizes classification or detection in isolation, often evaluated using high-end GPU infrastructure without consideration for embedded deployment [19]. Additionally, few works compare both paradigms—classification and detection—in terms of their performance trade-offs (accuracy vs. efficiency), interpretability, and real-time feasibility on edge devices.

This paper addresses this gap by conducting a comprehensive performance comparison of Embedded AI solutions for lung disease diagnosis, focusing on both classification and detection. We evaluate six widely used CNN models for five-class disease classification and a YOLO-based architecture for multi-label abnormality detection. Beyond accuracy, we benchmark each model’s inference latency on a Raspberry Pi 4 device, including detailed measurements of image preprocessing time, inference time, and overall end-to-end latency using a high-resolution input. Notably, the Raspberry Pi 4 is one of the most affordable and widely available single-board computers (SBCs), making it highly attractive for embedded deployment in clinical settings. Compared to wearable platforms such as the RealWear Navigator 500 and other edge AI-capable wearable devices, which are significantly more expensive, the Raspberry Pi offers a cost-effective solution. This economic advantage, combined with its flexibility and community support, was a key factor in selecting it as the reference platform for benchmarking real-time decision support systems—particularly in surgical environments or resource-constrained healthcare settings. The broader vision is that just as modern wearable devices provide web browsing or messaging capabilities on the wrist, future surgical wearables will embed real-time AI-driven decision support systems, enabling clinicians to receive diagnostic cues directly in the operating room.

Building on the identified research gap, this study explores the feasibility of a modular Embedded AI pipeline for lung disease diagnosis tailored to real-world deployment. The proposed system consists of two complementary components: (i) a multi-class classification model to support rapid triage and disease categorization, and (ii) a multi-label object detection model to provide spatial interpretability of thoracic abnormalities. While both components are designed for embedded environments, our current work focuses on evaluating the classification branch. Specifically, we deploy post-training quantized models on a Raspberry Pi 4 and assess real-time inference performance using high-resolution chest X-ray images. In parallel, we train a lightweight YOLOv8n model optimized for embedded detection, establishing the foundation for future integration. This modular approach offers a practical and scalable pathway toward real-time, interpretable decision support systems for mobile clinics, rural health centers, and surgical environments.

1.2. Objective

The primary objective of this study is to assess the performance and deployability of deep learning-based classification and detection models for lung disease diagnosis in embedded settings. Specifically, we aim to

Evaluate and compare six CNN-based classifiers (ResNet101, DenseNet201, MobileNetV3-Large, InceptionResNetV2, Xception, and EfficientNetV2-B0) on five lung disease categories using base and augmented datasets;
Implement and assess a lightweight object detection model (YOLOv8n) for localizing 14 distinct thoracic abnormalities in chest X-rays;
Analyze model trade-offs in terms of accuracy, interpretability, and edge-device performance, emphasizing their suitability for embedded deployment on devices such as Raspberry Pi.

1.3. Contributions

The primary contributions of this study are as follows:

A modular, edge-oriented AI pipeline is proposed for lung disease diagnosis, comprising classification and detection components. The classification component was experimentally validated on a Raspberry Pi 4 using INT8 quantized models after training, while the detection module (YOLOv8n) was trained and optimized for embedded deployment, but not yet deployed.
A comparative performance evaluation of six CNN-based classifiers (ResNet101, DenseNet201, MobileNetV3-Large, InceptionResNetV2, Xception, and EfficientNetV2-B0) was performed using base and augmented versions of a curated five-class lung disease dataset.
A lightweight YOLOv8n detector was trained to identify 14 thoracic abnormalities in chest X-rays, and its detection performance was analyzed using confidence-based metrics and mean Average Precision (mAP).
A detailed runtime analysis was performed on the Raspberry Pi 4, capturing model loading, image preprocessing, inference time, and end-to-end latency on high-resolution inputs. These metrics were used to evaluate classification performance under real-time embedded constraints.
The trade-offs between classification and detection were discussed from an Embedded AI deployment perspective, with emphasis on diagnostic complementarity, inference efficiency, and interpretability.

The remainder of the paper is structured as follows: Section 2 reviews the literature on deep learning for lung disease analysis and embedded AI. Section 3 describes the datasets, model architectures, and experimental setup. Section 4 presents the classification and detection results along with edge deployment performance. Section 5 explores diagnostic implications and limitations. Section 6 concludes the paper with future directions.

2. Related Work

The application of deep learning to medical imaging has driven significant progress in the diagnosis of lung diseases from chest X-ray (CXR) images. While many studies address classification or detection separately, few have explored their integration within a resource-efficient, Embedded AI framework suitable for real-world deployment. This section reviews the related literature across four themes: classification models, detection models, edge deployment strategies, and multi-stage diagnostic pipelines.

2.1. Deep Learning for Lung Disease Classification

CNN-based models have achieved strong results in lung disease classification from CXR images. Architectures such as ResNet [20], DenseNet [21], Inception [22], and EfficientNet [23] have demonstrated high accuracy across multi-class tasks involving conditions like COVID-19, pneumonia, and lung opacity [24,25]. These models learn spatial features hierarchically and often rival radiologists in performance.

However, their real-time deployment is often constrained by computational demands. To address this, lightweight models like MobileNetV2/V3 [26,27], SqueezeNet [28], and ShuffleNet [29] have been adopted for edge AI scenarios, sacrificing some predictive power for faster inference. Despite their efficiency, classification models typically lack interpretability and cannot localize abnormalities—limiting their clinical transparency.

2.2. Object Detection for Thoracic Abnormalities

To provide localization and visual justification, object detection models such as Faster R-CNN [30], RetinaNet [31], and YOLO [32] have been used to identify disease-specific regions within CXRs. The YOLO family, particularly YOLOv5 and YOLOv8, offers an effective balance of speed and accuracy for real-time medical applications [33]. Studies show YOLO-based models can detect thoracic conditions such as pleural effusion, lung nodules, and COVID-19-associated opacities [34,35]. Lightweight variants (YOLOv5n, YOLOv8n) have been developed to facilitate edge deployment.

Despite their value, detection models require detailed bounding box annotations, which are costly to obtain. Moreover, they are seldom evaluated on low-power devices, making their real-world feasibility uncertain—especially in embedded systems such as mobile or wearable diagnostic platforms.

2.3. Edge-AI Deployment for Medical Diagnosis

As Embedded AI systems gain traction in healthcare, the focus has shifted to making AI deployable on low-resource devices. Solutions such as Raspberry Pi, Jetson Nano, and RealWear headsets enable diagnostic inference in rural and mobile clinics [36,37,38,39,40]. Techniques like model quantization, pruning, and conversion to TFLite [41] or ONNX are commonly used to reduce model size and inference latency.

Several works have reported successful deployment of convolutional neural networks on Raspberry Pi for COVID-19 and pneumonia screening [42]. For object detection, models such as YOLOv5n have been demonstrated on Jetson devices with reasonable accuracy and speed [43,44]. However, side-by-side evaluations of classification and detection in the same embedded system remain rare, leaving a gap in comparative performance understanding.

2.4. Toward Unified and Interpretable Pipelines

Integrating classification and detection tasks offers the potential to build hybrid diagnostic pipelines, where a classification model rapidly screens an image for abnormality and a detection model provides spatial localization for detailed interpretation. This dual-stage approach mirrors real-world clinical workflows, where an initial diagnosis is often followed by targeted examination of suspected regions.

Some prior studies have explored combining classification with interpretability tools, such as class activation maps (CAMs) or attention-based mechanisms, to highlight regions associated with disease patterns. These methods provide visual cues alongside classification predictions, enhancing clinical transparency. One recent approach demonstrated the integration of Grad-CAM (Gradient-weighted Class Activation Mapping) with lightweight convolutional networks for multi-class pulmonary disease classification, effectively balancing model efficiency with interpretability [45]. However, most existing approaches stop short of full object detection and instead rely on weak localization without bounding box precision.

Moreover, the full integration of classification and object detection models into a single, resource-efficient, edge-deployable system remains largely unexplored. While cloud-based or high-end deployments exist, systematic evaluations of such hybrid pipelines under embedded computing constraints are rare. This gap motivates the present work, which benchmarks classification and detection models independently and analyzes their synergistic potential for real-time, interpretable, and lightweight lung disease diagnosis in Embedded AI environments.

3. Methodology

This study adopts a dual deep learning methodology to assess the performance and deployment feasibility of AI-based lung disease diagnosis using chest X-ray (CXR) images, particularly in resource-constrained environments. The workflow comprises two complementary tasks: image-level classification and object-level detection.

For classification, six state-of-the-art convolutional neural network (CNN) models—ResNet101, DenseNet201, MobileNetV3-Large, InceptionResNetV2, Xception, and EfficientNetV2-B0—are trained to categorize CXR images into five diagnostic classes: COVID-19, Bacterial Pneumonia, Viral Pneumonia, Lung Opacity, and Normal. Training is performed on a GPU workstation, and post-training inference is evaluated on a Raspberry Pi 4 Model B to analyze suitability for edge deployment.

In parallel, a YOLOv8n object detection model is employed to localize 14 thoracic abnormalities, including Consolidation, Pleural Effusion, and Pulmonary Fibrosis, using a separate, bounding box-annotated dataset. The model’s performance is evaluated using standard detection metrics such as mean Average Precision (mAP) and Intersection over Union (IoU).

Both tasks are independently executed with dedicated datasets and training pipelines. A correlation analysis is later performed to examine alignment between classification outcomes and localized findings. This dual-track approach facilitates the design of a hybrid diagnostic framework that combines speed, interpretability, and deployment feasibility for real-world edge-AI clinical applications.

3.1. Datasets and Models

3.1.1. Classification Dataset

The classification dataset used in this study follows the five-class configuration proposed by Vantaggiato et al. [46], covering COVID-19, Normal, Bacterial Pneumonia, Viral Pneumonia, and Lung Opacity (non-pneumonia diseases). This composite dataset was built from multiple public medical imaging repositories, including the IEEE8023 COVID-19 Chest X-ray dataset, the RSNA Pneumonia Detection Challenge, CheXpert, and the Kaggle Pneumonia dataset. The labels in these sources were derived from structured hospital records, institutional diagnoses, or radiologist-generated reports, ensuring clinical credibility. For example, CheXpert includes labels extracted from radiology reports using a rule-based NLP system, while the Kaggle datasets were annotated based on physician input or radiology summaries. In addition, 207 COVID-19 chest X-rays and an equal number of test samples from other classes were obtained directly from the Hospital of Tolga, Algeria, with diagnoses verified by clinical experts. These clinically vetted sources help ensure that the classification labels reflect real-world diagnostic decisions, although inter-observer variability remains a known limitation in medical imaging.

The curated classification dataset contains 2020 chest X-ray images, evenly distributed across the five diagnostic categories. For training and evaluation, the dataset was split using a 70%/15%/15% ratio into training, validation, and test subsets, respectively. To improve model generalization, an augmented version of the dataset was created, expanding the total to 9875 images. This corresponds to approximately four augmentation variants per original image. Augmentation operations included random horizontal flipping, rotation, brightness adjustments, and zoom transformations. A small portion of the dataset was retained for testing to evaluate model performance on unseen data.

Each image was labeled with a single class, without bounding boxes, making the dataset strictly classification-specific. Preprocessing involved resizing the images to architecture-dependent input dimensions—either

224 \times 224

or

299 \times 299

—and normalizing pixel values to the

[0, 1]

range. The dataset was used to evaluate six CNN architectures in terms of diagnostic accuracy, robustness to augmentation, and suitability for embedded deployment.

3.1.2. Detection Dataset

For the object detection task, we used a curated subset of the VinBigData Chest X-ray Abnormalities Detection dataset [47], publicly available in COCO format (vinbigdata_coco_chest_xray__wbf_yolo_his) via Kaggle (https://www.kaggle.com/datasets/mmmmmmmmmma/vinbigdata-coco-chest-xray-wbf-yolo-his) (accessed on 13 August 2025). This dataset originates from a large-scale initiative involving over 100,000 chest X-ray scans retrospectively collected from two major hospitals in Vietnam. From this raw collection, a subset of 18,000 images was manually annotated by 17 experienced radiologists with 22 local (bounding box) and 6 global (image-level) abnormality labels. The training set of 15,000 scans was triple-labeled by independent radiologists, while the 3000-scan test set was labeled by consensus from five experts, ensuring high-quality clinical annotation.

In our study, we used a refined version of this dataset comprising 3296 images in YOLO-ready format, where bounding boxes for 14 thoracic abnormalities were consolidated using Weighted Box Fusion (WBF). This format ensures greater consistency in overlapping predictions and minimizes annotation noise. Such clinically vetted and spatially localized labels are critical for training reliable object detection models like YOLOv8n.

3.1.3. Classification Models

We evaluated six convolutional neural networks (CNNs), selected for their architectural diversity and suitability for both high-accuracy and edge-friendly deployment: ResNet101 (deep residual connections), DenseNet201 (dense layer connectivity), MobileNetV3-Large (optimized for low-power devices), InceptionResNetV2 (inception modules with residual links), Xception (depthwise separable convolutions), and EfficientNetV2-B0 (compound model scaling). This selection balances classical and state-of-the-art CNN architectures to enable a comprehensive assessment across different model families. In particular, MobileNetV3-Large and EfficientNetV2-B0 are chosen for their proven efficiency on embedded platforms, while deeper models like DenseNet201 and ResNet101 serve as performance references. Although Xception is moderately heavy compared to MobileNet or EfficientNet, it was selected for its efficient use of depthwise separable convolutions and its ability to balance accuracy with computational cost. InceptionResNetV2, while more complex, was included for completeness due to its exceptional feature extraction capabilities and to provide insights into the performance limits of high-capacity models on constrained devices.

All models were trained on the five-class classification dataset using categorical cross-entropy loss and evaluated on both base and augmented datasets. Early stopping was used to prevent overfitting. Performance was assessed using accuracy, precision, recall, F1-score, and confusion matrices across the COVID-19, Bacterial Pneumonia, Viral Pneumonia, Lung Opacity, and Normal categories.

3.1.4. Detection Model

We employed YOLOv8n, a lightweight, anchor-free variant of the YOLOv8 family, optimized for efficient inference in edge environments. It features a CSPDarknet-inspired backbone and decoupled head for classification and regression, with support for ONNX and TensorRT exports. YOLOv8n was trained for 50 epochs using stochastic gradient descent (SGD) with a cosine learning rate schedule on a multi-label chest X-ray dataset comprising 14 thoracic abnormalities, including Consolidation, Pulmonary Fibrosis, and Pleural Effusion. The Ultralytics PyTorch [48,49] pipeline (Ultralytics v8.3.30, PyTorch v2.5.1) was used with real-time augmentations. Performance was evaluated using mAP@0.5, mAP@0.5:0.95, and class-wise precision, recall, and IoU, highlighting its potential for spatially aware, edge-deployable diagnosis.

3.2. Training Setup

All experiments were conducted on a high-performance setup with an NVIDIA Tesla GPU, 13th Gen Intel Core i7-1365U CPU (12 cores), 32 GB RAM, and Ubuntu 22.04. Classification models were implemented using TensorFlow [50] and Keras [51], while YOLOv8n detection employed the Ultralytics PyTorch pipeline.

For classification, all CNNs were initialized with ImageNet pretrained weights and trained with the Adam optimizer (learning rate: 0.0001) using categorical cross-entropy loss. Models were trained for up to 100 epochs (batch size of 32 and an input size of either

224 \times 224

or

299 \times 299

, depending on the model architecture) with early stopping and learning rate reduction on plateau. Early stopping was applied consistently across all CNN models using a patience value of 10 (i.e., training stopped if validation loss did not improve for 10 consecutive epochs), with a minimum delta threshold of 1 × 10⁻⁴. This configuration was selected after empirical testing showed improved convergence and final accuracy compared to a lower patience value. Data augmentation (flipping, rotation, zooming, brightness adjustment) was introduced in a secondary phase to improve generalization. Model checkpoints were based on best validation accuracy. For transparency, convergence plots (training and validation loss/accuracy) for all models trained on the augmented dataset are included in Appendix A. These reflect the final training phase used for model evaluation. Plots from base training are omitted to avoid redundancy, as augmentation was applied consistently across all models to improve generalization.

YOLOv8n was fine-tuned for 50 epochs with COCO-pretrained weights, using SGD (initial LR: 0.01 with cosine decay, momentum: 0.937, weight decay: 0.0005). Input size was

640 \times 640

pixels with batch size 16. Augmentations included mosaic, mixup, random scaling, and flipping. Model selection was based on peak validation mAP at IoU thresholds 0.5 and 0.5:0.95.

3.3. Edge Deployment

To assess feasibility in resource-constrained environments, all classification models were deployed on a Raspberry Pi 4 Model B (4 GB RAM, ARM Cortex-A72, Ubuntu 22.04, Python 3.9.2). Each model was converted from standard TensorFlow to TensorFlow Lite (TFLite), a lightweight framework optimized for mobile and embedded devices. TFLite models use a reduced runtime and smaller binaries, enabling fast and efficient inference on ARM-based systems. We applied post-training quantization, typically reducing weights and activations to 8-bit integer (int8) or 16-bit floating point (float16) precision, depending on model compatibility and target hardware. Full integer quantization was preferred for maximum speed and compression, while float16 was used where preserving numerical fidelity was important. This process significantly reduced memory footprint and improved inference speed while preserving classification performance and easing portability across devices. Inference was evaluated using 10 high-resolution test images per model, recording top-1 classification accuracy, model load time, image preprocessing time, inference time, and overall end-to-end latency. This setup was designed to closely simulate real-world deployment conditions on the Raspberry Pi 4 Model B using

2566 \times 2566

PNG inputs.

YOLOv8n was not yet deployed in this study due to its computational demands and the ongoing nature of our benchmarking on embedded accelerators. However, the model was exported to ONNX format to ensure compatibility with a wide range of hardware platforms, such as the NVIDIA Jetson Orin Nano or Raspberry Pi with TPU accelerator. This exportability supports a modular hybrid deployment strategy, wherein lightweight classification is performed locally on-device for rapid triage, while detection can be offloaded to nearby accelerators via Multi-access Edge Computing (MEC) for spatial interpretation. This forward-looking design aims to balance responsiveness, interpretability, and computational efficiency, paving the way for scalable deployment in real-world healthcare settings.

3.4. Evaluation Metrics

To evaluate model performance and deployment feasibility, we employed metrics tailored to classification, detection, and edge inference tasks.

3.4.1. Classification Metrics

For classification, we computed accuracy, precision, recall, and F1-score, along with confusion matrices across the five categories: COVID-19, Bacterial Pneumonia, Viral Pneumonia, Lung Opacity, and Normal. These metrics were calculated on both the base and augmented datasets to evaluate generalization capability. To assess edge deployment feasibility, we also recorded the top-1 accuracy on Raspberry Pi using high-resolution test images under quantized inference settings.

3.4.2. Detection Metrics

Detection performance was assessed using mean Average Precision (mAP) at IoU thresholds of 0.5 (mAP@0.5) and 0.5:0.95 (mAP@0.5:0.95). Additional metrics included class-wise precision, recall, and IoU to evaluate the spatial accuracy of localized predictions across 14 thoracic abnormalities.

3.4.3. Edge Inference Metrics

To assess deployment feasibility on Raspberry Pi, we evaluated the end-to-end latency of each quantized model using a high-resolution chest X-ray image (2566 × 2566, PNG format). The reported latency includes sample preprocessing and inference time, excluding model loading as it occurs only once during deployment. In addition, we recorded the TensorFlow Lite model size to quantify the trade-off between resource efficiency and diagnostic performance in embedded healthcare environments.

3.5. Correlation Analysis

To assess the diagnostic complementarity between classification and detection models, we conducted a correlation analysis linking the five classification categories—COVID-19, Bacterial Pneumonia, Viral Pneumonia, Lung Opacity, and Normal—with the 14 thoracic abnormalities identified by the YOLOv8n detection model. Although trained on separate datasets, both models reflect overlapping radiographic features observed in clinical practice.

Notable associations include Consolidation and Pleural Effusion with bacterial pneumonia; Infiltration, Pulmonary Fibrosis, and ILD with COVID-19 and lung opacity; and Atelectasis or Pleural Thickening with viral or post-infectious presentations. The Normal class generally aligned with the absence of detection outputs, reinforcing the consistency between tasks.

Table 1 summarizes these relationships, highlighting how spatially localized detections can contextualize and validate image-level classifications. This diagnostic synergy supports a hybrid framework wherein classification enables rapid triage and detection provides visual interpretability—especially valuable for real-time deployment on edge-AI systems in low-resource settings.

4. Results

This section presents the outcomes of our dual deep learning framework for lung disease diagnosis using chest X-ray images. We first evaluate the performance of six CNN-based classification models trained on both base and augmented datasets, followed by the detection performance of a YOLOv8n model on a separate multi-label dataset. We then assess edge deployment feasibility by benchmarking inference on a Raspberry Pi, and conclude with a combined analysis that explores the diagnostic complementarity of classification and detection.

4.1. Classification Results

We evaluated six CNN architectures—ResNet101, DenseNet201, MobileNetV3-Large, InceptionResNetV2, Xception, and EfficientNetV2-B0—for multi-class classification of chest X-rays across five disease categories. Table 2 and Table 3 summarize their performance on the base and augmented datasets, respectively.

On the base dataset, MobileNetV3-Large achieved the highest validation accuracy (62.0%), followed closely by ResNet101 (61.4%) and EfficientNetV2-B0 (60.4%). DenseNet201 delivered moderate performance (57.8%), while Xception (51.6%) and InceptionResNetV2 (20.0%) underperformed significantly. Notably, InceptionResNetV2 failed to learn effectively, with its validation accuracy plateauing at random guessing levels.

Data augmentation led to slight but consistent improvements across most models. MobileNetV3-Large improved to 63.4%, and ResNet101 to 62.8%, while DenseNet201 also saw a minor gain to 58.4%. These improvements suggest better generalization when exposed to increased data variability. However, some models—such as EfficientNetV2-B0—exhibited signs of overfitting, achieving very high training accuracy (96.05%) without corresponding improvements in validation performance.

To further assess model behavior under real-world conditions, we performed inference on a representative test image labeled as Bacterial Pneumonia. Among the six quantized models deployed on the Raspberry Pi 4, only the Xception model correctly identified the class, albeit with moderate confidence (41.4%). All other models misclassified the sample, with DenseNet201, ResNet101, and InceptionResNetV2 predicting COVID-19, EfficientNetV2-B0 predicting Viral Pneumonia, and MobileNetV3-Large labeling it as Normal. Most incorrect predictions were made with high confidence, indicating overconfidence in misclassification and revealing overlapping feature representations among pneumonia-related classes. This real-world inference underscores the challenge of distinguishing bacterial and viral pathologies, highlighting the importance of improved model calibration and interpretability in clinical AI deployments.

Overall, these results emphasize the trade-offs between model capacity and generalization. Lightweight models such as MobileNetV3-Large achieved strong validation performance with relatively low overfitting, making them suitable candidates for deployment in resource-constrained environments, as discussed in Section 4.3.

4.2. Detection Results

We evaluated YOLOv8n for multi-label detection of 14 thoracic conditions from chest X-rays. The model achieved a mAP@0.5 of 27.6% and mAP@0.5:0.95 of 14.7%, reflecting moderate performance given the class imbalance, small lesion sizes, and visual overlap between conditions.

Figure 1 shows the normalized confusion matrix, highlighting high true positive rates for well-defined classes such as Pulmonary Fibrosis, Pleural Effusion, and Aortic Enlargement. Lower recall was observed for rare and visually ambiguous findings like Calcification and Pneumothorax.

Loss convergence (Figure 2) indicates stable training across all components. Qualitative examples in Figure 3 confirm YOLOv8n’s ability to accurately localize multiple conditions, including co-occurrences like Cardiomegaly and Pleural Effusion. However, smaller pathologies (e.g., Nodule/Mass) were frequently missed, revealing limitations in spatial sensitivity.

Detection behavior is summarized in Figure 4, which includes precision–recall, confidence-based, and F1 performance curves. Precision reached 1.00 at high confidence (0.872), while F1-score peaked at 0.31 (threshold = 0.091), suggesting the model favors high-precision predictions at the expense of recall. We selected the nano version of YOLOv8 due to its compact architecture and significantly reduced computational footprint, making it more suitable for resource-constrained environments and compatible with potential deployment on edge accelerators like Jetson Nano or Coral TPU.

Figure 4e,f highlight the dataset’s skewed label distribution and central clustering of bounding boxes, which may bias detection outcomes. Most annotations overlap in the thoracic region, complicating fine-grained localization in multi-disease contexts.

In summary, YOLOv8n demonstrates reliable detection for dominant classes but struggles with rare or subtle features. Future improvements may include class re-weighting, advanced augmentation, and multi-scale feature enhancement. Nonetheless, YOLOv8n’s efficiency, ONNX support, and edge compatibility make it a strong candidate for hybrid diagnostic pipelines alongside classification models.

4.3. Edge Inference Results

To evaluate real-world feasibility, we measured the end-to-end latency of each quantized classification model on a Raspberry Pi 4, using a high-resolution chest X-ray image (

2566 \times 2566

, PNG format) to better reflect deployment scenarios in clinical settings. The reported latency includes image preprocessing and inference time, while model loading is excluded, as it occurs only once during deployment and is not part of repeated inference cycles. Among the tested models, MobileNetV3-Large achieved the fastest performance with an end-to-end latency of 429.6 ms, followed by EfficientNetV2-B0 at 623.0 ms, confirming their suitability for real-time embedded deployment. In contrast, InceptionResNetV2 and ResNet101 exhibited significantly higher latencies of 2300.7 ms and 1908.2 ms, respectively, due to their deeper architectures and larger parameter counts. Notably, even with a full-resolution image, all models completed end-to-end inference in under 2.4 s, demonstrating the feasibility of deploying AI-based decision support systems on embedded hardware, provided that quantization and architectural choices are carefully optimized for edge constraints. A detailed breakdown of model load time, preprocessing time, inference time, and total latency is presented in Table 4. It is important to clarify that model loading time was excluded from our reported total latency, as it is a one-time operation performed during system initialization. In realistic deployment scenarios, the model remains loaded in memory for repeated use, and therefore does not contribute to the per-image inference delay. The latency values presented reflect only the actual runtime operations preprocessing and inference, that occur for each input image.

The YOLOv8n detection model was not deployed on the Raspberry Pi due to its computational demands. Instead, it was exported to ONNX format for execution on edge accelerators such as NVIDIA Jetson Nano or Coral TPU. In realistic setups, a hybrid diagnostic pipeline is envisioned: lightweight CNNs perform local classification on-device, while detection is offloaded to a Multi-access Edge Computing (MEC) server over 5G for spatial interpretation. This design optimizes diagnostic responsiveness and interpretability in mobile and underserved clinical environments.

4.4. Impact of Quantization on Prediction Confidence

To evaluate the effect of post-training quantization on model prediction behavior, we compared the classification outcomes and softmax confidence values of all six models before and after 8-bit quantization, using the same high-resolution chest X-ray image (2566 × 2566 pixels, PNG format). As shown in Appendix B (Table A1), quantization preserved the predicted class in five out of six models, with only DenseNet201 exhibiting a shift in predicted label from Normal to COVID-19.

Despite minor shifts in probability magnitudes, the overall confidence distributions remained stable across models, with the dominant class maintaining its relative position. Quantized models displayed slightly sharpened or dampened softmax values, a typical artifact of reduced floating-point precision. Crucially, no unsafe class inversions—such as a misclassification between Normal and COVID-19—were observed, preserving the integrity of clinical safety margins.

These results suggest that 8-bit quantization introduces minimal risk to classification reliability when applied to well-optimized models, reinforcing its suitability for resource-constrained, real-time medical inference on embedded edge hardware.

4.5. Combined Analysis and Insights

To assess the diagnostic synergy of classification and detection, we analyzed how their outputs aligned and complemented each other. Classification models, such as DenseNet201 and EfficientNetV2-B0, offered accurate image-level predictions but lacked spatial interpretability. In contrast, the YOLOv8n detection model localized 14 thoracic abnormalities, many of which semantically mapped to the classification categories.

For example, Consolidation and Infiltration often co-occurred in cases predicted as Bacterial or Viral Pneumonia, while findings such as Pulmonary Fibrosis, ILD, and Lung Opacity were common in COVID-19 diagnoses. Pleural Effusion frequently accompanied Bacterial Pneumonia, supporting cross-validation of model outputs.

In uncertain classification cases, YOLOv8n’s localized outputs (e.g., bounding boxes for Effusion) enhanced interpretability and offered decision support. While visual overlays were not included in this study, the observed semantic consistency between detection and classification underscores the diagnostic value of combining both approaches.

This supports a hybrid Embedded AI pipeline: classification provides rapid triage on edge devices like Raspberry Pi, and detection—offloaded to accelerators or MEC infrastructure—adds spatial context. Such an architecture enhances transparency, reliability, and responsiveness in low-resource deployments.

4.6. Summary of Key Findings

Classification Performance: DenseNet201 and EfficientNetV2-B0 achieved top accuracy; MobileNetV3-Large offered the best speed–accuracy trade-off for edge deployment.
Data Augmentation Benefits: Augmented datasets improved generalization, especially for compact models, and reduced class confusion.
Detection Effectiveness: YOLOv8n achieved mAP@0.5 of 27.6% and mAP@0.5:0.95 of 14.7%. High precision was observed for visually distinct classes, with lower recall for rare or subtle findings.
Edge Feasibility: MobileNetV3-Large and EfficientNetV2-B0 demonstrated efficient inference on Raspberry Pi. YOLOv8n is better suited for hardware accelerators or MEC offloading.
Hybrid Potential: Correlation between classification and detection outputs validates a combined framework, enhancing diagnostic confidence, transparency, and edge deployability.

These results underscore the value of combining classification and detection in Embedded AI solutions for lung disease diagnosis and support their integration into real-world clinical workflows.

5. Discussion

This study explored a dual deep learning strategy—image-level classification and object-level detection—for AI-assisted lung disease diagnosis from chest X-rays. The integration of both approaches offers a more comprehensive and interpretable diagnostic workflow, particularly valuable in low-resource settings.

5.1. Interpretability and Practical Deployment

One of the primary challenges in deploying classification models in clinical practice is their limited transparency. The integration of object detection via YOLOv8n improves interpretability by providing spatial localization of abnormalities (e.g., Consolidation, Pleural Effusion, Pulmonary Fibrosis). These localized outputs align with classification labels, enhancing clinical trust and supporting more informed decision-making.

From a deployment standpoint, our experiments on Raspberry Pi 4 confirmed that lightweight classification models such as MobileNetV3-Large and EfficientNetV2-B0 achieve a favorable trade-off between accuracy and latency, enabling real-time inference in embedded contexts. While the YOLOv8n detection model was not deployed as part of this work, it is technically feasible to run it on Raspberry Pi 4 when optimized (e.g., via ONNX or TFLite) and supported by lightweight runtime environments. For enhanced performance or throughput, integration with hardware accelerators like the Coral TPU or NVIDIA Jetson Orin Nano is also possible. In a practical deployment scenario, classification could be used for on-device triage, while detection would provide spatial explainability, either on the same device or via a lightweight co-processor, enabling a flexible and interpretable edge-AI diagnostic workflow.

5.2. Complementary Value of Classification and Detection

Although this study evaluates classification and detection models independently, their integration presents practical diagnostic advantages aligned with real-world clinical workflows. The classification component serves as a rapid, low-complexity screening tool suitable for embedded platforms like Raspberry Pi, while the detection module contributes interpretability through spatial localization of abnormalities.

While a quantitative fusion of outputs was not conducted, qualitative observations indicate that the detected regions of interest often align well with the classification categories, particularly for conditions such as bacterial pneumonia, COVID-19, and pleural effusion. This complementary behavior supports the envisioned modular pipeline, where classification facilitates triage and detection augments interpretability.

Notably, the full integration of both modules into a unified edge-deployable pipeline remains a future goal. However, the modular architecture and deployment-aware design already offer insights into how such hybrid AI systems can be structured. Future work will investigate adaptive strategies, where detection is selectively triggered based on classification uncertainty, thereby enhancing diagnostic precision while managing computational overhead in resource-constrained environments.

5.3. Impact of Data Augmentation and Future Dataset Expansion

The application of standard data augmentation techniques yielded only marginal improvements for certain lightweight models and did not substantially enhance overall classifier performance. This outcome suggests that, within our current experimental framework, augmentation alone is insufficient to overcome the limitations imposed by the relatively small dataset size. Moreover, conventional augmentation techniques may not always introduce meaningful diversity to chest X-ray images, as common transformations can result in redundant or noninformative alterations to lung-specific features. This highlights the need for more context-aware or advanced augmentation strategies—tailored specifically for medical imaging—that preserve anatomical realism while enriching training variability.

Future research should focus on expanding the dataset in a more systematic and scalable manner, ideally incorporating real-world clinical diversity. Federated learning frameworks can offer a viable path by enabling collaborative model training across distributed institutions while preserving patient privacy and adhering to regulatory constraints.

In parallel, emphasis should remain on training low-complexity, quantization-friendly models that retain high diagnostic value while being optimized for real-time inference on resource-constrained edge devices.

5.4. Limitations and Future Work

A key limitation of this study lies in the use of separate datasets for classification and detection tasks, preventing joint optimization or end-to-end learning of a unified model. Future research could explore semi-supervised learning or multi-task learning approaches to bridge this gap and exploit partially labeled data.

Additionally, although the feasibility of classification model inference was thoroughly validated on Raspberry Pi 4 using quantized models, the hybrid pipeline integrating both classification and detection was not experimentally implemented or profiled end-to-end. Nevertheless, the proposed modular architecture reflects a novel and practical strategy, where lightweight classification is performed on-device for rapid triage, and detection is selectively activated via remote accelerators such as Jetson Orin Nano or edge servers. This design anticipates real-time decision support systems deployable in surgery rooms or mobile clinical setups. Future work will focus on fully integrating this pipeline and evaluating its end-to-end latency, diagnostic synergy, and clinical viability.

Another important limitation concerns the relatively small dataset and class imbalance, which particularly affects rare abnormalities and their detection accuracy. Enlarging the dataset with clinically validated annotations would improve the model’s generalization capacity. More rigorous cross-dataset validation and clinical trials involving real-world deployment are necessary to ensure clinical robustness.

Furthermore, some models required different input resolutions (e.g.,

224 \times 224

vs.

299 \times 299

), which might have affected performance consistency. Standardizing preprocessing pipelines or using resolution-invariant architectures should be explored in future work.

Moreover, while this study focused on six well-established CNN architectures for their diversity and relevance to embedded deployment, future work will expand the model pool to include additional lightweight families such as ShuffleNet, SqueezeNet. This will enable a more comprehensive exploration of the trade-offs between latency, accuracy, and deployability across a broader spectrum of edge-oriented models.

Finally, while the datasets used were labeled by radiologists or derived from structured hospital sources, no direct collaboration with clinical experts was involved in evaluating the model’s interpretability or usability. Future efforts should prioritize expert-in-the-loop validation to align AI outputs with clinical workflows.

Additionally, the datasets used in this study lack complete demographic annotations such as age, sex, or ethnicity, limiting our ability to assess or control for demographic bias. This represents an important constraint on the generalizability of the results, particularly for real-world deployment across diverse populations. Future work should consider demographic balancing and fairness-aware modeling to ensure equitable diagnostic performance across groups.

5.5. Novelty and Positioning

This work introduces a modular, edge-oriented AI pipeline for lung disease diagnosis, integrating both multi-class classification and multi-label object detection to reflect real-world diagnostic workflows. While most prior studies emphasize accuracy on high-end computing platforms, few consider the deployability of AI models on resource-constrained edge hardware. To the best of our knowledge, no existing work systematically evaluates the feasibility and complementary value of combining classification and detection models in embedded settings.

The novelty of this study lies in (i) benchmarking six quantized classification models on Raspberry Pi 4 for real-time inference, using high-resolution chest X-ray inputs to simulate deployment conditions, and (ii) training a YOLOv8n detection model that is compatible with embedded accelerators, though not deployed in this version. Together, these components demonstrate a theoretically viable hybrid diagnostic pipeline, with classification enabling rapid triage and detection offering spatial interpretability.

While a full end-to-end prototype was not implemented, this modular approach forms the foundation for future hybrid pipelines deployable in mobile clinics, wearable medical devices, or surgical assistance systems. The study prioritizes a balance between speed, interpretability, and resource efficiency—three essential pillars for embedded AI solutions in real-time, distributed healthcare environments. This forward-looking perspective distinguishes the work from the existing literature focused solely on algorithmic performance or isolated evaluation.

6. Conclusions

This study conducted a comprehensive performance comparison of Embedded AI solutions for lung disease diagnosis from chest X-ray images, evaluating both classification and detection models within the context of edge-deployable healthcare. By integrating image-level classification with object-level detection, we addressed key challenges related to diagnostic accuracy, interpretability, and real-world feasibility.

Six CNN-based classifiers were benchmarked across base and augmented datasets for five-class classification. MobileNetV3-Large and EfficientNetV2-B0 demonstrated the most favorable trade-off between classification accuracy and latency, achieving real-time performance on Raspberry Pi using quantized models. For detection, a lightweight YOLOv8n model was trained to localize 14 thoracic abnormalities with competitive mAP scores, particularly excelling in the detection of high-frequency and spatially distinct conditions such as Cardiomegaly, Pleural Effusion, and Pulmonary Fibrosis. Although YOLOv8n was not deployed on embedded hardware in this study, its compact architecture makes it a promising candidate for future edge deployment.

The observed alignment between classification predictions and detected abnormalities underscores the diagnostic synergy of combining both approaches. This supports the feasibility of a hybrid diagnostic pipeline, wherein classification provides efficient triage and detection enhances clinical interpretability through spatial verification—especially valuable in low-resource or mobile healthcare settings.

In summary, this work contributes a modular, deployable AI framework that leverages the complementary strengths of classification and detection. Through experimental validation of classification inference on Raspberry Pi and the training of an edge-compatible detection model, it addresses real-world constraints often overlooked in the existing literature. Future work will focus on fully integrating and evaluating this hybrid pipeline under real-time and system-level conditions, along with clinical validation and fairness-aware modeling to ensure equitable deployment.

Author Contributions

Conceptualization, M.S.A., S.G. and D.A.; methodology, M.S.A.; formal analysis, M.S.A.; data curation, M.S.A.; writing—original draft preparation, M.S.A.; writing—review and editing, M.S.A., S.G. and D.A.; supervision, S.G. and D.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially supported by the Ministry of University and Research (MUR) within the framework of the FoReLab project (Departments of Excellence). It was also developed within the framework of the project e.INS—Ecosystem of Innovation for Next Generation Sardinia (Code ECS00000038) and was partially funded by the Italian Ministry of University and Research (MUR) under the National Recovery and Resilience Plan (PNRR)—Mission 4, Component 2, “From research to business,” Investment 1.5, “Creation and strengthening of ecosystems of innovation,” and construction of “Territorial R&D Leaders”.

Institutional Review Board Statement

Not applicable. This study did not involve humans or animals and was conducted using publicly available medical image datasets.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets presented in this study are publicly available. The classification dataset follows the five-class configuration of Vantaggiato et al. [46] and was derived from the following public resources: (1) IEEE8023 COVID-19 Chest X-ray dataset (https://github.com/ieee8023/covid-chestxray-dataset), (2) CheXpert dataset (https://stanfordmlgroup.github.io/competitions/chexpert/), (3) Kaggle Pneumonia dataset (https://www.kaggle.com/datasets/paultimothymooney/chest-xray-pneumonia), (4) RSNA Pneumonia Detection Challenge dataset (https://www.kaggle.com/competitions/rsna-pneumonia-detection-challenge/data), and (5) the Hospital of Tolga, Algeria dataset (https://github.com/Edo2610/Covid-19_X-ray_Two-proposed-Databases/tree/main/Datasets/5-classes/Test/Covid-19). (6) The VinBigData Chest X-ray Abnormalities Detection dataset [47] is available on Kaggle (https://www.kaggle.com/datasets/mmmmmmmmmma/vinbigdata-coco-chest-xray-wbf-yolo-his), all accessed on 13 August 2025.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Convergence Curves of CNN Classifiers

Figure A1. Training and validation accuracy/loss curves of MobileNetV3-Large, EfficientNetV2-B0, and DenseNet201 models trained on the augmented dataset.

Figure A2. Training and validation accuracy/loss curves of ResNet101, Xception, and InceptionResNetV2 models trained on the augmented dataset.

Appendix B. Quantization Impact on Prediction Confidence

Table A1 presents a side-by-side comparison of softmax outputs for each model, evaluated on a representative high-resolution image. This comparison highlights the effect of INT8 quantization on class probabilities and confidence shifts.

Table A1. Comparison of softmax probabilities before and after quantization (full-resolution image: 2566 × 2566).

Model	Class 0	Class 1	Class 2	Class 3	Class 4	Predicted Class
DenseNet201
FP32 (PC)	0.1046	0.3547	0.0010	0.5338	0.0059	Normal (3)
INT8 (RPI)	0.0312	0.7852	0.0312	0.0938	0.0586	COVID-19 (1)
EfficientNetV2B0
FP32 (PC)	0.0000	0.0153	0.0000	0.0009	0.9838	Viral (4)
INT8 (RPI)	0.0000	0.0352	0.0000	0.0117	0.9531	Viral (4)
InceptionResNetV2
FP32 (PC)	0.1569	0.1868	0.2359	0.2187	0.2018	Lung Opacity (2)
INT8 (RPI)	0.1445	0.2812	0.2148	0.2031	0.1523	COVID-19 (1)
MobileNetV3Large
FP32 (PC)	0.0076	0.0158	0.0018	0.7224	0.2524	Normal (3)
INT8 (RPI)	0.0391	0.2812	0.0273	0.6523	0.0039	Normal (3)
ResNet101
FP32 (PC)	0.0000	0.9998	0.0000	0.00001	0.0002	COVID-19 (1)
INT8 (RPI)	0.0000	0.9961	0.0000	0.0000	0.0000	COVID-19 (1)
Xception
FP32 (PC)	0.4594	0.0250	0.0006	0.1993	0.3157	Bacterial (0)
INT8 (RPI)	0.4141	0.0742	0.0195	0.2617	0.2344	Bacterial (0)

References

Momtazmanesh, S.; Ghasemi, H.; Saghazadeh, A.; Rezaei, N. Global Burden of Chronic Respiratory Diseases and Risk Factors, 1990–2019: An Update from the Global Burden of Disease Study 2019. eClinicalMedicine 2023, 59, 101936. [Google Scholar] [CrossRef]
Boers, E.; Barrett, M.; Su, J.G.; Benjafield, A.V.; Sinha, S.; Kaye, L.; Zar, H.J.; Vuong, V.; Tellez, D.; Gondalia, R.; et al. Global Burden of Chronic Obstructive Pulmonary Disease Through 2050. JAMA Netw. Open 2023, 6, e2346598. [Google Scholar] [CrossRef]
Frija, G.; Salama, D.H.; Kawooya, M.G.; Allen, B. A Paradigm Shift in Point-of-Care Imaging in Low-Income and Middle-Income Countries. eClinicalMedicine 2023, 62, 102114. [Google Scholar] [CrossRef] [PubMed]
Shem, S.L.; Ugwu, A.C.; Hamidu, A.U.; Flavious, N.; Ibrahim, M.; Zira, D. Challenges, Opportunities and Strategies of Global Health Radiology in Low and Middle-Income Countries (LMICs): An Excerpt Review. J. Cancer Prev. Curr. Res. 2022, 13, 14–20. [Google Scholar] [CrossRef]
Lamichhane, B.; Neupane, N. Improved Healthcare Access in Low-Resource Regions: A Review of Technological Solutions. arXiv 2022, arXiv:2205.10913. [Google Scholar] [CrossRef]
Zhou, S.K.; Greenspan, H.; Davatzikos, C.; Duncan, J.S.; Van Ginneken, B.; Madabhushi, A.; Prince, J.L.; Rueckert, D.; Summers, R.M. A Review of Deep Learning in Medical Imaging: Imaging Traits, Technology Trends, Case Studies with Progress Highlights, and Future Promises. Proc. IEEE 2021, 109, 820–838. [Google Scholar] [CrossRef]
Zhang, H.; Qie, Y. Applying Deep Learning to Medical Imaging: A Review. Appl. Sci. 2023, 13, 10521. [Google Scholar] [CrossRef]
Bayareh-Mancilla, R.; Medina-Ramos, L.A.; Toriz-Vázquez, A.; Hernández-Rodríguez, Y.M.; Cigarroa-Mayorga, O.E. Automated Computer-Assisted Medical Decision-Making System Based on Morphological Shape and Skin Thickness Analysis for Asymmetry Detection in Mammographic Images. Diagnostics 2023, 13, 3440. [Google Scholar] [CrossRef]
Gefter, W.B.; Post, B.A.; Hatabu, H. Commonly Missed Findings on Chest Radiographs: Causes and Consequences. Chest 2023, 163, 650–661. [Google Scholar] [CrossRef]
Dreyer, R.G.; van der Merwe, C.M.; Nicolaou, M.A.; Richards, G.A. Assessing and Comparing Chest Radiograph Interpretation in the Department of Internal Medicine at the University of the Witwatersrand Medical School, According to Seniority. Afr. J. Thorac. Crit. Care Med. 2023, 29, 18–23. [Google Scholar] [CrossRef]
Kumar, R.R.; Shankar, S.V.; Jaiswal, R.; Ray, M.; Budhlakoti, N.; Singh, K.N. Advances in Deep Learning for Medical Image Analysis: A Comprehensive Investigation. J. Stat. Theory Pract. 2025, 19, 9. [Google Scholar] [CrossRef]
Mienye, I.D.; Swart, T.G.; Obaido, G.; Jordan, M.; Ilono, P. Deep convolutional neural networks in medical image analysis: A review. Information 2025, 16, 195. [Google Scholar] [CrossRef]
Rajpurkar, P.; Irvin, J.; Zhu, K.; Yang, B.; Mehta, H.; Duan, T.; Ding, D.; Bagul, A.; Langlotz, C.; Shpanskaya, K.; et al. CheXNet: Radiologist-Level Pneumonia Detection on Chest X-rays with Deep Learning. arXiv 2017, arXiv:1711.05225. [Google Scholar]
Wang, L.; Lin, Z.Q.; Wong, A. COVID-Net: A Tailored Deep Convolutional Neural Network Design for Detection of COVID-19 Cases from Chest X-ray Images. Sci. Rep. 2020, 10, 19549. [Google Scholar] [CrossRef] [PubMed]
Biró, A.; Szilágyi, L. An Enhanced AI Pipeline for the Detection and Preliminary Diagnosis of Pneumonia and Pulmonary Malformations in Athletes with YOLOv11. In Proceedings of the 2025 IEEE 29th International Conference on Intelligent Engineering Systems (INES), Palermo, Italy, 11–13 June 2025; pp. 000231–000236. [Google Scholar] [CrossRef]
Terven, J.; Córdova-Esparza, D.-M.; Romero-González, J.-A. A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]
Singh, R.; Gill, S.S. Edge AI: A Survey. Internet Things Cyber-Phys. Syst. 2023, 3, 71–92. [Google Scholar] [CrossRef]
Gill, S.S.; Golec, M.; Hu, J.; Xu, M.; Du, J.; Wu, H.; Walia, G.K.; Murugesan, S.S.; Ali, B.; Kumar, M.; et al. Edge AI: A Taxonomy, Systematic Review and Future Directions. Cluster Comput. 2025, 28, 18. [Google Scholar] [CrossRef]
Gupta, V.; Erdal, B.; Ramirez, C.; Floca, R.; Genereaux, B.; Bryson, S.; Bridge, C.; Kleesiek, J.; Nensa, F.; Braren, R.; et al. Current State of Community-Driven Radiological AI Deployment in Medical Imaging. JMIR AI 2024, 3, e55833. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar] [CrossRef]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar] [CrossRef]
Tan, M.; Le, Q. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning (ICML), Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
Wang, X.; Peng, Y.; Lu, L.; Lu, Z.; Bagheri, M.; Summers, R.M. ChestX-ray8: Hospital-Scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 3462–3471. [Google Scholar] [CrossRef]
Irvin, J.; Rajpurkar, P.; Ko, M.; Yu, Y.; Ciurea-Ilcus, S.; Chute, C.; Marklund, H.; Haghgoo, B.; Ball, R.; Shpanskaya, K.; et al. CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; pp. 590–597. [Google Scholar]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for MobileNetV3. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar] [CrossRef]
Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level Accuracy with 50x Fewer Parameters and <0.5MB Model Size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 6848–6856. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Advances in Neural Information Processing Systems 28 (NeurIPS 2015); Curran Associates, Inc.: Red Hook, NY, USA, 2015; pp. 91–99. Available online: https://arxiv.org/abs/1506.01497 (accessed on 25 April 2025).
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar] [CrossRef]
Rahimzadeh, M.; Attar, A. A Fully Automated Deep Learning-Based Network for Detecting COVID-19 from a New and Large Lung CT Scan Dataset. Biomed. Signal Process. Control 2021, 68, 102588. [Google Scholar] [CrossRef]
Li, Y.; He, X. COVID-19 Detection in Chest Radiograph Based on YOLO v5. In Proceedings of the 2021 IEEE International Conference on Computer Science, Electronic Information Engineering and Intelligent Control Technology (CEI), Fuzhou, China, 22–24 October 2021; pp. 344–347. [Google Scholar] [CrossRef]
Xu, Y.; Khan, T.M.; Song, Y.; Meijering, E. Edge deep learning in computer vision and medical diagnostics: A comprehensive survey. Artif. Intell. Rev. 2025, 58, 93. [Google Scholar] [CrossRef]
Ng, C.K.C.; Baldock, M.; Newman, S. Use of Smart Glasses (Assisted Reality) for Western Australian X-ray Operators’ Continuing Professional Development: A Pilot Study. Healthcare 2024, 12, 1253. [Google Scholar] [CrossRef] [PubMed]
Bhosale, Y.H.; Patnaik, K.S. IoT Deployable Lightweight Deep Learning Application for COVID-19 Detection with Lung Diseases Using Raspberry Pi. In Proceedings of the 2022 International Conference on IoT and Blockchain Technology (ICIBT), Ranchi, India, 28–30 April 2022; pp. 1–6. [Google Scholar] [CrossRef]
Talasila, S.; Gurrala, V.K.; Vijaya Babu, E.; Ramyasri, V.; Basavaraju, T.; Pallerla, S. Raspberry Pi Powered Deep Learning-Based Chest X-ray Analysis for Medical Diagnosis. In Proceedings of the 2024 IEEE International Conference on Blockchain and Distributed Systems Security (ICBDS), Pune, India, 15–17 March 2024; pp. 1–7. [Google Scholar] [CrossRef]
Lou, L.; Liang, H.; Wang, Z. Deep-Learning-Based COVID-19 Diagnosis and Implementation in Embedded Edge-Computing Device. Diagnostics 2023, 13, 1329. [Google Scholar] [CrossRef]
TensorFlow Developers. TensorFlow Lite. Available online: https://www.tensorflow.org/lite (accessed on 25 April 2025).
Chowdhury, S.; Doulah, A.B.M.S.U.; Rasheduzzaman, M.; Rafa, T.S. Pediatric Pneumonia Diagnosis: Integration of a Self-Assembled Digital Stethoscope with Raspberry Pi and 1D CNN Model. In Proceedings of the 2024 International Conference on Advances in Computing, Communication, Electrical, and Smart Systems (iCACCESS), Dhaka, Bangladesh, 8–9 March 2024; pp. 1–6. [Google Scholar] [CrossRef]
Satheeshkumar, K.G.; Arunachalam, V.; Deepika, S. Hand-Held GPU Accelerated Device for Multiclass Classification of X-ray Images Using CNN Model. Microprocess. Microsystems 2024, 106, 105046. [Google Scholar] [CrossRef]
Kosuri, B.P.; Raj Vattikuti, N.P.; Geddavalasa, P.N.; Goriparthi, P.T.; Lingamaneni, K. Lung Diseases Detection Using Deep Learning on NVIDIA Jetson Nano. In Proceedings of the 2024 International Conference on Recent Innovation in Smart and Sustainable Technology (ICRISST), Bengaluru, India, 5–6 April 2024; pp. 1–5. [Google Scholar] [CrossRef]
Khan, S.; Siddiqui, F.; Ahad, M.A. Bridging efficiency and interpretability: Explainable AI for multi-classification of pulmonary diseases utilizing modified lightweight CNNs. Image Vis. Comput. 2025, 158, 105553. [Google Scholar] [CrossRef]
Vantaggiato, E.; Paladini, E.; Bougourzi, F.; Distante, C.; Hadid, A.; Taleb-Ahmed, A. COVID-19 recognition using ensemble-CNNs in two new chest X-ray databases. Sensors 2021, 21, 1742. [Google Scholar] [CrossRef]
Nguyen, H.Q.; Lam, K.; Le, L.T.; Pham, H.H.; Tran, D.Q.; Nguyen, D.B.; Le, D.D.; Pham, C.M.; Tong, H.T.; Dinh, D.H.; et al. VinDr-CXR: An open dataset of chest X-rays with radiologist’s annotations. Sci. Data 2022, 9, 429. [Google Scholar] [CrossRef]
Ultralytics. YOLOv8 Models Documentation. Available online: https://docs.ultralytics.com/models/yolov8/ (accessed on 25 April 2025).
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32; Curran Associates, Inc.: Red Hook, NY, USA, 2019; pp. 8024–8035. Available online: http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf (accessed on 13 August 2025).
Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al. TensorFlow: A System for Large-Scale Machine Learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), Savannah, GA, USA, 2–4 November 2016; pp. 265–283. [Google Scholar]
Chollet, F. Keras. Available online: https://github.com/fchollet/keras (accessed on 25 April 2025).

Figure 1. Normalized confusion matrix for the 14-class detection task.

Figure 2. Training and validation performance of YOLOv8n over 50 epochs. The top row shows training losses for bounding box regression (box_loss), classification (cls_loss), and distribution focal loss (dfl_loss), along with batch-wise precision and recall. The bottom row presents corresponding validation losses and evaluation metrics, including mAP@0.5 and mAP@0.5:0.95. The x-axis denotes the number of training epochs.

Figure 3. Sample detection outputs from YOLOv8n showing bounding boxes for multiple co-occurring thoracic abnormalities.

Figure 4. Summary of detection performance and dataset distribution. (a) Precision–recall curve showing mAP@0.5 performance; (b) precision–confidence curve for different pathologies; (c) recall–confidence curve for confidence thresholding; (d) F1–confidence curve for different pathologies; (e) class distribution and bounding box characteristics; (f) correlation plot of bounding box parameters.

Table 1. Correlation between detection and classification classes in lung disease diagnosis. Detection classes are grouped based on their alignment with classification categories.

Detection Class	Related Classification Class(es)	Clinical/Diagnostic Relationship
Strong Overlap with Classification Classes
Consolidation	Bacterial Pneumonia, Viral Pneumonia	Characteristic radiological sign of pneumonia
Lung Opacity	Lung Opacity, COVID-19, Pneumonia	Shared imaging marker across multiple conditions
Infiltration	COVID-19, Bacterial/Viral Pneumonia	Common in acute infectious lung disease
Pleural Effusion	Bacterial Pneumonia	Frequently accompanies bacterial infections
Pulmonary Fibrosis	Post-COVID-19	Chronic complication from viral infections
Atelectasis	COVID-19, Viral Pneumonia	Collapse seen in infection-related pathology
Interstitial Lung Disease (ILD)	Lung Opacity, COVID-19	Overlaps with diffuse opacity and fibrotic changes
Other Lesion	Lung Opacity	Indicates generalized radiographic abnormality
Pleural Thickening	Post-infection	Sign of chronic inflammation or scarring
No Direct Overlap but Clinically Relevant
Aortic Enlargement	—	Reflects comorbidity; relevant for clinical context
Cardiomegaly	—	May coexist with lung pathology
Calcification	—	Sign of healed lesions or chronic disease
Nodule/Mass	—	Suspicious lesion, clinically relevant
Pneumothorax	—	Acute finding critical in emergency care

Table 2. Performance of CNN models on the base dataset (Patience = 10).

Model	Final Epoch	Training Accuracy (%)	Validation Accuracy (%)	Observation
MobileNetV3-Large	20	94.65	62.00	Moderate overfitting, but maintains high validation accuracy.
EfficientNetV2-B0	23	91.24	60.40	Slight overfitting, validation accuracy remains stable.
ResNet101	14	87.57	61.40	Balanced learning, validation performance competitive.
DenseNet201	28	80.84	57.80	Moderate training, slightly lower validation accuracy.
Xception	24	53.02	51.60	Underfitting, low performance on both training and validation.
InceptionResNetV2	10	20.00	20.00	Model failed to learn, stuck at random prediction level.

Table 3. Performance of CNN models on the augmented dataset (Patience = 10).

Model	Final Epoch	Training Accuracy (%)	Validation Accuracy (%)	Observation
MobileNetV3-Large	15	96.42	63.40	Slight overfitting, but maintains highest validation accuracy overall.
EfficientNetV2-B0	32	96.05	61.40	Very high training accuracy, but validation performance plateaued.
ResNet101	28	94.05	62.80	High learning with good generalization; validation accuracy consistent.
DenseNet201	20	80.52	58.40	Moderate overfitting, some improvement over base dataset.
Xception	29	60.96	56.60	Limited learning, but improvement over base performance.
InceptionResNetV2	10	20.03	20.00	Model failed to learn, similar to base performance.

Table 4. End-to-end inference performance of INT8 quantized models on Raspberry Pi (input: 2566 × 2566 PNG image).

Model	Model Load Time (ms)	Preprocessing Time (ms)	Inference Time (ms)	Total Latency (ms)
MobileNetV3Large	87.0	335.2	94.5	429.6
EfficientNetV2B0	162.5	336.0	286.9	623.0
DenseNet201	450.5	425.6	962.6	1388.2
Xception	10.5	352.9	1291.5	1644.4
ResNet101	359.1	338.5	1569.7	1908.2
InceptionResNetV2	18.6	356.0	1944.7	2300.7

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ahmed, M.S.; Giordano, S.; Adami, D. Performance Comparison of Embedded AI Solutions for Classification and Detection in Lung Disease Diagnosis. Appl. Sci. 2025, 15, 9345. https://doi.org/10.3390/app15179345

AMA Style

Ahmed MS, Giordano S, Adami D. Performance Comparison of Embedded AI Solutions for Classification and Detection in Lung Disease Diagnosis. Applied Sciences. 2025; 15(17):9345. https://doi.org/10.3390/app15179345

Chicago/Turabian Style

Ahmed, Md Sabbir, Stefano Giordano, and Davide Adami. 2025. "Performance Comparison of Embedded AI Solutions for Classification and Detection in Lung Disease Diagnosis" Applied Sciences 15, no. 17: 9345. https://doi.org/10.3390/app15179345

APA Style

Ahmed, M. S., Giordano, S., & Adami, D. (2025). Performance Comparison of Embedded AI Solutions for Classification and Detection in Lung Disease Diagnosis. Applied Sciences, 15(17), 9345. https://doi.org/10.3390/app15179345

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Performance Comparison of Embedded AI Solutions for Classification and Detection in Lung Disease Diagnosis

Abstract

1. Introduction

1.1. Motivation and Research Gap

1.2. Objective

1.3. Contributions

2. Related Work

2.1. Deep Learning for Lung Disease Classification

2.2. Object Detection for Thoracic Abnormalities

2.3. Edge-AI Deployment for Medical Diagnosis

2.4. Toward Unified and Interpretable Pipelines

3. Methodology

3.1. Datasets and Models

3.1.1. Classification Dataset

3.1.2. Detection Dataset

3.1.3. Classification Models

3.1.4. Detection Model

3.2. Training Setup

3.3. Edge Deployment

3.4. Evaluation Metrics

3.4.1. Classification Metrics

3.4.2. Detection Metrics

3.4.3. Edge Inference Metrics

3.5. Correlation Analysis

4. Results

4.1. Classification Results

4.2. Detection Results

4.3. Edge Inference Results

4.4. Impact of Quantization on Prediction Confidence

4.5. Combined Analysis and Insights

4.6. Summary of Key Findings

5. Discussion

5.1. Interpretability and Practical Deployment

5.2. Complementary Value of Classification and Detection

5.3. Impact of Data Augmentation and Future Dataset Expansion

5.4. Limitations and Future Work

5.5. Novelty and Positioning

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Convergence Curves of CNN Classifiers

Appendix B. Quantization Impact on Prediction Confidence

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI