Exploratory Image-Level Classification of a Public Chest Radiograph Dataset Using a Lightweight SqueezeNet-Based Pipeline

Ramalhete, Luis; Oliveira, Vitor; Quintas, Rui; Araújo, Rúben

doi:10.3390/aimed1020015

Open AccessArticle

Exploratory Image-Level Classification of a Public Chest Radiograph Dataset Using a Lightweight SqueezeNet-Based Pipeline

by

Luis Ramalhete

^{1,2,3,4,*,†}

,

Vitor Oliveira

^4,5,6,

Rui Quintas

^4,5,7 and

Rúben Araújo

^2,*,†

¹

Blood and Transplantation Center of Lisbon, Instituto Português do Sangue e da Transplantação, Alameda das Linhas de Torres, No. 117, 1769-001 Lisbon, Portugal

²

NOVA Medical School, Universidade NOVA de Lisboa, 1169-056 Lisbon, Portugal

³

iNOVA4Health—Advancing Precision Medicine, Núcleo de Investigação em Doenças Renais, NOVA Medical School, Faculdade de Ciências Médicas, Universidade NOVA de Lisboa, 1169-056 Lisbon, Portugal

⁴

LBS—Lisbon Business & Government School, Rua de São Bernardo, 34-A, 1200-825 Lisbon, Portugal

⁵

Radiology, São José Hospital, Local Health Unit of São José, Rua José António Serrano, 1150-199 Lisbon, Portugal

⁶

ISCSP-UL—Instituto Superior de Ciências Sociais e Políticas da Universidade de Lisboa, Rua Almerindo Lessa, 1300-633 Lisbon, Portugal

⁷

NOVA IMS—NOVA Information Management School, Universidade NOVA de Lisboa, Campus de Campolide, 1070-312 Lisbon, Portugal

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

AI Med. 2026, 1(2), 15; https://doi.org/10.3390/aimed1020015

Submission received: 16 March 2026 / Revised: 25 May 2026 / Accepted: 27 May 2026 / Published: 2 June 2026

Download

Browse Figures

Versions Notes

Abstract

Background: Chest radiography is widely used in clinical workflows; however, exploratory image-level classification across multiple public-dataset categories remains less studied than single-disease classification tasks. We aimed to develop and internally evaluate a compact SqueezeNet-based pipeline for nine-class chest radiograph classification within a public dataset. Low-computational-footprint approaches may be relevant for future research prototypes in resource-constrained settings, particularly when offline operation is desirable; however, no real-world clinical deployment or triage validation was assessed in the present study. Methods: Using a public dataset of 6743 frontal radiographs spanning normal anatomy and eight pathology categories, we extracted 512-dimensional embeddings from a pre-trained SqueezeNet-1.0 (features module with global average pooling) and trained a scikit-learn MLP with a single hidden layer. Performance was assessed with stratified 5-fold cross-validation using accuracy and class-wise precision, recall, and F1; interpretability was examined via confusion matrices and dimensionality reduction techniques (t-SNE, and MDS). Results: The model achieved a mean accuracy of 98.83% across folds, with per-class precision, recall, and F1 generally ≥0.96 and a weighted F1 of 0.99; confusion matrices showed minimal off-diagonal errors, and embedding visualizations revealed well-separated, class-consistent clusters. Conclusions: Compact CNN features coupled with a simple MLP demonstrated strong internal performance for multi-class CXR classification within the evaluated dataset. However, the absence of external validation, the use of synthetically augmented data, and the lack of patient-level provenance metadata substantially limit conclusions regarding generalizability and clinical applicability.

Keywords:

chest radiography; image-level classification; SqueezeNet; convolutional neural networks; public dataset; exploratory machine learning

1. Introduction

Chest radiography (chest X-ray, CXR) is one of the most commonly performed imaging examinations in medicine, serving as a first-line diagnostic tool for a wide range of thoracic diseases [1,2]. It is a fast, cost-effective, and non-invasive modality that can reveal pathologies of the lungs, heart, and surrounding structures. Prompt and accurate interpretation of chest X-rays is critical for patient care [1,3,4]. For example, early detection of pulmonary infections, malignancies, or chronic lung conditions significantly improves clinical outcomes. Conversely, missed diagnoses on CXR can lead to delayed treatment of life-threatening illnesses. Indeed, diseases such as pneumonia, tuberculosis (TB), and lung cancer remain major causes of mortality worldwide if not identified and treated in time [5,6,7,8,9]. In 2017, pneumonia alone claimed over 800,000 lives among young children, while TB and lung cancer each cause over a million deaths annually [9,10,11]. These statistics underscore the vital role of chest X-ray screening in global health. Such constraints are particularly acute in low- and middle-income countries (LMICs), where limited radiology capacity and delayed image interpretation can critically affect outcomes, underscoring the need for lightweight AI-assisted decision support approaches.

However, interpreting CXR images is a challenging task prone to observer variability and error. Radiologists must scrutinize subtle patterns on relatively low-contrast images, often under significant time pressure. In busy clinical settings, a single radiologist may need to read dozens or hundreds of X-rays per day, which can lead to fatigue and reduced diagnostic accuracy [12,13,14,15]. Furthermore, many regions of the world face a shortage of experienced radiologists, forcing general physicians to interpret X-rays despite limited specialized training [16,17,18]. Human interpretation is inherently fallible: studies have reported that approximately 30% of abnormal findings on chest radiographs can be missed by human readers [1,12,19,20]. Errors arise from various factors, including overlapping anatomical structures, subtle abnormalities that are difficult to discern, and differences in individual expertise or attention [1,21,22]. Such missed or misinterpreted findings may lead to delayed diagnoses, suboptimal treatment, and preventable complications [22,23,24,25,26]. This variability highlights the need for tools that can assist radiologists and clinicians in achieving more consistent and accurate CXR interpretations. However, many existing AI systems depend on cloud connectivity or high-end GPUs, limiting real-world adoption in resource-constrained hospitals and field clinics.

Computer-aided diagnosis (CAD) systems based on artificial intelligence (AI) have emerged as a promising solution to augment radiologists’ capabilities [27,28,29,30,31]. In particular, advances in machine learning (ML) and deep learning have enabled the development of algorithms that automatically analyze medical images and detect pathologies. Convolutional neural networks (CNNs), a class of deep learning models proven successful in image recognition tasks, have been at the forefront of this progress. Over the past decade, numerous studies have demonstrated that CNN-based models can achieve high diagnostic performance under controlled experimental conditions in identifying abnormalities on chest X-rays [32,33,34,35,36]. For instance, Rajpurkar et al. developed a 121-layer CNN that detects pneumonia from CXRs with performance reported to rival that of board-certified radiologists, achieving an area under the curve (AUC) of 76.80% [37]. Similarly, Kundu et al. introduced a model for pneumonia detection using a transfer learning-based ensemble of ResNet-18, GoogleNet, and DenseNet-121. By applying a weighted averaging method and five-fold cross-validation, their model was trained and tested on Pediatric-CXR and Radiological Society of North America-Pneumonia-CXR datasets. The system demonstrated strong performance across the evaluated datasets, achieving 98.81% accuracy on Pediatric-CXR and 86.86% on Radiological Society of North America-Pneumonia-CXR, highlighting the robustness of their ensemble approach across different datasets [38]. Likewise, deep learning models have shown high accuracy in detecting TB, nodules, pleural effusions, and other thoracic diseases on CXR images [39,40,41,42,43]. These successes have been fueled by the availability of large public CXR databases and modern GPU computing, enabling training of complex neural networks on hundreds of thousands of images [44]. Notably, the release of datasets such as the NIH ChestX-ray14 (with over 100,000 images) and Stanford CheXpert has spurred the development of multi-disease detection algorithms, moving beyond single-diagnosis models. As a result, AI systems now exist that can screen chest radiographs for dozens of different abnormalities within seconds, often with strong performance reported in specific experimental tasks [1].

Despite these advancements, there remain important gaps and challenges in the application of AI to chest radiograph interpretation. Many early deep learning studies addressed only binary classification problems (e.g., pneumonia vs. normal) or a limited set of pathologies, rather than the full diversity of lung diseases seen in practice [45,46,47]. Focusing on one finding at a time simplifies the problem but does not reflect real-world diagnostics, where an X-ray could reveal any of several abnormalities. In fact, a recent review noted that most CXR classification models have targeted a small subset of findings, and comparatively few frameworks attempt comprehensive, multi-class diagnosis [5,48,49].

When models do attempt to classify multiple conditions simultaneously, performance can degrade as the task grows more complex. For example, Cococi et al. reported that a CNN distinguishing pneumonia from other diseases achieved ~94% accuracy in a binary setting, but the accuracy dropped to ~84% when a third class was added [50,51,52]. This illustrates how multi-class classification is inherently more challenging, as the model must learn more fine-grained differences between categories. Indeed, many researchers have restricted their experiments to at most 5–6 classes even if a dataset contains a broader array of diagnoses [53,54]. From a methodological perspective, broader image-level classification across multiple radiographic categories remains an important research problem. However, robust multi-class or multi-label CXR classification remains challenging because of inter-class similarity, class imbalance, heterogeneous labeling practices, and the need for large annotated datasets with reliable provenance information [55,56]. An effective future system would ideally support broader radiological review across multiple abnormalities, identifying various possible pathologies (or confirming normal findings) in one pass. Developing robust multi-class (or multi-label) CXR classifiers remains an active area of research, with challenges including inter-class similarity, class imbalance, and the need for large annotated samples for each condition.

In this context, we leverage a recently published dataset that enables investigation of automated multi-class classification of lung diseases on CXR. The dataset, “X-ray Lung Diseases Images (9 classes)”, was created by Feltrin et al. and made publicly available via Kaggle. It contains a total of 6743 frontal chest X-ray images categorized into nine distinct classes: eight categories correspond to different lung pathologies and the ninth is normal (healthy lungs). The disease classes span a broad spectrum of radiographic abnormalities, including obstructive pulmonary diseases (e.g., COPD), degenerative infectious diseases (e.g., tubercular patterns), higher density and lower density changes in lung fields (which may correspond to infiltrates or emphysema-related lucency), encapsulated lesions (such as contained abscesses or cysts), mediastinal changes (e.g., widened mediastinum or lymphadenopathy), and other chest changes [57]. By encompassing several dataset-defined pulmonary categories, this public dataset provides a useful exploratory setting for image-level multi-class classification, while its provenance and labeling limitations restrict conclusions regarding clinical generalizability. Another important feature is that the dataset is relatively balanced across the nine classes, achieved through synthetic augmentation of images to bolster under-represented categories. All images were resized to a uniform resolution (450 pixels height) and augmented, resulting in a more even distribution of cases per class (only lightly imbalanced) [57].

Using data augmentation to address class imbalance is a common strategy in medical imaging AI [5], and here it helps ensure that the CNN does not become biased toward any one diagnosis. Compared to large-scale CXR datasets like ChestX-ray14 which have severe label imbalance (for instance, common conditions like atelectasis have tens of thousands of examples while rare findings have only a few hundred), the present Kaggle dataset offers a well-curated and balanced sample for each lung disease category. This mitigates one obstacle in training a multi-class model and allows us to focus on the model’s ability to differentiate between pathologies. Additionally, the dataset’s images have been preprocessed and standardized to some extent (all in 8-bit grayscale and similar dimensions), simplifying the preprocessing pipeline needed before model training.

Building on this dataset, our study evaluates an exploratory image-level classification pipeline for nine dataset-defined chest radiograph categories. We use a lightweight SqueezeNet-based feature extractor and an MLP classifier to assess whether compact CNN-derived representations can separate the available public-dataset labels under internal cross-validation. The trained model’s performance is assessed using class-wise metrics to avoid relying only on overall accuracy. We also include dimensionality reduction analyses and Grad-CAM visualizations as exploratory tools for inspecting feature-space organization and model-salient regions, without implying clinically validated explainability or diagnostic reasoning.

In summary, this work presents an exploratory image-level classification pipeline combining SqueezeNet-derived embeddings, an MLP classifier, dimensionality reduction analyses, and Grad-CAM visualization within a public nine-class chest radiograph dataset. The study evaluates whether compact CNN-derived representations can separate dataset-defined radiographic categories under internal cross-validation. The work should therefore be interpreted as a methodological proof-of-concept rather than as evidence of clinical diagnostic performance, triage capability, or deployment readiness.

2. Materials and Methods

2.1. Dataset and Ethical Considerations

All experiments were conducted using the publicly available dataset titled “X-ray Lung Diseases Images (9 classes)”, created and made available by Fernando Feltrin through the Kaggle platform. The dataset comprises chest X-ray images organized into nine categories, each corresponding to a different lung condition or to healthy, normal anatomy, as presented in Table 1. The images are free of patient-identifying information, and no interventionary studies on humans or animals were performed. Consequently, no institutional review board approval was required. Additionally, the dataset was accessed as publicly available material through the Kaggle platform. However, detailed information regarding original institutional sources, annotation procedures, licensing terms, and reuse permissions was not comprehensively documented in the dataset description. All data have been deposited online at Kaggle in accordance with the creator’s license. Any future updates or accession numbers will be provided prior to publication, if necessary. The labels were used as provided by the dataset authors, and no independent radiological verification was performed in the present study.

2.2. Data Preprocessing and Feature Extraction

Prior to model training, each chest X-ray was loaded and converted to a three-channel RGB format. Images were then resized to 224 × 224 pixels to ensure consistency with the chosen feature extractor. Following resizing, a standard normalization procedure was applied, subtracting channel-wise means and dividing by the respective standard deviations. These preprocessing steps help to stabilize training, standardize model input and stabilize feature extraction in CNNs.

Feature extraction was performed using the features module of a SqueezeNet 1.0 architecture, pre-trained on ImageNet. Specifically, we treated the convolutional portion of SqueezeNet as a fixed feature extractor, generating a 512-dimensional global average pooled vector for each image. This vector captures high-level features indicative of image content (e.g., consolidated opacities, hyperlucent regions, nodular patterns). SqueezeNet was used in “evaluation” mode to disable any training-related randomness (like dropout) during feature extraction. The resulting feature vectors served as input to the subsequent classification algorithm.

2.3. Classification Model and Training

To classify the extracted 512-dimensional feature vectors, we employed a Multi-Layer Perceptron (MLP) provided by the scikit-learn library (Python version 3.12). The MLP was specified with a single hidden layer of 200 neurons, rectified linear unit (ReLU) activation, and the Adam optimizer. A small regularization parameter (α = 0.01) was introduced to mitigate overfitting. Training was conducted with a maximum of 200 epochs or until convergence. All hyperparameters were determined empirically through pilot experiments aimed at balancing model complexity and generalizability.

2.4. Cross-Validation and Evaluation Metrics

Because of the modest size of the dataset and the multi-class nature of the classification task, performance was assessed using stratified 5-fold cross-validation. This procedure divides the data into five disjoint subsets while preserving class distribution in each subset. The cross-validation procedure was performed at the image level using the dataset structure as publicly provided. Because the dataset contains synthetically augmented images and does not provide patient-level identifiers or complete provenance metadata, strict patient-level or source-level separation across folds could not be guaranteed. Consequently, some degree of similarity between training and validation samples may have contributed to optimistic performance estimates. The reported results should therefore be interpreted as internal validation results within the constraints of the available dataset. Because augmented derivatives from the same original image may have been distributed across folds, the reported metrics may overestimate true generalization performance.

2.5. Dimensionality Reduction and Visualization

To gain further insight into how the model distinguished between classes in a 512-dimensional feature space, we employed both t-distributed Stochastic Neighbor Embedding (t-SNE) and Multidimensional Scaling (MDS). t-SNE is a nonlinear dimensionality reduction method that projects high-dimensional data into a low-dimensional space (often 2D) while preserving local structures. Each feature vector was projected into two dimensions, and color-coding was used to indicate class labels. Clusters in this 2D space provide a visual approximation of how separable or overlapping the categories are.

In parallel, MDS aims to preserve pairwise distances among data points by finding a low-dimensional representation that reflects the original distances or dissimilarities. This classical approach offers an alternative view of the inherent structure of the data, highlighting global patterns or relationships that might be less apparent with t-SNE. The combined use of t-SNE and MDS, though not strictly necessary for model deployment, provides additional exploratory visualization support, helping to visualize a potential feature-space organization pattern and guide potential refinements of the classification pipeline.

Once the final model was trained, a separate inference script was employed to classify new chest X-ray images. This script reuses the same feature extraction pipeline (SqueezeNet feature extraction and global average pooling). The extracted feature vectors are passed to the trained MLP model for probability estimation across all nine categories. The script then reports both the predicted class with the highest probability and the probabilities for all other classes, providing a more transparent view of the model’s confidence distribution. By examining these confidence scores, one can detect cases in which multiple pathologies might manifest similarly on chest radiographs.

2.6. Inference and Reporting

A standalone desktop application (Python/Tkinter) was implemented for fully offline inference to support resource-constrained offline inference workflows. The application loads the final trained classifier from a serialized file, reapplies the training-time preprocessing (RGB conversion, 224 × 224 resize, and standard normalization), and outputs per-class probabilities for the nine categories. Probabilities are displayed both as text and a bar chart, with an option to export a CSV summary to facilitate audit trails and downstream analysis. The tool runs on commodity hardware (CPU-only is sufficient) and depends solely on open-source libraries (PyTorch version 3.12, TorchVision, TorchCAM, Matplotlib), eliminating the need for internet connectivity and minimizing operational cost in resource-limited environments.

For interpretability, Grad-CAM is computed using the final convolutional layer of the SqueezeNet backbone (target layer features. 12). A user-controlled threshold restricts overlays to classes whose predicted probability exceeds the threshold (Class 00—Normal Anatomy is excluded), and salient regions are annotated with color-coded markers that map consistently to class labels. This design surfaces model-salient image regions that support the predicted labels and enables rapid visual assessment support when clinical workload is high or expert availability is limited.

The application was engineered for low computational footprint, offline operation, simple operator workflow, and linguistic localization. These characteristics may support future evaluation of research-oriented inference workflows in resource-constrained settings, including Portuguese-speaking low-resource environments. The tool should be interpreted as an exploratory prototype and not as a clinically validated triage or diagnostic system.

3. Results

This section presents the classification outcomes and visualizations of feature distributions. The dataset used for training and evaluation, summarized in Table 1, comprises nine categories of normal and pathological chest radiographs and totals 6743 images. We report cross-validation performance, analyze the confusion matrix, and examine additional dimensionality reduction techniques (t-SNE, manifold learning, and MDS) that offer deeper insights into how feature representations cluster.

3.1. Distribution of Images by Class

As noted previously in Table 1, the 6743 images were distributed among the nine classes with Normal Anatomy (Class 00) comprising the largest subset, at approximately 19.9% of the total (1340 images). Pulmonary Inflammatory Processes (Class 01) followed at around 15.7% (1060 images). Higher Density pathologies (Class 02) represented 10.1%, while Lower Density changes (Class 03) accounted for roughly 9.3%. Obstructive Pulmonary Diseases (Class 04) made up 9.6%, with Degenerative Infectious Diseases (Class 05) close behind at 8.8%. Encapsulated Lesions (Class 06) had nearly 9.8%, Mediastinal Changes (Class 07) around 8.8%, and Chest Changes (Class 08) about 8.1% of the data.

This relatively balanced distribution may have facilitated model training, as no single category overwhelmingly dominated the dataset.

3.2. Classification Performance

A total of 6743 images were processed, resulting in a feature matrix.

X \in R^{6743 \times 512}

(1)

A 5-fold stratified cross-validation protocol was employed to assess the accuracy of the final MLP classifier. The accuracy for each fold, presented at Table 2, yielded a mean accuracy of approximately 0.9883 (98.83%). This high cross-validation score suggests that the model was able to effectively distinguish between the nine dataset categories under the present experimental conditions, even under different training/validation splits.

To further examine the classification behavior, a confusion matrix (Table 3) and a detailed classification report were computed based on cross-validation predictions. Notably, the confusion matrix demonstrates very few off-diagonal entries. Among the nine classes, Higher Density (Class 02), Lower Density (Class 03), Degenerative Infectious Diseases (Class 05), Encapsulated Lesions (Class 06), Mediastinal Changes (Class 07), and Chest Changes (Class 08) are identified with high consistency, with minimal misclassifications.

These findings are also confirmed by the precision, recall, and F1-scores in the classification report, presented in Table 4, all of which exceed 0.96 across categories. The most frequent confusion occurs in the Pulmonary Inflammatory Processes (Pneumonia) class (Class 01), which sometimes overlaps with certain other pathologies. Overall, the weighted average F1-score of 0.99 indicates that the model maintains both high precision and high recall.

Having evaluated the cross-validated predictions, the final MLP model was retrained on the entire dataset of 6743 images, achieving comparable internal performance. This trained model was then saved for inference purposes, enabling efficient classification of new chest X-ray scans by leveraging the SqueezeNet-based feature extraction pipeline.

3.3. Dimensionality Reduction and Visualization

To explore how the 512-dimensional feature embeddings separate across the nine classes, we applied three dimensionality reduction techniques: t-SNE, Manifold Learning, and MDS. Each method projects the high-dimensional vectors onto a 2D plane, enabling visual inspection of the embeddings of the chest X-ray feature vectors.

t-SNE: Figure 1 depicts the t-SNE scatterplot, with each point representing one chest X-ray and its color denoting the assigned class. Despite the inherent stochasticity of t-SNE, it clearly shows distinct clusters. Notably, the large blue region at the lower-left corresponds primarily to the Normal Anatomy class (00), while the red cluster near it represents Pulmonary Inflammatory Processes (Pneumonia) (01). Classes such as Encapsulated Lesions (06) or Chest Changes (08) appear as smaller, denser groups, indicating that the SqueezeNet features capture characteristic radiographic signatures. Minor overlaps can be seen where pathologies share similar radiographic findings, but overall, well-defined boundaries affirm that the learned embeddings effectively separate the various disease categories.

Using an alternative manifold-learning technique (e.g., Isomap or Locally Linear Embedding), in Figure 2 the data again map into a 2D space. This approach tends to preserve global structure more effectively than t-SNE, and similar class clusters re-emerge in roughly consistent groupings. The sizable collection of “Normal Anatomy” points remains clearly separated from “Pulmonary Inflammatory Processes,” whereas intermediate or confusable classes still maintain coherent groups without significant overlap. Notably, classes with strong morphological hallmarks, like “Encapsulated Lesions” or “Chest Changes,” form smaller but concentrated clusters. This consistency across multiple projection methods supports the consistency of the SqueezeNet-derived features.

In classical MDS, as Figure 3 shows, distances in the 512-dimensional embedding space are translated directly into a 2D layout. While MDS may sacrifice some local details compared to t-SNE, it provides a more holistic view of overall inter-class separations. Again, “Normal Anatomy” remains clustered away from pathological classes, and conditions such as “Higher Density” or “Lower Density” are consistently placed near each other, reflecting partial similarity in radiographic characteristics (e.g., opacities vs. air-containing lesions).

Across all three visualization methods, the embeddings display a meaningful class-based structure. Consistent clustering of the same disease categories, even under different projection techniques, underscores the meaningful class separability of the extracted features. These visualizations are consistent with the observed classification performance and suggest that the extracted feature representations contain discriminative information relevant to the dataset categories. However, dimensionality reduction techniques remain exploratory visualization tools and should not be interpreted as direct evidence of clinically meaningful reasoning.

Collectively, these dimensionality reduction approaches confirm that the nine categories occupy distinct regions in the learned feature space. This finding aligns with the strong classification metrics, suggesting that SqueezeNet’s 512-dimensional embeddings effectively represent complex radiographic patterns in a way that the MLP can readily distinguish.

To conclude the Results, Figure 4 presents representative outputs from the offline desktop inference application: the original chest radiograph, a Grad-CAM overlay restricted to classes whose predicted probability exceeds a user-set threshold (Class 00—Normal Anatomy is not annotated), per-class probability outputs for the nine dataset categories, and a fixed legend. These qualitative displays complement the quantitative metrics by illustrating how exploratory image-level inference and visual explanation can be implemented in an offline research prototype. The interface was designed for execution on commodity hardware and for future evaluation in resource-constrained environments, but it should not be interpreted as evidence of clinical deployment readiness.

4. Discussion

Together with the quantitative results, the offline explainability interface illustrates a lightweight research-oriented workflow for exploratory image-level inference under constrained computational conditions. The prototype was designed for offline execution on commodity hardware and for potential linguistic localization, including Portuguese-language use. However, it should be interpreted strictly as a research prototype and not as a clinically validated triage, diagnostic, or deployment-ready system.

4.1. Internal Classification Performance Within the Evaluated Dataset

The SqueezeNet + MLP model achieved a cross-validated accuracy of 98.83% in classifying chest X-rays into nine categories, indicating strong internal discriminative performance within the evaluated dataset across a broad spectrum of lung pathologies. This high performance suggests the model captured feature representations associated with the dataset categories that distinguish even subtle differences among disease types. Such differentiation is noteworthy given that many thoracic diseases can present with overlapping radiographic patterns, often leading to human diagnostic errors [58,59,60]. In our results, each disease category, spanning obstructive lung diseases, infectious processes, density abnormalities, encapsulated lesions, mediastinal and chest wall changes, and normal lungs, was identified with high consistency within the evaluated dataset, suggesting that the extracted representations encode category-associated imaging patterns [50]. Although encouraging, these findings should be interpreted as internal experimental results and do not establish prospective clinical performance or real-world diagnostic reliability. The model may support future methodological research on image-level classification workflows, but external validation, patient-level separation, label verification, and prospective evaluation would be required before any clinical interpretation or workflow use could be considered. Clinically relevant contexts such as donor lung assessment or emergency imaging illustrate why robust chest radiograph interpretation is important. However, the present dataset and validation strategy do not support claims regarding performance in these specific settings. Such applications would require dedicated external validation, expert label verification, patient-level separation, prospective assessment, and workflow-specific evaluation [1,61,62,63].

4.2. Comparison with Prior Work

Our findings align with and extend the growing body of literature on AI-driven chest X-ray interpretation. In contrast to server- or cloud-hosted pipelines, our offline inference approach may offer practical advantages related to local processing and reduced infrastructure requirements, cost efficiency, and resilience, which is pivotal for low-infrastructure health systems. Early foundational studies focused on single-disease detection achieved impressive results; most famously, the CheXNet model (a 121-layer DenseNet) attained radiologist-level pneumonia detection on the large ChestX-ray14 dataset [37]. CheXNet provided early evidence that deep learning models could achieve performance comparable to radiologist interpretation for specific chest radiograph classification tasks under controlled experimental conditions [37]. Subsequent research has continued to validate deep learning’s potential: for instance, Lakhani et al. showed near-perfect AUC (~99%) in tuberculosis vs. normal classification using an ensemble of CNNs [64], and various groups report high accuracy on COVID-19 and pneumonia detection using CXR images [65]. However, many of these earlier works targeted binary or limited-class problems. Multi-class or multi-condition classification is inherently more challenging; adding just a third category was noted to drop accuracy from ~94% to ~84% in one mobile CNN study [51]. In this context, achieving ~98.8% across nine classes under the present experimental conditions is encouraging. It suggests that the transfer learning pipeline captured feature representations capable of separating the dataset categories despite the task complexity. A recent study by Maiti et al. (2024) employed a similar strategy of using SqueezeNet as a feature extractor coupled with classical classifiers for four lung conditions, reaching ~97.3% accuracy [65]. Our work builds on this lightweight paradigm and suggests that compact architectures may achieve competitive performance even in more complex multi-class settings. No direct comparison with alternative lightweight architectures such as MobileNet or EfficientNet-Lite was performed in the present study. SqueezeNet was selected primarily because of its compact architecture, low computational footprint, and suitability for offline deployment in resource-constrained settings. Notably, SqueezeNet’s efficiency did not compromise accuracy; an observation echoed by Maiti et al., who achieved comparable precision with a 4.6 million parameter SqueezeNet-based model versus a ~47 million parameter conventional model [65]. This efficiency is advantageous over heavier architectures (e.g., DenseNet or Inception) commonly reported in the literature, and it aligns with recent findings that carefully optimized lightweight networks (or fine-tuned models like EfficientNet) can deliver state-of-the-art accuracy on CXR tasks with far less computational cost [43,66,67,68,69,70,71,72]. Moreover, our multi-class approach addresses a methodological gap noted in the field: many earlier AI models targeted only specific findings or diseases [1], whereas broader multi-abnormality classification remains an important research challenge [73,74,75,76]. Within the constraints of the evaluated public dataset, our model represents an exploratory step toward broader image-level chest radiograph classification. Our results are consistent with the growing literature suggesting that AI-based methods can support concurrent analysis of multiple radiographic categories under controlled experimental conditions, but they do not establish clinical applicability.

4.3. Strengths of the Approach

High Internal Classification Performance: The model’s cross-validated accuracy of 98.83% is at the upper end of what has been reported for chest X-ray classification. This performance, achieved on a dataset of 6743 images spanning eight disease categories plus normal, underscores a robust pattern recognition that in some cases demonstrates strong performance within the evaluated dataset [77,78,79,80]. Such high accuracy across multiple classes is uncommon, as multi-class CXR studies often report greater difficulty with increasing classes [50].
Broad Disease Coverage: Unlike many studies focusing on a single pathology, our model distinguishes nine different conditions. This breadth increases the methodological relevance of the dataset-level classification task, while not establishing clinical utility. Covering a spectrum from chronic obstructive changes to acute infections and neoplasms (encapsulated lesions), the tool aligns with the need for AI systems to detect “multiple abnormalities” for comprehensive decision support [81,82,83].
Feature Separability and Interpretability: Dimensionality reduction analyses (t-SNE, MDS) of the learned feature vectors revealed well-separated clusters corresponding to the different disease categories. This indicates that the network’s latent representations form distinct groupings for each condition, providing a qualitative visualization of how feature representations are spatially organized. The relative separation of the “normal” cluster from the disease-associated clusters, for example, suggests that the model captures the clear radiographic differences between healthy lungs and pathological findings. Even disease classes with superficially similar radiographic appearances formed mostly discrete clusters, reflecting the model’s ability to tease apart subtle radiological distinctions. Such visualization offers a degree of interpretability, giving clinicians insight [84].
Computational Efficiency: A key strength of using SqueezeNet as the CNN backbone is its low computational and memory footprint [83,85]. SqueezeNet is a highly compact architecture, and our pipeline leverages this efficiency in a way that may be relevant for future research implementations in resource-constrained computational environments [86,87]. Prior work has noted that SqueezeNet-based models can achieve accuracy comparable to much larger networks while using fewer parameters [65]. This low computational footprint supports offline experimentation on commodity hardware, but the present study does not evaluate real-world point-of-care deployment.

4.4. Limitations and Considerations

Dataset and Generalization: The study used a relatively modestly sized dataset (on the order of 6–7k images), which, although augmented synthetically, is small compared to large public CXR databases (e.g., 100k+ images in NIH ChestX-ray14) [88]. The inclusion of synthetically augmented images raises the concern that the model may partially learn artificial features or repetitive patterns not present in real-world data. This could inflate cross-validation performance while limiting true generalization. As with many medical imaging AI studies based on public datasets, shortcut learning and dataset-specific bias cannot be excluded. No independent external test was reported, so it remains unverified how the model would perform on completely unseen data from different hospitals or patient populations. Robust generalization is crucial, as models can otherwise falter when confronted with shifts in imaging protocols, disease prevalence, or patient demographics [89,90,91]. A major limitation of the present study is the absence of independent external validation using datasets acquired under different institutional and technical conditions. Therefore, the reported performance should not be interpreted as evidence of broad clinical generalizability or deployment readiness. Calibration analysis was not performed, and predicted probabilities should therefore not be interpreted as calibrated estimates of disease likelihood. Future work should prioritize multicenter validation using heterogeneous datasets such as NIH ChestX-ray14 or CheXpert.
Class Definitions and Overlap: The nine classes in our dataset are broad categories (e.g., “obstructive pulmonary diseases” or “mediastinal changes”) that encompass various specific diagnoses. In practice, patients often have overlapping conditions, for instance, an obstructive disease (like COPD) with a superimposed infection could show features of both categories. Our model, being trained in a multi-class single-label setup based on the available dataset structure, assigns each image to one category and may not handle multi-label scenarios where multiple pathologies coexist. This constraint limits the realism of the present dataset-level classification task, since real chest X-rays frequently exhibit more than one abnormality [1]. A related point is that some categories have similar radiographic manifestations (for example, an “encapsulated lesion” vs. a certain type of density change), which could confuse the model in borderline cases. Indeed, the dimensionality reduction plots hinted at a degree of proximity between certain clusters, likely reflecting these overlaps. Although overall separability was high, a few misclassifications may have occurred between conceptually related diseases. Addressing multi-label classification and refining class definitions (perhaps aligning them with standard radiologic categories) would improve the model’s realism.
Explainability Scope: While the pipeline provides Grad-CAM overlays at the final convolutional layer, these saliency maps remain post hoc and qualitative. They do not guarantee causal faithfulness and can be sensitive to preprocessing and class priors. This “black boxes” nature means a clinician sees the output (e.g., “encapsulated lesion”) without a clear rationale or visualization of the evidence [92]. Such opacity can hinder trust and adoption, as clinicians may be reluctant to act on AI output without understanding the basis for the decision. As noted in the recent literature, a lack of transparency in deep learning models makes it difficult to predict failure, or generalize to different imaging hardware, or patient populations [93,94]. Explainability methods like saliency maps or Grad-CAM should be explored so that the model can point out, for instance, the region of a nodule or opacity that led to a particular classification [95]. Prospective reader-study evaluation and failure-mode analysis are needed to verify that highlighted regions consistently correspond to clinically meaningful findings across scanners and sites; exploring concept-based and counterfactual explanations would further strengthen trust and auditability.
Potential Bias and Spectrum of Disease: The dataset’s composition and synthetic augmentation could introduce bias. If certain classes (say, “normal” or common diseases) dominate, the model might be overly attuned to those at the expense of rarer conditions [96]. Moreover, synthetic augmentation (e.g., rotations, flips, etc.) may not fully capture the diversity of how diseases appear; important variations (different patient positions, image noise patterns, etc.) might still be underrepresented. These factors could limit the model’s performance on atypical presentations. Ongoing evaluation against a wide range of cases, including edge cases and subtle presentations, is necessary. Additionally, the model currently focuses only on detection/classification of the given categories; it does not quantify disease extent or severity, which could be important for clinical decision-making (for example, mild vs. severe cases of a disease are treated differently). This granularity is another possible extension in future studies.

Nevertheless, prospective multicenter validation in LMIC settings is essential to assess calibration, usability, and sustainability under real-world constraints. Accordingly, the present work should be interpreted primarily as a methodological proof-of-concept rather than a clinically validated diagnostic system.

4.5. Feature-Space Visualization and Separability

Dimensionality reduction techniques (t-SNE, MDS) were applied to the model’s high-dimensional feature embeddings to assess how well the learned features separate the different lung conditions. These visualizations provided qualitative evidence of feature separability. In the t-SNE plot, for instance, images naturally clustered into distinct groups that corresponded closely with their labeled disease categories, indicating that the model’s internal representations of, say, an obstructive lung disease vs. an infectious disease are visibly separated in the projected feature space. The normal chest X-rays formed a tight cluster that was well-isolated from all clusters of pathological cases; a reassuring result, as it suggests the model has learned a clear boundary between healthy and diseased lungs. Clusters for diseases that share certain radiographic characteristics were found in relative proximity, yet still largely distinguishable. For example, the group of obstructive inflammatory processes X-rays appeared near the obstructive pulmonary disease cluster, reflecting their related pathological nature, but they did not significantly intermingle; the model appears capable of distinguishing certain dataset-specific feature patterns associated with each category. Similarly, degenerative infectious disease cases clustered apart from acute encapsulated lesions, despite both possibly presenting as localized opacities, indicating that features like border definition or surrounding tissue reaction were captured by the network to tell them apart. The use of multiple manifold-learning methods (t-SNE for local neighborhood focus and MDS for preserving global distances) yielded congruent insights, strengthening our confidence that the class separability is genuine and not an artifact of any one visualization technique. These plots not only confirm the model’s high classification performance but also serve as an interpretive bridge: they show that images of the same class congregate in feature space, which aligns with radiological intuition (similar pathology appears similar) and provides a qualitative sanity-check for the model’s learning. In a qualitative sense, the 2D embeddings provide an exploratory visualization of how the extracted feature representations are organized, showing partial consistency between learned feature groupings and dataset-defined radiographic categories. Such visualization could be extended in the future as a tool for error analysis: any outlier images or overlaps identified in these plots might hint at either mislabeled data or truly challenging cases where even clinicians might disagree on the diagnosis.

4.6. Implications for Future Methodological Development

The present results suggest that lightweight SqueezeNet-based feature extraction combined with a simple MLP classifier may be a useful methodological framework for exploratory image-level classification of public chest radiograph datasets. However, the current study does not support claims of clinical deployment, diagnostic use, or triage capability. Before any workflow-oriented application could be considered, future studies would need to include externally validated datasets, patient-level separation, independent radiological label verification, calibration analysis, prospective reader studies, and evaluation across heterogeneous imaging devices and clinical populations [1].

However, for successful clinical adoption, further validation and development are required. Firstly, rigorous external validation on data from other hospitals and patient populations is needed to ensure the model generalizes and maintains its performance. This might involve collaborating with institutions to test the model on thousands of new images, including edge cases. Prospective clinical trials could then measure the model’s impact on diagnostic workflow, comparing radiologist performance with and without AI assistance (as was done in some recent investigations) [97,98]. Secondly, regulatory approval processes (such as FDA clearance in the US) mandate demonstrating the model’s safety and effectiveness. Key to this process will be showing consistent performance across diverse settings and absence of harmful biases. We should verify, for example, that the model performs equally well on different demographic groups and does not systematically over- or under-diagnose certain populations, an area of active concern in medical AI [99]. Thirdly, enhancing the explainability of the model is an important research direction. Incorporating saliency-based methods to highlight image regions supporting a given classification would transform the tool from a black box to a more transparent assistant. Radiologists are more likely to trust and adopt an AI that can not only output “this X-ray is classified as mediastinal abnormality” but also indicate where and why, for instance, by outlining an enlarged mediastinum on the X-ray suggesting lymphadenopathy [94,100]. This kind of visual explanation aligns the AI’s reasoning with clinical reasoning and could also help in cases where the AI might be incorrect, allowing the user to recognize if the highlighted features are irrelevant. Additionally, future work should explore multi-label classification so that the model can handle co-existing conditions (e.g., an X-ray with both a lesion and an effusion could trigger two labels) [101]. Multi-label outputs would mirror radiologists’ reporting, where multiple findings can be noted on one exam. Techniques like anomaly detection could also be incorporated for identifying cases that do not fit any of the learned categories (for example, an unforeseen pathology not in the training set), prompting a fallback to human review. Finally, expanding the training dataset with more images (potentially by pooling other public datasets or using data from different sources such as the NIH ChestX-ray or CheXpert databases) would likely further improve robustness. With more data, the model could learn a wider variety of presentations and avoid overfitting to the quirks of the current dataset. Data augmentation strategies could be made more sophisticated (e.g., using generative adversarial networks to create realistic variations in rare disease appearances) to enrich the training diversity without needing thousands of real new cases. Collaboration with clinicians to iteratively refine the model, for example, having radiologists review false positives/negatives to understand failure modes, will be valuable. This iterative loop can guide adjustments to the model or the decision thresholds to optimize clinical usefulness (perhaps favoring higher sensitivity in critical screening contexts, for instance).

Combined with offline operation and Grad-CAM-based visual explanations, the lightweight inference tool illustrates a low-computational-footprint research prototype for exploratory image-level inference. Its potential relevance to resource-constrained settings should be assessed only in future externally validated and prospectively designed studies.

5. Conclusions

In conclusion, this study presents an exploratory image-level classification pipeline based on SqueezeNet-derived embeddings and an MLP classifier for a public nine-class chest radiograph dataset. The model showed strong internal cross-validated performance within the evaluated dataset, and the accompanying offline prototype illustrates how low-computational-footprint inference and visual explanation could be implemented in a research-oriented desktop environment. However, the use of synthetically augmented data, limited dataset provenance, absence of patient-level identifiers, lack of independent label verification, and absence of external validation substantially limit interpretation. These findings should therefore be considered methodological and hypothesis-generating, and they do not establish clinical generalizability, diagnostic reliability, triage capability, or deployment readiness.

Author Contributions

Conceptualization, L.R. and R.A.; methodology, L.R. and R.A.; software, L.R. and R.A.; validation, L.R. and R.A.; formal analysis, L.R. and R.A.; investigation, L.R. and R.A.; resources, L.R. and R.A.; data curation, L.R., V.O., R.Q. and R.A.; writing—original draft preparation, L.R. and R.A.; writing—review and editing, L.R., V.O., R.Q. and R.A.; visualization, L.R. and R.A.; supervision, L.R. and R.A.; project administration, L.R. and R.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

The images are free of patient-identifying information, and no interventional studies on humans or animals were performed. Consequently, no institutional review board approval was required.

Data Availability Statement

Data available at https://www.kaggle.com/datasets/fernando2rad/x-ray-lung-diseases-images-9-classes, accessed on 10 January 2026.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

Chest X-ray, CXR	Chest radiography
CAD	Computer-aided diagnosis
TB	Tuberculosis
AI	Artificial intelligence
ML	Machine learning
CNN	Convolutional neural networks
AUC	Area under the curve
COPD	Chronic obstructive pulmonary disease
MLP	Multi-layer perceptron
ReLu	Rectified linear unit
t-SNE	t-distributed stochastic neighbor embedding
MDS	Multidimensional scaling

References

Anderson, P.G.; Tarder-Stoll, H.; Alpaslan, M.; Keathley, N.; Levin, D.L.; Venkatesh, S.; Bartel, E.; Sicular, S.; Howell, S.; Lindsey, R.V.; et al. Deep Learning Improves Physician Accuracy in the Comprehensive Detection of Abnormalities on Chest X-Rays. Sci. Rep. 2024, 14, 25151. [Google Scholar] [CrossRef]
Jones, C.M.; Buchlak, Q.D.; Oakden-Rayner, L.; Milne, M.; Seah, J.; Esmaili, N.; Hachey, B. Chest Radiographs and Machine Learning—Past, Present and Future. J. Med. Imaging Radiat. Oncol. 2021, 65, 538–544. [Google Scholar] [CrossRef]
Tárnoki, D.L.; Karlinger, K.; Ridge, C.A.; Kiss, F.J.; Györke, T.; Grabczak, E.M.; Tárnoki, Á.D. Lung Imaging Methods: Indications, Strengths and Limitations. Breathe 2024, 20, 230127. [Google Scholar] [CrossRef]
Kumar, S.; Kumar, H.; Kumar, G.; Singh, S.P.; Bijalwan, A.; Diwakar, M. A Methodical Exploration of Imaging Modalities from Dataset to Detection through Machine Learning Paradigms in Prominent Lung Disease Diagnosis: A Review. BMC Med. Imaging 2024, 24, 30. [Google Scholar] [CrossRef]
Ait Nasser, A.; Akhloufi, M.A. A Review of Recent Advances in Deep Learning Models for Chest Disease Detection Using Radiography. Diagnostics 2023, 13, 159. [Google Scholar] [CrossRef] [PubMed]
WHO. Pneumonia in Children. Available online: https://www.who.int/news-room/fact-sheets/detail/pneumonia (accessed on 11 August 2025).
Alapat, D.J.; Menon, M.V.; Ashok, S. A Review on Detection of Pneumonia in Chest X-Ray Images Using Neural Networks. J. Biomed. Phys. Eng. 2022, 12, 551–558. [Google Scholar] [CrossRef]
Qin, Z.Z.; Ahmed, S.; Sarker, M.S.; Paul, K.; Adel, A.S.S.; Naheyan, T.; Barrett, R.; Banu, S.; Creswell, J. Tuberculosis Detection from Chest X-Rays for Triaging in a High Tuberculosis-Burden Setting: An Evaluation of Five Artificial Intelligence Algorithms. Lancet Digit. Health 2021, 3, e543–e554. [Google Scholar] [CrossRef]
Sathitratanacheewin, S.; Sunanta, P.; Pongpirul, K. Deep Learning for Automated Classification of Tuberculosis-Related Chest X-Ray: Dataset Distribution Shift Limits Diagnostic Performance Generalizability. Heliyon 2020, 6, e04614. [Google Scholar] [CrossRef] [PubMed]
Malvezzi, M.; Carioli, G.; Bertuccio, P.; Boffetta, P.; Levi, F.; La Vecchia, C.; Negri, E. European Cancer Mortality Predictions for the Year 2017, with Focus on Lung Cancer. Ann. Oncol. 2017, 28, 1117–1123. [Google Scholar] [CrossRef] [PubMed]
Simba, J.M.; Irungu, A.; Otido, S.; Tumwa, D.; Mugane, S.; Musigula, R.; Andai, D.; Atieno, F.; Nyambura, M.; Mburugu, P. Preventable Deaths from Respiratory Diseases in Children in Low- and Middle-Income Countries. In Inequalities in Respiratory Health; Simba, J.M., Irungu, A., Otido, S., Tumwa, D., Mugane, S., Musigula, R., Andai, D., Atieno, F., Nyambura, M., Mburugu, P., Eds.; European Respiratory Society: Lausanne, Switzerland, 2023. [Google Scholar]
Gefter, W.B.; Post, B.A.; Hatabu, H. Commonly Missed Findings on Chest Radiographs. Chest 2023, 163, 650–661. [Google Scholar] [CrossRef]
Irmici, G.; Cè, M.; Caloro, E.; Khenkina, N.; Della Pepa, G.; Ascenti, V.; Martinenghi, C.; Papa, S.; Oliva, G.; Cellina, M. Chest X-Ray in Emergency Radiology: What Artificial Intelligence Applications Are Available? Diagnostics 2023, 13, 216. [Google Scholar] [CrossRef]
Pesapane, F.; Gnocchi, G.; Quarrella, C.; Sorce, A.; Nicosia, L.; Mariano, L.; Bozzini, A.C.; Marinucci, I.; Priolo, F.; Abbate, F.; et al. Errors in Radiology: A Standard Review. J. Clin. Med. 2024, 13, 4306. [Google Scholar] [CrossRef] [PubMed]
Krupinski, E.A.; Berbaum, K.S.; Caldwell, R.T.; Schartz, K.M.; Kim, J. Long Radiology Workdays Reduce Detection and Accommodation Accuracy. J. Am. Coll. Radiol. 2010, 7, 698–704. [Google Scholar] [CrossRef]
Siewert, B.; Bruno, M.A.; Bourland, J.D.; Slanetz, P.J.; Guillerman, P.; Schwartz, E.S.; Paltiel, H.J.; Hublall, R.; Brook, O.R.; Scanlon, M.H.; et al. Seven Challenges in Radiology Practice: From Declining Reimbursement to Inadequate Labor Force: Summary of the 2023 ACR Intersociety Meeting. J. Am. Coll. Radiol. 2025, 22, 129–138. [Google Scholar] [CrossRef]
Do, K.-H.; Beck, K.S.; Lee, J.M. The Growing Problem of Radiologist Shortages: Korean Perspective. Korean J. Radiol. 2023, 24, 1173. [Google Scholar] [CrossRef]
Hinrichs-Krapels, S.; Tombo, L.; Boulding, H.; Majonga, E.D.; Cummins, C.; Manaseki-Holland, S. Barriers and Facilitators for the Provision of Radiology Services in Zimbabwe: A Qualitative Study Based on Staff Experiences and Observations. PLoS Glob. Public Health 2023, 3, e0001796. [Google Scholar] [CrossRef]
Kaviani, P.; Digumarthy, S.R.; Bizzo, B.C.; Reddy, B.; Tadepalli, M.; Putha, P.; Jagirdar, A.; Ebrahimian, S.; Kalra, M.K.; Dreyer, K.J. Performance of a Chest Radiography AI Algorithm for Detection of Missed or Mislabeled Findings: A Multicenter Study. Diagnostics 2022, 12, 2086. [Google Scholar] [CrossRef]
Gefter, W.B.; Hatabu, H. Reducing Errors Resulting From Commonly Missed Chest Radiography Findings. Chest 2023, 163, 634–649. [Google Scholar] [CrossRef] [PubMed]
Neeraja, R.; Anbarasi, L.J. A Critical Review of Artificial Intelligence Based Techniques for Automatic Prediction of Cephalometric Landmarks. Artif. Intell. Rev. 2025, 58, 148. [Google Scholar] [CrossRef]
Zhang, L.; Wen, X.; Li, J.-W.; Jiang, X.; Yang, X.-F.; Li, M. Diagnostic Error and Bias in the Department of Radiology: A Pictorial Essay. Insights Imaging 2023, 14, 163. [Google Scholar] [CrossRef]
Turkington, P.M.; Kennan, N.; Greenstone, M.A. Misinterpretation of the Chest x Ray as a Factor in the Delayed Diagnosis of Lung Cancer. Postgrad. Med. J. 2002, 78, 158–160. [Google Scholar] [CrossRef]
Raffel, K.E.; Kantor, M.A.; Barish, P.; Esmaili, A.; Lim, H.; Xue, F.; Ranji, S.R. Prevalence and Characterisation of Diagnostic Error among 7-Day All-Cause Hospital Medicine Readmissions: A Retrospective Cohort Study. BMJ Qual. Saf. 2020, 29, 971–979. [Google Scholar] [CrossRef]
Zwaan, L.; Singh, H. Diagnostic Error in Hospitals: Finding Forests Not Just the Big Trees. BMJ Qual. Saf. 2020, 29, 961–964. [Google Scholar] [CrossRef]
Hames, K.; Patlas, M.N.; Mellnick, V.M.; Katz, D.S. Errors in Emergency and Trauma Radiology: General Principles. In Errors in Emergency and Trauma Radiology; Patlas, M.N., Katz, D.S., Scaglione, M., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 1–16. [Google Scholar]
Avni, U.; Greenspan, H.; Konen, E.; Sharon, M.; Goldberger, J. X-Ray Categorization and Retrieval on the Organ and Pathology Level, Using Patch-Based Visual Words. IEEE Trans. Med. Imaging 2011, 30, 733–746. [Google Scholar] [CrossRef] [PubMed]
Jaeger, S.; Karargyris, A.; Candemir, S.; Folio, L.; Siegelman, J.; Callaghan, F.; Xue, Z.; Palaniappan, K.; Singh, R.K.; Antani, S.; et al. Automatic Tuberculosis Screening Using Chest Radiographs. IEEE Trans. Med. Imaging 2014, 33, 233–245. [Google Scholar] [CrossRef] [PubMed]
Sethy, P.K.; Behera, S.K.; Anitha, K.; Pandey, C.; Khan, M.R. Computer Aid Screening of COVID-19 Using X-Ray and CT Scan Images: An Inner Comparison. J. X-Ray Sci. Technol. 2021, 29, 197–210. [Google Scholar] [CrossRef] [PubMed]
Parthasarathy, V.; Saravanan, S. Computer Aided Diagnosis Using Harris Hawks Optimizer with Deep Learning for Pneumonia Detection on Chest X-Ray Images. Int. J. Inf. Technol. 2024, 16, 1677–1683. [Google Scholar] [CrossRef]
Crowder, R.; Thangakunam, B.; Andama, A.; Christopher, D.J.; Dalay, V.; Nwamba, W.; Kik, S.V.; Van Nguyen, D.; Nguyen, N.V.; Phillips, P.P.J.; et al. Diagnostic Accuracy of Tuberculosis Screening Tests in a Prospective Multinational Cohort: Chest Radiography With Computer-Aided Detection, Xpert Tuberculosis Host Response, and C-Reactive Protein. Clin. Infect. Dis. 2024, 82, e239–e247. [Google Scholar] [CrossRef]
Guesmi, T. Detecting Pneumonia with a Deep Learning Model and Random Data Augmentation Techniques. Int. J. Adv. Comput. Sci. Appl. 2023, 14, 1187–1196. [Google Scholar] [CrossRef]
Moses, D.A. Deep Learning Applied to Automatic Disease Detection Using Chest X-rays. J. Med. Imaging Radiat. Oncol. 2021, 65, 498–517. [Google Scholar] [CrossRef]
Urooj, S.; Suchitra, S.; Krishnasamy, L.; Sharma, N.; Pathak, N. Stochastic Learning-Based Artificial Neural Network Model for an Automatic Tuberculosis Detection System Using Chest X-Ray Images. IEEE Access 2022, 10, 103632–103643. [Google Scholar] [CrossRef]
Sasikaladevi, N.; Revathi, A. Hypergraph Convolutional Neural Network for Fast and Accurate Diagnosis (FAT) of COVID From X-Ray Images. Int. J. Pattern Recognit. Artif. Intell. 2022, 36, 2257005. [Google Scholar] [CrossRef]
Albahli, S.; Yar, G.N.A.H. Fast and Accurate Detection of COVID-19 Along With 14 Other Chest Pathologies Using a Multi-Level Classification: Algorithm Development and Validation Study. J. Med. Internet Res. 2021, 23, e23693. [Google Scholar] [CrossRef]
Rajpurkar, P.; Irvin, J.; Zhu, K.; Yang, B.; Mehta, H.; Duan, T.; Ding, D.; Bagul, A.; Langlotz, C.; Shpanskaya, K.; et al. CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning. arXiv 2017, arXiv:1711.05225. [Google Scholar] [CrossRef]
Kundu, R.; Das, R.; Geem, Z.W.; Han, G.-T.; Sarkar, R. Pneumonia Detection in Chest X-Ray Images Using an Ensemble of Deep Learning Models. PLoS ONE 2021, 16, e0256630. [Google Scholar] [CrossRef] [PubMed]
Cha, M.J.; Chung, M.J.; Lee, J.H.; Lee, K.S. Performance of Deep Learning Model in Detecting Operable Lung Cancer With Chest Radiographs. J. Thorac. Imaging 2019, 34, 86–91. [Google Scholar] [CrossRef]
Thamilarasi, V.; Roselin, R. Automatic Classification and Accuracy by Deep Learning Using CNN Methods in Lung Chest X-Ray Images. IOP Conf. Ser. Mater. Sci. Eng. 2021, 1055, 012099. [Google Scholar] [CrossRef]
Pesce, E.; Joseph Withey, S.; Ypsilantis, P.-P.; Bakewell, R.; Goh, V.; Montana, G. Learning to Detect Chest Radiographs Containing Pulmonary Lesions Using Visual Attention Networks. Med. Image Anal. 2019, 53, 26–38. [Google Scholar] [CrossRef]
Iqbal, A.; Usman, M.; Ahmed, Z. An Efficient Deep Learning-Based Framework for Tuberculosis Detection Using Chest X-Ray Images. Tuberculosis 2022, 136, 102234. [Google Scholar] [CrossRef]
Cicero, M.; Bilbily, A.; Colak, E.; Dowdell, T.; Gray, B.; Perampaladas, K.; Barfett, J. Training and Validating a Deep Convolutional Neural Network for Computer-Aided Detection and Classification of Abnormalities on Frontal Chest Radiographs. Investig. Radiol. 2017, 52, 281–287. [Google Scholar] [CrossRef]
Rajpurkar, P.; Irvin, J.; Ball, R.L.; Zhu, K.; Yang, B.; Mehta, H.; Duan, T.; Ding, D.; Bagul, A.; Langlotz, C.P.; et al. Deep Learning for Chest Radiograph Diagnosis: A Retrospective Comparison of the CheXNeXt Algorithm to Practicing Radiologists. PLoS Med. 2018, 15, e1002686. [Google Scholar] [CrossRef] [PubMed]
Song, L.; Sun, H.; Xiao, H.; Lam, S.K.; Zhan, Y.; Ren, G.; Cai, J. Artificial Intelligence for Chest X-Ray Image Enhancement. Radiat. Med. Prot. 2025, 6, 61–68. [Google Scholar] [CrossRef]
Geric, C.; Qin, Z.Z.; Denkinger, C.M.; Kik, S.V.; Marais, B.; Anjos, A.; David, P.-M.; Ahmad Khan, F.; Trajman, A. The Rise of Artificial Intelligence Reading of Chest X-Rays for Enhanced TB Diagnosis and Elimination. Int. J. Tuberc. Lung Dis. 2023, 27, 367–372. [Google Scholar] [CrossRef]
Najjar, R. Redefining Radiology: A Review of Artificial Intelligence Integration in Medical Imaging. Diagnostics 2023, 13, 2760. [Google Scholar] [CrossRef]
Islam, M.T.; Aowal, M.A.; Minhaz, A.T.; Ashraf, K. Abnormality Detection and Localization in Chest X-Rays Using Deep Convolutional Neural Networks. arXiv 2017, arXiv:1705.09850. [Google Scholar] [CrossRef]
Kvak, D.; Chromcová, A.; Biroš, M.; Hrubý, R.; Kvaková, K.; Pajdaković, M.; Ovesná, P. Chest X-Ray Abnormality Detection by Using Artificial Intelligence: A Single-Site Retrospective Study of Deep Learning Model Performance. BioMedInformatics 2023, 3, 82–101. [Google Scholar] [CrossRef]
Aldamani, R.; Abuhani, D.A.; Shanableh, T. LungVision: X-Ray Imagery Classification for On-Edge Diagnosis Applications. Algorithms 2024, 17, 280. [Google Scholar] [CrossRef]
Lusiana; Kurniasari, A.A.; Kusumawati, I.F. Enhancing Accuracy in the Detection of Pneumonia in Adult Patients: An Approach by Using Convolutional Neural Networks. In 2020 International Conference on E-Health and Bioengineering (EHB); IEEE: Piscataway, NJ, USA, 2024; pp. 593–609. [Google Scholar]
Cococi, A.; Felea, I.; Armanda, D.; Dogaru, R. Pneumonia Detection on Chest X-Ray Images Using Convolutional Neural Networks Designed for Resource Constrained Environments. In Proceedings of the 2020 International Conference on E-Health and Bioengineering (EHB), Iasi, Romania, 29–30 October 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–4. [Google Scholar]
Guan, Q.; Huang, Y. Multi-Label Chest X-Ray Image Classification via Category-Wise Residual Attention Learning. Pattern Recognit. Lett. 2020, 130, 259–266. [Google Scholar] [CrossRef]
Chen, Y.; Wan, Y.; Pan, F. Enhancing Multi-Disease Diagnosis of Chest X-Rays with Advanced Deep-Learning Networks in Real-World Data. J. Digit. Imaging 2023, 36, 1332–1347. [Google Scholar] [CrossRef]
Ahmad, H.K.; Milne, M.R.; Buchlak, Q.D.; Ektas, N.; Sanderson, G.; Chamtie, H.; Karunasena, S.; Chiang, J.; Holt, X.; Tang, C.H.M.; et al. Machine Learning Augmented Interpretation of Chest X-Rays: A Systematic Review. Diagnostics 2023, 13, 743. [Google Scholar] [CrossRef]
Ajmera, P.; Onkar, P.; Desai, S.; Pant, R.; Seth, J.; Gupte, T.; Kulkarni, V.; Kharat, A.; Passi, N.; Khaladkar, S.; et al. Validation of a Deep Learning Model for Detecting Chest Pathologies from Digital Chest Radiographs. Diagnostics 2023, 13, 557. [Google Scholar] [CrossRef]
Feltrin, F. X-Ray Lung Diseases Images (9 Classes). Available online: https://www.kaggle.com/datasets/fernando2rad/x-ray-lung-diseases-images-9-classes (accessed on 11 August 2025).
Kumar, R.; Pan, C.-T.; Lin, Y.-M.; Yow-Ling, S.; Chung, T.-S.; Janesha, U.G.S. Enhanced Multi-Model Deep Learning for Rapid and Precise Diagnosis of Pulmonary Diseases Using Chest X-Ray Imaging. Diagnostics 2025, 15, 248. [Google Scholar] [CrossRef]
Sanida, M.V.; Sanida, T.; Sideris, A.; Dasygenis, M. An Advanced Deep Learning Framework for Multi-Class Diagnosis from Chest X-Ray Images. J 2024, 7, 48–71. [Google Scholar] [CrossRef]
Sufian, M.A.; Hamzi, W.; Sharifi, T.; Zaman, S.; Alsadder, L.; Lee, E.; Hakim, A.; Hamzi, B. AI-Driven Thoracic X-Ray Diagnostics: Transformative Transfer Learning for Clinical Validation in Pulmonary Radiography. J. Pers. Med. 2024, 14, 856. [Google Scholar] [CrossRef]
Garrity, E.R.; Boettcher, H.; Gabbay, E. Donor Infection: An Opinion on Lung Donor Utilization. J. Heart Lung Transplant. 2005, 24, 791–797. [Google Scholar] [CrossRef]
Ram, S.; Verleden, S.E.; Kumar, M.; Bell, A.J.; Pal, R.; Ordies, S.; Vanstapel, A.; Dubbeldam, A.; Vos, R.; Galban, S.; et al. CT-Based Machine Learning for Donor Lung Screening Prior to Transplantation. medRxiv 2023. [Google Scholar] [CrossRef]
Karalis, V.D. The Integration of Artificial Intelligence into Clinical Practice. Appl. Biosci. 2024, 3, 14–44. [Google Scholar] [CrossRef]
Lakhani, P.; Sundaram, B. Deep Learning at Chest Radiography: Automated Classification of Pulmonary Tuberculosis by Using Convolutional Neural Networks. Radiology 2017, 284, 574–582. [Google Scholar] [CrossRef]
Maiti, A.; Abarda, A.; Hanini, M.; Oussous, A. An Optimal Model Combining SqueezeNet and Machine Learning Methods for Lung Disease Diagnosis. Curr. Med. Imaging 2024, 20, e15734056258742. [Google Scholar] [CrossRef]
Baltruschat, I.M.; Nickisch, H.; Grass, M.; Knopp, T.; Saalbach, A. Comparison of Deep Learning Approaches for Multi-Label Chest X-Ray Classification. Sci. Rep. 2019, 9, 6381. [Google Scholar] [CrossRef]
Chouhan, V.; Singh, S.K.; Khamparia, A.; Gupta, D.; Tiwari, P.; Moreira, C.; Damaševičius, R.; de Albuquerque, V.H.C. A Novel Transfer Learning Based Approach for Pneumonia Detection in Chest X-Ray Images. Appl. Sci. 2020, 10, 559. [Google Scholar] [CrossRef]
Kim, C.; Yang, Z.; Park, S.H.; Hwang, S.H.; Oh, Y.-W.; Kang, E.-Y.; Yong, H.S. Multicentre External Validation of a Commercial Artificial Intelligence Software to Analyse Chest Radiographs in Health Screening Environments with Low Disease Prevalence. Eur. Radiol. 2023, 33, 3501–3509. [Google Scholar] [CrossRef]
Nam, J.G.; Park, S.; Hwang, E.J.; Lee, J.H.; Jin, K.-N.; Lim, K.Y.; Vu, T.H.; Sohn, J.H.; Hwang, S.; Goo, J.M.; et al. Development and Validation of Deep Learning–Based Automatic Detection Algorithm for Malignant Pulmonary Nodules on Chest Radiographs. Radiology 2019, 290, 218–228. [Google Scholar] [CrossRef]
Hwang, E.J.; Nam, J.G.; Lim, W.H.; Park, S.J.; Jeong, Y.S.; Kang, J.H.; Hong, E.K.; Kim, T.M.; Goo, J.M.; Park, S.; et al. Deep Learning for Chest Radiograph Diagnosis in the Emergency Department. Radiology 2019, 293, 573–580. [Google Scholar] [CrossRef]
Homayounieh, F.; Digumarthy, S.; Ebrahimian, S.; Rueckel, J.; Hoppe, B.F.; Sabel, B.O.; Conjeti, S.; Ridder, K.; Sistermanns, M.; Wang, L.; et al. An Artificial Intelligence–Based Chest X-Ray Model on Human Nodule Detection Accuracy From a Multicenter Study. JAMA Netw. Open 2021, 4, e2141096. [Google Scholar] [CrossRef]
Seah, J.C.Y.; Tang, C.H.M.; Buchlak, Q.D.; Holt, X.G.; Wardman, J.B.; Aimoldin, A.; Esmaili, N.; Ahmad, H.; Pham, H.; Lambert, J.F.; et al. Effect of a Comprehensive Deep-Learning Model on the Accuracy of Chest x-Ray Interpretation by Radiologists: A Retrospective, Multireader Multicase Study. Lancet Digit. Health 2021, 3, e496–e506. [Google Scholar] [CrossRef]
Mahamud, E.; Fahad, N.; Assaduzzaman, M.; Zain, S.M.; Goh, K.O.M.; Morol, M.K. An Explainable Artificial Intelligence Model for Multiple Lung Diseases Classification from Chest X-Ray Images Using Fine-Tuned Transfer Learning. Decis. Anal. J. 2024, 12, 100499. [Google Scholar] [CrossRef]
Javed, H.; El-Sappagh, S.; Abuhmed, T. Robustness in Deep Learning Models for Medical Diagnostics: Security and Adversarial Challenges towards Robust AI Applications. Artif. Intell. Rev. 2024, 58, 12. [Google Scholar] [CrossRef]
Koçak, B.; Ponsiglione, A.; Stanzione, A.; Bluethgen, C.; Santinha, J.; Ugga, L.; Huisman, M.; Klontzas, M.E.; Cannella, R.; Cuocolo, R. Bias in Artificial Intelligence for Medical Imaging: Fundamentals, Detection, Avoidance, Mitigation, Challenges, Ethics, and Prospects. Diagn. Interv. Radiol. 2024, 31, 75–88. [Google Scholar] [CrossRef]
Sadeghi, Z.; Alizadehsani, R.; Cifci, M.A.; Kausar, S.; Rehman, R.; Mahanta, P.; Bora, P.K.; Almasri, A.; Alkhawaldeh, R.S.; Hussain, S.; et al. A Review of Explainable Artificial Intelligence in Healthcare. Comput. Electr. Eng. 2024, 118, 109370. [Google Scholar] [CrossRef]
Erickson, B.J.; Korfiatis, P.; Akkus, Z.; Kline, T.L. Machine Learning for Medical Imaging. RadioGraphics 2017, 37, 505–515. [Google Scholar] [CrossRef]
Çallı, E.; Sogancioglu, E.; van Ginneken, B.; van Leeuwen, K.G.; Murphy, K. Deep Learning for Chest X-Ray Analysis: A Survey. Med. Image Anal. 2021, 72, 102125. [Google Scholar] [CrossRef]
Brady, A.P. Error and Discrepancy in Radiology: Inevitable or Avoidable? Insights Imaging 2017, 8, 171–182. [Google Scholar] [CrossRef]
McLauchlan, C.A.J.; Jones, K.; Guly, H.R. Interpretation of Trauma Radiographs by Junior Doctors in Accident and Emergency Departments: A Cause for Concern? Emerg. Med. J. 1997, 14, 295–298. [Google Scholar] [CrossRef]
Jiang, Y.; Ebrahimpour, L.; Després, P.; Manem, V.S.K. A Benchmark of Deep Learning Approaches to Predict Lung Cancer Risk Using National Lung Screening Trial Cohort. Sci. Rep. 2025, 15, 1736. [Google Scholar] [CrossRef]
Ward, B.; Koziar Vašáková, M.; Robalo Cordeiro, C.; Yorgancioğlu, A.; Chorostowska-Wynimko, J.; Blum, T.G.; Kauczor, H.-U.; Samarzija, M.; Henschke, C.; Wheelock, C.; et al. Important Steps towards a Big Change for Lung Health: A Joint Approach by the European Respiratory Society, the European Society of Radiology and Their Partners to Facilitate Implementation of the European Union’s New Recommendations on Lung Cancer Screen. ERJ Open Res. 2023, 9, 00026–02023. [Google Scholar] [CrossRef]
Yang, X. Application and Prospects of Artificial Intelligence Technology in Early Screening of Chronic Obstructive Pulmonary Disease at Primary Healthcare Institutions in China. Int. J. Chronic Obstr. Pulm. Dis. 2024, 19, 1061–1067. [Google Scholar] [CrossRef]
Oliveira, F.H.M.; Machado, A.R.P.; Andrade, A.O. On the Use of t-Distributed Stochastic Neighbor Embedding for Data Visualization and Classification of Individuals with Parkinson’s Disease. Comput. Math. Methods Med. 2018, 2018, 8019232. [Google Scholar] [CrossRef]
Rodriguez-Conde, I.; Campos, C.; Fdez-Riverola, F. Optimized Convolutional Neural Network Architectures for Efficient On-Device Vision-Based Object Detection. Neural Comput. Appl. 2022, 34, 10469–10501. [Google Scholar] [CrossRef]
Beheshti, N.; Johnsson, L. Squeeze U-Net: A Memory and Energy Efficient Image Segmentation Network. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1495–1504. [Google Scholar]
Kordnoori, S.; Sabeti, M.; Mostafaei, H.; Seyed Agha Banihashemi, S. Advances in Medical Image Analysis: A Comprehensive Survey of Lung Infection Detection. IET Image Process. 2024, 18, 3750–3800. [Google Scholar] [CrossRef]
National Institutes of Health Chest X-Ray Dataset. Available online: https://www.kaggle.com/datasets/nih-chest-xrays/data (accessed on 11 August 2025).
Hengstler, M.; Enkel, E.; Duelli, S. Applied Artificial Intelligence and Trust—The Case of Autonomous Vehicles and Medical Assistance Devices. Technol. Forecast. Soc. Change 2016, 105, 105–120. [Google Scholar] [CrossRef]
Nundy, S.; Montgomery, T.; Wachter, R.M. Promoting Trust Between Patients and Physicians in the Era of Artificial Intelligence. JAMA 2019, 322, 497. [Google Scholar] [CrossRef]
Hosny, A.; Parmar, C.; Quackenbush, J.; Schwartz, L.H.; Aerts, H.J.W.L. Artificial Intelligence in Radiology. Nat. Rev. Cancer 2018, 18, 500–510. [Google Scholar] [CrossRef]
Kamakshi, V.; Krishnan, N.C. Explainable Image Classification: The Journey So Far and the Road Ahead. AI 2023, 4, 620–651. [Google Scholar] [CrossRef]
Dhar, T.; Dey, N.; Borra, S.; Sherratt, R.S. Challenges of Deep Learning in Medical Image Analysis—Improving Explainability and Trust. IEEE Trans. Technol. Soc. 2023, 4, 68–75. [Google Scholar] [CrossRef]
Marey, A.; Arjmand, P.; Alerab, A.D.S.; Eslami, M.J.; Saad, A.M.; Sanchez, N.; Umair, M. Explainability, Transparency and Black Box Challenges of AI in Radiology: Impact on Patient Care in Cardiovascular Radiology. Egypt. J. Radiol. Nucl. Med. 2024, 55, 183. [Google Scholar] [CrossRef]
Muhammad, D.; Bendechache, M. Unveiling the Black Box: A Systematic Review of Explainable Artificial Intelligence in Medical Image Analysis. Comput. Struct. Biotechnol. J. 2024, 24, 542–560. [Google Scholar] [CrossRef]
Pelosi, D.; Cacciagrano, D.; Piangerelli, M. Explainability and Interpretability in Concept and Data Drift: A Systematic Literature Review. Algorithms 2025, 18, 443. [Google Scholar] [CrossRef]
Pennestrì, F.; Cabitza, F.; Picerno, N.; Banfi, G. Sharing Reliable Information Worldwide: Healthcare Strategies Based on Artificial Intelligence Need External Validation. Position Paper. BMC Med. Inform. Decis. Mak. 2025, 25, 56. [Google Scholar] [CrossRef]
Sperrin, M.; Riley, R.D.; Collins, G.S.; Martin, G.P. Targeted Validation: Validating Clinical Prediction Models in Their Intended Population and Setting. Diagn. Progn. Res. 2022, 6, 24. [Google Scholar] [CrossRef]
Muralidharan, V.; Adewale, B.A.; Huang, C.J.; Nta, M.T.; Ademiju, P.O.; Pathmarajah, P.; Hang, M.K.; Adesanya, O.; Abdullateef, R.O.; Babatunde, A.O.; et al. A Scoping Review of Reporting Gaps in FDA-Approved AI Medical Devices. NPJ Digit. Med. 2024, 7, 273. [Google Scholar] [CrossRef]
Borys, K.; Schmitt, Y.A.; Nauta, M.; Seifert, C.; Krämer, N.; Friedrich, C.M.; Nensa, F. Explainable AI in Medical Imaging: An Overview for Clinical Practitioners—Beyond Saliency-Based XAI Approaches. Eur. J. Radiol. 2023, 162, 110786. [Google Scholar] [CrossRef]
Saarela, M.; Podgorelec, V. Recent Applications of Explainable AI (XAI): A Systematic Literature Review. Appl. Sci. 2024, 14, 8884. [Google Scholar] [CrossRef]

Figure 1. t-SNE plot of 512-dimensional feature embeddings.

Figure 2. Manifold-learning plot of 512-dimensional feature embeddings.

Figure 3. MDS plot of 512-dimensional feature embeddings.

Figure 4. Offline exploratory inference and explainability prototype. The desktop interface displays: (I) the original chest radiograph; (II) Grad-CAM visualizations computed at the SqueezeNet features. 12 layer, with color-coded markers only for classes exceeding a user-defined probability threshold, the yellow line corresponds to the color-coded marker for the predicted abnormal class, highlighting the image region considered most relevant by the model for that classification, and should not be interpreted as a clinically validated lesion boundary; (III) per-class probability outputs for the nine dataset categories; and (IV) a fixed color legend and CSV export function for auditability. The application runs fully offline on commodity hardware and is presented here as an exploratory research prototype, not as a clinically validated triage or diagnostic tool.

Table 1. Distribution of classes and corresponding number of cases.

Class Number	Class Name	Number of Cases
00	Normal Anatomy	1340
01	Pulmonary Inflammatory Processes (Pneumonia)	1060
02	Higher Density (Pleural Effusion, Atelectatic Consolidation, Hydrothorax, Empyema)	678
03	Lower Density (Pneumothorax, Pneumomediastinum, Pneumoperitoneum)	629
04	Obstructive Pulmonary Diseases (Emphysema, Bronchopneumonia, Bronchiectasis, Embolism)	644
05	Degenerative Infectious Diseases (Tuberculosis, Sarcoidosis, Proteinosis, Fibrosis)	594
06	Encapsulated Lesions (Abscesses, Nodules, Cysts, Tumor Masses, Metastases)	658
07	Mediastinal Changes (Pericarditis, Arteriovenous Malformations, Lymph Node Enlargement)	596
08	Chest Changes (Atelectasis, Malformations, Agenesis, Hypoplasia)	544
	Total	6743

Table 2. Summary of the accuracy obtained on each fold, as well as the overall mean accuracy.

Fold	Accuracy
1	0.98813936
2	0.99110452
3	0.98888065
4	0.98293769
5	0.99035608
Mean	0.98828366

Table 3. Summary of confusion matrix, exhibiting limited misclassification across classes, reinforcing the high level of discriminative capacity.

Class	00	01	02	03	04	05	06	07	08	Total	% Misclassified
00	1316	17	0	3	0	0	4	0	0	1340	1.791045
01	10	1015	4	4	11	12	4	0	0	1060	4.245283
02	0	0	678	0	0	0	0	0	0	678	0
03	0	2	0	626	1	0	0	0	0	629	0.476948
04	0	4	1	0	639	0	0	0	0	644	0.776398
05	0	1	0	0	0	593	0	0	0	594	0.16835
06	0	0	0	1	0	0	657	0	0	658	0.151976
07	0	0	0	0	0	0	0	596	0	596	0
08	0	0	0	0	0	0	0	0	544	544	0
Total	1326	1039	683	634	651	605	665	596	544	6743

Class 00, Normal Anatomy; Class 01, Pulmonary Inflammatory Processes (Pneumonia); Class 02, Higher Density (Pleural Effusion, Atelectatic Consolidation, Hydrothorax, Empyema); Class 03, Lower Density (Pneumothorax, Pneumomediastinum, Pneumoperitoneum); Class 04, Obstructive Pulmonary Diseases (Emphysema, Bronchopneumonia, Bronchiectasis, Embolism); Class 05, Degenerative Infectious Diseases (Tuberculosis, Sarcoidosis, Proteinosis, Fibrosis); Class 06, Encapsulated Lesions (Abscesses, Nodules, Cysts, Tumor Masses, Metastases); Class 07, Mediastinal Changes (Pericarditis, Arteriovenous Malformations, Lymph Node Enlargement); Class 08, Chest Changes (Atelectasis, Malformations, Agenesis, Hypoplasia).

Table 4. Classification report (precision, recall, F1-score, and support) for each class.

Class Number	Class Name	Precision	Recall	F1-Score	Support
00	Normal Anatomy	0.99	0.98	0.99	1340
01	Pulmonary Inflammatory Processes	0.98	0.96	0.97	1060
02	Higher Density	0.99	1.00	1.00	678
03	Lower Density	0.99	1.00	0.99	629
04	Obstructive Pulmonary Diseases	0.98	0.99	0.99	644
05	Degenerative Infectious Diseases	0.98	1.00	0.99	594
06	Encapsulated Lesions	0.99	1.00	0.99	658
07	Mediastinal Changes	1.00	1.00	1.00	596
08	Chest Changes	1.00	1.00	1.00	544

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ramalhete, L.; Oliveira, V.; Quintas, R.; Araújo, R. Exploratory Image-Level Classification of a Public Chest Radiograph Dataset Using a Lightweight SqueezeNet-Based Pipeline. AI Med. 2026, 1, 15. https://doi.org/10.3390/aimed1020015

AMA Style

Ramalhete L, Oliveira V, Quintas R, Araújo R. Exploratory Image-Level Classification of a Public Chest Radiograph Dataset Using a Lightweight SqueezeNet-Based Pipeline. AI in Medicine. 2026; 1(2):15. https://doi.org/10.3390/aimed1020015

Chicago/Turabian Style

Ramalhete, Luis, Vitor Oliveira, Rui Quintas, and Rúben Araújo. 2026. "Exploratory Image-Level Classification of a Public Chest Radiograph Dataset Using a Lightweight SqueezeNet-Based Pipeline" AI in Medicine 1, no. 2: 15. https://doi.org/10.3390/aimed1020015

APA Style

Ramalhete, L., Oliveira, V., Quintas, R., & Araújo, R. (2026). Exploratory Image-Level Classification of a Public Chest Radiograph Dataset Using a Lightweight SqueezeNet-Based Pipeline. AI in Medicine, 1(2), 15. https://doi.org/10.3390/aimed1020015

Article Menu

Exploratory Image-Level Classification of a Public Chest Radiograph Dataset Using a Lightweight SqueezeNet-Based Pipeline

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset and Ethical Considerations

2.2. Data Preprocessing and Feature Extraction

2.3. Classification Model and Training

2.4. Cross-Validation and Evaluation Metrics

2.5. Dimensionality Reduction and Visualization

2.6. Inference and Reporting

3. Results

3.1. Distribution of Images by Class

3.2. Classification Performance

3.3. Dimensionality Reduction and Visualization

4. Discussion

4.1. Internal Classification Performance Within the Evaluated Dataset

4.2. Comparison with Prior Work

4.3. Strengths of the Approach

4.4. Limitations and Considerations

4.5. Feature-Space Visualization and Separability

4.6. Implications for Future Methodological Development

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI