PELM: A Deep Learning Model for Early Detection of Pneumonia in Chest Radiography

Yanar, Erdem; Hardalaç, Fırat; Ayturan, Kubilay

doi:10.3390/app15126487

Open AccessArticle

PELM: A Deep Learning Model for Early Detection of Pneumonia in Chest Radiography

by

Erdem Yanar

^1,2,*

,

Fırat Hardalaç

²

and

Kubilay Ayturan

²

¹

Department of Healthcare Systems System Engineering, ASELSAN, 06200 Ankara, Turkey

²

Department of Electrical and Electronics Engineering, Gazi University, 06570 Ankara, Turkey

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(12), 6487; https://doi.org/10.3390/app15126487

Submission received: 5 May 2025 / Revised: 31 May 2025 / Accepted: 4 June 2025 / Published: 9 June 2025

Download

Browse Figures

Versions Notes

Abstract

Pneumonia remains a leading cause of respiratory morbidity and mortality, underscoring the need for rapid and accurate diagnosis to enable timely treatment and prevent complications. This study introduces PELM (Pneumonia Ensemble Learning Model), a novel deep learning framework for automated pneumonia detection using chest X-ray (CXR) images. The model integrates four high-performing architectures—InceptionV3, VGG16, ResNet50, and Vision Transformer (ViT)—via feature-level concatenation to exploit complementary feature representations. A curated, large-scale dataset comprising 50,000 PA-view CXR images was assembled from NIH ChestX-ray14, CheXpert, PadChest, and Kaggle CXR Pneumonia datasets, including both pneumonia and non-pneumonia cases. To ensure fair benchmarking, all models were trained and evaluated under identical preprocessing and hyperparameter settings. PELM achieved outstanding performance, with 96% accuracy, 99% precision, 91% recall, 95% F1-score, 91% specificity, and an AUC of 0.91—surpassing individual model baselines and previously published methods. Additionally, comparative experiments were conducted using tabular clinical data from over 10,000 patients, enabling a direct evaluation of image-based and structured-data-based classification pipelines. These results demonstrate that ensemble learning with hybrid architectures significantly enhances diagnostic accuracy and generalization. The proposed approach is computationally efficient, clinically scalable, and particularly well-suited for deployment in low-resource healthcare settings, where radiologist access may be limited. PELM represents a promising advancement toward reliable, interpretable, and accessible AI-assisted pneumonia screening in global clinical practice.

Keywords:

pneumonia; deep learning; early diagnosis; clinical decision support; medical imaging; automated diagnosis; ensemble learning

1. Introduction

1.1. Background

Pneumonia is a serious lower respiratory infection marked by inflammation of the lung parenchyma and alveoli. It remains a leading cause of mortality, particularly among children under five and older adults, with 740,180 deaths reported globally in children under five in 2019 alone [1,2,3]. Timely and accurate diagnosis is essential for improving outcomes, especially in under-resourced settings [4,5]. While diagnostic tools include clinical exams, lab tests, and imaging, chest radiography (CXR) is the most widely used modality due to its accessibility and non-invasive nature [6].

Despite its ubiquity, interpreting CXR images presents challenges: overlapping anatomical features, subtle opacities, and radiographic similarity to other thoracic conditions can hinder accurate assessment—especially in regions with limited access to experienced radiologists or advanced imaging tools [7,8,9]. Recent developments in deep learning (DL), particularly convolutional neural networks (CNNs), have demonstrated strong potential in automating image-based diagnostics, including pneumonia detection [10,11,12,13,14]. DL-based systems can extract features directly from raw images, reducing subjectivity and potentially enhancing diagnostic accuracy and throughput.

This study investigates and compares the performance of a diverse set of DL architectures—including CNNs, Transformer-based models, and state space (Mamba) models—for pneumonia detection in CXR images. By leveraging a large-scale, curated dataset and evaluating all models under identical conditions, we aim to provide a comprehensive analysis of their diagnostic potential, generalizability, and suitability for deployment in clinical workflows.

1.2. Motivation

Traditional diagnostic approaches rely heavily on manual interpretation, which is prone to delay and variability, particularly in high-volume or low-resource settings. Although CXR remains a crucial tool for pneumonia diagnosis—where the disease typically appears as localized opacities—its effectiveness depends on radiologist experience and image clarity. Deep learning offers an opportunity to mitigate these limitations by learning visual patterns from labeled data, thereby standardizing and accelerating the diagnostic process.

This study is motivated by the need to enhance the speed, accuracy, and consistency of pneumonia detection. We explore whether recent DL architectures, including hybrid and attention-based models, offer measurable performance gains over prior methods and assess their ability to generalize across a diverse clinical dataset.

1.3. Objectives

This work aims to achieve the following:

Compare the diagnostic performance of multiple CNN and Transformer-based architectures for pneumonia detection from CXR images.
Evaluate whether ensemble learning and attention mechanisms yield tangible improvements over traditional approaches.
Provide an empirical basis for selecting models for AI-assisted clinical decision support.

The remainder of this paper is structured as follows: Section 2 reviews related work in pneumonia detection using deep learning. Section 3 details the datasets, model architectures, and training procedures. Section 4 presents the experimental results, followed by Section 5 (discussion) and Section 6 (future work). Section 7 and Section 8 provide the overall conclusions and highlight the novel contributions of this study.

2. Literature Reviews

Over the past decade, deep learning (DL) has shown considerable success in automating the detection of pneumonia from chest radiographs (CXR), addressing challenges of radiologist workload and diagnostic variability. This section analyzes prior studies with an emphasis on their modeling strategies, dataset use, and clinical relevance, forming the foundation for the proposed PELM ensemble framework. Several key studies in the field have laid important groundwork by exploring different architectures, learning paradigms, and evaluation strategies in CXR-based pneumonia classification:

Asnaoui et al. [15] conducted a performance comparison of several fine-tuned models, notably InceptionResNet_V2, ResNet50, and MobileNet_V2, and found that an ensemble of these achieved the highest F1-score (94.84%) on a COVID-era pneumonia dataset. While effective, their work focused on model-level stacking without optimized weight tuning, limiting adaptability across datasets.
Rahman and Muhammad [16] used transfer learning with four CNNs—AlexNet, ResNet18, DenseNet201, and SqueezeNet—to distinguish between bacterial and viral pneumonia. Their comprehensive three-class evaluation (normal, bacterial, viral) demonstrated DenseNet201’s superiority, yet lacked ensemble integration or spatial attention mechanisms, which are vital for nuanced feature focus.
Hossain et al. [17] incorporated LSTM layers with MobileNet and ResNet backbones to model temporal dependencies in X-ray sequences. Although this hybrid CNN-RNN approach achieved 90.2% accuracy, its application to static CXR data is limited and requires sequential imaging, which is not always clinically available.
Ahmad et al. [18] proposed a temporal model using CheXNet and GRUs to track COVID-19 progression across serial CXRs. Their zone-based approach yielded a strong AUC of 0.98, showing the value of spatial granularity, though the model’s reliance on longitudinal data limits its generalizability to single-instance diagnoses.
Irvin et al. [19], through the CheXpert dataset, demonstrated DenseNet121’s capability to achieve radiologist-level performance (AUC: 0.98). However, the study was confined to single-model evaluations and did not explore ensemble robustness or recall optimization, which are critical in minimizing missed diagnoses.
An et al. [20] introduced multi-head attention and dynamic pooling within a DenseNet121-EfficientNetB0 hybrid, achieving 95.19% accuracy and significantly reducing false positives. Despite strong precision (98.38%), the method’s computational complexity and architectural specificity may hinder scalability in resource-constrained settings.
Pacal et al. [21] benchmarked over 60 CNN and ViT models on cervical imaging, illustrating that ViT-B16 combined with ensemble strategies (e.g., max-voting) improved low-class performance. While not focused on pneumonia, this study underlines the value of Transformer-based architectures and voting techniques in medical imaging.
Darici et al. [22] compared lightweight custom CNNs with ensemble strategies on pediatric CXRs. Their use of SMOTE for class balancing and separable convolutions yielded 95% accuracy in binary classification. However, their ensemble model underperformed compared to their standalone CNN, indicating potential redundancy without adaptive weighting.
Sharma and Guleria [5] combined VGG16 feature extraction with multiple classifiers (e.g., NN, SVM, RF), achieving up to 95.4% accuracy and AUCs of 0.988. While showcasing the benefit of hybrid pipelines, the approach lacked architectural fusion, which can consolidate spatial-semantic feature richness.
Varshni et al. [23] demonstrated that combining DenseNet169 features with classical classifiers like SVM outperformed logistic regression and basic CNNs (AUC: 0.80). However, limited dataset size (n = 2862) and absence of attention mechanisms constrained generalizability.

Existing works validate the effectiveness of CNN-based and ensemble methods in pneumonia detection, yet several gaps persist: lack of optimal ensemble weighting, underutilization of hybrid Transformer-CNN models, limited interpretability integration, and absence of real-time clinical deployment considerations.

To address these limitations, our proposed PELM model introduces the following:

Validation across a large, multi-source dataset with balanced class distributions.
Integrating CNNs and Transformers through feature-level fusion to enhance diagnostic accuracy.
Incorporating transfer learning, batch normalization, and dropout to mitigate overfitting.
Tailored for real-time deployment in resource-constrained environments such as ICUs and emergency settings.
PELM is designed to deliver a robust, interpretable, and clinically viable deep learning framework for automated pneumonia detection from chest radiographs.

3. Materials and Methods

3.1. Data Collection and Preprocessing

3.1.1. Dataset

In this study, we formulated pneumonia detection as a binary classification task, distinguishing pneumonia-positive cases from non-pneumonia cases using carefully curated samples from four large-scale, publicly available datasets: NIH ChestX-ray14, CheXpert, PadChest, and Kaggle CXR Pneumonia. For the positive class (label = 1), we included all available posteroanterior (PA) chest X-ray images labeled with pneumonia, irrespective of co-occurring thoracic findings such as pleural effusion, consolidation, or pulmonary edema. This inclusive strategy was designed to capture the clinical heterogeneity typically encountered in real-world diagnostic scenarios.

For the negative class (label = 0), we included both normal chest X-rays and those labeled with other non-pneumonic conditions, such as fibrosis, nodules, and cardiomegaly. This deliberate inclusion ensures that the model is trained not only to differentiate pneumonia from healthy lungs but also to distinguish it from other thoracic abnormalities with overlapping visual features—enhancing its clinical robustness.

To prevent patient-level data leakage and to support generalization, only a single PA-view image per patient was retained, specifically the earliest available radiograph. This decision mirrors routine clinical workflows, where initial diagnosis is often made based on a single radiographic snapshot. The resulting dataset reflects the diagnostic complexity and class variability seen in contemporary radiology departments, thereby providing a realistic foundation for the development of a clinically reliable and generalizable AI model for pneumonia detection. The class distribution across the contributing datasets is summarized in Table 1.

NIH Chest X-ray Dataset: The NIH (National Institutes of Health) Chest X-ray Dataset consists of 112,120 chest radiographs from 30,805 unique patients, each annotated with disease labels extracted via Natural Language Processing (NLP) techniques applied to corresponding radiology reports. Although the original reports are not publicly avail- able, the automated labeling process has been estimated to achieve over 90% accuracy, making the dataset particularly suitable for weakly supervised learning tasks [24]. This dataset was first introduced by Wang et al. in their open-access publication “ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases.” Despite its strengths, it is important to acknowledge certain limitations—namely, the potential for mislabeling due to NLP-driven annotation and the absence of access to the original radiology reports. Representative samples from the dataset are shown in Figure 1.

CheXpert Dataset: The CheXpert dataset contains 224,316 chest radiographs collected from 65,240 patients, each annotated with labels corresponding to 14 common chest radio graphic findings [19]. To improve the reliability of label extraction from free-text radiology reports, the dataset incorporates an uncertainty-aware labeling strategy, allowing the model to capture not only confirmed observations but also uncertain diagnostic information. This approach enhances the dataset’s applicability in training models under both supervised and weakly supervised settings. Representative samples from the dataset are shown in Figure 2.

The PadChest dataset is a comprehensive and clinically rich resource for medical imaging research, consisting of over 160,000 chest X-ray (CXR) images acquired from approximately 67,000 patients at San Juan Hospital in Alicante, Spain, between 2009 and 2017 [25]. These radiographs encompass a wide array of clinical presentations and imaging conditions, captured in six different positional views, thus offering substantial diversity for training and evaluating computer-aided diagnostic (CAD) systems. The dataset is further enhanced by extensive metadata, including details on image acquisition settings and patient demographics, supporting both clinical interpretation and algorithmic development.

Each CXR image is paired with a corresponding radiology report authored by expert radiologists. These reports have been meticulously annotated with 174 distinct radiographic findings, 19 differential diagnoses, and 104 anatomical locations. Annotations are organized using a hierarchical structure and mapped to standardized medical ontologies, including the Unified Medical Language System (UMLS), ensuring interoperability and consistency with other clinical datasets.

The annotation pipeline employed a hybrid strategy to balance accuracy and scalability. Approximately 27% of the reports were manually annotated by trained physicians to establish high-quality ground truth labels. The remaining reports were annotated automatically using a supervised deep learning pipeline that incorporates a recurrent neural network with attention mechanisms. This dual approach—combining expert insight with algorithmic efficiency—produced a robust, high-utility dataset that is particularly well-suited for the development and validation of deep learning models in thoracic imaging. Representative samples from the PadChest dataset are presented in Figure 3.

Kaggle CXR Pneumonia Dataset: The Kaggle Chest X-ray Pneumonia dataset consists of 5863 anterior-posterior chest radiographs in JPEG format, organized into three distinct subsets: training, testing, and validation, each containing subfolders for the two image classes—Pneumonia and Normal [26]. The images were retrospectively collected from pediatric patients aged one to five years at Guangzhou Women and Children’s Medical Center in China, with all imaging conducted as part of routine clinical care. To ensure data quality, all scans underwent initial screening to exclude unreadable or low-quality images. Diagnostic labels were assigned by two experienced physicians, with an independent review conducted by a third expert to resolve any discrepancies and confirm final annotations. The dataset includes both bacterial and viral pneumonia cases, which present distinct radiographic patterns—bacterial pneumonia typically appears as focal lobar consolidation, whereas viral pneumonia presents a diffuse interstitial pattern affecting both lungs. Representative examples are shown in Figure 4, illustrating clear lungs in a normal scan, localized consolidation in bacterial pneumonia, and diffuse opacities in viral pneumonia.

The CXR-NIH, CheXpert, PadChest, and Kaggle CXR datasets were combined, and independent datasets were studied to prevent patient overlaps. The authors have declared that there are no patient overlaps across the datasets, ensuring all samples are unique. Regarding dataset selection criteria, chest X-rays in both posterior-anterior (PA) and anteroposterior (AP) views were considered to ensure coverage of pneumonia cases commonly encountered in clinical settings, including ICU patients. Poor quality or unreadable images (e.g., excessive noise, motion blur) were excluded using an automatic filtering review process. The distribution of the datasets used in this study is presented in detail in Table 2.

To build a robust and generalizable model for pneumonia detection, this study employed data augmentation and transfer learning as foundational techniques. These strategies were critical for improving model performance, mitigating overfitting, and addressing class imbalance and dataset heterogeneity.

An initial pool of 57,656 chest X-ray (CXR) images was compiled from four large-scale public datasets: PadChest, CXR NIH, Kaggle CXR Pneumonia, and CheXpert. Following a rigorous quality control process—which excluded images with low resolution, poor exposure, or anatomical truncation—and a class-balancing step, a refined dataset of 50,000 images was constructed, containing equal numbers of pneumonia-positive and normal/other findings cases (25,000 each).

To ensure fair and consistent evaluation, the curated dataset was partitioned into 40,000 training, 5000 validations, and 5000 test images. The distribution and origin of the images used are detailed in Table 3.

While the curated dataset used in this study supports more consistent and balanced model training, it does not fully encapsulate the variability observed in real-world clinical practice. Technically suboptimal images—such as those affected by low contrast, motion artifacts, or non-standard patient positioning—were excluded during quality filtering. Although this improves training stability, it may limit the model’s generalizability, as such imperfections are frequently encountered in everyday radiology workflows.

Furthermore, while the negative (non-pneumonia) class included a mix of normal and pathological findings (e.g., fibrosis, cardiomegaly, and nodules), and the pneumonia class encompassed cases with overlapping thoracic abnormalities (e.g., pleural effusion, consolidation, and infiltration), both classes were curated to a degree. As a result, the dataset may not fully reflect the diagnostic complexity and clinical ambiguity often present in actual practice. Future studies should incorporate a broader range of image quality, acquisition variability, and comorbid conditions to better align with real-world deployment scenarios.

To mitigate dataset limitations and improve classification performance, we employed transfer learning using a diverse set of high-performing convolutional and transformer-based models, including VGG16, ResNet50, InceptionV3, DenseNet121, DenseNet201, MobileNetV2, and Vision Transformer (ViT). Pretrained on the ImageNet dataset, these architectures provide robust and transferable hierarchical representations, which were fine-tuned for pneumonia detection. Their adaptation facilitated faster convergence, improved feature extraction, and enhanced sensitivity to subtle pathological patterns in chest radiographs.

Additionally, clinically informed data augmentation was applied to emulate common variations in image acquisition. Augmentation strategies included horizontal flipping, constrained rotation (±5°), limited translation (≤5%), and zoom adjustments (95–105%), all tailored to simulate realistic radiographic variability without compromising anatomical fidelity. Contrast Limited Adaptive Histogram Equalization (CLAHE) was further used to enhance local contrast and improve feature visibility in low-density regions. Together, these preprocessing and augmentation techniques enhanced the model’s generalization ability, minimized overfitting, and yielded more robust diagnostic performance across diverse imaging conditions.

3.1.2. Preprocessing

Prior to model training, all chest X-ray (CXR) images were resized to a resolution of 224 × 224 pixels to ensure compatibility with standard convolutional neural network (CNN) architectures while balancing computational efficiency. Pixel intensity values were normalized to a range of [0, 1] to facilitate faster convergence during training. To mitigate noise—particularly speckle and multiplicative noise commonly found in radiographic imaging—image smoothing filters were applied during preprocessing. This approach preserved critical anatomical features, such as heart borders and lung field clarity, while minimizing distortions that could negatively impact model performance.

3.2. Clinically Aware Data Augmentation

Data augmentation strategies were carefully designed to increase dataset variability while preserving thoracic anatomical structures critical for accurate cardiomegaly assessment. Horizontal flipping was applied with a 50% probability, maintaining diagnostic validity due to the inherent left–right symmetry in frontal chest radiographs.

Rotation augmentation was constrained within a narrow range of ±5 degrees to avoid distorting essential structures such as the heart silhouette and mediastinum. Zoom augmentation was limited to a range of 95% to 105%, simulating realistic variations in radiographic magnification without disproportionately enlarging the heart or lung regions.

Translation was restricted to shifts of no more than 5% of the image width or height (approximately ±10 to 12 pixels) along both x- and y-axes, modeling minor variations in patient positioning during imaging without compromising anatomical integrity. Additionally, selective combinations of flipping and translation were employed to further enhance spatial diversity while maintaining clinical interpretability.

These carefully constrained preprocessing and augmentation techniques ensured the preservation of diagnostically relevant features, ultimately supporting improved model training, generalization, and performance.

3.3. Contrast Enhancement with CLAHE

To enhance the visibility of thoracic anatomical structures in low-contrast chest X-ray (CXR) images, Contrast Limited Adaptive Histogram Equalization (CLAHE) was employed as a key component of the preprocessing pipeline. The enhancement was applied specifically to the luminance (L) channel after converting the original RGB images to the LAB color space, ensuring that contrast adjustments did not introduce distortions in chromatic information.

CLAHE was configured with a clip limit of 2.0 and a tile grid size of 8 × 8—hyper parameters carefully selected to improve local contrast while minimizing the risk of noise over-amplification. This approach significantly improved the delineation of diagnostically relevant features such as cardiac borders, lung fields, and mediastinal structures, thereby facilitating more accurate feature extraction by deep learning models.

To standardize image appearance across diverse datasets, a binary K-Means clustering algorithm was utilized to automatically identify white-background images, which were subsequently inverted to match the conventional dark-background format used in CXR interpretation. This consistency in preprocessing enhanced cross-dataset compatibility and improved the model’s ability to generalize across varying acquisition formats. A summary of the applied augmentation techniques and their clinical motivations is provided in Table 4.

These augmentations were applied during training to increase data diversity and reduce the risk of overfitting.

While an ablation study isolating the individual contributions of each augmentation technique was not conducted in this work, prior literature has consistently demonstrated their utility in radiographic imaging tasks. As such, our augmentation pipeline was designed based on clinically grounded heuristics and best practices in medical image preprocessing. In future work, we intend to conduct a systematic ablation analysis to quantify the incremental performance gains attributable to each transformation. Examples of the applied data augmentation techniques, including CLAHE, are presented in Figure 5.

As a result of the full data augmentation pipeline—including random rotation, controlled zooming, horizontal flipping, and CLAHE-based contrast enhancement—the dataset was expanded and standardized to contain 50,000 chest X-ray (CXR) images. The dataset was evenly balanced between pneumonia and non-pneumonia cases, with the latter encompassing both normal radiographs and other thoracic pathologies such as fibrosis, nodules, and cardiomegaly. Images were sourced from four publicly available datasets: NIH ChestX-ray14, CheXpert, PadChest, and Kaggle CXR Pneumonia.

To support robust model development and evaluation, the dataset was partitioned into training (40,000 images), validation (5000 images), and testing (5000 images) subsets. Class balance was preserved across all splits to ensure fair performance comparison and mitigate sampling bias. The detailed breakdown of the augmented dataset is presented in Table 5.

3.4. Model Architectures

We explored several CNN architectures to determine the most effective model for diagnosing pneumonia from chest X-rays. The models included were as follows:

VGG16 is a 16-layer deep CNN known for its uniform and straightforward architecture. It consists of stacked 3 × 3 convolutional layers followed by max-pooling layers, concluding with three fully connected layers. Despite having around 138 million parameters, VGG16 remains a popular backbone for transfer learning due to its strong performance on large-scale datasets like ImageNet. An overview of the adapted model used in our experiments is shown in Figure 6.

ResNet50 is a 50-layer network that introduces residual connections to enable efficient training of deep networks. It uses identity and convolutional blocks with bottleneck layers for dimensionality reduction. With approximately 25.6 million parameters, ResNet50 is widely adopted for its generalization and ease of optimization. An overview of the adapted model used in this work is illustrated in Figure 7.

Figure 7. The structure of the adapted ResNet50-based model.

InceptionV3 employs factorized and asymmetric convolutions to reduce computational cost while preserving expressiveness. It also integrates auxiliary classifiers to aid in training deeper networks. With ~23.5 million parameters, it supports larger input sizes and excels in multi-scale feature extraction. An overview of the adapted version used in this work is illustrated in Figure 8.

Figure 8. The structure of the adapted InceptionV3-based model.

The Vision Transformer (ViT) applies transformer architecture to image patches, enabling global attention across the spatial domain. It divides the image into non-overlapping patches and processes them using multi-head self-attention mechanisms. ViT excels at capturing long-range dependencies and has demonstrated competitive performance on visual recognition tasks.

In this study, ViT operates on 112 × 112 images, partitioned into 49 patches of size 16 × 16 × 3. Each patch is embedded into a 768-dimensional vector, forming a 50 × 768 sequence after adding the class token. The transformer encoder processes this sequence through 12 layers, leveraging attention heads to capture inter-patch dependencies. With these matrices computed, the self-attention mechanism within the MHA block is executed using the following equation:

Attention (Q, K, V) = softmax ((Q ∗ K^T)/sqrt (d_K)),

(1)

Compared to CNNs, which extract spatial features hierarchically through local filters, ViT offers a global attention mechanism that effectively models long-range relationships. This makes it particularly well-suited for medical imaging tasks where contextual awareness is crucial. An illustration of the ViT architecture is shown in Figure 9.

The objective of the proposed PELM architecture is to effectively extract and combine rich feature representations from chest X-ray images for the binary classification of pneumonia and normal cases. As illustrated in Figure 10, the model leverages four high-performing pre-trained networks: InceptionV3, VGG16, ResNet50, and the Vision Transformer (ViT). Before feature extraction, all input chest X-ray images are standardized to a resolution of 224 × 224 pixels to maintain compatibility with the pre-trained models. These backbone architectures were initially trained on the ImageNet dataset [27], enabling the transfer of generalized visual features to the medical imaging domain.

Each base model in the PELM ensemble independently processes the input chest X-ray image of size 224 × 224 × 3 and extracts deep visual features through its respective convolutional and transformer layers. Rather than producing direct class probabilities, the models are modified to output intermediate feature representations optimized for pneumonia-related information.

Let:

F₁: Feature vector from InceptionV3, reduced to 80 dimensions;
F₂: Feature vector from VGG16, reduced to 2048 dimensions;
F₃: Feature vector from ResNet50, reduced to 512 dimensions;
F₄: Feature vector from Vision Transformer (ViT), reduced to 768 dimensions.

Each of these outputs passes through a Last Identity Compensation Layer, aligning their feature shapes and scaling for concatenation. The resulting unified representation is as follows:

F = [F₁∥F₂∥F₃∥F₄]∈R^1×3408

(2)

This fused vector aggregates both local and global features, combining convolutional and transformer-based descriptors into a comprehensive feature profile. As illustrated in Figure 10, the concatenated feature vector F is passed to a fully connected classification head, composed of the following:

Batch Normalization layer: stabilizes feature distributions;
Dense Layer 1;
Dropout (rate = 0.4);
Dense Layer 2;
Dropout (rate = 0.3);
Dense Layer 3;
Output Layer: A single neuron with sigmoid activation.

The output of the final layer is a scalar probability y_predicted∈[0, 1] indicating the predicted likelihood of pneumonia. Binary classification is performed using a threshold of 0.5:

y^ = σ(WF + b) ⇒ Prediction = 1 if y_predicted ≥ 0.5 (Pneumonia),
otherwise (Normal/Other Findings)

(3)

Each architecture processes the input image in parallel, extracting hierarchical and semantic features relevant to pneumonia diagnosis. To ensure compatibility and reduce dimensional complexity, a Last Identity Compensation Layer is applied to each model’s output, followed by dimensional adjustments—80 neurons for InceptionV3, 2048 for VGG16, 512 for ResNet50, and 768 for ViT. These representations are then concatenated into a unified feature vector of size 3408, combining both local and global feature descriptors from each 568 model.

To expedite training and mitigate overfitting, the weights of the four backbone models were frozen during the fine-tuning phase. Only the classification head—composed of batch normalization, multiple dense layers, and dropout layers (with dropout rates of 0.4 and 0.3)—was retrained using pneumonia-specific data. The training was conducted using both a large dataset containing five different thoracic diseases and a curated pneumonia subset, allowing the model to adapt effectively to domain-specific patterns.

The final output layer produces binary predictions indicating the presence or absence of pneumonia. By integrating both convolutional and transformer-based feature representations and applying a structured dimensional alignment and fusion strategy, PELM offers a powerful, scalable, and clinically applicable framework for pneumonia diagnosis from chest X-ray images.

3.5. Training and Validation

The proposed PELM model was trained and validated using a large and diverse dataset compiled from multiple open-access chest X-ray repositories, including Kaggle CXR, Pneumonia CXR, NIH ChestX-ray14, PadChest, and CheXpert. To ensure class balance and clinical relevance, a custom dataset was prepared consisting of 50,000 images split into 40,000 for training, 5000 for validation, and 5000 for testing. The training set includes 20,000 pneumonia, 18,000 normal, and 2000 other findings images, ensuring the model is exposed to real-world variability and non-pneumonia conditions. Other findings images include thoracic pathologies other than pneumonia (e.g., pulmonary oedema, pleurisy, tuberculosis). The addition of these images was performed to increase the model’s ability to distinguish pneumonia from other lung abnormalities and to improve generalization performance. The category of other findings images was defined in Wang et al. [28]. Clinically significant and common thoracic diseases were selected. This diverse composition improves generalization and reduces class bias.

Model training was performed using the Adam optimizer with a learning rate of 0.001, and binary cross-entropy was used as the loss function to address the binary classification objective (pneumonia vs. normal). To prevent overfitting, early stopping was applied based on validation loss, and dropout layers with rates of 0.4 and 0.3 were incorporated into the fully connected layers of the ensemble head. Additionally, a 5-fold cross-validation strategy was implemented to ensure robust performance evaluation and minimize model variance across different data splits.

During training, metrics such as accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC-ROC) were calculated. The validation curves demonstrated strong convergence, and ROC analysis confirmed the model’s excellent discriminative ability. Overall, this training and evaluation pipeline ensured that the PELM model achieved high accuracy while maintaining generalization to unseen data, validating its potential for clinical deployment.

3.6. Hyperparameter Tuning

To maximize the performance of the proposed PELM ensemble model, hyper parameter tuning was performed using a grid search strategy. The goal was to identify the optimal combination of training parameters that would enhance convergence speed, reduce overfitting, and improve generalization on unseen data. The following parameters were tuned:

Learning Rate: Tested within the range of 0.0001 to 0.01, with 0.001 yielding the most stable and accurate results when using the Adam optimizer.
Batch Size: Values of 16, 32, and 64 were evaluated. A batch size of 32 offered the best trade-off between training stability and computational efficiency.
Dropout Rate: Dropout values between 0.2 and 0.5 were explored in the dense layers of the classification head. The final architecture included dropout rates of 0.4 and 0.3 in successive layers, effectively reducing overfitting.
Optimization: The Adam optimizer was used and momentum parameters were kept constant as ( $β 1 = 0.9$ , $β 2 = = 0.999$ ).
Number of Dense Units: Various configurations of dense layer sizes were tested after feature concatenation. The best performance was achieved using three fully connected layers with gradually decreasing units (e.g., 1024, 512, 128).
Activation Functions: ReLU activation was used for intermediate layers, while a sigmoid activation function was applied to the final output layer for binary classification.

This structured tuning process played a critical role in improving classification performance, helping the model achieve robust accuracy, high F1-score, and reliable generalization across diverse validation and test samples.

3.7. Performance Metrics

The performance of the proposed and comparative deep learning models was evaluated using a comprehensive set of metrics, including accuracy, precision, recall, F1-score, specificity, and the area under the receiver operating characteristic curve (AUC-ROC). These metrics offer a multidimensional perspective on model behavior, particularly under conditions of class imbalance and diagnostic uncertainty that are common in clinical settings. Additionally, training and validation curves for accuracy and loss across epochs were analyzed to assess each model’s learning dynamics and generalization capabilities.

In medical AI applications, selecting appropriate evaluation metrics is not merely a methodological choice but a clinical imperative. Metrics such as recall and specificity are critical for minimizing false negatives and false positives, respectively—both of which can have significant implications in diagnosis and treatment pathways. As demonstrated by Marengo et al., high AUC and F1 scores are indispensable in cardiovascular disease prediction tasks, as they reflect the model’s ability to balance sensitivity and specificity, a trade-off essential in screening contexts where misclassification carries high risk [29]. Similarly, Santamato et al. underscored the importance of using balanced accuracy and F1-score when evaluating deep learning models for COVID-19 and pneumonia, especially in real-world, variable conditions where class distributions and imaging quality may fluctuate [30].

These findings collectively affirm the necessity of multi-metric evaluation frameworks to ensure that AI models deployed in healthcare are accurate but also safe, interpretable, and equitable across diverse patient populations.

Accuracy: The most intuitive performance metric is the ratio of correctly predicted samples in relation to the total number of samples.

Accuracy = \frac{T P + T N}{T P + T N + F P + F N}

(4)

Precision: The calculation of the accuracy of the model is determined by determining the percentage of positively predicted samples, both actual and false, out of all positive samples that were correctly predicted.

Precision = \frac{T P}{T P + F P}

(5)

Recall: The proportion of positive instances that were correctly predicted, as a percentage of the total number of positive and false negative instances.

Recall = \frac{T P}{T P + F N}

(6)

F1-Score: The calculation of precision and recall as harmonic mean forms the basis of this evaluation.

F 1 -score = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(7)

Specificity: The ability of the model to accurately identify negative cases (true negatives) is a crucial metric in evaluating the performance of a model.

Specificity = \frac{T N}{T N + F P}

(8)

AUC: This parameter is indicative of the model’s overall capacity to differentiate between classes. The area under the curve (AUC) is calculated using the mean of sensitivity and specificity.

4. Results

Model Performance

The experimental system was configured with high-performance hardware to support computationally demanding tasks. It featured 64 GB of RAM, ensuring smooth multitasking and efficient management of large-scale datasets while preventing performance bottlenecks during data-intensive operations. For storage, a 1 TB HDD (hard disk drive) was integrated, ensuring adequate storage capacity for housing extensive datasets, intermediate computational outputs, and experimental results.

Processing capabilities were driven by an 11th-generation Intel Core i7-11850H processor (Intel, Ankara, Türkiye) (base clock speed: 2.50 GHz), delivering the computational power necessary for complex simulations and real-time analytical workloads. To further enhance performance in parallelizable tasks, particularly machine learning workflows, the system incorporated an NVIDIA GeForce RTX 3070 GPU (NVIDIA, Ankara, Türkiye). This graphics processing unit accelerated matrix operations and neural network training through its parallel computing architecture and optimized CUDA cores, significantly reducing processing times for model iteration and data-driven inference tasks.

Six proposed well-known methods for identifying and diagnosing pneumonia were thoroughly tested using X-ray images and established evaluation standards. We employed VGG16, ResNet50, InceptionV3, DenseNet121, DenseNet201, MobileNetV2, and VIT, utilizing pretrained models for feature extraction, in conjunction with a sigmoid/Adams classifier. The performance metrics for validation, and testing, including accuracy, precision, recall, and F1-score for recognizing pneumonia, are presented in Table 6 below:

Throughout all experiments, image enhancement techniques were applied to improve low-intensity contrast, and noise filtering was utilized, resulting in higher classification accuracy compared to the initial dataset. This study focused on the binary classification of pneumonia and healthy individuals using various pre-trained deep learning models, alongside the proposed ensemble model named PELM. Developed by combining InceptionV3, VGG16, ResNet50, and Vision Transformer (ViT), PELM leverages the complementary strengths of both convolutional and transformer-based architectures. Utilizing a weighted averaging strategy, the ensemble effectively balances high precision, sensitivity, and contextual feature extraction. As a result, PELM achieved 96% accuracy, 99% precision, 91% recall, and an F1-score of 0.95, outperforming all individual models. As demonstrated in Table 7, these results suggest that PELM surpasses prior studies in all key metrics, reflecting its potential for real-world clinical implementation. The model’s strong performance, enhanced by preprocessing and architecture fusion, not only improves diagnostic accuracy but also reduces computational overhead—making it highly suitable for deployment in resource-constrained or time-sensitive clinical environments. Ultimately, PELM represents a reliable, efficient, and cost-effective solution for pneumonia detection from chest X-rays, contributing to faster diagnoses and improved patient outcomes.

Compared to previous studies, the proposed ensemble model PELM exhibits superior and more balanced performance in pneumonia detection using chest X-ray images. Earlier works, such as Thakur et al. [31] and Chhikara et al. [33], reported high precision scores (0.98 and 0.90, respectively) and recall values reaching 0.95; however, these studies were limited by relatively small datasets and incomplete metric reporting, notably omitting specificity and AUC. Liang et al. [34] achieved a strong recall of 0.96 but reported lower accuracy (0.90) and F1-score (0.92), which may impact clinical reliability. Similarly, Zech et al. [35] obtained 95% accuracy using DenseNet121 and ResNet50 on a large-scale dataset, but with a notably low specificity of 0.71—reducing its effectiveness in real-world screening scenarios.

More recent models proposed by Sharma et al. [5], Mabrouk et al. [37], and Wang et al. [28] demonstrated improved balance across metrics, with F1-scores ranging from 0.93 to 0.94. In contrast, PELM outperformed all these approaches by achieving 96% accuracy, 99% precision, 91% recall, 95% F1-score, 91% specificity, and an AUC of 0.91—validated on a substantially larger and more diverse dataset comprising 44,836 CXR images.

These results underscore the robustness and generalizability of the PELM architecture, as well as its clinical potential for scalable, accurate, and real-time pneumonia detection in diverse healthcare environments.

5. Discussion

The findings of this study demonstrate that deep learning models, particularly ensemble-based architectures, can effectively and accurately diagnose pneumonia from chest X-ray (CXR) images. The strong performance metrics observed—especially those associated with ResNet50, DenseNet121, and the proposed PELM ensemble—highlight the capacity of deep neural networks to learn and extract complex spatial, textural, and semantic features from medical imaging data. These results support the growing body of evidence that deep learning holds significant promise as a diagnostic aid in radiological workflows, particularly in high-burden and resource-constrained environments.

5.1. Strengths of Deep Learning Models

One of the most prominent strengths demonstrated in this study is the high diagnostic accuracy achieved by the PELM ensemble model, which attained 96% accuracy, 99% precision, 91% recall, 95% F1-score, and 91% specificity on a large-scale curated dataset. These results surpass those reported by previous studies using single-model approaches or limited data, underscoring the value of combining multiple architectures to harness their complementary strengths. The fusion of convolutional and transformer-based features allowed the model to capture both local texture patterns and global contextual cues—critical for detecting pneumonia, which can vary significantly in appearance and distribution.

The use of transfer learning, model fusion, and carefully designed preprocessing—including CLAHE-based contrast enhancement and clinically aware data augmentation—contributed significantly to performance and generalizability. These components helped mitigate the effects of dataset imbalance, image acquisition variability, and feature sparsity. Moreover, automation of pneumonia detection reduces dependency on subjective radiological interpretation, minimizes inter-observer variability, and lowers the potential for diagnostic errors in fast-paced clinical settings. Once deployed, these models are scalable, require minimal computational resources for inference, and can be integrated into existing picture archiving and communication systems (PACS) or electronic health record (EHR) platforms, facilitating broad clinical utility.

5.2. Limitations and Challenges

Despite the promising outcomes, this study has several limitations that warrant discussion. Although the dataset used is considerably larger and more diverse than in many prior works, it does not fully encompass the variability encountered in real-world clinical populations. For instance, the curated dataset may underrepresent certain demographic groups, age brackets, and comorbid conditions, which can influence radiographic presentation. Furthermore, only frontal PA-view images were used, excluding lateral views or follow-up studies that might offer additional diagnostic value. Future research should focus on evaluating the model’s performance across more heterogeneous, multi-institutional, and longitudinal datasets.

Another critical challenge lies in the interpretability of deep learning models. While ensemble methods improve classification accuracy, they often increase architectural complexity and obscure the decision-making process. In clinical practice, understanding the rationale behind a model’s prediction is essential for building trust among healthcare providers. To this end, integrating explainable AI (XAI) tools—such as saliency maps, Grad-CAM, or layer-wise relevance propagation—will be vital for providing transparent and interpretable outputs that can be validated by clinicians.

In addition, artificially balanced class distributions (1:1 pneumonia vs. non-pneumonia) and the exclusion of low-quality radiographs—such as those with motion artifacts, low contrast, or suboptimal positioning—may introduce bias and limit the model’s robustness in real-world clinical scenarios. While improving training stability, these preprocessing steps reduce the representation of actual clinical variability where pneumonia prevalence is often lower, and image quality may be compromised.

Finally, although PELM demonstrates strong performance under experimental conditions, further research is needed to evaluate its real-time deployment feasibility, integration into clinical workflows, and validation through prospective studies in actual healthcare environments.

6. Future Work

While the present study demonstrates the effectiveness of deep learning models—particularly the proposed PELM architecture—in detecting pneumonia from chest X-rays, several directions remain for future exploration to strengthen clinical utility and translational impact.

Integration with Clinical Workflows: A key area for future development is the seamless integration of AI-driven diagnostic tools into real-world hospital environments. This includes compatibility with hospital information systems (HIS), picture archiving and communication systems, and electronic health records. Embedding AI outputs directly into these systems will enable radiologists and clinicians to access model predictions during routine workflows, thereby supporting faster, more informed clinical decision-making with minimal operational disruption.

Real-World Validation: To ensure robust generalizability and clinical relevance, future studies should focus on large-scale, prospective validation across diverse institutions and patient populations. Evaluating model performance under real-world conditions—including variations in imaging equipment, acquisition settings, and patient comorbidities—will be essential for establishing trust and determining deployment readiness. Such validation will also help identify edge cases and ensure equitable model performance across demographic groups.

Interpretability and Explainability: Despite high predictive accuracy, deep learning models are often criticized for their lack of transparency. Enhancing model interpretability remains a crucial challenge, especially in high-stakes domains like medical imaging. Future work should incorporate explainable AI techniques, Layer-wise Relevance Propagation, or counterfactual reasoning to provide intuitive visual and textual explanations for each prediction. These tools can improve clinician trust, support regulatory approval, and facilitate collaborative human–AI decision-making.

Toward Clinical Translation: Addressing the above areas will be instrumental in closing the gap between research and clinical deployment. Another important direction involves conducting systematic ablation studies to quantify the specific contribution of each data augmentation technique to overall model performance. Such experiments will support evidence-based refinement of preprocessing strategies, helping optimize model generalization across diverse imaging conditions. The successful transfer of AI to radiology applications depends on algorithmic accuracy as well as reliability, usability, and system-level integration. By focusing on workflow alignment, real-world validation, and interpretability, future efforts can enable the adoption of scalable, reliable, and clinically meaningful AI tools for pneumonia detection and beyond.

7. Conclusions

This study demonstrates the substantial potential of deep learning techniques in enhancing pneumonia detection from chest X-ray (CXR) images. Through a comprehensive comparative analysis of several state-of-the-art architectures—including VGG16, ResNet50, DenseNet121, InceptionV3, MobileNetV2, and Vision Transformer (ViT)—as well as a novel ensemble framework (PELM), we established that deep learning models can significantly outperform traditional diagnostic methods in both accuracy and operational efficiency. The proposed PELM ensemble achieved 96% accuracy, 99% precision, 91% recall, and a 95% F1-score, setting a new benchmark for AI-driven pneumonia diagnosis.

Our findings emphasize that model architecture and feature representation strategies are critical to achieving robust performance. While individual models such as ResNet50 and ViT demonstrated strong generalization capabilities, the ensemble approach effectively combined their complementary strengths to improve reliability and diagnostic precision. Visual analyses—including training curves and ROC visualizations—further confirmed stable convergence and balanced sensitivity–specificity tradeoffs.

Despite these promising outcomes, certain limitations persist. The curated dataset, though large and diverse relative to previous studies, may not fully represent the heterogeneity of real-world clinical populations. Future studies should therefore prioritize validation on multi-institutional, demographically varied datasets. Moreover, enhancing model interpretability remains an essential challenge for clinical adoption. Transparent prediction rationale is critical for building clinician trust and ensuring responsible deployment.

Future research will integrate AI models into clinical workflows via real-time PACS/EHR compatibility, conduct prospective real-world studies, and advance explainable AI (XAI) for model interpretability. The methodologies also provide a foundation for expanding automated diagnostics to other pulmonary conditions like tuberculosis or chronic obstructive pulmonary disease (COPD).

In summary, this work underscores the transformative role of deep learning in radiology and highlights its potential to augment clinical decision-making, streamline diagnostic workflows, and ultimately improve patient outcomes through faster and more accurate pneumonia detection.

8. Innovative Contribution

This study presents several key innovations in the field of AI-assisted medical diagnostics, particularly in the automated detection of pneumonia from chest X-ray images. Foremost among these is the introduction of PELM, a novel ensemble model that integrates the complementary strengths of four state-of-the-art deep learning architectures—InceptionV3, VGG16, ResNet50, and Vision Transformer. By leveraging the distinct representational capabilities of both convolutional and transformer-based models, PELM achieves outstanding performance across all major evaluation metrics, including 96% accuracy, 99% precision, 91% recall, 95% F1-score, 91% specificity, and an AUC of 0.91.

A major differentiator of this work is its comprehensive evaluation on a large and diverse dataset comprising 44,836 PA-view CXR images—far exceeding the scale of most prior studies. This significantly improves the model’s generalizability and enhances its clinical relevance. The study also introduces an optimized preprocessing pipeline that incorporates noise reduction and contrast enhancement using CLAHE, facilitating superior feature extraction and improving robustness under varied image quality conditions.

PELM consistently maintains a balanced trade-off between sensitivity and specificity, which is critical in clinical settings where both false negatives and false positives carry significant risks. The ensemble architecture not only boosts classification performance but also ensures adaptability to heterogeneous imaging patterns commonly found in real-world datasets.

Beyond technical contributions, this study addresses critical deployment considerations such as integration with clinical information systems, potential for prospective real-world validation, and scalability to additional thoracic diseases. As such, it establishes a forward-looking framework that positions PELM as a practical, modular, and scalable solution for future integration into AI-powered radiology workflows.

Author Contributions

E.Y.: methodology, software, data curation, validation, formal analysis, investigation, visualization, writing—original draft preparation, writing—review and editing; F.H.: conceptualization, validation, writing—review and editing, supervision, project administration; K.A.: validation, writing—original draft preparation, writing—review and editing, formal analysis. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by ASELSAN Inc.

Institutional Review Board Statement

The study did not require ethical approval.

Informed Consent Statement

Not applicable.

Data Availability Statement

The authors used an open access dataset that is available from Kaggle CXR Pneumonia: https://www.kaggle.com/datasets/paultimothymooney/chest-xray-pneumonia (accessed on 24 May 2025); PadChest: https://bimcv.cipf.es/bimcv-projects/padchest (accessed on 24 May 2025); CXR-NIH: https://www.kaggle.com/datasets/nih-chest-xrays/data (accessed on 24 May 2025); CheXpert: https://aimi.stanford.edu/datasets/chexpert-chest-x-rays (accessed on 24 May 2025).

Acknowledgments

We thank the referee for his comprehensive review and appreciate his comments, corrections, and suggestions, which contributed significantly to improving the quality of the publication.

Conflicts of Interest

The authors declare that this study received funding from ASELSAN Inc. The funder was not involved in the study design, collection, analysis, interpretation of data, the writing of this article or the decision to submit it for publication.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
AUC	Area Under The Curve
CLAHE	Contrast-Limited Adaptive Histogram Equalization
CNN	Convolutional Neural Networks
CT	Computed Tomography
CTR	Cardiothoracic Ratio
DL	Deep Learning
DOAJ	Directory Of Open-Access Journals
LD	Linear Dichroism
LSTM	Long Short-Term Memory
MAC	Multiply-Add Operations
MDPI	Multidisciplinary Digital Publishing Institute
MRI	Magnetic Resonance Imagine
NIH	National Institutes Of Health
PELM	Pneumonia Ensemble Learning Model
ResNet50	Residual Network 50
TLA	Three-Letter Acronym
ViT	Vision Transformer

References

Maurer, J.R. Long-term exposure to ambient air pollution and risk of hospitalization with community-acquired pneumonia in older adults. Yearb. Pulm. Dis. 2011, 2011, 150–152. [Google Scholar] [CrossRef]
Sajed, S.; Sanati, A.; Garcia, J.E.; Rostami, H.; Keshavarz, A.; Teixeira, A. The effectiveness of deep learning vs. traditionalmethods for lung disease diagnosis using chest X-ray images: A systematic review. Appl. Soft Comput. 2023, 147, 110817. [Google Scholar] [CrossRef]
WHO. Pneumonia “Pneumonia in Children”. 2021. Available online: https://www.who.int/news-room/fact-sheets/detail/pneumonia (accessed on 24 May 2025).
Aljawarneh, S.A.; Al-Quraan, R. Pneumonia detection using enhanced convolutional neural network model on chest X-ray images. Big Data 2025, 13, 16–29. [Google Scholar] [CrossRef] [PubMed]
Sharma, S.; Guleria, K. A deep learning based model for the detection of pneumonia from chest X-ray images using VGG-16 and neural networks. Procedia Comput. Sci. 2023, 218, 357–366. [Google Scholar] [CrossRef]
Campbell, H.; el Arifeen, S.; Hazir, T.; O’Kelly, J.; Bryce, J.; Rudan, I.; Qazi, S.A. Measuring coverage in MNCH: Challenges in monitoring the proportion of young children with pneumonia who receive antibiotic treatment. PLoS Med. 2013, 10, e1001421. [Google Scholar] [CrossRef]
Kanwal, K.; Asif, M.; Khalid, S.G.; Liu, H.; Qurashi, A.G.; Abdullah, S. Current diagnostic techniques for pneumonia: A scoping review. Sensors 2024, 24, 4291. [Google Scholar] [CrossRef]
Kareem, A.; Liu, H.; Sant, P. Review on pneumonia image detection: A machine learning approach. Hum.-Centric Intell. Syst. 2022, 2, 31–43. [Google Scholar] [CrossRef]
Htun, T.P.; Sun, Y.; Chua, H.L.; Pang, J. Clinical features for diagnosis of pneumonia among adults in primary care setting: A systematic and meta-review. Sci. Rep. 2019, 9, 7600. [Google Scholar] [CrossRef]
Ahmadova, A.; Huseynov, I.; Ibrahimov, Y. Improving pneumonia diagnosis with RadImageNet: A deep transfer learning approach. Authorea 2023, 8, 25. [Google Scholar] [CrossRef]
Prakash, J.A.; Asswin, C.; Ravi, V.; Sowmya, V.; Soman, K. Pediatric pneumonia diagnosis using stacked ensemble learning on multi-model Deep CNN Architectures. Multimed. Tools Appl. 2022, 82, 21311–21351. [Google Scholar] [CrossRef]
Shirwaikar, R. A machine learning application for medical image analysis using deep convolutional neural networks (cnns) and transfer learning models for pneumonia detection. J. Electr. Syst. 2024, 20, 2316–2324. [Google Scholar] [CrossRef]
Rashed, B.M.; Popescu, N. Performance investigation for Medical Image Evaluation and diagnosis using machine-learning and deep-learning techniques. Computation 2023, 11, 63. [Google Scholar] [CrossRef]
Van Ginneken, B.; Romeny, B.M.T.H.; Viergever, M.A. Computer-aided diagnosis in chest radiography: A survey. IEEE Trans. Med. Imaging 2001, 20, 1228–1241. [Google Scholar] [CrossRef] [PubMed]
El Asnaoui, K. Design ensemble deep learning model for pneumonia disease classification. Int. J. Multimed. Inf. Retr. 2021, 10, 55–68. [Google Scholar] [CrossRef]
Rahman, T.; Chowdhury, M.E.H.; Khandakar, A.; Islam, K.R.; Islam, K.F.; Mahbub, Z.B.; Kadir, M.A.; Kashem, S. Transfer Learning with Deep Convolutional Neural Network (CNN) for Pneumonia Detection Using Chest X-ray. Appl. Sci. 2020, 10, 3233. [Google Scholar] [CrossRef]
Hossain, S.; Rahman, R.; Ahmed, M.S.; Islam, M.S. Pneumonia detection by analyzing XRAY images using mobilenet, resnet architecture and long short term memory. In Proceedings of the 2020 30th International Conference on Computer Theory and Applications (ICCTA), Alexandria, Egypt, 12–14 December 2020; pp. 60–64. [Google Scholar] [CrossRef]
Ahmad, J.; Saudagar, A.K.; Malik, K.M.; Ahmad, W.; Khan, M.B.; Hasanat, M.H.; AlTameem, A.; AlKhathami, M.; Sajjad, M. Disease progression detection via deep sequence learning of successive radiographic scans. Int. J. Environ. Res. Public Health 2022, 19, 480. [Google Scholar] [CrossRef]
Irvin, J.; Rajpurkar, P.; Ko, M.; Yu, Y.; Ciurea-Ilcus, S.; Chute, C.; Marklund, H.; Haghgoo, B.; Ball, R.; Shpanskaya, K.; et al. Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. Proc. AAAI Conf. Artif. Intell. 2019, 33, 590–597. [Google Scholar] [CrossRef]
An, Q.; Chen, W.; Shao, W. A deep convolutional neural network for pneumonia detection in X-ray images with attention ensemble. Diagnostics 2024, 14, 390. [Google Scholar] [CrossRef]
Pacal, I.; Kılıcarslan, S. Deep learning-based approaches for robust classification of cervical cancer. Neural Comput. Appl. 2023, 35, 18813–18828. [Google Scholar] [CrossRef]
Darici, M.B.; Dokur, Z.; Olmez, T. Pneumonia detection and classification using deep learning on chest X-ray images. Int. J. Intell. Syst. Appl. Eng. 2020, 8, 177–183. [Google Scholar] [CrossRef]
Varshni, D.; Thakral, K.; Agarwal, L.; Nijhawan, R.; Mittal, A. Pneumonia detection using CNN based feature extraction. In Proceedings of the 2019 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), Coimbatore, India, 20–22 February 2019; pp. 1–7. [Google Scholar] [CrossRef]
Wang, X.; Peng, Y.; Lu, L.; Lu, Z.; Bagheri, M.; Summers, R.M. Chestx-Ray8: Hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 3462–3471. [Google Scholar] [CrossRef]
CXR-PadChest. Available online: https://bimcv.cipf.es/bimcv-projects/padchest (accessed on 24 May 2025).
Kermany, D.S.; Goldbaum, M.; Cai, W.; Valentim, C.C.S.; Liang, H.; Baxter, S.L.; McKeown, A.; Yang, G.; Wu, X.; Yan, F.; et al. Identifying medical diagnoses and treatable diseases by image-based Deep Learning. Cell 2018, 172, 1122–1131.e9. [Google Scholar] [CrossRef] [PubMed]
Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Li, F.-F. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar] [CrossRef]
Wang, K.; Jiang, P.; Meng, J.; Jiang, X. Attention-based DenseNet for pneumonia classification. Irbm 2022, 43, 479–485. [Google Scholar] [CrossRef]
Marengo, A.; Pagano, A.; Santamato, V. An efficient cardiovascular disease prediction model through AI-driven IOT Technology. Comput. Biol. Med. 2024, 183, 109330. [Google Scholar] [CrossRef]
Santamato, V.; Tricase, C.; Faccilongo, N.; Iacoviello, M.; Pange, J.; Marengo, A. Machine learning for evaluating hospital mobility: An Italian case study. Appl. Sci. 2024, 14, 6016. [Google Scholar] [CrossRef]
Thakur, S.; Goplani, Y.; Arora, S.; Upadhyay, R.; Sharma, G. Chest X-ray images based automated detection of pneumonia using transfer learning and CNN. In Proceedings of International Conference on Artificial Intelligence and Applications, New Delhi, India, 6–7 February 2020; Advances in Intelligent Systems and Computing. Springer: Berlin/Heidelberg, Germany, 2020; pp. 329–335. [Google Scholar] [CrossRef]
Jain, R.; Nagrath, P.; Kataria, G.; Sirish Kaushik, V.; Jude Hemanth, D. Pneumonia detection in chest x-ray images using convolutional neural networks and transfer learning. Measurement 2020, 165, 108046. [Google Scholar] [CrossRef]
Chhikara, P.; Singh, P.; Gupta, P.; Bhatia, T. Deep convolutional neural network with transfer learning for detecting pneumonia on chest X-rays. In Advances in Bioinformatics, Multimedia, and Electronics Circuits and Signals; Advances in Intelligent Systems and Computing; Springer: Berlin/Heidelberg, Germany, 2019; pp. 155–168. [Google Scholar] [CrossRef]
Liang, G.; Zheng, L. A transfer learning method with deep residual network for pediatric pneumonia diagnosis. Comput. Programs Biomed. 2020, 187, 104964. [Google Scholar] [CrossRef] [PubMed]
Zech, J.R.; Badgeley, M.A.; Liu, M.; Costa, A.B.; Titano, J.J.; Oermann, E.K. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study. PLOS Med. 2018, 15, e1002683. [Google Scholar] [CrossRef]
Bhatt, H.; Shah, M. A convolutional neural network ensemble model for pneumonia detection using chest X-ray images. Healthc. Anal. 2023, 3, 100176. [Google Scholar] [CrossRef]
Mabrouk, A.; Díaz Redondo, R.P.; Dahou, A.; Abd Elaziz, M.; Kayed, M. Pneumonia detection on chest X-ray images using ensemble of deep convolutional Neural Networks. Appl. Sci. 2022, 12, 6448. [Google Scholar] [CrossRef]

Figure 1. CXR-NIH dataset samples (CXR-NIH) ((a) “Infiltration and Pneumonia”, (b) “No Finding”, (c) “Emphysema and Fibrosis”, (d) “Mass”) [24].

Figure 2. CheXpert dataset samples (CheXpert) ((a) “Pneumonia”, (b) “Normal”, (c) “Enlarged Cardiomediastinum”, (d) “Lung Opacity”) [19].

Figure 3. PadChest dataset samples ((a) “Increased density, Apical Pleural Thickening, Fibrotic Band, Pneumonia, Volume Loss, Bullas”, (b) “Normal”, (c) “Copd Signs, Pulmonary Mass, Soft Tissue Mass”, (d) “Fibrotic Band”) [25].

Figure 4. Kaggle CXR Pneumonia Dataset ((a) “Normal”, (b) “Bacterial Pneumonia”, (c) “Viral Pneumonia”) [26].

Figure 5. Data augmentation samples.

Figure 6. The structure of the adapted VGG16-based model.

Figure 9. Details of ViT architecture.

Figure 10. Details of the proposed ensemble learning model PELM.

Table 1. Data distribution of data sets.

	CXR NIH	CheXpert	PadChest	Kaggle CXR Pneumonia
Source	NIH Clinical Center, USA	Stanford University, Stanford ML Group	San Juan Hospital, Alicante, Spain	Guangzhou Women and Children’s Medical Center, China
Images	112,120 frontal-view X-ray images from 30,805 patients	224,316 chest radiographs from 65,240 patients	160,000+ CXR images from 67,000 patients	5863 PA-view pediatric chest X-rays (JPEG format)
Label Method	14 thoracic pathologies mined using NLP from radiology reports	14 observations (e.g., pneumonia, edema) with uncertainty- aware label extraction from reports	174 radiological findings, 104 anatomical locations, 19 differential diagnoses	physician consensus with third-expert adjudication; bacterial and viral pneumonia classes included

Table 2. Data set distribution.

	Kaggle CXR Pneumonia	CXR NIH	PadChest	CheXpert
Pneumonia	4273	1431	8174	6047
Normal	1583	60,361	50,684	22,419
Other Findings	-	50,328	102,003	195,182
Total	5856	112,120	160,861	223,648

Table 3. Final image distribution after filtering and balancing.

Dataset	Pneumonia	Normal/Other Findings	Total Used	Quality Filtered	Notes
PadChest	3000	3000	6000	~2000	Downsampled for balance
CXR NIH	1400	1400	2800	~0	Nearly all pneumonia cases used
Kaggle CXR Pneumonia	400	400	800	~56	Small dataset, mostly retained
CheXpert	20,200	20,200	40,400	~5600	Majority source filtered and balanced
Total	25,000	25,000	50,000	~7656	After filtering + 1:1 balancing

Table 4. Augmentation settings and clinical purpose.

Augmentation	Range/Setting	Clinical Purpose
CLAHE	clipLimit = 2.0, grid = 8 × 8	Enhances local contrast in low-density lung regions, improving feature visibility.
Horizontal Flip	50% chance	Simulates left/right orientation variable, compromising diagnostic integrity.
Translation	Max ±12 pixels (approximately 5%)	Models minor patient positioning variable image acquisition.
Rotation	Max ±5°	Preserves heart silhouette stability with slight orientation tolerance.
Zoom	95–105%	Simulates small magnification changes, introducing anatomical distortion.

Table 5. Distribution of the augmented dataset.

	Training	Validation	Test
Pneumonia	20,000	2500	2500
Normal	18,000	2250	2250
Other Findings	2000	250	250
Total	40,000	5000	5000

Table 6. A comparison among existing methods and our proposed method’s performances in pneumonia detection.

Model	Accuracy	Precision	Recall	F1-Score	Specificity	AUC
PELM (Proposed Model)	0.96	0.99	0.91	0.95	0.91	0.91
ResNet50	0.95	0.95	0.85	0.90	0.77	0.81
DenseNet201	0.93	0.92	0.81	0.86	0.73	0.77
MobileNetV2	0.93	0.94	0.83	0.88	0.76	0.88
VGG16	0.93	0.95	0.85	0.90	0.77	0.81
InceptionV3	0.92	0.97	0.87	0.92	0.83	0.85
DenseNet121	0.92	0.97	0.88	0.92	0.82	0.85
ViT	0.92	0.97	0.88	0.92	0.82	0.85

Table 7. A comparison among previous existing methods in pneumonia detection.

Authors	Classes	Methods	Image Number	Accuracy	Precision	Recall	F1-Score	Specificity	AUC
(Thakur et al., 2020) [31]	Normal, Pneumonia	Pretrained VGG16	5856	0.90	0.98	-	0.93	-	-
(Jain et al., 2020) [32]	Normal, Pneumonia	VGG19,ResNet50,VGG16, two customized models and Inception V3	5856	0.80	-	-	-	-	-
(Chhikara et al., 2019) [33]	Normal, Pneumonia	Modified InceptionV3, Pre-processing	5856	0.90	0.90	0.95	0.93	-	0.91
(Liang et al., 2020) [34]	Normal, Pneumonia	CNN with residual DL architecture	5856	0.90	0.89	0.96	0.92	-	0.95
(Zech et al., 2018) [35]	Normal, Pneumonia	DenseNet121, ResNet50	158,323	0.95	0.93	-	-	0.71	0.93
(Sharma et al., 2023) [5]	Normal, Pneumonia	VGG16 with CNN	5856	0.92	0.94	0.93	0.93	-	0.97
(Bhatt et al., 2023) [36]	Normal, Pneumonia	CNN, Ensemble	5863	0.84	0.80	0.99	0.88	-	0.93
(Mabrouk et al., 2023) [37]	Normal, Pneumonia	DenseNet169, Mo-bileNetV2 and Vision Trans-former	5856	0.94	0.94	0.93	0.93	-	-
(Wang et al., 2022) [28]	Normal, Pneumonia	Modified DenseNet	5857	0.93	0.92	0.96	0.94	-	-
PELM (Proposed Model)	Normal, Pneumonia	CNN, In-ceptionV3, VGG16, ResNet- 50	44,836	0.96	0.99	0.91	0.95	0.91	0.91

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yanar, E.; Hardalaç, F.; Ayturan, K. PELM: A Deep Learning Model for Early Detection of Pneumonia in Chest Radiography. Appl. Sci. 2025, 15, 6487. https://doi.org/10.3390/app15126487

AMA Style

Yanar E, Hardalaç F, Ayturan K. PELM: A Deep Learning Model for Early Detection of Pneumonia in Chest Radiography. Applied Sciences. 2025; 15(12):6487. https://doi.org/10.3390/app15126487

Chicago/Turabian Style

Yanar, Erdem, Fırat Hardalaç, and Kubilay Ayturan. 2025. "PELM: A Deep Learning Model for Early Detection of Pneumonia in Chest Radiography" Applied Sciences 15, no. 12: 6487. https://doi.org/10.3390/app15126487

APA Style

Yanar, E., Hardalaç, F., & Ayturan, K. (2025). PELM: A Deep Learning Model for Early Detection of Pneumonia in Chest Radiography. Applied Sciences, 15(12), 6487. https://doi.org/10.3390/app15126487

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

PELM: A Deep Learning Model for Early Detection of Pneumonia in Chest Radiography

Abstract

1. Introduction

1.1. Background

1.2. Motivation

1.3. Objectives

2. Literature Reviews

3. Materials and Methods

3.1. Data Collection and Preprocessing

3.1.1. Dataset

3.1.2. Preprocessing

3.2. Clinically Aware Data Augmentation

3.3. Contrast Enhancement with CLAHE

3.4. Model Architectures

3.5. Training and Validation

3.6. Hyperparameter Tuning

3.7. Performance Metrics

4. Results

Model Performance

5. Discussion

5.1. Strengths of Deep Learning Models

5.2. Limitations and Challenges

6. Future Work

7. Conclusions

8. Innovative Contribution

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI