Interpretable Deep Learning for Pediatric Pneumonia Diagnosis Through Multi-Phase Feature Learning and Activation Patterns

Radočaj, Petra; Martinović, Goran

doi:10.3390/electronics14091899

Open AccessArticle

Interpretable Deep Learning for Pediatric Pneumonia Diagnosis Through Multi-Phase Feature Learning and Activation Patterns

by

Petra Radočaj

^1,* and

Goran Martinović

²

¹

Layer d.o.o., Vukovarska Cesta 31, 31000 Osijek, Croatia

²

Faculty of Electrical Engineering, Computer Science and Information Technology, Josip Juraj Strossmayer University of Osijek, Kneza Trpimira 2B, 31000 Osijek, Croatia

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(9), 1899; https://doi.org/10.3390/electronics14091899

Submission received: 29 March 2025 / Revised: 30 April 2025 / Accepted: 6 May 2025 / Published: 7 May 2025

(This article belongs to the Section Computer Science & Engineering)

Download

Browse Figures

Versions Notes

Abstract

Pediatric pneumonia remains a critical global health challenge requiring accurate and interpretable diagnostic solutions. Although deep learning has shown potential for pneumonia recognition on chest X-ray images, gaps persist in understanding model interpretability and feature learning during training. We evaluated four convolutional neural network (CNN) architectures, i.e., InceptionV3, InceptionResNetV2, DenseNet201, and MobileNetV2, using three approaches—standard convolution, multi-scale convolution, and strided convolution—all incorporating the Mish activation function. Among the tested models, InceptionResNetV2, with strided convolutions, demonstrated the best performance, achieving an accuracy of 0.9718. InceptionV3 also performed well using the same approach, with an accuracy of 0.9684. For DenseNet201 and MobileNetV2, the multi-scale convolution approach was more effective, with accuracies of 0.9676 and 0.9437, respectively. Gradient-weighted class activation mapping (Grad-CAM) visualizations provided critical insights, e.g., multi-scale convolutions identified diffuse viral pneumonia patterns across wider lung regions, while strided convolutions precisely highlighted localized bacterial consolidations, aligning with radiologists’ diagnostic priorities. These findings establish the following architectural guidelines: strided convolutions are suited to deep hierarchical CNNs, while multi-scale approaches optimize compact models. This research significantly advances the development of interpretable, high-performance diagnostic systems for pediatric pneumonia using chest X-rays, bridging the gap between computational innovation and clinical application.

Keywords:

pediatric pneumonia; convolutional neural networks; Mish activation function; multi-scale convolution; strided convolution; model interpretability; feature extraction

Graphical Abstract

1. Introduction

Worldwide morbidity and mortality from pediatric pneumonia persist as the key condition affecting children under five years old [1], with the highest burden observed in low- and middle-income countries (LMICs) [2]. Pneumonia causes 14% of deaths in children aged five and under, based on data from the World Health Organization, resulting in an estimated 740,000 fatalities annually [3]. Globally, the disease contributes to 10–20 million hospitalizations each year, with an incidence of 150–156 million cases, of which over 80% occur in LMICs [1,4]. Regions such as South Asia and Sub-Saharan Africa are disproportionately affected, with pneumonia being a leading cause of childhood mortality and a significant contributor to healthcare system strain [4]. Viral pathogens remain the main cause of pneumonia, but the introduction of vaccines against Streptococcus pneumoniae and Haemophilus influenzae has led to significant decline in bacterial pneumonia cases [5,6,7]. Nevertheless, S. pneumoniae and Mycoplasma pneumoniae remain the leading bacterial pneumonia pathogens affecting vaccinated pediatric patients beyond the neonatal stage [8,9]. The symptoms of pediatric pneumonia clinically present as cough with fever and quick breathing, along with signs of respiratory problems, often accompanied by systemic manifestations such as fatigue, vomiting, and decreased appetite [10,11]. The severity of the condition can lead to complications involving pleural effusion, hypoxemia, and respiratory failure, necessitating prompt diagnosis and intervention [12,13]. Chest X-rays are crucial for diagnosing pediatric pneumonia by visualizing key lung pathologies such as consolidation, interstitial infiltrates, and pleural effusion. However, interpreting chest X-rays remains a challenging task for radiologists, especially when assessing early-stage infections or when clinical symptoms match those of other respiratory conditions [14,15]. Moreover, access to radiologists and advanced imaging technologies varies, along with the ability to perform feature extraction [16]. CNNs excel at identifying complex patterns in imaging data, making them particularly well-suited for tasks such as pneumonia recognition in chest X-rays. However, the “black box” nature of these models has historically limited their clinical adoption, as healthcare providers require interpretable and transparent decision-making processes to trust and effectively utilize AI-driven tools [17,18,19]. This challenge has spurred the development of visualization techniques, such as Grad-CAM, which provide insights into the regions of an image that influence a model’s predictions [20]. By generating heatmaps that highlight areas of interest, Grad-CAM enables clinicians to understand how a CNN arrives at its conclusions, thereby bridging the gap between AI and clinical practice [21]. The integration of Grad-CAM into deep learning frameworks for medical image analysis offers several advantages. First, it improves model transparency by visually delineating the most influential regions in an image, such as areas of consolidation or interstitial infiltrates in pneumonia cases. This interpretability fosters clinician trust and enables the identification of potential biases or errors, allowing for iterative model refinement [22,23]. Second, Grad-CAM supports the validation of model predictions by ensuring alignment between highlighted regions and established clinical features, reinforcing the model’s focus on biologically relevant areas [24,25]. Lastly, these visualization techniques serve as valuable educational tools, aiding less experienced clinicians in recognizing the subtle radiographic manifestations of pneumonia that might otherwise be overlooked [26].

Despite significant advancements in deep learning for medical image analysis, the application of deep learning visualization techniques to pneumonia classification, particularly in pediatric cases, remains underexplored. A comparative analysis of research trends from 2015 to 2024 indicates a substantial increase in studies focused on deep learning visualization in medicine, reaching a peak of approximately 450 publications in 2021. While the Web of Science Core Collection [27] indexes a substantial volume of scientific publications, research specifically focused on pneumonia represents a comparatively minor subset, peaking at 22 publications in 2021 and subsequently exhibiting a decline, as quantitatively depicted in Figure 1. This trend underscores a critical research gap in the application of deep learning visualization for pneumonia classification, particularly in pediatric populations. While established visualization methodologies such as Grad-CAM have been integrated into emerging research to enhance pneumonia recognition using CNNs, the interpretability of CNN models in this domain remains insufficiently investigated. Key areas requiring further exploration include feature activation patterns and the optimization of training efficiency.

Several research gaps persist in the application of deep learning for pediatric pneumonia classification. First, there is limited understanding of how CNNs process and interpret images during various phases of training [22,26]. Identifying the stages of training that most significantly contribute to learning and analyzing the evolution of feature activation over time could provide valuable insights for optimizing model performance and interpretability. This is particularly relevant for pediatric pneumonia, where subtle radiographic patterns necessitate precise feature extraction and analysis. Second, while advanced activation functions such as Mish have demonstrated superior results regarding smoothness, as well as improved gradient flow capabilities compared to those of traditional functions like ReLU [28], their potential in pediatric pneumonia diagnosis remains largely untapped. The ability of Mish to recognize complex features within chest radiographs has not been fully leveraged, revealing opportunities for improvements in diagnostic accuracy and computational efficiency. Third, although visualization techniques like Grad-CAM have been applied to medical imaging, their use in analyzing feature activation patterns across diverse CNN architectures—such as standard CNNs, multi-scale CNNs, and strided CNNs—has not been systematically explored. Understanding how these architectures differ in their capacity to highlight clinically relevant features in pediatric pneumonia cases is essential for developing more interpretable and reliable models. Finally, there is a pressing need for research that balances diagnostic accuracy with computational efficiency, particularly for deployment in resource-constrained settings where pediatric pneumonia is most prevalent. Many existing models are computationally intensive, limiting their applicability in low-resource environments [29]. Addressing these gaps is critical for advancing the field of pediatric pneumonia diagnosis and ensuring the development of effective and accessible AI-driven tools. This research provides valuable additions to pediatric pneumonia diagnostic methods and deep learning in the following ways:

It provides a detailed analysis of how CNNs perceive and process images during different training phases, identifying critical learning stages that enhance model performance and interpretability in pediatric pneumonia classification.
By evaluating three distinct CNN architectures—standard, multi-scale, and strided—this research offers a comprehensive comparison of their strengths and limitations, guiding the selection of optimal models for pediatric pneumonia diagnosis.
Leveraging the Mish activation function and Grad-CAM visualization, this work enhances model transparency and diagnostic accuracy, enabling clinicians to better understand and trust AI-driven tools for pediatric pneumonia recognition.

Following this introduction, the paper proceeds as follows: Section 2 reviews the relevant literature. Section 3 provides a detailed description of the models and methodologies used. Section 4 presents and analyzes the key results. Finally, Section 5 concludes the paper with summary remarks and suggestions for future work.

2. Related Works

Advances in deep learning, particularly CNNs, have substantially enhanced the accuracy and interpretability of medical image analysis, particularly in the context of pediatric pneumonia diagnosis.

Luján-García et al. [30] utilized deep learning for pneumonia diagnosis in children under five, employing transfer learning with the pre-trained Xception network to classify 3883 pneumonia and 1349 normal chest X-ray images. The model achieved a precision of 0.84, a recall of 0.99, an F1 score of 0.91, and an AUC of 0.97, demonstrating competitive performance. They applied Grad-CAM to generate heatmaps, enabling the localization of pneumonia-related abnormalities. The study underscores the efficacy of transfer learning and visualization techniques in enhancing deep learning-based medical image analysis. Panwar et al. [31] developed a deep transfer learning model to enhance the accuracy and interpretability of COVID-19 detection using chest X-ray and CT-scan images. They integrated Grad-CAM for infection localization and implemented an early stopping mechanism to mitigate overfitting. The model achieved an accuracies of 0.9655 in COVID-19 detection, 0.9404 in distinguishing COVID-19 from non-COVID cases, and 0.8947 in differentiating COVID-19 from normal cases, demonstrating the efficacy of deep learning in reliable and interpretable COVID-19 diagnosis. Zebin and Rezvy [32] developed a transfer learning pipeline for automated COVID-19 classification using chest X-ray images from publicly available datasets. They employed multiple pre-trained convolutional backbones as feature extractors, achieving classification accuracies of 0.90 with VGG16, 0.94 with ResNet50, and 0.97 with EfficientNetB0. To address data imbalance, they trained a CycleGAN for the synthetic augmentation of COVID-19 cases. Additionally, they implemented Grad-CAM to enhance model interpretability, enabling the visualization of affected lung regions for diagnosis and monitoring disease progression. Rahman et al. [33] proposed a VGG-16-based deep learning framework for explainable COVID-19 and pneumonia classification using chest X-ray images. They incorporated image enhancement, ROI segmentation, and data augmentation to improve accuracy. Additionally, they introduced a multi-layer gradient-weighted class activation mapping (ML-Grad-CAM) algorithm to generate class-specific saliency maps and a severity assessment index (SAI) to quantify infection severity. Their model achieved an accuracy of 0.9644 in a three-class classification task, demonstrating the potential of saliency maps for both diagnostic interpretation and severity assessment. Mohagheghi et al. [34] proposed two methods for COVID-19 diagnosis and differentiation from viral pneumonia using X-ray images. They employed deep neural networks for classification and an image retrieval approach for discrimination, both trained on healthy, pneumonia, and COVID-19 cases. Transfer learning and hashing functions enhanced the performance, achieving an accuracy of 0.97 for CNN-based classification and an overall precision of 0.87 for retrieval. Additionally, they introduced a decision support system integrating image retrieval and visualization techniques, including CT involvement score calculation, to provide physicians with interpretable diagnostic insights. Owais et al. [35] developed a lightweight deep learning ensemble for COVID-19 diagnosis using CT-scan and X-ray images, incorporating MobileNet, ShuffleNet, and FCNet to reduce the trainable parameters to 3.16 million. They introduced a multilevel class activation mapping (ML-CAM) layer to enhance lesion visualization, facilitating radiologist-assisted validation. A novel hierarchical training procedure dynamically adjusted epochs based on validation performance, optimizing model convergence. The proposed model achieved F1-scores of 0.9460 (CT) and 0.9594 (X-ray), with AUCs of 0.9750 and 0.9799, respectively, demonstrating outstanding diagnostic accuracy and computational efficiency.

While these studies point to the significant capabilities of deep learning in medical image analysis, there remains a need to investigate how CNNs perceive images during different training phases, particularly in the context of pediatric pneumonia classification. Understanding which training stages contribute the most to learning and how feature activation evolves over time can provide valuable insights into optimizing model performance and interpretability. This study aims to address these gaps by analyzing feature activation maps using Grad-CAM and examining loss convergence across various CNN architectures, including a standard CNN, a CNN with multi-scale convolution, and a CNN with strided convolution. In each of these approaches, we employ the Mish activation function, which has demonstrated superior performance in terms of smoothness and gradient flow compared to that of traditional activation functions like ReLU [28,36,37]. By leveraging Mish, we aim to enhance the model’s ability to capture complex patterns in chest X-ray images, particularly in the context of pediatric pneumonia. Furthermore, by identifying critical learning phases and optimizing training efficiency, this paper aims to improve the interpretability of deep learning models while increasing their diagnostic accuracy when classifying pediatric pneumonia.

3. Materials and Methods

In this study, we introduce an interpretable deep learning framework for pediatric pneumonia diagnosis, integrating multi-phase feature learning and activation pattern analysis. The methodology involves three stages, as presented in Figure 2: (1) data preparation and image preprocessing, including categorization into healthy and pneumonia classes, using data augmentation techniques; (2) implementation and evaluation of InceptionV3, InceptionResNetV2, DenseNet201, and MobileNetV2 architectures, where three convolutional approaches were investigated: Approach 1, standard convolutions; Approach 2, multi-scale convolutions; and Approach 3, strided convolutions, each combined with Mish activation; and (3) performance assessment using standard metrics, with interpretability analysis via Grad-CAM.

3.1. Data Preprocessing and Experimental Setup

We employed the publicly available Chest X-Ray Images (Pneumonia) dataset [38], which comprises 5856 pediatric chest X-ray scans collected from patients aged one to five years at Guangzhou Women and Children’s Medical Center. The dataset consists of 4273 pneumonia cases and 1583 healthy controls [38]. To ensure data quality, an initial screening of all chest radiographs was conducted to eliminate low-quality or unreadable scans before analysis. Diagnosis assessment was conducted by two expert physicians who graded the images. A third expert validated the evaluation dataset to ensure accuracy and improve reliability. We partitioned the dataset using a stratified random split method at an 80:20 ratio, preserving the inherent class distribution. Although the dataset is imbalanced, we adopted multiple strategies to mitigate the impact of this characteristic. First, we applied extensive data augmentation—including rotation, translation, scaling, and horizontal flipping—equally across both classes to artificially expand the training set and reduce overfitting, specifically benefiting the minority (healthy) class. Second, we focused on evaluation metrics sensitive to class imbalance, such as precision, recall, and F1-score, rather than relying solely on accuracy. These metrics provided a more reliable assessment of model performance across both pneumonia and healthy classes. Third, stratified splitting ensured that the imbalance was consistently represented in both the training and validation sets, preventing bias toward the majority class during evaluation. We opted for a single split over k-fold cross-validation due to computational limitations, optimizing processing efficiency while maintaining comparable model performance. To standardize input dimensions, we resized all images to 224 × 224 pixels.

We built and trained the deep learning models in a Python 3.10 (Python Software Foundation, Wilmington, DE, USA) environment in Google Colab, leveraging the Keras-GPU [39] and TensorFlow-GPU [40] frameworks. We expedited training using NVIDIA Tesla K80 GPUs (Nvidia Corporation, Santa Clara, CA, USA) with 12 GB of memory. For each model, we set the training regimen to 20 epochs with a batch size of 32 and used the Adam optimizer for dynamic parameter updates. To prevent overfitting and improve convergence, we implemented an adaptive learning rate adjustment mechanism, reducing the rate upon validation performance stagnation, with a lower bound of 0.5 × 10⁻⁶.

3.2. Interpretability and Convolutional Methods in Regards to Pneumonia Recognition

In this study, we employed four state-of-the-art CNN architectures for pneumonia classification from pediatric chest X-ray images: InceptionV3, InceptionResNetV2, DenseNet201, and MobileNetV2. Each architecture presents unique characteristics that influence feature extraction, model complexity, and interpretability, particularly in the medical imaging domain, as demonstrated in Table 1.

InceptionV3, for instance, utilizes factorized convolutions and multi-scale processing via its Inception modules, enabling efficient, parallel feature extraction. It is particularly beneficial for capturing diverse patterns indicative of pulmonary infections and balances network depth against computational cost, a relevant consideration for pediatric datasets potentially exhibiting subtle signs [41,42,43]. InceptionResNetV2 enhances this ability by merging Inception’s multi-scale approach with residual connections, improving gradient flow to stabilize the training of very deep networks. This architecture is well-suited for learning complex, hierarchical features from detailed medical images, such as pediatric radiographs, while maintaining training robustness [44]. DenseNet201 adopts a dense connectivity pattern in which each layer accesses feature maps from all the preceding layers, promoting extensive feature reuse and efficient information propagation. The integration of multi-level features and its inherent parameter efficiency make it advantageous for tasks like pneumonia recognition, especially with limited datasets [45,46]. Finally, MobileNetV2 prioritizes computational efficiency through depth-wise separable convolutions and an inverted residual structure, with linear bottlenecks. This design substantially reduces parameters and floating-point operations, rendering it highly suitable for resource-constrained environments, although its streamlined architecture may offer less sensitivity to extremely subtle features compared to that provided by more complex models [43,47].

Grad-CAM is a visualization technique used to interpret the predictions of CNNs by highlighting important regions in an input image [20,31]. This method utilizes the gradient information flowing into the final convolutional layer to produce a coarse localization map of the salient regions. By visualizing these areas, Grad-CAM aids in the interpretability of deep learning models, which is crucial in medical applications such as pneumonia classification, where understanding why a model makes a particular decision can improve trust and reliability [48]. The class-discriminative localization map

L^{c}

for a target class is computed as defined in Equation (1):

L^{c} = ReLU \sum_{k} α_{k}^{c} A^{k}

(1)

where

A^{k}

represents the activation maps of the last convolutional layer, and

α_{k}^{c}

represents importance weights computed as defined in Equation (2):

α_{k}^{c} = \frac{1}{Z} \sum_{i} \sum_{j} \frac{\partial y^{c}}{\partial A_{i j}^{k}}

(2)

Here,

Z

is the number of spatial locations, and

y^{c}

represents the class score for class

c

. The

ReLU

function ensures that only positive influences are considered, emphasizing relevant regions in the image that contribute to the model’s decision [20,49]. This is particularly beneficial in pneumonia classification, where highlighting infected lung regions in chest X-ray images can assist radiologists in making more accurate diagnoses. Applying Grad-CAM in this study not only enhances model transparency but also allows for correlation between the architectural depth or complexity and the quality of the generated heatmaps. Models like DenseNet201 and InceptionResNetV2, due to their richer feature hierarchies, typically produce more distinct and clinically meaningful visual explanations compared to lightweight models like MobileNetV2.

In addition to interpretability techniques, activation functions play a crucial role in deep learning models. The Mish activation function, a non-monotonic function, has been shown to improve gradient flow and smoothness during training, leading to better feature extraction and generalization [36,37]. It is defined mathematically using Equation (3):

Mish (x) = x * \tanh (softplus (x)) = x * \tanh (\ln (1 + e^{x}))

(3)

In contrast to ReLU, which sets the negative values to zero, Mish preserves small negative values and provides a smooth transition, which is particularly beneficial in medical image analysis, as demonstrated in recent studies. The smooth non-linearity helps in preserving finer details in pneumonia classification tasks, allowing for improved model robustness. Mish has also been observed to provide better feature representation compared to that of traditional activation functions, leading to more stable training and enhanced classification accuracy in deep learning architectures [36].

Deep learning models for pneumonia classification leverage different types of convolutional operations to extract meaningful features from chest X-ray images. Standard convolutions are fundamental for capturing spatial features by applying fixed-size kernels to detect the texture and structural patterns associated with pneumonia-infected lungs [50]. Multi-scale convolutions employ filters of varying sizes to detect abnormalities at different resolutions, ensuring that subtle and large-scale pneumonia-related features are equally recognized [41,51]. This enhances the model’s ability to generalize across diverse manifestations of the disease, such as varying opacity and lesion size in infected lungs. Additionally, multi-scale convolutional layers help capture both local fine-grained details and broader anatomical structures, which are essential for distinguishing pneumonia from other lung conditions [52]. Strided convolutions serve a dual purpose of providing both feature extraction and dimensionality reduction, reducing computational complexity while preserving critical spatial information [53,54]. Unlike max-pooling, strided convolutions ensure a more structured downsampling process, which can be advantageous in medical image processing where fine details are crucial for accurate diagnosis. By reducing the spatial resolution while maintaining relevant features, strided convolutions facilitate deeper architectures without excessive computational costs [55]. Furthermore, when combined with residual connections, they mitigate information loss, ensuring that vital pneumonia-specific features are retained through the network layers.

By integrating Grad-CAM for model interpretability, Mish activation for enhanced feature extraction, and a combination of standard, multi-scale, and strided convolutions, deep learning-based pneumonia classification systems can achieve higher accuracy and robustness. These methodologies contribute to improved diagnostic performance, making CNN-based models more reliable for medical applications. The ability to highlight affected lung regions, capture multi-scale features, and efficiently process medical images enhances the potential for the use of AI-assisted diagnostic tools in clinical settings, leading to better patient outcomes and more informed medical decision making.

3.3. Pediatric Pneumonia Accuracy Assessment

We evaluated the performance of deep learning models for pneumonia classification in chest radiographs. We analyzed training and validation accuracy, loss, and key classification metrics, deriving them from the confusion matrix. These key metrics—precision, recall, specificity, F1-score, and accuracy—as defined in Equations (4)–(8), allowed us to perform a comprehensive evaluation of the models’ diagnostic capability, as follows:

Precision = \frac{TP}{TP + FP},

(4)

Recall = \frac{TP}{TP + FN},

(5)

Specificity = \frac{TN}{TN + FP},

(6)

F 1 - score = 2 * \frac{Precision * Recall}{Precision + Recall},

(7)

Accuracy = \frac{TP + TN}{TP + FP + TN + FN}

(8)

We classified the model’s outcomes as follows: true positives (TP), where it correctly identified pneumonia cases; true negatives (TN), where it correctly identified non-pneumonia cases; false positives (FP), where it incorrectly classified non-pneumonia cases as pneumonia; and false negatives (FN), where it incorrectly classified pneumonia cases as non-pneumonia. We calculated specificity, which reflects the model’s ability to correctly identify non-pneumonia cases, and recall (sensitivity), which reflects its ability to identify pneumonia cases. Radiographic features indicative of pneumonia, including increased pulmonary opacity, consolidation, and pleural effusion, served as primary discriminative criteria for the model’s classification algorithm. However, the potential for overlapping radiographic manifestations with alternative pathologies, coupled with the susceptibility to imaging artifacts, introduced sources of diagnostic error, resulting in classification inaccuracies.

4. Results and Discussion

In this study, we evaluated the efficacy of four deep learning architectures—InceptionV3, InceptionResNetV2, DenseNet201, and MobileNetV2—for pediatric pneumonia classification. We implemented three distinct convolutional approaches—Approach 1, base model with Mish activation function and standard convolution; Approach 2, base model with Mish activation function and multi-scale convolutions; Approach 3, base model with Mish activation function and strided convolutions. We assessed the performance of each architecture and approach using established metrics, including accuracy, precision, recall, F1-score, and specificity, providing a comprehensive evaluation of their classification capabilities. For InceptionResNetV2, Approach 3 yielded the best results, achieving the highest accuracy of 0.9718 and F1-score of 0.9634. This approach demonstrated its ability to efficiently capture complex features while maintaining a high precision of 0.9767 and a recall of 0.9519. The combination of strided convolutions and the Mish activation function likely enhanced feature extraction and generalization [56], making it the top-performing model overall. Similarly, for InceptionV3, Approach 3 achieved its best performance with an accuracy of 0.9684 and an F1-score of 0.9595. This approach likely improved its hierarchical feature extraction process and increased the specificity to 0.9211. The high precision and recall indicate that the model effectively minimized false positives and false negatives, respectively [57]. In contrast, DenseNet201 performed optimally with Approach 2, achieving an accuracy of 0.9676 and an F1-score 0.9582. The multi-scale approach complemented the dense connectivity of DenseNet201 by capturing features at multiple resolutions, leading to a robust balance between sensitivity, with a recall of 0.9510, and specificity, with a score of 0.9148. MobileNetV2 also benefited from Approach 2, achieving an accuracy of 0.9437 and an F1-score of 0.9254. Despite its smaller size, this approach allowed it to efficiently extract diverse features [47], although its specificity of 0.8297 was lower compared to that of other models. Key observations highlight that Approach 3 was particularly effective for InceptionV3 and InceptionResNetV2, aligning well with their hierarchical feature extraction processes and improving their computational efficiency. Approach 2 worked best for DenseNet201 and MobileNetV2, enhancing their ability to capture features at multiple resolutions while maintaining competitive performance. The precision and F1-score metrics revealed that all models effectively addressed class imbalance, with InceptionResNetV2 and DenseNet201 achieving the best balance between sensitivity and specificity. These findings underscore the importance of tailoring architectural modifications to the strengths of each model, as demonstrated in Table 2.

The confusion matrix provides a detailed breakdown of model performance across different classification approaches, as presented in Figure 3. For InceptionV3, Approach 3 yielded the best results, achieving a high true positive rate of 0.9860, with only 12 false negative cases. In contrast, Approach 2 struggled to distinguish healthy cases, leading to an increase in false positives. InceptionResNetV2 demonstrated consistent performance across all approaches, with Approach 3 producing the lowest false negative rate of just four cases and a recall of 0.9953, confirming its superior sensitivity in classifying pneumonia. MobileNetV2, however, exhibited higher false positive rates using Approach 3, indicating a tendency to misclassify healthy cases as pneumonia. The highest recall of 0.9860 was observed in Approach 2, demonstrating strong pneumonia detection capabilities but at the cost of increased false positives. DenseNet201 performed optimally with Approach 2, achieving a high specificity of 0.9148 and a recall of 0.9871 for pneumonia detection. However, Approach 3 resulted in a higher number of false positives, suggesting a trade-off in regards to specificity. Overall, Approach 3 proved the most effective for deeper models like InceptionV3 and InceptionResNetV2, while multi-scale convolutions in Approach 2 were more beneficial for MobileNetV2 and DenseNet201 [37,46,58]. This pattern aligns with the observed accuracy and F1-score trends, reinforcing the idea that different architectural enhancements impact classification trade-offs in unique ways.

The results, evaluated in terms of training accuracy, validation accuracy, and validation loss at 10 and 20 epochs, provide key insights into the learning dynamics and generalization capabilities of each model, as demonstrated in Table 3. Notably, Approach 3 demonstrated superior efficacy for InceptionV3 and InceptionResNetV2. InceptionV3 achieved a validation accuracy of 0.9583 at 10 epochs and 0.9705 at 20 epochs, with the lowest validation loss values of 0.1155 and 0.0744, respectively. Similarly, InceptionResNetV2 exhibited substantial improvement in validation accuracy from 0.9392 to 0.9670, while validation loss markedly decreased from 0.4888 to 0.0912. Conversely, Approach 2 yielded the most favorable results for DenseNet201 and MobileNetV2. DenseNet201 attained a validation accuracy of 0.9592 at 10 epochs and 0.9566 at 20 epochs, accompanied by validation loss values of 0.1171 and 0.1058, respectively. MobileNetV2 demonstrated the fastest convergence, achieving a validation accuracy of 0.9549 by 10 epochs, with a minimal change to 0.9523 at 20 epochs. Despite slightly higher validation loss values of 0.1289 at 10 epochs and 0.1383 at 20 epochs, MobileNetV2 exhibited stable training accuracy, recording 0.9688 at 10 epochs and 0.9520 at 20 epochs.

From a generalization perspective, InceptionV3 achieved the lowest validation loss of 0.0744 at 20 epochs, closely followed by DenseNet201 at 0.1058. Overall, the results underscore the importance of selecting architectural modifications based on model depth and computational constraints. Approach 3 proved beneficial for deeper architectures such as InceptionV3 and InceptionResNetV2, whereas Approach 2 exhibited optimized performance in more compact models like DenseNet201 and MobileNetV2.

Moreover, we analyzed Grad-CAM visualizations for deep learning models—InceptionV3, InceptionResNetV2, MobileNetV2, and DenseNet201—in classifying healthy lungs, viral pneumonia, and bacterial pneumonia from chest X-ray images, using multiple Grad-CAM approaches to highlight key decision-making areas, as presented in Figure 4, Figure 5, and Figure 6, respectively. In healthy cases, models like InceptionV3 and DenseNet201 show minimal activation, primarily along the ribcage or lung periphery, while viral pneumonia is characterized by diffuse, bilateral activation across both lungs, often including the heart region [59,60], as seen in MobileNetV2 and DenseNet201, which effectively capture interstitial changes. Bacterial pneumonia, in contrast, is identified by sharp, localized activation, typically within one lung lobe [61], with DenseNet201 excelling in its detection due to its highly focused heatmaps, while InceptionResNetV2 also well differentiates the sharply defined activations in specific lobes. Grad-CAM approaches, particularly Approach 3, provide the clearest visualizations, revealing sharp, focused bacterial pneumonia regions and broader, generalized viral patterns, while Approach 1 highlights initial lung field activations, and Approach 2 refines distinctions between pneumonia types. MobileNetV2 performs best for viral pneumonia due to its strong central lung and heart activation, DenseNet201 is the most accurate for bacterial pneumonia, with its distinct lobar focus, and InceptionResNetV2 offers a balanced performance for both. Key takeaways include minimal activation in healthy cases, diffuse patterns in viral pneumonia, and localized, sharp activations in bacterial pneumonia, with DenseNet201 emerging as the most precise model for distinguishing between the two pneumonia types [45,62]. Overall, Grad-CAM effectively illustrates how these models interpret pneumonia patterns, confirming DenseNet201’s superiority for bacterial pneumonia and MobileNetV2’s strength in viral pneumonia detection.

The consistent performance of MobileNetV2 with Approach 2 across training and validation metrics, coupled with its rapid convergence and stability, made it a suitable choice for detailed interpretability analysis using Grad-CAM, as presented in Figure 7. Its ability to generalize well, with minimal overfitting, underscores the importance of selecting models that balance performance, efficiency, and interpretability, particularly in medical image classification tasks, where false positives and false negatives can have significant clinical implications [63,64].

The integration of multi-scale convolutions with Mish activation in MobileNetV2 significantly improves heatmap-based lung disease detection by refining feature extraction across multiple spatial resolutions. This architectural enhancement enables the model to simultaneously capture fine-grained abnormalities, such as the diffuse opacities characteristic of viral pneumonia, and larger consolidations typical of bacterial infections. The multi-scale approach effectively suppresses false activations in non-pathological regions, maintaining precise focus on diagnostically relevant areas throughout the training process [41,52]. During the initial training phases, the multi-scale architecture mitigates scattered attention patterns by promoting more meaningful feature extraction. As training progresses to intermediate stages, it enhances consistency in activation patterns across diverse lung pathologies. In the final training stages, this approach produces sharply defined heatmaps, while preserving sensitivity to subtle pathological indicators that might otherwise be overlooked. The Mish activation function offers superior gradient flow and feature diversity compared to those of conventional ReLU activation [28,36]. Its smooth, non-monotonic characteristics address several limitations of traditional activation functions. Specifically, Mish prevents vanishing gradients during backpropagation while maintaining richer feature representations. This property proves particularly valuable in medical image analysis, where subtle pathological patterns require precise detection [37]. In early training iterations, Mish activation helps avoid suboptimal initialization traps that can hinder model convergence. During the intermediate training phases, it strengthens the mid-level feature extraction capabilities. In the final training stages, Mish activation yields more confident and precise spatial activations in the generated heatmaps. The combined implementation of multi-scale convolutions and Mish activation provides multiple synergistic benefits. First, it produces heatmaps with superior pathological localization. Second, it enhances classification accuracy for distinguishing between viral and bacterial pneumonia manifestations. Third, it significantly reduces activation noise in healthy tissue regions. Fourth, it accelerates model convergence, potentially enabling early stopping strategies, without compromising diagnostic performance. This optimized architecture demonstrates particular value in regards to clinical decision support systems, where reliable detection of pulmonary abnormalities must be maintained across diverse imaging conditions and acquisition protocols. The improved interpretability of the resulting heatmaps provides clinicians with more trustworthy visual explanations of the model’s diagnostic reasoning process.

This study has some limitations, primarily due to reliance on public datasets, which may introduce biases in regards to image quality and patient demographics, affecting generalizability. Future research should validate the model across diverse clinical settings to enhance robustness. Implementing a hybrid cloud-edge infrastructure could enable scalable deployment, balancing centralized model updates with local inference for improved data privacy. Additionally, long-term clinical integration requires continuous learning mechanisms. Incorporating radiologist feedback loops would allow the model to adapt to real-world cases, ensuring sustained diagnostic accuracy and practical clinical impact. Advanced deep learning approaches, particularly variational autoencoders (VAEs), have shown considerable value, especially those focused on analyzing COVID-19 chest X-rays. These methods have proven effective in addressing challenges like class imbalance and enhancing feature learning [65,66]. Future work could involve integrating VAE-based techniques for data balancing or feature extraction to further improve model robustness. Additionally, both highlighted sophisticated uses of Grad-CAM for model interpretability, suggesting that incorporating more advanced or modified Grad-CAM approaches could enhance explainability and clinical trust in pneumonia classification models.

5. Conclusions and Future Work

This study demonstrates that deep learning models can achieve excellent diagnostic performance for pediatric pneumonia when properly optimized. The InceptionResNetV2 model, with strided convolutions and Mish activation, achieved the highest accuracy of 0.9718, closely followed by InceptionV3 at 0.9684. The DenseNet201 architecture performed exceptionally well with multi-scale convolutions, reaching 0.9676 accuracy, while MobileNetV2 achieved 0.9437 accuracy using the same approach. These results highlight how different convolutional strategies can be matched to specific network architectures for optimal performance. The integration of Grad-CAM provided valuable interpretability, clearly visualizing diagnostic features in chest X-rays. For viral pneumonia, the models detected diffuse patterns across lung fields, while for bacterial cases, the models identified the precise localization of consolidations. This capability to explain decisions builds crucial trust for clinical adoption.

Moving forward, we aim to translate these research findings into clinical practice by developing an end-to-end AI diagnostic system capable of real-time pneumonia detection. This will require expanded validation across diverse patient populations and imaging protocols to ensure robustness. We also plan to optimize these models for practical deployment through the use of edge computing solutions that preserve diagnostic accuracy while meeting clinical latency requirements. Close collaboration with radiologists will be essential to refine model predictions and integrate AI assistance seamlessly into existing workflows. By bridging the gap between technical innovation and clinical needs, this work paves the way for more accurate, interpretable, and deployable AI tools for use in pediatric respiratory care.

Author Contributions

Conceptualization, P.R.; methodology, P.R.; software, P.R.; validation, G.M.; formal analysis, P.R.; investigation, P.R.; resources, P.R.; data curation, P.R.; writing—original draft preparation, P.R.; writing—review and editing, P.R. and G.M.; visualization, P.R.; supervision, G.M.; project administration, G.M.; funding acquisition, P.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The code developed in this study is available on request from the corresponding author. The open access Chest X-Ray Images (Pneumonia) repository containing images collected from pediatric patients, is divided into two categories—Pneumonia/Healthy—and is available at https://www.kaggle.com/datasets/paultimothymooney/chest-xray-pneumonia (accessed on 15 March 2025).

Conflicts of Interest

Author Petra Radočaj was employed by the company Layer d.o.o. The remaining author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.

References

Rudan, I.; O’brien, K.L.; Nair, H.; Liu, L.; Theodoratou, E.; Qazi, S.; Lukšić, I.; Walker, C.L.F.; Black, R.E.; Campbell, H. Epidemiology and Etiology of Childhood Pneumonia in 2010: Estimates of Incidence, Severe Morbidity, Mortality, Underlying Risk Factors and Causative Pathogens for 192 Countries. J. Glob. Health 2013, 3, 010401. [Google Scholar] [PubMed]
Marangu, D.; Zar, H.J. Childhood Pneumonia in Low-and-Middle-Income Countries: An Update. Paediatr. Respir. Rev. 2019, 32, 3–9. [Google Scholar] [CrossRef] [PubMed]
Pneumonia in Children. Available online: https://www.who.int/news-room/fact-sheets/detail/pneumonia (accessed on 22 December 2024).
Genie, Y.D.; Sayih, A.; Dessalegn, N.; Adugnaw, E.; Hiwot, A.Y.; Tesfa, T.B.; Kindie, K.; Gutema, L.; Ayalew, E.; Kebede, B.F. Time to Recovery from Severe Pneumonia and Its Predictors among Pediatric Patients Admitted in South West Region Governmental Hospitals, South West Ethiopia: Prospective Follow-up Study. Glob. Pediatr. 2024, 9, 100227. [Google Scholar] [CrossRef]
Scotta, M.C.; Marostica, P.J.C.; Stein, R.T. 25–Pneumonia in Children. In Kendig’s Disorders of the Respiratory Tract in Children, 9th ed.; Wilmott, R.W., Deterding, R., Li, A., Ratjen, F., Sly, P., Zar, H.J., Bush, A., Eds.; Elsevier: Philadelphia, PA, USA, 2019; pp. 427–438.e4. ISBN 978-0-323-44887-1. [Google Scholar]
Onwuchekwa, C.; Edem, B.; Williams, V.; Oga, E. Estimating the Impact of Pneumococcal Conjugate Vaccines on Childhood Pneumonia in Sub-Saharan Africa: A Systematic Review. F1000Research 2020, 9, 765. [Google Scholar] [CrossRef]
Pavia, M.; Bianco, A.; Nobile, C.G.A.; Marinelli, P.; Angelillo, I.F. Efficacy of Pneumococcal Vaccination in Children Younger than 24 Months: A Meta-Analysis. Pediatrics 2009, 123, e1103–e1110. [Google Scholar] [CrossRef]
de Groot, R.C.A.; Estevão, S.C.; Meyer Sauteur, P.M.; Perkasa, A.; Hoogenboezem, T.; Spuesens, E.B.M.; Verhagen, L.M.; van Rossum, A.M.C.; Unger, W.W.J. Mycoplasma Pneumoniae Carriage Evades Induction of Protective Mucosal Antibodies. Eur. Respir. J. 2022, 59, 2100129. [Google Scholar] [CrossRef]
Berg, A.S.; Inchley, C.S.; Aase, A.; Fjaerli, H.O.; Bull, R.; Aaberge, I.; Leegaard, T.M.; Nakstad, B. Etiology of Pneumonia in a Pediatric Population with High Pneumococcal Vaccine Coverage: A Prospective Study. Pediatr. Infect. Dis. J. 2016, 35, e69–e75. [Google Scholar] [CrossRef]
Søndergaard, M.J.; Friis, M.B.; Hansen, D.S.; Jørgensen, I.M. Clinical Manifestations in Infants and Children with Mycoplasma Pneumoniae Infection. PLoS ONE 2018, 13, e0195288. [Google Scholar] [CrossRef]
Biagi, C.; Cavallo, A.; Rocca, A.; Pierantoni, L.; Antonazzo, D.; Dondi, A.; Gabrielli, L.; Lazzarotto, T.; Lanari, M. Pulmonary and Extrapulmonary Manifestations in Hospitalized Children with Mycoplasma Pneumoniae Infection. Microorganisms 2021, 9, 2553. [Google Scholar] [CrossRef]
de Benedictis, F.M.; Kerem, E.; Chang, A.B.; Colin, A.A.; Zar, H.J.; Bush, A. Complicated Pneumonia in Children. Lancet 2020, 396, 786–798. [Google Scholar] [CrossRef]
Zhang, J.; Zhu, Y.; Zhou, Y.; Gao, F.; Qiu, X.; Li, J.; Yuan, H.; Jin, W.; Lin, W. Pediatric Adenovirus Pneumonia: Clinical Practice and Current Treatment. Front. Med. 2023, 10, 1207568. [Google Scholar] [CrossRef] [PubMed]
Fancourt, N.; Deloria Knoll, M.; Barger-Kamate, B.; de Campo, J.; de Campo, M.; Diallo, M.; Ebruke, B.E.; Feikin, D.R.; Gleeson, F.; Gong, W.; et al. Standardized Interpretation of Chest Radiographs in Cases of Pediatric Pneumonia From the PERCH Study. Clin. Infect. Dis. 2017, 64, S253–S261. [Google Scholar] [CrossRef] [PubMed]
Salehi, M.; Mohammadi, R.; Ghaffari, H.; Sadighi, N.; Reiazi, R. Automated Detection of Pneumonia Cases Using Deep Transfer Learning with Paediatric Chest X-Ray Images. Br. J. Radiol. 2021, 94, 20201263. [Google Scholar] [CrossRef]
Chan, H.-P.; Samala, R.K.; Hadjiiski, L.M.; Zhou, C. Deep Learning in Medical Image Analysis. Adv. Exp. Med. Biol. 2020, 1213, 3–21. [Google Scholar] [CrossRef] [PubMed]
Hakkoum, H.; Abnane, I.; Idri, A. Interpretability in the Medical Field: A Systematic Mapping and Review Study. Appl. Soft Comput. 2022, 117, 108391. [Google Scholar] [CrossRef]
Nguyen, H.; Huynh, H.; Tran, T.; Huynh, H. Explanation of the Convolutional Neural Network Classifying Chest X-Ray Images Supporting Pneumonia Diagnosis. EAI Endorsed Trans. Context-Aware Syst. Appl. 2020, 7, e3. [Google Scholar] [CrossRef]
Lo, S.-H.; Yin, Y. A Novel Interaction-Based Methodology towards Explainable AI with Better Understanding of Pneumonia Chest X-Ray Images. Discov. Artif. Intell. 2021, 1, 16. [Google Scholar] [CrossRef]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
Philbrick, K.A.; Yoshida, K.; Inoue, D.; Akkus, Z.; Kline, T.L.; Weston, A.D.; Korfiatis, P.; Takahashi, N.; Erickson, B.J. What Does Deep Learning See? Insights From a Classifier Trained to Predict Contrast Enhancement Phase From CT Images. AJR Am. J. Roentgenol. 2018, 211, 1184–1193. [Google Scholar] [CrossRef]
Shen, Y.; Huang, X. A Comparative Visualization Analysis of Neural Network Models Using Grad-CAM. Sci. Technol. Eng. Chem. Environ. Prot. 2024, 1, 10. [Google Scholar] [CrossRef]
Xiao, M.; Zhang, L.; Shi, W.; Liu, J.; He, W.; Jiang, Z. A Visualization Method Based on the Grad-CAM for Medical Image Segmentation Model. In Proceedings of the 2021 International Conference on Electronic Information Engineering and Computer Science (EIECS), Changchun, China, 23–26 September 2021; pp. 242–247. [Google Scholar]
Ali, A.A. Interpretable Deep Learning Framework for COVID-19 Detection: Grad-CAM Integration with Pre-Trained CNN Models on Chest X-Ray Images. Int. J. Sci. Res. Sci. Eng. Technol. 2025, 12, 153–163. [Google Scholar] [CrossRef]
Saporta, A.; Gui, X.; Agrawal, A.; Pareek, A.; Truong, S.Q.H.; Nguyen, C.D.T.; Ngo, V.-D.; Seekins, J.; Blankenberg, F.G.; Ng, A.Y.; et al. Benchmarking Saliency Methods for Chest X-Ray Interpretation. Nat. Mach. Intell. 2022, 4, 867–878. [Google Scholar] [CrossRef]
Suara, S.; Jha, A.; Sinha, P.; Sekh, A.A. Is Grad-CAM Explainable in Medical Images? In Proceedings of the Computer Vision and Image Processing; Kaur, H., Jakhetiya, V., Goyal, P., Khanna, P., Raman, B., Kumar, S., Eds.; Springer Nature: Cham, Switzerland, 2024; pp. 124–135. [Google Scholar]
Web of Science. Available online: https://www.webofscience.com (accessed on 18 April 2025).
Radočaj, P.; Radočaj, D.; Martinović, G. Optimizing Convolutional Neural Network Architectures with Optimal Activation Functions for Pediatric Pneumonia Diagnosis Using Chest X-Rays. Big Data Cogn. Comput. 2025, 9, 25. [Google Scholar] [CrossRef]
Naydenova, E.; Tsanas, A.; Casals-Pascual, C.; De Vos, M. Smart Diagnostic Algorithms for Automated Detection of Childhood Pneumonia in Resource-Constrained Settings. In Proceedings of the 2015 IEEE Global Humanitarian Technology Conference (GHTC), Seattle, WA, USA, 8–11 October 2015; pp. 377–384. [Google Scholar]
Luján-García, J.E.; Yáñez-Márquez, C.; Villuendas-Rey, Y.; Camacho-Nieto, O. A Transfer Learning Method for Pneumonia Classification and Visualization. Appl. Sci. 2020, 10, 2908. [Google Scholar] [CrossRef]
Panwar, H.; Gupta, P.K.; Siddiqui, M.K.; Morales-Menendez, R.; Bhardwaj, P.; Singh, V. A Deep Learning and Grad-CAM Based Color Visualization Approach for Fast Detection of COVID-19 Cases Using Chest X-Ray and CT-Scan Images. Chaos Solitons Fractals 2020, 140, 110190. [Google Scholar] [CrossRef] [PubMed]
Zebin, T.; Rezvy, S. COVID-19 Detection and Disease Progression Visualization: Deep Learning on Chest X-Rays for Classification and Coarse Localization. Appl. Intell. Dordr. Neth. 2021, 51, 1010–1021. [Google Scholar] [CrossRef] [PubMed]
Rahman, M.F.; Tseng, T.-L.; Pokojovy, M.; McCaffrey, P.; Walser, E.; Moen, S.; Vo, A.; Ho, J.C. Machine-Learning-Enabled Diagnostics with Improved Visualization of Disease Lesions in Chest X-Ray Images. Diagnostics 2024, 14, 1699. [Google Scholar] [CrossRef]
Mohagheghi, S.; Alizadeh, M.; Safavi, S.M.; Foruzan, A.H.; Chen, Y.-W. Integration of CNN, CBMIR, and Visualization Techniques for Diagnosis and Quantification of Covid-19 Disease. IEEE J. Biomed. Health Inform. 2021, 25, 1873–1880. [Google Scholar] [CrossRef]
Owais, M.; Yoon, H.S.; Mahmood, T.; Haider, A.; Sultan, H.; Park, K.R. Light-Weighted Ensemble Network with Multilevel Activation Visualization for Robust Diagnosis of COVID19 Pneumonia from Large-Scale Chest Radiographic Database. Appl. Soft Comput. 2021, 108, 107490. [Google Scholar] [CrossRef]
Misra, D. Mish: A Self Regularized Non-Monotonic Activation Function. arXiv 2019, arXiv:1908.08681. [Google Scholar]
Radočaj, P.; Radočaj, D.; Martinović, G. Pediatric Pneumonia Recognition Using an Improved DenseNet201 Model with Multi-Scale Convolutions and Mish Activation Function. Algorithms 2025, 18, 98. [Google Scholar] [CrossRef]
Kermany, D.S.; Goldbaum, M.; Cai, W.; Valentim, C.C.S.; Liang, H.; Baxter, S.L.; McKeown, A.; Yang, G.; Wu, X.; Yan, F.; et al. Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning. Cell 2018, 172, 1122–1131.e9. [Google Scholar] [CrossRef] [PubMed]
Keras 3 API Documentation. Available online: https://keras.io/api/ (accessed on 24 September 2024).
Module: Tf|TensorFlow v2.16.1. Available online: https://www.tensorflow.org/api_docs/python/tf (accessed on 4 May 2024).
Liu, J.; Zhang, L.; Guo, A.; Gao, Y.; Zheng, Y. Multi-Scale Feature Fusion Convolutional Neural Network for Multi-Modal Medical Image Fusion. In Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things, Xiamen, China, 26–28 May 2023; Association for Computing Machinery: New York, NY, USA, 2023; pp. 913–917. [Google Scholar]
Yi, R.; Tang, L.; Tian, Y.; Liu, J.; Wu, Z. Identification and Classification of Pneumonia Disease Using a Deep Learning-Based Intelligent Computational Framework. Neural Comput. Appl. 2023, 35, 14473–14486. [Google Scholar] [CrossRef] [PubMed]
Charan, K.S.; Krishna, O.V.; Sai, P.V.; Ilavarasi, A.K. Transfer Learning Based Multi-Class Lung Disease Prediction Using Textural Features Derived From Fusion Data. IEEE Access 2024, 12, 108248–108262. [Google Scholar] [CrossRef]
Bhatt, D.; Patel, C.; Talsania, H.; Patel, J.; Vaghela, R.; Pandya, S.; Modi, K.; Ghayvat, H. CNN Variants for Computer Vision: History, Architecture, Application, Challenges and Future Scope. Electronics 2021, 10, 2470. [Google Scholar] [CrossRef]
Sanghvi, H.A.; Patel, R.H.; Agarwal, A.; Gupta, S.; Sawhney, V.; Pandya, A.S. A Deep Learning Approach for Classification of COVID and Pneumonia Using DenseNet-201. Int. J. Imaging Syst. Technol. 2023, 33, 18–38. [Google Scholar] [CrossRef]
Roy, P.; Efat, A.H.; Hasan, S.M.; Srizon, A.Y.; Hossain, M.R.; Faruk, M.F.; Al Mamun, M. Multi-Scale Feature Fusion Framework Based on Attention Integrated Customized DenseNet201 Architecture for Multi-Class Skin Lesion Detection. In Proceedings of the 2024 IEEE International Conference on Power, Electrical, Electronics and Industrial Applications (PEEIACON), Rajshahi, Bangladesh, 12–13 September 2024; pp. 496–501. [Google Scholar]
Yuan, H.; Cheng, J.; Wu, Y.; Zeng, Z. Low-Res MobileNet: An Efficient Lightweight Network for Low-Resolution Image Classification in Resource-Constrained Scenarios. Multimed. Tools Appl. 2022, 81, 38513–38530. [Google Scholar] [CrossRef]
Yang, Y.; Mei, G.; Piccialli, F. A Deep Learning Approach Considering Image Background for Pneumonia Identification Using Explainable AI (XAI). IEEE/ACM Trans. Comput. Biol. Bioinform. 2024, 21, 857–868. [Google Scholar] [CrossRef]
Desai, S.; Ramaswamy, H.G. Ablation-CAM: Visual Explanations for Deep Convolutional Network via Gradient-Free Localization. In Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), Snowmass Village, CO, USA, 1–5 March 2020; pp. 983–991. [Google Scholar]
Namatēvs, I. Deep Convolutional Neural Networks: Structure, Feature Extraction and Training. Inf. Technol. Manag. Sci. 2017, 20, 40–47. [Google Scholar] [CrossRef]
Yan, T.; Wong, P.K.; Ren, H.; Wang, H.; Wang, J.; Li, Y. Automatic Distinction between COVID-19 and Common Pneumonia Using Multi-Scale Convolutional Neural Network on Chest CT Scans. Chaos Solitons Fractals 2020, 140, 110153. [Google Scholar] [CrossRef]
Sarkar, O.; Islam, M.R.; Syfullah, M.K.; Islam, M.T.; Ahamed, M.F.; Ahsan, M.; Haider, J. Multi-Scale CNN: An Explainable AI-Integrated Unique Deep Learning Framework for Lung-Affected Disease Classification. Technologies 2023, 11, 134. [Google Scholar] [CrossRef]
Yang, C.; Wang, Y.; Wang, X.; Geng, L. A Stride-Based Convolution Decomposition Method to Stretch CNN Acceleration Algorithms for Efficient and Flexible Hardware Implementation. IEEE Trans. Circuits Syst. Regul. Pap. 2020, 67, 3007–3020. [Google Scholar] [CrossRef]
Younesi, A.; Ansari, M.; Fazli, M.; Ejlali, A.; Shafique, M.; Henkel, J. A Comprehensive Survey of Convolutions in Deep Learning: Applications, Challenges, and Future Trends. IEEE Access 2024, 12, 41180–41218. [Google Scholar] [CrossRef]
Ayachi, R.; Afif, M.; Said, Y.; Atri, M. Strided Convolution Instead of Max Pooling for Memory Efficiency of Convolutional Neural Networks. In Proceedings of the 8th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT’18); Bouhlel, M.S., Rovetta, S., Eds.; Springer International Publishing: Cham, Swizerland, 2020; Volume 1, pp. 234–243. [Google Scholar]
Zaniolo, L.; Marques, O. On the Use of Variable Stride in Convolutional Neural Networks. Multimed. Tools Appl. 2020, 79, 13581–13598. [Google Scholar] [CrossRef]
Fränti, P.; Mariescu-Istodor, R. Soft Precision and Recall. Pattern Recognit. Lett. 2023, 167, 115–121. [Google Scholar] [CrossRef]
Khan, S.I.; Shahrior, A.; Karim, R.; Hasan, M.; Rahman, A. MultiNet: A Deep Neural Network Approach for Detecting Breast Cancer through Multi-Scale Feature Fusion. J. King Saud Univ.–Comput. Inf. Sci. 2022, 34, 6217–6228. [Google Scholar] [CrossRef]
Franquet, T. Imaging of Pulmonary Viral Pneumonia. Radiology 2011, 260, 18–39. [Google Scholar] [CrossRef]
Garg, M.; Prabhakar, N.; Kiruthika, P.; Agarwal, R.; Aggarwal, A.; Gulati, A.; Khandelwal, N. Imaging of Pneumonia: An Overview. Curr. Radiol. Rep. 2017, 5, 16. [Google Scholar] [CrossRef]
Franquet, T. Imaging of Community-Acquired Pneumonia. J. Thorac. Imaging 2018, 33, 282. [Google Scholar] [CrossRef]
Nillmani; Jain, P.K.; Sharma, N.; Kalra, M.K.; Viskovic, K.; Saba, L.; Suri, J.S. Four Types of Multiclass Frameworks for Pneumonia Classification and Its Validation in X-Ray Scans Using Seven Types of Deep Learning Artificial Intelligence Models. Diagnostics 2022, 12, 652. [Google Scholar] [CrossRef]
Li, Y.; Zhang, Z.; Dai, C.; Dong, Q.; Badrigilan, S. Accuracy of Deep Learning for Automated Detection of Pneumonia Using Chest X-Ray Images: A Systematic Review and Meta-Analysis. Diagnostics 2020, 123, 103898. [Google Scholar] [CrossRef]
Chapman, W.W.; Fizman, M.; Chapman, B.E.; Haug, P.J. A Comparison of Classification Algorithms to Automatically Identify Chest X-Ray Reports That Support Pneumonia. J. Biomed. Inform. 2001, 34, 4–14. [Google Scholar] [CrossRef] [PubMed]
Wu, Y.; Rocha, B.M.; Kaimakamis, E.; Cheimariotis, G.-A.; Petmezas, G.; Chatzis, E.; Kilintzis, V.; Stefanopoulos, L.; Pessoa, D.; Marques, A.; et al. A Deep Learning Method for Predicting the COVID-19 ICU Patient Outcome Fusing X-Rays, Respiratory Sounds, and ICU Parameters. Expert Syst. Appl. 2024, 235, 121089. [Google Scholar] [CrossRef]
Wehbe, R.M.; Sheng, J.; Dutta, S.; Chai, S.; Dravid, A.; Barutcu, S.; Wu, Y.; Cantrell, D.R.; Xiao, N.; Allen, B.D.; et al. DeepCOVID-XR: An Artificial Intelligence Algorithm to Detect COVID-19 on Chest Radiographs Trained and Tested on a Large U.S. Clinical Data Set. Radiology 2021, 299, E167–E176. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Comparative distribution of papers indexed in the Web of Science Core Collection showing machine learning visualization applications in pneumonia and medicine in general from 2015 to 2024.

Figure 2. Interpretable pediatric pneumonia diagnosis methodology: (1) data preparation and augmentation; (2) evaluation of four CNNs, InceptionV3, InceptionResNetV2, DenseNet201, MobileNetV2, with three convolutional approaches—standard, multi-scale, and strided convolutions—with Mish activation function; (3) performance and Grad-CAM interpretability.

Figure 3. The confusion matrices for the proposed approaches. Within each cell, the count of classified instances is presented as the top number, and the corresponding percentage for that actual class is displayed as the bottom number.

Figure 4. Grad-CAM visualizations for healthy lungs.

Figure 5. Grad-CAM visualizations for viral pneumonia.

Figure 6. Grad-CAM visualizations for bacterial pneumonia.

Figure 7. Grad-CAM visualizations of MobileNetV2 with Approach 2 across training stages for the classification of healthy lungs, viral pneumonia, and bacterial pneumonia.

Table 1. Comparative characteristics of CNN models employed for medical imaging.

Architecture	Strengths	Limitations
InceptionV3	Efficient multi-scale feature extraction through Inception modules [41]; effective at capturing fine-grained patterns, with moderate computational cost [42].	May struggle with very subtle features in pediatric lungs; not as lightweight as MobileNetV2 [43].
InceptionResNetV2	Combines Inception modules with residual connections; deeper and more accurate; improved gradient flow [44].	Higher computational requirements; risk of overfitting, if not carefully regularized [44].
DenseNet201	Dense connectivity promotes feature reuse and mitigates vanishing gradients [45]; strong performance on small datasets.	Higher memory usage; feature maps can become redundant, slightly increasing inference time [37,46].
MobileNetV2	Lightweight with depth-wise separable convolutions; ideal for real-time applications and devices with limited resources [43].	May underperform on very complex patterns when compared to the results for heavier models; limited representational capacity [47].

Table 2. Pneumonia classification accuracy assessment of proposed approaches.

Transfer Deep Learning Model	Classification Approach	Accuracy	F1-Score	Precision	Recall	Specificity
InceptionV3	Approach 1	0.9573	0.9462	0.9444	0.9479	0.9274
	Approach 2	0.9394	0.9202	0.9394	0.9049	0.8297
	Approach 3	0.9684	0.9595	0.9659	0.9536	0.9211
InceptionResNetV2	Approach 1	0.9684	0.9604	0.9564	0.9645	0.9558
	Approach 2	0.9539	0.9407	0.9483	0.9337	0.8896
	Approach 3	0.9718	0.9634	0.9767	0.9519	0.9085
MobileNetV2	Approach 1	0.9206	0.9060	0.8871	0.9367	0.9716
	Approach 2	0.9437	0.9254	0.9481	0.9078	0.8297
	Approach 3	0.9104	0.8945	0.8754	0.9277	0.9653
DenseNet201	Approach 1	0.9650	0.9542	0.9709	0.9403	0.8864
	Approach 2	0.9676	0.9582	0.9662	0.9510	0.9148
	Approach 3	0.9573	0.9440	0.9622	0.9291	0.8675

Approach 1, base model with Mish activation function and standard convolution; Approach 2, base model with Mish activation function and multi-scale convolutions; Approach 3, base model with Mish activation function and strided convolutions. The highest assessment metrics are bolded.

Table 3. Accuracy and validation loss during 10 and 20 epochs of training for the proposed approaches.

Transfer Deep Learning Model	Classification Approach	10 Epochs			20 Epochs
Transfer Deep Learning Model	Classification Approach	TA	VA	VL	TA	VA	VL
InceptionV3	Approach 1	0.9379	0.9149	0.2379	0.9662	0.9714	0.1034
	Approach 2	0.9364	0.9253	0.1844	0.9688	0.9384	0.1645
	Approach 3	0.9638	0.9583	0.1155	0.9658	0.9705	0.0744
InceptionResNetV2	Approach 1	0.9688	0.9453	0.7087	0.9375	0.9714	0.0723
	Approach 2	0.9302	0.9392	0.1664	0.9688	0.9497	0.1476
	Approach 3	0.9512	0.9392	0.4888	0.9769	0.9670	0.0912
MobileNetV2	Approach 1	0.9375	0.9201	0.2278	0.9705	0.9193	0.2688
	Approach 2	0.9688	0.9549	0.1289	0.9520	0.9523	0.1383
	Approach 3	0.9664	0.8811	0.5385	0.9681	0.9314	0.2727
DenseNet201	Approach 1	0.9062	0.9314	0.2257	0.9688	0.9635	0.1372
	Approach 2	0.9688	0.9592	0.1171	0.9375	0.9566	0.1058
	Approach 3	0.9523	0.9583	0.1191	0.9062	0.9566	0.1254

Approach 1, base model with Mish activation function and standard convolution; Approach 2, base model with Mish activation function and multi-scale convolutions; Approach 3, base model with Mish activation function and strided convolutions; (TA) training accuracy; (VA) validation accuracy; (VL) validation loss. The highest assessment metrics are bolded.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Radočaj, P.; Martinović, G. Interpretable Deep Learning for Pediatric Pneumonia Diagnosis Through Multi-Phase Feature Learning and Activation Patterns. Electronics 2025, 14, 1899. https://doi.org/10.3390/electronics14091899

AMA Style

Radočaj P, Martinović G. Interpretable Deep Learning for Pediatric Pneumonia Diagnosis Through Multi-Phase Feature Learning and Activation Patterns. Electronics. 2025; 14(9):1899. https://doi.org/10.3390/electronics14091899

Chicago/Turabian Style

Radočaj, Petra, and Goran Martinović. 2025. "Interpretable Deep Learning for Pediatric Pneumonia Diagnosis Through Multi-Phase Feature Learning and Activation Patterns" Electronics 14, no. 9: 1899. https://doi.org/10.3390/electronics14091899

APA Style

Radočaj, P., & Martinović, G. (2025). Interpretable Deep Learning for Pediatric Pneumonia Diagnosis Through Multi-Phase Feature Learning and Activation Patterns. Electronics, 14(9), 1899. https://doi.org/10.3390/electronics14091899

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Interpretable Deep Learning for Pediatric Pneumonia Diagnosis Through Multi-Phase Feature Learning and Activation Patterns

Abstract

1. Introduction

2. Related Works

3. Materials and Methods

3.1. Data Preprocessing and Experimental Setup

3.2. Interpretability and Convolutional Methods in Regards to Pneumonia Recognition

3.3. Pediatric Pneumonia Accuracy Assessment

4. Results and Discussion

5. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI