Combining Deep Learning Architectures with Fuzzy Logic for Robust Pneumonia Detection in Chest X-Rays

Mjahad, Azeddine; Rosado-Muñoz, Alfredo

doi:10.3390/app151910321

Open AccessArticle

Combining Deep Learning Architectures with Fuzzy Logic for Robust Pneumonia Detection in Chest X-Rays

by

Azeddine Mjahad

^*

and

Alfredo Rosado-Muñoz

^*

GDDP, Department Electronic Engineering, School of Engineering, University of Valencia, 46100 Burjassot, Valencia, Spain

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2025, 15(19), 10321; https://doi.org/10.3390/app151910321

Submission received: 3 August 2025 / Revised: 5 September 2025 / Accepted: 11 September 2025 / Published: 23 September 2025

(This article belongs to the Special Issue Machine Learning-Based Feature Extraction and Selection: 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Early and accurate detection of pneumonia from chest X-ray images is essential for improving treatment and clinical outcomes. Medical imaging datasets often exhibit class imbalance and uncertainty in feature extraction, which complicates conventional classification methods and motivates the use of advanced approaches combining deep learning and fuzzy logic. This study proposes a hybrid approach that combines deep learning architectures (VGG16, EfficientNetV2, MobileNetV2, ResNet50) for feature extraction with fuzzy logic-based classifiers, including Fuzzy C-Means, Fuzzy Decision Tree, Fuzzy KNN, Fuzzy SVM, and ANFIS (Adaptive Neuro-Fuzzy Inference System). Feature selection techniques were also applied to enhance the discriminative power of the extracted features. The best-performing model, ANFIS with MobileNetV2 features and Gaussian membership functions, achieved an overall accuracy of 98.52%, with Normal class precision of 97.07%, recall of 97.48%, and F1-score of 97.27%, and Pneumonia class precision of 99.06%, recall of 98.91%, and F1-score of 98.99%. Among the fuzzy classifiers, Fuzzy SVM and Fuzzy KNN also showed strong performance with accuracy above 96%, while Fuzzy Decision Tree and Fuzzy C-Means achieved moderate results. These findings demonstrate that integrating deep feature extraction with neuro-fuzzy reasoning significantly improves diagnostic accuracy and robustness, providing a reliable tool for clinical decision support. Future research will focus on optimizing model efficiency, interpretability, and real-time applicability.

Keywords:

pneumonia detection; chest X-ray; deep learning; feature selection; VGG16; EfficientNetV2; MobileNetV2; ResNet50; fuzzy classifiers; ANFIS; medical imaging; hybrid model; clinical decision support

1. Introduction

Pneumonia is a respiratory disease characterized by lung inflammation and remains one of the leading causes of morbidity and mortality worldwide, particularly among vulnerable populations such as children, the elderly, and immunocompromised patients [1]. Early and accurate detection of this condition is crucial for initiating appropriate treatment and improving patient prognosis [2].

Chest X-rays (CXR) are the most widely used imaging modality for pneumonia diagnosis due to their low cost, rapid acquisition, and widespread availability [3]. However, interpreting chest radiographs is often complex and subjective, even for experienced radiologists, because of overlapping anatomical structures, variable image quality, and visual similarities between pneumonia and other pulmonary conditions [4]. Furthermore, in many regions, limited resources and a shortage of specialists hinder timely and high-quality diagnosis [5].

To address these challenges, computer-aided diagnosis (CAD) systems have gained increasing importance in medical imaging [6]. In particular, artificial intelligence (AI)-based approaches have demonstrated significant potential to improve diagnostic accuracy, reduce the workload of healthcare professionals, and streamline clinical workflows [7]. Deep learning models, in particular, have achieved remarkable results in medical image analysis due to their ability to automatically extract discriminative features directly from raw data, without requiring manual feature engineering [8]. In parallel, fuzzy logic-based methods have been widely applied in biomedical research, given their capacity to handle uncertainty, imprecision, and non-linear relationships inherent in clinical data [9]. Fuzzy systems provide flexible and interpretable frameworks for decision-making, making them particularly suitable for medical diagnosis where data ambiguity is common [10]. Consequently, integrating fuzzy reasoning with AI-based feature extraction can enhance the robustness, transparency, and reliability of CAD systems for pneumonia detection.

Several studies in recent years have leveraged large-scale public CXR datasets to develop automated pneumonia detection models. Despite encouraging results, persistent challenges remain, including class imbalance, inter-patient variability, and the uncertainty of visual patterns, which may compromise model generalizability in real-world clinical settings.

Therefore, this study proposes a hybrid framework for pneumonia detection that combines deep learning architectures for feature extraction with fuzzy logic-based classifiers. The approach aims to exploit the high representational capacity of deep neural networks together with the interpretability and uncertainty management of fuzzy systems. By addressing the limitations of conventional methods, the proposed system seeks to improve diagnostic accuracy, robustness, and clinical applicability, ultimately contributing to timely and effective patient care.

Proposed Work:

This study proposes a hybrid framework for the automated detection of pneumonia in chest X-ray (CXR) images, integrating feature extraction using CNNs with fuzzy logic-based classifiers, including ANFIS. Unlike previous works where fuzzy systems were applied first for feature extraction and CNNs were subsequently used for classification, our approach reverses this order: CNNs are employed to extract deep features and optimize relevant parameters, while fuzzy classifiers, including ANFIS, are applied for the final classification. This strategy enables more effective training, enhances model robustness and interpretability, and maximizes accuracy in pneumonia detection.

The methodology is structured into three main stages:

Stage 1: Deep Feature Extraction via CNNs
Pretrained CNNs (VGG16, EfficientNetV2, MobileNetV2, and ResNet50) are used to extract deep features and optimize parameters relevant for classification.
Stage 2: Fuzzy Classification
Various fuzzy classifiers (Fuzzy SVM, Fuzzy DT, Fuzzy KNN, FCM, and ANFIS) are trained using the features extracted from the CNNs. For ANFIS, both Gaussian and triangular membership functions are used, with parameters automatically adjusted through ANFIS’s hybrid learning algorithm (combining gradient descent and least squares).
Stage 3: Hybrid Integration and Optimization
A genetic algorithm is applied to select the best parameters, and all CNN–fuzzy combinations are compared to identify the most effective configuration for pneumonia classification.

This reversal of the processing flow represents the main novelty of our approach compared to previous studies, allowing more efficient use of extracted features and superior performance in pneumonia classification.

The structure of this paper is organized as follows. Section 2 introduces the classification strategies explored in this work, with an emphasis on deep learning techniques CNNs and fuzzy logic-based approaches. Section 3 describes the fuzzy logic-based classification models employed in the study. Section 4 details the data sources and outlines the methodological framework, including the training scheme, feature extraction process, and evaluation protocol. Section 5 presents the experimental results, which are further analyzed and discussed in Section 6. Finally, Section 8 summarizes the main contributions of this work and outlines directions for future research.

2. Fundamental Concepts CNN

This section provides an overview of CNNs, focusing on the particular architecture adopted for this research.

CNNs have gained widespread attention due to their superior accuracy in tasks involving image recognition and classification. Fundamentally, they can be viewed as specialized feedforward artificial neural networks (ANNs) [11], as depicted in Figure 1.

These networks consist of multiple layers, each containing trainable filters or kernels (also called neurons) with weights and biases that are optimized during training. The process involves convolving these filters over the input data, often followed by nonlinear activation functions to introduce complexity [12]. A standard CNN architecture is composed of the following elements:

Convolutional layers (CONV): These layers apply convolutional operations to extract features from the input data.
Pooling layers (POOL): They reduce the spatial dimensions of feature maps, commonly through subsampling, to decrease computational load and capture dominant features.
Fully connected layers (FCL): These layers connect every neuron from the previous layer to each neuron in the current layer, similar to a traditional perceptron, to interpret extracted features.
Classification layer (Softmax): This final layer computes probabilities for each class, assigning the input to the most likely category.

2.1. Optimization of Hyperparameters

Hyperparameters in CNNs are parameters that are not learned during training but must be predefined. They significantly affect model performance and training efficiency.

Important hyperparameters include:

Number of layers: Typically a combination of convolutional, activation (e.g., ReLU), pooling, and fully connected layers [13].
Filter size (kernel size): Examples include $3 \times 3$ , $5 \times 5$ , or $7 \times 7$ , which influence feature extraction granularity [14].
Number of filters: Determines the depth and expressiveness of feature maps, balancing computational cost and accuracy [15].
Stride: Controls the step size of filters over the input; common values are 1 or 2, affecting output size [16].
Padding: Preserves spatial dimensions, typically set as ‘same’ or ‘valid’ [16].
Activation functions: Such as ReLU, leaky ReLU, or Sigmoid; ReLU is favored for simplicity and effectiveness [17].
Pooling layers: Max pooling or average pooling with typical pool sizes of $2 \times 2$ reduce spatial dimensions and computational load [18].
Fully connected layers: Size varies depending on task complexity; output layer size matches the number of classes [19].
Dropout: A regularization technique to prevent overfitting by randomly deactivating neurons during training, usually with rates between 0.2 and 0.5 [20].
Batch size: Number of samples processed per training iteration, impacting convergence speed and computational requirements [21].
Number of epochs: Defines how many times the entire dataset is passed through the network during training [22].
Learning rate: Controls the optimization step size; too small slows convergence, too large risks instability [23].
Optimizers: Algorithms such as Stochastic Gradient Descent (SGD), Adam, and RMSprop influence training efficiency and model accuracy [24].

Selecting optimal hyperparameters depends on the dataset, problem domain, and available computational resources, often requiring iterative tuning and experimentation.

2.2. CNN Approaches Implemented

In this study, four CNN architectures with distinct characteristics and purposes were employed for image classification and analysis: VGG16, EfficientNetV2, MobileNetV2, and ResNet50. VGG16 is recognized for its simplicity and depth, stacking exclusively small

3 \times 3

convolutional filters across multiple layers to increase representational capacity. It also uses pooling layers to reduce spatial dimensions and fully connected layers for classification. EfficientNetV2 optimizes both accuracy and efficiency by applying compound scaling of depth, width, and resolution, along with improved training techniques. This architecture achieves strong performance with fewer parameters and faster training times compared to previous models. MobileNetV2 is designed specifically for efficiency on mobile and embedded devices, utilizing depthwise separable convolutions to significantly reduce the number of parameters and computational cost. It introduces bottleneck layers and residual connections to further enhance performance. ResNet50, a deep residual network, addresses the degradation problem in very deep networks by using skip connections that facilitate gradient flow during training. This architecture enables training of very deep models while maintaining high accuracy. Each of these architectures offers unique advantages that complement one another to enhance visual data analysis in this study.

3. Fuzzy Logic-Based Classification Models

Fuzzy logic, introduced by Zadeh in 1965, allows handling imprecision and uncertainty through membership values in the range

[0, 1]

[25]. Unlike classical binary logic, it enables the representation of degrees of membership and the management of uncertain information, which is particularly useful in industrial defect detection.

Fuzzy sets are represented by membership functions, typically triangular or Gaussian. In this work, we use three membership functions per feature (two Gaussian and one triangular) to capture variability and uncertainty while preserving interpretability. Among the most widely used fuzzy algorithms, two groups can be distinguished:

ANFIS (Adaptive Neuro-Fuzzy Inference System) [26]: uses explicit membership functions and trainable fuzzy rules. During training, ANFIS adjusts both the parameters of the membership functions and the rules that combine the inputs to generate the output, through a hybrid learning approach (gradient + least squares). This enables adaptive modeling of uncertainty and nonlinear relationships.
Other fuzzy classifiers:
–
Fuzzy SVM [27]: incorporates fuzzy membership values for each sample to weight its importance during training, but does not adjust membership functions or explicit rules.
–
Fuzzy KNN [28]: assigns soft memberships to a query point based on its distance to neighbors; it does not employ trainable membership functions or rules.
–
Fuzzy DT [29]: operate with derived membership values at each node, without training membership functions or rules; they enable handling of uncertainty in decision-making.
–
FCM [30,31]: performs fuzzy clustering based on adaptive distances and cluster memberships; it does not train membership functions or rules.

To illustrate the operation of ANFIS and its trainable membership functions and rules, different configurations of membership functions were analyzed. Figure 2, Figure 3 and Figure 4 present the ANFIS training schemes using Gaussian, trapezoidal, and triangular membership functions, respectively.

4. Materials and Methods

This study proposes a hybrid methodology for the automatic detection of pneumonia in CXR images, combining deep learning for feature extraction with fuzzy logic-based classification. The general pipeline includes dataset preparation, feature extraction using CNN architectures, feature selection via GAs, and classification using fuzzy systems. Finally, the results are evaluated through performance metrics. Figure 5 illustrates the overall workflow.

Dataset Preparation:

The dataset consists of labeled CXR images divided into pneumonia-positive and normal classes.
The data was split into training and testing sets, maintaining class balance and representative sampling.
Preprocessing involved normalization and resizing images to fit the input requirements of the CNN models.

CNN Architectures for Deep Feature Extraction:

Pre-trained CNN architectures such as VGG16, EfficientNetV2, MobileNetV2, and ResNet50 were employed.
The models were fine-tuned on the training dataset to provide baseline classification performance.
Deep features were extracted from the penultimate layers to be used in subsequent fuzzy classification.

Feature Selection Using Genetic Algorithms:

The extracted CNN features are typically high-dimensional and may contain redundant or irrelevant components.
For each CNN, a genetic algorithm (GA) was applied to select the most informative parameters within its feature group, yielding an optimized version of each CNN.
The optimized feature groups from different CNNs were then compared using clustering-based metrics, namely the Silhouette coefficient and the Calinski-Harabasz index, in order to determine the best CNN feature group for classification.

Fuzzy Classification Models and Evaluation:

Several fuzzy logic-based classifiers were trained using the optimized CNN features.
The classifiers considered were Fuzzy SVM, Fuzzy DT, Fuzzy K-Nearest Neighbors Fuzzy KNN, and FCM.
Performance was evaluated using accuracy, precision, recall, F1-score, and ROC curves.
All CNN–fuzzy combinations were tested to identify the most effective hybrid configuration.

Testing and Validation:

Test images were preprocessed and passed through the trained CNN feature extractors.
Optimized feature subsets were used as inputs to the trained fuzzy classifiers.
Model performance was assessed on unseen data to verify generalization and robustness.

4.1. Dataset Preparation

The dataset used in this study is the publicly available CXR Images (Pneumonia) dataset, originally introduced as part of a clinical study published in the journal Cell [32]. It contains 5863 anterior-posterior (AP) chest X-ray images of pediatric patients aged 1–5 years. The data were collected at the Guangzhou Women and Children’s Medical Center in China as part of routine clinical care. All radiographs underwent an initial quality screening, and low-quality or unreadable scans were excluded. Diagnostic labels—classified as either Normal or Pneumonia (bacterial or viral)—were assigned based on expert evaluation. Each image was independently reviewed by two certified radiologists, and to ensure diagnostic accuracy, a third specialist evaluated a subset of images to reduce annotation errors

The dataset is organized into three predefined folders: training, validation, and testing. These subsets are balanced across the two classes, facilitating supervised learning and consistent performance evaluation. The images are in JPEG format and exhibit sufficient resolution for feature extraction via CNNs. The dataset is available under a Creative Commons (CC BY 4.0) license and can be accessed through Mendeley Data [33]. Figure 6 provides example CXR scans illustrating the visual differences between normal lungs and those affected by bacterial and viral pneumonia.

4.2. CNN Architectures for Deep Feature Extraction

To extract rich and discriminative feature representations from CXR images, several well-known CNN architectures were employed in a transfer learning setting. The models used include VGG16, EfficientNetV2, MobileNetV2, and ResNet50. Each of these networks was initialized with weights pretrained on the ImageNet dataset. Their original classification layers were removed, and the convolutional base was kept frozen during training in order to retain the generic visual features learned from large-scale natural image datasets.

On top of each frozen base CNN, a custom classification head was added to adapt the model to the binary task (normal vs. pneumonia). This head consists of a Global Average Pooling (GAP) layer, followed by two fully connected (dense) layers with ReLU activation, and a final dense layer with a sigmoid activation function for binary classification. During training, only these top layers were optimized. The output of the penultimate dense layer was extracted as a deep feature vector for each image.

Table 1 summarizes the CNN architectures used in this work and the common classification head added to all of them. These deep features were then used as inputs to fuzzy logic-based classifiers, combining the automatic and powerful feature extraction capability of CNNs with the flexibility and uncertainty management provided by fuzzy models. This hybrid approach aims to improve diagnostic accuracy and robustness by leveraging the strengths of both methodologies.

In addition to the pretrained backbones, we also designed a custom CNN architecture (Table 2), which serves as a baseline for validating the performance of our hybrid approach against both standard and customized CNN structures.

4.3. Feature Selection Using GAs

A two-stage feature selection strategy based on Genetic Algorithms (GA) was employed:

GA—selects the most relevant features within each group.
Quality Indices—compare the complete feature groups generated by GA to select the most representative set.

4.3.1. Method 1: GA-Based Feature Selection Using Quality Metric Q

To maximize class separability, a quality metric Q is computed as the average Euclidean distance between samples from different classes [34]:

Q = \frac{1}{| C_{0} | | C_{1} |} \sum_{x \in C_{0}} \sum_{y \in C_{1}} ∥ x - y ∥

(1)

where

C_{0}

and

C_{1}

are the sets of selected features for classes 0 and 1, respectively, and

∥ x - y ∥

is the Euclidean distance between samples x and y.

4.3.2. Method 2: GA-Based Feature Selection Using the Silhouette Index

To evaluate clustering quality of feature subsets, the average Silhouette index is computed [35]:

fitness = \frac{1}{n} \sum_{i = 1}^{n} \frac{b (i) - a (i)}{max {a (i), b (i)}}

(2)

where

a (i)

is the average intra-cluster distance and

b (i)

is the minimum average inter-cluster distance of sample i.

4.3.3. Additional Cluster Validity Indices

To further assess clustering performance, the Calinski-Harabasz (CH) and Davies-Bouldin (DB) indices are computed:

C H = \frac{tr (B_{k}) / (k - 1)}{tr (W_{k}) / (n - k)}, D B = \frac{1}{k} \sum_{i = 1}^{k} max_{j \neq i} \frac{S_{i} + S_{j}}{M_{i j}}

(3)

where

S_{i}

is the intra-cluster distance of cluster i and

M_{i j}

is the distance between centroids of clusters i and j.

The configuration of the GA used for feature selection is summarized in Table 3.

4.4. Fuzzy Classification Models and Evaluation

The optimal feature subset obtained through the GAs described in the previous section was used as input for several fuzzy classifiers. These models leverage fuzzy logic rules and membership functions to handle the uncertainty and imprecision often present in medical data. The classifiers evaluated include: Fuzzy KNN, Fuzzy DT, Fuzzy SVM, FCM, and ANFIS.

Each model was tuned using different types of membership functions (triangular, trapezoidal, and Gaussian) and decision thresholds to optimize classification performance. In the case of ANFIS, the hybrid learning algorithm (combining gradient descent and least squares) was used to adapt the membership function parameters and fuzzy rules.

To ensure robustness, all fuzzy classifiers were trained and validated using 5-fold cross-validation on the training data. The dataset division was the same as applied to the CNN architectures: 75% for training, 15% for testing, and 10% for validation. The training subset was used both for feature selection (via GA) and classifier optimization, while the validation set was employed for hyperparameter tuning. The independent test set (15%) was used exclusively for the final evaluation to provide unbiased performance assessment.

The evaluation metrics used in this study—accuracy, precision, recall (sensitivity), specificity, F1-score, and AUC—are standard in binary classification problems. These metrics provide complementary views of the model’s performance and are widely adopted in medical diagnosis tasks.

5. Results

Before presenting the experimental results in detail, the overall workflow of the proposed approach is shown in Figure 7. The methodology begins with preprocessing and enhancement of medical images, followed by three main analysis paths:

Path 1: Fuzzy-only approach—Direct use of the original images as input to the fuzzy classification system, without applying any feature extraction or parameter optimization.
Path 2: CNN-only approach—The original images are directly fed into the CNN family (VGG16, ResNet50, MobileNetV2, EfficientNetV2), which performs both feature extraction and classification. No additional feature selection or optimization is applied in this path.
Path 3: Hybrid CNN-Fuzzy approach—Combines CNN-based feature extraction with fuzzy classification. Features extracted by the CNN family are first optimized using two strategies:
–
Method 1: Intra-architecture optimization (GA-based)—Internal tuning of genetic algorithm parameters (population size, generations, mutation rate) applied independently to each CNN architecture, producing an optimized subset of features.
–
Method 2: Inter-method comparison (clustering metrics)—Comparison among optimized feature groups using clustering-based evaluation metrics, namely the Silhouette Index, Calinski-Harabasz Index, and Davies-Bouldin Index. These metrics guide the selection of the best-performing CNN feature group.
Finally, the selected features are passed to the fuzzy classifiers, which produce the final diagnosis.

Figure 7 illustrates the overall workflow of the proposed approach as a block diagram. The experimental setup follows a comparative structure in which each path provides complementary evidence about the contribution of CNN-based deep features and fuzzy classifiers. The following subsections present the quantitative results for each analysis path, followed by a comparative discussion that highlights the advantages of the proposed hybrid CNN-Fuzzy system over single-method approaches.

5.1. Direct Image-Based Classification—CNN

Table 4 summarizes the performance of different CNN architectures for CXR classification into Normal and Pneumonia classes. Overall, VGG16 and MobileNetV2 demonstrate the highest performance across most metrics, with accuracy values exceeding 96% for both classes. In contrast, EfficientNetV2 shows significantly lower accuracy for the Normal class, with metrics such as Recall and Precision dropping to 0%, indicating difficulties in correctly classifying this category.

For the Normal class, as illustrated in Figure 8, VGG16 achieves the highest recall 88.66% and precision 97.78%, suggesting a strong ability to correctly identify normal cases while minimizing false positives. MobileNetV2 also performs well, particularly in recall 97.06% and specificity (97.06%), indicating reliable discrimination between normal and pathological samples.

Regarding the Pneumonia class, Figure 9 shows that VGG16 maintains the best overall balance across all metrics, with an F1-score of 97.62%, while Baseline CNN and MobileNetV2 also yield high accuracy and recall. EfficientNetV2 exhibits extremely high recall 100% but very low specificity (0%), reflecting that it tends to classify most samples as Pneumonia, which inflates sensitivity but severely compromises its ability to correctly recognize normal cases.

In summary, the analysis of Table 4 and Figure 8 and Figure 9 highlights that VGG16 and MobileNetV2 are the most robust architectures for direct image-based classification, achieving consistently high performance for both Normal and Pneumonia classes. These results provide a solid baseline for subsequent stages involving feature selection and hybrid CNN-Fuzzy approaches.

5.2. Performance Evaluation of Fuzzy Models Using Direct Image Input

In this study, various fuzzy models were evaluated for the classification of CXR images into two classes: Normal and Pneumonia, using the images directly as input without feature extraction. The methods evaluated include Fuzzy SVM, Fuzzy KNN, Fuzzy DT, FCM, and ANFIS with three membership function variants: Gaussian, Triangular, and Trapezoidal. The performance metrics for each model and class are presented in Table 5.

The analysis of Table 5 indicates that Fuzzy SVM achieved the best overall performance, with an accuracy of 92.95%. It shows a good balance between sensitivity for detecting Pneumonia 95.48% and specificity for Normal 95.48%, as well as high F1-scores for both classes, demonstrating robust and balanced classification performance.

The second-best model is Fuzzy KNN, with accuracy of 91.01 percent. It excels in detecting Pneumonia cases (recall 96.88%) but is less sensitive for Normal cases (recall 75.21%), indicating a slight tendency to misclassify healthy patients. Nonetheless, it is competitive when prioritizing disease detection.

Fuzzy DT shows intermediate performance (accuracy 85.10%), with lower sensitivity and precision for Normal cases compared to SVM and KNN, but its interpretable structure can be advantageous in clinical applications.

FCM demonstrates poor performance (accuracy 54.72%), barely above chance, making it unsuitable for direct image classification in this setting.

ANFIS models, regardless of membership function (Gaussian, Triangular, Trapezoidal), show similar behavior (accuracy 72.92%), with zero sensitivity for Normal and perfect sensitivity for Pneumonia. This indicates a strong bias toward Pneumonia classification, producing high false positives and limiting clinical applicability.

In summary, when using images directly as input to the fuzzy family without feature extraction, the most reliable models are Fuzzy SVM and Fuzzy KNN. SVM provides a balanced performance across both classes, whereas KNN favors Pneumonia detection. Fuzzy DT may be considered when interpretability is critical, and FCM and ANFIS are less recommended due to poor or biased performance.

5.3. Feature Extraction and Selection from CNN Models

This section presents the results of extracting deep features from the CNN family and applying feature selection using a Genetic Algorithm (GA), followed by a comparative evaluation among the CNN models using clustering-based metrics, as summarized in Figure 10.

5.3.1. Feature Selection Using GA and Quality Metric Q

A GA was applied to select the most informative features from each CNN model: MobileNetV2, ResNet50, VGG16, EfficientNetV2, and a custom CNN. The fitness function in this analysis was the quality metric Q, which evaluates class separability in the feature space. The goal was to obtain subsets of features that maximize discriminability for Normal and Pneumonia classes.

Visualization of Selected Features

Figure 11 shows PCA-based projections of the selected features for each CNN model. These plots illustrate the class separation in a 2D space, providing a visual indication of the discriminative power of the selected features.

Table 6 summarizes the maximum clustering quality Q achieved and the number of features selected for each CNN model. As can be seen in Figure 12, the bar chart visually represents the same information, highlighting that VGG16 achieved the highest Q, indicating the strongest discriminability. The figure also shows the number of selected features above each bar, with the best model clearly indicated.

5.3.2. Feature Selection Using Clustering Metrics

To further validate the selected features, clustering metrics (Silhouette Score, Calinski–Harabasz Index, Davies–Bouldin Index) were used to compare the feature subsets from all CNN models. Table 7 presents the exact numeric values, while Figure 13 provides a visual comparison.

PCA Analysis of Cluster Structure

Figure 14 shows PCA projections of the selected feature subsets using Silhouette-based selection. VGG16 and Custom CNN show well-separated clusters, while ResNet50 has poorly separated clusters.

GA Fitness Evolution Across Generations

Figure 15 shows the evolution of the Silhouette-based GA fitness across generations for all CNN models. VGG16 quickly reaches optimal fitness and remains stable, MobileNetV2 converges gradually, ResNet50 shows limited improvement, while EfficientNetV2 and Custom CNN exhibit smooth and consistent fitness growth.

5.3.3. FCM Classification Using Selected Features

The FCM classifier was applied using the optimal feature subsets extracted from the different CNN models. Table 8 shows the performance metrics (Accuracy, Recall, Specificity, Precision, and F1-score) for both Normal and Pneumonia classes across all evaluated feature extraction methods.

The results in Table 8 show that VGG16 features yield the best overall performance for FCM classification, achieving an accuracy of 94.43% and high F1-scores for both classes. CNN and EfficientNetV2 features follow closely, but MobileNetV2 has lower overall accuracy due to reduced specificity for Normal cases. ResNet50 achieves balanced but slightly lower performance.

Figure 16 and Figure 17 illustrate the distribution of the performance metrics, highlighting that FCM is particularly sensitive to the quality of the selected feature set. While recall for Normal cases is generally high across most models, the precision varies significantly, indicating some misclassification of healthy patients.

In conclusion, FCM performs best when paired with highly discriminative features such as those extracted from VGG16, and its performance drops when feature quality or separability is lower.

5.3.4. Fuzzy DT Classification Using Selected Features

The Fuzzy DT classifier was evaluated using the feature subsets extracted from different CNN models. Table 9 presents the classification performance metrics (Accuracy, Recall, Specificity, Precision, and F1-score) for both Normal and Pneumonia classes.

The results in Table 9 indicate that CNN and MobileNetV2 features provide the best performance for Fuzzy DT classification, achieving accuracy around 97–98% and strong F1-scores for both classes. EfficientNetV2 and VGG16 features show slightly lower performance, especially in recall for Normal cases. ResNet50 has the lowest overall metrics, mainly due to reduced sensitivity for Normal samples. Figure 18 and Figure 19 demonstrate how Fuzzy DT performance depends on feature discriminability: highly separable features improve precision and recall significantly.

5.3.5. Fuzzy KNN Classification Using Selected Features

The Fuzzy KNN classifier was applied to the same feature subsets. Table 10 presents the performance metrics for both Normal and Pneumonia classes.

Fuzzy KNN achieves the highest overall performance, with CNN and MobileNetV2 features reaching accuracy above 98% and high F1-scores for both classes. EfficientNetV2 and VGG16 also provide strong results, while ResNet50 underperforms due to lower sensitivity for Normal cases.

Figure 20 and Figure 21 highlight that Fuzzy KNN benefits from highly discriminative features, showing consistently high recall and precision. Overall, Fuzzy KNN demonstrates robust classification performance, slightly outperforming Fuzzy DT in most scenarios.

5.3.6. Fuzzy SVM Classification Using Selected Features

The Fuzzy SVM classifier was evaluated using the feature subsets extracted from different CNN models. Table 11 presents the classification performance metrics (Accuracy, Recall, Specificity, Precision, and F1-score) for both Normal and Pneumonia classes.

Fuzzy SVM achieves high overall classification performance. CNN and MobileNetV2 features reach accuracy above 98% with strong F1-scores for both Normal and Pneumonia classes. EfficientNetV2 and VGG16 also show competitive results, while ResNet50 underperforms due to lower sensitivity in Normal cases.

Figure 22 and Figure 23 illustrate how Fuzzy SVM performance benefits from highly discriminative features, providing consistently high precision and recall across both classes.

5.4. ANFIS-CNN Classification Results

The ANFIS classifier demonstrated strong performance using features extracted from various CNN architectures.

The results in Table 12 indicate that ANFIS effectively integrates CNN feature extraction with neuro-fuzzy reasoning for early pneumonia detection. CNN Generic and MobileNetV2 features combined with Gaussian membership functions achieved the highest overall accuracy and F1-scores, exceeding 98.5% and 97% respectively for both Normal and Pneumonia classes. EfficientNetV2 and VGG16 features also achieved high performance (around 97%), while ResNet50 features performed slightly lower (92–93%).

Gaussian membership functions provided the most consistent results across all CNN architectures. Trapezoidal and Triangular functions performed well in most cases but showed lower F1-scores for certain combinations, such as MobileNetV2 with Trapezoidal membership for Normal class. Overall, ANFIS with CNN features provides robust, accurate, and balanced classification, making it a strong approach for early pneumonia detection.

The results of ANFIS fuzzy classification using features extracted from both MobileNetV2 and a generic CNN, as illustrated in Figure 24 and Figure 25, show consistent trends across membership functions. The Triangular and Gaussian membership functions achieved the highest ROC AUC and PRC values, exceeding 98% in both feature sets, indicating strong discriminatory capability between the Normal and Pneumonia classes. Corresponding confusion matrices show minimal misclassifications, confirming that the majority of cases were correctly classified. Precision, recall, and F1-score metrics are consistently high, reflecting reliable and balanced detection.

In contrast, the Trapezoidal membership function exhibits lower ROC AUC and PRC values across both CNN feature sets, indicating reduced classification performance. Confusion matrices reveal increased errors, particularly in the Normal class, accompanied by lower precision and F1-score metrics, highlighting less robust predictions.

Overall, across both MobileNetV2 and generic CNN feature sets, Triangular and Gaussian membership functions consistently outperform Trapezoidal, achieving near-optimal class separation in ROC curves and stable predictions in PDC analyses. These observations confirm that the choice of membership function is critical to maximize ANFIS performance regardless of the CNN used for feature extraction.

5.5. Comparative Analysis of Fuzzy and ANFIS Classification Methods

Table 13 presents the best classification results for each method by class. This allows a clear comparison of the performance, stability, and robustness of fuzzy classifiers versus ANFIS.

Among the traditional fuzzy methods, Fuzzy KNN and Fuzzy SVM consistently achieve the highest accuracies, above 98%, with strong F1-scores for both Normal and Pneumonia classes. These methods demonstrate a well-balanced performance in precision and recall. Fuzzy SVM slightly outperforms Fuzzy KNN in Pneumonia detection, with an F1-score of 99.06% versus 98.75%.

Fuzzy DT offers good overall performance, around 97.7% accuracy, but is slightly inferior to Fuzzy KNN and SVM. Its main advantage is interpretability, which is valuable in clinical settings.

FCM shows the lowest performance among fuzzy methods, around 94.4% accuracy, with notable discrepancies between recall and precision for the Normal class, indicating difficulties in correctly identifying healthy patients. ANFIS demonstrates superior performance across both classes. Using MobileNetV2 features with Gaussian membership functions, ANFIS reaches 98.52% accuracy and F1-scores of 97.27% (Normal) and 98.99% (Pneumonia). ANFIS minimizes the gap between precision and recall for both classes, ensuring a balanced and clinically reliable classification.

The table clearly shows the comparative performance across methods.

5.6. Comparison with Previous Studies

Table 14 summarizes previous studies on pneumonia detection using chest X-ray images. Recent works reported high accuracy with different deep learning approaches. For example, Rahman et al. [36] achieved 98.1% accuracy with deep CNN and transfer learning, while Elshennawy and Ibrahim [37] obtained between 92% and 96.4% using MobileNetV2 combined with LSTM-CNN. More recently, a concatenated CNN with fuzzy logic-based image enhancement reached 98.9% accuracy [38]. These studies illustrate the effectiveness of deep learning, although most approaches involve high computational cost and limited interpretability.

Our hybrid approach—integrating GA-based feature selection with fuzzy and ANFIS classifiers—achieved an accuracy of 98.52% using MobileNetV2 features and Gaussian membership functions. These results demonstrate that state-of-the-art performance can be obtained with fewer parameters and higher interpretability. In addition, our method showed reduced execution time compared to conventional deep learning models, highlighting its efficiency for real-time or resource-constrained applications.

6. Discussion

The results of this study provide a comprehensive comparison of traditional fuzzy classifiers and ANFIS for early pneumonia detection using CNN-extracted features and GA-based feature selection.

Among the fuzzy methods, Fuzzy KNN and Fuzzy SVM achieved the highest accuracy and F1-scores for both Normal and Pneumonia classes, demonstrating a balanced performance between precision and recall. Fuzzy Decision Tree also showed good performance, with the additional advantage of interpretability, which can be important in clinical settings. In contrast, Fuzzy C-Means showed lower reliability, particularly in correctly identifying healthy patients, due to sensitivity to feature quality and separability.

ANFIS consistently outperformed all traditional fuzzy classifiers. Using MobileNetV2 features with Gaussian membership functions, ANFIS achieved 98.52% accuracy and F1-scores of 97.27% (Normal) and 98.99% (Pneumonia), highlighting its ability to integrate CNN feature extraction with neuro-fuzzy reasoning for robust and stable classification. Moreover, ANFIS minimized the gap between precision and recall, a critical factor in clinical applications to reduce false negatives and ensure reliable diagnosis.

The comparison with previous studies, including works from 2019 to the most recent 2024 publications, shows that the proposed hybrid ANFIS-based approach achieves comparable or superior results. Notably, while some 2024 studies report high accuracy (95–98%) using complex CNN ensembles, our method achieves similar or better performance with fewer parameters and reduced computational cost, thanks to the effective feature selection and fuzzy-based reasoning. This highlights the practical advantages of the proposed approach in terms of efficiency, interpretability, and clinical applicability.

In summary, the discussion reveals that:

Fuzzy KNN and Fuzzy SVM are competitive when features are of high quality, but their performance can vary depending on feature discriminability.
Fuzzy Decision Tree provides interpretability with moderate accuracy, making it suitable for explainable clinical decisions.
ANFIS offers the most reliable and consistent classification, with robust performance across both classes and superior handling of feature uncertainty.
Compared with state-of-the-art studies, ANFIS achieves high accuracy while maintaining efficiency and a lower number of parameters, which is beneficial for practical deployment in clinical environments.

Overall, integrating CNN-based feature extraction with neuro-fuzzy reasoning and feature selection results in a powerful framework for early pneumonia detection, balancing accuracy, robustness, and interpretability.

7. Application in a Real Clinical Setting

In contemporary clinical practice, artificial intelligence—particularly deep learning techniques such as CNNs is increasingly adopted as a valuable tool to support diagnostic decision-making. This is especially relevant for conditions like pneumonia, where timely and accurate diagnosis is critical to preventing complications and reducing mortality.

Although CXR imaging is widely available and cost-effective, it poses significant interpretation challenges due to overlapping anatomical structures and the subtlety of pathological signs. Advanced CNN architectures such as VGG16, ResNet50, and InceptionV3 have demonstrated strong capabilities in addressing these challenges by extracting discriminative features and identifying complex visual patterns that may not be readily apparent to human observers.

Building on this capability, our study integrates fuzzy logic-based classifiers to manage the inherent uncertainty and imprecision of medical data. This hybrid approach not only achieves high classification accuracy but also enhances model robustness and interpretability—critical factors for real-world clinical deployment, where transparency is often as important as predictive performance.

In practical scenarios such as emergency departments or resource-constrained environments, AI-assisted systems can assist in prioritizing patients, streamlining radiology workflows, and reducing diagnostic delays. However, successful integration into clinical settings requires thorough model validation, regulatory compliance, and the establishment of clinician trust through explainable outputs.

The proposed framework contributes to bridging the gap between research and clinical application by providing a reliable, interpretable, and efficient tool for pneumonia detection across both routine and acute care settings.

8. Conclusions

This study presented a hybrid framework for pneumonia detection in CXR images by integrating deep CNNs for feature extraction with fuzzy logic-based classifiers to enhance interpretability and handle uncertainty. The combination of pre-trained models such as VGG16, EfficientNetV2, MobileNetV2, and ResNet50 with GA-based feature selection demonstrated strong classification performance across key metrics, including sensitivity, specificity, precision, F1-score, and overall accuracy.

The incorporation of fuzzy logic contributed not only to improved diagnostic accuracy—with overall accuracy values reaching approximately 93.5% for traditional fuzzy classifiers—but also to greater transparency in decision-making, an essential requirement in clinical applications. Among the evaluated configurations, fuzzy classifiers employing triangular and trapezoidal membership functions, especially in KNN and Decision Tree algorithms, yielded the most balanced and reliable results, achieving F1-scores above 95% for the Abnormal class and balanced performance for the Normal class.

Notably, ANFIS combined with MobileNetV2 features and Gaussian membership functions outperformed all traditional fuzzy classifiers, achieving an overall accuracy of 98.52% and F1-scores of 97.27% (Normal) and 98.99% (Pneumonia). Compared with recent state-of-the-art studies, including 2024 publications reporting accuracies of 95–98%, the proposed ANFIS-based approach achieves comparable or superior performance while maintaining a reduced number of parameters, lower computational cost, and enhanced interpretability.

Overall, the results confirm that integrating CNN feature extraction with fuzzy and neuro-fuzzy reasoning provides a powerful, robust, and clinically applicable framework for early pneumonia detection, balancing accuracy, reliability, and transparency in decision-making.

Author Contributions

Conceptualization, A.M. and A.R.-M.; methodology, A.M. and A.R.-M.; software, A.M.; validation, A.M. and A.R.-M.; formal analysis, A.M. and A.R.-M.; investigation, A.M. and A.R.-M.; resources, data curation, A.M. and A.R.-M.; writing—original draft preparation, A.M., and A.R.-M.; writing—review and editing, A.M. and A.R.-M.; supervision, A.R.-M.; project administration, A.R.-M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Prasanna, H.; Inderjeeth, C.A.; Nossent, J.C.; Almutairi, K.B. The global prevalence of interstitial lung disease in patients with rheumatoid arthritis: A systematic review and meta-analysis. Rheumatol. Int. 2025, 45, 34. [Google Scholar] [CrossRef] [PubMed]
Kayembe, J.M.N.; Kayembe, H.C.N. Challenges of Early Diagnosis and Effective Treatment of Pneumonia in Resource-Limited Settings. In Viral Infectious Diseases; IntechOpen: London, UK, 2025. [Google Scholar] [CrossRef]
Nurbaiti, N.; Azarah, N.F. Revolutionizing Pneumonia Diagnostics: A Systematic Review of Deep Learning Applications in Chest Radiography. Jurnal Sains 2025, 4, 1–10. [Google Scholar]
Al Nufaiei, Z.F.; Alshamrani, K.M. Comparing Ultrasound, Chest X-Ray, and CT Scan for Pneumonia Detection. Med. Devices Evid. Res. 2025, 18, 149–159. [Google Scholar] [CrossRef]
Adekola, H.A.; Bamidele, T.; Chukwu, E.; Fowora, M.; Ajibaye, S.; Salako, A.; Musa, Z.; Ezechi, O. Challenges in the phenotypic detection of streptococcus Pneumoniae from clinical samples in low resource settings. Clin. Microbiol. Newsl. 2025, 51, 37–41. [Google Scholar] [CrossRef]
Xin, H.; Wang, W.; Wang, X.; Huang, J.; Di, Y.; Du, J.; Cao, X.; Feng, B.; Shen, L.; He, Y.; et al. The performance of computer-aided detection for chest radiography in tuberculosis screening: A population-based retrospective cohort study. Emerg. Microbes Infect. 2025, 14, 2470998. [Google Scholar] [CrossRef]
Lastrucci, A.; Iosca, N.; Wandael, Y.; Barra, A.; Lepri, G.; Forini, N.; Ricci, R.; Miele, V.; Giansanti, D. AI and Interventional Radiology: A Narrative Review of Reviews on Opportunities, Challenges, and Future Directions. Diagnostics 2025, 15, 893. [Google Scholar] [CrossRef]
Munir, K.; Usama Tanveer, M.; Alyamani, H.J.; Bermak, A.; Ur Rehman, A. PneuX-Net: An Enhanced Feature Extraction and Transformation Approach for Pneumonia Detection in X-Ray Images. IEEE Access 2025, 13, 84024–84037. [Google Scholar] [CrossRef]
Choi, S.H.; Yoon, H.; Baek, H.J.; Long, X. Biomedical Signal Processing and Health Monitoring Based on Sensors. Sensors 2025, 25, 641. [Google Scholar] [CrossRef] [PubMed]
Mallidi, S.K.R. Enhancing Pneumonia Diagnosis and Severity Assessment through Deep Learning: A Comprehensive Approach Integrating CNN Classification and Infection Segmentation. arXiv 2025, arXiv:2502.06735. [Google Scholar] [CrossRef]
da Costa, M.F.P.; Araújo, R.d.S.; Silva, A.R.; Pereira, L.; Silva, G.M.M. Predictive Artificial Neural Networks as Applied Tools in the Remediation of Dyes by Adsorption—A Review. Appl. Sci. 2025, 15, 2310. [Google Scholar] [CrossRef]
Budak, Ü.; Cömert, Z.; Çıbuk, M.; Şengür, A. DCCMED-Net: Densely connected and concatenated multi Encoder-Decoder CNNs for retinal vessel extraction from fundus images. Med. Hypotheses 2020, 134, 109426. [Google Scholar] [CrossRef] [PubMed]
Victoria, A.H.; Maragatham, G. Automatic Tuning of Hyperparameters Using Bayesian Optimization. Evol. Syst. 2021, 12, 217–223. [Google Scholar] [CrossRef]
Young, S.R.; Rose, D.C.; Karnowski, T.P.; Lim, S.H.; Patton, R.M. Optimizing Deep Learning Hyper-Parameters through an Evolutionary Algorithm. In Proceedings of the MLHPC ’15: Proceedings of the Workshop on Machine Learning in High-Performance Computing Environments, Austin, TX, USA, 15 November 2015; pp. 1–5. [Google Scholar] [CrossRef]
Cui, H.; Bai, J. A new hyperparameters optimization method for convolutional neural networks. Pattern Recognit. Lett. 2019, 125, 828–834. [Google Scholar] [CrossRef]
Lee, W.Y.; Park, S.M.; Sim, K.B. Optimal hyperparameter tuning of convolutional neural networks based on the parameter-setting-free harmony search algorithm. Optik 2018, 172, 359–367. [Google Scholar] [CrossRef]
Kiliçarslan, S.; Celik, M. RSigELU: A nonlinear activation function for deep neural networks. Expert Syst. Appl. 2021, 174, 114805. [Google Scholar] [CrossRef]
Zou, X.; Wang, Z.; Li, Q.; Sheng, W. Integration of residual network and convolutional neural network along with various activation functions and global pooling for time series classification. Neurocomputing 2019, 367, 39–45. [Google Scholar] [CrossRef]
Basha, S.S.; Vinakota, S.K.; Dubey, S.R.; Pulabaigari, V.; Mukherjee, S. AutoFCL: Automatically tuning fully connected layers for handling small dataset. Neural Comput. Appl. 2021, 33, 8055–8065. [Google Scholar] [CrossRef]
Dahl, G.E.; Sainath, T.N.; Hinton, G.E. Improving deep neural networks for LVCSR using rectified linear units and dropout. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2013), Vancouver, BC, Canada, 26–31 May 2013; pp. 8609–8613. [Google Scholar] [CrossRef]
Radiuk, P.M. Impact of training set batch size on the performance of convolutional neural networks for diverse datasets. Inf. Technol. Manag. Sci. 2017, 20, 20–24. [Google Scholar] [CrossRef]
Utama, A.B.P.; Wibawa, A.P.; Muladi, M.; Nafalski, A. PSO based Hyperparameter tuning of CNN Multivariate Time-Series Analysis. J. Online Inform. 2022, 7, 193–202. [Google Scholar] [CrossRef]
Zeiler, M.D. Adadelta: An adaptive learning rate method. arXiv 2012, arXiv:1212.5701. [Google Scholar] [CrossRef]
Gulcehre, C.; Moczulski, M.; Bengio, Y. Adasecant: Robust adaptive secant method for stochastic gradient. arXiv 2014, arXiv:1412.7419. [Google Scholar]
Zadeh, L.A. Fuzzy logic—A personal perspective. Fuzzy Sets Syst. 2015, 281, 4–20. [Google Scholar] [CrossRef]
Mjahad, A.; Rosado-Muñoz, A. Hybrid CNN-Fuzzy Approach for Automatic Identification of Ventricular Fibrillation and Tachycardia. Appl. Sci. 2025, 15, 9289. [Google Scholar] [CrossRef]
Chen, Y.; Xiao, C.; Yang, S.; Yang, Y.; Wang, W. Research on long term power load grey combination forecasting based on fuzzy support vector machine. Comput. Electr. Eng. 2024, 116, 109205. [Google Scholar] [CrossRef]
Lohrmann, C.; Lohrmann, A.; Kumbure, M.M. On the benefit of feature selection and ensemble feature selection for fuzzy k-nearest neighbor classification. Appl. Soft Comput. 2025, 171, 112784. [Google Scholar] [CrossRef]
Zhu, X.; Hu, X.; Yang, L.; Pedrycz, W.; Li, Z. A Development of Fuzzy-Rule-Based Regression Models Through Using Decision Trees. IEEE Trans. Fuzzy Syst. 2024, 32, 2976–2986. [Google Scholar] [CrossRef]
Sakrwar, M.S.; Ranadive, A.; Pamucar, D. A novel Gustafson–Kessel based clustering algorithm using n-Pythagorean fuzzy sets. Syst. Soft Comput. 2025, 7, 200345. [Google Scholar] [CrossRef]
Abdullah, A.A.; Ahmed, A.M.; Rashid, T.; Veisi, H.; Rassul, Y.H.; Hassan, B.; Fattah, P.; Ali, S.A.; Shamsaldin, A.S. Advanced Clustering Techniques for Speech Signal Enhancement: A Review and Metanalysis of Fuzzy C-Means, K-Means, and Kernel Fuzzy C-Means Methods. arXiv 2024, arXiv:2409.19448. [Google Scholar] [CrossRef]
Kermany, D.S.; Zhang, K.; Goldbaum, M. Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning. Cell 2018, 172, 1122–1131. [Google Scholar] [CrossRef]
Kermany, D.; Zhang, K.; Goldbaum, M. Labeled Optical Coherence Tomography (OCT) and Chest X-Ray Images for Classification. Mendeley Data, Version 2, CC BY 4.0. 2018. Available online: https://data.mendeley.com/datasets/rscbjbr9sj/2 (accessed on 9 September 2025).
Alsuwat, H.; Alsuwat, E. Energy-aware and efficient cluster head selection and routing in wireless sensor networks using improved artificial bee Colony algorithm. Peer-to-Peer Netw. Appl. 2025, 18, 65. [Google Scholar] [CrossRef]
Lai, H.; Huang, T.; Lu, B.; Zhang, S.; Xiaog, R. Silhouette coefficient-based weighting k-means algorithm. Neural Comput. Appl. 2025, 37, 3061–3075. [Google Scholar] [CrossRef]
Rahman, T.; Chowdhury, M.E.H.; Khandakar, A.; Islam, K.R.; Islam, K.F.; Mahbub, Z.B.; Kadir, M.A.; Kashem, S. Transfer Learning with Deep Convolutional Neural Network (CNN) for Pneumonia Detection using Chest X-ray. Appl. Sci. 2020, 10, 3233. [Google Scholar] [CrossRef]
Elshennawy, N.M.; Ibrahim, D.M. Deep-Pneumonia Framework Using Deep Learning Models Based on Chest X-Ray Images. Diagnostics 2020, 10, 649. [Google Scholar] [CrossRef]
Buriboev, A.S.; Muhamediyeva, D.; Primova, H.; Sultanov, D.; Tashev, K.; Jeon, H.S. Concatenated CNN-Based Pneumonia Detection Using a Fuzzy-Enhanced Dataset. Sensors 2024, 24, 6750. [Google Scholar] [CrossRef] [PubMed]
Talaat, M.; Si, X.; Xi, J. Multi-Level Training and Testing of CNN Models in Diagnosing Multi-Center COVID-19 and Pneumonia X-ray Images. Appl. Sci. 2023, 13, 10270. [Google Scholar] [CrossRef]
Ismail, A.; Rahmat, T.; Aliman, S. Chest X-ray Image Classification Using Faster R-CNN. Malays. J. Comput. 2019, 4, 225–236. [Google Scholar] [CrossRef]
Liang, G.; Zheng, L. A transfer learning method with deep residual network for pediatric pneumonia diagnosis. Comput. Methods Programs Biomed. 2020, 187, 104964. [Google Scholar] [CrossRef]
Varshni, D.; Thakral, K.; Agarwal, L.; Nijhawan, R.; Mittal, A. Pneumonia Detection Using CNN based Feature Extraction. In Proceedings of the 2019 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), Coimbatore, India, 20–22 February 2019; pp. 1–7. [Google Scholar] [CrossRef]
Ayan, E.; Ünver, H.M. Diagnosis of Pneumonia from Chest X-Ray Images Using Deep Learning. In Proceedings of the 2019 Scientific Meeting on Electrical-Electronics & Biomedical Engineering and Computer Science (EBBT), Istanbul, Turkey, 24–26 April 2019; pp. 1–5. [Google Scholar] [CrossRef]
Sirazitdinov, I.; Kholiavchenko, M.; Mustafaev, T.; Yixuan, Y.; Kuleev, R.; Ibragimov, B. Deep neural network ensemble for pneumonia localization from a large-scale chest X-ray database. Comput. Electr. Eng. 2019, 78, 388–399. [Google Scholar] [CrossRef]

Figure 1. Illustration of a typical CNN architecture used for image classification.

Figure 2. ANFIS training scheme using Gaussian membership functions.

Figure 3. ANFIS training scheme using trapezoidal membership functions.

Figure 4. ANFIS training scheme using triangular membership functions.

Figure 5. Training and evaluation pipeline for the proposed hybrid model.

Figure 6. Examples of pediatric CXR images: (left) normal lung, (middle) bacterial pneumonia with localized consolidation, and (right) viral pneumonia with bilateral interstitial patterns. The letter “R” on the radiographs indicates the right side of the patient, following standard radiological convention.

Figure 7. Schematic representation of the proposed fuzzy-based diagnostic system, illustrating the three analysis paths (1: Fuzzy-only, 2: CNN-only, and 3: Hybrid CNN-Fuzzy) and the corresponding feature selection strategies.

Figure 8. Performance metrics (Accuracy, Recall, Specificity, Precision, F1-score) for the Normal class using different CNN architectures.

Figure 9. Performance metrics (Accuracy, Recall, Specificity, Precision, F1-score) for the Pneumonia class using different CNN architectures.

Figure 10. Comparison of fuzzy classification methods based on the generated images.

Figure 11. PCA projections of the feature subsets selected via GA using quality metric Q for different CNN architectures.

Figure 12. Comparison of quality metric Q and selected features across CNN models.

Figure 13. Comparison of clustering metrics across CNN-based feature subsets.

Figure 14. PCA visualizations of GA-selected feature subsets based on Silhouette Index for different CNN architectures.

Figure 15. GA fitness evolution (Silhouette Score) over generations for each CNN model.

Figure 16. Performance metrics for Normal class using Fuzzy C-Means with selected CNN features.

Figure 17. Performance metrics for Pneumonia class using Fuzzy C-Means with selected CNN features.

Figure 18. Performance metrics for the Normal class using Fuzzy DT with selected CNN features.

Figure 19. Performance metrics for the Pneumonia class using Fuzzy DT with selected CNN features.

Figure 20. Performance metrics for the Normal class using Fuzzy KNN with selected CNN features.

Figure 21. Performance metrics for the Pneumonia class using Fuzzy KNN with selected CNN features.

Figure 22. Performance metrics for the Normal class using Fuzzy SVM with selected CNN features.

Figure 23. Performance metrics for the Pneumonia class using Fuzzy SVM with selected CNN features.

Figure 24. Results of ANFIS fuzzy classification using MobileNetv2-extracted features with different membership functions.

Figure 25. Results of ANFIS fuzzy classification using CNN-extracted features with different membership functions.

Table 1. Summary of the deep CNN architectures used for feature extraction.

Component	Description	Output Shape
Common Structure (added to all CNN bases)
Input	RGB image	$224 \times 224 \times 3$
Global Average Pooling	Aggregates spatial features	Depends on base model
Dense Layer	128 units, ReLU activation	128
Dense Layer	64 units, ReLU activation	64
Output Layer	1 unit, Sigmoid activation (binary classification)	1
Frozen CNN Backbones
VGG16	ImageNet pretrained base (without classification head)	$7 \times 7 \times 512$
EfficientNetV2B0	ImageNet pretrained base (without classification head)	$7 \times 7 \times 1280$
MobileNetV2	ImageNet pretrained base (without classification head)	$7 \times 7 \times 1280$
ResNet50	ImageNet pretrained base (without classification head)	$7 \times 7 \times 2048$

Table 2. Summary of the proposed CNN architecture.

Model	Proposed CNN
Layer	Kernel Size	Filter Number	Parameters
Conv1	3 × 3	32	320
Max Pooling1	4 × 4	-	0
Conv2	3 × 3	64	18,496
Max Pooling2	4 × 4	-	0
FC1	128	-	991,360
FC2	256	-	33,024
Output (Softmax)	2	-	514

Table 3. Genetic algorithm configuration for feature selection.

Parameter	Value
Initial population	20 individuals
Number of generations	30
Crossover	Two-point, probability 0.7
Mutation	Bit-flip, probability 0.05
Fitness function	Accuracy of a base classifier trained on selected features

Table 4. Performance comparison of CNN and transfer learning models for CXR classification (values in %).

Model	Class	Accuracy	Recall	Specificity	Precision	F1-Score
Baseline CNN	Normal	94.65	85.71	98.12	94.44	89.86
	Pneumonia	94.65	98.12	85.71	94.89	96.43
VGG16	Normal	96.59	88.66	99.38	97.78	93.06
	Pneumonia	96.59	99.38	88.66	95.91	97.62
ResNet50	Normal	88.28	83.19	90.48	76.75	79.88
	Pneumonia	88.28	90.48	83.19	93.64	91.95
EfficientNetV2	Normal	72.93	0.00	100.00	0.00	0.00
	Pneumonia	72.93	100.00	0.00	72.91	84.38
MobileNetV2	Normal	94.79	97.06	97.06	85.56	91.98
	Pneumonia	94.79	93.92	97.06	98.85	96.97

Table 5. Classification results using images as direct input, without feature extraction.

Model	Class	Accuracy (%)	Recall (%)	Specificity (%)	Precision (%)	F1-Score (%)
Fuzzy SVM	Normal	92.95	86.13	95.48	87.61	86.86
Fuzzy SVM	Pneumonia	92.95	95.48	86.13	94.88	95.18
Fuzzy KNN	Normal	91.01	75.21	96.88	89.95	81.92
Fuzzy KNN	Pneumonia	91.01	96.88	75.21	91.32	94.02
Fuzzy DT	Normal	85.10	74.79	88.92	71.49	73.10
Fuzzy DT	Pneumonia	85.10	88.92	74.79	90.48	89.69
Fuzzy C-Means	Normal	54.72	55.88	54.29	31.22	40.06
Fuzzy C-Means	Pneumonia	54.72	54.29	55.88	76.82	63.62
ANFIS (Gaussian)	Normal	72.92	0.00	100.00	0.00	0.00
ANFIS (Gaussian)	Pneumonia	72.92	100.00	0.00	72.92	84.34
ANFIS (Triangular)	Normal	72.92	0.00	100.00	0.00	0.00
ANFIS (Triangular)	Pneumonia	72.92	100.00	0.00	72.92	84.34
ANFIS (Trapezoidal)	Normal	72.92	0.00	100.00	0.00	0.00
ANFIS (Trapezoidal)	Pneumonia	72.92	100.00	0.00	72.92	84.34

Table 6. Maximum clustering quality Q and number of selected features per CNN.

Model	Maximum Quality Q	Selected Features
MobileNetV2	10.10	74
ResNet50	5.11	74
VGG16	12.89	95
EfficientNetV2	6.84	69
Custom CNN	7.22	44

Table 7. Clustering validation metrics for selected CNN features.

Model	Silhouette Score	Calinski–Harabasz Score	Davies–Bouldin Index	Selected Features
MobileNetV2	0.7229	5182.68	0.5834	44
ResNet50	0.5678	7561.61	0.5251	17
VGG16	0.7343	8233.05	0.4292	16
EfficientNetV2	0.6495	5476.06	0.4790	56
Custom CNN	0.6541	9038.60	0.4254	30

Table 8. Classification results using FCM with selected features from different CNN models.

Algorithm/Class	Accuracy (%)	Recall/Sensitivity (%)	Specificity (%)	Precision (%)	F1-Score (%)
CNN-Normal	93.40	99.16	91.26	80.82	89.06
CNN-Pneumonia	93.40	91.26	99.16	99.66	95.28
EfficientNetV2-Normal	91.47	98.74	88.77	76.55	86.24
EfficientNetV2-Pneumonia	91.47	88.77	98.74	99.48	93.82
MobileNetV2-Normal	89.42	99.58	85.65	72.04	83.60
MobileNetV2-Pneumonia	89.42	85.65	99.58	99.82	92.19
ResNet50-Normal	91.13	92.44	90.64	78.57	84.94
ResNet50-Pneumonia	91.13	90.64	92.44	96.99	93.71
VGG16-Normal	94.43	99.16	92.67	83.39	90.60
VGG16-Pneumonia	94.43	92.67	99.16	99.66	96.04

Table 9. Classification results using Fuzzy DT with different feature extraction algorithms.

Algorithm/Class	Accuracy (%)	Recall/Sensitivity (%)	Specificity (%)	Precision (%)	F1-Score (%)
CNN-Normal	97.72	96.22	98.28	95.42	95.82
CNN-Pneumonia	97.72	98.28	96.22	98.59	98.44
EfficientNetV2-Normal	94.31	90.76	95.63	88.52	89.63
EfficientNetV2-Pneumonia	94.31	95.63	90.76	96.54	96.08
MobileNetV2-Normal	97.61	95.38	98.44	95.78	95.58
MobileNetV2-Pneumonia	97.61	98.44	95.38	98.29	98.36
ResNet50-Normal	92.26	86.55	94.38	85.12	85.83
ResNet50-Pneumonia	92.26	94.38	86.55	94.98	94.68
VGG16-Normal	96.25	92.44	97.66	93.62	93.02
VGG16-Pneumonia	96.25	97.66	92.44	97.20	97.43

Table 10. Fuzzy KNN classification results using different feature extraction algorithms.

Algorithm/Class	Accuracy (%)	Recall/Sensitivity (%)	Specificity (%)	Precision (%)	F1-Score (%)
CNN-Normal	98.18	97.06	98.60	96.25	96.65
CNN-Pneumonia	98.18	98.60	97.06	98.90	98.75
EfficientNetV2-Normal	96.81	95.38	97.35	93.03	94.19
EfficientNetV2-Pneumonia	96.81	97.35	95.38	98.27	97.81
MobileNetV2-Normal	98.07	96.22	98.75	96.62	96.42
MobileNetV2-Pneumonia	98.07	98.75	96.22	98.60	98.67
ResNet50-Normal	92.61	83.19	96.10	88.79	85.90
ResNet50-Pneumonia	92.61	96.10	83.19	93.90	94.99
VGG16-Normal	96.70	92.86	98.13	94.85	93.84
VGG16-Pneumonia	96.70	98.13	92.86	97.37	97.75

Table 11. Fuzzy SVM classification results using different feature extraction algorithms.

Algorithm/Class	Accuracy (%)	Recall/Sensitivity (%)	Specificity (%)	Precision (%)	F1-Score (%)
CNN-Normal	98.29	97.06	98.75	96.65	96.86
CNN-Pneumonia	98.29	98.75	97.06	98.91	98.83
EfficientNetV2-Normal	96.70	94.54	97.50	93.36	93.95
EfficientNetV2-Pneumonia	96.70	97.50	94.54	97.96	97.73
MobileNetV2-Normal	98.63	98.32	98.75	96.69	97.50
MobileNetV2-Pneumonia	98.63	98.75	98.32	99.37	99.06
ResNet50-Normal	93.06	87.39	95.16	87.03	87.21
ResNet50-Pneumonia	93.06	95.16	87.39	95.31	95.24
VGG16-Normal	97.38	94.96	98.28	95.36	95.16
VGG16-Pneumonia	97.38	98.28	94.96	98.13	98.21

Table 12. ANFIS results using features extracted from different CNN architectures. All values are in %.

CNN	Membership Function	Class	Accuracy (%)	Recall (%)	Specificity (%)	Precision (%)	F1-Score (%)
CNN Generic	Triangular	Normal	98.41	97.06	98.91	97.06	97.06
	Triangular	Pneumonia	98.41	98.91	97.06	98.91	98.91
	Gaussian	Normal	98.41	97.90	98.60	96.28	97.08
	Gaussian	Pneumonia	98.41	98.60	97.90	99.22	98.90
	Trapezoidal	Normal	95.90	88.66	98.60	95.91	92.14
	Trapezoidal	Pneumonia	95.90	98.60	88.66	95.90	97.23
EfficientNetV2	Triangular	Normal	97.16	94.96	97.97	94.56	94.76
	Triangular	Pneumonia	97.16	97.97	94.96	98.12	98.05
	Gaussian	Normal	97.27	95.38	97.97	94.58	94.98
	Gaussian	Pneumonia	97.27	97.97	95.38	98.28	98.12
	Trapezoidal	Normal	88.51	95.38	85.96	71.61	81.80
	Trapezoidal	Pneumonia	88.51	85.96	95.38	98.04	91.60
MobileNetV2	Triangular	Normal	98.18	98.32	98.13	95.12	96.69
	Triangular	Pneumonia	98.18	98.13	98.32	99.37	98.74
	Gaussian	Normal	98.52	97.48	98.91	97.07	97.27
	Gaussian	Pneumonia	98.52	98.91	97.48	99.06	98.99
	Trapezoidal	Normal	81.11	30.67	99.84	98.65	46.79
	Trapezoidal	Pneumonia	81.11	99.84	30.67	79.50	88.52
ResNet50	Triangular	Normal	92.72	83.19	96.26	89.19	86.09
	Triangular	Pneumonia	92.72	96.26	83.19	93.91	95.07
	Gaussian	Normal	92.49	83.61	95.79	88.05	85.78
	Gaussian	Pneumonia	92.49	95.79	83.61	94.03	94.90
	Trapezoidal	Normal	92.49	83.61	95.79	88.05	85.78
	Trapezoidal	Pneumonia	92.49	95.79	83.61	94.03	94.90
VGG16	Triangular	Normal	72.92	0.00	100.0	0.00	0.00
	Triangular	Pneumonia	72.92	100.0	0.00	72.92	84.34
	Gaussian	Normal	97.50	94.12	98.75	96.55	95.32
	Gaussian	Pneumonia	97.50	98.75	94.12	97.84	98.29
	Trapezoidal	Normal	96.93	92.44	98.60	96.07	94.22
	Trapezoidal	Pneumonia	96.93	98.60	92.44	97.23	97.91

Table 13. Best classification results for each method by class (values in %).

Method	Class	Accuracy (%)	Recall (%)	Precision (%)	F1-Score (%)
FCM	Normal	94.43	99.16	83.39	90.60
FCM	Pneumonia	94.43	92.67	99.66	96.04
Fuzzy DT	Normal	97.72	96.22	95.42	95.82
Fuzzy DT	Pneumonia	97.72	98.28	98.59	98.44
Fuzzy KNN	Normal	98.18	97.06	96.25	96.65
Fuzzy KNN	Pneumonia	98.18	98.60	98.90	98.75
Fuzzy SVM	Normal	98.63	98.32	96.69	97.50
Fuzzy SVM	Pneumonia	98.63	98.75	99.37	99.06
ANFIS	Normal	98.52	97.48	97.07	97.27
ANFIS	Pneumonia	98.52	98.91	99.06	98.99

Table 14. Comparison with previous studies on pneumonia detection.

Reference	Study Focus	Methodology	Reported Accuracy
[36]	Pneumonia detection with transfer learning	Deep CNN with transfer learning	98.1%
[37]	Deep learning framework for pneumonia detection	MobileNetV2, CNN, LSTM-CNN	92–96.4%
[38]	Pneumonia detection with image enhancement	Concatenated CNN (CCNN) with fuzzy logic-based enhancement	98.9%
[39]	Multi-level CNN evaluation for COVID-19 and pneumonia	AlexNet, ResNet-50, MobileNet, VGG19	82–87%
[40]	CXR classification using Faster R-CNN	Fully connected RCNN	62%
[41]	Pediatric pneumonia diagnosis via transfer learning	VGG16 and CNN	74.2%
[42]	Pneumonia detection using CNN-based features	DenseNet-169 + SVM	80.02%
[43]	Pneumonia diagnosis using deep learning	VGG16 and Xception	87%
[44]	Pneumonia localization on large-scale dataset	Ensemble RetinaNet + Mask R-CNN	75.8%
This work	Hybrid ANFIS classification with feature selection	ANFIS + MobileNetV2 (Gaussian MF)	98.52%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mjahad, A.; Rosado-Muñoz, A. Combining Deep Learning Architectures with Fuzzy Logic for Robust Pneumonia Detection in Chest X-Rays. Appl. Sci. 2025, 15, 10321. https://doi.org/10.3390/app151910321

AMA Style

Mjahad A, Rosado-Muñoz A. Combining Deep Learning Architectures with Fuzzy Logic for Robust Pneumonia Detection in Chest X-Rays. Applied Sciences. 2025; 15(19):10321. https://doi.org/10.3390/app151910321

Chicago/Turabian Style

Mjahad, Azeddine, and Alfredo Rosado-Muñoz. 2025. "Combining Deep Learning Architectures with Fuzzy Logic for Robust Pneumonia Detection in Chest X-Rays" Applied Sciences 15, no. 19: 10321. https://doi.org/10.3390/app151910321

APA Style

Mjahad, A., & Rosado-Muñoz, A. (2025). Combining Deep Learning Architectures with Fuzzy Logic for Robust Pneumonia Detection in Chest X-Rays. Applied Sciences, 15(19), 10321. https://doi.org/10.3390/app151910321

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Combining Deep Learning Architectures with Fuzzy Logic for Robust Pneumonia Detection in Chest X-Rays

Abstract

1. Introduction

2. Fundamental Concepts CNN

2.1. Optimization of Hyperparameters

2.2. CNN Approaches Implemented

3. Fuzzy Logic-Based Classification Models

4. Materials and Methods

4.1. Dataset Preparation

4.2. CNN Architectures for Deep Feature Extraction

4.3. Feature Selection Using GAs

4.3.1. Method 1: GA-Based Feature Selection Using Quality Metric Q

4.3.2. Method 2: GA-Based Feature Selection Using the Silhouette Index

4.3.3. Additional Cluster Validity Indices

4.4. Fuzzy Classification Models and Evaluation

5. Results

5.1. Direct Image-Based Classification—CNN

5.2. Performance Evaluation of Fuzzy Models Using Direct Image Input

5.3. Feature Extraction and Selection from CNN Models

5.3.1. Feature Selection Using GA and Quality Metric Q

Visualization of Selected Features

5.3.2. Feature Selection Using Clustering Metrics

PCA Analysis of Cluster Structure

GA Fitness Evolution Across Generations

5.3.3. FCM Classification Using Selected Features

5.3.4. Fuzzy DT Classification Using Selected Features

5.3.5. Fuzzy KNN Classification Using Selected Features

5.3.6. Fuzzy SVM Classification Using Selected Features

5.4. ANFIS-CNN Classification Results

5.5. Comparative Analysis of Fuzzy and ANFIS Classification Methods

5.6. Comparison with Previous Studies

6. Discussion

7. Application in a Real Clinical Setting

8. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI