1. Introduction
Pneumonia is a respiratory disease characterized by lung inflammation and remains one of the leading causes of morbidity and mortality worldwide, particularly among vulnerable populations such as children, the elderly, and immunocompromised patients [
1]. Early and accurate detection of this condition is crucial for initiating appropriate treatment and improving patient prognosis [
2].
Chest X-rays (CXR) are the most widely used imaging modality for pneumonia diagnosis due to their low cost, rapid acquisition, and widespread availability [
3]. However, interpreting chest radiographs is often complex and subjective, even for experienced radiologists, because of overlapping anatomical structures, variable image quality, and visual similarities between pneumonia and other pulmonary conditions [
4]. Furthermore, in many regions, limited resources and a shortage of specialists hinder timely and high-quality diagnosis [
5].
To address these challenges, computer-aided diagnosis (CAD) systems have gained increasing importance in medical imaging [
6]. In particular, artificial intelligence (AI)-based approaches have demonstrated significant potential to improve diagnostic accuracy, reduce the workload of healthcare professionals, and streamline clinical workflows [
7]. Deep learning models, in particular, have achieved remarkable results in medical image analysis due to their ability to automatically extract discriminative features directly from raw data, without requiring manual feature engineering [
8]. In parallel, fuzzy logic-based methods have been widely applied in biomedical research, given their capacity to handle uncertainty, imprecision, and non-linear relationships inherent in clinical data [
9]. Fuzzy systems provide flexible and interpretable frameworks for decision-making, making them particularly suitable for medical diagnosis where data ambiguity is common [
10]. Consequently, integrating fuzzy reasoning with AI-based feature extraction can enhance the robustness, transparency, and reliability of CAD systems for pneumonia detection.
Several studies in recent years have leveraged large-scale public CXR datasets to develop automated pneumonia detection models. Despite encouraging results, persistent challenges remain, including class imbalance, inter-patient variability, and the uncertainty of visual patterns, which may compromise model generalizability in real-world clinical settings.
Therefore, this study proposes a hybrid framework for pneumonia detection that combines deep learning architectures for feature extraction with fuzzy logic-based classifiers. The approach aims to exploit the high representational capacity of deep neural networks together with the interpretability and uncertainty management of fuzzy systems. By addressing the limitations of conventional methods, the proposed system seeks to improve diagnostic accuracy, robustness, and clinical applicability, ultimately contributing to timely and effective patient care.
Proposed Work:
This study proposes a hybrid framework for the automated detection of pneumonia in chest X-ray (CXR) images, integrating feature extraction using CNNs with fuzzy logic-based classifiers, including ANFIS. Unlike previous works where fuzzy systems were applied first for feature extraction and CNNs were subsequently used for classification, our approach reverses this order: CNNs are employed to extract deep features and optimize relevant parameters, while fuzzy classifiers, including ANFIS, are applied for the final classification. This strategy enables more effective training, enhances model robustness and interpretability, and maximizes accuracy in pneumonia detection.
The methodology is structured into three main stages:
Stage 1: Deep Feature Extraction via CNNs
Pretrained CNNs (VGG16, EfficientNetV2, MobileNetV2, and ResNet50) are used to extract deep features and optimize parameters relevant for classification.
Stage 2: Fuzzy Classification
Various fuzzy classifiers (Fuzzy SVM, Fuzzy DT, Fuzzy KNN, FCM, and ANFIS) are trained using the features extracted from the CNNs. For ANFIS, both Gaussian and triangular membership functions are used, with parameters automatically adjusted through ANFIS’s hybrid learning algorithm (combining gradient descent and least squares).
Stage 3: Hybrid Integration and Optimization
A genetic algorithm is applied to select the best parameters, and all CNN–fuzzy combinations are compared to identify the most effective configuration for pneumonia classification.
This reversal of the processing flow represents the main novelty of our approach compared to previous studies, allowing more efficient use of extracted features and superior performance in pneumonia classification.
The structure of this paper is organized as follows.
Section 2 introduces the classification strategies explored in this work, with an emphasis on deep learning techniques CNNs and fuzzy logic-based approaches.
Section 3 describes the fuzzy logic-based classification models employed in the study.
Section 4 details the data sources and outlines the methodological framework, including the training scheme, feature extraction process, and evaluation protocol.
Section 5 presents the experimental results, which are further analyzed and discussed in
Section 6. Finally,
Section 8 summarizes the main contributions of this work and outlines directions for future research.
2. Fundamental Concepts CNN
This section provides an overview of CNNs, focusing on the particular architecture adopted for this research.
CNNs have gained widespread attention due to their superior accuracy in tasks involving image recognition and classification. Fundamentally, they can be viewed as specialized feedforward artificial neural networks (ANNs) [
11], as depicted in
Figure 1.
These networks consist of multiple layers, each containing trainable filters or kernels (also called neurons) with weights and biases that are optimized during training. The process involves convolving these filters over the input data, often followed by nonlinear activation functions to introduce complexity [
12]. A standard CNN architecture is composed of the following elements:
Convolutional layers (CONV): These layers apply convolutional operations to extract features from the input data.
Pooling layers (POOL): They reduce the spatial dimensions of feature maps, commonly through subsampling, to decrease computational load and capture dominant features.
Fully connected layers (FCL): These layers connect every neuron from the previous layer to each neuron in the current layer, similar to a traditional perceptron, to interpret extracted features.
Classification layer (Softmax): This final layer computes probabilities for each class, assigning the input to the most likely category.
2.1. Optimization of Hyperparameters
Hyperparameters in CNNs are parameters that are not learned during training but must be predefined. They significantly affect model performance and training efficiency.
Important hyperparameters include:
Number of layers: Typically a combination of convolutional, activation (e.g., ReLU), pooling, and fully connected layers [
13].
Filter size (kernel size): Examples include
,
, or
, which influence feature extraction granularity [
14].
Number of filters: Determines the depth and expressiveness of feature maps, balancing computational cost and accuracy [
15].
Stride: Controls the step size of filters over the input; common values are 1 or 2, affecting output size [
16].
Padding: Preserves spatial dimensions, typically set as ‘same’ or ‘valid’ [
16].
Activation functions: Such as ReLU, leaky ReLU, or Sigmoid; ReLU is favored for simplicity and effectiveness [
17].
Pooling layers: Max pooling or average pooling with typical pool sizes of
reduce spatial dimensions and computational load [
18].
Fully connected layers: Size varies depending on task complexity; output layer size matches the number of classes [
19].
Dropout: A regularization technique to prevent overfitting by randomly deactivating neurons during training, usually with rates between 0.2 and 0.5 [
20].
Batch size: Number of samples processed per training iteration, impacting convergence speed and computational requirements [
21].
Number of epochs: Defines how many times the entire dataset is passed through the network during training [
22].
Learning rate: Controls the optimization step size; too small slows convergence, too large risks instability [
23].
Optimizers: Algorithms such as Stochastic Gradient Descent (SGD), Adam, and RMSprop influence training efficiency and model accuracy [
24].
Selecting optimal hyperparameters depends on the dataset, problem domain, and available computational resources, often requiring iterative tuning and experimentation.
2.2. CNN Approaches Implemented
In this study, four CNN architectures with distinct characteristics and purposes were employed for image classification and analysis: VGG16, EfficientNetV2, MobileNetV2, and ResNet50. VGG16 is recognized for its simplicity and depth, stacking exclusively small convolutional filters across multiple layers to increase representational capacity. It also uses pooling layers to reduce spatial dimensions and fully connected layers for classification. EfficientNetV2 optimizes both accuracy and efficiency by applying compound scaling of depth, width, and resolution, along with improved training techniques. This architecture achieves strong performance with fewer parameters and faster training times compared to previous models. MobileNetV2 is designed specifically for efficiency on mobile and embedded devices, utilizing depthwise separable convolutions to significantly reduce the number of parameters and computational cost. It introduces bottleneck layers and residual connections to further enhance performance. ResNet50, a deep residual network, addresses the degradation problem in very deep networks by using skip connections that facilitate gradient flow during training. This architecture enables training of very deep models while maintaining high accuracy. Each of these architectures offers unique advantages that complement one another to enhance visual data analysis in this study.
3. Fuzzy Logic-Based Classification Models
Fuzzy logic, introduced by Zadeh in 1965, allows handling imprecision and uncertainty through membership values in the range
[
25]. Unlike classical binary logic, it enables the representation of degrees of membership and the management of uncertain information, which is particularly useful in industrial defect detection.
Fuzzy sets are represented by membership functions, typically triangular or Gaussian. In this work, we use three membership functions per feature (two Gaussian and one triangular) to capture variability and uncertainty while preserving interpretability. Among the most widely used fuzzy algorithms, two groups can be distinguished:
ANFIS (Adaptive Neuro-Fuzzy Inference System) [
26]: uses explicit membership functions and trainable fuzzy rules. During training, ANFIS adjusts both the parameters of the membership functions and the rules that combine the inputs to generate the output, through a hybrid learning approach (gradient + least squares). This enables adaptive modeling of uncertainty and nonlinear relationships.
Other fuzzy classifiers:
- –
Fuzzy SVM [
27]: incorporates fuzzy membership values for each sample to weight its importance during training, but does not adjust membership functions or explicit rules.
- –
Fuzzy KNN [
28]: assigns soft memberships to a query point based on its distance to neighbors; it does not employ trainable membership functions or rules.
- –
Fuzzy DT [
29]: operate with derived membership values at each node, without training membership functions or rules; they enable handling of uncertainty in decision-making.
- –
FCM [
30,
31]: performs fuzzy clustering based on adaptive distances and cluster memberships; it does not train membership functions or rules.
To illustrate the operation of ANFIS and its trainable membership functions and rules, different configurations of membership functions were analyzed.
Figure 2,
Figure 3 and
Figure 4 present the ANFIS training schemes using Gaussian, trapezoidal, and triangular membership functions, respectively.
4. Materials and Methods
This study proposes a hybrid methodology for the automatic detection of pneumonia in CXR images, combining deep learning for feature extraction with fuzzy logic-based classification. The general pipeline includes dataset preparation, feature extraction using CNN architectures, feature selection via GAs, and classification using fuzzy systems. Finally, the results are evaluated through performance metrics.
Figure 5 illustrates the overall workflow.
Dataset Preparation:
The dataset consists of labeled CXR images divided into pneumonia-positive and normal classes.
The data was split into training and testing sets, maintaining class balance and representative sampling.
Preprocessing involved normalization and resizing images to fit the input requirements of the CNN models.
CNN Architectures for Deep Feature Extraction:
Pre-trained CNN architectures such as VGG16, EfficientNetV2, MobileNetV2, and ResNet50 were employed.
The models were fine-tuned on the training dataset to provide baseline classification performance.
Deep features were extracted from the penultimate layers to be used in subsequent fuzzy classification.
Feature Selection Using Genetic Algorithms:
The extracted CNN features are typically high-dimensional and may contain redundant or irrelevant components.
For each CNN, a genetic algorithm (GA) was applied to select the most informative parameters within its feature group, yielding an optimized version of each CNN.
The optimized feature groups from different CNNs were then compared using clustering-based metrics, namely the Silhouette coefficient and the Calinski-Harabasz index, in order to determine the best CNN feature group for classification.
Fuzzy Classification Models and Evaluation:
Several fuzzy logic-based classifiers were trained using the optimized CNN features.
The classifiers considered were Fuzzy SVM, Fuzzy DT, Fuzzy K-Nearest Neighbors Fuzzy KNN, and FCM.
Performance was evaluated using accuracy, precision, recall, F1-score, and ROC curves.
All CNN–fuzzy combinations were tested to identify the most effective hybrid configuration.
Testing and Validation:
Test images were preprocessed and passed through the trained CNN feature extractors.
Optimized feature subsets were used as inputs to the trained fuzzy classifiers.
Model performance was assessed on unseen data to verify generalization and robustness.
4.1. Dataset Preparation
The dataset used in this study is the publicly available CXR Images (Pneumonia) dataset, originally introduced as part of a clinical study published in the journal Cell [
32]. It contains 5863 anterior-posterior (AP) chest X-ray images of pediatric patients aged 1–5 years. The data were collected at the Guangzhou Women and Children’s Medical Center in China as part of routine clinical care. All radiographs underwent an initial quality screening, and low-quality or unreadable scans were excluded. Diagnostic labels—classified as either Normal or Pneumonia (bacterial or viral)—were assigned based on expert evaluation. Each image was independently reviewed by two certified radiologists, and to ensure diagnostic accuracy, a third specialist evaluated a subset of images to reduce annotation errors
The dataset is organized into three predefined folders: training, validation, and testing. These subsets are balanced across the two classes, facilitating supervised learning and consistent performance evaluation. The images are in JPEG format and exhibit sufficient resolution for feature extraction via CNNs. The dataset is available under a Creative Commons (CC BY 4.0) license and can be accessed through Mendeley Data [
33].
Figure 6 provides example CXR scans illustrating the visual differences between normal lungs and those affected by bacterial and viral pneumonia.
4.2. CNN Architectures for Deep Feature Extraction
To extract rich and discriminative feature representations from CXR images, several well-known CNN architectures were employed in a transfer learning setting. The models used include VGG16, EfficientNetV2, MobileNetV2, and ResNet50. Each of these networks was initialized with weights pretrained on the ImageNet dataset. Their original classification layers were removed, and the convolutional base was kept frozen during training in order to retain the generic visual features learned from large-scale natural image datasets.
On top of each frozen base CNN, a custom classification head was added to adapt the model to the binary task (normal vs. pneumonia). This head consists of a Global Average Pooling (GAP) layer, followed by two fully connected (dense) layers with ReLU activation, and a final dense layer with a sigmoid activation function for binary classification. During training, only these top layers were optimized. The output of the penultimate dense layer was extracted as a deep feature vector for each image.
Table 1 summarizes the CNN architectures used in this work and the common classification head added to all of them. These deep features were then used as inputs to fuzzy logic-based classifiers, combining the automatic and powerful feature extraction capability of CNNs with the flexibility and uncertainty management provided by fuzzy models. This hybrid approach aims to improve diagnostic accuracy and robustness by leveraging the strengths of both methodologies.
In addition to the pretrained backbones, we also designed a custom CNN architecture (
Table 2), which serves as a baseline for validating the performance of our hybrid approach against both standard and customized CNN structures.
4.3. Feature Selection Using GAs
A two-stage feature selection strategy based on Genetic Algorithms (GA) was employed:
4.3.1. Method 1: GA-Based Feature Selection Using Quality Metric Q
To maximize class separability, a quality metric
Q is computed as the average Euclidean distance between samples from different classes [
34]:
where
and
are the sets of selected features for classes 0 and 1, respectively, and
is the Euclidean distance between samples
x and
y.
4.3.2. Method 2: GA-Based Feature Selection Using the Silhouette Index
To evaluate clustering quality of feature subsets, the average Silhouette index is computed [
35]:
where
is the average intra-cluster distance and
is the minimum average inter-cluster distance of sample
i.
4.3.3. Additional Cluster Validity Indices
To further assess clustering performance, the Calinski-Harabasz (CH) and Davies-Bouldin (DB) indices are computed:
where
is the intra-cluster distance of cluster
i and
is the distance between centroids of clusters
i and
j.
The configuration of the GA used for feature selection is summarized in
Table 3.
4.4. Fuzzy Classification Models and Evaluation
The optimal feature subset obtained through the GAs described in the previous section was used as input for several fuzzy classifiers. These models leverage fuzzy logic rules and membership functions to handle the uncertainty and imprecision often present in medical data. The classifiers evaluated include: Fuzzy KNN, Fuzzy DT, Fuzzy SVM, FCM, and ANFIS.
Each model was tuned using different types of membership functions (triangular, trapezoidal, and Gaussian) and decision thresholds to optimize classification performance. In the case of ANFIS, the hybrid learning algorithm (combining gradient descent and least squares) was used to adapt the membership function parameters and fuzzy rules.
To ensure robustness, all fuzzy classifiers were trained and validated using 5-fold cross-validation on the training data. The dataset division was the same as applied to the CNN architectures: 75% for training, 15% for testing, and 10% for validation. The training subset was used both for feature selection (via GA) and classifier optimization, while the validation set was employed for hyperparameter tuning. The independent test set (15%) was used exclusively for the final evaluation to provide unbiased performance assessment.
The evaluation metrics used in this study—accuracy, precision, recall (sensitivity), specificity, F1-score, and AUC—are standard in binary classification problems. These metrics provide complementary views of the model’s performance and are widely adopted in medical diagnosis tasks.
5. Results
Before presenting the experimental results in detail, the overall workflow of the proposed approach is shown in
Figure 7. The methodology begins with preprocessing and enhancement of medical images, followed by three main analysis paths:
Path 1: Fuzzy-only approach—Direct use of the original images as input to the fuzzy classification system, without applying any feature extraction or parameter optimization.
Path 2: CNN-only approach—The original images are directly fed into the CNN family (VGG16, ResNet50, MobileNetV2, EfficientNetV2), which performs both feature extraction and classification. No additional feature selection or optimization is applied in this path.
Path 3: Hybrid CNN-Fuzzy approach—Combines CNN-based feature extraction with fuzzy classification. Features extracted by the CNN family are first optimized using two strategies:
- –
Method 1: Intra-architecture optimization (GA-based)—Internal tuning of genetic algorithm parameters (population size, generations, mutation rate) applied independently to each CNN architecture, producing an optimized subset of features.
- –
Method 2: Inter-method comparison (clustering metrics)—Comparison among optimized feature groups using clustering-based evaluation metrics, namely the Silhouette Index, Calinski-Harabasz Index, and Davies-Bouldin Index. These metrics guide the selection of the best-performing CNN feature group.
Finally, the selected features are passed to the fuzzy classifiers, which produce the final diagnosis.
Figure 7 illustrates the overall workflow of the proposed approach as a block diagram. The experimental setup follows a comparative structure in which each path provides complementary evidence about the contribution of CNN-based deep features and fuzzy classifiers. The following subsections present the quantitative results for each analysis path, followed by a comparative discussion that highlights the advantages of the proposed hybrid CNN-Fuzzy system over single-method approaches.
5.1. Direct Image-Based Classification—CNN
Table 4 summarizes the performance of different CNN architectures for CXR classification into Normal and Pneumonia classes. Overall, VGG16 and MobileNetV2 demonstrate the highest performance across most metrics, with accuracy values exceeding 96% for both classes. In contrast, EfficientNetV2 shows significantly lower accuracy for the Normal class, with metrics such as Recall and Precision dropping to 0%, indicating difficulties in correctly classifying this category.
For the Normal class, as illustrated in
Figure 8, VGG16 achieves the highest recall 88.66% and precision 97.78%, suggesting a strong ability to correctly identify normal cases while minimizing false positives. MobileNetV2 also performs well, particularly in recall 97.06% and specificity (97.06%), indicating reliable discrimination between normal and pathological samples.
Regarding the Pneumonia class,
Figure 9 shows that VGG16 maintains the best overall balance across all metrics, with an F1-score of 97.62%, while Baseline CNN and MobileNetV2 also yield high accuracy and recall. EfficientNetV2 exhibits extremely high recall 100% but very low specificity (0%), reflecting that it tends to classify most samples as Pneumonia, which inflates sensitivity but severely compromises its ability to correctly recognize normal cases.
In summary, the analysis of
Table 4 and
Figure 8 and
Figure 9 highlights that VGG16 and MobileNetV2 are the most robust architectures for direct image-based classification, achieving consistently high performance for both Normal and Pneumonia classes. These results provide a solid baseline for subsequent stages involving feature selection and hybrid CNN-Fuzzy approaches.
5.2. Performance Evaluation of Fuzzy Models Using Direct Image Input
In this study, various fuzzy models were evaluated for the classification of CXR images into two classes: Normal and Pneumonia, using the images directly as input without feature extraction. The methods evaluated include Fuzzy SVM, Fuzzy KNN, Fuzzy DT, FCM, and ANFIS with three membership function variants: Gaussian, Triangular, and Trapezoidal. The performance metrics for each model and class are presented in
Table 5.
The analysis of
Table 5 indicates that Fuzzy SVM achieved the best overall performance, with an accuracy of 92.95%. It shows a good balance between sensitivity for detecting Pneumonia 95.48% and specificity for Normal 95.48%, as well as high F1-scores for both classes, demonstrating robust and balanced classification performance.
The second-best model is Fuzzy KNN, with accuracy of 91.01 percent. It excels in detecting Pneumonia cases (recall 96.88%) but is less sensitive for Normal cases (recall 75.21%), indicating a slight tendency to misclassify healthy patients. Nonetheless, it is competitive when prioritizing disease detection.
Fuzzy DT shows intermediate performance (accuracy 85.10%), with lower sensitivity and precision for Normal cases compared to SVM and KNN, but its interpretable structure can be advantageous in clinical applications.
FCM demonstrates poor performance (accuracy 54.72%), barely above chance, making it unsuitable for direct image classification in this setting.
ANFIS models, regardless of membership function (Gaussian, Triangular, Trapezoidal), show similar behavior (accuracy 72.92%), with zero sensitivity for Normal and perfect sensitivity for Pneumonia. This indicates a strong bias toward Pneumonia classification, producing high false positives and limiting clinical applicability.
In summary, when using images directly as input to the fuzzy family without feature extraction, the most reliable models are Fuzzy SVM and Fuzzy KNN. SVM provides a balanced performance across both classes, whereas KNN favors Pneumonia detection. Fuzzy DT may be considered when interpretability is critical, and FCM and ANFIS are less recommended due to poor or biased performance.
5.3. Feature Extraction and Selection from CNN Models
This section presents the results of extracting deep features from the CNN family and applying feature selection using a Genetic Algorithm (GA), followed by a comparative evaluation among the CNN models using clustering-based metrics, as summarized in
Figure 10.
5.3.1. Feature Selection Using GA and Quality Metric Q
A GA was applied to select the most informative features from each CNN model: MobileNetV2, ResNet50, VGG16, EfficientNetV2, and a custom CNN. The fitness function in this analysis was the quality metric Q, which evaluates class separability in the feature space. The goal was to obtain subsets of features that maximize discriminability for Normal and Pneumonia classes.
Visualization of Selected Features
Figure 11 shows PCA-based projections of the selected features for each CNN model. These plots illustrate the class separation in a 2D space, providing a visual indication of the discriminative power of the selected features.
Table 6 summarizes the maximum clustering quality
Q achieved and the number of features selected for each CNN model. As can be seen in
Figure 12, the bar chart visually represents the same information, highlighting that VGG16 achieved the highest
Q, indicating the strongest discriminability. The figure also shows the number of selected features above each bar, with the best model clearly indicated.
5.3.2. Feature Selection Using Clustering Metrics
To further validate the selected features, clustering metrics (Silhouette Score, Calinski–Harabasz Index, Davies–Bouldin Index) were used to compare the feature subsets from all CNN models.
Table 7 presents the exact numeric values, while
Figure 13 provides a visual comparison.
PCA Analysis of Cluster Structure
Figure 14 shows PCA projections of the selected feature subsets using Silhouette-based selection. VGG16 and Custom CNN show well-separated clusters, while ResNet50 has poorly separated clusters.
GA Fitness Evolution Across Generations
Figure 15 shows the evolution of the Silhouette-based GA fitness across generations for all CNN models. VGG16 quickly reaches optimal fitness and remains stable, MobileNetV2 converges gradually, ResNet50 shows limited improvement, while EfficientNetV2 and Custom CNN exhibit smooth and consistent fitness growth.
5.3.3. FCM Classification Using Selected Features
The FCM classifier was applied using the optimal feature subsets extracted from the different CNN models.
Table 8 shows the performance metrics (Accuracy, Recall, Specificity, Precision, and F1-score) for both Normal and Pneumonia classes across all evaluated feature extraction methods.
The results in
Table 8 show that VGG16 features yield the best overall performance for FCM classification, achieving an accuracy of 94.43% and high F1-scores for both classes. CNN and EfficientNetV2 features follow closely, but MobileNetV2 has lower overall accuracy due to reduced specificity for Normal cases. ResNet50 achieves balanced but slightly lower performance.
Figure 16 and
Figure 17 illustrate the distribution of the performance metrics, highlighting that FCM is particularly sensitive to the quality of the selected feature set. While recall for Normal cases is generally high across most models, the precision varies significantly, indicating some misclassification of healthy patients.
In conclusion, FCM performs best when paired with highly discriminative features such as those extracted from VGG16, and its performance drops when feature quality or separability is lower.
5.3.4. Fuzzy DT Classification Using Selected Features
The Fuzzy DT classifier was evaluated using the feature subsets extracted from different CNN models.
Table 9 presents the classification performance metrics (Accuracy, Recall, Specificity, Precision, and F1-score) for both Normal and Pneumonia classes.
The results in
Table 9 indicate that CNN and MobileNetV2 features provide the best performance for Fuzzy DT classification, achieving accuracy around 97–98% and strong F1-scores for both classes. EfficientNetV2 and VGG16 features show slightly lower performance, especially in recall for Normal cases. ResNet50 has the lowest overall metrics, mainly due to reduced sensitivity for Normal samples.
Figure 18 and
Figure 19 demonstrate how Fuzzy DT performance depends on feature discriminability: highly separable features improve precision and recall significantly.
5.3.5. Fuzzy KNN Classification Using Selected Features
The Fuzzy KNN classifier was applied to the same feature subsets.
Table 10 presents the performance metrics for both Normal and Pneumonia classes.
Fuzzy KNN achieves the highest overall performance, with CNN and MobileNetV2 features reaching accuracy above 98% and high F1-scores for both classes. EfficientNetV2 and VGG16 also provide strong results, while ResNet50 underperforms due to lower sensitivity for Normal cases.
Figure 20 and
Figure 21 highlight that Fuzzy KNN benefits from highly discriminative features, showing consistently high recall and precision. Overall, Fuzzy KNN demonstrates robust classification performance, slightly outperforming Fuzzy DT in most scenarios.
5.3.6. Fuzzy SVM Classification Using Selected Features
The Fuzzy SVM classifier was evaluated using the feature subsets extracted from different CNN models.
Table 11 presents the classification performance metrics (Accuracy, Recall, Specificity, Precision, and F1-score) for both Normal and Pneumonia classes.
Fuzzy SVM achieves high overall classification performance. CNN and MobileNetV2 features reach accuracy above 98% with strong F1-scores for both Normal and Pneumonia classes. EfficientNetV2 and VGG16 also show competitive results, while ResNet50 underperforms due to lower sensitivity in Normal cases.
Figure 22 and
Figure 23 illustrate how Fuzzy SVM performance benefits from highly discriminative features, providing consistently high precision and recall across both classes.
5.4. ANFIS-CNN Classification Results
The ANFIS classifier demonstrated strong performance using features extracted from various CNN architectures.
The results in
Table 12 indicate that ANFIS effectively integrates CNN feature extraction with neuro-fuzzy reasoning for early pneumonia detection. CNN Generic and MobileNetV2 features combined with Gaussian membership functions achieved the highest overall accuracy and F1-scores, exceeding 98.5% and 97% respectively for both Normal and Pneumonia classes. EfficientNetV2 and VGG16 features also achieved high performance (around 97%), while ResNet50 features performed slightly lower (92–93%).
Gaussian membership functions provided the most consistent results across all CNN architectures. Trapezoidal and Triangular functions performed well in most cases but showed lower F1-scores for certain combinations, such as MobileNetV2 with Trapezoidal membership for Normal class. Overall, ANFIS with CNN features provides robust, accurate, and balanced classification, making it a strong approach for early pneumonia detection.
The results of ANFIS fuzzy classification using features extracted from both MobileNetV2 and a generic CNN, as illustrated in
Figure 24 and
Figure 25, show consistent trends across membership functions. The Triangular and Gaussian membership functions achieved the highest ROC AUC and PRC values, exceeding 98% in both feature sets, indicating strong discriminatory capability between the Normal and Pneumonia classes. Corresponding confusion matrices show minimal misclassifications, confirming that the majority of cases were correctly classified. Precision, recall, and F1-score metrics are consistently high, reflecting reliable and balanced detection.
In contrast, the Trapezoidal membership function exhibits lower ROC AUC and PRC values across both CNN feature sets, indicating reduced classification performance. Confusion matrices reveal increased errors, particularly in the Normal class, accompanied by lower precision and F1-score metrics, highlighting less robust predictions.
Overall, across both MobileNetV2 and generic CNN feature sets, Triangular and Gaussian membership functions consistently outperform Trapezoidal, achieving near-optimal class separation in ROC curves and stable predictions in PDC analyses. These observations confirm that the choice of membership function is critical to maximize ANFIS performance regardless of the CNN used for feature extraction.
5.5. Comparative Analysis of Fuzzy and ANFIS Classification Methods
Table 13 presents the best classification results for each method by class. This allows a clear comparison of the performance, stability, and robustness of fuzzy classifiers versus ANFIS.
Among the traditional fuzzy methods, Fuzzy KNN and Fuzzy SVM consistently achieve the highest accuracies, above 98%, with strong F1-scores for both Normal and Pneumonia classes. These methods demonstrate a well-balanced performance in precision and recall. Fuzzy SVM slightly outperforms Fuzzy KNN in Pneumonia detection, with an F1-score of 99.06% versus 98.75%.
Fuzzy DT offers good overall performance, around 97.7% accuracy, but is slightly inferior to Fuzzy KNN and SVM. Its main advantage is interpretability, which is valuable in clinical settings.
FCM shows the lowest performance among fuzzy methods, around 94.4% accuracy, with notable discrepancies between recall and precision for the Normal class, indicating difficulties in correctly identifying healthy patients. ANFIS demonstrates superior performance across both classes. Using MobileNetV2 features with Gaussian membership functions, ANFIS reaches 98.52% accuracy and F1-scores of 97.27% (Normal) and 98.99% (Pneumonia). ANFIS minimizes the gap between precision and recall for both classes, ensuring a balanced and clinically reliable classification.
The table clearly shows the comparative performance across methods.
5.6. Comparison with Previous Studies
Table 14 summarizes previous studies on pneumonia detection using chest X-ray images. Recent works reported high accuracy with different deep learning approaches. For example, Rahman et al. [
36] achieved 98.1% accuracy with deep CNN and transfer learning, while Elshennawy and Ibrahim [
37] obtained between 92% and 96.4% using MobileNetV2 combined with LSTM-CNN. More recently, a concatenated CNN with fuzzy logic-based image enhancement reached 98.9% accuracy [
38]. These studies illustrate the effectiveness of deep learning, although most approaches involve high computational cost and limited interpretability.
Our hybrid approach—integrating GA-based feature selection with fuzzy and ANFIS classifiers—achieved an accuracy of 98.52% using MobileNetV2 features and Gaussian membership functions. These results demonstrate that state-of-the-art performance can be obtained with fewer parameters and higher interpretability. In addition, our method showed reduced execution time compared to conventional deep learning models, highlighting its efficiency for real-time or resource-constrained applications.
6. Discussion
The results of this study provide a comprehensive comparison of traditional fuzzy classifiers and ANFIS for early pneumonia detection using CNN-extracted features and GA-based feature selection.
Among the fuzzy methods, Fuzzy KNN and Fuzzy SVM achieved the highest accuracy and F1-scores for both Normal and Pneumonia classes, demonstrating a balanced performance between precision and recall. Fuzzy Decision Tree also showed good performance, with the additional advantage of interpretability, which can be important in clinical settings. In contrast, Fuzzy C-Means showed lower reliability, particularly in correctly identifying healthy patients, due to sensitivity to feature quality and separability.
ANFIS consistently outperformed all traditional fuzzy classifiers. Using MobileNetV2 features with Gaussian membership functions, ANFIS achieved 98.52% accuracy and F1-scores of 97.27% (Normal) and 98.99% (Pneumonia), highlighting its ability to integrate CNN feature extraction with neuro-fuzzy reasoning for robust and stable classification. Moreover, ANFIS minimized the gap between precision and recall, a critical factor in clinical applications to reduce false negatives and ensure reliable diagnosis.
The comparison with previous studies, including works from 2019 to the most recent 2024 publications, shows that the proposed hybrid ANFIS-based approach achieves comparable or superior results. Notably, while some 2024 studies report high accuracy (95–98%) using complex CNN ensembles, our method achieves similar or better performance with fewer parameters and reduced computational cost, thanks to the effective feature selection and fuzzy-based reasoning. This highlights the practical advantages of the proposed approach in terms of efficiency, interpretability, and clinical applicability.
In summary, the discussion reveals that:
Fuzzy KNN and Fuzzy SVM are competitive when features are of high quality, but their performance can vary depending on feature discriminability.
Fuzzy Decision Tree provides interpretability with moderate accuracy, making it suitable for explainable clinical decisions.
ANFIS offers the most reliable and consistent classification, with robust performance across both classes and superior handling of feature uncertainty.
Compared with state-of-the-art studies, ANFIS achieves high accuracy while maintaining efficiency and a lower number of parameters, which is beneficial for practical deployment in clinical environments.
Overall, integrating CNN-based feature extraction with neuro-fuzzy reasoning and feature selection results in a powerful framework for early pneumonia detection, balancing accuracy, robustness, and interpretability.
7. Application in a Real Clinical Setting
In contemporary clinical practice, artificial intelligence—particularly deep learning techniques such as CNNs is increasingly adopted as a valuable tool to support diagnostic decision-making. This is especially relevant for conditions like pneumonia, where timely and accurate diagnosis is critical to preventing complications and reducing mortality.
Although CXR imaging is widely available and cost-effective, it poses significant interpretation challenges due to overlapping anatomical structures and the subtlety of pathological signs. Advanced CNN architectures such as VGG16, ResNet50, and InceptionV3 have demonstrated strong capabilities in addressing these challenges by extracting discriminative features and identifying complex visual patterns that may not be readily apparent to human observers.
Building on this capability, our study integrates fuzzy logic-based classifiers to manage the inherent uncertainty and imprecision of medical data. This hybrid approach not only achieves high classification accuracy but also enhances model robustness and interpretability—critical factors for real-world clinical deployment, where transparency is often as important as predictive performance.
In practical scenarios such as emergency departments or resource-constrained environments, AI-assisted systems can assist in prioritizing patients, streamlining radiology workflows, and reducing diagnostic delays. However, successful integration into clinical settings requires thorough model validation, regulatory compliance, and the establishment of clinician trust through explainable outputs.
The proposed framework contributes to bridging the gap between research and clinical application by providing a reliable, interpretable, and efficient tool for pneumonia detection across both routine and acute care settings.
8. Conclusions
This study presented a hybrid framework for pneumonia detection in CXR images by integrating deep CNNs for feature extraction with fuzzy logic-based classifiers to enhance interpretability and handle uncertainty. The combination of pre-trained models such as VGG16, EfficientNetV2, MobileNetV2, and ResNet50 with GA-based feature selection demonstrated strong classification performance across key metrics, including sensitivity, specificity, precision, F1-score, and overall accuracy.
The incorporation of fuzzy logic contributed not only to improved diagnostic accuracy—with overall accuracy values reaching approximately 93.5% for traditional fuzzy classifiers—but also to greater transparency in decision-making, an essential requirement in clinical applications. Among the evaluated configurations, fuzzy classifiers employing triangular and trapezoidal membership functions, especially in KNN and Decision Tree algorithms, yielded the most balanced and reliable results, achieving F1-scores above 95% for the Abnormal class and balanced performance for the Normal class.
Notably, ANFIS combined with MobileNetV2 features and Gaussian membership functions outperformed all traditional fuzzy classifiers, achieving an overall accuracy of 98.52% and F1-scores of 97.27% (Normal) and 98.99% (Pneumonia). Compared with recent state-of-the-art studies, including 2024 publications reporting accuracies of 95–98%, the proposed ANFIS-based approach achieves comparable or superior performance while maintaining a reduced number of parameters, lower computational cost, and enhanced interpretability.
Overall, the results confirm that integrating CNN feature extraction with fuzzy and neuro-fuzzy reasoning provides a powerful, robust, and clinically applicable framework for early pneumonia detection, balancing accuracy, reliability, and transparency in decision-making.