Artificial Neural Network, Attention Mechanism and Fuzzy Logic-Based Approaches for Medical Diagnostic Support: A Systematic Review

Zacarias-Morales, Noel; Pancardo, Pablo; Hernández-Nolasco, José Adán; Garcia-Constantino, Matias

doi:10.3390/ai6110281

Open AccessSystematic Review

Artificial Neural Network, Attention Mechanism and Fuzzy Logic-Based Approaches for Medical Diagnostic Support: A Systematic Review

by

Noel Zacarias-Morales

¹

,

Pablo Pancardo

^1,*

,

José Adán Hernández-Nolasco

¹

and

Matias Garcia-Constantino

²

¹

Academic Division of Sciences and Information Technology, Juarez Autonomous University of Tabasco, Cunduacán 86690, Mexico

²

School of Computing, Ulster University, Jordanstown BT37 0QB, UK

^*

Author to whom correspondence should be addressed.

AI 2025, 6(11), 281; https://doi.org/10.3390/ai6110281

Submission received: 29 August 2025 / Revised: 22 October 2025 / Accepted: 24 October 2025 / Published: 1 November 2025

(This article belongs to the Section Medical & Healthcare AI)

Download

Browse Figures

Versions Notes

Abstract

Accurate medical diagnosis is essential for informed decision making and the delivery of effective treatment. Traditionally, this process relies on clinical judgment, integrating data and medical expertise to inform decision making. In recent years, artificial neural networks (ANNs) have proven to be valuable tools for diagnostic support. Attention mechanisms have enhanced ANNs performance, while fuzzy logic has contributed to managing uncertainty inherent in clinical data. This systematic review analyzes how the integration of these three approaches enhances computational models for medical diagnostic support. Following PRISMA 2020 guidelines, a comprehensive search was conducted across five scientific databases (IEEE Xplore, ScienceDirect, Web of Science, SpringerLink, and ACM Digital Library) for studies published between 2020 and 2025 that implemented the combined use of ANNs, attention mechanisms, and fuzzy logic for medical diagnostic support. Inclusion and exclusion criteria were applied, along with a quality assessment. Data extraction and synthesis were conducted independently by two reviewers and verified by a third. Out of 269 initially identified articles, 32 met the inclusion criteria. The findings consistently indicate that the integration of ANNs, attention mechanisms, and fuzzy logic significantly improves the performance of diagnostic models. ANNs effectively capture complex data patterns, attention mechanisms prioritize the most relevant features, and fuzzy logic provides robust handling of ambiguity and imprecise information through continuous degrees of membership. This integration leads to more accurate and interpretable diagnostic models. Future research should focus on leveraging multimodal data, enhancing model generalization, reducing computational complexity, and exploring novel fuzzy logic techniques and training paradigms to improve adaptability in real-world clinical settings.

Keywords:

artificial neural networks; attention mechanisms; fuzzy logic; medical diagnosis; systematic review

1. Introduction

The correct medical diagnosis is essential for making pertinent decisions that lead to timely treatment for people with a medical condition. The evolution over the years, in terms of knowledge and characteristics of diseases, has allowed more effective value judgments for medical diagnoses. However, the subjectivity and degree of experience of health professionals are always present to appreciate the symptoms that patients present. Health professionals play a crucial role in the diagnostic process, often using their intuition to make a conclusive medical diagnosis [1]. The above situation illustrates that diseases, in addition to their quantitative aspects, also possess qualitative aspects.

Medical diagnosis can rely more than ever on computational technology to be more accurate, and advances in artificial intelligence make it a crucial tool to support medical diagnosis. That is, machine learning algorithms are capable of recognizing the set of characteristics or values that constitute the appropriate pattern to determine that a disease is present. In particular, in recent years, algorithms known as artificial neural networks (ANNs) have demonstrated excellent efficiency as a medical support tool for diagnosing whether a person is suffering from a specific disease based on vital signs and symptoms [2]. When fed with input values, ANNs mimic the behavior of the human brain, allowing a series of interconnected neurons to produce a final result that indicates whether a person is suffering from a specific disease [3]. There is a great variety of algorithms that implement the ANN architecture, and the advances to increase its efficiency are diverse.

One such important development in the artificial neural network (ANN) domain is the emerging concept of attention, which is implemented in ANNs through the concept of attention mechanism. An attention mechanism allows ANN models to prioritize specific regions in the input, such as key areas in medical images or critical features in temporal sequences [4]. Its primary function is to improve diagnostic accuracy and efficiency while minimizing information redundancy.

However, as mentioned, medical diagnosis has the particularity that it must consider qualitative values that require the evaluation of the healthcare professional. A technology that allows the management of the uncertainty and ambiguity present in the context of medical diagnosis support is fuzzy logic [5]. Fuzzy logic enables the representation and management of uncertainty and ambiguity inherent in clinical data. In contrast to traditional Boolean logic, it represents qualitative and imprecise values by continuous degrees of membership [6], and this is crucial given that symptoms and diseases rarely match rigid thresholds.

Fuzzy logic techniques enable categorization and improvement of medical decision making, enhance interpretability by emulating human reasoning, and facilitate the processing of fuzzy data. Fuzzy logic improves the interpretability of artificial neural network models by allowing knowledge to be represented in the form of rules that are understandable to humans. Unlike traditional neurons in artificial neural networks, where weights and activations are difficult to interpret on their own, fuzzy logic models relationships between variables using fuzzy terms such as “high”, “low”, or “moderate”. This integration between ANNs and fuzzy logic facilitates a deeper understanding of how the neural network model arrives at a conclusion, enabling experts to validate or adjust the system’s behavior based on their experience and expertise.

Despite the robustness that fuzzy logic offers neural networks with attention mechanisms in support of medical diagnosis, to our knowledge, no systematic review has been published that analyzes proposals integrating all three technologies to support medical diagnosis during the period 2020–2025. Our review aims to systematically analyze the integration of artificial neural networks, attention mechanisms, and fuzzy logic, with the goal of understanding their impact on the performance of computational models for medical diagnostic support and envisioning possible future routes to combine these technologies. For our systematic review, five research questions were established to achieve the objective of the systematic review:

Which algorithms or techniques of artificial neural networks, attention mechanisms, and fuzzy logic were selected to be integrated?
How was the integration between algorithms or techniques of artificial neural networks, attention mechanisms, and fuzzy logic performed?
What impact does the integration of algorithms or techniques of artificial neural networks, attention mechanisms, and fuzzy logic have on the outcome of the proposal?
What are the characteristics of the input data and of the data to be predicted, classified or inferred?
What methods or metrics were used to assess results?

Given the importance and relevance of artificial neural networks, attention mechanisms, and fuzzy logic, this paper presents a systematic review of these algorithms for medical diagnostic support. The main contributions of our systematic review are the following:

We identify and summarize the techniques and/or algorithms of artificial neural networks, attention mechanisms, and fuzzy logic that have been integrated and applied to medical diagnostic support.
We classify and describe the integration of artificial neural networks, attention mechanisms, and fuzzy logic algorithms or techniques.
We analyze and condense the information on the impact of integrating algorithms or techniques from artificial neural networks, attention mechanisms, and fuzzy logic into the models proposed in the publications.
We identify future research lines of approaches using artificial neural networks, attention mechanisms, and fuzzy logic algorithms or techniques.

The remainder of the paper is organized as follows: Section 2 describes the fundamental concepts of neural networks, attention mechanisms, and fuzzy logic. Section 3 describes the methodology used following the PRISMA 2020 systematic review guidelines [7]. Section 4 describes the publications included in the review and answers to the research questions. Section 5 analyzes and discusses the main findings of the review, providing future research directions. Finally, Section 6 concludes the paper.

2. Background and Related Work

The development of medical diagnostic support systems has significantly benefited from advances in artificial intelligence, aiming to objectively process the vital signs and symptoms of patients, as well as the subjectivity inherent in healthcare professionals’ judgments. In this context, artificial neural networks (ANNs) have established themselves as a fundamental tool capable of recognizing complex patterns from vital signs and patient symptoms. To enhance their efficiency and address the variety of algorithms that implement the ANN architecture, key concepts such as attention mechanisms have emerged, which allow ANNs to prioritize the most relevant information in the input data. Additionally, fuzzy logic has emerged as a vital technology for handling the uncertainty and ambiguity inherent in the qualitative characteristics of disease and medical data.

The synergistic integration of these three approaches (ANN for pattern learning, attention mechanisms for key feature targeting, and fuzzy logic for imprecision management) is critical for developing more accurate, robust, and interpretable medical diagnostic models. In this section, we explore the main approaches that have been applied and integrated to address clinical diagnostic challenges.

2.1. Artificial Neural Networks (ANNs)

Artificial intelligence has provided crucial tools to support medical diagnosis through algorithms capable of recognizing patterns of characteristics and values that determine the presence of a disease [2]. Within this field, artificial neural networks (ANNs) have emerged as a fundamental component [8]. ANNs mimic the behavior of the human brain through a series of interconnected neurons that, when fed with input values, provide a final result on the possible presence of a disease [3]. The most common types of artificial neural networks used in medical diagnostic support are convolutional neural networks (CNNs), multilayer perceptrons (MLPs), Transformers, gated recurrent units (GRUs), and graph neural networks (GNNs).

Convolutional neural networks (CNNs) are a type of artificial neural networks widely used in medical diagnostics. Their primary strength lies in their ability to extract spatial features from images efficiently [4]. Their relevance in medical diagnostics is manifested in critical tasks such as the accurate segmentation of images of organs and lesions, which is fundamental for treatment planning. For example, they have been used to delineate organ areas in images [9,10] and for disease classification from images [11,12]. CNNs enable artificial intelligence systems to identify and analyze complex visual details in medical data with an accuracy that complements human capability.

Multilayer perceptrons (MLPs) stand out for their simplicity and versatility. They often act as the final layer in a neural network architecture, performing the task of classification [4]. Their importance in medical diagnosis lies in their ability to make a final decision based on the characteristics processed by other, more specialized neural networks. For example, they have been used in the classification of histopathological images for different tissues [13] or in the diagnosis of cardiovascular diseases based on sound signals [14]. MLPs provide the decision stage that translates complex patterns into a specific classification.

Neural network architectures based on Transformers are valued for their ability to model complex relationships and global dependencies between input features [4]. In medical diagnosis, they are important for their ability to capture context in long data sequences, which is vital in scenarios where global spatial relationships are crucial. They are applied in tasks such as segmenting complex structures [9,15] and classifying diseases that require the fusion of information from multiple sources [16].

Gated recurrent units (GRUs) are exceptionally efficient in processing medical time series. Their importance in the medical field lies in their ability to analyze data where sequence and temporal dependencies are critical [4]. A clear example is their application in the classification of electroencephalogram (EEG) signals [17]. GRUs enable models to comprehend the evolution of symptoms and patterns over time, which is crucial for diagnosing dynamic conditions that change over time.

Graph neural networks (GNNs) are a class of neural networks that are particularly well suited for processing data with an inherent graph structure [3], such as the relationships between genes or biological samples. Their importance for medical diagnosis comes from their ability to predict connections between diseases or make complex diagnoses by using information from substructures. An example of their application is in the diagnosis of brain tumors, where they benefit from information on the relationships between substructures [18].

In addition to the main types of artificial neural networks mentioned above, other artificial neural network architectures have been used in medical diagnosis support tasks. Long short-term memory (LSTM) neural networks are beneficial for medical image reconstruction tasks that require modeling long-term dependencies in data sequences, often integrated within attention modules [3]. Generative adversarial networks (GANs) have been primarily used for data augmentation in situations where the amount of training data is limited [19], a crucial application for enhancing the robustness of models when data collection is challenging [4].

In recent years, ANNs have proven to be highly efficient as a medical support tool for diagnosing whether a person has a specific disease based on their vital signs and symptoms [8]. Their effectiveness extends to the analysis of large volumes of medical data, such as magnetic resonance images, electrocardiograms, and electronic health records, allowing them to identify complex patterns that may not be apparent to the human eye. However, despite their great capacity, the generalization ability of ANNs can be limited if the most relevant features within the information they process are not emphasized [20]. This particularity highlights the need to consider additional strategies that optimize their performance in complex clinical environments.

2.2. Attention Mechanisms

Attention mechanisms represent a significant advance in the field of artificial neural networks (ANNs). Their primary function is to optimize data interpretation by allowing ANN models to prioritize specific regions, such as in medical images or critical features in time series of vital signs. The ability to prioritize specific regions directly translates into a notable improvement in the accuracy and efficiency of artificial neural network model inference while also helping to reduce information redundancy in processing. By mimicking the human brain’s ability to focus on what is most relevant, these mechanisms enable neural networks to dynamically concentrate on different parts of the input, thereby achieving a more comprehensive and relevant representation of the data for diagnosis [4]. The most common types of attention mechanisms adapted to different needs and types of data in the medical field are Self-Attention, Multihead Attention, Channel-Spatial Attention, and Fuzzy Attention modules.

Self-Attention and Multihead Attention are very popular attention mechanisms used in the Transformer artificial neural network architecture. They are ideal for modeling complex relationships and global dependencies between input features [4,21]. Their ability to dynamically focus on different parts of the input and combine information from multiple “heads” of attention results in a more comprehensive representation.

Channel-Spatial Attention is a widely used attention mechanism for image-processing tasks where it is essential to prioritize key regions of the human body or specific visual features. This type of attention is very effective in models based on convolutional neural networks (CNNs). It is applied in object localization and detection [22], medical image classification [12,23], and accurate segmentation of organs or lesions [24,25]. It enables the dynamic capture of spatial and channel-level information, highlighting the most informative spatial regions and channels [26].

Fuzzy Attention modules are explicitly used in scenarios where uncertainty, ambiguity, and imprecise boundaries are inherent in the data. They integrate fuzzy logic directly into the attention process, which is particularly useful in image segmentation tasks with difficult-to-define boundaries [27,28] or in the classification of diseases with gradual transitions between classes [29]. These modules generate attention maps or redistribute weights using fuzzy membership functions, allowing the network to handle the uncertainty and complexity of regions of interest more effectively.

Attention mechanisms have become fundamental in artificial neural network models used for medical diagnosis support as they refine features and spatial/channel information, contributing to better data representation and, consequently, superior performance in supporting the diagnosis of different diseases.

2.3. Fuzzy Logic

Fuzzy logic is a fundamental computational technology in medical diagnosis designed to handle the uncertainty and ambiguity inherent in clinical data [5]. Unlike traditional Boolean logic (true/false), fuzzy logic allows for the representation of qualitative values and imprecise or “fuzzy” concepts, such as “mild pain”, “high blood pressure”, or “intermediate risk of disease”, through degrees of membership on a continuous interval. This feature is crucial because diseases and their symptoms often do not fit rigid thresholds but instead exhibit continuous variations.

The importance of fuzzy logic for medical diagnosis support lies in its ability to provide a flexible framework for categorizing risk levels and enhancing medical decision making, thereby increasing the interpretability of models by guiding their behavior and functionality through a set of rules determined by medical experts. Additionally, it facilitates the identification of fuzzy patterns. It enhances the segmentation or grouping of features in medical images, making it particularly useful in diagnosis cases with poorly defined boundaries. The most relevant key concepts of fuzzy logic [6] used in applications for medical diagnosis support include fuzzy sets, membership functions, fuzzy rules, fuzzy inference, defuzzification, and linguistic variables.

A fuzzy set extends the concept of a classical set by allowing its elements to have a degree of membership, represented by a continuous value between 0 and 1, rather than the strict membership of 0 or 1. This mathematical framework is crucial for modeling uncertain, approximate, or imprecise data, allowing for a more flexible representation of reality in hybrid systems. Fuzzy sets are classified into different types based on their complexity. Type 1 sets are the most common, while Type 2 sets incorporate a greater range of uncertainty by defining an interval for degrees of membership, which is helpful in highly ambiguous environments.

Membership functions are mathematical definitions that assign a membership value to each element in a fuzzy set. These functions, such as triangular, trapezoidal, or Gaussian, play a key role in modeling smooth transitions between states and are fundamental in the configuration of the fuzzy system.

The fuzzy rules, usually expressed in the form “If–Then”, structure the behavior of the fuzzy system using qualitative logic. These rules enable the translation of heuristic knowledge or human experience into a formal structure, facilitating its integration with numerical models, such as neural networks.

Fuzzy inference is a process that involves evaluating fuzzy rules and combining their results to infer conclusions. Methods such as the Mamdani or the Sugeno schemes are commonly used in fuzzy inference, providing a bridge between fuzzy inputs and controlled outputs.

Defuzzification is the process of converting fuzzy results into precise values, enabling them to be interpreted in practical applications. Techniques such as area centroid or weighted average are used to ensure the consistency and usefulness of outputs in hybrid systems.

Linguistic variables allow human knowledge to be represented using linguistic terms such as “low”, “high”, or “moderate”. Each linguistic term is associated with a membership function, facilitating the translation of qualitative concepts into quantitative representations suitable for computational processing.

The versatility of fuzzy logic enables its integration into different phases of a neural network model, ranging from data pre-processing and direct embedding in attention mechanisms to modifying the architecture of neural networks or refining results in post-processing stages. This ability to model the complexity and imprecision inherent in medical data makes it a powerful tool for strengthening and improving the accuracy of medical diagnostic support systems.

3. Methodology

We conducted this systematic review using the PRISMA 2020 methodology [7]. The PRISMA methodology is a guide designed to improve transparency, quality, and reproducibility in the preparation of systematic reviews. It aims to ensure that the publication selection, data extraction, and summarization processes are well documented, enabling other researchers to evaluate the results critically.

For our systematic review, we developed a protocol that was validated and approved by all authors before initiating the systematic review. The protocol was also registered with the university research ethics committee of the Universidad Juárez Autónoma de Tabasco (protocol code UJAT-CIEI-2025-062). The systematic review protocol follows the PRISMA-P 2015 guidelines [30].

Furthermore, the systematic review protocol was prospectively registered in the International Prospective Register of Systematic Reviews (PROSPERO) to ensure transparency and avoid duplication of research efforts. The registration includes a detailed description of the objectives, eligibility criteria, and methodological approach of the study. The corresponding registration number is CRD420251140977, which has been included for reference in accordance with best practices for systematic reviews.

3.1. Eligibility Criteria

In this systematic review we established the eligibility criteria in consideration of the study’s objectives. We included publications from 2020 to 2025, written in English and corresponding to scientific papers published in academic journals, conference proceedings, or technical reports from research institutes. We considered only those publications that proposed the integration of the following computational approaches: artificial neural networks, attention mechanisms, and fuzzy logic, provided they were applied to support the medical diagnosis of human diseases.

In contrast, we excluded publications that, although addressed the integration of the previously mentioned computational approaches, were not oriented towards medical diagnostic support, as well as those that focused on medical diagnostic support but did not integrate the three computational approaches mentioned. We also excluded studies that did not present methods, evaluation metrics, or quantitative results, as well as publications with incomplete or unverifiable information (e.g., no year of publication or no clear source). Likewise, we excluded literature review studies; however, we considered those systematic or scoping reviews that we identified during the search process as contextual references and comparison material without integrating them into the primary analysis of this systematic review.

3.2. Sources of Information

For our systematic review, we searched for publications in five recognized scientific electronic databases: IEEE Xplore, ScienceDirect, Web of Science, SpringerLink, and ACM Digital Library. We selected these sources for their relevance, subject coverage, and editorial quality in the fields of engineering, computer science, artificial intelligence, and its applications in medical diagnostics. These databases compile high-impact publications from scientific journals, international conferences, and peer-reviewed technical reports, providing access to up-to-date, rigorous, and relevant research. In addition, their broad multidisciplinary scope allowed us to identify studies that integrate advanced computational approaches (such as artificial neural networks, attention mechanisms, and fuzzy logic) applied in clinical contexts. In this way, we ensured an exhaustive and representative search of the relevant literature for the objectives of our systematic review. The process of searching and retrieving records from the five scientific electronic databases was automated using each publisher’s advanced search engines.

3.3. Search Strategy

Our search strategy employed keywords and terms related to artificial neural networks, attention mechanisms, and fuzzy logic, as well as the medical diagnostic process. We identified 6 terms and 16 synonyms, which were used to generate the search strings, using the Boolean structure “OR” to associate synonymous words and “AND” to connect the main parts. Two researchers performed the search strings, and a third researcher validated them. We searched the scientific electronic databases on 28 March 2025. The search strings are shown in Table 1. Some of the digital libraries allow using the asterisk (*) as a wildcard to search for words that have spelling variations or contain a specified pattern of characters. We used the asterisk (*) to find terms with the same beginning but different endings.

3.4. Selection Process

For the management and organization of information during the selection process, we used office software tools and a bibliographic reference manager. Specifically, we used spreadsheet software (Microsoft Excel) to develop and apply the forms based on the inclusion and exclusion criteria, as well as the quality assessment forms and the template for the final data collection of the selected publications. This tool allowed us to structure, classify, and filter the records systematically. We also used a bibliographic reference manager (Zotero) to import, organize, and purge the publications retrieved from the databases, facilitating both the elimination of duplicates and the integration of the abstracts for initial review. For drafting the systematic review manuscript and coordinating collaborative tasks among authors, we used a word processor (Microsoft Word) that enabled us to document the methodological process, consolidate results, and track individual contributions at each phase of the systematic review. The combination of these tools contributed to maintaining the traceability, transparency, and methodological rigor of the selection process.

Once we identified the total number of publications from each of the selected databases, we carried out a systematic and structured process of study inclusion and exclusion to ensure the quality, relevance, and coherence of the works included in this systematic review. This process was conducted in six stages:

Search by source: For each database, we applied the previously defined search string, adapting it according to the particularities of each platform’s search engine. Some inclusion criteria, such as publication period (2020–2025) and language (English), were incorporated directly into the search strings to refine the results from the beginning.
Title and abstract retrieval: We used a bibliographic reference manager to retrieve the list of publications obtained from each source, including abstracts, when available. This information enabled us to perform an efficient initial filtering of the publications.
Removal of duplicates: With the help of the bibliographic reference manager, we identified and removed duplicate publications, ensuring that each publication considered in the following stages was unique.
Review of titles and abstracts: We subsequently reviewed and analyzed the titles and abstracts of each publication to assess their correspondence with the inclusion and exclusion criteria defined in the protocol of our systematic review. At this stage, we discarded all those publications that did not achieve the objectives of our systematic review.
Advanced review of the full text: In cases where the title and abstract did not provide sufficient information to determine the publication’s relevance, we accessed the full text to further evaluate its potential inclusion.
Quality assessment: To minimize the risk of bias resulting from the methodological quality of the included publications, we conducted a quality assessment for each preselected publication. This analysis allowed us to ensure that the final publications had the level of scientific rigor necessary to answer our research questions.

This entire process was carried out independently by two of the authors to minimize errors and biases. In cases where disagreements arose about the inclusion or exclusion of a publication, a third author, acting as an external reviewer, was consulted to make the final decision. In addition, a detailed record was kept of the reasons for excluding each discarded publication, which is reported in the form of a PRISMA flowchart as part of this review.

3.5. Risk of Bias Assessment (Quality Assessment)

Our systematic review had to assess the quality of the publications in order to identify those that best answered the research questions. For this reason, we applied a risk of bias assessment (other authors refer to this study as “quality assessment”). For this process, we defined 11 questions to assess the publications; each question could obtain one of three possible answers with a respective score according to the following criteria:

The question is answered completely = 1.
The question is answered in a general way = 0.5.
The question is not answered = 0.

The sum of the answer scores ranged from 1 to 11, and only those publications that scored seven or more were selected for the next phase of the systematic review. This assessment was conducted by two of the authors separately and reviewed by a third researcher. The questions were:

Q-01: Is the information on the source of the publication clear?
Q-02: Does the publication have the basic sections of a scientific report?
Q-03: Do they define the problem they address?
Q-04: Do they describe what the input (source) data are?
Q-05: Do they describe what the output data (prediction or classification) is?
Q-06: Are the artificial neural networks, attentional mechanisms, and fuzzy logic algorithms mentioned?
Q-07: Is the integration between artificial neural network algorithms, attention mechanisms, and fuzzy logic clearly described?
Q-08: Do they use metrics to evaluate the results?
Q-09: Is there any mention of the impact of the integration of the algorithms on the results?
Q-10: Do they clearly present results?
Q-11: Does the discussion section consider the implications of the proposal and compare their results with similar ones?

We developed the quality assessment based on the criteria used by the Center for Reviews and Dissemination at the University of York, published in [30].

3.6. Data Extraction and Synthesis of Results

Once we obtained the definitive list of publications to be included in the systematic review, we proceeded to collate the full texts of each study and begin the data extraction process. To achieve this, we designed a specific form comprising 30 items, which we had previously defined based on the objectives of the systematic review and adhering to good methodological practices. This process was carried out independently and in parallel by two authors to minimize the risk of bias and reduce possible errors during data collection. We organized the data extracted from each publication into five general categories; this approach allowed us to conduct a structured and comparative analysis of the selected publications. The categories are as follows:

General information about the publication.
Information about the source of the publication, including publication type, journal name, and date.
Details about the datasets used, including general characteristics, pre-processing techniques, and descriptions of the target data.
Technical information on the algorithms and approaches applied, specifically artificial neural networks, attention mechanisms, and fuzzy logic.
Details on the integration between the approaches, including how they were combined, the impact of such integration on the results, the evaluation metrics used, and possible areas of opportunity reported by the authors.

To present the findings of our systematic review, we conducted a narrative synthesis, which is presented in text, graphs, and tables to summarize and explain the main characteristics and most relevant results of the included publications. This synthesis enabled us to qualitatively explore the relationships between the approaches proposed in the analyzed publications, with a special focus on how artificial neural networks, attention mechanisms, and fuzzy logic were integrated, as well as the impact of integrating these three approaches on supporting medical disease diagnosis.

We structured the narrative synthesis into sections aligned with each of the research questions formulated for our systematic review. In this way, readers will be able to clearly and directly identify the answers provided by the current literature concerning each question, thereby facilitating an understanding of the contributions, gaps, and opportunities for research in this field.

3.7. Information Bias

In our systematic review, we did not conduct a meta-analysis; therefore, we employed narrative synthesis to mitigate the risk of bias associated with missing results in the publications. During the extraction and analysis of information from each publication, we paid particular attention to the absence of relevant outcomes, such as experimental comparisons, ablation studies, or performance analyses. We explicitly documented within appendix tables the cases in which publications did not report any of these key elements, pointing out the specific reference of the corresponding publication. The appendix tables allow readers to identify publications with incomplete results and to assess the degree of reliability or limitations associated with those publications. This strategy enabled us to maintain transparency and methodological rigor despite not including an additional evaluation of the results that were missing from the publications.

4. Results

In this section, we describe the results obtained and the answers to the research questions of this systematic review.

4.1. Study Selection

We identified a total of 269 publications as a result of our search in the five selected electronic scientific databases (IEEE Xplore, ScienceDirect, Web of Science, SpringerLink, and ACM Digital Library). In the first stage, we eliminated 28 duplicated publications, resulting in 241 records for the initial review. Subsequently, we reviewed these 241 publications based on their titles and abstracts to apply the previously defined eligibility criteria. During this process, we excluded 209 publications that did not meet the established criteria, primarily due to a lack of integration between the required computational approaches, failure to apply the methods to medical diagnosis, or insufficient presentation of clear methodological information. Finally, 32 publications met all the inclusion criteria and were selected to form the set of studies included in our systematic review. This process was applied separately by two of the authors and reviewed by a third author. Figure 1 illustrates the methodological approach according to the PRISMA guidelines. The search identified publications related to neural networks, attention mechanisms, and fuzzy logic for medical diagnostic support in scientific digital databases.

4.2. Study Characteristics

In Appendix A, we list the publications and include the most important data related to the research questions, which are also considered significant for this systematic review.

4.3. Risk of Bias Within Studies

Appendix B contains the results of the risk of bias assessment (quality assessment) for the publications.

4.4. Synthesis of Results

Searching various databases enables us to analyze global concepts related to neural networks, attention mechanisms, and fuzzy logic. Figure 2 illustrates the bibliographic information network of keywords in the titles and abstracts of the analyzed articles, presented by total count.

The bibliographic information network map allowed us to identify the thematic structure of the literature on artificial neural networks, attention mechanisms, and fuzzy logic in medical diagnosis support. The map generated highlights the centrality of specific concepts and their interrelation in three main areas: clinical application, methodological development, and integrative approaches.

First, the size of the nodes reflects the frequency of occurrence of the terms, with model, image, feature, and network standing out as recurring and cross-cutting concepts in the publications. In turn, more specific terms such as pneumonia, airway segmentation, or heart sound signal occur less frequently but represent specific areas of application in particular clinical contexts.

Color grouping enables the observation of different thematic clusters. The red cluster focuses on terms related to the clinical field and the direct application of these techniques, including image, segmentation, accuracy, treatment, and fuzzy logic, highlighting the relevance of imaging and fuzzy logic in the diagnostic process. The green cluster, meanwhile, brings together terms associated with technical and methodological fundamentals, such as convolutional neural network, deep learning, feature, and interpretability, reflecting the emphasis on architecture construction and feature extraction. The blue cluster is related to performance and reliability assessment, including terms such as performance, uncertainty, robustness, and diagnosis. Finally, the yellow cluster highlights the relevance of attention mechanisms, which are connected to model, dataset, and feature, playing an articulating role between the clinical and technical dimensions.

The pattern of connections between nodes reveals that model is the most cross-cutting concept, as it is linked to all clusters. Likewise, attention mechanisms show strong connectivity with both methodological and applied terms, confirming their role as an integrative strategy that simultaneously enhances the predictive capacity and clinical applicability of models. Complementarily, fuzzy logic is mainly associated with diagnosis, accuracy, and treatment, reflecting its contribution to managing the uncertainty inherent in medical diagnosis and strengthening clinical decision making.

Among the main tasks addressed in the publications of our systematic review, medical image segmentation and medical image classification are by far the most researched areas of application, accounting for almost 85% of all publications, as shown in Figure 3.

Visual data are the primary focus of publications. Medical images (MRI, CT, X-rays, ultrasounds) constitute the predominant type of data identified in our systematic review, as shown in Figure 4.

Table 2 lists the publications relevant to each type of disease or medical condition. This table shows the frequency with which each disease or condition is targeted by the models described in the publications included in our systematic review. The data was organized into disease categories for ease of understanding.

Below are the answers to the five research questions.

4.4.1. RQ1: What Algorithms or Techniques of Artificial Neural Networks, Attention Mechanisms, and Fuzzy Logic Were Selected to Be Integrated?

We analyzed and identified a wide variety of artificial neural network (ANN) algorithms in the 32 publications included in our systematic review. It is worth mentioning that publications commonly employ more than one type of neural network. The most frequently integrated were convolutional neural networks (CNNs), which are commonly used in publications due to their ability to extract spatial features in medical images, as well as multilayer perceptrons (MLPs) due to their simplicity, which makes them suitable for combination with other types of neural networks. In addition, several publications incorporated Transformers, generally as part of encoder–decoder architectures or Multihead Attention mechanisms, taking advantage of their effectiveness in modeling global dependencies. Regarding fuzzy logic, publications have implemented fuzzy modules in various parts of the model, utilizing techniques such as fuzzy layers and integrated fuzzy networks to convert feature maps into fuzzy representations.

Table 3 presents the artificial neural networks (ANNs) reported in publications that integrate various attention techniques and fuzzy logic. Figure 5 shows the types, combinations, and number of neural networks identified.

The attention mechanisms were varied and often customized according to the architecture and type of data, reflecting a high degree of customization in each publication. Table 4 presents the attention mechanisms reported in publications that were integrated with various neural networks and fuzzy logic. Figure 6 shows the types and number of attention mechanisms identified.

Fuzzy logic was implemented in a wide variety of ways, from specific modules to more complex architectures. We identified different approaches, which are summarized in Table 5.

4.4.2. RQ2: How Was the Integration Between Algorithms or Techniques of Artificial Neural Networks, Attention Mechanisms, and Fuzzy Logic Performed?

The integration of algorithms or techniques from artificial neural networks, attention mechanisms, and fuzzy logic is carried out in various ways, encompassing everything from data pre-processing to result post-processing, including modifications to the network architecture and loss function.

(A): Integration of fuzzy logic as pre-processing or initial data transformation

A standard method of integration involves the use of fuzzy logic in the initial stages of the workflow before neural networks and attention mechanisms process the data or prepare it in a specific way.

In several cases, fuzzy logic is employed for feature discretization or initial image segmentation, which helps to eliminate redundant information and focus on the area of interest. For example, Ref. [41] applies fuzzy clustering to images and combines fuzzy sets and rough sets for feature discretization as a pre-processing step. Similarly, Ref. [25] performs feature discretization based on Rough Fuzzy Sets and uses Fuzzy C-Means Clustering to calculate pixel membership before the images are input into the attention network. Ref. [31] also uses Fuzzy C-Means (Dictionary Learning-based Fuzzy C-Means) Clustering in the pre-processing stage for the segmentation of the region of interest (ROI) in skin lesion images.

Fuzzy logic is also used to improve data quality or representation. Ref. [39] utilizes a Multioutput Takagi–Sugeno–Kang Fuzzy System to handle uncertainty and noise, thereby improving the data before applying a Transformer-based model and an MLP. Ref. [44] uses a Fuzzy Color Method for X-ray enhancement and noise reduction in the pre-processing stage. Ref. [33] utilizes a Type-2 Fuzzy Subsystem to transform raw data into fuzzy samples, thereby providing a more accurate representation of data in its neural network model with attention. Additionally, Ref. [17] employs a Fuzzy Entropy Algorithm in the pre-processing stage to quantify signal uncertainty and obtain more information for the classification of Parkinson’s disease.

(B): Integration of Fuzzy Logic directly into Attention Mechanisms

One of the most sophisticated integrations is when fuzzy logic is incorporated into the design and operation of attention mechanisms, allowing them to handle uncertainty and focus information more effectively.

In several publications, fuzzy logic guides or improves the generation of attention maps or feature weighting. Ref. [27] uses a Fuzzy Attention-based Module that employs trainable Gaussian membership functions to focus the network on relevant regions of the encoder and decoder outputs, generating an attention map with fuzzy features. Similarly, in [28], a Fuzzy Attention Layer is applied to each residual connection in the encoder–decoder architecture, taking the feature representations and feeding them to a Fuzzy Attention Gate to generate an attention map using Gaussian membership functions to learn the importance of the representations. Ref. [37] also incorporates Fuzzy Attention Modules (Fuzzy Channel Attention and Fuzzy Spatial Attention) in the encoder and decoder to control the importance of pixel values using fuzzy membership degrees.

Fuzzy logic is also used to fuse features in the context of attention. Ref. [13] introduces Fuzzy-Guided Cross-Attention for the fusion of fuzzy features and different features extracted by a CNN. Ref. [29] develops Fuzzy-Enhanced Holistic Attention, which applies fuzzy membership functions to redistribute feature weights in the channel and spatial domains, thereby mitigating uncertainty in visual similarity and ambiguous boundaries.

In addition, there are integrations where neuro-fuzzy systems influence attention. Ref. [36] uses an Adaptive Neuro-Fuzzy Inference System (ANFIS) directly within the attention mechanism to model concepts of uncertainty in the data and generate fuzzy values. Ref. [34] incorporates a Feature Selective Enhancement (FSE) module that integrates fuzzy logic techniques for Channel-Wise Feature Recalibration, improving the model’s focus on relevant morphological patterns. Ref. [11] utilizes a fuzzy layer within a Fuzzy Joint Attention Module to convert feature maps into fuzzy maps, employing membership functions to filter and assign a fuzzy degree to each feature.

(C): Integration of Fuzzy Logic into the Architecture of Artificial Neural Networks

In this type of integration, fuzzy logic is directly added to the components of the neural network, such as layers, connections, or activation functions, thereby modifying the model’s behavior.

Fuzzy logic can influence the flow of information between the components of the neural network model. Ref. [9] placed a Wide-Wise Fuzziness Module between the encoder and decoder of a Convolution-Transformer network, using a Mask Matrix with Fuzziness Elements for the estimation of fuzzy areas. Ref. [42] placed a Fuzzy Learning Module between the encoder and decoder of a convolutional neural network (CNN) architecture, converting the feature map extracted by the encoder into a fuzzy value using Gaussian membership functions.

The internal connections of artificial neural network models can also be “fuzzy”. In [10], fuzzy skip connections in a CNN-based encoder–decoder architecture use a fuzzy operation to formulate features, suppressing features in irrelevant background regions and enhancing Target Features. In [35], features extracted from two different CNN-based neural network models are combined to make diagnostic inferences; integration is performed by combining features from different branches (one with CNN and Fuzzy C-Means and another with CNN and Spatial-Channel Attention) for final classification.

Fuzzy logic can be an integral part of the layers or activation functions. Ref. [24] incorporates fuzzy convolutional modules (Spatial Fuzzy Convolution and Channel Fuzzy Convolution) directly into the encoder and decoder layers of a CNN-based network, allowing the network to learn fuzzy information from the spatial and channel dimensions. Ref. [43] employs a fuzzy logic approach to the activation function in each residual connection of a CNN-based encoder–decoder architecture, enabling smoother and more continuous outputs and increasing sensitivity to small changes in image data.

Hybrid neuro-fuzzy blocks can also be created. Ref. [40] uses a Fuzzy Neural Information Processing Block that merges a CNN and a fuzzy network to calculate the uncertainty of the input pixels. Ref. [22] employs a CNN based on fuzzy pooling for feature extraction. Fuzzy pooling uses membership functions to perform a weighted average of the data, preserving more positional information.

(D): Integration of Fuzzy Logic in Post-processing Stages or Result Fusion

Fuzzy logic can be applied after neural networks have generated their inferences to refine the results or combine them intelligently. Ref. [15] proposes a Fuzzy Selector to process the results of the segmentation process further, leveraging local spatial information to mitigate noise interference.

Regarding the fusion of outputs from multiple models, Ref. [16] merges the malignancy scores of three CNN-based models and one Transformer-based model using fused malignancy scores and Choquet integral (CI), which considers the relevance and interactions between scores. Similarly, Ref. [19] employs a Fuzzy Rank-Based Ensemble Approach to reduce the dispersion of individual predictions and improve classification accuracy, where the decision scores of three CNN models are mapped into nonlinear functions to construct fuzzy ranks.

Fuzzy logic can also act as a final classifier. In [14], after CNNs and attention mechanisms process the features, a Self-Organizing Fuzzy Inference System is used as the final classifier.

Additionally, fuzzy logic can also be used for feature selection and post-attention channel selection. In [12], after a Channel Attention Module (CAM) and two Spatial Attention Modules (SAMs), a Fuzzy Channel Selection (FCS) Module is used to select the most relevant channels, assigning weights to each channel through a sigmoid function.

In [18], Adaptively Regularized Kernel-based Fuzzy C-Means (ARKFCM) is employed for an intermediate segmentation (clustering) stage, bridging the gap between feature extraction and final classification, thereby grouping data with complex structures.

(E): Integration of Fuzzy Logic in the loss function or for optimization

Fuzzy logic can influence the model’s learning process by defining loss functions or utilizing optimization algorithms. Ref. [38] utilizes a Fuzzy Supervised Contrastive Loss (LFSC) to “fuzzy smooth” the label information, thereby reducing the rigidity of the labels and enhancing the network’s generalization ability during training. Ref. [23] integrates fuzzy logic into a Fuzzy-Enhanced Firefly Algorithm (FEFA), an optimization algorithm that enhances the adaptability of the neural network model optimization process.

(F): Hybrid Neuro-Fuzzy Systems as key components

Some publications present inherently hybrid components, where the principles of artificial neural networks and fuzzy logic come together to form a new type of “layer” or “module”. Ref. [32] presents a fuzzy-ER net layer, which is a fuzzy classification approach that integrates fuzzy concepts (Fuzzy Partition, Rule Generation, and Categorization of new patterns) into cancer image classification, acting as a central classification component.

The integration between artificial neural networks, attention mechanisms, and fuzzy logic manifests itself through a spectrum of strategies, ranging from intelligent data preparation utilizing fuzzy logic to its direct embedding in neural network architectures and attention mechanisms for uncertainty management, focus improvement, and application in prediction combination or learning process optimization. This variety demonstrates the versatility of the three approaches in complementing and enhancing the capabilities of deep learning models to support medical diagnosis.

4.4.3. RQ3: What Impact Does the Integration of Algorithms or Techniques of Artificial Neural Networks, Attention Mechanisms, and Fuzzy Logic Have on the Outcome of the Proposal?

To answer this research question, we conducted an analysis using information from ablation studies included in 19 of the 32 publications. We only included 19 publications because 13 of the 32 publications did not provide clear and concise information on the individual or combined impact of the following techniques: artificial neural networks, attention mechanisms, and fuzzy logic. It is essential to note that the metric values are presented as originally reported in the publications, some in percentages (%) and others in decimal values.

Our analysis of the extracted data consistently demonstrates that integrating attention mechanisms, fuzzy logic techniques, or a combination of both improves the performance of neural network models in several tasks that support medical disease diagnosis. For a better understanding of the results, in response to research question five, we list and classify the metrics found in the publications.

(A): Impact of Attention Mechanisms

Attention mechanisms demonstrate a remarkable ability to improve the accuracy and efficiency of models by allowing them to focus on the most relevant features of the data. For example, in left atrial image segmentation [9], the use of the Multihead Self-Attention mechanism improved the Dice similarity coefficient (DSC) by 2.8% and the Jaccard similarity coefficient (JSC) by 3.77%. For glaucoma stage classification [11], Channel Attention increased accuracy (ACC) by 1.9% and the F1-score (F1) by 2.2%; Spatial Attention achieved similar improvements. In Alzheimer’s diagnosis [38], Hybrid Attention improved accuracy (ACC) by 4.28% and sensitivity (SEN) by 4.85%.

Other examples include improvements in Dice similarity coefficient (DSC) and Hausdorff distance at 95th percentile (HD95) for 3D image segmentation [42], or in accuracy (ACC), precision (Pre), recall (Rec), F1-score (F1) for Parkinson’s diagnosis [17], as well as improvements in Dice similarity coefficient (DSC) and Hausdorff distance (HD) for cardiac MRI segmentation [24]. In [41], the use of the Dual Attention Mechanism improved segmentation performance, with increases of 0.0376 in the Dice similarity coefficient (DSC), a reduction of 0.0794 in the 95th-percentile Hausdorff distance (95HD), as well as a reduction of 0.0799 in the average surface distance (ASD), compared to a model without attention. The implementation of Spatial-Channel Attention in [35] resulted in improvements in classification performance, with increases of 0.022 in precision (Pre) and 0.022 in recall (Rec) compared to when the attention technique was not used. These results demonstrate how attention mechanisms, by refining features and spatial and channel information, contribute to better representation and, consequently, superior performance.

(B): Impact of Fuzzy Logic Techniques

The integration of fuzzy logic became a powerful strategy for managing uncertainty and data granularity, often resulting in substantial improvements. In the classification of histopathological images [13], the use of fuzzy logic (Universal Fuzzy Feature) improved accuracy (ACC) by 6%, true-positive rate (TPR) by 5.4%, and positive predictive value (PPV) by 6.3%. One of the most notable impacts is seen in the diagnosis of autism [39], where the Multioutput Takagi–Sugeno–Kang Fuzzy System increased accuracy (ACC) by an impressive 24.6% and specificity (SPE) by 59.9% compared to a model without this technique. Similarly, in the prediction of Alzheimer’s disease [38], Fuzzy Supervised Contrastive Loss improved accuracy (ACC) by 10.74% and sensitivity (SEN) by 10.05%.

Other publications, such as Fuzzy Channel Selection for Pneumonia Detection [12], also showed considerable improvements in accuracy (ACC), precision (Pre), recall (Rec), F1-score (F1), and area under the curve (AUC). Fuzzy logic has also proven helpful in pre-processing, such as image enhancement with a Fuzzy Color Method [44], which improved accuracy (ACC) by 0.0262, or the Fuzzy Entropy Algorithm for Parkinson’s diagnosis [17], which improved accuracy (ACC) by 5.44%. In OCT fundus image segmentation [25], Fuzzy C-Means Clustering improved the Dice similarity coefficient (DSC) by 0.02.

The application of the Fuzzy-Enhanced Firefly Algorithm [23], which integrates fuzzy rules for algorithm optimization, improved classification metrics compared to an approach without fuzzy logic: 1.7% in accuracy (ACC), 1.8% in F1-score (F1), 1.8% in precision (Pre), and 1.9% in recall (Rec). Additionally, the Fuzzy C-Means Clustering technique, which uses membership degrees instead of one-hot labels, had a positive impact on classification metrics, with improvements of 0.032 in precision (Pre), 0.008 in recall (Rec), and 0.013 in F1-score (F1) when compared to the alternative without fuzzy logic [35]. These results demonstrate how fuzzy logic allows models to better handle the complexity inherent in medical data by modeling ambiguity and gradual membership of data.

(C): Impact of Combining Attention and Fuzzy Logic

The fusion of attention mechanisms and fuzzy logic often leads to even more robust performance, leveraging the best of both approaches. In lung organ segmentation [27], Fuzzy Attention improved the intersection over union (IoU) by 0.83, in addition to improvements in other relevant metrics. For refined segmentation of the pancreas [10], the combination of Target Attention and fuzzy skip connections resulted in a 7.1% improvement in the Dice similarity coefficient (DSC) and a 10.24% reduction in the volumetric overlap error (VOE). In the segmentation of 3D medical images [42], Attention Fusion with fuzzy learning improved the Dice similarity coefficient (DSC) by 1.24%. Furthermore, in diabetic retinopathy classification [29], Fuzzy-Enhanced Holistic Attention increased accuracy (ACC) by an average of 2.92% and quadratic weighted kappa (kappa) by 3.73%.

In [28], the use of the Fuzzy Attention Layer improved performance in airway segmentation by integrating encoder and decoder features using Gaussian membership functions to generate attention maps. Improvements were observed in precision (Pre) of 0.0015, in the detected length ratio (DLR) of 1.39%, in the detected branch ratio (DBR) of 2.18%, a reduction of 0.34% in the airway missing ratio (AMR), and an improvement in continuity and completeness F-score (CCF) of 0.0051, compared to a base model without these techniques. In [33], the combination of attention and fuzzy logic in a Fuzzy Graph Structure Attention Network resulted in improvements of 0.0158 in accuracy (ACC) and 0.0133 in F1-score (F1) compared to a model without these integrations.

These results demonstrate that the joint integration of these techniques can achieve an additive or even synergistic improvement by allowing the model to focus on the correct regions and more effectively handle the uncertainty and complexity of those regions.

4.4.4. RQ4: What Are the Characteristics of the Input Data and of the Data to Be Predicted, Classified, or Inferred?

Below, we present an analysis of the characteristics of the input data and the data to be predicted, classified, or inferred based on information extracted from the 32 publications.

(A): Characteristics of Input Data

The input data in the publications included in our systematic review are predominantly medical images, demonstrating a strong focus on diagnosis and segmentation in the healthcare field. These images come from a variety of modalities:

Computed tomography (CT): Chest CT scans are used for lung and airway segmentation, as in [27,28]. Abdominal CT scans are also used for organ segmentation, such as the pancreas in [10]. Additionally, CT images are utilized for image reconstruction, as seen in [40], and for lung cancer classification in [32].
Magnetic resonance imaging (MRI): Brain MRIs are commonly used for the diagnosis of brain tumors in [15,18], as well as for predicting Alzheimer’s disease using PET in [38]. Cardiac MRIs are used for segmentation in [24,37], and breast MRIs are used for cancer classification in [33]. Refs. [9,39,42] also use MRI as input data.
Fundus images and OCT: Fundus images are essential for glaucoma classification in [11] and diabetic retinopathy grading in [29]. Retinal OCT images are used for sub-retinal fluid segmentation in [25,41].
Histopathological images: [13] uses histopathological images of different tissues for classification.
X-rays: Chest X-rays are used for the diagnosis of pulmonary disorders in [23] and the detection of pneumonia in [12]. Panoramic dental X-rays are used in [44] for pediatric oral health screening.
Ultrasound images: Ref. [35] uses ultrasound images of cervical lymph nodes, and [16] uses them to predict the malignancy of breast tumors.
Other imaging modalities: Ref. [22] uses gastric endoscopy images to locate polyps. Ref. [34] uses peripheral blood smear images for the diagnosis of acute lymphoblastic leukemia. Refs. [19,31] use dermatoscopy images for the detection and classification of skin cancer. [43] uses images of diabetic foot ulcers for segmentation.

Besides the images, the input data also includes the following:

Audio signals: Ref. [14] uses heart sound signals to diagnose cardiovascular diseases.
Electroencephalogram (EEG) signals: Ref. [17] uses EEG signals to classify Parkinson’s disease.
Data matrices: Ref. [36] processes mRNA expression data matrices for predicting prostate cancer metastasis.

Pre-processing is diverse and essential for adapting data to neural networks:

Resizing and cropping: Images are commonly resized to uniform sizes, such as (224 × 224 pixels), (256 × 256 pixels), or (512 × 512 pixels). Cropping is also standard, such as (128 × 96 pixels) in [27], (192 × 160 pixels) in [10], and (135 × 224 × 224 pixels) in [15].
Normalization: This is a critical step, such as normalizing to an interval from 0 to 1 in [23], using Z-score normalization in [15,36,37], or normalizing by subtracting the mean and dividing by the standard deviation in [16,35].
Data augmentation: Widely applied to improve the robustness and generalization of models; techniques include rotations (horizontal, vertical, and random), scaling or zooming, adding noise, or using generative adversarial networks (GANs) for data augmentation as in [19].
Image enhancement: Ref. [44] uses a diffuse color method to enhance X-rays and reduce noise. Ref. [31] uses improved mean adjustment (IMA) for image enhancement.

Specific pre-processing by modality:

Audio signals: Ref. [14] applies resampling at 1000 Hz, filtering (bandpass 25–400 Hz), denoising, and extraction of features from the time domain (mean, variance), frequency domain (FFT), and time-frequency domain (MFCCs).
EEG signals: Ref. [17] applies bandpass filtering, notch filter for power frequency interference, wavelet denoising, frequency band decomposition, and data segmentation.
mRNA expression data: Ref. [36] uses Z-score and min–max normalization and feature selection with ANOVA and Relief algorithm.
OCT images: Ref. [41] applies fuzzy clustering and feature discretization based on fuzzy approximate sets. Ref. [25] uses feature discretization based on fuzzy-approximate sets and Fuzzy C-Means clustering.
MRI: The Multioutput Takagi–Sugeno–Kang Fuzzy System in [39] addresses interference and intrinsic noise in rs-fMRI data.

(B): Characteristics of data to be predicted, classified, or inferred

The output data reflects the diverse tasks addressed by the models, which can be grouped primarily into classification and segmentation, with some reconstruction and coordinate inference tasks also included.

Classification is the most frequent task, with a variable number of classes depending on the condition or disease being classified.

Two classes: For the diagnosis of autism in [39], Parkinson’s disease in [17], breast cancer in [16,33], prostate cancer metastasis in [36], pediatric oral health screening in [44], acute lymphoblastic leukemia in [34], skin cancer in [31], lymph node metastasis in [35], and cardiovascular disease in [14].
Three classes: For the classification of glaucoma stages (advanced, early, normal) in [11], diagnosis of pulmonary disorders in [23], detection of pneumonia in [12], prediction of Alzheimer’s disease in [38], and diagnosis of brain tumors in [18].
Five classes: For the classification of histopathological images in [13] and the grading of diabetic retinopathy in [29].
Seven classes: For the classification of dermatoscopy lesions in [19].

In the segmentation task, the expected result is a segmentation mask or probability maps of segmentation regions.

Segmentation masks: [9] produces image segmentation using contour lines. Ref. [27] generates an edge mask for lung organ segmentation. Ref. [10] produces a mask for pancreatic segmentation, as well as a 3D reconstruction of the organ. Ref. [43] generates segmentation masks for diabetic foot ulcer images. Ref. [28] produces a mask and 3D reconstruction for airway segmentation. Refs. [24,37] focus on segmenting cardiac MRI images to identify regions such as the right ventricle (RV), left ventricle (LV), and left ventricular myocardium (MYO).
Probability maps: Ref. [25] generates probability maps of segmentation regions for fundus OCT images.
Specified segment: Ref. [15] performs segmentation of a specified area in brain MRI images.

For image reconstruction tasks, such as in [40], the goal is to generate a reconstructed image from a lower-quality or noisy image. In applications of coordinate inference, such as the localization of gastric polyps in [22], the system predicts the specific coordinates (square area) of the lesions.

The publications analyzed in our systematic review demonstrate a wide diversity in the types of input data, with a predominance of medical images with sophisticated pre-processing. The expected outputs vary between multiclass classification for specific diagnoses, precise segmentation of anatomical structures (often with 3D reconstructions), and other specialized tasks such as super-resolution or coordinate inference, demonstrating the versatility of deep learning, attention, and fuzzy logic techniques in supporting medical diagnosis.

4.4.5. RQ5: What Methods or Metrics Were Used to Assess Results?

For a clear understanding of the results reported in the 32 publications included in our systematic review, it is essential to familiarize ourselves with the different metrics used. To that end, we have listed the metrics used in the publications. The most commonly reported metrics are detailed below:

(A): Classification and Overall Performance Metrics

Accuracy (ACC): Represents the percentage of correct predictions made by the model out of the total number of predictions.
Precision (Pre): Measures the proportion of true positives among all instances classified as positive by the model, indicating the reliability of positive predictions.
Recall (Rec)/Sensitivity (SEN)/true-positive rate (TPR): Indicates the proportion of true positives that the model correctly identified out of all really positive instances.
F1-score (F1)/F-Measure: This is the harmonic mean of precision and recall, providing a balance between the two metrics, which is especially useful in unbalanced data sets.
Area under the curve (AUC): Quantifies the ability of a classifier to distinguish between classes, being the area under the receiver operating characteristic (ROC) curve.
Specificity (SPE)/true-negative rate (TNR): Measures the proportion of true negatives that the model correctly identified out of all really negative instances.
Positive predictive value (PPV): This represents the probability that a positive result is really positive.
False-positive rate (FPR): Indicates the proportion of false positives out of the total number of actual negative cases.
Quadratic weighted kappa (kappa): Measures the degree of agreement between two evaluators (or between the model and the actual values), penalizing more strongly the most distant classification errors. It is helpful in ordinal classification problems.

(B): Segmentation and Region Similarity Metrics

Dice similarity coefficient (DSC/Dice coefficient): A widely used metric for measuring the overlap between the predicted segmentation and the true target area, with higher values indicating greater similarity.
Jaccard similarity coefficient (JSC/Jaccard coefficient/intersection over union (IoU)): Similar to the Dice coefficient, it measures the overlap between two sets, being the ratio of the intersection area to the union area.
Volumetric overlap error (VOE): Quantifies the volume of error between the predicted segmentation and the actual volume of the target area, expressed as a percentage of nonmatch.
Relative absolute volume difference (RAVD): Measures the relative volume difference between the predicted and true segmentation, normalized by the volume of the true segmentation.
Pixel accuracy (PA): The ratio of correctly classified pixels to the total number of pixels in the image.
Class pixel accuracy (CPA): The average pixel accuracy for each class individually.
Mean pixel accuracy (MPA): The average pixel accuracy is calculated for each class and then averaged across all classes.

(C): Distance and Surface Metrics

Hausdorff distance (HD/95HD/HD95): Measures the maximum distance between points in two sets, often used to evaluate the difference between segmentation contours. The 95th percentile (95HD/HD95) reduces sensitivity to outliers.
Average surface distance (ASD/ASSD): Calculates the average distance between the surfaces of the predicted segmentation and the true target area. Sometimes, ASSD is used to emphasize that the average considers both surfaces, not just one direction. Often, ASD and ASSD are synonymous.
Root mean square surface distance (RMSD): Measures the square root of the mean of the squared differences between surface distances.

(D): Specific Image Quality and Reconstruction Metrics

Peak signal-to-noise ratio (PSNR): A measure of image reconstruction quality where higher values indicate greater fidelity to the original image.
Structural similarity index (SSIM): Evaluates the similarity of two images in terms of luminance, contrast, and structure, providing a metric that is more closely aligned with human perception.
Mean squared error (MSE): Measures the average of the squares of the errors between the predicted values and the actual values, commonly used in regression and reconstruction.
Mean absolute error (MAE): Calculates the average of the absolute differences between the predicted values and the actual values, providing a measure of the average magnitude of the errors.

(E): Specific Airway Segmentation Metrics

Detected length ratio (DLR): Proportion of the total length of the airways that has been correctly detected.
Detected branch ratio (DBR): Proportion of the branches of the airways that have been correctly detected.
Continuity and completeness F-score (CCF): Combines two key aspects of airway segmentation: continuity (how well the branches are connected) and completeness (how much of the actual tree was segmented), using an F-score formula (harmonic mean).

5. Discussion

5.1. Artificial Neural Network (ANN) Algorithms

In the algorithms and techniques that were integrated, we identified a wide variety of artificial neural network (ANN) algorithms, with the use of more than one type of neural network per published proposal being common. The most frequently integrated ANNs were convolutional neural networks (CNNs), present in 28 publications due to their ability to extract spatial features in medical images for classification or segmentation. Multilayer perceptrons (MLPs) were used in 18 publications, often as a final classification layer or in combination with other types of neural networks. Transformers appeared in 7 publications, valued for their ability to model complex and temporal relationships, often integrated with fuzzy modules to guide attention. In addition, gated recurrent units (GRUs) and graph neural networks (GNNs) were found, although to a lesser extent.

We identified that selecting the most appropriate type of artificial neural network (ANN) for medical diagnosis support tasks depends on the type of input data and the specific task to be performed (classification, segmentation, reconstruction, or coordinate inference). The publications we analyzed often integrate more than one type of neural network to leverage their complementary strengths. Below, we outline the recommended scenarios for each type of ANN based on the information extracted from the publications.

Convolutional neural networks (CNNs) are the most frequently used type of neural network and are the primary choice for tasks involving medical image processing. Their strength lies in their ability to extract spatial features efficiently. They are ideal for identifying and delineating organs, such as the left cardiac atrium in [9], the lungs in [27], or the pancreas in [10]; tissues or lesions, including brain tumors in [18], subretinal fluid in [41], airways in [28], or diabetic foot ulcers in [43]. CNNs are fundamental to these tasks in encoder–decoder architectures.

In medical image classification, CNNs are used for the diagnosis or grading of conditions based on histopathological images, such as in [13], glaucoma in [11], and diabetic retinopathy in [29]; X-rays, such as pulmonary disorders in [23], pneumonia in [12], and lymph node metastasis in [35]; and dermatoscopy, such as skin cancer in [31]. In image reconstruction, CNNs are suitable for image enhancement, as in [40], and for object localization, as in the identification of gastric polyps in endoscopy images in [22]. CNNs are often integrated with spatial attention mechanisms and fuzzy logic, such as fuzzy pooling (as in [22]) or fuzzy convolutional modules (as in [24]), to enhance the interpretation of relevant regions and handle uncertainty in the data.

Multilayer perceptrons (MLPs) are mainly used for their simplicity and versatility, which makes them ideal for combining with other types of neural networks, attention modules, or fuzzy modules. Their most common use is as the final layer of the network for classification tasks when a model is needed to process feature vectors already extracted by other networks (such as CNNs or Transformers) to make final decisions, whether in image classification as in [11,12,13,19,23,29,34,35,38]; signal classification as in [14,17]; or structured data classification as in [36].

Transformers are excellent for modeling complex relationships and global dependencies between input features. They are used in tasks that require global contextual understanding, as their Self-Attention capability makes them suitable for capturing long-range relationships in images, such as in [9,15], or multimodal data, such as in [13]. They are also used in complex diagnostic systems, integrated into hybrid architectures such as Hybrid-Transformer in [15], or as part of models that fuse information from multiple sources for classification in [16]. In image segmentation and classification tasks, they are used for segmenting cardiac atria, as in [9], or brain tumors, as in [15], and for classifying glaucoma stages, as in [11], lung disorders, as in [23], and pediatric oral health, as in [44]. Transformers are often combined with CNNs in encoder–decoder architectures and can also be integrated with diffusion modules to guide attention.

Gated recurrent units (GRUs) are advantageous for processing medical time series. For example, in the analysis of physiological signals, such as the classification of electroencephalogram (EEG) signals for the diagnosis of Parkinson’s disease in [17], where sequence and temporal dependencies are crucial. They can also be combined with CNNs and attention mechanisms for tasks such as skin cancer detection, where spatial and temporal features are processed similarly to those in [31].

Graph neural networks (GNNs) are most commonly used to predict associations between diseases, which is ideal for processing data with an inherent graph structure, such as relationships between genes or samples, as seen in [36]. Additionally, GNNs can enhance the diagnosis of brain tumors by utilizing substructure information, as demonstrated in [18]. GNNs can also be combined with fuzzy systems to model structural uncertainty and extract relevant information, as seen in [33].

Other less common approaches include long short-term memory (LSTM) neural networks, which are utilized for medical image reconstruction tasks that require long-range temporal dependencies, often within attention modules, as seen in [40]. Generative adversarial networks (GANs), mainly used for data augmentation in scenarios where the amount of training data is limited, improve the robustness of classification models in [19]. Random-coupled neural networks (RCNNs) offer robustness and flexibility by introducing an element of unpredictability into the connections in [18].

5.2. Attention Mechanism Algorithms

As for attention mechanisms, they were diverse and highly customized. The most common ones include Self-Attention/Multihead Attention (Transformer style), present in at least 10 publications. Channel-Spatial Attention was employed in 12 publications, including versions such as Dual Attention and the Hybrid Attention Mechanism, which proved highly effective in image processing for prioritizing key regions. We also identified Diffuse Attention Modules, which directly integrate fuzzy logic into the attention process, such as the Fuzzy Attention Gate or Fuzzy-Enhanced Holistic Attention.

We identified that the integration of attention mechanisms into neural network models (often combined with fuzzy logic) was carried out in different scenarios to improve the model’s ability to focus on the most relevant information and handle uncertainty. Below, we detail the recommended scenarios for each type of attention mechanism.

Self-Attention and Multihead Attention (Transformer-style) are ideal for scenarios where complex relationships and global dependencies between input features need to be modeled, capturing long-range context. This type of attention mechanism is used in the task of segmenting complex medical images, such as the left cardiac atrium in [9] or brain tumors in [15], where global spatial relationships are crucial for accurate segmentation.

In image classification, attention based on Self-Attention and Multihead Attention is used when the global context is important for diagnosis, such as classifying histopathological images as [13] or grading glaucoma stages as [11]. It is also used in complex diagnostic systems that require merging information from multiple sources or modeling relationships between multimodal data, such as in [16,39]. Its main benefit is the ability to dynamically focus on different parts of the input and combine information from multiple “heads” of attention for a more complete representation.

The Channel-Spatial Attention-based mechanism is the preferred option for image processing tasks where it is essential to prioritize key regions of the human body or specific visual features. It is applied in models based on convolutional neural networks (CNNs). It is used for the localization and detection of objects in images, such as gastric polyps in endoscopic images, as seen in [22]. It is also used in the task of classifying medical images for diagnostic purposes, such as diagnosing lung disorders in [23], detecting pneumonia in chest X-rays in [12], diagnosing Alzheimer’s disease from PET scans in [38], or classifying lymph node metastases, as seen in [35].

This type of attention is applied in the segmentation of images of organs or lesions, where the precise identification of boundaries and regions is essential, such as in the segmentation of cardiac magnetic resonance images in [24] or OCT images of the fundus of the eye in [25]. Channel-Spatial Attention enables the dynamic capture of information at both the channel and spatial levels, emphasizing the most informative channels and spatial regions, which results in significant improvements in classification and segmentation metrics.

Fuzzy attention modules are used explicitly in scenarios where uncertainty, ambiguity, and imprecise boundaries are inherent characteristics of medical data, as they integrate fuzzy logic directly into the attention process. They are instrumental in image segmentation tasks with difficult-to-define boundaries, such as lung segmentation in [27], airway segmentation in [28], or cardiac MRI segmentation in [37]. Fuzzy logic enables the attention module to focus on relevant regions by utilizing membership functions, thereby addressing imprecision.

Fuzzy attention is also used in the task of disease classification, where the transition between classes is gradual or uncertain, such as the grading of diabetic retinopathy in [29] or the diagnosis of acute lymphoblastic leukemia in [34]. The integration of fuzzy logic mitigates uncertainty and improves the interpretation of features. These modules generate attention maps or redistribute weights using fuzzy membership functions, allowing the network to handle the uncertainty and complexity of regions of interest more effectively.

Other types of attention, such as Residual Connection Attention, Hybrid Attention, or Temporal Attention, are used in more specific situations, adapting to the particular needs of the task:

Residual Connection Attention is utilized for refined segmentation of organs, such as the pancreas in [10], as well as for segmentation of 3D medical images in [42]. Its goal is to improve feature representation and suppress irrelevant information or perform iterative fusion of information at different depths of the network.
Hybrid Attention is employed in medical image reconstruction in [40], where information from different layers or modules is combined to enhance the quality of the reconstructed image.
Temporal and Spatial Attention is essential for processing medical time series or data that combine temporal and spatial aspects, such as skin cancer detection through a hybrid network with GRU and CNN proposed by [31], where sequence and temporal dependencies are critical.

5.3. Fuzzy Logic Algorithms

Fuzzy logic is implemented in multiple ways. Fuzzy networks and fuzzy layers combine the nonlinear representation of ANNs with the approximate reasoning of fuzzy logic, allowing for the handling of uncertainty and ambiguity in medical data. Fuzzy clustering techniques (Fuzzy C-Means and Type-2 Fuzzy) were employed to identify fuzzy patterns and enhance feature segmentation or grouping, which is particularly useful in diagnosis with poorly defined boundaries. Hybrid fuzzy techniques such as the Fuzzy-Enhanced Firefly Algorithm and Fuzzy Entropy were also observed, indicating an integration of fuzzy logic beyond the architecture. Other techniques include fuzzy pooling, fuzzy activation functions, and fuzzy skip connections.

In the pre-processing and initial data transformation phases, fuzzy logic is utilized for feature discretization or initial image segmentation, thereby reducing redundant information and focusing on the area of interest. Techniques such as fuzzy clustering or the combination of Fuzzy Sets and Rough Sets are applied in this case in [25,41]. They are also used to improve data quality or representation, such as the Multioutput Takagi–Sugeno–Kang Fuzzy System to handle uncertainty and noise in [39], the Fuzzy Color Method to improve X-rays and reduce noise in [44], the Type-2 Fuzzy Subsystem to transform raw data into fuzzy samples in [33], or the Fuzzy Entropy Algorithm to quantify signal uncertainty in Parkinson’s diagnosis in [17].

For direct integration into attention mechanisms, fuzzy techniques guide or improve attention map generation or feature weighting. For example, the Fuzzy Attention-based Module and Fuzzy Attention Layer generate attention maps using trainable Gaussian membership functions to focus the neural network on relevant regions in [27,28,37]. In addition, Fuzzy-Guided Cross-Attention is introduced for feature fusion in [13], and Fuzzy-Enhanced Holistic Attention redistributes feature weights to mitigate uncertainty at visual boundaries in [29]. Neuro-fuzzy systems, such as the Adaptive Neuro-Fuzzy Inference System (ANFIS), also influence attention to model uncertainty in the data in [36]. The Feature Selective Enhancement (FSE) module integrates fuzzy logic for feature recalibration in [34].

The integration of fuzzy logic into the architecture of artificial neural networks involves adding it directly to their components. For example, the Anchor-Wise Fuzziness Module or the Fuzzy Learning Module is placed between the encoder and the decoder to estimate fuzzy areas or convert feature maps into fuzzy values in [9,42]. Fuzzy skip connections modify the internal connections of the networks to suppress irrelevant information and improve the target features in [10]. Fuzzy convolutional modules are incorporated into the layers to learn fuzzy information from the spatial and channel dimensions in [24]. A fuzzy activation function can be applied to generate smoother outputs that are more sensitive to changes in image data, as in [43].

In the post-processing or result fusion stages, fuzzy logic refines the network’s inferences. A Fuzzy Selector can process segmentation results to resist noise interference in [15]. For the fusion of outputs from multiple models, a Learnable Fuzzy Measure with Choquet Integral is employed in [16], and a Fuzzy Rank-Based Ensemble Approach is utilized in [19] to reduce the dispersion of predictions and enhance accuracy. A Self-Organizing Fuzzy Inference System can act as the final classifier in [14]. Additionally, the Fuzzy Channel Selection (FCS) Module is utilized to select relevant channels following the attention modules in [12], and the Adaptively Regularized Kernel-based Fuzzy C-Means (ARKFCM) algorithm is employed in an intermediate segmentation stage to cluster data with complex structures in [18].

Fuzzy logic can influence the learning process through the use of loss functions or optimization algorithms. Fuzzy Supervised Contrastive Loss (LFSC) is used to “smooth” label information, improving the generalization of the network in [38]. In the field of optimization, the Fuzzy-Enhanced Firefly Algorithm (FEFA) integrates fuzzy logic to enhance the adaptability of the model optimization process, as seen in [23].

5.4. Integration of Artificial Neural Network Algorithms, Attention Mechanisms, and Fuzzy Logic

We identified that the integration of techniques is carried out in different ways. In pre-processing or initial data transformation, fuzzy logic is utilized for feature discretization, initial image segmentation, data quality improvement (including noise reduction), and the transformation of raw data into fuzzy samples. Directly in attention mechanisms, fuzzy logic guides the generation of attention maps, weights features, or merges information.

In artificial neural network (ANN) architectures, integration of the three computational techniques influences information flow, and the internal connections (fuzzy skip connections) form an integral part of layers or activation functions (fuzzy convolutional modules and fuzzy pooling). In post-processing or result fusion stages, fuzzy logic refines results (Fuzzy Selector), merges outputs from multiple models (Fuzzy Rank-Based Ensemble), or acts as a final classifier (Self-Organizing Fuzzy Inference System).

In the loss function or for optimization, fuzzy logic integration is used to “smooth” label information (Fuzzy Supervised Contrastive Loss) or as part of optimization algorithms (Fuzzy-Enhanced Firefly Algorithm). In hybrid neuro-fuzzy systems, some models include intrinsically hybrid components, such as the Fuzzy-ER Net layer.

5.5. Criteria for Integrating Fuzzy Logic with Artificial Neural Network Models

After analyzing the 32 publications in our systematic review, we identified the stage at which fuzzy logic can be integrated or utilized. To achieve this, it is essential to recognize that fuzzy logic integration can be applied to all phases of a neural network model’s workflow, whether or not it includes an attention mechanism. The key decision will depend on the nature of the data, the specific problem to be solved, and the objective being pursued by incorporating fuzzy logic.

Using fuzzy logic in pre-processing or initial data transformation is ideal if the input data is inherently noisy, uncertain, or incomplete or needs a “smoother” or more interpretable representation before being processed by the neural network (see Figure 7A). Fuzzy logic can help reduce redundant information, focus on areas of interest, or convert raw data into fuzzy samples, thereby improving data quality and preparation. For example, it is helpful for feature discretization, initial image segmentation, or handling uncertainty and noise in the input data.

Direct integration of fuzzy logic into attention mechanisms is ideal if the neural network model needs to intelligently focus on relevant regions or features under conditions of uncertainty or ambiguous boundaries. Using fuzzy logic within attention mechanisms enables guiding or improving the generation of attention maps, weighting features more effectively, or merging features considering imprecision, which is crucial when traditional attention mechanisms are insufficient.

Integrating fuzzy logic into the architecture of artificial neural networks is recommended if we want it to be an intrinsic part of how the network processes information (see Figure 7B). This kind of integration involves modifying the data flow between layers, connections, or even activation functions to handle imprecision directly in the model’s calculations or processing. The goal of this type of integration is to introduce the ability to reason with fuzzy information in feature processing, suppress irrelevant information, or generate smoother and more adaptive outputs.

Using fuzzy logic in post-processing or result fusion is appropriate if the network’s raw inferences need refinement, consolidation, or a “softer” decision-making process that considers the uncertainty of predictions (see Figure 7C). It is beneficial when aiming to improve the robustness of final decisions or intelligently combine the results of multiple models, resisting noise in the results and smoothing decisions.

In the case of loss functions or optimization algorithms, integrating fuzzy logic directly influences the way the model learns (see Figure 7B). Fuzzy logic can soften error penalties or adapt the process of searching for optimal parameters. This integration can enhance the model’s generalization by “softening” label information or making the optimization algorithm more adaptable to fuzzy data.

Using hybrid neuro-fuzzy systems as key components can be considered if the design of the proposal involves creating a new type of “layer” or “module” that inherently merges the principles of neural networks and fuzzy logic. These components are designed for interpretable reasoning and can act as classifiers or central processors that directly integrate fuzzy concepts to handle uncertainty and ambiguity.

To decide whether to integrate fuzzy logic techniques into a neural network model with or without an attention mechanism, we recommend analyzing the inherent characteristics of the problem and the data. For example:

If the image data has ambiguous boundaries or significant noise, fuzzy data pre-processing or fuzzy attention could be very advantageous.
If the model needs to reason explicitly with data uncertainty or provide smoother responses, integration at the architecture level or as a hybrid system may be more appropriate.
If it is necessary to improve the robustness of final decisions or combine multiple predictions, post-processing techniques will be suitable.
If the learning process itself is affected by label ambiguity or optimization, a fuzzy loss function or fuzzy optimization algorithm could be the key.

The great versatility of fuzzy logic allows for deep customization, so an experimentation phase is necessary to find the most effective integration for a specific proposal.

5.6. Data and Metrics

In terms of data characteristics, the input data was predominantly medical images from different modalities (computed tomography, magnetic resonance imaging, fundus and OCT, histopathology, radiography, ultrasound, gastric endoscopy, peripheral blood smears, dermatoscopy, and diabetic foot ulcers). We also identified audio signals (heart sounds), electroencephalogram (EEG) signals, and mRNA expression data matrices.

Data pre-processing is diverse and includes resizing, cropping, normalization (Z-score, min–max), data augmentation (rotations, flips, zoom, noise, GANs), and image enhancements (fuzzy color methods). There is also modality-specific pre-processing, such as filtering and feature extraction for signals.

In the data to be predicted, classified, or inferred, classification is the most frequent task, with 2, 3, 5, or 7 classes to predict, depending on the disease or condition being studied. In the segmentation task, the results are segmentation masks or probability maps of regions.

The main evaluation metrics used in publications can be classified as follows:

Classification and overall performance: Accuracy (ACC), precision (Pre), recall (Rec)/sensitivity (SEN)/true-positive rate (TPR), F1-score (F1), area under the curve (AUC), specificity (SPE)/true-negative rate (TNR), positive predictive value (PPV), false-positive rate (FPR), and quadratic weighted kappa (kappa).
Segmentation and region similarity: Dice similarity coefficient (DSC), Jaccard similarity coefficient (JSC)/intersection over union (IoU), volumetric overlap error (VOE), relative absolute volume difference (RAVD), pixel accuracy (PA), class pixel accuracy (CPA), and mean pixel accuracy (MPA).
Distance and surface: Hausdorff distance (HD/95HD), average surface distance (ASD/ASSD), and root mean square symmetric surface distance (RMSD).
Image/reconstruction quality: Peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), mean squared error (MSE), and mean absolute error (MAE).
Airway segmentation: Detected length ratio (DLR), detected branch ratio (DBR), and continuity and completeness F-score (CCF).

5.7. Final Remarks on the Findings

The findings of our systematic review consistently reveal that integrating artificial neural networks, attention mechanisms, and fuzzy logic significantly improves the performance of computational models in medical diagnostic support tasks. Convolutional neural networks (CNNs) and multilayer perceptrons (MLPs) are the most commonly used neural network architectures due to their effectiveness in feature extraction and adaptability, respectively. This hybridization extends from pre-processing to the loss function, demonstrating the versatility of these approaches in enhancing the capabilities of the models.

Attention mechanisms, such as Self-Attention and Channel-Spatial Attention, are fundamental to improving the accuracy and efficiency of models by enabling them to focus on the most relevant features. For example, in cardiac image segmentation, Multihead Attention improved the Dice similarity coefficient (DSC) by 2.8% in [9], and in Alzheimer’s diagnosis, Hybrid Attention increased accuracy (ACC) by 4.28% in [38]. These results underscore the ability of attention mechanisms to refine information and optimize data representation.

Meanwhile, fuzzy logic stands out for its ability to handle the uncertainty and granularity inherent in medical data, translating into substantial improvements. Its impact is particularly notable in the diagnosis of autism in [39], where a Multioutput Takagi–Sugeno–Kang Fuzzy System increased accuracy (ACC) by an impressive 24.6% and specificity (SPE) by 59.9%. In addition, techniques such as Fuzzy Supervised Contrastive Loss in [38] or Fuzzy Channel Selection in [12] demonstrate considerable improvements by smoothing labels or selecting relevant features, showing their value in both architecture and data processing.

The fusion of attention mechanisms and fuzzy logic results in more robust performance by combining the ability to focus with the management of uncertainty. Cases such as Fuzzy Attention for lung segmentation in [27], which improved the intersection over union (IoU) by 0.83, or Fuzzy-Enhanced Holistic Attention for diabetic retinopathy in [29], which increased accuracy (ACC) by an average of 2.92%, illustrate an additive or synergistic impact. The versatility of these models is demonstrated in their application to various imaging modalities (CT, MRI, fundus, histopathological, radiographs) and other data such as audio signals, EEG, and mRNA expression matrices.

Although the findings are promising, we must consider that the analysis of the ablation studies supporting these improvements was only able to obtain results from the integration of approaches in 19 of the 32 publications, which limits the universal generalizability of the evidence in these publications. Additionally, heterogeneity in the way metrics are reported (in percentages or decimal values) makes direct comparison and accurate quantitative aggregation of results somewhat challenging. While statistical improvements are observed, future research must contextualize and evaluate the significance of these variations, as a slight percentage improvement may have a limited clinical impact.

5.8. Comparison with Other Reviews

To our knowledge, this is the first systematic literature review to explore the use of three approaches (artificial neural networks, attention mechanisms, and fuzzy logic) in tasks to support medical diagnosis. Other systematic and literature reviews have explored only the combination of artificial neural network techniques or algorithms with fuzzy logic, such as [45], which conducts a systematic review with a neuro-fuzzy focus for predicting neurological disorders. This study critically evaluated neuro-fuzzy systems as an effective technique applied to medical fields, specifically in neurological applications.

We can also find literature reviews, such as [46], which presents a comprehensive review of the main techniques and applications of fuzzy neural networks and neuro-fuzzy networks. The review focuses on the primary methods and training structures of hybrid models, as well as describing how these models have been applied to solve problems of different natures. Meanwhile, ref. [47] presents a systematic review that, although not directly applied to medical fields, explores the task of classification in social media based on fuzzy neural networks, as its motivation was to find the most effective neuro-fuzzy hybrid method for classification.

Finally, the systematic review in [48] proposes a comprehensive framework for fuzzy machine learning, which interested researchers can use to identify fuzzy techniques that improve the performance of machine learning methods, especially in situations involving complex or uncertain factors. However, this systematic review does not consider the importance of attention mechanisms.

5.9. Strengths and Limitations

Our findings reveal a set of significant strengths in the application of artificial intelligence for medical diagnosis. Primarily, the integration of neural networks, attention mechanisms, and fuzzy logic consistently improves performance across a wide range of tasks. We observe great versatility in neural network architectures, with CNNs dominating due to their ability to extract features from images, complemented by MLPs for final classification tasks and Transformers for modeling global dependencies. This adaptability extends to various medical data modalities, including images (CT, MRI, fundus, histopathological, X-rays, ultrasound, blood smears) and nonimage data (audio signals, EEG, mRNA expression matrices), demonstrating the robustness of hybrid approaches.

A key strength lies in the improved ability to handle uncertainty and data relevance. Attention mechanisms (such as Self-Attention and Channel-Spatial Attention) are crucial for models to focus on the most relevant features, improving model accuracy. Complementarily, fuzzy logic is distinguished by its ability to manage the ambiguity and granularity inherent in medical data, offering greater interpretability. The ability to manage ambiguity in medical data is achieved through various techniques, including fuzzy layers, fuzzy networks, fuzzy clustering, and hybrid fuzzy techniques.

The integration of fuzzy logic is sophisticated and multifaceted, ranging from data pre-processing (for feature discretization or image quality improvement and noise reduction) to its direct incorporation into attention mechanisms, modification of neural network architecture, post-processing of results (refinement or fusion of outputs from multiple models), and influence on the loss function or optimization algorithms. Notable examples include the use of Fuzzy Attention-based Modules to generate attention maps, fuzzy skip connections to suppress irrelevant information, and Fuzzy Supervised Contrastive Loss to smooth labels and improve generalization.

Despite these strengths, several important limitations must be considered. The most notable is that only 19 of the 32 publications included clear and concise information in the ablation studies to directly evaluate the impact of integrating the three approaches on the outcome. This lack of ablation studies (evaluating the impact of each technique or algorithm) in a significant number of papers limits the generalizability and ability to attribute improvements to attention or fuzzy logic components conclusively. Without these ablation studies, it is difficult to accurately determine the individual contribution of each technique to the overall performance of the model.

Another crucial limitation is the heterogeneity in the way performance metrics are reported. Sometimes, the values of the same metrics are presented as percentages or decimal values, which makes direct comparison and accurate quantitative aggregation of results between different publications somewhat difficult. Furthermore, although statistical improvements are reported, the publications do not thoroughly explore the clinical significance of these variations. A slight percentage improvement, although statistically significant, may have a limited clinical impact in real-world scenarios.

Finally, the use of in-house or private hospital datasets in some publications, although not explicitly stated as a limitation by the synthesis, raises questions of generalizability if these data are not representative or sufficiently diverse compared to public and standardized datasets. Some publications do not even detail the datasets or hyperparameters used in training, which makes replication and in-depth analysis difficult.

Additionally, it is essential to recognize that, due to the specific objectives of our systematic review (focused on the analysis of the integration of computational techniques), the discussion of several crucial topics was intentionally considered outside the scope of the study:

Absence of meta-analysis: Our systematic review was based on a narrative synthesis of the evidence found. This methodological choice means that the review does not include meta-analyses. It is crucial to note that the PRISMA methodology does not require conducting a meta-analysis; therefore, the absence of a quantitative synthesis typical of a meta-analysis should not be interpreted as a methodological weakness or a failure to comply with the PRISMA methodology.
Clinical, regulatory, and ethical applicability: The study’s focus was strictly limited to the analysis of the integration of computational techniques. Consequently, the analysis of regulatory barriers (such as FDA/CE certification), real-world deployment challenges, or ethical gaps related to algorithmic bias was outside the scope and focus of this systematic review.
Computational resource considerations: Similarly, detailed evaluation of computational costs, resource consumption (such as GPU memory or inference latency), and optimization of model parameter counts were not part of the data inclusion or extraction criteria.While these aspects are crucial for real-time applicability, their comprehensive analysis is considered a future line of research.

5.10. Implications for Medical Practice and Research

The results of the publications included in our systematic review have significant implications for both medical practice and artificial intelligence research, as well as opening up several lines of future research.

5.10.1. Implications for Medical Practice

Improved diagnosis and segmentation: The integration of artificial neural networks (ANNs), attention mechanisms, and fuzzy logic has consistently demonstrated improved performance in different medical diagnosis and segmentation tasks. The diversity of the tasks suggests considerable potential for assisting healthcare professionals in decision making, offering greater accuracy in identifying pathologies such as cancer, heart disease, or neurological disorders.
Versatility in data modalities: Hybrid models are adaptable to a wide range of medical data, including various images (CT, MRI, X-rays, and ultrasound, among others), as well as nonimage data such as sound signals, EEG, and mRNA expression matrices. This versatility enables the techniques to be applied across multiple medical specialties, including cardiology, neurology, oncology, and dermatology, thereby expanding their potential impact.
Uncertainty and relevance management: Attention mechanisms enable models to focus on the most relevant characteristics, thereby improving diagnostic accuracy and efficiency. Complementarily, fuzzy logic is crucial for managing the ambiguity and granularity inherent in medical data. Attention to ambiguous data is particularly valuable in clinical contexts where the boundaries between normal and pathological can be blurred or where data is noisy or incomplete, which could lead to more robust and potentially more interpretable diagnoses for medical professionals.

5.10.2. Implications for Research

Need for rigorous ablation studies: Future research should prioritize the inclusion of ablation studies to directly evaluate the impact of integrating each technique. Our observation that only 19 of the 32 publications included clear and concise information on the specific impact of attention mechanisms or fuzzy logic in their ablation studies is a significant limitation to conclusively and definitively attributing performance improvements to the aforementioned techniques. Evaluating each technique is crucial for the internal validity and replicability of the findings.
Standardization of metrics and clinical evaluation: Heterogeneity in the reporting of performance metrics (percentages or decimal values) makes direct comparisons between publications difficult. Research should move toward standardization in reporting results. In addition, it is essential that, beyond statistical significance, the clinical significance of the observed improvements be thoroughly explored to ensure that computational advances have a real and tangible impact on patient care.
Generalization of models: The use of in-house or private hospital datasets in some publications raises questions about the generalization of models to more diverse populations or different clinical settings. Future research should consider the use of larger and more diverse public datasets or techniques that ensure better generalizability to data not seen during training.
Optimization of computational complexity: The integration of multiple networks and techniques, while improving performance, can increase computational complexity in terms of computational resource consumption (CPU and memory). Future research should optimize models by reducing the number of parameters and minimizing processing and memory resource requirements while maintaining accuracy.

5.11. Suggestions for Future Lines of Research

We identified the following lines of research based on the areas of opportunity and considerations identified in the publications included in our systematic review.

(A): Improving generalization and robustness

Explore adversarial learning and transfer learning to improve the application of large models on medical datasets and build models with stronger generalization capabilities, as seen in [13].
Utilize large-scale samples to evaluate proposed methods and address the high specificity observed among subjects, as seen in [17].
Apply models to other complex medical datasets and conditions to test and improve model adaptability and effectiveness, as seen in [25,34,41].
Consider the generalization of the model to the global population, as seen in [38].

(B): Multimodal data integration

Expand the methodology to multimodal datasets (e.g., combining MRI with CT or images with clinical and textual data) for more comprehensive and accurate diagnoses, including:

Include multimodal data to build models with stronger generalization capabilities, as seen in [13].
Collate datasets containing both images and related textual information and explore the use of CLIP models to improve generalization and performance, as seen in [11].
Expand the proposed methodologies to multimodal image datasets, as seen in [23].
Merge multiple data sources (heart sound signals, images, clinical data, etc.), as seen in [14].

(C): Real-time applications

Develop new versions of models for video data, enabling real-time identification of pathological conditions, as seen in [22].
Expand the methodology to real-time diagnostic applications, as seen in [18,23].

(D): Exploring generative models and contrastive learning

Explore generative models for synthesizing images and utilize contrastive learning approaches to address the data imbalance problem, as observed in [11].

(E): Expansion to new tasks and data types

Extend the methods to more types of medical imaging tasks, such as tumor segmentation, specific organ segmentation, and lesion detection, as seen in [42].
Apply the model to the automated classification of skin lesions from digital photographs (particularly facial skin diseases such as acne, rosacea, etc.), as seen in [19].
Apply the proposed model to 3D medical image processing, as seen in [37].

(F): Hybrid model optimization

Optimize individual models to reduce the number of parameters while maintaining prediction accuracy, addressing computational complexity, as observed in [16].
Explore the use of swarm intelligence optimization algorithms, as seen in [39].

(G): New fuzzy logic techniques or types of training

Introduce fuzzy cost functions for improved optimization or incorporate other membership functions, such as generalized bell and sigmoid functions, as observed in [28].
Consider semi-supervised and unsupervised learning as possible directions for future research, as seen in [42].

6. Conclusions

The integration of artificial neural networks (ANNs), attention mechanisms, and fuzzy logic has consistently demonstrated a significant improvement in the performance of computational models for medical diagnosis support. This fusion is crucial for addressing the challenges inherent in clinical data, thereby driving the development of more robust, accurate, and interpretable diagnostic systems. ANNs (such as CNNs, MLPs, Transformers, GRUs, and GNN)s offer the ability to extract complex patterns and handle various types of medical data, including images, signals, and structured data while adapting to specific classification, segmentation, or reconstruction tasks. Attention mechanisms enhance the accuracy and efficiency of models by allowing them to focus on the most relevant features of the data, improving the representation and discernment of critical information. Fuzzy logic is crucial for managing the uncertainty, ambiguity, and granularity inherent in medical data; its versatility allows it to be integrated into any phase of the process, from pre-processing to model architecture, loss function, and post-processing of results.

The implications for medical practice are considerable, as this integration has the potential to assist healthcare professionals in decision making, offering greater accuracy in the identification of pathologies and the segmentation of anatomical structures. The versatility of these hybrid models across diverse data modalities (CT images, MRI, X-rays, and ultrasound, as well as audio signals, EEG, and mRNA expression matrices) extends their potential impact to multiple medical specialties. Improved management of uncertainty and data relevance leads to more robust and potentially more interpretable diagnoses for medical professionals, especially in contexts where boundaries are fuzzy or data is noisy.

Despite these promising findings, research in this field presents key areas for improvement and limitations that need to be addressed in future research. One notable limitation is that only 19 of the 32 publications analyzed included ablation studies with clear and concise information on the specific components of attention mechanisms or fuzzy logic, restricting the ability to attribute performance improvements to individual components conclusively.

It is essential that future research not only seek the significance of its statistical results or percentage metrics but also explore the clinical significance of the improvements observed to ensure that computational advances have a tangible impact on supporting medical diagnosis. The generalization of models is also a challenge, given the use of in-house datasets in some publications. Considering the above, we identify the following lines of future research:

Enhance generalization and robustness by exploring adversarial and transfer learning, utilizing larger and more diverse public datasets, and assessing the applicability of models to global populations.
Expand multimodal data integration, combining diverse sources such as images, clinical data, and text for more comprehensive diagnoses.
Develop real-time applications, such as identifying pathological conditions in video data.
Explore generative models and contrastive learning to synthesize images and handle data imbalance.
Extend methods to new tasks and new types of medical data.
Optimize hybrid models to minimize computational resource consumption and reduce the number of model parameters without compromising accuracy.
Investigate new fuzzy logic techniques or training types, such as introducing fuzzy cost functions or semi-supervised and unsupervised learning.

The great versatility of fuzzy logic enables deep customization in integration with artificial neural networks and attention mechanisms, making an experimentation phase necessary to determine the most effective integration for a specific proposal. This deep customization will contribute to the creation of more reliable diagnostic systems applicable in real clinical environments.

Author Contributions

Conceptualization, N.Z.-M., J.A.H.-N., and P.P.; methodology, N.Z.-M., J.A.H.-N., and P.P.; investigation, N.Z.-M., J.A.H.-N., and P.P.; writing—original draft preparation, N.Z.-M., J.A.H.-N., P.P., and M.G.-C.; writing—review and editing, N.Z.-M., P.P., J.A.H.-N., and M.G.-C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially funded by a postdoctoral fellowship from the Secretariat of Science, Humanities, Technology, and Innovation (SECIHTI) (grant numbering CVU: 826483) in Mexico.

Institutional Review Board Statement

The study followed the PRISMA-P protocol titled: “Artificial Neural Network, Attention Mechanism and Fuzzy Logic-Based Approaches for Medical Diagnostic Support: A Systematic Review.” It was approved by the Institutional Research Ethics Committee of Juarez Autonomous University of Tabasco (UJAT) (protocol code UJAT-CIEI-2025-062, approval date 24 July 2025).

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

We express our gratitude to the Secretariat of Science, Humanities, Technology, and Innovation (SECIHTI) and the Juarez Autonomous University of Tabasco (UJAT) for providing us with the necessary academic resources for this research.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Most Important Key Data Extracted to Answer the Research Questions of Our Systematic Review

See Table A1.

Table A1. Important key data extracted.

Ref.	Application Area	Input Data	Output Data	ANN Model	Attention Mechanism	Fuzzy Logic Technique	Integration Method	Impact on Results
[38]	Classification (Alzheimer’s disease)	Image	3 classes	CNN, MLP	Hybrid Attention Mechanism (HAM)	Fuzzy Color Method and Fuzzy Stacking	They utilize two convolutional modules, combined with a Hybrid Attention module (comprising a Channel Attention module and a Spatial Attention module based on CNN and MLP) and a Fuzzy Supervised Contrastive Loss to alleviate label stiffness and enhance the network’s generalization ability.	The use of the attention technique (Hybrid Attention) improved 3 metrics compared to when the attention technique was not used: ACC: 4.28%, SEN: 4.85%, SPE: 4.99%. The use of the fuzzy logic technique (Fuzzy Supervised Contrastive Loss) improved 3 metrics compared to when the fuzzy logic technique was not used: ACC: 10.74%, EN: 10.05%, SPE: 8.45%.
[39]	Classification (Autism)	Image	2 classes	Transformer, MLP	Multihead Self-Attention (Transformer)	Fuzzy Rough CNN	First, the model utilizes a Multioutput Takagi–Sugeno–Kang Fuzzy System to address the interference caused by inherent human body noise and equipment factors during the rs-fMRI data collection process, thereby providing a highly interpretable feature for subsequent work. Second, they employ an encoder–decoder architecture (MLP + Transformer). The encoder maps input data to a low-dimensional hidden representation, while the decoder maps the hidden representation back to the original data space. Finally, they use an MLP for classification.	The use of the fuzzy logic technique (Multioutput Takagi–Sugeno–Kang Fuzzy System) improved 2 metrics compared to when the fuzzy logic technique was not used: ACC: 24.6%, SPE: 59.9%.
[18]	Classification (Brain tumor)	Image	3 classes	CNN, Random-Coupled Neural Network, Graph Neural Network	Contextual Attention Network and Substructure Aware Graph Neural Network Attention	Multioutput Takagi–Sugeno–Kang Fuzzy System	The workflow of the proposal is A) pre-processing and feature extraction using Contextual Attention Network with Convolutional Auto Encoder (CAN-CAE). B) Segmentation (clustering) using Adaptively Regularized Kernel-based Fuzzy C-Means (ARKFCM). C) Classification using Random-Coupled Neural Network with Substructure Aware Graph Neural Network Attention (RCNN-SAGNNA).	No data
[16]	Classification (Breast tumor malignancy)	Image	2 classes	CNN, Transformer	Multihead Attention	Feature Selective Enhancement Module	They use four models, one CNN-based and 3 Transformer-based; the four models generate four individual malignancy scores. Finally, they fused malignancy scores utilizing learnable fuzzy measures.	No data
[33]	Classification (Cancer detection)	Image	2 classes	MLP	Self-Attention Mechanism	Fuzzy Attention Layer	The architecture has three cascaded components. The Type-2 Fuzzy Subsystem (component 1) takes raw data as input and outputs the corresponding fuzzy samples. In this way, the fuzzy samples are transferred to a better representation through the graph-based Generalized Correntropy Auto-encoder (component 2). The Fuzzy Graph Structure Attention Network (component 3) further models the structural uncertainty or fuzziness and extracts the structural information.	The use of attention and fuzzy logic techniques (Fuzzy Attention layer) improved 2 metrics compared to when the attention and fuzzy logic techniques were not used: ACC: 0.0158, F1: 0.0133
[32]	Classification (Cancer)	Image	NA	CNN, MLP	Channel Attention Module (CAM) and Position Attention Module (PAM)	Adaptive Neuro-Fuzzy Inference System (ANFIS)	The model uses three CNN-based networks (DAV-Net, EfficientNet, and DRN). The first CNN-based network (DAV-Net) used a position attention module and a Channel Attention module for segmentation. For classification, they use a fuzzy-ER net layer based on fuzzy partition, rule generation, and categorization of new patterns.	No data
[14]	Classification (Cardiovascular disease)	Sound	2 classes	1D-CNN, 2D-CNN, MLP	Attention Mechanism based on Self-Attention	Type-2 Fuzzy Subsystem	They utilize a network comprising 1D-CNN and 2D-CNN with an attention mechanism (transformer-based) and MLP. Then, they use a fuzzy inference system as a classifier.	No data
[29]	Classification (Diabetic Retinopathy Grading)	Image	5 classes	CNN, MLP	Fuzzy-Enhanced Holistic Attention Mechanism	Fuzzy-Enhanced Holistic Attention (FEHA)	First, a multiscale feature encoder based on CNN is applied to encode a set of feature maps from different scales. Then, for each scale feature map, fuzzy-enhanced holistic attention is used to generate channel-spatial attention weights by modeling the interaction of channel information and spatial location (fuzzy membership functions are applied in channel-spatial attention). Then, a fuzzy learning-based cross-scale fusion module is employed to integrate feature representations. Finally, the classifier is implemented using a fully connected layer with a softmax function to predict the grading results.	The use of attention and fuzzy logic techniques (Fuzzy-Enhanced Holistic Attention) improved 4 metrics compared to when the attention and fuzzy logic techniques were not used (averaged): ACC: 2.92%, kappa: 3.73%, Pre: 2.93%, F1-score: 4.08%.
[11]	Classification (Glaucoma Stages)	Image	3 classes	CNN, Transformer, MLP	Self-Attention Mechanism	Fuzzy Layer	They utilize CNN for feature extraction, then employ a Fuzzy Joint Attention Module, comprising two attention blocks: a Local–Global Channel Attention block and a Local–Global Spatial Attention block (CNN with Self-Attention), along with a fuzzy layer to convert feature maps into fuzzy maps. Finally, they use an MLP as a classifier.	The use of the fuzzy logic technique (fuzzy layer) improved 3 metrics compared to when the fuzzy logic technique was not used: ACC: 1.9%, F1: 2.2%, AUC: 0.01%. The use of attention technique (Channel Attention) improved 3 metrics compared to when the attention technique was not used: ACC: 1.9%, F1: 2.2%, AUC: 0.01%. The use of attention technique (Spacial Attention) improved 3 metrics compared to when the attention technique was not used: ACC: 1.7%, F1: 2%, AUC: 0.01%.
[13]	Classification (Cell structures)	Image	5 to 9 classes	CNN, Transformer	Self-Attention Mechanism	Fuzzy-Guided Cross-Attention	They use CNN for multigranular feature extraction and also extract three fuzzy features from input images by applying different membership functions. Then, use fuzzy-guided cross-attention (a Self-Attention Mechanism from the Transformer) for feature fusion and processing. Furthermore, finally, use an MLP for classification.	The use of the attention technique (Fuzzy-Guided Cross-Attention) improved 3 metrics compared to when the attention technique was not used: ACC: 2.7%, TPR: 2.4%, PPV: 3.15%. The use of the fuzzy logic technique (Universal Fuzzy Feature) improved 3 metrics compared to when the fuzzy logic technique was not used: ACC: 6%, TPR: 5.4%, PPV: 6.3%.
[44]	Classification (Horal health)	Image	2 classes	MLP	Multihead Self-Attention	Fuzzy Activation Function	First, they use image enhancement with a fuzzy color method as a pre-processing step. Then, they use a Swin transformer to process data and make the classification.	The use of the fuzzy logic technique (Fuzzy color method) improved 4 metrics compared to when the fuzzy logic technique was not used: ACC: 0.0262, Pre: 0.0238, Rec: 0.0329, F1: 0.0161
[23]	Classification (Lung Disorder)	Image	3 classes	CNN, Transformer, MLP	Hybrid Attention Mechanism (Channel Attention and Spatial Attention)	Fuzzy-Rough Set-Based Fitness Function.	The architecture utilizes EfficientNet and Vision Transformer for feature extraction (CNN-based models), then incorporates a Hybrid Attention Mechanism, and finally employs an MLP for class predictions. The Fuzzy-Enhanced Firefly Algorithm (Fuzzy Logic) improves the algorithm’s adaptability during the optimization procedure.	The use of the fuzzy logic technique (Fuzzy-Enhanced Firefly Algorithm) improved 4 metrics compared to when the fuzzy logic technique was not used: ACC: 1.7%, F1: 1.8%, Pre: 1.8%, Rec: 1.9%.
[35]	Classification (Lymph node metastasis)	Image	2 classes	CNN, MLP	Spatial-Channel Attention	Fuzzy Supervised Contrastive Loss	The model consists of three key parts: (1) CNN for feature extraction with fuzzy c-means clustering. (2) The Spatial-Channel Attention block consists of a spatial branch and a channel branch, which is used for the feature map. (3) The classification network takes the representations from the previous two parts as input.	The use of the attention technique (Spatial-Channel Attention) improved 3 metrics compared to when the attention technique was not used: Pre: 0.022, Rec: 0.022, F1: 0.002. The use of the fuzzy logic technique (Fuzzy C-Means Clustering) improved 3 metrics compared to when the fuzzy logic technique was not used: Pre: 0.032, Rec: 0.008, F1: 0.013.
[34]	Classification (Lymphoblastic leukemia)	Image	2 classes	CNN, MLP	Fuzzy Attention-based Feature Extraction	Fuzzy C-Means Clustering	Their model is based on a combination of CNN and MLP, featuring two pathways, and then the features extracted from both pathways are combined. They add a Feature Selective Enhancement module, which focuses the model’s attention on relevant morphological patterns. The Feature Selective Enhancement module is based on a CNN with a member function layer and a fuzzy rule layer.	No data
[17]	Classification (Parkinson’s disease)	Signal	2 classes	CNN, GRU, MLP	Dual Attention Network (Danet)	Learnable Fuzzy Measure	First, they use a fuzzy entropy algorithm in the pre-processing stage. They employ a 3-module architecture (temporal separable convolution, dual attention network, and GRU network) and subsequently calculate the classification using an MLP.	The use of the attention technique (Dual Attention Network) improved 4 metrics compared to when the attention technique was not used: ACC: 0.836, Pre: 0.853, Rec: 0.80, F1: 0.83. The use of the fuzzy logic technique (Fuzzy Entropy Algorithm) improved 4 metrics compared to when the fuzzy logic technique was not used: ACC: 5.44%, Pre: 5.46%, Rec: 5.24%, F1: 5.44%.
[12]	Classification (Pneumonia)	Image	3 classes	CNN, MLP	One Channel Attention Module (CAM) and two Spatial Attention Modules (Samavg and Sammax).	Adaptively Regularized Kernel-Based Fuzzy C-Means	The architecture uses CNN for feature extraction first. Then, the feature vector passes through three branches, which are treated by one Channel Attention Module (CAM) and two Spatial Attention Modules (SAMavg and SAMmax). The new feature map serves as input for the Fuzzy Channel Selection (FCS) module, which generates a feature map with the top channels. The classification layer then processes this flattened feature to predict the image’s output.	The use of the fuzzy logic technique (Fuzzy Channel Selection) improved 5 metrics compared to when the fuzzy logic technique was not used: ACC: 2.55%, Pre: 2.43%, Rec: 1.93%, F1: 3.1%, AUC: 2.18%.
[36]	Classification (Prostate cancer)	Data array	2 classes	MLP, Graph Neural Network	Multihead Attention	Self-Organizing Fuzzy Inference System	They applied a classifier model to assess the discrimination ability of the selected features. This classifier model consists of three layers: a Graph Generator layer, a Graph Fuzzy Attention Network (GFAT) layer, and a Fully Connected (FC) layer.	No data
[31]	Classification (Skin cancer)	Image	2 classes	CNN, GRU	Temporal and Spatial Attention Module	Fuzzy Entropy Algorithm	In the pre-processing stage, the region of interest (ROI) is segmented using a Fuzzy C-Means (DicL-FCM) clustering technique. Then, the classification is performed by the Convolutional Gated Recurrent Network (TA_CGRNet), which is designed by hybridizing the Gated Recurrent Unit (GRU) and CNN with a temporal and spatial attention module.	No data
[19]	Classification (Skin lesion)	Image	7 classes	GANs, CNN, MLP	Efficient Channel Integrating Spatial Attention Module	Fuzzy C-Means (Dicl-FCM) Clustering	3 CNN-based models with channel and spatial attention mechanisms (MLFF-InceptionV3, MLFF-Xception, and MLFF-DenseNet121) are used to compute decision scores. Then, a Fuzzy Rank-Based Ensemble approach is employed to reduce the dispersion of individual-based predictions and enhance classification accuracy.	No data
[28]	Organ segmentation (Airway)	Image	Mask and 3D reconstruction of organ	CNN	Fuzzy Attention Layer	Fuzzy Channel Selection Module	Encoder–decoder architecture based on CNN with residual connections between encoder and decoder. Each residual connection has a fuzzy attention layer between. The fuzzy attention layer takes both feature representations from the encoder and decoder layers and applies a learnable Gaussian membership function to them.	The use of attention and fuzzy logic techniques (Fuzzy Attention layer) improved 2 metrics compared to when the attention and fuzzy logic techniques were not used in a base model: Pre: 0.0015, DLR: 1.39%, DBR: 2.18%, AMR: 0.34%, CCF: 0.0051
[15]	Organ segmentation (Brain tumor)	Image	Area segment	CNN, Transformer, MLP	Multihead Self-Attention (Transformer), Refine Module	Fuzzy-Enhanced Firefly Algorithm (FEFA)	They utilize a hybrid joint that combines Transformer and convolution (Hybrid-Transformer) and subsequently propose a refinement module to preserve and refine the downsampled features, which incorporates an attention mechanism based on average and maximum pooling. Finally, a fuzzy selector is proposed to process the segmentation results further.	No data
[9]	Organ segmentation (Cardiac left atrium)	Image	Image with green and red lines	CNN, Transformer	Multihead Self-Attention Mechanism	Ancho-Wise Fuzziness Module	Encoder–decoder architecture, convolution for contraction, and the Transformer for expansion, with a mask matrix in the middle that incorporates fuzziness elements.	The use of the attention technique (Self-Attention Mechanism) improved 4 metrics compared to when the attention technique was not used: DICE: 2.8%, JSC: 3.77%, HD: 2.33, ASD: 1.54.
[24]	Organ segmentation (Cardiac)	Image	Image with 3 areas and black background	CNN	Higher-Performance Dyconv Structure as an Attention Mechanism	Fuzzy-ER Netlayer	Encoder–decoder architecture based on CNN with residual connections between encoder and decoder. They include a spatial fuzzy convolutional layer (SFConv) and a channel fuzzy convolutional layer (CFConv) in the encoder and decoder. They add a dyConv structure (CNN-based) that calculates the attention of the feature map in the channel or spatial dimension.	The use of the attention technique (Channel and Spatial Attention) improved 2 metrics compared to when the attention technique was not used in a base model: DSC: 1.63%, HD: 1.12 mm. The use of the fuzzy logic technique (fuzzy convolutional module) improved 2 metrics compared to when the fuzzy logic technique was not used in a base model: ACC: DSC: 3.05%, HD: 1.72 mm.
[37]	Organ segmentation (Cardiac)	Image	Image with segmentation area	CNN	Fuzzy Attention Module (Fuzzy Channel Attention and Fuzzy Spatial Attention)	Fuzzy C-Means Clustering (FCM).	Encoder–decoder architecture based on CNN with residual connections between encoder and decoder. They add a Fuzzy Channel Attention Module on the encoder and a Fuzzy Spatial Attention Module on the decoder. Based on the attention mechanism, fuzzy membership is used to recalibrate the importance of the pixel value of each local area.	No data
[43]	Organ segmentation (Diabetic foot ulcers)	Image	Segmentation masks	CNN	Attention Gate Module	Fuzzy Selector	They employ an encoder–decoder architecture, utilizing a fuzzy logic approach to the activation function and attention gates.	No data
[25]	Organ segmentation (Fundus/eyes)	Image	Probability maps of segmentation regions	CNN	Dual Attention Mechanism (Spatial-Channel), Attention Refinement Module	Fuzzy Convolutional Module	First, feature discretization based on the rough fuzzy set is performed. Then, the data serves as input to the encoder–decoder architecture with skip connections, which introduces a dual attention mechanism that combines spatial regions and feature channels. Next, the attention refinement module captures multiscale contextual information as feature maps. Finally, two CNN layers generate probability maps of segmentation regions	The use of the fuzzy logic technique (Fuzzy C-Means Clustering) improved 5 metrics compared to when the fuzzy logic technique was not used: DSC: 0.02, HD95: 0.02, ASD: 0.02, SEN: 0.02, SPE: 0.05.
[27]	Organ segmentation (Lung)	Image	Border mask	CNN	Fuzzy Attention-based Module	Fuzzy Attention Gate (FAG)	They propose a Fuzzy Attention-based Transformer-like (encoder–decoder architecture). The encoder and decoder are based on CNN, and they are connected with residual connections. In each residual connection, they add a "fuzzy attention module" to focus on pertinent regions of the encoder and decoder outputs, utilizing four Gaussian membership functions. Finally, they use a feature fusion module to focus on image borders.	The use of attention and fuzzy logic techniques (Fuzzy Attention) improved 6 metrics compared to when the attention and fuzzy logic technique were not used: IoU: 0.83, Pre: 0.39%, DLR: 4.03, DBR: 4.12, AMR: 0.47, CCF: 2.24.
[42]	Organ segmentation (Multiorgan)	Image	Reconstructed image	CNN	Iterative Attention Fusion Module	Fuzzy Learning Module	Encoder–decoder architecture based on CNN with residual connections between encoder and decoder. Each residual connection utilizes a fuzzy operation to suppress irrelevant information. The decoder utilizes a target attention mechanism based on a CNN to further enhance the feature representation. The output is a mask. Finally, they use an ensemble of multiview masks to generate a 3D representation of the segmentation.	The use of attention technique (Attention Fusion) improved 2 metrics compared to when the attention technique was not used: DSC: 0.59%; HD95: 0.7 mm. The use of attention and fuzzy logic techniques (Attention Fusion + Fuzzy Learning) improved 2 metrics compared to when the attention and fuzzy logic techniques were not used: DSC: 1.24%; HD95: 1.52 mm.
[10]	Organ segmentation (Pancreas)	Image	Mask and 3D reconstruction of organ	CNN	Target Attention Mechanism Based on CNN	Fuzzy Skip Connections	Encoder–decoder architecture based on CNN with residual connections between encoder and decoder. Each residual connection utilizes a fuzzy operation to suppress irrelevant information. The decoder uses a target attention mechanism based on CNN to further improve the feature representation. The output is a mask. Finally, they use an ensemble of multiview masks to generate a 3D representation of the segmentation.	The use of attention and fuzzy logic techniques (Target attention + fuzzy skip connections) improved 4 metrics compared to when the attention and fuzzy logic techniques were not used: DSC: 7.1%, VOE: 10.24%, ASSD: 0.55 mm, RMSD: 0.17 mm.
[41]	Organ segmentation (Subretinal Fluid Lesions)	Image	NA	CNN	Dual Attention Mechanism (Spatial and Channel), and Attention Refinement Module	Fuzzy Clustering	First, they applied a fuzzy clustering of OCT fundus images. Then, the fuzzy set was combined with the rough set in a rough approximation space to construct a fuzzy rough model for feature discretization in SD-OCT fundus images. Finally, the images after feature discretization are used as input data, and the deep attention modules are employed to capture multiscale information within the fully convolutional neural network architecture.	The use of attention technique (Dual attention Mechanism) improved 3 metrics compared to when the attention technique was not used: DSC: 0.0376, HD95: 0.0794, ASD: 0.0799.
[40]	Reconstruction (COVID-19 CT reconstruction)	Image	Reconstructed image	CNN, LSTM, MLP	Attention Mechanism based on CNN and others based on Conv-LSTM	Fuzzy Network	They utilized a fuzzy neural information processing block, which combines a CNN and a fuzzy network (to match the output of the fuzzy neurons in the membership function layer according to specific rules) and then performed fuzzy logic operations to calculate the uncertainty of the input pixels. Then, they utilize feature extraction with a CNN that incorporates residual connections (MGLDB module) and two attention modules, based on CNN and Conv-LSTM, to reconstruct the image.	No data
[22]	Tissue location (Polyp positions)	Image	Area coordinates (square)	CNN, MLP	Hybrid Attention Mechanism	Fuzzy Pooling	The architecture utilizes a Channel Attention block and a Spatial Attention block, combined with a CNN, to compute the coordinates of gastric polyps. Then, they employ a CNN based on fuzzy pooling to perform feature extraction of the image. Then, the model integrates both outputs and inputs them into the FCN to obtain the coordinates of gastric polyps.	No data

Appendix B. Results of the Risk of Bias Assessment of the Included Publications on Our Systematic Review

See Table A2.

Table A2. Results of the risk of bias assessment.

Ref.	Q-01	Q-02	Q-03	Q-04	Q-05	Q-06	Q-07	Q-08	Q-09	Q-10	Q-11	Score	Status
[9]	1	0.5	1	1	1	1	1	1	1	1	0	9.5	Included
[13]	1	1	1	1	1	1	1	1	1	1	1	11	Included
[40]	1	0.5	1	1	1	1	1	1	0	1	0	9	Included
[11]	1	0.5	1	1	1	1	1	1	1	1	0	9.5	Included
[27]	1	0.5	1	1	1	1	1	1	1	1	0	9.5	Included
[10]	1	0.5	1	1	1	1	1	1	0.5	1	0	9	Included
[42]	1	1	1	1	1	1	1	1	1	1	0	10	Included
[22]	1	0.5	1	1	1	1	1	1	0	1	0	9	Included
[29]	1	0.5	1	1	1	1	1	1	1	1	0	9.5	Included
[41]	1	1	1	1	1	1	1	1	1	1	1	11	Included
[23]	1	1	1	1	1	1	1	1	1	1	0.5	10.5	Included
[39]	1	1	1	1	1	1	1	1	1	1	0.5	10.5	Included
[15]	1	0.5	1	1	1	1	1	1	0	0.5	0	8.5	Included
[18]	1	1	1	1	1	1	1	1	0	1	0.5	9.5	Included
[43]	1	0.5	1	1	1	1	1	1	0	0.5	0	8	Included
[12]	1	0.5	1	1	1	1	1	1	1	1	0	9.5	Included
[44]	1	1	1	1	1	0.5	0.5	1	0.5	1	1	9.5	Included
[28]	1	1	1	1	1	1	1	1	1	1	1	11	Included
[38]	1	1	1	1	0.5	1	1	1	0.5	1	0	9	Included
[33]	0.5	0.5	1	1	1	1	1	0.5	0.5	0.5	0	7.5	Included
[35]	1	1	1	1	1	1	1	1	0.5	1	0.5	10	Included
[14]	1	1	1	1	0.5	0.5	0.5	1	0	1	0.5	8	Included
[34]	1	0.5	1	1	0.5	0.5	0.5	1	0	1	1	8	Included
[36]	1	1	0.5	1	1	1	1	1	0	1	1	9.5	Included
[16]	1	0.5	1	1	1	1	1	1	0	1	0	8.5	Included
[32]	1	1	1	1	1	1	1	1	0	1	0.5	10	Included
[17]	1	1	1	1	1	1	1	1	0.5	1	1	10.5	Included
[24]	1	1	1	1	1	1	1	1	0.5	1	1	10.5	Included
[31]	1	1	1	1	1	1	1	1	0	1	1	10	Included
[25]	1	1	1	1	1	1	1	1	0.5	1	0.5	10	Included
[19]	1	0.5	1	1	1	1	1	1	0	1	0	9	Included
[37]	1	0.5	1	1	1	1	1	1	0	1	0	9	Included

References

Vanstone, M.; Monteiro, S.; Colvin, E.; Norman, G.; Sherbino, J.; Sibbald, M.; Dore, K.; Peters, A. Experienced physician descriptions of intuition in clinical reasoning: A typology. Diagnosis 2019, 6, 259–268. [Google Scholar] [CrossRef]
Wen, X. Deep learning framework for enhanced MRI analysis in healthcare diagnosis. Expert Syst. Appl. 2025, 292, 128487. [Google Scholar] [CrossRef]
Aggarwal, C.C. Neural Networks and Deep Learning: A Textbook; Springer International Publishing: Cham, Switzerland, 2023. [Google Scholar] [CrossRef]
Kufel, J.; Bargiel-Laczek, K.; Kocot, S.; Kozlik, M.; Bartnikowska, W.; Janik, M.; Czogalik, L.; Dudek, P.; Magiera, M.; Lis, A.; et al. What Is Machine Learning, Artificial Neural Networks and Deep Learning?—Examples of Practical Applications in Medicine. Diagnostics 2023, 13, 2582. [Google Scholar] [CrossRef] [PubMed]
Phuong, N.H.; Kreinovich, V. Fuzzy logic and its applications in medicine. Int. J. Med Inform. 2001, 62, 165–173. [Google Scholar] [CrossRef] [PubMed]
Bojadziev, G.; Bojadziev, M. Fuzzy Sets, Fuzzy Logic, Applications; Advances in Fuzzy Systems—Applications and Theory; World Scientific: Singapore, 1996; Volume 5. [Google Scholar] [CrossRef]
Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef]
Fang, M.; Wang, Z.; Pan, S.; Feng, X.; Zhao, Y.; Hou, D.; Wu, L.; Xie, X.; Zhang, X.Y.; Tian, J.; et al. Large models in medical imaging: Advances and prospects. Chin. Med J. 2025, 138, 1647–1664. [Google Scholar] [CrossRef]
Zhang, T.; Wang, X. Anchorwise Fuzziness Modeling in Convolution–Transformer Neural Network for Left Atrium Image Segmentation. IEEE Trans. Fuzzy Syst. 2024, 32, 398–408. [Google Scholar] [CrossRef]
Chen, Y.; Xu, C.; Ding, W.; Sun, S.; Yue, X.; Fujita, H. Target-aware U-Net with fuzzy skip connections for refined pancreas segmentation. Appl. Soft Comput. 2022, 131, 109818. [Google Scholar] [CrossRef]
Das, D.; Nayak, D.R. FJA-Net: A Fuzzy Joint Attention Guided Network for Classification of Glaucoma Stages. IEEE Trans. Fuzzy Syst. 2024, 32, 5438–5448. [Google Scholar] [CrossRef]
Roy, A.; Bhattacharjee, A.; Oliva, D.; Ramos-Soto, O.; Alvarez-Padilla, F.J.; Sarkar, R. FA-Net: A Fuzzy Attention-aided Deep Neural Network for Pneumonia Detection in Chest X-Rays. In Proceedings of the 2024 IEEE 37th International Symposium on Computer-Based Medical Systems (CBMS), Guadalajara, Mexico, 26–28 June 2024; pp. 338–343. [Google Scholar] [CrossRef]
Ding, W.; Zhou, T.; Huang, J.; Jiang, S.; Hou, T.; Lin, C.T. FMDNN: A Fuzzy-Guided Multigranular Deep Neural Network for Histopathological Image Classification. IEEE Trans. Fuzzy Syst. 2024, 32, 4709–4723. [Google Scholar] [CrossRef]
Xiao, F.; Liu, H.; Lu, J. A new approach based on a 1D + 2D convolutional neural network and evolving fuzzy system for the diagnosis of cardiovascular disease from heart sound signals. Appl. Acoust. 2024, 216, 109723. [Google Scholar] [CrossRef]
Wu, D.; Nie, L.; Mumtaz, R.A.; Agarwal, K. A LLM-Based Hybrid-Transformer Diagnosis System in Healthcare. IEEE J. Biomed. Health Inform. 2024, 29, 6428–6439. [Google Scholar] [CrossRef]
Singh, V.K.; Mohamed, E.M.; Abdel-Nasser, M. Aggregating efficient transformer and CNN networks using learnable fuzzy measure for breast tumor malignancy prediction in ultrasound images. Neural Comput. Appl. 2024, 36, 5889–5905. [Google Scholar] [CrossRef]
Li, J.; Li, X.; Mao, Y.; Yao, J.; Gao, J.; Liu, X. Classification of Parkinson’s disease EEG signals using 2D-MDAGTS model and multi-scale fuzzy entropy. Biomed. Signal Process. Control 2024, 91, 105872. [Google Scholar] [CrossRef]
Srinivasan, P.S.; Regan, M. Enhancing Brain Tumor Diagnosis with Substructure Aware Graph Neural Networks and Fuzzy Linguistic Segmentation. In Proceedings of the 2024 Second International Conference on Intelligent Cyber Physical Systems and Internet of Things (ICoICI), Coimbatore, India, 28–30 August 2024; pp. 1613–1618. [Google Scholar] [CrossRef]
Li, H.; Li, W.; Chang, J.; Zhou, L.; Luo, J.; Guo, Y. Dermoscopy lesion classification based on GANs and a fuzzy rank-based ensemble of CNN models. Phys. Med. Biol. 2022, 67, 185005. [Google Scholar] [CrossRef]
Amato, F.; López, A.; Peña-Méndez, E.M.; Vaňhara, P.; Hampl, A.; Havel, J. Artificial neural networks in medical diagnosis. J. Appl. Biomed. 2013, 11, 47–58. [Google Scholar] [CrossRef]
Jones, A. Comprehensive Machine Learning Techniques: A Guide for the Experienced Analyst; Walzone Press: Pittsfield, MA, USA, 2025; ISBN 979-8-230-43387-3. [Google Scholar]
Ma, X.; Wang, H.; Ren, X.; Ma, Y. A Hybrid Attention-Based Fuzzy Pooling Network Model for Locating Polyp Positions in Gastroscopic Image in Internet of Medical Things. IEEE Internet Things J. 2025, 1. [Google Scholar] [CrossRef]
Pearly, A.A.; Karthik, B. HybridNet-X: A Hybrid Deep Learning Network with Fuzzy-Enhanced Firefly Algorithm for Lung Disorder Diagnosis Using X-Ray Images. In Proceedings of the 2024 International Conference on Innovative Computing, Intelligent Communication and Smart Electrical Systems (ICSES), Chennai, India, 12–13 December 2024; pp. 1–7. [Google Scholar] [CrossRef]
Luo, Y.; Fang, Y.; Zeng, G.; Lu, Y.; Du, L.; Nie, L.; Wu, P.Y.; Zhang, D.; Fan, L. DAFNet: A dual attention-guided fuzzy network for cardiac MRI segmentation. AIMS Math. 2024, 9, 8814–8833. [Google Scholar] [CrossRef]
Chen, Q.; Zeng, L.; Lin, C. A deep network embedded with rough fuzzy discretization for OCT fundus image segmentation. Sci. Rep. 2023, 13, 328. [Google Scholar] [CrossRef]
Chen, L.; Zhang, H.; Xiao, J.; Nie, L.; Shao, J.; Liu, W.; Chua, T.S. SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6298–6306. [Google Scholar] [CrossRef]
Zhang, S.; Fang, Y.; Nan, Y.; Wang, S.; Ding, W.; Ong, Y.S.; Frangi, A.F.; Pedrycz, W.; Walsh, S.; Yang, G. Fuzzy Attention-Based Border Rendering Orthogonal Network for Lung Organ Segmentation. IEEE Trans. Fuzzy Syst. 2024, 32, 5462–5476. [Google Scholar] [CrossRef]
Nan, Y.; Ser, J.D.; Tang, Z.; Tang, P.; Xing, X.; Fang, Y.; Herrera, F.; Pedrycz, W.; Walsh, S.; Yang, G. Fuzzy Attention Neural Network to Tackle Discontinuity in Airway Segmentation. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 7391–7404. [Google Scholar] [CrossRef] [PubMed]
Lin, Z.; He, Z.; Wang, X.; Su, W.; Tan, J.; Deng, Y.; Xie, S. Cross-Scale Fuzzy Holistic Attention Network for Diabetic Retinopathy Grading From Fundus Images. IEEE Trans. Emerg. Top. Comput. Intell. 2025, 9, 2164–2178. [Google Scholar] [CrossRef]
PRISMA-P Group; Moher, D.; Shamseer, L.; Clarke, M.; Ghersi, D.; Liberati, A.; Petticrew, M.; Shekelle, P.; Stewart, L.A. Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015 statement. Syst. Rev. 2015, 4, 1. [Google Scholar] [CrossRef]
Rai, A.K.; Agarwal, S.; Gupta, S.; Agarwal, G. An effective fuzzy based segmentation and twin attention based convolutional gated recurrent network for skin cancer detection. Multimed. Tools Appl. 2023, 83, 52113–52140. [Google Scholar] [CrossRef]
Murthy, N.N.; Thippeswamy, K. Fuzzy-ER Net: Fuzzy-based Efficient Residual Network-based lung cancer classification. Comput. Electr. Eng. 2025, 121, 109891. [Google Scholar] [CrossRef]
Quan, T.; Yuan, Y.; Song, Y.; Zhou, T.; Qin, J. Fuzzy Structural Broad Learning for Breast Cancer Classification. In Proceedings of the 2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI), Kolkata, India, 28–31 March 2022; pp. 1–4. [Google Scholar] [CrossRef]
Zhang, T.; Xue, G. Fuzzy attention-based deep neural networks for acute lymphoblastic leukemia diagnosis. Appl. Soft Comput. 2025, 171, 112810. [Google Scholar] [CrossRef]
Luo, Y.; Xin, J.; Liu, S.; Feng, J.; Ruan, L.; Cui, W.; Zheng, N. Lymph Node Metastasis Classification Based on Semi-Supervised Multi-View Network. In Proceedings of the 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Seoul, Republic of Korea, 16–19 December 2020; pp. 675–680. [Google Scholar] [CrossRef]
Emdadi, M.; Pedram, M.M.; Eshghi, F.; Mirzarezaee, M. Graph Fuzzy Attention Network Model for Metastasis Prediction of Prostate Cancer Based on mRNA Expression Data. Int. J. Fuzzy Syst. 2024, 27, 1702–1711. [Google Scholar] [CrossRef]
Yang, R.; Yu, J.; Yin, J.; Liu, K.; Xu, S. An FA-SegNet Image Segmentation Model Based on Fuzzy Attention and Its Application in Cardiac MRI Segmentation. Int. J. Comput. Intell. Syst. 2022, 15, 24. [Google Scholar] [CrossRef]
Chen, Y.; Wang, H.; Zhang, G.; Liu, X.; Huang, W.; Han, X.; Li, X.; Martin, M.; Tao, L. Contrastive Learning for Prediction of Alzheimer’s Disease Using Brain 18F-FDG PET. IEEE J. Biomed. Health Inform. 2023, 27, 1735–1746. [Google Scholar] [CrossRef]
Zhang, S.; Xiao, L.; Huang, H.; Hu, Z. Attention-stacking adaptive fuzzy neural networks for autism diagnosis. In Proceedings of the 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Lisbon, Portugal, 3–6 December 2024; pp. 5654–5661. [Google Scholar] [CrossRef]
Wang, C.; Lv, X.; Shao, M.; Qian, Y.; Zhang, Y. A novel fuzzy hierarchical fusion attention convolution neural network for medical image super-resolution reconstruction. Inf. Sci. 2023, 622, 424–436. [Google Scholar] [CrossRef]
Chen, Q.; Zeng, L.; Ding, W. FRCNN: A Combination of Fuzzy-Rough-Set-Based Feature Discretization and Convolutional Neural Network for Segmenting Subretinal Fluid Lesions. IEEE Trans. Fuzzy Syst. 2025, 33, 350–364. [Google Scholar] [CrossRef]
Ding, W.; Geng, S.; Wang, H.; Huang, J.; Zhou, T. FDiff-Fusion: Denoising diffusion fusion network based on fuzzy learning for 3D medical image segmentation. Inf. Fusion 2024, 112, 102540. [Google Scholar] [CrossRef]
Purwono, P.; Nataliani, Y.; Purnomo, H.D.; Timotius, I.K. Fuzzy Logic and Attention Gate for Improved U-Net with Genetic Algorithm for DFU Image Segmentation. In Proceedings of the 2024 International Conference on Information Technology Research and Innovation (ICITRI), Jakarta, Indonesia, 5–6 September 2024; pp. 135–140. [Google Scholar] [CrossRef]
Bhat, S.; Birajdar, G.K.; Patil, M.D. Pediatric oral health detection using Swin transformer. In Proceedings of the 2024 Third International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE), Ballari, India, 26–27 April 2024; pp. 1–7. [Google Scholar] [CrossRef]
Bali, B.; Garba, E.J. Neuro-fuzzy Approach for Prediction of Neurological Disorders: A Systematic Review. SN Comput. Sci. 2021, 2, 307. [Google Scholar] [CrossRef]
De Campos Souza, P.V. Fuzzy neural networks and neuro-fuzzy networks: A review the main techniques and applications used in the literature. Appl. Soft Comput. 2020, 92, 106275. [Google Scholar] [CrossRef]
Ravichandran, B.D.; Keikhosrokiani, P. Classification of Covid-19 misinformation on social media based on neuro-fuzzy and neural network: A systematic review. Neural Comput. Appl. 2023, 35, 699–717. [Google Scholar] [CrossRef] [PubMed]
Lu, J.; Ma, G.; Zhang, G. Fuzzy Machine Learning: A Comprehensive Framework and Systematic Review. IEEE Trans. Fuzzy Syst. 2024, 32, 3861–3878. [Google Scholar] [CrossRef]

Figure 1. Flowchart of the search and selection process (according to the PRISMA 2020 guidelines). The letter “n” indicates the number of publications in the different stages.

Figure 2. Bibliographic network map of the included publications.

Figure 3. Main tasks addressed in the publications of our systematic review.

Figure 4. Predominant types of data identified in our systematic review.

Figure 5. ANN combinations identified in the 32 publications.

Figure 6. Types of attention mechanisms identified in the 32 publications.

Figure 7. Methods for integrating fuzzy logic techniques with a neural network model. (A) Integration of fuzzy logic into the pre-processing of input data. (B) Integration of fuzzy logic into the artificial neural network model or within the training of the artificial neural network model. (C) Integration of fuzzy logic for output data processing.

Table 1. Search strings.

Scientific Electronic Databases	Search Strings
IEEE Xplore	("Abstract”:neur* OR “Abstract”:net* OR “Abstract”:model) AND (“Abstract”:atten) AND (“Abstract”:fuzz) AND (“Abstract”:medic OR “Abstract”: diagn*) Filters Applied: 2020–2025
Science Direct	Find articles with these terms: (“neural network” OR model) AND attention AND fuzzy AND (medic OR diagnostic) Year(s): 2020–2025 Title, abstract or author-specified keywords: “neural network” AND attention AND fuzzy AND diagnostic
Springer Link	Keywords: (“neural network” OR model) AND attention AND fuzzy AND (medic OR diagnostic) Title: fuzz* AND (neur* OR net* OR model*) Start year: 2020 End year: 2025
Web of Science	AB = (neur* OR net* OR model) AND AB = atten* AND AB = fuzz* AND AB = (medic* OR diagn) TI = fuzz AND TI = (neur* OR net* OR model*) Start year: 2020 End year: 2025
ACM Digital Library	Search items from: The ACM Guide to Computing Literature Title: fuzz* AND (neur* OR net* OR model) Abstract: (“neur net” OR model) AND atten* AND fuzz* AND (medic* OR diagn*) Publication Date: Jan 2020–Dec 2025

Table 2. Medical conditions identified in our systematic review.

Type of Disease	Condition	References
Cancer and Tumors	Skin Cancer and Skin Lesions	[19,31]
	Lung Cancer	[13,32]
	Breast Cancer	[16,33]
	Brain Tumor	[15,18]
	Colon Cancer	[13]
	Lymphoblastic Leukemia	[34]
	Lymph Node Metastasis	[35]
	Prostate Cancer	[36]
Heart Conditions	Heart (Segmentation)	[9,24,37]
Heart Conditions	Heart Abnormalities	[14]
Neurological or Brain Diseases	Alzheimer’s Disease	[38]
	Autism	[39]
	Parkinson’s Disease	[17]
Pulmonary Conditions	COVID-19	[28,40]
	Pneumonia	[12]
	Lung Disorders	[23]
Ocular Conditions	Glaucoma	[11]
	Subretinal Fluid Lesions	[41]
	Fundus Segmentation	[25]
Gastrointestinal and Abdominal Conditions	Gastric Polyps	[22]
	Pancreas	[10]
	Abdominal Multiorgan (Segmentation)	[42]
Conditions Related to Diabetes	Diabetic Foot Ulcers	[43]
Conditions Related to Diabetes	Diabetic Retinopathy	[29]
Oral Health	Healthy Teeth	[44]

Table 3. ANNs reported in the publications of the systematic review.

Artificial Neural Network	Details of Artificial Neural Networks
Convolutional neural networks (CNNs)	CNNs were used in 28 publications, making them the most widely used type of neural network. They are mainly applied to tasks involving the classification or segmentation of medical images, such as MRIs, CT scans, and X-rays. In some cases, they were integrated with spatial attention and fuzzy logic mechanisms to improve the interpretation of relevant regions.
Multilayer perceptrons (MLPs)	MLPs were utilized in 18 publications primarily due to their simplicity, which makes them suitable for combination with other types of neural networks, attention modules, or fuzzy modules. We also identified their use as a final layer for classification tasks.
Transformers	Transformers appear in seven publications and stand out for their ability to model complex and temporal relationships between input features. They were often integrated with fuzzy modules to guide attention with fuzzy logic.
Gated recurrent units (GRUs)	GRUs were used in two publications, mainly applied to medical time series (e.g., physiological signal monitoring).
Graph neural networks (GNNs)	GNNs were used in two publications, as they have become a state-of-the-art method for predicting associations between diseases.
Other less common approaches	Long short-term memory networks (LSTM). Generative adversarial networks (GANs). Random-coupled neural network (RCNN).

Table 4. Attention mechanisms reported in the publications of the systematic review.

Attention Mechanisms	Details of the Attention Mechanisms
Self-Attention/Multihead Attention (Transformer style)	This type of attention appears in 10 publications, generally with the Transformers structure to focus on contextual or spatial features [9,11,13,14,15,16,33,36,39,44].
Channel-Spatial Attention	This type of attention is used in 12 publications, with versions such as: - Dual Attention (channel + spatial) [12,19,23,25,35,41]. - Hybrid Attention Mechanism (HAM) [22,31,38]. - Iterative Attention Fusion Modules [42]. These mechanisms are effective in image processing-based models, enabling the prioritization of key regions of the human body for segmentation tasks. Generally, this type of attention is based on convolutional neural networks.
Fuzzy Attention modules	Some mechanisms, such as Fuzzy Attention Gate [28] or Fuzzy-Enhanced Holistic Attention [29], integrate fuzzy logic directly into the attention process. We identified this type of attention in five publications [27,28,29,34,37].
Additional attention mechanisms	We also identified other concepts, such as the Attention Gate Module [43] and specialized networks, including the Contextual Attention Network [18].

Table 5. Fuzzy logic techniques reported in the publications of our systematic review.

Fuzzy Logic Techniques	Details of Fuzzy Logic Techniques
Fuzzy networks and fuzzy layers	Fuzzy networks and fuzzy layers, as used in publications, combine the nonlinear representation capacity of neural networks with the approximate reasoning characteristic of fuzzy logic. These techniques made it possible to handle the uncertainty, ambiguity, and imprecision inherent in much medical data, offering greater interpretability in models by making them more understandable from a human logic or human language perspective.
Fuzzy clustering	We observed that fuzzy clustering techniques were used to identify fuzzy patterns between classes and improve the segmentation or grouping of features. Their flexibility in the presence of ambiguity makes them useful in diagnoses with poorly defined boundaries in images.
Hybrid fuzzy techniques	We identified hybrid fuzzy techniques, including the Fuzzy-Enhanced Firefly Algorithm, Fuzzy Entropy, Fuzzy Rank-Based Ensemble, and Fuzzy Stacking, which indicated that fuzzy logic was integrated into both the architecture and feature processing.
Other fuzzy techniques	Fuzzy Attention. Fuzzy Pooling. Fuzzy Activation Functions. Fuzzy Contrastive Loss. Fuzzy Skip Connections.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zacarias-Morales, N.; Pancardo, P.; Hernández-Nolasco, J.A.; Garcia-Constantino, M. Artificial Neural Network, Attention Mechanism and Fuzzy Logic-Based Approaches for Medical Diagnostic Support: A Systematic Review. AI 2025, 6, 281. https://doi.org/10.3390/ai6110281

AMA Style

Zacarias-Morales N, Pancardo P, Hernández-Nolasco JA, Garcia-Constantino M. Artificial Neural Network, Attention Mechanism and Fuzzy Logic-Based Approaches for Medical Diagnostic Support: A Systematic Review. AI. 2025; 6(11):281. https://doi.org/10.3390/ai6110281

Chicago/Turabian Style

Zacarias-Morales, Noel, Pablo Pancardo, José Adán Hernández-Nolasco, and Matias Garcia-Constantino. 2025. "Artificial Neural Network, Attention Mechanism and Fuzzy Logic-Based Approaches for Medical Diagnostic Support: A Systematic Review" AI 6, no. 11: 281. https://doi.org/10.3390/ai6110281

APA Style

Zacarias-Morales, N., Pancardo, P., Hernández-Nolasco, J. A., & Garcia-Constantino, M. (2025). Artificial Neural Network, Attention Mechanism and Fuzzy Logic-Based Approaches for Medical Diagnostic Support: A Systematic Review. AI, 6(11), 281. https://doi.org/10.3390/ai6110281

Article Menu

Artificial Neural Network, Attention Mechanism and Fuzzy Logic-Based Approaches for Medical Diagnostic Support: A Systematic Review

Abstract

1. Introduction

2. Background and Related Work

2.1. Artificial Neural Networks (ANNs)

2.2. Attention Mechanisms

2.3. Fuzzy Logic

3. Methodology

3.1. Eligibility Criteria

3.2. Sources of Information

3.3. Search Strategy

3.4. Selection Process

3.5. Risk of Bias Assessment (Quality Assessment)

3.6. Data Extraction and Synthesis of Results

3.7. Information Bias

4. Results

4.1. Study Selection

4.2. Study Characteristics

4.3. Risk of Bias Within Studies

4.4. Synthesis of Results

4.4.1. RQ1: What Algorithms or Techniques of Artificial Neural Networks, Attention Mechanisms, and Fuzzy Logic Were Selected to Be Integrated?

4.4.2. RQ2: How Was the Integration Between Algorithms or Techniques of Artificial Neural Networks, Attention Mechanisms, and Fuzzy Logic Performed?

4.4.3. RQ3: What Impact Does the Integration of Algorithms or Techniques of Artificial Neural Networks, Attention Mechanisms, and Fuzzy Logic Have on the Outcome of the Proposal?

4.4.4. RQ4: What Are the Characteristics of the Input Data and of the Data to Be Predicted, Classified, or Inferred?

4.4.5. RQ5: What Methods or Metrics Were Used to Assess Results?

5. Discussion

5.1. Artificial Neural Network (ANN) Algorithms

5.2. Attention Mechanism Algorithms

5.3. Fuzzy Logic Algorithms

5.4. Integration of Artificial Neural Network Algorithms, Attention Mechanisms, and Fuzzy Logic

5.5. Criteria for Integrating Fuzzy Logic with Artificial Neural Network Models

5.6. Data and Metrics

5.7. Final Remarks on the Findings

5.8. Comparison with Other Reviews

5.9. Strengths and Limitations

5.10. Implications for Medical Practice and Research

5.10.1. Implications for Medical Practice

5.10.2. Implications for Research

5.11. Suggestions for Future Lines of Research

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Most Important Key Data Extracted to Answer the Research Questions of Our Systematic Review

Appendix B. Results of the Risk of Bias Assessment of the Included Publications on Our Systematic Review

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI