A Logic Tensor Network-Based Neurosymbolic Framework for Explainable Diabetes Prediction

Mondal, Semanto; Ferraro, Antonino; Pecorelli, Fabiano; De Pietro, Giuseppe

doi:10.3390/app152111806

Open AccessArticle

A Logic Tensor Network-Based Neurosymbolic Framework for Explainable Diabetes Prediction

¹

Department of Information Science and Technology, Pegaso University, Piazza Triste e Trento 48, 80132 Naples, Italy

²

Department of Electrical Engineering and Information Technology, University of Naples Federico II, 80125 Naples, Italy

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(21), 11806; https://doi.org/10.3390/app152111806

Submission received: 10 October 2025 / Revised: 2 November 2025 / Accepted: 4 November 2025 / Published: 5 November 2025

(This article belongs to the Special Issue Artificial Intelligence Innovations for Smart and Sustainable Healthcare)

Download

Browse Figures

Versions Notes

Abstract

Neurosymbolic AI is an emerging paradigm that combines neural network learning capabilities with the structured reasoning capacity of symbolic systems. Although machine learning has achieved cutting-edge outcomes in diverse fields, including healthcare, agriculture, and environmental science, it has potential limitations. Machine learning and neural models excel at identifying intricate data patterns, yet they often lack transparency, depend on large labelled datasets, and face challenges with logical reasoning and tasks that require explainability. These challenges reduce their reliability in high-stakes applications such as healthcare. To address these limitations, we propose a hybrid framework that integrates symbolic knowledge expressed in First-Order Logic into neural learning via a Logic Tensor Network (LTN). In this framework, expert-defined medical rules are embedded as logical axioms with learnable thresholds. As a result, the model gains predictive power, interpretability, and explainability through reasoning over the logical rules. We have utilized this neurosymbolic method for predicting diabetes by employing the Pima Indians Diabetes Dataset. Our experimental setup evaluates the LTN-based model against several conventional methods, including Support Vector Machines (SVM), Logistic Regression (LR), K-Nearest Neighbors (K-NN), Random Forest Classifiers (RF), Naive Bayes (NB), and a Standalone Neural Network (NN). The findings demonstrate that the neurosymbolic framework not only surpasses traditional models in predictive accuracy but also offers improved explainability and robustness. Notably, the LTN-based neurosymbolic framework achieves an excellent balance between recall and precision, along with a higher AUC-ROC score. These results underscore its potential for trustworthy medical diagnostics. This work highlights how integrating symbolic reasoning with data-driven models can bridge the gap between explainability, interpretability, and performance, offering a promising direction for AI systems in domains where both accuracy and explainability are critical.

Keywords:

neurosymbolic AI; logic tensor network (LTN); explainability; symbolic reasoning; first-order logic

1. Introduction

Accurate and timely diagnosis prediction is paramount in healthcare, as it directly influences patient outcomes and treatment efficacy. Over the years, computational methods have evolved from symbolic systems to advanced machine learning and deep learning models. Although these methods have shown significant success, they often struggle with a trade-off between performance and interpretability, which is a vital issue in healthcare environments where comprehending the rationale behind a diagnosis is crucial.

Early computational diagnostic tools predominantly relied on symbolic reasoning, utilising predefined logic rules to infer conclusions from medical data [1,2]. These symbolic frameworks are commonly referred to as Good Old-Fashioned AI (GOFAI) [3,4]. These systems offered high transparency but were constrained by the strictness of human-crafted rules, limiting their adaptability to the complexities of real-world clinical scenarios. Several conventional machine learning models, such as LR and SVM [5], improved generalisation by learning patterns from data. However, these models frequently demanded considerable feature engineering and, in some cases, sacrificed interpretability for improved performance. CNN and RNN are popular deep learning architectures [6,7,8] that have significantly improved predictive performance, particularly with unstructured data. However, the “black-box” characteristics of these systems have raised concerns about transparency and trustworthiness in healthcare applications [9,10].

Recent advancements have resulted in the integration of LLMs in the healthcare industry, leveraging their ability to understand and generate text that mimics human language [11]. LLMs can examine vast volumes of unstructured data, including medical records, to identify subtle and complex patterns. However, they inherit the interpretability challenges of deep learning models and often demand significant computational power, restricting their practical application in numerous healthcare settings.

The lack of interpretability and reasoning capability in conventional AI systems remains a key barrier to clinical adoption. Clinicians and healthcare providers require not only accurate predictions but also justifiable insights that align with established medical reasoning. There is thus a strong motivation to design AI models that integrate domain knowledge and logical constraints into the learning process, bridging the gap between statistical learning and symbolic reasoning. Neuro-symbolic AI (NeSy) represents an emerging paradigm that unites the structured reasoning of symbolic systems with the adaptive learning strengths of neural networks. This hybrid approach aims to bridge the gap between performance and explainability, making it appropriate for healthcare applications where both are paramount. Logical Neural Networks (LNNs), a subset of NeSy, have demonstrated the potential to integrate domain-specific knowledge through logical rules, thereby enhancing both accuracy and transparency in diagnosis prediction tasks [12,13,14].

Building upon this motivation, the study aims to develop an explainable and interpretable AI framework for diabetes prediction that unites medical domain knowledge with data-driven learning. Specifically, we introduce a Logic Tensor Network (LTN)-based neuro-symbolic framework that embeds first-order logic (FOL) medical axioms into the neural network through trainable thresholds incorporated directly into the loss function. This approach enables the model to learn from both empirical data and domain knowledge simultaneously, ensuring medical reasoning consistency during training and inference.

To make the study more comprehensive and to gain a better overview of neurosymbolic integration for explainable and interpretable systems, we have defined the following research questions:

RQ1: How does the performance of LTN with trainable thresholds compare with the traditional machine learning models in the diabetes prediction task?
RQ2: How does an LTN-based neurosymbolic framework introduce interpretability, explainability and reasoning in the diabetes prediction task?
RQ3: What are the effects of different symbolic components on the overall performance of the LTN-based neurosymbolic framework?

The key contribution of this work is the design of an LTN-based neuro-symbolic model with trainable medical thresholds that integrates clinical axioms directly into the learning process through differentiated loss functions. This design enables interpretable, logic-consistent, and high-performing diabetes prediction by allowing the model to learn from both empirical data and structured medical knowledge. By embedding first-order logic constraints during training, rather than relying on post-hoc interpretability, the proposed framework ensures that predictions remain clinically meaningful and explainable. This study thereby introduces an LTN with trainable medical thresholds combined with clinical axioms and differentiated losses. The research is guided by three key questions (RQ1–RQ3) that evaluate performance, interpretability, and the influence of symbolic knowledge on prediction quality. We have also prepared a Streamlit-based web interface to demonstrate the proof of concept, which can be utilized for real-time diabetes prediction.

An overview of similar studies related to diabetic predictions is provided in Section 2. The dataset description and data preprocessing steps are described in Section 3. Details of the custom loss function and the overall methodology covering medical predicate definitions, logical formulation, training, and inference, can be found in Section 4 and Section 5. An overview of the evaluation metrics, results, and ablation studies is included in Section 7, Section 8 and Section 9, whereas based on these results, the proposed research questions are addressed in Section 10. Finally, the study concludes with a proof of concept, demonstrating the practical implementation and conclusions in Section 11 and Section 12.

2. Related Work

Diabetes prediction has been an active area of research in which both conventional machine learning (ML) and modern deep learning (DL) techniques have been applied extensively. Recent advancements also explore explainable AI (XAI) and neuro-symbolic reasoning to enhance model interpretability and trustworthiness in clinical contexts. This section reviews the main research trends, categorized into four subdomains.

2.1. Traditional Machine Learning Approaches for Diabetes Prediction

Classical ML models such as Logistic Regression (LR), Support Vector Machines (SVM), Random Forest (RF), and Decision Trees (DT) have been widely used for diabetes prediction due to their efficiency and interpretability. The National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) developed the well-known Pima Indians Diabetes Dataset, which remains a standard benchmark for comparative analysis. Numerous studies have utilized this dataset to compare traditional and advanced algorithms.

For instance, Sisodia (2023) [15] applied LR, SVM, and DT classifiers, achieving an accuracy of 78.69% with SVM. Similarly, Elluri et al. (2025) [16] employed ensemble techniques and feature selection methods, improving predictive performance with a maximum accuracy of 81.2%. Chang et al. (2022) [17] evaluated DT, RF, and NB, obtaining accuracies of 94.4%, 94%, and 91%, respectively, even with minimal preprocessing. Ahmed et al. (2024) [18] reported random forest achieving 80% accuracy and an AUC of 0.83, and with Naive Bayes reported an AUC of 0.81 while also employing cross-validation.

2.2. Deep Learning Approaches

Deep learning models have demonstrated improved performance in capturing nonlinear relationships within clinical data. Huma et al. (2020) [19] employed a Multilayer Perceptron (MLP), achieving 98.07% accuracy on the Pima dataset through optimized hyperparameters. Zhang et al. (2024) [20] proposed DiabetesNet, a Back-Propagation Neural Network (BPNN)-based framework, reaching 89.81% accuracy. Similarly, Dutt et al. (2018) [21] used a Multi-Layer Feedforward Neural Network (MLFNN) and achieved 82.5% accuracy. These works underscore the potential of DL in feature abstraction and high predictive accuracy, though often at the expense of interpretability.

2.3. Explainable and Interpretable AI in Diabetes Risk Prediction

With the increasing deployment of AI in medical diagnosis, model interpretability has become essential. Explainable AI (XAI) methods aim to provide insights into model decisions using tools such as SHAP, LIME, and attention visualization. For example, Li et al. (2025) [22] proposed an interpretable framework combining Gradient Boosting with SHAP values for diabetes risk explanation. Similarly, Kutlu et al. [23] introduced interpretability in the XGBoost prediction of diabetes risk using the SHAP method, which provides valuable information on the prediction of the model. These approaches, while improving interpretability, often rely on post-hoc explanations that do not inherently encode reasoning.

2.4. Neuro-Symbolic and Logic-Based Methods

To address the gap between learning and reasoning, neuro-symbolic frameworks such as Logic Tensor Networks (LTNs) and Logical Neural Networks (LNNs) have been developed. LTNs, proposed by Badreddine et al. (2022) [24], unify deep learning with First-Order Logic (FOL), allowing symbolic constraints to guide learning. In healthcare, Lu et al. (2025) [13] applied LNNs to diabetes diagnosis, achieving 80.52% accuracy and an AUC of 0.8457, outperforming LR and RF while providing reasoning capability. Similarly, Chen et al. (2025) [25] discussed several studies and methodologies in which LTN-based reasoning is applied to clinical risk prediction, showing improved explainability without compromising performance.

2.5. Summary and Research Gap

While ML and DL approaches achieve high predictive accuracy, most lack intrinsic reasoning and transparent decision making. Post-hoc XAI methods improve interpretability but do not enable logical consistency. Neuro-symbolic frameworks, such as LTNs, provide a promising direction by integrating reasoning with learning. Drawing inspiration from the study conducted by Lu et al. (2025) [13], we present a neuro-symbolic approach using the logic tensor network (LTN) framework. This approach not only maintains acceptable accuracy but also incorporates reasoning capabilities, offering a more interpretable and explainable solution for medical diagnosis tasks.

2.6. Protocol Comparability

It is important to note that reported performance figures on the Pima Indian Diabetes dataset vary widely across studies, often due to differences in preprocessing, data splitting, and evaluation protocols. There are some studies where authors have reported extremely high accuracies such as 98.07% as mentioned by Huma et al. (2020) [19] in their study. They have utilized specific data handling strategy to handle invalid zero values, shuffled sampling and model validation. On the other head, in most of the studies the result is not reported using cross-validation. Such inconsistencies limit the direct comparability of reported results across studies.

3. Dataset Description

The Pima Indians Diabetes Dataset serves as a well-known standard dataset for tasks related to predicting diabetes. It consists of medical diagnostic measurements for female patients of Pima Indian heritage, aged 21 years and above. The dataset consists of 768 instances with eight numerical input features and a binary target label, which indicates the presence or absence of diabetes. The eight input features are as follows:

Number of pregnancies;
Plasma glucose concentration (mg/dL);
Diastolic blood pressure (mm Hg);
Triceps skinfold thickness (mm);
2-Hour serum insulin (mu U/mL);
Body mass index (BMI, weight in kg/(height in m²));
Diabetes pedigree function (a genetic risk score);
Age (years).

The target variable is a binary label, with 0 denoting the absence of diabetes and 1 denoting its presence. The dataset has an unequal distribution of classes, and no techniques for imputation or balancing are utilized to assess the model’s performance based on the original, imbalanced class distribution.

In this study, the data set is normalized using the standard scaler (Z-score normalization) and missing values are handled using median imputation before training the Logic Tensor Network (LTN) model for the prediction of diabetes.

Table 1 shows the summary of the variables with invalid zeros, the percentage of these values, and the distribution before and after computation.

4. Loss Function

We integrate domain knowledge into the training process through a hybrid loss function that unites binary cross-entropy loss (BCE) (

L_{bce}

) with fuzzy logic-based axiom satisfaction loss (

L fuzzy

). The total loss is defined as follows:

L_{total} = α \cdot L_{bce} + (1 - α) \cdot L_{fuzzy}

(1)

Here, the hyperparameter

α \in [0, 1]

balances the contribution of the BCE loss and the axiom satisfaction component.

S_{total}

is the aggregated fuzzy satisfaction score and (

L_{fuzzy}

) is the total fuzzy loss defined as follows:

S_{total} = SatAgg (S_{\sup}, λ \cdot S_{axioms})

(2)

L_{fuzzy} = 1 - S_{total}

(3)

where:

$S_{\sup}$ is the supervision satisfaction degree from the labeled data;
$S_{axioms} = SatAgg (\forall x_{1} ϕ_{1}, \dots, \forall x_{n} ϕ_{n})$ is the aggregated satisfaction of medical axioms;
$λ$ is a weight controlling the influence of domain knowledge axioms;
SatAgg denotes the fuzzy aggregation function used in Logic Tensor Networks.

This formulation encourages the model not only to minimize prediction error but also to adhere to predefined medical constraints, improving interpretability and robustness.

5. Methodology

The implementation strategy of the LTN-based Neuro-symbolic framework is discussed in this section. We briefly discuss the logic tensor network, along with the implementation of the medical predicates aligned with clinical standards. The formulation of medical axioms and their integration into the learning process using LTNtorch framework is also discussed.

5.1. Overview of Logic Tensor Network

LTNs combine a sub-symbolic component with first-order logic (FOL) by embedding logical axioms into the training process. FOL can represent facts and rules, where facts are the known truths and rules are the logical statements that connect facts and allow inference. They enable the incorporation of domain knowledge, expressed as logical axioms, directly into the learning process. This integration allows models to perform both data-driven learning and logical reasoning [24]. In LTNs, the predicates are represented by the neurons, which can be feedforward neural networks, and different logical connectives such as AND, OR, NOT, FORALL, and EXISTS are implemented using fuzzy logic operators. LTNs use neural networks to learn and predict properties and relationships from raw data and first-order logic rules to encode knowledge. The neural predicates provide fuzzy truth values, and the logic layer uses these fuzzy truth values to evaluate logical formulas and provide a consistency measure for reasoning. LTNs provide interpretable decisions with the help of domain knowledge as logic rules, and due to the integration of a neural component, they can also handle both structured and unstructured data. In this study, we have used the version v1.0.2 of the LTNtorch framework, a PyTorch-based implementation of LTNs [26] using PyTorch version of 2.6.0. The LTNtorch framework grounds the medical axiom in a tensor to make these differentiable and trainable.

5.2. Medical Predicates and Feature Representation

In order to integrate domain-specific knowledge into the learning process, we define a set of logical predicates that are based on the well-established risk indicators of the WHO and the ADA for diabetes. These predicates capture threshold-based conditions related to the features, including glucose level, body mass index (BMI), age, and diabetes pedigree function (DPF). Table 2 gives an overview of the predicates with their clinical definition and medical basis.

5.3. Logical Formulation

The predicates defined in Table 2 are encoded as First-Order Logic (FOL) axioms, also known as medical knowledge axioms. These axioms describe both risk-enhancing and risk-reducing associations with diabetes. These axioms are integrated into the training process using the LTNtorch framework. In general, these axioms enforce clinically consistent reasoning.

First-order logic enables the representation of general rules using quantified variables, logical predicates, and connectives such as implication (→), conjunction (∧), and negation (¬). The universal quantifier (∀) is used to denote that the logical statement applies to all individuals in the domain (i.e., all patients x in our case). We define three categories of axioms as follows:

Risk-enhancing axioms:

These rules encode conditions that increase the likelihood of diabetes, but do not guarantee a diagnosis and influence the overall fuzzy logic formulation.

\begin{matrix} \forall x & : GlucoseDiabetic (x) \to Diabetic (x) \\ \forall x & : BMIObese (x) \to Diabetic (x) \\ \forall x & : BMIOver (x) \to Diabetic (x) \\ \forall x & : AgeRisk (x) \to Diabetic (x) \\ \forall x & : DPFHigh (x) \to Diabetic (x) \\ \forall x & : (BMIObese (x) \land AgeRisk (x)) \to Diabetic (x) \\ \forall x & : (DPFHigh (x) \land (BMIOver (x) \lor AgeRisk (x)) \to Diabetic (x) \end{matrix}

Risk-reducing axioms: These rules reduce the likelihood of diabetes without guaranteeing a negative diagnosis but influence the overall decision-making process of the model.

\begin{matrix} \forall x & : GlucoseNormal (x) \to \neg Diabetic (x) \\ \forall x & : BMINormal (x) \to \neg Diabetic (x) \\ \forall x & : BMIUnder (x) \to \neg Diabetic (x) \\ \forall x & : AgeYoung (x) \to \neg Diabetic (x) \\ \forall x & : DPFLow (x) \to \neg Diabetic (x) \\ \forall x & : (BMINormal (x) \land AgeYoung (x) \land DPFLow (x)) \to \neg Diabetic (x) \end{matrix}

Consistency constraints: These mutually exclusive constraints are implemented to prevent logical contradictions in the learned model, ensuring that predictions adhere to classical logic properties.

\begin{matrix} \forall x & : GlucoseNormal (x) \to \neg GlucoseDiabetic (x) \\ \forall x & : \neg GlucoseDiabetic (x) \to GlucoseNormal (x) \end{matrix}

5.4. Implementation of Predicates Using Trainable Thresholds

The candidate thresholds are initialized on the basis of the medical standards of the ADA and WHO. The thresholds are trainable, which allows the model to adapt to the dataset’s distribution while being guided by clinical knowledge. Each predicate is implemented as a differentiable function, producing a truth value in [0, 1] to represent fuzzy membership. Formally, each predicate except Diabetic is modeled as a trainable threshold function:

P (x) = σ (s \cdot (x_{feature} - τ))

(4)

where:

$x_{feature}$ is the patient feature corresponding to the predicate;
$τ$ is the trainable threshold;
s is the slope (typically fixed at 10); and
$σ$ denotes the sigmoid function.

The Diabetic predicate is implemented as a feed-forward neural network, as shown in Figure 1. It outputs a fuzzy truth value in the range

[0, 1]

, representing the likelihood of diabetes:

Diabetic (x) = σ (W_{3} \cdot Dropout (ELU ({BN}_{2} (W_{2} \cdot Dropout (ELU ({BN}_{1} (W_{1} x)))))) + b_{3})

(5)

where:

x is the patient feature vector;
$W_{i}$ and $b_{i}$ are weight matrices and biases for layer i;
${BN}_{i}$ denotes batch normalization at layer i;
$ELU (\cdot)$ is the exponential linear unit activation;
Dropout is applied to prevent overfitting; and
$σ$ is the sigmoid function mapping the output to $[0, 1]$ .

5.5. Integration of Medical Axioms in Training

To embed domain-specific medical knowledge into model training, we jointly optimize the Diabetic neural network and the trainable threshold predicates under the guidance of first-order logic (FOL) axioms. The axioms, categorized as risk-enhancing and risk-reducing, encode both positive and negative associations between patient features and diabetes risk. Logical connectives are implemented as differentiable operators using the LTNtorch framework, which allows the evaluation of each axiom as a continuous satisfaction score. During training, the model jointly optimizes the empirical loss, which is the cross-entropy between predicted and true diabetic labels and logic loss, which is the average degree of violation across all axioms. The workflow of the proposed framework is illustrated in Figure 2.

Algorithm 1 summarizes the complete workflow for training the LTN with trainable predicates and FOL axioms and the inference. At each epoch, the model computes predicate truth values, evaluates logical satisfaction scores, calculates the total loss, and updates both neural network parameters and predicate thresholds via gradient descent.

Algorithm 1: Diabetes Prediction with LTN using Trainable Thresholds

Input:: Patient feature vector x, trainable predicates $P$ , axioms $A$ , Diabetic neural network, hyperparameters $α \in [0, 1]$ , $λ \in [0, 1]$ .
Output:: Fuzzy truth value $Diabetic (x)$ and explanation of prediction.

Training Phase: Applsci 15 11806 i001

Inference Phase:

1.: Evaluate predicates $p_{i} (x)$ for input x.
2.: Compute $Diabetic (x)$ using the neural network.
3.: Compute axiom satisfaction scores and combine with prediction.
4.: Output final fuzzy truth value $Diabetic (x)$ and explanation.

6. Experimental Settings

The experiments were carried out on a local machine equipped with an NVIDIA RTX 4060 GPU (NVIDIA, Santa Clara, CA, USA) with 8 GB of VRAM and 16 GB of system RAM, utilising CUDA version 11.8 to leverage GPU acceleration for training processes. All models were implemented in PyTorch and trained on a standard computing environment. No additional data augmentation or synthetic oversampling techniques were applied.

To ensure fair and consistent evaluation, all models in this study were trained and tested using a 5-fold stratified cross-validation strategy. The dataset was split into 5 folds, ensuring that each fold maintained the original class distribution. A fixed random seed of 42 was used to guarantee reproducibility.

For neural network-based models, the same architecture was adopted across all experiments. For each fold, the models were trained over 500 epochs using the AdamW optimizer with a learning rate of 0.001.

There are two hyperparameters in the loss function, including

α

and

λ

. We performed a grid search-based sensitivity analysis on

α \in {0.25, 0.50, 0.75}

and

λ \in {0.1, 0.3, 0.4, 0.5}

to analyze their impact on model performance and logical satisfaction. Based on the grid search result in Table 3, the configuration

α = 0.5

and

λ = 0.4

was selected because it provides the best trade-off between predictive performance and logical satisfaction.

Table 4 shows the summary of different parameters set during training. For the classical machine learning models, some parameters were set to the default value.

7. Evaluation Metrics

For a comprehensive assessment of the proposed models, we have employed the following standard metrics pertinent to binary classification tasks.

Accuracy: Represents the percentage of instances that are correctly predicted (both true positives and true negatives) among all instances. It is defined as follows:

$Accuracy = \frac{T P + T N}{T P + T N + F P + F N}$

(6)

Accuracy provides a general performance measure but can be misleading in the presence of class imbalance [27].
Precision: Evaluates the percentage of positive identifications that were genuinely accurate. It is determined by calculating

$Precision = \frac{T P}{T P + F P}$

(7)

Precision is vital in scenarios where the cost of false positives is high, such as in medical diagnoses [27].
Recall (Sensitivity): Reflects the proportion of actual positives that were correctly identified. It is given by

$Recall = \frac{T P}{T P + F N}$

(8)

Recall is particularly important when the cost of false negatives is significant [27].
F1 Score: The harmonic mean of precision and recall, providing a balance between the two. It is computed as

$F 1 Score = 2 \cdot \frac{Precision \cdot Recall}{Precision + Recall}$

(9)

The F1 Score is especially useful in situations with class imbalance, as it considers both false positives and false negatives [27].
Area Under the ROC Curve (AUC-ROC): Demonstrates the model’s capacity to differentiate between classes at different threshold levels. The ROC curve displays the true positive rate in relation to the false positive rate, and the AUC measures the overall performance. An AUC of 1.0 indicates flawless classification, whereas an AUC of 0.5 implies a lack of discriminative power [28].
Area Under the Precision-Recall Curve (PR-ROC): PR-ROC evaluates the trade-off between precision and recall across different thresholds.

Instead of relying solely on accuracy, considering multiple metrics provides a more reliable assessment. Accuracy offers a general measure of correct predictions, but it can be misleading when class imbalance exists, and it also does not reflect the consequences of false positives and false negatives. In this study, we deal with an imbalanced dataset, so recall and precision are also important. Recall reflects the capability of the model to correctly recognize true cases of diabetes. High recall is considered in medical applications because false negatives may be harmful and delay the treatment process. On the other hand, precision determines how many of the patients who are predicted as diabetic are truly diabetic. False positives may lead to unnecessary follow-up, but these are generally less harmful than missed diagnoses. The F1-score gives a balanced metric by considering both precision and recall, which helps to understand the trade-off between false positives and false negatives. AUC-ROC helps to assess the model’s discriminative ability between different classes, which is important for medical settings because it evaluates the discriminative power of the model across thresholds. The problem with AUC-ROC is that it might give misleadingly high performance estimates in imbalance datasets since it is less sensitive to the minority class. To handle this situation, we have also considered PR-ROC which can handle imbalance situations and gives clear picture when the positive class is smaller than the negative class.

8. Results

Table 5 presents a comparative summary of the proposed LTN model’s performance alongside several empirical machine learning models, including LR, SVM, RF, K-NN, NB, and a standalone NN. A 5-fold stratified cross-validation strategy is employed for evaluation, ensuring that each fold maintains the original class distribution. All models are evaluated on the Pima Indian Diabetes dataset using the same experimental setup and consistent hyperparameters. The architecture of the standalone neural network is identical to the one used as the Diabetic predicate in the LTN model (Figure 1). To convert probabilistic outputs into binary predictions, the optimal threshold is determined from the ROC curve computed on the validation set within each cross-validation fold and then applied to the corresponding test subset, selecting the point that maximizes the difference between the true positive rate and false positive rate, instead of using a 0.5 cut-off-based approach, which is sometimes misleading in healthcare applications.

The LTN model with trainable thresholds outperforms all other baseline models across all performance metrics. The model achieves a recall of 0.83, which is particularly important in medical diagnostics, as it reduces the number of false negatives. The F1 score of 0.80, AUC-ROC of 0.92 and PR-AUC of 0.86 further demonstrate that the model shows a balanced performance and robust discriminative capability. Figure 3 provides a visual comparison of key metrics, highlighting the consistent advantage of the LTN framework over classical models.

We have performed the Wilcoxon signed-rank test [29] to evaluate whether the LTN model statistically outperforms other machine learning models in terms of F1 score. All models were trained and evaluated using identical stratified 5-fold splits, ensuring that the Wilcoxon test was applied on paired results from the same folds. The null hypothesis (

H_{0}

) assumes no difference between the models, while the alternative hypothesis (

H_{1}

) assumes that the LTN achieves higher performance. Since multiple hypotheses were tested, the Holm–Bonferroni correction was applied to control the family-wise error rate at

α = 0.05

. The results, summarized in Table 6, indicate that the LTN significantly outperforms all baseline models.

In this study, we made the thresholds of the predicates trainable to introduce interpretability. During training, initial candidate values were set based on ADA/WHO guidelines (e.g., OGTT 2h: 140/200 mg/dL for glucose predicates). To compute the final trained value of the predicates threshold, we ran the experiments five times with different random seeds and recorded the results. The mean value of each predicate threshold, along with the lower and upper bounds of the 95% confidence interval, is reported in Table 7. Importantly, the training process respects all hard axioms, ensuring that the final thresholds remain clinically consistent while learning from the data.

The quantitative findings suggest that the LTN model consistently surpasses the baseline models in various evaluation metrics, especially in maintaining a balance between precision and recall.

9. Ablation Study

To assess the role of different components of the proposed Logic Tensor Network (LTN) model, we have conducted an ablation study on the Pima Indian Diabetes dataset. Table 8 summarizes the performance under four setups: (i) LTN without axioms or threshold supervision (only supervision satisfaction loss), (ii) LTN with fixed thresholds, (iii) LTN with trainable thresholds, and (iv) LTN with automatic rule generation using a decision tree.

The ablation study reveals the contribution of each of the components, such as the medical axioms, and fixed and trainable thresholds. In the first case, LTN without any domain guidance, with supervision satisfaction and binary cross-entropy loss, achieves comparatively lower AUC-ROC (0.90), F1 score (0.79) and Stotal (0.65). Introducing fixed thresholds along with medical axioms improves all metrics except recall, particularly F1 (0.80), highlighting the benefit of incorporating domain knowledge as constraints, which also improves clinical interpretability. Trainable thresholds further enhance performance by increasing precision (0.78), AUC-ROC (0.92) and Stotal (0.72), indicating that adapting thresholds to dataset-specific distributions provides additional flexibility while improving model interpretability. Finally, integrating automatic rule generation using a decision tree with candidate thresholds increases recall (0.88) while slightly reducing precision (0.73), suggesting that data-driven rule refinement improves detection of positive cases at the cost of false positives. Overall, these results confirm that the integration of medical axioms with trainable thresholds improves model performance while introducing interpretability. The predictions are accompanied by clinically aligned explanations derived from the medical axioms, which introduce explainability and reasoning into the decision-making process.

10. Discussion

The Logic Tensor Network (LTN)-based neurosymbolic model with trainable thresholds achieved an accuracy of 86%, precision of 78%, recall of 83%, F1-score of 80%, an AUC-ROC of 0.92 and PR-AUC of 0.86, outperforming all classical machine learning approaches including Logistic Regression (LR), Standalone Neural Network (NN), Support Vector Machine (SVM), K-Nearest Neighbors (K-NN), Random Forest (RF), and Naive Bayes (NB), as shown in Table 5 which answers RQ1. This demonstrates that integrating medical axioms with data-driven learning can improve both predictive performance and interpretability.

The medical axioms with the candidate thresholds are defined by following clinical standards. These thresholds are also trainable and are updated by gradient descent during the backpropagation process. This makes the framework clinically interpretable. Model training is jointly guided by data and axioms, enabling reasoning over predefined medical rules. During inference, the model provides a final prediction along with an explanation of the prediction in terms of predicate evaluation and axiom satisfaction. These ensure explainability and reasoning of the framework, and hence answer RQ2.

To address RQ3, we refer to Table 8 in the ablation study. The results show that the LTN without medical knowledge does not achieve optimal performance, while incorporating medical axioms with fixed thresholds enhances the model’s overall performance. Furthermore, employing a trainable threshold improves both the overall F1 score and the interpretability of the model.

Despite these promising results, there are some limitations which should be acknowledged. The Pima Indian dataset contains entries of adult women only from the Pima heritage; therefore, the thresholds derived may not generalize to other populations without recalibration. Additionally, training over 500 epochs on a relatively small dataset poses a potential risk of overfitting. This risk was mitigated using the AdamW optimizer, which provides effective weight decay, and a dropout rate of 0.2 in the neural components of the model. Future work may explore early stopping, and testing on larger and more diverse cohorts to further enhance generalizability.

Other future directions include extending the LTN framework to other chronic diseases, incorporating multi-modal data sources, and investigating automated axiom refinement strategies to enhance both performance and interpretability in clinical decision support applications.

11. Proof of Concept

To demonstrate the practical applicability of the proposed model, we have developed an interactive user interface using Streamlit. This user interface (UI) enables users to input patient features and receive real-time diabetes risk predictions, along with explanations based on the neurosymbolic model, highlighting the model’s potential for clinical decision support. The practical implementation of a diabetes detection system using the LTN-based neurosymbolic AI is shown in Figure 4. For the model prediction case, it shows the confidence that the patient is diabetic. Therefore, for the positive case, the value is higher, whereas in the negative case, the value is lower than 50%.

12. Conclusions

This work highlights the effectiveness of neurosymbolic AI in enhancing predictive model performance by integrating domain knowledge through logical constraints, specifically using first-order logic (FOL) as medical axioms. The proposed approach not only demonstrates promising accuracy but also enables guided and interpretable training, ensuring that the model’s predictions are consistent with expert medical knowledge.

While this study focuses on diabetes prediction within the healthcare context, the underlying methodology is broadly applicable across domains where incorporating structured domain knowledge can improve model reliability and transparency. Although FOL are used here as the knowledge representation, other forms, such as knowledge graphs, can also be integrated as domain knowledge bases, paving the way for richer, more interpretable, and robust AI models.

In the current implementation, during inference, a prediction is provided along with an explanation based on the satisfaction of the medical axioms and the confidence score of the predicates. The proposed LTN-based neurosymbolic model thus presents a sustainable direction for interpretable AI, achieving promising performance even on a relatively small dataset. However, external validation on larger and more diverse clinical datasets will be essential before translating these findings into real-world medical practice. Future integration with large language models (LLMs) can further enhance the explainability by providing more structured explanations, though at the cost of increased model complexity and power consumption.

Author Contributions

Conceptualization, S.M., A.F., F.P. and G.D.P.; methodology, S.M., A.F., F.P. and G.D.P.; software, S.M., A.F., F.P. and G.D.P.; validation, S.M., A.F., F.P. and G.D.P.; formal analysis, S.M., A.F., F.P. and G.D.P.; investigation, S.M., A.F., F.P. and G.D.P.; resources, S.M., A.F., F.P. and G.D.P.; data curation, S.M., A.F., F.P. and G.D.P.; writing—original draft preparation, S.M., A.F., F.P. and G.D.P.; writing—review and editing, S.M., A.F., F.P. and G.D.P.; visualization, S.M., A.F., F.P. and G.D.P.; supervision, A.F., F.P. and G.D.P.; project administration, A.F., F.P. and G.D.P.; funding acquisition, A.F., F.P. and G.D.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset used in this study, the Pima Indians Diabetes Database, is publicly available at https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database (accessed on 28 May 2025). The source code for implementing the proposed framework is openly accessible at https://github.com/Pegaso-University-DINF/Explainable-Diabetes-Prediction (accessed on 28 May 2025).

Acknowledgments

The authors would like to thank the project “Models for Explainable Reasoning and Learning through Integration (MERLIN)” (CUP/ID: PRA2025001), PRA 2025 – Progetti di Ricerca di Ateneo dell’ Università Pegaso (Decreto N. 231/2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

LTN	Logic Tensor Network
FOL	First Order Logic
LR	Logistic Regression
SVM	Support vector Machine
RF	Random Forest
K-NN	K-Nearest Neighbors
NB	Naive Bayes
NN	Neural Network
DT	Decision Tree
AUC-ROC	Area Under the Curve (AUC)-Receiver-Operating Characteristic Curve (ROC)
CNN	Convolutional Neural Network
RNN	Recurrent Neural Network
LLMs	Large Language Models
NeSy	Neuro-Symbolic
ML	Machine Learning
DL	Deep Learning
NIDDK	National Institute of Diabetes and Digestive and Kidney Diseases
MLP	Multi Layer Perceptron
MLFNN	Multi-Layer Feed Forward Neural Networks
BPNN	Back-Propagation Neural Network
BMI	Body Mass Index
DPF	Diabetes Pedigree Function
ADA	American Diabetes Association
WHO	World Health Organization
OGTT	Oral Glucose Tolerance Test
AdamW	Adaptive Moment Estimation with Weight Decay
TP	True Positive
TN	True Negative
BN	Batch Normalization
ELU	Exponential Linear Unit
UI	User Interface
BCE	Binary Cross-Entropy
AI	Artificial Intelligence
SHAP	SHapley Addictive exPlanations
LIME	Local Interpretable Model-agnostic Explanations
lbfgs	Limited-memory Broyden-Fletcher-Goldfarb-Shanno

References

Eyasu, K.; Jimma, W.; Tadesse, T. Developing a Prototype Knowledge-Based System for Diagnosis and Treatment of Diabetes Using Data Mining Techniques. Ethiop. J. Health Sci. 2020, 30, 115–124. [Google Scholar] [CrossRef]
Dhandhania, V.; Choubey, D.; Paul, S. Rule based diagnosis system for diabetes. Biomed. Res. 2017, 28, 5196–5209. [Google Scholar]
Emma, L. The Evolution of Artificial Intelligence: From Symbolic AI to Deep Learning. March 2025. Available online: https://www.researchgate.net/profile/Lawrence-Emma/publication/390544723_The_Evolution_of_Artificial_Intelligence_From_Symbolic_AI_to_Deep_Learning/links/67f32bd095231d5ba5b99670/The-Evolution-of-Artificial-Intelligence-From-Symbolic-AI-to-Deep-Learning.pdf (accessed on 14 September 2025).
Ilkou, E.; Koutraki, M. Symbolic Vs Sub-symbolic AI Methods: Friends or Enemies? CIKM 2020, 2699. [Google Scholar] [CrossRef]
Hassannataj Joloudari, J.; Saadatfar, H.; Dehzangi, I.; Shamshirband, S. Computer aided decision-making for predicting liver disease using PSO-based optimized SVM with feature selection. Inform. Med. Unlocked 2019, 17, 100255. [Google Scholar] [CrossRef]
Barolli, L.; Ferraro, A. A Prediction Approach in Health Domain Combining Encoding Strategies and Neural Networks. In Advances on P2P, Parallel, Grid, Cloud and Internet Computing; Barolli, L., Ed.; Springer International Publishing: Cham, Switzerland, 2023; pp. 129–136. [Google Scholar] [CrossRef]
Junejo, A.; Kaabar, M.; Ullah, I.; Khan, R.; Ma, Y.-K. Deep Learning in Cancer Diagnosis and Prognosis Prediction: A Minireview on Challenges, Recent Trends, and Future Directions. Comput. Math. Methods Med. 2021, 2021, 28. [Google Scholar] [CrossRef]
Chen, R.; Stewart, W.; Sun, J.; Ng, K.; Yan, S. Recurrent Neural Networks for Early Detection of Heart Failure from Longitudinal Electronic Health Record Data. Circ. Cardiovasc. Qual. Outcomes 2019, 12, e005114. [Google Scholar] [CrossRef]
Rasheed, K.; Qayyum, A.; Ghaly, M.; Al-Fuqaha, A.; Razi, A.; Qadir, J. Explainable, trustworthy, and ethical machine learning for healthcare: A survey. Comput. Biol. Med. 2022, 149, 106043. [Google Scholar] [CrossRef]
Esposito, C.; Janneh, M.; Spaziani, S.; Calcagno, V.; Bernardi, M.L.; Iammarino, M.; Verdone, C.; Tagliamonte, M.; Buonaguro, L.; Pisco, M.; et al. Assessment of Primary Human Liver Cancer Cells by Artificial Intelligence-Assisted Raman Spectroscopy. Cells 2023, 12, 2645. [Google Scholar] [CrossRef]
Zhou, S.; Xu, Z.; Zhang, M.; Xu, C.; Guo, Y.; Zhan, Z.; Ding, S.; Wang, J.; Xu, K.; Fang, Y.; et al. Large Language Models for Disease Diagnosis: A Scoping Review. arXiv 2024, arXiv:2409.00097. [Google Scholar] [CrossRef]
Ferraro, A.; Galli, A.; La Gatta, V.; Minocchi, M.; Moscato, V.; Postiglione, M. Few Shot NER on Augmented Unstructured Text from Cardiology Records. In Advances in Internet, Data & Web Technologies; Barolli, L., Ed.; Springer Nature: Cham, Switzerland, 2024; pp. 1–12. [Google Scholar] [CrossRef]
Lu, Q.; Li, R.; Sagheb, E.; Wen, A.; Wang, J.; Wang, L.; Fan, J.W.; Liu, H. Explainable Diagnosis Prediction through Neuro-Symbolic Integration. arXiv 2025, arXiv:2410.01855. [Google Scholar]
Aversano, L.; Bernardi, M.L.; Cimitile, M.; Iammarino, M.; Verdone, C. An Enhanced UNet Variant for Effective Lung Cancer Detection. In Proceedings of the 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy, 18–23 July 2022; IEEE: Padua, Italy, 2022; pp. 1–8. [Google Scholar] [CrossRef]
Sisodia, D.; Sisodia, D. Prediction of Diabetes using Classification Algorithms. Procedia Comput. Sci. 2023, 132, 1578–1585. [Google Scholar] [CrossRef]
Elluri, S. Enhanced Diabetes Prediction Using a Hybrid Machine Learning Framework with Feature Selection and Weighted Ensemble Classification. Healthcare 2025, 35, 16157–16173. [Google Scholar]
Chang, V.; Bailey, J.; Xu, Q.; Sun, Z. Pima Indians diabetes mellitus classification based on machine learning (ML) algorithms. Neural Comput. Appl. 2022, 35, 16157–16173. [Google Scholar] [CrossRef] [PubMed]
Ahmed, A.; Khan, J.; Arsalan, M.; Jalal, K.; Shahat, A.; Alhalmi, A.; Naaz, S. Machine Learning Algorithm-Based Prediction of Diabetes Among Female Population Using PIMA Dataset. Healthcare 2024, 13, 37. [Google Scholar] [CrossRef]
Huma, S.; Ahuja, S. Deep learning approach for diabetes prediction using PIMA Indian dataset. J. Diabetes Metab. Disord. 2020, 19, 391–403. [Google Scholar] [CrossRef]
Zhang, Z.; Ahmed, K.A.; Hasan, M.R.; Gedeon, T.; Hossain, M.Z. DiabetesNet: A Deep Learning Approach to Diabetes Diagnosis. arXiv 2024, arXiv:2403.07483. [Google Scholar]
Dutt, M.; Nunavath, V.; Goodwin, M. A Multi-layer Feed Forward Neural Network Approach for Diagnosing Diabetes. In Proceedings of the 2018 11th International Conference on Developments in eSystems Engineering (DeSE), Cambridge, UK, 2–5 September 2018; pp. 300–305. [Google Scholar] [CrossRef]
Li, X.; Ding, F.; Zhang, L.; Zhao, S.; Hu, Z.; Ma, Z.; Li, F.; Zhang, Y.; Zhao, Y.; Zhao, Y. Interpretable machine learning method to predict the risk of pre-diabetes using a national-wide cross-sectional data: Evidence from CHNS. BMC Public Health 2025, 25, 1145. [Google Scholar] [CrossRef]
Kutlu, M.; Dönmez, T.B.; Freeman, C. Machine Learning Interpretability in Diabetes Risk Assessment: A SHAP Analysis. Comput. Electron. Med. 2024, 1, 34–44. [Google Scholar] [CrossRef]
Badreddine, S.; d’Avila Garcez, A.; Serafini, L.; Spranger, M. Logic Tensor Networks. Artif. Intell. 2022, 303, 103649. [Google Scholar] [CrossRef]
Hossain, D.; Chen, J.Y. A Study on Neuro-Symbolic Artificial Intelligence: Healthcare Perspectives. arXiv 2025, arXiv:2503.18213. [Google Scholar] [CrossRef]
Carraro, T.; Serafini, L.; Aiolli, F. LTNtorch: PyTorch Implementation of Logic Tensor Networks. arXiv 2024, arXiv:2409.16045. [Google Scholar] [CrossRef]
Powers, D.M.W. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv 2020, arXiv:2010.16061. [Google Scholar] [CrossRef]
Fawcett, T. Introduction to ROC analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
Rey, D.; Neuhauser, M. Wilcoxon-Signed-Rank Test. In International Encyclopedia of Statistical Science; Lovric, M., Ed.; Springer: Berlin/Heidelberg, Germany, 2011; pp. 1658–1659. [Google Scholar] [CrossRef]

Figure 1. Schematic representation of the Diabetic predicate model, implemented as a feed-forward neural network.

Figure 2. High-level workflow diagram of LTN-based NeSy framework for diabetes prediction.

Figure 3. Visual representation of the model’s performance comparison.

Figure 4. Proof of concept UI screenshots showing user inputs and model outputs for diabetic and non-diabetic cases. Panels are: (a) Case 1: Patient input and model output with explanation in case of a diabetic patient, (b) Case 2: Patient input and model output with explanation in case of a non-diabetic patient.

Table 1. Summary of missing value imputation in the Pima Indians Diabetes Dataset.

Variable	Missing (%)	Imputation Method	Median (Pre→Post)
Glucose	0.65	Median	117.0 → 117.0
Blood Pressure	4.56	Median	72.0 → 72.0
Skin Thickness	29.56	Median	29.0 → 29.0
Insulin	48.70	Median	125.0 → 125.0
BMI	1.43	Median	32.3 → 32.3

Note: Zeros in clinical variables (Glucose, Blood Pressure, Skin Thickness, Insulin, BMI) were replaced with missing values and imputed using the median of each column. The median remains unchanged after imputation, since imputed values were equal to the pre-imputation medians.

Table 2. Threshold-based definition of the medical predicates and their medical basis.

Predicate	Definition	Medical Basis
GlucoseNormal (x)	Glucose OGTT (2 h) < 140 mg/dL	ADA: Normal glucose
GlucosePre (x)	Glucose OGTT (2 h) 140–199 mg/dL	ADA: Prediabetes
GlucoseDiabetic (x)	Glucose OGTT (2 h) ≥ 200 mg/dL	ADA: Diabetes diagnostic threshold
BMIUnder (x)	BMI < 18.5 kg/m²	WHO: Underweight
BMINormal (x)	BMI 18.5–24.9 kg/m²	WHO: Normal range
BMIOver (x)	BMI 25.0–29.9 kg/m²	WHO: Overweight
BMIObese (x)	BMI ≥ 30.0 kg/m²	WHO: Obese
AgeYoung (x)	Age < 35 years	Heuristic
AgeRisk (x)	Age ≥ 45 years	ADA: Age-related risk
DPFLow (x)	Diabetes Pedigree Function < 0.2	Heuristic
DPFHigh (x)	Diabetes Pedigree Function ≥ 0.5	Heuristic
Diabetic (x)	Patient is classified as diabetic or non-diabetic	–

ADA: American Diabetes Association; WHO: World Health Organization.

Table 3. Sensitivity analysis of the LTN model for different values of

α

and

λ

.

Table 3. Sensitivity analysis of the LTN model for different values of

α

and

λ

.

$α$	$λ$	F1 Score	AUC-ROC	Satisfaction Level
0.25	0.1	0.690	0.814	0.639
0.25	0.3	0.688	0.825	0.656
0.25	0.4	0.696	0.814	0.703
0.25	0.5	0.683	0.807	0.705
0.50	0.1	0.688	0.821	0.740
0.50	0.3	0.695	0.818	0.765
0.50	0.4	0.723	0.819	0.784
0.50	0.5	0.703	0.834	0.768
0.75	0.1	0.686	0.814	0.802
0.75	0.3	0.681	0.795	0.765
0.75	0.4	0.661	0.802	0.773
0.75	0.5	0.685	0.818	0.819

Table 4. A summary of parameter settings of all models.

Model	Optimizer/Solver	LR	K-Fold	Random State	Epoch/Iteration	Loss Function	Other Parameters
LTN	AdamW	0.001	5	42	500	Hybrid ( $α$ = 0.5, $λ$ = 0.4)	Activation = ELU; Dropout = 0.2
NN	AdamW	0.001	5	42	500	BCELoss	Activation = ELU; Dropout = 0.2
LR	lbfgs	–	5	42	500	Default	–
SVM	–	–	5	42	–	Default	Kernel = RBF; Probability = True
RF	–	–	5	42	–	Default	n_estimators = 100
KNN	–	–	5	42	–	Default	n_neighbors = 5
NB	–	–	5	42	–	Default	–

All models were trained and evaluated using 5-fold stratified cross-validation with a fixed random seed of 42 for reproducibility. The symbol “–” indicates parameters that are not applicable or correspond to default configuration values in the implementation.

Table 5. Performance comparison of different models on the Pima Indian Diabetes dataset.

Model	Accuracy	Precision	Recall	F1 Score	AUC-ROC	PR-AUC
LTN	0.86 [0.83, 0.88]	0.78 [0.70, 0.86]	0.83 [0.72, 0.95]	0.80 [0.77, 0.83]	0.92 [0.90, 0.93]	0.86 [0.85, 0.87]
NN	0.83 [0.79, 0.87]	0.71 [0.64, 0.78]	0.88 [0.87, 0.89]	0.78 [0.74, 0.82]	0.91 [0.89, 0.92]	0.85 [0.84, 0.86]
LR	0.77 [0.75, 0.80]	0.73 [0.65, 0.81]	0.57 [0.51, 0.64]	0.64 [0.61, 0.67]	0.84 [0.81, 0.87]	0.72 [0.68, 0.77]
SVM	0.75 [0.72, 0.78]	0.69 [0.61, 0.77]	0.55 [0.50, 0.59]	0.61 [0.57, 0.65]	0.82 [0.81, 0.84]	0.71 [0.68, 0.74]
RF	0.76 [0.73, 0.79]	0.68 [0.62, 0.75]	0.59 [0.53, 0.64]	0.63 [0.58, 0.68]	0.82 [0.80, 0.85]	0.69 [0.67, 0.72]
KNN	0.73 [0.69, 0.78]	0.63 [0.55, 0.71]	0.57 [0.53, 0.62]	0.60 [0.54, 0.66]	0.78 [0.75, 0.81]	0.61 [0.56, 0.65]
NB	0.75 [0.68, 0.81]	0.66 [0.52, 0.80]	0.59 [0.51, 0.68]	0.62 [0.54, 0.71]	0.81 [0.76, 0.85]	0.67 [0.56, 0.78]

All metrics are reported as mean values with 95% confidence intervals from 5-fold cross-validation. The bold is to show the best model.

Table 6. Wilcoxon Signed-Rank Test results comparing LTN with baseline models after Holm–Bonferroni correction (

α = 0.05

).

Table 6. Wilcoxon Signed-Rank Test results comparing LTN with baseline models after Holm–Bonferroni correction (

α = 0.05

).

Alternate Hypothesis $H_{1}$	Adjusted p-Value	Significant ( $p < 0.05$ )?
LTN performs better than NN	0.04883	Yes
LTN performs better than LR	0.01172	Yes
LTN performs better than SVM	0.01172	Yes
LTN performs better than RF	0.01172	Yes
LTN performs better than KNN	0.01172	Yes
LTN performs better than NB	0.01172	Yes

All tests are paired across identical folds and adjusted using the Holm–Bonferroni correction to control the family-wise error rate at

α = 0.05

.

Table 7. Learned predicate thresholds (mean ± 95% CI) compared to candidate thresholds from ADA/WHO guidelines.

Predicate	Candidate	Mean	CI Lower	CI Upper
GlucoseNormal	140	147.38	147.22	147.54
GlucosePre	199	201.63	201.35	201.90
GlucoseDiabetic	200	203.47	202.91	204.04
BMIUnder	18.5	19.22	19.10	19.35
BMINormal	24.9	26.31	26.26	26.36
BMIOver	25.0	26.96	26.91	27.01
BMIObese	30.0	31.99	31.94	32.03
AgeYoung	35	37.88	37.83	37.93
AgeRisk	45	47.47	47.41	47.53
DPFLow	0.2	0.278	0.275	0.280
DPFHigh	0.5	0.586	0.584	0.587

Candidate thresholds are the initial values from ADA/WHO guidelines. Mean and 95% confidence intervals are computed over 5 random seeds.

Table 8. Ablation study of LTN model components on the Pima Indian Diabetes dataset.

Setup	Accuracy	Precision	Recall	F1 Score	AUC-ROC	Stotal
LTN (No Axioms/No Thresholds)	0.84 ± 0.02	0.73 ± 0.04	0.86 ± 0.07	0.79 ± 0.03	0.90 ± 0.02	0.65 ± 0.03
LTN + Fixed Thresholds	0.85 ± 0.02	0.75 ± 0.05	0.85 ± 0.07	0.80 ± 0.02	0.91 ± 0.01	0.68 ± 0.02
LTN + Trainable Thresholds	0.86 ± 0.02	0.78 ± 0.02	0.83 ± 0.03	0.80 ± 0.02	0.92 ± 0.01	0.72 ± 0.06
LTN + Automatic Rule Generation	0.86 ± 0.03	0.73 ± 0.05	0.88 ± 0.06	0.80 ± 0.03	0.91 ± 0.01	0.66 ± 0.02

All values are reported as mean ± standard deviation. Stotal represents the average axiom satisfaction.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mondal, S.; Ferraro, A.; Pecorelli, F.; De Pietro, G. A Logic Tensor Network-Based Neurosymbolic Framework for Explainable Diabetes Prediction. Appl. Sci. 2025, 15, 11806. https://doi.org/10.3390/app152111806

AMA Style

Mondal S, Ferraro A, Pecorelli F, De Pietro G. A Logic Tensor Network-Based Neurosymbolic Framework for Explainable Diabetes Prediction. Applied Sciences. 2025; 15(21):11806. https://doi.org/10.3390/app152111806

Chicago/Turabian Style

Mondal, Semanto, Antonino Ferraro, Fabiano Pecorelli, and Giuseppe De Pietro. 2025. "A Logic Tensor Network-Based Neurosymbolic Framework for Explainable Diabetes Prediction" Applied Sciences 15, no. 21: 11806. https://doi.org/10.3390/app152111806

APA Style

Mondal, S., Ferraro, A., Pecorelli, F., & De Pietro, G. (2025). A Logic Tensor Network-Based Neurosymbolic Framework for Explainable Diabetes Prediction. Applied Sciences, 15(21), 11806. https://doi.org/10.3390/app152111806

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Logic Tensor Network-Based Neurosymbolic Framework for Explainable Diabetes Prediction

Abstract

1. Introduction

2. Related Work

2.1. Traditional Machine Learning Approaches for Diabetes Prediction

2.2. Deep Learning Approaches

2.3. Explainable and Interpretable AI in Diabetes Risk Prediction

2.4. Neuro-Symbolic and Logic-Based Methods

2.5. Summary and Research Gap

2.6. Protocol Comparability

3. Dataset Description

4. Loss Function

5. Methodology

5.1. Overview of Logic Tensor Network

5.2. Medical Predicates and Feature Representation

5.3. Logical Formulation

5.4. Implementation of Predicates Using Trainable Thresholds

5.5. Integration of Medical Axioms in Training

6. Experimental Settings

7. Evaluation Metrics

8. Results

9. Ablation Study

10. Discussion

11. Proof of Concept

12. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI