Next Article in Journal
A Data Science and Sports Analytics Approach to Decode Clutch Dynamics in the Last Minutes of NBA Games
Previous Article in Journal
Graph Convolutional Networks for Predicting Cancer Outcomes and Stage: A Focus on cGAS-STING Pathway Activation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Novel Integration of Data-Driven Rule Generation and Computational Argumentation for Enhanced Explainable AI

by
Lucas Rizzo
1,
Damiano Verda
2,
Serena Berretta
2 and
Luca Longo
1,*
1
Artificial Intelligence and Cognitive Load Research Lab, School of Computer Science, Technological University Dublin, D07 H6K8 Dublin, Ireland
2
Rulex Innovation Labs, Via Felice Romani 9, 16122 Genova, Italy
*
Author to whom correspondence should be addressed.
Mach. Learn. Knowl. Extr. 2024, 6(3), 2049-2073; https://doi.org/10.3390/make6030101
Submission received: 5 August 2024 / Revised: 2 September 2024 / Accepted: 6 September 2024 / Published: 12 September 2024
(This article belongs to the Section Data)

Abstract

:
Explainable Artificial Intelligence (XAI) is a research area that clarifies AI decision-making processes to build user trust and promote responsible AI. Hence, a key scientific challenge in XAI is the development of methods that generate transparent and interpretable explanations while maintaining scalability and effectiveness in complex scenarios. Rule-based methods in XAI generate rules that can potentially explain AI inferences, yet they can also become convoluted in large scenarios, hindering their readability and scalability. Moreover, they often lack contrastive explanations, leaving users uncertain why specific predictions are preferred. To address this scientific problem, we explore the integration of computational argumentation—a sub-field of AI that models reasoning processes through defeasibility—into rule-based XAI systems. Computational argumentation enables arguments modelled from rules to be retracted based on new evidence. This makes it a promising approach to enhancing rule-based methods for creating more explainable AI systems. Nonetheless, research on their integration remains limited despite the appealing properties of rule-based systems and computational argumentation. Therefore, this study also addresses the applied challenge of implementing such an integration within practical AI tools. The study employs the Logic Learning Machine (LLM), a specific rule-extraction technique, and presents a modular design that integrates input rules into a structured argumentation framework using state-of-the-art computational argumentation methods. Experiments conducted on binary classification problems using various datasets from the UCI Machine Learning Repository demonstrate the effectiveness of this integration. The LLM technique excelled in producing a manageable number of if-then rules with a small number of premises while maintaining high inferential capacity for all datasets. In turn, argument-based models achieved comparable results to those derived directly from if-then rules, leveraging a concise set of rules and excelling in explainability. In summary, this paper introduces a novel approach for efficiently and automatically generating arguments and their interactions from data, addressing both scientific and applied challenges in advancing the application and deployment of argumentation systems in XAI.

1. Introduction

Explainable Artificial Intelligence (XAI) is an active research field that aims to understand how AI-based models arrive at inferences. One of its main focuses is designing and developing methods to construct more interpretable and explainable models from data. Such models should be highly accurate, transparent, and comprehensible to stakeholders. These properties can enhance human trust and effectively support the realization of trustworthy and responsible AI [1,2]. While many categories of XAI-based methods exist, one is based on rules. The rationale for using rules is that they are believed to be comprehensible, readable and understandable to humans, and they can facilitate the explainability and support the understanding of the inferential mechanisms of AI-models [3]. For example, decision trees can produce an if-then rule [4], while a fuzzy logic-based system can produce fuzzy rules [5]. However, rules can become too convoluted in large scenarios, presenting challenges regarding readability and scalability. For example, decision trees can be very accurate but have many deep branches, thus leading to very long rules humans cannot consume. Also, such methods often cannot provide contrastive rules [3], which might help users, for instance, to explain and understand why one model’s prediction was favored over another [6]. Hence, a key scientific challenge in XAI is the development of methods that generate transparent and interpretable explanations while maintaining scalability and effectiveness in complex scenarios.
One branch of artificial intelligence is automated reasoning, which models human reasoning via computational methods. A class of these methods is called computational argumentation, a sub-field of AI specializing in implementing reasoning models [7,8]. Technically, defeasible reasoning is a type of non-monotonic reasoning [9,10], which considers reasons, often in the form of rules, called arguments, in a dialogical structure [11]. This structure allows for the retraction of the conclusions or claims supported by some argument in light of new information or evidence supported by other arguments. Consequently, computational argumentation is an ideal candidate for creating efficient XAI methods [12]. This is because of the language used, the rules and arguments, and the specific reasoning that can be modeled via defeasibility, a property of human reasoning under uncertainty and with contradicting and contrastive knowledge. Therefore, computational argumentation excels in explainability [13], and it is an ideal candidate for building explainable AI systems [14]. Despite the appealing properties of rule-based systems and computational argumentation, limited work exists at their intersection [15]. Therefore, this study addresses the applied challenge of implementing such an integration within practical AI tools. In particular, it contributes to this challenge by leveraging the capability of a particular rule-extractor, named the Logic Learning Machine (LLM) [16], to produce if-then rules, and its transition to a structured argumentation framework [7] produced with state-of-the-art computational argumentation methods. A preliminary comparative analysis is proposed for classification tasks using datasets of different sizes. If-then rules are produced by the LLM technique for each dataset, targeting one of the available output classes. When a record matches multiple rules, conflicts are resolved using two distinct approaches. The first is the Standard Applied Procedure, a heuristic method in which rules are assigned weights based on their generality (covering) and precision (error rate). The prediction for each input instance is made by measuring the output value for which the sum of the weights of the verified rules is higher, while the second is implemented via computational argumentation.
The remainder of this manuscript is focused on presenting related work to the research aim in Section 2. Section 3 introduces the formalisms behind the LLM and the structured argumentation. A comparative experiment is proposed in Section 4 aimed at systematically demonstrating how the integration of the selected rule extractor and computational argumentation methods can be realized. Results and discussion relative to this experiment are presented in Section 5. Lastly, Section 6 concludes the study and indicates potential directions for future work.

2. Related Work

2.1. Rule-Based Explainable Artificial Intelligence

The issue that limits the broader adoption of most AI-based systems is their lack of interpretability, the well-known black-box paradigm. This refers to humans’ difficulty, either expert or lay people, in understanding the inferential mechanisms of such systems and how they arrive at final outputs. Explainable Artificial Intelligence (XAI) is a discipline that is gaining momentum and is focused on making AI-based black boxes transparent, interpretable, and understandable, among many other properties [15]. Many methods have been designed and deployed, and many others are being developed. For example, some authors [17] have attempted to create grey-box models, aimed at introducing interpretability into otherwise opaque systems. While fully converting a black-box into a white-box may not always be feasible or desirable, XAI provides a middle ground by making black-box models more transparent and understandable through various explanation methods. Another class of such methods is rule-based ones [15]. These are particularly appealing because rules are often used in human reasoning to justify claims and are believed to have a good degree of comprehensibility and explainability. In particular, within rule-based methods, those generating if-then rules, such as Decision Trees (DTs) [18] and Logic Learning Machine (LLM) [19], offer one approach to creating predicting models where the inferential mechanisms can be interpreted and comprehended with ease, and in turn, used for justification and explanation. That is mainly due to their ability to control the maximum number of premises and error thresholds per rule, thereby reducing the overall amount of rules while maintaining strong inferential capacity [20]. This approach has been applied in various biomedical contexts [21] for the analysis of gene expression for cancer diagnosis [22] and in the task of disease risk prediction [23].
Although the rule-based methods category offers appealing properties for building explainable models from data, it handles contrastive rules and conflictuality poorly. In other words, they often lack strategies for dealing with conflicting rules in their premise or claim spaces and do not help shape exhaustive explanations for users. A survey describes several strategies to solve conflicts in rule-based applications [24]. The CN2 algorithm [25] resolves the conflicts among rules by computing, for each rule r and each output class o, the number of examples of o that are covered by r, and then the class with the higher value is assigned as a prediction. Another way to solve rule conflicts is the voting mechanism [26,27]. Each rule has a corresponding vote for its predicted class using a weight equal to the probability of the given class; the votes are summed up, and the class with the highest probability is chosen as the final prediction. Other approaches to solving conflicts among rules are described in [28,29]. In the former work, the idea is to use the information of training examples at the intersection of the conflicting rules to assign a predictive class. Otherwise, if there are no training examples in the intersection, partition rules should be proposed in as few non-empty partitions as possible and used to find the most likely output class. Double Induction is introduced in the latter work, aiming to induce new rules based on examples involving conflicting rules. These new rules are then used to classify the unlabeled example. This approach has been extended into a new approach called Recursive Induction [24]. Another approach to solving conflicts among rules is the Standard Applied Procedure, the particular one used in this research study and described in Section 3.1.5. Here, a rule’s weight depends on its generality and precision on the training set and how these metrics compare to other rules predicting the same output class.

2.2. Argument-Based XAI and Defeasible Reasoning

Computational argumentation naturally addresses the aforementioned challenges and is strictly linked to Explainable Artificial Intelligence [13]. This link arises because XAI approaches benefit from clarifying and defending AI systems’ inferential outputs. At the same time, computational argumentation is focused on developing methods for linking these outputs to the evidence supporting them. In particular, systems based on computational argumentation can show step-by-step how an AI system can lead to a specific output providing explicative justifications to human users [12]. This is an advantage when compared to other interpretable systems, for example, those following a Bayesian network approach. The argument-based approach is usually qualitative, focusing on the structure and content of logical arguments, whereas the Bayesian network approach [30] is quantitative, focusing on probabilistic reasoning and the systematic handling of uncertainty. Its reasoning is more mathematical and relies on the formalism of probability theory. The choice between these methods depends on the nature of the problem at hand and the requirements for interpretability and reasoning style. In summary, while both argument-based and Bayesian network approaches can be considered white-box systems due to their interpretability, they differ significantly in their reasoning methodologies.
The scientific interaction between methods from XAI and computational argumentation is widely discussed in [31]. Similarly, arguments can be used to represent the reasons that lead an agent to make a decision and argumentation semantics is used to determine acceptable arguments in a scenario involving rescue robots [32]. A domain in which computational argumentation can significantly support the interpretability and explainability of AI-based systems is medical informatics [33]. Argumentative models have features and properties that resemble activities routinely carried out by humans to reason, justify, and explain under uncertainty, such as in medical informatics. In particular, thanks to their rich dialectical status, they can achieve various reasoning abstractions tightly connected to the notion of explainability and justification [31]. Such dialectical status can be achieved by applying argumentation semantics [34,35,36,37] as demonstrated in [38]. Such semantics, in essence, are strategies for resolving conflicts among arguments, often expressed as rules [39]. Work has attempted to employ its methods for constructing predictive models, enabling decisions to be mapped to an argumentation framework with predefined attack properties between arguments (logical rules) [14,18]. These attack relations between arguments can yield justifiable decisions, clearly showing the steps taken to reach them. Ideally, this approach would bring the advantages of symbolic, non-monotonic reasoning into data-driven inferential models. However, it often faces limitations in inferential capacity across diverse applications. Recent research [13,40] has sought to integrate machine learning (ML) with argumentation, aiming to combine the former’s inferential strength with the latter’s explanatory power. Considering these aspects indicates that the proposed integration could lead to more explainable models while tentatively maintaining robust accuracy.

2.3. Argument-Based XAI

Explainable Artificial Intelligence and argumentation are closely related, as in recent years, argumentation has been used for providing explainability to AI [13]. The pairing arises from the fact that XAI approaches benefit from clarifying and defending their decisions, while argumentation provides a method for linking any decision to the evidence supporting it. Argumentation can show step-by-step how an AI System reaches a decision, explaining predictions to human users [12]. One example of this interaction is represented in [31], which provides an overview of XAI approaches built using methods from the field of computational argumentation along with an argumentation-based approach for engineering explainable Beliefs-Desires-Intentions (BDI) agents. Similarly, in a multi-agent scenario involving rescue robots, arguments were used to represent the reasons an agent could employ to make a decision. Argumentation semantics were employed to determine which of these could be accepted [32]. The connection between explainable AI (XAI) and argumentation is motivated by the fact that the former requires clarifying and defending decisions, and the latter provides a tangible, computational class of methods for linking any decision to the supporting evidence [12]. Computational argumentation was used for dialectically explainable predictions [41]. In particular, a procedure was devised to extract argumentation debates from data so that their dialectical outcomes amount to predictions, such as classifications that can be explained dialectically. In this context, debates consist of data-driven pieces of information that might not be linguistic but could be seen as arguments because they are dialectically related, for example, by disagreeing on data labels. The dialectical interaction with argument-based systems can enhance human understanding and support trustworthy AI. Analogously to debates, dialogue games based on arguments and their dialectical status can enhance users’ explanation needs [42]. The use of computational argumentation was reviewed in the context of medical informatics, given the significant interest in developing explainable AI to support decision-making and diagnosis and consider risks and responsibilities for clinicians [33]. Connected to this, argumentative models, thanks to their dialectical status, have features and properties that resemble activities routinely carried out by humans for reasoning and explaining, especially in clinical domains.

3. Integration of the Logic Learning Machine and Structured Argumentation

This section is devoted to describing the formalisms behind the selected rule generation method (the Logic Learning Machine) in Section 3.1 and the selected structured argumentation framework to integrate extracted rules towards the production of a final rationale inference in Section 3.2.

3.1. Rule Generator: The Logic Learning Machine

The Logic Learning Machine (LLM) is a rule-based method. The LLM transforms the data into a Boolean domain where some Boolean functions (one for each output value) are reconstructed starting from a portion of their truth table with a method described in [16]. The method creates a set of intelligible rules through Boolean function synthesis following four steps: (a) Discretization; (b) Latticization or Binarization; (c) Positive Boolean function; (d) Rule generation.

3.1.1. Discretization

In this step, each continuous variable domain is converted into a discrete domain by a mapping ψ j : X j I M , where X j is the domain of the j-th variable and I M = { 1 , , M } is the set of positive integers up to M. The mapping must preserve the ordering of the data. If  x i j x k j then ψ j ( x i ) ψ j ( x k ) , i = 1 , , d . One way to describe ψ j is that it consists of a vector γ j = ( γ 1 , , γ m 1 ) so that:
ψ j ( x i ) = 1 , if   x i j < γ ( i 1 ) m , if   γ ( j m 1 ) < x i j < γ ( j m ) M j , if   x i j > γ ( i M 1 )
The problem of determining cutoffs for discretization can be approached as an optimization problem. The goal is to minimize the total amount of cutoffs while considering the constraint introduced in the preceding equation. In order to formalize the problem, τ j l and X j i k for j { 1 , , d } and i , k { 1 , , N } and l { 1 , , α j } must be introduced. X j i k is the set of indexes l such that x i j < ρ j l < x k j and τ j l are Boolean values that assert if ρ j l is a cutoff for the j-th variable:
τ j l = 1 if   ρ j l γ j 0 otherwise
Then, the optimization problem is given by:
min τ j = 1 d l = 1 α j τ j l with :
j = 1 d l X j i k τ j l 1   i , k s . t .   i k
On the one hand, if the training set contains ambiguities, discretization must minimize them and the number of cutoffs. The solution is to relax constraints by introducing a set of auxiliary variables that control the violation of the starting constraints. On the other hand, constraints may be enforced by imposing that examples of the different classes are separated by at least q > 1 different variables. In any case, the abovementioned problem presents several variables and could require excessive computational cost. To find a near-optimal solution, the possible practical approach is to use a greedy approach method, such as the Attribute Driven Incremental Discretization (ADID) [43,44]. ADID uses the concept of effective distance d eff , that is, the number of inputs that separate two examples:
d eff ( x i , x k ) = j = 1 d θ ( | ψ ( x i ) ψ ( x k ) | )
where θ ( u ) = 1 if   u > 0 0 otherwise
Given the effective distance, then two examples x i , x k are considered separated in a determinant way if d eff ( x i , x k ) q . For each candidate cutoff, it is possible to define the following:
  • The sets S j l ( l ) , S j l ( r ) including examples separated by the cutoff ρ j l :
    S j l ( l ) = { x i   |   ( x i , y i ) S , x i j ρ j l }   S j l ( r ) = { x i   |   ( x i , y i ) S , x i j > ρ j l }
  • The counter v j l that counts the number of examples separated in a determinant way by ρ j l :
    v j l = ( x i S j l ( l ) ) ( x k S j l ( r ) ) | y i y k |   θ ( q d eff ( x i , x k ) )
  • The value w j l that measures the total effective distance between examples belonging to different classes that are separated in a determinant way by ρ j l :
    w j l = ( x i S j l ( l ) ) ( x k S j l ( r ) ) | y i y k |   θ ( q d eff ( x i , x k ) )   d eff ( x i , x k )
By using these definitions, ADID starts from an empty set. At each step, it adds the cutoff that maximizes v j l , and in a suborder that minimizes w j l , v j l and w j l are recalculated according to the added cutoff and the process continues until v j l = 0 ρ j l . Once the cutoffs have been selected from ρ j , their positions can be refined by applying the midpoint between the two boundary examples, as shown in the following equation:
γ j l = ρ j l γ j l = ( ρ j l + ρ j l + 1 ) 2
Before applying binarization, it is necessary to convert nominal attributes. In this instance, the transformation is simpler than continuous attributes since assigning an integer to each possible nominal value is sufficient.

3.1.2. Binarization

In this step, each discretized domain is transformed into a binary domain through a mapping ϕ : I M j { 0 , 1 } M j , where I M j is the domain of the j-th variable and { 0 , 1 } M j is a string having a bit for each possible value in I M j . The mapping must maintain the ordering of data: u < v if, and only if, ϕ ( u ) < ϕ ( v ) , where the standard ordering between z and w { 0 , 1 } M j is defined as follows:
z < w   if ,   and   only   if ,   j such   that   z j < w j l j z j w j
z w   if ,   and   only   if ,   z j w j   j = 1 , , M
If the relation in the equation above holds, then it is said that z covers w. A suitable choice for ϕ j is the inverse only coding, that for each k I M creates a string h { 0 , 1 } M j having all bits equal to 1 except the k-th bit which is set to 0. For example, let x i j = 3 with domain I 5 , then ϕ j ( x i ) = 11011 . In this way ϕ ( x i ) = z i , where z i is obtained by concatenating ϕ j ( x ) for j = 1 , , d . As a result, the new training set is S = { ( z i , y i ) } i = 1 N , with  z i { 0 , 1 } B , where B = j = 1 d M j .

3.1.3. Synthesis of the Boolean Function

The training set S , obtained after binarization, can be divided into different subsets according to the output class: T is the set containing ( z i , y i ) with y i = 1 , whereas F is the set containing the examples for which y i = 0 . T and F can be viewed as portions of the truth table of the Boolean function f that must be reconstructed. In line with the definitions provided in [45], some notations are described below:
  • Each Boolean function can be written with operators AND, OR, and NOT that constitute the Boolean algebra; if NOT is not considered, then a simpler structure, called a Boolean lattice, is obtained. From now on, only the Boolean lattice is considered.
  • The sum (OR) and product (AND) of m terms can be denoted as follows:
    j = 1 m z j = z 1 + z 2 + + z m = z 1   OR   z 2   OR     OR   z m
    j = 1 m z j = z 1 · z 2 · · z m = z 1   AND   z 2   AND     AND   z m
  • A logical product is called an implicant of a function f if the following relation holds: j = 1 m z j f , where each element z j is called a literal.
The algorithm LLM employs to produce implicants is called Shadow Clustering [46]. It generates implicants for f by analysing the Boolean lattice { 0 , 1 } B . The algorithm selects a node in the diagram and generates bottom points ( T , F ) by descending the diagram; the move down from a node to another node is equivalent to changing a component from 1 to 0, and a bottom point is added to A when any further move down leads to a node belonging to the lower shadow of some w F . In particular, the starting node is chosen among the z T { 0 , 1 } B that do not cover any point a A such that a z (in other words, the algorithm ends when each element in T covers at least one element in A). Once A has been found, it may contain redundant elements, and consequently, it must be simplified to find A * from which derives the Positive Disjunctive Normal Form (PDNF) of the positive Boolean function. Different versions of Shadow Clustering exist depending on the choice of the element to be switched from 1 to 0 at each step of the diagram descent. For example, Exhaustive Shadow Clustering (ESC) retrieves all the bottom points deriving from the starting node. Since ESC has a high computational cost, it is usually better to use greedy approaches that, at each step, change the i-th element of the examined node to minimize the complexity of the final function, which is measured in terms of the number of implicants and the number of literals in each implicant. For example, Maximum-covering Shadow Clustering (MSC) at each step changes the element that maximizes the potential covering associated with the element index, where the potential covering associated with switching on index i from 1 to 0 is defined as the number of elements z T for which z i = 0 . As for the selection of A * A , a possible choice is to subsequently add to A * the element of A that covers the highest number of points in T that are not covered by any other element of A * . By leveraging Boolean functions, it is possible to construct rules that will be applied to the data and provide a prediction after handling any arising conflicts. Rule generation is detailed in the next subsection.

3.1.4. Rule Generation

Let r i denote the i-th rule. Each rule r i can be represented in a general form as:
r i : if   j = 1 m i F i j ( v i j )   then   y i
where ⋀ denotes logical AND, F i j ( v i j ) represents the j-th feature with a corresponding value (or range of values) v i j in rule r i , y i denotes the output class associated with rule r i , and  m i represents the number of premises (features) in rule r i . Rules are generated using the Boolean functions mentioned before. An intelligible rule is the result of transforming each implicant of the positive Boolean function f. A function is generated for each output value, and the consequent rules only depend on f. The transformation considers the coding applied during binarization. In particular, z was obtained by concatenating the results of the mapping ϕ j ( x ) for each j = 1 , , d , and consequently, it can be split into substring h j for each attribute, where each element z i h j corresponds to a nominal value if X j is nominal. In contrast, it corresponds to an interval if X j is ordered. For each implicant, a rule in if-then form is generated by adding a condition for each attribute X j as follows:
  • If z i = 0 for each z i h j , then no condition relative to X j is added to the rule.
  • If X j is nominal, then a condition X j V is added to the rule, where V is the set of values associated with each z i = 0 h j .
  • If X j is ordered, then a condition X j V is added to the rule, where V is the union of the intervals associated with each z i = 0 h j .
Notice that conventionally, if finite, the upper bound is always included in the condition, whereas the lower bound is excluded.
Several options can be used to control the generated rules, for example:
  • the number of rules for each class; it affects the total number of generated rules;
  • the maximum number of conditions in a rule; it forces the number of premises of each rule not to exceed a certain threshold;
  • the maximum error (in %) that a rule can score.
By acting on these parameters, it is possible to control the properties of the generated rule set. Rules for the other class are generated by changing the label C 0 with 1 and C 1 with 0. In the case of the multiclass problem, it is sufficient to decompose the problem into several bi-class problems. According to the One-vs-Rest (OVR) strategies described so far, for each of the sub-problems, the target class is labeled with 1, and the remainder with 0.

3.1.5. Rules Aggregator Logic: Standard Applied Procedure

When applying a Logic Learning Machine model, a sample may verify rules predicting different output classes. In this case, the predicted output class is established following the so-called Standard Applied Procedure based on the rule relevance concept. To present relevance, the following quantities relative to a rule r in the IF <premise> THEN <consequence> form are introduced:
  • ν 1 r is the number of training set examples that satisfy both the premise and the consequence of the rule r;
  • ν 2 r is the number of training set examples that satisfy the premise but do not satisfy the consequence of the rule r;
  • ν 3 r is the number of training set examples that do not satisfy either the premise or the consequence of the rule r;
  • ν 4 r is the number of training set examples that do not satisfy the premise and satisfy the consequence of the rule r.
Notice that an example x i satisfies the premise of the rule r if it satisfies all its premise conditions, whereas x i does not satisfy the premise of the rule r if it does not satisfy at least one among its premise conditions. Combining these quantities, it is possible to compute two quality measures for a rule r:
  • Covering: C r = ν 1 r ν 1 r + ν 4 r ,
  • Error: E r = ν 2 r ν 2 r + ν 3 r .
On the one hand, it is evident that the greater the covering, the more relevant the rule is. On the other hand, the smaller the error, the less relevant the rule is. Then, the relevance of a rule r is obtained by combining C r and E r :
R r = C r × ( 1 E r ) .
Once the relevance of the rule is defined, it is possible to use it to compute a score S ( x i , c ) for each class c that measures how likely it is that y i = c :
S ( x i , c ) = r R c i R r
with R c i = { r | r R , r x i , O ( r ) = c } , where R is the complete rule set, r x i denotes that x i satisfies the premise of the r and O ( r ) = c denotes the consequence of r predicting class c. Then, R c i is the set of rules satisfied by x i that predict class c. From the scores of each output class, it is possible to define the probability that y i = c :
P ( c | x i ) = S ( x i , c ) k C S ( x i , k )
Then, the selected output is the one that maximizes the output probability: y i * = max c P ( c | x i ) .

3.2. Rules Aggregator Logic: Computational Argumentation

Contrarily to expert systems, which usually apply a monotonic logic for the aggregation of rules, defeasible reasoning, as described in Section 2.2, applies a non-monotonic logic [14]. Models built with such logic are usually structured over different layers [40], with a sequential structure: (a) Creation of the structure of arguments; (b) Definition of the conflicts of arguments; (c) Evaluation of conflicts among arguments; (d) Definition of the dialectical status of arguments; and (e) Accrual of acceptable arguments and final inference (Figure 1, right). The following sections describe the formalism to implement each of these phases.

3.2.1. Creation of the Structure of Arguments

This research study uses if-then rules directly as defeasible arguments. Such rules are those formed with the procedure described in Section 3.1. This implies that the arguments modeled from rules are not considered strict and can be retracted (Section 2.2) in light of the information present in other arguments.

3.2.2. Conflicts of Arguments

The interaction of arguments must be precisely formalized to properly implement non-monotonic, defeasible logic. Considering the rules produced by the rule-generator LLM have a binary inferential mechanism (Section 3.1.2), which means each rule supports one of two possible outcomes (claim), then a specific exclusion mechanism is necessary. In abstract argumentation, this can be modeled with the notion of ‘attack’, a specific way of tackling conflicts. In this case, an argument that infers a specific claim attacks another with a different claim. Technically, this is known in the literature as a rebuttall attack, a type of conflict that occurs when an argument negates the conclusions of another argument [40,47]. This mechanism generates an initial multipartite argumentation graph, in which all the arguments that support one claim attack all other arguments that support a different one. Figure 2 depicts an abstract example with two class outputs y 1 and y 2 .

3.2.3. Evaluation of Conflicts

The argumentation graph built with if-then rules can be elicited with data, following the notion of instantiation-based argumentation [48]. Arguments are activated or discarded based on whether their premises are true or false. Consequently, a new, instantiated abstract argumentation framework emerges, a subset of the initial multi-partite argumentation graph. Consequently, the number of attacks defined in the multi-partite graph might also be activated. In other words, attacks are binary relations; a successful (activated) attack occurs whenever both its source (argument attacking) and its target (argument being attacked) are activated. Since, by design, a multipartite graph has two sets of arguments for binary classification tasks, each containing arguments supporting the same specific claim, it is intuitive to know the effect of their interaction: two distinct possible outcomes. Consequently, the notion of an inconsistency budget is introduced in this study to improve the evaluation of conflicts [49,50,51,52,53]. This is a specific threshold that indicates how much inconsistency can be tolerated. Given an inconsistency budget β , only a set of attacks whose sum of weights is less or equal to β can be considered. The relevance score of an if-then rule (Equation (2)) is used to weigh arguments. In detail, the weight of a single attack is calculated as the sum of the weights of its source and target arguments (the attacker and the attacked). The total sum of the attacks’ weights in an instantiated argumentation framework represents the maximum β allowed in an argumentation graph. When β is reduced, attacks must be removed until the new sum of weights fits within the limit, following these three priorities (from most to least important):
  • P1: Rebuttal attacks.
  • P2: Attack with smaller weight.
  • P3: In cases of equal weights, the attack whose source argument has the smallest weight.
Since the rules generated by the LLM technique always lead to multipartite graphs, the first priority P1 is defined to reduce the inconsistencies caused by the numerous rebuttals consequently created. In other words, removing an attack from a rebuttal is preferable to removing a uni-directional attack between two arguments. Eliminating all attacks would also lead to undecided cases, where multiple arguments supporting different claims could be accepted simultaneously. After reducing the number of rebuttals, P2 and P3 aim to prioritize accepting more significant arguments (based on the weight calculated via rules coverage and error). Once the valid attacks under a specific budget are defined, their dialectical status can be computed as described in the following section.

3.2.4. Dialectical Status of Arguments

The dialectical status of arguments can be defined according to argumentation semantics designed for Dung-style abstract argumentation frameworks [54]. Hence, the argumentation framework is considered a directed graph at this stage of the reasoning process. In this graph, the arguments (nodes) are connected by attack relations (edges), temporarily disregarding any previously defined internal structure for arguments and any information associated with attacks. In this study, two classes of semantics are employed to define the dialectal status of arguments in such a graph: extension-based semantics [39,54] and ranking-based semantics [55]. Their formal definitions are presented below.

Extension-Based Semantics

The notions of reinstatement and conflict-freeness are pillars of extension-based semantics, including the popular Dung-based semantics [54], such as the grounded and preferred. The formal definitions described below are those provided in [56,57].
Definition 1.
Let A r , R be an argumentation framework (AF), a , b A r and a r g s A r . The following shorthand notations are employed:
a + as { b   |   ( a , b ) R } .
a r g s + as { a   |   ( a , b ) R for some a a r g s } .
a as { b   |   ( b , a ) R } .
a r g s as { b   |   ( b , a ) R for some a a r g s } .
a + indicates the arguments attacked by a, while a indicates the arguments attacking a. a r g s + indicates the set of arguments attacked by a r g s + , while a r g s indicates the set of arguments attacking a r g s .
Definition 2.
Let A r , R be an AF and a r g s A r . a r g s is  conflict-free  iff a r g s a r g s + = .
Definition 3.
Let A r , R be an AF, A A r and a r g s A r . a r g s  defends  an argument A iff A a r g s + .
Definition 4.
Let A r g , R be an AF and L a b : A r g { i n , o u t , u n d e c } be a labeling function. L a b is a  reinstatement labeling  iff it satisfies:
  • A A r : ( L a b ( A ) = o u t B A r : ( B defends A L a b ( B ) = i n ) ) and
  • A A r : ( L a b ( A ) = i n B A r : ( B defends A L a b ( B ) = o u t ) )
Definition 5.
Let A r g s be a conflict-free set of arguments, F : 2 A r g s 2 A r g s a function such that F ( A r g s ) = { A   |   A is defended by A r g s } and L a b : A r g s { i n , o u t , u n d e c } a reinstatement labeling function. Also consider i n ( L a b ) short for { A A r g s   |   L a b ( A ) = i n } , o u t ( L a b ) short for { A A r g s   |   L a b ( A ) = o u t } and u n d e c ( L a b ) short for { A A r g s   |   L a b ( A ) = u n d e c } .
A r g s is  admissible  if A r g s F ( A r g s ) .
A r g s is a  complete  extension if A r g s = F ( A r g s ) .
i n ( L a b ) is a  grounded  extension if u n d e c ( L a b ) is maximal, or  i n ( L a b ) is minimal, or  o u t ( L a b ) is minimal.
i n ( L a b ) is a  preferred  extension if i n ( L a b ) is maximal or o u t ( L a b ) is maximal.
Argumentation semantics, such as grounded and preferred, provide a mechanism for defining the dialectical status of an argument, a sort of justification status. According to such semantics, an argument is justified if it can withstand its attackers. Thus, extension-based semantics provides a perspective one can take when deciding on accepted, rejected, and undecided arguments. The extension (set of acceptable arguments) can defend itself and remain internally coherent, even if someone disagrees with its viewpoint [36]. The grounded semantics produces an extension considered more sceptical, as it takes fewer committed choices and always offers a single extension. In contrast, the preferred semantics can be seen as a more credulous approach to being more audacious when accepting arguments since it can produce different extensions.

Ranking-Based Semantics

Another well-known class of semantics is the ranking-based [55]. Here, the goal is not to compute an extension of arguments but to rank them. It is a more flexible class of semantics because arguments are not strictly rejected or accepted. Instead, a graded assessment is executed based on the topology of an argumentation framework. This approach addresses potential drawbacks in extension-based semantics. These include the problem of multiple attacks on an argument having the same effect as one, the acceptance level of arguments always being the same, and arguments having the same status without direct comparison. The following definitions are those presented in [55].
Definition 6.
A  ranking-based semantics  σ associates with any argumentation framework A F = A r g s , a t t a ranking A F σ on A r g s , where A F σ is a binary relation which is total ( a , b A r g s , a   A F σ b or b A F σ a ) and transitive ( a , b , c A r g s , a   A F σ b and b A F σ c , then a A F σ c ). a A F σ b means that a is at least as acceptable as b and a A F σ b means a is strictly more acceptable than b.
A ranking-based semantics assigns a value to each argument based on its number of attackers, as proposed in [58]. For this purpose, a categorizer function is defined as follows:
Definition 7.
Let A r g s , a t t be an argumentation framework. Then, C a t : A r g s ( 0 , 1 ] is the  categorizer function  defined as:
C a t ( a ) = 1 if   a = 1 1 + c a C a t ( c ) otherwise
Definition 8.
Given an argumentation framework A r g s , a t t and a categorizer function C a t : A r g s ( 0 , 1 ] , a   ranking-based categorizer semantics  associates a ranking A F C a t on A r g s such that a , b A r g s , a   A F C a t b iff C a t ( a ) C a t ( b ) .

3.2.5. Accrual of Acceptable Arguments and Final Inference

A definitive inference must be made in the final step of this defeasible reasoning process. Two straightforward cases are as follows: (i) an extension-based semantics produces a single extension where all arguments support the same claim, and (ii) the top argument(s) ranked by a ranking-based semantics support the same claim. In both cases, the final inference is the claim supported by the accepted/top-ranked argument(s). However, if neither case applies, the following ad hoc strategies are employed based on the type of semantics used:
  • Extension-based semantics:
    -
    An extension containing arguments that support different claims is not employed for the final decision, as it does not provide a single, justifiable point of view. This suggests that the inconsistency budget was set too small. Therefore, the final inference is undecided on whether any extension can be accepted.
    -
    When multiple acceptable extensions are produced, their credibility is determined by the number of accepted arguments in each extension, disregarding argument weights since they have already been used to define the attack weights. This is a limited, simplistic approach to reduce the number of undecided situations after applying semantics, like the preferred one, that could yield multiple extensions. Eventually, the defeasible reasoning process ends undecided if all extensions have the same number of arguments.
  • Ranking-based semantics:
    -
    The inference is left undecided if the ranking-based semantics produces a tie between top-ranked arguments supporting different claims.
Figure 3 shows an illustrative example of argument elicitation under various inconsistency budgets and the dialectical status of arguments under different semantics. Only graph 3 leads to non-empty extensions in which arguments support the same claim. Using the categorizer semantics, only graphs 1 and 3 lead to top-ranked arguments supporting the same claim.

4. Design and Methods

This research study aims to achieve a good level of automation for building argument-based models from data without resorting to human intervention. In particular, this automation can be obtained by leveraging the specific Logic Learning Machine (LLM) rule-extractor [16] (as described in Section 3.1) to automatically produce if-then rules. Subsequently, state-of-the-art computational argumentation methods are used for semi-automatically aggregating these rules into a structured argumentation framework [7], a multipartite graph believed to be more explainable and interpretable than mere lists of rules. This hypothesis is tested via a comparative analysis of the predictive capacity of the LLM procedure and that of the computational argumentation pipeline (described in Section 3.2). Such predictive capacity is tested against the generalizability across different binary classification tasks by employing four open-access datasets of different sizes and dimensionality. The detailed formal hypothesis is:
Hypothesis 1.
IF defeasible argument-based models are built with if-then rules generated by the LLM technique AND their conflictuality is handled with argument-based semantics (preferred, grounded, categorizer), then, these models will achieve similar predictive capacity (recall, precision, accuracy, Younden’s index, % of undecidedness) to those built with the Standard Applied Procedure on the same rule sets for four distinct binary classification tasks.
Figure 4 summarizes the designed experiment to test this hypothesis with the evaluation phases incorporated into the flow.
Four datasets from the UCI Machine Learning Repository were selected for this research study (and accessed on 6 June 2024): CARS (archive.ics.uci.edu/ml/datasets/Car+Evaluation), CENSUS (archive.ics.uci.edu/ml/datasets/Adult, BANK (archive.ics.uci.edu/ml/datasets/Bank+Marketing, MYOCARDIAL (archive.ics.uci.edu/ml/datasets/Myocardial+infarction+complications. The datasets are often used as benchmarks within the machine learning community. The rationale for such a selection is their availability as open access, their varying number of features and records (Table 1), and  the imbalance in their target class distribution, a common characteristic of real-world settings. A simple pre-processing is necessary to uniform these datasets. For the CARS dataset, the target class options acc, good, and vgood are grouped into acc, to transit to a binary classification task. All the records with at least one missing value across features are dropped for the CENSUS dataset. For the BANK dataset, no pre-processing is performed. For the MYOCARDIAL dataset, the ZSN (chronic rate failure) feature is chosen as the target, features with more than 5% missing data are dropped, and no interpolation is performed. Table 1 summarizes the characteristics of the final datasets to be used in the subsequent experimental phases.
The Logic Learning Machine pipeline, as described in Section 3.1, is implemented through the proprietary software Rulex Platform (www.rulex.ai/, accessed on 6 June 2024). The computational argumentation pipeline, as described in Section 3.2, is implemented through the open-source ArgFrame framework [38] (Figure A3). The if-then rules extracted by the former pipeline (LLM), in textual format, are entered into a Python parser and translated to a JSON format required by the latter pipeline (ArgFrame).
The comparison of the predictive models produced by the LLM technique (Table 2) and the argument-based models (Table 3) is performed based on standard classification metrics. The accuracy, precision, recall, Younden’s index, and percentage undecided are computed, providing a balanced assessment of a model’s performance. Two important aspects of each dataset are (i) the sizes of the resulting models and (ii) their inferential capacity. The former is relevant as a proxy and quantitative measure of explainability since compact information is believed to be more interpretable and explainable than large-size information. The latter indicates the models’ predictive power and performance.
Some parameters of the two methods tested (LLM and computational argumentation) are systematically varied to ensure an extensive assessment of model performance and to gauge the generalizability of results. When employing the LLM technique, the maximum number of premises per rule was limited to 4, aiming to keep them more comprehensible for end-users. Additionally, the error threshold per rule was set at two levels: 10% and 25%. Introducing an error threshold helps prevent overfitting and reduces the number of rules, thereby supporting the overall explainability of the resulting models. Furthermore, these error thresholds enable argument-based models to possibly resolve conflicts and simultaneously achieve robust inferential capacity with fewer rules.
As for the configuration of argument-based models, three semantics are utilized: (1) preferred, a credulous semantic commonly used in defeasible argumentation; (2) grounded, a more sceptical but also widely used semantic; and (3) categorizer, to incorporate a rank-based semantics in the evaluation. Various thresholds are also set for the maximum inconsistency budget since the rules generated by the LLM technique will likely lead to numerous rebuttals. Therefore, establishing an inconsistent budget is crucial to avoid many cases without a conclusive inference. Three thresholds were explored: 25%, representing a more aggressive approach focusing on critical attacks; 50%, a moderate approach; and 90%, aimed at resolving undecided cases by removing only a small number of attacks. Table 2 and Table 3 list the models and their respective parameters.

5. Results and Discussion

In this section, results are grouped by the four selected datasets. An analysis is performed for each of them, based on predictive capacity (recall, precision, accuracy, Younden’s index, % of undecidedness) and model size (number of rules/arguments and attacks). Lastly, an overall discussion is presented with general findings and a focus on model explainability.

5.1. CARS Dataset

The resulting number of rules and attacks formed by using the CARS dataset are listed in Table 4, grouped by the two error thresholds employed. The resulting models are relatively small, considering the number of formed rules, even for a high β . Figure 5 depicts the results associated with the selected classification metrics. Insightfully, the maximum error threshold did not seem to impact results (cases a–d versus e–h of Figure 5), demonstrating that the rules produced by the LLM technique were robust even with the highest value.
As for the argument-based models, the inconsistency budget β set to 50% seems to have solved all cases of undecided inferences (cases b and f of Figure 5). Nonetheless, all other β led to inferences with a maximum of 28% of undecided cases while achieving similar or better model performance (precision, recall, accuracy Younden’s index) when compared to the counterpart models ( H 1 or H 2 ). This suggests that the argument-based models can fairly handle the feasibility of the if-then rules, regardless of the argumentation semantics employed, given the low undecideness rate. For positive values of undecided cases, an increase in model performance can also be observed (case b versus {a,c,d}, and case f versus {e,g,h} of Figure 5).

5.2. CENSUS Dataset

As per Table 5, the number of rules and attacks generated by the LLM technique for the CENSUS dataset was considerably higher than the values for the CARS dataset. Figure A1 and Figure A2 are examples of argumentation graphs generated from the if-then rules, along with the resulting sub-graphs activated under the preferred semantics. This could be intuitively expected, given this dataset’s higher number of features. In addition, the inconsistency budget β was able to substantially reduce the amount of attacks (Table 5), making some argument-based models more compact than others.
As for the inferential capacity, Figure 6 depicts the result for the CENSUS dataset. Similarly to the CARS dataset, the maximum error threshold per rule did not demonstrate a significant impact on the models’ performance (cases a–d versus e–h of Figure 6). Moreover, it is important to observe the robust inferences produced by the argument-based models employing the grounded semantics and high β , but at a cost of a higher number of undecided cases (cases c,d,g and h of Figure 6). This demonstrates that identifying undecided cases also worked with this dataset; however, certain values are too high to deem their models functional (between 60% and 93%). Such a high value of undecided cases indicates a higher number of rules elicited per record in this dataset, requiring a better strategy for conflict resolution. Lower β values were able to reduce these cases while keeping the other metrics (except for precision) close to the counterpart models ( H 1 or H 2 ). The issue with precision is likely due to the class imbalance in the data, which shows that argument-based models were less efficient in terms of positive predictions. Nonetheless, these models achieved reasonable results, suggesting that a grid search across various inconsistency budgets is a valid mechanism for evaluating the resolution of large and more conflicting scenarios.

5.3. BANK Dataset

Results associated with the data from the BANK dataset are listed in Table 6 and Figure 7. Some of the same patterns identified in the results associated with the previous two datasets can be observed. These are a higher number of rules and attacks due to the higher number of features in the data, a reduced number of attacks for lower β values, and a higher number of undecided cases for higher β values. The main difference is for argument-based models at the inconsistency budget set to 50% β . In this scenario, all undecided cases were reduced to 0, achieving very close values for the model’s performance compared to the counterpart models ( H 1 or H 2 ). This demonstrates the argumentation semantics successfully solved the conflicting scenarios with no prejudice to inferential capacity when employing 50% β . This is important to reinforce the applicability of the extracted if-then rules by the designed argument-based models.

5.4. MYOCARDIAL Dataset

Finally, despite the MYOCARDIAL dataset having the largest set of features, the number of rules and attacks (see Table 7) did not show a significant increase in results compared to the previous datasets. This suggests that the LLM technique did not utilize all available features, resulting in similar rules and premises, as seen earlier. Figure 8 depicts the models’ inferential capacity, showing consistent patterns observed in the other datasets, such as the maximum error threshold allowed per rule not impacting results; robust inferences produced by the argument-based models employing the grounded semantics and high β ; and argumentation semantics able to successfully solve the conflicting scenarios with no prejudice to inferential capacity when employing 50% β . The use of lower β values to minimize undecided cases while maintaining robust results for all other metrics is particularly notable. The recurrence of these patterns across various datasets indicates generalizability in the findings, which are discussed in the next section.

5.5. Summary and Discussion

The LLM technique for extracting if-then rules proved highly efficient across all the models built with data from the four selected datasets in terms of rule quantity, average number of premises, and model performance. The construction of argument-based models from these rules slightly improved the predictive power of resulting models over the data from the CARS dataset, which contained a small number of features. However, further fine-tuning for the other selected datasets with more features was required. This requirement stemmed mainly from the multipartite nature of the resulting argumentation frameworks (graphs) composed by the extracted if-then rules, leading to many conflicts when multiple rules were concurrently activated. These graphs can be found in Appendix A. The primary mechanism for reducing the number of activated conflicts was the inconsistency budget parameter, β . This parameter prioritized attacks based on the relevance of arguments, as determined by the LLM technique using rules’ coverage and error rates. The results showed that adjusting β effectively eliminated weaker attacks, reducing undecided inferences and improving model accuracy. This underscores the crucial role of the inconsistency budget when integrated with the LLM technique.
Overall, in terms of predictive power, the argument-based models constructed using if-then rules from the Logic Learning Machine (Section 3.1) and aggregated through the argumentation pipeline (Section 3.2) were highly comparable to models built using the same rule set but aggregated with the Standard Applied Procedure (Section 3.1.5). Achieving this equivalence was the main prerequisite for supporting the initial view that argument-based models can offer similar inferential capacity while being inherently more explainable than simple lists of rules. This increased explainability stems from the dialectical status assigned to arguments—accepted, rejected, or undecided—along with the more interpretable mechanisms used to produce final inferences. These mechanisms align more closely with how humans reason under uncertainty and conflictuality. Additionally, argumentation graphs are presumably more compact and visually accessible than a mere list of rules, potentially enhancing the explainability and interpretability of the inferential mechanisms. Indeed, explainability and interpretability are two properties that were not quantitatively measured but only deemed ideal. However, user studies could be conducted to investigate these properties more quantitatively, such as using psychometrics, as performed in [59]. Another important property of argumentation graphs is that they satisfy the GLOCAL property of AI-based systems brought forward within the eXplainable Artificial Intelligence community [4,60]. This means they allow for the global view and understanding of the inferential mechanisms. They also support local explanations by examining the single-activated arguments from input data and their connections with other arguments (Appendix A). Additionally, argumentation graphs are inferential tools for understanding contrastive explanations [6]. By utilizing the concept of attacks among arguments (the rules), it is possible to implement contrastive explanations in practice. This means that contrastive cases can be presented as complementary explanations. In other words, this approach helps explain why a particular final inference is considered the most rational choice over another. It clarifies the reasons presented by the accepted arguments and distinguishes them from the reasons discarded by the rejected arguments. Analogously, in the case of accepted arguments supporting different claims, thus contrastive, an argumentation graph can help understand why a final inference is not possible, eventually ending in a situation of undecideness. Moreover, this could be mitigated via human intervention and domain-knowledge expertise. For example, by examining the accepted arguments and their attacks, a human reasoner can explore alternative scenarios by manually removing, modifying, or adding arguments and observing the final inference’s effects. This is possible due to the lack of restrictions on the topology of argumentation graphs, allowing knowledge to be more easily incorporated.

6. Conclusions

This work presented the possibility of coupling the Logic Learning Machine (LLM), a specific rule generator, with argument-based models, a specific paradigm for modeling inconsistencies among rules. Hence, it contributes to the scientific challenge in XAI of developing data-driven methods of inference that generate transparent and interpretable results while maintaining scalability and effectiveness in complex scenarios. In detail, the applied results are generated from an empirical study that compared the Standard Applied Procedure, namely, the default way of addressing conflicts used by Logic Learning Machines, against different argument-based models built upon rules formed in LLM, using data from a selection of popular benchmarking datasets. The main rationale was to employ a robust general-purpose procedure to generate rules from data and solve their conflicts with computational argumentation, a paradigm for implementing defeasible reasoning and handling conflictuality rationally. Additionally, the dialectical status assigned to each argument within the argumentation frameworks facilitated the discussion of such argument-based models’ global and local interpretability. It demonstrated how contrastive explanations can be constructed. The comparison showed that the LLM is a stable, robust mechanism for generating rules that can be adapted into arguments, even when some parameters associated with argument-based models were manipulated. It also showed that argument-based models can achieve similar predictive power to the LLM approach while offering more explainable mechanisms for inference that are better aligned with humans’ reasoning under uncertainty. This work is applicable to any binary classification problem. Although the datasets used in this study come from the finance, healthcare, and automotive industries, the findings are not restricted to these fields and can be applied more broadly. Future work involves extending the experiments to more challenging, non-binary problems, improving the integration of the Logic Learning Machine and computational argumentation. In the broader context of XAI, it would also be valuable to more clearly position the proposed integration between LLM and computational argumentation in relation to other transparent and interpretable systems, including those beyond the realm of knowledge-based systems.

Author Contributions

Conceptualization, L.R., D.V., S.B. and L.L.; methodology, L.R., D.V., S.B. and L.L.; software, L.R., D.V. and S.B.; validation, L.R., D.V. and S.B.; formal analysis, L.R., D.V. and S.B.; investigation, L.R., D.V., S.B. and L.L.; resources, L.R., D.V., S.B. and L.L.; data curation, L.R., D.V. and S.B.; writing—original draft preparation, L.R., D.V., S.B. and L.L.; writing—review and editing, L.R., D.V., S.B. and L.L.; visualization, L.R., D.V., S.B. and L.L.; supervision, L.R., D.V., S.B. and L.L.; project administration, L.L.; funding acquisition, L.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest with this manuscript. The funders had no role in the study’s design; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A

Figure A1. Example of argumentation graph generated from the if-then rules extracted for the CENSUS dataset using the LLM technique with 10% error threshold per rule. (a) All arguments and attacks with no input data, (b,c) two examples of accepted (green) and rejected (red) arguments from some input data using the preferred semantics.
Figure A1. Example of argumentation graph generated from the if-then rules extracted for the CENSUS dataset using the LLM technique with 10% error threshold per rule. (a) All arguments and attacks with no input data, (b,c) two examples of accepted (green) and rejected (red) arguments from some input data using the preferred semantics.
Make 06 00101 g0a1
Figure A2. Example of argumentation graph generated from the if-then rules extracted for the CENSUS dataset using the LLM technique with 10% error threshold per rule. (a) All arguments and attacks with no input data, (b,c) two examples of accepted (green) and rejected (red) arguments from some input data using the preferred semantics.
Figure A2. Example of argumentation graph generated from the if-then rules extracted for the CENSUS dataset using the LLM technique with 10% error threshold per rule. (a) All arguments and attacks with no input data, (b,c) two examples of accepted (green) and rejected (red) arguments from some input data using the preferred semantics.
Make 06 00101 g0a2
Figure A3. Examples of the open-source ArgFrame framework [38] instantiated with argumentation graphs generated for the CENSUS dataset. It is possible to hover over nodes to analyze their internal structure. Data can also be imported, allowing the visualization of case-by-case inferences. Its use is recommended for a better understanding of the available functionalities.
Figure A3. Examples of the open-source ArgFrame framework [38] instantiated with argumentation graphs generated for the CENSUS dataset. It is possible to hover over nodes to analyze their internal structure. Data can also be imported, allowing the visualization of case-by-case inferences. Its use is recommended for a better understanding of the available functionalities.
Make 06 00101 g0a3

References

  1. Ali, S.; Abuhmed, T.; El-Sappagh, S.; Muhammad, K.; Alonso-Moral, J.M.; Confalonieri, R.; Guidotti, R.; Del Ser, J.; Díaz-Rodríguez, N.; Herrera, F. Explainable Artificial Intelligence (XAI): What we know and what is left to attain Trustworthy Artificial Intelligence. Inf. Fusion 2023, 99, 101805. [Google Scholar] [CrossRef]
  2. Markus, A.F.; Kors, J.A.; Rijnbeek, P.R. The role of explainability in creating trustworthy artificial intelligence for health care: A comprehensive survey of the terminology, design choices, and evaluation strategies. J. Biomed. Inform. 2021, 113, 103655. [Google Scholar] [CrossRef] [PubMed]
  3. van der Waa, J.; Nieuwburg, E.; Cremers, A.; Neerincx, M. Evaluating XAI: A comparison of rule-based and example-based explanations. Artif. Intell. 2021, 291, 103404. [Google Scholar] [CrossRef]
  4. Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.I. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef]
  5. Cao, J.; Zhou, T.; Zhi, S.; Lam, S.; Ren, G.; Zhang, Y.; Wang, Y.; Dong, Y.; Cai, J. Fuzzy inference system with interpretable fuzzy rules: Advancing explainable artificial intelligence for disease diagnosis—A comprehensive review. Inf. Sci. 2024, 662, 120212. [Google Scholar] [CrossRef]
  6. Miller, T. Contrastive explanation: A structural-model approach. Knowl. Eng. Rev. 2021, 36, e14. [Google Scholar] [CrossRef]
  7. Besnard, P.; Garcia, A.; Hunter, A.; Modgil, S.; Prakken, H.; Simari, G.; Toni, F. Introduction to structured argumentation. Argum. Comput. 2014, 5, 1–4. [Google Scholar] [CrossRef]
  8. Atkinson, K.; Baroni, P.; Giacomin, M.; Hunter, A.; Prakken, H.; Reed, C.; Simari, G.; Thimm, M.; Villata, S. Towards artificial argumentation. AI Mag. 2017, 38, 25–36. [Google Scholar] [CrossRef]
  9. Tompits, H. A survey of non-monotonic reasoning. Open Syst. Inf. Dyn. 1995, 3, 369–395. [Google Scholar] [CrossRef]
  10. Lindström, S. A semantic approach to nonmonotonic reasoning: Inference operations and choice. Theoria 2022, 88, 494–528. [Google Scholar] [CrossRef]
  11. Brewka, G. Nonmonotonic Reasoning: Logical Foundations of Commonsense; Cambridge University Press: Cambridge, UK, 1991; Volume 12. [Google Scholar]
  12. Sklar, E.I.; Azhar, M.Q. Explanation through Argumentation. In Proceedings of the 6th International Conference on Human-Agent Interaction, New York, NY, USA, 15–18 December 2018; HAI ’18. [Google Scholar] [CrossRef]
  13. Vassiliades, A.; Bassiliades, N.; Patkos, T. Argumentation and explainable artificial intelligence: A survey. Knowl. Eng. Rev. 2021, 36, e5. [Google Scholar] [CrossRef]
  14. Rizzo, L.; Longo, L. Comparing and extending the use of defeasible argumentation with quantitative data in real-world contexts. Inf. Fusion 2023, 89, 537–566. [Google Scholar] [CrossRef]
  15. Vilone, G.; Longo, L. A Quantitative Evaluation of Global, Rule-Based Explanations of Post-Hoc, Model Agnostic Methods. Front. Artif. Intell. 2021, 4, 160. [Google Scholar] [CrossRef] [PubMed]
  16. Muselli, M.; Ferrari, E. Coupling logical analysis of data and shadow clustering for partially defined positive Boolean function reconstruction. IEEE Trans. Knowl. Data Eng. 2009, 23, 37–50. [Google Scholar] [CrossRef]
  17. Bennetot, A.; Franchi, G.; Ser, J.D.; Chatila, R.; Díaz-Rodríguez, N. Greybox XAI: A Neural-Symbolic learning framework to produce interpretable predictions for image classification. Knowl.-Based Syst. 2022, 258, 109947. [Google Scholar] [CrossRef]
  18. Rizzo, L. A Novel Structured Argumentation Framework for Improved Explainability of Classification Tasks. In Proceedings of the Explainable Artificial Intelligence; Longo, L., Ed.; Springer Nature: Cham, Switzerland, 2023; pp. 399–414. [Google Scholar]
  19. Ferrari, E.; Verda, D.; Pinna, N.; Muselli, M. A Novel Rule-Based Modeling and Control Approach for the Optimization of Complex Water Distribution Networks. In Proceedings of the Advances in System-Integrated Intelligence; Valle, M., Lehmhus, D., Gianoglio, C., Ragusa, E., Seminara, L., Bosse, S., Ibrahim, A., Thoben, K.D., Eds.; Springer International Publishing: Cham, Switzerland, 2023; pp. 33–42. [Google Scholar]
  20. Nicoletta, M.; Zilich, R.; Masi, D.; Baccetti, F.; Nreu, B.; Giorda, C.B.; Guaita, G.; Morviducci, L.; Muselli, M.; Ozzello, A.; et al. Overcoming Therapeutic Inertia in Type 2 Diabetes: Exploring Machine Learning-Based Scenario Simulation for Improving Short-Term Glycemic Control. Mach. Learn. Knowl. Extr. 2024, 6, 420–434. [Google Scholar] [CrossRef]
  21. Parodi, S.; Manneschi, C.; Verda, D.; Ferrari, E.; Muselli, M. Logic Learning Machine and standard supervised methods for Hodgkin’s lymphoma prognosis using gene expression data and clinical variables. Health Inform. J. 2018, 24, 54–65. [Google Scholar] [CrossRef]
  22. Verda, D.; Parodi, S.; Ferrari, E.; Muselli, M. Analyzing gene expression data for pediatric and adult cancer diagnosis using logic learning machine and standard supervised methods. BMC Bioinform. 2019, 20, 390. [Google Scholar] [CrossRef]
  23. Gerussi, A.; Verda, D.; Cappadona, C.; Cristoferi, L.; Bernasconi, D.P.; Bottaro, S.; Carbone, M.; Muselli, M.; Invernizzi, P.; Asselta, R.; et al. LLM-PBC: Logic Learning Machine-Based Explainable Rules Accurately Stratify the Genetic Risk of Primary Biliary Cholangitis. J. Pers. Med. 2022, 12, 1587. [Google Scholar] [CrossRef]
  24. Lindgren, T. Methods for rule conflict resolution. In Proceedings of the European Conference on Machine Learning; Springer: Berlin/Heidelberg, Germany, 2004; pp. 262–273. [Google Scholar]
  25. Clark, P.; Boswell, R. Rule induction with CN2: Some recent improvements. In Proceedings of the Machine Learning—EWSL-91: European Working Session on Learning, Porto, Portugal, 6–8 March 1991; Proceedings 5. Springer: Berlin/Heidelberg, Germany, 1991; pp. 151–163. [Google Scholar]
  26. Doe, J.; Smith, J. A Survey of the Role of Voting Mechanisms in Explainable Artificial Intelligence (XAI). J. Artif. Intell. Res. 2022, 59, 123–145. [Google Scholar]
  27. Nössig, A.; Hell, T.; Moser, G. A Voting Approach for Explainable Classification with Rule Learning. In Proceedings of the IFIP International Conference on Artificial Intelligence Applications and Innovations; Springer: Berlin/Heidelberg, Germany, 2024; pp. 155–169. [Google Scholar]
  28. Lindgren, T.; Boström, H. Classification with intersecting rules. In Proceedings of the Algorithmic Learning Theory: 13th International Conference, ALT 2002, Lübeck, Germany, 24–26 November 2002; Proceedings 13. Springer: Berlin/Heidelberg, Germany, 2002; pp. 395–402. [Google Scholar]
  29. Lindgren, T.; Boström, H. Resolving rule conflicts with double induction. Intell. Data Anal. 2004, 8, 457–468. [Google Scholar] [CrossRef]
  30. Kitson, N.K.; Constantinou, A.C.; Guo, Z.; Liu, Y.; Chobtham, K. A survey of Bayesian Network structure learning. Artif. Intell. Rev. 2023, 56, 8721–8814. [Google Scholar] [CrossRef]
  31. Čyras, K.; Rago, A.; Albini, E.; Baroni, P.; Toni, F. Argumentative XAI: A survey. arXiv 2021, arXiv:2105.11266. [Google Scholar]
  32. Espinoza, M.M.; Possebom, A.T.; Tacla, C.A. Argumentation-based agents that explain their decisions. In Proceedings of the 2019 8th IEEE Brazilian Conference on Intelligent Systems (BRACIS), Salvador, Brazil, 15–18 October 2019; pp. 467–472. [Google Scholar]
  33. Caroprese, L.; Vocaturo, E.; Zumpano, E. Argumentation approaches for explanaible AI in medical informatics. Intell. Syst. Appl. 2022, 16, 200109. [Google Scholar] [CrossRef]
  34. Governatori, G.; Maher, M.J.; Antoniou, G.; Billington, D. Argumentation semantics for defeasible logic. J. Log. Comput. 2004, 14, 675–702. [Google Scholar] [CrossRef]
  35. Baroni, P.; Giacomin, M. On principle-based evaluation of extension-based argumentation semantics. Artif. Intell. 2007, 171, 675–700. [Google Scholar] [CrossRef]
  36. Wu, Y.; Caminada, M.; Podlaszewski, M. A labelling-based justification status of arguments. Stud. Log. 2010, 3, 12–29. [Google Scholar]
  37. Caminada, M. Argumentation semantics as formal discussion. Handb. Form. Argum. 2017, 1, 487–518. [Google Scholar]
  38. Rizzo, L. ArgFrame: A multi-layer, web, argument-based framework for quantitative reasoning. Softw. Impacts 2023, 17, 100547. [Google Scholar] [CrossRef]
  39. Baroni, P.; Caminada, M.; Giacomin, M. An introduction to argumentation semantics. Knowl. Eng. Rev. 2011, 26, 365–410. [Google Scholar] [CrossRef]
  40. Longo, L. Defeasible reasoning and argument-based systems in medical fields: An informal overview. In Proceedings of the 2014 IEEE 27th International Symposium on Computer-Based Medical Systems, New York, NY, USA, 27–29 May 2014; pp. 376–381. [Google Scholar]
  41. Cocarascu, O.; Stylianou, A.; Čyras, K.; Toni, F. Data-empowered argumentation for dialectically explainable predictions. In ECAI 2020; IOS Press: Amsterdam, The Netherlands, 2020; pp. 2449–2456. [Google Scholar]
  42. Castagna, F.; McBurney, P.; Parsons, S. Explanation–Question–Response dialogue: An argumentative tool for explainable AI. Argum. Comput. 2024, 1–23, preprint. [Google Scholar] [CrossRef]
  43. Ferrari, E.; Muselli, M. Maximizing pattern separation in discretizing continuous features for classification purposes. In Proceedings of the The 2010 IEEE International Joint Conference on Neural Networks (IJCNN), Barcelona, Spain, 18–23 July 2010. [Google Scholar]
  44. Cangelosi, D.; Muselli, M.; Parodi, S.; Blengio, F.; Becherini, P.; Versteeg, R.; Conte, M.; Varesio, L. Use of Attribute Driven Incremental Discretization and Logic Learning Machine to build a prognostic classifier for neuroblastoma patients. BMC Bioinform. 2014, 15, 1–15. [Google Scholar] [CrossRef] [PubMed]
  45. Ferrari, E.; Verda, D.; Pinna, N.; Muselli, M. Optimizing Water Distribution through Explainable AI and Rule-Based Control. Computers 2023, 12, 123. [Google Scholar] [CrossRef]
  46. Muselli, M.; Quarati, A. Reconstructing positive Boolean functions with shadow clustering. In Proceedings of the 2005 IEEE European Conference on Circuit Theory and Design, Cork, Ireland, 2 September 2005; Volume 3. [Google Scholar]
  47. Walton, D.; Reed, C.; Macagno, F. Attack, Rebuttal, and Refutation. In Argumentation Schemes; Cambridge University Press: Cambridge, UK, 2008; pp. 220–274. [Google Scholar]
  48. Baumann, R.; Spanring, C. A Study of Unrestricted Abstract Argumentation Frameworks. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17), Melbourne, Australia, 19–25 August 2017; Volume 17, pp. 807–813. [Google Scholar]
  49. Dunne, P.E.; Hunter, A.; McBurney, P.; Parsons, S.; Wooldridge, M.J. Inconsistency tolerance in weighted argument systems. In Proceedings of the AAMAS (2), Budapest, Hungary, 10–15 May 2009; pp. 851–858. [Google Scholar]
  50. Dunne, P.E.; Hunter, A.; McBurney, P.; Parsons, S.; Wooldridge, M. Weighted argument systems: Basic definitions, algorithms, and complexity results. Artif. Intell. 2011, 175, 457–486. [Google Scholar] [CrossRef]
  51. Bistarelli, S.; Santini, F. Conarg: A constraint-based computational framework for argumentation systems. In Proceedings of the 2011 IEEE 23rd International Conference on Tools with Artificial Intelligence, Boca Raton, FL, USA, 7–9 November 2011; pp. 605–612. [Google Scholar]
  52. Pazienza, A.; Ferilli, S.; Esposito, F.; Bistarelli, S.; Giacomin, M. Constructing and Evaluating Bipolar Weighted Argumentation Frameworks for Online Debating Systems. In Proceedings of the AI3@ AI* IA, Bari, Italy, 14–17 November 2017; pp. 111–125. [Google Scholar]
  53. Vilone, G.; Longo, L. An Examination of the Effect of the Inconsistency Budget in Weighted Argumentation Frameworks and their Impact on the Interpretation of Deep Neural Networks. In Proceedings of the Joint Proceedings of the xAI-2023 Late-breaking Work, Demos and Doctoral Consortium Co-Located with the 1st World Conference on eXplainable Artificial Intelligence (xAI-2023), Lisbon, Portugal, 26–28 July 2023; Longo, L., Ed.; CEUR Workshop Proceedings: Tenerife, Spain, 2023; Volume 3554, pp. 53–58. Available online: https://CEUR-WS.org (accessed on 6 June 2024).
  54. Dung, P.M. On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming and n-person games. Artif. Intell. 1995, 77, 321–357. [Google Scholar] [CrossRef]
  55. Bonzon, E.; Delobelle, J.; Konieczny, S.; Maudet, N. A comparative study of ranking-based semantics for abstract argumentation. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30. [Google Scholar]
  56. Caminada, M.W.; Gabbay, D.M. A logical account of formal argumentation. Stud. Log. 2009, 93, 109. [Google Scholar] [CrossRef]
  57. Caminada, M. On the issue of reinstatement in argumentation. In Proceedings of the European Workshop on Logics in Artificial Intelligence; Springer: Berlin/Heidelberg, Germany, 2006; pp. 111–123. [Google Scholar]
  58. Besnard, P.; Hunter, A. A logic-based theory of deductive arguments. Artif. Intell. 2001, 128, 203–235. [Google Scholar] [CrossRef]
  59. Vilone, G.; Longo, L. Development of a Human-Centred Psychometric Test for the Evaluation of Explanations Produced by XAI Methods. In Proceedings of the Explainable Artificial Intelligence; Longo, L., Ed.; Springer Nature: Cham, Switzerland, 2023; pp. 205–232. [Google Scholar]
  60. Lv, G.; Chen, L.; Cao, C.C. On glocal explainability of graph neural networks. In Proceedings of the International Conference on Database Systems for Advanced Applications; Springer: Cham, Switzerland, 2022; pp. 648–664. [Google Scholar]
Figure 1. Illustration of the integration of a data-driven rule-generator (Logic Learning Machine) and a rule-aggregator with non-monotonic logic (structured argumentation).
Figure 1. Illustration of the integration of a data-driven rule-generator (Logic Learning Machine) and a rule-aggregator with non-monotonic logic (structured argumentation).
Make 06 00101 g001
Figure 2. An illustration of a multipartite argumentation graph. A node represents each argument and has an if-then internal structure following Equation (1) (premises are omitted for the sake of simplicity). Arguments ac share a common output class, whereas arguments df share a different one. Each argument in a partite attacks all the other arguments in the other partite.
Figure 2. An illustration of a multipartite argumentation graph. A node represents each argument and has an if-then internal structure following Equation (1) (premises are omitted for the sake of simplicity). Arguments ac share a common output class, whereas arguments df share a different one. Each argument in a partite attacks all the other arguments in the other partite.
Make 06 00101 g002
Figure 3. An illustrative example of elicitation of arguments and the definition of their dialectical status. Node labels contain the argument label and its weight. The premise of argument a does not hold true with the input data, so it is discarded along with its incoming/outgoing attacks. For graphs 1, 2, 3, and 4, the following can be observed, respectively: attacks { } , { d c } , { d c , c b } , and { d c , c d , c b } are removed to respect the inconsistency budget defined; the grounded extensions are { } , { } , { c } , { c , d } , and the preferred extensions are { { c } , { b , d } } , { { c } , { b , d } } , { c } , { c , d } ; the top ranked arguments for the categoriser are { b , d } , { b , c , d } , { c } , { c , d } .
Figure 3. An illustrative example of elicitation of arguments and the definition of their dialectical status. Node labels contain the argument label and its weight. The premise of argument a does not hold true with the input data, so it is discarded along with its incoming/outgoing attacks. For graphs 1, 2, 3, and 4, the following can be observed, respectively: attacks { } , { d c } , { d c , c b } , and { d c , c d , c b } are removed to respect the inconsistency budget defined; the grounded extensions are { } , { } , { c } , { c , d } , and the preferred extensions are { { c } , { b , d } } , { { c } , { b , d } } , { c } , { c , d } ; the top ranked arguments for the categoriser are { b , d } , { b , c , d } , { c } , { c , d } .
Make 06 00101 g003
Figure 4. Design of a comparative experiment with four main steps: (a) Selection and pre-processing of four datasets for binary classification tasks; (b) Automatic formation of if-then rules from the selected dataset using the Logic Learning Machine (LLM) technique; (c) Generation of final inferences using two aggregator logics rules: the Standard Applied Procedure and computational argumentation; (d) Comparative analysis via standard binary classification metrics and percentage of undecided cases (NAs: when a model cannot lead to a final inference).
Figure 4. Design of a comparative experiment with four main steps: (a) Selection and pre-processing of four datasets for binary classification tasks; (b) Automatic formation of if-then rules from the selected dataset using the Logic Learning Machine (LLM) technique; (c) Generation of final inferences using two aggregator logics rules: the Standard Applied Procedure and computational argumentation; (d) Comparative analysis via standard binary classification metrics and percentage of undecided cases (NAs: when a model cannot lead to a final inference).
Make 06 00101 g004
Figure 5. Overall results for inferences produced using the CARS dataset grouped by the error threshold per rule parameter (10% for the first four blocks (ad), 25% for the second four blocks (eh)), and divided by inconsistency budget variation (25%, 50%, 90%, 100%).
Figure 5. Overall results for inferences produced using the CARS dataset grouped by the error threshold per rule parameter (10% for the first four blocks (ad), 25% for the second four blocks (eh)), and divided by inconsistency budget variation (25%, 50%, 90%, 100%).
Make 06 00101 g005
Figure 6. Overall results for inferences produced using the CENSUS dataset grouped by the error threshold per rule parameter (10% for the first four blocks (ad), 25% for the second four blocks (eh)) and divided by inconsistency budget variation (25%, 50%, 90%, 100%).
Figure 6. Overall results for inferences produced using the CENSUS dataset grouped by the error threshold per rule parameter (10% for the first four blocks (ad), 25% for the second four blocks (eh)) and divided by inconsistency budget variation (25%, 50%, 90%, 100%).
Make 06 00101 g006
Figure 7. Overall results for inferences produced using the BANK dataset grouped by the error threshold per rule parameter (10% for the first four blocks (ad), 25% for the second four blocks (eh)), and divided by inconsistency budget variation (25%, 50%, 90%, 100%).
Figure 7. Overall results for inferences produced using the BANK dataset grouped by the error threshold per rule parameter (10% for the first four blocks (ad), 25% for the second four blocks (eh)), and divided by inconsistency budget variation (25%, 50%, 90%, 100%).
Make 06 00101 g007
Figure 8. Overall results for inferences produced using the MYOCARDIAL dataset grouped by the error threshold per rule parameter (10% for the first four blocks (ad), 25% for the second four blocks (eh)), and divided by inconsistency budget variation (25%, 50%, 90%, 100%).
Figure 8. Overall results for inferences produced using the MYOCARDIAL dataset grouped by the error threshold per rule parameter (10% for the first four blocks (ad), 25% for the second four blocks (eh)), and divided by inconsistency budget variation (25%, 50%, 90%, 100%).
Make 06 00101 g008
Table 1. Characteristics of final datasets formed for subsequent rule-formation and aggregation.
Table 1. Characteristics of final datasets formed for subsequent rule-formation and aggregation.
Datasets
CARSCENSUSBANKMYOCARDIAL
Features6141659
Records172830162452111436
Class distribution (positive-negative)30–70%25–75%29–71%24–76%
Feature typesNumerical, categoricalNumerical, categoricalNumerical, categoricalNumerical, categorical
Target featureClassIncomeyZSN
Table 2. List of models built with the LLM technique.
Table 2. List of models built with the LLM technique.
ModelError Threshold per RuleMax. Premises per RuleConflict Resolution
H 1 10%4Standard Applied Procedure
H 2 25%4Standard Applied Procedure
Table 3. List of models built with argumentation and if-then rules from the LLM technique. The inconsistency budget is defined as a percentage of the sum of all attack weights.
Table 3. List of models built with argumentation and if-then rules from the LLM technique. The inconsistency budget is defined as a percentage of the sum of all attack weights.
ModelInput RulesSemanticsInc. Budget
A G 1 / A P 1 / A C 1 Same as H 1 Grounded/Preferred/Categoriser25%
A G 2 / A P 2 / A C 2 Same as H 1 Grounded/Preferred/Categoriser50%
A G 3 / A P 3 / A C 3 Same as H 1 Grounded/Preferred/Categoriser90%
A G 4 / A P 4 / A C 4 Same as H 1 Grounded/Preferred/Categoriser100%
A G 5 / A P 5 / A C 5 Same as H 2 Grounded/Preferred/Categoriser25%
A G 6 / A P 6 / A C 6 Same as H 2 Grounded/Preferred/Categoriser50%
A G 7 / A P 7 / A C 7 Same as H 2 Grounded/Preferred/Categoriser90%
A G 8 / A P 8 / A C 8 Same as H 2 Grounded/Preferred/Categoriser100%
Table 4. Summary of rules count and attacks count based on error threshold per rule and the inconsistency budget β using the data from the CARS dataset.
Table 4. Summary of rules count and attacks count based on error threshold per rule and the inconsistency budget β using the data from the CARS dataset.
Error Threshold
per Rule
# RulesAverage # Premises# Attacks
β 25% β 50% β 90% β 100%
10%926172936
25%61.524810
Table 5. Summary of rules count and attacks count based on error threshold per rule and the inconsistency budget β using the CENSUS dataset.
Table 5. Summary of rules count and attacks count based on error threshold per rule and the inconsistency budget β using the CENSUS dataset.
Error Threshold
per Rule
# RulesAverage # Premises# Attacks
β 25% β 50% β 90% β 100%
10%312.7454240357480
25%142.1412487496
Table 6. Summary of rules count, and attacks count based on error threshold per rule and the inconsistency budget β using the BANK dataset.
Table 6. Summary of rules count, and attacks count based on error threshold per rule and the inconsistency budget β using the BANK dataset.
Error Threshold
per Rule
# RulesAverage # Premises# Attacks
β 25% β 50% β 90% β 100%
10%312.7461234367468
25%152.85135686112
Table 7. Summary of rules count and attacks count based on error threshold per rule and the inconsistency budget β using the MYOCARDIAL dataset.
Table 7. Summary of rules count and attacks count based on error threshold per rule and the inconsistency budget β using the MYOCARDIAL dataset.
Error Threshold
per Rule
# RulesAverage # Premises# Attacks
β 25% β 50% β 90% β 100%
10%332.57102270458540
25%141.7116498198
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Rizzo, L.; Verda, D.; Berretta, S.; Longo, L. A Novel Integration of Data-Driven Rule Generation and Computational Argumentation for Enhanced Explainable AI. Mach. Learn. Knowl. Extr. 2024, 6, 2049-2073. https://doi.org/10.3390/make6030101

AMA Style

Rizzo L, Verda D, Berretta S, Longo L. A Novel Integration of Data-Driven Rule Generation and Computational Argumentation for Enhanced Explainable AI. Machine Learning and Knowledge Extraction. 2024; 6(3):2049-2073. https://doi.org/10.3390/make6030101

Chicago/Turabian Style

Rizzo, Lucas, Damiano Verda, Serena Berretta, and Luca Longo. 2024. "A Novel Integration of Data-Driven Rule Generation and Computational Argumentation for Enhanced Explainable AI" Machine Learning and Knowledge Extraction 6, no. 3: 2049-2073. https://doi.org/10.3390/make6030101

APA Style

Rizzo, L., Verda, D., Berretta, S., & Longo, L. (2024). A Novel Integration of Data-Driven Rule Generation and Computational Argumentation for Enhanced Explainable AI. Machine Learning and Knowledge Extraction, 6(3), 2049-2073. https://doi.org/10.3390/make6030101

Article Metrics

Back to TopTop