Symbolic AI for XAI: Evaluating LFIT Inductive Programming for Explaining Biases in Machine Learning

: Machine learning methods are growing in relevance for biometrics and personal information processing in domains such as forensics, e-health, recruitment, and e-learning. In these domains, white-box (human-readable) explanations of systems built on machine learning methods become crucial. Inductive logic programming (ILP) is a subﬁeld of symbolic AI aimed to automatically learn declarative theories about the processing of data. Learning from interpretation transition (LFIT) is an ILP technique that can learn a propositional logic theory equivalent to a given black-box system (under certain conditions). The present work takes a ﬁrst step to a general methodology to incorporate accurate declarative explanations to classic machine learning by checking the viability of LFIT in a speciﬁc AI application scenario: fair recruitment based on an automatic tool generated with machine learning methods for ranking Curricula Vitae that incorporates soft biometric information (gender and ethnicity). We show the expressiveness of LFIT for this speciﬁc problem and propose a scheme that can be applicable to other domains. In order to check the ability to cope with other domains no matter the machine learning paradigm used, we have done a preliminary test of the expressiveness of LFIT, feeding it with a real dataset about adult incomes taken from the US census, in which we consider the income level as a function of the rest of attributes to verify if LFIT can provide logical theory to support and explain to what extent higher incomes are biased by gender and ethnicity.


Introduction
Statistical and optimisation-based machine learning algorithms are supported by well-known and solid numerical and statistical methods. These techniques have achieved great success in various applications such as speech recognition [1], image classification [2], machine translation [3], and other problems in very different domains.
These approaches include, among others, classic neural architectures, rule generating systems based on entropy, support vector machines, and especially deep neural networks that have shown the most remarkable success, especially in speech and image recognition.
Although deep learning methods usually have good generalisation ability on similarly distributed new data, they have some weaknesses, including their trend to hide the reasons for their behaviours [4,5] , and having a clear explanation of the machine behaviour can be crucial in many practical applications.
Although in some machine learning scenarios, explanations are rather an extra; in others they are mandatory, e.g., forensics identification [6,7] , automatic recruitment systems [8], and financial risks consulting (https://www.bbc.com/news/business-50365609, accessed on 3 November 2021). Explanations are also required in some specific domains, in which ethics behaviour is a priority, such as those in which unacceptable biases (by gender or ethnicity) are detected [9][10][11] Two of these application areas are experimentally addressed in the present paper: automatic recruitment tools and income level prediction based on demographic information.
There is a classical classification of machine learning systems from their capability to generate explanations about their processes [12]: models are weak if they are only able to improve their predictive performance with increasing amounts of data without giving any reason understandable by human beings; they are strong if they additionally provide their hypotheses in symbolic (declarative) form; they are ultra strong if they have the ability to generate new knowledge that could improve the performance of human beings after learning it.
Most of the current AI systems based on common machine learning paradigms (including deep learning) are weak. Inductive logic programming (ILP) systems are, however, ultra strong by design [13,14]. Our goal is to incorporate ILP capabilities to already existing machine learning frameworks to turn these usually weak systems into ultra-strong, in a kind of explainable AI (XAI) [15].
Logic programming is based on first-order logic that is a standard model to represent human knowledge. Inductive logic programming (ILP) has been developed for inductively learning logic programs from examples, and already known theories [16]. The basic idea that supports ILP takes as input a collection of positive and negative examples and an already known theory about the domain under consideration (background knowledge). ILP systems learn declarative (symbolic) programs [17,18], which could even be noise-tolerant [19,20], that entails all of the positive examples but none of the negative examples.
It is important to pay attention to the model that supports the learning engine in these approaches. The output of ILP systems is very similar to that of, for example, (classic) rule-based learners, fuzzy rule-based learners, decision trees, and similar systems. But the model of the learners is different.
ILP is based on theoretical results from formal logic that guarantees the properties of the learned theory. The most relevant properties for us are equivalence to the observed data and simplicity (minimality) of the induced formulas.
The other numerical/statistical approaches usually learn a version that is an approximation good enough to the original data. These approaches usually are driven by the gradient of some kind of loss (or gain) differentiable function.
This circumstance has important consequences for the XAI researcher because the properties and features of the results are radically different.
In the declarative realm, equivalence, for example, is a property that holds or does not hold at all. There is not a measure of the degree of equivalence among different objects; hence, it does not make sense to define measures for the degree of equivalence.
From the numerical/statistical viewpoint, most of the models learned can be more or less approximated to the original data, so it is absolutely natural and advisable to define accuracy metrics to compare different approaches.
Therefore, in general, it seems of low relevance to try to compare, from the accuracy viewpoint, machine learning approaches that guarantee to induce versions equivalent to the observed data (such as ILP) and approaches that learn approximations more or less accurate (such as most of the numerical/statistical models).
There is another important consideration-expressiveness. From the declarative viewpoint, it is not exactly about the expressive power of the models that support the learning engine but about readability and comfort for the user. For example, a difference between first-order logic and propositional logic is the use of variables; that is allowed in the former but forbidden in the latter. Roughly speaking, a first-order expression like score(X) seems more readable than the propositional one: score(1), score(2), score(3), score(4), score(5), score (6), score (7), if it is clear that the variable X can only take these seven values; although both expressions represent the same facts.
For our purposes, learning from interpretation transition (LFIT) [21] is one of the most promising approaches of ILP. LFIT induces a logical representation of dynamical complex systems by observing their behaviour as a black box under some circumstances. This logic version can be considered as a white-box digital twin of the system under consideration. The most general of LFIT algorithms is GULA (general usage LFIT algorithm). PRIDE is an approximation to GULA with polynomial performance. These approaches will be introduced in depth in the following sections.
Our research is interested in declarative machine learning models that guarantee the equivalence of the learned theory and the observed data.
As we will discuss in further sections, LFIT belongs to this kind of approach. LFIT additionally guarantees that the set of conditions of each propositional clause (rule) is minimal. These two guarantees informally mean that the complexity of the theory learned by LFIT depends exclusively on the complexity of the observed data.
LFIT is an inductive propositional logic programming model. It is a well-known fact (and was previously mentioned) that propositional logic theories end up being less readable than others, such as those of first-order logic.
In this paper, we are testing LFIT in different scenarios. However, it is out of our scope to compare LFIT with other numerical/statistical rules-based learners, and also to try to increase the readability of LFIT results in comparison with, for example, first-order logic. We plan to try to face these pending questions in future experiments. Figure 1 shows the architecture of our proposed approach for generating white-box explanations using PRIDE of a given black-box classifier.

Output classes (target) = {0,1}
Input features (variables) Declarative explanation (propositional logic fragment) = v 1 , v 2 , v 3 v 1 binary, v 2 and v 3 ∈ ℕ Classifier seen as black-box system (input/outputs) Examples (two inputs and ) Note that the explanation reveals that the output only depends on v 1 3 Figure 1. Architecture of the proposed approach for generating an explanation of a given black-box classifier (1) using PRIDE (2) with a toy example (3). Note that the resulting explanations generated by PRIDE are in propositional logic.
The main contributions of this work are: • We have proposed a method to provide declarative explanations and descriptions using PRIDE of typical machine learning scenarios. Our approach guarantees a logical equivalent version to explain how the outputs are related with the inputs in a general machine learning setup. • We have applied our proposal to two different domains to check the generality of our approach.
-A multimodal machine learning test-bed around automatic recruitment including different biases (by gender and ethnicity) on synthetic datasets.

-
A real dataset about adult incomes taken from US census whose possible biases to get higher earnings are found and shown.
A preliminary version of this article was published in [22]. This new work significantly improves [22] in the following ways: • We have updated the state of the art methods applicable to XAI. • We have enriched the introduction to LFIT with examples for a more general audience. • We have checked the expressiveness of our approach (based on LFIT) extending it to a dataset about adult income level from the 1994 US census. In this domain, we have not used any deep learning algorithms to compare, showing that the proposed approach is also applicable under this circumstance.
The rest of the paper is structured as follows: Section 2 summarises the related relevant literature. Section 3 describes our methodology including LFIT, GULA, and PRIDE. Section 4 presents the experimental framework, including the datasets and experiments conducted. Section 5 presents our results. Finally Sections 6 and 7 respectively discuss our work and describe our conclusions and further research lines.

Explainable AI (XAI): Declarative Approaches
Among the methods suitable to generate explanations because they could be considered as strong or ultra-strong, we find the state of the art evolutionary approaches (from initial genetic programming [23] to grammatical-based methods [24][25][26] and other algebraic ways to express algorithms as straight-line programs [27]); declarative-numeric hybrid (in some way) approaches such as δ ILP [28] (that mix neural and logic domains) or DeepProbLog [29] (that follows a probabilistic and logic approach); and finally declarative approaches, like the one developed in the present paper.
The state of the art method shows that most of the reviews on XAI identify different methods to explain or interpret black-box machine learning algorithms without considering declarative approaches. Especially noteworthy are the exhaustive treatment of rule extracting systems and the exclusion of formal logic-based methods [30][31][32][33][34], like the one presented here.
In general, the most exhaustive reviews mainly focus on numeric approaches to generate explanations, both for specific domains such as graph neural networks (see [35] and in general [15,36,37]).
We attempt to deeper explain the relationship among LFIT and machine learning algorithms focused on rule sets (including fuzzy logic ones) and decision trees because their outputs syntactically look similar.
In [15], we find a detailed taxonomy that includes these systems. It is clear that they are mostly used in a post-hoc way (explaining the result of a black-box algorithm after being generated) in approaches named explanations by simplification and, in some cases, local explanations.
In the case of local explanations, a set of explanations is generated from the different smaller subsystems that are identified in the global system. It is clear that both (explanation by simplification and local explanations) need to simplify the model that explain, avoiding the complexity of explaining the complete model as a whole. As we have explained before, our research is not interested in approaches that need to simplify the model to generate explanations.
In addition, these approaches are usually supported by numerical engines. Most are driven by the gradient of a loss or gain function. This is not the case with LFIT and other declarative approaches. As we have previously explained, the current research is not interested in these kinds of numerical/statistical approaches.
New possible classifications arise when the declarative dimension is taken into account. If rule sets (generated as outputs) are considered declarative, approaches such as rule-based learners (including fuzzy) and decision trees could be considered hybrids. If each approach is classified by taking into account only the nature of its learning engine, these systems should be considered as numerical/statistical. The authors of the current contribution have previously published an internal report that extends taxonomies like [15] from this viewpoint.
Numerical (statistical) and declarative approaches such as those mentioned in this paper are classical alternatives for facing machine learning. As we have previously explained, they differ in several important features that we can summarise in the following way: • Statistical approaches need huge amounts of data to extract valid knowledge, while declarative ones are usually able to minimise the set of examples and counterexamples to get the same. • Statistical approaches are usually compatible with noisy and poorly labelled data, while for declarative ones, this is a circumstance difficult to overcome. • Statistical approaches do not offer, in the general case, clear explanations about the decisions they make (usually considered as weak machine learning algorithms), while declarative approaches (due to the declarative nature of the formal models that support them) are designed to be at least strong. • Declarative approaches are supported by formal models like functional programming or formal logic. The theoretical properties of these models make it possible that the learned knowledge exhibits some characteristics (such as logical equivalence, minimisation, etc.) • Hybrid approaches try to take advantage of both possibilities. Hybridisation can mix a declarative learning engine with numerical components or the opposite. The characteristics of the learned model depend on the type of hybridisation: equivalent noise-tolerant versions of the observed data can be learned by logical engines with numerical input components, and quasi-equivalent logical theories can be approximately induced by numerical/statistical machine learning algorithms that implement differentiable versions of logical operators and inference rules.

Inductive Programming for XAI
Some meta-heuristic approaches (the aforementioned evolutionary methods) have been used to automatically generate programs. Genetic programming (GP) was introduced by Koza [38] for automatically generating LISP expressions for given tasks expressed as pairs (input/output). This is, in fact, a typical machine learning scenario. GP was extended by the use of formal grammar to generate programs in any arbitrary language, keeping not only syntactic correctness [24] but also semantic properties [26]. Algorithms expressed in any language are declarative versions of the concepts learnt, which makes evolutionary automatic programming algorithms machine learners with good explainability.
Of particular interest for us within declarative paradigms is logic programming, and in particular, first-order logic programming, which is based on the Robinson's resolution inference rule that automates the reasoning process of deducing new clauses from a firstorder theory [44]. Introducing examples and counter examples and combining this scheme with the ability to extend the initial theory with new clauses, it is possible to automatically induce a new theory that (logically) entails all of the positive examples but none of the negative examples. The underlying theory from which the new one emerges is considered background knowledge. This is the hypothesis of inductive logic programming (ILP [18,45]) that has received a great research effort in the last two decades. Recently, these approaches have been extended to make them noise-tolerant (in order to overcome one of the main drawbacks of ILP vs. statistical/numerical approaches when facing bad-labelled or noisy examples [20]).
Other declarative paradigms are also compatible with ILP, e.g., MagicHaskeller [46], with the functional programming language Haskell; and ILASP [47] for inductively learning answer set programs.
It has been previously mentioned that ILP implies some kind of search in spaces that can become huge. This search can be eased by hybridising with other techniques, e.g., [48] introduces GA-Progol that applies evolutive techniques.
Within ILP methods, we have identified LFIT as especially relevant for explainable AI (XAI). Although LFIT learns propositional logic theories instead of first-order logic, the aforementioned ideas about ILP are still valid. In the next section, we will describe the fundamentals of LFIT and its PRIDE implementation, which will be tested experimentally for XAI in the experiments that will follow.

Learning From Interpretation Transition (LFIT)
Learning from interpretation transition (LFIT) [49] has been proposed to automatically construct a model of the dynamics of a system from the observation of its state transitions. Given some raw data, like time-series data of gene expression, a discretisation of those data in the form of state transitions is assumed. From those state transitions, according to the semantics of the system dynamics, several inference algorithms modelling the system as a logic program have been proposed. The semantics of a system's dynamics can indeed differ with regard to the synchronism of its variables, the determinism of its evolution and the influence of its history.
The LFIT framework proposes several modelling and learning algorithms to tackle those different semantics. To date, the following systems have been tackled: memory-less deterministic systems [49], systems with memory [50], probabilistic systems [51], and their multi-valued extensions [52,53]. The work [54], proposes a method that deals with continuous time series data, the abstraction itself being learned by the algorithm.
In [55,56], LFIT was extended to learn system dynamics independently of its update semantics. That extension relies on a modeling of discrete memory-less multi-valued systems as logic programs in which each rule represents a variable that takes some value at the next state, extending the formalism introduced in [49,57]. The representation in [55,56] is based on annotated logics [58,59]. Here, each variable corresponds to a domain of discrete values. In a rule, a literal is an atom annotated with one of these values. It represents annotated atoms simply as classical atoms, and thus, remains at a propositional level. This modelling characterises optimal programs independently of the update semantics. It allows modelling the dynamics of a wide range of discrete systems, including our domain of interest in this paper. LFIT can be used to learn an equivalent propositional logic program that provides explanations for each given observation. Figure 2 graphically describes our proposed approach to generate explanations using LFIT of a given black-box classifier. We can see that our purpose is to get declarative explanations in parallel (in a kind of white-blox digital twin) to a given neural network classifier. In the present work, for our first set of experiments, we used the same neural network and datasets described in [8], excluding the face images as explained in the following sections. In our second set of experiments (income prediction) we did not consider any machine learning algorithms to compare with. Therefore, the black box of Figure 2 is not considered in that case, although the rest of the figure is still applicable. In that set of experiments, we explore declarative explanations of the input/output relation of the training/testing datasets. Experimental framework: PRIDE is fed with all the data available (train + test) for increasing the accuracy of the equivalence. In our experiments we consider the classifier (see [8] for details) as a black box to perform regression from input resume attributes (atts.) to output labels (recruitment scores labelled by human resources experts). LFIT gets a digital twin to the neural network providing explainability (as human-readable white-box rules) to the neural network classifier.

PRIDE Implementation of LFIT
GULA [55,56] and PRIDE [60] are particular implementations of the LFIT framework [49]. In the present section we introduce and describe, first informally and then formally, the notation and the fundamentals of both methods. Table 1 summarises the dataset about incomes of adults in USA. It shows the names, meaning, type of data, and codification used in our experiments. In the following points we will explain by means of examples over that table the relevant LFIT concepts.  Table 1 shows multi-valued attributes instead of binary. LFIT translates them into propositional ones (binary) creating as many propositional (binary) variables as possible values for each attribute. Although we keep the typical functional notation var(value), each combination is in fact the propositional variable var value .

Rules
LFIT expresses the theory it learns as a set of propositional Horn clauses with exactly one positive literal, that is, as logical implications between a conjunction of propositional atoms in the following form: The Prolog form shown in Listing 1 is usually preferred. Listing 1. Prolog notation for LFIT rules.
In the domain of adult incomes we could find rules like those shown in Listing 2.

Rule Domination
In the LFIT learning process, rule domination is an important concept. Roughly speaking, when they have the same head, a rule dominates another if its body is contained in the others.
In Listing 3 you can see how R 1 dominates R 2 .
Dominant rules can be considered more general and are the goal of LFIT.

States and Rule-State Matching
Rule generation in LFIT starts from the design of a body that fits as many examples as possible. This is done by rule-state matching. Informally, a state is a conjunction of atoms (positive literals, that is, associations between attributes and specific values) that could describe one or more examples.
A rule and a state match if the body of the rule is included in the state. Listing 4 shows an example of rule-state matching: state s 1 and rule R 1 does.  In the following, we denote by N := {0, 1, 2, . . . }, the set of natural numbers, and for all k, n ∈ N, k; n := {i ∈ N | k ≤ i ≤ n} is the set of natural numbers between k and n included. For any set S, the cardinal of S is denoted |S| and the power set of S is denoted ℘(S).
Let V = {v 1 , . . . , v n } be a finite set of n ∈ N variables, V al the set in which variables take their values and dom : V → ℘(V al) a function associating a domain to each variable. The atoms of MVL (multi-valued logic) are of the form v val where v ∈ V and val ∈ dom(v). The set of such atoms is denoted by A V dom = {v val ∈ V × V al | val ∈ dom(v)} for a given set of variables V and a given domain function dom. In the following, we work on specific V and dom that we omit to mention when the context makes no ambiguity, thus simply writing A for A V dom .
An MVL rule R is defined by: where ∀i ∈ 0; m , v val i i ∈ A are atoms in MVL so that every variable is mentioned at most once in the right-hand part: ∀j, k ∈ 1; m , j = k ⇒ v j = v k . Intuitively, the rule R has the following meaning: the variable v 0 can take the value val 0 in the next dynamical step if for each i ∈ 1; m , variable v i has value val i in the current dynamical step.
The atom on the left-hand side of the arrow is called the head of R and is denoted h(R) := v val 0 0 . The notation var(h(R)) := v 0 denotes the variable that occurs in h(R). The conjunction on the right-hand side of the arrow is called the body of R, written b(R) and can be assimilated to the set {v val 1 1 , . . . , v val m m }; we thus use set operations such as ∈ and ∩ on it. The notation var(b(R)) := {v 1 , · · · , v m } denotes the set of variables that occurs in b(R). More generally, for all set of atoms X ⊆ A, we denote var(X) := {v ∈ V | ∃val ∈ dom(v), v val ∈ X} the set of variables appearing in the atoms of X. A multi-valued logic program (MVLP) is a set of MVL rules.
Definition 1 introduces a domination relation between rules that defines a partial anti-symmetric ordering. Rules with the most general bodies dominate the other rules. In practice, these are the rules we are interested in since they cover the most general cases.
In [56], the set of variables is divided into two disjoint subsets: T (for targets) and F (for features). This allows us to define a dynamic MVLP, which captures the dynamics of the problems we tackle in this paper.
The dynamical system we want to learn the rules of is represented by a succession of states as formally given by Definition 3. We also define the "compatibility" of a rule with a state in Definition 4.

Definition 3 (Discrete state).
A discrete state s on T (resp. F ) of a DMVLP is a function from T (resp. F ) to N, i.e., it associates an integer value to each variable in T (resp. F ). It can be equivalently represented by the set of atoms {v s(v) | v ∈ T (resp. F )} and thus we can use classical set operations on it. We write S T (resp. S F ) to denote the set of all discrete states of T (resp. F ), and a couple of states (s, s ) ∈ S F × S T is called a transition. GULA [55,56] and PRIDE [60] can produce such programs. Formally, given a set of observations T, GULA [55,56] and PRIDE [60] will learn a set of rules P such that all observations are explained: ∀(s, s ) ∈ T, ∀v val ∈ s , ∃R ∈ P, R s, h(R) = v val . All rules of P are correct w.r.t. T: ∀R ∈ P, ∀(s1, s2) ∈ T, R s1 =⇒ ∃(s1, s3) ∈ T, h(R) ∈ s3 (if T is deterministic, s2 = s3). All rules are minimal w.r.t. F : ∀R ∈ P, ∀R ∈ MVLP, R correct w.r.t. T it holds that R ≤ R =⇒ R = R.
The possible explanations of an observation are the rules that match the feature state of this observation. The body of rules gives a minimal condition over feature variables to obtain its conclusions over a target variable. Multiple rules can match the same feature state, thus multiple explanations can be possible. Rules can be weighted by the number of observations they match to assert their level of confidence. Output programs of GULA and PRIDE can also be used in order to predict and explain from unseen feature states by learning additional rules that encode when a target variable value is not possible as shown in the experiments of [56]. The current contribution shows a possible application of a declarative method (such as LFIT) in some scenarios with numerical aspects: in the FairCV db case we are generating white-box explanations to a deep-learner black-box; in the US census case we are explaining a dataset that could be typically tackled by numeric (statistical) approaches.
In these situations there is an interesting question regarding qualitative vs. quantitative considerations.
From the declarative viewpoint of LFIT, the focus is on the qualitative guarantee of learning a logical version equivalent to the observed system. Regarding equivalence, the version is equivalent or it is not. If the model fails in 1% of the examples, equivalence is lost in the same way than if it had failed in 20% or 60% of the examples.
From the viewpoint of the statistical approaches it is very important to take into account the amounts. For example, the output of deep-learning classifiers is based on a quantitative criterion such as to choose the label with the highest probability.
It could seem that the qualitative behaviour of LFIT does not matter; but this is not exactly true.
LFIT can easily collect qualitative information, such as how many states (input examples) match each rule. This numerical information can be used as weights, both to better explain and understand the process, but also to incorporate predicting capabilities to the declarative version. This option has been explained and explored in [61,62].

Experimental Framework
For testing the capability of PRIDE to explain machine learning domains we have designed several experiments using the FairCVdb dataset [8] and the data about adult incomes from the 1994 US census [63].
Although the goals and methods are similar, there are big differences between the tasks. The detailed process is described separately.

FairCVdb Dataset
FairCVdb comprises 24,000 synthetic resume profiles. Table 2 summarises the structure of these data. Each resume includes 12 features (v i ) related to the candidate merits, 2 demographic attributes (gender and three ethnicity groups), and a face photograph. In our experiments we discarded the face image for simplicity (unstructured image data will be explored in future work). Each of the profiles includes three target scores (T) generated as a linear combination of the 12 features: where α i is a weighting factor for each of the merits (see [8] for details): (i) unbiased score (β = 0); (ii) gender-biased scores (β = 0.2 for male and β = 0 for female candidates); and (iii) ethnicity-biased scores (β = 0.0, 0.15 and 0.3 for candidates from ethnic groups 1, 2, and 3, respectively). Thus, we intentionally introduce bias into the candidate scores. From this point on, we will simplify the name of the attributes considering g for gender, e for ethnic group, and i1 to i12 for the rest of the input attributes. In addition to the bias previously introduced, some other random bias was introduced relating attributes and gender to simulate real social dynamics. The attributes concerned were i3 and i7. Note that merits were generated without bias, assuming an ideal scenario where candidate competencies do not depend on their gender or ethnic group. For the current work we have used only discrete values for each attribute discretising one attribute (experience to take values from 0 to 5, the higher the better) and the scores (from 0 to 3) that were real valued in [8]. We have experimented with PRIDE on the FairCVdb dataset described in the previous section. Figure 3 shows names and explains the scenarios considered in our experiments. In [8], researchers demonstrate that an automatic recruitment algorithm based on multimodal machine learning reproduces existing biases in the target functions even if demographic information was not available as input (see [8] for details). Our purpose in the experiments was to obtain a declarative explanation capable of revealing those biases. and ethnicity (e) bias separately. Apart from gender and ethnicity, there are 12 other input attributes (named from i1 to i12). There is a couple of (biased and unbiased) datasets for each one: gender and ethnicity. We have studied the input attributes by increasing complexity starting with i1 and i2 and adding one at each time. Thus, for each couple we considered 11 different scenarios (named from s1 to s11). This figure shows their structure (s i is included in all s j for which i < j).

Adult Income Level Dataset
In a second set of experiments, we considered a dataset about adult incomes extracted from the 1994 US census [63]. It contains a total of 48,842 entries with 14 attributes that describe the group of individuals represented by each entry. One of these attributes is the income level discretised to only highlight if it is high (>50k USD) or low (≤50k USD). Table 1 summarises the structure of the dataset.
The dataset is usually split into training and testing subsets. Like in the first analysis on the FairCVdb dataset, we fed PRIDE with the complete dataset.
Unlike the first set of experiments on FairCVdb, with the adult income dataset there was no guarantee that incomes are biased by attributes like gender or ethnicity. Another difference is that datasets taken from the US census are not synthetic; they collect information about real people. In addition, an unbiased version of the income level is not available. On the other hand, there is a common belief that the level of income is skewed by gender and ethnicity. The general intuition tells us that males are more likely to have higher incomes than females, and people of white ethnicity are more likely to have higher incomes than other ethnicities. The goal of this second set of experiments over the income dataset was to check PRIDE expressiveness when trying to find a data-driven explanation for this common belief.

Experiments Design
We followed these steps:

1.
To prepare the dataset for PRIDE by preprocessing: • Removing those entries with some unknown attribute. Only 45,222 entries remain after this step. • Discretising continuous attributes (those marked as continuous in Table 1).

2.
To get a logical version equivalent to the data to analyse the effect of the attribute sex considering the income level as a function of the other attributes.

3.
To get a logical version equivalent to the data to analyse the effect of the attribute ethnicity considering the income level as a function of the other attributes.

Results
It is important to pay attention to the properties that the formal model under LFIT guarantees: the learned propositional logic theory is equivalent to the observed data, and the conditions of each clause (rule) are minimal. These properties allow for estimating the complexity of the observed data from the complexity of the learned theory-the simpler the dataset the simpler the theories. In the future, we would like to explore the possibility of defining some kind of complexity measure of the datasets from the complexity of the theories learned by LFIT. It could be something similar to Kolmogorov's compression complexity [64].
Another important question to take into account when quantitatively analysing these results is the expressiveness of the LFIT models. Propositional logic excludes the use of variables. Although functional notation has been used (for example in sex(0)) each pair of functions and one specific value of its argument, represents a proposition (sex 0 in our example). The use of variables by other declarative models, such as first-order logic, allows a more compact notation by grouping different values of the same attribute by means of a well defined variable. However, there is no trivial translation from one model to another. It is important to realise that this circumstance is an inherent characteristic of propositional logic that can not be overcome inside the propositional realm. It is true that more compact notations could be more readable and, hence, they can offer more easily understandable explanations. However, the increase of the readability of LFIT results by translating them into another model is out of the scope of the current contribution.

Example of Declarative Explanation
Listing 5 shows a fragment generated with the proposed methods for scenario s1 for gender-biased scores. We have chosen a fragment that fully explains how a CV is scored with the value three for Scenario 1. Scenario 1 takes into account the input attributes gender, education, and experience. The first clause (rule), for example, says that if the value of a CV for the attribute gender is 1 (female), for education is 5 (the highest), and for experience is 3, then this CV receives the highest score (3).
The resulting explanation is a propositional logic fragment equivalent to the classifier for the data. It can also be understood as a set of rules with the same behavior. From the viewpoint of explainable AI, this resulting fragment can be understood by an expert in the domain and used to generate new knowledge about the scoring of CVs.

Quantitative Summary of the Results
In this section, a quantitative summary of the results is discussed. The total number of rules and the frequency of each attribute are shown. In order to compare the influence of each attribute, their normalised frequencies with respect to the total number of rules are also shown. Tables 3 and 4 show the number of rules and the absolute frequency of each attribute in the rules when comparing ethnicity biased and unbiased datasets.  Tables 5 and 6 show the number of rules and the absolute frequency of each attribute in the rules when comparing gender biased and unbiased datasets. In order to compare the effect of each attribute, their normalised frequencies (with respect to the number of rules) are also shown in Figure 4 (when studying ethnicity biases) and in Figure 5 (for gender biases).

Quantitative Identification of Biased Attributes in Rules
Our quantitative results are divided in two parts. The first part is based on the fact that, in the biased experiments, if gender(0) appears more frequently than gender(1) in the rules, then that would lead to higher scores for gender(0). In the second quantitative experimental part we will show the influence of bias in the distribution of attributes.
We first define Partial Weight PW as follows. For any program P and two atoms v val i 0 0 and v val j 1 1 , where val i 0 ∈ val 0 and val i 1 ∈ val 1 , define: Then we have: PW is a weighted addition of all the values of the output, and the weight, in our case, is the value of scores.
This analysis was performed only on scenario s11, comparing unbiased and genderand ethnicity-biased scores. We have observed a similar behavior of both parameters: partial and global weights. In unbiased scenarios, the distributions of the occurrences of each value could be considered statistically the same (between gender(0) and gender(1) and among ethnicity(0), ethnicity(1) and ethnicity (2)). Nevertheless, in biased datasets the occurrences of gender(0) and ethnic(0) for higher scores are significantly higher. The maximum difference even triplicates the occurrence of the other values.
For the global weights, for example, the maximum differences in the number of occurrences, without and with bias respectively, for higher scores expressed as % increases from 48.8% to 78.1% for gender(0), while for gender(1) decreases from 51.2% to 21.9%. In the case of ethnicity, it increases from 33.4% to 65.9% for ethnic(0), but decreases from 33.7% to 19.4% for ethnic(1) and from 32.9% to 14.7% for ethnic(2).

Quantitative Identification of the Distribution of Biased Attributes
We now define f req p 1 (a) as the frequency of attribute a in P 1 . The normalised percentage for input a is: NP p 1 (a) = f req p 1 (a)/ ∑ x∈input f req p 1 (x) and the percentage of the absolute increment for each input from unbiased experiments to its corresponding biased ones is defined as: AIP p 1 ,p 2 (a) = ( f req p 1 (a) − f req p 2 (a))/ f req p 2 (a).
In this approach we have taken into account all scenarios (from s1 to s11) for both gender and ethnicity.
We have observed that for both parameters the only attributes that consistently increase their values are gender and ethnicity comparing unbiased and gender/ethnicitybiased scores. Figures 6 and 7 show AIP us1−11,ebs1−11 for each attribute, that is, their values comparing unbiased and ethnic-biased scores for all scenarios from s1 to s11. It is clear that the highest values correspond to the attribute ethnicity.

Attributes
Percentage of Absolute Increment (AIP) AIP(ethnicity) is between 50% and 480% superior than the rest Figure 6. Percentage of the absolute increment (comparing scores with and without bias for ethnicity) of each attribute for scenarios s1, s2, s3, s4, s5 and s6 (AIP us1−6,ebs1−6 ). The graphs link the points corresponding to all the input attributes considered in each scenario.

Attributes
Percentage of Absolute Increment (AIP) AIP(ethnicity) is 40% superior than the rest Something similar happens for gender. Figures 8 and 9 show AIP us1−11,gbs1−11 for each attribute when studying gender-biased scores. It is worth mentioning that some differences exist in scenarios s9, s10, and s11, regarding attributes i3 and i7. These apparent anomalies are explained by the random bias introduced in the datasets in order to relate these attributes with gender when the score is biased. Figure 10 shows NP s11 for all attributes. This clearly shows the small relevance of attributes i3 and i7 in the final biased score. As is highlighted elsewhere, this capability of PRIDE to identify random indirect perturbations of other attributes in the bias is a relevant achievement of our proposal.

Attributes
Percentage of Absolute Increment (AIP) AIP(gender) is between 60% and 270% superior than the rest

Adult Income Level Dataset
A quantitative summary of the dataset can be found in the next paragraphs. Tables 7 and 8 show the frequency of each attribute when studying ethnicity biases.  Tables 9 and 10 show the frequency of each attribute when studying gender biases.  Figure 11 shows the normalised frequency of these attributes. It is easy to check that LFIT catches the structure of the dataset because there are no significative differences when excluding gender or ethnicity to study their biases. It is also interesting to mention that these attributes do not contribute the most to income level. Due to the circumstances described in previous sections the goals and analysis on this dataset are simpler than on FairCVdb.
In this case, it was enough to study the clauses of the learned program and compute the normalised frequency of the different values of the attributes ethnicity and sex with respect to the total amount of entries and compare the proportion of class(0) and class(1). The results are shown in Figures 12 and 13.
In both cases, blue color is used for class(0) and red for class (1). Their frequency normalised with respect to the total amount of entries are put together to compare them.
This simple initial experiment shows that the propositional logic theory learnt by PRIDE supports and explains the common belief about the relationship among sex (idem. ethnicity) and higher income level: • Ethnicity: Figure 12 shows that the logical theory contains clauses to explain that people of white ethnicity ethnicity(0) get higher incomes than other ethnicities. • Sex: Figure 13 shows that the logical theory contains clauses to explain that males sex(1) get higher incomes than females. Table 11 shows the frequency of values of ethnicity and their effect on income. It is easy to draw the same conclusions explained before. The same happens with respect to gender as Table 12 shows.

Discussion
After running the experiments described in the previous sections we can extract the following conclusions.
• PRIDE can explain algorithms learnt by neural networks. The theorems that support the characteristics of PRIDE allow a set of propositional clauses logically equivalent to the systems observed when facing the input data provided. In addition, each proposition has a set of conditions that is minimum. Thus, regarding the FairCVdb case, once the scorer is learnt, PRIDE translates it into a logical equivalent program. This program is a list of clauses like the one shown in Listing 5. Logical programs are declarative theories that explain the knowledge on a domain. • PRIDE can explain what happens in a specific domain. Our experimental results discover these characteristics of the domain: -Insights into the structure of the FairCVd dataset. We have seen (and further confirmed with the authors of the datasets) characteristics of the datasets, e.g., (1) All attributes are needed for the score. We have learnt the logical version of the system starting from only two input attributes and including one additional attribute at a time and only reached an accuracy of 100% when taking into account all of them. This is because removing some attributes generates indistinguishable CVs (all the remainder attributes have the same value) with different scores (that correspond to different values in some of the removed attributes). (2) Gender and ethnicity are not the most relevant attributes for scoring: The number of occurrences of these attributes is much smaller than others in the conditions of the clauses of the learnt logical program. (3) While trying to catch the biases we have discovered that some attributes seem to increase their relevance when the score is biased. For example, the competence in some specific languages (attribute i7) seems to be more relevant when the score has gender bias. After discussing with the authors of the datasets, they confirmed a random perturbation of these languages into the biases, that explained our observations. -Biases in the training FairCVdb datasets were detected. We have analysed the relationship between the scores and the specific values of the attributes used to generate the biased data. We have proposed a simple mathematical model based on the effective weights of the attributes that concludes that higher values of the scores correspond to the same specific values of gender (for gender bias) and ethnic group (for ethnicity bias). On the other hand, we have performed an exhaustive series of experiments to analyse the increase of the presence of the gender and ethnicity in the conditions of the clauses of the learnt logical program (comparing the unbiased and biased versions).

-
Insights into the structure of dataset about the adult income from the US census.
In this case, there is no unbiased version to compare with, as in the FairCVdb dataset. In addition, we do not have any machine learning approach to be considered for the black-box explanation. Nevertheless, there exists a common belief about the presence of biases (gender and ethnicity) in the income level. PRIDE has been used considering the dataset itself as a black-box, understanding the income level as a function of the other attributes. We have obtained a logic theory that supports this common belief.
Our overall conclusion is that in scenarios in which opaque (black-box) machine learning techniques have been used; LFIT, and in particular PRIDE, are able to offer explanations to the algorithm learnt in the domain under consideration. The resulting explanation is, as well, expressive enough to catch training biases in the models learnt with neural networks.
In those cases in which there is no machine learner to compare with, PRIDE is still able to explain the structure of the datasets considering themselves as the black-box that has to be explained.

Further Research Lines
• Increasing understandability. Two possibilities could be considered in the future: (1) to ad hoc post-process the learned program for translating it into a more abstract form, or (2) to increase the expressive power of the formal model that supports the learning engine using, for example, ILP based on first-order logic. • Adding predictive capability. PRIDE is actually not aimed to predict but to explain (declaratively) by means of a digital twin of the observed systems. Nevertheless, it is not really complicated to extend PRIDE functionality to predict. It should be necessary to change the way in which the result is interpreted as a logical program: mainly by adding mechanisms to chose the most promising rule when more than one is applicable. Our plan is to test an extended-to-predict PRIDE version to this same domain and compare the result with the classifier generated by deep learning algorithms. • Handling numerical inputs. [8] included as input the images of the faces of the owners of the CVs. Although some variants to PRIDE are able to cope with numerical signals, the huge amount of information associated with images implies performance problems. Images are a typical input format in real deep learning domains. We would like to add some automatic pre-processing steps for extracting discrete information (such as semantic labels) from input images. We are motivated by the success of systems with similar approaches but different structure like [65]. • Generating and combining multiple explanations. The present work has explored a way to provide a single human-readable explanation of the behavior of an AI model. An extension we have in mind is generating multiple explanations by different complementary methods and parameters of those methods and then generating a combined explanation [66,67]. • Explaining AI vulnerabilities. Another extension of the presented work is towards explaining unexpected behaviors and vulnerabilites of given AI systems, e.g., against potential attacks [68] like manipulated input data [69]. • Measuring the accuracy and performance of the explanations. As far as the authors know, there is no standard procedure to evaluate and compare different explainability approaches. We will incorporate in future versions some formal metric. • Analysing other significant problems where non-explainable AI is now the common practice for good explanations. The scenario studied here (automatic tools for screening in recruitment and estimating the income level based on demographic information) are only two of the many application areas where explanations of the action of AI systems are really needed. Other areas that will significantly benefit from this kind of approaches are e-learning [70], e-health [71,72], and other human-computer interaction applications [73,74]. • Proposing metrics for the complexity of the datasets. Due to the formal properties that the general LFIT model gives to the learned theories, the complexity of the original data could be estimated from the complexity of the propositional logic equivalent theory. This approach is inspired by some implementations of Kolmogorov's complexity by means of file compressors [64]. 101248-B-I00 MINECO/FEDER), RTI2018-095232-B-C22 MINECO, PLeNTaS project PID2019-111430RB-I00 MINECO; and also by Pays de la Loire Region through RFI Atlanstic 2020.

Conflicts of Interest:
The authors declare no conflict of interest.