Next Article in Journal
Test Case Prioritization Using Dragon Boat Optimization for Software Quality Testing
Previous Article in Journal
An Innovative Digital Pulse Width Modulator and Its Field-Programmable Gate Array Implementation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Object Identity Reloaded—A Comprehensive Reference for an Efficient and Effective Framework for Logic-Based Machine Learning

Department of Computer Science, University of Bari, Via E. Orabona 4, 70125 Bari, Italy
Electronics 2025, 14(8), 1523; https://doi.org/10.3390/electronics14081523
Submission received: 3 February 2025 / Revised: 3 April 2025 / Accepted: 7 April 2025 / Published: 9 April 2025
(This article belongs to the Section Computer Science & Engineering)

Abstract

:
Sub-symbolic Machine Learning (ML) techniques, and specifically Neural Network-based ones, recently took over the research landscape, thanks to their efficiency and impressive effectiveness. On the other hand, the recent debate on ethics and AI and the first regulations on AI are progressively calling for anthropocentricity, which in turn requires explicit, human-understandable, and explainable approaches and representations that allow humans to be active parts in the loop. In these cases, logic-based approaches are more suitable. The Inductive Logic Programming (ILP) branch of research in ML provides an anwer to this need and a uniform and unifying framework for three relevant industrial and research concerns: management of databases, implementation of software systems, and modeling of human-like reasoning strategies. A particular ILP framework based on the Object Identity (OI) assumption was proposed in the 1990s, for which desirable theoretical and pratical properties were demonstrated and working tools and systems that successfully approached real-world and classical problems in AI were developed. In an age when mainstream research and media seem to reduce AI and ML to just deep learning, this paper celebrates the 30th anniversary of OI by providing for the first time a comprehensive overview of the framework to be used as a reference for researchers still interested in investigating the ILP approach to ML.

1. Introduction

Research in Artificial Intelligence (AI) has traditionally investigated two kinds of approaches. The sub-symbolic one, based on numeric/probabilistic techniques, reproduces the brain’s physical structure and behavior, and it is most suited to replicate perception. The symbolic one, exploiting formal logics, aims at reproducing high-level mental functions and representations in humans, and it is most appropriate to reproduce mind. The former is extremely efficient, but unsuitable for environments where complex objects with many interacting relationships must be expressed and described. The latter allows humans to trace, understand, and check the AI outcomes, thanks to the possibility of associating logic formulas with a ‘semantics’, i.e., an intuitive meaning that links them to our world and thoughts, and the chance of reproducing the inference strategies that humans use in conscious reasoning. This comes at the cost of increased computational complexity, which pushed the research towards weaker frameworks showing a good tradeoff between costs and benefits. Logic Programming (LP) [1,2] is the most famous. It deserves attention in the current Computer Science landscape since it provides a uniform and unifying framework for three major industrial and research concerns: implementation of software systems, management of information stored in databases (DBs), and modeling of human-like reasoning strategies to be applied in less formalized and understood environments and situations, where algorithms for carrying out the requested tasks are lacking.
Research in AI quickly experienced the so-called ‘knowledge acquisition bottleneck’ [3], i.e., the enormous difficulty in filling up the systems’ knowledge base with respect to using it for reasoning. This is due to the difficulty of domain experts in eliciting and properly formalizing their competence and sometimes their (more or less unconscious) refusal to provide their own “secrets” to other people (let alone a computer system). This gave rise to another research branch, Machine Learning (ML), aimed at automatically acquiring the knowledge directly from observations. Specifically, incremental ML can be important (for improving efficiency, as a closer simulation of human behavior, and for supporting the continuous changes and adaptation in the knowledge). Again, both sub-symbolic and symbolic approaches were attempted. The former, and specifically Neural Network-based ones, recently took over the research landscape thanks to their efficiency and impressive effectiveness. On the other hand, the recent hot debate on ethics and AI and the first regulations on AI [4] are progressively calling for anthropocentricity, which in turn requires explicit, transparent, human-understandable, and explainable approaches and representations that allow humans to be active parts in the loop. When these are unnegligible requirements (e.g., in medical and other critical applications, such as those concerning bioinformatics and chemistry —see the works by Muggleton, Srinivasan, and Ross, including [5] on carcinogenesis, [6,7] on mutagenesis), the latter is the appropriate choice [8]. The symbolic approach to ML is investigated by Inductive Logic Programming (ILP) [9] based on LP. A particular ILP framework based on the Object Identity (OI) assumption was proposed in the 1990s, aimed at supporting all of the above. We stress here the fact that this paper refers to the notion of Object Identity in ILP, which has nothing to do with the notion with the same name used in the Database and Object Oriented community [10] For this framework, desirable theoretical and practical properties were demonstrated, especially in the incremental ML setting, and working tools and systems that successfully apprached real-world and classical problems in AI were developed. In an age when mainstream research and media seem to reduce AI and ML to just deep learning, this paper celebrates the 30th anniversary of OI by providing for the first time a comprehensive overview of the framework to be used as a reference for researchers still interested in investigating the ILP approach to ML. Note that no new theoretical result is proven in this paper; all the proposed proofs were already published in papers or introduced in thesis works. Rather, we organize here the whole corpus of results in a more coordinated and balanced way.
Concerning the methodology for carrying out the review, given the objective to stress the need to re-discover symbolic, especially FOL-based, approaches, as alternative and complementary to subsymbolic ones, a first decision was to exclude all works concerning attempts to combine neural and symbolic learning (see [11] for an overview on NeSy). Of course, we totally endorse the possibility of mixing the two kinds of approaches in balanced solutions that can take the best of both and let them make each for the limitations of the other; put simply, this is not in the scope of this paper. Then, we conducted a systematic review of all the papers that introduced, expanded, or used the OI framework. Since this work is specifically interested in the theoretical results, we then identified the subset of papers that included theoretical contributions and determined the minimum set that could cover all the relevant ones, preferring journal papers over conference ones. Since the results and proofs collected in this way sometimes used different notation, we rewrote them using the same style and formalism, also removing some redundancy. Finally, we turned to the subset of papers reporting on practical applications of the OI framework (e.g., describing OI-based systems or experiments using these systems). Being this a secondary objective of this work, we only selected a few most prominent papers that could showcase, without an ambition to exhaustiveness, a sufficiently varied range of algorithms, systems, and applications demonstrating the practical use and advantages brought by the OI framework.
The paper is organized as follows. Section 2 introduces the basics of logics, LP and ILP. Then, Section 3, Section 4 and Section 5 provide an overview of logics under OI, of how it restructures the space of formulas, and on how it can be used in incremental ML, respectively. Section 6 and Section 7 discuss the framework and its uses, before concluding with Section 8.

2. Basics

First-Order Logic (FOL for short) has gained predominance in symbolic AI research as a means for formally representing knowledge and implementing automated reasoning, since it reaches a good tradeoff between expressive power and computational tractability. Indeed, at this level the (informal) notion of semantic consequence between formulas (⊧) is equivalent to syntactic derivation (⊢), which is formal and algorithmic. In FOL:
  • a term, representing and object, is a variable, a constant, or an n-ary function symbol applied to n terms as arguments;
  • an n-ary predicate p, denoted as p / n , is a kind of claim involving n objects;
  • an atom is an n-ary predicate applied to n terms as arguments, representing a claim that can be true or false; a formula is an atom or a suitable composition of atoms using logic functions (such as negation ¬ and disjunction ∨); a theory is a set of formulas.
vars ( E ) , consts ( E ) , and terms ( E ) denote the sets of variables, constants, and terms occurring in a formula E, respectively. A formula or a term is ground if it does not contain variables.
The clausal fragment of FOL is based on the following additional definitions:
  • A literal is an atom (positive literal) or its negation (negative literal); l ¯ denotes the opposite of a literal l.
  • A clause is a universal formula consisting of a finite disjunction of literals; the length of a clause C, denoted by C , is the number of its literals; the empty clause □, representing a contradiction, does not contain any literal.
  • Two clauses are standardized apart if they do not share variables; they are variants if they differ only in the names of variables (renaming variables in a formula does not change its meaning); in the following, clauses are always standardized apart.
A clause is usually denoted as a set of literals or in Prolog style:
{ A 1 , , A m , ¬ B 1 , , ¬ B n } A 1 , , A m : B 1 , , B n
where the A i and B j are atoms. It corresponds to the implication “If all B j s are true, then at least an A i is true”, where { A i } are the conclusions and { B j } are the premises. The clausal language is the set of all clauses expressed in a given alphabet of constant, variable, function, and predicate symbols. It has the same expressive power as general FOL, since each FOL formula can be expressed as a set of clauses by casting it in normal conjunctive form and eliminating existential quantifiers by transforming them into functions of the universally quantified ones (skolemization).
A substitution is a finite mapping from variables to terms, represented as a set of bindings  θ = { t 1 / x 1 , , t n / x n } , where all variables x i are distinct and i = 1 , , n : x i t i . The empty substitution ϵ = { } contains no bindings. A substitution is ground if all t i are ground; it is a renaming if all terms t i are distinct variables. Applying substitution θ to a formula E (denoted by E θ ) consists in replacing in E each occurrence of x i by t i . The resulting formula is an instance of E. If θ is a renaming, E θ is called a variant of E. A generality ordering can be defined upon substitutions: a substitution θ is more general than another η if γ substitution such that η = θ γ (where θ γ means applying γ to the result obtained by applying θ ). Two formulas E , F are unifiable if θ substitution such that E θ = F θ . Such a θ is a unifier of E , F , and E θ is their unification. θ is a most general unifier (MGU) of E , F , denoted mgu ( E , F ) , if, for any other unifier σ of E , F , θ is more general than σ . Unification is decidable in FOL (not in higher-order logics), where the MGU is also unique (except for renamings).
Logic formulas can be associated to a semantics, i.e., a meaning that is intuitive and that can mirror human experience. The classical declarative semantics used in logics is based on the following notions. Given a theory T,
  • the Herbrand Universe of T, U T is the set of all terms that can be expressed in the language;
  • the Herbrand Base of T, B T , is the set of all ground atoms that can be expressed in the language;
  • a Herbrand Interpretation for T associates each atom in the Hebrand Universe to a truth value (true or false); it can be seen as defining the set of true atoms in B T .
A Herbrand interpretation is a model for T if it makes all formulas in T true. If a theory has at least one model, it is satisfiable, otherwise it is unsatisfiable. The intersection of Herbrand models is a Herbrand model, and the intersection of all Herbrand models of T is the least Herbrand model, denoted M T .
Some ordering relationship ≤ can be applied to clauses to define their generality, where C D means “C is more general than D” (or “D is more specific than C”); C < D means that C D and D ¬ C (C is strictly more general than D, D is strictly more specific than C), and C D means that C D and D C (C and D are equivalent).
The classical generality ordering used in logic is implication, according to which a formula is more general than another if, whenever the former is true, the latter is, too:
Definition 1 
(Implication). Given C and D formulas, C implies D ( C D or D I C ) iff any model for C is a model for D ( C D ). Equivalence is denoted by ⇔.
Implication is a quasi-ordering; not being anti-symmetric, two equivalent clauses under implication might not be variants. Since it is undecidable (given C and D clauses, there exists no procedure to tell if C D ) [12], various attempts to obtain weaker but computable orderings take place, the most successful of which is ( θ ) subsumption.
Definition 2 
( θ subsumption). Given C and D clauses, C θ subsumes D ( C θ D ) iff θ substitution such that C θ D . Equivalence is denoted by θ .
θ -subsumption is a quasi-ordering upon clauses, strictly weaker than implication (if C θ D then C I D , but not vice versa). Since it is not anti-symmetric, two clauses can be equivalent without being variants. θ -subsumption is decidable, but NP-complete [13].
A calculus allows to reason on knowledge represented in logics using operations called inference rules, each being a tautology involving a finite number of premises and a consequence: if all premises are true, the consequence is true. A derivation, denoted by ⊢, is a sequence of applications of inference rules. An inference rule is sound if it allows to derive only logical consequences of the premises; it is complete if it allows to derive all logical consequences of a set of formulas. A very simple, but very powerful, inference rule is resolution:
Definition 3 
(Resolution). Let C = C l C , D = D l D ¯ be two formulas with l C , l D literals such that θ = mgu ( l C , l D ) . A resolvent of C and D, where the parents C and D were resolved upon l C and l D , is a formula R = R ( C , D ) = ( C D ) θ .
R is sound and is a closed operation for clauses (the resolvent of two clauses is again a clause). Together with a finite set of clauses (called axioms) T, it defines a clausal theory. The resolution closure of a set of clauses T, denoted by R * ( T ) , is defined inductively:
  • R 0 ( T ) = T ;
  • R n ( T ) = R O I n 1 ( T ) { R = R ( { C , D } ) | C , D R n 1 ( T ) } , ( n > 0 );
  • R * ( T ) = R 0 ( T ) R O I 1 ( T ) R n ( T ) .
In a clausal theory, for any clause C, T C iff C R * ( T ) [14,15]. R is incomplete by deduction because it cannot instantiate variables or add literals (what subsumption does). There exists a deduction of C from T ( T d C ) if C is a tautology, or D such that T D D θ C . Derivation of the empty clause from T, called a refutation, happens iff T is unsatisfiable (i.e., contradictory). This is the basis of the ‘Proof by refutation’ strategy: if the negation of the thesis, along with the axioms, derives the empty clause, then the theorem is true.
Theorem 1 
(Subsumption theorem [16,17]). Let T be a set of clauses and C a clause. If T C , then T d C .
Corollary 1 
(Refutation completeness [16]). If T is an unsatisfiable set of clauses, then T .
We focus on linear resolution ( L ), where a new resolution step in the derivation must involve the result of the latest resolution step. Albeit being a restriction of R , a subsumption theorem holds as a consequence of its completeness with respect to refutation [9].

2.1. Logic Programming

A particular restriction of clausal theories is based on Horn clauses that include at most one positive literal. In a Horn clause, the premises (if any) are the body; the conclusion (if any) is the head. A definite program clause has both head and body; a goal clause has no head; a unit clause has no body. Horn clauses are interesting since, once the premises are verified, the true consequence is univocally determined. A clause is
  • domain restricted iff all variables occurring in its body also occur in its head;
  • range restricted iff all variables occurring in its head also occur in its body;
  • linked if all of its literals are; a literal is linked if at least one of its arguments is; an argument of a literal is linked if the literal is the head of the clause or if another argument of the same literal is linked [18].
Thus, a domain-restricted clause is also linked.
Example 1. 
C = { P ( x ) , ¬ Q ( x , y ) , ¬ Q ( y , z ) } is linked; D = C { ¬ Q ( x , y ) } and F = C { ¬ R ( v , w ) } are not linked.
Thanks to the procedural interpretation of Horn clauses [19], they can be used as a programming language or as a DB manipulation language based on the correspondences in Table 1. A logic program is a set of definite Horn clauses, interpreted as their conjunction. The clauses in a logic program can be divided into facts (ground unit clauses representing true claims) and rules (definite clauses allowing to deduce true claims from other true claims).
General logic programming [1] admits negated literals in the body (it borrows the terminology from LP, adding adjective general).
Execution of logic programs works by refutation using a special case of resolution, SLD resolution (Selection rule-driven Linear resolution for Definite clauses): given a logic program P and a goal : l 1 , , l n , the negation of the goal is N = ( l 1 l n ) , which starts the sequential execution of procedures l 1 , , l n . If θ 0 θ m is the sequence of substitutions performed during refutation, the refutation can be seen as a proof of formula ( l 1 l n ) θ 0 θ m , and the bindings associated to the variables in N in the composed substitution θ = θ 0 θ m , called computed answer substitution for P { N } , are the output of the program.
The classical declarative semantics obviously holds also for logic programs. Interestingly, the least Herbrand model M P of a definite program P is exactly the set of all its logical consequences, so one may always work on M P without loss of generality:
M P = { a B P a   is   a   logic   consequence   of   P }
Other approaches to semantics are specific to logic programs. Denotational semantics [1] relies on the fact that the set 2 B P of all Herbrand interpretations of a definite program P is a complete lattice with respect to the partial ordering defined by set inclusion ⊆. The immediate consequence operator  T P : 2 B P 2 B P such that, given a Herbrand interpretation I,
T P = { a B P a : a 1 , , a n   is   a   ground   instance   of   a   clause   in   P   and   { a 1 , , a n } I }
is clearly a monotonic mapping, with least fixpoint T P ω . Procedural semantics focuses on the success set  S S P of a definite program P, i.e., the set of all the (ground atoms) a B P such that there exists a SLD-refutation for P { : a } . All these approaches are equivalent:
M P = T P ω = S S P

2.2. Datalog

Datalog (acronym for Database Logic) [20,21] is an approach to relational DBs based on LP. Syntactically, it is the function-free fragment of General LP [1]. Allowing only variables or constants as terms, it ensures that all elements in n-tuples are atomic (and hence the resulting DB is in First Normal Form). To avoid that from a Datalog program P infinite facts can be derived, two safety conditions must be imposed: all facts in P must be ground, and all rules in P must be range restricted. While in General LP all the knowledge is expressed in the program, Datalog splits the knowledge into two sets: the Extensional DB (EDB) contains all facts, stored in a relational DB; the Intensional DB (IDB) contains the rules, i.e., the very program P that defines the views. Accordingly, predicates are also distinguished in extensional ones if they are in the EDB, and intensional ones if they are defined by the heads of rules in P. A Datalog program can be viewed as a query on the EDB.
Some extensions of Datalog may use built-in predicates, expressed by special symbols (such as <, >, ≤, ≥, =, ≠), that have a pre-defined meaning. Although they can formally be regarded as EDB predicates, they are implemented as procedures to be evaluated at runtime. Since they often correspond to infinite relations, further safety conditions are required: any variable occurring as an argument of a built-in predicate in the head of a rule must occur also as an argument of an ordinary predicate in the body of the same rule or be bound by a (chain of) equality predicate(s) to a variable occurring in an ordinary predicate or to a constant. Adding the inequality built-in ≠ yields the Datalog extension.
Interestingly, Datalog has the same expressive power as unconstrained clausal logic, thanks to a representational change called flattening [22] that transforms each n-ary function symbol f into a fresh ( n + 1 ) -ary predicate f p whose first n arguments are the function’s arguments and whose last argument is the function result, defined as
f p ( t 1 , , t n , X ) : X = f ( t 1 , , t n )
A clause is flattened by replacing in it each occurrence of a term f ( t 1 , , t n ) with a fresh variable X, and adding to its body f p ( t 1 , , t n , X ) . All occurrences of the same term are replaced by the same variable. Flattening must be applied recursively on terms including nested functions, up to the complete elimination of function symbols. An inverse procedure, called unflattening, goes back from a flattened program to the original one. Given a clause (or logic program) C, flat ( C ) and unflat ( C ) denote its flattened and unflattened versions, respectively. Flat _ Defs ( P ) denotes the set of all definitions of flattening predicates in a program P.
Theorem 2. 
P A iff flat ( P ) , Flat _ Defs ( P ) A .
Corollary 2. 
Given two definite clauses C and C ,
1. 
if C is a clause with function symbols, unflat ( flat ( C ) ) = C ;
2. 
if C is a flat clause, flat ( unflat ( C ) ) = C ;
3. 
C θ C iff flat ( C ) θ flat ( C ) .
Datalog performs inference using unification. The elementary production principle (EPP) states that, given a rule R = l 0 : l 1 , , l n and a set of ground facts f 1 , , f n , if θ substitution is such that i = 1 , , n : l i θ = f i , then from R and f 1 , , f n it is possible to infer in one step the fact that l 0 θ . Then, a ground fact f can be deduced from a set of clauses S (in symbols S f ) if f S or if it can be deduced by EPP from a rule R S and from ground facts previously deduced. Such an inference is called proof of f from S.
In regard to semantics, we determine that deduction is complete:
Theorem 3. 
If S is a set of Datalog clauses and f is a ground fact, S f S f .
The set cons ( S ) of all facts that are logic consequences of a finite set of clauses S can be computed using Algorithm 1. Since S is finite, it contains a finite set of predicates and constants. EPP does not introduce new ones. The arity of facts derived is bounded by the greatest arity of heads of clauses in S. Thus, the algorithm terminates and the computed set cons ( S ) is finite.
Algorithm 1: Algorithm for computing the set of logical consequences of a Datalog program.
function Infer (S: finite set of Datalog clauses) : cons ( S ) ;
begin
W := S
while EPP can be applied to some rule and fact of W and some fact F W is produced do
   W := W { F }
return ( W H B ) /∗ all facts in W, not rules ∗/
end.
    Applying model theory, it coincides with the least Herbrand model of S, and hence with the least fixpoint of the immediate consequence operator of S ( T S ):  
cons ( S ) = { f H B S f } = { I I is a Herbrand model of S } = M S = T S ω
In pure Datalog, negation cannot be used. Negative facts can be deduced using the Closed World Assumption (CWA): “If a fact does not logically follow from a set of Datalog clauses, then assume its negation to be true”. Datalog¬, allowing negated literals in clause body, can infer additional information from these facts at the cost of a further safety condition: each variable occurring in a negated literal must also occur in a positive literal of the body.
In regard to semantics, we introduce the following definitions, where f denotes the positive counterpart of a negative fact f:
Definition 4. 
Let I be a Herbrand interpretation, f a ground fact, and R = l 0 : l 1 , , l n a rule.
f is satisfied in I iff f is a positive fact and f I , or f is a negative fact and f I .
R is satisfied in I iff, for all ground substitutions θ for R, we have that i = 1 , , n : l i θ satisfied in I l 0 θ is satisfied in I ( l 0 θ I , since l 0 is positive).
Declarative semantics can still be applied, but the existence of one least model is not guaranteed: e.g., the clause h : ¬ b ¬ b h h ¬ ( ¬ b ) h b has two incomparable minimal models, { h } and { b } . Different strategies are used to select one reference Herbrand model when evaluating a program P on an EDB E, considering that the evaluation of a literal in a rule may cause evaluation of other rules that in turn might contain negated literals, and so on. Stratified Datalog¬ partitions P into n sets P i (called layers) such that
  • all the rules that define the same IDB predicate in P are in the same layer;
  • P 1 contains only clauses without negated literals or whose negated literals correspond to EDB predicates;
  • each P i contains only clauses whose negated literals are completely defined in lower level layers (i.e., layers P j with j < i ).
Such a partition is called stratification, and P is stratified. A stratified program is evaluated by growing layers, applying to each one CWA locally to the EDB made up by the original EDB and by all predicates obtained by the previous layers evaluation. The evaluation of a stratified program produces a minimal Herbrand model, called perfect model, that can be characterized in a purely semantic way. In the example, first b is evaluated, obtaining an empty answer set (since no fact of the form b is derivable from it); then, by CWA, we obtain ¬ b , hence (by the rule) h, which is the perfect Herbrand model. While the stratification for a program is not unique, all are equivalent in regard to the evaluation result. Moreover, not all programs are stratified. A Datalog¬ program P is stratified if its extended dependence graph E D G ( P ) contains no cycles involving edges marked with ¬, where E D G ( P ) is a directed graph whose nodes represent IDB predicates of P, there is an edge p , q if q occurs in the body of a rule defining p, and the edge is labeled with ¬ if there exists at least one rule with p in the head and in whose body occurs q negated.
Another approach, applicable to all Datalog¬ programs, is the inflationary evaluation. It is applied iteratively: each step processes in parallel all rules in P, applying them to both the EDB and the facts computed so far, and then “temporarily” applies the CWA to evaluate the body of the rules (assuming that the negation of all facts not yet derived is true). It corresponds to the computation of a least fixpoint and returns a (generally non-minimal) Herbrand model. It allows to specify goals. Inflationary Datalog¬ is strictly more expressive than stratified Datalog¬ that in turn is strictly more expressive than pure Datalog.

2.3. Inductive Logic Programming

A theory can be viewed as a set of conditions that are necessary and sufficient to account for some observations (knowledge that is perceived from the world) in a given environment, useful to explain past events or to predict future situations in that environment. Inductive learning aims at extracting a model of a concept, given a set of examples (tags concerning the concepts to be learned associated to observations) for that concept. Examples can be positive, representing instances of the concept, or negative, representing instances that do not belong to the concept. ILP joins the objectives of Inductive Learning and LP: it aims at learning (inducing) concept descriptions as logic programs or integrity constraints valid in a given database, starting from a set of literals expressing (positive and negative) examples of the concepts to be learned. We use direct relevance [23] in the form of a Horn clause with the example in the head and the relevant observation in the body (e.g., u n c l e ( s a m , b o b ) p a r e n t ( t i m , b o b ) , b r o t h e r ( s a m , t i m ) is a positive example for concept uncle/2; a negative example for uncle/2 is: ¬ u n c l e ( m a r y , b o b ) p a r e n t ( t i m , b o b ) , f e m a l e ( m a r y ) .
In indirect relevance [24], the observation that is relevant to an example must be obtained from the available knowledge. The examples above would be represented as just u n c l e ( s a m , b o b ) and ¬ u n c l e ( m a r y , b o b ) ).
Definition 5 
(Inductive Learning paradigm). A theory T is a set of hypotheses. A hypothesis H is a set of program clauses with the same predicate in their head, i.e., defining the same concept. An example for a hypothesis H has the same predicate in the head as H. It is negative for H if negated, positive otherwise.
Given:
  • a set of examples E = E + E , where E + are the positive ones and E are the negative ones;
  • a (possibly empty) background knowledge (or BK) B.
Find a theory (logic program, model) T such that (Knowing if a solution exists is the Finite Axiomatizability problem (FA-problem): finding a set of axioms that explains a potentially infinite set of observations).
  • T B E +                                                     (completeness or sufficiency)
  • T B E ̸                               ((prior or strong ) consistency) (If the calculus were complete, we could write T B ̸ E ).
Moreover, the following properties are fulfilled:
  • B ̸ E +                                                         ((prior) necessity)
  • B E ̸                                                      (prior consistency)
  • T B ̸                                                        (weak consistency).
A theory that explains a negative example is inconsistent (or too strong , i.e., too general); it makes a commission error and needs to be specialized. A theory that does not explain a positive example is incomplete (or too weak , i.e., too specific); it makes an omission error and needs to be generalized. In both cases, the theory is incorrect.
Since the examples are the only source of knowledge available for learning, the learning process has to be progressive. Concept definitions inferred starting from limited observations cannot be guaranteed to be universally correct: new observations can reveal that the current formulation of a concept is wrong. Most research in ILP focuses on batch approaches that take all the available examples and return a theory that explains them. When the theory fails on new evidence, the whole process must be restarted from scratch, taking no advantage of the previous theory. On the other hand, incremental approaches can adjust the existing theory without completely rejecting it, but taking the previously generated hypotheses as the starting point of a search process whose goal is a new theory that explains both old and new observations. Incremental synthesis of theories is necessary in several cases, like changing world, sloppy modeling, and selection bias [25]. The two approaches have complementary advantages and drawbacks: the former usually yields more compact and elegant theories, but wastes computational resources for recomputing the theory by starting from scratch; in the latter, the system can just revise an existing hypothesis on the ground of new evidence, but the resulting theory might not be very elegant. So, a cooperation between the two is desirable.
The incremental approach is more complex since it involves two steps: an abstract diagnosis to detect the source of incorrectness in the theory and a debugging of the theory in order to fix it and restore correctness. Note that for commission errors the diagnosis can detect a single inconsistent clause while for omission errors it can just detect the hypothesis that does not account for the example.
A generality ordering ≤ on formulas can be interpreted from the induction perspective:
  • C D : C is a generalization of D, D is a specialization of C;
  • C < D : C is a proper generalization of D, D is a proper specialization of C;
or, from a theory refinement viewpoint,
  • C D : C is an upward refinement of D, D is a downward refinement of C;
  • C < D : C is a proper upward refinement of D, D is a proper downward refinement of C.
As to the debugging process, commission errors can be solved by a downward refinement operator; dually, upward refinement operators can cope with omission errors:
Definition 6 
(Refinement operators). Let ( S , ) be a quasi-ordered set.
A downward refinement operator ρ is a mapping such that C S : ρ ( C ) { D S C D } , i.e., it computes a subset of all specializations of C.
An upward refinement operator δ is a mapping such that C S : δ ( C ) { D S D C } , i.e., it computes a subset of all generalizations of C.
The debugging process aims at finding a minimal refinement of a theory [26]. In the deductive process, the fundamental operation is unification, allowing to find a common instance of two clauses (if any). Its dual, that plays an analogous role in induction, is generalization, that allows to find (if any) a clause of which two given clauses are both instances. So, just like one looks for the most general unifier in order not to miss deduction steps, for the same reason one looks for the least general generalization (lgg) in induction as the only correct generalization (if it covers a negative example, there is no more specific generalization not to cover it).
Definition 7 
(Least general generalization under θ subsumption).
A clause C is a least general generalization under θ subsumption ( lgg ) of two clauses C 1 and C 2 iff it is more specific of (or equivalent to) any other generalization of C 1 and C 2 . Formally,
lgg ( C 1 , C 2 ) = { C C θ C i , i = 1 , 2 D s . t . D θ C i , i = 1 , 2 : D θ C }
This definition can be extended to any finite set of clauses S: being the space of clauses ordered by θ subsumption, a lattice [27] lgg ( S ) exists and is unique. The empty clause, being a subset of any other clause, subsumes them all, but (obviously) is not equivalent to any of them. The algorithm for computing the lgg between two any clauses (not necessarily Horn ones) was given by Plotkin [27]. An lgg of a set S of n clauses, whose maximum length is m, has at most length m n .
Definition 8 
(Ideality). Let ( S , ) be a quasi-ordered set, ρ a downward refinement operator, and δ an upward refinement operator.
  • ρ (respectively, δ) is locally finite iff C S : ρ ( C ) (respectively, δ ( C ) ) is finite and computable.
  • ρ is proper iff C S : ρ ( C ) { D S C < D }
    δ is proper iff C S : δ ( C ) { D S D < C } .
  • ρ is complete iff C , D S : C < D E such that E ρ * ( C ) and E D
    δ is complete iff C , D S : D < C E such that E δ * ( C ) and E D .
  • ρ (respectively, δ) is ideal iff it fulfills all the above three properties.
Ideality plays a key role for efficiency and effectiveness. Unfortunately, there exist no ideal refinement operators when using full Horn clause logic as representation language and either θ subsumption or implication as the generalization model because of the existence of infinite unbound strictly ascending/descending chains [28,29,30].

3. Object Identity

Object identity (OI) was first defined in [31,32] for a FOL that is both function-free and constant-free, and later generalized to a full FOL [33,34]. It is based on the following assumption:
Within a clause, terms denoted by different symbols must be distinct.
In Datalog, the adoption of the OI assumption can be viewed as a method for building an equational theory into both the ordering and the inference rules of the calculus [35] by adding one rewrite rule to the axioms of Clark’s Equality Theory (CET) [1]:
t s b o d y ( C ) C   clause   in   L   and   ( t , s )   pair   of   distinct   terms   in   terms ( C ) ( OI )
where L denotes the language of all the Datalog clauses built from a finite number of predicates. It can be viewed as an extension of both Reiter’s unique-names assumption [36] and Axioms (7), (8), (9) of CET to the variables of the language. The resulting language, DatalogOI (a sublanguage of Datalog), is an instance of Constraint LP (which extends LP by allowing constraints in the body of clauses) [37]. Under OI, any Datalog clause C = l 0 : l 1 , , l n generates a Datalog clause C O I = l 0 : l 1 , , l n I consisting of two components:
  • c o r e ( C O I ) = C ;
  • c o n s t r a i n t s ( C O I ) = I = { t s t , s t e r m s ( C ) t , s distinct}.
Example 2. 
The Datalog clause C = p ( X ) : q ( X , X ) , q ( Y , a ) , where X, Y are variables and a is a constant, corresponds to the DatalogOI clause C O I = p ( X ) : q ( X , X ) , q ( Y , a ) [ X Y ] , [ X a ] , [ Y a ] .
Nevertheless, DatalogOI has the same expressive power as Datalog, since any Datalog clause/program admits a set of DatalogOI clauses equivalent to it:
Proposition 1 
([38]). C D a t a l o g C D a t a l o g O I : T C ω = T C ω
Proof. 
See Algorithm 2.    □
Algorithm 2: Algorithm to compute the OI equivalent of a Datalog clause
function GenerateDatalogOIClauses (C: DatalogClause): DatalogOIClause
begin
C := {};
for k = 0 , 1 , , v a r s ( C ) do
   foreach combination of k variables out of v a r s ( C ) do
      begin
      Define some ordering between the k variables;
      foreach permutation with replacement of k constants out of c o n s t s ( C ) do
         for h = 1 , 2 , , v a r s ( C ) k do
            begin
            C h := {};
            foreach partition ( V i ) i = 1 , 2 , , h of the remaining variables such that
            i , j = 1 , 2 , , h , i < j : V i = r i , r i r j do
               begin
               C h := {};
               Build a clause D by replacing the l-th ( l = 1 , 2 , , k ) variable
               of the combination with the l-th constant of the permutation
               and i = 1 , 2 , , h all the r i variables belonging to V i with
               one new variable;
               if i , j = 1 , 2 , , k , i j : r i r j then Insert D in C h
               elsif there exists no renaming of D in C h then Insert D in C h
               end;
            C := C C h
            end;
      end;
return C
end;
Corollary 3 
([38]). P D a t a l o g P D a t a l o g O I : T P ω = T P ω .
Proof. 
P is the program obtained by translating each clause C P .    □
Lemma 1 
([38]). Let C be a Datalog clause such that v a r s ( C ) = n and c o n s t s ( C ) = m . The number of DatalogOI clauses generated by Algorithm 2 (that include renamings) is
k = 0 , n n k m k h = 1 , n k r 1 r h r 1 + + r h = n k n k r 1 , , r h
The number of distinct clauses is obtained by dividing each multinomial coefficient by n 1 ! n l ! , where n i is the number of r j s in that coefficient that are equal to i and l is the greatest value of r j s in that coefficient.
Proof. 
Each step of the outer loop in Algorithm 2 fixes a different k ( 0 k n ). Then, all possible combinations of k variables are generated (samples of k objects out of n without replacement and without taking into account the ordering that are a binomial coefficient), and to each of them one of the possible dispositions with replacement of that many constants in C (samples of k objects out of m with replacement and taking into account the ordering that are m k ) is associated.
The remaining n k variables are handled by the inner loop as follows. Each step fixes a different h ( 1 h n k ) and generates all possible partitions of v a r s ( C ) made up of h elements sorted by increasing cardinality (samples of dimension n k , without replacement, taking into account their ordering—apart from the equivalent ones, i.e., those belonging to the same partition element—that are a multinomial coefficient). By the addition principle, this yields the inner summation of the formula.
Applying the multiplication principle to this sequence of operations yields the formula to be proved. The division to obtain the number of distinct clauses comes from the fact that partitions are considered distinct even if they were obtained by permutation of elements with the same cardinality. Thus, for each group of n i elements having the same cardinality i, it suffices dividing by the number of their permutations that are n i !    □
Lemma 1 guarantees termination of Algorithm 2 since there is a finite number of DatalogOI clauses corresponding to a given Datalog one.
The cases in which many arguments of a predicate must have the same value is solved under OI by introducing different variations of the predicate, associated to the possible cases of unification among arguments. E.g., given predicate p / 3 , if the first and third arguments can take the same value, we must introduce a new predicate p 3 , 1 , 3 / 2 representing these cases, defined as p 3 , 1 , 3 ( X , Y ) p ( X , Y , X ) . Of course, to consider all possible cases of unification among arguments, a combinatorial number of new predicates is required. This technique also applies to the flattening predicates in order to deal with the case of functions where different arguments can take the same value.
These are only theoretical results to guarantee that there is no loss in expressive power from Datalog to DatalogOI The significant growth in the number of clauses and predicates according to the procedures above is not a limitation in practice, since only the variations that are needed must be included in the theory. E.g., in ILP, a bound to the number of clauses in a theory that must explain a number of examples is the very number of examples. On the other hand, the OI assumption brings several advantages from theoretical, practical, and intuitive viewpoints (see Section 5 and Section 6).
All the FOL concepts introduced in Section 2 can be applied to the OI setting. In the following, they are redefined only when under OI they have a different behavior or specification. From now on, the OI subscript denotes the application of OI to the standard FOL concepts.

4. Generalization Models

The OI assumption changes the classical ordering relations on the representation language, restructuring the corresponding search spaces.

4.1. θ O I Subsumption

Definition 9 
( θ O I subsumption [33,34]). Let C, D be Datalog clauses. C θ -subsumes D under OI (C θ O I -subsumes D, in symbols C O I D ) iff σ substitution is such that C O I . σ D O I .
Example 3. 
 
  • C : p ( Y ) q ( Y , Z ) .                    D : p ( X ) q ( X , a ) .                    D : p ( X ) q ( X , a ) , r ( b ) .
    C < O I D : σ = { X / Y , a / Z } s . t . C O I . σ = D O I ¬ θ s . t . D O I . θ C O I
    C < O I D : σ = { X / Y , a / Z } s . t . C O I . σ D O I ¬ θ s . t . D O I θ C O I
  • C : p ( X ) q ( Y ) .                        D : p ( W ) q ( Z ) .                         C O I D
    θ = { W / X , Z / Y } : C O I θ D O I , σ = { X / W , Y / Z } : D O I σ C O I
  • C : p ( X ) q ( X , a ) .                      D : p ( Y ) q ( Y , Z ) , r ( a ) .                   are not comparable
    ¬ σ s . t . C O I σ D O I , ¬ θ s . t . D O I θ C O I
Like θ subsumption, θ O I subsumption induces a quasi-ordering upon the space of Datalog clauses, as stated by the following result.
Proposition 2. 
Let C, D, E be Datalog clauses. Then
1. 
C O I C ;                                                            (reflexivity)
2. 
C O I D and D O I E C O I E .                                                  (transitivity)
Proof. 
1.   C C C O I . { } C O I C O I C .
2.  If C O I D and D O I E , then σ , θ substitutions are such that C O I σ D O I D O I θ E O I . Thus, C O I θ σ D O I σ E O I . This proves C O I E .
   □
A computational characterization of θ O I subsumption is
Proposition 3. 
Let C, D be Datalog clauses,
C O I D σ s . t . c o r e ( C O I ) σ c o r e ( D O I ) c o n s t r a i n t s ( C O I ) σ c o n s t r a i n t s ( C O I ) .
Proof. (⇐)Trivial.
(⇒)
From Definition 9, C O I D σ substitution is such that C O I . σ D O I
( c o r e ( C O I ) c o n s t r a i n t s ( C O I ) ) σ c o r e ( D O I ) c o n s t r a i n t s ( D O I )
( c o r e ( C O I ) σ c o n s t r a i n t s ( C O I ) σ ) c o r e ( D O I ) c o n s t r a i n t s ( D O I ) .
Since inequalities cannot occur in a Datalog clause, core ( · ) and constraints ( · ) are always disjoint. So, c o r e ( C O I ) σ c o r e ( D O I ) c o n s t r a i n t s ( C O I ) σ c o n s t r a i n t s ( D O I ) .
   □
θ O I subsumption is weaker than θ subsumption:
Proposition 4. 
Let C, D be Datalog clauses. C O I D C θ D .
Proof. 
In Proposition 3, one further property ( c o n s t r a i n t s ( C O I ) σ c o n s t r a i n t s ( D O I ) ) is required in addition to θ subsumption ( c o r e ( C O I ) σ c o r e ( D O I ) ).    □
Proposition 5. 
Let C, D be Datalog clauses.
1. 
C O I D σ such that C O I σ D O I :
        σ injective σ : v a r s ( C ) v a r s ( D ) ( c o n s t s ( D ) c o n s t s ( C ) ) .
2. 
C O I D | C | | D | .
3. 
C O I D iff they are variants.
Proof. 
(sketch) The constraints require substitutions to be injective and map variables in D onto terms in C that in Datalog are just variables (in C) or constants (not already occurring in D). If c o n s t s ( D ) = , it suffices for the substitution to be injective; if there are constants, injectivity alone would not fulfill transitivity of O I , so it would not be an ordering.
Requiring that terms are distinct means that literals cannot unify among each other; hence, θ O I subsumption maps each literal of the subsuming clause onto a single, different literal in the subsumed one. So, equivalent clauses under O I must have the same number of literals; hence, the only way to be equivalent is to be variants (this does not hold under implication and θ subsumption).    □
Thus, in a search space ordered by θ O I subsumption, there is no subset of a clause that is equivalent to the clause itself under OI. This yields smaller equivalence classes than those in a space ordered by θ subsumption.
The sets c o r e ( · ) and c o n s t r a i n t s ( · ) (and hence θ O I subsumption) can be straightforwardly extended from DatalogOI to generic clauses, leveraging the fact that substitutions are mappings [39], and thus can be required to be injective:
Definition 10 
(OI-substitution). Let T be a set of terms and σ a substitution. σ is an OI-substitution with respect to T iff t 1 , t 2 T : t 1 t 2 t 1 σ t 2 σ .
(Non-injective substitutions yield contradictions when applied to constraints: e.g., [ x y ] . { x / a , y / a } = [ a a ] ). This yields another characterization for θ O I subsumption.
Proposition 6. 
Let C , D be clauses. C O I D iff σ OI substitution with respect to t e r m s ( C ) such that C . σ D .
Proof. (⇐)Trivial.
(⇒)
If C O I D , then c o r e ( C O I ) σ c o r e ( D O I ) c o n s t r a i n t s ( C O I ) σ c o n s t r a i n t s ( D O I )
(by Proposition 3 extended to generic clauses). By contradiction, let t 1 , t 2 terms ( C O I ) , t 1 t 2 such that t 1 σ = t 2 σ = t . By definition of C O I ,
t 1 , t 2 terms ( C O I ) t 1 t 2 t 1 t 2 constraints ( C O I )
[ t 1 t 2 ] σ = [ t 1 σ t 2 σ ] = [ t t ] constraints ( D O I ) ,
which is a contradiction by definition of constraints ( · ) .
   □
Example 4. 
θ O I -subsumption in unrestricted space
C : p ( f ( X ) ) : p ( f ( X ) , Y ) .                    D : p ( f 2 ( a ) ) : q ( f 2 ( a ) , f 2 ( a ) ) .
C θ D using substitution θ = { f ( a ) / X , f 2 ( a ) / Y } , but the terms f ( X ) and Y ( f ( X ) Y ) are both mapped to f 2 ( a ) by θ; thus, C ¬ O I D .
As a consequence, Properties 2 and 3 in Proposition 5 hold also for unrestricted clauses, since injectivity of the substitutions involved is the only required property of the DatalogOI language: under OI, clauses are non-redundant and cannot be further reduced. Hence, both definitions of resolution are, by themselves, complete by refutation.
Note that, since θ subsumption is decidable, θ O I subsumption, as a restriction allowing only OI-substitutions, must also be decidable.
Definition 11 
( lgg O I ). A least general generalization under θ O I subsumption of two clauses is a generalization which is not more general than (i.e., it is either more specific than or not comparable to) any other such generalization. Formally, given two Datalog clauses C 1 and C 2 ,
l g g O I ( C 1 , C 2 ) = { C C O I C i , i = 1 , 2 D such that D O I C i , i = 1 , 2 : C ¬ < O I D } .
The lgg O I is not unique since the space of Datalog clauses ordered by O I is not a lattice [32,40]. It can be computed using the algorithm in Algorithm 3, a straightforward extension to DatalogOI of Plotkin’s algorithm [27], that takes into account the (OI) rule in order to determine a partition of the literals in G c o r e that allows to find the set of all the possible lggs under OI rather than the unique lgg under θ subsumption. An equivalence test is needed, after the algorithm, to eliminate duplicate lggs. The reduced lggs of two clauses are in finite number and specifically
l g g O I ( C 1 , C 2 ) < 2 n n = min ( C 1 , C 2 )
since, as a consequence of the OI constraint, the literals in the generalization must be mapped one to one onto literals in the generalized clauses.
Algorithm 3: Algorithm for computing the lggOI
Let us preliminarily define a selection under OI of two DatalogOI clauses C 1 , C 2 as a pair of literals ( c i c o r e ( C 1 ) , d j c o r e ( C 2 ) ) that have the same predicate symbol, sign, and arity.
Given two DatalogOI clauses, lgg O I ( C 1 , C 2 ) is a set of DatalogOI such that for each clause G O I ,
  • core ( G O I ) = { g g = l g g ( c i , d j ) , ( c i , d j ) selection under OI of C 1 , C 2 and φ k φ k , k = 1 , 2 }
    where l g g ( c i , d j ) is computed using Plotkin’s algorithm and φ k , k = 1 , 2 , are the
    substitutions such that core ( G O I ) φ k c o r e ( C k ) , k = 1 , 2 . More precisely, given the mapping
    Φ : terms ( C 1 ) × terms ( C 2 ) N ( consts ( C 1 ) const ( C 2 ) ) s . t .
    Φ ( t i , s i ) = t i if t i = s i X N otherwise
    where ( P ( t 1 , t 2 , , t n ) , P ( s 1 , s 2 , , s n ) ) is a selection under OI of C 1 and C 2 , N denotes
    a set of new variables, and φ 1 , φ 2 are the projections of the inverse function of Φ onto
    t e r m s ( C 1 ) and t e r m s ( C 2 ) , respectively.
  • constraints ( G O I ) = { x y φ k ( x ) φ k ( y ) , k = 1 , 2 }
Example 5. 
 
C : b i c y c l e ( X ) : w h e e l ( X , b ) , w h e e l ( X , X ) , r e d ( c ) ;
D : b i c y c l e ( Y ) : w h e e l ( a , Y ) , s t r i p e s ( d ) .
The selections along with the unification functions are
bicycle ( X ) , bicycle ( Y ) Φ ( X , Y ) = W 1 φ 1 ( W 1 ) = X ̲ φ 2 ( W 1 ) = Y wheel ( X , b ) , wheel ( a , Y ) Φ 1 ( X , a ) = W 2 φ 1 ( W 2 ) = X ̲ φ 2 ( W 2 ) = a Φ 1 ( b , Y ) = W 3 φ 1 ( W 3 ) = b φ 2 ( W 3 ) = Y ̲ wheel ( X , X ) , wheel ( a , Y ) Φ 1 ( X , a ) = W 2 φ 1 ( W 2 ) = X ̲ φ 2 ( W 2 ) = a Φ 2 ( X , Y ) = W 4 φ 1 ( W 4 ) = X φ 2 ( W 4 ) = Y ̲
Since φ 1 and φ 2 are not injective (see the underlined bindings), a partition among variables is determined (and hence among literals containing them) such that they cannot occur in the same generalization ( W 1 is incompatible with W 2 , W 3 is incompatible with W 4 ). So lgg O I ( C , D ) =
{ bicycle ( W 1 ) : , : wheel ( W 2 , W 3 ) W 2 W 3 , : wheel ( W 2 , W 4 ) W 2 W 4 }
The last two clauses are equivalent (i.e., variants), so one of them must be removed.

4.2. OI Implication

An OI-compliant form of implication can be defined.

4.2.1. Model-Theoretic Definition

OI interpretations are always normal interpretations, while of course the converse is not true. An OI model may not be a model. Under OI, for variable bindings, the mapping is required to be one to one:
Definition 12 
(Variable binding under OI). Let I be an interpretation of L with domain D. A variable binding under OI with respect to L is a one-to-one mapping V : vars ( L ) D .
Example 6.
(OI model). The OI interpretation I = { p ( a , b ) } on the domain D = { a , b } is an OI model for C = { p ( a , X ) q ( a ) } . I is not a model for C in general. Indeed, since q ( a ) is false, the truth of the clause depends on p ( a , X ) , where variable X is universally quantified. Under OI, it is sufficient that p ( a , b ) be true for X p ( a , X ) to hold, while in general both p ( a , a ) and p ( a , b ) should be true.
Definition 13 
(OI implication). A formula ϕ is a logical consequence under OI of a set of formulas Σ ( Σ O I ϕ ) iff all OI models of Σ are also OI models of ϕ.
Given two formulas C and D, C OI implies D ( C O I D ) iff any OI model for C is also an OI model for D ( C O I D ). Equivalence is denoted by O I .
Useful results hold also in this semantics:
Theorem 4 
(Deduction Theorem). Let Σ be a set of formulas and ϕ and ψ be formulas. Σ { ϕ } O I ψ iff Σ O I ( ϕ ψ ) .
Proof. 
Σ { ϕ } O I ψ iff all OI models of Σ { ϕ } are OI models of ψ iff
all OI models of Σ are OI models of ¬ ϕ or of ψ iff
all OI models of Σ are OI models of ¬ ϕ ψ iff Σ O I ( ϕ ψ ) .    □
As a consequence, it is possible to also prove this result:
Theorem 5. 
Given a set of formulas Σ and a formula ϕ, Σ O I ϕ iff Σ { ¬ ϕ } is unsatisfiable.
Proof. 
Σ O I ϕ iff (by Theorem 4) Σ { ¬ ϕ } O I iff
all OI models of Σ { ¬ ϕ } are OI models of □ iff Σ { ¬ ϕ } is unsatisfiable under OI.    □
Herbrand (OI) models only can be considered, like in normal logic.
Proposition 7. 
Let Σ be a set of clauses in a first order language L. Σ has an OI model iff Σ has a Herbrand OI model.
Proof. (⇐)Trivial.
(⇒)
Given an OI model I for Σ , consider the Herbrand OI interpretation I ¯ having U Σ as the domain. For every n ary predicate P in Σ , the interpretation I ¯ P ( t 1 , , t n ) is true if P ( t 1 , , t n ) is true in I, or false otherwise.
   □

4.2.2. Proof-Theoretic Definition

Applying the OI assumption to the subsumption theorem for implication yields a constructive definition of OI implication. OI resolution R O I is defined just like resolution R , except that mgu O I s only are allowed. Moreover, equivalent clauses of a resolvent in L O I * ( { C } ) are not considered, since under OI they are merely variants. Like resolution, OI resolution is sound and complete by refutation.
Lemma 2. 
Let C 1 and C 2 be clauses. If R = R O I ( C 1 , C 2 ) then { C 1 , C 2 } O I R .
Proof. 
Suppose R is an OI resolvent of C 1 and C 2 upon a and b by θ = mgu O I ( a , b ¯ ) , and I is an OI model (with domain D) of { C 1 , C 2 } . Let X 1 , , X n = vars ( C 1 ) vars ( C 2 ) and V be a variable binding under OI. Then, if a θ is false under I and V ( X 1 / d 1 ) ( X n / d n ) , at least one of the other literals in C 1 is true under I and V ( X 1 / d 1 ) ( X n / d n ) , d i D . The case is similar for b ¯ .
R is made up of all literals in C 1 θ and C 2 θ except for M θ and N θ . Since either M θ or N θ is false under I and V ( X 1 / d 1 ) ( X n / d n ) , at least one of the literals in R is true under I and V ( X 1 / d 1 ) ( X n / d n ) ; hence, I is an OI model of R. Therefore, { C 1 , C 2 } O I R .    □
Extending this result to OI derivations, soundness of OI resolution is proven:
Theorem 6 
(OI derivation Soundness). Let Σ be a set of clauses, and C be a clause. If Σ O I C , then Σ O I C .
Proof. 
Σ O I C R 1 , , R k = C OI derivation from Σ . By induction on k,
( k = 1 )
R 1 = C Σ ; thus, obviously, Σ O I C .
( k > 1 )
Suppose the thesis holds for k m . Let R 1 , , R m + 1 = C be an OI derivation of C from Σ . If R m + 1 Σ ; then, the theorem is obvious. Otherwise, R m + 1 is an OI resolvent of some R i and R j ( i , j m ). By induction hypothesis, Σ O I R i and Σ O I R j . From Lemma 2, it follows that { R i , R j } O I R m + 1 = C .
   □
It is now possible to define
Definition 14 
(OI implication). Let C, D be clauses. C implies D under OI (or C OI implies D, C O I D ) iff D is a tautology or E R O I * ( { C } ) such that E O I D .
Example 7. 
C : p ( a ) : q ( a , b ) , p ( b ) .      D : p ( c ) : q ( c , d ) , q ( d , e ) , p ( e ) .
E : p ( X ) : q ( X , Y ) , q ( Z , W ) , p ( W ) .       F : p ( U ) : q ( U , V ) , p ( V ) .
E θ C and E θ D . F I C and F I D . E < θ F .
Under OI, E ¬ O I C (since E has more literals) and E ¬ O I D (since terms Y and Z cannot be mapped onto the same term d); F O I C (trivial) and F O I D (since, by self-resolving F, we have: R O I ( { F } ) = { p ( X ) : q ( X , Y ) , q ( Y , Z ) , p ( Z ) } O I D .
Concerning the ordering relationships, θ O I subsumption is weaker than OI implication:
Proposition 8. 
Let C, D be Datalog clauses.
If C O I D then C D ;
If C O I D then C O I D .
Figure 1 summarizes the relationships between the various generalization models.
The soundness of OI implication bridges the gap between its model-theoretic and proof-theoretic definitions. A completeness result bridges the opposite gap.

4.2.3. Subsumption Theorem

The simplest case occurs when both Σ and C are ground:
Lemma 3. 
Let Σ be a ground logic program and C a ground clause. If Σ O I C , then D such that Σ O I D D C .
Proof. 
As for implication [16] since, being the clauses ground, no variable assignment is made.    □
In order to prove the refutation completeness when only C is ground, we need to prove two theorems that are valid for standard notions of unsatisfiability (but hold also under OI) and to lift an OI resolution step from the case of ground parent clauses to that of unrestricted ones.
Theorem 7 
(Herbrand). A set of clauses Σ is unsatisfiable (under OI) iff Γ finite unsatisfiable set of ground instances of clauses from Σ.
Proof. 
Like in [9], but using Proposition 7.    □
Theorem 8 
(Deduction Theorem). Let Σ be a set of clauses and C be a ground clause. Then Σ O I C iff Γ finite set of ground instances of clauses from Σ such that Γ O I C .
Proof. 
Like in [9], but exploiting Theorems 4 and 7.    □
Lemma 4 
(Resolution Lifting). Let C 1 , C 2 be clauses and C 1 , C 2 , respectively, two instances thereof. If R is an OI resolvent of C 1 and C 2 , then R OI resolvent of C 1 and C 2 such that R is an instance of R.
Proof. 
By hypothesis:
  • σ 1 , σ 2 OI substitutions are such that C 1 = C 1 σ 1 , C 2 = C 2 σ 2 .
  • n 1 C 1 , n 2 C 2 , μ = mgu O I ( n 1 , n 2 ¯ ) such that R = ( ( C 1 { n 1 } ) ( C 2 { n 2 } ) ) μ .
Let D 1 = C 1 { n 1 } and D 2 = C 2 { n 2 } : { n 1 } , D 1 C 1 , { n 2 } , D 2 C 2 such that
n 1 = n 1 σ 1 , n 2 = n 2 σ 2 and D 1 σ 1 = D 1 , D 2 σ 2 = D 2 . We have that
R = ( D 1 D 2 ) μ = ( D 1 σ 1 D 2 σ 2 ) μ = ( C 1 , C 2 , C 1 , C 2 standardized apart) ( D 1 D 2 ) σ 1 σ 2 μ .
Now, σ 1 σ 2 μ OI unifier of n 1 and n 2 ¯ θ = mgu O I ( n 1 , n 2 ¯ )
δ OI substitution such that σ 1 σ 2 μ = θ δ .
Thus, R = ( ( C 1 { n 1 } ) ( C 1 { n 2 } ) ) θ = ( D 1 D 2 ) θ
R δ = ( D 1 D 2 ) θ δ = ( D 1 D 2 ) σ 1 σ 2 μ = R .    □
The lifting technique can be generalized to many OI-resolution steps.
Lemma 5 
(Derivation Lifting). Let Σ be a (finite) set of clauses and Σ a set of instances of clauses in Σ. If R 1 , , R k is an OI derivation of clause R k from Σ , then R 1 , , R k OI derivation of clause R k from Σ such that i : R i is an instance of R i .
Proof. 
By induction on the length of the derivation k.
( k = 1 )
R 1 Σ R 1 Σ such that R 1 is an instance of R 1 .
( k > 1 )
Let R 1 , , R k be an OI derivation of R k from Σ ; thus, R k is the resolvent of two clauses in Σ { R 1 , , R k 1 } . By induction hypothesis, R 1 , , R k 1 OI derivation of R k 1 such that i = 1 , , k 1 : R i is an instance of R i . Hence, by Lemma 4, R k such that R k is its instance.
   □
It is now possible to prove the following results:
Lemma 6. 
Let Σ be a logic program and C be a ground clause. If Σ O I C , then D such that Σ O I D D O I C .
Proof. 
Assume C is not a tautology. By Theorem 8, if Σ O I C , then Γ finite set of ground instances of clauses in Σ such that Γ O I C . Then, by Lemma 3 D clause such that Γ O I D D C . Given R 1 , , R k = D OI derivation of D from Γ , it can be lifted by Lemma 5 to an OI derivation R 1 , , R k of R k from Σ , where D = R k is an instance of R k . Taken D = R k : Σ O I D D O I C (since D O I C ).    □
Theorem 9 
(Subsumption Theorem). Let Σ be a logic program and C be a clause. Σ O I C iff D such that Σ O I D D O I C .
Proof. (⇐)For the soundness of the derivation (and θ O I subsumption).
(⇒)
Assume C is not a tautology. Given θ OI substitution that maps vars ( C ) to new constants that do not occur in Σ C , C θ is a non-tautological ground clause and Σ O I C θ . Thus, by Lemma 6 D such that Σ O I D D O I C θ . θ maps variables of C to constants that are not in D because, being derived from Σ , D cannot contain constants in θ . Let δ be a substitution such that D δ C θ and σ be the substitution obtained from δ by replacing in each binding x i / t i the term t i with a i . Then, δ = σ θ . Since θ only replaces the variables x i by a i ( 1 i n ) , it follows that D σ C , i.e., D O I C .
   □
So, OI resolution plus θ O I subsumption can derive the same conclusions as model-theoretic OI implication. As a consequence, some results in [41] for the unrestricted case can be proved. First, let us show when OI implication and θ O I subsumption coincide:
Proposition 9. 
Let C , D be clauses. If C is not self-resolvent and D is not tautological, then C O I D iff C O I D .
Proof. (⇐)Trivial.
(⇒)
If C O I D , by Theorem 9 E R O I * ( { C } ) such that E O I D . Since C is not self-resolvent R O I * ( { C } ) = { C } . Hence the thesis.
   □
Proposition 10. 
Let C and D be clauses. If C O I D , then C + O I D + and C O I D
where C + and C denote the positive and negative literals of a clause C, respectively.
Proof. 
C + O I C , so C + O I C . By hypothesis, C O I D ; thus, C + O I D . C + and D + contain only positive literals, so they cannot be self-OI-resolved, and D + is not a tautology. By Proposition 9, it must hold C + O I D . But C + is made up of positive literals only, so C + O I D + . Analogously, C O I D .    □
Finally, we obtain a sufficient condition for the equivalence between OI implication and θ O I subsumption:
Proposition 11. 
Let C, D be clauses. If D is not ambivalent, then C O I D iff C O I D .
Proof. 
Assume C O I D . As D is not ambivalent, it cannot be a tautology. By Proposition 10 θ , δ OI substitutions such that C + θ D + and C δ D . Now, if C were self-resolving, then there would be a literal l = l σ 1 = l σ 2 such that l σ 1 C + and ¬ l σ 2 C , with σ 1 and σ 2 OI unifiers. But then l σ 1 θ D + and ¬ l σ 2 δ D , which does not hold for the non-ambivalence of D. Thus, C is not self-resolving and D is not ambivalent. By Proposition 9, C O I D .    □
Example 8. 
Consider the domain D = { a , b } and the following cases.
1. 
C = p ( X ) q ( Y ) not self-resolvent (nor ambivalent), D = p ( a ) q ( b ) not tautological. { p ( a ) , p ( b ) } , { p ( a ) , q ( a ) } , { p ( b ) , q ( b ) } , { q ( a ) , q ( b ) } and their supersets are the OI models for C, all of which are also OI models for D, so C O I D ; also, C O I D because θ = { a / X , b / Y } : C θ = D . Note that under OI, p ( a ) q ( a ) and p ( b ) q ( b ) are not to be verified by the interpretations, since they would bind both X and Y onto the same constant (a or b).
2. 
C = p ( X ) ¬ q ( X ) not self-resolvent (nor ambivalent), D = p ( a ) ¬ q ( a ) ¬ r ( a ) not tautological. Interpretations { } , { p ( b ) } , { p ( b ) , q ( a ) } , { p ( a ) } , { p ( a ) , q ( b ) } , { p ( a ) , p ( b ) } , { p ( a ) , p ( b ) , q ( b ) } , { p ( a ) , p ( b ) , q ( a ) } , { p ( a ) , p ( b ) , q ( a ) , q ( b ) } are the OI models for C, and they are also OI models for D, so C O I D ; also, C O I D because θ = { a / X } : C θ D . Moreover, for θ = { a / X } : C + = p ( X ) O I p ( a ) = D +
and C = ¬ q ( X ) O I ¬ q ( a ) ¬ r ( a ) = D .
3. 
C = p ( X ) ¬ q ( Y ) not self-resolvent (nor ambivalent), D = p ( a ) ¬ q ( b ) ¬ r ( a ) not tautological. { } , { p ( b ) } , { p ( b ) , q ( b ) } , { p ( a ) } , { p ( a ) , q ( a ) } , { p ( a ) , p ( b ) } , { p ( a ) , p ( b ) , q ( b ) } , { p ( a ) , p ( b ) , q ( a ) } , { p ( a ) , p ( b ) , q ( a ) , q ( b ) } are the OI models for C, and they are also OI models for D, so C O I D ; also, C O I D because θ = { a / X , b / Y } : C θ D . Note that interpretations { p ( a ) , p ( b ) , q ( a ) } , { p ( a ) , p ( b ) , q ( b ) } , { q ( a ) , q ( b ) , p ( a ) } , { q ( a ) , q ( b ) , p ( b ) } , { p ( a ) , p ( b ) , q ( a ) , q ( b ) } are not OI interpretations since they would bind both X and Y onto the same constant (a or b).
Moreover, C + = p ( X ) q ( Y ) O I p ( a ) q ( b ) = D + for θ = { a / X , b / Y }
and C = O I ¬ r ( a ) = D because the empty clause subsumes everything.

4.2.4. Refutation Completeness and Compactness

Like normal resolution, OI resolution alone is not deduction complete, yet it is complete with respect to an unsatisfiable set of clauses.
Theorem 10 
(Refutation Completeness). A finite set of clauses Σ is unsatisfiable under OI iff Σ O I .
Proof. (⇐)For the soundness of the OI derivation (Theorem 6).
(⇒)
If Σ O I , for Theorem 9 D such that Σ O I D D O I . So, D = and thus Σ O I .
   □
All the results in this section can be extended to the case of Σ infinite, owing to a Compactness Theorem (see [42] for the unrestricted case). A proof for our notion of satifiability is proposed here by lifting a demonstration given for the propositional calculus. Two preliminary results on binary trees must be recalled:
Lemma 7 
(König’s Lemma). A binary tree with arbitrarily long branches has an infinite branch.
Lemma 8. 
Let Σ be a set of clauses. Then, Σ Σ such that Σ has an OI model iff Σ has an OI model.
Now, the Compactness Theorem is proven in two forms:
Theorem 11 
(Compactness—First Form). Given an infinite set of clauses Σ, Σ Σ finite: Σ has an OI model iff Σ has an OI model.
Proof. (⇒)Trivial using Lemma 8.
(⇐)
Assume Σ 0 Σ finite: Σ 0 has an OI model. Then, a Herbrand OI model can be built based on Proposition 7. The Herbrand base H Σ is in general infinite but countable: denote its elements with the sequence A 1 , , A n , .
Now, let Σ n be a finite set of clauses whose truth depends on A 1 , , A n . Consider a binary tree built by taking node r as the root and scuh that the successive nodes stand for the truth values of A 1 , , A n , , respectively; r has two edges towards two nodes standing for the possible truth values of A 1 ; then, from each node, two edges depart towards truth values of A 2 , and so on. Clearly, a Herbrand interpretation is given by assigning the truth values encountered traversing a path in such a tree.
Let T Σ be the subtree obtained by taking all finite paths that represent an OI modelfor C, C Σ n . Since every Σ n has an OI model, T Σ must have arbitrarily long branches; hence, by Lemma 7, it has an infinite branch, representing an OI model for Σ .
   □
Theorem 12 
(Compactness—Second Form). Let Σ be an infinite set of formulas having no OI model. Then, Σ Σ is finite that has no OI model.
Proof. 
By contradiction, if all the subsets Σ have an OI model, then Σ should have an OI model by Theorem 11.    □

4.3. Decidability

Let us start by providing some definitions and preliminary results. Let Σ = { C 1 , , C m } be a set of clauses, and C a clause with v a r s ( C ) = { x 1 , , x n } .
  • Given a 1 , , a n distinct constants not occurring in Σ or C, σ = { a 1 / x 1 , , a n / x n } is a Skolem substitution for C with respect to Σ .
    The term set of Σ by σ is the set of all terms occurring in Σ σ .
  • Given a set of terms T, the instance set of C with respect to T is
    I ( C , T ) = { C θ | θ = { t i / x i | i = 1 , , n : t i T } O I s u b s t i t u t i o n }
    The instance set of Σ with respect to T is I ( Σ , T ) = I ( C 1 , T ) I ( C m , T ) .
Lemma 9. 
Let Σ be a set of clauses, C a clause, and σ a Skolem OI substitution for C with respect to Σ. Σ O I C iff Σ O I C σ .
Proof. (⇒)Trivial.
(⇐)
Suppose C is not a tautology and σ = { a i / x i | x i vars ( C ) } . If Σ O I C σ , then, by Theorem 9, D clause such that Σ O I D D O I C σ . Hence, since all constants in D must occur also in clauses in Σ , σ can be regarded as a Skolem substitution for C with respect to D. Then, by Lemma 6, D O I C . So, we can conclude that Σ O I C .
   □
In the following, some ideas introduced in [43] for proving the decidability of T-implication are borrowed (T-implication [43] was an attempt to weaken implication to obtain a decidable relationship, but (being not transitive) does not induce a quasi-ordering, and hence it is not useful for defining search spaces and refinement operators on them). Like implication, OI implication is also easily proven decidable when only ground clauses are involved.
Lemma 10. 
Let Σ be a set of ground clauses and C a ground clause. Σ O I C is decidable.
Proof. 
Let S be the (finite) set of all ground atoms occurring in Σ and C. Then
Σ O I C iff (by Theorem 5) Σ ¬ C has no OI model iff (by Proposition 7);
Σ ¬ C has no Herbrand OI model iff no subset of S is a Herbrand OI model of Σ ¬ C .
Since S is finite, this relationship is decidable.    □
The following result is based on the observation that if a term does not appear in the conclusion of an OI derivation between ground clauses, then it must have been resolved during an OI resolution step. Hence, by replacing this term with a new term in the premises, a new derivation of the same conclusion can be obtained.
Lemma 11. 
Let Σ be a set of clauses, C a clause, σ a Skolem OI-substitution for Σ with respect to { C } , and T the term set of C by σ. Then, Σ O I C iff I ( Σ , T ) O I C σ .
Proof. 
(⇒)
Trivial when C is a tautology. If C is not a tautology, then neither is C σ . Since Σ O I C σ by Lemma 9, from Theorem 8, it follows that Γ is a finite set of instances of clauses in Σ such that Γ O I C σ . For Theorem 9, there exists a derivation from Γ of a clause E, such that E O I C σ . Since Γ is made up of ground clauses, E must be ground, too; hence, E C σ . Therefore, E contains only terms from C σ .
Now, consider the ground OI substitutions γ i , i = 1 , , n that yield the clauses in Γ from those in Σ . i = 1 , , n : γ i new OI substitution such that
t j / x j γ i   if   t j / x j γ i   and   t j T ; s j / x j γ i   if   t j / x j γ i   and   t j T ,
where s j T is a choice of a new term from T. In this manner, every term { t 1 , , t k } in clauses in Γ which is not in T is replaced with a distinct term in { s 1 , , s k } T . Since Γ contains ground instances of clauses in Σ , this replacement yields a new set of clauses Γ where each clause is a ground instance of clauses in Σ . The term replacement involves the term set T by σ ; then, the existence of a derivation of E from Γ yields a derivation of E from Γ . Indeed, each OI resolution step in the derivation from Γ can be transposed into an OI resolution step from Γ , since the same terms in Γ are replaced by the same terms in Γ . In addition, the terms in Γ that are not in T, replaced by { s 1 , , s k } T , do not appear in the conclusion E of the derivation.
Then, a derivation of E from Γ exists, so we can write Γ O I E and hence Γ O I C σ . Now, Γ is a set of ground instances of clauses in Σ and all terms in Γ are also in T; then, Γ I ( Σ , T ) . Thus, I ( Σ , T ) O I C σ .
(⇐)
Σ O I I ( Σ , T ) holds, being the instance set made up of instances of clauses in Σ . By hypothesis, I ( Σ , T ) O I C σ , so Σ O I C σ and, by Lemma 9, Σ O I C .
   □
The same does not hold for unconstrained implication, since in this case the terms replaced cannot be tracked back. For OI implication, since different terms cannot be mapped onto the same term, a ground instance of a clause C in Σ can be transformed into another instance of C by means of an OI anti-substitution composed with an OI substitution.
Theorem 13 
(Decidability—General Case). Let Σ be a set of clauses and C a clause. The problem whether Σ O I C is decidable.
Proof. 
By Lemma 11, I ( Σ , T ) O I C σ is equivalent to Σ O I C . But I ( Σ , T ) and C σ are both ground, and OI implication between ground clauses is decidable (by Lemma 10).    □

5. Incremental Inductive Synthesis

Using the OI framework for the incremental inductive synthesis of theories brings several advantages. The process of abstract diagnosis and debugging of an incorrect theory involves a search in a space whose algebraic structure makes easy the definition of algorithms that meet several desirable properties from the viewpoint of theoretical computer science, and specifically of the mathematical theory of computation (e.g., non-termination) [32], as well as of the theory of computational complexity [33,34]. Such algorithms embody two ideal refinement operators, one for generalizing incomplete clauses and the other for specializing inconsistent clauses.
In ILP, a standard practice to restrict the search space is imposing biases on it [44]. In the following, we are concerned exclusively with logic theories expressed as hierarchical programs, that is, as (non-recursive) programs made up of Datalog linked clauses. Under OI, we can give the following definitions, where E and E + denote the sets of all the negative and positive examples (respectively).
Definition 15 
(Inconsistency).
  • A theory T is inconsistent iff H T , N E : H is inconsistent with respect to N.
  • A hypothesis H is inconsistent with respect to N iff C H : C is inconsistent with respect to N.
  • A clause C is inconsistent with respect to N iff σ :
    1. 
    body(C) σ body(N)
    2. 
    ¬ head(C)σ = head(N)
    3. 
    constraints( C O I )σ⊆ constraints( N O I )
    where b o d y ( φ ) and h e a d ( φ ) denote, respectively, the body and the head of a clause φ.
Note that the O I notation cannot be used, since the same σ must satisfy Conditions 1, 2, and 3. If at least one of such conditions is not met, C is consistent with respect to N.
Definition 16 
(Incompleteness).
  • A theory T is incomplete iff H T , P E + : H is incomplete with respect to P.
  • A hypothesis H is incomplete with respect to P iff C H : ¬ ( P O I C ).
Otherwise it is complete with respect to P.
When the aim is to incrementally develop a logic program, that should be correct with respect to its intended model at the end of the development process, it becomes relevant to define operators that allow an incremental refinement of too weak or too strong programs [45].

5.1. Refinement Operators

In ILP, the definition of ideal refinement operators [34] is crucial. Such operators do exist in the space of Datalog clauses ordered by θ O I subsumption, where infinite unbound strictly ascending/descending chains do not exist, since equivalence among clauses coincides with variance [33,34].
Definition 17 
(Refinement operators under OI). Let C be a Datalog clause.
  • D ρ O I ( C ) when exactly one of the following conditions holds:
    1. 
    D C θ , where θ = { a / x } , a c o n s t s ( C ) , x v a r s ( C ) ;
    2. 
    D = C { ¬ l } , where l is an atom such that ¬ l C .
  • D δ O I ( C ) when exactly one of the following conditions holds:
    1. 
    D C γ , where γ = { x / a } , a c o n s t s ( C ) , x v a r s ( C ) ;
    2. 
    D = C { ¬ l } , where l is an atom such that ¬ l C .
In other words, a specialization of a clause can be obtained by substituting a variable with a (new) constant or by adding a literal to its body; a generalization of a clause can be obtained by substituting a constant with a (new) variable or by removing a literal from its body.
By 2 in the definition of δ O I , all possible generalizations of a clause must be (non-empty) subsets of its literals that are 2 C 1 . Applying 1 to each of them, a clause containing n constants can yield k = 0 , n n k generalizations. Excluding the clause itself, the total number of proper generalizations P G E N O I ( C ) can be bound as follows:
P G E N O I ( C ) ( 2 C 1 ) k = 0 c o n s t s ( C ) c o n s t s ( C ) k = 2 C + c o n s t s ( C ) 2 c o n s t s ( C )
Proposition 12 
([34]). The refinement operators in Definition 17 are ideal for Datalog clauses ordered by θ O I subsumption.
It is possible to define upper bounds to the number of refinement steps required to reach a clause D by starting from a clause C:
  • D ρ O I n + t ( C ) , where n = c o n s t s ( D ) c o n s t s ( C ) 0 , t = D C
  • D δ O I n + t ( C ) , where n = c o n s t s ( D ) c o n s t s ( C ) 0 , t = C D
Some redundancy is present in the defined refinement operators, since in Item 2 they can add (or remove) literals containing new constants (or constants not occurring elsewhere in the clause) which is in charge of Item 1. This can be avoided, obtaining less sparse search graphs at the cost of some additional refinement item, by modifying Item 2 as follows, obtaining the modified operators ρ O I and δ O I , respectively:
ρ O I : 2 .
D = C { ¬ l } where l is an atom such that ¬ l C and c o n s t s ( l ) c o n s t s ( C )
δ O I : 2 .
D = C { ¬ l } where l is an atom such that ¬ l C and c o n s t s ( l ) c o n s t s ( C { ¬ l } )
Inspired by a previous concept first given by Reynolds and extended to clauses by Shapiro, we have a measure for the complexity of a clause:
Definition 18 
(sizeOI). The size of a clause C under OI is s i z e O I ( C ) = C + c o n s t s ( C ) .
i.e., the number of literals in C plus the number of distinct constants occurring in C. It allows to predict the exact number of steps required to perform a refinement, based only on the syntactic structure of the clauses involved (that could be known or bounded a priori). Given two clauses C and D, if C < O I D , then
C δ O I k ( D ) , D ρ O I k ( C ) , k = s i z e O I ( D ) s i z e O I ( C )
Example 9. 
C : p ( X ) : q ( X , Y ) D : p ( X ) : q ( X , Y ) , q ( a , b )
Since s i z e O I ( C ) = 2 and s i z e O I ( D ) = 3 + 2 = 5 , The number of steps required to refine C in D (or vice versa) is s i z e O I ( D ) s i z e O I ( C ) = 3 , i.e., D ρ O I 3 ( C ) (respectively, C δ O I 3 ( D ) ):
C = p ( X ) : q ( X , Y ) . 2 . p ( X ) : q ( X , Y ) , q ( W , Z ) . 1 . p ( X ) : q ( X , Y ) , q ( a , Z ) . 1 . p ( X ) : q ( X , Y ) , q ( a , b ) . = D

5.2. Refinement in Unrestricted Search Spaces

Through OI substitutions, the previous refinement operators can be extended to the case of unrestricted search spaces ordered by θ O I subsumption (with functions).
Definition 19 
(Refinement operators in unrestricted space). Let C , D be clauses.
D ρ O I ( C ) when exactly one of the conditions in Definition 17 and the following one holds:
3. 
D = C θ , where θ = { f ( Y 1 , , Y n ) / x } , f function symbol ( n > 0 ) and x v a r s ( C ) ;
D δ O I ( C ) when exactly one of the conditions in Definition 17 and the following one holds:
3. 
D = C γ , where γ = { x / f ( Y 1 , , Y n ) } , f function symbol ( n > 0 ) such that f ( Y 1 , , Y n ) t e r m s ( C ) and x v a r s ( C ) .
Again, these operators are ideal for this search space. The proof requires two lemmas:
Lemma 12. 
C , D clauses such that C < O I D : C O I θ = D O I a ρ O I -chain from C to D.
Proof. 
Let C O I = C 0 , , C n = D O I be a chain of clauses such that C i = C i 1 θ i , i = 1 , , n , where every substitution θ i is defined by 1 and 3 in the definition of ρ O I (here renaming substitutions are left out for brevity). Thus, C i ρ O I ( C i 1 ) , i = 1 , , n .    □
Lemma 13. 
C , D clauses such that C D : a ρ O I -chain from C to D.
Proof. 
By induction on the number n of literals in D C .
(base)
If n = 0 , then C = D ; thus, the empty chain fulfills the lemma.
(step)
Assume that for some k, 0 k < n , there is a ρ O I -chain from C to a C k , where C k is obtained from C by adding k literals contained in D. Now, let l D C k and C k + 1 = C k { l } . Since l D C k , then l C k ; then, it can be used to refine C k , according to 2 in the definition of ρ O I . Hence, we obtain C k + 1 ρ O I ( C k ) . By inductive hypothesis, there is also a chain from C to C k . Thus, the lemma is satisfied for k + 1 .
   □
Theorem 14 
(Ideality of ρ O I , δ O I ). In an unrestricted clausal search space, ρ O I and δ O I are ideal refinement operators.
Proof. 
(local finiteness)
Trivial, since, by definition, a term dominated by an n-ary function ( n 0 ) is turned into a variable (or vice versa) or a single literal is added (or removed).
(properness)
If D ρ O I ( C ) , by definition of downward refinement operator, C O I D . If also D O I C , then C O I D . Hence, by 3 in Property 5 in the unrestricted space, the clauses would be variants, which is false for the construction of the operator (either C has a new term with respect to D or it is longer than D). The case is analogous for the other operator.
(completeness)
Let C and D be two clauses such that C O I D . We have to prove the existence of a chain from C to D (clauses equivalent to D are omitted here since they are variants). Hence, θ : C θ D . Let E = C θ . For Lemma 12, there is a ρ O I -chain from C to E. By definition of E, it also holds E D , and by Lemma 13, there is a ρ O I -chain from E to D. Thus, we have a chain from C to D, which proves that ρ O I is complete. An analogous proof holds for the completeness of δ O I (Lemmas similar to 12 and 13 are needed).
   □

5.3. Refinement Operators for OI Implication

When dealing with non ambivalent clauses, generalization and specialization under implication (or OI implication) correspond to the cases considered for θ subsumption (or θ O I subsumption). Thus, an operator computing the resolution and inverse resolution steps for ambivalent clauses is needed. Powers and roots of a clause [46] (operations in which a clause is resolved with itself) can be regarded as theoretical refinement operators.
Definition 20 
(Clause Power/Root). Let C , D be clauses. D is an n t h power of C (or C is an n t h root of D) iff D is a variant of a clause in L n 1 ( C ) , n 1 .
Example 10. 
Given C : p ( X ) : p ( f ( X ) ) , q ( X , Y ) ,
  • p ( X ) : p ( f 2 ( X ) ) , q ( X , Y ) , q ( f ( X ) , Z ) is a second power of C;
  • p ( X ) : p ( f 3 ( X ) ) , q ( X , Y ) , q ( f ( X ) , Z ) , q ( f 2 ( X ) , W ) is a third power of C.
By the subsumption theorem, downward refinements of clause C in a search space ordered by OI implication can be obtained by self-resolving Cn times ( n t h power) or by applying a downward refinement operator for θ O I subsumption. Conversely, upward refinements require to compute an n t h root of C or to exploit an upward refinement operator for θ O I subsumption. Both cases are not practically useful, since the n to stop at in the process of self-resolution is not known a priori. Moreover, while it is clear how to compute n t h powers by using linear resolution, the dual is a more complex task since it involves the inversion of resolution steps. We now tackle this latter problem.

5.3.1. Inverting OI Resolution

We adapt to OI implication the “or-introduction” technique developed in [43] to build parent clauses given the resolvent and generalize it to build OI ancestors. The resolvent of two clauses C and D is R = ( ( C { l C } ) ( D { l D } ) ) θ , where θ = m g u O I ( l C , l D ¯ ) . The most specific parent clauses are those that inherit all literals from the OI resolvent. So, θ must be an m g u O I also for { C { l C } , D { l D } } :
( C { l C } ) θ = ( D { l D } ) θ = R
Thus, the two ambivalent parent clauses are obtained by introducing a new literal as
C = R { l } and D = R { l ¯ } where l = l C = l D ¯
Proposition 13. 
Let R be a clause and l a literal. { R } O I ( R { l } , R { l ¯ } ) .
Proof. 
See ([43], page 39). The “ O I ” proof is based on the empty substitution which is injective. The opposite holds since OI resolution is sound, being a special case of resolution (cf. Theorem 6).    □
Another useful result can be transposed straight forwardly from the following implication.
Proposition 14. 
Let C , D be clauses and R an OI resolvent of C and D. Then, l is literal such that C O I R { l } and D O I R { l ¯ } .
Proof. 
Similar to ([43], page 39), but considering OI substitutions: by definition of OI resolution, we have R = ( ( C { a } ) ( D { b } ) ) θ where θ = m g u O I ( a , b ¯ ) . Now, let l = a θ = b ¯ θ (with θ injective to be eliminated), then
R { l } = ( C ( D { b } ) ) θ C θ and R { l ¯ } = ( ( C { a } ) D ) θ D θ .    □
This technique can be iterated to compute clauses from which R follows in many steps, progressively introducing new literals: given { R { l } , R { l ¯ } } , either of the two parent clauses can be inverted; applying or-introduction to the former, we have { R { l } { m } , R { l } { m ¯ } , R { l ¯ } } . Generalizing, we obtain the following.
Definition 21 
(Or-introduced Clause [43]). Let C be a clause and Ω a sequence of literals. Then, a set of clauses S is or-introduced from C by Ω iff
(a) 
S = { C } and Ω = or
(b) 
S = ( S D ) { D { l } , D { l ¯ } } and Ω = l 1 , , l n , l , where S is a set of clauses or-introduced from C by l 1 , , l n and D S .
Logical equivalence still holds through this step of inversion.
Theorem 15. 
Let S be a set of clauses or-introduced from clause C. S O I { C } .
Proof. 
See ([43], page 41). It follows by induction on the number of or-introduced literals, using Proposition 13.    □
A sequence of resolution steps can be inverted by applying or-introduction to a sequence of literals.
Lemma 14. 
Let C , D , E be clauses and { D 1 , , D n } a set of clauses or-introduced from D such that D O I E i = 1 , , n : C O I D i . Then, { E 1 , , E n } is a set of clauses or-introduced from E such that i = 1 , , n : C O I E i .
Proof. 
Similar to ([43], page 41). Since the D i s are or-introduced from D, D i = D Λ i . Now, i { 1 , , n } : C O I D i θ i OI substitution such that C θ i D i = D Λ i and D O I E σ OI substitution such that D σ E . Hence, C θ i σ ( D Λ i ) σ ( E Λ i σ ) . Let E i = E Λ i σ , then i { 1 , , n } : C O I E i .    □
Theorem 16 
(Inversion of OI Resolution). Let T be a set of clauses, D L O I n ( T ) . Then, S is a set of clauses or-introduced from D , C T such that E S : C O I E .
Proof. 
See ([43], pages 41–42). Let S n be or-introduced from D n by Ω n = l 1 , , l n 1 . By induction on n,
(base)
For n = 1 , S 1 = { D 1 } is or-introduced from Ω 1 = . Then, of course, D 1 O I D 1 , D 1 T ;
(step)
By inductive hypothesis Ω k = l 1 , , l k 1 sequence of literals, S k set of clauses or-introduced from D k by Ω k such that E k S k C T : C O I E k .
By definition of linear OI resolution, D k + 1 is an OI resolvent of D k and some C T . Then, by Proposition 14, l literal such that C O I D k + 1 { l } and D k O I D k + 1 { l ¯ } .
By the inductive hypothesis and by Lemma 14, S k set of clauses or-introduced from D k + 1 { l ¯ } by Ω k = l 1 , , l k 1 such that E k S k C T : C O I E k .
Thus, S k + 1 = ( D k + 1 { l } ) S k is a set of clauses or-introduced from D k + 1 by Ω k + 1 = l , l 1 , , l k 1 such that E k + 1 S k + 1 C T : C O I E k + 1 .
   □
Restricting to the case of a single clause:
Corollary 4. 
Let C be a clause and D L O I n ( { C } ) . S set of clauses or-introduced from D such that S O I { D } E S : C O I E .

5.3.2. Expansions

Section 5.3.1 makes an effort to reduce resolution to subsumption mechanisms. Thus, we come to the actual computation of upward refinements of clauses by using the following notion that can be regarded as the inverse reduction with respect to logical implication.
Definition 22 
(Expansion under OI). Let C be a clause and Ω a sequence of literals. A clause E is an expansion under OI (or OI expansion ) of C by Ω iff E is in the lggOI of a set of clauses or-introduced from C by Ω.
This expansion technique for unrestricted clausal spaces is practically infeasible, since it leads to an exponential growth of the computed expansion [43]: if n is the number of literals or-introduced to compute an expansion E of a clause C such that C = m , then the maximal cardinality of E is ( m + n ) n + 1 . This cannot happen under OI:
Theorem 17 
(Maximal Cardinality of OI expansions). Let C be a clause, C = m , S a set of clauses or-introduced from C by [ l 1 , , l n ] , and E in the lggOI of S. The maximum length of E is m + log 2 ( n ) .
Proof. 
Any l g g O I of a set of clauses, including E, cannot be longer than the shortest clause in the set. Each or-introduction replaces a clause of length l with two clauses of length l + 1 , thus yielding a binary tree. In particular, the clauses in a set S i are the leaves of the tree obtained after i or-introductions, and such clauses have length equal to m plus their depth in the tree. As a consequence, the longest l g g O I is possible when the or-introductions coming from the given sequence are performed in the most uniform way possible, i.e., by or-introducing each time one of the shortest clauses so far obtained. In fact, the order according to which or-introductions are performed depends on the goal of the search process. If this leads to a non-balanced tree, this means that some clause has a depth shorter than log 2 ( n ) , and that would be the new bound for the l g g O I .
Taking it to the extreme, if each time the longest clause in the set is choosen for or-introduction, the final set S will contain a clause of length m + n and another one of length m + 1 (resulting from the first or-introduction), and hence the resulting l g g O I will be long at most m + 1 . After the last (the nth) or-introduction, each leaf (i.e., each clause in S) will have depth at most log 2 ( n ) , and hence E, that is, the l g g O I of all clauses in S, will be long at most m + log 2 ( n ) .    □
It is noticeable that expansions of a clause are logically equivalent to it.
Theorem 18 
(Equivalence Preservation). Let C be a clause and E its OI expansion by some sequence Ω. Then, C O I E .
Proof. 
Similar to ([43], page 44). By definition of OI expansion, S is a set of clauses or-introduced from C by Ω such that E l g g O I ( S ) .
( )
F S : E O I F and, by Proposition 8, E O I F . So, { E } S . By Theorem 15, S O I { C } ; hence, E O I C .
( )
By Theorem 15, { C } O I S , i.e., F S : C O I F , and so G such that C O I G G O I F . For the properties of lgg O I , G O I E . Summing up, C O I G O I E , and by Proposition 8, C O I G O I E , so C O I E .
   □
Theorem 19 
(OI implication reduced to θ O I subsumption). Let C be a clause and D be a non-tautological clause such that C O I D . Then, E is the expansion of D such that C O I E .
Proof. 
By definition of OI implication D n L O I n ( { C } ) is a clause such that D n O I D . So, by Theorem 16 S n is a set of clauses or-introduced from D n such that F n S n : C O I F n . Then, by Lemma 14 S is a set of clauses or-introduced from D such that F S : C O I F . By definition of l g g O I , E l g g O I ( S ) : C O I E .    □
This theoretical apparatus leads to the definition of the following operators [47].
Definition 23 
( ρ O I , δ O I ). Given clause C,
D δ O I ( C ) iff E expansion of C such that D δ O I ( E ) ;
D ρ O I ( C ) iff E L O I n ( { C } ) for some n, such that D ρ O I ( E ) .
ρ O I is proper and complete for the properness and completeness of ρ O I , but not locally finite, because, as said, computing nth powers of a clause in the definition of ρ O I is a merely theoretical issue, since it is not known a priori which n to stop at.
In regard to δ O I , Theorem 17 proves that there is a limit for the length of the computed expansions. Moreover, in the modified refinement operators for θ O I subsumption δ O I , ρ O I , the number of steps necessary to the refinement can be computed from the syntax of the clauses involved. Redefining the operators under OI implication to exploit δ O I and ρ O I gives more control over the complexity of the refinement process.
Theorem 20 
(Ideality of δ O I and ρ O I ). In a clausal space ordered by OI implication, δ O I and ρ O I are ideal refinement operators.
Proof. 
δ O I and ρ O I are defined in terms of δ O I and ρ O I , respectively.
( δ O I local finiteness)
Holds by definition of δ O I and Theorem 19 ensuring the existence of an expansion, as an l g g O I of a set of or-introduced clauses (a singleton, in this case);
( δ O I properness)
Follows from the properness of δ O I (cf. Theorem 14);
( δ O I completeness)
As for local finiteness, it comes from Theorem 19 ensuring the existence of an expansion and Proposition 12.
( ρ O I properness)
Like for δ O I , using again Theorem 14;
( ρ O I local finiteness and completeness)
The level saturation procedure [9] that computes the linear self-resolution steps is needed. Call D the specialization to be computed.
For n = 1 : L O I 1 ( { C } ) = { C } . Only a θ O I subsumption step is needed, and the ideal operator ρ O I can be used.
For n > 1 , suppose that D has not been computed up to step n 1 and consider the case for n. A further linear OI resolution step can be computed as follows:
L O I n ( { C } ) = { R O I ( E , F ) E L O I n 1 ( { C } ) , F L O I k ( { C } ) , k < n }
and the ideal operator ρ O I can be used to compute specializations of clauses in L O I n ( { C } ) . The computation is non-deterministic in the choice of F.
By construction, this procedure certainly finds specialization under OI implication when it exists. The computation terminates using as a halting point the condition R L O I n ( { C } ) : | R | > | D | . In fact, the cardinality of the clauses increases monotonically, both by resolution ( | R O I ( E 1 , E 2 ) | > | E i | , i = 1 , 2 except when | E 1 | 2 or | E 2 | 2 , in which case the OI resolvent can be reached with a simple OI subsumption step) and by OI subsumption (if C 1 O I C 2 , then | C 1 | | C 2 | ).
   □

6. Discussion

While this paper focused on providing a comprehensive account of the OI framework from the foundational and theoretical viewpoint, it is also important to discuss its practical consequences. This section provides motivations for investigating and using the OI framework from the viewpoints of intuition and efficiency.

6.1. Intuition

Many clues support the belief that the OI assumption is built in the human way of thinking. For instance, saying that “X and Y are brothers if they share a common parent”, one implicitly means (and everybody understands) that X and Y must be two different persons. And, defining a bicycle as “an object made up of, among other components, a wheel X and a wheel Y”, nobody would think of a mono-cycle, even if X and Y could be associated to one wheel. In this respect, logic is counterintuitive. E.g., in the space of FOL clauses, p ( X , X ) p ( X , Y ) p ( Y , Z ) clearly involves more objects than p ( X , X ) ; nevertheless, they are equivalent under θ subsumption, since the latter is clearly a subset of the former and the former may become a subset of (in fact, equal to) the latter through substitution { Y / X , Z / X } .
Let us show a more practical example. The two structures in a block world shown in Figure 2 can be represented as follows:
E 1 :
blocks(obj1) :− part_of(obj1,p1), part_of(obj1,p2), on(p1,p2), cube(p1), cube(p2), small(p1),big(p2), black(p1), stripes(p2).
E 2 :
blocks(obj2) :− part_of(obj2,p3),part_of(obj2,p4), on(p3,p4), cube(p3), cube(p4), small(p3), big(p4), black(p4), stripes(p3).
Now, the lgg under θ subsumption between E 1 and E 2 is [27]
G:
blocks(X) :− part_of(X,X1), part_of(X,X2), part_of(X,X3), part_of(X,X4), cube(X4), on(X1,X2), cube(X1), cube(X2), cube(X3), small(X1), big(X2), black(X3), stripes(X4).
i.e., the most specific generalization of two structures, involving two objects each, involves four objects, which is quite counterintuitive! A person asked to tell what the two structures have in common would give two alternative answers: “a small cube on a big cube or a black cube and a stripes cube”, each capturing only the portion of correspondences that are consistent with each other. This is exactly what the lgg O I [40] yields:
G 1 :
blocks(X) :− part_of(X,X1), part_of(X,X2), on(X1,X2), cube(X1), cube(X2), small(X1), big(X2).
G 2 :
blocks(X) :− part_of(X,X1), part_of(X,X2), cube(X1), cube(X2), black(X1), stripes(X2).

6.2. Notes on Computational Complexity and Efficiency

Due to lack of space, a detailed account of computational complexity of the OI-related algorithms proposed in this paper cannot be provided here. Generally speaking, the complexity of processing each example is related to the degree of indeterminacy in the relational descriptions and depends on the ILP system (e.g., see [48] for InTheLEx). Still, we may elaborate on several aspects that show why the OI paradigm promises more efficiency than the standard approach. Crucial in this direction is the fact that equivalent formulas must be variants under θ O I subsumption, as explained in the following.
A first consequence of this has to do with the mentioned existence of infinite unbounded strictly ascending/descending chains in the classical space of clauses ordered by θ subsumption. This is a much more relevant problem than computational complexity, since these chains may prevent termination of the refinement operators used in ML systems, and thus the very computability of generalizations/specializations [32]. On the contrary, the fact that equivalent formulas must be variants under θ O I subsumption means that any proper generalization of a clause must be a proper subset of the clause (or, from the opposite perspective, that any proper superset of a clause is a proper specialization thereof). So, all possible generalizations of a clause are in finite number, and the number of literals in the examples represents a bound on the size of the generalizations to be searched by the ML systems. This is a guarantee of termination, because whenever we add or remove a literal we are actually moving to a different semantic item in the space, and sooner or later we will have to stop (in generalization because we reach reach an incomplete clause or, ultimately, the empty set; in specialization because we reach an inconsistent clause).
A second consequence is that, while under θ subsumption (possibly infinitely many) syntactically different clauses might be equivalent, under OI, every equivalence class includes only one clause (modulo variable renaming). In other words, the space of clauses under OI is more scattered than under θ subsumption; i.e., representing it as a graph, the number of nodes (clauses) and arcs (immediate generalization/specialization relationships) under OI is much larger. However, the meaning of every node and arc is much sharper than under θ subsumption, helping the refinement operators to better focus their search for a correct hypothesis. In regard to arcs specifically, this is good news for the purposes of ML, because learning is guided by the examples, and so only the generalizations/specializations that are compatible to the examples are to be explored, with the guarantee that every refinement step will lead us closer to the goal (the correct target concept). In regard to nodes, Lemma 1 states how many nodes are necessary under OI to represent one node in the unconstrained case. While this theoretical number may be large (due to the factorial), again one must consider that for ML purposes, no explicit expansion or translation is required: only the clauses under OI that are compliant with the examples are generated under the guidance of the examples, which are limited by the same cosiderations made above.

7. Operability: Systems and Applications

The OI framework is not just a theoretical matter. It also has an impact on effectiveness and efficiency of the ML systems, resulting in practical value. A study on the learning systems FOIL and FOCL found that, because of their not using the OI-compliant space, they cannot solve some learning tasks [31]. More in general, the traditional ILP systems proposed in the literature implicitly use the OI assumption when learning, but not when using the learned theory, resulting in an inconsistent behavior. In recent years, works on OI have focused more on the development of systems and on their application to several problems.

7.1. OI-Based Systems

Algorithms for implementing different versions of the OI-based refinement operators and experiments showing their performance can be found in [48,49,50,51]. The OI assumption was adopted in developing a series of ML systems, starting with the pair INDUBI/H [52] + INCR/H [53], where the former could learn from scratch, adopting a batch approach, a first version of the theory to be subsequently refined by the latter, adopting an incremental approach, when new examples became available. Subsequently, it was embedded in the incremental ML system InTheLEx [54] that combined different inference strategies (deductive, inductive, abductive, and similarity-based), and implemented the ideal operators for θ O I subsumption in the space of Datalog clauses described in this paper. InTheLEx is the only existing fully and inherently incremental FOL ML system in the literature. An OI-compliant similarity measure for clauses, to be used for case-based reasoning, k-NN classification, clustering, and for guiding the search of the refinement operators, was presented in [48] and evaluated with several experiments, showing the effectiveness of the proposed operators and their efficiency (30% to 70% runtime savings). More recently, the OI framework was adopted in the state-of-the-art Process Mining and Management system WoMan [55,56] and in the automated MultiStrategy Reasoning engine GEAR [57]. WoMan is a declarative system that covers the whole range of tasks related to process management: discovery, supervision, analysis, simulation, prediction. It is the only existing fully incremental process mining system in the literature, and it was successfully applied to several real-world problems where other state-of-the-art systems in the literature failed. It embeds InTheLEx to learn pre- and post-conditions for process components, and it could learn successful models for several tough application domains, including ambient intelligence, education, and chess. GEAR aims at bringing to cooperation in a single inference engine several strategies that are typically investigated in isolation in the literature: induction, deduction, abduction, abstraction, ontological reasoning, similarity-based reasoning, uncertain/probabilistic reasoning, argumentation, and analogy.
The operability of the OI framework is also proven by its successful application to various real-world tasks and domains, of which we discuss some representative samples in the next subsections. In all these cases, the OI-based systems obtain state-of-the-art or better performance, often in less time than their competitors.

7.2. Mutagenesis

Prediction of mutagenicity of chemical compounds is an interesting testbed for the application of ML techniques because it is a relevant problem for biologists, but laboratory tests are costly. The problem consists in distinguishing molecules into “active” and “non-active” ones with respect to mutagenicity. The research in [6] found that for some molecules, classical regression-based techniques are not able to learn useful theories for mutagenicity, and domain experts were asked to define a (now classical) benchmark dataset of molecules suffering from this problem. The 188 molecules in the dataset are described with a total of 25917 atoms, for an average of about 138 atoms per observation. We applied InTheLEx and the OI-based k-NN classifier to this dataset, running a 10-fold cross-validation. InTheLEx endowed with the basic operators obtained 86% accuracy, while using the similarity-guided version of the lggOI operator it reached a slightly better predictive accuracy (87%) saving 70% runtime, indicating that it could quickly identify the correct correspondence of sub-parts of the compound descriptions. The k-NN approach using the same OI-based similarity measure for k = 13 (square root of 188) obtained an average predictive accuracy of 87.22%.
We also compared the OI-based approach to that of other systems in the literature. In the original paper on mutagenesis [6], Progol obtained 88% accuracy, comparable to our results. Concerning systems based on k-NN, the performance of RIBL was just above 70% in the original paper [58] (compared to about 62% of FOIL 6.2). In a subsequent paper [59], RIBL was endowed with different kernels and compared to other systems. In the case of RIBL endowed with kNN, the performance was 77%, while the best performing algorithm in that comparison was SMD (Sum of Minimum Distances algorithm) with about 84%. On an extended dataset of 205 molecules, the k-RNN system [60] was reported to have different performance for 32 different parameter settings ( k = 1 20 and the length of the saturated clause l = 2 5 ). In only three cases, their accuracy was slightly better than ours (89.31% for l = 4 and k = 3 or 4 and 88.25% for l = 3 and k = 2 ), and in all these cases k was always very low with respect to the classical square-root setting; for k = 15 , the best performance reached by k-RNN was 85.72%.

7.3. Document Image Processing

The problem of Document Image Classification consists in identifying the type of document based only on its layout, i.e., without reading its content. It is important for libraries, archives, administrations, and other kinds of document repositories, because documents can be selected, organized, and forwarded to appropriate processing steps while saving computational resources. It is challenging because the document layout can be very variable and its descriptions are characterized by significant indeterminacy (the problem, in FOL-based ML, of identifying the correct mapping between objects in different descriptions). InTheLEx was applied to several document image classification datasets. We report here about two of them, in two very different but challenging settings: contemporary documents (namely scientific papers) and historical documents.

7.3.1. Scientific Papers

Experiments on scientific papers were run on a real-world dataset (available at http://lacam.di.uniba.it/systems/inthelex/index.htm#datasets, consulted on 28 March 2025) containing 353 descriptions of scientific papers belonging to four different classes, some of which had very similar layout styles: 52 from Elsevier journals, 75 from the Springer–Verlag Lecture Notes series (SVLN), 95 from the Journal of Machine Learning Research (JMLR), and 131 from the Machine Learning Journal (MLJ) [48]. Only the first page of each paper was described as the most relevant to identify the paper’s type. The 353 documents were described with a total of 67920 literals, for an average of more than 192 literals per description (some included more than 400 literals). Effectiveness was evaluated using a stratified 10-fold cross-validation (i.e., we ensured that the proportion of examples from the different classes in each fols was the same as for the overall dataset).
First, we ran experiments on model-based concept learning, using InTheLEx under two different settings: with the basic generalization operator (B) and with the similarity-guided generalization operator (S). In addition to Predictive Accuracy, we also used F-measure (with parameter 1 in order to equally weight Precision and Recall) as performance measures to check effectiveness on both positive and negative examples (indeed, the dataset is not balanced: since each positive example for a class is a negative example for the other classes, the negative examples for a class are three times the positive ones). Table 2 summarizes the outcomes, averaged on the 10 folds, reporting in Column Cl the number of clauses in the learned theories. The similarity-driven version outperformed the basic one for both measures; overall, in the 40 runs it saved 1.15 h, resulting in a 98% average accuracy (+1% with respect to the basic version) and 96% average F-measure (+2% with respect to the basic version).
Then, we turned to the k-NN approach. k was set, as recommended by the literature, to the square root of the number of learning instances, i.e., 17. Note that the classification was a multi-class one, so ties are possible. Rows labeled k in Table 2 report the average accuracy. The overall accuracy of 94% means that the OI-based similarity approach is very effective. A deeper analysis revealed that very often in the correct cases almost all of the nearest neighbors were from the same class. Classes Elsevier and MLJ had 100% accuracy (also in every single fold); however, also on SVLN and JMLR, high accuracy rates were reached. An analysis of the errors revealed that MLJ is quite distinct from the other classes, while Elsevier, although well recognizable in itself, is somehow in between JMLR and SVLN, which are also close to each other.
We also ran experiments on Conceptual Clustering, based on the classical K-means algorithm, using medoids as cluster representatives (centroids cannot be used because FOL formulas do not induce an euclidean space), asking for four clusters and stopping the procedure when a new iteration returned a partition already seen in previous iterations. Computing each similarity took on average 2.27 s, a reasonable time considering the complexity of the descriptions. Since the correct class of each document in the dataset is known, we associated each cluster to the class with maximum overlapping. Results are reported in Table 3: for each dataset size, it reports the number of instances in each cluster (column “Size”) and the corresponding class. Precision (P) and Recall (R) values are all above 80%, showing great effectiveness of the approach. A qualitative evaluation of the clustering outcomes revealed that errors were made on very ambiguous documents. Compound statistics are reported in the last row for an overall purity (the clustering counterpart of accuracy) equal to 92.35, a number comparable to results of supervised learning. Preliminary comparisons with other classical measures (Jaccard’s, Tverski’s and Dice’s) report an improvement of up to +5.48% for Precision, up to +8.05% for Recall, and up to +2.83% for Purity.

7.3.2. Historical Documents

Experiments on historical documents were run on a dataset coming from the EU IST-1999-20882 project COLLATE (“Collaboratory for Annotation, Indexing and Retrieval of Digitised Historical Archive Materials”, 2000–2002), consisting of 114 digitized historical documents related to film censorship and produced between the two World Wars. The aim was to train models for recognizing two document classes: registration cards from the Filmarchiv Austria (FAA) and censorship decision reports from the Deutsches Filminstitut (DIF). Specifically, the dataset included 34 FAA registration cards, 19 DIF censorship decisions, and 61 reject documents, obtained from newspaper articles and DIF registration cards. Samples of first and last pages of these documents are reported in Figure 3, where the complexity of the documents can be appreciated (they carry stamps, signatures, typed text, pencil signs, etc.): FAA registration cards are on the top-left, DIF censorship decisions on the right, and DIF censorship decisions on the bottom-left. Note that some reject documents (DIF censorship decisions) are very similar to FAA registration cards, making the task even more challenging. When expressed in FOL, the description length in literals ranged from 40 to 379 atoms per observation/example [61].
The results of a 10-fold cross-validation evaluation are shown in Table 4. Columns “Clauses” and “Length” report the average number of clauses for the 10 folds in the generated theories and the average number of literals in the clauses, respectively. We note that basically the characterizing and discriminating features of the complex layout models for these documents could be captured in just one clause for FAA registration cards and about one clause for DIF censorship decisions. The model length ranged between one and two dozen literals, which ensures human readability and understandability of the models. The average runtime for learning the models, reported in the third column (in sec), shows that efficiency is sufficient for practical real-world applications. Finally, the last three columns report average accuracy of the learned models (overall and for positive and negative examples only), which is perfect for FAA documents and nearly perfect for DIF ones. Also, note that only a few dozen documents per class were needed to learn such high-quality models.

7.4. Process Mining and Management

Process Management is relevant because one can automatically check correctness of process executions based on models of the intended process. Process Mining is relevant because such models can be learned automatically from observation of cases of process executions, saving money with respect to manual development and obtaining models which directly reflect the actual practice. These models can also be useful to analyze the current practice in order to identify problems and improve it, provided that the models are human-understandable. To this purpose, declarative approaches, adopting logic-like representations, are more suitable. A source of complexity in the process mining task is that only positive examples are available to learn the models, and so the safeguard role played by negative examples in ML is missing.
While process management is typically applied to industrial or administrative processes that are quite stable and strictly defined, real life is pervaded by processes with much higher complexity and variability. Most systems available in the literature are unable to represent or learn complex models in real-life situations, which motivated us to develop the WoMan framework and system, declarative and based on OI. WoMan’s models consist of two main kinds of components: “tasks”, specifying allowed types of activities, and “transitions”, expressing permitted task combinations. We report in the following subsections a few experiments in different domains.

7.4.1. Complex Artificial Process Models

To test WoMan under tough conditions, we ran controlled experiments that allowed to tune the amount and kind of complexity in the models. A total of 11 very complex artificial workflow models were built and, following [62], for each model, 1000 training cases were generated at random and used as training examples to learn the models from scratch. Very interestingly, in all training problems, just 50 training cases were sufficient for WoMan to learn a complete and correct model, confirming the usefulness of the incremental approach and its ability to quickly learn accurate models from very few examples.
We compared WoMan to two state-of-the-art systems with very different characteristics: a more traditional one, Little Thumb [62], and one based on genetic algorithms [63]. Table 5 summarizes the learning outcomes of the two approaches (where y = correct model; n = incorrect model; e = formally wrong model). Reference [62] was unable to learn 5 of the 11 models, and [63] performed even worse: it learned the correct model only in 2 cases, and in 5 of the remaining cases it learned a formally wrong one. Its runtimes, reported in the last column, confirm its very low efficiency: it never took less than 21 s, and in one case it took more than half an hour. On the other hand, WoMan never took more than 10 s in all models.

7.4.2. Daily Routines

Learning models of people’s behavior in daily routines is interesting for using these models in personalized domotic and ambient intelligence. In this real-world context, we ran several experiments aimed at assessing the practical applicability of WoMan.
One concerned the “Aruba” dataset taken from the CASAS benchmark repository (available at https://casas.wsu.edu/datasets/, accessed on 29 March 2025). This dataset includes continuous recordings of home activities, involving different combinations of 10 tasks along 220 days, yielding 220 cases of the the home inhabitant’s daily behavior. Overall, it consists of 13 788 events, referred to 6674 activities, yielding an average of 62.67 events per case. Since the “correct” model for Aruba is not available, the learned model’s accuracy was evaluated by an “inverse” 10-fold cross-validation, in which each fold used just 10% cases for learning and the remaining 90% for testing. While [62,63] returned formally wrong models, WoMan processed the whole dataset in 0.1 s (including translation from the case logs to the FOL representation, which proves that the approach is suitable for real-world applications) and always returned syntactically valid models, yielding an average accuracy of 92% (consider that each missed case costs nearly 5% accuracy and that in real life there are always deviations from the routine that prevent reaching 100% accuracy). Being in a positive-examples-only setting, precision is always 100% for both tasks and transitions. The Recall and F1-measure results reported in Table 6 indicate a success, even more considering that just 22 training cases were used.
The Aruba log also reports the status of several sensors installed in the home, from which 5976 training examples (27.16 per case on average) for task pre- and post-conditions were obtained and processed by InTheLEx. The number of atoms in the example descriptions ranged from 740 to 2570 for preconditions and from 12,882 to 175,168 for postconditions, which are really huge numbers in ILP. Still, InTheLEx took on average just 0.12 s to process each example, which again ensures on-line applicability to real-world tasks. Notwithstanding the positive-examples-only setting, the lggOI operator in InTheLEx returned a theory consisting of 12 clauses that preserve all relevant information. The following sample,
sleeping(A):-true.
eating(A):-next(_,A).
meal_preparation(A):-next(_,A).
bed_to_toilet(A):-next(_,A), sensor_m004(A,B), status_on(B),
demonstrates human readability of the model that can be interpreted as follows: “sleeping” has no precondition (i.e., it may happen at any time); “eating”, “meal_preparation”, and “bed_to_toilet” are always preceded by another activity; “bed_to_toilet” takes place only when the movement sensor on the bathroom door (m004) is on. A classical 10-fold cross-validation with random sampling obtained 99.86% accuracy for pre-conditions and 99.75% accuracy for post-conditions. Even using the “inverted” 10-fold cross-validation (where each fold only included training examples from 22 consecutive days), WoMan still obtained 98.27% accuracy for pre-conditions and 98.42% accuracy for post-conditions.

7.4.3. Process-Related Prediction

WoMan can also predict the next activity in a process execution or the kind of process being executed among a number of candidate models, returning a ranking of candidates and abstaining when not sufficiently certain. This is important in real-world applications, where there is much more variability and subjectivity in behavior. To evaluate these features, we used the following datasets (for which Table 7 reports some statistics):
Aruba 
described in the previous section.
GPItaly 
reports the movements of an elderly person in the rooms of her home [64] along 253 days. Each day was a case of the process representing the movement routine.
Chess 
consists of 400 reports of top-level matches downloaded from https://www.federscacchi.com/fsi/index.php/punteggi/archivio-partite (consulted on 29 March 2025). A case was a chess match.
The chess dataset is characterized by very high parallelism: each game starts with 32 concurrent activities (well beyond the reach of many current process mining systems).
A k-fold cross-validation procedure was used (see column Folds in Table 8 for the values of k). After learning the models, each event in the test cases updated the project status and called for a prediction about the next expected activity. Table 8 reports performance averages concerning the different processes: in how many cases a prediction was returned (“Pred”); how many of such predictions include the correct activity (“Recall”), how high it is in the ranking (“Rank”: 1.0 means it is the top one, 0.0 means it is the last one); and the average length of the ranking (“Tasks”). Most of the times WoMan makes a prediction, and in 97-98% of the times the correct activity is in the ranking, and always in the top 10% positions (always at the very top in the chess processes). WoMan is more cautions on predictions in the chess domain but still covers more than half of the match, and correctness is at the same level or better than the state-of-the-art Deep Learning and Neural Network-based approaches.
The process prediction task was evaluated on the chess dataset because it provided three different kinds of processes based on the same domain, corresponding to the possible match outcomes: white wins, black wins, or draw. The aim was predicting the match outcome as long as match events happened. We used the same folds and models as for activity prediction, but calling the process prediction function (that, in this case, always returns a prediction). Table 9 summarizes the performance: Column “Pos” reports the average position of the correct prediction in the ranking (normalized to [ 0 , 1 ] , where 1 represents the top of the ranking and 0 its bottom). The last columns report, on average, for what percentage of the case duration the prediction was correct (C: the correct process was alone at the top of the ranking), approximately correct (A: the correct process shared the top of the ranking with other, but not all, processes), undefined (U: all processes were ranked equal), or wrong (W: the correct process was not at the top of the ranking). In the middle phase of the match, which is the core part for some respects, the percentage of correct predictions is much higher than what is reported in Table 9.

8. Conclusions and Future Work

FOL can be the optimal paradigm to bring together systems programming, DB management, and knowledge-based applications. The LP fragment specifically is a general-purpose technique for building systems suitable for handling relational DBs and transparent and explainable by design. This latter feature is especially important in the current technological landscape, where the use of AI in critical applications requires trustworthiness and understandability by humans. In fact, “ILP systems have discovered new knowledge that has been refereed and published in journals of the relevant subject area” [6]. In this field, the OI framework brings several advantages to ML systems, especially incremental ones, in terms of closer resemblance to human knowledge representation, efficiency, and effectiveness. This paper collected and presented in a uniform way, for the first time, the whole theoretical corpus underlying OI. It can serve as a reference for researchers interested in developing systems based on this framework, as a starting point to further expand the framework, and as a reminder that symbolic AI approaches should be still investigated, in spite of the current hype for subsymbolic (especially neural network-based) ones, in order to ensure trustworthiness, explainability, and anthropocentricity of AI solutions (as also required by recent EU norms). To show the practical usefulness of the OI framework beyond pure theory, the paper also discussed the theoretical bases for its efficiency and provided examples of real OI-based systems that were successfully applied to real-world tasks.
There are several directions for future work on the OI framework. On the theoretical side, a better assessment of computational complexity could be carried out, and the consequences of applying OI to functions could be investigated in more depth. On the implementation side, to the best of our knowledge, no algorithm or system has been developed to date for unconstrained clausal logic or for OI implication. On the application side, more tasks and domains could be explored, and experiments could be run to investigate applicability, effectiveness, and efficiency of OI-based ML under different conditions. Last but not least, the relationships and possible contributions of the OI framework to the current debate on ethics in AI should be investigated. The framework was born before the pervasive spread of AI in everyday life that happened a few years ago and posed the problem of ensuring that the uses of AI are ethical and human-compatible, especially in critical applications where privacy, security, and wellness of people are at stake. Logic-based approaches, such as the OI framework, are understandable and explainable by design, providing real causal explanations rather than post hoc ones or simply interpretable outcomes; as shown, compared to standard FOL-based approaches, OI bears additional resemblance to the human way of representing and expressing knowledge. Also, incremental approaches to ML are naturally suited to interactive behavior, fostering anthropocentricity. Logic-based approaches are also more amenable to learning from very few, selected examples, which may be related to the sustainability problem, not requiring huge computational resources which have an impact on the environment.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All datasets used in the reported experiments are publicly available. No new data were generated during the work described in this paper.

Acknowledgments

The author thanks all the collaborators, colleagues, and students that worked with him on these topics since 1995 for the useful and insightful discussions.

Conflicts of Interest

The author declares no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
DBDatabase
FOLFirst-Order Logic
iffIf and Only If
ILPInductive Logic Programming
LPLogic Programming
MLMachine Learning
OIObject Identity

References

  1. Lloyd, J.W. Foundations of Logic Programming, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 1987. [Google Scholar]
  2. Apt, K.R. Introduction to Logic Programming. In Handbook of Theoretical Computer Science; Elsevier: Amsterdam, The Netherlands, 1990; pp. 492–574. [Google Scholar]
  3. Barr, A.; Feigenbaum, E. The Handbook of Artificial Intelligence; William Kaufmann: Los Altos, CA, USA, 1982. [Google Scholar]
  4. European Parliament and of the Council. Regulation (EU) 2024/1689 Laying Down Harmonised Rules on Artificial Intelligence and Amending Regulations (EC) No 300/2008, (EU) No 167/2013, (EU) No 168/2013, (EU) 2018/858, (EU) 2018/1139 and (EU) 2019/2144 and Directives 2014/90/EU, (EU) 2016/797 and (EU) 2020/1828 (Artificial Intelligence Act) (Text with EEA Relevance). Available online: http://data.europa.eu/eli/reg/2024/1689/oj (accessed on 13 June 2024).
  5. Srinivasan, A.; King, R.D.; Muggleton, S.H.; Sternberg, M.J.E. Carcinogenesis Predictions Using Inductive Logic Programming. In Intelligent Data Analysis in Medicine and Pharmacology; Lavrač, N., Keravnou, E.T., Zupan, B., Eds.; Springer: Boston, MA, USA, 1997; pp. 243–260. [Google Scholar]
  6. Srinivasan, A.; Muggleton, S.; King, R.; Sternberg, M. Mutagenesis: ILP experiments in a non-determinate biological domain. In Proceedings of the 4th International Workshop on Inductive Logic Programming, Bad Honnef/Bonn, Germany, 12–14 September 1994; GMD-Studien Nr. 237. pp. 217–232. [Google Scholar]
  7. Srinivasan, A.; Muggleton, S.; Sternberg, M.; King, R. Theories for mutagenicity: A study in first-order and feature-based induction. Artif. Intell. 1996, 85, 277–299. [Google Scholar] [CrossRef]
  8. Genesereth, M.; Nilsson, N. Logic Foundations of Artificial Intelligence; Morgan Kaufmann: Burlington, MA, USA, 1987. [Google Scholar]
  9. Nienhuys-Cheng, S.H.; de Wolf, R. Foundations of Inductive Logic Programming; Lecture Notes in Artificial Intelligence; Springer: Berlin/Heidelberg, Germany, 1997; Volume 1228. [Google Scholar]
  10. Khoshafian, S.N.; Copeland, G.P. Object identity. ACM Sigplan Not. 1986, 21, 406–416. [Google Scholar] [CrossRef]
  11. Besold, T.R.; d’Avila Garcez, A.; Bader, S.; Bowman, H.; Domingos, P.; Hitzler, P.; Kuehnberger, K.U.; Lamb, L.C.; Lima, P.M.V.; de Penning, L.; et al. Neural-symbolic learning and reasoning: A survey and interpretation. In Neuro-Symbolic Artificial Intelligence: The State of the Art; Frontiers in Artificial Intelligence and Applications; IOS Press: Amsterdam, The Netherlands, 2021; Volume 342, pp. 1–51. [Google Scholar]
  12. Schmidt-Schauss, M. Implication of Clauses is Undecidable. Theor. Comput. Sci. 1988, 59, 287–296. [Google Scholar] [CrossRef]
  13. Garey, M.; Johnson, D. Computers and Intractability—A Guide to the Theory of NP-Completeness; Freeman: San Francisco, CA, USA, 1979. [Google Scholar]
  14. Robinson, J.A. A Machine-Oriented Logic Based on the Resolution Principle. J. ACM 1965, 12, 23–41. [Google Scholar] [CrossRef]
  15. Chang, C.; Lee, R. Symbolic Logic and Mechanical Theorem Proving; Academic Press: San Diego, CA, USA, 1973. [Google Scholar]
  16. Nienhuys-Cheng, S.H.; de Wolf, R. The Subsumption Theorem in Inductive Logic Programming: Facts and Fallacies. In Advances in Inductive Logic Programming; De Raedt, L., Ed.; Frontiers in Artificial Intelligence and Applications; IOS Press: Amsterdam, The Netherlands, 1996; Volume 32, pp. 265–276. [Google Scholar]
  17. Nienhuys-Cheng, S.H.; de Wolf, R. The Subsumption Theorem Revisited: Restricted to SLD Resolution. In Proceedings of the Computing Science in the Netherlands, Utrecht, The Netherlands, 20–22 September 1995. [Google Scholar]
  18. Helft, N. Inductive Generalization: A Logical Framework. In Proceedings of the 2nd European Conference on European Working Session on Learning (EWSL’87), Bled, Yugoslavia, 1 May 1987; pp. 149–157. [Google Scholar]
  19. van Emden, M.; Kowalski, R. The Semantics of Predicate Logic as a Programming Language. J. ACM 1976, 23, 733–742. [Google Scholar] [CrossRef]
  20. Ceri, S.; Gottlöb, G.; Tanca, L. Logic Programming and Databases; Springer: Berlin/Heidelberg, Germany, 1990. [Google Scholar]
  21. Kanellakis, P.C. Elements of Relational Database Theory. In Handbook of Theoretical Computer Science; Elsevier: Amsterdam, The Netherlands, 1990; Volume B–Formal Models and Semantics, pp. 1073–1156. [Google Scholar]
  22. Rouveirol, C. Extensions of Inversion of Resolution Applied to Theory Completion. In Inductive Logic Programming; Academic Press: Cambridge, MA, USA, 1992; pp. 64–90. [Google Scholar]
  23. Michalski, R.S. A Theory and Methodology of Inductive Learning. In Machine Learning: An Artificial Intelligence Approach; Morgan Kaufmann: Burlington, MA, USA, 1983; Volume I. [Google Scholar]
  24. De Raedt, L. Interactive Theory Revision—An Inductive Logic Programming Approach; Academic Press: Cambridge, MA, USA, 1992. [Google Scholar]
  25. Wrobel, S. Concept Formation and Knowledge Revision; Kluwer Academic: Dordrecht, The Netherlands, 1994. [Google Scholar]
  26. Wrobel, S. On the proper definition of minimality in specialization and theory revision. In Machine Learning, Proceedings of the ECML-93: European Conference on Machine Learning, Vienna, Austria, 5–7 April 1993; Number 667 in Lecture Notes in Artificial Intelligence; Springer: Berlin/Heidelberg, Germany, 1993; pp. 65–82. [Google Scholar]
  27. Plotkin, G.D. A Note on Inductive Generalization. Mach. Intell. 1970, 5, 153–163. [Google Scholar]
  28. Van der Laag, P.R.J.; Nienhuys-Cheng, S.H. A Note on Ideal Refinement Operators in Inductive Logic Programming. In Proceedings of the 4th International Workshop on Inductive Logic Programming, Bad Honnef/Bonn, Germany, 12–14 September 1994; GMD-Studien Nr. 237. pp. 247–260. [Google Scholar]
  29. Van der Laag, P.R.J.; Nienhuys-Cheng, S.H. Existence and Nonexistence of Complete Refinement Operators. In Machine Learning, Proceedings of the ECML-94: European Conference on Machine Learning, Catania, Italy, 6–8 April 1994; Number 784 in Lecture Notes in Artificial Intelligence; Springer: Berlin/Heidelberg, Germany, 1994; pp. 307–322. [Google Scholar]
  30. Van der Laag, P.R.J. An Analysis of Refinement Operators in Inductive Logic Programming. Ph.D. Thesis, Erasmus University, Rotterdam, The Netherlands, 1995. [Google Scholar]
  31. Esposito, F.; Malerba, D.; Semeraro, G.; Brunk, C.; Pazzani, M. Traps and Pitfalls when Learning Logical Definitions from Relations. In Methodologies for Intelligent Systems, Proceedings of the 8th International Symposium, ISMIS’94, Charlotte, NC, USA, 16–19 October 1994; Number 869 in Lecture Notes in Artificial Intelligence; Springer: Berlin/Heidelberg, Germany, 1994; pp. 376–385. [Google Scholar]
  32. Semeraro, G.; Esposito, F.; Malerba, D.; Brunk, C.; Pazzani, M. Avoiding Non-Termination when Learning Logic Programs: A Case Study with FOIL and FOCL. In Logic Program Synthesis and Transformation—Meta-Programming in Logic, Proceedings of the 4th International Workshops, LOPSTR’94 and META’94, Pisa, Italy, 20–21 June 1994; Number 883 in Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 1994; pp. 183–198. [Google Scholar]
  33. Esposito, F.; Laterza, A.; Malerba, D.; Semeraro, G. Locally Finite, Proper and Complete Operators for Refining Datalog Programs. In Foundations of Intelligent Systems, Proceedings of the 9th International Symposium, ISMIS’96, Zakopane, Poland, 9–13 June 1996; Number 1079 in Lecture Notes in Artificial Intelligence; Springer: Berlin/Heidelberg, Germany, 1996; pp. 468–478. [Google Scholar]
  34. Semeraro, G.; Esposito, F.; Malerba, D. Ideal Refinement of Datalog Programs. In Logic Program Synthesis and Transformation, Proceedings of the 5th International Workshop, LOPSTR’95, Utrecht, The Netherlands, 20–22 September 1995; Number 1048 in Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 1995; pp. 120–136. [Google Scholar]
  35. Plotkin, G.D. Building-in Equational Theories. In Machine Intelligence; Edinburgh University Press: Edinburgh, UK, 1972; Volume 7, pp. 73–90. [Google Scholar]
  36. Reiter, R. Equality and domain closure in first order databases. J. ACM 1980, 27, 235–249. [Google Scholar] [CrossRef]
  37. Jaffar, J.; Maher, M.J. Constraint Logic Programming: A Survey. J. Log. Program. 1994, 19, 503–581. [Google Scholar] [CrossRef]
  38. Ferilli, S. Programmazione Logica Induttiva nella Revisione di Teorie. Laurea Degree Thesis, Dipartimento di Informatica, Università di Bari, Bari, Italy, 1996. [Google Scholar]
  39. Siekmann, J.H. An introduction to Unification Theory. In Formal Techniques in Artificial Intelligence—A Sourcebook; Banerji, R.B., Ed.; Elsevier Science: Amsterdam, The Netherlands, 1990; pp. 460–464. [Google Scholar]
  40. Semeraro, G.; Esposito, F.; Malerba, D.; Fanizzi, N.; Ferilli, S. A Logic Framework for the Incremental Inductive Synthesis of Datalog Theories. In Logic Program Synthesis and Transformation, Proceedings of the 7th International Workshop, LOPSTR’97, Leuven, Belgium, 10–12 July 1997; Number 1463 in Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 1997; pp. 300–321. [Google Scholar]
  41. Gottlöb, G. Subsumption and Implication. Inf. Process. Lett. 1987, 24, 109–111. [Google Scholar] [CrossRef]
  42. Boolos, G.; Jeffrey, R. Computability and Logic, 3rd ed.; Cambridge University Press: Cambridge, UK, 1989. [Google Scholar]
  43. Idestam-Almquist, P. Generalization of Clauses. Ph.D. Thesis, Stockholm University and Royal Institute of Technology, Kista, Sweden, 1993. [Google Scholar]
  44. Nédellec, C.; Rouveirol, C.; Adé, H.; Bergadano, F.; Tausend, B. Declarative Bias in ILP. In Advances in Inductive Logic Programming; De Raedt, L., Ed.; IOS Press: Amsterdam, The Netherlands, 1996; Volume 32, pp. 82–103. [Google Scholar]
  45. Komorowski, H.J.; Trcek, S. Towards Refinement of Definite Logic Programs. In Proceedings of the Methodologies for Intelligent Systems; Raś, Z.W., Zemankova, M., Eds.; Number 869 in Lecture Notes in Artificial Intelligence; Springer: Berlin/Heidelberg, Germany, 1994; pp. 315–325. [Google Scholar]
  46. Muggleton, S.H. Inverting Implication. In Proceedings of the 2nd International Workshop on Inductive Logic Programming, ICOT Technical Memorandum TM-1182, Tokyo, Japan, 15–20 July 1992. [Google Scholar]
  47. Esposito, F.; Fanizzi, N.; Ferilli, S.; Semeraro, G. A generalization model based on OI implication for ideal theory refinement. Fundam. Inform. 2001, 47, 15–33. [Google Scholar]
  48. Ferilli, S.; Basile, T.M.A.; Biba, M.; Mauro, N.D.; Esposito, F. A General Similarity Framework for Horn Clause Logic. Fundam. Inform. 2009, 90, 43–66. [Google Scholar] [CrossRef]
  49. Esposito, F.; Malerba, D.; Semeraro, G. Negation as a specializing operator. In Advances in Artificial Intelligence, Proceedings of the Third Congress of the Italian Association for Artificial Intelligence, AI* IA’93, Torino, Italy, 26–28 October 1993; Torasso, P., Ed.; Springer: Berlin/Heidelberg, Germany, 1993; pp. 166–177. [Google Scholar]
  50. Esposito, F.; Fanizzi, N.; Malerba, D.; Semeraro, G. Downward Refinement of Hierarchical Datalog Theories. In Proceedings of the 1995 Joint Conference on Declarative Programming, GULP-PRODE 95, Marina di Vietri, Italy, 11–14 September 1995; pp. 148–159. [Google Scholar]
  51. Ferilli, S. Predicate invention-based specialization in Inductive Logic Programming. J. Intell. Inf. Syst. 2016, 47, 33–55. [Google Scholar] [CrossRef]
  52. Esposito, F.; Malerba, D.; Semeraro, G. Multistrategy Learning for Document Recognition. Appl. Artif. Intell. Int. J. 1994, 8, 33–84. [Google Scholar] [CrossRef]
  53. Semeraro, G.; Esposito, F.; Fanizzi, N.; Malerba, D. Revision of logical theories. In Topics in Artificial Intelligence; Number 992 in Lecture Notes in Artificial Intelligence; Springer: Berlin/Heidelberg, Germany, 1995; pp. 365–376. [Google Scholar]
  54. Esposito, F.; Semeraro, G.; Fanizzi, N.; Ferilli, S. Multistrategy Theory Revision: Induction and Abduction in INTHELEX. Mach. Learn. J. 2000, 38, 133–156. [Google Scholar] [CrossRef]
  55. Ferilli, S. WoMan: Logic-based Workflow Learning and Management. IEEE Trans. Syst. Man Cybern. Syst. 2014, 44, 744–756. [Google Scholar] [CrossRef]
  56. Ferilli, S.; Angelastro, S. Activity Prediction in Process Mining using the WoMan Framework. J. Intell. Inf. Syst. 2019, 53, 93–112. [Google Scholar] [CrossRef]
  57. Ferilli, S. GEAR: A General Inference Engine for Automated MultiStrategy Reasoning. Electronics 2023, 12, 256. [Google Scholar] [CrossRef]
  58. Emde, W.; Wettschereck, D. Relational instance based learning. In Proceedings of the ICML-96; Saitta, L., Ed.; Morgan Kaufmann: Burlington, MA, USA, 1996; pp. 122–130. [Google Scholar]
  59. Woznica, A.; Kalousis, A.; Hilario, M. Distances and (Indefinite) Kernels for Sets of Objects. In Proceedings of the Sixth International Conference on Data Mining (ICDM’06), Hong Kong, China, 18–22 December 2006; pp. 1151–1156. [Google Scholar] [CrossRef]
  60. Fonseca, N.; Costa, V.S.; Rocha, R.; Camacho, R. k-RNN: K-Relational Nearest Neighbour Algorithm. In Proceedings of the SAC ’08: 2008 ACM Symposium on Applied Computing, Fortaleza, Brazil, 16–20 March 2008; ACM: New York, NY, USA, 2008; pp. 944–948. [Google Scholar] [CrossRef]
  61. Esposito, F.; Ferilli, S.; Fanizzi, N.; Basile, T.M.; Di Mauro, N. Incremental multistrategy learning for document processing. Appl. Artif. Intell. 2003, 17, 859–883. [Google Scholar] [CrossRef]
  62. Weijters, A.; van der Aalst, W. Rediscovering Workflow Models from Event-Based Data. In Proceedings of the 11th Dutch-Belgian Conference on Machine Learning (Benelearn 2001), The Hague, The Netherlands, 20 May 2011; Hoste, V., Pauw, G.D., Eds.; ACM: New York, NY, USA, 2001; pp. 93–100. [Google Scholar]
  63. de Medeiros, A.; Weijters, A.; van der Aalst, W. Genetic process mining: An experimental evaluation. Data Min. Knowl. Discov. 2007, 14, 245–304. [Google Scholar] [CrossRef]
  64. Coradeschi, S.; Cesta, A.; Cortellessa, G.; Coraci, L.; Gonzalez, J.; Karlsson, L.; Furfari, F.; Loutfi, A.; Orlandini, A.; Palumbo, F.; et al. Giraffplus: Combining social interaction and long term monitoring for promoting independent living. In Proceedings of the 2013 6th International Conference on Human System Interactions (HSI), Sopot, Poland, 6–8 June 2013; pp. 578–585. [Google Scholar]
Figure 1. Taxonomy of the generalization models.
Figure 1. Taxonomy of the generalization models.
Electronics 14 01523 g001
Figure 2. Two structures in a block world.
Figure 2. Two structures in a block world.
Electronics 14 01523 g002
Figure 3. Samples of COLLATE documents: FAA registration cards and DIF censorship decision reports.
Figure 3. Samples of COLLATE documents: FAA registration cards and DIF censorship decision reports.
Electronics 14 01523 g003
Table 1. Procedural interpretation of clauses.
Table 1. Procedural interpretation of clauses.
LogicsProgrammingDatabases
predicateprocedure namerelation (table)
termdata structure(atomic) value
clauseprocedure declaration
clause headprocedure heading
clause bodyprocedure body
atom in clause bodystatement
definite clausesub-programview
goalexecutionquery
fact n-tuple
Table 2. Classification results for Document Image Classification on scientific papers.
Table 2. Classification results for Document Image Classification on scientific papers.
ClAccuracyF1-Measure
S1.90.980.97
JMLRB1.90.980.97
k0.90
S11.001.00
ElsevierB2.10.990.97
k1.00
S4.70.960.94
MLJB5.20.930.91
k1.00
S2.60.980.94
SVLNB3.30.970.93
k0.90
S2.550.980.96
AverageB3.1250.970.94
k0.94
Table 3. Experimental results for Document Clustering on scientific papers.
Table 3. Experimental results for Document Clustering on scientific papers.
ClusterSizeClassP (%)R (%)
165Elsevier80100
265SVLN98.4685.33
3105JMLR90.48100
4118MLJ97.4687.79
average 91.6093.28
Table 4. Classification Results for COLLATE historical documents.
Table 4. Classification Results for COLLATE historical documents.
ClausesLengthRuntimeAccuracyPosNeg
faa-registration-card112.590.131.01.01.0
dif-censorship-decision1.125.85761.80.9630.7891.0
Table 5. Experimental outcomes on artificial workflow models.
Table 5. Experimental outcomes on artificial workflow models.
IdLittle ThumbGeneticTime (Genetic)
1yn1
2ye>3
3ye>1
4ny>1
5nn>25
6nn>4
7ne>30
8ne37
9yn31
10yy21
11ye30
Table 6. Learning performance on the Aruba dataset workflow models.
Table 6. Learning performance on the Aruba dataset workflow models.
TasksTransitions
Recall91.36%55.49%
F1-measure95.48%71.37%
Table 7. Process prediction dataset statistics.
Table 7. Process prediction dataset statistics.
CasesEventsTasks
OverallAvgOverall
Aruba22013,78862.6710
GPItaly253185,844369.478
White15836,768232.71681
Black8721,142243.01663
Draw15532,422209.17658
Table 8. Activity prediction performance.
Table 8. Activity prediction performance.
FoldsPredRecallRank(Tasks)
Aruba30.850.970.926.06
GPItaly30.990.970.968.02
chess50.540.981.011.34
Table 9. Process prediction performance.
Table 9. Process prediction performance.
FoldsPos (%)CAUW
black50.470.200.000.150.66
white50.700.440.000.150.40
draw50.610.290.010.180.52
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ferilli, S. Object Identity Reloaded—A Comprehensive Reference for an Efficient and Effective Framework for Logic-Based Machine Learning. Electronics 2025, 14, 1523. https://doi.org/10.3390/electronics14081523

AMA Style

Ferilli S. Object Identity Reloaded—A Comprehensive Reference for an Efficient and Effective Framework for Logic-Based Machine Learning. Electronics. 2025; 14(8):1523. https://doi.org/10.3390/electronics14081523

Chicago/Turabian Style

Ferilli, Stefano. 2025. "Object Identity Reloaded—A Comprehensive Reference for an Efficient and Effective Framework for Logic-Based Machine Learning" Electronics 14, no. 8: 1523. https://doi.org/10.3390/electronics14081523

APA Style

Ferilli, S. (2025). Object Identity Reloaded—A Comprehensive Reference for an Efficient and Effective Framework for Logic-Based Machine Learning. Electronics, 14(8), 1523. https://doi.org/10.3390/electronics14081523

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop