An Inductive Logical Model with Exceptional Information for Error Detection and Correction in Large Knowledge Bases

Wu, Yan; Lin, Xiao; Lian, Haojie; Zhang, Zili

doi:10.3390/math13111877

Open AccessArticle

An Inductive Logical Model with Exceptional Information for Error Detection and Correction in Large Knowledge Bases

by

Yan Wu

^1,*,†

,

Xiao Lin

^2,†,

Haojie Lian

^3,† and

Zili Zhang

^4,†

¹

Department of Foundational Courses Dujiangyan Campus, Sichuan Agricultural University, Chengdu 611830, China

²

The York Management School, University of York, York YO10 5DD, UK

³

Key Laboratory of In-Situ Property-Improving Mining of Ministry of Education, Taiyuan University of Technology, Taiyuan 030024, China

⁴

College of Computer and Information Science, Southwest University, Chongqing 400715, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2025, 13(11), 1877; https://doi.org/10.3390/math13111877

Submission received: 22 March 2025 / Revised: 19 May 2025 / Accepted: 27 May 2025 / Published: 4 June 2025

Download

Browse Figures

Versions Notes

Abstract

Some knowledge bases (KBs) extracted from Wikipedia articles can achieve very high average precision values (over 95% in DBpedia). However, subtle mistakes including inconsistencies, outliers, and erroneous relations are usually ignored in the construction of KBs by extraction rules. Automatic detection and correction of these subtle errors is important for improving the quality of KBs. In this paper, an inductive logic programming with exceptional information (EILP) is proposed to automatically detect errors in large knowledge bases (KBs). EILP leverages the exceptional information problems that are ignored in conventional rule-learning algorithms such as inductive logic programming (ILP). Furthermore, an inductive logical correction method with exceptional features (EILC) is proposed to automatically correct these mistakes by learning a set of correction rules with exceptional features, in which respective metrics are provided to validate the revised triples. The experimental results demonstrate the effectiveness of EILP and EILC in detecting and repairing large knowledge bases, respectively.

Keywords:

rule learning; exceptional information; correction rules; inductive logic programming

MSC:

03B99

1. Introduction

The application of domain-oriented knowledge bases (KBs) is a prevalent research topic in Artificial Intelligence. KBs serve various downstream tasks, such as hierarchical reasoning [1], question answering [2], and information retrieval [3]. Many semantic KBs, for example, YAGO [4], DBpedia [5], Wikidata [6], or the Google Knowledge Graph [7], have been constructed from the knowledge in the Web and have wide applications. In the real world, KBs are multirelational, heterogeneous, semistructured, and full of uncertainty and incompleteness [8,9], which leads to inconsistencies, outliers, and erroneous relations. Hence, many algorithms have been proposed to detect errors in KBs [10,11,12]. Among them, rule learning algorithms are commonly used due to their strong interpretability of data [13,14,15,16]. A popular algorithm that falls into this category is inductive logic programming (ILP) [17,18], which learns the hypothesis with background knowledge to cover all positive examples [19]. A deficiency of ILP is that it neglects two important factors in error detection processes: (a) negative statements and (b) abnormal information. To utilize negative statements to improve the accuracy of error detection in KBs, Wu et al. [20] proposed guided inductive logic programming (GILP), which detects systematic extraction of mistaken problems by positive and negative feedback with maximum consideration of the characteristics of misinformation. In terms of utilizing abnormal information, Mohamed et al. [21] introduced the exception-enriched rule learning model and improved the quality of a set of rules by repairing Horn rules by adding negative atoms to bodies. In the present paper, we propose a new error detection algorithm, the inductive logic programming algorithm with exceptional information (EILP), which considers both negative statements and abnormal information in error detection, combining the advantages of the two methods mentioned above.

After detecting errors in KBs, a correction algorithm is needed to improve the quality of the knowledge bases [22]. Many efforts have been made in this area. For example, Martin et al. [23] employed nonembedding methods to resolve temporal conflicts in inconsistent Resource Description Framework (RDF) knowledge bases. Mohamed Yakout [24] proposed a Scalable Automatic Repairing (SCARE) system to correct the error of the data according to the statistical likelihood. Theodoros et al. [25] introduced a HoloClean framework for holistic data repairs driven by probabilistic inference. Lertvittayakumjorn [26] proposed the keyword method to correct range violation errors of DBpedia. Ortona designed the system of rule discovery in knowledge bases (RuDiK) [27] to find declarative rules over KBs, including inequality rules to expand associated rules with measures. Mohammad Mahdavi [28] proposed a Baran system to fix data errors with respect to their value, vicinity, and domain contexts. Furthermore, Abedini [29] proposed a new Correction Tower to analyze popular embedding methods. The authors of the present paper proposed the guided inductive logic learning model (GILLearn) in previous work [30,31], which corrects the range errors of the triples by building correction rules after cleaning the KB. Nevertheless, all of the above-mentioned correction algorithms heavily rely on pre-configured paradigms and complete set of rules, leading to high computational cost and limits in the range of applications. To overcome this difficulty, as an extension of our previous work [30,31], we propose an inductive logic correction algorithm with exceptional information, coined “EILC”, to correct negative knowledge based on exception features in large KBs, in which a new search space and a new rule refining algorithm are applied to mine rules and repair knowledge. In addition, cross-semantic similarity metrics based on semantic similarity and source-related contents are applied to annotate and refine triples in KBs. However, several quality issues persist in knowledge bases (KBs), such as anomalous triples, type inconsistencies, and abnormal relation patterns within RDF knowledge bases. Our model is specifically designed to detect and correct these identified error patterns.

Hence, the novelty of our work contains the following two aspects:

The inductive logical programming algorithm with exceptional information (EILP) is proposed to detect errors in KBs by considering both negative statements and abnormal information;
The inductive logical correction method with exceptional features (EILC) is proposed to correct errors, in which a new rule refining algorithm is applied to revise correction rules.

The remainder of this paper is organized as follows. In Section 2, some preliminary concepts are introduced. Section 3 highlights the EILP and EILC algorithms. Section 4 reports our main experimental results and the analysis of our model. The conclusion is presented in Section 5.

2. Preliminaries

For an understanding of our proposed methods, some basic notation is explained in this section. Here, three types of erroneous content are explained in the problem statement.

2.1. Problem Statement

Knowledge bases often exhibit quality issues, deriving from sincere mistakes or errors in crowd-sourced content [32]. Zaveri et al. [33] found that 12% of DBpedia triples have some type of problem. In our model, problem statements suffer from three types of erroneous content.

Inconsistencies of Type. A major error is inconsistency of knowledge. An erroneous tail of a triple is common. Correcting incorrect tails requires the range constraint of a given relation. For example, triples (<

e_{i}

, nationality,

e_{j}

>) have wrong range types, such as EthnicGroup, Language, and Agent. Most work has used traditional rule extraction metrics [34] to achieve high confidence in inconsistency problems, where triples breaking constraints with the meaning of “unusual” are viewed as incorrect in the data cleaning system. The common sense behind these learning methods is that “unusual means errors”. Therefore, the method of “unusual equals incorrect” cannot be executed within subdomains of inconsistencies in KB, such as functional dependencies [15], denial constraints, and Horn clauses [35]. For these problems, common rule learning approaches perform only the work of data cleaning; these methods are rarely used in knowledge base correction algorithms.

Triple outliers in associated KBs. The process of knowledge creation results in errors in information extraction, and there are subtle discrepancies in the same core of knowledge sources, like Wikidata, DBpedia, and YAGO. In this study, triples corrected for similar entities are investigated in structures with cooccurence facts. The property (owl: sameAs, owl: http://www.w3.org/2002/07/owl, accessed on 1 May 2022) is leveraged to enhance the precision of original KBs. The interrelations of KBs have domain or range restrictions, and the candidates violating them are not estimated in the system. Here, exceptional triples are those facts in a KB that deviate from the entities’ cooccurrence in associated KBs. For this condition, there are common connections between the entities involved with the same background information. The quality of extracted information reaches a new stage where integration into KBs should be achieved with filtering or correcting in further work.

Vicinity of erroneous relations. There are two main reasons for the erroneous relations in KBs. First, extracted resources, such as Wikipedia articles, may contain falsified relations. Extracting knowledge from unstructured or semistructured KBs is error prone in the vast amount of linked data available. Second, some erroneous relations are added to KBs after knowledge construction. The neighborhood of erroneous relations has some implicit information for the purpose of data correction. Implicit knowledge has been recovered by internal correlations of triples [26].

For example, the logic relations of France + Berlin − Germany = Paris and France + Germans − Germany = French in a KB have the same confidence scores. In nonembedding approaches, direct logic relations can be utilized in process refinement and enrichment, ignoring the condition of samples. For false triples <?a, dbr:nationality, Germans>, the right result is probed in the second equation. Traditional algorithms do not fix the erroneous value Germans in the type

C o u n t r y

with the entity value. The item itself has the right constraint on its type of

E t h n i c G r o u p

. When checking its neighboring triple attributes with the same type, Germany as the type of

C o u n t r y

is regarded the tail of the triple. Finally, mistaken triples can be revised with their vicinity in the KB. For brevity, the system corrects problems that ABox property assertions of triple with mistaken type. Here, entity assertions with erroneous tails are mined by rules and presented with simple canonicalization in our model. In particular, some simple canonicalizations are acquired in the examples; e.g., the mistaken assertion <dbr:Rodrigo_Kirchner, dbo:nationality, dbr:German> (dbr: http://dbpedia.org/resource/, accessed on 1 May 2025); (dbo: http://dbpedia.org/ontology/, accessed on 1 May 2025) should be corrected to <dbr:Rodrigo_Kirchner, dbo:nationality, dbr:Germany>. The dbpedia knowledge base is open source and available online. We can query it through the URL https://dbpedia.org/sparql and it can be accessed until 28 May 2025.

Triple: RDF is chosen as a data model to transform web facts [36]. Knowledge bases are made up of a series of RDF information called triples. For example, the sentence (Beijing is capital of China) is translated as <Beijing, isCapitalOf, China>. A three-tuple consists of

{< s u b, p r e, o b j > ∥ s u b, o b j \in ξ, p r e \in K}

, where

s u b

and

o b j

present entities of the subject and object, respectively, and

p r e

is the predicate name in the knowledge base

K

.

First order logic rules [37]: We focus on first-order rules with a single atom of head and two predicates of body. The number of variables contained in the closed rule is even, and the variables of the head exist in the body of rules. The most ideal rule satisfies the condition

p r e_{1} (? a, ? b), p r e_{2} (? b, ? c) - > p r e_{3} (? a, ? c)

.

2.2. Search Space of Feedback

The background schema KB and TBox is then incorporated into the constraints taxonomy, where a set of tuples is regarded a knowledge base

K

. A triple is denoted by

R (\bar{t}) \in K

from the whole

K (R, \bar{t})

. Some training examples are selected to filter feedback

F

with positive/negative information based on expert knowledge or external features. Feedback

F

is either positive or negative for each triple

R (\bar{t})

. Therefore, the two groups

F^{+}

or

F^{-} \subseteq K

involve all triples receiving correct or incorrect

F

, respectively. The other two sets,

T^{+}

and

T^{-} \subseteq K

, consist of all facts, which are detected to distinguish positive or negative features by consistency constraints, explained in rules. The main target of assignments for extracting

F

is to learn Horn clauses with exception by small samples of labels

F^{+}

or

F^{-}

while developing

T^{+}

or

T^{-}

over a swarm of triples in

K

. Four conditions are kept in the KBs:

F^{+} \cap F^{-} = Ø

,

F^{+} \cup F^{-} = F

,

T^{+} \cap T^{-} = Ø

, and

T^{+} \cup T^{-} = T

.

A new search space is proposed to profile first-order rules for the following steps.

K

,

F^{+}

,

F^{-}

,

T^{+}

, and

T^{-}

are expressed as set predicates, and

T^{+} (\bar{t}) \land R (\bar{t})

are served as

T^{+} (R, \bar{t})

. Other predicates follow a similar structure. Inspired by the Markov logic network [38], each triple expands all linked information with six conditions of paths. A complete closed clique is constructed, and the weights are defined by numbers of logic formulations. For the relationship between x, y, and z, we can use different colors and dotted and solid arrows to represent the six logical relationships between them. Arrows of the same color represent one logical relationship, the solid line represents the outer circle logical relationship between x, y, and z, and the dotted line represents the inner circle logic. The feedback is similar to that of the original atomic formula, as shown in Figure 1. The new search space deletes some redundant information and concentrates on fixed predicate names in the knowledge base.

2.3. Quality Measures

Cross-similarity measures are used to validate repairs of erroneous triples in KBs. After our model operations, some mistaken assertions are matched with multiple values in the repairs process. Here, a new cross-similarity measure is proposed to analyze the final revised assertions of triples in KBs, aiming to discover common features between the original entities and the repairs after correction. In Equation (2), the distance Jaro–Winkler [39] is suitable to calculate the similarity between short strings such as names, where

d_{j}

is the similarity of the Jaro–Winkler strings between

e_{0}

and

e_{i}

, m is the number of matched strings, and t is the number of transpositions. Then,

s i m_e x t e r n a l (,)

analyzes the external similarity probability, matching cooccurrence Wikipages in the (

w i k i P a g e W i k i L i n k

) property.

s (e_{0}, e_{i})

is a pair of compared objects. A new cross-function,

f_{c r o s s}

, in Equation (3) is the harmonic mean of distance and external similarity, which is designed to cover all correlations of assertions and candidate repairs.

s i m_e x t e r n a l (e_{0}, e_{i}) = (P_{e_{0}} \cap P_{e_{i}}) / P_{e_{0}}

(1)

d_{j} = \frac{1}{3} (\frac{m}{∥ e_{0} ∥} + \frac{m}{∥ e_{i} ∥} + \frac{m - t}{m})

(2)

f_{c r o s s} = \frac{2 \times d_{j} \times s i m_e x t e r n a l (e_{0}, e_{i})}{d_{j} + s i m_e x t e r n a l (e_{0}, e_{i})}

(3)

3. Proposed Methods

In this section, an inductive logical model with exceptional information is proposed to detect errors in large knowledge bases. For large KBs, our approaches extract some rules to detect these problems. Here, a new search space is applied to increase the speed of mining rules, described in Section 2.2. After finding erroneous knowledge, an inductive logical correction method is leveraged to correct these abnormal triples to refine KBs. Finally, some quality metrics are applied to validate the revised triples, illustrated in Section 2.3. On the whole, the core of the Inductive Logical model focuses on the algorithms of EILP and EILC. The main focus of our proposed method is the use with exceptional features of feedback and negative statements.

3.1. Overview of Inductive Logical Model

Figure 2 shows the architecture of the inductive logical model with exception, including EILP detection and the EILC algorithm. The core of the framework focuses on the modules of error detection and knowledge correction. The guidance of exceptional features is offered to update the quality of the rules in our model. The EILP algorithm acquires a higher quality of rules in order to detect more mistakes in the knowledge base. Subsequently, the EILC correction algorithm is implemented using correction rules to refine the large knowledge base, which pulls some corrected facts for the KB cleaning. The top k candidate assertions are filtered from EILC to repair the wrong features in triples. Repairs are generated from EILC, and new measures filter the best corrected triples to clean the knowledge base.

Repair similarity can be validated through various approaches [40], and multiple algorithms are available. Here, the new methods of cross-similarity of candidates containing similar words are renovated to determine the optimal correction result, as shown in Section 2.3. The incorrect range assertions of triples can be discovered automatically by EILP. A set of triples are viewed as negative by their range type in the TBox property. For a triple

< s u b, p r e, o b j >

in the community, the proposed correction module notices an atom from K as an object substitution. The revised entity e is semantically correlated with

o b j

, and the triple

< s u b, p r e, e >

is repaired. A new candidate entity assertion remains instead of erroneous objects in open KBs. For example, the fact nationality(Einstein, German) is an incorrect triple, and the erroneous German is switched with the revised atom Germany. Here, nationality can serve as a predicate name in the DBpedia training sets.

3.2. EILP

The detection part includes nonmonotonic inductive logic programming (NILP) and inductive logic programming with exceptional information (EILP), as described in the following. We wish to learn rules with high quality to explain consistency constraints in KBs. Constraints are mined by positive and negative feedback from relational first-order literals or rules. In the initial step, feedback extracted from a given predicate name is utilized to mine seed rules where, to increase the confidence of rules, those with exceptions are proposed to cover more feedback in the process of knowledge extraction.

Nonmonotonic Inductive Logic Programming. We consider traditional logic programs in definition [41] under knowledge semantics. NILP uses a set of rules,

\forall x_{1} \dots \forall x_{k} (φ (\bar{X}), n o t_E \to ψ (\bar{X})),

(4)

where

ψ (\bar{X})

signifies the first-order atom of

\bar{X}

, acting as

h e a d (r)

.

φ (\bar{X})

is a conjunction of

φ (\bar{x_{1}}), \dots, φ (\bar{x_{k}})

with positive atoms, called

b o d y^{+} (r)

. Then, not_E shows a conjunction of not_

φ (\bar{x_{k + 1}}), \dots, n o t_φ (\bar{x_{n}})

, where

n o t_

symbolizes negation as failure (NAF) or default negation [42]. Consequently,

b o d y^{-} (r)

is denoted as the negative part of the bodies. When

b o d y^{-} (r) = Ø

, the rule r without negative examples is positive.

EILP is inspired by GILP [20] and NILP. With the exception of guidance, GILP is developed and applied in the learning of consistency constraints. A relational atom is

Q (P, \bar{X})

, where

Q

is one of the set predicates

K

,

F^{+}

,

F^{-}

,

T^{+}

, or

T^{-}

, and

\bar{X}

is a vector of constants or variables. P is a given relation in KBs. So,

T^{+} (P, \bar{X})

is an atom, meaning that all constants in

Var (\bar{X})

are deduced to be correct from a rule, while

T^{-} (P, \bar{X})

expresses the collection of vandalized facts.

An EILP, indicated by

ϕ

, is a Horn clause of the formula

\begin{matrix} \forall x_{1} \dots \forall x_{k} (φ (\bar{X}) \to n o t_E \land ψ (\bar{X})) \\ \forall x_{1} \dots \forall x_{k} (E \land φ (\bar{X}) \to ψ (\bar{X})), \end{matrix}

(5)

where

ψ (\bar{X})

is a single atom referred to the concept of logic rule in Section 2.1 (First order logic rules [37]);

φ (\bar{X})

is a conjunction of atoms;

x_{1}, \dots, x_{k}

are variables in

Var (\bar{X})

; and E is the exception of the body.

In addition, by covering the correlational predicates in five set predicates, rule atoms are applied to discriminate facts. Some address facts and feedback from positive or negative facts in a KB can be clearly separated, and some deduced facts are deemed correct or incorrect by relational predicates. Because

F

is either correct or incorrect in our framework, there are two kinds of EILP rules.

Exclusive exception rules have the formula

\begin{matrix} \forall x_{1} \dots \forall x_{k} (φ^{-} (\bar{X}) \land E^{-} \to T^{-} (R, \bar{X})) \\ o r : \forall x_{1} \dots \forall x_{k} (φ^{+} (\bar{X}) \to n o t_E^{+} \land T^{-} (R, \bar{X})) \end{matrix}

(6)

Inclusive exception rules have the formula

\begin{matrix} \forall x_{1} \dots \forall x_{k} (φ^{+} (\bar{X}) \land E^{+} \to T^{+} (R, \bar{X})) \\ o r : \forall x_{1} \dots \forall x_{k} (φ^{-} (\bar{X}) \to n o t_E^{-} \land T^{+} (R, \bar{X})) \end{matrix}

(7)

where

x_{1}, \dots, x_{k}

are variables in

Var (\bar{X})

;

E^{+}

and

E^{-}

are exceptions of positive and negative features, respectively; and

n o t

indicates that rules do not contain these properties.

Example 1.

The following exclusive exception rule mentions that “The family names of all humans with USA nationality are correct”.

$K (hasFamilyName, 〈 x_{1}, y_{1} 〉) \land$ $K (hasNationality, 〈 x_{1}, U S A 〉)$
$\to T^{+} (hasFamilyName, 〈 x_{1}, y_{1} 〉)$
$K (hasFamilyName, 〈 x_{1}, y_{1} 〉) \land$ $K (hasNationality, 〈 x_{1}, y_{1} 〉)$
$\to T^{-} (hasFamilyName, 〈 x_{1}, y_{1} 〉)$ $\land n o t_U S A (y_{1})$

Conversely, the below inclusive exception rule mentions that “The family names of all humans with Asian nationality (like China) are incorrect”.

$K (hasFamilyName, 〈 x_{1}, y_{1} 〉) \land$ $K (hasNationality, 〈 x_{1}, y_{1} 〉)$
$\to T^{+} (hasFamilyName, 〈 x_{1}, y_{1} 〉)$ $\land n o t_C h i n a (y_{1})$
$K (hasFamilyName, 〈 x_{1}, y_{1} 〉) \land$ $K (hasNationality, 〈 x_{1}, C h i n a 〉)$
$\to T^{-} (hasFamilyName, 〈 x_{1}, y_{1} 〉)$

Example 1 shows that two rules contain positive information in the feedback of samples. For the whole knowledge base, some outliers, or exceptions, exist in any two rules. For a single first order Horn rule, it is impossible to contain all triples that meet the conditions of the body and head. The rule learning framework extracts sparsity exception information from feedback. In the GILP model [20], these positive or negative rules filter the same triples repeatedly in different rules, leading to redundancy of knowledge. Horn rules have better explanations for human understanding and great potential for the exploration of interesting negative statements in KBs. Compared to cooccurrence entities in other knowledge bases, some labeled properties are searched to distinguish negative statements in feedback. To automatically mine rule learning, some labels attached to samples are utilized to replace feedback in the process of pruning the search space.

Algorithm 1 describes EILP. Rule augmentation updates rule generation, adding negative atoms (exception) to bodies. A set of seedRules

Φ_{0}

are mined from initial knowledge feedback, and

F_{0}

is generated. The fact

T

is predicated by accepted rules

Φ_{a c c}

, and

T

is initially empty. Similarly, the iteratively merged global rule set

Φ

and

F

are extended from

Φ_{0}

and

F_{0}

, respectively.

Algorithm 1 EILP

Require:

Φ_{a c c}

:= Ø;

T

:= Ø;

Φ

:=

Φ_{0}

;

F

:=

F_{0}

Ensure:

Φ_{a c c} \land E

1:: $Φ_{0} : =$ generateSeedRules( $F_{0}$ )
2:: while $Φ \neq Ø or T c h a n g e d$ do
3:: $Φ_{i} : =$ all rules in $Φ$ which are accepted based on $P r o p e r t y$ in schema-level or cooccurring similar property in similar knowledge base
4:: $Φ_{a c c} \land E^{+} : =$ top-k Exceptional Rules from $Φ_{i}$ by Exception
5:: $Φ_{r e j} \land E^{-} : =$ all rules in $Φ$ which can be rejected by Exception
6:: $T^{+} / T^{-} : =$ positive/negative facts predicted by $Φ_{a c c}$ or $Φ_{r e j}$
7:: if each $ϕ \in Φ$ then
8:: $Φ : = Φ \land E \cup$ ILP( $F$ , $K$ )
9:: $F : = F \cup$ pullPrediction( $T$ )
10:: end if
11:: end while

At each rule learning iteration, all rules in

Φ

which are accepted on the basis of

T B o x

property in schema-level or cooccurring similar property in similar KB. Next,

Φ_{a c c}

is filtered from the current

Φ

by exception in the

T B o x

property based on the top 10 exceptional rules. Here, the rank of confidence is leveraged to discover exceptional rules with higher confidence. Subsequently, the mean quantity of accepted rule sizes between iterations is utilized to settle the selection of K. Referring to embedding methods, K equals 10. Similarly, predicated facts

T

are developed by

Φ_{a c c}

at the current loop. The EILP model wishes to learn by two mutually exclusive sets of feedback facts,

F

and

K

. pullPrediction(

T

) fetches the next round of knowledge feedback as a randomly chosen subset of the currently predicted facts,

T

. The rule learning loops are terminated either when no

Φ

can be created, or when

T

remains unchanged between two rounds.

The

T B o x

and

A B o x

properties are combined to perform feedback automatically in EILP. Then, the EILP model is expanded to iteratively learn consistency constraints from feedback. Here, exceptional information is added to the extraction process of rules. So,

Φ

with higher precision is mined for rule augmentation with Exception. The linked property is leveraged to replace the work that experts perform for the feedback. The rules with exceptions are constructed from automated feedback. As such, the module enhances the effectiveness of rule mining. In the correction module, the previous operation mines as much negative feedback as possible and improves a large number of correct features in positive rules. The highlight of EILP is leveraged to automatically select exceptional features from feedback. Some rough features or fuzzy rules are discarded after the application of exceptions or negative features.

3.3. EILC

To devise a comprehensive correction model, all triples are divided into negative and positive feedback. To repair a group of erroneous triples from a negative rule, an incorrect range constraint of triples is applied. EILC is shown as Algorithm 2.

Algorithm 2 EILC

Require:

(ϕ_{E}^{+} (x, y), ϕ_{E}^{-} (x, y)) : =

EILP(

F_{0}, K

)
Ensure:

C o r r_{n}

1:: $C o r r_{0} : = Ø$ ;
2:: $T^{-} : =$ incorrect facts predicted by $ϕ_{n}^{-}$
3:: $[γ (x)] : = ϕ_{E}^{-} (x, y) - > ξ (x, y, z)$ ;
4:: while $ξ (x, y, z) \neq Ø$ do
5:: $ϕ_{R e f i n e} : = (ϕ_{E}^{+} (x, y), ϕ_{E}^{-} (x, y)) - > ϕ_{r e f i n e}$
6:: $T : =$ all incorrect or correct predictions in $ϕ_{r e f i n e}$ which are accepted by $P r o p e r t y$ in TBox
7:: $C o r r_{i} : = p u l l C o r r e c t i o n s (f i l t e r R e p a i r s (T))$
8:: $ξ (x, y, z)$ remove $T^{-}$
9:: end while

Here, Ø is empty in the initial step. Some negative facts are extracted from

ϕ_{n}^{-}

. Erroneous triples are generated by

ϕ^{-} (x, y)

and kept in

γ (x)

. For errors in large KBs, new rules

(ϕ_{E}^{+} (x, y), ϕ_{E}^{-} (x, y))

are revised to

ϕ_{r e f i n e}

to process negative knowledge. While negative triples are nonempty, all predictions in

ϕ_{r e f i n e}

are accepted by

P r o p e r t y

in

T B o x

. The similarity measures are leveraged in the function of

f i l t e r R e p a i r s

(Section 2.3).

p u l l C o r r e c t i o n s

calculates the final repair similarity to obtain the final corrections. Iterations are terminated when no triples in

ξ (x, y, z)

can initially be generated.

Rewriting the consistency constraints in Algorithm 3 as first-order rules over set predicates, all of the above classes of constraints are unified in a single rule. Some generated

ϕ_{r e f i n e}

are utilized in EILC. The Refine Rules algorithm leverages their cooccurring variables to revise negative rules after adding positive features of positive rules to produce a set of possible repairs for erroneous knowledge in KBs. In Algorithm 3,

m a p V a r i a b l e

is leveraged to form a union of conjunctive rewriting queries as correction rules

ϕ_{r e f i n e}

.

Algorithm 3 Refine Rules

Require:

ϕ^{+} (x, z), ϕ^{-} (x, y)

(positive/negative rules), x is variable; y, z are variables or constants;
Ensure:

ϕ_{r e f i n e} (x, y, z)

: union of conjunctive rewriting queries;

1:: $ϕ_{r e f i n e} : = Ø$ ;
2:: for each rule $r^{-} \in ϕ^{-} (x, y)$ do
3:: $r^{-} \to h e a d^{+} (s u b j e c t^{+}, o b j e c t^{+})$ ;
4:: for each atom p(sub,obj) ∈ body of $r^{+}$ do
5:: if sub== $s u b j e c t^{+}$ then
6:: mapVariable(p(sub, obj), $s u b j e c t^{+}$ , $s u b j e c t^{-}$ ) = $p^{'}$ ( $s u b j e c t^{-}$ , obj);
7:: else
8:: mapVariable(p(sub, obj), $s u b j e c t^{+}$ , $s u b j e c t^{-}$ ) = $p^{'} (s u b^{'}, o b j)$ ;
9:: end if
10:: if obj== $s u b j e c t^{+}$ then
11:: mapVariable(p(sub, obj), $s u b j e c t^{+}$ , $s u b j e c t^{-}$ ) = $p^{'} (s u b, s u b j e c t^{-})$ ;
12:: else
13:: mapVariable(p(sub, obj), $s u b j e c t^{+}$ , $s u b j e c t^{-}$ ) = $p^{'} (s u b, o b j^{'})$ ;
14:: end if
15:: $r_{r e f i n e} = r_{r e f i n e} \cap$ mapVariable(p(sub, obj), $s u b j e c t^{+}$ , $s u b j e c t^{-}$ );
16:: end for
17:: $ϕ_{r e f i n e} : = ϕ_{r e f i n e} \cup r_{r e f i n e}$ ;
18:: end for

m a p V a r i a b l e

reconciles incorrect or correct knowledge in KBs. In a rewriting step, one input order (a negative first-order clause) is reformed as a conjunctive clause (adding a relational positive query) from the sample database. Inspired by Horn description logics (DL) [43], an example of query

ϕ^{*}

is illustrated below, where

q 1

is a revised clause with respect to the property of nationality. The query is evaluated over KB, and corrected item (Germany) replaces erroneous entity (German_people) in the predicate of nationality,

\begin{matrix} ϕ^{*} : s t a t e O f O r i g i n (y, G e r m a n_p e o p l e), r e l a t i v e (x, y), \\ n a t i o n a l i t y (x, G e r m a n_p e o p l e) \land b i r t h P l a c e (x, z), c o u n t r y (z, f) \\ \to n a t i o n a l i t y^{*} (x, f) \end{matrix} .

(8)

For low-quality noise, for example, one triple is <dbr:Max_Amann, dbo:nationality, dbr:Germans> in DBpedia. As this triple is with error range type of relation, these facts with low-quality noise are detected. Then, the correction rules are leveraged to correct these errors. For high-quality noise, an example is <Yao_Ming, hasGivenName, Yao> in YAGO, where the rules guarantee correct constraints of types and schema relations.

Approaches for logic-based link prediction are closed to our research. Extending previous ILP work on relational association rule mining, e.g., AMIE+ [35], mines only first-order rules, and NILPs are mined with type property in KBs. Different body and head are focused on rules. Inspired by rules with exceptions, we focus on more detailed positive rules to reform revised rules from negative statements:

\begin{matrix} ϕ_{1}^{+} : b i r t h P l a c e (a, b) \land t y p e (b, C o u n t r y) \to n a t i o n a l i t y^{+} (a, b) \\ ϕ_{1}^{-} : b i r t h P l a c e (a, b) \land n o t_C o u n t r y (b) \to n a t i o n a l i t y^{-} (a, b) \\ ϕ_{1}^{*} : b i r t h P l a c e (a, b) \land n o t_C o u n t r y (b) \land P_{1} (b, f) \land t y p e (f, C o u n t r y) \\ \to n a t i o n a l i t y^{*} (a, f) \end{matrix} .

(9)

From Equation (9), one predicate is added to

ϕ_{1}^{-}

to create a revised rule. Thus, a new query is created to find

P_{1}

:select ?p where{ ?a birthPlace ?b. ?b ?p ?f. ?f a country. filter not exists{?b a Country.}}. If some entities without nationality relations satisfy query results, these new triples are acquired to complete KBs. The most appropriate rewriting forms are detected to illustrate the semantics of exception rules by revising negative rules with positive features. Finally, the revised triples are extracted from the correction rules to complete knowledge bases.

3.4. Complexity Analysis

EILP was implemented in Java 11 with the SPARQL protocol and RDF query language (SPARQL) (https://www.w3.org/TR/sparql11-query/, accessed on 1 May 2022). Based on small samples, the model mines positive/negative rules with high precision. A website query engine and a local RDF3X-Engine were utilized to validate the results. These architectural choices were motivated by the availability of large knowledge bases. The model is compliant with SPARQL. The computational complexity of EILP, excluding constraints, is linear with respect to the properties linked in the knowledge bases. SPARQL patterns are PSPACE complete. As such, the time complexity relies on the SPARQL engine that computes the queries. The number of graph patterns in queries grows linearly with the size of the rule body. Because our queries contain only AND and FILTER operators, they can be calculated with time complexity

O (∥ \vec{B} ∥ \cdot ∥ N ∥)

, where

\vec{B}

denotes body sizes and N is the total number of triples. EILP and EILC provide pruning optimization in large knowledge bases. As queries for adjacencies might be repeated, a new validation measure is introduced to avoid unnecessary queries and time consumption for large knowledge bases. Moreover, since the rules are sorted by confidence value in descending order, the algorithm can easily prune search spaces with new measures. The model implements a rule-learning algorithm that executes each type (negative/positive rules) with high optimization.

The computational complexity of EILP can be analyzed in detail as follows, incorporating both theoretical bounds and practical optimizations:

The model of EILP complexity is linear with respect to the number of linked exceptional types in KBs. This implies a baseline complexity of

O (L)

, where L is the number of linked types. Rule generation involves iterating over graph patterns, which grow linearly with the size of the rule body (denoted as

\vec{B}

). However, since EILP restricts queries to AND and FILTER operators (excluding UNION or OPTIONAL), the complexity reduces to

O (∥ \vec{B} ∥ \cdot ∥ \vec{N} ∥)

, where

\vec{B}

is the size of the rule body (number of graph patterns), and

\vec{N}

is the total number of RDF triples in the KB.

This simplification avoids exponential blowup, as AND/FILTER operations are tractable via join ordering optimizations in SPARQL engines like RDF3X. Repeated adjacency queries are avoided through a novel validation measure, reducing redundant computations. Rules are sorted by confidence (descending), enabling early termination of search when lower-confidence rules are encountered. This reduces the effective search space in practice. Pruning eliminates most redundant queries, and high confidence rules are found early. Complexity approaches O(L +

∥ \vec{B} ∥ \cdot ∥ \vec{N} ∥

) with a low constant factor. No pruning occurs (e.g., no rules meet confidence thresholds), and all graph patterns are evaluated. Complexity remains O(L +

∥ \vec{B} ∥ \cdot ∥ \vec{N} ∥

) but with higher constants due to exhaustive SPARQL evaluations shown in Table 1.

SPARQL engines like RDF3X use indexing and compression, limiting memory overhead. Intermediate results for AND/FILTER operations scale with the number of matched triples, bounded by

O \vec{N}

. Storing positive/negative rules require space proportional to the number of rules (R) and their body sizes

O \vec{B}

, leading to

O (R \cdot ∥ \vec{N} ∥

). Here, pruning reduces the number of rules retained in memory, lowering effective space usage.

The use of RDF3X Engine leverages efficient indexing, reducing the hidden constants in

O (∥ \vec{B} ∥ \cdot ∥ \vec{N} ∥)

. For large KBs (e.g., Wikidata), pruning and sorted rule evaluation are critical to maintain tractability. The reported linear complexity excludes constraints; incorporating them could reintroduce PSPACE dependencies shown in Table 1. By explicitly addressing these dimensions, the analysis clarifies EILP’s efficiency trade-offs and underscores its suitability for large-scale knowledge bases with sparse constraints.

In our model, we prune search spaces with measures to short time complexity. Here, retrieving more triples related to a rule

ϕ

is performed to obtain a better value of

C P (ϕ)

. For example, suppose that we require the confidence to be 95%, and the lower bound of

C P (ϕ)

to be higher than 0.9; we need the users to comment on at least 34 triples (we assume here all users give correct responses) according to Equation (10).

Finally, we need an extra requirement with respect to user feedback, called correctness probability (CP in short). In terms of statistics, the more tuples covered by a rule

ϕ

, the greater confidence we have about the correctness of

ϕ

. Ideally, we want to obtain comments for all tuples inside

K

which are covered by

ϕ

. Unfortunately, in practice, we can only obtain a limited amount of feedbacks. Suppose we obtain a set of feedbacks

F

, according to the Wilson interval. The probability that

ϕ

is correct, denoted by

C P (ϕ)

, is located within the following range with quantile z:

C P (ϕ) \in_{α} \frac{1}{1 + z^{2} / n} (\hat{p} + \frac{z^{2}}{2 n} \pm z \sqrt{\frac{\hat{p} (1 - \hat{p})}{n} + \frac{z^{2}}{4 \times n^{2}}})

(10)

Here,

\hat{p}

is the observed average, which is the base of consistent instantiations of

ϕ

with respect to

F

; n is the number of instances covered by

ϕ

, and z is the

1 - \frac{1}{2} α

quantile of a standard normal distribution (that is, for z equals 1.96 when

α = 0.95

). Generally speaking, we prefer to rules which are probably correct (i.e.,

p (ϕ)

is larger than

θ

) with high confidence (i.e., z is large enough).

4. Experiments

4.1. General Setup

In experiments, Wikidata (https://query.wikidata.org, accessed on 1 May 2023) and two versions of DBpedia (2016, 2020) (https://wiki.dbpedia.org/develop/datasets, accessed on 1 May 2020) were selected as training sets. Two open and available query editors, DBpedia SPARQL (https://dbpedia.org/sparql, accessed on 1 May 2024) and Wikidata query (https://query.wikidata.org, accessed on 1 May 2023), were used in our models. We chose DBpedia as the baseline and Wikidata as a control group with cooccurring similar triples. The EILLearn model was implemented in Java 1.8.0 on a notebook computer with an Intel i7 CPU at 1.80 GHz with 16 GB memory.

4.2. Error Detection

Our error detection architecture depends on some tuning parameters. The quantity of feedback was 100 in the initial step and the maximum size of the rule was 3. The other rule learning parameters can be seen in [44], and the confidence in the selection of rules was retained at 0.01. In each iteration of the EILP, the confidence of the partial completeness assumption (PCA) was considered a measure of the quality of the rules. EILP was leveraged to complete the error detection task.

In the GILP model, negative feedback is generated by experts. Here, the TBox property was leveraged to replace experts to provide the feedback. For predicate name

n a t i o n a l i t y

, some example rules were learned by GILP, as shown in Table 2. Figure 3 depicts the number of positive and negative GILP rules learned over multiple iterations in different PCA intervals. Among the 10 confidence intervals of the positive rules, the quantity of rules changes fastest in interval [0.9, 1.0], while there are minor differences in other intervals, and their numbers are less than 100. DBpedia can achieve very high average precision values (over 95%) (the estimated precision of those statements is 95% in the Data Set 3.9 of DBpedia), so the amount of negative information mined is very small, and there are also very small gaps in other intervals except [0.9, 1.0]. It can be seen that the quantity of mined positive/negative rules is concentrated in the interval [0.9, 1.0], and rules in this interval can be preserved to generate predictions. Finally, the accuracy threshold of candidate rules is accepted if it is more than 0.9.

Figure 4 describes the number of positive and negative facts predicted over multiple iterations in the GILP model. After about 20 iterations, the experimental results of this cycle reached a stable state, and the number of positive predictions in the PCA confidence interval exceeded the number of negative predictions. The results suggest that negative rules acquire fewer predictions than positive rules. If predictions of positive or negative rules are considered only in interval [0.9, 1.0], some implicit predictions are missing in interval [0, 0.2]. In each loop, the numbers of positive predictions are close in the two intervals. For predicted negative facts, the number in [0, 0.2] is much higher than in [0.9, 1.0]. It can be seen that some implicit information is worth investigating in the interval [0, 0.2]. If all final rules are considered, there is a large number of invalid rules in [0.0–0.2]. Here, top 10 rules are selected as the effective results. In the process of rule extraction, the top 10 rules are filtered by numbers of predictions in the final iteration.

Due to space constraints in experiments, different from expert feedback, our model mines positive/negative rules based on the TBox property of KBs. Here, the largest difference can be observed from numbers and predictions of rules in intervals [0, 0.2] and [0.9, 1.0], while the number of predictions between two intervals in PCA confidence are close, as shown in Figure 5. It can be seen that rules in the interval [0–0.2] hide a large number of new predictions, which are worthy of exploration. Thus, EILP is proposed to explore features to improve the quality of rules.

To obtain implicit information with the confidence interval in the range of [0, 0.2], we mine abnormal rules from the perspective of abnormal information. Here, some example rules are learned by NILP, as shown in Table 3. For positive or negative rules, they have some different exceptions in training data. The disadvantage of NILP is that it requires an ideal knowledge base containing all the information to find outliers for the training set. In

ϕ_{5}^{+}

, the abnormal information is

c i t i z e n s h i p

, and it has a similar property with

n a t i o n a l i t y

. In training sets, the confidence of

ϕ_{5}^{+}

is over 0.9. However, some predictions can be regarded as negative statements following

ϕ_{5}^{+}

. Inspired by GILP and NILP, exceptional information and negative statements are both considered in EILP.

Some rules are learned by EILP, as shown in Table 4. Here, some exceptional information is extracted from feedback or the type of property knowledge. In the examples,

n a t i o n a l i t y

selects range type property

w i k i d a t a

: Q6256 (https://www.wikidata.org/wiki/Q6256, accessed on 1 May 2023) (dbo: Country) (https://dbpedia.org/ontology/, accessed on 1 May 2023) to filter positive feedback. Then, negative feedback is acquired from the property of

w i k i d a t a

: Q41710 (dbo: EthnicGroup) or other exceptional information, e.g.,

d b o

:

l a n g u a g e

,

d b o

:

o r g a n i z a t i o n

,

S p o r t s T e a m

.

We compare GILP and EILP rules to find exception intervals for 20 loops, as shown in Figure 6. Here, a reverse thinking algorithm is leveraged to obtain the number of rules to reduce time complexity. When positive/negative rules with exceptions satisfy

x = 0

, the original rules have 100% precision with positive/negative feedback. Through reversing validation methods, exception intervals are found when the experimental results remain stable after 20 iterations. Coverage of rules still guides our assessment of rule quality: good positive rules should cover many positive/negative example pairs in KB, while negative rules cover as few pairs of outliers as possible in our model. From Figure 6, the outliers covered by positive rules are stabilized after five iterations, while negative rules are balanced after almost 18 iterations. In each iteration, rules cover fewer than 10 exceptional messages, as shown in the interval [0, 10]. These can be defined as rough rules, and there is much effective information that is worthy of exploration. A negative rule contains less negative example information, and the maximum number is 20. A positive rule contains a large amount of valid information, and more than 500 rules have 100% precision. Therefore, positive and negative feedback can be analyzed, and more effective results can be mined for subsequent corrections.

In our experiments, rule learning algorithms are amended to automatically filter rules through external information of the knowledge base. The search space is pruned with a Markov logic network, and the depth of expansion feedback is 3. Combining positive and negative rules, logical queries are devised to correct negative feedback. A single negative rule is acquired randomly to filter the negative knowledge in batches. Four rule learning algorithms, including association rule mining under incomplete evidence (AMIE), GILP, NILP, and EILP, are used to mine rules. Associated rules are considered in our framework and language bias follows the measures in the GILP model. In Figure 7, GILP mines the most positive rules, and EILP has better effects in terms of the quantity of negative feedback. In response to this phenomenon, GILP has stricter constraints for errors in KBs. Negative feedback provides little associated information to enact rules. Further pruning rule learning, an exception is added to the original rules for the selection of the best rules. The goal of rules is to mine positive examples without covering any negative example information.

In five loops, GILP mines the largest number of positive rules, and AMIE mines the most negative rules. The number of rules does not determine their accuracy. Because GILP and AMIE do not consider abnormal information, there are many fuzzy rules, containing both positive and negative information, and the correctness of their predictions cannot be determined. Therefore, exception information is added to GILP and AMIE, and new models EILP and NILP compare regular and exception rules. It can be seen that the performance of EILP is close to that of GILP, and better than that of AMIE on positive rules. In terms of negative feedback, EILP obtains the fewest rules, removes redundant rules, and obtains the best effects. Comparing two cases, it can be found that EILP only contains rules with 100% accuracy, and invalid rules are excluded. Our goal is to obtain the smallest number of rules and the most accurate rules possible. In the feedback of positive and negative examples, EILP is leveraged to achieve this.

We analyze three types of rules (GILP, NILP, EILP) and show those with confidence greater than 0.9 in Table 2, Table 3 and Table 4. The final confidence interval of rules is explained in Figure 8. Because GILP contains no outliers, the GILP model extends EILP and NILP, and the overall effects are better than those of GILP. Therefore, we analyze only the difference between EILP and NILP. The results of the last set of loops are analyzed. For small samples, two exception algorithms are compared and the results explain that EILP has better effects, with a confidence of 1.0. For positive feedback, the rules obtained by EILP achieve a correction precision of 100%, and the number of rules is more than 300. The number of rules obtained by NILP was less than 100, and their confidence satisfied precision > 0.9. Due to the small amount of negative information, the number of rules obtained is small, so it is easy to fall into local optima. In the case of small samples of negative feedback, the rules obtained by EILP have a significant effect when precision is greater than 0.9. Almost no rules satisfy the conditions in NILP. From this, it can be seen that the selection of EILP has a better effect in the rule learning algorithm with exception. Here, all errors can be detected by negative rules, as shown in Table 2, Table 3 and Table 4. Finally, all negative predictions can be generated by EILP and are leveraged in the task of knowledge correction.

4.3. Knowledge Correction

In the error detection part, EILP generates erroneous triples as training data for the task of knowledge correction. We select

n a t i o n a l i t y

as the predicate name to generate erroneous triples in two versions of DBpedia (2016, 2020). Depending on different subjects and objects, the training data are divided into four groups: subject (2016), subject (2020), object (2016), object (2020).

In the EILC model, errors are generated by negative rules in the EILP model. To build correction rules in EILC, the best positive rule is chosen to rebuild the logical queries in EILP. For example, rule

ϕ_{3}^{-}

is picked as the negative sample in Table 2. Analyzing the search space in

ϕ_{3}^{-}

, the correction model finds a positive rule

ϕ_{3}^{+}

to build a logical query for correction. Then, a new refined query (

ϕ_R e v i s e d

) is generated:

b i r t h P l a c e (a, f) \land p o p u l a t i o n P l a c e (b, f) \land n a t i o n a l i t y (a, b) \land t y p e (d, C o u n t r y) \to n a t i o n a l i t y^{*} (a, f)

. The constants on f serve as repairs from the refined clause

ϕ_R e v i s e d

. Typically, the corrected targets

n a t i o n a l i t y

are multivalued. As such, the top 10 repairs are taken to filter the final corrections by similarity measures.

EILC focuses on generating correction rules with exceptions, which have a higher quality to obtain predictions. Here, positive rules with

ϕ_{i}^{+}

revise false statements from negative rules based on the features of subjects. EILC corrects incorrect information on triples with different subjects using Algorithm 3.

Given an error fact, <Adélard_Turgeon, nationality, Canadians>, we extract features in

ϕ_{4}^{+}

of the GILP model: <Adélard_Turgeon, birthPlace, Canada>, <Canada, type, Country>→<Adélard_Turgeon, nationality, Canadians>. In the EILP model, the error triple can also be extracted by

ϕ_{1}^{+}

. Following Equation (9), the exception correction rule is generated to obtain repairs <Canadians, Canada> for the subject <Adélard_Turgeon>. Then, the revised fact is <Adélard_Turgeon, nationality, Canada>. In the EILC algorithm, a single rule only corrects a limited number of error triples. This model corrects the error information of a target predicate name by multiple correction rules so as to achieve the correction of a large knowledge base. Here, some EILC correction rules are shown in Table 5. One symbol ⋈ is utilized to connect negative and positive rules with common attributes. Like Example 1, atoms (

b i r t h P l a c e (a, c) \land c o u n t r y (c, b)

) exist in negative and positive rules. So, the final atom

n a t i o n a l i t y^{+} (a, d)

is applied to acquire repairs for negative rules in batches.

The corresponding subject or object in Wikidata is found by property owl:sameAs in DBpedia, which is utilized as the repair of erroneous triples in triple correction assessment (TCA), which can only be used to verify triples with attribute owl:sameAs. If Wikidata lacks facts existing in DBpedia, then TCA cannot perform verification. Therefore, the verification result of TCA is used to measure the correction effect of EILC. The more error triples exist in KBs, the more time it takes to perform corrections in TCA. We match the error triples in the external knowledge base (Wikidata) and find the target entity to replace the old one as the correction. For small errors, the method is easy and quick. For large KBs, many errors and rules including features of errors are presented in Table 2. A single rule proposes two correlated predicates to the targets. Here, the top k rules are leveraged to perform the refining queries and count the confidence of the top k refined logic rules, saved in the group of correction rules. Then, the fact nationality(Donald_Heins, Canadians) is a vandalized triple, and erroneous Canadians is repaired by Canada in the TCA model.

Unlike EILC, TCA aims to construct a model that enables a machine to query a single fact, and its repair has a high probability of semantic correlation. The tail

G e r m a n_p e o p l e

is false in relation to

n a t i o n a l i t y

, and the correction entity is

G e r m a n y

. For most triples <

?, n a t i o n a l i t y, G e r m a n_p e o p l e

>, the repair triple is

< ?, n a t i o n a l i t y, G e r m a n y

>. The strength of TCA depends on facts in associated KBs that contain information about the target facts. If entities have no additional information, TCA cannot correct them. However, EILC avoids this problem. Figure 9 shows that more than 70% of the facts have a corresponding repair in Wikidata. Here, fewer than 15% of triples cannot obtain correction values by EILC in

s u b j e c t (2016)

. Compared to TCA, EILC is applied to revise incomplete knowledge in the open-world assumption. Two methods receive repair rates based on all entities after the final iterations are finished, containing more than 80% of entities in the given predicate. Analyzing the final results, EILC is advisable for large KB correction. In addition, TCA only applies to the correction of Wikidata related databases, and the corrected precision has little effect on the subject and object of incorrect knowledge. Two methods are leveraged to correct the negative statements in the DBpedia of two versions, since they contain a detailed TBox level to filter the wrong triples.

Compared to TCA and training data, our experiments illustrate the effectiveness of EILC and the improvements in precision for the correction of KB, as shown in Figure 9. The

B a s e

stands for the precision of training data. There are obvious improvements in TCA, and our model is gaining in performance. Two models have near 90% precision after refining the whole KB. The results show that EILC has a more significant effect, and we propose directed logic rules with exception to explanation. All results of empty and correction rates are shown in Figure 10. All DBpedia training data sets have correction rates greater than 0.8. In addition, EILC has an empty rate below 15%. The corrected rates for all facts are received in the final exception rules, which contain more than 80% atoms in the given predicates from DBpedia. Accordingly, EILC is favorable to correct large knowledge bases. In addition, TCA can be utilized to validate triples and supplement EILC.

After acquiring EILC repairs, 14 similarity measures are leveraged to compare these repairs to TCA, as shown in Figure 11. The mistaken entities have single values as the final correction. For multiple values as repairs, cross-similarity is proposed to discover final corrections. Distance similarity measures are leveraged to validate repairs, such as longest common subsequence (LCS), optimal string alignment (OSA), and normalized Levenshtein distance (NLD). Compared to DBpedia, the similarity of repairs in Wikidata is based on word similarity. For a single erroneous triple, the Jaro–Winkler similarity is utilized to validate repairs, and the revised correction has an interval with high precision. In the experiment, 2000 negative entities are randomly selected to verify in the TCA model. The best cross-similarity performance is shown in Figure 11 and Equation (3). So, cross-similarity is leveraged to filter final repairs in the EILC model. The final pairs of errors and corrections exhibit unique characteristics, which have a high degree of word similarity. Here, multiple repairs represent that some examples have an over 90% similarity probability, that is, Jaro–Winkler similarity.

In practice, the precisions of most rules are poloarized, i.e., close to one or close to zero. The following discussions are based on this observation. The left graph of Figure 12 illustrates the lower and upper bounds of the Wilson intervals of a positive rule (correctness probability higher than 0.9) with different numbers of predictions. We can ensure that the rule’s CP is higher than 0.8 with high confidence. Figure 12 shows the result for negative rules from negative feedback. From this figure, we can find that it is higher than 0.8 with high confidence when pulling over 400 predictions. Figure 12 provides a detailed analysis of the Wilson intervals (at

γ = 0.95

). An estimation of the precision (CP) of the learned rules is obtained in

T^{+}

(positive predictions) and

T^{-}

(negative predictions) in each iteration (

| F_{0} | = 40

). Analyzing through experiments, the intervals become tighter not only due to the increased amount of evaluation but also due to the increased amount of feedback collected at each iteration. They show that the accuracy of the predictions in subsequent iterations is greatly improved. After three to eight iterations, the accuracy interval mean of positive predictions converges to about 0.91, and even converges to 0.94 for negative facts, which confirms that our iterative guided rule learning model is particularly good for pruning negative facts.

5. Conclusions

In this paper, we studied the problem of detecting and correcting errors in large KBs. The core of the detection and correction framework was learning some rules, which, with exceptions, helped reveal inconsistencies in constraints that occurred during the construction of a knowledge base in a targeted predicate name. We highlighted the challenges of this problem, especially the exceptional information and the negative statements added to the rules. An iterative EILP algorithm was proposed to detect these mistakes; the EILC algorithm was proposed to correct these errors in the large KB, and appropriate metrics were proposed to assess the quality of the revised facts. The robustness and effectiveness of the proposed framework were verified by experiments in the DBpedia Knowledge Base.

In the future, we will study other kinds of nonmonotonic rules and update the ILP algorithm. Our next research will look into the extraction of conflicting feedback for or against exceptions from literals and the pruning of the search space with clustering algorithms to refine the KB. In addition, embedding methods can be incorporated in rule learning to improve the interpretability of knowledge.

Author Contributions

Conceptualization, Z.Z.; Methodology, Z.Z.; Software, X.L. and Z.Z.; Validation, X.L.; Data curation, Y.W.; Writing—original draft, Y.W.; Writing—review & editing, H.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data openly available in a public repository. The data that support the findings of this study are openly available in DBPedia at https://downloads.dbpedia.org/wiki-archive/ (accessed on 1 May 2025).

Acknowledgments

The authors thank Haojie Lian of the Key Laboratory of In situ Property-improving Mining of Ministry of Education in Taiyuan University of Technology for helpful discussions on topics related to this work.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Bai, Y.; Ying, Z.; Ren, H.; Leskovec, J. Modeling Heterogeneous Hierarchies with Relation-specific Hyperbolic Cones. Adv. Neural Inf. Process. Syst. 2021, 34, 12316–12327. [Google Scholar]
Hao, Y.; Zhang, Y.; Liu, K.; He, S.; Liu, Z.; Wu, H.; Zhao, J. An end-to-end model for question answering over knowledge base with cross-attention combining global knowledge. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, BC, Canada, 30 July–4 August 2017; Volume 1: Long Papers. pp. 221–231. [Google Scholar]
Xiong, C.; Power, R.; Callan, J. Explicit semantic ranking for academic search via knowledge graph embedding. In Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, 3–7 April 2017; pp. 1271–1279. [Google Scholar]
Suchanek, F.M.; Kasneci, G.; Weikum, G. Yago: A core of semantic knowledge. In Proceedings of the 16th International Conference on World Wide Web, Banff, AB, Canada, 8–12 May2007; pp. 697–706. [Google Scholar]
Bizer, C.; Lehmann, J.; Kobilarov, G.; Auer, S.; Becker, C.; Cyganiak, R.; Hellmann, S. Dbpedia-a crystallization point for the web of data. J. Web Semant. 2009, 7, 154–165. [Google Scholar] [CrossRef]
Vrandečić, D.; Krötzsch, M. Wikidata: A free collaborative knowledgebase. Commun. Acm 2014, 57, 78–85. [Google Scholar] [CrossRef]
Fensel, D.; Şimşek, U.; Angele, K.; Huaman, E.; Kärle, E.; Panasiuk, O.; Toma, I.; Umbrich, J.; Wahler, A. Introduction: What is a knowledge graph? In Knowledge Graphs: Methodology, Tools and Selected Use Cases; Springer: Berlin/Heidelberg, Germany, 2020; pp. 1–10. [Google Scholar] [CrossRef]
Bakshi, G.; Shukla, R.; Yadav, V.; Dahiya, A.; Anand, R.; Sindhwani, N.; Singh, H. An optimized approach for feature extraction in multi-relational statistical learning. J. Sci. Ind. Res. 2021, 80, 537–542. [Google Scholar]
Tekli, G. A survey on semi-structured web data manipulations by non-expert users. Comput. Sci. Rev. 2021, 40, 100367. [Google Scholar] [CrossRef]
Legast, M.; Legay, A. Rule-Based Expert System for Energy Optimization: Detection and Identification of Relationships Between Rules in Knowledge Base. Master’s Thesis, UCLouvain, Louvain-la-Neuve, Belgium, 2021. [Google Scholar]
Hangloo, S.; Arora, B. Fake News Detection Tools and Methods—A Review. arXiv 2021, arXiv:2112.11185. [Google Scholar]
Ahmed, M.; Ansar, K.; Muckley, C.B.; Khan, A.; Anjum, A.; Talha, M. A semantic rule based digital fraud detection. Peerj Comput. Sci. 2021, 7, e649. [Google Scholar] [CrossRef]
Ko, H.; Witherell, P.; Lu, Y.; Kim, S.; Rosen, D.W. Machine learning and knowledge graph based design rule construction for additive manufacturing. Addit. Manuf. 2021, 37, 101620. [Google Scholar] [CrossRef]
Tiddi, I.; Schlobach, S. Knowledge graphs as tools for explainable machine learning: A survey. Artif. Intell. 2022, 302, 103627. [Google Scholar] [CrossRef]
Fan, W.; Geerts, F. Foundations of Data Quality Management; Synthesis Lectures on Data Management; Springer: Berlin/Heidelberg, Germany, 2012; Volume 4, pp. 1–217. [Google Scholar]
Michelangelo Diligenti, D.; Francesco Giannini, D.; Marco Gori, D.; Marco Maggini, D.; Marra, G. A Constraint-Based Approach to Learning and Reasoning. Neuro-Symb. Artif. Intell. State Art 2022, 342, 192. [Google Scholar]
Cropper, A.; Dumančić, S. Inductive logic programming at 30: A new introduction. arXiv 2020, arXiv:2008.07912. [Google Scholar] [CrossRef]
Picado, J.; Termehchy, A.; Fern, A.; Pathak, S.; Ilango, P.; Davis, J. Scalable and usable relational learning with automatic language bias. In Proceedings of the 2021 International Conference on Management of Data, Xi’an, China, 20–25 June 2021; pp. 1440–1451. [Google Scholar]
Srinivasan, A.; Faruquie, T.A.; Joshi, S. Data and task parallelism in ILP using MapReduce. Mach. Learn. 2012, 86, 141–168. [Google Scholar] [CrossRef]
Wu, Y.; Chen, J.; Haxhidauti, P.; Venugopal, V.E.; Theobald, M. Guided Inductive Logic Programming: Cleaning Knowledge Bases with Iterative User Feedback. Epic. Ser. Comput. 2020, 72, 92–106. [Google Scholar]
Gad-Elrab, M.H.; Stepanova, D.; Urbani, J.; Weikum, G. Exception-enriched rule learning from knowledge graphs. In Proceedings of the International Semantic Web Conference, Kobe, Japan, 17–21 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 234–251. [Google Scholar]
Xue, B.; Zou, L. Knowledge Graph Quality Management: A Comprehensive Survey. IEEE Trans. Knowl. Data Eng. 2022, 35, 4969–4988. [Google Scholar] [CrossRef]
Dylla, M.; Sozio, M.; Theobald, M. Resolving temporal conflicts in inconsistent RDF knowledge bases. In Datenbanksysteme für Business, Technologie und Web (BTW); Gesellschaft für Informatik e.V.: Bonn, Germany, 2011. [Google Scholar]
Yakout, M.; Berti-Équille, L.; Elmagarmid, A.K. Don’t be scared: Use scalable automatic repairing with maximal likelihood and bounded changes. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, New York, NY, USA, 22–27 June 2013; pp. 553–564. [Google Scholar]
Rekatsinas, T.; Chu, X.; Ilyas, I.F.; Ré, C. Holoclean: Holistic data repairs with probabilistic inference. arXiv 2017, arXiv:1702.00820. [Google Scholar] [CrossRef]
Lertvittayakumjorn, P.; Kertkeidkachorn, N.; Ichise, R. Correcting Range Violation Errors in DBpedia. In Proceedings of the International Semantic Web Conference (Posters, Demos & Industry Tracks), Vienna, Austria, 23–25 October 2017. [Google Scholar]
Ortona, S.; Meduri, V.V.; Papotti, P. Robust discovery of positive and negative rules in knowledge bases. In Proceedings of the 2018 IEEE 34th International Conference on Data Engineering (ICDE), Paris, France, 16–19 April 2018; pp. 1168–1179. [Google Scholar]
Mahdavi, M.; Abedjan, Z. Baran: Effective error correction via a unified context representation and transfer learning. Proc. Vldb Endow. 2020, 13, 1948–1961. [Google Scholar] [CrossRef]
Abedini, F.; Keyvanpour, M.R.; Menhaj, M.B. Correction Tower: A general embedding method of the error recognition for the knowledge graph correction. Int. J. Pattern Recognit. Artif. Intell. 2020, 34, 2059034. [Google Scholar] [CrossRef]
Wu, Y.; Zhang, Z.; Wang, G. Correcting large knowledge bases using guided inductive logic learning rules. In Proceedings of the PRICAI 2021: Trends in Artificial Intelligence: 18th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2021, Hanoi, Vietnam, 8–12 November 2021; Proceedings, Part I 18. Springer: Berlin/Heidelberg, Germany, 2021; pp. 556–571. [Google Scholar]
Wu, Y.; Zhang, Z. Refining large knowledge bases using co-occurring information in associated KBs. Front. Phys. 2023, 11, 1140733. [Google Scholar] [CrossRef]
Pellissier-Tanon, T. Knowledge Base Curation Using Constraints. Ph.D. Thesis, Institut Polytechnique de Paris, Palaiseau, France, 2020. [Google Scholar]
Zaveri, A.; Kontokostas, D.; Sherif, M.A.; Bühmann, L.; Morsey, M.; Auer, S.; Lehmann, J. User-driven quality evaluation of dbpedia. In Proceedings of the 9th International Conference on Semantic Systems, Graz, Austria, 4–6 September 2013; pp. 97–104. [Google Scholar]
Zhang, L.; Wang, W.; Zhang, Y. Privacy preserving association rule mining: Taxonomy, techniques, and metrics. IEEE Access 2019, 7, 45032–45047. [Google Scholar] [CrossRef]
Galárraga, L.; Teflioudi, C.; Hose, K.; Suchanek, F.M. Fast rule mining in ontological knowledge bases with AMIE+. Vldb J. 2015, 24, 707–730. [Google Scholar] [CrossRef]
Hogan, A.; Hogan, A. Resource description framework. In The Web of Data; Springer: Berlin/Heidelberg, Germany, 2020; pp. 59–109. [Google Scholar]
Picado, J.; Davis, J.; Termehchy, A.; Lee, G.Y. Learning Over Dirty Data Without Cleaning. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, Portland, OR, USA, 14–19 June 2020; pp. 1301–1316. [Google Scholar]
Richardson, M.; Domingos, P. Markov logic networks. Mach. Learn. 2006, 62, 107–136. [Google Scholar] [CrossRef]
Wang, Y.; Qin, J.; Wang, W. Efficient approximate entity matching using jaro-winkler distance. In Proceedings of the International Conference on Web Information Systems Engineering, Puschino, Russia, 7–11 October 2017; Springer: Berlin/Heidelberg, Germany, 2017; pp. 231–239. [Google Scholar]
Islam, A.; Inkpen, D. Semantic text similarity using corpus-based word similarity and string similarity. Acm Trans. Knowl. Discov. Data (TKDD) 2008, 2, 10. [Google Scholar] [CrossRef]
Lifschitz, V. Foundations of logic programming. Princ. Knowl. Represent. 1996, 3, 69–127. [Google Scholar]
Lisi, F.A.; Weikum, G. Towards Nonmonotonic Relational Learning from Knowledge Graphs. In Proceedings of the Inductive Logic Programming: 26th International Conference, ILP 2016, London, UK, 4–6 September 2016; Revised Selected Papers. Springer: Berlin/Heidelberg, Germany, 2017; Volume 10326, p. 94. [Google Scholar]
Bienvenu, M.; Hansen, P.; Lutz, C.; Wolter, F. First order-rewritability and containment of conjunctive queries in Horn description logics. arXiv 2020, arXiv:2011.09836. [Google Scholar]
Lajus, J.; Galárraga, L.; Suchanek, F. Fast and Exact Rule Mining with AMIE 3. In Proceedings of the European Semantic Web Conference, Crete, Greece, 31 May–4 June 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 36–52. [Google Scholar]

Figure 1. Search space of feedback.

Figure 2. Architecture of inductive logical model with exception including EILP and EILC.

Figure 3. Rules learned over multiple iterations in PCA interval.

Figure 4. Facts predicted over multiple iterations in PCA interval.

Figure 5. Difference of prediction and number of rules (PCA confidence).

Figure 6. Number of positive/negative rules in exception interval.

Figure 7. Numbers of rules in four algorithms for iterations.

Figure 8. Confidence interval of exception algorithms in final iterations.

Figure 9. Correction rates of EILC and TCA.

Figure 10. Various rates of EILC algorithm.

Figure 11. Correction rates and intervals based on different similarity measures.

Figure 12. Precision vs. predictions over multiple guided rule learning model iterations.

Table 1. Comparative Analysis of Complexity.

Aspect	EILP	Generic SPARQL Based Models
Time Complexity	$O (∥ \vec{B} ∥ \cdot ∥ \vec{N} ∥)$	PSPACE complete (unrestricted)
Space Complexity	$O (R \cdot ∥ \vec{B} ∥ + ∥ \vec{N} ∥)$	$O (∥ \vec{N} ∥^{2})$ (intermediate results)
Scalability	Linear in $∥ \vec{N} ∥$ (optimized)	Limited by PSPACE constraints

Table 2. Example rules learned by GILP.

Positive rules:

ϕ^{+}

head: nationality

1 . b i r t h P l a c e (a, f) \land c o u n t r y (f, b) \to h e a d (a, b)

2 . d e a t h P l a c e (a, f) \land c o u n t r y (f, b) \to h e a d (a, b)

3 . b i r t h P l a c e (a, d) \land r d f T y p e (d, d b o : C o u n t r y) \to h e a d (a, d)

4 . s u c c e s s o r (a, f) \land l e a d e r (b, f) \to h e a d (a, b)

5 . d e a t h P l a c e (a, f) \land c a p i t a l (b, f) \to h e a d (a, b)

6 . d e a t h P l a c e (a, f) \land l a r g e s t C i t y (b, f) \to h e a d (a, b)

Negative rules:

ϕ^{-}

1 . p o p u l a t i o n P l a c e (b, f) \land d e a t h P l a c e (a, f) \to h e a d (a, b)

2 . d e a t h P l a c e (a, f) \land e t h n i c G r o u p (f, b) \to h e a d (a, b)

3 . b i r t h P l a c e (a, f) \land p o p u l a t i o n P l a c e (b, f) \to h e a d (a, b)

4 . k n o w n F o r (a, f) \land d b o : e t h n i c G r o u p (f, b) \to h e a d (a, b)

5 . r e s i d e n c e (a, f) \land p o p u l a t i o n P l a c e (b, f) \to h e a d (a, b)

6 . e t h n i c i t y (a, f) \land r e l a t e d (b, f) \to h e a d (a, b)

Table 3. Example rules learned by NILP.

Positive rules:

ϕ^{+}

head: nationality(a, b)

1 . a c a d e m i c A d v i s o r (a, c) \land b i r t h P l a c e (c, b) \land n o t_r e s i d e n c e (a, b) \to h e a d (a, b)

2 . a s s o c i a t e (a, c) \land c o u n t r y (c, b) \land n o t_d e a t h P l a c e (a, b) \to h e a d (a, b)

3 . b i r t h P l a c e (a, c) \land c o u n t r y (c, b) \land n o t_e d u c a t i o n (a, b) \to h e a d (a, b)

4 . b i r t h P l a c e (a, c) \land e t h n i c G r o u p (c, b) \land n o t_g e n r e (a, b) \to h e a d (a, b)

5 . c h i l d (a, c) \land n a t i o n a l i t y (c, b) \land n o t_c i t i z e n s h i p (a, b) \to h e a d (a, b)

6 . c h i l d (a, c) \land s t a t e O f O r i g i n (c, b) \land n o t_c i t i z e n s h i p (a, b) \to h e a d (a, b)

Negative rules:

ϕ^{-}

1 . b i r t h P l a c e (a, c) \land e t h n i c G r o u p (c, b) \land n o t_g e n r e (a, b) \to h e a d (a, b)

2 . b i r t h P l a c e (a, c) \land o w l : d i f f e r e n t F r o m (c, b) \land n o t_c i t i z e n s h i p (a, b) \to h e a d (a, b)

3 . c h i l d (a, c) \land n a t i o n a l i t y (c, b) \land n o t_b i r t h P l a c e (a, b) \to h e a d (a, b)

4 . c h i l d (a, c) \land s t a t e O f O r i g i n (c, b) \land n o t_c i t i z e n s h i p (a, b) \to h e a d (a, b)

5 . c i t i z e n s h i p (a, c) \land e t h n i c G r o u p (c, b) \land n o t_e t h n i c i t y (a, b) \to h e a d (a, b)

6 . c o u n t r y (a, c) \land e t h n i c G r o u p (c, b) \land n o t_c i t i z e n s h i p (a, b) \to h e a d (a, b)

Table 4. Rules learned by EILP.

Positive rules:

ϕ^{+}

head: nationality

1 . p o p u l a t i o n P l a c e (b, f) \land d e a t h P l a c e (a, f) \land t y p e (b, w i k i d a t a : Q 6256) \to h e a d (a, b)

2 . d e a t h P l a c e (a, f) \land e t h n i c G r o u p (f, b) \land t y p e (b, w i k i d a t a : Q 6256) \to h e a d (a, b)

3 . b i r t h P l a c e (a, f) \land p o p u l a t i o n P l a c e (b, f) \land t y p e (b, w i k i d a t a : Q 6256) \to h e a d (a, b)

4 . b i r t h P l a c e (a, b) \land r e s i d e n c e (a, b) \land n o t_t y p e (b, w i k i d a t a : Q 41710) \to h e a d (a, b)

5 . c o u n t r y (c, b) \land s u c c e s s o r (c, a) \land n o t_t y p e (b, w i k i d a t a : Q 41710) \to h e a d (a, b)

6 . b i r t h P l a c e (a, b) \land s p o u s e (c, a) \land n o t_t y p e (b, w i k i d a t a : Q 41710) \to h e a d (a, b)

Negative rules:

ϕ^{-}

1 . b i r t h P l a c e (a, f) \land c o u n t r y (f, b) \land n o t_t y p e (b, w i k i d a t a : Q 6256) \to h e a d (a, b)

2 . d e a t h P l a c e (a, f) \land c o u n t r y (f, b) \land n o t_t y p e (b, w i k i d a t a : Q 6256) \to h e a d (a, b)

3 . b i r t h P l a c e (a, d) \land t y p e (d, C o u n t r y) \land n o t_t y p e (d, w i k i d a t a : Q 6256) \to h e a d (a, d)

4 . b i r t h P l a c e (a, c) \land p o p u l a t i o n P l a c e (b, c) \land t y p e (b, w i k i d a t a : Q 41710) \to h e a d (a, b)

5 . b i r t h P l a c e (a, c) \land t y p e (b, d b o : E t h i c G r o u p) \to h e a d (a, b)

6 . k n o w n F o r (a, b) \land t y p e (b, d b o : l a n g u a g e) \to h e a d (a, b)

Table 5. Correction rules learned by EILC.

1

b i r t h P l a c e (a, c) \land c o u n t r y (c, b) \land n o t_t y p e (b, w i k i d a t a : Q 6256) \land n a t i o n a l i t y^{-} (a, b) ⋈ P_{1} (b, d) ⋈

p o p u l a t i o n P l a c e (d, c) \land t y p e (d, w i k i d a t a : Q 6256) \to n a t i o n a l i t y^{+} (a, d)

2

d e a t h P l a c e (a, c) \land c o u n t r y (c, b) \land n o t_t y p e (b, w i k i d a t a : Q 6256) \land i s C i t i z e n O f^{-} (a, b) ⋈ P_{1} (b, d) ⋈

e t h n i c G r o u p (c, d) \land t y p e (d, w i k i d a t a : Q 6256) \to i s C i t i z e n O f^{+} (a, d)

3

b i r t h P l a c e (a, b) \land t y p e (b, C o u n t r y) \land n o t_t y p e (b, w i k i d a t a : Q 6256) \land l o c a t i o n C o u n t r y^{-} (a, b) ⋈ P_{1} (b, d) ⋈

r e s i d e n c e (a, d) \land n o t_t y p e (d, w i k i d a t a : Q 41710) \to l o c a t i o n C o u n t r y^{+} (a, d)

4

b i r t h P l a c e (a, c) \land p o p u l a t i o n P l a c e (b, c) \land t y p e (b, w i k i d a t a : Q 41710) \land n a t i o n a l i t y^{-} (a, b) ⋈ P_{1} (b, d) ⋈

t y p e (d, w i k i d a t a : Q 6256) \to n a t i o n a l i t y^{+} (a, d)

5

b i r t h P l a c e (a, c) \land t y p e (b, d b o : E t h i c G r o u p) \land n a t i o n a l i t y^{-} (a, b) ⋈ P_{1} (b, d) ⋈

t y p e (d, w i k i d a t a : Q 6256) \to n a t i o n a l i t y^{+} (a, d)

6

k n o w n F o r (a, b) \land t y p e (b, d b o : l a n g u a g e) \land n a t i o n a l i t y^{-} (a, b) ⋈ P_{1} (b, d) ⋈

t y p e (d, w i k i d a t a : Q 6256) \to n a t i o n a l i t y^{+} (a, d)

7

b i r t h P l a c e (a, c) \land e t h n i c G r o u p (c, b) \land n o t_g e n r e (a, b) \land n a t i o n a l i t y^{-} (a, b)

⋈ c o u n t r y (c, d) \land n o t_e d u c a t i o n (a, d) \to n a t i o n a l i t y^{+} (a, d)

8

b i r t h P l a c e (a, c) \land o w l : d i f f e r e n t F r o m (c, b) \land n o t_c i t i z e n s h i p (a, b) \land n a t i o n a l i t y^{-} (a, b)

⋈ e t h n i c G r o u p (c, d) \land n o t_g e n r e (a, d) \to n a t i o n a l i t y^{+} (a, d)

9

c h i l d (a, c) \land n a t i o n a l i t y (c, b) \land n o t_b i r t h P l a c e (a, b) \land n a t i o n a l i t y^{-} (a, b)

⋈ s t a t e O f O r i g i n (c, d) \land n o t_c i t i z e n s h i p (a, d) \to n a t i o n a l i t y^{+} (a, d)

10

c h i l d (a, c) \land s t a t e O f O r i g i n (c, b) \land n o t_c i t i z e n s h i p (a, b) \land n a t i o n a l i t y^{-} (a, b)

⋈ c o u n t r y (c, d) \land n o t_e d u c a t i o n (a, d) \to n a t i o n a l i t y^{+} (a, d)

11

a c a d e m i c A d v i s o r (a, c) \land e t h n i c G r o u p (c, b) \land n o t_e t h n i c i t y (a, b) \land n a t i o n a l i t y^{-} (a, b)

⋈ b i r t h P l a c e (c, d) \land n o t_r e s i d e n c e (a, d) \to n a t i o n a l i t y^{+} (a, d)

12

a s s o c i a t e (a, c) \land e t h n i c G r o u p (c, b) \land n o t_c i t i z e n s h i p (a, b) \land n a t i o n a l i t y^{-} (a, b)

⋈ c o u n t r y (c, d) \land n o t_d e a t h P l a c e (a, d) \to n a t i o n a l i t y^{+} (a, d)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wu, Y.; Lin, X.; Lian, H.; Zhang, Z. An Inductive Logical Model with Exceptional Information for Error Detection and Correction in Large Knowledge Bases. Mathematics 2025, 13, 1877. https://doi.org/10.3390/math13111877

AMA Style

Wu Y, Lin X, Lian H, Zhang Z. An Inductive Logical Model with Exceptional Information for Error Detection and Correction in Large Knowledge Bases. Mathematics. 2025; 13(11):1877. https://doi.org/10.3390/math13111877

Chicago/Turabian Style

Wu, Yan, Xiao Lin, Haojie Lian, and Zili Zhang. 2025. "An Inductive Logical Model with Exceptional Information for Error Detection and Correction in Large Knowledge Bases" Mathematics 13, no. 11: 1877. https://doi.org/10.3390/math13111877

APA Style

Wu, Y., Lin, X., Lian, H., & Zhang, Z. (2025). An Inductive Logical Model with Exceptional Information for Error Detection and Correction in Large Knowledge Bases. Mathematics, 13(11), 1877. https://doi.org/10.3390/math13111877

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Inductive Logical Model with Exceptional Information for Error Detection and Correction in Large Knowledge Bases

Abstract

1. Introduction

2. Preliminaries

2.1. Problem Statement

2.2. Search Space of Feedback

2.3. Quality Measures

3. Proposed Methods

3.1. Overview of Inductive Logical Model

3.2. EILP

3.3. EILC

3.4. Complexity Analysis

4. Experiments

4.1. General Setup

4.2. Error Detection

4.3. Knowledge Correction

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI