MDPI - Publisher of Open Access Journals

25 pages, 1429 KiB

Open AccessArticle

A Contrastive Semantic Watermarking Framework for Large Language Models

by Jianxin Wang, Xiangze Chang, Chaoen Xiao and Lei Zhang

Symmetry 2025, 17(7), 1124; https://doi.org/10.3390/sym17071124 - 14 Jul 2025

Viewed by 441

The widespread deployment of large language models (LLMs) has raised urgent demands for verifiable content attribution and misuse mitigation. Existing text watermarking techniques often struggle in black-box or sampling-based scenarios due to limitations in robustness, imperceptibility, and detection generality. These challenges are particularly [...] Read more.

The widespread deployment of large language models (LLMs) has raised urgent demands for verifiable content attribution and misuse mitigation. Existing text watermarking techniques often struggle in black-box or sampling-based scenarios due to limitations in robustness, imperceptibility, and detection generality. These challenges are particularly critical in open-access settings, where model internals and generation logits are unavailable for attribution. To address these limitations, we propose CWS (Contrastive Watermarking with Semantic Modeling)—a novel keyless watermarking framework that integrates contrastive semantic token selection and shared embedding space alignment. CWS enables context-aware, fluent watermark embedding while supporting robust detection via a dual-branch mechanism: a lightweight z-score statistical test for public verification and a GRU-based semantic decoder for black-box adversarial robustness. Experiments on GPT-2, OPT-1.3B, and LLaMA-7B over C4 and DBpedia datasets demonstrate that CWS achieves F1 scores up to 99.9% and maintains F1 ≥ 93% under semantic rewriting, token substitution, and lossy compression (ε ≤ 0.25, δ ≤ 0.2). The GRU-based detector offers a superior speed–accuracy trade-off (0.42 s/sample) over LSTM and Transformer baselines. These results highlight CWS as a lightweight, black-box-compatible, and semantically robust watermarking method suitable for practical content attribution across LLM architectures and decoding strategies. Furthermore, CWS maintains a symmetrical architecture between embedding and detection stages via shared semantic representations, ensuring structural consistency and robustness. This semantic symmetry helps preserve detection reliability across diverse decoding strategies and adversarial conditions. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

23 pages, 2623 KiB

Open AccessArticle

An Inductive Logical Model with Exceptional Information for Error Detection and Correction in Large Knowledge Bases

by Yan Wu, Xiao Lin, Haojie Lian and Zili Zhang

Mathematics 2025, 13(11), 1877; https://doi.org/10.3390/math13111877 - 4 Jun 2025

Viewed by 377

Abstract

Some knowledge bases (KBs) extracted from Wikipedia articles can achieve very high average precision values (over 95% in DBpedia). However, subtle mistakes including inconsistencies, outliers, and erroneous relations are usually ignored in the construction of KBs by extraction rules. Automatic detection and correction [...] Read more.

Some knowledge bases (KBs) extracted from Wikipedia articles can achieve very high average precision values (over 95% in DBpedia). However, subtle mistakes including inconsistencies, outliers, and erroneous relations are usually ignored in the construction of KBs by extraction rules. Automatic detection and correction of these subtle errors is important for improving the quality of KBs. In this paper, an inductive logic programming with exceptional information (EILP) is proposed to automatically detect errors in large knowledge bases (KBs). EILP leverages the exceptional information problems that are ignored in conventional rule-learning algorithms such as inductive logic programming (ILP). Furthermore, an inductive logical correction method with exceptional features (EILC) is proposed to automatically correct these mistakes by learning a set of correction rules with exceptional features, in which respective metrics are provided to validate the revised triples. The experimental results demonstrate the effectiveness of EILP and EILC in detecting and repairing large knowledge bases, respectively. Full article

► Show Figures

Figure 1

25 pages, 985 KiB

Open AccessArticle

Construction of Topic Hierarchy with Subtree Representation for Knowledge Graphs

by Yujia Zhang, Wenjie Xu, Zheng Yu and Marek Z. Reformat

Axioms 2025, 14(4), 300; https://doi.org/10.3390/axioms14040300 - 15 Apr 2025

Viewed by 540

Abstract

Hierarchy analysis of the knowledge graphs aims to discover the latent structure inherent in knowledge base data. Drawing inspiration from topic modeling, which identifies latent themes and content patterns in text corpora, our research seeks to adapt these analytical frameworks to the hierarchical [...] Read more.

Hierarchy analysis of the knowledge graphs aims to discover the latent structure inherent in knowledge base data. Drawing inspiration from topic modeling, which identifies latent themes and content patterns in text corpora, our research seeks to adapt these analytical frameworks to the hierarchical exploration of knowledge graphs. Specifically, we adopt a non-parametric probabilistic model, the nested hierarchical Dirichlet process, to the field of knowledge graphs. This model discovers latent subject-specific distributions along paths within the tree. Consequently, the global tree can be viewed as a collection of local subtrees for each subject, allowing us to represent subtrees for each subject and reveal cross-thematic topics. We assess the efficacy of this model in analyzing the topics and word distributions that form the hierarchical structure of complex knowledge graphs. We quantitatively evaluate our model using four common datasets: Freebase, Wikidata, DBpedia, and WebRED, demonstrating that it outperforms the latest neural hierarchical clustering techniques such as TraCo, SawETM, and HyperMiner. Additionally, we provide a qualitative assessment of the induced subtree for a single subject. Full article

► Show Figures

Figure 1

16 pages, 715 KiB

Open AccessArticle

Sentence Embeddings and Semantic Entity Extraction for Identification of Topics of Short Fact-Checked Claims

by Krzysztof Węcel, Marcin Sawiński, Włodzimierz Lewoniewski, Milena Stróżyna, Ewelina Księżniak and Witold Abramowicz

Information 2024, 15(10), 659; https://doi.org/10.3390/info15100659 - 21 Oct 2024

Viewed by 1961

Abstract

The objective of this research was to design a method to assign topics to claims debunked by fact-checking agencies. During the fact-checking process, access to more structured knowledge is necessary; therefore, we aim to describe topics with semantic vocabulary. Classification of topics should [...] Read more.

The objective of this research was to design a method to assign topics to claims debunked by fact-checking agencies. During the fact-checking process, access to more structured knowledge is necessary; therefore, we aim to describe topics with semantic vocabulary. Classification of topics should go beyond simple connotations like instance-class and rather reflect broader phenomena that are recognized by fact checkers. The assignment of semantic entities is also crucial for the automatic verification of facts using the underlying knowledge graphs. Our method is based on sentence embeddings, various clustering methods (HDBSCAN, UMAP, K-means), semantic entity matching, and terms importance assessment based on TF-IDF. We represent our topics in semantic space using Wikidata Q-ids, DBpedia, Wikipedia topics, YAGO, and other relevant ontologies. Such an approach based on semantic entities also supports hierarchical navigation within topics. For evaluation, we compare topic modeling results with claims already tagged by fact checkers. The work presented in this paper is useful for researchers and practitioners interested in semantic topic modeling of fake news narratives. Full article

(This article belongs to the Special Issue Advances in Information Quality: Fact-Checking and AI in the Era of Fake News)

► Show Figures

Figure 1

34 pages, 4271 KiB

Open AccessArticle

Knowledge-Based Recommendation System for Plate Waste Reduction in Latvian Schools

by Sergejs Kodors, Jelena Lonska, Imants Zarembo, Anda Zvaigzne, Ilmars Apeinans and Juta Deksne

Sustainability 2024, 16(19), 8446; https://doi.org/10.3390/su16198446 - 27 Sep 2024

Cited by 1 | Viewed by 1348

Abstract

Food waste indicates ineffective and irresponsible consumption of resources, particularly during the food consumption stage. The aim of our research study is to optimize the catering management process at Latvian schools by reducing the amount of plate waste. The experts developed a set [...] Read more.

Food waste indicates ineffective and irresponsible consumption of resources, particularly during the food consumption stage. The aim of our research study is to optimize the catering management process at Latvian schools by reducing the amount of plate waste. The experts developed a set of recommendations aimed at improving the catering management process at schools. The recommendations developed were supported by measurable parameters, which must be monitored by school staff. The capability-driven development approach was applied to model the recommendation system. A plate waste predictive module and a large language model classifier were integrated into the system to support sustainable decision-making. The large language model classifier was trained to filter questions and recommendations. Three training methods were compared: training from scratch and finetuning by using datasets DBPedia and News Category Dataset. As a result, we present the list of recommendations based on the literature review, and the prototype of the knowledge-based recommendation system was developed to audit the school catering management process and promote sustainable school management and decision-making. The recommendation system aims to reduce plate waste due to deficiencies in the implementation of the catering process and to promote responsible food consumption at schools. Full article

(This article belongs to the Section Sustainable Food)

► Show Figures

Figure 1

16 pages, 2490 KiB

Open AccessArticle

Constructing Semantic Summaries Using Embeddings

by Georgia Eirini Trouli, Nikos Papadakis and Haridimos Kondylakis

Information 2024, 15(4), 238; https://doi.org/10.3390/info15040238 - 20 Apr 2024

Cited by 1 | Viewed by 1945

Abstract

The increase in the size and complexity of large knowledge graphs now available online has resulted in the emergence of many approaches focusing on enabling the quick exploration of the content of those data sources. Structural non-quotient semantic summaries have been proposed in [...] Read more.

The increase in the size and complexity of large knowledge graphs now available online has resulted in the emergence of many approaches focusing on enabling the quick exploration of the content of those data sources. Structural non-quotient semantic summaries have been proposed in this direction that involve first selecting the most important nodes and then linking them, trying to extract the most useful subgraph out of the original graph. However, the current state of the art systems use costly centrality measures for identifying the most important nodes, whereas even costlier procedures have been devised for linking the selected nodes. In this paper, we address both those deficiencies by first exploiting embeddings for node selection, and then by meticulously selecting approximate algorithms for node linking. Experiments performed over two real-world big KGs demonstrate that the summaries constructed using our method enjoy better quality. Specifically, the coverage scores obtained were 0.8, 0.81, and 0.81 for DBpedia v3.9 and 0.94 for Wikidata dump 2018, across 20%, 25%, and 30% summary sizes, respectively. Additionally, our method can compute orders of magnitude faster than the state of the art. Full article

(This article belongs to the Special Issue Feature Papers in Information in 2024–2025)

► Show Figures

Figure 1

23 pages, 1636 KiB

Open AccessArticle

Enhancing Semantic Web Technologies Using Lexical Auditing Techniques for Quality Assurance of Biomedical Ontologies

by Rashmi Burse, Michela Bertolotto and Gavin McArdle

BioMedInformatics 2023, 3(4), 962-984; https://doi.org/10.3390/biomedinformatics3040059 - 1 Nov 2023

Cited by 1 | Viewed by 1374

Abstract

Semantic web technologies (SWT) represent data in a format that is easier for machines to understand. Validating the knowledge in data graphs created using SWT is critical to ensure that the axioms accurately represent the so-called “real” world. However, data graph validation is [...] Read more.

Semantic web technologies (SWT) represent data in a format that is easier for machines to understand. Validating the knowledge in data graphs created using SWT is critical to ensure that the axioms accurately represent the so-called “real” world. However, data graph validation is a significant challenge in the semantic web domain. The Shapes Constraint Language (SHACL) is the latest W3C standard developed with the goal of validating data-graphs. SHACL (pronounced as shackle) is a relatively new standard and hitherto has predominantly been employed to validate generic data graphs like WikiData and DBPedia. In generic data graphs, the name of a class does not affect the shape of a class, but this is not the case with biomedical ontology data graphs. The shapes of classes in biomedical ontology data graphs are highly influenced by the names of the classes, and the SHACL shape creation methods developed for generic data graphs fail to consider this characteristic difference. Thus, the existing SHACL shape creation methods do not perform well for domain-specific biomedical ontology data graphs. Maintaining the quality of biomedical ontology data graphs is crucial to ensure accurate analysis in safety-critical applications like Electronic Health Record (EHR) systems referencing such data graphs. Thus, in this work, we present a novel method to create enhanced SHACL shapes that consider the aforementioned characteristic difference to better validate biomedical ontology data graphs. We leverage the knowledge available from lexical auditing techniques for biomedical ontologies and incorporate this knowledge to create smart SHACL shapes. We also create SHACL shapes (baseline SHACL graph) without incorporating the lexical knowledge of the class names, as is performed by existing methods, and compare the performance of our enhanced SHACL shapes with the baseline SHACL shapes. The results demonstrate that the enhanced SHACL shapes augmented with lexical knowledge of the class names identified 176 violations which the baseline SHACL shapes, void of this lexical knowledge, failed to detect. Thus, the enhanced SHACL shapes presented in this work significantly improve the validation performance of biomedical ontology data graphs, thereby reducing the errors present in such data graphs and ensuring safe use in the life-critical applications referencing them. Full article

(This article belongs to the Special Issue Feature Papers in Applied Biomedical Data Science)

► Show Figures

Figure 1

22 pages, 724 KiB

Open AccessArticle

Semantic Interest Modeling and Content-Based Scientific Publication Recommendation Using Word Embeddings and Sentence Encoders

by Mouadh Guesmi, Mohamed Amine Chatti, Lamees Kadhim, Shoeb Joarder and Qurat Ul Ain

Multimodal Technol. Interact. 2023, 7(9), 91; https://doi.org/10.3390/mti7090091 - 15 Sep 2023

Cited by 2 | Viewed by 3654

Abstract

The fast growth of data in the academic field has contributed to making recommendation systems for scientific papers more popular. Content-based filtering (CBF), a pivotal technique in recommender systems (RS), holds particular significance in the realm of scientific publication recommendations. In a content-based [...] Read more.

The fast growth of data in the academic field has contributed to making recommendation systems for scientific papers more popular. Content-based filtering (CBF), a pivotal technique in recommender systems (RS), holds particular significance in the realm of scientific publication recommendations. In a content-based scientific publication RS, recommendations are composed by observing the features of users and papers. Content-based recommendation encompasses three primary steps, namely, item representation, user modeling, and recommendation generation. A crucial part of generating recommendations is the user modeling process. Nevertheless, this step is often neglected in existing content-based scientific publication RS. Moreover, most existing approaches do not capture the semantics of user models and papers. To address these limitations, in this paper we present a transparent Recommendation and Interest Modeling Application (RIMA), a content-based scientific publication RS that implicitly derives user interest models from their authored papers. To address the semantic issues, RIMA combines word embedding-based keyphrase extraction techniques with knowledge bases to generate semantically-enriched user interest models, and additionally leverages pretrained transformer sentence encoders to represent user models and papers and compute their similarities. The effectiveness of our approach was assessed through an offline evaluation by conducting extensive experiments on various datasets along with user study (N = 22), demonstrating that (a) combining SIFRank and SqueezeBERT as an embedding-based keyphrase extraction method with DBpedia as a knowledge base improved the quality of the user interest modeling step, and (b) using the msmarco-distilbert-base-tas-b sentence transformer model achieved better results in the recommendation generation step. Full article

► Show Figures

Figure 1

28 pages, 1705 KiB

Open AccessArticle

A Purely Entity-Based Semantic Search Approach for Document Retrieval

by Mohamed Lemine Sidi and Serkan Gunal

Appl. Sci. 2023, 13(18), 10285; https://doi.org/10.3390/app131810285 - 14 Sep 2023

Cited by 3 | Viewed by 3271

Abstract

Over the past decade, knowledge bases (KB) have been increasingly utilized to complete and enrich the representation of queries and documents in order to improve the document retrieval task. Although many approaches have used KB for such purposes, the problem of how to [...] Read more.

Over the past decade, knowledge bases (KB) have been increasingly utilized to complete and enrich the representation of queries and documents in order to improve the document retrieval task. Although many approaches have used KB for such purposes, the problem of how to effectively leverage entity-based representation still needs to be resolved. This paper proposes a Purely Entity-based Semantic Search Approach for Information Retrieval (PESS4IR) as a novel solution. The approach includes (i) its own entity linking method and (ii) an inverted indexing method, and for document retrieval and ranking, (iii) an appropriate ranking method is designed to take advantage of all the strengths of the approach. We report the findings on the performance of our approach, which is tested by queries annotated by two known entity linking tools, REL and DBpedia-Spotlight. The experiments are performed on the standard TREC 2004 Robust and MSMARCO collections. By using the REL method on the Robust collection, for the queries whose terms are all annotated and whose average annotation scores are greater than or equal to 0.75, our approach achieves the maximum nDCG@5 score (1.00). Also, it is shown that using PESS4IR alongside another document retrieval method would improve performance, unless that method alone achieves the maximum nDCG@5 score for those highly annotated queries. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

24 pages, 5574 KiB

Open AccessArticle

Attention–Survival Score: A Metric to Choose Better Keywords and Improve Visibility of Information

by Jorge Chamorro-Padial and Rosa Rodríguez-Sánchez

Algorithms 2023, 16(4), 196; https://doi.org/10.3390/a16040196 - 3 Apr 2023

Cited by 2 | Viewed by 2027

Abstract

In this paper, we propose a method to aid authors in choosing alternative keywords that help their papers gain visibility. These alternative keywords must have a certain level of popularity in the scientific community and, simultaneously, be keywords with fewer competitors. The competitors [...] Read more.

In this paper, we propose a method to aid authors in choosing alternative keywords that help their papers gain visibility. These alternative keywords must have a certain level of popularity in the scientific community and, simultaneously, be keywords with fewer competitors. The competitors are derived from other papers containing the same keywords. Having fewer competitors would allow an author’s paper to have a higher consult frequency. In order to recommend keywords, we must first determine an attention–survival score. The attention score is obtained using the popularity of a keyword. The survival score is derived from the number of manuscripts using the same keyword. With these two scores, we created a new algorithm that finds alternative keywords with a high attention–survival score. We used ontologies to ensure that alternative keywords proposed by our method are semantically related to the original authors’ keywords that they wish to refine. The hierarchical structure in an ontology supports the relationship between the alternative and input keywords. To test the sensibility of the ontology, we used two sources: WordNet and the Computer Science Ontology (CSO). Finally, we launched a survey for the human validation of our algorithm using keywords from Web of Science papers and three ontologies: WordNet, CSO, and DBpedia. We obtained good results from all our tests. Full article

(This article belongs to the Section Algorithms for Multidisciplinary Applications)

► Show Figures

Figure 1

17 pages, 924 KiB

Open AccessArticle

PharmKE: Knowledge Extraction Platform for Pharmaceutical Texts Using Transfer Learning

by Nasi Jofche, Kostadin Mishev, Riste Stojanov, Milos Jovanovik, Eftim Zdravevski and Dimitar Trajanov

Computers 2023, 12(1), 17; https://doi.org/10.3390/computers12010017 - 9 Jan 2023

Cited by 9 | Viewed by 3873

Abstract

Even though named entity recognition (NER) has seen tremendous development in recent years, some domain-specific use-cases still require tagging of unique entities, which is not well handled by pre-trained models. Solutions based on enhancing pre-trained models or creating new ones are efficient, but [...] Read more.

Even though named entity recognition (NER) has seen tremendous development in recent years, some domain-specific use-cases still require tagging of unique entities, which is not well handled by pre-trained models. Solutions based on enhancing pre-trained models or creating new ones are efficient, but creating reliable labeled training for them to learn on is still challenging. In this paper, we introduce PharmKE, a text analysis platform tailored to the pharmaceutical industry that uses deep learning at several stages to perform an in-depth semantic analysis of relevant publications. The proposed methodology is used to produce reliably labeled datasets leveraging cutting-edge transfer learning, which are later used to train models for specific entity labeling tasks. By building models for the well-known text-processing libraries spaCy and AllenNLP, this technique is used to find Pharmaceutical Organizations and Drugs in texts from the pharmaceutical domain. The PharmKE platform also incorporates the NER findings to resolve co-references of entities and examine the semantic linkages in each phrase, creating a foundation for further text analysis tasks, such as fact extraction and question answering. Additionally, the knowledge graph created by DBpedia Spotlight for a specific pharmaceutical text is expanded using the identified entities. The obtained results with the proposed methodology result in about a 96% F1-score on the NER tasks, which is up to 2% better than those of the fine-tuned BERT and BioBERT models developed using the same dataset. The ultimate benefits of the platform are that pharmaceutical domain specialists may more easily identify the knowledge extracted from the input texts thanks to the platform’s visualization of the model findings. Likewise, the proposed techniques can be integrated into mobile and pervasive systems to give patients more relevant and comprehensive information from scanned medication guides. Similarly, it can provide preliminary insights to patients and even medical personnel on whether a drug from a different vendor is compatible with the patient’s prescription medication. Full article

(This article belongs to the Special Issue Mobile4Medicine 2022: Mobile Systems and Pervasive Computing for Personalized Medicine)

► Show Figures

Figure 1

15 pages, 1114 KiB

Open AccessArticle

Multi-Feature Extension via Semi-Autoencoder for Personalized Recommendation

by Yishuai Geng, Yi Zhu, Yun Li, Xiaobing Sun and Bin Li

Appl. Sci. 2022, 12(23), 12408; https://doi.org/10.3390/app122312408 - 4 Dec 2022

Cited by 6 | Viewed by 1709

Abstract

Over the past few years, personalized recommendation systems aim to address the problem of information overload to help users achieve useful information and make quick decisions. Recently, due to the benefits of effective representation learning and no labeled data requirements, autoencoder-based models have [...] Read more.

Over the past few years, personalized recommendation systems aim to address the problem of information overload to help users achieve useful information and make quick decisions. Recently, due to the benefits of effective representation learning and no labeled data requirements, autoencoder-based models have commonly been used in recommendation systems. Nonetheless, auxiliary information that can effectively enlarge the feature space is always scarce. Moreover, most existing methods ignore the hidden relations between extended features, which significantly affects the recommendation accuracy. To handle these problems, we propose a Multi-Feature extension method via a Semi-AutoEncoder for personalized recommendation (MFSAE). First, we extract auxiliary information from DBpedia as feature extensions of items. Second, we leverage the LSI model to learn hidden relations on top of item features and embed them into low-dimensional feature vectors. Finally, the resulting feature vectors, combined with the original rating matrix and side information, are fed into a semi-autoencoder for recommendation prediction. We ran comprehensive experiments on the MovieLens datasets. The results demonstrate the effectiveness of MFSAE compared to state-of-the-art methods. Full article

► Show Figures

Figure 1

18 pages, 1029 KiB

Open AccessArticle

Multi-Task Learning and Improved TextRank for Knowledge Graph Completion

by Hao Tian, Xiaoxiong Zhang, Yuhan Wang and Daojian Zeng

Entropy 2022, 24(10), 1495; https://doi.org/10.3390/e24101495 - 20 Oct 2022

Cited by 6 | Viewed by 2856

Abstract

Knowledge graph completion is an important technology for supplementing knowledge graphs and improving data quality. However, the existing knowledge graph completion methods ignore the features of triple relations, and the introduced entity description texts are long and redundant. To address these problems, this [...] Read more.

Knowledge graph completion is an important technology for supplementing knowledge graphs and improving data quality. However, the existing knowledge graph completion methods ignore the features of triple relations, and the introduced entity description texts are long and redundant. To address these problems, this study proposes a multi-task learning and improved TextRank for knowledge graph completion (MIT-KGC) model. The key contexts are first extracted from redundant entity descriptions using the improved TextRank algorithm. Then, a lite bidirectional encoder representations from transformers (ALBERT) is used as the text encoder to reduce the parameters of the model. Subsequently, the multi-task learning method is utilized to fine-tune the model by effectively integrating the entity and relation features. Based on the datasets of WN18RR, FB15k-237, and DBpedia50k, experiments were conducted with the proposed model and the results showed that, compared with traditional methods, the mean rank (MR), top 10 hit ratio (Hit@10), and top three hit ratio (Hit@3) were enhanced by 38, 1.3%, and 1.9%, respectively, on WN18RR. Additionally, the MR and Hit@10 were increased by 23 and 0.7%, respectively, on FB15k-237. The model also improved the Hit@3 and the top one hit ratio (Hit@1) by 3.1% and 1.5% on the dataset DBpedia50k, respectively, verifying the validity of the model. Full article

(This article belongs to the Section Information Theory, Probability and Statistics)

► Show Figures

Figure 1

22 pages, 2368 KiB

Open AccessArticle

Deep Model-Based Security-Aware Entity Alignment Method for Edge-Specific Knowledge Graphs

by Jongmo Kim, Kunyoung Kim, Mye Sohn and Gyudong Park

Sustainability 2022, 14(14), 8877; https://doi.org/10.3390/su14148877 - 20 Jul 2022

Cited by 4 | Viewed by 2118

Abstract

This paper proposes a deep model-based entity alignment method for the edge-specific knowledge graphs (KGs) to resolve the semantic heterogeneity between the edge systems’ data. To do so, this paper first analyzes the edge-specific knowledge graphs (KGs) to find unique characteristics. The deep [...] Read more.

This paper proposes a deep model-based entity alignment method for the edge-specific knowledge graphs (KGs) to resolve the semantic heterogeneity between the edge systems’ data. To do so, this paper first analyzes the edge-specific knowledge graphs (KGs) to find unique characteristics. The deep model-based entity alignment method is developed based on their unique characteristics. The proposed method performs the entity alignment using a graph which is not topological but data-centric, to reflect the characteristics of the edge-specific KGs, which are mainly composed of the instance entities rather than the conceptual entities. In addition, two deep models, namely BERT (bidirectional encoder representations from transformers) for the concept entities and GAN (generative adversarial networks) for the instance entities, are applied to model learning. By utilizing the deep models, neural network models that humans cannot interpret, it is possible to secure data on the edge systems. The two learning models trained separately are integrated using a graph-based deep learning model GCN (graph convolution network). Finally, the integrated deep model is utilized to align the entities in the edge-specific KGs. To demonstrate the superiority of the proposed method, we perform the experiment and evaluation compared to the state-of-the-art entity alignment methods with the two experimental datasets from DBpedia, YAGO, and wikidata. In the evaluation metrics of Hits@k, mean rank (MR), and mean reciprocal rank (MRR), the proposed method shows the best predictive and generalization performance for the KG entity alignment. Full article

(This article belongs to the Special Issue New Insights on Intelligence and Security for Sustainable Applications)

► Show Figures

Figure 1

25 pages, 1776 KiB

Open AccessArticle

Linking Entities from Text to Hundreds of RDF Datasets for Enabling Large Scale Entity Enrichment

by Michalis Mountantonakis and Yannis Tzitzikas

Knowledge 2022, 2(1), 1-25; https://doi.org/10.3390/knowledge2010001 - 24 Dec 2021

Viewed by 4074

Abstract

There is a high increase in approaches that receive as input a text and perform named entity recognition (or extraction) for linking the recognized entities of the given text to RDF Knowledge Bases (or datasets). In this way, it is feasible to retrieve [...] Read more.

There is a high increase in approaches that receive as input a text and perform named entity recognition (or extraction) for linking the recognized entities of the given text to RDF Knowledge Bases (or datasets). In this way, it is feasible to retrieve more information for these entities, which can be of primary importance for several tasks, e.g., for facilitating manual annotation, hyperlink creation, content enrichment, for improving data veracity and others. However, current approaches link the extracted entities to one or few knowledge bases, therefore, it is not feasible to retrieve the URIs and facts of each recognized entity from multiple datasets and to discover the most relevant datasets for one or more extracted entities. For enabling this functionality, we introduce a research prototype, called LODsyndesis

_{I E}

, which exploits three widely used Named Entity Recognition and Disambiguation tools (i.e., DBpedia Spotlight, WAT and Stanford CoreNLP) for recognizing the entities of a given text. Afterwards, it links these entities to the LODsyndesis knowledge base, which offers data enrichment and discovery services for millions of entities over hundreds of RDF datasets. We introduce all the steps of LODsyndesis

_{I E}

, and we provide information on how to exploit its services through its online application and its REST API. Concerning the evaluation, we use three evaluation collections of texts: (i) for comparing the effectiveness of combining different Named Entity Recognition tools, (ii) for measuring the gain in terms of enrichment by linking the extracted entities to LODsyndesis instead of using a single or a few RDF datasets and (iii) for evaluating the efficiency of LODsyndesis

_{I E}

. Full article

► Show Figures

Figure 1

Search Results (33)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (33)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI