applsci-logo

Journal Browser

Journal Browser

Knowledge and Data Engineering

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 20 February 2025 | Viewed by 11071

Special Issue Editors


E-Mail Website
Guest Editor
Department of Informatics, Modeling, Electronics and System Engineering, University of Calabria, 87036 Rende, Italy
Interests: knowledge representation; logic programming; argumentation; incomplete and inconsistent databases

E-Mail Website
Guest Editor
Department of Mechanical, Energy and Management Engineering, University of Calabria, 87036 Rende, Italy
Interests: big data processing; optimization; machine learning; data imputation
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

As advances in technology and data generation provide new opportunities and challenges, the field of knowledge and data engineering continues to evolve and expand. The incorporation of artificial intelligence (AI) and machine learning (ML) algorithms into knowledge and data engineering has resulted in some exciting new developments.

The goal of this Special Issue is to highlight the most recent research and developments in knowledge and data engineering, with a focus on the integration of AI and ML algorithms, as well as their practical and theoretical advancements that can aid in the resolution of real-world problems in the field.

This Special Issue's scope includes, but is not limited to, knowledge representation and reasoning, knowledge graph construction and analysis, data and knowledge integration, knowledge discovery and data mining, data privacy and security, semantic web and linked data, natural language processing, recommendation systems, decision support systems, and big data processing and analysis.

This Special Issue welcomes theoretical as well as practical papers and will gather a diverse range of perspectives from academia and industry, in order to provide a platform for the exchange of ideas and insights on the future of knowledge and data engineering, and AI/ML.

Dr. Irina Trubitsyna
Dr. Reza Shahbazian
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • artificial intelligence
  • data integration
  • knowledge discovery
  • knowledge representation
  • knowledge graphs
  • decision support
  • data privacy
  • semantic web
  • natural language processing

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

26 pages, 1199 KiB  
Article
Exploring the Effectiveness of Shallow and L2 Learner-Suitable Textual Features for Supervised and Unsupervised Sentence-Based Readability Assessment
by Dimitris Kostadimas, Katia Lida Kermanidis and Theodore Andronikos
Appl. Sci. 2024, 14(17), 7997; https://doi.org/10.3390/app14177997 - 7 Sep 2024
Viewed by 360
Abstract
Simplicity in information found online is in demand from diverse user groups seeking better text comprehension and consumption of information in an easy and timely manner. Readability assessment, particularly at the sentence level, plays a vital role in aiding specific demographics, such as [...] Read more.
Simplicity in information found online is in demand from diverse user groups seeking better text comprehension and consumption of information in an easy and timely manner. Readability assessment, particularly at the sentence level, plays a vital role in aiding specific demographics, such as language learners. In this paper, we research model evaluation metrics, strategies for model creation, and the predictive capacity of features and feature sets in assessing readability based on sentence complexity. Our primary objective is to classify sentences as either simple or complex, shifting the focus from entire paragraphs or texts to individual sentences. We approach this challenge as both a classification and clustering task. Additionally, we emphasize our tests on shallow features that, despite their simplistic nature and ease of use, seem to yield decent results. Leveraging the TextStat Python library and the WEKA toolkit, we employ a wide variety of shallow features and classifiers. By comparing the outcomes across different models, algorithms, and feature sets, we aim to offer valuable insights into optimizing the setup. We draw our data from sentences sourced from Wikipedia’s corpus, a widely accessed online encyclopedia catering to a broad audience. We strive to take a deeper look at what leads to greater readability classification in datasets that appeal to audiences such as Wikipedia’s, assisting in the development of improved models and new features for future applications with low feature extraction/processing times. Full article
(This article belongs to the Special Issue Knowledge and Data Engineering)
Show Figures

Figure 1

21 pages, 3543 KiB  
Article
Developing the NLP-QFD Model to Discover Key Success Factors of Short Videos on Social Media
by Hsin-Cheng Wu, Wu-Der Jeng, Long-Sheng Chen and Cheng-Chin Ho
Appl. Sci. 2024, 14(11), 4870; https://doi.org/10.3390/app14114870 - 4 Jun 2024
Viewed by 739
Abstract
In the transition from television to mobile devices, short videos have emerged as the primary content format, possessing tremendous potential in various fields such as marketing, promotion, education, advertising, and so on. However, from the available literature, there is a lack of studies [...] Read more.
In the transition from television to mobile devices, short videos have emerged as the primary content format, possessing tremendous potential in various fields such as marketing, promotion, education, advertising, and so on. However, from the available literature, there is a lack of studies investigating the elements necessary for the success of short videos, specifically regarding what factors need to be considered during production to increase viewership. Therefore, this study proposed the NLP-QFD model, integrating Natural Language Processing (NLP), Latent Dirichlet Allocation (LDA), and Quality Function Deployment (QFD) methods. Real short videos from mainstream Western media (CNN) and regional media (Middle East Eye) will be employed as case studies. In addition to analyzing the content of short videos and audiences’ reviews, we will utilize the NLP-QFD model to identify the key success factors (KSFs) of short videos, providing guidance for future short video creators, especially for small-scale businesses, to produce successful short videos and expand their influence through social media. The results indicate that the success factors for short videos include the movie title, promotion, reviews, and social media. For large enterprises, endorsements by famous individuals are crucial, while music and shooting are key elements for the success of short videos for small businesses. Full article
(This article belongs to the Special Issue Knowledge and Data Engineering)
Show Figures

Figure 1

34 pages, 1274 KiB  
Article
Graph-Driven Exploration of Issue Handling Schemes in Software Projects
by Bartosz Dobrzyński and Janusz Sosnowski
Appl. Sci. 2024, 14(11), 4723; https://doi.org/10.3390/app14114723 - 30 May 2024
Viewed by 380
Abstract
The Issue Tracking System (ITS) repositories are rich sources of software development documentation that are useful in assessing the status and quality of software projects. An original model is proposed for tracing issue handling activities and their impact on project progress. As opposed [...] Read more.
The Issue Tracking System (ITS) repositories are rich sources of software development documentation that are useful in assessing the status and quality of software projects. An original model is proposed for tracing issue handling activities and their impact on project progress. As opposed to classical data mining of software repositories, we consider fine-grained features of issues which provide a better insight into project evolution. A thorough analysis of repository contents allows us to define useful metrics for characterizing issue handling schemes. These metrics are derived from the introduced graph model and developed original data mining algorithms targeting timing, issue flow progress and project actor activity aspects. This study is associated with issue processing states and their sequences (handling paths), leading to problem resolution. The introduced taxonomy of issue processing schemes facilitates the creation of a pertinent knowledge database and the identification of both bad (anomalies) and good practices. The proposed approach is illustrated with experimental results related to a representative set of ITS project repositories. These results enhance experts’ knowledge of the project and can be used for correct decision-making actions. They reveal weak points in project development and possible directions for improvement. Full article
(This article belongs to the Special Issue Knowledge and Data Engineering)
Show Figures

Figure 1

21 pages, 594 KiB  
Article
Analyzing Data Reduction Techniques: An Experimental Perspective
by Vítor Fernandes, Gonçalo Carvalho, Vasco Pereira and Jorge Bernardino
Appl. Sci. 2024, 14(8), 3436; https://doi.org/10.3390/app14083436 - 18 Apr 2024
Viewed by 1615
Abstract
The exponential growth in data generation has become a ubiquitous phenomenon in today’s rapidly growing digital technology. Technological advances and the number of connected devices are the main drivers of this expansion. However, the exponential growth of data presents challenges across different architectures, [...] Read more.
The exponential growth in data generation has become a ubiquitous phenomenon in today’s rapidly growing digital technology. Technological advances and the number of connected devices are the main drivers of this expansion. However, the exponential growth of data presents challenges across different architectures, particularly in terms of inefficient energy consumption, suboptimal bandwidth utilization, and the rapid increase in data stored in cloud environments. Therefore, data reduction techniques are crucial to reduce the amount of data transferred and stored. This paper provides a comprehensive review of various data reduction techniques and introduces a taxonomy to classify these methods based on the type of data loss. The experiments conducted in this study include distinct data types, assessing the performance and applicability of these techniques across different datasets. Full article
(This article belongs to the Special Issue Knowledge and Data Engineering)
Show Figures

Figure 1

24 pages, 1032 KiB  
Article
Recommendation Algorithm Based on Survival Action Rules
by Marek Hermansa, Marek Sikora, Beata Sikora and Łukasz Wróbel
Appl. Sci. 2024, 14(7), 2939; https://doi.org/10.3390/app14072939 - 30 Mar 2024
Viewed by 762
Abstract
Survival analysis is widely used in fields such as medical research and reliability engineering to analyze data where not all subjects experience the event of interest by the end of the study. It requires dedicated methods capable of handling censored cases. This paper [...] Read more.
Survival analysis is widely used in fields such as medical research and reliability engineering to analyze data where not all subjects experience the event of interest by the end of the study. It requires dedicated methods capable of handling censored cases. This paper extends the collection of techniques applicable to censored data by introducing a novel algorithm for interpretable recommendations based on a set of survival action rules. Each action rule contains recommendations for changing the values of attributes describing examples. As a result of applying the action rules, an example is moved from a group characterized by a survival curve to another group with a significantly different survival rate. In practice, an example can be covered by several induced rules. To decide which attribute values should be changed, we propose a recommendation algorithm that analyzes all actions suggested by the rules covering the example. The efficiency of the algorithm has been evaluated on several benchmark datasets. We also present a qualitative analysis of the generated recommendations through a case study. The results indicate that the proposed method produces high-quality recommendations and leads to a significant change in the estimated survival time. Full article
(This article belongs to the Special Issue Knowledge and Data Engineering)
Show Figures

Figure 1

27 pages, 6880 KiB  
Article
Using Level-Based Multiple Reasoning in a Web-Based Intelligent System for the Diagnosis of Farmed Fish Diseases
by Konstantinos Kovas, Ioannis Hatzilygeroudis, Konstantinos Dimitropoulos, Georgios Spiliopoulos, Konstantinos Poulos, Evi Abatzidou, Theofanis Aravanis, Aristeidis Ilias, Grigorios Kanlis and John A. Theodorou
Appl. Sci. 2023, 13(24), 13059; https://doi.org/10.3390/app132413059 - 7 Dec 2023
Cited by 1 | Viewed by 1322
Abstract
Farmed fish disease diagnosis is an important problem in the fish farming industry, affecting quality of production and financial losses. In this paper, we present a web-based intelligent system that tackles the problem of fish disease diagnosis. To this end, it uses multiple [...] Read more.
Farmed fish disease diagnosis is an important problem in the fish farming industry, affecting quality of production and financial losses. In this paper, we present a web-based intelligent system that tackles the problem of fish disease diagnosis. To this end, it uses multiple knowledge representation and reasoning methods: rule-based, case-based, weight-based, and voting. Knowledge, which concerns the diagnosis of sea bass diseases, was acquired from experts in the field and represented in the form of decision trees. The diagnostic process is performed in two stages: a general one and a specialized one. In the general stage, a level-based diagnosis is performed, where environmental parameters, external signs, and internal signs are successively examined, and the three most probable diseases are identified. In the specialized stage, which is optional, a specialized expert system is used for each of the resulting diseases, where additional parameters concerning laboratory tests (microbiological, microscopic, molecular, and chemical) are considered. The general stage is the most useful, given that it can be performed on-site in real-time, whereas the specialized one requires time-consuming lab tests. The system also provides explanations for its decisions. Evaluation of the general-stage diagnostic process showed a top-3 accuracy of 78.79% on expert test cases and 94% on an artificial dataset. Full article
(This article belongs to the Special Issue Knowledge and Data Engineering)
Show Figures

Figure 1

24 pages, 4084 KiB  
Article
SKATEBOARD: Semantic Knowledge Advanced Tool for Extraction, Browsing, Organisation, Annotation, Retrieval, and Discovery
by Eleonora Bernasconi, Davide Di Pierro, Domenico Redavid and Stefano Ferilli
Appl. Sci. 2023, 13(21), 11782; https://doi.org/10.3390/app132111782 - 27 Oct 2023
Cited by 2 | Viewed by 1379
Abstract
This paper introduces Semantic Knowledge Advanced Tool for Extraction Browsing Organisation Annotation Retrieval and Discovery (SKATEBOARD), a tool designed to facilitate knowledge exploration through the application of semantic technologies. The demand for advanced solutions that streamline Knowledge Extraction, management, and visualisation, characterised by [...] Read more.
This paper introduces Semantic Knowledge Advanced Tool for Extraction Browsing Organisation Annotation Retrieval and Discovery (SKATEBOARD), a tool designed to facilitate knowledge exploration through the application of semantic technologies. The demand for advanced solutions that streamline Knowledge Extraction, management, and visualisation, characterised by abundant information, has grown substantially in the current era. Graph-based representations have emerged as a robust approach for uncovering intricate data relationships, complementing the capabilities offered by AI models. Acknowledging the transparency and user control challenges faced by AI-driven solutions, SKATEBOARD offers a comprehensive framework encompassing Knowledge Extraction, ontology development, management, and interactive exploration. By adhering to Linked Data principles and adopting graph-based exploration, SKATEBOARD provides users with a clear view of data relationships and dependencies. Furthermore, it integrates recommendation systems and reasoning capabilities to augment the knowledge discovery process, thus introducing a serendipity effect generated by the SKATEBOARD interface exploration. This paper elucidates SKATEBOARD’s functionalities while emphasising its user-centric design. After reviewing related research, we provide an overview of the SKATEBOARD pipeline, demonstrating its capacity to bridge RDF and LPG representations. Subsequent sections delve into Knowledge Extraction and exploration, culminating in the evaluation of the tool. SKATEBOARD empowers users to make informed decisions and uncover valuable insights within their data domains, with the added dimension of serendipitous discoveries facilitated by its interface exploration capabilities. Full article
(This article belongs to the Special Issue Knowledge and Data Engineering)
Show Figures

Figure 1

18 pages, 2572 KiB  
Article
A Domain-Oriented Entity Alignment Approach Based on Filtering Multi-Type Graph Neural Networks
by Yaoli Xu, Jinjun Zhong, Suzhi Zhang, Chenglin Li, Pu Li, Yanbu Guo, Yuhua Li, Hui Liang and Yazhou Zhang
Appl. Sci. 2023, 13(16), 9237; https://doi.org/10.3390/app13169237 - 14 Aug 2023
Cited by 1 | Viewed by 1301
Abstract
Owing to the heterogeneity and incomplete information present in various domain knowledge graphs, the alignment of distinct source entities that represent an identical real-world entity becomes imperative. Existing methods focus on cross-lingual knowledge graph alignment, and assume that the entities of knowledge graphs [...] Read more.
Owing to the heterogeneity and incomplete information present in various domain knowledge graphs, the alignment of distinct source entities that represent an identical real-world entity becomes imperative. Existing methods focus on cross-lingual knowledge graph alignment, and assume that the entities of knowledge graphs in the same language are unique. However, due to the ambiguity of language, heterogeneous knowledge graphs in the same language are often duplicated, and relationship triples are far less than those of cross-lingual knowledge graphs. Moreover, existing methods rarely exclude noisy entities in the process of alignment. These make it impossible for existing methods to deal effectively with the entity alignment of domain knowledge graphs. In order to address these issues, we propose a novel entity alignment approach based on domain-oriented embedded representation (DomainEA). Firstly, a filtering mechanism employs the language model to extract the semantic features of entities and to exclude noisy entities for each entity. Secondly, a Structural Aggregator (SA) incorporates multiple hidden layers to generate high-order neighborhood-aware embeddings of entities that have few relationship connections. An Attribute Aggregator (AA) introduces self-attention to dynamically calculate weights that represent the importance of the attribute values of the entities. Finally, the approach calculates a transformation matrix to map the embeddings of distinct domain knowledge graphs onto a unified space, and matches entities via the joint embeddings of the SA and AA. Compared to six state-of-the-art methods, our experimental results on multiple food datasets show the following: (i) Our approach achieves an average improvement of 6.9% on MRR. (ii) The size of the dataset has a subtle influence on our approach; there is a positive correlation between the expansion of the dataset size and an improvement in most of the metrics. (iii) We can achieve a significant improvement in the level of recall by employing a filtering mechanism that is limited to the top-100 nearest entities as the candidate pairs. Full article
(This article belongs to the Special Issue Knowledge and Data Engineering)
Show Figures

Figure 1

17 pages, 2481 KiB  
Article
Detection of Software Security Weaknesses Using Cross-Language Source Code Representation (CLaSCoRe)
by Sergiu Zaharia, Traian Rebedea and Stefan Trausan-Matu
Appl. Sci. 2023, 13(13), 7871; https://doi.org/10.3390/app13137871 - 4 Jul 2023
Cited by 1 | Viewed by 1743
Abstract
The research presented in the paper aims at increasing the capacity to identify security weaknesses in programming languages that are less supported by specialized security analysis tools, based on the knowledge gathered from securing the popular ones, for which security experts, scanners, and [...] Read more.
The research presented in the paper aims at increasing the capacity to identify security weaknesses in programming languages that are less supported by specialized security analysis tools, based on the knowledge gathered from securing the popular ones, for which security experts, scanners, and labeled datasets are, in general, available. This goal is vital in reducing the overall exposure of software applications. We propose a solution to expand the capabilities of security gaps detection to downstream languages, influenced by their more popular “ancestors” from the programming languages’ evolutionary tree, using language keyword tokenization and clustering based on word embedding techniques. We show that after training a machine learning algorithm on C, C++, and Java applications developed by a community of programmers with similar behavior of writing code, we can detect, with acceptable accuracy, similar vulnerabilities in C# source code written by the same community. To achieve this, we propose a core cross-language representation of source code, optimized for security weaknesses classifiers, named CLaSCoRe. Using this method, we can achieve zero-shot vulnerability detection—in our case, without using any training data with C# source code. Full article
(This article belongs to the Special Issue Knowledge and Data Engineering)
Show Figures

Figure 1

Back to TopTop