Applied Sciences

Research

35 pages, 1415 KiB

Open AccessArticle

A Knowledge and Semantic Fusion Method for Automatic Geometry Problem Understanding

by Ying Wang, Wei Zhou, Yongsheng Rao and Hao Guan

Appl. Sci. 2025, 15(7), 3857; https://doi.org/10.3390/app15073857 - 1 Apr 2025

Viewed by 705

Geometry problem understanding (GPU) is a fundamental task in machine intelligence for problem-solving, requiring more accurate and complete information extraction than general natural language understanding tasks. This paper proposes a knowledge and semantic fusion method to achieve high-quality, interpretable, and scalable GPU. It [...] Read more.

Geometry problem understanding (GPU) is a fundamental task in machine intelligence for problem-solving, requiring more accurate and complete information extraction than general natural language understanding tasks. This paper proposes a knowledge and semantic fusion method to achieve high-quality, interpretable, and scalable GPU. It extracts text-level and knowledge-level entities and relationships from problem texts and transforms them into a semantic knowledge graph. First, a dual-layer semantic-enhanced knowledge ontology model (SGKO) tailored for the geometry domain is constructed. By separating the ontology and data layers and combining the strengths of both the knowledge system type ontology and the semantic network type ontology, it enables bidirectional association between conceptual-level knowledge and object-level textual data. Second, a dynamically generated modular relationship matching template is introduced, which is decomposed into reusable atomic components and dynamically assembled through knowledge base queries, significantly reducing template quantity while enhancing adaptability to complex text structures. Additionally, a state-machine-based semantic information extraction model (IDIM-T) is designed that achieves efficient and interpretable semantic extraction through categorized relationship description types. This is combined with a rule-based method (IDIM-K) to complete knowledge-level entity relationship extraction. To validate the method, a dataset was constructed from authoritative sources, including past middle school exam questions, textbooks, and exercise books, covering unary, binary, and ternary relationships, as well as single-clause, cross-clause, and multi-relationship conjunction expressions. Experiments on 230 problems with complex relational descriptions showed that the proposed method achieved fully accurate two-level relationship parsing for 91.87% of the problems. Compared with four baseline methods (sentence template-based, Bi-LSTM-based, Transformer-based, and

S^{2}

-based), the method achieved the highest F1 score (0.974) for 1832 relationships, outperforming the highest F1 score (0.900) of the baselines. Full article

(This article belongs to the Special Issue Knowledge and Data Engineering)

► Show Figures

Figure 1

20 pages, 652 KiB

Open AccessArticle

Deep Clustering of Student Code Strategies Using Multi-View Code Representation (CMVAE)

by Zhengting Tang, Shizhou Wang and Liangyu Chen

Appl. Sci. 2025, 15(7), 3462; https://doi.org/10.3390/app15073462 - 21 Mar 2025

Viewed by 656

Abstract

In programming education, it is common for students to submit solutions to algorithmic problems that implement the same functionality but are not labeled, making it difficult to identify which codes employ similar strategies. Students approach problem-solving in diverse ways, and each problem can [...] Read more.

In programming education, it is common for students to submit solutions to algorithmic problems that implement the same functionality but are not labeled, making it difficult to identify which codes employ similar strategies. Students approach problem-solving in diverse ways, and each problem can be solved using multiple programming strategies. Existing code representation methods typically rely on labeled datasets and task-specific training, limiting their generalizability. To address this, this paper proposes CMVAE, a deep clustering model that leverages multi-view representations for group student code based on problem-solving strategies. The model captures structural features by transforming code into tree graphs and extracting centrality measures, while CodeBERT provides semantic embeddings. Through a joint optimization of reconstruction loss and clustering loss, the model effectively integrates multiple code representations. Experimental results on C# and Python datasets show that CMVAE outperforms traditional and deep clustering baselines, producing more compact and well-separated clusters. CMVAE can assist educators in analyzing student approaches, providing targeted feedback for optimization, and enhancing programming pedagogy. Full article

(This article belongs to the Special Issue Knowledge and Data Engineering)

► Show Figures

Figure 1

20 pages, 4590 KiB

Open AccessArticle

A Digital Twin-Based Approach for Detecting Cyber–Physical Attacks in ICS Using Knowledge Discovery

by Marco Lucchese, Giuseppe Salerno and Andrea Pugliese

Appl. Sci. 2024, 14(19), 8665; https://doi.org/10.3390/app14198665 - 26 Sep 2024

Cited by 3 | Viewed by 2279

Abstract

The integration and automation of industrial processes has brought significant gains in efficiency and productivity but also elevated cybersecurity risks, especially in the process industry. This paper introduces a methodology utilizing process mining and digital twins to enhance anomaly detection in Industrial Control [...] Read more.

The integration and automation of industrial processes has brought significant gains in efficiency and productivity but also elevated cybersecurity risks, especially in the process industry. This paper introduces a methodology utilizing process mining and digital twins to enhance anomaly detection in Industrial Control Systems (ICS). By converting raw device logs into event logs, we uncover patterns and anomalies indicative of cyberattacks even when such attacks are masked by normal operational data. We present a detailed case study replicating an industrial process to demonstrate the practical application of our approach. Experimental results confirm the effectiveness of our method in identifying cyber–physical attacks within a realistic industrial setting. Full article

(This article belongs to the Special Issue Knowledge and Data Engineering)

► Show Figures

Figure 1

14 pages, 4237 KiB

Open AccessArticle

Design of a Mixed-Reality Application to Reduce Pediatric Medication Errors in Prehospital Emergency Care

by Vaishnavi Satya Sreeja Ankam, Guan Yue Hong and Alvis C. Fong

Appl. Sci. 2024, 14(18), 8426; https://doi.org/10.3390/app14188426 - 19 Sep 2024

Cited by 3 | Viewed by 2244

Abstract

Children in prehospital emergency care are particularly vulnerable to medication errors, often with serious consequences. A prior study analyzing prehospital pediatric medication dosing errors, conducted after the implementation of a statewide pediatric drug-dosing reference for emergency medical services (EMS), identified an alarmingly high [...] Read more.

Children in prehospital emergency care are particularly vulnerable to medication errors, often with serious consequences. A prior study analyzing prehospital pediatric medication dosing errors, conducted after the implementation of a statewide pediatric drug-dosing reference for emergency medical services (EMS), identified an alarmingly high error rate. This significant finding led to the current study, which aims to develop technological interventions to reduce the frequency of medication errors for children during treatment by EMS. The current study focuses on the design and development of a safety strategy to automate medication administration using mixed-reality technology. Simulations were conducted to inform the design process, focusing on three scenarios: cardiac arrest, seizure, and burns. The design team included medical and engineering researchers, paramedics, and emergency medical technicians from multiple emergency medical service agencies. Root cause analysis (RCA) and failure mode and effects analysis (FMEA) were conducted after the simulations were completed. The RCA and FMEA were used to identify and prioritize failure points, which were then addressed in a mixed-reality solution using Microsoft HoloLens 2 to automate and enhance pediatric medication administration in prehospital emergency care. The resulting application will provide real-time assistance to guide paramedics through the complicated medication dosing and administration process using a detailed step-by-step guide, aiming to decrease medication errors and improve medication dosing accuracy. Full article

(This article belongs to the Special Issue Knowledge and Data Engineering)

► Show Figures

Figure 1

26 pages, 1199 KiB

Open AccessArticle

Exploring the Effectiveness of Shallow and L2 Learner-Suitable Textual Features for Supervised and Unsupervised Sentence-Based Readability Assessment

by Dimitris Kostadimas, Katia Lida Kermanidis and Theodore Andronikos

Appl. Sci. 2024, 14(17), 7997; https://doi.org/10.3390/app14177997 - 7 Sep 2024

Viewed by 1681

Abstract

Simplicity in information found online is in demand from diverse user groups seeking better text comprehension and consumption of information in an easy and timely manner. Readability assessment, particularly at the sentence level, plays a vital role in aiding specific demographics, such as [...] Read more.

Simplicity in information found online is in demand from diverse user groups seeking better text comprehension and consumption of information in an easy and timely manner. Readability assessment, particularly at the sentence level, plays a vital role in aiding specific demographics, such as language learners. In this paper, we research model evaluation metrics, strategies for model creation, and the predictive capacity of features and feature sets in assessing readability based on sentence complexity. Our primary objective is to classify sentences as either simple or complex, shifting the focus from entire paragraphs or texts to individual sentences. We approach this challenge as both a classification and clustering task. Additionally, we emphasize our tests on shallow features that, despite their simplistic nature and ease of use, seem to yield decent results. Leveraging the TextStat Python library and the WEKA toolkit, we employ a wide variety of shallow features and classifiers. By comparing the outcomes across different models, algorithms, and feature sets, we aim to offer valuable insights into optimizing the setup. We draw our data from sentences sourced from Wikipedia’s corpus, a widely accessed online encyclopedia catering to a broad audience. We strive to take a deeper look at what leads to greater readability classification in datasets that appeal to audiences such as Wikipedia’s, assisting in the development of improved models and new features for future applications with low feature extraction/processing times. Full article

(This article belongs to the Special Issue Knowledge and Data Engineering)

► Show Figures

Figure 1

21 pages, 3543 KiB

Open AccessArticle

Developing the NLP-QFD Model to Discover Key Success Factors of Short Videos on Social Media

by Hsin-Cheng Wu, Wu-Der Jeng, Long-Sheng Chen and Cheng-Chin Ho

Appl. Sci. 2024, 14(11), 4870; https://doi.org/10.3390/app14114870 - 4 Jun 2024

Viewed by 2124

Abstract

In the transition from television to mobile devices, short videos have emerged as the primary content format, possessing tremendous potential in various fields such as marketing, promotion, education, advertising, and so on. However, from the available literature, there is a lack of studies [...] Read more.

In the transition from television to mobile devices, short videos have emerged as the primary content format, possessing tremendous potential in various fields such as marketing, promotion, education, advertising, and so on. However, from the available literature, there is a lack of studies investigating the elements necessary for the success of short videos, specifically regarding what factors need to be considered during production to increase viewership. Therefore, this study proposed the NLP-QFD model, integrating Natural Language Processing (NLP), Latent Dirichlet Allocation (LDA), and Quality Function Deployment (QFD) methods. Real short videos from mainstream Western media (CNN) and regional media (Middle East Eye) will be employed as case studies. In addition to analyzing the content of short videos and audiences’ reviews, we will utilize the NLP-QFD model to identify the key success factors (KSFs) of short videos, providing guidance for future short video creators, especially for small-scale businesses, to produce successful short videos and expand their influence through social media. The results indicate that the success factors for short videos include the movie title, promotion, reviews, and social media. For large enterprises, endorsements by famous individuals are crucial, while music and shooting are key elements for the success of short videos for small businesses. Full article

(This article belongs to the Special Issue Knowledge and Data Engineering)

► Show Figures

Figure 1

34 pages, 1274 KiB

Open AccessArticle

Graph-Driven Exploration of Issue Handling Schemes in Software Projects

by Bartosz Dobrzyński and Janusz Sosnowski

Appl. Sci. 2024, 14(11), 4723; https://doi.org/10.3390/app14114723 - 30 May 2024

Viewed by 986

Abstract

The Issue Tracking System (ITS) repositories are rich sources of software development documentation that are useful in assessing the status and quality of software projects. An original model is proposed for tracing issue handling activities and their impact on project progress. As opposed [...] Read more.

The Issue Tracking System (ITS) repositories are rich sources of software development documentation that are useful in assessing the status and quality of software projects. An original model is proposed for tracing issue handling activities and their impact on project progress. As opposed to classical data mining of software repositories, we consider fine-grained features of issues which provide a better insight into project evolution. A thorough analysis of repository contents allows us to define useful metrics for characterizing issue handling schemes. These metrics are derived from the introduced graph model and developed original data mining algorithms targeting timing, issue flow progress and project actor activity aspects. This study is associated with issue processing states and their sequences (handling paths), leading to problem resolution. The introduced taxonomy of issue processing schemes facilitates the creation of a pertinent knowledge database and the identification of both bad (anomalies) and good practices. The proposed approach is illustrated with experimental results related to a representative set of ITS project repositories. These results enhance experts’ knowledge of the project and can be used for correct decision-making actions. They reveal weak points in project development and possible directions for improvement. Full article

(This article belongs to the Special Issue Knowledge and Data Engineering)

► Show Figures

Figure 1

21 pages, 594 KiB

Open AccessArticle

Analyzing Data Reduction Techniques: An Experimental Perspective

by Vítor Fernandes, Gonçalo Carvalho, Vasco Pereira and Jorge Bernardino

Appl. Sci. 2024, 14(8), 3436; https://doi.org/10.3390/app14083436 - 18 Apr 2024

Cited by 5 | Viewed by 5727

Abstract

The exponential growth in data generation has become a ubiquitous phenomenon in today’s rapidly growing digital technology. Technological advances and the number of connected devices are the main drivers of this expansion. However, the exponential growth of data presents challenges across different architectures, [...] Read more.

The exponential growth in data generation has become a ubiquitous phenomenon in today’s rapidly growing digital technology. Technological advances and the number of connected devices are the main drivers of this expansion. However, the exponential growth of data presents challenges across different architectures, particularly in terms of inefficient energy consumption, suboptimal bandwidth utilization, and the rapid increase in data stored in cloud environments. Therefore, data reduction techniques are crucial to reduce the amount of data transferred and stored. This paper provides a comprehensive review of various data reduction techniques and introduces a taxonomy to classify these methods based on the type of data loss. The experiments conducted in this study include distinct data types, assessing the performance and applicability of these techniques across different datasets. Full article

(This article belongs to the Special Issue Knowledge and Data Engineering)

► Show Figures

Figure 1

24 pages, 1032 KiB

Open AccessArticle

Recommendation Algorithm Based on Survival Action Rules

by Marek Hermansa, Marek Sikora, Beata Sikora and Łukasz Wróbel

Appl. Sci. 2024, 14(7), 2939; https://doi.org/10.3390/app14072939 - 30 Mar 2024

Viewed by 1480

Abstract

Survival analysis is widely used in fields such as medical research and reliability engineering to analyze data where not all subjects experience the event of interest by the end of the study. It requires dedicated methods capable of handling censored cases. This paper [...] Read more.

Survival analysis is widely used in fields such as medical research and reliability engineering to analyze data where not all subjects experience the event of interest by the end of the study. It requires dedicated methods capable of handling censored cases. This paper extends the collection of techniques applicable to censored data by introducing a novel algorithm for interpretable recommendations based on a set of survival action rules. Each action rule contains recommendations for changing the values of attributes describing examples. As a result of applying the action rules, an example is moved from a group characterized by a survival curve to another group with a significantly different survival rate. In practice, an example can be covered by several induced rules. To decide which attribute values should be changed, we propose a recommendation algorithm that analyzes all actions suggested by the rules covering the example. The efficiency of the algorithm has been evaluated on several benchmark datasets. We also present a qualitative analysis of the generated recommendations through a case study. The results indicate that the proposed method produces high-quality recommendations and leads to a significant change in the estimated survival time. Full article

(This article belongs to the Special Issue Knowledge and Data Engineering)

► Show Figures

Figure 1

27 pages, 6880 KiB

Open AccessArticle

Using Level-Based Multiple Reasoning in a Web-Based Intelligent System for the Diagnosis of Farmed Fish Diseases

by Konstantinos Kovas, Ioannis Hatzilygeroudis, Konstantinos Dimitropoulos, Georgios Spiliopoulos, Konstantinos Poulos, Evi Abatzidou, Theofanis Aravanis, Aristeidis Ilias, Grigorios Kanlis and John A. Theodorou

Appl. Sci. 2023, 13(24), 13059; https://doi.org/10.3390/app132413059 - 7 Dec 2023

Cited by 3 | Viewed by 2148

Abstract

Farmed fish disease diagnosis is an important problem in the fish farming industry, affecting quality of production and financial losses. In this paper, we present a web-based intelligent system that tackles the problem of fish disease diagnosis. To this end, it uses multiple [...] Read more.

Farmed fish disease diagnosis is an important problem in the fish farming industry, affecting quality of production and financial losses. In this paper, we present a web-based intelligent system that tackles the problem of fish disease diagnosis. To this end, it uses multiple knowledge representation and reasoning methods: rule-based, case-based, weight-based, and voting. Knowledge, which concerns the diagnosis of sea bass diseases, was acquired from experts in the field and represented in the form of decision trees. The diagnostic process is performed in two stages: a general one and a specialized one. In the general stage, a level-based diagnosis is performed, where environmental parameters, external signs, and internal signs are successively examined, and the three most probable diseases are identified. In the specialized stage, which is optional, a specialized expert system is used for each of the resulting diseases, where additional parameters concerning laboratory tests (microbiological, microscopic, molecular, and chemical) are considered. The general stage is the most useful, given that it can be performed on-site in real-time, whereas the specialized one requires time-consuming lab tests. The system also provides explanations for its decisions. Evaluation of the general-stage diagnostic process showed a top-3 accuracy of 78.79% on expert test cases and 94% on an artificial dataset. Full article

(This article belongs to the Special Issue Knowledge and Data Engineering)

► Show Figures

Figure 1

24 pages, 4084 KiB

Open AccessArticle

SKATEBOARD: Semantic Knowledge Advanced Tool for Extraction, Browsing, Organisation, Annotation, Retrieval, and Discovery

by Eleonora Bernasconi, Davide Di Pierro, Domenico Redavid and Stefano Ferilli

Appl. Sci. 2023, 13(21), 11782; https://doi.org/10.3390/app132111782 - 27 Oct 2023

Cited by 7 | Viewed by 2225

Abstract

This paper introduces Semantic Knowledge Advanced Tool for Extraction Browsing Organisation Annotation Retrieval and Discovery (SKATEBOARD), a tool designed to facilitate knowledge exploration through the application of semantic technologies. The demand for advanced solutions that streamline Knowledge Extraction, management, and visualisation, characterised by [...] Read more.

This paper introduces Semantic Knowledge Advanced Tool for Extraction Browsing Organisation Annotation Retrieval and Discovery (SKATEBOARD), a tool designed to facilitate knowledge exploration through the application of semantic technologies. The demand for advanced solutions that streamline Knowledge Extraction, management, and visualisation, characterised by abundant information, has grown substantially in the current era. Graph-based representations have emerged as a robust approach for uncovering intricate data relationships, complementing the capabilities offered by AI models. Acknowledging the transparency and user control challenges faced by AI-driven solutions, SKATEBOARD offers a comprehensive framework encompassing Knowledge Extraction, ontology development, management, and interactive exploration. By adhering to Linked Data principles and adopting graph-based exploration, SKATEBOARD provides users with a clear view of data relationships and dependencies. Furthermore, it integrates recommendation systems and reasoning capabilities to augment the knowledge discovery process, thus introducing a serendipity effect generated by the SKATEBOARD interface exploration. This paper elucidates SKATEBOARD’s functionalities while emphasising its user-centric design. After reviewing related research, we provide an overview of the SKATEBOARD pipeline, demonstrating its capacity to bridge RDF and LPG representations. Subsequent sections delve into Knowledge Extraction and exploration, culminating in the evaluation of the tool. SKATEBOARD empowers users to make informed decisions and uncover valuable insights within their data domains, with the added dimension of serendipitous discoveries facilitated by its interface exploration capabilities. Full article

(This article belongs to the Special Issue Knowledge and Data Engineering)

► Show Figures

Figure 1

18 pages, 2572 KiB

Open AccessArticle

A Domain-Oriented Entity Alignment Approach Based on Filtering Multi-Type Graph Neural Networks

by Yaoli Xu, Jinjun Zhong, Suzhi Zhang, Chenglin Li, Pu Li, Yanbu Guo, Yuhua Li, Hui Liang and Yazhou Zhang

Appl. Sci. 2023, 13(16), 9237; https://doi.org/10.3390/app13169237 - 14 Aug 2023

Cited by 2 | Viewed by 2058

Abstract

Owing to the heterogeneity and incomplete information present in various domain knowledge graphs, the alignment of distinct source entities that represent an identical real-world entity becomes imperative. Existing methods focus on cross-lingual knowledge graph alignment, and assume that the entities of knowledge graphs [...] Read more.

Owing to the heterogeneity and incomplete information present in various domain knowledge graphs, the alignment of distinct source entities that represent an identical real-world entity becomes imperative. Existing methods focus on cross-lingual knowledge graph alignment, and assume that the entities of knowledge graphs in the same language are unique. However, due to the ambiguity of language, heterogeneous knowledge graphs in the same language are often duplicated, and relationship triples are far less than those of cross-lingual knowledge graphs. Moreover, existing methods rarely exclude noisy entities in the process of alignment. These make it impossible for existing methods to deal effectively with the entity alignment of domain knowledge graphs. In order to address these issues, we propose a novel entity alignment approach based on domain-oriented embedded representation (DomainEA). Firstly, a filtering mechanism employs the language model to extract the semantic features of entities and to exclude noisy entities for each entity. Secondly, a Structural Aggregator (SA) incorporates multiple hidden layers to generate high-order neighborhood-aware embeddings of entities that have few relationship connections. An Attribute Aggregator (AA) introduces self-attention to dynamically calculate weights that represent the importance of the attribute values of the entities. Finally, the approach calculates a transformation matrix to map the embeddings of distinct domain knowledge graphs onto a unified space, and matches entities via the joint embeddings of the SA and AA. Compared to six state-of-the-art methods, our experimental results on multiple food datasets show the following: (i) Our approach achieves an average improvement of 6.9% on MRR. (ii) The size of the dataset has a subtle influence on our approach; there is a positive correlation between the expansion of the dataset size and an improvement in most of the metrics. (iii) We can achieve a significant improvement in the level of recall by employing a filtering mechanism that is limited to the top-100 nearest entities as the candidate pairs. Full article

(This article belongs to the Special Issue Knowledge and Data Engineering)

► Show Figures

Figure 1

17 pages, 2481 KiB

Open AccessArticle

Detection of Software Security Weaknesses Using Cross-Language Source Code Representation (CLaSCoRe)

by Sergiu Zaharia, Traian Rebedea and Stefan Trausan-Matu

Appl. Sci. 2023, 13(13), 7871; https://doi.org/10.3390/app13137871 - 4 Jul 2023

Cited by 1 | Viewed by 2686

Abstract

The research presented in the paper aims at increasing the capacity to identify security weaknesses in programming languages that are less supported by specialized security analysis tools, based on the knowledge gathered from securing the popular ones, for which security experts, scanners, and [...] Read more.

The research presented in the paper aims at increasing the capacity to identify security weaknesses in programming languages that are less supported by specialized security analysis tools, based on the knowledge gathered from securing the popular ones, for which security experts, scanners, and labeled datasets are, in general, available. This goal is vital in reducing the overall exposure of software applications. We propose a solution to expand the capabilities of security gaps detection to downstream languages, influenced by their more popular “ancestors” from the programming languages’ evolutionary tree, using language keyword tokenization and clustering based on word embedding techniques. We show that after training a machine learning algorithm on C, C++, and Java applications developed by a community of programmers with similar behavior of writing code, we can detect, with acceptable accuracy, similar vulnerabilities in C# source code written by the same community. To achieve this, we propose a core cross-language representation of source code, optimized for security weaknesses classifiers, named CLaSCoRe. Using this method, we can achieve zero-shot vulnerability detection—in our case, without using any training data with C# source code. Full article

(This article belongs to the Special Issue Knowledge and Data Engineering)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Knowledge and Data Engineering

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (13 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI