Advances in Data Mining and Knowledge Discovery

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Networks".

Deadline for manuscript submissions: closed (15 January 2023) | Viewed by 18050

Special Issue Editors


E-Mail Website
Guest Editor
Division of General Studies, Computer Science, Kangnam University, Yongin 16979, Korea
Interests: artificial intelligence; machine learning; konwledge base system; data mining; ICA; emotional system and intelligent system modeling

E-Mail Website
Guest Editor
Department of Computer Science and Engineering, Korea University, Seoul 02841, Korea
Interests: big data; smart computing

E-Mail Website
Guest Editor
Division of Information and Communication Engineering, Kongju National University, Cheonan 331717, Republic of Korea
Interests: machine learning; konwledge base system; data mining
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The use of data mining and knowledge discovery for extracting useful information from flood data can be compared to the task of digging gold in a huge mountain. Data mining technology is an important field of AI that is an essential part of future technologies, providing the heavy lifting for many of the most challenging problems in computer engineering. In the data mining process, it is very important to extract useful data, construct data structure, and manage data. This involves sophisticated high-tech methods that must be supported to manipulated valuable data with accuracy and time fit from flood data. The aim of this Special Issue is to provide in-depth high-tech methods related to the topic of data mining and knowledge discovery which can contribute to the development of advanced AI technology. We invite researchers to contribute original and unique papers. Topics include, but are not limited to, the following areas:

∙Data mining

∙Knowledge discovery

∙Knowledge acquisition

∙Knowledge Representation

∙Knowledge intensive problem solving techniques

∙Knowledge networks and management

∙Intelligent information systems

∙Reasoning strategies

∙Neural networks and variations including deep learning

∙Intelligent tutoring system

∙Brain or Bio modeling system

∙Hierarchical Learning models

∙Bayesian methods

∙Independent Component Analysis

∙Computational Neuroscience

∙Applications including computer vision, signal processing, pattern recognition, robotics, medicine, finance, education and emerging applications.

Prof. Dr. JeongYon Shim
Prof. Hyeoncheol Kim
Dr. Kyu-Tae Lee
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Data mining and Knowledge discovery
  • Knowledge intensive problem solving techniques
  • Knowledge networks and management
  • Neural networks and variations including deep learning
  • Brain or Bio modeling system
  • Hierarchical Learning models
  • Bayesian methods
  • Independent Component Analysis

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

17 pages, 821 KiB  
Article
SeAttE: An Embedding Model Based on Separating Attribute Space for Knowledge Graph Completion
by Zongwei Liang, Junan Yang, Hui Liu, Keju Huang, Lingzhi Qu, Lin Cui and Xiang Li
Electronics 2022, 11(7), 1058; https://doi.org/10.3390/electronics11071058 - 28 Mar 2022
Cited by 4 | Viewed by 1752
Abstract
Knowledge graphs are structured representations of real world facts. However, they typically contain only a small subset of all possible facts. Link prediction is the task of inferring missing facts based on existing ones. Knowledge graph embedding, representing entities and relations in the [...] Read more.
Knowledge graphs are structured representations of real world facts. However, they typically contain only a small subset of all possible facts. Link prediction is the task of inferring missing facts based on existing ones. Knowledge graph embedding, representing entities and relations in the knowledge graphs with high-dimensional vectors, has made significant progress in link prediction. The tensor decomposition models are an embedding family with good performance in link prediction. The previous tensor decomposition models do not consider the problem of attribute separation. These models mainly explore particular regularization to improve performance. No matter how sophisticated the design of tensor decomposition models is, the performance is theoretically under the basic tensor decomposition model. Moreover, the unnoticed task of attribute separation in the traditional models is just handed over to the training. However, the amount of parameters for this task is tremendous, and the model is prone to overfitting. We investigate the design approaching the theoretical performance of tensor decomposition models in this paper. The observation that measuring the rationality of specific triples means comparing the matching degree of the specific attributes associated with the relations is well-known. Therefore, the comparison of actual triples needs first to separate specific attribute dimensions, which is ignored by existing models. Inspired by this observation, we design a novel tensor ecomposition model based on Separating Attribute space for knowledge graph completion (SeAttE). The major novelty of this paper is that SeAttE is the first model among the tensor decomposition family to consider the attribute space separation task. Furthermore, SeAttE transforms the learning of too many parameters for the attribute space separation task into the structure’s design. This operation allows the model to focus on learning the semantic equivalence between relations, causing the performance to approach the theoretical limit. We also prove that RESCAL, DisMult and ComplEx are special cases of SeAttE in this paper. Furthermore, we classify existing tensor decomposition models for subsequent researchers. Experiments on the benchmark datasets show that SeAttE has achieved state-of-the-art among tensor decomposition models. Full article
(This article belongs to the Special Issue Advances in Data Mining and Knowledge Discovery)
Show Figures

Figure 1

13 pages, 476 KiB  
Article
HRER: A New Bottom-Up Rule Learning for Knowledge Graph Completion
by Zongwei Liang, Junan Yang, Hui Liu, Keju Huang, Lin Cui, Lingzhi Qu and Xiang Li
Electronics 2022, 11(6), 908; https://doi.org/10.3390/electronics11060908 - 15 Mar 2022
Cited by 3 | Viewed by 1980
Abstract
Knowledge graphs (KGs) are collections of structured facts, which have recently attracted growing attention. Although there are billions of triples in KGs, they are still incomplete. These incomplete knowledge bases will bring limitations to practical applications. Predicting new facts from the given knowledge [...] Read more.
Knowledge graphs (KGs) are collections of structured facts, which have recently attracted growing attention. Although there are billions of triples in KGs, they are still incomplete. These incomplete knowledge bases will bring limitations to practical applications. Predicting new facts from the given knowledge graphs is an increasingly important area. We investigate the models based on logic rules in this paper. This paper proposes HRER, a new bottom-up rule learning for knowledge graph completion. First of all, inspired by the observation that the known information of KGs is incomplete and unbalanced, HRER modifies the indicators for screening based on the existing relation rule mining methods. The new metric HRR is more effective than traditional confidences in filtering Horn rules. Besides, motivated by the differences between the embedding-based methods and the methods based on logic rules, HRER proposes entity rules. The entity rules make up for the limited expression of Horn rules to some extent. HRER needs a few parameters to control the number of rules and can provide the explanation for prediction. Experiments show that HRER achieves the state-of-the-art across the standard link prediction datasets. Full article
(This article belongs to the Special Issue Advances in Data Mining and Knowledge Discovery)
Show Figures

Figure 1

15 pages, 2318 KiB  
Article
Sentence Augmentation for Language Translation Using GPT-2
by Ranto Sawai, Incheon Paik and Ayato Kuwana
Electronics 2021, 10(24), 3082; https://doi.org/10.3390/electronics10243082 - 10 Dec 2021
Cited by 6 | Viewed by 3450
Abstract
Data augmentation has recently become an important method for improving performance in deep learning. It is also a significant issue in machine translation, and various innovations such as back-translation and noising have been made. In particular, current state-of-the-art model architectures such as BERT-fused [...] Read more.
Data augmentation has recently become an important method for improving performance in deep learning. It is also a significant issue in machine translation, and various innovations such as back-translation and noising have been made. In particular, current state-of-the-art model architectures such as BERT-fused or efficient data generation using the GPT model provide good inspiration to improve the translation performance. In this study, we propose the generation of additional data for neural machine translation (NMT) using a sentence generator by GPT-2 that produces similar characteristics to the original. BERT-fused architecture and back-translation are employed for the translation architecture. In our experiments, the model produced BLEU scores of 27.50 for tatoebaEn-Ja, 30.14 for WMT14En-De, and 24.12 for WMT18En-Ch. Full article
(This article belongs to the Special Issue Advances in Data Mining and Knowledge Discovery)
Show Figures

Figure 1

14 pages, 1531 KiB  
Article
Improving Text-to-Code Generation with Features of Code Graph on GPT-2
by Incheon Paik and Jun-Wei Wang
Electronics 2021, 10(21), 2706; https://doi.org/10.3390/electronics10212706 - 05 Nov 2021
Cited by 2 | Viewed by 5769
Abstract
Code generation, as a very hot application area of deep learning models for text, consists of two different fields: code-to-code and text-to-code. A recent approach, GraphCodeBERT uses code graph, which is called data flow, and showed good performance improvement. The base model architecture [...] Read more.
Code generation, as a very hot application area of deep learning models for text, consists of two different fields: code-to-code and text-to-code. A recent approach, GraphCodeBERT uses code graph, which is called data flow, and showed good performance improvement. The base model architecture of it is bidirectional encoder representations from transformers (BERT), which uses the encoder part of a transformer. On the other hand, generative pre-trained transformer (GPT)—another multiple transformer architecture—uses the decoder part and shows great performance in the multilayer perceptron model. In this study, we investigate the improvement of code graphs with several variances on GPT-2 to refer to the abstract semantic tree used to collect the features of variables in the code. Here, we mainly focus on GPT-2 with additional features of code graphs that allow the model to learn the effect of the data stream. The experimental phase is divided into two parts: fine-tuning of the existing GPT-2 model, and pre-training from scratch using code data. When we pre-train a new model from scratch, the model produces an outperformed result compared with using the code graph with enough data. Full article
(This article belongs to the Special Issue Advances in Data Mining and Knowledge Discovery)
Show Figures

Figure 1

20 pages, 41669 KiB  
Article
Feature-Based Interpretation of the Deep Neural Network
by Eun-Hun Lee and Hyeoncheol Kim
Electronics 2021, 10(21), 2687; https://doi.org/10.3390/electronics10212687 - 03 Nov 2021
Cited by 1 | Viewed by 1425
Abstract
The significant advantage of deep neural networks is that the upper layer can capture the high-level features of data based on the information acquired from the lower layer by stacking layers deeply. Since it is challenging to interpret what knowledge the neural network [...] Read more.
The significant advantage of deep neural networks is that the upper layer can capture the high-level features of data based on the information acquired from the lower layer by stacking layers deeply. Since it is challenging to interpret what knowledge the neural network has learned, various studies for explaining neural networks have emerged to overcome this problem. However, these studies generate the local explanation of a single instance rather than providing a generalized global interpretation of the neural network model itself. To overcome such drawbacks of the previous approaches, we propose the global interpretation method for the deep neural network through features of the model. We first analyzed the relationship between the input and hidden layers to represent the high-level features of the model, then interpreted the decision-making process of neural networks through high-level features. In addition, we applied network pruning techniques to make concise explanations and analyzed the effect of layer complexity on interpretability. We present experiments on the proposed approach using three different datasets and show that our approach could generate global explanations on deep neural network models with high accuracy and fidelity. Full article
(This article belongs to the Special Issue Advances in Data Mining and Knowledge Discovery)
Show Figures

Figure 1

18 pages, 2798 KiB  
Article
PSO Based Optimized Ensemble Learning and Feature Selection Approach for Efficient Energy Forecast
by Wafa Shafqat, Sehrish Malik, Kyu-Tae Lee and Do-Hyeun Kim
Electronics 2021, 10(18), 2188; https://doi.org/10.3390/electronics10182188 - 07 Sep 2021
Cited by 15 | Viewed by 2504
Abstract
Swarm intelligence techniques with incredible success rates are broadly used for various irregular and interdisciplinary topics. However, their impact on ensemble models is considerably unexplored. This study proposes an optimized-ensemble model integrated for smart home energy consumption management based on ensemble learning and [...] Read more.
Swarm intelligence techniques with incredible success rates are broadly used for various irregular and interdisciplinary topics. However, their impact on ensemble models is considerably unexplored. This study proposes an optimized-ensemble model integrated for smart home energy consumption management based on ensemble learning and particle swarm optimization (PSO). The proposed model exploits PSO in two distinct ways; first, PSO-based feature selection is performed to select the essential features from the raw dataset. Secondly, with larger datasets and comprehensive range problems, it can become a cumbersome task to tune hyper-parameters in a trial-and-error manner manually. Therefore, PSO was used as an optimization technique to fine-tune hyper-parameters of the selected ensemble model. A hybrid ensemble model is built by using combinations of five different baseline models. Hyper-parameters of each combination model were optimized using PSO followed by training on different random samples. We compared our proposed model with our previously proposed ANN-PSO model and a few other state-of-the-art models. The results show that optimized-ensemble learning models outperform individual models and the ANN-PSO model by minimizing RMSE to 6.05 from 9.63 and increasing the prediction accuracy by 95.6%. Moreover, our results show that random sampling can help improve prediction results compared to the ANN-PSO model from 92.3% to around 96%. Full article
(This article belongs to the Special Issue Advances in Data Mining and Knowledge Discovery)
Show Figures

Figure 1

Back to TopTop