Electronics

Research

17 pages, 302 KiB

Open AccessArticle

Comparative Analysis of Graph Neural Networks and Transformers for Robust Fake News Detection: A Verification and Reimplementation Study

by Soveatin Kuntur, Maciej Krzywda, Anna Wróblewska, Marcin Paprzycki and Maria Ganzha

Electronics 2024, 13(23), 4784; https://doi.org/10.3390/electronics13234784 - 4 Dec 2024

Cited by 5 | Viewed by 6398

Abstract

This study compares Transformer-based models and Graph Neural Networks (GNNs) for fake news detection across three datasets: FakeNewsNet, ISOT, and WELFake. Transformer models (BERT, RoBERTa, GPT-2) demonstrated superior performance, achieving mean accuracies above 85% on FakeNewsNet and exceeding 98% on ISOT and WELFake. [...] Read more.

This study compares Transformer-based models and Graph Neural Networks (GNNs) for fake news detection across three datasets: FakeNewsNet, ISOT, and WELFake. Transformer models (BERT, RoBERTa, GPT-2) demonstrated superior performance, achieving mean accuracies above 85% on FakeNewsNet and exceeding 98% on ISOT and WELFake. Specifically, RoBERTa achieved 86.16% accuracy on FakeNewsNet and 99.99% on ISOT, while GPT-2 reached 99.72% on WELFake. In contrast, GNNs (GCN, GraphSAGE, GIN, GAT) exhibited lower performance. GCN achieved 71% accuracy on FakeNewsNet but dropped to 53.30% on ISOT and 50.28% on WELFake, with F1 scores reflecting similar trends. Other GNNs, like GraphSAGE, showed even lower results, particularly on ISOT and WELFake, where performance hovered around 50%. Our findings indicate that while Transformers provide exceptional accuracy and reliability, GNNs offer potential efficiency benefits for resource-constrained scenarios despite their lower predictive performance. This study informs model selection for fake news detection tasks and encourages the exploration of hybrid approaches to balance accuracy and computational efficiency. Full article

(This article belongs to the Special Issue Data Mining Applied in Natural Language Processing)

► Show Figures

Figure 1

17 pages, 2057 KiB

Open AccessArticle

Fake Review Detection Model Based on Comment Content and Review Behavior

by Pengfei Sun, Weihong Bi, Yifan Zhang, Qiuyu Wang, Feifei Kou, Tongwei Lu and Jinpeng Chen

Electronics 2024, 13(21), 4322; https://doi.org/10.3390/electronics13214322 - 4 Nov 2024

Viewed by 4120

Abstract

With the development of the Internet, services such as catering, beauty, accommodation, and entertainment can be reserved or consumed online. Therefore, consumers increasingly rely on online information to choose merchants, products, and services, with reviews becoming a crucial factor in their decision making. [...] Read more.

With the development of the Internet, services such as catering, beauty, accommodation, and entertainment can be reserved or consumed online. Therefore, consumers increasingly rely on online information to choose merchants, products, and services, with reviews becoming a crucial factor in their decision making. However, the authenticity of reviews is highly debated in the field of Internet-based process-of-life service consumption. In recent years, due to the rapid growth of these industries, the detection of fake reviews has gained increasing attention. Fake reviews seriously mislead customers and damage the authenticity of online reviews. Various fake review classifiers have been developed, taking into account the content of the reviews and the behavior involved in the reviews, such as rating, time, etc. However, there has been no research considering the credibility of reviewers and merchants as part of identifying fake reviews. In order to improve the accuracy of existing fake review classification and detection methods, this study utilizes a comment text processing module to model the content of reviews, utilizes a reviewer behavior processing module and a reviewed merchant behavior processing module to model consumer review behavior sequences that imply reviewer credibility and merchant review behavior sequences that imply merchant credibility, respectively, and finally merges the two features for fake review classification. The experimental results show that, compared to other models, the model proposed in this paper improves the classification performance by simultaneously modeling the content of reviews and the credibility of reviewers and merchants. Full article

(This article belongs to the Special Issue Data Mining Applied in Natural Language Processing)

► Show Figures

Figure 1

12 pages, 491 KiB

Open AccessArticle

Quantum-Inspired Fusion for Open-Domain Question Answering

by Ruixue Duan, Xin Liu, Zhigang Ding and Yangsen Zhang

Electronics 2024, 13(20), 4135; https://doi.org/10.3390/electronics13204135 - 21 Oct 2024

Viewed by 1405

Abstract

Open-domain question-answering systems need models capable of referencing multiple passages simultaneously to generate accurate answers. The Rational Fusion-in-Decoder (RFiD) model focuses on differentiating between causal relationships and spurious features by utilizing the encoders of the Fusion-in-Decoder model. However, RFiD reliance on partial token [...] Read more.

Open-domain question-answering systems need models capable of referencing multiple passages simultaneously to generate accurate answers. The Rational Fusion-in-Decoder (RFiD) model focuses on differentiating between causal relationships and spurious features by utilizing the encoders of the Fusion-in-Decoder model. However, RFiD reliance on partial token information limits its ability to determine whether the corresponding passage is a rationale for the question, potentially leading to inappropriate answers. To address this issue, we propose a Quantum-Inspired Fusion-in-Decoder (QFiD) model. Our approach introduces a Quantum Fusion Module (QFM) that maps single-dimensional into multi-dimensional hidden states, enabling the model to capture more comprehensive token information. Then, the classical mixture method from quantum information theory is used to fuse all information. Based on the fused information, the model can accurately predict the relationship between the question and passage. Experimental results on two prominent ODQA datasets, Natural Questions and TriviaQA, demonstrate that QFiD outperforms the strong baselines in automatic evaluations. Full article

(This article belongs to the Special Issue Data Mining Applied in Natural Language Processing)

► Show Figures

Figure 1

13 pages, 933 KiB

Open AccessArticle

Dynamic Assessment-Based Curriculum Learning Method for Chinese Grammatical Error Correction

by Ruixue Duan, Zhiyuan Ma, Yangsen Zhang, Zhigang Ding and Xiulei Liu

Electronics 2024, 13(20), 4079; https://doi.org/10.3390/electronics13204079 - 17 Oct 2024

Cited by 1 | Viewed by 1533

Abstract

Current mainstream for Chinese grammatical error correction methods rely on deep neural network models, which require a large amount of high-quality data for training. However, existing Chinese grammatical error correction corpora have a low annotation quality and high noise levels, leading to a [...] Read more.

Current mainstream for Chinese grammatical error correction methods rely on deep neural network models, which require a large amount of high-quality data for training. However, existing Chinese grammatical error correction corpora have a low annotation quality and high noise levels, leading to a low generalization ability of the models and difficulty in handling complex sentences. To address this issue, this paper proposes a dynamic assessment-based curriculum learning method for Chinese grammatical error correction. The proposed approach focuses on two key components: defining the difficulty of training samples and devising an effective training strategy. In the difficulty assessment phase, we enhance the accuracy of the curriculum sequence by dynamically updating the evaluation model. During the training strategy phase, a multi-stage dynamic progressive approach is employed to select training samples of varying difficulty levels, which helps prevent the model from prematurely converging to local optima and enhances the overall training effectiveness. Experimental results on the MuCGEC and NLPCC 2018 Chinese grammatical error correction datasets show that the proposed curriculum learning method significantly improves the model’s error correction performance, with F0.5 scores increasing by 0.9 and 1.05, respectively, validating the method’s effectiveness. Full article

(This article belongs to the Special Issue Data Mining Applied in Natural Language Processing)

► Show Figures

Figure 1

17 pages, 1076 KiB

Open AccessArticle

Prompt-Based End-to-End Cross-Domain Dialogue State Tracking

by Hengtong Lu, Lucen Zhong, Huixing Jiang, Wei Chen, Caixia Yuan and Xiaojie Wang

Electronics 2024, 13(18), 3587; https://doi.org/10.3390/electronics13183587 - 10 Sep 2024

Viewed by 1432

Abstract

Cross-domain dialogue state tracking (DST) focuses on using labeled data from source domains to train a DST model for target domains. It is of great significance for transferring a dialogue system into new domains. Most of the existing cross-domain DST models track each [...] Read more.

Cross-domain dialogue state tracking (DST) focuses on using labeled data from source domains to train a DST model for target domains. It is of great significance for transferring a dialogue system into new domains. Most of the existing cross-domain DST models track each slot independently, which leads to poor performances caused by not considering the correlation among different slots, as well as low efficiency of training and inference. This paper, therefore, proposes a prompt-based end-to-end cross-domain DST method for efficiently tracking all slots simultaneously. A dynamic prompt template shuffle method is proposed to alleviate the bias of the slot order, and a dynamic prompt template sampling method is proposed to alleviate the bias of the slot number, respectively. The experimental results on the MultiWOZ 2.0 and MultiWOZ 2.1 datasets show that our approach consistently outperforms the state-of-the-art baselines in all target domains and improves both training and inference efficiency by at least 5 times. Full article

(This article belongs to the Special Issue Data Mining Applied in Natural Language Processing)

► Show Figures

Figure 1

18 pages, 1584 KiB

Open AccessArticle

Automatic Generation and Evaluation of French-Style Chinese Modern Poetry

by Li Zuo, Dengke Zhang, Yuhai Zhao and Guoren Wang

Electronics 2024, 13(13), 2659; https://doi.org/10.3390/electronics13132659 - 6 Jul 2024

Viewed by 1443

Abstract

Literature has a strong cultural imprint and regional color, including poetry. Natural language itself is part of the poetry style. It is interesting to attempt to use one language to present poetry in another language style. Therefore, in this study, we propose a [...] Read more.

Literature has a strong cultural imprint and regional color, including poetry. Natural language itself is part of the poetry style. It is interesting to attempt to use one language to present poetry in another language style. Therefore, in this study, we propose a method to fine-tune a pre-trained model in a targeted manner to automatically generate French-style modern Chinese poetry and conduct a multi-faceted evaluation of the generated results. In a five-point scale based on human evaluation, judges assigned scores between 3.29 and 3.93 in seven dimensions, which reached 80.8–93.6% of the scores of the Chinese versions of real French poetry in these dimensions. In terms of the high-frequency poetic imagery, the consistency of the top 30–50 high-frequency poetic images between the poetry generated by the fine-tuned model and the French poetry reached 50–60%. In terms of the syntactic features, compared with the poems generated by the baseline model, the distribution frequencies of three special types of words that appear relatively frequently in French poetry increased by 12.95%, 15.81%, and 284.44% per 1000 Chinese characters in the poetry generated by the fine-tuned model. The human evaluation, poetic image distribution, and syntactic feature statistics show that the targeted fine-tuned model is helpful for the spread of language style. This fine-tuned model can successfully generate modern Chinese poetry in a French style. Full article

(This article belongs to the Special Issue Data Mining Applied in Natural Language Processing)

► Show Figures

Figure 1

20 pages, 728 KiB

Open AccessArticle

Semantic Augmentation in Chinese Adversarial Corpus for Discourse Relation Recognition Based on Internal Semantic Elements

by Zheng Hua, Ruixia Yang, Yanbin Feng and Xiaojun Yin

Electronics 2024, 13(10), 1944; https://doi.org/10.3390/electronics13101944 - 15 May 2024

Viewed by 1526

Abstract

This paper proposes incorporating linguistic semantic information into discourse relation recognition and constructing a Semantic Augmented Chinese Discourse Corpus (SACA) comprising 9546 adversative complex sentences. In adversative complex sentences, we suggest a quadruple (P, Q, R,

Q_{β}

) [...] Read more.

This paper proposes incorporating linguistic semantic information into discourse relation recognition and constructing a Semantic Augmented Chinese Discourse Corpus (SACA) comprising 9546 adversative complex sentences. In adversative complex sentences, we suggest a quadruple (P, Q, R,

Q_{β}

) representing internal semantic elements, where the semantic opposition between Q and

Q_{β}

forms the basis of the adversative relationship. P denotes the premise, and R represents the adversative reason. The overall annotation approach of this corpus follows the Penn Discourse Treebank (PDTB), except for the classification of senses. We combined insights from the Chinese Discourse Treebank (CDTB) and obtained eight sense categories for Chinese adversative complex sentences. Based on this corpus, we explore the relationship between sense classification and internal semantic elements within our newly proposed Chinese Adversative Discourse Relation Recognition (CADRR) task. Leveraging deep learning techniques, we constructed various classification models and the model that utilizes internal semantic element features, demonstrating their effectiveness and the applicability of our SACA corpus. Compared with pre-trained models, our model incorporates internal semantic element information to achieve state-of-the-art performance. Full article

(This article belongs to the Special Issue Data Mining Applied in Natural Language Processing)

► Show Figures

Figure 1

20 pages, 2744 KiB

Open AccessArticle

CogCol: Code Graph-Based Contrastive Learning Model for Code Summarization

by Yucen Shi, Ying Yin, Mingqian Yu and Liangyu Chu

Electronics 2024, 13(10), 1816; https://doi.org/10.3390/electronics13101816 - 8 May 2024

Viewed by 1801

Abstract

Summarizing source code by natural language aims to help developers better understand existing code, making software development more efficient. Since source code is highly structured, recent research uses code structure information like Abstract Semantic Tree (AST) to enhance the structure understanding rather than [...] Read more.

Summarizing source code by natural language aims to help developers better understand existing code, making software development more efficient. Since source code is highly structured, recent research uses code structure information like Abstract Semantic Tree (AST) to enhance the structure understanding rather than a normal translation task. However, AST can only represent the syntactic relationship of code snippets, which can not reflect high-level relationships like control and data dependency in the program dependency graph. Moreover, researchers treat the AST as the unique structure information of one code snippet corresponding to one summarization. It will be easily affected by simple perturbations as it lacks the understanding of code with similar structure. To handle the above problems, we build CogCol, a Code graph-based Contrastive learning model. CogCol is a Transformer-based model that converts code graphs into unique sequences to enhance the model’s structure learning. In detail, CogCol uses supervised contrastive learning by building several kinds of code graphs as positive samples to enhance the structural representation of code snippets and generalizability. Moreover, experiments on the widely used open-source dataset show that CogCol can significantly improve the state-of-the-art code summarization models under Meteor, BLEU, and ROUGE. Full article

(This article belongs to the Special Issue Data Mining Applied in Natural Language Processing)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Data Mining Applied in Natural Language Processing

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (8 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI