Next Article in Journal
Decentralized Zone-Based PKI: A Lightweight Security Framework for IoT Ecosystems
Next Article in Special Issue
Prediction of Disk Failure Based on Classification Intensity Resampling
Previous Article in Journal
The Impact of Input Types on Smart Contract Vulnerability Detection Performance Based on Deep Learning: A Preliminary Study
Previous Article in Special Issue
Detection of Korean Phishing Messages Using Biased Discriminant Analysis under Extreme Class Imbalance Problem
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

A Framework Model of Mining Potential Public Opinion Events Pertaining to Suspected Research Integrity Issues with the Text Convolutional Neural Network Model and a Mixed Event Extractor

School of Management, Shanghai University, Shanghai 200444, China
*
Author to whom correspondence should be addressed.
Information 2024, 15(6), 303; https://doi.org/10.3390/info15060303
Submission received: 5 May 2024 / Revised: 19 May 2024 / Accepted: 21 May 2024 / Published: 24 May 2024

Abstract

With the development of the Internet, the oversight of research integrity issues has extended beyond the scientific community to encompass the whole of society. If these issues are not addressed promptly, they can significantly impact the research credibility of both institutions and scholars. This article proposes a text convolutional neural network based on SMOTE to identify short texts of potential public opinion events related to suspected scientific integrity issues from common short texts. The SMOTE comprehensive sampling technique is employed to handle imbalanced datasets. To mitigate the impact of short text length on text representation quality, the Doc2vec embedding model is utilized to represent short text, yielding a one-dimensional dense vector. Additionally, the dimensions of the input layer and convolution kernel of TextCNN are adjusted. Subsequently, a short text event extraction model based on TF-IDF and TextRank is proposed to extract crucial information, for instance, names and research-related institutions, from events and facilitate the identification of potential public opinion events related to suspected scientific integrity issues. Results of experiments have demonstrated that utilizing SMOTE to balance the dataset is able to improve the classification results of TextCNN classifiers. Compared to traditional classifiers, TextCNN exhibits greater robustness in addressing the problems of imbalanced datasets. However, challenges such as low information content, non-standard writing, and polysemy in short texts may impact the accuracy of event extraction. The framework can be further optimized to address these issues in the future.
Keywords: research integrity; TextCNN; SMOTE; event mining research integrity; TextCNN; SMOTE; event mining

Share and Cite

MDPI and ACS Style

Zou, Z.; Ji, X.; Li, Y. A Framework Model of Mining Potential Public Opinion Events Pertaining to Suspected Research Integrity Issues with the Text Convolutional Neural Network Model and a Mixed Event Extractor. Information 2024, 15, 303. https://doi.org/10.3390/info15060303

AMA Style

Zou Z, Ji X, Li Y. A Framework Model of Mining Potential Public Opinion Events Pertaining to Suspected Research Integrity Issues with the Text Convolutional Neural Network Model and a Mixed Event Extractor. Information. 2024; 15(6):303. https://doi.org/10.3390/info15060303

Chicago/Turabian Style

Zou, Zongfeng, Xiaochen Ji, and Yingying Li. 2024. "A Framework Model of Mining Potential Public Opinion Events Pertaining to Suspected Research Integrity Issues with the Text Convolutional Neural Network Model and a Mixed Event Extractor" Information 15, no. 6: 303. https://doi.org/10.3390/info15060303

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop