Advances in Text Mining Techniques and Applications for Knowledge Discovery

A special issue of Future Internet (ISSN 1999-5903). This special issue belongs to the section "Big Data and Augmented Intelligence".

Deadline for manuscript submissions: closed (31 March 2024) | Viewed by 15472

Special Issue Editors


E-Mail Website
Guest Editor
1. Department of Economics and International Relations–DERI, Faculty of Economics–FCE, Universidade Federal do Rio Grande do Sul—UFRGS, Porto Alegre 90040-000, Brazil
2. Interdisciplinary Center for Studies and Research in Agribusiness–CEPAN, Universidade Federal do Rio Grande do Sul–UFRGS, Porto Alegre 90040-060, Brazil
Interests: bioeconomics; bioeconomy; sustainability; agribusiness; agriculture; food systems; text mining
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
1. Department of Economics and International Relations–DERI, Faculty of Economics–FCE, Universidade Federal do Rio Grande do Sul—UFRGS, Porto Alegre 90040-000, Brazil
2. Interdisciplinary Center for Studies and Research in Agribusiness–CEPAN, Universidade Federal do Rio Grande do Sul–UFRGS, Porto Alegre 90040-060, Brazil
Interests: agribusiness; sustainability; finance; decision making; entrepreneurship and innovation; blockchain; circular bioeconomy; systematic review; bibliometrics; scientometrics
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Twenty years ago, it was estimated that 80% of information was transmitted in a text format. Considering the value of the information contained in a text, text mining techniques began to be developed to process large volumes of writing and extract valuable knowledge for decision makers. With the advances in information and communication technologies, the amount of information in textual outputs has likely increased considerably as of now. In addition, the rise of social media and content platforms have become powerful channels for transmitting information in text, images, video, and audio, making digital knowledge vast and accessible but relatively diffuse.

Organizing digital information and extracting valuable knowledge requires appropriate techniques. Therefore, text mining techniques for knowledge discovery represent an essential method for processing and systematizing the enormous amount of information available in the literature, social media, image files, and video and audio records, etc. By using natural and technical language, extracting the context and meanings of information from a textual database about a particular phenomenon or situation is possible. The text mining process occurs through a set of data mining techniques and metrics, machine learning, neural networks, and computational linguistics, among others, all combined with ontology, semantics, and linguistics knowledge. Text mining can be used either as a research method or as an object of study itself.

This Special Issue seeks original, unpublished articles that address recent advances in text mining techniques as well as their applications. Authors are invited to submit manuscripts addressing the development of new text mining techniques, such as algorithms, software, computational routines, metrics, and others, that enable processing information in text, image, video, and audio formats. Applications of text mining techniques in different contexts, showing their potential and practical relevance for advancing science, knowledge discovery, and supporting decision making, are also within the scope of this Special Issue. Studies that show the historical evolution of the development of techniques and applications with an emphasis on state-of-the-art and future perspectives are also welcomed. Technical papers, reviews, surveys, and case studies are encouraged. Topics of interest include but are not limited to the following:

  • Development of software for text mining;
  • Development of independent or shared routines, algorithms, or programming resources (R, VosViewer, Gephi, Pajek, Python, SAS, WordStat, SPSS, and others);
  • Applications of Knowledge Discovery in Text in real-life cases (journalism, advertisement, merchandising, marketing, social media, policy and politics, sociology, environment, agricultural sciences, biology, medicine, psychology, information science, management, engineering, and technology, etc.);
  • Text mining techniques and applications in knowledge discovery in image-to-text, video-to-text, and audio-to-text;
  • Natural language processing—NLP;
  • Content analysis automation;
  • Emotion and sentiment analysis;
  • Machine learning and learning algorithm in continuous text mining process;
  • Artificial intelligence and computational linguistics;
  • Big data, data mining, and text mining;
  • Neural networks;
  • Future trends in the development of techniques and applications in text mining;
  • Chatbots and Automatic question answering;
  • Information retrieval and extraction;
  • Ontologies and Knowledge Representation.

You may choose our Joint Special Issue in Data.

Dr. Edson Talamini
Dr. Letícia De Oliveira
Dr. Filipe Portela
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Future Internet is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • text mining
  • knowledge discovery in text
  • content analysis
  • text analysis
  • big data
  • artificial intelligence
  • information retrieval
  • audio-to-text mining
  • video-to-text mining
  • image-to-text mining
  • algorithms
  • software
  • applications

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

15 pages, 3855 KiB  
Article
Advanced Techniques for Geospatial Referencing in Online Media Repositories
by Dominik Warch, Patrick Stellbauer and Pascal Neis
Future Internet 2024, 16(3), 87; https://doi.org/10.3390/fi16030087 - 1 Mar 2024
Cited by 1 | Viewed by 1704
Abstract
In the digital transformation era, video media libraries’ untapped potential is immense, restricted primarily by their non-machine-readable nature and basic search functionalities limited to standard metadata. This study presents a novel multimodal methodology that utilizes advances in artificial intelligence, including neural networks, computer [...] Read more.
In the digital transformation era, video media libraries’ untapped potential is immense, restricted primarily by their non-machine-readable nature and basic search functionalities limited to standard metadata. This study presents a novel multimodal methodology that utilizes advances in artificial intelligence, including neural networks, computer vision, and natural language processing, to extract and geocode geospatial references from videos. Leveraging the geospatial information from videos enables semantic searches, enhances search relevance, and allows for targeted advertising, particularly on mobile platforms. The methodology involves a comprehensive process, including data acquisition from ARD Mediathek, image and text analysis using advanced machine learning models, and audio and subtitle processing with state-of-the-art linguistic models. Despite challenges like model interpretability and the complexity of geospatial data extraction, this study’s findings indicate significant potential for advancing the precision of spatial data analysis within video content, promising to enrich media libraries with more navigable, contextually rich content. This advancement has implications for user engagement, targeted services, and broader urban planning and cultural heritage applications. Full article
Show Figures

Figure 1

36 pages, 1052 KiB  
Article
A Structured Narrative Prompt for Prompting Narratives from Large Language Models: Sentiment Assessment of ChatGPT-Generated Narratives and Real Tweets
by Christopher J. Lynch, Erik J. Jensen, Virginia Zamponi, Kevin O’Brien, Erika Frydenlund and Ross Gore
Future Internet 2023, 15(12), 375; https://doi.org/10.3390/fi15120375 - 23 Nov 2023
Cited by 7 | Viewed by 4157
Abstract
Large language models (LLMs) excel in providing natural language responses that sound authoritative, reflect knowledge of the context area, and can present from a range of varied perspectives. Agent-based models and simulations consist of simulated agents that interact within a simulated environment to [...] Read more.
Large language models (LLMs) excel in providing natural language responses that sound authoritative, reflect knowledge of the context area, and can present from a range of varied perspectives. Agent-based models and simulations consist of simulated agents that interact within a simulated environment to explore societal, social, and ethical, among other, problems. Simulated agents generate large volumes of data and discerning useful and relevant content is an onerous task. LLMs can help in communicating agents’ perspectives on key life events by providing natural language narratives. However, these narratives should be factual, transparent, and reproducible. Therefore, we present a structured narrative prompt for sending queries to LLMs, we experiment with the narrative generation process using OpenAI’s ChatGPT, and we assess statistically significant differences across 11 Positive and Negative Affect Schedule (PANAS) sentiment levels between the generated narratives and real tweets using chi-squared tests and Fisher’s exact tests. The narrative prompt structure effectively yields narratives with the desired components from ChatGPT. In four out of forty-four categories, ChatGPT generated narratives which have sentiment scores that were not discernibly different, in terms of statistical significance (alpha level α=0.05), from the sentiment expressed in real tweets. Three outcomes are provided: (1) a list of benefits and challenges for LLMs in narrative generation; (2) a structured prompt for requesting narratives of an LLM chatbot based on simulated agents’ information; (3) an assessment of statistical significance in the sentiment prevalence of the generated narratives compared to real tweets. This indicates significant promise in the utilization of LLMs for helping to connect a simulated agent’s experiences with real people. Full article
Show Figures

Figure 1

19 pages, 786 KiB  
Article
Sentiment Analysis of Chinese Product Reviews Based on Fusion of DUAL-Channel BiLSTM and Self-Attention
by Ye Yuan, Wang Wang, Guangze Wen, Zikun Zheng and Zhemin Zhuang
Future Internet 2023, 15(11), 364; https://doi.org/10.3390/fi15110364 - 10 Nov 2023
Cited by 1 | Viewed by 2286
Abstract
Product reviews provide crucial information for both consumers and businesses, offering insights needed before purchasing a product or service. However, existing sentiment analysis methods, especially for Chinese language, struggle to effectively capture contextual information due to the complex semantics, multiple sentiment polarities, and [...] Read more.
Product reviews provide crucial information for both consumers and businesses, offering insights needed before purchasing a product or service. However, existing sentiment analysis methods, especially for Chinese language, struggle to effectively capture contextual information due to the complex semantics, multiple sentiment polarities, and long-term dependencies between words. In this paper, we propose a sentiment classification method based on the BiLSTM algorithm to address these challenges in natural language processing. Self-Attention-CNN BiLSTM (SAC-BiLSTM) leverages dual channels to extract features from both character-level embeddings and word-level embeddings. It combines BiLSTM and Self-Attention mechanisms for feature extraction and weight allocation, aiming to overcome the limitations in mining contextual information. Experiments were conducted on the onlineshopping10cats dataset, which is a standard corpus of e-commerce shopping reviews available in the ChineseNlpCorpus 2018. The experimental results demonstrate the effectiveness of our proposed algorithm, with Recall, Precision, and F1 scores reaching 0.9409, 0.9369, and 0.9404, respectively. Full article
Show Figures

Figure 1

46 pages, 17840 KiB  
Communication
A Comprehensive Analysis and Investigation of the Public Discourse on Twitter about Exoskeletons from 2017 to 2023
by Nirmalya Thakur, Kesha A. Patel, Audrey Poon, Rishika Shah, Nazif Azizi and Changhee Han
Future Internet 2023, 15(10), 346; https://doi.org/10.3390/fi15100346 - 22 Oct 2023
Cited by 1 | Viewed by 2833
Abstract
Exoskeletons have emerged as a vital technology in the last decade and a half, with diverse use cases in different domains. Even though several works related to the analysis of Tweets about emerging technologies exist, none of those works have focused on the [...] Read more.
Exoskeletons have emerged as a vital technology in the last decade and a half, with diverse use cases in different domains. Even though several works related to the analysis of Tweets about emerging technologies exist, none of those works have focused on the analysis of Tweets about exoskeletons. The work of this paper aims to address this research gap by presenting multiple novel findings from a comprehensive analysis of about 150,000 Tweets about exoskeletons posted between May 2017 and May 2023. First, findings from temporal analysis of these Tweets reveal the specific months per year when a significantly higher volume of Tweets was posted and the time windows when the highest number of Tweets, the lowest number of Tweets, Tweets with the highest number of hashtags, and Tweets with the highest number of user mentions were posted. Second, the paper shows that there are statistically significant correlations between the number of Tweets posted per hour and the different characteristics of these Tweets. Third, the paper presents a multiple linear regression model to predict the number of Tweets posted per hour in terms of these characteristics of Tweets. The R2 score of this model was observed to be 0.9540. Fourth, the paper reports that the 10 most popular hashtags were #exoskeleton, #robotics, #iot, #technology, #tech, #innovation, #ai, #sci, #construction and #news. Fifth, sentiment analysis of these Tweets was performed, and the results show that the percentages of positive, neutral, and negative Tweets were 46.8%, 33.1%, and 20.1%, respectively. To add to this, in the Tweets that did not express a neutral sentiment, the sentiment of surprise was the most common sentiment. It was followed by sentiments of joy, disgust, sadness, fear, and anger, respectively. Furthermore, hashtag-specific sentiment analysis revealed several novel insights. For instance, for almost all the months in 2022, the usage of #ai in Tweets about exoskeletons was mainly associated with a positive sentiment. Sixth, lexicon-based approaches were used to detect possibly sarcastic Tweets and Tweets that contained news, and the results are presented. Finally, a comparison of positive Tweets, negative Tweets, neutral Tweets, possibly sarcastic Tweets, and Tweets that contained news is presented in terms of the different characteristic properties of these Tweets. The findings reveal multiple novel insights related to the similarities, variations, and trends of character count, hashtag usage, and user mentions in such Tweets during this time range. Full article
Show Figures

Figure 1

15 pages, 1932 KiB  
Article
MSEN: A Multi-Scale Evolutionary Network for Modeling the Evolution of Temporal Knowledge Graphs
by Yong Yu, Shudong Chen, Rong Du, Da Tong, Hao Xu and Shuai Chen
Future Internet 2023, 15(10), 327; https://doi.org/10.3390/fi15100327 - 30 Sep 2023
Cited by 1 | Viewed by 1671
Abstract
Temporal knowledge graphs play an increasingly prominent role in scenarios such as social networks, finance, and smart cities. As such, research on temporal knowledge graphs continues to deepen. In particular, research on temporal knowledge graph reasoning holds great significance, as it can provide [...] Read more.
Temporal knowledge graphs play an increasingly prominent role in scenarios such as social networks, finance, and smart cities. As such, research on temporal knowledge graphs continues to deepen. In particular, research on temporal knowledge graph reasoning holds great significance, as it can provide abundant knowledge for downstream tasks such as question answering and recommendation systems. Current reasoning research focuses primarily on interpolation and extrapolation. Extrapolation research aims to predict the likelihood of events occurring in future timestamps. Historical events are crucial for predicting future events. However, existing models struggle to fully capture the evolutionary characteristics of historical knowledge graphs. This paper proposes a multi-scale evolutionary network (MSEN) model that leverages Hierarchical Transfer aware Graph Neural Network (HT-GNN) in a local memory encoder to aggregate rich structural semantics from each timestamp’s knowledge graph. It also utilizes Time Related Graph Neural Network (TR-GNN) in a global memory encoder to model temporal-semantic dependencies of entities across the global knowledge graph, mining global evolutionary patterns. The model integrates information from both encoders to generate entity embeddings for predicting future events. The proposed MSEN model demonstrates strong performance compared to several baselines on typical benchmark datasets. Results show MSEN achieves the highest prediction accuracy. Full article
Show Figures

Figure 1

13 pages, 534 KiB  
Article
Temporal-Guided Knowledge Graph-Enhanced Graph Convolutional Network for Personalized Movie Recommendation Systems
by Chin-Yi Chen and Jih-Jeng Huang
Future Internet 2023, 15(10), 323; https://doi.org/10.3390/fi15100323 - 28 Sep 2023
Cited by 3 | Viewed by 1745
Abstract
Traditional movie recommendation systems are increasingly falling short in the contemporary landscape of abundant information and evolving user behaviors. This study introduced the temporal knowledge graph recommender system (TKGRS), a ground-breaking algorithm that addresses the limitations of existing models. TKGRS uniquely integrates graph [...] Read more.
Traditional movie recommendation systems are increasingly falling short in the contemporary landscape of abundant information and evolving user behaviors. This study introduced the temporal knowledge graph recommender system (TKGRS), a ground-breaking algorithm that addresses the limitations of existing models. TKGRS uniquely integrates graph convolutional networks (GCNs), matrix factorization, and temporal decay factors to offer a robust and dynamic recommendation mechanism. The algorithm’s architecture comprises an initial embedding layer for identifying the user and item, followed by a GCN layer for a nuanced understanding of the relationships and fully connected layers for prediction. A temporal decay factor is also used to give weightage to recent user–item interactions. Empirical validation using the MovieLens 100K, 1M, and Douban datasets showed that TKGRS outperformed the state-of-the-art models according to the evaluation metrics, i.e., RMSE and MAE. This innovative approach sets a new standard in movie recommendation systems and opens avenues for future research in advanced graph algorithms and machine learning techniques. Full article
Show Figures

Graphical abstract

Back to TopTop