Special Issue "Natural Language Processing: Emerging Neural Approaches and Applications"

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (31 July 2020).

Special Issue Editors

Dr. Massimo Esposito
E-Mail Website
Guest Editor
Institute for High Performance Computing and Networking–National Research Council of Italy (ICAR-CNR), Naples, Italy
Interests: artificial intelligence; natural language processing; decision support systems; cognitive systems; knowledge-based technologies
Special Issues and Collections in MDPI journals
Dr. Giovanni Luca Masala
E-Mail Website
Guest Editor
Department of Computing and Mathematics, Manchester Metropolitan University (MMU), Manchester, UK
Interests: artificial intelligence; natural language processing; cognitive systems; machine vision; cloud computing
Special Issues and Collections in MDPI journals
Dr. Aniello Minutolo
E-Mail Website
Guest Editor
Institute for High Performance Computing and Networking, National Research Council of Italy (ICAR-CNR), Naples, Italy
Interests: knowledge-based technologies; natural language processing; decision support systems; dialog systems and chatbots
Special Issues and Collections in MDPI journals
Dr. Marco Pota
E-Mail Website
Guest Editor
Institute for High Performance Computing and Networking, National Research Council of Italy (ICAR-CNR), Naples, Italy
Interests: fuzzy modeling; explainable AI; natural language processing; deep neural networks
Special Issues and Collections in MDPI journals

Special Issue Information

Dear Colleagues,

In recent years, Artificial Intelligence has led to impressive achievements on a variety of complex cognitive tasks, matching or even beating humans. In the field of natural language processing (NLP), the use of deep learning models in the last five years has allowed AI to surpass human levels on many important tasks, such as machine translation and machine reading comprehension, and reach considerable improvements in other real-world NLP applications, such as image captioning, visual question answering and conversational systems, search and information retrieval, sentiment analysis, and recommender systems.

Despite the remarkable success of deep learning in different NLP tasks, significant challenges yet remain that make natural language development and understanding among the least understood human capabilities from a cognitive perspective. Indeed, the current deep learning methods have been scaled up and improved, but, as a side effect, their complexity has grown, assuming the form of empirical engineering solutions. Moreover, they have assumed the characteristic of being extremely data-hungry and are not applicable to languages with scarce or zero datasets. Furthermore, they are not able to explain their outputs, which is relevant for using and improving natural language systems. Summarizing, current deep learning systems do not provide a human-like computational model of cognition that is able to acquire, comprehend, and generate natural language as well as ground it and perform common-sense reasoning on physical concepts, objects, and events of the external world.

This Special Issue is intended to provide an overview of the research being carried out in the area of natural language processing to face these open issues, with a particular focus on both emerging approaches for language learning, understanding, production, and grounding, interactively or autonomously from data, in cognitive and neural systems, as well as on their potential or real applications in different domains. To this end, the Special Issue aims to gather researchers with a broad expertise in various fields—natural language processing, cognitive science and psychology, Artificial Intelligence and neural networks, computational modeling and neuroscience—to discuss their cutting-edge work as well as perspectives on future directions in this exciting field. Original contributions are sought, covering the whole range of theoretical and practical aspects, technologies, and systems in this research area.

The topics of interest for this Special Issue include but are not limited to:

  • Natural language understanding, generation, and grounding;
  • Multilingual and cross-lingual distributional representations and universal language models;
  • Conversational systems/interfaces and question answering;
  • Sentiment analysis, emotion detection, and opinion mining;
  • Document analysis, information extraction, and text mining;
  • Machine translation;
  • Search and information retrieval;
  • Common-sense reasoning;
  • Computer/human interactive learning;
  • Neuroscience-inspired cognitive architectures;
  • Trustworthy and explainable artificial intelligence;
  • Cognitive and social robotics;
  • Applications in science, engineering, medicine, healthcare, finance, business, law, education, transportation, retailing, telecommunication, and multimedia.

Dr. Massimo Esposito
Dr. Giovanni Luca Masala
Dr. Aniello Minutolo
Dr. Marco Pota
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2000 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Natural language processing
  • Text analytics
  • Interactive and reinforcement learning
  • Machine/deep learning
  • Transfer learning
  • Cognitive systems

Published Papers (30 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review, Other

Article
Comparison of Deep Learning Models and Various Text Pre-Processing Techniques for the Toxic Comments Classification
Appl. Sci. 2020, 10(23), 8631; https://doi.org/10.3390/app10238631 - 02 Dec 2020
Cited by 2 | Viewed by 666
Abstract
The emergence of anti-social behaviour in online environments presents a serious issue in today’s society. Automatic detection and identification of such behaviour are becoming increasingly important. Modern machine learning and natural language processing methods can provide effective tools to detect different types of [...] Read more.
The emergence of anti-social behaviour in online environments presents a serious issue in today’s society. Automatic detection and identification of such behaviour are becoming increasingly important. Modern machine learning and natural language processing methods can provide effective tools to detect different types of anti-social behaviour from the pieces of text. In this work, we present a comparison of various deep learning models used to identify the toxic comments in the Internet discussions. Our main goal was to explore the effect of the data preparation on the model performance. As we worked with the assumption that the use of traditional pre-processing methods may lead to the loss of characteristic traits, specific for toxic content, we compared several popular deep learning and transformer language models. We aimed to analyze the influence of different pre-processing techniques and text representations including standard TF-IDF, pre-trained word embeddings and also explored currently popular transformer models. Experiments were performed on the dataset from the Kaggle Toxic Comment Classification competition, and the best performing model was compared with the similar approaches using standard metrics used in data analysis. Full article
Show Figures

Figure 1

Article
Delayed Combination of Feature Embedding in Bidirectional LSTM CRF for NER
Appl. Sci. 2020, 10(21), 7557; https://doi.org/10.3390/app10217557 - 27 Oct 2020
Viewed by 724
Abstract
Named Entity Recognition (NER) plays a vital role in natural language processing (NLP). Currently, deep neural network models have achieved significant success in NER. Recent advances in NER systems have introduced various feature selections to identify appropriate representations and handle Out-Of-the-Vocabulary (OOV) words. [...] Read more.
Named Entity Recognition (NER) plays a vital role in natural language processing (NLP). Currently, deep neural network models have achieved significant success in NER. Recent advances in NER systems have introduced various feature selections to identify appropriate representations and handle Out-Of-the-Vocabulary (OOV) words. After selecting the features, they are all concatenated at the embedding layer before being fed into a model to label the input sequences. However, when concatenating the features, information collisions may occur and this would cause the limitation or degradation of the performance. To overcome the information collisions, some works tried to directly connect some features to latter layers, which we call the delayed combination and show its effectiveness by comparing it to the early combination. As feature encodings for input, we selected the character-level Convolutional Neural Network (CNN) or Long Short-Term Memory (LSTM) word encoding, the pre-trained word embedding, and the contextual word embedding and additionally designed CNN-based sentence encoding using a dictionary. These feature encodings are combined at early or delayed position of the bidirectional LSTM Conditional Random Field (CRF) model according to each feature’s characteristics. We evaluated the performance of this model on the CoNLL 2003 and OntoNotes 5.0 datasets using the F1 score and compared the delayed combination model with our own implementation of the early combination as well as the previous works. This comparison convinces us that our delayed combination is more effective than the early one and also highly competitive. Full article
Show Figures

Figure 1

Article
A Novel Hybrid Model for Cantonese Rumor Detection on Twitter
Appl. Sci. 2020, 10(20), 7093; https://doi.org/10.3390/app10207093 - 12 Oct 2020
Viewed by 618
Abstract
The development of information technology and mobile Internet has spawned the prosperity of online social networks. As the world’s largest microblogging platform, Twitter is popular among people all over the world. However, as the number of users on Twitter increases, rumors have become [...] Read more.
The development of information technology and mobile Internet has spawned the prosperity of online social networks. As the world’s largest microblogging platform, Twitter is popular among people all over the world. However, as the number of users on Twitter increases, rumors have become a serious problem. Therefore, rumor detection is necessary since it can prevent unverified information from causing public panic and disrupting social order. Cantonese is a widely used language in China. However, to the best of our knowledge, little research has been done on Cantonese rumor detection. In this paper, we propose a novel hybrid model XGA (namely XLNet-based Bidirectional Gated Recurrent Unit (BiGRU) network with Attention mechanism) for Cantonese rumor detection on Twitter. Specifically, we take advantage of both semantic and sentiment features for detection. First of all, XLNet is employed to produce text-based and sentiment-based embeddings at the character level. Then we perform joint learning of character and word embeddings to obtain the words’ external contexts and internal structures. In addition, we leverage BiGRU and the attention mechanism to obtain important semantic features and use the Cantonese rumor dataset we constructed to train our proposed model. The experimental results show that the XGA model outperforms the other popular models in Cantonese rumor detection. The research in this paper provides methods and ideas for future work in Cantonese rumor detection on other social networking platforms. Full article
Show Figures

Figure 1

Article
Ontology Fixing by Using Software Engineering Technology
Appl. Sci. 2020, 10(18), 6328; https://doi.org/10.3390/app10186328 - 11 Sep 2020
Viewed by 636
Abstract
This paper presents OntologyFixer, a web-based tool that supports a methodology to build, assess, and improve the quality of ontology web language (OWL) ontologies. Using our software, knowledge engineers are able to fix low-quality OWL ontologies (such as those created from natural language [...] Read more.
This paper presents OntologyFixer, a web-based tool that supports a methodology to build, assess, and improve the quality of ontology web language (OWL) ontologies. Using our software, knowledge engineers are able to fix low-quality OWL ontologies (such as those created from natural language documents using ontology learning processes). The fixing process is guided by a set of metrics and fixing mechanisms provided by the tool, and executed primarily through automated changes (inspired by quick fix actions used in the software engineering domain). To evaluate the quality, the tool supports numerical and graphical quality assessments, focusing on ontology content and structure attributes. This tool follows principles, and provides features, typical of scientific software, including user parameter requests, logging, multithreading execution, and experiment repeatability, among others. OntologyFixer architecture takes advantage of model view controller (MVC), strategy, template, and factory design patterns; and decouples graphical user interfaces (GUI) from ontology quality metrics, ontology fixing, and REST (REpresentational State Transfer) API (Application Programming Interface) components (used for pitfall identification, and ontology evaluation). We also separate part of the OntologyFixer functionality into a new package called OntoMetrics, which focuses on the identification of symptoms and the evaluation of the quality of ontologies. Finally, OntologyFixer provides mechanisms to easily develop and integrate new quick fix methods. Full article
Show Figures

Figure 1

Article
A Domain-Independent Classification Model for Sentiment Analysis Using Neural Models
Appl. Sci. 2020, 10(18), 6221; https://doi.org/10.3390/app10186221 - 08 Sep 2020
Cited by 1 | Viewed by 580
Abstract
Most people nowadays depend on the Web as a primary source of information. Statistical studies show that young people obtain information mainly from Facebook, Twitter, and other social media platforms. By relying on these data, people may risk drawing the incorrect conclusions when [...] Read more.
Most people nowadays depend on the Web as a primary source of information. Statistical studies show that young people obtain information mainly from Facebook, Twitter, and other social media platforms. By relying on these data, people may risk drawing the incorrect conclusions when reading the news or planning to buy a product. Therefore, systems that can detect and classify sentiments and assist users in finding the correct information on the Web is highly needed in order to prevent Web surfers from being easily deceived. This paper proposes an intensive study regarding domain-independent classification models for sentiment analysis that should be trained only once. The study consists of two phases: the first phase is based on a deep learning model which is training a neural network model once after extracting robust features and saving the model and its parameters. The second phase is based on applying the trained model on a totally new dataset, aiming at correctly classifying reviews as positive or negative. The proposed model is trained on the IMDb dataset and then tested on three different datasets: IMDb dataset, Movie Reviews dataset, and our own dataset collected from Amazon reviews that rate users’ opinions regarding Apple products. The work shows high performance using different evaluation metrics compared to the stat-of-the-art results. Full article
Show Figures

Figure 1

Article
Incorporating Synonym for Lexical Sememe Prediction: An Attention-Based Model
Appl. Sci. 2020, 10(17), 5996; https://doi.org/10.3390/app10175996 - 29 Aug 2020
Cited by 2 | Viewed by 649
Abstract
Sememe is the smallest semantic unit for describing real-world concepts, which improves the interpretability and performance of Natural Language Processing (NLP). To maintain the accuracy of the sememe description, its knowledge base needs to be continuously updated, which is time-consuming and labor-intensive. Sememes [...] Read more.
Sememe is the smallest semantic unit for describing real-world concepts, which improves the interpretability and performance of Natural Language Processing (NLP). To maintain the accuracy of the sememe description, its knowledge base needs to be continuously updated, which is time-consuming and labor-intensive. Sememes predictions can assign sememes to unlabeled words and are valuable work for automatically building and/or updating sememeknowledge bases (KBs). Existing methods are overdependent on the quality of the word embedding vectors, it remains a challenge for accurate sememe prediction. To address this problem, this study proposes a novel model to improve the performance of sememe prediction by introducing synonyms. The model scores candidate sememes from synonyms by combining distances of words in embedding vector space and derives an attention-based strategy to dynamically balance two kinds of knowledge from synonymous word set and word embedding vector. A series of experiments are performed, and the results show that the proposed model has made a significant improvement in the sememe prediction accuracy. The model provides a methodological reference for commonsense KB updating and embedding of commonsense knowledge. Full article
Show Figures

Figure 1

Article
Zero-Shot Learning for Cross-Lingual News Sentiment Classification
Appl. Sci. 2020, 10(17), 5993; https://doi.org/10.3390/app10175993 - 29 Aug 2020
Cited by 3 | Viewed by 1179
Abstract
In this paper, we address the task of zero-shot cross-lingual news sentiment classification. Given the annotated dataset of positive, neutral, and negative news in Slovene, the aim is to develop a news classification system that assigns the sentiment category not only to Slovene [...] Read more.
In this paper, we address the task of zero-shot cross-lingual news sentiment classification. Given the annotated dataset of positive, neutral, and negative news in Slovene, the aim is to develop a news classification system that assigns the sentiment category not only to Slovene news, but to news in another language without any training data required. Our system is based on the multilingual BERTmodel, while we test different approaches for handling long documents and propose a novel technique for sentiment enrichment of the BERT model as an intermediate training step. With the proposed approach, we achieve state-of-the-art performance on the sentiment analysis task on Slovenian news. We evaluate the zero-shot cross-lingual capabilities of our system on a novel news sentiment test set in Croatian. The results show that the cross-lingual approach also largely outperforms the majority classifier, as well as all settings without sentiment enrichment in pre-training. Full article
Show Figures

Figure 1

Article
DAWE: A Double Attention-Based Word Embedding Model with Sememe Structure Information
Appl. Sci. 2020, 10(17), 5804; https://doi.org/10.3390/app10175804 - 21 Aug 2020
Viewed by 645
Abstract
Word embedding is an important reference for natural language processing tasks, which can generate distribution presentations of words based on many text data. Recent evidence demonstrates that introducing sememe knowledge is a promising strategy to improve the performance of word embedding. However, previous [...] Read more.
Word embedding is an important reference for natural language processing tasks, which can generate distribution presentations of words based on many text data. Recent evidence demonstrates that introducing sememe knowledge is a promising strategy to improve the performance of word embedding. However, previous works ignored the structure information of sememe knowledges. To fill the gap, this study implicitly synthesized the structural feature of sememes into word embedding models based on an attention mechanism. Specifically, we propose a novel double attention word-based embedding (DAWE) model that encodes the characteristics of sememes into words by a “double attention” strategy. DAWE is integrated with two specific word training models through context-aware semantic matching techniques. The experimental results show that, in word similarity task and word analogy reasoning task, the performance of word embedding can be effectively improved by synthesizing the structural information of sememe knowledge. The case study also verifies the power of DAWE model in word sense disambiguation task. Furthermore, the DAWE model is a general framework for encoding sememes into words, which can be integrated into other existing word embedding models to provide more options for various natural language processing downstream tasks. Full article
Show Figures

Figure 1

Article
Preliminary Results on Different Text Processing Tasks Using Encoder-Decoder Networks and the Causal Feature Extractor
Appl. Sci. 2020, 10(17), 5772; https://doi.org/10.3390/app10175772 - 20 Aug 2020
Viewed by 465
Abstract
Deep learning methods are gaining popularity in different application domains, and especially in natural language processing. It is commonly believed that using a large enough dataset and an adequate network architecture, almost any processing problem can be solved. A frequent and widely used [...] Read more.
Deep learning methods are gaining popularity in different application domains, and especially in natural language processing. It is commonly believed that using a large enough dataset and an adequate network architecture, almost any processing problem can be solved. A frequent and widely used typology is the encoder-decoder architecture, where the input data is transformed into an intermediate code by means of an encoder, and then a decoder takes this code to produce its output. Different types of networks can be used in the encoder and the decoder, depending on the problem of interest, such as convolutional neural networks (CNN) or long-short term memories (LSTM). This paper uses for the encoder a method recently proposed, called Causal Feature Extractor (CFE). It is based on causal convolutions (i.e., convolutions that depend only on one direction of the input), dilatation (i.e., increasing the aperture size of the convolutions) and bidirectionality (i.e., independent networks in both directions). Some preliminary results are presented on three different tasks and compared with state-of-the-art methods: bilingual translation, LaTeX decompilation and audio transcription. The proposed method achieves promising results, showing its ubiquity to work with text, audio and images. Moreover, it has a shorter training time, requiring less time per iteration, and a good use of the attention mechanisms based on attention matrices. Full article
Show Figures

Figure 1

Article
Can We Survive without Labelled Data in NLP? Transfer Learning for Open Information Extraction
Appl. Sci. 2020, 10(17), 5758; https://doi.org/10.3390/app10175758 - 20 Aug 2020
Viewed by 622
Abstract
Various tasks in natural language processing (NLP) suffer from lack of labelled training data, which deep neural networks are hungry for. In this paper, we relied upon features learned to generate relation triples from the open information extraction (OIE) task. First, we studied [...] Read more.
Various tasks in natural language processing (NLP) suffer from lack of labelled training data, which deep neural networks are hungry for. In this paper, we relied upon features learned to generate relation triples from the open information extraction (OIE) task. First, we studied how transferable these features are from one OIE domain to another, such as from a news domain to a bio-medical domain. Second, we analyzed their transferability to a semantically related NLP task, namely, relation extraction (RE). We thereby contribute to answering the question: can OIE help us achieve adequate NLP performance without labelled data? Our results showed comparable performance when using inductive transfer learning in both experiments by relying on a very small amount of the target data, wherein promising results were achieved. When transferring to the OIE bio-medical domain, we achieved an F-measure of 78.0%, only 1% lower when compared to traditional learning. Additionally, transferring to RE using an inductive approach scored an F-measure of 67.2%, which was 3.8% lower than training and testing on the same task. Hereby, our analysis shows that OIE can act as a reliable source task. Full article
Show Figures

Figure 1

Article
Best Practices of Convolutional Neural Networks for Question Classification
Appl. Sci. 2020, 10(14), 4710; https://doi.org/10.3390/app10144710 - 08 Jul 2020
Cited by 9 | Viewed by 984
Abstract
Question Classification (QC) is of primary importance in question answering systems, since it enables extraction of the correct answer type. State-of-the-art solutions for short text classification obtained remarkable results by Convolutional Neural Networks (CNNs). However, implementing such models requires choices, usually based on [...] Read more.
Question Classification (QC) is of primary importance in question answering systems, since it enables extraction of the correct answer type. State-of-the-art solutions for short text classification obtained remarkable results by Convolutional Neural Networks (CNNs). However, implementing such models requires choices, usually based on subjective experience, or on rare works comparing different settings for general text classification, while peculiar solutions should be individuated for QC task, depending on language and on dataset size. Therefore, this work aims at suggesting best practices for QC using CNNs. Different datasets were employed: (i) A multilingual set of labelled questions to evaluate the dependence of optimal settings on language; (ii) a large, widely used dataset for validation and comparison. Numerous experiments were executed, to perform a multivariate analysis, for evaluating statistical significance and influence on QC performance of all the factors (regarding text representation, architectural characteristics, and learning hyperparameters) and some of their interactions, and for finding the most appropriate strategies for QC. Results show the influence of CNN settings on performance. Optimal settings were found depending on language. Tests on different data validated the optimization performed, and confirmed the transferability of the best settings. Comparisons to configurations suggested by previous works highlight the best classification accuracy by those optimized here. These findings can suggest the best choices to configure a CNN for QC. Full article
Show Figures

Figure 1

Article
Text Normalization Using Encoder–Decoder Networks Based on the Causal Feature Extractor
Appl. Sci. 2020, 10(13), 4551; https://doi.org/10.3390/app10134551 - 30 Jun 2020
Cited by 2 | Viewed by 709
Abstract
The encoder–decoder architecture is a well-established, effective and widely used approach in many tasks of natural language processing (NLP), among other domains. It consists of two closely-collaborating components: An encoder that transforms the input into an intermediate form, and a decoder producing the [...] Read more.
The encoder–decoder architecture is a well-established, effective and widely used approach in many tasks of natural language processing (NLP), among other domains. It consists of two closely-collaborating components: An encoder that transforms the input into an intermediate form, and a decoder producing the output. This paper proposes a new method for the encoder, named Causal Feature Extractor (CFE), based on three main ideas: Causal convolutions, dilatations and bidirectionality. We apply this method to text normalization, which is a ubiquitous problem that appears as the first step of many text-to-speech (TTS) systems. Given a text with symbols, the problem consists in writing the text exactly as it should be read by the TTS system. We make use of an attention-based encoder–decoder architecture using a fine-grained character-level approach rather than the usual word-level one. The proposed CFE is compared to other common encoders, such as convolutional neural networks (CNN) and long-short term memories (LSTM). Experimental results show the feasibility of CFE, achieving better results in terms of accuracy, number of parameters, convergence time, and use of an attention mechanism based on attention matrices. The obtained accuracy ranges from 83.5% to 96.8% correctly normalized sentences, depending on the dataset. Moreover, the proposed method is generic and can be applied to different types of input such as text, audio and images. Full article
Show Figures

Figure 1

Article
A Polarity Capturing Sphere for Word to Vector Representation
Appl. Sci. 2020, 10(12), 4386; https://doi.org/10.3390/app10124386 - 26 Jun 2020
Cited by 1 | Viewed by 745
Abstract
Embedding words from a dictionary as vectors in a space has become an active research field, due to its many uses in several natural language processing applications. Distances between the vectors should reflect the relatedness between the corresponding words. The problem with existing [...] Read more.
Embedding words from a dictionary as vectors in a space has become an active research field, due to its many uses in several natural language processing applications. Distances between the vectors should reflect the relatedness between the corresponding words. The problem with existing word embedding methods is that they often fail to distinguish between synonymous, antonymous, and unrelated word pairs. Meanwhile, polarity detection is crucial for applications such as sentiment analysis. In this work we propose an embedding approach that is designed to capture the polarity issue. The approach is based on embedding the word vectors into a sphere, whereby the dot product between any vectors represents the similarity. Vectors corresponding to synonymous words would be close to each other on the sphere, while a word and its antonym would lie at opposite poles of the sphere. The approach used to design the vectors is a simple relaxation algorithm. The proposed word embedding is successful in distinguishing between synonyms, antonyms, and unrelated word pairs. It achieves results that are better than those of some of the state-of-the-art techniques and competes well with the others. Full article
Show Figures

Figure 1

Article
Improving Sentence Retrieval Using Sequence Similarity
Appl. Sci. 2020, 10(12), 4316; https://doi.org/10.3390/app10124316 - 23 Jun 2020
Cited by 1 | Viewed by 683
Abstract
Sentence retrieval is an information retrieval technique that aims to find sentences corresponding to an information need. It is used for tasks like question answering (QA) or novelty detection. Since it is similar to document retrieval but with a smaller unit of retrieval, [...] Read more.
Sentence retrieval is an information retrieval technique that aims to find sentences corresponding to an information need. It is used for tasks like question answering (QA) or novelty detection. Since it is similar to document retrieval but with a smaller unit of retrieval, methods for document retrieval are also used for sentence retrieval like term frequency—inverse document frequency (TF-IDF), BM 25 , and language modeling-based methods. The effect of partial matching of words to sentence retrieval is an issue that has not been analyzed. We think that there is a substantial potential for the improvement of sentence retrieval methods if we consider this approach. We adapted TF-ISF, BM 25 , and language modeling-based methods to test the partial matching of terms through combining sentence retrieval with sequence similarity, which allows matching of words that are similar but not identical. All tests were conducted using data from the novelty tracks of the Text Retrieval Conference (TREC). The scope of this paper was to find out if such approach is generally beneficial to sentence retrieval. However, we did not examine in depth how partial matching helps or hinders the finding of relevant sentences. Full article
Show Figures

Figure 1

Article
Paraphrase Identification with Lexical, Syntactic and Sentential Encodings
Appl. Sci. 2020, 10(12), 4144; https://doi.org/10.3390/app10124144 - 16 Jun 2020
Cited by 2 | Viewed by 844
Abstract
Paraphrase identification has been one of the major topics in Natural Language Processing (NLP). However, how to interpret a diversity of contexts such as lexical and semantic information within a sentence as relevant features is still an open problem. This paper addresses the [...] Read more.
Paraphrase identification has been one of the major topics in Natural Language Processing (NLP). However, how to interpret a diversity of contexts such as lexical and semantic information within a sentence as relevant features is still an open problem. This paper addresses the problem and presents an approach for leveraging contextual features with a neural-based learning model. Our Lexical, Syntactic, and Sentential Encodings (LSSE) learning model incorporates Relational Graph Convolutional Networks (R-GCNs) to make use of different features from local contexts, i.e., word encoding, position encoding, and full dependency structures. By utilizing the hidden states obtained by the R-GCNs as well as lexical and sentential encodings by Bidirectional Encoder Representations from Transformers (BERT), our model learns the contextual similarity between sentences effectively. The experimental results by using the two benchmark datasets, Microsoft Research Paraphrase Corpus (MRPC) and Quora Question Pairs (QQP) show that the improvement compared with the baseline, BERT sentential encodings model, was 1.7% F1-score on MRPC and 1.0% F1-score on QQP. Moreover, we verified that the combination of position encoding and syntactic features contributes to performance improvement. Full article
Show Figures

Figure 1

Article
A Rule-Based Approach to Embedding Techniques for Text Document Classification
Appl. Sci. 2020, 10(11), 4009; https://doi.org/10.3390/app10114009 - 10 Jun 2020
Cited by 2 | Viewed by 1071
Abstract
With the growth of online information and sudden expansion in the number of electronic documents provided on websites and in electronic libraries, there is difficulty in categorizing text documents. Therefore, a rule-based approach is a solution to this problem; the purpose of this [...] Read more.
With the growth of online information and sudden expansion in the number of electronic documents provided on websites and in electronic libraries, there is difficulty in categorizing text documents. Therefore, a rule-based approach is a solution to this problem; the purpose of this study is to classify documents by using a rule-based. This paper deals with the rule-based approach with the embedding technique for a document to vector (doc2vec) files. An experiment was performed on two data sets Reuters-21578 and the 20 Newsgroups to classify the top ten categories of these data sets by using a document to vector rule-based (D2vecRule). Finally, this method provided us a good classification result according to the F-measures and implementation time metrics. In conclusion, it was observed that our algorithm document to vector rule-based (D2vecRule) was good when compared with other algorithms such as JRip, One R, and ZeroR applied to the same Reuters-21578 dataset. Full article
Show Figures

Figure 1

Article
Dual Pointer Network for Fast Extraction of Multiple Relations in a Sentence
Appl. Sci. 2020, 10(11), 3851; https://doi.org/10.3390/app10113851 - 01 Jun 2020
Cited by 1 | Viewed by 682
Abstract
Relation extraction is a type of information extraction task that recognizes semantic relationships between entities in a sentence. Many previous studies have focused on extracting only one semantic relation between two entities in a single sentence. However, multiple entities in a sentence are [...] Read more.
Relation extraction is a type of information extraction task that recognizes semantic relationships between entities in a sentence. Many previous studies have focused on extracting only one semantic relation between two entities in a single sentence. However, multiple entities in a sentence are associated through various relations. To address this issue, we proposed a relation extraction model based on a dual pointer network with a multi-head attention mechanism. The proposed model finds n-to-1 subject–object relations using a forward object decoder. Then, it finds 1-to-n subject–object relations using a backward subject decoder. Our experiments confirmed that the proposed model outperformed previous models, with an F1-score of 80.8% for the ACE (automatic content extraction) 2005 corpus and an F1-score of 78.3% for the NYT (New York Times) corpus. Full article
Show Figures

Figure 1

Article
A Hybrid Adversarial Attack for Different Application Scenarios
Appl. Sci. 2020, 10(10), 3559; https://doi.org/10.3390/app10103559 - 21 May 2020
Cited by 1 | Viewed by 786
Abstract
Adversarial attack against natural language has been a hot topic in the field of artificial intelligence security in recent years. It is mainly to study the methods and implementation of generating adversarial examples. The purpose is to better deal with the vulnerability and [...] Read more.
Adversarial attack against natural language has been a hot topic in the field of artificial intelligence security in recent years. It is mainly to study the methods and implementation of generating adversarial examples. The purpose is to better deal with the vulnerability and security of deep learning systems. According to whether the attacker understands the deep learning model structure, the adversarial attack is divided into black-box attack and white-box attack. In this paper, we propose a hybrid adversarial attack for different application scenarios. Firstly, we propose a novel black-box attack method of generating adversarial examples to trick the word-level sentiment classifier, which is based on differential evolution (DE) algorithm to generate semantically and syntactically similar adversarial examples. Compared with existing genetic algorithm based adversarial attacks, our algorithm can achieve a higher attack success rate while maintaining a lower word replacement rate. At the 10% word substitution threshold, we have increased the attack success rate from 58.5% to 63%. Secondly, when we understand the model architecture and parameters, etc., we propose a white-box attack with gradient-based perturbation against the same sentiment classifier. In this attack, we use a Euclidean distance and cosine distance combined metric to find the most semantically and syntactically similar substitution, and we introduce the coefficient of variation (CV) factor to control the dispersion of the modified words in the adversarial examples. More dispersed modifications can increase human imperceptibility and text readability. Compared with the existing global attack, our attack can increase the attack success rate and make modification positions in generated examples more dispersed. We’ve increased the global search success rate from 75.8% to 85.8%. Finally, we can deal with different application scenarios by using these two attack methods, that is, whether we understand the internal structure and parameters of the model, we can all generate good adversarial examples. Full article
Show Figures

Graphical abstract

Article
Source Code Assessment and Classification Based on Estimated Error Probability Using Attentive LSTM Language Model and Its Application in Programming Education
Appl. Sci. 2020, 10(8), 2973; https://doi.org/10.3390/app10082973 - 24 Apr 2020
Cited by 6 | Viewed by 1675
Abstract
The rate of software development has increased dramatically. Conventional compilers cannot assess and detect all source code errors. Software may thus contain errors, negatively affecting end-users. It is also difficult to assess and detect source code logic errors using traditional compilers, resulting in [...] Read more.
The rate of software development has increased dramatically. Conventional compilers cannot assess and detect all source code errors. Software may thus contain errors, negatively affecting end-users. It is also difficult to assess and detect source code logic errors using traditional compilers, resulting in software that contains errors. A method that utilizes artificial intelligence for assessing and detecting errors and classifying source code as correct (error-free) or incorrect is thus required. Here, we propose a sequential language model that uses an attention-mechanism-based long short-term memory (LSTM) neural network to assess and classify source code based on the estimated error probability. The attentive mechanism enhances the accuracy of the proposed language model for error assessment and classification. We trained the proposed model using correct source code and then evaluated its performance. The experimental results show that the proposed model has logic and syntax error detection accuracies of 92.2% and 94.8%, respectively, outperforming state-of-the-art models. We also applied the proposed model to the classification of source code with logic and syntax errors. The average precision, recall, and F-measure values for such classification are much better than those of benchmark models. To strengthen the proposed model, we combined the attention mechanism with LSTM to enhance the results of error assessment and detection as well as source code classification. Finally, our proposed model can be effective in programming education and software engineering by improving code writing, debugging, error-correction, and reasoning. Full article
Show Figures

Figure 1

Article
Cooperative Multi-Agent Reinforcement Learning with Conversation Knowledge for Dialogue Management
Appl. Sci. 2020, 10(8), 2740; https://doi.org/10.3390/app10082740 - 15 Apr 2020
Viewed by 696
Abstract
Dialogue management plays a vital role in task-oriented dialogue systems, which has become an active area of research in recent years. Despite the promising results brought from deep reinforcement learning, most of the studies need to develop a manual user simulator additionally. To [...] Read more.
Dialogue management plays a vital role in task-oriented dialogue systems, which has become an active area of research in recent years. Despite the promising results brought from deep reinforcement learning, most of the studies need to develop a manual user simulator additionally. To address the time-consuming development of simulator policy, we propose a multi-agent dialogue model where an end-to-end dialogue manager and a user simulator are optimized simultaneously. Different from prior work, we optimize the two-agents from scratch and apply the reward shaping technology based on adjacency pairs constraints in conversational analysis to speed up learning and to avoid the derivation from normal human-human conversation. In addition, we generalize the one-to-one learning strategy to one-to-many learning strategy, where a dialogue manager can be concurrently optimized with various user simulators, to improve the performance of trained dialogue manager. The experimental results show that one-to-one agents trained with adjacency pairs constraints can converge faster and avoid derivation. In cross-model evaluation with human users involved, the dialogue manager trained in one-to-many strategy achieves the best performance. Full article
Show Figures

Figure 1

Article
A Hybrid Deep Learning Model for Protein–Protein Interactions Extraction from Biomedical Literature
Appl. Sci. 2020, 10(8), 2690; https://doi.org/10.3390/app10082690 - 13 Apr 2020
Cited by 1 | Viewed by 740
Abstract
The exponentially increasing size of biomedical literature and the limited ability of manual curators to discover protein–protein interactions (PPIs) in text has led to delays in keeping PPI databases updated with the current findings. The state-of-the-art text mining methods for PPI extraction are [...] Read more.
The exponentially increasing size of biomedical literature and the limited ability of manual curators to discover protein–protein interactions (PPIs) in text has led to delays in keeping PPI databases updated with the current findings. The state-of-the-art text mining methods for PPI extraction are primarily based on deep learning (DL) models, and the performance of a DL-based method is mainly affected by the architecture of DL models and the feature embedding methods. In this study, we compared different architectures of DL models, including convolutional neural networks (CNN), long short-term memory (LSTM), and hybrid models, and proposed a hybrid architecture of a bidirectional LSTM+CNN model for PPI extraction. Pretrained word embedding and shortest dependency path (SDP) embedding are fed into a two-embedding channel model, such that the model is able to model long-distance contextual information and can capture the local features and structure information effectively. The experimental results showed that the proposed model is superior to the non-hybrid DL models, and the hybrid CNN+Bidirectional LSTM model works well for PPI extraction. The visualization and comparison of the hidden features learned by different DL models further confirmed the effectiveness of the proposed model. Full article
Show Figures

Figure 1

Article
Medical Instructed Real-Time Assistant for Patient with Glaucoma and Diabetic Conditions
Appl. Sci. 2020, 10(7), 2216; https://doi.org/10.3390/app10072216 - 25 Mar 2020
Cited by 4 | Viewed by 872
Abstract
Virtual assistants are involved in the daily activities of humans such as managing calendars, making appointments, and providing wake-up calls. They provide a conversational service to customers around-the-clock and make their daily life manageable. With this emerging trend, many well-known companies launched their [...] Read more.
Virtual assistants are involved in the daily activities of humans such as managing calendars, making appointments, and providing wake-up calls. They provide a conversational service to customers around-the-clock and make their daily life manageable. With this emerging trend, many well-known companies launched their own virtual assistants that manage the daily routine activities of customers. In the healthcare sector, virtual medical assistants also provide a list of relevant diseases linked to a specific symptom. Due to low accuracy and uncertainty, these generated recommendations are untrusted and may lead to hypochondriasis. In this study, we proposed a Medical Instructed Real-time Assistant (MIRA) that listens to the user’s chief complaint and predicts a specific disease. Instead of informing about the medical condition, the user is referred to a nearby appropriate medical specialist. We designed an architecture for MIRA that considers the limitations of existing virtual medical assistants such as weak authentication, lack of understanding multiple intent statements about a specific medical condition, and uncertain diagnosis recommendations. To implement the designed architecture, we collected the chief complaints along with the dialogue corpora of real patients. Then, we manually validated these data under the supervision of medical specialists. We then used these data for natural language understanding, disease identification, and appropriate response generation. For the prototype version of MIRA, we considered the cases of glaucoma (eye disease) and diabetes (an autoimmune disease) only. The performance measure of MIRA was evaluated in terms of accuracy (89%), precision (90%), sensitivity (89.8%), specificity (94.9%), and F-measure (89.8%). The task completion was calculated using Cohen’s Kappa ( k = 0.848 ) that categorizes MIRA as ‘Almost Perfect’. Furthermore, the voice-based authentication identifies the user effectively and prevent against masquerading attack. Simultaneously, the user experience shows relatively good results in all aspects based on the User Experience Questionnaire (UEQ) benchmark data. The experimental results show that MIRA efficiently predicts a disease based on chief complaints and supports the user in decision making. Full article
Show Figures

Figure 1

Article
Assessment of Word-Level Neural Language Models for Sentence Completion
Appl. Sci. 2020, 10(4), 1340; https://doi.org/10.3390/app10041340 - 16 Feb 2020
Viewed by 804
Abstract
The task of sentence completion, which aims to infer the missing text of a given sentence, was carried out to assess the reading comprehension level of machines as well as humans. In this work, we conducted a comprehensive study of various approaches for [...] Read more.
The task of sentence completion, which aims to infer the missing text of a given sentence, was carried out to assess the reading comprehension level of machines as well as humans. In this work, we conducted a comprehensive study of various approaches for the sentence completion based on neural language models, which have been advanced in recent years. First, we revisited the recurrent neural network language model (RNN LM), achieving highly competitive results with an appropriate network structure and hyper-parameters. This paper presents a bidirectional version of RNN LM, which surpassed the previous best results on Microsoft Research (MSR) Sentence Completion Challenge and the Scholastic Aptitude Test (SAT) sentence completion questions. In parallel with directly applying RNN LM to sentence completion, we also employed a supervised learning framework that fine-tunes a large pre-trained transformer-based LM with a few sentence-completion examples. By fine-tuning a pre-trained BERT model, this work established state-of-the-art results on the MSR and SAT sets. Furthermore, we performed similar experimentation on newly collected cloze-style questions in the Korean language. The experimental results reveal that simply applying the multilingual BERT models for the Korean dataset was not satisfactory, which leaves room for further research. Full article
Show Figures

Figure 1

Article
Reliable Classification of FAQs with Spelling Errors Using an Encoder-Decoder Neural Network in Korean
Appl. Sci. 2019, 9(22), 4758; https://doi.org/10.3390/app9224758 - 07 Nov 2019
Cited by 2 | Viewed by 578
Abstract
To resolve lexical disagreement problems between queries and frequently asked questions (FAQs), we propose a reliable sentence classification model based on an encoder-decoder neural network. The proposed model uses three types of word embeddings; fixed word embeddings for representing domain-independent meanings of words, [...] Read more.
To resolve lexical disagreement problems between queries and frequently asked questions (FAQs), we propose a reliable sentence classification model based on an encoder-decoder neural network. The proposed model uses three types of word embeddings; fixed word embeddings for representing domain-independent meanings of words, fined-tuned word embeddings for representing domain-specific meanings of words, and character-level word embeddings for bridging lexical gaps caused by spelling errors. It also uses class embeddings to represent domain knowledge associated with each category. In the experiments with an FAQ dataset about online banking, the proposed embedding methods contributed to an improved performance of the sentence classification. In addition, the proposed model showed better performance (with an accuracy of 0.810 in the classification of 411 categories) than that of the comparison model. Full article
Show Figures

Figure 1

Article
A Text Abstraction Summary Model Based on BERT Word Embedding and Reinforcement Learning
Appl. Sci. 2019, 9(21), 4701; https://doi.org/10.3390/app9214701 - 04 Nov 2019
Cited by 11 | Viewed by 1435
Abstract
As a core task of natural language processing and information retrieval, automatic text summarization is widely applied in many fields. There are two existing methods for text summarization task at present: abstractive and extractive. On this basis we propose a novel hybrid model [...] Read more.
As a core task of natural language processing and information retrieval, automatic text summarization is widely applied in many fields. There are two existing methods for text summarization task at present: abstractive and extractive. On this basis we propose a novel hybrid model of extractive-abstractive to combine BERT (Bidirectional Encoder Representations from Transformers) word embedding with reinforcement learning. Firstly, we convert the human-written abstractive summaries to the ground truth labels. Secondly, we use BERT word embedding as text representation and pre-train two sub-models respectively. Finally, the extraction network and the abstraction network are bridged by reinforcement learning. To verify the performance of the model, we compare it with the current popular automatic text summary model on the CNN/Daily Mail dataset, and use the ROUGE (Recall-Oriented Understudy for Gisting Evaluation) metrics as the evaluation method. Extensive experimental results show that the accuracy of the model is improved obviously. Full article
Show Figures

Graphical abstract

Article
Multi-Turn Chatbot Based on Query-Context Attentions and Dual Wasserstein Generative Adversarial Networks
Appl. Sci. 2019, 9(18), 3908; https://doi.org/10.3390/app9183908 - 18 Sep 2019
Cited by 5 | Viewed by 1228
Abstract
To generate proper responses to user queries, multi-turn chatbot models should selectively consider dialogue histories. However, previous chatbot models have simply concatenated or averaged vector representations of all previous utterances without considering contextual importance. To mitigate this problem, we propose a multi-turn chatbot [...] Read more.
To generate proper responses to user queries, multi-turn chatbot models should selectively consider dialogue histories. However, previous chatbot models have simply concatenated or averaged vector representations of all previous utterances without considering contextual importance. To mitigate this problem, we propose a multi-turn chatbot model in which previous utterances participate in response generation using different weights. The proposed model calculates the contextual importance of previous utterances by using an attention mechanism. In addition, we propose a training method that uses two types of Wasserstein generative adversarial networks to improve the quality of responses. In experiments with the DailyDialog dataset, the proposed model outperformed the previous state-of-the-art models based on various performance measures. Full article
Show Figures

Figure 1

Article
A Text-Generated Method to Joint Extraction of Entities and Relations
Appl. Sci. 2019, 9(18), 3795; https://doi.org/10.3390/app9183795 - 10 Sep 2019
Cited by 3 | Viewed by 855
Abstract
Entity-relation extraction is a basic task in natural language processing, and recently, the use of deep-learning methods, especially the Long Short-Term Memory (LSTM) network, has achieved remarkable performance. However, most of the existing entity-relation extraction methods cannot solve the overlapped multi-relation extraction problem, [...] Read more.
Entity-relation extraction is a basic task in natural language processing, and recently, the use of deep-learning methods, especially the Long Short-Term Memory (LSTM) network, has achieved remarkable performance. However, most of the existing entity-relation extraction methods cannot solve the overlapped multi-relation extraction problem, which means one or two entities are shared among multiple relational triples contained in a sentence. In this paper, we propose a text-generated method to solve the overlapped problem of entity-relation extraction. Based on this, (1) the entities and their corresponding relations are jointly generated as target texts without any additional feature engineering; (2) the model directly generates the relational triples using a unified decoding process, and entities can be repeatedly presented in multiple triples to solve the overlapped-relation problem. We conduct experiments on two public datasets—NYT10 and NYT11. The experimental results show that our proposed method outperforms the existing work, and achieves the best results. Full article
Show Figures

Figure 1

Article
Information Extraction from Electronic Medical Records Using Multitask Recurrent Neural Network with Contextual Word Embedding
Appl. Sci. 2019, 9(18), 3658; https://doi.org/10.3390/app9183658 - 04 Sep 2019
Cited by 7 | Viewed by 882
Abstract
Clinical named entity recognition is an essential task for humans to analyze large-scale electronic medical records efficiently. Traditional rule-based solutions need considerable human effort to build rules and dictionaries; machine learning-based solutions need laborious feature engineering. For the moment, deep learning solutions like [...] Read more.
Clinical named entity recognition is an essential task for humans to analyze large-scale electronic medical records efficiently. Traditional rule-based solutions need considerable human effort to build rules and dictionaries; machine learning-based solutions need laborious feature engineering. For the moment, deep learning solutions like Long Short-term Memory with Conditional Random Field (LSTM–CRF) achieved considerable performance in many datasets. In this paper, we developed a multitask attention-based bidirectional LSTM–CRF (Att-biLSTM–CRF) model with pretrained Embeddings from Language Models (ELMo) in order to achieve better performance. In the multitask system, an additional task named entity discovery was designed to enhance the model’s perception of unknown entities. Experiments were conducted on the 2010 Informatics for Integrating Biology & the Bedside/Veterans Affairs (I2B2/VA) dataset. Experimental results show that our model outperforms the state-of-the-art solution both on the single model and ensemble model. Our work proposes an approach to improve the recall in the clinical named entity recognition task based on the multitask mechanism. Full article
Show Figures

Figure 1

Review

Jump to: Research, Other

Review
A Review of Text Corpus-Based Tourism Big Data Mining
Appl. Sci. 2019, 9(16), 3300; https://doi.org/10.3390/app9163300 - 12 Aug 2019
Cited by 17 | Viewed by 2455
Abstract
With the massive growth of the Internet, text data has become one of the main formats of tourism big data. As an effective expression means of tourists’ opinions, text mining of such data has big potential to inspire innovations for tourism practitioners. In [...] Read more.
With the massive growth of the Internet, text data has become one of the main formats of tourism big data. As an effective expression means of tourists’ opinions, text mining of such data has big potential to inspire innovations for tourism practitioners. In the past decade, a variety of text mining techniques have been proposed and applied to tourism analysis to develop tourism value analysis models, build tourism recommendation systems, create tourist profiles, and make policies for supervising tourism markets. The successes of these techniques have been further boosted by the progress of natural language processing (NLP), machine learning, and deep learning. With the understanding of the complexity due to this diverse set of techniques and tourism text data sources, this work attempts to provide a detailed and up-to-date review of text mining techniques that have been, or have the potential to be, applied to modern tourism big data analysis. We summarize and discuss different text representation strategies, text-based NLP techniques for topic extraction, text classification, sentiment analysis, and text clustering in the context of tourism text mining, and their applications in tourist profiling, destination image analysis, market demand, etc. Our work also provides guidelines for constructing new tourism big data applications and outlines promising research areas in this field for incoming years. Full article
Show Figures

Figure 1

Other

Jump to: Research, Review

Letter
Evolutionary Neural Architecture Search (NAS) Using Chromosome Non-Disjunction for Korean Grammaticality Tasks
Appl. Sci. 2020, 10(10), 3457; https://doi.org/10.3390/app10103457 - 17 May 2020
Viewed by 1021
Abstract
In this paper, we apply the neural architecture search (NAS) method to Korean grammaticality judgment tasks. Since the word order of a language is the final result of complex syntactic operations, a successful neural architecture search in linguistic data suggests that NAS can [...] Read more.
In this paper, we apply the neural architecture search (NAS) method to Korean grammaticality judgment tasks. Since the word order of a language is the final result of complex syntactic operations, a successful neural architecture search in linguistic data suggests that NAS can automate language model designing. Although NAS application to language has been suggested in the literature, we add a novel dataset that contains Korean-specific linguistic operations, which adds great complexity in the patterns. The result of the experiment suggests that NAS provides an architecture for the language. Interestingly, NAS has suggested an unprecedented structure that would not be designed manually. Research on the final topology of the architecture is the topic of our future research. Full article
Show Figures

Figure 1

Back to TopTop