Natural Language Processing Method: Deep Learning and Deep Semantics

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Electronic Multimedia".

Deadline for manuscript submissions: closed (15 September 2024) | Viewed by 5017

Special Issue Editors

School of Computing, National University of Singapore, Singapore 117417, Singapore
Interests: computer vision; video understanding; vision and language
Special Issues, Collections and Topics in MDPI journals
School of EIE, The University of Sydney, Sydney, NSW 2006, Australia
Interests: computer vision; machine learning; vision and language

Special Issue Information

Dear Colleagues,

With the rapid development of deep learning technology, intelligent cross-modal systems have garnered a great deal interest from academics and industry alike. Accordingly, we have witnessed the recent dramatic emergence of various AI-based vision–language applications in various fields. This Special Issues invites original research addressing important, innovative and timely challenges in the community. Potential topics include, but are not limited to:

  • visual captioning (image, video);
  • visual question answering (image, video);
  • visual text retrieval (image, video);
  • storytelling; dense visual captioning;
  • visual dialog (image, video);
  • visual grounding;
  • scene graph generation.

Dr. Wei Ji
Dr. Yiming Wu
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • computer vision
  • natural language processing
  • machine learning
  • artificial intelligence
  • visual understanding and recognition
  • deep learning

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

16 pages, 7008 KiB  
Article
Improving Top-Down Attention Network in Speech Separation by Employing Hand-Crafted Filterbank and Parameter-Sharing Transformer
by Aye Nyein Aung and Jeih-weih Hung
Electronics 2024, 13(21), 4174; https://doi.org/10.3390/electronics13214174 - 24 Oct 2024
Viewed by 1015
Abstract
The “cocktail party problem”, the challenge of isolating individual speech signals from a noisy mixture, has traditionally been addressed using statistical methods. However, deep neural networks (DNNs), with their ability to learn complex patterns, have emerged as superior solutions. DNNs excel at capturing [...] Read more.
The “cocktail party problem”, the challenge of isolating individual speech signals from a noisy mixture, has traditionally been addressed using statistical methods. However, deep neural networks (DNNs), with their ability to learn complex patterns, have emerged as superior solutions. DNNs excel at capturing intricate relationships between mixed audio signals and their respective speech sources, enabling them to effectively separate overlapping speech signals in challenging acoustic environments. Recent advances in speech separation systems have drawn inspiration from the brain’s hierarchical sensory information processing, incorporating top-down attention mechanisms. The top-down attention network (TDANet) employs an encoder–decoder architecture with top-down attention to enhance feature modulation and separation performance. By leveraging attention signals from multi-scale input features, TDANet effectively modifies features across different scales using a global attention (GA) module in the encoder–decoder design. Local attention (LA) layers then convert these modulated signals into high-resolution auditory characteristics. In this study, we propose two key modifications to TDANet. First, we substitute the fully trainable convolutional encoder with a deterministic hand-crafted multi-phase gammatone filterbank (MP-GTF), which mimics human hearing. Experimental results demonstrated that this substitution yielded comparable or even slightly superior performance to the original TDANet with a trainable encoder. Second, we replace the single multi-head self-attention (MHSA) layer in the global attention module with a transformer encoder block consisting of multiple MHSA layers. To optimize GPU memory utilization, we introduce a parameter sharing mechanism, dubbed “Reverse Cycle”, across layers in the transformer-based encoder. Our experimental findings indicated that these proposed modifications enabled TDANet to achieve competitive separation performance, rivaling state-of-the-art techniques, while maintaining superior computational efficiency. Full article
(This article belongs to the Special Issue Natural Language Processing Method: Deep Learning and Deep Semantics)
Show Figures

Figure 1

12 pages, 297 KiB  
Article
Cross-Domain Document Summarization Model via Two-Stage Curriculum Learning
by Seungsoo Lee, Gyunyeop Kim and Sangwoo Kang
Electronics 2024, 13(17), 3425; https://doi.org/10.3390/electronics13173425 - 29 Aug 2024
Viewed by 843
Abstract
Generative document summarization is a natural language processing technique that generates short summary sentences while preserving the content of long texts. Various fine-tuned pre-trained document summarization models have been proposed using a specific single text-summarization dataset. However, each text-summarization dataset usually specializes in [...] Read more.
Generative document summarization is a natural language processing technique that generates short summary sentences while preserving the content of long texts. Various fine-tuned pre-trained document summarization models have been proposed using a specific single text-summarization dataset. However, each text-summarization dataset usually specializes in a particular downstream task. Therefore, it is difficult to treat all cases involving multiple domains using a single dataset. Accordingly, when a generative document summarization model is fine-tuned to a specific dataset, it performs well, whereas the performance is degraded by up to 45% for datasets that are not used during learning. In short, summarization models perform well with in-domain cases, as the dataset domain during training and evaluation is the same but perform poorly with out-domain inputs. In this paper, we propose a new curriculum-learning method using mixed datasets while training a generative summarization model to be more robust on out-domain datasets. Our method performed better than XSum with 10%, 20%, and 10% lower performance degradation in CNN/DM, which comprised one of two test datasets used, compared to baseline model performance. Full article
(This article belongs to the Special Issue Natural Language Processing Method: Deep Learning and Deep Semantics)
Show Figures

Figure 1

13 pages, 1724 KiB  
Article
Context-Dependent Multimodal Sentiment Analysis Based on a Complex Attention Mechanism
by Lujuan Deng, Boyi Liu, Zuhe Li, Jiangtao Ma and Hanbing Li
Electronics 2023, 12(16), 3516; https://doi.org/10.3390/electronics12163516 - 20 Aug 2023
Cited by 3 | Viewed by 2535
Abstract
Multimodal sentiment analysis aims to understand people’s attitudes and opinions from different data forms. Traditional modality fusion methods for multimodal sentiment analysis con-catenate or multiply various modalities without fully utilizing context information and the correlation between modalities. To solve this problem, this article [...] Read more.
Multimodal sentiment analysis aims to understand people’s attitudes and opinions from different data forms. Traditional modality fusion methods for multimodal sentiment analysis con-catenate or multiply various modalities without fully utilizing context information and the correlation between modalities. To solve this problem, this article provides a new model based on a multimodal sentiment analysis framework based on a recurrent neural network with a complex attention mechanism. First, after the raw data is preprocessed, the numerical feature representation is obtained using feature extraction. Next, the numerical features are input into the recurrent neural network, and the output results are multimodally fused using a complex attention mechanism layer. The objective of the complex attention mechanism is to leverage enhanced non-linearity to more effectively capture the inter-modal correlations, thereby improving the performance of multimodal sentiment analysis. Finally, the processed results are fed into the classification layer and the sentiment output is obtained using the classification layer. This process can effectively capture the semantic information and contextual relationship of the input sequence and fuse different pieces of modal information. Our model was tested on the CMU-MOSEI datasets, achieving an accuracy of 82.04%. Full article
(This article belongs to the Special Issue Natural Language Processing Method: Deep Learning and Deep Semantics)
Show Figures

Figure 1

Back to TopTop