Recent Advances in Neural Networks and Applications

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Network Science".

Deadline for manuscript submissions: closed (31 October 2023) | Viewed by 7580

Special Issue Editors


E-Mail Website
Guest Editor
St. Petersburg Federal Research Center of the Russian Academy of Sciences (SPC RAS), 199178 St. Petersburg, Russia
Interests: signal and image processing; pattern recognition; computer vision; machine learning; artificial Intelligence; multimodal interfaces

E-Mail Website
Guest Editor
St. Petersburg Federal Research Center of the Russian Academy of Sciences (SPC RAS), 199178 St. Petersburg, Russia
Interests: automatic speech recognition; audio-visual speech recognition; lip-reading; machine learning; multimodal interfaces

Special Issue Information

Dear Colleagues,

In recent years, many tasks related to the automation and digital transformation of key processes in social society, science, business, and industry have appeared.  Leading world scientific centers and industrial companies are tightly involved in the development and implementation of new mathematical methods as well as software and technological solutions (applications) based on machine learning, neural networks, and artificial intelligence (neurocomputer technologies). This Special Issue is devoted to recent research advances in the field and is focused on assessing the quality of data preprocessing methods of different modalities, neural network models, methods, and algorithms for solving various problems of exponential complexity. We are looking for high-quality original, unpublished, and completed research that is not currently under review by any other conference/journal.

 Topics of interest include but are not limited to the following:

  • Processing of signals from sources of various types, including the use in machine-learning-based applications.
  • Processing of noisy signals (including the creation of systems for processing heavily noise signals).
  • Recognition of sounds and speech in difficult conditions (noise, long distance, and so on), including conversation analysis.
  • Collection of datasets and training of neural network models.
  • Data markup using artificial intelligence, including for automating data preparation for applied tasks.
  • A combination of different types of algorithms within computer vision and machine learning systems.
  • Detection and identification of objects in a complex environment.
  • Autonomous semantic segmentation, classification and identification of objects, and division into sub-objects, including in real time.
  • Event analysis using video analytics systems.
  • Evaluation of the quality of neural network models of machine learning without testing in a real environment, including systems tested without user participation.
  • Development of methods towards the creation of effective artificial intelligence.
  • Development of autonomous intelligent agents, including those based on reinforcement learning, as well as multi-agent systems with artificial intelligence.
  • Automation of training of neural networks (automated machine learning, including evolutionary algorithms).

Dr. Ryumin Dmitry
Dr. Ivanko Denis
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • signal processing
  • data augmentation
  • neural rendering
  • computer vision and image processing
  • artificial neural networks
  • deep machine learning
  • transfer learning
  • prediction analysis
  • mathematical modeling
  • scene recognition
  • expert systems based on artificial intelligence
  • explainable ai
  • artificial intelligence and mathematics
  • intelligent systems
  • multimodal interfaces

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

17 pages, 1411 KiB  
Article
A Neural Network Architecture for Children’s Audio–Visual Emotion Recognition
by Anton Matveev, Yuri Matveev, Olga Frolova, Aleksandr Nikolaev and Elena Lyakso
Mathematics 2023, 11(22), 4573; https://doi.org/10.3390/math11224573 - 7 Nov 2023
Viewed by 912
Abstract
Detecting and understanding emotions are critical for our daily activities. As emotion recognition (ER) systems develop, we start looking at more difficult cases than just acted adult audio–visual speech. In this work, we investigate the automatic classification of the audio–visual emotional speech of [...] Read more.
Detecting and understanding emotions are critical for our daily activities. As emotion recognition (ER) systems develop, we start looking at more difficult cases than just acted adult audio–visual speech. In this work, we investigate the automatic classification of the audio–visual emotional speech of children, which presents several challenges including the lack of publicly available annotated datasets and the low performance of the state-of-the art audio–visual ER systems. In this paper, we present a new corpus of children’s audio–visual emotional speech that we collected. Then, we propose a neural network solution that improves the utilization of the temporal relationships between audio and video modalities in the cross-modal fusion for children’s audio–visual emotion recognition. We select a state-of-the-art neural network architecture as a baseline and present several modifications focused on a deeper learning of the cross-modal temporal relationships using attention. By conducting experiments with our proposed approach and the selected baseline model, we observe a relative improvement in performance by 2%. Finally, we conclude that focusing more on the cross-modal temporal relationships may be beneficial for building ER systems for child–machine communications and environments where qualified professionals work with children. Full article
(This article belongs to the Special Issue Recent Advances in Neural Networks and Applications)
Show Figures

Figure 1

21 pages, 1620 KiB  
Article
Deep Models for Low-Resourced Speech Recognition: Livvi-Karelian Case
by Irina Kipyatkova and Ildar Kagirov
Mathematics 2023, 11(18), 3814; https://doi.org/10.3390/math11183814 - 5 Sep 2023
Viewed by 951
Abstract
Recently, there has been a growth in the number of studies addressing the automatic processing of low-resource languages. The lack of speech and text data significantly hinders the development of speech technologies for such languages. This paper introduces an automatic speech recognition system [...] Read more.
Recently, there has been a growth in the number of studies addressing the automatic processing of low-resource languages. The lack of speech and text data significantly hinders the development of speech technologies for such languages. This paper introduces an automatic speech recognition system for Livvi-Karelian. Acoustic models based on artificial neural networks with time delays and hidden Markov models were trained using a limited speech dataset of 3.5 h. To augment the data, pitch and speech rate perturbation, SpecAugment, and their combinations were employed. Language models based on 3-grams and neural networks were trained using written texts and transcripts. The achieved word error rate metric of 22.80% is comparable to other low-resource languages. To the best of our knowledge, this is the first speech recognition system for Livvi-Karelian. The results obtained can be of a certain significance for development of automatic speech recognition systems not only for Livvi-Karelian, but also for other low-resource languages, including the fields of speech recognition and machine translation systems. Future work includes experiments with Karelian data using techniques such as transfer learning and DNN language models. Full article
(This article belongs to the Special Issue Recent Advances in Neural Networks and Applications)
Show Figures

Figure 1

22 pages, 2739 KiB  
Article
Multi-Corpus Learning for Audio–Visual Emotions and Sentiment Recognition
by Elena Ryumina, Maxim Markitantov and Alexey Karpov
Mathematics 2023, 11(16), 3519; https://doi.org/10.3390/math11163519 - 15 Aug 2023
Cited by 1 | Viewed by 1195
Abstract
Recognition of emotions and sentiment (affective states) from human audio–visual information is widely used in healthcare, education, entertainment, and other fields; therefore, it has become a highly active research area. The large variety of corpora with heterogeneous data available for the development of [...] Read more.
Recognition of emotions and sentiment (affective states) from human audio–visual information is widely used in healthcare, education, entertainment, and other fields; therefore, it has become a highly active research area. The large variety of corpora with heterogeneous data available for the development of single-corpus approaches for recognition of affective states may lead to approaches trained on one corpus being less effective on another. In this article, we propose a multi-corpus learned audio–visual approach for emotion and sentiment recognition. It is based on the extraction of mid-level features at the segment level using two multi-corpus temporal models (a pretrained transformer with GRU layers for the audio modality and pre-trained 3D CNN with BiLSTM-Former for the video modality) and on predicting affective states using two single-corpus cross-modal gated self-attention fusion (CMGSAF) models. The proposed approach was tested on the RAMAS and CMU-MOSEI corpora. To date, our approach has outperformed state-of-the-art audio–visual approaches for emotion recognition by 18.2% (78.1% vs. 59.9%) for the CMU-MOSEI corpus in terms of the Weighted Accuracy and by 0.7% (82.8% vs. 82.1%) for the RAMAS corpus in terms of the Unweighted Average Recall. Full article
(This article belongs to the Special Issue Recent Advances in Neural Networks and Applications)
Show Figures

Figure 1

Review

Jump to: Research

30 pages, 543 KiB  
Review
A Review of Recent Advances on Deep Learning Methods for Audio-Visual Speech Recognition
by Denis Ivanko, Dmitry Ryumin and Alexey Karpov
Mathematics 2023, 11(12), 2665; https://doi.org/10.3390/math11122665 - 12 Jun 2023
Cited by 4 | Viewed by 3139
Abstract
This article provides a detailed review of recent advances in audio-visual speech recognition (AVSR) methods that have been developed over the last decade (2013–2023). Despite the recent success of audio speech recognition systems, the problem of audio-visual (AV) speech decoding remains challenging. In [...] Read more.
This article provides a detailed review of recent advances in audio-visual speech recognition (AVSR) methods that have been developed over the last decade (2013–2023). Despite the recent success of audio speech recognition systems, the problem of audio-visual (AV) speech decoding remains challenging. In comparison to the previous surveys, we mainly focus on the important progress brought with the introduction of deep learning (DL) to the field and skip the description of long-known traditional “hand-crafted” methods. In addition, we also discuss the recent application of DL toward AV speech fusion and recognition. We first discuss the main AV datasets used in the literature for AVSR experiments since we consider it a data-driven machine learning (ML) task. We then consider the methodology used for visual speech recognition (VSR). Subsequently, we also consider recent AV methodology advances. We then separately discuss the evolution of the core AVSR methods, pre-processing and augmentation techniques, and modality fusion strategies. We conclude the article with a discussion on the current state of AVSR and provide our vision for future research. Full article
(This article belongs to the Special Issue Recent Advances in Neural Networks and Applications)
Show Figures

Figure 1

Back to TopTop