MDPI - Publisher of Open Access Journals

17 pages, 402 KiB

Open AccessArticle

Cross-Corpus Multilingual Speech Emotion Recognition: Amharic vs. Other Languages

by Ephrem Afele Retta, Richard Sutcliffe, Jabar Mahmood, Michael Abebe Berwo, Eiad Almekhlafi, Sajjad Ahmad Khan, Shehzad Ashraf Chaudhry, Mustafa Mhamed and Jun Feng

Appl. Sci. 2023, 13(23), 12587; https://doi.org/10.3390/app132312587 - 22 Nov 2023

Cited by 8 | Viewed by 2077

Abstract

In a conventional speech emotion recognition (SER) task, a classifier for a given language is trained on a pre-existing dataset for that same language. However, where training data for a language do not exist, data from other languages can be used instead. We [...] Read more.

In a conventional speech emotion recognition (SER) task, a classifier for a given language is trained on a pre-existing dataset for that same language. However, where training data for a language do not exist, data from other languages can be used instead. We experiment with cross-lingual and multilingual SER, working with Amharic, English, German, and Urdu. For Amharic, we use our own publicly available Amharic Speech Emotion Dataset (ASED). For English, German and Urdu, we use the existing RAVDESS, EMO-DB, and URDU datasets. We followed previous research in mapping labels for all of the datasets to just two classes: positive and negative. Thus, we can compare performance on different languages directly and combine languages for training and testing. In Experiment 1, monolingual SER trials were carried out using three classifiers, AlexNet, VGGE (a proposed variant of VGG), and ResNet50. The results, averaged for the three models, were very similar for ASED and RAVDESS, suggesting that Amharic and English SER are equally difficult. Similarly, German SER is more difficult, and Urdu SER is easier. In Experiment 2, we trained on one language and tested on another, in both directions for each of the following pairs: Amharic↔German, Amharic↔English, and Amharic↔Urdu. The results with Amharic as the target suggested that using English or German as the source gives the best result. In Experiment 3, we trained on several non-Amharic languages and then tested on Amharic. The best accuracy obtained was several percentage points greater than the best accuracy in Experiment 2, suggesting that a better result can be obtained when using two or three non-Amharic languages for training than when using just one non-Amharic language. Overall, the results suggest that cross-lingual and multilingual training can be an effective strategy for training an SER classifier when resources for a language are scarce. Full article

(This article belongs to the Special Issue Natural Language Processing (NLP) and Applications)

► Show Figures

Figure 1

23 pages, 32221 KiB

Open AccessArticle

Learned Text Representation for Amharic Information Retrieval and Natural Language Processing

by Tilahun Yeshambel, Josiane Mothe and Yaregal Assabie

Information 2023, 14(3), 195; https://doi.org/10.3390/info14030195 - 20 Mar 2023

Cited by 12 | Viewed by 6090

Abstract

Over the past few years, word embeddings and bidirectional encoder representations from transformers (BERT) models have brought better solutions to learning text representations for natural language processing (NLP) and other tasks. Many NLP applications rely on pre-trained text representations, leading to the development [...] Read more.

Over the past few years, word embeddings and bidirectional encoder representations from transformers (BERT) models have brought better solutions to learning text representations for natural language processing (NLP) and other tasks. Many NLP applications rely on pre-trained text representations, leading to the development of a number of neural network language models for various languages. However, this is not the case for Amharic, which is known to be a morphologically complex and under-resourced language. Usable pre-trained models for automatic Amharic text processing are not available. This paper presents an investigation on the essence of learned text representation for information retrieval and NLP tasks using word embeddings and BERT language models. We explored the most commonly used methods for word embeddings, including word2vec, GloVe, and fastText, as well as the BERT model. We investigated the performance of query expansion using word embeddings. We also analyzed the use of a pre-trained Amharic BERT model for masked language modeling, next sentence prediction, and text classification tasks. Amharic ad hoc information retrieval test collections that contain word-based, stem-based, and root-based text representations were used for evaluation. We conducted a detailed empirical analysis on the usability of word embeddings and BERT models on word-based, stem-based, and root-based corpora. Experimental results show that word-based query expansion and language modeling perform better than stem-based and root-based text representations, and fastText outperforms other word embeddings on word-based corpus. Full article

(This article belongs to the Special Issue Novel Methods and Applications in Natural Language Processing)

► Show Figures

Figure 1

22 pages, 1141 KiB

Open AccessArticle

Amharic Speech Search Using Text Word Query Based on Automatic Sentence-like Segmentation

by Getnet Mezgebu Brhanemeskel, Solomon Teferra Abate, Tewodros Alemu Ayall and Abegaz Mohammed Seid

Appl. Sci. 2022, 12(22), 11727; https://doi.org/10.3390/app122211727 - 18 Nov 2022

Cited by 3 | Viewed by 4276

Abstract

More than 7000 languages are spoken in the world today. Amharic is one of the languages spoken in the East African country Ethiopia. A lot of speech data is being made every day in different languages as machines are getting better at processing [...] Read more.

More than 7000 languages are spoken in the world today. Amharic is one of the languages spoken in the East African country Ethiopia. A lot of speech data is being made every day in different languages as machines are getting better at processing and have improved storing capacity. However, searching for a particular word with its respective time frame inside a given audio file is a challenge. Since Amharic has its own distinguishing characteristics, such as glottal, palatal, and labialized consonants, it is not possible to directly use models that are developed for other languages. A popular approach in developing systems for searching particular information in speech involves using an automatic speech recognition (ASR) module that generates the text version of the speech where the word or phrase is searched based on text query. However, it is not possible to transcribe a long audio file without segmentation, which in turn affects the performance of the ASR module. In this paper, we are reporting our investigation on the effects of manual and automatic speech segmentation of Amharic audio files in a spiritual domain. We have used manual segmentation as a baseline for our investigation and found out that sentence-like automatic segmentation resulted in a word error rate (WER) closer to the WER achieved on the manually segmented test speech. Based on the experimental results, we propose Amharic speech search using text word query (ASSTWQ) based on automatic sentence-like segmentation. Since we have achieved lower WER using the previously developed speech corpus, which is in a broadcast news domain, together with the in-domain speech corpus, we recommend using both in- and out-domain speech corpora to develop the Amharic ASR module. The performance of the proposed ASR is a WER of 53% that needs further improvement. Combining two language models (LMs) developed using training text from the two different domains (spiritual and broadcast news) allowed a WER reduction from 53% to 46%. Therefore, we have developed two ASSTWQ systems using the two ASR modules with WERs of 53% and 46%. Full article

(This article belongs to the Special Issue Natural Language Processing: Recent Development and Applications)

► Show Figures

Figure 1

28 pages, 4341 KiB

Open AccessArticle

Amharic Adhoc Information Retrieval System Based on Morphological Features

by Tilahun Yeshambel, Josiane Mothe and Yaregal Assabie

Appl. Sci. 2022, 12(3), 1294; https://doi.org/10.3390/app12031294 - 26 Jan 2022

Cited by 7 | Viewed by 4420

Abstract

Information retrieval (IR) is one of the most important research and development areas due to the explosion of digital data and the need of accessing relevant information from huge corpora. Although IR systems function well for technologically advanced languages such as English, this [...] Read more.

Information retrieval (IR) is one of the most important research and development areas due to the explosion of digital data and the need of accessing relevant information from huge corpora. Although IR systems function well for technologically advanced languages such as English, this is not the case for morphologically complex, under-resourced and less-studied languages such as Amharic. Amharic is a Semitic language characterized by a complex morphology where thousands of words are generated from a single root form through inflection and derivation. This has made the development of Amharic natural language processing (NLP) tools a challenging task. Amharic adhoc retrieval also faces challenges due to scarcity of linguistic resources, tools and standard evaluation corpora. In this research work, we investigate the impact of morphological features on the representation of Amharic documents and queries for adhoc retrieval. We also analyze the effects of stem-based and root-based text representation, and proposed new Amharic IR system architecture. Moreover, we present the resources and corpora we constructed for evaluation of Amharic IR systems and other NLP tools. We conduct various experiments with a TREC-like approach for Amharic IR test collection using a standard evaluation framework and measures. Our findings show that root-based text representation outperforms the conventional stem-based representation on Amharic IR. Full article

► Show Figures

Figure 1

21 pages, 569 KiB

Open AccessArticle

Topic Modeling for Amharic User Generated Texts

by Girma Neshir, Andreas Rauber and Solomon Atnafu

Information 2021, 12(10), 401; https://doi.org/10.3390/info12100401 - 29 Sep 2021

Cited by 1 | Viewed by 3081

Abstract

Topic Modeling is a statistical process, which derives the latent themes from extensive collections of text. Three approaches to topic modeling exist, namely, unsupervised, semi-supervised and supervised. In this work, we develop a supervised topic model for an Amharic corpus. We also investigate [...] Read more.

Topic Modeling is a statistical process, which derives the latent themes from extensive collections of text. Three approaches to topic modeling exist, namely, unsupervised, semi-supervised and supervised. In this work, we develop a supervised topic model for an Amharic corpus. We also investigate the effect of stemming on topic detection on Term Frequency Inverse Document Frequency (TF-IDF) features, Latent Dirichlet Allocation (LDA) features and a combination of these two feature sets using four supervised machine learning tools, that is, Support Vector Machine (SVM), Naive Bayesian (NB), Logistic Regression (LR), and Neural Nets (NN). We evaluate our approach using an Amharic corpus of 14,751 documents of ten topic categories. Both qualitative and quantitative analysis of results show that our proposed supervised topic detection outperforms with an accuracy of 88% by SVM using state-of-the-art-approach TF-IDF word features with the application of the Synthetic Minority Over-sampling Technique (SMOTE) and with no stemming operation. The results show that text features with stemming slightly improve the performance of the topic classifier over features with no stemming. Full article

► Show Figures

Figure 1

20 pages, 960 KiB

Open AccessArticle

Meta-Learner for Amharic Sentiment Classification

by Girma Neshir, Andreas Rauber and Solomon Atnafu

Appl. Sci. 2021, 11(18), 8489; https://doi.org/10.3390/app11188489 - 13 Sep 2021

Cited by 7 | Viewed by 3462

Abstract

The emergence of the World Wide Web facilitates the growth of user-generated texts in less-resourced languages. Sentiment analysis of these texts may serve as a key performance indicator of the quality of services delivered by companies and government institutions. The presence of user-generated [...] Read more.

The emergence of the World Wide Web facilitates the growth of user-generated texts in less-resourced languages. Sentiment analysis of these texts may serve as a key performance indicator of the quality of services delivered by companies and government institutions. The presence of user-generated texts is an opportunity for assisting managers and policy-makers. These texts are used to improve performance and increase the level of customers’ satisfaction. Because of this potential, sentiment analysis has been widely researched in the past few years. A plethora of approaches and tools have been developed—albeit predominantly for well-resourced languages such as English. Resources for less-resourced languages such as, in this paper, Amharic, are much less developed. As a result, it requires cost-effective approaches and massive amounts of annotated training data, calling for different approaches to be applied. This research investigates the performance of a combination of heterogeneous machine learning algorithms (base learners such as SVM, RF, and NB). These models in the framework are fused by a meta-learner (in this case, logistic regression) for Amharic sentiment classification. An annotated corpus is provided for evaluation of the classification framework. The proposed stacked approach applying SMOTE on TF-IDF characters (1,7) grams features has achieved an accuracy of 90%. The overall results of the meta-learner (i.e., stack ensemble) have revealed performance rise over the base learners with TF-IDF character n-grams. Full article

► Show Figures

Figure 1

9 pages, 1620 KiB

Open AccessArticle

Combating Fake News in “Low-Resource” Languages: Amharic Fake News Detection Accompanied by Resource Crafting

by Fantahun Gereme, William Zhu, Tewodros Ayall and Dagmawi Alemu

Information 2021, 12(1), 20; https://doi.org/10.3390/info12010020 - 7 Jan 2021

Cited by 34 | Viewed by 8563

Abstract

The need to fight the progressive negative impact of fake news is escalating, which is evident in the strive to do research and develop tools that could do this job. However, a lack of adequate datasets and good word embeddings have posed challenges [...] Read more.

The need to fight the progressive negative impact of fake news is escalating, which is evident in the strive to do research and develop tools that could do this job. However, a lack of adequate datasets and good word embeddings have posed challenges to make detection methods sufficiently accurate. These resources are even totally missing for “low-resource” African languages, such as Amharic. Alleviating these critical problems should not be left for tomorrow. Deep learning methods and word embeddings contributed a lot in devising automatic fake news detection mechanisms. Several contributions are presented, including an Amharic fake news detection model, a general-purpose Amharic corpus (GPAC), a novel Amharic fake news detection dataset (ETH_FAKE), and Amharic fasttext word embedding (AMFTWE). Our Amharic fake news detection model, evaluated with the ETH_FAKE dataset and using the AMFTWE, performed very well. Full article

(This article belongs to the Special Issue Natural Language Processing for Social Media)

► Show Figures

Figure 1

30 pages, 5703 KiB

Open AccessArticle

Advanced Convolutional Neural Network-Based Hybrid Acoustic Models for Low-Resource Speech Recognition

by Tessfu Geteye Fantaye, Junqing Yu and Tulu Tilahun Hailu

Computers 2020, 9(2), 36; https://doi.org/10.3390/computers9020036 - 2 May 2020

Cited by 11 | Viewed by 5466

Abstract

Deep neural networks (DNNs) have shown a great achievement in acoustic modeling for speech recognition task. Of these networks, convolutional neural network (CNN) is an effective network for representing the local properties of the speech formants. However, CNN is not suitable for modeling [...] Read more.

Deep neural networks (DNNs) have shown a great achievement in acoustic modeling for speech recognition task. Of these networks, convolutional neural network (CNN) is an effective network for representing the local properties of the speech formants. However, CNN is not suitable for modeling the long-term context dependencies between speech signal frames. Recently, the recurrent neural networks (RNNs) have shown great abilities for modeling long-term context dependencies. However, the performance of RNNs is not good for low-resource speech recognition tasks, and is even worse than the conventional feed-forward neural networks. Moreover, these networks often overfit severely on the training corpus in the low-resource speech recognition tasks. This paper presents the results of our contributions to combine CNN and conventional RNN with gate, highway, and residual networks to reduce the above problems. The optimal neural network structures and training strategies for the proposed neural network models are explored. Experiments were conducted on the Amharic and Chaha datasets, as well as on the limited language packages (10-h) of the benchmark datasets released under the Intelligence Advanced Research Projects Activity (IARPA) Babel Program. The proposed neural network models achieve 0.1–42.79% relative performance improvements over their corresponding feed-forward DNN, CNN, bidirectional RNN (BRNN), or bidirectional gated recurrent unit (BGRU) baselines across six language collections. These approaches are promising candidates for developing better performance acoustic models for low-resource speech recognition tasks. Full article

(This article belongs to the Special Issue Artificial Neural Networks in Pattern Recognition)

► Show Figures

Figure 1

Search Results (8)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (8)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI