MDPI - Publisher of Open Access Journals

25 pages, 1822 KiB

Open AccessArticle

Emotion Recognition from Speech in a Subject-Independent Approach

by Andrzej Majkowski and Marcin Kołodziej

Appl. Sci. 2025, 15(13), 6958; https://doi.org/10.3390/app15136958 - 20 Jun 2025

Cited by 1 | Viewed by 639

The aim of this article is to critically and reliably assess the potential of current emotion recognition technologies for practical applications in human–computer interaction (HCI) systems. The study made use of two databases: one in English (RAVDESS) and another in Polish (EMO-BAJKA), both containing speech recordings expressing various emotions. The effectiveness of recognizing seven and eight different emotions was analyzed. A range of acoustic features, including energy features, mel-cepstral features, zero-crossing rate, fundamental frequency, and spectral features, were utilized to analyze the emotions in speech. Machine learning techniques such as convolutional neural networks (CNNs), long short-term memory (LSTM) networks, and support vector machines with a cubic kernel (cubic SVMs) were employed in the emotion classification task. The research findings indicated that the effective recognition of a broad spectrum of emotions in a subject-independent approach is limited. However, significantly better results were obtained in the classification of paired emotions, suggesting that emotion recognition technologies could be effectively used in specific applications where distinguishing between two particular emotional states is essential. To ensure a reliable and accurate assessment of the emotion recognition system, care was taken to divide the dataset in such a way that the training and testing data contained recordings of completely different individuals. The highest classification accuracies for pairs of emotions were achieved for Angry–Fearful (0.8), Angry–Happy (0.86), Angry–Neutral (1.0), Angry–Sad (1.0), Angry–Surprise (0.89), Disgust–Neutral (0.91), and Disgust–Sad (0.96) in the RAVDESS. In the EMO-BAJKA database, the highest classification accuracies for pairs of emotions were for Joy–Neutral (0.91), Surprise–Neutral (0.80), Surprise–Fear (0.91), and Neutral–Fear (0.91). Full article

(This article belongs to the Special Issue New Advances in Applied Machine Learning)

► Show Figures

Figure 1

21 pages, 7017 KiB

Open AccessArticle

Multi-Scale Frequency-Adaptive-Network-Based Underwater Target Recognition

by Lixu Zhuang, Afeng Yang, Yanxin Ma and David Day-Uei Li

J. Mar. Sci. Eng. 2024, 12(10), 1766; https://doi.org/10.3390/jmse12101766 - 5 Oct 2024

Cited by 2 | Viewed by 1002

Abstract

Due to the complexity of underwater environments, underwater target recognition based on radiated noise has always been challenging. This paper proposes a multi-scale frequency-adaptive network for underwater target recognition. Based on the different distribution densities of Mel filters in the low-frequency band, a three-channel improved Mel energy spectrum feature is designed first. Second, by combining a frequency-adaptive module, an attention mechanism, and a multi-scale fusion module, a multi-scale frequency-adaptive network is proposed to enhance the model’s learning ability. Then, the model training is optimized by introducing a time–frequency mask, a data augmentation strategy involving data confounding, and a focal loss function. Finally, systematic experiments were conducted based on the ShipsEar dataset. The results showed that the recognition accuracy for five categories reached 98.4%, and the accuracy for nine categories in fine-grained recognition was 88.6%. Compared with existing methods, the proposed multi-scale frequency-adaptive network for underwater target recognition has achieved significant performance improvement. Full article

(This article belongs to the Section Ocean Engineering)

► Show Figures

Figure 1

17 pages, 5633 KiB

Open AccessArticle

Audio General Recognition of Partial Discharge and Mechanical Defects in Switchgear Using a Smartphone

by Dongyun Dai, Quanchang Liao, Zhongqing Sang, Yimin You, Rui Qiao and Huisheng Yuan

Appl. Sci. 2023, 13(18), 10153; https://doi.org/10.3390/app131810153 - 9 Sep 2023

Cited by 2 | Viewed by 1459

Abstract

Mechanical defects and partial discharge (PD) defects can appear in the indoor switchgear of substations or distribution stations, making the switchgear a safety hazard. However, traditional acoustic methods detect and identify these two types of defects separately, ignoring the general recognition of audio signals. In addition, the process of using testing equipment is complex and costly, which is not conducive to timely testing and widespread application. To assist technicians in making a quick preliminary diagnosis of defect types for switchgear, improve the efficiency of the subsequent overhaul, and reduce the cost of detection, this paper proposes a general audio recognition method for identifying defects in switchgear using a smartphone. Using this method, we can analyze and identify audio and video files recorded with smartphones and synchronously distinguish background noise, mechanical vibration, and PD audio signals, which have good applicability within a certain range. When testing the feasibility of using smartphones to identify three types of audio signal, through characterizing 12 sets of live audio and video files provided by technicians, it was found that there were similarities and differences in these characteristics, such as the autocorrelation, density, and steepness of the waveforms in the time domain, and the band energy and harmonic components of the frequency spectrum, and new combinations of features were proposed as applicable. To compare the recognition performance for features in the time domain, frequency band energy, Mel-frequency cepstral coefficient (MFCC), and this method, feature vectors were input into a support vector machine (SVM) for a recognition test, and the recognition results showed that the the present method had the highest recognition accuracy. Finally, a set of mechanical defects and PD defects were set up for a switchgear, for practical verification, which proved that this method was general and effective. Full article

► Show Figures

Figure 1

17 pages, 3284 KiB

Open AccessArticle

Resource-Efficient Pet Dog Sound Events Classification Using LSTM-FCN Based on Time-Series Data

by Yunbin Kim, Jaewon Sa, Yongwha Chung, Daihee Park and Sungju Lee

Sensors 2018, 18(11), 4019; https://doi.org/10.3390/s18114019 - 18 Nov 2018

Cited by 21 | Viewed by 8148

Abstract

The use of IoT (Internet of Things) technology for the management of pet dogs left alone at home is increasing. This includes tasks such as automatic feeding, operation of play equipment, and location detection. Classification of the vocalizations of pet dogs using information from a sound sensor is an important method to analyze the behavior or emotions of dogs that are left alone. These sounds should be acquired by attaching the IoT sound sensor to the dog, and then classifying the sound events (e.g., barking, growling, howling, and whining). However, sound sensors tend to transmit large amounts of data and consume considerable amounts of power, which presents issues in the case of resource-constrained IoT sensor devices. In this paper, we propose a way to classify pet dog sound events and improve resource efficiency without significant degradation of accuracy. To achieve this, we only acquire the intensity data of sounds by using a relatively resource-efficient noise sensor. This presents issues as well, since it is difficult to achieve sufficient classification accuracy using only intensity data due to the loss of information from the sound events. To address this problem and avoid significant degradation of classification accuracy, we apply long short-term memory-fully convolutional network (LSTM-FCN), which is a deep learning method, to analyze time-series data, and exploit bicubic interpolation. Based on experimental results, the proposed method based on noise sensors (i.e., Shapelet and LSTM-FCN for time-series) was found to improve energy efficiency by 10 times without significant degradation of accuracy compared to typical methods based on sound sensors (i.e., mel-frequency cepstrum coefficient (MFCC), spectrogram, and mel-spectrum for feature extraction, and support vector machine (SVM) and k-nearest neighbor (K-NN) for classification). Full article

(This article belongs to the Section Internet of Things)

► Show Figures

Figure 1

31 pages, 14229 KiB

Open AccessArticle

Optimal Representation of Anuran Call Spectrum in Environmental Monitoring Systems Using Wireless Sensor Networks

by Amalia Luque, Jesús Gómez-Bellido, Alejandro Carrasco and Julio Barbancho

Sensors 2018, 18(6), 1803; https://doi.org/10.3390/s18061803 - 3 Jun 2018

Cited by 19 | Viewed by 4287

Abstract

The analysis and classification of the sounds produced by certain animal species, notably anurans, have revealed these amphibians to be a potentially strong indicator of temperature fluctuations and therefore of the existence of climate change. Environmental monitoring systems using Wireless Sensor Networks are therefore of interest to obtain indicators of global warming. For the automatic classification of the sounds recorded on such systems, the proper representation of the sound spectrum is essential since it contains the information required for cataloguing anuran calls. The present paper focuses on this process of feature extraction by exploring three alternatives: the standardized MPEG-7, the Filter Bank Energy (FBE), and the Mel Frequency Cepstral Coefficients (MFCC). Moreover, various values for every option in the extraction of spectrum features have been considered. Throughout the paper, it is shown that representing the frame spectrum with pure FBE offers slightly worse results than using the MPEG-7 features. This performance can easily be increased, however, by rescaling the FBE in a double dimension: vertically, by taking the logarithm of the energies; and, horizontally, by applying mel scaling in the filter banks. On the other hand, representing the spectrum in the cepstral domain, as in MFCC, has shown additional marginal improvements in classification performance. Full article

(This article belongs to the Special Issue Sensor Networks for Environmental Observations)

► Show Figures

Figure 1

Search Results (5)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (5)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI