You are currently viewing a new version of our website. To view the old version click .
Mathematics
  • This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
  • Article
  • Open Access

11 December 2025

Sound Event Detection Employing Segmental Model

Department of Electronics, Keimyung University, Daegu 42601, Republic of Korea
This article belongs to the Special Issue Deep Learning Algorithms and Game Theory Models for Intelligent Information Processing and Decision-Making

Abstract

Segmental models compute likelihood scores in segment units instead of frame units to recognize sequence data. Motivated by some promising results in speech recognition and natural language processing, we apply segmental models to sound event detection for the first time and verify their effectiveness compared to the conventional frame-based approaches. The proposed model processes variable-length segments of sound signals by encoding feature vectors employing deep learning techniques. These encoded vectors are subsequently embedded to derive representative values for each segment, which are then scored to identify the best matches for each input sound signal. Owing to the inherent variation in lengths and types of input sound signals, segmental models incur high computational and memory costs. To address this issue, a simple segment-scoring function with efficient computation and memory usage is employed in our end-to-end model. We use marginal log loss as the cost function while training the segment model, which eliminates the reliance on strong labels for sound events. Experiments performed on the detection and classification of acoustic scenes and events challenge 2019 dataset reveal that the proposed method achieves a better F-score in sound event detection compared with conventional convolutional recurrent neural network-based models.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.