MDPI - Publisher of Open Access Journals

16 pages, 1093 KiB

Open AccessArticle

A Lightweight Framework for Audio-Visual Segmentation with an Audio-Guided Space–Time Memory Network

by Yunpeng Zuo and Yunwei Zhang

Appl. Sci. 2025, 15(12), 6585; https://doi.org/10.3390/app15126585 - 11 Jun 2025

Viewed by 451

As a multimodal fusion task, audio-visual segmentation (AVS) aims to locate sounding objects at the pixel level within a given image. This capability holds significant importance and practical value in applications such as intelligent surveillance, multimedia content analysis, and human–robot interaction. However, existing AVS models typically feature complex architectures, require a large number of parameters, and are challenging to deploy on embedded platforms. Furthermore, these models often lack integration with object tracking mechanisms and fail to address the issue of the mis-segmentation of unvoiced objects caused by environmental noise in real-world scenarios. To address these challenges, this research proposes a lightweight audio-visual segmentation framework incorporating an audio-guided space–time memory network (AG-STMNet). First, a mask generator with a scoring mechanism was developed to identify sounding objects from generated masks. This component integrates Fastsam, a lightweight, pre-trained, object-aware segmentation model, with WAV2CLIP, a parameter-efficient audio-visual alignment model. Subsequently, AG-STMNet, an audio-guided video object segmentation network, was introduced to track sounding objects using video object segmentation techniques while mitigating environmental noise. Finally, the mask generator and AG-STMNet were combined to form the complete framework. The experimental results demonstrate that the framework achieves a mean Intersection over Union (mIoU) score of 41.5, indicating its potential as a viable lightweight solution for practical applications. Full article

(This article belongs to the Special Issue Artificial Intelligence and Its Application in Robotics)

► Show Figures

Figure 1

Search Results (1)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (1)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI