Filter Before Mixing: Per-Modality Denoising for Multimodal RL with Application to Health Management

Okita, Tsuyoshi

doi:10.3390/electronics15112361

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessFeature PaperArticle

Filter Before Mixing: Per-Modality Denoising for Multimodal RL with Application to Health Management^†

by

Tsuyoshi Okita

Department of Artificial Intelligence, Graduate School of Computer Science and Systems Engineering, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka-shi 820-8502, Fukuoka, Japan

^†

This paper is an extended version of our paper published in the 8th International Conference on Activity and Behavior Computing, 9–12 May 2026, at Hokkaido, Japan.

Electronics 2026, 15(11), 2361; https://doi.org/10.3390/electronics15112361

Submission received: 5 May 2026 / Revised: 19 May 2026 / Accepted: 20 May 2026 / Published: 29 May 2026

(This article belongs to the Section Artificial Intelligence)

Download Versions Notes

Abstract

Multimodal reinforcement learning agents must fuse signals with vastly different noise profiles—yet existing architectures, whether monolithic (

π

0, DreamerV3) or modular (MSDP, VTDexManip), allow noise from unreliable modalities to contaminate reliable ones at the point of fusion. We propose filter before mixing: each modality’s representation is independently refined by a per-modality Flow Matching module before spectral-domain fusion via a Fourier Neural Operator (FNO) with a residual gate ensuring that refinement is never harmful. The resulting architecture, FreamerV1 (Filter-before-mixing dreamer), has 93M parameters (0.4M trainable). On MiniGrid, FreamerV1 reaches 87.7 ± 8.2% (3 seeds) at 5000 episodes, while the encoder-only baseline degrades to 78% due to catastrophic forgetting. With OGM-GE (On-the-fly Gradient Modulation) for adaptive per-modality gate control, FreamerV1 achieves an 8.0% relative improvement in success rate over manual tuning with halved seed-to-seed variance (three seeds). On Crafter (no language modality), it achieves an 11.7% relative improvement over DreamerV3 in the official Crafter score (geometric mean of 22 achievement success rates; 10 seeds). On PAMAP2 wearable sensors—where no pretrained encoder exists—the foundation encoder achieves 2.4× higher reward and 16× lower variance than a vanilla MLP, confirming that the filter-before-mixing advantage grows with encoder noise.

Keywords: multimodal reinforcement learning; per-modality denoising; flow matching; Fourier Neural Operator; spectral fusion; slot attention; catastrophic forgetting; wearable health management; world model; episodic memory

Share and Cite

MDPI and ACS Style

Okita, T. Filter Before Mixing: Per-Modality Denoising for Multimodal RL with Application to Health Management. Electronics 2026, 15, 2361. https://doi.org/10.3390/electronics15112361

AMA Style

Okita T. Filter Before Mixing: Per-Modality Denoising for Multimodal RL with Application to Health Management. Electronics. 2026; 15(11):2361. https://doi.org/10.3390/electronics15112361

Chicago/Turabian Style

Okita, Tsuyoshi. 2026. "Filter Before Mixing: Per-Modality Denoising for Multimodal RL with Application to Health Management" Electronics 15, no. 11: 2361. https://doi.org/10.3390/electronics15112361

APA Style

Okita, T. (2026). Filter Before Mixing: Per-Modality Denoising for Multimodal RL with Application to Health Management. Electronics, 15(11), 2361. https://doi.org/10.3390/electronics15112361

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Filter Before Mixing: Per-Modality Denoising for Multimodal RL with Application to Health Management^†

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Filter Before Mixing: Per-Modality Denoising for Multimodal RL with Application to Health Management †

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Filter Before Mixing: Per-Modality Denoising for Multimodal RL with Application to Health Management^†