Next Article in Journal
Curvature-Adaptive Smoothing for Multi-View Industrial Metrology
Previous Article in Journal
Causal Graph-Enhanced Large Language Models for Automated Fault Diagnosis and Intelligent Operation and Maintenance in Distributed Computing Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Filter Before Mixing: Per-Modality Denoising for Multimodal RL with Application to Health Management †

Department of Artificial Intelligence, Graduate School of Computer Science and Systems Engineering, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka-shi 820-8502, Fukuoka, Japan
This paper is an extended version of our paper published in the 8th International Conference on Activity and Behavior Computing, 9–12 May 2026, at Hokkaido, Japan.
Electronics 2026, 15(11), 2361; https://doi.org/10.3390/electronics15112361
Submission received: 5 May 2026 / Revised: 19 May 2026 / Accepted: 20 May 2026 / Published: 29 May 2026
(This article belongs to the Section Artificial Intelligence)

Abstract

Multimodal reinforcement learning agents must fuse signals with vastly different noise profiles—yet existing architectures, whether monolithic (π0, DreamerV3) or modular (MSDP, VTDexManip), allow noise from unreliable modalities to contaminate reliable ones at the point of fusion. We propose filter before mixing: each modality’s representation is independently refined by a per-modality Flow Matching module before spectral-domain fusion via a Fourier Neural Operator (FNO) with a residual gate ensuring that refinement is never harmful. The resulting architecture, FreamerV1 (Filter-before-mixing dreamer), has 93M parameters (0.4M trainable). On MiniGrid, FreamerV1 reaches 87.7 ± 8.2% (3 seeds) at 5000 episodes, while the encoder-only baseline degrades to 78% due to catastrophic forgetting. With OGM-GE (On-the-fly Gradient Modulation) for adaptive per-modality gate control, FreamerV1 achieves an 8.0% relative improvement in success rate over manual tuning with halved seed-to-seed variance (three seeds). On Crafter (no language modality), it achieves an 11.7% relative improvement over DreamerV3 in the official Crafter score (geometric mean of 22 achievement success rates; 10 seeds). On PAMAP2 wearable sensors—where no pretrained encoder exists—the foundation encoder achieves 2.4× higher reward and 16× lower variance than a vanilla MLP, confirming that the filter-before-mixing advantage grows with encoder noise.
Keywords: multimodal reinforcement learning; per-modality denoising; flow matching; Fourier Neural Operator; spectral fusion; slot attention; catastrophic forgetting; wearable health management; world model; episodic memory multimodal reinforcement learning; per-modality denoising; flow matching; Fourier Neural Operator; spectral fusion; slot attention; catastrophic forgetting; wearable health management; world model; episodic memory

Share and Cite

MDPI and ACS Style

Okita, T. Filter Before Mixing: Per-Modality Denoising for Multimodal RL with Application to Health Management. Electronics 2026, 15, 2361. https://doi.org/10.3390/electronics15112361

AMA Style

Okita T. Filter Before Mixing: Per-Modality Denoising for Multimodal RL with Application to Health Management. Electronics. 2026; 15(11):2361. https://doi.org/10.3390/electronics15112361

Chicago/Turabian Style

Okita, Tsuyoshi. 2026. "Filter Before Mixing: Per-Modality Denoising for Multimodal RL with Application to Health Management" Electronics 15, no. 11: 2361. https://doi.org/10.3390/electronics15112361

APA Style

Okita, T. (2026). Filter Before Mixing: Per-Modality Denoising for Multimodal RL with Application to Health Management. Electronics, 15(11), 2361. https://doi.org/10.3390/electronics15112361

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop