Abstract
Syntheticaperture radar (SAR) object detection offers significant advantages in remote sensing applications, particularly under adverse weather conditions or low-light environments. However, single-modal SAR image object detection encounters numerous challenges, including speckle noise, limited texture information, and interference from complex backgrounds. To address these issues, we present Modality-Aware Adaptive Interaction Enhancement Network (MAIENet), a multimodal detection framework designed to effectively extract complementary information from both SAR and optical images, thereby enhancing object detection performance. MAIENet comprises three primary components: batch-wise splitting and channel-wise concatenation (BSCC) module, modality-aware adaptive interaction enhancement (MAIE) module, and multi-directional focus (MF) module. The BSCC module extracts and reorganizes features from each modality to preserve their distinct characteristics. The MAIE module component facilitates deeper cross-modal fusion through channel reweighting, deformable convolutions, atrous convolution, and attention mechanisms, enabling the network to emphasize critical modal information while reducing interference. By integrating features from various spatial directions, the MF module expands the receptive field, allowing the model to adapt more effectively to complex scenes. The MAIENet framework is end-to-end trainable and can be seamlessly integrated into existing detection networks with minimal modifications. Experimental results on the publicly available OGSOD-1.0 dataset demonstrate that MAIENet achieves superior performance compared with existing methods, achieving 90.8% .