Abstract
We propose Res-FormerNet, an improved inversion network that integrates a lightweight Transformer encoder into a ResNet50 backbone to enhance two-dimensional magnetotelluric (MT) inversion. The model is designed to jointly leverage residual convolutional structures for local feature extraction and global attention mechanisms for capturing long-range spatial dependencies in geoelectrical resistivity models. To evaluate the effectiveness of the proposed architecture, more than 100,000 synthetic models generated by a two-dimensional staggered-grid finite-difference forward solver are used to construct training and validation datasets for TE and TM apparent resistivity responses, with realistic noise levels applied to simulate field acquisition conditions. A smoothness-aware loss function is further introduced to improve inversion stability and structural continuity. Results from synthetic tests demonstrate that incorporating the Transformer encoder substantially enhances the recovery of large-scale anomalies, structural boundaries, and resistivity contrasts compared with the original ResNet50. The proposed method also exhibits strong generalization capability when applied to real MT field data from southern Africa, producing inversion results highly consistent with those obtained using the nonlinear conjugate gradient (NLCG) method. These findings confirm that the Res-FormerNet architecture provides an effective and robust framework for MT inversion and illustrate the potential of hybrid convolution–attention networks for advancing data-driven electromagnetic inversion.