GaPMeS: Gaussian Patch-Level Mixture-of-Experts Splatting for Computation-Limited Sparse-View Feed-Forward 3D Reconstruction
Abstract
1. Introduction
2. Related Work
2.1. Feed-Forward Gaussian Splatting
2.2. Mixture-of-Experts Model and Multi-Task Optimization
3. Methods
3.1. Multimodal Feature Alignment and Patch-Wise Processing
3.2. Task-Oriented Patch-Level Experts Selection
3.3. Multi-Task Decoupled Gaussian Parameter Prediction Module
3.4. Novel View Rendering and Loss Functions
4. Experiments and Discussion
4.1. Evaluation Metrics
4.2. Computational Constraint Experimental Setup
4.3. Effectiveness Comparison Experiments
- Depthsplat [20]: Employs a collaborative training architecture combining FF3DGS with the Depth Anything depth estimation model, enabling the simultaneous optimization of parameters in both models during training.
- Mvsplat [19]: Utilizes a cost-volume similarity computation method to determine candidate depths for each Gaussian ellipsoid, and adopts a cross-view Transformer architecture to extract multi-view features.
- Pixelsplat [18]: Applies an epipolar line pairing and sampling strategy to identify candidate depths for each Gaussian ellipsoid, also employing a cross-view Transformer for feature extraction.
- PixelNeRF [42]: Constructs a 3D-NeRF representation by integrating multi-view 2D image features through a cross-view Transformer model.
- Transplat [43]: Transformer-based method that utilizes depth-awareness to achieve high-quality, generalizable 3D reconstruction and novel view synthesis from sparse input images.
4.4. Ablation Studies and Subtask Independence Analysis
- All-Module: The complete model containing all components.
- w/o-Depth: Removes the depth branch from the multimodal input and its associated expert processing networks.
- w/o-Experts_MoE: Removes the feature-learning expert group and its corresponding gating network.
- w/o-Tasks_MoE: Removes the task-selection routing gating network, directly using the full feature map for Gaussian parameter prediction.
4.5. Generalization Experiments
5. Conclusions
- Gaussian Parameter Prediction via Multi-task Decoupling: We propose a task-decomposed framework that splits Gaussian parameter estimation into four structurally related sub-tasks (mean, covariance, color, and opacity). This strategy mitigates gradient conflicts caused by optimization imbalance in conventional methods, thereby improving geometric accuracy and appearance fidelity in complex scenes.
- Task-oriented Patch-level Mixture-of-Experts Selection: A dual-gating routing mechanism is introduced to adaptively select the most suitable input modalities and expert combinations for each sub-task at the image-patch level. This design enables fine-grained alignment across depth, texture, and semantic features, alleviating representation conflicts and information loss caused by coarse feature fusion.
- Efficient and Computation-Adaptive Architecture: The proposed method eliminates the need for end-to-end backbone optimization during training. Through sparse activation and path separation, GaPMeS maintains a lightweight parameter count (14.6 M) while surpassing all baselines under the same experimental settings. Specifically, it achieves an SSIM of 0.709 on the 4-view RealEstate10K dataset (+3.5%), a PSNR of 19.57 on the 2-view DL3DV dataset (+2.7%), and a 26.0% relative SSIM improvement on the custom industrial dataset.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Chen, T.; Yang, Q.; Chen, Y. Overview of NeRF Technology and Applications. J. Comput.-Aided Des. Comput. Graph. 2025, 37, 51–74. [Google Scholar] [CrossRef]
- Fei, B.; Xu, J.; Zhang, R.; Zhou, Q.; Yang, W.; He, Y. 3d gaussian splatting as new era: A survey. IEEE Trans. Vis. Comput. Graph. 2024, 31, 4429–4449. [Google Scholar] [CrossRef] [PubMed]
- Kerbl, B.; Kopanas, G.; Leimkühler, T.; Drettakis, G. 3D Gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. 2023, 42, 139:1–139:14. [Google Scholar] [CrossRef]
- Wu, T.; Yuan, Y.; Zhang, L.; Yang, J.; Cao, Y.; Yan, L. Recent advances in 3d gaussian splatting. Comput. Vis. Media 2024, 10, 613–642. [Google Scholar] [CrossRef]
- Tan, Z.; Zhou, Z.; Ge, Y.; Wang, Z.; Chen, X.; Hu, D. Td-nerf: Novel truncated depth prior for joint camera pose and neural radiance field optimization. In Proceedings of the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Abu Dhabi, United Arab Emirates, 14–18 October 2024; IEEE: New York, NY, USA, 2024; pp. 372–379. [Google Scholar]
- Wang, B.; Zhang, D.; Su, Y.; Zhang, H. Enhancing View Synthesis with Depth-Guided Neural Radiance Fields and Improved Depth Completion. Sensors 2024, 24, 1919. [Google Scholar] [CrossRef]
- Wang, J.; Xiao, J.; Zhang, X.; Xu, X.; Jin, T.; Jin, Z. Depth-based dynamic sampling of neural radiation fields. Electronics 2023, 12, 1053. [Google Scholar] [CrossRef]
- Deng, K.; Liu, A.; Zhu, J.-Y.; Ramanan, D. Depth-supervised nerf: Fewer views and faster training for free. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 12882–12891. [Google Scholar]
- Wang, J.; Shao, H.; Deng, X.; Jiang, Y. PIDSNeRF: Pose interpolation depth supervision neural radiance fields for view synthesis from challenging input. Multimed. Tools Appl. 2025, 84, 22539–22559. [Google Scholar] [CrossRef]
- Li, J.; Zhang, J.; Bai, X.; Zheng, J.; Ning, X.; Zhou, J.; Gu, L. Dngaussian: Optimizing sparse-view 3d gaussian radiance fields with global-local depth normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 20775–20785. [Google Scholar]
- Kumar, R.; Vats, V. Few-shot novel view synthesis using depth aware 3D gaussian splatting. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; Springer Nature: Cham, Switzerland, 2024; pp. 1–13. [Google Scholar]
- Xu, J.; Gao, S.; Shan, Y. Freesplatter: Pose-free gaussian splatting for sparse-view 3d reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Honolulu, HI, USA, 19–25 October 2025; pp. 25442–25452. [Google Scholar]
- Jiang, C.; Gao, R.; Shao, K.; Wang, Y.; Xiong, R.; Zhang, Y. Li-gs: Gaussian splatting with lidar incorporated for accurate large-scale reconstruction. IEEE Robot. Autom. Lett. 2024, 10, 1864–1871. [Google Scholar] [CrossRef]
- Zhou, C.; Fu, L.; Peng, S.; Yan, Y.; Zhang, Z.; Chen, Y.; Xia, J.; Zhou, X. LiDAR-RT: Gaussian-based ray tracing for dynamic lidar re-simulation. In Proceedings of the Computer Vision and Pattern Recognition Conference, Shanghai, China, 15–18 October 2025; pp. 1538–1548. [Google Scholar]
- Peng, R.; Xu, W.; Tang, L.; Wang, R.; Xu, W. Structure consistent gaussian splatting with matching prior for few-shot novel view synthesis. Adv. Neural Inf. Process. Syst. 2024, 37, 97328–97352. [Google Scholar]
- Han, L.; Zhou, J.; Liu, Y.S.; Zhou, J. Binocular-guided 3d gaussian splatting with view consistency for sparse view synthesis. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 10–15 December 2024; Volume 37, pp. 68595–68621. [Google Scholar]
- Guo, W.; Xu, X.; Yin, H.; Wang, Z.; Feng, J.; Zhou, J.; Lu, J. IGL-Nav: Incremental 3D Gaussian Localization for Image-goal Navigation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Honolulu, HI, USA, 19–25 October 2025; pp. 6808–6817. [Google Scholar]
- Charatan, D.; Li, S.; Tagliasacchi, A.; Sitzmann, V. pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 19457–19467. [Google Scholar]
- Chen, Y.; Xu, H.; Zheng, C.; Zhuang, B.; Pollefeys, M.; Geiger, A.; Cham, T.-J.; Cai, J. Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; Springer Nature: Cham, Switzerland, 2024; pp. 370–386. [Google Scholar]
- Xu, H.; Peng, S.; Wang, F.; Blum, H.; Barath, D.; Geiger, A.; Pollefeys, M. Depthsplat: Connecting gaussian splatting and depth. In Proceedings of the Computer Vision and Pattern Recognition Conference, Shanghai, China, 15–18 October 2025; pp. 16453–16463. [Google Scholar]
- Shao, J.; Zhang, H.; Miao, J. Depthanything and SAM for UIE: Exploring large model information contributes to underwater image restoration. Mach. Vis. Appl. 2025, 36, 47. [Google Scholar] [CrossRef]
- Zheng, Y.; Jiang, Z.; He, S.; Sun, Y.; Dong, J.; Zhang, H.; Du, Y. NexusGS: Sparse View Synthesis with Epipolar Depth Priors in 3D Gaussian Splatting. In Proceedings of the Computer Vision and Pattern Recognition Conference, Shanghai, China, 15–18 October 2025; pp. 26800–26809. [Google Scholar]
- Tong, S.; Liu, Z.; Zhai, Y.; Ma, Y.; Lecun, Y.; Xie, S. Eyes wide shut? exploring the visual shortcomings of multimodal llms. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 9568–9578. [Google Scholar]
- Oquab, M.; Darcet, T.; Moutakanni, T.; Vo, H.; Szafraniec, M.; Khalidov, V.; Fernandez, P.; Haziza, D.; Massa, F.; El-Nouby, A.; et al. Dinov2: Learning robust visual features without supervision. arXiv 2023, arXiv:2304.07193. [Google Scholar]
- Huang, Y.; Zou, J.; Meng, L.; Yue, X.; Zhao, Q.; Li, J.; Song, C.; Jimenez, G.; Li, S.; Fu, G. Comparative analysis of imagenet pre-trained deep learning models and dinov2 in medical imaging classification. In Proceedings of the 2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC), Osaka, Japan, 2–4 July 2024; IEEE: New York, NY, USA, 2024; pp. 297–305. [Google Scholar]
- Gan, W.; Ning, Z.; Qi, Z.; Yu, P.S. Mixture of experts (moe): A big data perspective. Inf. Fusion 2026, 127, 103664. [Google Scholar] [CrossRef]
- Yu, J.; Liu, X.; Luo, C.; Huang, J.; Zhou, R.; Liu, Y.; Hu, J.; Chen, J.; Zhang, K.; Zhang, D.; et al. Multitask learning 1997–2024: Part I fundamentals. Harv. Data Sci. Rev. 2025, 7. [Google Scholar] [CrossRef]
- Chen, J.; Er, M.J. Mitigating gradient conflicts via expert squads in multi-task learning. Neurocomputing 2025, 614, 128832. [Google Scholar] [CrossRef]
- Ma, J.; Zhao, Z.; Yi, X.; Chen, J.; Hong, L.; Chi, E.H. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 1930–1939. [Google Scholar]
- Tang, A.; Shen, L.; Luo, Y.; Yin, N.; Zhang, L.; Tao, D. Merging Multi-Task Models via Weight-Ensembling Mixture of Experts. In Proceedings of the International Conference on Machine Learning, Vienna, Austria, 21–27 July 2024; pp. 47778–47799. [Google Scholar]
- Zhu, X.; Hu, Y.; Mo, F.; Wu, J.; Zhu, X. Uni-med: A unified medical generalist foundation model for multi-task learning via connector-MoE. Adv. Neural Inf. Process. Syst. 2024, 37, 81225–81256. [Google Scholar]
- Li, Y.; Li, X.; Li, Y.; Zhang, Y.; Dai, Y.; Hou, Q.; Cheng, M.-M.; Yang, J. Sm3det: A unified model for multi-modal remote sensing object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2025. [Google Scholar]
- Herrmann, C.; Bowen, R.S.; Zabih, R. Channel selection using gumbel softmax. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer International Publishing: Cham, Switzerland, 2020; pp. 241–257. [Google Scholar]
- Chen, L.; Xiang, Z.; Lei, K.; Zhang, X.-Y. Multi-Task Model Fusion via Adaptive Merging. In Proceedings of the 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India, 6–11 April 2025; IEEE: New York, NY, USA, 2025; pp. 1–5. [Google Scholar]
- Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 586–595. [Google Scholar]
- Shah, S.R.; Qadri, S.; Bibi, H.; Shah, S.M.W.; Sharif, M.I.; Marinello, F. Comparing inception V3, VGG 16, VGG 19, CNN, and ResNet 50: A case study on early detection of a rice disease. Agronomy 2023, 13, 1633. [Google Scholar] [CrossRef]
- Zhou, T.; Tucker, R.; Flynn, J.; Fyffe, G.; Snavely, N. Stereo magnification: Learning view synthesis using multiplane images. ACM Trans. Graph. (TOG) 2018, 37, 1–12. [Google Scholar] [CrossRef]
- Ling, L.; Sheng, Y.; Tu, Z.; Zhao, W.; Xin, C.; Wan, K.; Yu, L.; Guo, Q.; Yu, Z.; Lu, Y.; et al. Dl3dv-10k: A large-scale scene dataset for deep learning-based 3d vision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 22160–22169. [Google Scholar]
- Zhao, Q.; Li, T.; Du, M.; Jiang, Y.; Sun, Q.; Wang, Z.; Liu, H.; Xu, H. UniMatch: A Unified User-Item Matching Framework for the Multi-purpose Merchant Marketing. In Proceedings of the 2023 IEEE 39th International Conference on Data Engineering (ICDE), Anaheim, CA, USA, 3–7 April 2023; IEEE: New York, NY, USA, 2023; pp. 3309–3321. [Google Scholar]
- Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Chen, W. Lora: Low-rank adaptation of large language models. In Proceedings of the International Conference on Learning Representations, Virtual, 25–29 April 2022; Volume 1, p. 3. [Google Scholar]
- Yu, A.; Ye, V.; Tancik, M.; Kanazawa, A. pixelnerf: Neural radiance fields from one or few images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 4578–4587. [Google Scholar]
- Zhang, C.; Zou, Y.; Li, Z.; Yi, M.; Wang, H. Transplat: Generalizable 3d gaussian splatting from sparse multi-view images with transformers. Proc. AAAI Conf. Artif. Intell. 2025, 39, 9869–9877. [Google Scholar] [CrossRef]
- Jia, H.; Zhu, L.; Zhao, N. H3R: Hybrid Multi-view Correspondence for Generalizable 3D Reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Honolulu, HI, USA, 19–25 October 2025; pp. 7655–7665. [Google Scholar]
- Yu, T.; Kumar, S.; Gupta, A.; Levine, S.; Hausman, K.; Finn, C. Gradient surgery for multi-task learning. Adv. Neural Inf. Process. Syst. 2020, 33, 5824–5836. [Google Scholar]
- Wang, J.; Chen, M.; Karaev, N.; Vedaldi, A.; Rupprecht, C.; Novotny, D. Vggt: Visual geometry grounded transformer. In Proceedings of the Computer Vision and Pattern Recognition Conference, Shanghai, China, 15–18 October 2025; pp. 5294–5306. [Google Scholar]
- Mildenhall, B.; Srinivasan, P.P.; Tancik, M.; Barron, J.T.; Ramamoorthi, R.; Ng, R. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 2021, 65, 99–106. [Google Scholar] [CrossRef]
- Xu, Q.; Xu, Z.; Philip, J.; Bi, S.; Shu, Z.; Sunkavalli, K.; Neumann, U. Point-nerf: Point-based neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5438–5448. [Google Scholar]






| Configuration Type | Setting | Notes |
|---|---|---|
| gaussians_per_pixel | 1 | Number of Gaussians per pixel |
| gaussian_adapter.gaussian_scale_min | 1 × 10−10 | Minimum Gaussian scale |
| gaussian_adapter.gaussian_scale_max | 3.0 | Maximum Gaussian scale |
| gaussian_adapter.sh_degree | 2 | Spherical harmonics degree |
| trainer.max_steps | 300,001 | Maximum training steps |
| MOE.num_moe_experts | 6 | Number of MOE experts |
| MOE.num_moe_tasks | 4 | Number of MOE tasks |
| MOE.moe_top_k | 2 | Number of experts selected each time |
| MOE.moe_hidden_dim | 256 | Channel count of each expert’s CNN |
| MOE.moe_routing_dim | 128 | Hidden layer dimension of the routing network |
| MOE.patch_size | 16 | Patch size |
| MOE.τ | 0.8 | MOE temperature parameter |
| loss.lpips.weight | 0.25 | LPIPS loss weight |
| LR_rate | 0.001 | Learning rate of optimizer |
| Model Type | Params (M) | Inference FPS | Inference Consumption | |||
|---|---|---|---|---|---|---|
| Ours | 14.6 | 0.709 | 20.20 | 0.258 | ~51 | 1998 MB |
| Transplat [43] | 15.6 | 0.681 | 19.62 | 0.281 | ~49 | 2348 MB |
| Depthsplat [20] | 117.0 | 0.685 | 19.80 | 0.272 | ~28 | 5224 MB |
| Mvsplat [19] | 16.0 | 0.653 | 19.15 | 0.291 | ~47 | 2906 MB |
| Pixelsplat [18] | 125.8 | 0.638 | 18.90 | 0.305 | ~32 | 5319 MB |
| PixelNeRF [42] | 28.2 | 0.589 | 16.78 | 0.324 | ~34 | 3478 MB |
| Model Type | Params (M) | Inference FPS | Inference Consumption | |||
|---|---|---|---|---|---|---|
| Ours | 14.6 | 0.624 | 19.57 | 0.359 | ~44 | 1998 MB |
| Transplat [43] | 15.6 | 0.593 | 18.98 | 0.374 | ~37 | 2228 MB |
| Depthsplat [20] | 117.0 | 0.610 | 19.05 | 0.363 | ~25 | 5224 MB |
| Mvsplat [19] | 16.0 | 0.529 | 17.54 | 0.402 | ~34 | 2906 MB |
| Pixelsplat [18] | 125.8 | OOM | OOM | OOM | ~23 | 5319 MB |
| PixelNeRF [42] | 28.2 | 0.492 | 15.92 | 0.381 | ~33 | 3478 MB |
| Model Type | |||
|---|---|---|---|
| Ours | 0.644 | 22.33 | 0.312 |
| Depthsplat [20] | OOM | OOM | OOM |
| Mvsplat [19] | 0.502 | 16.66 | 0.422 |
| Pixelsplat [18] | OOM | OOM | OOM |
| PixelNeRF [42] | OOM | OOM | OOM |
| Model Type | |||
|---|---|---|---|
| All-Module | 0.709 | 20.20 | 0.258 |
| w/o-Depth | 0.473 | 15.05 | 0.486 |
| w/o-Experts_MoE | 0.511 | 15.78 | 0.488 |
| w/o-Tasks_MoE | 0.466 | 14.99 | 0.479 |
| Parameters | Value | |||
|---|---|---|---|---|
| MOE.τ | 0.8 | 0.709 | 20.20 | 0.258 |
| 0.6 | 0.683 | 19.89 | 0.267 | |
| 0.4 | 0.682 | 19.91 | 0.271 | |
| MOE.num_moe_experts | 4 | 0.613 | 14.78 | 0.311 |
| 6 | 0.709 | 20.20 | 0.258 | |
| 8 | 0.687 | 17.48 | 0.299 | |
| MOE.moe_top_k | 1 | 0.691 | 19.64 | 0.272 |
| 2 | 0.709 | 20.20 | 0.258 | |
| 4 | 0.642 | 16.54 | 0.293 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Liu, J.; Liu, W.; Guo, R. GaPMeS: Gaussian Patch-Level Mixture-of-Experts Splatting for Computation-Limited Sparse-View Feed-Forward 3D Reconstruction. Appl. Sci. 2026, 16, 1108. https://doi.org/10.3390/app16021108
Liu J, Liu W, Guo R. GaPMeS: Gaussian Patch-Level Mixture-of-Experts Splatting for Computation-Limited Sparse-View Feed-Forward 3D Reconstruction. Applied Sciences. 2026; 16(2):1108. https://doi.org/10.3390/app16021108
Chicago/Turabian StyleLiu, Jinwen, Wenchao Liu, and Rui Guo. 2026. "GaPMeS: Gaussian Patch-Level Mixture-of-Experts Splatting for Computation-Limited Sparse-View Feed-Forward 3D Reconstruction" Applied Sciences 16, no. 2: 1108. https://doi.org/10.3390/app16021108
APA StyleLiu, J., Liu, W., & Guo, R. (2026). GaPMeS: Gaussian Patch-Level Mixture-of-Experts Splatting for Computation-Limited Sparse-View Feed-Forward 3D Reconstruction. Applied Sciences, 16(2), 1108. https://doi.org/10.3390/app16021108
