Multi-Scale Gaussian Mixture Model-Gated Mixture of Experts for Fine-Grained Insect Pest Classification
Abstract
1. Introduction
- (1)
- GMM-gated MoE routing mechanism: To the best of our knowledge, the first application of analytic GMM posterior responsibilities as a spatial routing gate within a convolutional feature hierarchy. The posterior is computed in closed form, replacing deterministic attention weighting with statistically grounded soft assignment of spatial locations to dedicated convolutional expert sub-networks.
- (2)
- Multi-scale routing architecture: Independent GMM-MoE modules applied at three feature depths of a DenseNet-121 backbone, with multi-scale fusion performed via spatial alignment and channel projection.
- (3)
- Conditional prior mechanism π(x): An input-dependent formulation of the GMM mixing coefficients that transforms the standard mixture model into a conditional mixture through a learnable blending coefficient α.
- (4)
- Precision-based variance parametrization: Combined with dimension-aware temperature scaling to ensure consistent routing behavior across layers of varying channel dimensionality.
- (5)
- Data-driven expert initialization: Farthest-point sampling of real feature distributions combined with calibrated variance initialization to prevent expert collapse during training.
2. Materials and Methods
2.1. Dataset (IP102)
2.2. Backbone: DenseNet-121
2.3. Theoretical Background
2.3.1. Gaussian Mixture Model
2.3.2. Mixture of Experts
2.3.3. Relation Between GMM Responsibilities and Expert Routing
2.4. Proposed Method
2.4.1. Overview
- (i)
- Feature Projection: the input feature tensor is first passed through a 1 × 1 convolution followed by Layer Normalization to obtain a lower-dimensional projection , where . This projection serves as the common input to both the GMM routing computation and the expert sub-networks.
- (ii)
- E-Step (GMM Routing): for every spatial location , the analytic posterior responsibilities of the mixture components are computed in closed form, with a conditional prior that adapts the mixing coefficients to the input image content and a dimension-aware temperature that stabilizes the softmax across feature scales of differing dimensionality.
- (iii)
- M-Step (Expert Processing): parallel Conv2D–BatchNorm–ReLU expert sub-networks transform the projected representation independently, and their outputs are aggregated through a responsibility-weighted sum .
- (iv)
- Output Processing: the aggregated expert representation is fused with the original backbone tensor through a residual connection , with an adaptive blending weight that is gradually annealed during training.
- (v)
- Auxiliary Losses: a load balance term , encouraging uniform expert utilization, and a negative log-likelihood term , driving the GMM to fit the spatial feature distribution faithfully, are combined into a single auxiliary loss and added to the classification loss during training.
2.4.2. GMM-MoE Module
2.4.3. Conditional Prior Mechanism
2.4.4. Multi-Scale Ensemble Architecture
2.4.5. Data-Driven Expert Initialization
2.5. Experimental Setup
3. Results and Discussion
3.1. Ablation Study: Effect of Expert Count K
3.2. Final Model Performance and Reproducibility
3.3. Interpretability: Multi-Scale Routing Maps and Expert Specialization
3.4. Comparison with State of the Art
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Oerke, E.-C. Crop losses to pests. J. Agric. Sci. 2006, 144, 31–43. [Google Scholar] [CrossRef]
- Savary, S.; Willocquet, L.; Pethybridge, S.J.; Esker, P.; McRoberts, N.; Nelson, A. The global burden of pathogens and pests on major food crops. Nat. Ecol. Evol. 2019, 3, 430–439. [Google Scholar] [CrossRef] [PubMed]
- Parsa, S.; Morse, S.; Bonifacio, A.; Chancellor, T.C.B.; Condori, B.; Crespo-Pérez, V.; Hobbs, S.L.A.; Kroschel, J.; Ba, M.N.; Rebaudo, F.; et al. Obstacles to integrated pest management adoption in developing countries. Proc. Natl. Acad. Sci. USA 2014, 111, 3889–3894. [Google Scholar] [CrossRef]
- Aktar, W.; Sengupta, D.; Chowdhury, A. Impact of pesticides use in agriculture: Their benefits and hazards. Interdiscip. Toxicol. 2009, 2, 1–12. [Google Scholar] [CrossRef]
- Barzman, M.; Bàrberi, P.; Birch, A.N.E.; Boonekamp, P.; Dachbrodt-Saaydeh, S.; Graf, B.; Hommel, B.; Jensen, J.E.; Kiss, J.; Kudsk, P.; et al. Eight principles of integrated pest management. Agron. Sustain. Dev. 2015, 35, 1199–1215. [Google Scholar] [CrossRef]
- Lima, M.C.F.; Leandro, M.E.D.A.; Valero, C.; Coronel, L.C.P.; Bazzo, C.O.G. Automatic detection and monitoring of insect pests—A review. Agriculture 2020, 10, 161. [Google Scholar] [CrossRef]
- Austen, G.E.; Bindemann, M.; Griffiths, R.A.; Roberts, D.L. Species identification by experts and non-experts: Comparing images from field guides. Sci. Rep. 2016, 6, 33634. [Google Scholar] [CrossRef] [PubMed]
- Xie, C.; Wang, R.; Zhang, J.; Chen, P.; Dong, W.; Li, R.; Chen, T.; Chen, H. Multi-level learning features for automatic classification of field crop pests. Comput. Electron. Agric. 2018, 152, 233–241. [Google Scholar] [CrossRef]
- Kamilaris, A.; Prenafeta-Boldú, F.X. A review of the use of convolutional neural networks in agriculture. J. Agric. Sci. 2018, 156, 312–322. [Google Scholar] [CrossRef]
- Kamilaris, A.; Prenafeta-Boldú, F.X. Deep learning in agriculture: A survey. Comput. Electron. Agric. 2018, 147, 70–90. [Google Scholar] [CrossRef]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
- Tan, M.; Le, Q.V. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning (ICML), Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar] [CrossRef]
- Wu, X.; Zhan, C.; Lai, Y.-K.; Cheng, M.-M.; Yang, J. IP102: A large-scale benchmark dataset for insect pest recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 8779–8788. [Google Scholar] [CrossRef]
- Gomes, J.C.; Borges, D.L. Insect pest image recognition: A few-shot machine learning approach including maturity stages classification. Agronomy 2022, 12, 1733. [Google Scholar] [CrossRef]
- Nanni, L.; Manfè, A.; Maguolo, G.; Lumini, A.; Brahnam, S. High performing ensemble of convolutional neural networks for insect pest image detection. Ecol. Inform. 2022, 67, 101515. [Google Scholar] [CrossRef]
- Liu, H.; Zhan, Y.; Xia, H.; Mao, Q.; Tan, Y. Self-supervised transformer-based pre-training method using latent semantic masking auto-encoder for pest and disease classification. Comput. Electron. Agric. 2022, 203, 107448. [Google Scholar] [CrossRef]
- Xia, W.; Han, D.; Li, D.; Wu, Z.; Han, B.; Wang, J. An ensemble learning integration of multiple CNN with improved vision transformer models for pest classification. Ann. Appl. Biol. 2023, 182, 144–158. [Google Scholar] [CrossRef]
- Chen, Y.; Chen, M.; Guo, M.; Wang, J.; Zheng, N. Pest recognition based on multi-image feature localization and adaptive filtering fusion. Front. Plant Sci. 2023, 14, 1282212. [Google Scholar] [CrossRef]
- Qian, Y.; Xiao, Z.; Deng, Z. Fine-grained crop pest classification based on multi-scale feature fusion and mixed attention mechanisms. Front. Plant Sci. 2025, 16, 1500571. [Google Scholar] [CrossRef]
- An, J.; Du, Y.; Hong, P.; Zhang, L.; Weng, X. Insect recognition based on complementary features from multiple views. Sci. Rep. 2023, 13, 2966. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar] [CrossRef]
- Jacobs, R.A.; Jordan, M.I.; Nowlan, S.J.; Hinton, G.E. Adaptive mixtures of local experts. Neural Comput. 1991, 3, 79–87. [Google Scholar] [CrossRef]
- Jordan, M.I.; Jacobs, R.A. Hierarchical mixtures of experts and the EM algorithm. In Proceedings of the 1993 International Joint Conference on Neural Networks (IJCNN-93), Nagoya, Japan, 25–29 October 1993; Volume 2, pp. 1339–1344. [Google Scholar] [CrossRef]
- Riquelme, C.; Puigcerver, J.; Mustafa, B.; Neumann, M.; Jenatton, R.; Susano Pinto, A.; Keysers, D.; Houlsby, N. Scaling vision with sparse mixture of experts. Adv. Neural Inf. Process. Syst. 2021, 34, 8583–8595. [Google Scholar] [CrossRef]
- Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 1977, 39, 1–22. [Google Scholar] [CrossRef]
- Variani, E.; McDermott, E.; Heigold, G. A Gaussian mixture model layer jointly optimized with discriminative features within a deep neural network architecture. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, QLD, Australia, 19–24 April 2015; pp. 4270–4274. [Google Scholar] [CrossRef]
- van den Oord, A.; Schrauwen, B. Factoring variations in natural images with deep Gaussian mixture models. Adv. Neural Inf. Process. Syst. 2014, 27, 3518–3526. [Google Scholar] [CrossRef]
- Blei, D.M.; Kucukelbir, A.; McAuliffe, J.D. Variational inference: A review for statisticians. J. Am. Stat. Assoc. 2017, 112, 859–877. [Google Scholar] [CrossRef]
- Jordan, M.I.; Ghahramani, Z.; Jaakkola, T.S.; Saul, L.K. An introduction to variational methods for graphical models. In Learning in Graphical Models; Jordan, M.I., Ed.; Springer: Dordrecht, The Netherlands, 1998; pp. 105–161. [Google Scholar] [CrossRef]
- Bishop, C.M. Pattern Recognition and Machine Learning; Springer: New York, NY, USA, 2006; ISBN 978-0-387-31073-2. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar] [CrossRef]
- Wang, Y.; Shi, Y.; Yang, T.; Wang, W.; Sun, Z.; Zhang, Y. Structural performance warning based on computer intelligent monitoring and fractional-order multi-rate Kalman fusion method. Fractal Fract. 2026, 10, 186. [Google Scholar] [CrossRef]
- Ayan, E.; Erbay, H.; Varçın, F. Crop pest classification with a genetic algorithm-based weighted ensemble of deep convolutional neural networks. Comput. Electron. Agric. 2020, 179, 105809. [Google Scholar] [CrossRef]
- Gan, Y.; Guo, Q.; Wang, C.; Liang, W.; Xiao, D.; Wu, H. Recognizing crop pests using an improved EfficientNet model. Trans. Chin. Soc. Agric. Eng. 2022, 38, 203–211. [Google Scholar] [CrossRef]
- Zheng, T.; Yang, X.; Lv, J.; Mi, L.; Wang, S.; Li, W. An efficient mobile model for insect image classification in the field pest management. Eng. Sci. Technol. Int. J. 2023, 39, 101335. [Google Scholar] [CrossRef]









| Parameter | Conv3 | Conv4 | Conv5 |
|---|---|---|---|
| Number of experts | 10 | 10 | 10 |
| Projection dimension | 256 | 512 | 512 |
| Expert kernel size | 5 × 5 | 3 × 3 | 1 × 1 |
| Temperature | 1.3 | 1.6 | 1.4 |
| Load balance weight | 0.02 | 0.03 | 0.03 |
| Routing quality weight | 0.05 | 0.10 | 0.10 |
| Expert dropout rate | 0.1 | 0.1 | 0.1 |
| Configuration | K | Conv3 (%) | Conv4 (%) | Conv5 (%) |
|---|---|---|---|---|
| K = 1 (lower bound) | 1 | 65.66 | 72.37 | 72.47 |
| K = 2 | 2 | 66.40 | 72.79 | 72.72 |
| K = 4 | 4 | 67.45 | 73.00 | 73.10 |
| K = 6 | 6 | 67.92 | 72.94 | 73.11 |
| K = 8 | 8 | 67.95 | 72.91 | 73.40 |
| K = 10 | 10 | 68.39 | 73.28 | 73.64 |
| Run 1 (%) | Run 2 (%) | Run 3 (%) | Mean (%) | Std (%) | 95% CI (%) |
|---|---|---|---|---|---|
| 74.03 | 74.11 | 74.22 | 74.12 | 0.1 | 74.12 ± 0.25 |
| Method | Year | Backbone | Approach | Acc. (%) |
|---|---|---|---|---|
| Wu et al. [14] | 2019 | ResNet-50 | Standard fine-tuning | 49.40 |
| (†) DenseNet-121 [11] | 2017 | DenseNet-121 | Standard fine-tuning | 61.10 |
| Ayan et al. [35] | 2020 | Multi-CNN ensemble | GAEnsemble (VGG/ResNet/Inception/Xception/MobileNet) | 67.13 |
| Gan et al. [36] | 2022 | EfficientNet | Coordinate attention | 69.45 |
| Nanni et al. [16] | 2022 | CNN ensemble | 6-CNN + improved Adam optimizer | 74.11 |
| Zheng et al. [37] | 2023 | EfficientNetV2 | PCNet with coordinate attention | 73.70 |
| Xia et al. [18] | 2023 | DenseNet-201 + ViT | Multi-branch multi-scale ensemble | 74.20 |
| Liu et al. [17] | 2022 | ViT | Self-supervised pre-training (LSMAE) | 74.69 |
| Chen et al. [19] | 2023 | CNN (ResNet-based) | Multi-image feature localization and adaptive filtering fusion | 73.90 |
| GMM-MoE CNN (proposed) | 2026 | DenseNet-121 | Multi-scale probabilistic GMM-gated MoE routing | 74.12 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Şahin, N.; Alpaslan, N.; Hanbay, D. Multi-Scale Gaussian Mixture Model-Gated Mixture of Experts for Fine-Grained Insect Pest Classification. Electronics 2026, 15, 2268. https://doi.org/10.3390/electronics15112268
Şahin N, Alpaslan N, Hanbay D. Multi-Scale Gaussian Mixture Model-Gated Mixture of Experts for Fine-Grained Insect Pest Classification. Electronics. 2026; 15(11):2268. https://doi.org/10.3390/electronics15112268
Chicago/Turabian StyleŞahin, Nurullah, Nuh Alpaslan, and Davut Hanbay. 2026. "Multi-Scale Gaussian Mixture Model-Gated Mixture of Experts for Fine-Grained Insect Pest Classification" Electronics 15, no. 11: 2268. https://doi.org/10.3390/electronics15112268
APA StyleŞahin, N., Alpaslan, N., & Hanbay, D. (2026). Multi-Scale Gaussian Mixture Model-Gated Mixture of Experts for Fine-Grained Insect Pest Classification. Electronics, 15(11), 2268. https://doi.org/10.3390/electronics15112268

