WMN: A Multi-Scale Nested Mixture-of-Experts-Based Method for High-Resolution Remote-Sensing Solid Waste Site Extraction and Monitoring
Abstract
1. Introduction
- (1)
- Solid Waste Site Detection Using Multi-Source Spatiotemporal Data
- (2)
- Solid Waste Site Detection in High-Resolution Remote-Sensing Imagery with Deep Learning Integration
- ①
- Integration of multiple deep learning models
- ②
- Targeted modifications to the internal architecture of deep learning models
- (1)
- We propose WMN, a novel network specifically designed for extracting solid waste sites from high-resolution remote-sensing imagery, which enables complex scene understanding through adaptive perception and expert collaboration. The design of WMN addresses a core challenge in this task: solid waste sites are represented as a single SIO at the semantic level, yet they are composed of multiple PIPs with significant variations in scale, morphology, and spectral characteristics. To address this issue, WMN introduces a MOE–based perception paradigm and incorporates two task-oriented modules: the Dynamic Adaptive Receptive-field Mixture of Experts (DARF-MOE) and the Nested Mixture of Experts (NST-MOE). Specifically, DARF-MOE dynamically adjusts receptive fields to accommodate scale and structural heterogeneity among PIPs within an SIO, whereas NST-MOE is designed to model higher-order semantic discrepancies between different SIOs under conditions where PIPs exhibit strong visual similarity. Together, these modules enable fine-grained perception and robust recognition of solid waste sites in complex backgrounds.
- (2)
- We introduce a KAN linear layer to enhance the efficiency of the model in engineering applications. By learning the optimal mask coefficients, we can achieve accurate solid waste site segmentation with fewer parameters. Moreover, by leveraging the interpretability of the KAN linear layer, explicit formulas can be established to explain which features determine the generation of these “mask coefficients”, thereby providing a quantitative description of the interpretable relationship between PIPs and SIOs.
- (3)
- We built a high-resolution remote-sensing dataset for solid waste site extraction, covering five categories: TP, CSS, LS, GDS, and ES. Annotations follow the SIO–PIP relationship: each waste site is an SIO, and its annotation boundary is defined by its core internal PIPs—specifically selected based on composition and spatial organization. For each SIO category, we explicitly define the essential PIPs and exclude non-representative ones that cause semantic drift. This minimizes ambiguity from background clutter and locally similar PIPs.
- (4)
- A “GIS-based remote-sensing solid waste pollution risk prevention system” was developed to assist practitioners in more convenient and efficient supervision of local solid waste sites. This system has been practically applied by a social environmental protection NGO in Changsha, China, to monitor solid waste sites across Hunan Province, and it has achieved promising results. In this study, a distinct empirical experiment was conducted in Loudi City, Hunan Province. The empirical outcomes can be obtained upon request by contacting the correspondence author.
2. Methods
2.1. DARF-MOE Module
2.2. NST-MOE Module
2.3. KAN Mask Coefficient Prediction Module
2.4. BUCEA-SWS Dataset
3. Experimental Results and Analysis
3.1. Datasets
- (1)
- BUCEA-SWS Dataset: The dataset constructed in this study, covering five categories of solid waste sites, is described in detail in Section 2.4.
- (2)
- Global Dumpsite Test Data [22]: The dataset covers multiple major cities in Africa and Asia, primarily sourced from the Google Earth platform. Based on the origin and distribution patterns of the waste, all solid waste sites are divided into six categories: agriculture forestry, construction waste, disposed garbage, domestic garbage, industry waste, and mining waste. These six types of waste sites have been labeled and annotated.
- (3)
- Open Source Tailings Pond Dataset [45]: The dataset covers multiple cities in Anhui Province, China, comprising 352 positive samples and 430 negative samples of Google satellite imagery with a spatial resolution of 2.05 m. The original images were divided into 500 × 500-pixel patches, and the locations of tailings ponds were annotated based on the target positional information.
- (4)
- Open Pit Mine Object Detection Dataset [46]: The dataset consists of 4617 remote-sensing image patches of open-pit mining areas, each sized 1024 × 1024 pixels, along with their corresponding object detection bounding boxes. All bounding boxes were meticulously manually annotated using the Labelme tool.
- (5)
- Tailings Ponds in Henan Province [47]: This dataset is constructed based on multi-year Chinese high-resolution optical remote-sensing satellite imagery through data processing, manual interpretation and annotation, as well as image tiling. It is an open-access dataset designed for tailings pond detection in Henan Province, China. It contains 1183 image tiles and 1728 object instances, featuring multi-temporal coverage across four years: 2016, 2018, 2020, and 2021.
3.2. Training Details
3.3. Comparison Methods and Evaluation Metrics
3.4. Comparison Results and Analysis
3.5. Effectiveness Analysis of the Proposed Model Architecture
4. Case Study of Solid Waste Site Detection in Hunan Province
4.1. Empirical Workflow
4.2. Empirical Results
4.3. GIS-Based Remote-Sensing Solid Waste Pollution Risk Prevention System
5. Discussion
- (1)
- It should also be noted that the representativeness of the dataset and the comparability of results across different datasets still have certain limitations. The BUCEA-SWS dataset constructed in this study is mainly collected from southern Chinese provinces, although it covers multiple scene types, including urban areas, urban–rural fringe zones, industrial parks, and mountainous regions. Therefore, it may not fully represent solid waste sites under different climatic, geological, and landscape conditions, such as arid regions, cold regions, northern mining areas, or areas with substantially different vegetation and soil backgrounds. In addition, the open-source datasets used for comparative experiments differ in spatial resolution, geographic background, annotation standards, object scale, and category composition. As a result, the performance values obtained on different datasets should not be directly compared as cross-dataset rankings. Instead, these experiments are intended to evaluate the relative performance of different models within each dataset under the same experimental settings, and to provide supplementary evidence for the adaptability of the proposed method under different data conditions. In future work, the dataset will be further expanded to broader geographic regions and more diverse environmental conditions, and more standardized cross-region evaluation protocols will be explored.
- (2)
- Figure 17 shows the remaining challenging cases for the proposed method in complex remote-sensing scenes. In panel (a), the model produces false extractions when encountering areas that are highly similar to GDS, as highlighted by the red circle in the Image row. In panel (b), when GDS are irregularly dumped along roadsides, as indicated by the red circle, the model fails to accurately extract every individual waste pile. In panel (c), false extraction occurs when yellowish piled materials exhibit both color and stacking patterns similar to those of CSS. Panel (d) shows the failure cases for large-scale targets such as TP, LS, or ES, where incomplete recognition of internal PIPs leads to fragmented extraction results. Moreover, when the integrity of a target is disrupted at the image boundary, the model may fail to identify the object.
- (3)
- Although the proposed MOE-based structure improves the adaptability of feature representation, its training process is more complex than that of a conventional single-path network. In particular, the load-balancing strategy encourages different experts to participate in feature learning, which helps alleviate the risk of expert collapse. However, this regulation may also introduce certain fluctuations during training. When the model temporarily relies more strongly on a subset of experts, the load-balancing mechanism may adjust the routing tendency to promote the participation of other experts, causing the model to temporarily deviate from a locally stable routing pattern and leading to short-term performance oscillations.
- (4)
- All models were trained and evaluated using fixed dataset splits, unified training settings, and consistent evaluation criteria. Although this design ensured the fairness and consistency of the model comparisons, the study did not further examine model performance from the perspective of statistical significance.
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
| Model | Box (mAP50) | Seg (mAP50) | ||||||
|---|---|---|---|---|---|---|---|---|
| ES | TP | LS | All | ES | TP | LS | All | |
| FasterRCNN (2017) | 73.6 | 74.9 | 72.7 | 73.7 | --- | --- | --- | --- |
| RetinaNet (2017) | 73.5 | 68.2 | 71.1 | 70.9 | --- | --- | --- | --- |
| Transformer (2019) | 75.1 | 72.3 | 74.8 | 74.1 | --- | --- | --- | --- |
| CSL (2020) | 74.7 | 64.4 | 70.1 | 69.7 | --- | --- | --- | --- |
| R3Det (2021) | 74.3 | 69.5 | 73.0 | 72.3 | --- | --- | --- | --- |
| PKI (2024) | 76.8 | 76.8 | 76.4 | 76.7 | --- | --- | --- | --- |
| RT-Dert (2024) | 76.3 | 76.6 | 79.9 | 77.6 | --- | --- | --- | --- |
| YOLOv11 (2024) | 77.9 | 83.1 | 81.1 | 80.7 | 79.8 | 85.3 | 82.0 | 82.4 |
| YOLOv12 (2025) | 77.8 | 76.5 | 80.8 | 78.4 | 79.0 | 79.3 | 81.4 | 79.9 |
| LEGNet (2025) | 72.4 | 68.9 | 72.4 | 71.2 | --- | --- | --- | --- |
| FBRT (2025) | 77.8 | 78.5 | 81.7 | 79.3 | --- | --- | --- | --- |
| Propose+ (YOLOv11) | 80.9 | 85.4 | 83.1 | 83.1 | 83.0 | 86.7 | 84.1 | 84.6 |
| Propose+ (YOLOv12) | 77.9 | 84.6 | 83.1 | 81.9 | 80.1 | 85.5 | 83.4 | 83.0 |
| Model | Box (mAP50) | Seg (mAP50) | ||||
|---|---|---|---|---|---|---|
| GDS | CSS | All | GDS | CSS | All | |
| FasterRCNN (2017) | 62.4 | 54.7 | 58.6 | - | - | - |
| RetinaNet (2017) | 47.2 | 46.3 | 46.7 | - | - | - |
| Transformer (2019) | 59.4 | 51.2 | 55.3 | - | - | - |
| CSL (2020) | 58.8 | 53.9 | 56.3 | - | - | - |
| R3Det (2021) | 60.8 | 47.0 | 53.9 | - | - | - |
| PKI (2024) | 63.8 | 56.1 | 59.9 | - | - | - |
| RT-Dert (2024) | 70.5 | 59.4 | 64.9 | - | - | - |
| YOLOv11 (2024) | 71.3 | 60.2 | 65.8 | 74.6 | 57.7 | 66.1 |
| YOLOv12 (2025) | 74.1 | 59.2 | 66.6 | 78.6 | 61.6 | 70.1 |
| LEGNet (2025) | 68.8 | 61.0 | 64.9 | - | - | - |
| FBRT (2025) | 70.9 | 60.4 | 65.6 | - | - | - |
| Propose+ (YOLOv11) | 75.5 | 61.4 | 68.5 | 78.1 | 61.2 | 69.6 |
| Propose+ (YOLOv12) | 74.2 | 63.5 | 68.8 | 75.4 | 64.1 | 69.8 |
| Model | mAP50 | mAP50–95 | Precision | Recall | F1-Score |
|---|---|---|---|---|---|
| FasterRCNN (2017) | 34.8 | 14.1 | 41.6 | 44.5 | 43.0 |
| RetinaNet (2017) | 25.2 | 10.2 | 30.4 | 46.0 | 36.6 |
| Transformer (2019) | 43.0 | 16.1 | 50.9 | 51.5 | 51.2 |
| CSL (2020) | 34.1 | 15.2 | 39.2 | 49.2 | 43.6 |
| R3Det (2021) | 41.1 | 18.8 | 45.4 | 50.9 | 48.0 |
| PKI (2024) | 59.0 | 35.2 | 71.8 | 56.9 | 63.5 |
| RT-DETR (2024) | 48.5 | 27.3 | 49.6 | 50.3 | 49.9 |
| YOLOv11 (2024) | 55.6 | 35.3 | 50.8 | 53.2 | 52.0 |
| YOLOv12 (2025) | 52.6 | 32.5 | 50.2 | 52.6 | 51.4 |
| LEGNet (2025) | 56.7 | 29.7 | 60.1 | 60.4 | 60.2 |
| FBRT (2025) | 56.4 | 33.6 | 52.9 | 54.8 | 53.8 |
| Propose+ (YOLOv11) | 57.9 | 35.6 | 54.6 | 53.8 | 54.2 |
| Propose+ (YOLOv12) | 57.7 | 36.7 | 66.4 | 46.4 | 54.6 |
| Model | mAP50 | mAP50–95 | Precision | Recall | F1-Score |
|---|---|---|---|---|---|
| FasterRCNN (2017) | 51.7 | 27.0 | 60.0 | 51.8 | 55.6 |
| RetinaNet (2017) | 49.8 | 22.2 | 59.1 | 50.2 | 54.3 |
| Transformer (2019) | 50.2 | 22.3 | 56.2 | 52.8 | 54.4 |
| CSL (2020) | 50.3 | 24.8 | 58.7 | 50.0 | 54.0 |
| R3Det (2021) | 51.0 | 24.7 | 59.0 | 51.2 | 54.8 |
| PKI (2024) | 54.6 | 28.4 | 61.8 | 55.8 | 58.6 |
| RT-DETR (2024) | 54.1 | 31.4 | 52.6 | 55.3 | 53.9 |
| YOLOv11 (2024) | 55.7 | 33.7 | 57.0 | 52.6 | 54.7 |
| YOLOv12 (2025) | 55.7 | 33.9 | 56.6 | 54.2 | 55.4 |
| LEGNet (2025) | 50.2 | 28.7 | 58.7 | 54.2 | 56.4 |
| FBRT (2025) | 56.5 | 33.9 | 56.5 | 53.6 | 55.0 |
| Propose+ (YOLOv11) | 56.3 | 34.1 | 58.6 | 52.8 | 55.5 |
| Propose+ (YOLOv12) | 56.9 | 34.5 | 58.8 | 54.3 | 56.5 |
| Model | mAP50 | mAP50–95 | Precision | Recall | F1-Score |
|---|---|---|---|---|---|
| FasterRCNN (2017) | 36.8 | 14.2 | 53.7 | 42.3 | 47.3 |
| RetinaNet (2017) | 50.9 | 18.2 | 69.3 | 50.0 | 58.1 |
| Transformer (2019) | 50.1 | 20.5 | 57.0 | 54.8 | 55.9 |
| CSL (2020) | 38.8 | 14.7 | 58.8 | 38.5 | 46.5 |
| R3Det (2021) | 46.0 | 16.9 | 60.0 | 49.0 | 53.9 |
| PKI (2024) | 46.5 | 17.5 | 59.6 | 53.8 | 56.6 |
| RT-DETR (2024) | 35.2 | 18.7 | 41.1 | 34.6 | 37.6 |
| YOLOv11 (2024) | 49.5 | 29.4 | 59.7 | 35.6 | 44.6 |
| YOLOv12 (2025) | 43.9 | 25.2 | 55.0 | 31.7 | 40.2 |
| LEGNet (2025) | 51.1 | 20.5 | 49.2 | 58.7 | 53.5 |
| FBRT (2025) | 48.5 | 30.3 | 57.4 | 37.5 | 45.4 |
| Propose+ (YOLOv11) | 52.0 | 31.1 | 52.5 | 50.0 | 51.2 |
| Propose+ (YOLOv12) | 47.4 | 27.9 | 65.2 | 28.8 | 40.0 |
| Model | mAP50 | mAP50–95 | Precision | Recall | F1-Score |
|---|---|---|---|---|---|
| FasterRCNN (2017) | 44.4 | 9.8 | 61.7 | 55.7 | 58.5 |
| RetinaNet (2017) | 67.6 | 23.8 | 77.1 | 67.2 | 71.8 |
| Transformer (2019) | 64.5 | 21.0 | 70.8 | 73.9 | 72.3 |
| CSL (2020) | 59.5 | 21.2 | 69.3 | 63.2 | 66.1 |
| R3Det (2021) | 66.0 | 21.3 | 78.2 | 69.9 | 73.8 |
| PKI (2024) | 75.2 | 30.8 | 81.5 | 74.9 | 78.1 |
| RT-DETR (2024) | 76.7 | 55.1 | 84.1 | 68.4 | 75.4 |
| YOLOv11 (2024) | 80.6 | 47.2 | 80.7 | 74.1 | 77.3 |
| YOLOv12 (2025) | 82.1 | 53.3 | 82 | 72.7 | 77.1 |
| LEGNet (2025) | 76.2 | 32.1 | 80.2 | 80.1 | 80.1 |
| FBRT (2025) | 82.7 | 50.0 | 76.9 | 78.7 | 77.8 |
| Propose+ (YOLOv11) | 84.2 | 55.0 | 89.1 | 73.6 | 80.6 |
| Propose+ (YOLOv12) | 83.1 | 53.9 | 85.1 | 72.6 | 78.4 |
Appendix B
| TP | CSS | LS | GDS | ES | |
|---|---|---|---|---|---|
| High-resolution remote-sensing imagery | ![]() | ![]() | ![]() | ![]() | ![]() |
![]() | ![]() | ![]() | ![]() | ![]() | |
![]() | ![]() | ![]() | ![]() | ![]() | |
| Geometric Features | Irregular pond-like in shape, with natural edges | Individual units form irregularly stacked pyramids. The overall layout exhibits a spatial arrangement pattern. | Built to follow the contours of the terrain, it has an irregular shape. | Irregular shape, chaotic outline. | Features an irregularly pitted structure. |
| Spectral Features | The pools display diverse colors, such as grayish white, blue-green, and earthy yellow. | The surface of the pile appears in shades of earthy yellow, brown, or gray. | Membranes are black, dark gray, or grayish white in color. | Appear grayish-white or bright white, with high reflectivity. | Yellowish-brown, grayish-white, or rust-colored, with high reflectivity. |
| Texture Features | The mining pool exhibits smooth and delicate characteristics, while the intercepting dam features a unique linear profile. | The surface of the construction debris is rough and uneven in color, exhibiting a mottled, blocky appearance. | The surface of the covering film is flat yet rough, exhibiting a distinct reticulated pattern. | Features strong local brightness contrast and coarse, disordered characteristics. | The surface is fractured and rough, often accompanied by distinct digging marks. |
| Spatial configuration | Comprising an intercepting dam, dam body, and tailings pond, it is often constructed on mountainous or forested terrain. | Structures, characterized by loose or dense accumulation, are commonly found near construction sites and exposed areas along suburban roadsides. | Enclosed by transport roads, with an internal covering membrane, it adheres to mountainous and forested terrain. | Unevenly distributed sites often appear scattered, predominantly located in urban–rural fringe areas or on open, bare land. | Pits formed by vertical cliff walls and construction structures are often surrounded by mountain forests or farmland. |
References
- Fraternali, P.; Morandini, L.; González, S.L.H. Solid Waste Detection, Monitoring and Mapping in Remote Sensing Images: A Survey. Waste Manag. 2024, 189, 88–102. [Google Scholar] [CrossRef] [PubMed]
- Lv, Y. Research on the Identification and Risk Assessment of Solid Waste Landfill Sites Based on Multi-Source Information from Satellite, Aerial, and Ground Observations. Master’s Thesis, Zhejiang University, Hangzhou, China, 2023. [Google Scholar]
- Ministry of Ecology and Environment of the People’s Republic of China. The Ministry and Seven Departments Jointly Launched a Nationwide Special Campaign Against Illegal Dumping and Disposal of Solid Waste. Available online: https://www.mee.gov.cn/ywdt/xwfb/202506/t20250625_1121878.shtml (accessed on 10 June 2026).
- He, S.; Li, Y. Application of UAV Remote Sensing Images in Solid Waste Monitoring: A Case Study of Nanhu District, Jiaxing City. Cent. South Agric. Sci. Technol. 2024, 45, 82–85+99. [Google Scholar]
- Mao, P.; Yu, J.; Tian, Y. Application of Satellite Remote Sensing in Solid Waste Supervision. Renew. Resour. Circ. Econ. 2024, 17, 21–23. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-Cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar] [CrossRef]
- Du, S.; Xing, J.; Wang, S.; Wei, L.; Zhang, Y. STMNet: Scene Classification-Assisted and Texture Feature-Enhanced Multi-Scale Network for Large-Scale Urban Informal Settlement Extraction from Remote Sensing Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 13169–13187. [Google Scholar] [CrossRef]
- Zhang, C.; Xing, J.; Li, J.; Du, S.; Qin, Q. A New Method for the Extraction of Tailing Ponds from Very High-Resolution Remotely Sensed Images: PSVED. Int. J. Digit. Earth 2023, 16, 2681–2703. [Google Scholar] [CrossRef]
- Lavender, S. Detection of Waste Plastics in the Environment: Application of Copernicus Earth Observation Data. Remote Sens. 2022, 14, 4772. [Google Scholar] [CrossRef]
- Kruse, C.; Boyda, E.; Chen, S.; Karra, K.; Bou-Nahra, T.; Hammer, D.; Mathis, J.; Maddalene, T.; Jambeck, J.; Laurier, F. Satellite Monitoring of Terrestrial Plastic Waste. PLoS ONE 2023, 18, e0278997. [Google Scholar] [CrossRef] [PubMed]
- Yailymova, H.; Mikava, P.; Kussul, N.; Krasilnikova, T.; Shelestov, A.; Yailymov, B.; Titkov, D. Neural Network Model for Monitoring of Landfills Using Remote Sensing Data. In Proceedings of the 2022 IEEE 3rd International Conference on System Analysis & Intelligent Computing (SAIC), Kyiv, Ukraine, 4–7 October 2022; pp. 1–4. [Google Scholar]
- Zhang, S.; Ma, J. CascadeDumpNet: Enhancing Open Dumpsite Detection through Deep Learning and AutoML Integrated Dual-Stage Approach Using High-Resolution Satellite Imagery. Remote Sens. Environ. 2024, 313, 114349. [Google Scholar] [CrossRef]
- Devesa, M.R.; Brust, A.V. Mapping Illegal Waste Dumping Sites with Neural-Network Classification of Satellite Imagery. arXiv 2021, arXiv:2110.08599. [Google Scholar] [CrossRef]
- Rajkumar, A.; Kft, C.A.; Sziranyi, T.; Majdik, A. Detecting Landfills Using Multi-Spectral Satellite Images and Deep Learning Methods. In Proceedings of the 10th International Conference on Learning Representations (ICLR 2022), Online, 25–29 April 2022; pp. 1–9. [Google Scholar]
- Torres, R.N.; Fraternali, P. Learning to Identify Illegal Landfills through Scene Classification in Aerial Images. Remote Sens. 2021, 13, 4520. [Google Scholar] [CrossRef]
- Yang, K.; Zhang, C.; Luo, T.; Hu, L. Automatic Identification Method of Construction and Demolition Waste Based on Deep Learning and GAOFEN-2 Data. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2022, 43, 1293–1299. [Google Scholar] [CrossRef]
- Wang, P.; Zhao, H.; Yang, Z.; Jin, Q.; Wu, Y.; Xia, P.; Meng, L. Fast Tailings Pond Mapping Exploiting Large Scene Remote Sensing Images by Coupling Scene Classification and Sematic Segmentation Models. Remote Sens. 2023, 15, 327. [Google Scholar] [CrossRef]
- Yu, J.; Mao, P.; Wu, W.; Wang, Q.; Shao, X.; Teng, J.; Wang, Y. TSNET: A Solid Waste Instance Segmentation Model in China Based on a Two-Step Detection Strategy and Satellite Remote Sensing Images. Int. J. Appl. Earth Obs. Geoinf. 2025, 136, 104366. [Google Scholar] [CrossRef]
- Yong, Q.; Wu, H.; Wang, J.; Chen, R.; Yu, B.; Zuo, J.; Du, L. Automatic Identification of Illegal Construction and Demolition Waste Landfills: A Computer Vision Approach. Waste Manag. 2023, 172, 267–277. [Google Scholar] [CrossRef] [PubMed]
- Sun, X.; Yin, D.; Qin, F.; Yu, H.; Lu, W.; Yao, F.; He, Q.; Huang, X.; Yan, Z.; Wang, P.; et al. Revealing Influencing Factors on Global Waste Distribution via Deep-Learning Based Dumpsite Detection from Satellite Imagery. Nat. Commun. 2023, 14, 1444. [Google Scholar] [CrossRef] [PubMed]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Zhou, L.; Rao, X.; Li, Y.; Zuo, X.; Liu, Y.; Lin, Y.; Yang, Y. SWDet: Anchor-Based Object Detector for Solid Waste Detection in Aerial Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 16, 306–320. [Google Scholar] [CrossRef]
- Li, H.; Hu, C.; Zhong, X.; Zeng, C.; Shen, H. Solid Waste Detection in Cities Using Remote Sensing Imagery Based on a Location-Guided Key Point Network with Multiple Enhancements. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 16, 191–201. [Google Scholar] [CrossRef]
- Hussain, M. Yolov1 to v8: Unveiling Each Variant–a Comprehensive Review of Yolo. IEEE Access 2024, 12, 42816–42833. [Google Scholar] [CrossRef]
- Khanam, R.; Hussain, M. Yolov11: An Overview of the Key Architectural Enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar] [CrossRef]
- Liu, J.; Du, M.; Mao, Z. Scale Computation on High Spatial Resolution Remotely Sensed Imagery Multi-Scale Segmentation. Int. J. Remote Sens. 2017, 38, 5186–5214. [Google Scholar] [CrossRef]
- Fedus, W.; Zoph, B.; Shazeer, N. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity. J. Mach. Learn. Res. 2022, 23, 1–39. [Google Scholar] [CrossRef]
- Jacobs, R.A.; Jordan, M.I.; Nowlan, S.J.; Hinton, G.E. Adaptive Mixtures of Local Experts. Neural Comput. 1991, 3, 79–87. [Google Scholar] [CrossRef] [PubMed]
- Pavlitskaya, S.; Hubschneider, C.; Weber, M.; Moritz, R.; Huger, F.; Schlicht, P.; Zollner, M. Using Mixture of Expert Models to Gain Insights into Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 342–343. [Google Scholar]
- Liu, Z.; Wang, Y.; Vaidya, S.; Ruehle, F.; Halverson, J.; Soljačić, M.; Hou, T.Y.; Tegmark, M. Kan: Kolmogorov-Arnold Networks. arXiv 2024, arXiv:2404.19756. [Google Scholar] [CrossRef]
- Bolya, D.; Zhou, C.; Xiao, F.; Lee, Y.J. Yolact: Real-Time Instance Segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9157–9166. [Google Scholar]
- Gao, S.-H.; Cheng, M.-M.; Zhao, K.; Zhang, X.-Y.; Yang, M.-H.; Torr, P. Res2net: A New Multi-Scale Backbone Architecture. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 43, 652–662. [Google Scholar] [CrossRef]
- Li, B.; Yan, H.; Wu, M.; Zhang, C. Multi-Scale Receptive Field Rectification of Remote Sensing Images. In Proceedings of the IGARSS 2024—2024 IEEE International Geoscience and Remote Sensing Symposium, Athens, Greece, 7–12 July 2024; pp. 9835–9839. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed]
- Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected Crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
- Chen, L.-C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar] [CrossRef]
- Kim, B.J.; Choi, H.; Jang, H.; Kim, S.W. Resolution-Aware Design of Atrous Rates for Semantic Segmentation Networks. arXiv 2023, arXiv:2307.14179. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. Cbam: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Chen, Y.; Jiang, W.; Wang, Y. FAMHE-Net: Multi-Scale Feature Augmentation and Mixture of Heterogeneous Experts for Oriented Object Detection. Remote Sens. 2025, 17, 205. [Google Scholar] [CrossRef]
- Rossi, L.; Bernuzzi, V.; Fontanini, T.; Bertozzi, M.; Prati, A. Swin2-MoSE: A New Single Image Supersolution Model for Remote Sensing. IET Image Process. 2025, 19, e13303. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Chen, S. Official Weibo of Changsha Shuguang Environmental Protection Public Welfare Development Center. Available online: https://www.weibo.com/sghb201381 (accessed on 17 May 2026).
- Lyu, J.; Hu, Y.; Ren, S.; Yao, Y.; Ding, D.; Guan, Q.; Tao, L. Extracting the Tailings Ponds from High Spatial Resolution Remote Sensing Images by Integrating a Deep Learning-Based Model. Remote Sens. 2021, 13, 743. [Google Scholar] [CrossRef]
- Lin, G. Open Pit Mine Object Detection Dataset. Figshare. 2024. Available online: https://figshare.com/articles/dataset/Open_Pit_Mine_Object_Detection_Dataset/27300960 (accessed on 17 May 2026).
- Li, J.; Li, M.; Sui, Z.; Su, W.; Lian, Y.; Chen, S.; Yuan, Z. Target Detection Dataset of Tailings Ponds in Henan Province, China (2016–2021). Science Data Bank. 2022. Available online: https://www.scidb.cn/en/detail?dataSetId=720626420933296128 (accessed on 17 May 2026).
- Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. Detrs Beat Yolos on Real-Time Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 16965–16974. [Google Scholar]
- Xiao, Y.; Xu, T.; Xin, Y.; Li, J. FBRT-YOLO: Faster and Better for Real-Time Aerial Image Detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2025; Volume 39, pp. 8673–8681. [Google Scholar]
- Cai, X.; Lai, Q.; Wang, Y.; Wang, W.; Sun, Z.; Yao, Y. Poly Kernel Inception Network for Remote Sensing Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 27706–27716. [Google Scholar] [CrossRef]
- Lu, W.; Chen, S.B.; Li, H.D.; Shu, Q.L.; Ding, C.H.Q.; Tang, J.; Luo, B. LEGNet: Lightweight Edge-Gaussian Driven Network for Low-Quality Remote Sensing Image Object Detection. arXiv 2025, arXiv:2503.14012. [Google Scholar] [CrossRef]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar] [CrossRef]
- Ding, J.; Xue, N.; Long, Y.; Xia, G.S.; Lu, Q. Learning RoI Transformer for Oriented Object Detection in Aerial Images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2849–2858. [Google Scholar] [CrossRef]
- Yang, X.; Yan, J. Arbitrary-Oriented Object Detection with Circular Smooth Label. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 677–694. [Google Scholar] [CrossRef]
- Yang, X.; Yan, J.; Feng, Z.; He, T. R3Det: Refined Single-Stage Detector with Feature Refinement for Rotating Object. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 2–9 February 2021; Volume 35, pp. 3163–3171. [Google Scholar] [CrossRef]




















| Categories | Annotation Sample Specifications Based on PIPs in Solid Waste Site SIOs |
|---|---|
| TP | A typical tailings storage SIO mainly consists of three types of PIPs: dam body, tailings pond, and interception dam. During annotation, all three PIP areas should be fully covered, and any buildings should be excluded from the annotated image region whenever possible. |
| CSS | A typical spoil heap SIO mainly consists of conical mounds or densely arranged small mounds as PIPs. Only areas with clearly identifiable spoil characteristics should be annotated, while scattered heaps lacking typical features are not annotated. |
| LS | A typical landfill SIO mainly consists of black or white mesh-covered PIPs. Since the white mesh can be easily confused with overexposed areas in the imagery, potentially affecting recognition accuracy, only areas covered by black mesh should be annotated. |
| GDS | A typical garbage pile SIO mainly consists of bright white or grayish-white mounded PIPs, with the main area usually exhibiting a noticeable white transitional zone relative to surrounding land features. Due to blurred boundaries, only the bright white or grayish-white mounded main area should be annotated. |
| ES | A typical excavation site SIO mainly consists of prominent cliff edges, construction areas, and three-dimensional shadow PIPs. During annotation, boundaries should follow the cliff edges, avoiding construction buildings whenever possible, while retaining pit-like features such as excavation pits or exposed slopes. |
| TP | CSS | LS | GDS | ES | |
|---|---|---|---|---|---|
| Quantity | 1364 | 2757 | 3137 | 3328 | 2460 |
| Proportion | 10.45% | 21.13% | 24.05% | 25.51% | 18.86% |
| Categories | ![]() | ![]() | ![]() | ![]() | ![]() |
![]() | ![]() | ![]() | ![]() | ![]() | |
![]() | ![]() | ![]() | ![]() | ![]() |
| Datasets | Location | Source | Resolution | Categories |
|---|---|---|---|---|
| Global Dumpsite Test Data | Cities in Africa and Asia | Google Earth | --- | CSS, GDS |
| Open Source Tailings Pond Dataset | Cities in Anhui Province, China | Google Earth | 2.05 m | TP |
| Open Pit Mine Object Detection Dataset | --- | --- | --- | ES |
| Tailings Ponds in Henan Province | Henan Province, China | --- | 2.00 m | TP |
| Parameter | Descriptions | Value |
|---|---|---|
| GPU_COUNT | Number of GPUs used | 1 |
| Batch Size | Batch size per training | 16 |
| Epochs | Number of iterations | 300 |
| Image_size | Image size | 512 × 512 |
| Learning_Rate | Learning rate | 0.01 |
| Optimizer | Optimizer | SGD |
| Experts_num | Number of experts | 4 |
| Top-K_num | Number of active experts | 2 |
| Model | mAP50 | mAP50–95 | Precision | Recall | F1-Score |
|---|---|---|---|---|---|
| FasterRCNN (2017) | 73.7 | 28.2 | 80.5 | 74.1 | 77.2 |
| RetinaNet (2017) | 70.9 | 25.3 | 75.8 | 72.5 | 74.1 |
| Transformer (2019) | 74.1 | 28.3 | 77.3 | 75.2 | 76.2 |
| CSL (2020) | 69.7 | 26.5 | 75.5 | 72.0 | 73.7 |
| R3Det (2021) | 72.3 | 26.0 | 80.4 | 70.4 | 75.1 |
| PKI (2024) | 76.7 | 36.9 | 81.9 | 78.2 | 80.0 |
| RT-DETR (2024) | 77.6 | 57.8 | 78.7 | 72.9 | 75.7 |
| YOLOv11 (2024) | 80.7 | 61.9 | 80.3 | 72.6 | 76.3 |
| YOLOv12 (2025) | 78.4 | 60.4 | 77.6 | 72.5 | 75.0 |
| LEGNet (2025) | 71.2 | 32.5 | 76.4 | 72.1 | 74.2 |
| FBRT (2025) | 79.3 | 61.6 | 77.6 | 74.8 | 76.2 |
| Propose+ (YOLOv11) | 83.1 | 65.0 | 79.4 | 78.3 | 78.8 |
| Propose+ (YOLOv12) | 81.9 | 63.1 | 75.9 | 78.9 | 77.4 |
| Model | mAP50 | mAP50–95 | Precision | Recall | F1-Score |
|---|---|---|---|---|---|
| FasterRCNN (2017) | 58.6 | 22.4 | 61.7 | 68.3 | 64.8 |
| RetinaNet (2017) | 46.7 | 12.5 | 55.2 | 56.1 | 55.6 |
| Transformer (2019) | 55.3 | 20.2 | 64.5 | 58.9 | 61.6 |
| CSL (2020) | 56.3 | 20.2 | 60.7 | 61.8 | 61.2 |
| R3Det (2021) | 53.9 | 16.3 | 67.0 | 54.7 | 60.2 |
| PKI (2024) | 59.9 | 24.6 | 70.2 | 60.7 | 65.1 |
| RT-DETR (2024) | 64.9 | 44.9 | 67.5 | 59.1 | 63.0 |
| YOLOv11 (2024) | 65.8 | 41.7 | 71.0 | 59.5 | 64.7 |
| YOLOv12 (2025) | 66.6 | 46.1 | 67.6 | 60.3 | 63.7 |
| LEGNet (2025) | 64.9 | 29.3 | 70.2 | 64.4 | 67.2 |
| FBRT (2025) | 65.6 | 45.0 | 60.9 | 67.8 | 64.2 |
| Propose+ (YOLOv11) | 68.5 | 46.0 | 67.0 | 65.6 | 66.3 |
| Propose+ (YOLOv12) | 68.8 | 47.2 | 63.5 | 66.9 | 65.2 |
| Model | mAP50 | |||
|---|---|---|---|---|
| GDTD | OPMOD | OSTPD | TPHPD | |
| FasterRCNN (2017) | 34.8 | 51.7 | 36.8 | 44.4 |
| RetinaNet (2017) | 25.2 | 49.8 | 50.9 | 67.6 |
| Transformer (2019) | 43.0 | 50.2 | 50.1 | 64.5 |
| CSL (2020) | 34.1 | 50.3 | 38.8 | 59.5 |
| R3Det (2021) | 41.1 | 51.0 | 46.0 | 66.0 |
| PKI (2024) | 59.0 | 54.6 | 46.5 | 75.2 |
| RT-DETR (2024) | 48.5 | 54.1 | 35.2 | 76.7 |
| YOLOv11 (2024) | 55.6 | 55.7 | 49.5 | 80.6 |
| YOLOv12 (2025) | 52.6 | 55.7 | 43.9 | 82.1 |
| LEGNet (2025) | 56.7 | 50.2 | 51.1 | 76.2 |
| FBRT (2025) | 56.4 | 56.5 | 48.5 | 82.7 |
| Propose+ (YOLOv11) | 57.9 | 56.3 | 52.0 | 84.2 |
| Propose+ (YOLOv12) | 57.7 | 56.9 | 47.4 | 83.1 |
| Model | Gflops | FPS |
|---|---|---|
| FasterRCNN (2017) | 63.3 | 48.2 |
| RetinaNet (2017) | 52.5 | 50.8 |
| Transformer (2019) | 77.2 | 42.4 |
| CSL (2020) | 36.2 | 44.5 |
| R3Det (2021) | 82.3 | 25 |
| PKI (2024) | 45.4 | 6.5 |
| RT-DETR (2024) | 40.83 | 57 |
| YOLOv11 (2024) | 3.26 | 116 |
| YOLOv12 (2025) | 3.27 | 113 |
| LEGNet (2025) | 56.8 | 29.6 |
| FBRT (2025) | 7.31 | 114 |
| Propose+ (YOLOv11) | 7.14 | 88 |
| Propose+ (YOLOv12) | 7.1 | 84 |
| Gflops | FPS | DARF-MOE | NST-MOE | KAN-MCP | Linear | mAP50 | ||
|---|---|---|---|---|---|---|---|---|
| Detection | Segmentation | |||||||
| A | 3.26 | 116 | - | - | - | - | 80.5 | 81.4 |
| B | 7.18 | 85 | √ | √ | √ | 83.3 | 84.5 | |
| C | 7.26 | 92 | √ | √ | √ | 82.5 | 83.7 | |
| D | 7.24 | 94 | √ | √ | 81.9 | 83.4 | ||
| E | 5.35 | 98 | √ | √ | 82.1 | 82.7 | ||
| F | 5.17 | 92 | √ | √ | 81.9 | 83.1 | ||
| G | 5.25 | 107 | √ | 81.7 | 82.3 | |||
| H | 5.11 | 97 | √ | 81.5 | 82.4 | |||
| I | 3.19 | 105 | √ | 80.9 | 83.1 | |||
| Gflops | FPS | I | J | mAP50 | Number of Experts | ||
|---|---|---|---|---|---|---|---|
| Box | Seg | ||||||
| A | 6.39 | 91.2 | 1 | 16 | 81.8 | 82.6 | 16 |
| B | 7.13 | 89.1 | 4 | 3 | 82.1 | 82.9 | 12 |
| C | 7.14 | 88 | 4 | 4 | 83.1 | 84.6 | 16 |
| D | 7.16 | 86.3 | 4 | 6 | 82.1 | 83.1 | 24 |
| E | 7.11 | 84.1 | 3 | 4 | 82.6 | 83.3 | 12 |
| F | 7.19 | 83.4 | 6 | 4 | 82.3 | 82.8 | 24 |
| Image Level | Data Size (GB) | Processing Time (h) | Screening Time (h) | Total Time (h) |
|---|---|---|---|---|
| Level-18 Empirical Data | 6.35 | 0.88 | 0.46 | 1.34 |
| Level-16 Empirical Data | 12.40 | 2.2 | 1.01 | 3.21 |
| Categories | GDS | CSS | ES | TP | LS |
|---|---|---|---|---|---|
| Precision | 54.8% | 29.8% | 65.3% | 6.6% | 9.7% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Wang, K.; Liu, J.; Li, C.; Yu, B. WMN: A Multi-Scale Nested Mixture-of-Experts-Based Method for High-Resolution Remote-Sensing Solid Waste Site Extraction and Monitoring. Appl. Sci. 2026, 16, 6259. https://doi.org/10.3390/app16126259
Wang K, Liu J, Li C, Yu B. WMN: A Multi-Scale Nested Mixture-of-Experts-Based Method for High-Resolution Remote-Sensing Solid Waste Site Extraction and Monitoring. Applied Sciences. 2026; 16(12):6259. https://doi.org/10.3390/app16126259
Chicago/Turabian StyleWang, Kaiqi, Jianhua Liu, Chen Li, and Bing Yu. 2026. "WMN: A Multi-Scale Nested Mixture-of-Experts-Based Method for High-Resolution Remote-Sensing Solid Waste Site Extraction and Monitoring" Applied Sciences 16, no. 12: 6259. https://doi.org/10.3390/app16126259
APA StyleWang, K., Liu, J., Li, C., & Yu, B. (2026). WMN: A Multi-Scale Nested Mixture-of-Experts-Based Method for High-Resolution Remote-Sensing Solid Waste Site Extraction and Monitoring. Applied Sciences, 16(12), 6259. https://doi.org/10.3390/app16126259































