Multi-Scale Context Fusion Network for Urban Solid Waste Detection in Remote Sensing Images
Abstract
:1. Introduction
- We propose an effective multi-scale context fusion network for detecting urban solid waste based on remote sensing images, thereby enhancing the monitoring performance of illegal waste dumping. As an intelligent auxiliary tool, this solution can provide a basis for the reasonable construction of landfill sites.
- To explore features at different levels, we design an effective guidance fusion module. By using spatial attention mechanisms and large kernel convolutions, it not only helps guide low-level features to retain critical information but also extracts richer features under different receptive fields.
- To capture more representative context information, we introduce a novel context awareness module. By using heterogeneous convolutions and gating mechanisms, it not only captures anisotropic features but also improves feature representation.
- To fuse multi-scale features, we build an innovative multi-scale interaction module. By using cross guidance and coordinate perception, it not only enhances important features but also fuses low-level information with high-level information.
- To substantiate the reliability of our method, we conduct relevant assessments on two representative benchmark datasets. The empirical findings demonstrate that our method is superior to other deep learning models and can achieve consistent performance improvements on different object detectors.
2. Materials and Methods
2.1. Datasets
2.2. Model Architecture
2.3. Guidance Fusion Module
2.4. Context Awareness Module
2.5. Multi-Scale Interaction Module
3. Results
3.1. Implementation Details
3.2. Evaluation Metrics
3.3. Performance Comparison
3.4. Generalization Analysis
3.5. Visualization Analysis
3.6. Ablation Studies
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Wikurendra, E.A.; Csonka, A.; Nagy, I.; Nurika, G. Urbanization and benefit of integration circular economy into waste management in Indonesia: A review. Circ. Econ. Sustain. 2024, 4, 1219–1248. [Google Scholar] [CrossRef]
- Cheng, J.; Shi, F.; Yi, J.; Fu, H. Analysis of the factors that affect the production of municipal solid waste in China. J. Clean. Prod. 2020, 259, 120808. [Google Scholar] [CrossRef]
- Wu, W.; Zhang, M. Exploring the motivations and obstacles of the public’s garbage classification participation: Evidence from Sina Weibo. J. Mater. Cycl. Waste Manag. 2023, 25, 2049–2062. [Google Scholar] [CrossRef]
- Kuang, Y.; Lin, B. Public participation and city sustainability: Evidence from urban garbage classification in China. Sustain. Cities Soc. 2021, 67, 102741. [Google Scholar] [CrossRef]
- Maalouf, A.; Mavropoulos, A. Re-assessing global municipal solid waste generation. Waste Manag. Res. 2023, 41, 936–947. [Google Scholar] [CrossRef] [PubMed]
- Voukkali, I.; Papamichael, I.; Loizia, P.; Zorpas, A.A. Urbanization and solid waste production: Prospects and challenges. Environ. Sci. Pollut. Res. 2024, 31, 17678–17689. [Google Scholar] [CrossRef] [PubMed]
- Teshome, Y.; Habtu, N.; Molla, M.; Ulsido, M. Municipal solid wastes quantification and model forecasting. Glob. J. Environ. Sci. Manag. 2023, 9, 227–240. [Google Scholar]
- Li, Y.; Zhang, X. Intelligent X-ray waste detection and classification via X-ray characteristic enhancement and deep learning. J. Clean. Prod. 2024, 435, 140573. [Google Scholar] [CrossRef]
- Li, Y.; Zhang, X. Relation-aware graph convolutional network for waste battery inspection based on X-ray images. Sustain. Energy Technol. Assess. 2024, 63, 103651. [Google Scholar] [CrossRef]
- Zhang, C.; Zhang, Y.; Lin, H. Multi-scale feature interaction network for remote sensing change detection. Remote Sens. 2023, 15, 2880. [Google Scholar] [CrossRef]
- Cheng, Y.; Wang, W.; Zhang, W.; Yang, L.; Wang, J.; Ni, H.; Guan, T.; He, J.; Gu, Y.; Tran, N.N. A multi-feature fusion and attention network for multi-scale object detection in remote sensing images. Remote Sens. 2023, 15, 2096. [Google Scholar] [CrossRef]
- Li, Y.; Zhang, X. Multi-modal deep learning networks for RGB-D pavement waste detection and recognition. Waste Manag. 2024, 177, 125–134. [Google Scholar] [CrossRef] [PubMed]
- Shang, R.; Zhang, J.; Jiao, L.; Li, Y.; Marturi, N.; Stolkin, R. Multi-scale adaptive feature fusion network for semantic segmentation in remote sensing images. Remote Sens. 2020, 12, 872. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef]
- Zhang, H.; Chang, H.; Ma, B.; Wang, N.; Chen, X. Dynamic r-cnn: Towards high quality object detection via dynamic training. In Proceedings of the 16th European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 260–275. [Google Scholar]
- Zhu, C.; He, Y.; Savvides, M. Feature selective anchor-free module for single-shot object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 840–849. [Google Scholar]
- Yang, Z.; Liu, S.; Hu, H.; Wang, L.; Lin, S. Reppoints: Point set representation for object detection. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9657–9666. [Google Scholar]
- Kim, K.; Lee, H.S. Probabilistic anchor assignment with iou prediction for object detection. In Proceedings of the 16th European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 355–371. [Google Scholar]
- Chen, Q.; Wang, Y.; Yang, T.; Zhang, X.; Cheng, J.; Sun, J. You only look one-level feature. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Virtual. 19–25 June 2021; pp. 13039–13048. [Google Scholar]
- Ge, Z. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Cheng, Y.; Wang, W.; Ren, Z.; Zhao, Y.; Liao, Y.; Ge, Y.; Wang, J.; He, J.; Gu, Y.; Wang, Y.; et al. Multi-scale feature fusion and transformer network for urban green space segmentation from high-resolution remote sensing images. Int. J. Appl. Earth Obs. Geoinf. 2023, 124, 103514. [Google Scholar] [CrossRef]
- Wang, Z.; Xu, M.; Wang, Z.; Guo, Q.; Zhang, Q. ScribbleCDNet: Change detection on high-resolution remote sensing imagery with scribble interaction. Int. J. Appl. Earth Obs. Geoinf. 2024, 128, 103761. [Google Scholar] [CrossRef]
- Chang, J.; Dai, H.; Zheng, Y. Cag-fpn: Channel self-attention guided feature pyramid network for object detection. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Seoul, Republic of Korea, 14–19 April 2024; pp. 9616–9620. [Google Scholar]
- Dong, J.; Wang, Y.; Yang, Y.; Yang, M.; Chen, J. MCDNet: Multilevel cloud detection network for remote sensing images based on dual-perspective change-guided and multi-scale feature fusion. Int. J. Appl. Earth Obs. Geoinf. 2024, 129, 103820. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the 15th European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Zhang, Q.L.; Yang, Y.B. Sa-net: Shuffle attention for deep convolutional neural networks. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Toronto, ON, Canada, 6–11 June 2021; pp. 2235–2239. [Google Scholar]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual attention network for scene segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 3146–3154. [Google Scholar]
- Wang, D.; Zhang, C.; Han, M. MLFC-Net: A multi-level feature combination attention model for remote sensing scene classification. Comput. Geosci. 2022, 160, 105042. [Google Scholar] [CrossRef]
- Chen, Y.; Wang, X.; Zhang, J.; Shang, X.; Hu, Y.; Zhang, S.; Wang, J. A new dual-branch embedded multivariate attention network for hyperspectral remote sensing classification. Remote Sens. 2024, 16, 2029. [Google Scholar] [CrossRef]
- Wu, F.; Hu, T.; Xia, Y.; Ma, B.; Sarwar, S.; Zhang, C. WDFA-YOLOX: A wavelet-driven and feature-enhanced attention YOLOX network for ship detection in SAR images. Remote Sens. 2024, 16, 1760. [Google Scholar] [CrossRef]
- Im, J.; Jensen, J.R.; Jensen, R.R.; Gladden, J.; Waugh, J.; Serrato, M. Vegetation cover analysis of hazardous waste sites in Utah and Arizona using hyperspectral remote sensing. Remote Sens. 2012, 4, 327–353. [Google Scholar] [CrossRef]
- Youme, O.; Bayet, T.; Dembele, J.M.; Cambier, C. Deep learning and remote sensing: Detection of dumping waste using UAV. Proced. Comput. Sci. 2021, 185, 361–369. [Google Scholar] [CrossRef]
- Maharjan, N.; Miyazaki, H.; Pati, B.M.; Dailey, M.N.; Shrestha, S.; Nakamura, T. Detection of river plastic using UAV sensor data and deep learning. Remote Sens. 2022, 14, 3049. [Google Scholar] [CrossRef]
- Liao, Y.H.; Juang, J.G. Real-time UAV trash monitoring system. Appl. Sci. 2022, 12, 1838. [Google Scholar] [CrossRef]
- Zhou, L.; Rao, X.; Li, Y.; Zuo, X.; Liu, Y.; Lin, Y.; Yang, Y. SWDet: Anchor-based object detector for solid waste detection in aerial images. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2022, 16, 306–320. [Google Scholar] [CrossRef]
- Sun, X.; Yin, D.; Qin, F.; Yu, H.; Lu, W.; Yao, F.; He, Q.; Huang, X.; Yan, Z.; Wang, P.; et al. Revealing influencing factors on global waste distribution via deep-learning based dumpsite detection from satellite imagery. Nat. Commun. 2023, 14, 1444. [Google Scholar] [CrossRef]
- Zhang, S.; Chi, C.; Yao, Y.; Lei, Z.; Li, S.Z. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9759–9768. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully convolutional one-stage object detection. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9626–9635. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 42, 318–327. [Google Scholar] [CrossRef]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 658–666. [Google Scholar]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 13713–13722. [Google Scholar]
- Chen, K.; Wang, J.; Pang, J.; Cao, Y.; Xiong, Y.; Li, X.; Sun, S.; Feng, W.; Liu, Z.; Xu, J.; et al. MMDetection: Open mmlab detection toolbox and benchmark. arXiv 2019, arXiv:1906.07155. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Kong, T.; Sun, F.; Liu, H.; Jiang, Y.; Li, L.; Shi, J. Foveabox: Beyound anchor-based object detection. IEEE Trans. Image Process. 2020, 29, 7389–7398. [Google Scholar] [CrossRef]
- Chen, Z.; Yang, C.; Li, Q.; Zhao, F.; Zha, Z.J.; Wu, F. Disentangle your dense object detector. In Proceedings of the 29th ACM International Conference on Multimedia, Virtual. 20–24 October 2021; pp. 4939–4948. [Google Scholar]
- Feng, C.; Zhong, Y.; Gao, Y.; Scott, M.R.; Huang, W. Tood: Task-aligned one-stage object detection. In Proceedings of the IEEE International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 3490–3499. [Google Scholar]
- Zhang, H.; Wang, Y.; Dayoub, F.; Sunderhauf, N. Varifocalnet: An iou-aware dense object detector. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Virtual. 19–25 June 2021; pp. 8514–8523. [Google Scholar]
- Ying, Z.; Zhou, J.; Zhai, Y.; Quan, H.; Li, W.; Genovese, A.; Piuri, V.; Scotti, F. Large-scale high-altitude UAV-based vehicle detection via pyramid dual pooling attention path aggregation network. IEEE Trans. Intell. Transp. Syst. 2024. [Google Scholar] [CrossRef]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Su, B.; Zhang, H.; Li, J.; Zhou, Z. Toward generalized few-shot open-set object detection. IEEE Trans. Image Process. 2024, 33, 1389–1402. [Google Scholar] [CrossRef] [PubMed]
Method | |||||||
---|---|---|---|---|---|---|---|
Reppoints [17] | 43.5 | 75.4 | 44.3 | 26.7 | 37.7 | 46.5 | 56.5 |
FoveaBox [46] | 45.6 | 75.7 | 48.9 | 29.8 | 41.0 | 48.1 | 57.2 |
PAA [18] | 46.0 | 77.8 | 48.2 | 35.7 | 40.9 | 48.8 | 61.8 |
FSAF [16] | 46.9 | 76.9 | 49.0 | 28.1 | 42.6 | 49.5 | 58.8 |
DDOD [47] | 49.2 | 78.7 | 51.9 | 36.7 | 44.7 | 51.7 | 60.4 |
TOOD [48] | 50.0 | 78.3 | 55.2 | 44.1 | 46.3 | 52.5 | 61.7 |
VFNet [49] | 50.2 | 78.6 | 53.7 | 35.0 | 43.9 | 53.7 | 61.6 |
ATSS [38] | 50.6 | 78.8 | 54.5 | 35.4 | 44.3 | 53.9 | 61.7 |
YOLOF [19] | 31.3 | 60.2 | 28.4 | 19.0 | 24.8 | 34.8 | 51.6 |
YOLOX-S [20] | 55.3 | 70.6 | 58.1 | 32.8 | 51.7 | 57.3 | 59.4 |
SWDet [36] | - | 77.6 | 58.4 | - | - | - | - |
BCANet [37] | 48.0 | 79.9 | 50.0 | 27.6 | 42.9 | 50.9 | 59.2 |
PDPAPAN [50] | 44.0 | 76.3 | 45.6 | 23.4 | 39.5 | 46.5 | 54.0 |
CAGFPN [24] | 46.5 | 76.4 | 48.7 | 26.2 | 41.2 | 49.3 | 55.9 |
Ours | 58.6 | 81.8 | 65.7 | 40.5 | 54.0 | 60.9 | 66.6 |
Method | |||||||
---|---|---|---|---|---|---|---|
Reppoints [17] | 36.1 | 63.4 | 37.1 | −1 | 33.8 | 33.7 | 55.0 |
FoveaBox [46] | 38.7 | 62.3 | 40.5 | −1 | 37.4 | 36.3 | 56.1 |
PAA [18] | 36.6 | 61.8 | 36.6 | −1 | 35.6 | 33.8 | 60.0 |
FSAF [16] | 37.9 | 61.9 | 38.5 | −1 | 38.0 | 33.3 | 52.6 |
DDOD [47] | 39.1 | 62.3 | 40.3 | −1 | 38.2 | 35.6 | 56.7 |
TOOD [48] | 37.5 | 60.2 | 38.3 | −1 | 37.4 | 35.2 | 55.8 |
VFNet [49] | 38.3 | 61.4 | 39.7 | −1 | 34.0 | 36.3 | 57.1 |
ATSS [38] | 38.6 | 60.7 | 39.6 | −1 | 39.3 | 35.8 | 56.8 |
YOLOF [19] | 27.3 | 49.3 | 28.1 | −1 | 25.4 | 26.2 | 49.1 |
YOLOX-S [20] | 28.4 | 36.9 | 28.6 | −1 | 34.1 | 21.6 | 34.1 |
BCANet [37] | 39.0 | 64.3 | 40.0 | −1 | 39.1 | 35.6 | 56.3 |
PDPAPAN [50] | 35.8 | 60.6 | 37.6 | −1 | 33.3 | 33.2 | 50.2 |
CAGFPN [24] | 36.9 | 57.9 | 40.0 | −1 | 35.5 | 33.5 | 48.1 |
Ours | 40.3 | 62.8 | 40.7 | −1 | 39.6 | 37.7 | 55.0 |
Method | ||||||
---|---|---|---|---|---|---|
ks = [1, 3] | 53.8 | 81.6 | 58.1 | 34.4 | 47.4 | 57.0 |
ks = [3, 1] | 54.2 | 79.7 | 59.9 | 38.6 | 47.7 | 57.4 |
ks = [3, 3] | 54.1 | 80.7 | 60.3 | 37.3 | 47.4 | 57.4 |
All | 54.9 | 81.2 | 59.8 | 33.4 | 48.5 | 58.1 |
Method | ||||||
---|---|---|---|---|---|---|
HLF+MLF | 54.1 | 80.3 | 58.9 | 39.6 | 46.8 | 57.7 |
LLF+MLF | 54.2 | 80.9 | 59.8 | 41.8 | 48.2 | 57.2 |
All | 54.5 | 81.8 | 59.5 | 41.9 | 46.1 | 58.5 |
Method | ||||||
---|---|---|---|---|---|---|
CAP | 56.2 | 80.3 | 62.8 | 40.3 | 50.8 | 58.9 |
CMP | 56.5 | 82.4 | 62.1 | 41.8 | 49.7 | 59.8 |
CCP | 56.7 | 80.9 | 63.7 | 38.0 | 51.3 | 59.3 |
CAP+CMP | 57.6 | 82.5 | 63.2 | 42.6 | 52.6 | 60.2 |
CMP+CCP | 57.3 | 82.4 | 62.7 | 39.9 | 52.8 | 59.7 |
CAP+CCP | 57.8 | 82.3 | 64.4 | 49.2 | 50.3 | 61.1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, Y.; Zhang, X. Multi-Scale Context Fusion Network for Urban Solid Waste Detection in Remote Sensing Images. Remote Sens. 2024, 16, 3595. https://doi.org/10.3390/rs16193595
Li Y, Zhang X. Multi-Scale Context Fusion Network for Urban Solid Waste Detection in Remote Sensing Images. Remote Sensing. 2024; 16(19):3595. https://doi.org/10.3390/rs16193595
Chicago/Turabian StyleLi, Yangke, and Xinman Zhang. 2024. "Multi-Scale Context Fusion Network for Urban Solid Waste Detection in Remote Sensing Images" Remote Sensing 16, no. 19: 3595. https://doi.org/10.3390/rs16193595
APA StyleLi, Y., & Zhang, X. (2024). Multi-Scale Context Fusion Network for Urban Solid Waste Detection in Remote Sensing Images. Remote Sensing, 16(19), 3595. https://doi.org/10.3390/rs16193595