Enhancing River Waste Detection with Deep Learning and Preprocessing: A Case Study in the Urban Canals of the Chao Phraya River

Nunkhaw, Maiyatat; Chitwatkulsiri, Detchphol; Miyamoto, Hitoshi

doi:10.3390/w17223193

Open AccessArticle

Enhancing River Waste Detection with Deep Learning and Preprocessing: A Case Study in the Urban Canals of the Chao Phraya River

by

Maiyatat Nunkhaw

^1,*

,

Detchphol Chitwatkulsiri

²

and

Hitoshi Miyamoto

¹

Regional Environment Systems Course, Graduate School of Engineering and Science, Shibaura Institute of Technology, Tokyo 135-8548, Japan

²

Department of Water Resources Engineering, Faculty of Engineering, Kasetsart University, Bangkok 10900, Thailand

^*

Author to whom correspondence should be addressed.

Water 2025, 17(22), 3193; https://doi.org/10.3390/w17223193 (registering DOI)

Submission received: 23 September 2025 / Revised: 29 October 2025 / Accepted: 6 November 2025 / Published: 8 November 2025

(This article belongs to the Special Issue Artificial Intelligence in Water Science: Opportunities, Prospects, and Concerns)

Download

Browse Figures

Versions Notes

Abstract

Plastic waste in river systems represents a major pathway of marine pollution, with rivers estimated to contribute up to 80% of the plastic entering the ocean. This study introduces a deep learning framework with preprocessing for automated detection and tracking of floating plastic waste (macroplastics) in the urban canals of the Chao Phraya River, Thailand. Unlike previous approaches that rely on site-specific retraining or model modification, our method employs a YOLO-based detection model integrated with DeepSORT (Deep Simple Online and Realtime Tracking). The model, initially trained on laboratory flume images, was adapted to real river conditions through a three-step preprocessing pipeline comprising skew correction, background removal, and object region extraction. Experiments on 2000 canal images demonstrated that preprocessing improved the mean Average Precision (mAP) from 0.74 to 0.85, with notable gains for categories such as foam and paper. Testing with a more advanced YOLO architecture further enhanced accuracy, indicating that preprocessing and model upgrades are complementary. These findings suggest that reliable detection and quantification of floating waste can be achieved without retraining. The proposed framework provides a scalable and cost-effective solution for monitoring in data-limited regions, contributing to efforts to mitigate riverine and marine plastic pollution. Future work will address the remaining limitations, as detection performance is still influenced by strong reflections, motion blur, and occlusion, occasionally resulting in missed detections.

Keywords:

river waste; macroplastics; riverine pollution; automated waste detection; Chao Phraya River

1. Introduction

Plastic waste poses significant threats to marine ecosystems. Of particular concern is the widespread presence of microplastics, which disrupt ecological balance, threaten aquatic organisms, and may affect human health. The accumulation of plastic debris in oceans has been linked to ecosystem degradation and the loss of biodiversity and ecosystem services. Rivers are estimated to contribute up to 80% of plastics entering the oceans, making riverine systems a key pathway for marine pollution and a critical target for mitigation [1,2,3].

Urbanization and human activity in river basins have accelerated the transport of waste into rivers and, ultimately, the oceans [4]. Once in rivers, plastics undergo physical, chemical, and biological fragmentation, generating microplastics that are even more difficult to manage. Detecting and quantifying river plastic waste is therefore essential for understanding river conditions and their downstream impacts on marine ecosystems [5].

In Thailand, the problem is particularly severe. According to the World Bank [6], about 428,000 tons of plastic waste are mismanaged annually, much of which enters the ocean through rivers. In the Chao Phraya River, the estimated microplastic outflow from Samut Prakan Province reached about 183,000 particles per day in September 2021 and 160,000 particles per day in March 2022. Similarly, a study of the Nan River found an average of 23.67 microplastic particles per liter in surface water samples [6,7]. These findings underscore the severity of river plastic pollution in Thailand and the need for effective monitoring systems.

To address this challenge, methods that can identify and quantify different waste types are required. Accurate classification of plastics, metals, glass, and other anthropogenic debris supports targeted waste management strategies and enables automated monitoring systems that can reduce costs and labor [8,9,10].

Recent advances in computer vision have enabled automated, image-based monitoring systems to replace traditional manual observation. Deep learning techniques—particularly object detection and tracking—have significantly improved monitoring capabilities for dynamic environments. For instance, YOLO-based detection integrated with DeepSORT tracking has shown promising performance for continuous monitoring of floating waste under varying water and lighting conditions [11,12,13,14].

Numerous studies have investigated river waste modeling and monitoring to understand spatial distribution and accumulation dynamics. Lebreton et al. [1] identified major source rivers such as the Yangtze, Ganges, and Niger through a global emission model. Gasperi et al. [9] and González-Fernández et al. [10] applied image-based approaches along the Seine and Rhine, while Kataoka et al. [15] demonstrated image-based quantification under flood conditions in Japan. Recent studies in Malaysia and China achieved 85–90% accuracy using YOLO-based detection [16,17] and van Lieshout et al. [18] reported 68.7% precision in Jakarta’s rivers, with performance declining under local variability. A YOLO + DeepSORT system previously trained on 5711 flume images achieved over 81% mAP in laboratory settings but lower accuracy in natural environments due to reflections, lighting variation, and occlusion [19,20]. Overall, river waste monitoring has advanced globally, yet site-specific adaptation and environmental variability remain major challenges, emphasizing the need for transferable and efficient monitoring frameworks.

This study aims to address these limitations by developing an automated waste quantification framework for real river environments. The applicability of a deep learning-based system for floating waste in the urban canals of the Chao Phraya River is evaluated using a three-step preprocessing pipeline—skew correction, background removal, and object region extraction—to enhance detection accuracy and robustness. Unlike previous studies that focused on model retraining or architectural modification, this research adopts a data-centric approach emphasizing preprocessing to improve adaptability to diverse field conditions. The proposed framework enables a pretrained YOLO-based detection and tracking model to achieve high accuracy without additional retraining, representing a practical and scalable solution for river waste monitoring, particularly in data-limited or resource-constrained regions. The novelty of this study lies in demonstrating, through a real-world case study of the Chao Phraya urban canals, that reliable and transferable detection performance can be achieved under real-world conditions.

2. Materials and Methods

2.1. Implementation of the Proposed Method

The proposed method consists of two main modules: a preprocessing module and an automated waste measurement module as shown in Figure 1. The preprocessing module applies three techniques: skew correction, background removal, and object region extraction using SSD (Single Shot Multibox Detector) implemented in Python 3.10 with TensorFlow 2.12 [21]. The automated measurement module integrates YOLOv5 for object detection [22] with DeepSORT for multi-object tracking, enabling efficient monitoring and quantification [23] of riverine waste in video frames.

2.2. Study Area and Sampling Sites

This study was conducted in an urban canal connected to the Chao Phraya River in Bangkok, Thailand. The Chao Phraya River is the principal watercourse of central Thailand, encompassing a drainage basin of approximately 20,523 km². It originates in Nakhon Sawan Province at the confluence of the Ping, Wang, Yom, and Nan Rivers, and flows about 379 km southward before discharging into the Gulf of Thailand. The river traverses the central plains, a region characterized by intensive agricultural, aquacultural, industrial, and residential development.

For this research, the target area was a canal section monitored by the Bangna CCTV station in Bangkok as shown in Figure 2. The monitoring system, installed along the Bangna Canal—a tributary of the Chao Phraya River—continuously recorded video at a resolution of 640 × 480 pixels and a frame rate of 30 frames per second (fps). The camera operated 24 h per day throughout the observation period.

To reduce the effects of low illumination, glare, and image noise, only daytime footage (approximately 08:00–17:00 local time) was extracted and analyzed. A total of 30 consecutive days of video recordings were archived, from which 2000 JPEG images were systematically sampled at regular intervals for experimental evaluation. Each extracted frame was visually inspected to ensure scene consistency and to exclude frames with significant occlusion, rainfall, or poor visibility.

The fixed camera installation provided a stable viewpoint geometry, minimizing positional variation and enabling a direct comparison between laboratory-trained and real-canal imagery.

2.3. Preprocessing Section

The Preprocessing Section aims to adjust actual canal images to resemble the laboratory images used for training in the automated waste measurement method as shown in Figure 3. There are various preprocessing methods available; however, this study proposes three specific methods: skew correction of images, background removal, and object detection using SSD. Skew correction of images transforms tilted images captured by CCTV into top-down images like those taken in a laboratory setting. Background removal addresses the difficulty of distinguishing between the background and objects in real environment images by eliminating the background (water surface), making it easier to differentiate objects. In the SSD-based object detection step, the model uses an input size of 300 × 300 pixels, ensuring consistency with the configuration used during training.

2.4. Automated Waste Measurement

In this study, an automated waste measurement framework was established by integrating YOLO and DeepSORT, forming a unified pipeline for the real-time detection and tracking of floating waste, as shown in Figure 4. This integration enables continuous quantification of waste flow in dynamic river environments, thereby enhancing the accuracy and efficiency of environmental monitoring systems.

YOLO was selected as the core detection algorithm because of its strong balance between speed and accuracy, which is particularly effective under the variable conditions encountered in real-world monitoring. Its ability to detect multiple objects concurrently within a single inference allows for real-time responsiveness, while its high precision in identifying diverse waste types aligns well with the objectives of this research. YOLO predicts object classes and bounding boxes through a multi-term loss function that jointly optimizes localization, confidence, and classification performance.

For the tracking component, DeepSORT was adopted to extend the functionality of the original SORT algorithm through the incorporation of a deep appearance descriptor. This addition allows the system to maintain consistent object identities over time, even under conditions of occlusion or changes in object appearance. DeepSORT combines motion prediction based on Kalman filtering with object association using Mahalanobis distance, ensuring reliable tracking across frames.

To optimize the combined performance of YOLO and DeepSORT, several parameters were fine-tuned. The YOLO model was trained with an initial learning rate of 0.00872 (Table S1) while DeepSORT employed a cosine distance threshold of 0.2 for robust track association. These configurations provided a stable balance between detection precision and tracking consistency in real environments.

2.5. Experiments

2.5.1. Experiment I: Application of the Model to a Canal in Thailand

The practical applicability of the proposed waste detection and tracking system was evaluated by applying the integrated YOLOv5 and DeepSORT model to a real-world environment in Thailand.

2.5.2. Experiment II: Application of the Proposed Model with Preprocessing in a Canal in Thailand

The preprocessing component of the proposed automated waste measurement framework is designed to align real-world imagery with the characteristics of laboratory images used during model training. Given the significant variability and complexity of environmental conditions in actual canal settings, effective preprocessing is essential to ensure accurate and consistent performance of the detection and tracking models. Although various preprocessing techniques have been proposed, this study introduces three targeted methods: skew correction, background removal, and object detection using SSD.

2.5.3. Experiment III: Evaluation of the Updated YOLO Model for Automated Waste Measurement

To enhance the performance of the automated waste measurement system, the pipeline was updated to incorporate a YOLOv10-based [25] object detection model. This updated model integrates recent advancements in network architecture and training strategies, including refined anchor box selection, improved data augmentation, and enhanced backbone feature extraction. These improvements are intended to increase detection precision, particularly under the challenging conditions of real-world environments.

2.6. Evaluation

Evaluating the performance of object detection and tracking models is essential for assessing their reliability in real applications. In this study, four standard metrics were used: Precision (P), Recall (R), F1 score [26], and mean Average Precision (mAP) [27].

Precision (P) represents the proportion of correctly detected objects among all detections, whereas Recall (R) indicates the proportion of actual objects that were successfully detected. The F1 score combines Precision and Recall into a single measure to evaluate the balance between accuracy and completeness and is particularly useful for assessing counting and tracking performance. Since the primary objective of this study is to quantify and track the number of floating waste objects, the use of the F1 score provides an appropriate and consistent metric for evaluating both detection and tracking reliability. The mAP summarizes detection accuracy across all waste categories as an overall indicator of model performance.

These metrics enable consistent evaluation and comparison of detection and tracking models. They were computed following Equations (1)–(4)

P = \frac{T P}{T P + F P}

(1)

R = \frac{T P}{T P + F N}

(2)

F 1_{s c o r e} = \frac{2 P \cdot R}{P + R}

(3)

m A P = \frac{1}{c l a s s} \sum_{k = 1}^{c l a s s} {(\sum_{i = 1}^{n} (R_{i} - R_{i - 1}) P_{i})}_{k}

(4)

where TP, FP, and FN denote true positives, false positives, and false negatives, respectively, and n is the number of object categories.

3. Results

3.1. Experiment I: Application of the Model to a Canal in Thailand

In this experiment, the baseline performance of the automated waste detection system was evaluated by applying a YOLOv5 object detection model integrated with the DeepSORT tracking algorithm to video footage captured in a natural canal setting in Thailand, as shown in Figure 5. The YOLOv5 model had been trained in a controlled laboratory environment using annotated datasets, and no preprocessing or domain adaptation techniques were employed during deployment. The objective was to assess the model’s ability to generalize to real-world conditions without any further modification.

The results, as summarized in Figure 6, indicate an overall mAP of 0.74 ± 0.03. While the system was able to detect and track common objects such as can, glass, and clear plastic bottle with relatively high AP values, the performance for other categories such as foam and paper was noticeably lower. This can be attributed to several environmental challenges present in the canal setting, including water surface reflections, object occlusions, floating debris, and fluctuating lighting conditions.

These findings suggest that models trained solely on laboratory data face significant limitations when applied directly to unstructured outdoor environments. The high variability in background and object appearance in natural canal severely degrades detection accuracy and consistency, demonstrating the need for preprocessing and domain adaptation in such contexts.

3.2. Experiment II: Application of the Proposed Model with Preprocessing in a Canal in Thailand

To address the domain gap between laboratory and real-world imagery observed in Experiment I, this experiment introduced a preprocessing framework designed to adapt input data before applying the detection model. The preprocessing pipeline consists of three key components: (1) skew correction to normalize image orientation, (2) background removal to reduce visual clutter and isolate floating waste, and (3) object region identification using the Single Shot Multibox Detector (SSD) to guide YOLOv5 in focusing on relevant areas.

After integrating these preprocessing steps, the YOLOv5 + DeepSORT system was re-applied to the same canal dataset. The performance notably improved, with the mAP increasing from 0.74 ± 0.03 to 0.82 ± 0.03. As illustrated in Figure 7, each waste category exhibited improved AP values, with foam and paper showing the most significant gains (+0.10 and +0.08, respectively). Table 1 (center) visually demonstrates enhanced detection robustness and fewer false positives due to the improved signal-to-noise ratio in the preprocessed imagery.

This experiment highlights the importance of task-specific preprocessing in enabling deep learning models to adapt to complex and variable outdoor conditions. By reducing irrelevant background features and standardizing input image characteristics, the system achieved more stable and accurate detection and tracking performance in the real-world environment.

3.3. Experiment III: Evaluation of the Updated YOLO Model for Automated Waste Measurement

In the final experiment, the system was further enhanced by upgrading the detection backbone from YOLOv5 to YOLOv10. The YOLOv10m variant was adopted for its optimized balance between inference speed and detection accuracy. YOLOv10 introduces several architectural advancements, including a more efficient backbone, refined anchor box selection, improved feature fusion, and robust data augmentation techniques. These enhancements are expected to provide superior detection accuracy, particularly under challenging outdoor conditions such as canal environments.

Initially, the YOLOv10 + DeepSORT system was evaluated without preprocessing, resulting in a mAP of 0.78 ± 0.04—already an improvement over the YOLOv5 baseline, as shown in Table 2. Subsequently, the same preprocessing pipeline from Experiment II was applied, which further elevated the mAP to 0.85 ± 0.03. This was the highest score among all experimental configurations. As shown in Figure 7, clear plastic bottle achieved an AP of 0.92, while plastic and glass also showed remarkable accuracy improvements. Table 1 presents a detection scene where the system precisely identified and tracked multiple floating waste items even under sunlight reflection and water motion.

These results confirm that combining advanced detection architectures with domain-aware preprocessing enables robust performance under real-world conditions. The YOLOv10-based system not only enhances detection precision but also ensures consistent tracking, making it well-suited for practical deployment in riverine waste monitoring applications.

3.4. Effectiveness of Preprocessing Steps by Waste Type

To evaluate the individual and combined contributions of the proposed preprocessing components, a stepwise performance analysis was conducted across seven waste types using YOLOv10. Table 3 summarizes the mAP improvements observed for each class after applying successive preprocessing stages: Skew Correction (SC), Background Removal (BG), and Object Extraction (OE).

Skew Correction, the first step, resulted in modest but consistent gains across all waste types. Notably, rigid and geometrically distinct objects such as plastic bottles (+0.07) and cans (+0.06) showed the highest sensitivity to skew normalization. This suggests that perspective correction facilitates improved object localization, particularly for cylindrical or elongated items whose orientation heavily influences their appearance.

Background Removal, when added to skew correction, yielded substantial additional improvements for most classes, especially for visually ambiguous or low-contrast materials. Foam exhibited a 0.10 increase (from 0.70 to 0.80), and plastic and paper also improved significantly (+0.08 and +0.06, respectively). These results highlight that eliminating dynamic or reflective water backgrounds helps the model suppress false positives and focus on relevant object features.

The final stage, Object Extraction, further enhanced performance in all categories, though its marginal gains were smaller than previous steps. The improvement was most pronounced for cans (from 0.89 to 0.91) and plastic bottles (0.92 to 0.94), indicating that spatial cropping and focusing on regions-of-interest helped the model refine its predictions by reducing irrelevant spatial context.

Overall, the preprocessing pipeline increased total mAP from 0.74 (raw input) to 0.85 (fully preprocessed), with the most significant individual-class gains observed in:

Foam: +0.13 (0.69 → 0.82);

Plastic bottle: +0.12 (0.82 → 0.94);

Plastic: +0.14 (0.71 → 0.85);

Can: +0.14 (0.77 → 0.91).

These improvements are particularly important for waste types that are prone to environmental degradation or visual confusion, such as foam and plastic. The results demonstrate that task-specific preprocessing not only improves overall accuracy but also selectively enhances detection performance for difficult waste categories.

4. Discussion

In recent years, many studies have applied deep learning to the detection of floating waste in riverine environments. A consistent trend in this work has been reliance on dataset-specific retraining or architectural modifications to address environmental variability. For example, Table 4, Zailan et al. (2022) [16] reported a mean Average Precision (mAP) of about 89% using an optimized YOLOv4 model fine-tuned with a custom river dataset from Malaysia. Yang et al. (2022) [28] achieved nearly 90% with a tuned YOLOv5, while Jiang et al. (2024) [29] introduced attention-based feature fusion into YOLOv7 and exceeded 91%. Similarly, Li et al. (2020) [30] combined YOLOv5 with Transformer modules to reach about 84% on river imagery. Although these approaches achieved high accuracy, they required large, annotated datasets, computationally demanding retraining, and location-specific adaptation.

In contrast, the present study achieved comparable results (mAP = 0.85 ± 0.03) using a different strategy. Instead of retraining or modifying the detection architecture, a three-stage preprocessing pipeline—skew correction, background removal, and object region extraction—was applied to a pretrained YOLO model originally trained with 5711 flume-based laboratory images. Importantly, no retraining or additional labeling was performed on the canal dataset, which comprised 2000 frames captured from urban canals of the Chao Phraya River in Thailand. These results highlight the effectiveness of data-centric adaptation (i.e., preprocessing-based adaptation), demonstrating that aligning input images prior to detection can achieve performance comparable to more complex and resource-intensive retraining approaches.

The implications of this data-centric adaptation are notable, especially when compared with the standard practices observed in previous studies. Most prior work has addressed real-world variability by collecting annotated imagery from deployment sites and retraining or fine-tuning detection models. While this approach can achieve high accuracy, it poses practical challenges for scalability, particularly in global applications where local datasets are scarce and retraining may not be feasible.

The proposed method overcomes these limitations by using preprocessing to reduce the domain gap between laboratory imagery and field-captured canal images. This enables the reuse of a robust pretrained model without additional annotation or training, making the approach well-suited for regions where data collection is logistically or economically constrained. Although the YOLO + DeepSORT model was trained solely on laboratory images, it achieved higher accuracy than several retrained alternatives, demonstrating that carefully designed preprocessing can improve both robustness and transferability.

In addition to detection accuracy, tracking reliability was also evaluated using the F1 score, which balances Precision and Recall of tracked objects over time. Although this study primarily focused on detection performance, the F1-based evaluation confirmed more consistent object identity maintenance after preprocessing, indicating improved temporal stability of tracking results. This suggests that the preprocessing pipeline not only enhances detection accuracy but also contributes to smoother object association in continuous video monitoring.

The ablation study shows how each component of the preprocessing pipeline [31] contributes to detection performance, particularly in addressing the environmental complexity of real environment settings. Skew correction [32] was especially effective for elongated or angular objects, such as bottles and cartons, by normalizing orientation and reducing distortion. This improved consistency with the training set and increased Average Precision (AP) by 0.07–0.09 for plastic bottles and cans. Background removal [33] reduced the visual complexity of river surfaces (e.g., waves, reflections, vegetation), improving the signal-to-noise ratio and detection of low-contrast objects. Foam and paper benefited most, with mAP gains of 0.08–0.10. SSD-guided object extraction further improved precision by isolating relevant regions, though its impact was smaller than the previous steps.

Unlike end-to-end retraining approaches, where architectural changes obscure the contribution of individual design choices, our modular pipeline provides clarity and reproducibility. It demonstrates how systematic input conditioning can compensate for domain mismatch and environmental variability. Such transparency is particularly valuable in environmental monitoring, where deployment conditions are highly dynamic and stakeholders require confidence in how the system operates.

These findings should also be considered in the broader context of global riverine and marine plastic pollution, as terrestrial rivers are estimated to contribute about 80% of ocean-bound plastic waste, much of it from urban river systems in rapidly developing regions. Within this context, automated, scalable, and cost-effective monitoring systems are critical to support evidence-based mitigation strategies. A key strength of our proposed approach is its adaptability to diverse real-world conditions, achieved through preprocessing alone without the need for retraining, and with minimal dependence on manual annotation [34]. This allows rapid deployment across river systems with varying hydrological, lighting, and background characteristics. As the proposed approach eliminates the need for retraining and additional data preparation, it offers a computationally efficient and easily deployable solution for real-world monitoring applications.

Nonetheless, several challenges remain. The system occasionally produces false negatives under conditions of strong reflections, motion blur, or occlusion. Moreover, the current pipeline is limited to detecting floating debris at the water surface and cannot identify submerged or bottom-deposited waste. To overcome these limitations, future work will focus on integrating the vision-based pipeline with hydrodynamic models to estimate and forecast waste transport and accumulation, even when objects are not directly visible. Additionally, system scalability will be evaluated by applying the framework to a broader range of natural rivers with diverse environmental characteristics [35]. These efforts aim to advance the development of robust, field-ready tools for cleanup planning and policy applications.

5. Conclusions

This study proposed a preprocessing deep learning framework for detecting and tracking floating plastic waste in the urban canals of the Chao Phraya River Basin. By integrating YOLO + DeepSORT with a three-step preprocessing pipeline—skew correction, background removal, and object extraction—the framework successfully adapted a laboratory-trained model to real canal imagery without retraining. Experiments using 2000 CCTV images achieved a mean Average Precision (mAP) improvement from 0.74 ± 0.03 to 0.82 ± 0.02 and further to 0.85 ± 0.03 with the upgraded YOLOv10m detector, with major gains for visually complex materials such as foam, plastic, and paper. These findings confirm that targeted preprocessing effectively enhances model robustness and stability under varying canal surface and lighting conditions.

The results demonstrate the potential of this preprocessing deep learning approach as a scalable and cost-efficient solution for automated canal waste monitoring in data-limited regions. By focusing on preprocessing rather than retraining, the framework reduces annotation requirements and enables rapid deployment for real-world environmental surveillance. Some challenges remain, including reduced accuracy under reflections, motion blur, and partial occlusion, and the current method cannot yet detect submerged debris. Future research will explore multispectral imaging, real-time edge implementation, and integration with cleanup and policy decision systems to advance practical, data-driven river waste management for sustainable environmental monitoring.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/w17223193/s1, Table S1: Hyperparameters.

Author Contributions

Conceptualization, M.N. and H.M.; methodology, M.N., D.C. and H.M.; software, M.N.; validation, M.N.; formal analysis, M.N.; investigation, M.N.; resources, M.N.; writing—original draft preparation, M.N.; writing—review and editing, H.M. and D.C.; visualization, M.N.; supervision, H.M.; project administration, H.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The code used in this study was developed in Python 3.10 and is publicly available at [https://github.com/MAIYATAT/waste_detection, accessed on 20 September 2025]. The image dataset analyzed during the current study is available from the corresponding author on reasonable request.

Conflicts of Interest

The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Lebreton, L.C.M.; van der Zwet, J.; Damsteeg, J.-W.; Slat, B.; Andrady, A.; Reisser, J. River plastic emissions to the world’s oceans. Nat. Commun. 2017, 8, 15611. [Google Scholar] [CrossRef]
Wong, W.Y.; Al-Ani, A.K.I.; Hasikin, K.; Khairuddin, A.S.M.; Razak, S.A.; Hizaddin, H.F.; Mokhtar, M.I.; Azizan, M.M. Water, Soil and Air Pollutants’ Interaction on Mangrove Ecosystem and Corresponding Artificial Intelligence Techniques Used in Decision Support Systems—A Review. IEEE Access 2021, 9, 105532–105563. [Google Scholar] [CrossRef]
Smith, M.; Love, D.C.; Rochman, C.M.; Neff, R.A. Microplastics in Seafood and the Implications for Human Health. Curr. Environ. Health Rep. 2018, 5, 375–386. [Google Scholar] [CrossRef] [PubMed]
Jambeck, J.R.; Geyer, R.; Wilcox, C.; Siegler, T.R.; Perryman, M.; Andrady, A.; Narayan, R.; Law, K.L. Plastic waste inputs from land into the ocean. Science 2015, 347, 768–771. [Google Scholar] [CrossRef] [PubMed]
Blettler, M.C.M.; Abrial, E.; Khan, F.R.; Sivri, N.; Espinola, L.A. Freshwater plastic pollution: Recognizing research biases and identifying knowledge gaps. Water Res. 2018, 143, 416–424. [Google Scholar] [CrossRef]
Babel, S.; Ta, A.T.; Loan, N.T.P.; Sembiring, E.; Setiadi, T.; Sharp, A. Microplastics pollution in selected rivers from Southeast Asia. APN Sci. Bull. 2022, 12, 5–17. [Google Scholar] [CrossRef]
Jendanklang, P.; Meksumpun, S.; Pokavanich, T.; Ruengsorn, C.; Kasamesiri, P. Distribution and flux assessment of microplastic debris in the middle and lower Chao Phraya River, Thailand. J. Water Health 2023, 21, 771–788. [Google Scholar] [CrossRef]
Nihei, Y.; Shirakawa, A.; Suzuki, T.; Akamatsu, Y. Field Measurements of Floating-Litter Transport in a Large River under Flooding Conditions and its relation to DO Environments in an Inner Bay. J. Jpn. Soc. Civ. Eng. Ser B2 Coast. Eng. 2010, 66, 1171–1175. [Google Scholar] [CrossRef]
Gasperi, J.; Dris, R.; Bonin, T.; Rocher, V.; Tassin, B. Assessment of floating plastic debris in surface water along the Seine River. Environ. Pollut. 2014, 195, 163–166. [Google Scholar] [CrossRef]
González-Fernández, D.; Hanke, G. Toward a Harmonized Approach for Monitoring of Riverine Floating Macro Litter Inputs to the Marine Environment. Front. Mar. Sci. 2017, 4, 86. [Google Scholar] [CrossRef]
Gatelli, L.; Gosmann, G.; Fitarelli, F.; Huth, G.; Schwertner, A.A.; de Azambuja, R.; Brusamarello, V.J. Counting, Classifying and Tracking Vehicles Routes at Road Intersections with YOLOv4 and DeepSORT. In Proceedings of the 2021 5th International Symposium on Instrumentation Systems, Circuits and Transducers (INSCIT), Campinas, Brazil, 23–27 August 2021; pp. 1–6. [Google Scholar] [CrossRef]
Zhao, J.; Hao, S.; Dai, C.; Zhang, H.; Zhao, L.; Ji, Z.; Ganchev, I. Improved Vision-Based Vehicle Detection and Classification by Optimized YOLOv4. IEEE Access 2022, 10, 8590–8603. [Google Scholar] [CrossRef]
Charran, S.; Dubey, R. Two-Wheeler Vehicle Traffic Violations Detection and Automated Ticketing for Indian Road Scenario. IEEE Trans. Intell. Transp. Syst. 2022, 23, 22002–22007. [Google Scholar] [CrossRef]
Pszonka, J.; Godlewski, P.; Fheed, A.; Dwornik, M.; Schulz, B.; Wendorff, M. Identification and quantification of intergranular volume using SEM automated mineralogy. Mar. Pet. Geol. 2024, 162, 106708. [Google Scholar] [CrossRef]
Kataoka, T.; Nihei, Y. Quantification of floating riverine macro-debris transport using an image processing approach. Sci. Rep. 2020, 10, 2198. [Google Scholar] [CrossRef]
Zailan, N.A.; Azizan, M.M.; Hasikin, K.; Mohd Khairuddin, A.S.; Khairuddin, U. An automated solid waste detection using the optimized YOLO model for riverine management. Front. Public Health 2022, 10, 907280. [Google Scholar] [CrossRef] [PubMed]
Zailan, N.A.; Mohd Khairuddin, A.S.; Hasikin, K.; Junos, M.H.; Khairuddin, U. An automatic garbage detection using optimized YOLO model. Signal Image Video Process. 2024, 18, 315–323. [Google Scholar] [CrossRef]
van Lieshout, C.; van Oeveren, K.; van Emmerik, T.; Postma, E. Automated River Plastic Monitoring Using Deep Learning and Cameras. Earth Space Sci. 2020, 7, e2019EA000960. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. arXiv 2016, arXiv:1506.02640. [Google Scholar] [CrossRef]
Nunkhaw, M.; Miyamoto, H. An Image Analysis of River-Floating Waste Materials by Using Deep Learning Techniques. Water 2024, 16, 1373. [Google Scholar] [CrossRef]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Computer Vision—ECCV 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar] [CrossRef]
Chen, L.; Zhu, J. Water surface garbage detection based on lightweight YOLOv5. Sci. Rep. 2024, 14, 6133. [Google Scholar] [CrossRef]
Wojke, N.; Bewley, A.; Paulus, D. Simple Online and Realtime Tracking with a Deep Association Metric. arXiv 2017, arXiv:1703.07402. [Google Scholar] [CrossRef]
Thailand—Subnational Administrative Boundaries|Humanitarian Dataset|HDX. Available online: https://data.humdata.org/dataset/cod-ab-tha (accessed on 27 October 2025).
Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. arXiv 2024, arXiv:2405.14458. [Google Scholar] [CrossRef]
Padilla, R.; Passos, W.L.; Dias, T.L.B.; Netto, S.L.; da Silva, E.A.B. A Comparative Analysis of Object Detection Metrics with a Companion Open-Source Toolkit. Electronics 2021, 10, 279. [Google Scholar] [CrossRef]
Padilla, R.; Netto, S.L.; da Silva, E.A.B. A Survey on Performance Metrics for Object-Detection Algorithms. In Proceedings of the 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), Niteroi, Brazil, 1–3 July 2020; pp. 237–242. [Google Scholar] [CrossRef]
Yang, X.; Zhao, J.; Zhao, L.; Zhang, H.; Li, L.; Ji, Z.; Ganchev, I. Detection of River Floating Garbage Based on Improved YOLOv5. Mathematics 2022, 10, 4366. [Google Scholar] [CrossRef]
Jiang, Z.; Wu, B.; Ma, L.; Zhang, H.; Lian, J. APM-YOLOv7 for Small-Target Water-Floating Garbage Detection Based on Multi-Scale Feature Adaptive Weighted Fusion. Sensors 2024, 24, 50. [Google Scholar] [CrossRef]
Li, X.; Tian, M.; Kong, S.; Wu, L.; Yu, J. A modified YOLOv3 detection method for vision-based water surface garbage capture robot. Int. J. Adv. Robot. Syst. 2020, 17, 1729881420932715. [Google Scholar] [CrossRef]
Erlander, A.; Persson, F. Investigating Object Detection and Semantic Segmentation Using Preprocessed Radar Data. Master’s Theses, Lund University, Lund, Sweden, 2024. Available online: http://lup.lub.lu.se/student-papers/record/9161872 (accessed on 8 August 2025).
Mahadevkar, S.; Patil, S.; Kotecha, K.; Abraham, A. A comparison of deep transfer learning backbone architecture techniques for printed text detection of different font styles from unstructured documents. PeerJ Comput. Sci. 2024, 10, e1769. [Google Scholar] [CrossRef] [PubMed]
Wang, H.; Sui, C.; Jiang, F.; Li, S.; Liu, H.; Wang, A. Value-Guided Adaptive Data Augmentation for Imbalanced Small Object Detection. Electronics 2024, 13, 1849. [Google Scholar] [CrossRef]
Qureshi, M.; Fox, S. Using Machine Learning to automate trash detection on a limited dataset with the YOLOv5 object detection network. J. High Sch. Sci. 2023, 7, 87544. [Google Scholar] [CrossRef]
Zhang, J.; Xiang, A.; Cheng, Y.; Yang, Q.; Wang, L. Research on Detection of Floating Objects in River and Lake Based on AI Intelligent Image Recognition. arXiv 2024, arXiv:2404.06883. [Google Scholar] [CrossRef]

Figure 1. Graphical flowchart of the Proposed Method. Arrows indicate the processing flow from input images through preprocessing to the final waste detection output.

Figure 2. The Locations of the Chao Phraya River basin, Bangna Canal, and the study site area. Source: modified from the data in Humanitarian Dataset [24].

Figure 3. Comparison among laboratory images (left), real canal images (center), and canal images after preprocessing (right). The laboratory images were taken under controlled conditions, whereas the canal images were captured in real-world environments. The preprocessing step (right) effectively corrects distortion and illumination differences, thereby reducing the domain gap between laboratory and field images.

Figure 4. Deep learning framework for automated waste detection and counting using YOLO and DeepSORT. YOLO performs object detection, while DeepSORT maintains object identities and counts through appearance-based tracking. Arrows indicate the processing flow from detection to tracking and counting.

Figure 5. Representative canal images captured by the Bangna CCTV monitoring station showing detected floating waste objects using the YOLO–DeepSORT framework. Each image has a resolution of 640 × 480 pixels with 8-bit color.

Figure 6. Waste detection results Laboratory vs. Canal in Thailand.

Figure 7. Waste detection results of YOLOv5 and YOLOv10 with and without pre-processing.

Table 1. Model comparison of the YOLOv5 with YOLOv10 on waste detection.

Comparison	mAP Gain	% Gain (Relative)
YOLOv5 → YOLOv10	+0.04	+5.4%
YOLOv5 → YOLOv5 + Preprocessing	+0.08	+10.8%
YOLOv10 → YOLOv10 + Preprocessing	+0.07	+9.0%

Table 2. Model comparison of the pre-processing method on YOLOv5 with YOLOv10 without pre-processing.

Waste Type	Object Count	YOLOv10 AP	Proposed AP	Absolute Gain	Relative Gain (%)
Can	540	0.83 ± 0.02	0.87 ± 0.02	0.04	4.82%
Carton	590	0.78 ± 0.01	0.82 ± 0.01	0.04	5.13%
Clear plastic bottle	790	0.84 ± 0.02	0.86 ± 0.02	0.02	2.38%
Foam	500	0.76 ± 0.03	0.80 ± 0.02	0.04	5.26%
Glass	360	0.79 ± 0.01	0.80 ± 0.01	0.01	1.27%
Paper	580	0.74 ± 0.00	0.76 ± 0.00	0.02	2.70%
Plastic	880	0.78 ± 0.04	0.82 ± 0.03	0.04	5.13%

Table 3. Effectiveness of preprocessing Steps each waste type.

Type	YOLOv10 Without Preprocessing	Skew Correction	Background Removal	Object Extraction
Can	0.77	0.83	0.83	0.78
Carton	0.73	0.75	0.80	0.74
Plastic bottle	0.82	0.89	0.85	0.82
Foam	0.69	0.70	0.79	0.70
Glass	0.73	0.77	0.78	0.74
Paper	0.72	0.74	0.79	0.73
Plastic	0.71	0.75	0.79	0.76
Total	0.74	0.77	0.80	0.76

Table 4. Model comparison of the proposed work with previous works on waste detection.

Study (Year)	Model	Preprocessing	Dataset Environment	Waste Types	mAP@0.5	Training Images	Test Images	Advantages
This study	YOLOv10 + DeepSORT	Skew correction, BG, ROI extraction	Real river (Thailand)	7 types	85%	5711 (Lab data)	2000 (River data)	No retraining + Preprocessing-based domain adaptation
Zailan et al. (2022) [16]	YOLOv4 (optimized)	Mosaic augmentation	Real river (Malaysia)	5 types	89%	9554 (River data)	2481 (River data)	Retraining + Optimized architecture
Zailan et al. (2024) [17]	YOLOv4-Tiny (embedded)	None	Real river (Malaysia)	5 types	74.90%	21,358 (River data)	5845 (River data)	Retraining + Optimized architecture
Yang et al. (2022) [28]	YOLOv5_CBS	None	Real river (China)	5 types	90.9–92.1%	2400 (River data)	2000 (River data)	Retraining + Optimized architecture
Jiang et al. (2024) [29]	APM-YOLOv7	None	Real river	7 types	~91.3%	3083 (River data)	742 (River data)	Retraining + Optimized architecture
Li et al. (2020) [30]	YOLOv3-2SMA	Rotation-based	Real river	3 types	84.30%	1204 (River data)	301 (River data)	Retraining + Optimized architecture

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nunkhaw, M.; Chitwatkulsiri, D.; Miyamoto, H. Enhancing River Waste Detection with Deep Learning and Preprocessing: A Case Study in the Urban Canals of the Chao Phraya River. Water 2025, 17, 3193. https://doi.org/10.3390/w17223193

AMA Style

Nunkhaw M, Chitwatkulsiri D, Miyamoto H. Enhancing River Waste Detection with Deep Learning and Preprocessing: A Case Study in the Urban Canals of the Chao Phraya River. Water. 2025; 17(22):3193. https://doi.org/10.3390/w17223193

Chicago/Turabian Style

Nunkhaw, Maiyatat, Detchphol Chitwatkulsiri, and Hitoshi Miyamoto. 2025. "Enhancing River Waste Detection with Deep Learning and Preprocessing: A Case Study in the Urban Canals of the Chao Phraya River" Water 17, no. 22: 3193. https://doi.org/10.3390/w17223193

APA Style

Nunkhaw, M., Chitwatkulsiri, D., & Miyamoto, H. (2025). Enhancing River Waste Detection with Deep Learning and Preprocessing: A Case Study in the Urban Canals of the Chao Phraya River. Water, 17(22), 3193. https://doi.org/10.3390/w17223193

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing River Waste Detection with Deep Learning and Preprocessing: A Case Study in the Urban Canals of the Chao Phraya River

Abstract

1. Introduction

2. Materials and Methods

2.1. Implementation of the Proposed Method

2.2. Study Area and Sampling Sites

2.3. Preprocessing Section

2.4. Automated Waste Measurement

2.5. Experiments

2.5.1. Experiment I: Application of the Model to a Canal in Thailand

2.5.2. Experiment II: Application of the Proposed Model with Preprocessing in a Canal in Thailand

2.5.3. Experiment III: Evaluation of the Updated YOLO Model for Automated Waste Measurement

2.6. Evaluation

3. Results

3.1. Experiment I: Application of the Model to a Canal in Thailand

3.2. Experiment II: Application of the Proposed Model with Preprocessing in a Canal in Thailand

3.3. Experiment III: Evaluation of the Updated YOLO Model for Automated Waste Measurement

3.4. Effectiveness of Preprocessing Steps by Waste Type

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI