Abstract
The ubiquitous presence of microplastics across environmental compartments presents a formidable ecotoxicological and risk assessment challenge, fundamentally complicated by the link between microplastic morphology and differential toxicological outcomes. Current analytical methods face a significant measurement bottleneck, hindering the precise, high-throughput characterization needed for robust mechanistic and exposure studies. To address this, we introduce MNv4-Conv-M-fpn, a novel deep learning model specifically engineered for multi-class microplastic segmentation and morphological characterization from microscopic images. This model is designed to provide the toxicologically-relevant granularity required for rigorous risk assessment, segmenting images into six classes: five distinct microplastic categories (fiber, fragment, sphere, foam, and film) and the background. By incorporating advanced architectural features—including transfer learning, a Feature Pyramid Network, and a Feature Fusion Module—our approach achieves high accuracy, computational efficiency, and near real-time inference speed. Comprehensive validation using a diverse dataset demonstrates that MNv4-Conv-M-fpn outperforms existing segmentation methods while maintaining low computational load. This makes the model well-suited for high-throughput deployment in environmental laboratories and resource-constrained monitoring efforts. This approach offers a valuable tool for environmental monitoring, enabling more precise and scalable analysis of microplastic pollution in various ecosystems.
1. Introduction
The global proliferation of plastics has resulted in the ubiquitous presence of plastic debris across all environmental compartments, from terrestrial systems to the deep sea []. A significant portion of this material fragments into microplastics (particles less than in size), which are now recognized as a critical pollutant with profound implications for ecotoxicology and environmental safety. This pollution is characterized by extreme persistence, as most plastics resist biodegradation and only break down slowly via mechanical and photochemical processes [,].
The escalating scale of microplastic pollution urgently necessitates robust measurement and analysis tools to investigate their occurrence, transport, and long-term toxicological effects on biota and ecosystems. Time series data from [] reveal a significant increase in microplastic abundance in coastal waters. These particles are transported through complex, poorly understood pathways, from surface waters to the deep sea via processes like sediment gravity flows [,]. Risk assessments [,] confirm that many marine areas already face high ecological risks. These findings underscore the urgent need for robust measurement and analysis tools to investigate the sources, distribution, and long-term effects of microplastics on marine ecosystem structure and function.
The toxicological impacts of microplastic pollution are multifaceted, causing not only physical harm but also significant sublethal effects on biota across aquatic and terrestrial environments [,,]. Furthermore, microplastics act as vectors for other environmental contaminants, thereby amplifying ecological risks within the marine environment []. A critical aspect of accurately assessing and managing these risks is the ability to accurately differentiate their physical and chemical characteristics. Specifically, morphological attributes (e.g., fibers, fragments, spheres) are crucial determinants of their environmental fate, transport dynamics, bioavailability, and ultimate toxicological effects []. Specifically, ref. [] shows that the shape of these microplastics influences their mobility, buoyancy, interaction with marine organisms, and their broader impacts on health and biodiversity. Ref. [] established that the shape of plastic debris is an essential factor for comprehensive categorization and pollution assessment.
To effectively characterize and quantify these diverse microplastic morphologies, researchers primarily utilize optical and electron microscopy. However, manual characterization under optical microscopy is inherently subjective, time-intensive, and prone to inconsistencies, severely limiting the throughput required for large-scale, comprehensive marine pollution monitoring []. To overcome these limitations and enhance the accuracy of environmental assessments, the adoption of deep learning and artificial intelligence (AI) for identifying plastic morphology in microscopy images is essential [,]. This automated approach is crucial for introducing new forms of measurement and analysis that can rapidly and precisely assess marine environmental conditions, thereby enabling better management and supporting policy decisions. For instance, the application of deep learning surrogate methods has been successfully demonstrated for predicting particle transport in coastal environments [].
A key technique in this automation is image segmentation, which employs computer programs to divide microscopy images into distinct regions, labeling each pixel to identify and classify microplastics against the background. This allows researchers to rapidly process large volumes of environmental samples, providing critical data for management, such as size distributions and count statistics for each microplastic category. The work of [] on seafloor debris detection using deep learning driven image restoration further validates the effectiveness of AI-enhanced image analysis in challenging environmental conditions.
Previous studies have explored the use of deep neural network models for microplastic image segmentation. These works are summarized in Table 1. The majority of these studies focus on binary segmentation, where images are categorized simply as microplastics versus background. For instance, ref. [] employed a U-Net [] to separate microplastics from background elements. Similarly, ref. [] used MultiResU-Net [], demonstrating improved accuracy. Other works have utilized architectures such as ResNet [] combined with a feature pyramid network [], various U-Net derivatives [], and HRNet [,]. While some works have advanced toward multi-class segmentation, such as [], which segmented beach and marine debris into five morphological categories, most remain focused on binary classification.
Table 1.
Summary of deep learning works in microplastic image segmentation, detailing the scope of particle measurement and analysis.
The fundamental limitation of most existing work is the focus on binary segmentation, which fails to capture the full diversity of microplastic types, each associated with distinct origins, transport mechanisms, and toxicological risks in the marine environment. This failure to provide morphological granularity hinders accurate exposure modeling and the development of targeted pollution control strategies. Our work addresses this critical gap in marine pollution measurement and analysis by tackling the technically more difficult problem of segmenting microplastics into five distinct morphological categories (plus the background). This high-performance multi-class segmentation approach allows for a significantly more nuanced and detailed assessment of contamination in environmental samples. Our accurate identification and segmentation of multiple microplastic types in microscope images highlight the potential of using this AI-driven approach as a new paradigm for versatile microplastic image segmentation. This level of morphological granularity is essential for tracking the distribution of microplastics and directly informing targeted pollution control and management strategies in the future.
2. Methodology
2.1. Deep Learning Model
The high-level architecture of the deep learning model proposed in this work is depicted in Figure 1. The model architecture is composed of a backbone network, a feature pyramid network (FPN), and a feature fusion module (FFM). In addition, we utilize a simple yet effective loss function to optimize our model. For simplicity, we will refer to our model as MNv4-Conv-M-fpn throughout the remainder of this paper. Our full implementation code is provided in https://github.com/xwd2019/MNv4-fpn/, accessed on 23 November 2025, with commit id 3b1976c.
Figure 1.
MNv4-Conv-M-fpn’s overall architecture.
2.1.1. BackBone Network
In deep learning, a backbone network is a key part of the model architecture for tasks like semantic segmentation. It is typically a pre-trained convolutional neural network that extracts important features from input images. These features include edges, textures, and shapes, which help the model recognize and differentiate between various types of objects in the images, microplastics in this work. By using a backbone network, the model can efficiently perform complex image analysis with fewer resources.
The choice of a backbone network is crucial for balancing performance and computational cost. We employ MobileNetV4 [] as the backbone for its lightweight and efficient architecture, which is specifically designed for resource-constrained environments. It leverages pre-training on large datasets and incorporates advanced architectural innovations, such as efficient channel attention mechanisms and optimized bottleneck layers, to achieve high segmentation accuracy with significantly reduced computational complexity. This efficiency, as demonstrated in Section 3, makes MobileNetV4 an ideal choice for high-throughput, environmental monitoring applications where accuracy and computational resources are critical.
The backbone network of our model is constructed using three submodules: CBR, IR, and UIB. These submodules are described as follows:
- CBR: Stands for Conv2d + Batch Normalization + ReLU, a widely used combination in deep learning models introduced in [] that has proven highly effective in various image-related tasks.
- IR: Stands for Inverted Residual, introduced in []. The IR module applies the residual connection to high-dimensional feature maps. This inverted residual structure helps maintain the model’s performance even in relatively shallow network architectures.
- UIB: Stands for Universal Inverted Bottleneck, introduced in []. It builds on key components of MobileNetV4, specifically separable depthwise (DW) convolution and pointwise (PW) expansion and projection. UIB extends the Inverted Bottleneck (IB) block introduced in [] and has become a standard building block for efficient neural network architectures.
2.1.2. Feature Pyramid Network
Feature pyramid networks (FPNs), introduced in [], are a series of powerful tools for handling objects of varying sizes, a critical challenge in microplastic image analysis. Unlike traditional methods that rely on a single scale of features, a feature pyramid network constructs a pyramid-like structure of several feature maps, each representing different levels of detail. By combining information from these multiple scales, an FPN effectively captures both fine-grained details necessary for locating small microplastics and broader context for understanding larger fragments. This multi-scale approach significantly enhances the ability to accurately detect and segment microplastics of diverse sizes within a single image, leading to improved overall performance.
A typical feature pyramid network is composed of the following components: Upsample, Concat, C2f, and CBS. These components can be described as follows:
- Upsample: upsamples the input tensor using the bilinear upsampling method;
- Concat: concatenates tensors along the channel dimension;
- C2f: Introduced in [], this module uses two convolutional layers and a fusion layer. It takes feature maps from different spatial scales as input, then applies a series of convolution and concatenation operations to fuse these multi-scale feature maps, allowing the model to capture and integrate information from different spatial resolutions;
- CBS: Stands for Conv2d + Batch Normalization + SiLU. It is similar to the standard CBR module, but replaces the ReLU activation function with SiLU for potentially improved performance.
2.1.3. Feature Fusion Module
The feature pyramid network employed in this work generates three feature maps, each capturing semantic information at different scales. Following this, the feature fusion module (FFM) plays a critical role in merging these feature maps into a cohesive representation that effectively balances fine details and broader semantic information. The FFM also includes mechanisms to prioritize the most relevant features, ensuring the final output is optimized for segmentation accuracy.
To further enhance the FFM’s performance, we integrate the Atrous Spatial Pyramid Pooling (ASPP) module and the decoder from DeepLabV3+ []. These components are essential for refining the features and boosting the model’s segmentation capabilities. These components can be described as follows:
- ASPP: Introduced in [], atrous spatial pyramid pooling employs parallel atrous convolutions with varying dilation rates, enabling the model to capture features across multiple scales. This approach improves the model’s ability to handle objects of different sizes.
- Decoder: We directly incorporate the decoder from DeepLabV3+ [], which combines the outputs from the ASPP module and FPN. This decoder upscales the feature maps, resulting in a richer and more detailed feature representation, ultimately contributing to more accurate segmentation.
2.1.4. Transfer Learning
To boost the performance of our deep learning model for microplastic segmentation, we employed transfer learning as a crucial technique during training []. Specifically, we first pre-train our model on the Cityscapes dataset [], which provided a rich set of features adaptable to the specific task of microplastic segmentation. After this pre-training, the model was fine-tuned using our microplastic image dataset.
The rationale behind this transfer learning approach is that the low-level features learned from the Cityscapes dataset, such as edge detection, texture recognition, and color identification, are likely to be useful for the microplastic task as well. As shown in Section 3.4, this transfer learning approach enhances the model’s performance by about .
2.1.5. Loss Function
A loss function measures the difference between a model’s predictions and the actual target values, guiding the optimization process during training. It quantifies how well or poorly the model is performing by calculating the error between predicted outputs and true values. For our case, where the main objective is to perform semantic segmentation of microplastics in images, the loss function evaluates how accurately the model identifies and classifies different microplastic types and how well it delineates their boundaries. If the model incorrectly classifies a fragment as a fiber or misses parts of a foam, the loss function reflects this discrepancy, prompting adjustments to improve the model’s accuracy and precision in detecting and segmenting microplastics.
Our model uses a combination of cross-entropy loss and Dice loss, each contributing uniquely to the training process.
The cross-entropy loss is defined in Equation (1). In this equation, N is the total number of samples being evaluated. M is the number of classes of microplastics. is the label of sample i for class c, is the predicted probability of sample i being in class c generated by the model. is the weight of each predicted class.
As described in [], this loss function quantifies the dissimilarity between the predicted and actual categorical distributions, encouraging the model to assign higher probabilities to the correct class labels. By minimizing cross-entropy loss, the model improves its ability to accurately classify each pixel in the image into one of the predefined microplastic categories.
Dice Loss is computed using Equation (2). In Equation (2), is the predicted label of sample i for class c by the model.
As described in [], this loss is particularly effective in tasks where the precise delineation of object boundaries is crucial. It is derived from the Dice coefficient, which is a measure of overlap between the predicted segmentation and the ground truth. By minimizing Dice loss, the model enhances its ability to produce accurate and contiguous segmentation masks, ensuring that the shapes and boundaries of microplastics are well-defined.
Combining cross-entropy loss and Dice loss, the loss function is defined in Equation (3).
2.2. Datasets
2.2.1. Microplastic Dataset
The microplastic dataset used in our study originates from the Moore Institute for Plastic Pollution Research and consists of 5653 microscope images. Each image is annotated with labels indicating the types of microplastics present, as well as detailed contours outlining each microplastic within the image.
Microplastics in this dataset are categorized into five types: spheres, fragments, fibers, films, and foams. The definition of these categories, their count and annotations are listed in Table 2.
Table 2.
Detailed information of each MP category.
While the dataset provides labels for the type of microplastic in each image, it includes pixel-level segmentation masks. A significant part of this work involves annotating these images to generate precise pixel-level masks, which are essential for training and evaluating semantic segmentation models. Figure 2 shows a list of samples from this dataset as well as their segmentation masks. The annotation tool we use is LabelMe v5.9.0 (https://github.com/labelmeai/labelme, accessed on 23 November 2025), a Python-based tool designed for pixel-level annotation. We utilized its polygon annotation feature to trace the microplastic boundaries, and these polygons were subsequently converted into pixel-level bitmap segmentation masks for model training. A snapshot of the annotation process is shown in Figure 3. After annotation, detailed statistics of the microplastics in this dataset were determined, which are provided in Figure 4.
Figure 2.
Sample microplastic images and corresponding annotated masks.
Figure 3.
Annotate microplastics in an image using LabelMe.
Figure 4.
Details of the dataset: (a) Percentage of each microplastic’s count and area. (b) Percentage of images containing each microplastic. (c) Distribution of the mask’s coverage area. (d) Distribution of the mask’s center position. (e) Breakdown of images by microplastic object count. (f) Breakdown of images by microplastic coverage area.
From Figure 2 and Figure 4, we can see that the dataset encompasses a diverse range of imaging conditions, including variations in size, aspect ratio, magnification, lighting, camera angles, coverage area, center position, background materials, etc.
Figure 5 depicts the distribution of image size and aspect ratio. As shown in Figure 5, the dataset contains images with a wide range of sizes, from as small as 132 × 180 pixels to as large as 3286 × 4096 pixels. The aspect ratios of the images also vary significantly.
Figure 5.
Image size and aspect ratio distributions: (a) distribution of image width and height; (b) distribution of image aspect ratio.
This diversity ensures that the dataset is representative of real-world challenges in microplastic detection and analysis, thus enhancing the robustness and generalizability of the models developed.
To facilitate model training and evaluation, the dataset is split into training and validation sets in a 7:3 ratio. Specifically, the training set consists of 3957 images, while the validation set includes 1696 images. This division allows for a comprehensive assessment of the model’s performance on unseen data, providing reliable estimates of its ability to generalize effectively to new and varied samples.
2.2.2. Cityscapes Dataset
As mentioned in Section 2.1.4, we use the Cityscapes dataset to pre-train our model. This is a large-scale dataset designed originally for semantic urban scene understanding. It contains images captured in various street scenes across 50 cities, primarily in Germany. The dataset includes pixel-level annotations for 33 different classes, such as vehicles, pedestrians, buildings, and road markings, making it a comprehensive resource for training models on urban scene segmentation tasks. The images in Cityscapes are captured under diverse conditions, including varying weather, lighting, and traffic scenarios [], providing a robust foundation for pre-training models that require strong feature extraction capabilities, even if the focus of the target task is different, such as microplastic segmentation. More information on the Cityscapes dataset can be found on the Cityscapes website https://www.cityscapes-dataset.com/, accessed on 23 November 2025.
2.3. Data Preprocessing and Augmentation
As shown in Figure 5, the images in the microplastic dataset vary in size, with most being around 400 × 600 or 1200 × 1500 pixels. The majority of these images have an aspect ratio close to 0.75. To standardize the input for our experiments, we resize all images to 640 × 480 pixels. Specifically, we first pad the images to achieve a 4:3 aspect ratio to ensure that objects in the image do not deform significantly, then resize them to 640 × 480. For images smaller than 640 × 480, only the padding step is applied.
In addition to preprocessing, we also apply data augmentation techniques to improve the robustness and generalization of our model. These methods include random rotations, flips, and color jittering, which help the model learn to recognize microplastics under varying conditions and enhance its performance on unseen data. Finally, we generate a series of synthetic images containing microplastics to increase the diversity within the dataset. This approach addresses the imbalance in the original dataset, which contains an overrepresentation of fragment-type microplastics and a limited number of foam- and film-type microplastics, as illustrated in Figure 4.
By diversifying the training data, we aim to reduce overfitting and ensure the model’s effectiveness in the segmentation of microplastics across different imaging scenarios. The augmentation methods used in this work are listed in Table 3.
Table 3.
Image augmentation methods.
2.4. Evaluation Metrics
In this work, we use a variety of metrics to evaluate our approach. We employ mean Intersection over Union (mIoU) and F1 score (F1) as measures of segmentation accuracy, the value of which ranges from 0 to 1, where a higher value indicates better performance. Besides accuracy metrics, we also employ the number of parameters (params), FLOPs (Floating Point Operations), and inference time to assess the model’s efficiency and computational demands. These metrics provide a detailed understanding of the computational performance of our model. A brief description of these metrics are provided below:
- mIoU: This metric measures the average overlap between the predicted segmentation mask and the ground truth mask. A higher mIoU value indicates better agreement between the predicted and actual segmentation.
- F1: This metric is calculated as the harmonic mean of precision and recall. It provides a balanced measure by considering both false positives and false negatives.
- params: This metric quantifies the number of learnable parameters in a model. While a higher parameter count can improve the model’s capacity to learn complex patterns, it also increases the risk of overfitting and can lead to longer inference times.
- FLOPs: This metric represents the total number of floating-point operations required for a model’s computation. A higher FLOPs value indicates greater computation resource usage.
- Inference time: This metric refers to the average time taken to process a single input and generate an output. A good segmentation model typically achieves an inference time of only a few milliseconds.
3. Experiments
3.1. Experiment Setup
We trained each model for 300 epochs, with batch size of 8, using the RMSprop optimization algorithm []. This batch size was deliberately chosen to ensure a fair comparison across all benchmarked models. Specifically, with an input image resolution of pixels, one of the larger comparison models (Nested UNet) reached its maximum stable batch size of 8 on the 24 GB VRAM GPU. Therefore, to maintain identical training conditions and prevent the batch size from influencing comparative results, we standardized the batch size at 8 for all architectures, including our own model.
The learning rate was dynamically adjusted using the Cosine Annealing algorithm, where the learning rate at epoch t is defined as follows:
Here, and represent the minimum and maximum learning rates, respectively, and denotes the total number of training epochs. In our experiments, we set , and as 0, , and 300, respectively. Details of the hardware and software configurations used during training are provided in Table 4.
Table 4.
Configuration of the experiment.
As discussed in Section 2.1.4, we pre-trained our model on the Cityscapes dataset before fine-tuning it on the microplastics dataset. The Cityscapes dataset includes 5000 finely annotated images and 19,997 coarsely annotated images, all of which were used in the pre-training phase. The training loss curves for both the Cityscapes and microplastic datasets are illustrated in Figure 6. These curves clearly demonstrate the rapid convergence and low final loss achieved during the pre-training phase and subsequent fine-tuning phase on the microplastic data, validating the effectiveness of our approach.
Figure 6.
Loss curves on pre-trained and fine-tuned datasets: (a) Cityscapes coarse dataset; (b) Cityscapes fine dataset; (c) MicroPlastics dataset.
3.2. Comparison Study
To evaluate the performance of our model against other established segmentation approaches, we conducted experiments using a selection of well-known models. Below is a list of these models, along with a brief description of each:
- FCN []: The Fully Convolutional Network (FCN) is a pioneering model that introduced the concept of fully convolutional architectures for dense pixel-wise predictions, eliminating the need for fully connected layers.
- BiSeNetV1 []: BiSeNetV1 achieves real-time performance by utilizing a dual-path architecture—a spatial path for capturing detailed spatial information and a context path for rich semantic context.
- BiSeNetV2 []: This is an enhanced version of BiSeNetV1, featuring a bilateral structure, adaptive aggregation module, and spatial enhancement modules to further improve efficiency and accuracy.
- DeepLabV3+ []: Known for its high accuracy, DeepLabV3+ uses an encoder–decoder architecture with atrous convolution and a spatial pyramid pooling module to capture multi-scale contextual information. It offers three backbone options: ResNet, MobileNet, and Xception.
- HRNet []: HRNet maintains high-resolution representations throughout the network by employing parallel multi-resolution subnetworks, enabling the capture of both rich spatial details and contextual information.
- U-Net []: U-Net utilizes an encoder-decoder architecture with skip connections, which help to preserve spatial information during the segmentation process.
- Nested U-Net []: An extension of U-Net, Nested U-Net incorporates a nested and multi-scale architecture to capture hierarchical features, leading to improved segmentation accuracy.
- Swin U-Net []: Swin U-Net combines the Swin Transformer with the U-Net architecture, effectively modeling both local and global spatial relationships for enhanced segmentation performance.
This comparison highlights the strengths and innovations of our model relative to these established approaches.
3.3. Experiment Results
3.3.1. Visual Comparison
We present a set of representative visual segmentation results for all models in Figure 7. The figure shows that for spherical and fragment-shaped microplastics, such as in inputs 2, 6, and 11, most models perform well, accurately segmenting these objects due to their distinct shapes, consistent textures, and simple background interactions.
Figure 7.
Visual performance comparison of different models.
However, film and fiber-shaped microplastics, like those in inputs 9 and 12, pose significant challenges for most models. Fiber-shaped microplastics are typically thin, leading to discontinuous segmentation results, while film-shaped microplastics are often transparent, making them difficult to distinguish from the background. The majority of models also struggle to segment foam-shaped microplastics, as seen in inputs 4 and 13. This is somewhat counter-intuitive, given that foam microplastics are visually identifiable, but their underrepresentation in the dataset—only of the images contain at least one foam-shaped microplastic—likely contributes to this difficulty. Despite this, our model is able to produce accurate segmentations for inputs 4 and 13, demonstrating its strong capability to handle images with underrepresented training samples.
As shown in Figure 7, our model excels at detecting and accurately delineating the boundaries of small-sized microplastics, such as in inputs 9 and 10, where several other models fall short. This superior performance is attributed to the incorporation of a feature pyramid network and a feature fusion module in our model, which enable it to effectively capture small objects and integrate information across different resolutions for enhanced boundary detection.
3.3.2. Evaluation Results
To accurately compare the performance of our model with the models listed in Section 3.2, we present detailed results on the evaluation metrics described in Section 2.4 in Table 5 and Figure 8.
Table 5.
Model’s overall performance on metrics.
Figure 8.
mIoU, params and inference time for each model.
As shown in Table 5, our model outperforms all others in both mIoU and F1 score, indicating it is the most accurate among those tested. Additionally, our model has the lowest FLOPs, demonstrating its efficiency in terms of computational power. In contrast, while Swin U-Net has more than twice the number of parameters as our model, it performs poorly in both accuracy and inference time, suggesting that the Swin Transformer may not be the best choice for segmenting microplastics.
This is further illustrated in Figure 8, where our model surpasses the second-best model by as much as in mIoU while maintaining a 4.5 ms inference time, making it fast enough for real-time applications.
To further demonstrate our model’s effectiveness in segmenting various types of microplastics, we present detailed mIoU and F1 scores for each type of microplastic across all models in Table 6. From this table, it is evident that our model outperforms all others in accuracy across all microplastic types. Notably, for fiber-shaped microplastics, it is the only model to achieve an F1 score above 0.75.
Table 6.
mIoU and F1 score breakdown for each model across different microplastic types.
3.4. Ablation Study
To evaluate the contribution of each component in our model, we conducted an ablation study by systematically removing specific modules and observing the impact on performance. This analysis helps identify which modules are critical to the model’s success in segmenting microplastics images, providing deeper insights into its design and effectiveness.
Our model comprises three main components: a backbone network, a feature pyramid network, and a feature fusion module. We used the backbone network (MNv4-Conv-M) as the baseline and focused this ablation study on the effects of the feature pyramid network and the feature fusion module.
The results of this experiment are presented in Figure 9. It is evident that integrating the feature pyramid network into the baseline model leads to a substantial increase in mIoU, from 0.36 to 0.57, and a corresponding rise in F1 score, from 0.38 to 0.61. The subsequent addition of the feature fusion module further boosts the mIoU and F1 score to 0.69 and 0.73, respectively. These findings clearly demonstrate that both the feature pyramid network and the feature fusion module play crucial roles in significantly enhancing the accuracy of our model. Moreover, the synergy between the feature pyramid network and the feature fusion module further amplifies their individual contributions to performance improvement.
Figure 9.
Ablation experiment.
To assess the impact of transfer learning, we compared the performance of the full MNv4-Conv-M-fpn model when trained from scratch versus when trained using weights pre-trained on the Cityscapes dataset. Training the model from scratch yields an mIoU of 0.655 and an F1 score of 0.696. In contrast, incorporating the Cityscapes pre-training step results in the highest achieved performance: an mIoU of 0.690 and an F1 score of 0.733. This performance lift confirms that pre-training is a highly effective and necessary component of our methodology.
4. Discussion
4.1. The Scientific and Technical Advancement
This work introduces MNv4−Conv−M−fpn, a novel deep learning architecture that significantly advances the capability for automated microplastic analysis. To the best of our knowledge, our study is the first to integrate advanced deep learning components, specifically MobileNetV4, Feature Pyramid Networks, and a Feature Fusion Module, for multi-category microplastic segmentation under varied and complicated backgrounds. This approach directly addresses a major analytical bottleneck in marine science.
Most existing studies on automated microplastic identification focus on binary segmentation, distinguishing only between microplastic and background. In stark contrast, our methodology focuses on the far more complex task of segmenting images into six categories, including five distinct microplastic morphologies and the background. This granular, multi-class segmentation poses a significantly greater technical challenge but is essential for robust marine pollution measurement and analysis. Morphological differentiation allows researchers to link pollution findings to specific sources (e.g., distinguishing fibers from fragments or pellets), which is critical for effective source control and management.
Furthermore, much of the related research relies on U-Net or its variants for segmentation. Our approach introduces a more advanced deep learning model, which, as demonstrated in Section 3.3.2, consistently outperforms U-Net in accuracy. To ensure a comprehensive assessment against contemporary AI tools, we compared MNv4−Conv−M−fpn with other state-of-the-art deep convolutional neural network models using established metrics such as mean Intersection over Union (mIoU), F1 score, number of parameters, FLOPs, and inference time. These rigorous comparisons confirm that our model is not only highly accurate but is also lightweight and capable of real-time processing, a crucial feature for field deployability. An ablation study was also conducted to rigorously verify the unique contribution of each architectural component to the overall performance improvement, proving that the synergistic design of our proposed model is the most suitable for high-performance microplastic segmentation.
While deep learning has also been applied to microplastic classification—for example, through standardized visual protocols [], holographic methods [], and models like SqueezeNet or ResNet []—segmentation provides superior environmental insight. Classification provides only overall category information without specifying the location or size of particles. In contrast, segmentation effectively differentiates microplastics from complex backgrounds, accurately identifying the type, precise location, shape, and area of each particle. While technically more challenging, segmentation offers more detailed and useful insights essential for accurate ecotoxicological and risk assessment and tracking the fate of pollutants in marine environments.
4.2. Environmental Deployment and Policy Impact
The high computational efficiency and real-time processing capabilities of MNv4-Conv-M-fpn significantly enhance its utility for large-scale exposure assessment and ecotoxicological monitoring. This lightweight architecture is specifically designed for deployment on mobile and embedded platforms, where portability and efficiency are paramount for scalable research. By integrating this advanced AI tool into field devices, researchers and environmental agencies can perform rapid, on-site microplastic analysis. This capability circumvents the reliance on extensive and time-consuming laboratory infrastructure, which has historically created a bottleneck in collecting the high-volume data needed for robust risk modeling and understanding the full extent of microplastic hazard. This streamlined process enables more immediate and widespread characterization of microplastic pollution on a global scale.
This high-efficiency approach offers a direct pathway to improve the risk management and protection of the natural environment. Generating real-time, morphology-specific data supports more agile policy decisions, allowing for the rapid identification of microplastic hotspots and the efficient targeting of mitigation efforts. Crucially, the ability to characterize microplastics by shape and size in real time provides data essential for understanding differential exposure routes and predicting toxicological consequences based on particle morphology. Furthermore, the adaptability of this segmentation approach provides a foundation for broader applications, such as the real-time detection of other morphologically distinct marine pollutants or contaminants, further extending its value as a novel tool for environmental toxicology.
4.3. Limitations and Future Work
Despite the strong capabilities demonstrated, our model has certain limitations that warrant further attention. The segmentation performance on some microplastic types is constrained by dataset imbalance, particularly for underrepresented categories such as foam. To address this, future work will focus on the collection of additional high-quality microscope images containing a broader diversity of challenging microplastic morphologies. Furthermore, the segmentation of transparent objects, such as film microplastics, is currently less effective due to background interference. We plan to optimize the model’s architecture and loss function to specifically improve the accurate segmentation of transparent objects. Additionally, we aim to further enhance the model’s performance and efficiency by incorporating the latest advancements in computer vision, such as SAM2 [], ensuring the model remains at the forefront of automated marine environmental analysis.
5. Conclusions
In this work, we introduced MNv4-Conv-M-fpn, a novel deep learning model engineered for accurate and morphologically-specific multi-class microplastic segmentation, directly addressing the measurement bottleneck in environmental analysis.
By combining the lightweight MNv4-Conv-M backbone with a Feature Pyramid Network and a dedicated Feature Fusion Module, our model achieves high accuracy in multi-class microplastic detection, while maintaining a low computational load and fast inference speed. This efficiency makes the model ideal for high-throughput deployment in environmental laboratories and resource-constrained monitoring efforts, providing the granular morphological data essential for differential toxicological studies and risk assessment.
This innovative approach serves as a valuable, scalable tool for ecotoxicological monitoring, exposure assessment, and informed environmental risk management.
Author Contributions
Conceptualization, Y.Y. and H.F.; methodology, Y.Y. and W.X.; software, W.X.; validation, W.X.; investigation, Y.Y. and W.X.; resources, Y.Y., W.X. and H.F.; data curation, W.X.; writing—original draft preparation, Y.Y. and W.X.; writing—review and editing, Y.Y. and H.F.; visualization, W.X.; supervision, Y.Y. and H.F.; project administration, H.F.; funding acquisition, Y.Y. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the Wuhan Institute of Technology under Grant Numbers K2023006 and K202238, and by the Wuhan Municipal Bureau of Talents under Grant Number X24D090. The authors sincerely appreciate this generous support.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The raw data supporting the conclusions of this article will be made available by the authors on request.
Acknowledgments
We gratefully acknowledge the Moore Institute for Plastic Pollution Research for providing the microplastic images used in this study.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Warguła, J. Not only foils and packaging. Part 1: About the applications of polymers in ophthalmology, dentistry and medical equipment. Prospect. Pharm. Sci. 2024, 22, 135–145. [Google Scholar] [CrossRef]
- Kwon, B.G. Behavior of invisible microplastics in aquatic environments. Mar. Pollut. Bull. 2026, 222, 118739. [Google Scholar] [CrossRef] [PubMed]
- Rodríguez, A.M.; Martín, M.N.D.; Pérez, E.M.R.; Amey, J.; Borges, J.H.; Nuez, E.F.; Machín, F.; Moreno, D.V. Vertical distribution and composition of microplastics and marine litter in the open ocean surrounding the Canary Islands (0–1200 m depth). Mar. Pollut. Bull. 2026, 222, 118836. [Google Scholar] [CrossRef] [PubMed]
- Bermúdez, J.R.; Metian, M.; Swarzenski, P.W.; Bank, M.S.; Bjorøy, Ø.; Cajas, J.; Bucheli, R.; González-Muñoz, R.; Lynch, J.; Piguave, E.; et al. Marine microplastics on the rise in the Eastern Tropical Pacific: Abundance doubles in 11 years and a ten-fold increase is projected by 2100. Mar. Pollut. Bull. 2025, 221, 118437. [Google Scholar] [CrossRef] [PubMed]
- Zhang, J.; Choi, C. A transport mechanism for deep-sea microplastics: Hydroplaning of clay-laden sediment gravity flows. Mar. Pollut. Bull. 2025, 218, 118191. [Google Scholar] [CrossRef]
- Li, Q.; Zhu, L.; Wei, N.; Bai, M.; Liu, K.; Wang, X.; Jiang, C.; Zong, C.; Zhang, F.; Li, C.; et al. Transport mechanism of microplastics from a still water system to a dynamic estuarine system: A case study in Macao SAR. Mar. Pollut. Bull. 2025, 218, 118223. [Google Scholar] [CrossRef]
- Zeng, H.; Liu, D.; Liu, S.; Li, Z.; Ding, Y.; Wang, T. Occurrence, characteristics and ecological risk assessment of microplastics in the surface water of the Central South China Sea. Mar. Pollut. Bull. 2026, 222, 118760. [Google Scholar] [CrossRef]
- Aytan, Ü.; Şentürk Koca, Y.; Pasli, S.; Güven, O.; Ceylan, Y.; Basaran, B. Microplastics in commercial fish and their habitats in the important fishing ground of the Black Sea: Characteristic, concentration, and risk assessment. Mar. Pollut. Bull. 2025, 221, 118434. [Google Scholar] [CrossRef]
- Arienzo, M.; Donadio, C. Microplastic-Pharmaceuticals Interaction in Water Systems. J. Mar. Sci. Eng. 2023, 11, 1437. [Google Scholar] [CrossRef]
- Zhang, J.; Wang, X.; Li, Q.; Li, B.; Ma, H.; Wang, S.; Ying, Q.; Ran, S.; Leonteva, E.O.; Shilin, M.B.; et al. Microplastic exposure reshapes the virome and virus–bacteria networks with implications for immune regulation in Mytilus coruscus. Mar. Pollut. Bull. 2026, 222, 118785. [Google Scholar] [CrossRef]
- Remanan, C.; Sahadevan, P.; Anju, K.; Ankitha, C.; Lal, M.; Sanjeevan, V. Field and laboratory-based evidence of microplastic ingestion by the Asian green mussel, Perna viridis from the northern Malabar coast of India. Mar. Pollut. Bull. 2025, 220, 118468. [Google Scholar] [CrossRef] [PubMed]
- Ho, C.M.; Feng, W.; Li, X.; Kalaipandian, S.; Ngien, S.K.; Yu, X. Exploring the complex interactions between microplastics and marine contaminants. Mar. Pollut. Bull. 2026, 222, 118697. [Google Scholar] [CrossRef] [PubMed]
- Barker, M.; Singha, T.; Willans, M.; Hackett, M.; Pham, D.S. A Domain-Adaptive Deep Learning Approach for Microplastic Classification. Microplastics 2025, 4, 69. [Google Scholar] [CrossRef]
- Wootton, N.; Gillanders, B.M.; Leterme, S.; Noble, W.; Wilson, S.P.; Blewitt, M.; Swearer, S.E.; Reis-Santos, P. Research priorities on microplastics in marine and coastal environments: An Australian perspective to advance global action. Mar. Pollut. Bull. 2024, 205, 116660. [Google Scholar] [CrossRef]
- Hartmann, N.B.; Huffer, T.; Thompson, R.C.; Hassellov, M.; Verschoor, A.; Daugaard, A.E.; Rist, S.; Karlsson, T.; Brennholt, N.; Cole, M.; et al. Are we speaking the same language? Recommendations for a definition and categorization framework for plastic debris. Environ. Sci. Technol. 2019, 53, 1039–1047. [Google Scholar] [CrossRef]
- Villegas-Camacho, O.; Francisco-Valencia, I.; Alejo-Eleuterio, R.; Granda-Gutiérrez, E.E.; Martínez-Gallegos, S.; Villanueva-Vásquez, D. FTIR-Based Microplastic Classification: A Comprehensive Study on Normalization and ML Techniques. Recycling 2025, 10, 46. [Google Scholar] [CrossRef]
- Huang, H.; Cai, H.; Qureshi, J.U.; Mehdi, S.R.; Song, H.; Liu, C.; Di, Y.; Shi, H.; Yao, W.; Sun, Z. Proceeding the categorization of microplastics through deep learning-based image segmentation. Sci. Total Environ. 2023, 896, 165308. [Google Scholar] [CrossRef]
- Odhiambo, J.M.; Mvurya, M. Deep Learning Algorithm for Identifying Microplastics in Open Sewer Systems: A Systematic Review. Int. J. Eng. Sci. 2022, 11, 11–18. [Google Scholar]
- Fajardo-Urbina, J.M.; Liu, Y.; Georgievska, S.; Gräwe, U.; Clercx, H.J.; Gerkema, T.; Duran-Matute, M. Efficient deep learning surrogate method for predicting the transport of particle patches in coastal environments. Mar. Pollut. Bull. 2024, 209, 117251. [Google Scholar] [CrossRef]
- Zhao, F.; Huang, B.; Wang, J.; Shao, X.; Wu, Q.; Xi, D.; Liu, Y.; Chen, Y.; Zhang, G.; Ren, Z.; et al. Seafloor debris detection using underwater images and deep learning-driven image restoration: A case study from Koh Tao, Thailand. Mar. Pollut. Bull. 2025, 214, 117710. [Google Scholar] [CrossRef]
- Baek, J.Y.; de Guzman, M.K.; Park, H.m.; Park, S.; Shin, B.; Velickovic, T.C.; Van Messem, A.; De Neve, W. Developing a Segmentation Model for Microscopic Images of Microplastics Isolated from Clams. In Proceedings of the International Conference on Pattern Recognition, Virtual Event, 10–15 January 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 86–97. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv 2015. [Google Scholar] [CrossRef]
- Lee, G.; Jhang, K. Neural network analysis for microplastic segmentation. Sensors 2021, 21, 7030. [Google Scholar] [CrossRef]
- Ibtehaz, N.; Rahman, M.S. MultiResUNet: Rethinking the U-Net architecture for multimodal biomedical image segmentation. Neural Netw. 2020, 121, 74–87. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015. [Google Scholar] [CrossRef]
- Xu, J.; Wang, Z. Efficient and accurate microplastics identification and segmentation in urban waters using convolutional neural networks. Sci. Total Environ. 2024, 911, 168696. [Google Scholar] [CrossRef] [PubMed]
- Wang, J.; Sun, K.; Cheng, T.; Jiang, B.; Deng, C.; Zhao, Y.; Liu, D.; Mu, Y.; Tan, M.; Wang, X.; et al. Deep High-Resolution Representation Learning for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 3349–3364. [Google Scholar] [CrossRef]
- Jeong, J.; Lee, G.; Jeong, J.; Kim, J.; Kim, J.; Jhang, K. Microplastic Binary Segmentation with Resolution Fusion and Large Convolution Kernels. J. Comput. Sci. Eng 2024, 18, 29–35. [Google Scholar] [CrossRef]
- Royer, S.J.; Wolter, H.; Delorme, A.E.; Lebreton, L.; Poirion, O.B. Computer vision segmentation model—Deep learning for categorizing microplastic debris. Front. Environ. Sci. 2024, 12, 1386292. [Google Scholar] [CrossRef]
- Lee, K.S.; Chen, H.L.; Ng, Y.S.; Maul, T.; Gibbins, C.; Ting, K.N.; Amer, M.; Camara, M. U-Net skip-connection architectures for the automated counting of microplastics. Neural Comput. Appl. 2022, 34, 7283–7297. [Google Scholar] [CrossRef]
- Park, H.m.; Park, S.; de Guzman, M.K.; Baek, J.Y.; Cirkovic Velickovic, T.; Van Messem, A.; De Neve, W. MP-Net: Deep learning-based segmentation for fluorescence microscopy images of microplastics isolated from clams. PLoS ONE 2022, 17, e0269449. [Google Scholar] [CrossRef]
- Shi, B.; Patel, M.; Yu, D.; Yan, J.; Li, Z.; Petriw, D.; Pruyn, T.; Smyth, K.; Passeport, E.; Miller, R.D.; et al. Automatic quantification and classification of microplastics in scanning electron micrographs via deep learning. Sci. Total Environ. 2022, 825, 153903. [Google Scholar] [CrossRef]
- Qin, D.; Leichner, C.; Delakis, M.; Fornoni, M.; Luo, S.; Yang, F.; Wang, W.; Banbury, C.; Ye, C.; Akin, B.; et al. MobileNetV4-Universal Models for the Mobile Ecosystem. arXiv 2024, arXiv:2404.10518. [Google Scholar]
- Karen, S. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. arXiv 2018. [Google Scholar] [CrossRef]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Sun, Y.; Chen, G.; Zhou, T.; Zhang, Y.; Liu, N. Context-aware Cross-level Fusion Network for Camouflaged Object Detection. arXiv 2021. [Google Scholar] [CrossRef]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. In Computer Vision—ECCV 2014; Springer International Publishing: Berlin/Heidelberg, Germany, 2014; pp. 346–361. [Google Scholar] [CrossRef]
- Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H.; Xiong, H.; He, Q. A comprehensive survey on transfer learning. Proc. IEEE 2020, 109, 43–76. [Google Scholar] [CrossRef]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The Cityscapes Dataset for Semantic Urban Scene Understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Mao, A.; Mohri, M.; Zhong, Y. Cross-entropy loss functions: Theoretical analysis and applications. In Proceedings of the International Conference on Machine Learning, PMLR, Honolulu, HI, USA, 23–29 July 2023; pp. 23803–23828. [Google Scholar]
- Milletari, F.; Navab, N.; Ahmadi, S.A. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 25–28 October 2016; pp. 565–571. [Google Scholar]
- Luo, L.; Xiong, Y.; Liu, Y.; Sun, X. Adaptive Gradient Methods with Dynamic Bound of Learning Rate. arXiv 2019. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. arXiv 2015. [Google Scholar] [CrossRef]
- Yu, C.; Wang, J.; Peng, C.; Gao, C.; Yu, G.; Sang, N. BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation. arXiv 2018. [Google Scholar] [CrossRef]
- Yu, C.; Gao, C.; Wang, J.; Yu, G.; Shen, C.; Sang, N. BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation. arXiv 2020. [Google Scholar] [CrossRef]
- Zhou, Z.; Siddiquee, M.M.R.; Tajbakhsh, N.; Liang, J. UNet++: A Nested U-Net Architecture for Medical Image Segmentation. arXiv 2018. [Google Scholar] [CrossRef]
- Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation. arXiv 2021. [Google Scholar] [CrossRef]
- Lusher, A.L.; Bråte, I.L.N.; Munno, K.; Hurley, R.R.; Welden, N.A. Is it or isn’t it: The importance of visual classification in microplastic characterization. Appl. Spectrosc. 2020, 74, 1139–1153. [Google Scholar] [CrossRef]
- Zhu, Y.; Yeung, C.H.; Lam, E.Y. Microplastic pollution monitoring with holographic classification and deep learning. J. Phys. Photonics 2021, 3, 024013. [Google Scholar] [CrossRef]
- Teng, T.W.; Sheikh, U.U. Deep learning based microplastic classification. In Proceedings of the AIP Conference Proceedings, Male’ City, Maldives, 15–17 November 2023; AIP Publishing: Melville, NY, USA, 2024; Volume 3245. [Google Scholar]
- Ravi, N.; Gabeur, V.; Hu, Y.T.; Hu, R.; Ryali, C.; Ma, T.; Khedr, H.; Rädle, R.; Rolland, C.; Gustafson, L.; et al. Sam 2: Segment anything in images and videos. arXiv 2024, arXiv:2408.00714. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).