1. Introduction
Computer vision tasks, such as object detection and segmentation, require large-scale datasets to train neural network models [
1]. An object detection task involves identification of object boundaries in an image, while a segmentation task assumes pixel classification. These tasks pose several challenges for researchers. The first one is image collection; the second issue concerns the preparation of high-quality annotations for the dataset. Obtaining precise annotations is a time-consuming and costly process, especially for large-scale datasets [
2].
Computer vision tasks in the agriculture domain are even more challenging [
3,
4]. Plants are very diverse and volatile. For many practical problems, we have to solve even more difficult tasks. We need to distinguish plant parts rather than the whole plant. These masks are used to identify different parts of the plant, such as leaves, stems, and fruits. This information can be used to quantify plant traits such as leaf area, stem diameter, and fruit size. It can also help in identifying specific plant diseases that affect certain parts of the plant. The use of computer vision systems in agriculture can help automate many tasks, such as crop monitoring, weed detection, and yield estimation. Accurate plant segmentation and part segmentation masks are essential for developing such automated systems. The plant part segmentation model can be used as a component in a larger pipeline for precision agriculture. For example, the model can be used to identify and segment different parts of a plant, such as leaves, stems, and fruits, from images captured by drones or other imaging devices. This information can then be used to analyze the health and growth of the plant, optimize irrigation and fertilization, and detect diseases or pests early on. The model can also be beneficial in reducing manual labor and increasing efficiency in agricultural operations. By automating the process of plant part segmentation, farmers can save time and resources that would otherwise be spent on manual inspection and analysis.
For this task, we usually need fine-grained manual annotations. However, it is not feasible to collect and annotate plant parts datasets for every plant variety and condition. It is reasonable to utilize data augmentation techniques in such cases to enlarge the dataset. In some cases, synthetic data can save us thousands of hours for manual annotation [
5]. Besides classical augmentations such as color and geometrical transformations, there are advanced techniques [
6,
7]. For instance, one can apply object-based augmentation (OBA) [
8,
9]. The key idea of the OBA is to crop foreground target objects from the image using their masks, apply some augmentations to these instances, and then past them onto a new background. OBA is more flexible than classical image-based augmentation, providing more ways to handle the target objects [
10]. However, to extend the amount of data in the custom dataset using OBA, masks of the target objects are necessary [
11]. Semantic segmentation annotation requires more time and resources than image-level annotation because it involves defining per-pixel boundaries of the target objects. One promising approach to make the pixel-wise annotation process easier is weakly supervised semantic segmentation (WSSS), which utilizes weak supervision such as image-level labels and bounding boxes. Implementing WSSS to obtain masks of target objects for OBA techniques significantly accelerates the creation of custom datasets and simplifies the process of data labeling.
Several approaches have been proposed recently to deal with the task of WSSS. Image-level labels ascribing is the most convenient and cost-effective type of image annotation. Most recent studies in WSSS that use image-level labels employ a class activation map (CAM) method to generate pseudo-masks. The CAM is obtained from the classification network with a global average pooling (GAP) layer [
12]. The classification network activates specific features of the input image depending on the class label. The CAM approach highlights the most important parts of the image on which the class prediction is based. However, WSSS methods based on the CAM have drawbacks such as underactivation. It means that CAM produces high response only in the most discriminative regions, but ignores other regions that can be important for segmentation. Therefore, many research studies are devoted to enlarging region coverage provided by CAM. It is important to emphasize that OBA employment makes tough demands for CAM quality because too noisy or corrupted pseudo-masks can ruin OBA as well as the training process.
Typically, WSSS refers to methods for addressing the semantic segmentation task, which is an important computer vision task that is applied in critical systems such as aerial image analysis [
13], unmanned aerial vehicles (UAVs) [
14], autonomous vehicles (AVs) [
15], robotics [
16], and environmental analysis [
17]. However, the high cost of pixel-wise annotations limits progress in these research fields. Combining OBA and WSSS can significantly improve progress and increase the size of annotated datasets, which in turn can have a positive impact on neural network training in general. This work aims to obtain a segmentation mask using only a limited amount of weak supervision labels, such as class labels and bounding boxes. All experiments in this work were performed on agricultural images, which present several challenges for computer vision tasks. Firstly, plants exhibit a wide range of morphological variations and can vary in appearance at different stages of growth, making accurate recognition difficult. Secondly, the appearance of plants can be significantly influenced by environmental factors and imaging properties, such as lighting conditions, background clutter, and occlusion, which can further complicate the recognition process. Thirdly, plants can share similar visual characteristics with each other, making it challenging to distinguish between different species or varieties of plants. Additionally, acquiring large amounts of high-quality labeled data for training models in plant segmentation can be difficult and costly, especially for rare or exotic plant species. Lastly, the computational complexity of plant segmentation tasks can be high, particularly when dealing with large-scale datasets requiring significant amounts of computational resources and specialized hardware.
The novelty of this paper is in the exploration of weakly-supervised approaches for object parts segmentation.
The main contributions of the work are the following:
We collect and annotate a dataset of images in agricultural domain. The dataset covers multiple subdomains, and has segmentation masks for each plant part.
We commit a detailed review of weakly supervised and unsupervised images segmentation methods.
We present a new robust weakly supervised algorithm that allows training instance segmentation models having only bounding box annotations.
We present a pipeline with instance-level augmentation based on weakly supervised segmentation and prove its efficiency.
The remainder of the paper is organized as follows:
Section 2 describes the literature devoted to the recent WSSS methods;
Section 3 describes experiment methodology and methods used;
Section 4 and
Section 5 report on the results and include discussion.
4. Results
Table 4 and
Table 5 compare the
[email protected] and
[email protected]:0.95 of the predictions provided by YOLOv8 trained on different types of labels for the instance segmentation task. The results of both tables show the relative percentage gain in metrics. The gray color in these tables represents the baseline case where the bounding box is used as the segmentation mask.
Table 4 shows that for plant part segmentation, SAM-based annotations work better for the most of the tasks. This approach increases mAP from 13 to 16 compared with the baseline. This is close to the result of a model, trained on the real part segmentation masks. However, we must note that for very thin objects such as plant roots, the SAM-based approach is weaker than the baseline. Therefore, for this class it is more suitable to use TransCAM with MiDaS masks.
We can also observe that metrics for the stem category provided by the SAM method outperform the ground truth result.
Figure 6 shows SAM pseudo-masks for the stem category.
Figure 6 reveals that, apart from the target objects, SAM also detects tomato stalks placed near to the stem due to their similar semantic structure to the stems. Consequently, the SAM pseudo-masks provide additional information about the target category by highlighting semantically similar objects within the image.
Table 5 shows the results of full plant segmentation. With this objective, a model trained with masks, obtained with TransCAM and MiDaS, generally works better. On average, it provides a 40% relative increase in mAP. This approach significantly beats SAM-based masks because MiDaS depth masks allow us to distinguish the borders of overlapping objects better. The only exception here is the result on the Herbarium plants dataset. The reason for this is the simplicity of this dataset. It has a single plant on each image, and the background is always uniform.
Table 6 and
Table 7 display the results of combinations of object-based augmentation and SAM techniques applied to the object parts and full plants. In the brackets are shown gains in metrics related to the original SAM results from
Table 4 and
Table 5.
In
Table 6 and
Table 7 one can see that object-based augmentation boosts the performance of weakly supervised solutions even further. In these results, a 26% increase in mAP for the object part case and a 72% increase for the full plant case are observed. Obtained metrics verify that the OBA approach significantly improves the performance on instance segmentation task when using pseudo-masks and a small-sized dataset. However, it should be noted that for the plants with rich morphological structure and overlapping objects, such as plants from the Cassava plants dataset, the given approach is faced with challenges and provides lower metrics than the original approach.
The following dependencies were observed: TransCAM and MiDaS exhibit superior results for the whole plant case, while SAM performs well for the part plant case. These approaches have different strengths and weaknesses. Consequently, we decided to construct a meta-algorithm combining TransCAM and MiDaS with SAM to complement mutual weaknesses and achieve improved performance.
The core of the meta-model is the Passive Aggressive Classifier (PAC) from the Scikit-Learn package. We utilized SAM pseudo-instance masks and TransCAM heatmaps as input for PAC, with ground truth masks as the desired output.
For the dataset, we selected 50 images on which we conducted PAC training, using the hinge loss function. The proposed meta-algorithm was employed to generate annotations for the dataset. Subsequently, YOLOv8 was trained on the obtained annotations. The results of comparing the quality of the meta-algorithm with the TransCAM and MiDaS approach are presented in
Table 8. The experiment was conducted on the full plant case, where the TransCAM with MiDaS approach demonstrated the best results.
In
Table 8, one can see that the meta-algorithm outperforms the TransCAM and MiDaS approach for the full plant case by 5%. Combining two algorithms to eliminate their weaknesses can be a promising approach. By leveraging the strengths of each algorithm and compensating for their limitations, the resulting combination has the potential to achieve improved performance and robustness. The proposed approach allows for a more comprehensive solution that addresses multiple aspects of the observed problems.
In order to prove statistical significance of the obtained metrics, we used the Kruskal–Wallis test [
51]. This method is used to determine if there are significant differences in the medians of three or more independent groups. In our experiment, we divided our data into five folds and performed the training process five times in a row for every type of the mask, creating tables with metrics and averaged metrics value over five experiments. First of all, we calculated gain in mAP related to the bounding-box-based masks (baseline case) in every experiment. Then, we calculated
p-value using Kruskal–Wallis test for these metrics gains for the following types of masks: TransCAM and MiDaS pseudo-masks, SAM pseudo-masks, and meta-model pseudo-masks. We wanted to prove that improvements in the metrics provided by the proposed and considered methods were not a coincidence. We set a significance level, also known as alpha, to the 0.05 value. It is a threshold used to reject the null hypothesis that considered data groups have the same median value. It represents the maximum acceptable probability of observing a result as extreme as, or more extreme than, the observed result, assuming the null hypothesis is true. The obtained
p-value from the Kruskal–Wallis test is equal to 0.017 and it is lower than the significance level of 0.05. In the conditions of a small amount of statistical data, it is sufficient evidence that our results have different statistical parameters and the obtained metrics values for the considered methods relative to the baseline case are not a coincidence.
Figure 7 gives the comparative qualitative results obtained with different kinds of masks used.
Author Contributions
Conceptualization, S.N. and S.I.; methodology, A.S.; software, S.M.; validation, S.M., S.N. and S.I.; formal analysis, S.M.; investigation, S.M.; resources, A.S.; data curation, S.N.; writing—original draft preparation, S.M. and S.N.; writing—review and editing, S.I. and A.S.; visualization, S.M.; supervision, A.S.; project administration, S.N.; funding acquisition, S.I. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by Ministry of Science and Higher Education grant No. 075-10-2021-068.
Data Availability Statement
The dataset is available upon request.
Conflicts of Interest
The authors declare no conflict of interest.
Abbreviations
The following abbreviations are used in this manuscript:
A2GNN | Affinity Attention Graph Neural Network |
ACFN | Atrous Convolutional Feature Network |
AdvCAM | Adversarial Class Activation Map |
AV | Autonomous Vehicle |
AP | Average Precision |
AuxSegNet | Auxiliary Segmentation Network |
BAP | Background Average Pooling |
CCAM | Class-Agnostic Activation Map |
CLIP | Contrastive Language–Image Pretraining |
CMA | Collaborative Multi-Attention |
CODNet | Co-attention Dictionary Network |
CAM | Class Activation Map |
CPN | Complementary Patch Network |
DCAMA | Dense Pixel-Wise Cross-Query-and-Support Attention-Weighted Mask Aggregation |
DINO | Self-DIstillatiON Loss |
DA | Discriminative Activation |
dCRF | Dense Conditional Random Field |
EDAM | Embedded Discriminative Attention Mechanism |
FP | False Positive |
GPT | Generative Pretrained Transformer |
GT | Ground Truth |
GAP | Global Average Pooling |
HDMNet | Hierarchically Decoupled Matching Network |
HSSP | Hybrid Spatial Pyramid Pooling |
H-DSRG | Hierarchical Deep Seeded Region Growing |
ISIM | Image Segmentation with Iterative Masking |
LOST | Localizing Objects with Self-Supervised Transformers |
MSANet | Multi-Similarity and Attention Network |
MiDaS | Monocular Depth Estimation |
MIL | Multiple Instance Learning |
MP | Multiple Point |
mIoU | Mean Intersection Over Union |
mAP | Mean Average Precision |
NAL | Noise-Aware Loss |
NSRM | Nonsalient Region Masking |
PAC | Passive Aggressive Classifier |
PPL | Progressive Patch Learning |
POM | Potential Object Mining |
PCM | Pixel Correlation Module |
PRCM | Pixel-Region Correlation Module |
OA-CAM | Online Accumulated Class Attention Map |
OBA | Object-Based Augmentation |
SVD | Singular Value Decomposition |
SVF | Singular Value Fine-tuning |
SegGPT | Segmentation Generative Pretrained Transformer |
SLAM | Semantic Learning-Based Activation Map |
SEAM | Self-Supervised Equivariant Attention Mechanism |
SAM | Segment Anything from Meta |
TransCAM | Transformer Class Activation Map |
TP | True Positive |
UAV | Unmanned Aerial Vehicle |
VWE | Visual Word Encoder |
WSSS | Weakly-Supervised Semantic Segmentation |
WSIS | Weakly-Supervised Instance Segmentation |
YOLO | You Only Look Once |
References
- Sorscher, B.; Geirhos, R.; Shekhar, S.; Ganguli, S.; Morcos, A. Beyond neural scaling laws: Beating power law scaling via data pruning. Adv. Neural Inf. Process. Syst. 2022, 35, 19523–19536. [Google Scholar]
- Paton, N. Automating data preparation: Can we? should we? must we? In Proceedings of the 21st International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data, Lisbon, Portugal, 26 March 2019. [Google Scholar]
- Lemikhova, L.; Nesteruk, S.; Somov, A. Transfer Learning for Few-Shot Plants Recognition: Antarctic Station Greenhouse Use-Case. In Proceedings of the 2022 IEEE 31st International Symposium on Industrial Electronics (ISIE), Anchorage, AK, USA, 1–3 June 2022; pp. 715–720. [Google Scholar] [CrossRef]
- Nesteruk, S.; Shadrin, D.; Pukalchik, M.; Somov, A.; Zeidler, C.; Zabel, P.; Schubert, D. Image compression and plants classification using machine learning in controlled-environment agriculture: Antarctic station use case. IEEE Sens. J. 2021, 21, 17564–17572. [Google Scholar] [CrossRef]
- Markov, I.; Nesteruk, S.; Kuznetsov, A.; Dimitrov, D. RusTitW: Russian Language Text Dataset for Visual Text in-the-Wild Recognition. arXiv 2023, arXiv:2303.16531. [Google Scholar]
- Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data 2019, 6, 1–48. [Google Scholar] [CrossRef] [Green Version]
- Illarionova, S.; Nesteruk, S.; Shadrin, D.; Ignatiev, V.; Pukalchik, M.; Oseledets, I. MixChannel: Advanced augmentation for multispectral satellite images. Remote Sens. 2021, 13, 2181. [Google Scholar] [CrossRef]
- Illarionova, S.; Nesteruk, S.; Shadrin, D.; Ignatiev, V.; Pukalchik, M.; Oseledets, I. Object-based augmentation for building semantic segmentation: Ventura and santa rosa case study. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 1659–1668. [Google Scholar]
- Ghiasi, G.; Cui, Y.; Srinivas, A.; Qian, R.; Lin, T.Y.; Cubuk, E.D.; Le, Q.V.; Zoph, B. Simple copy-paste is a strong data augmentation method for instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 2918–2928. [Google Scholar]
- Nesteruk, S.; Illarionova, S.; Akhtyamov, T.; Shadrin, D.; Somov, A.; Pukalchik, M.; Oseledets, I. Xtremeaugment: Getting more from your data through combination of image collection and image augmentation. IEEE Access 2022, 10, 24010–24028. [Google Scholar] [CrossRef]
- Illarionova, S.; Shadrin, D.; Ignatiev, V.; Shayakhmetov, S.; Trekin, A.; Oseledets, I. Augmentation-Based Methodology for Enhancement of Trees Map Detalization on a Large Scale. Remote Sens. 2022, 14, 2281. [Google Scholar] [CrossRef]
- Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2921–2929. [Google Scholar]
- Xia, G.S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Utah, USA, 18–22 June 2018; pp. 3974–3983. [Google Scholar]
- Bouguettaya, A.; Zarzour, H.; Kechida, A.; Taberkit, A.M. Vehicle detection from UAV imagery with deep learning: A review. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 6047–6067. [Google Scholar] [CrossRef]
- Feng, D.; Haase-Schütz, C.; Rosenbaum, L.; Hertlein, H.; Glaeser, C.; Timm, F.; Wiesbeck, W.; Dietmayer, K. Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges. IEEE Trans. Intell. Transp. Syst. 2020, 22, 1341–1360. [Google Scholar] [CrossRef] [Green Version]
- Ruiz-del Solar, J.; Loncomilla, P.; Soto, N. A survey on deep learning methods for robot vision. arXiv 2018, arXiv:1803.10862. [Google Scholar]
- Illarionova, S.; Shadrin, D.; Tregubova, P.; Ignatiev, V.; Efimov, A.; Oseledets, I.; Burnaev, E. A Survey of Computer Vision Techniques for Forest Characterization and Carbon Monitoring Tasks. Remote Sens. 2022, 14, 5861. [Google Scholar] [CrossRef]
- Zhang, B.; Xiao, J.; Jiao, J.; Wei, Y.; Zhao, Y. Affinity Attention Graph Neural Network for Weakly Supervised Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 8082–8096. [Google Scholar] [CrossRef]
- Yao, Y.; Chen, T.; Xie, G.S.; Zhang, C.; Shen, F.; Wu, Q.; Tang, Z.; Zhang, J. Non-salient region object mining for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 2623–2632. [Google Scholar]
- Wang, Y.; Zhang, J.; Kan, M.; Shan, S.; Chen, X. Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 12275–12284. [Google Scholar]
- Wu, T.; Huang, J.; Gao, G.; Wei, X.; Wei, X.; Luo, X.; Liu, C.H. Embedded discriminative attention mechanism for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021. [Google Scholar] [CrossRef]
- Bircanoglu, C.; Arica, N. ISIM: Iterative Self-Improved Model for Weakly Supervised Segmentation. arXiv 2022, arXiv:2211.12455. [Google Scholar]
- Zhang, F.; Gu, C.; Zhang, C.; Dai, Y. Complementary patch for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 7242–7251. [Google Scholar]
- Li, J.; Jie, Z.; Wang, X.; Zhou, Y.; Wei, X.; Ma, L. Weakly Supervised Semantic Segmentation via Progressive Patch Learning. IEEE Trans. Multimed. 2022, 25, 1686–1699. [Google Scholar] [CrossRef]
- Oh, Y.; Kim, B.; Ham, B. Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2021, Virtual, 19–25 June 2021. [Google Scholar] [CrossRef]
- Ma, T.; Wang, Q.; Zhang, H.; Zuo, W. Delving Deeper Into Pixel Prior for Box-Supervised Semantic Segmentation. IEEE Trans. Image Process. 2022, 31, 1406–1417. [Google Scholar] [CrossRef]
- Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.Y.; et al. Segment anything. arXiv 2023, arXiv:2304.02643. [Google Scholar]
- Xu, L.; Xue, H.; Bennamoun, M.; Boussaid, F.; Sohel, F. Atrous convolutional feature network for weakly supervised semantic segmentation. Neurocomputing 2021, 421, 115–126. [Google Scholar] [CrossRef]
- Chen, J.; Zhao, X.; Liu, M.; Shen, L. SLAM: Semantic Learning based Activation Map for Weakly Supervised Semantic Segmentation. arXiv 2022, arXiv:2210.12417. [Google Scholar]
- Xu, L.; Ouyang, W.; Bennamoun, M.; Boussaid, F.; Sohel, F.; Xu, D. Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised Semantic Segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021. [Google Scholar] [CrossRef]
- Wan, W.; Chen, J.; Yang, M.H.; Ma, H. Co-attention dictionary network for weakly-supervised semantic segmentation. Neurocomputing 2022, 486, 272–285. [Google Scholar] [CrossRef]
- Chong, Y.; Chen, X.; Tao, Y.; Pan, S. Erase then grow: Generating correct class activation maps for weakly-supervised semantic segmentation. Neurocomputing 2021, 453, 97–108. [Google Scholar] [CrossRef]
- Ru, L.; Du, B.; Wu, C. Learning Visual Words for Weakly-Supervised Semantic Segmentation. In Proceedings of the 13th International Joint Conference on Artificial Intelligence, Virtual, 19–27 August 2021. [Google Scholar] [CrossRef]
- Lee, J.; Kim, E.; Yoon, S. Anti-adversarially manipulated attributions for weakly and semi-supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 4071–4080. [Google Scholar]
- Siméoni, O.; Puy, G.; Vo, H.V.; Roburin, S.; Gidaris, S.; Bursuc, A.; Pérez, P.; Marlet, R.; Ponce, J. Localizing objects with self-supervised transformers and no labels. arXiv 2021, arXiv:2109.14279. [Google Scholar]
- Caron, M.; Touvron, H.; Misra, I.; Jégou, H.; Mairal, J.; Bojanowski, P.; Joulin, A. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 9650–9660. [Google Scholar]
- Wang, Y.; Shen, X.; Hu, S.X.; Yuan, Y.; Crowley, J.L.; Vaufreydaz, D. Self-supervised transformers for unsupervised object discovery using normalized cut. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 14543–14553. [Google Scholar]
- Melas-Kyriazi, L.; Rupprecht, C.; Laina, I.; Vedaldi, A. Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 8364–8375. [Google Scholar]
- Sauvalle, B.; de La Fortelle, A. Unsupervised Multi-object Segmentation Using Attention and Soft-argmax. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 2–7 January 2023; pp. 3267–3276. [Google Scholar]
- Xie, J.; Xiang, J.; Chen, J.; Hou, X.; Zhao, X.; Shen, L. C2AM: Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation. arXiv 2022, arXiv:2203.13505. [Google Scholar]
- Sun, Y.; Chen, Q.; He, X.; Wang, J.; Feng, H.; Han, J.; Ding, E.; Cheng, J.; Li, Z.; Wang, J. Singular Value Fine-tuning: Few-shot Segmentation requires Few-parameters Fine-tuning. Adv. Neural Inf. Process. Syst. 2022, 35, 37484–37496. [Google Scholar]
- Iqbal, E.; Safarov, S.; Bang, S. MSANet: Multi-Similarity and Attention Guidance for Boosting Few-Shot Segmentation. arXiv 2022, arXiv:2206.09667. [Google Scholar]
- Shi, X.; Wei, D.; Zhang, Y.; Lu, D.; Ning, M.; Chen, J.; Ma, K.; Zheng, Y. Dense cross-query-and-support attention weighted mask aggregation for few-shot segmentation. In Computer Vision–ECCV 2022, Proceedings of the 17th European Conference, Tel Aviv, Israel, 23–27 October 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 151–168. [Google Scholar]
- Peng, B.; Tian, Z.; Wu, X.; Wang, C.; Liu, S.; Su, J.; Jia, J. Hierarchical Dense Correlation Distillation for Few-Shot Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 23641–23651. [Google Scholar]
- Wang, X.; Zhang, X.; Cao, Y.; Wang, W.; Shen, C.; Huang, T. SegGPT: Segmenting everything in context. arXiv 2023, arXiv:2304.03284. [Google Scholar]
- Li, R.; Mai, Z.; Zhang, Z.; Jang, J.; Sanner, S. TransCAM: Transformer Attention-based CAM Refinement for Weakly Supervised Semantic Segmentation. J. Vis. Commun. Image Represent. 2023, 92, 103800. [Google Scholar] [CrossRef]
- Ranftl, R.; Lasinger, K.; Hafner, D.; Schindler, K.; Koltun, V. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 1623–1637. [Google Scholar] [CrossRef]
- Rezaei, M.; Farahanipad, F.; Dillhoff, A.; Elmasri, R.; Athitsos, V. Weakly-supervised hand part segmentation from depth images. In Proceedings of the 14th PErvasive Technologies Related to Assistive Environments Conference, Virtual, 29 June–2 July 2021; pp. 218–225. [Google Scholar]
- Ergül, M.; Alatan, A. Depth is all you Need: Single-Stage Weakly Supervised Semantic Segmentation From Image-Level Supervision. In Proceedings of the 2022 IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 16–19 October 2022; pp. 4233–4237. [Google Scholar]
- Khoreva, A.; Benenson, R.; Hosang, J.; Hein, M.; Schiele, B. Simple does it: Weakly supervised instance and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 876–885. [Google Scholar]
- Ostertagova, E.; Ostertag, O.; Kováč, J. Methodology and application of the Kruskal-Wallis test. Appl. Mech. Mater. 2014, 611, 115–120. [Google Scholar] [CrossRef]
- Zhang, Q.; Yang, M.; Zheng, Q.; Zhang, X. Segmentation of hand gesture based on dark channel prior in projector-camera system. In Proceedings of the 2017 IEEE/CIC International Conference on Communications in China (ICCC), Qingdao, China, 22–24 October 2017; pp. 1–6. [Google Scholar]
- Zheng, Q.; Yang, M.; Tian, X.; Wang, X.; Wang, D. Rethinking the Role of Activation Functions in Deep Convolutional Neural Networks for Image Classification. Eng. Lett. 2020, 28, 1–13. [Google Scholar]
- Illarionova, S.; Shadrin, D.; Ignatiev, V.; Shayakhmetov, S.; Trekin, A.; Oseledets, I. Estimation of the Canopy Height Model From Multispectral Satellite Imagery with Convolutional Neural Networks. IEEE Access 2022, 10, 34116–34132. [Google Scholar] [CrossRef]
- Zheng, Q.; Zhao, P.; Li, Y.; Wang, H.; Yang, Y. Spectrum interference-based two-level data augmentation method in deep learning for automatic modulation classification. Neural Comput. Appl. 2021, 33, 7723–7745. [Google Scholar] [CrossRef]
- Zheng, Q.; Zhao, P.; Wang, H.; Elhanashi, A.; Saponara, S. Fine-grained modulation classification using multi-scale radio transformer with dual-channel representation. IEEE Commun. Lett. 2022, 26, 1298–1302. [Google Scholar] [CrossRef]
- Nesteruk, S.; Zherebtsov, I.; Illarionova, S.; Shadrin, D.; Somov, A.; Bezzateev, S.V.; Yelina, T.; Denisenko, V.; Oseledets, I. CISA: Context Substitution for Image Semantics Augmentation. Mathematics 2023, 11, 1818. [Google Scholar] [CrossRef]
- Nesteruk, S.; Shadrin, D.; Kovalenko, V.; Rodríguez-Sanchez, A.; Somov, A. Plant growth prediction through intelligent embedded sensing. In Proceedings of the 2020 IEEE 29th International Symposium on Industrial Electronics (ISIE), Delft, The Netherlands, 17–19 June 2020; pp. 411–416. [Google Scholar]
- Illarionova, S.; Shadrin, D.; Shukhratov, I.; Evteeva, K.; Popandopulo, G.; Sotiriadi, N.; Oseledets, I.; Burnaev, E. Benchmark for Building Segmentation on Up-Scaled Sentinel-2 Imagery. Remote Sens. 2023, 15, 2347. [Google Scholar] [CrossRef]
- Fu, Y.; Yao, X. A review on manufacturing defects and their detection of fiber reinforced resin matrix composites. Compos. Part C Open Access 2022, 8, 100276. [Google Scholar] [CrossRef]
- Illarionova, S.; Trekin, A.; Ignatiev, V.; Oseledets, I. Tree species mapping on sentinel-2 satellite imagery with weakly supervised classification and object-wise sampling. Forests 2021, 12, 1413. [Google Scholar] [CrossRef]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).