Next Article in Journal
Photodynamic and Sonodynamic Antibacterial Activity of Grape Leaf Extracts
Next Article in Special Issue
A Data-Driven Multi-Granularity Attention Framework for Sentiment Recognition in News and User Reviews
Previous Article in Journal
Seismic Performance Assessment of 170 kV Line Trap Systems Through Shake Table Testing and Finite Element Analysis
Previous Article in Special Issue
Enhancing Wafer Notch Detection for Ion Implantation: Optimized YOLOv8 Approach with Global Attention Mechanism
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Classification of Blackcurrant Genotypes by Ploidy Levels on Stomata Microscopic Images with Deep Learning: Convolutional Neural Networks and Vision Transformers

by
Aleksandra Konopka
1,*,
Ryszard Kozera
1,2,
Agnieszka Marasek-Ciołakowska
3 and
Aleksandra Machlańska
3
1
Institute of Information Technology, Warsaw University of Life Sciences—SGGW, ul. Nowoursynowska 159, 02-776 Warsaw, Poland
2
School of Physics, Mathematics and Computing, The University of Western Australia, 35 Stirling Highway, Crawley, Perth, WA 6009, Australia
3
Department of Applied Biology, The National Institute of Horticultural Research, ul. Konstytucji 3 Maja 1/3, 96-100 Skierniewice, Poland
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(19), 10735; https://doi.org/10.3390/app151910735
Submission received: 22 August 2025 / Revised: 30 September 2025 / Accepted: 1 October 2025 / Published: 5 October 2025

Abstract

Plants vary in number of chromosomes (ploidy levels), which can influence morphological traits, including the size and density of stomata cells. Although biologists can detect these differences under a microscope, the process is often time-consuming and tedious. This study aims to automate the classification of blackcurrant (Ribes nigrum L.) ploidy levels—diploid, triploid, and tetraploid—by leveraging deep learning techniques. Convolutional Neural Networks and Vision Transformers are employed to perform microscopic image classification across two distinct blackcurrant datasets. Initial experiments demonstrate that these models can effectively classify ploidy levels when trained and tested on subsets derived from the same dataset. However, the primary challenge lies in proposing a model capable of yielding satisfactory classification results across different datasets ensuring robustness and generalization, which is a critical step toward developing a universal ploidy classification system. In this research, a variety of experiments is performed including application of augmentation technique. Model efficacy is evaluated with standard metrics and its interpretability is ensured through Gradient-weighted Class Activation Mapping visualizations. Finally, future research directions are outlined with application of other advanced state-of-the-art machine learning methods to further refine ploidy level prediction in botanical studies.

1. Introduction

Plants differ in numbers of chromosomes (ploidy levels) and these variations can impact morphological features of plants at the microscopic level. The differences are evident in characteristics of stomata which are pores in plants responsible for gas exchange, controlling water transpiration and regulating light intake [1]. The objective of this research is to verify whether machine learning algorithms are capable of distinguishing ploidy levels in blackcurrant (Ribes nigrum L.) based on microscopic images of leaves with visible stomata structures. In this study, the considered datasets consist of diploids, triploids, and tetraploids which refer to the number of chromosome sets.
Existing literature suggests that stomata size increases and its density decreases with higher ploidy levels [2,3]. The biologists can manually measure these differences on microscopic images; however, this approach is often time-consuming and tedious. When studying such aspects as distinguishing different ploidy levels or analyzing impact of biofertilizers on stomata characteristics, the manual measures must be repeated many times to draw significant conclusions. This is the reason why these processes need to be automated with a computerized approach, powered by artificial intelligence (AI), ultimately reducing the amount of time needed for the analysis. Although there are numerous papers on computerized stomata counting and measuring [4,5,6], our research that applies AI for classification of ploidy levels is a novel approach.
Historically, the stomata measurements were performed manually. To enhance the manual measuring softwares such as ImageJ [7] were commonly applied. With the development of computer vision and machine learning techniques the stomata instances were initially described with handcrafted features. As an example, in a work from 2017 [8], authors measured features such as area, major axis length, minor axis length, and eccentricity. The field of stomata analysis was transformed by deep learning [9], which became the primary tool for detection, segmentation and feature extraction. In 2020, Ref. [10] applied YOLO algorithm for stomata detection of common beans (Phaseolus vulgaris L.), barley plants (Hordeum vulgare L. cv. Henley), and two soybean (Glycine max. L.) cultivars (PI 398223 and PI 567201). In 2021, Ref. [11] proposed a system for tracking and monitoring stomata dynamics in wheat leaf where MobileNet was used for feature extraction. In 2025, Ref. [12] applied U-Net for image segmentation of root images on dataset comprising of monocotyledon spring and winter wheat (Triticum aestivum L.) and dicotyledonous faba bean (Vicia faba L.). Stomata microscopic images are rarely employed to solve classification problems. Existing research has addressed specific tasks: classification of certain species [9] (deep learning algorithms) and verifying open or closed stomata structures [13] (Support Vector Machine).
The segmentation of stomata in microscopic images forms a challenge due to the presence of surrounding leaf structures which exhibit high contrast comparing to the background. Due to that fact, standard traditional image analysis approaches cannot be directly applied to most datasets. In our previous work [14], the YOLO algorithm was successfully utilized for segmentation task to specify the region of interest which is the area including stomata cells. Despite the satisfactory results, this approach had the following drawbacks: a time-consuming labeling process was required and it was needed to apply an additional method for classification (yielding a two-step process). In this work, it is investigated whether a single-step classification of whole images—without prior segmentation of individual stomata—can yield satisfactory results. This study compares the performance of two leading deep learning architectures: Convolutional Neural Networks (CNNs) [15] and Vision Transformers (ViTs) [16]. Although CNNs have been frequently applied to stomata image analysis, ViTs represent a state-of-the-art approach that, to the best of our knowledge, has not yet been tested on such images.
In our latest work [14], the models learned on training dataset and were tested on testing dataset. Both datasets were derived from one set of blackcurrant microscopic images prepared under certain conditions and including selected cultivars. In contrast to prior work, this study performs computations on two distinct datasets which include three classes: diploid, triploid, and tetraploid. These datasets differ in selected cultivars and time of capturing the samples (beginning or end of different vegetation seasons). These two datasets are split into train, validation, and test subsets. The models are trained on training subset from one set and tested iteratively on distinct subsets from the other set. This approach allows us to evaluate model performance not only on a set with the same characteristics, but also on another separate dataset. Consequently, the proposed evaluation approach enables us to verify whether the model performance is reproducible on other datasets or if the results are satisfactory only for the set prepared in initial training conditions.
The objective of this research is to verify whether AI models can successfully classify diploidy, triploidy, and tetraploidy. The models should be applicable to diverse datasets not to be limited to a constrained sample pool. In this work, performance of models is evaluated with metrics, and model interpretability is analyzed to understand decision-making criteria of considered architectures. Gradient-weighted Class Activation Mapping (Grad-CAM) method applied in this research was used in another paper on stomata detection [17] where it was applied to interpret the YOLO model. Augmentation technique is also tested to enhance generalization of the achieved results [18].
The key novel contributions of this research involve:
  • Computerized classification of ploidy levels based on microscopic images of stomata, whereas other researchers focus on detection, segmentation and measuring stomata. To the best of our knowledge, no other research group has yet applied artificial intelligence to ploidy-level classification.
  • The approach of training a model on one set and testing it on a second, different dataset for ploidy-level classification, as opposed to our previous research [14], where both training and testing was performed on subsets of a single dataset.
  • Application of Vision Transformers to stomata microscopic images, as other works do not apply ViTs to any stomata-related issues.

2. Datasets

The experiments were conducted in the Experimental Pomological Orchard in Skierniewice, central Poland, which is a part of the National Institute of Horticultural Research. Blackcurrant plants (Ribes nigrum L.) were cultivated under natural light in 50 L pots filled with a 1:1 mixture of peat substrate and soil supplemented with slow-release fertilizers. A drip system was employed to manage the irrigation regimes, and the moisture content of the growing medium was monitored using a dielectric probe (TEROS-12; METER, San Francisco, CA, USA). The water potential in the growth medium was maintained at (−)10 kPa to ensure optimal irrigation. The plants were protected against pests and diseases according to the Integrated Pest Management guidelines.
The datasets comprise three classes of blackcurrant genotypes distinguished by their chromosome counts: diploid ( 2 n = 2 x = 16 ) cultivar (class 0), triploid ( 2 n = 3 x = 24 ) cultivar (class 1), and autotetraploid ( 2 n = 4 x = 32 ) clone (class 2). All images were captured using a VHX-7000N KEYENCE digital microscope, to ensure high-quality images suitable for detailed analysis (see Figure 1).
In this research, computations are preformed on two distinct datasets.

2.1. Dataset1

Class 0 contains 252 images of the diploid ‘Gofert’ cultivar and 252 images of the diploid ‘Polares’ cultivar. Class 1 includes 502 images of the triploid ‘Dlinnokistnaja’ cultivar, while class 2 consists of 251 images of the autotetraploid clone of ‘Gofert’ cultivar and 250 images of the ‘Polares’ cultivar obtained by in vitro polyploidization [19]. Leaves were sampled for photo documentation at the end of a vegetation season between 5 and 25 September 2024.

2.2. Dataset2

Class 0 contains 500 images of diploid ‘Gofert’ cultivar. Class 1 includes 500 images of the triploid ‘Dlinnokistnaja’ cultivar and class 2 contains 500 images of an autotetraploid clone of ’Gofert’ [19]. Leaves were sampled for photo documentation at the beginning of the vegetation season between 14 and 30 April 2025.

3. Methods

Neural networks [20] are machine learning models which evolved from a single artificial neuron. This neuron was trained with perceptron algorithm which is capable of solving linearly separable binary problems [21]. Neural networks comprise of neurons which form layers. The number of trainable parameters in neural network increases along with complexity of its architecture. When a network consists of many layers then it is referred to as a deep neural network and its training process is termed deep learning. Neural networks are applied in many fields including computer vision [22], signal processing [23], and natural language processing [24].
In the classical approach, the input to the neural network is a matrix of features. These features are handcrafted by IT specialists in collaboration with domain experts, who determine which traits are relevant for a specific task [25]. For instance, in the classification of stomata images with handcrafted features, one could consider the computing number of stomata objects on each image, as well as their size and shape eccentricity. The surge of deep learning field results in new algorithms leveraging automatic feature extraction [22,26]. This process is performed with two mechanisms employed in this study—convolution [27] in Convolutional Neural Networks and self-attention [16] in Transformers.

3.1. Convolutional Neural Networks

Convolutional Neural Networks [28] are deep neural networks which are commonly applied in image analysis for tasks involving classification, detection, or segmentation. CNNs were first studied in 1980s with models proposed by Yann LeCun [15]. Further development of these architectures accelerated in early 21st century. CNNs comprise two modules: feature extraction and task-specific part (responsible, e.g., for classification or regression).
The feature extraction module processes input image data and generates a matrix representation through automatically learned features. This module is composed of alternating convolutional and pooling layers. The yielded feature matrix is subsequently fed into the second module, which is usually implemented as Multilayer Perceptron [29], where task-specific prediction is performed.
In convolution layers [27], the filters are applied to extract features with a sliding window operation. This operation is a cross-correlation between a filter and image patches which is also referred to as convolution. These filters are matrices with values (weights), trained along with other network parameters. The weights are adjusted to detect spatial features such as edges or textures.
The pooling layers [30] are applied to reduce spatial dimensions of the image. Despite this reduction, each layer is designed in such a way that the most important details are retained. The pooling algorithm can be performed differently, e.g., as a max pooling (only the maximal value within a patch preserves) or average pooling (mean value of all elements within the window is calculated). Pooling operation is applied to reduce computational complexity and migitate overfitting.
Residual Neural Network (ResNet) is an architecture of CNN introduced by He et al. in 2016 [31]. In neural networks trained by backpropagation algorithm, a problem with unstable gradients occurs in deeper layers of CNNs leading to exploding or vanishing gradient phenomena. This issues are addressed by ResNets with application of skip connections [32,33] which are the connections that bypass one or more layers. The gradient can efficiently propagate through such network which improves its stability and performance.
ResNet architectures are modified to enhance the accuracy (ACC) and performance. In ResNet152v2 [34], the order of altering layers is changed from: convolution, batch normalization, Rectified Linear Unit (ReLU) [35] to batch normalization, ReLU and convolution. This adjustment enhances training stability and further reduces gradient vanishing.

3.2. Transformers

Transformers are deep neural networks introduced in 2017 [16] which were originally created for natural language processing tasks.
The core of this methodology is the tokenization technique sampling the training corpus into smaller units. This approach is applied to reduce the dimensionality of data (not to process each character separately) and then to model dependencies between phrases in subsequent steps. Each token is assigned an individual ID which is selected based on its frequency of appearance in training corpus. To capture dependencies between tokens, an embedding mechanism is applied, designed to explain relations between words as vectors in high dimensional space [36]. Preserving the order of words in a text is essential for learning their context. In addition, the distance between specific words is represented by positional encoding which adds positional information to embedding input [37].
Then, the self-attention mechanism is applied to weight relevance between tokens in input data. An attention score is computed between pairs of tokens, where a high attention score represents high dependency. To obtain these relations, specific matrices W Q (query), W K (key), and W V (value) are trained with backpropagation algorithm alongside the rest of the model’s parameters.
Vision Transformers, proposed in 2020 [38], extend the Transformer architecture to computer vision tasks by processing images as input. Both ViTs and traditional Transformers employ the same self-attention mechanism; however, ViTs differ in their input data processing. ViTs decompose images into patches, which are then linearly embedded into tokens. These tokens are served as the input sequence for the Transformer encoder [39].

4. Experiments and Results

The experiments were performed on two sets of images DataSet1 and DataSet2. Each dataset contained three classes of microscopic images with different ploidy levels (class 0: diploid, class 1: triploid, and class 2: tetraploid). The images were processed before applying them as input to classification methods. Some of the images in the Dataset2 contained a label with information about the magnification. For this purpose, it was decided to reduce the size of all images removing the labels, yielding a final image size of 2880 × 1950 × 3 . Each of these two datasets was split into three subsets: train, valid, and test in 7:2:1 ratio. All computations were performed in Python programming language on AMD Ryzen 5 5650 with 32 GB DDR4 and RTX 3060 GPU. ResNet152v2 and ViTs were applied for classification purposes.
It is worth emphasizing that both ResNets and ViTs demonstrated the ability to accurately classify three classes in testing set corresponding to training set. For example, when a model was trained on Dataset1 train subset and tested on Dataset1 test subset, the results were higher than 0.97 accuracy—reaching even 100% for some models. However, our aim was to train a model that could distinguish any diploid, triploid, and tetraploid, rather than only recognizing samples contained within a specific dataset. Due to that reason, this work performs experiments with the following approach—each model is trained on training set derived from one set and tested on another dataset. The testing was conducted on whole datasets, with performance results presented in Table 1, Table 2 and Table 3.
Additionally, to ensure the stability of the obtained results, the validation was performed multiple times. Dataset1 and Dataset2 were split into five distinct subset each. These subsets contained balanced number of images from three classes (0, 1, 2) yielding sets of 100–101 samples of a given class (around 300 samples per each of 5 sets). Then, each of these five sets was applied to verify the performance of a model, e.g., when the model was trained on the training set from Dataset1, then it was tasted on all five subsets from Dataset2, and when the model was trained on the training set from Dataset2, then it was tasted on all five subsets from Dataset1. This approach was applied in all the experiments in this work, and the mean values and standard deviations of five results computed for each of the models are available in Supporting Information.

4.1. ResNet Experimental Setup

The Residual Neural Network applied in this research was ResNet152v2 implementation from Keras library [40]. The images from the datasets were applied as input to ResNet with size reduced by 4 (our experiments showed that applying images resized to default 224 × 224 yields significantly lower accuracy results). The ResNet152v2 was the first part of a sequential model which also consisted of other layers: dense layer with 512 neurons and ReLU activation function, dropout layer with 30% drop, again a dense layer with 512 neurons, and ReLU activation function, dropout layer with 30% drop, and a dense layer with 3 neurons with SoftMax activation function which outputs probabilities that a given sample is assigned to each of these three classes. The optimizer applied in this research was Adam [41] with learning rate 0.01 and categorical_crossentropy loss function was used (from Keras). The model was pretrained on ImageNet dataset and finetuned from conv5_block1_1_conv layer. The model was trained in 1000 epochs with batch size 8 and patience 50. The training usually stopped around 100 epochs.

4.2. ViT Experimental Setup

The ViT applied in this research were two pretrained models: vit-large-patch16-224-in21k and vit-base-patch16-224-in21k from Hugging Face’s Transformers library [42]. These two models differ in size (~86M parameters base and ~307M parameters large) leading to differences in computational cost and performance. The models are pretrained on ImageNet-21k dataset and all its layers are finetuned. The models resize input images to 224 × 224 and classify data into three categories. The batch size was set to 16 and number of epochs 10 (in most cases, the model converged around the sixth epoch). The learning optimizer is AdamW [43] with learning rate of 2 × 10 5 , the loss function is cross-entropy (from PyTorch library [44]).

4.3. Model Evaluation Metrics

The models were evaluated with computed metrics: accuracy, precision, recall, and F1-score. Accuracy measures percentage of all correct predictions. Precision computes proportion of true positives divided by true and false positives combined. Recall measures proportion of true positives divided by true positives and false negatives combined. F1-score is a harmonic mean of precision (p) and recall (r): F 1 = 2 × p × r p + r .
Although this research focuses on maximizing the accuracy, the other tested measures are also helpful for understanding how the model manages to discern instances within each specific class.

4.4. Experiments on Raw Datasets

The models under consideration: ResNet152v2, vit-large-patch16-224-in21k, and vit-base-patch16-224-in21k, were trained on Dataset1 training set, tested on Dataset2 and then trained on Dataset2 training set and tested on Dataset1 (see Table 1). It is shown that better accuracy results are achieved when model learns on Dataset1 and is tested on Dataset2 in two out of three tested scenarios. The best accuracy results are obtained with Residual Network which is equal to 0.68. Large ViT model obtained higher ACC results than the base one. Most of the models have problems with correct classification of class 1 which is illustrated with low values of precision or recall for this class. The model that performed the best in classifying class 1 is ResNet trained on Dataset1.

4.4.1. Experiments on Two Classes

Separate experiments were performed on raw datasets: Dataset1 and Dataset2 considering only class 0 and class 2. These two classes differ in most extent in number of chromosomes so it was decided to verify the classification results obtained by analyzed ViT and ResNet models for this combination (see Table 2). Both base and large ViT architectures outperform ResNet152v2 on both tested datasets. The highest yielded accuracy is 0.88 for vit-base-patch16-224-in21k model and the lowest 0.79 for ResNet152v2.

4.4.2. Experiments on Augmented Datasets

The images in classes 0, 1, and 2 in each of two considered datasets differed in shades of green color. Despite that fact models could successfully learn to distinguish classes in different sets. It was decided to apply augmentation [18] to prevent overfitting and enhance generalization of the model. Each of the images in the dataset was augmented to three new images. The augmentation process contained such modifications as follows: random (+/− 10–40%) brightness modification, (+/− 10–40%) contrast, (+/− 10–40%) saturation, and (+/− 15%) hue. Additionally, auto-contrast correction was applied with a 30–70% probability, while sharpness enhancement (sharpness factor: 1.5–2.5) was introduced with a 20–40% probability. These random values were generated from uniform distributions. The implementation was carried out using Pillow [45] and TorchVision [46] libraries.
The classification was performed on augmented datasets—Dataset1 and Dataset2, (yielding 12000 images in both sets combined) and on classes 0 and 2 extracted from these two datasets. The classification performance was evaluated on raw datasets (not augmented ones). To verify if augmentation increased classification accuracy let us first compare results from Table 1 with Table 3. In case of vit-base-patch16-224-in21k, the augmentation increased accuracy results for models tested on Dataset1 by 2 percentage points (pp) and on Dataset2 by 8 pp. In case of vit-large-patch16-224-in21k, the accuracy decreased for model trained on Dataset1 augmented by 3 pp. The use of augmentation had a mostly positive impact on the classification of class 1 (which was the most problematic class to classify) when employing Vision Transformers, as evidenced by the increase in both precision and recall. Before augmentation, class 1 showed particularly poor performance with recall often at 0%, indicating the model completely failed to detect these instances in many cases. The results yielded for ResNet152v2 trained on augmented Dataset1 decreased classification results by 8 pp and for Dataset2 ACC increased by 5 pp.
Comparing the results obtained in Table 2 with Table 3, one can notice that augmentation did not increase accuracy in most scenarios for Vision Transformers when two classes were classified. The decrease in ACC was between 1 and 8 pp, and the only increase (by 6 pp) was observed for vit-large-patch16-224-in21k trained on Dataset2. In case or Residual Neural Network, the accuracy decreased by 1 pp on model trained on Dataset1 and increased by 5 pp for model trained on Dataset2.
Despite the fact that augmentation increased accuracy results for many models, they did not outperform models without augmentation. For three classes, the highest ACC was 0.68 for ResNet152v2 on Dataset1 and for two classes 0.88 for vit-base-patch16-224-in21k on Dataset2 and for vit-large-patch16-224-in21k on Dataset2 when augmentation was applied.

4.4.3. Gradient-Weighted Class Activation Mapping

To visualize the areas on images which were the most important in the classification process, Gradient-weighted Class Activation Mapping method [47] was employed on both ResNets and Vision Transformers. In both Grad-CAMs for ResNet and ViT, the gradient weights were computed with global average pooling. The resulting activation map was normalized with min-max scaling and then resampled to original image size with bicubic interpolation. Then the heatmap was applied to image with opacity of 0.5. The results were tested on many images, models, and their layers. Figure 2 presents Grad-CAM activation maps of two models with the highest accuracy result for ViT and for ResNet in classification of two classes. The first considered model was ResNet152v2 trained on Dataset2 class 0 and 2 with augmented data, which obtained 0.85 ACC when tested on Dataset1. The second model was vit-base-patch16-224-in21k trained on Dataset2 class 0 and 2 with only raw data, which yielded 0.88 ACC when tested on Dataset1. In case of ResNet, the Grad-CAM heatmap presents results obtained on conv5_block1_1_conv layer. In ViT, Grad-CAM visualizes the last dense layer in the final encoder block. Figure 2 presets evaluation of the latter models on two images: image from class 0 testing set of Dataset2 and image from class 2 testing set of Dataset1. In the case of ViT, the heatmaps consistently highlight stomata cells, as indicated by warm colors (red and yellow) in the relevant regions. In contrast, ResNet’s heatmaps exhibit diffuse attention, focusing on irregular and less interpretable areas of the image that do not correlate with stomata positions. As deeper layers of ResNet were tested, the coverage of red areas expanded, further reducing localization accuracy.

5. Conclusions

In this work, a classification of three ploidy levels was performed (diploidy, triploidy, and tetraploidy) on microscopic images of blackcurrant. All the models were trained on one dataset and tested on an entirely independent datasets collected under various conditions and containing a different sets of blackcurrant genotypes. The ViT and ResNet models could achieve exceptional classification performance on three classes when one dataset was split into training, validation, and testing sets, and the model learned on training and was tested on testing set. These subsets, even if contained different images, included blackcurrant images of specific species and were taken in identical conditions. The models obtained accuracy above 0.97, confirming that distinguishing between ploidy levels within homogeneous data presents no significant challenge.
The primary scientific contribution of this work lies in evaluating model generalizability across distinct datasets. In our previous research on ploidy-level classification with AI [14], the computations were performed on training set and testing was performed on distinct testing set both derived from a single dataset. That dataset consisted of samples prepared in the same conditions. This research focuses on training models and testing them on distinct datasets which include different sets of blackcurrant species and were prepared under varying conditions. In previous work [14], the stomata instances were manually labeled, and then this process was continued with YOLO algorithm. The images with extracted stomata instances were served as input to classification methods. In this work, the classification is reduced to a single-step approach. In this research, tested Residual Neural Networks and Vision Transformers take whole images as input. The results reveal that these models can successfully distinguish blackcurrant reaching up to 0.69 ACC on three classes with ResNet152v2 and 0.88 for vit-base-patch16-224-in21k in binary classification of the most distinct ploidy levels. In three-class classification, the maximal ACC obtained with ViT was equal to 0.6 on vit-base-patch16-224-in21k with augmentation applied. The augmentation approach increased accuracy in most cases when three classes were classified. This technique increased the availability of the model to classify the most challenging class 1 (triploidy). This class is the most problematic to distinguish as the size of stomata and its density increases along with ploidity level and the class 1 is a triploid while 0 includes diploidy and 2—tetraploidy. Comparing the results obtained in Table 1, where no augmentation was used with Table 3 with augmentation applied one can notice an increase of values of precision and recall when the model is trained on Dataset2 their values are no longer equal to 0 when the augmentation technique is employed.
Through Grad-CAM visualization, we uncovered fundamental differences in decision-making processes of the models. ViTs demonstrated biologically interpretable behavior, consistently focusing on stomata structures. In contrast, ResNets exhibited progressively broader attention patterns in deeper layers, incorporating more global image features in their classifications.

6. Discussion

These findings demonstrate that deep learning can effectively differentiate ploidy levels in blackcurrant plants, though several opportunities for improvement remain. In further research, we suggest development of more diverse, multi-species datasets to enhance generalizability. Deeper analysis of Grad-CAM results with different classification methods and various modifications of rigor parameters can be presented and its evaluation with metrics is to be applied. Other augmentation techniques should be employed including geometric modifications such as rotation or flipping.
It is also worth considering different Convolutional Neural Networks (e.g., EfficientNet [48], ConvNeXt [49]) and different Transformer architectures (e.g., Swin Transformer [50]). A possible direction would be to evaluate hybrid approaches (e.g., Mamba [51]), which leverage both the detailed feature extraction of CNNs and the broader contextual understanding of self-attention mechanisms from Transformers.
This analysis shows that ViTs are a promising direction in analysis of stomata images. Transformers should also be employed to stomata microscopic images in other contexts such as stomata detection and measuring. The methodologies and research directions proposed in this work contribute to several scientific domains including analysis of stomata behaviour under different environmental conditions, studies involving biofertilizer production, and cross-species stomata comparisons.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/app151910735/s1.

Author Contributions

Conceptualization, A.K. and A.M.-C.; methodology, A.K.; software, A.K.; validation, A.K. and R.K.; formal analysis, A.K.; investigation, A.K. and A.M.-C.; resources, A.M.-C. and A.M.; data curation, A.K.; writing-original draft preparation, A.K.; writing-review and editing R.K., A.M.-C., and A.M.; visualization, A.K.; supervision, R.K.; project administration, A.K.; funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data and code supporting the findings of this study are available from the corresponding author upon request.

Acknowledgments

The authors are grateful to Stanisław Pluta and Małgorzata Podwyszyńska for providing access to the plant materials used in the experiments.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Casson, S.A.; Hetherington, A.M. Environmental regulation of stomatal development. Curr. Opin. Plant Biol. 2010, 13, 90–95. [Google Scholar] [CrossRef]
  2. Mtileni, M.; Venter, N.; Glennon, K. Ploidy differences affect leaf functional traits, but not water stress responses in a mountain endemic plant population. S. Afr. J. Bot. 2021, 138, 76–83. [Google Scholar] [CrossRef]
  3. Van Laere, K.; França, S.C.; Vansteenkiste, H.; Van Huylenbroeck, J.; Steppe, K.; Van Labeke, M.C. Influence of ploidy level on morphology, growth and drought susceptibility in Spathiphyllum wallisii. Acta Physiol. Plant. 2010, 33, 1149–1156. [Google Scholar] [CrossRef]
  4. Sai, N.; Bockman, J.P.; Chen, H.; Watson-Haigh, N.; Xu, B.; Feng, X.; Piechatzek, A.; Shen, C.; Gilliham, M. StomaAI: An efficient and user-friendly tool for measurement of stomatal pores and density using deep computer vision. New Phytol. 2023, 238, 904–915. [Google Scholar] [CrossRef]
  5. Wu, T.L.; Chen, P.Y.; Du, X.; Wu, H.; Ou, J.Y.; Zheng, P.X.; Wu, Y.L.; Wang, R.S.; Hsu, T.C.; Lin, C.Y.; et al. StomaVision: Stomatal trait analysis through deep learning. bioRxiv 2024. [Google Scholar] [CrossRef]
  6. Fetter, K.C.; Eberhardt, S.; Barclay, R.S.; Wing, S.; Keller, S.R. StomataCounter: A neural network for automatic stomata identification and counting. New Phytol. 2019, 223, 1671–1681. [Google Scholar] [CrossRef]
  7. Schneider, C.A.; Rasband, W.S.; Eliceiri, K.W. NIH Image to ImageJ: 25 years of image analysis. Nat. Methods 2012, 9, 671–675. [Google Scholar] [CrossRef]
  8. Jayakody, H.; Liu, S.; Whitty, M.; Petrie, P. Microscope image based fully automated stomata detection and pore measurement method for grapevines. Plant Methods 2017, 13, 94. [Google Scholar] [CrossRef]
  9. Andayani, U.; Sumantri, I.B.; Pahala, A.; Muchtar, M.A. The implementation of deep learning using Convolutional Neural Network to classify based on stomata microscopic image of curcuma herbal plants. IOP Conf. Ser. Mater. Sci. Eng. 2020, 851, 012035. [Google Scholar] [CrossRef]
  10. Casado-García, A.; del Canto, A.; Sanz-Saez, A.; Pérez-López, U.; Bilbao-Kareaga, A.; Fritschi, F.B.; Miranda-Apodaca, J.; Muñoz-Rueda, A.; Sillero-Martínez, A.; Yoldi-Achalandabaso, A.; et al. LabelStoma: A tool for stomata detection based on the YOLO algorithm. Comput. Electron. Agric. 2020, 178, 105751. [Google Scholar] [CrossRef]
  11. Sun, Z.; Song, Y.; Li, Q.; Cai, J.; Wang, X.; Zhou, Q.; Huang, M.; Jiang, D. An integrated method for tracking and monitoring stomata dynamics from microscope videos. Plant Phenomics 2021, 2021, 9835961. [Google Scholar] [CrossRef]
  12. Wacker, T.S.; Smith, A.G.; Jensen, S.M.; Pflüger, T.; Hertz, V.G.; Rosenqvist, E.; Liu, F.; Dresbøll, D.B. Stomata morphology measurement with interactive machine learning: Accuracy, speed, and biological relevance? Plant Methods 2025, 21, 95. [Google Scholar] [CrossRef]
  13. Razzaq, A.; Shahid, S.; Akram, M.; Ashraf, M.; Iqbal, S.; Hussain, A.; Azam Zia, M.; Qadri, S.; Saher, N.; Shahzad, F.; et al. Stomatal state identification and classification in quinoa microscopic imprints through deep learning. Complexity 2021, 2021, 9938013. [Google Scholar] [CrossRef]
  14. Konopka, A.; Struniawski, K.; Kozera, R.; Ortenzi, L.; Marasek-Ciołakowska, A.; Machlańska, A. Deep Learning Classification of Blackcurrant Genotypes by Ploidy Levels on Stomata Microscopic Images. In Computational Science—ICCS 2025 Workshops; Springer Nature: Cham, Switzerland, 2025; pp. 135–148. [Google Scholar] [CrossRef]
  15. LeCun, Y.; Boser, B.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.; Jackel, L.D. Backpropagation applied to handwritten zip code recognition. Neural Comput. 1989, 1, 541–551. [Google Scholar] [CrossRef]
  16. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar] [CrossRef]
  17. Li, X.; Guo, S.; Gong, L.; Lan, Y. An automatic plant leaf stoma detection method based on YOLOv5. IET Image Process. 2022, 17, 67–76. [Google Scholar] [CrossRef]
  18. Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
  19. Podwyszyńska, M.; Pluta, S. In vitro tetraploid induction of the blackcurrant (Ribes nigrum L.) and preliminary phenotypic observations. Zemdirb. Agric. 2019, 106, 151–158. [Google Scholar] [CrossRef]
  20. Haykin, S. Neural Networks and Learning Machines, 3rd ed.; Pearson: Bloomington, MN, USA, 2008. [Google Scholar]
  21. Rosenblatt, F. The Perceptron—A Perceiving and Recognizing Automaton; Report 85-460-1; Cornell Aeronautical Laboratory: Buffalo, NY, USA, 1957. [Google Scholar]
  22. Konopka, A.; Struniawski, K.; Kozera, R. Performance analysis of Residual Neural Networks in soil bacteria microscopic image classification. In Proceedings of the 37th Annual European Simulation and Modelling Conference, ESM 2023, Toulouse, France, 24–26 October 2023; pp. 144–148. [Google Scholar]
  23. Slavutskii, L.; Lazareva, N.; Portnov, M.; Slavutskaya, E. Neural net without deep learning: Signal approximation by multilayer perceptron. In Proceedings of the 2nd International Conference on Computer Applications for Management and Sustainable Development of Production and Industry (CMSD-II-2022), SPIE, Dushanbe, Tajikistan, 21–23 December 2022; p. 11. [Google Scholar] [CrossRef]
  24. Wang, L.; Meng, Z. Multichannel two-dimensional Convolutional Neural Network based on interactive features and group strategy for chinese sentiment analysis. Sensors 2022, 22, 714. [Google Scholar] [CrossRef]
  25. Konopka, A.; Kozera, R.; Sas-Paszt, L.; Trzcinski, P.; Lisek, A. Identification of the selected soil bacteria genera based on their geometric and dispersion features. PLoS ONE 2023, 18, e0293362. [Google Scholar] [CrossRef]
  26. Konopka, A.; Kozera, R.; Sas-Paszt, L.; Trzciński, P. Automated imaging and machine learning for soil bacteria classification: Challenges and insights. Eng. Appl. Artif. Intell. 2025, 159, 111369. [Google Scholar] [CrossRef]
  27. Hsu, C.Y.; Tseng, C.C.; Lee, S.L.; Xiao, B.Y. Image classification using Convolutional Neural Networks with different convolution operations. In Proceedings of the 2020 IEEE International Conference on Consumer Electronics—Taiwan (ICCE-Taiwan), Taoyuan, Taiwan, 28–30 September 2020; pp. 1–2. [Google Scholar] [CrossRef]
  28. LeCun, Y.; Jackel, L.D.; Bottou, L.; Cortes, C.; Denker, J.S.; Drucker, H.; Guyon, I.; Muller, U.A.; Sackinger, E.; Simard, P.; et al. Learning algorithms for classification: A comparison on handwritten digit recognition. In Neural Networks; World Scientific: Pohang, Republic of Korea, 1995; pp. 261–276. [Google Scholar] [CrossRef]
  29. Vang-Mata, R. Multilayer Perceptrons: Theory and Applications; Nova Science Publishers, Inc.: Hauppauge, NY, USA, 2020. [Google Scholar]
  30. Zhao, L.; Zhang, Z. A improved pooling method for Convolutional Neural Networks. Sci. Rep. 2024, 14, 1589. [Google Scholar] [CrossRef]
  31. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
  32. Borawar, L.; Kaur, R. ResNet: Solving Vanishing Gradient in Deep Networks. In Proceedings of International Conference on Recent Trends in Computing; Springer Nature: Singapore, 2023; pp. 235–247. [Google Scholar] [CrossRef]
  33. Philipp, G.; Song, D.; Carbonell, J.G. Gradients explode—Deep networks are shallow—ResNet explained. In Proceedings of the 6th International Conference on Learning Representations, ICLR 2018—Workshop Track Proceedings, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
  34. He, K.; Zhang, X.; Ren, S.; Sun, J. Identity mappings in Deep Residual Networks. arXiv 2016, arXiv:1603.05027. [Google Scholar] [CrossRef]
  35. Varshney, M.; Singh, P. Optimizing nonlinear activation function for convolutional neural networks. Signal Image Video Process 2021, 15, 1323–1330. [Google Scholar] [CrossRef]
  36. Azar, G.A.; Emami, M.; Fletcher, A.; Rangan, S. Learning embedding representations in high dimensions. In Proceedings of the 2024 58th Annual Conference on Information Sciences and Systems (CISS), Princeton, NJ, USA, 13–15 March 2024; pp. 1–6. [Google Scholar] [CrossRef]
  37. Zheng, C.; Gao, Y.; Shi, H.; Huang, M.; Li, J.; Xiong, J.; Ren, X.; Ng, M.; Jiang, X.; Li, Z.; et al. DAPE: Data-Adaptive Positional Encoding for length extrapolation. Adv. Neural Inf. Process. Syst. 2024, 37, 26659–26700. [Google Scholar]
  38. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar] [CrossRef]
  39. Shah, S.M.A.H.; Khan, M.Q.; Ghadi, Y.Y.; Jan, S.U.; Mzoughi, O.; Hamdi, M. A hybrid neuro-fuzzy approach for heterogeneous patch encoding in ViTs using contrastive embeddings and deep knowledge dispersion. IEEE Access 2023, 11, 83171–83186. [Google Scholar] [CrossRef]
  40. Keras. 2015. Available online: https://keras.io (accessed on 30 September 2025).
  41. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar] [CrossRef]
  42. Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.; Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Funtowicz, M.; et al. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Online, 16–20 November 2020; pp. 38–45. [Google Scholar]
  43. Loshchilov, I.; Hutter, F. Decoupled weight decay regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar] [CrossRef]
  44. Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32; Curran Associates, Inc.: Red Hook, NY, USA, 2019; pp. 8024–8035. [Google Scholar]
  45. Clark, A. Pillow (PIL Fork) Documentation. 2015. Available online: https://buildmedia.readthedocs.org/media/pdf/pillow/latest/pillow.pdf (accessed on 30 September 2025).
  46. Marcel, S.; Rodriguez, Y. Torchvision the machine-vision package of torch. In Proceedings of the 18th ACM International Conference on Multimedia, Firenze, Italy, 25–29 October 2010; MM ’10. ACM: New York, NY, USA; pp. 1485–1488. [Google Scholar] [CrossRef]
  47. Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual explanations from deep networks via gradient-based localization. arXiv 2016, arXiv:1610.02391. [Google Scholar] [CrossRef]
  48. Tan, M.; Le, Q.V. EfficientNet: Rethinking model scaling for Convolutional Neural Networks. arXiv 2019, arXiv:1905.11946. [Google Scholar] [CrossRef]
  49. Liu, Z.; Mao, H.; Wu, C.Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A ConvNet for the 2020s. arXiv 2022, arXiv:2201.03545. [Google Scholar] [CrossRef]
  50. Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using shifted windows. arXiv 2021, arXiv:2103.14030. [Google Scholar] [CrossRef]
  51. Zhu, L.; Liao, B.; Zhang, Q.; Wang, X.; Liu, W.; Wang, X. Vision Mamba: Efficient visual representation learning with bidirectional state space model. arXiv 2024, arXiv:2401.09417. [Google Scholar] [CrossRef]
Figure 1. Microscopic images of blackcurrant (Ribes nigrum L.) leaves showing stomata cells across different ploidy levels: (ac) diploid (class 0), triploid (class 1), and tetraploid (class 2) samples from Dataset1; (df) corresponding ploidy classes from Dataset2. All images were captured using a VHX-7000N KEYENCE digital microscope.
Figure 1. Microscopic images of blackcurrant (Ribes nigrum L.) leaves showing stomata cells across different ploidy levels: (ac) diploid (class 0), triploid (class 1), and tetraploid (class 2) samples from Dataset1; (df) corresponding ploidy classes from Dataset2. All images were captured using a VHX-7000N KEYENCE digital microscope.
Applsci 15 10735 g001
Figure 2. GradCAM activation maps comparing ViT (vit-base-patch16-224-in21k) trained on Dataset2 of raw images with ResNet152v2 model trained on Dataset2 with augmentation applied for ploidy level classification (Class 0: diploid, Class 2: tetraploid) tested on selected images.
Figure 2. GradCAM activation maps comparing ViT (vit-base-patch16-224-in21k) trained on Dataset2 of raw images with ResNet152v2 model trained on Dataset2 with augmentation applied for ploidy level classification (Class 0: diploid, Class 2: tetraploid) tested on selected images.
Applsci 15 10735 g002
Table 1. Comparative performance analysis of Vision Transformer and ResNet architectures in datasets Dataset1 (1) and Dataset2 (2) for all 3 classes. Models were evaluated on whole datasets and the values of metrics—accuracy (acc), precision (p), recall (r), and F1-score (f1) are presented. Vit-b: vit-base-patch16-224-in21k; Vit-l: vit-large-patch16-224-in21k; RN: ResNet152v2. Training (Tr) and testing (Te) sets are indicated for each experiment.
Table 1. Comparative performance analysis of Vision Transformer and ResNet architectures in datasets Dataset1 (1) and Dataset2 (2) for all 3 classes. Models were evaluated on whole datasets and the values of metrics—accuracy (acc), precision (p), recall (r), and F1-score (f1) are presented. Vit-b: vit-base-patch16-224-in21k; Vit-l: vit-large-patch16-224-in21k; RN: ResNet152v2. Training (Tr) and testing (Te) sets are indicated for each experiment.
TrTemodacc0_p0_r0_f11_p1_r1_f12_p2_r2_f1
12vit-b0.570.830.720.770.820.120.210.440.880.59
21vit-b0.520.490.990.66100.010.570.560.57
12vit-l0.560.900.590.710.960.160.280.430.930.59
21vit-l0.590.720.960.82100.010.480.790.60
12RN0.680.720.660.690.630.910.750.720.470.57
21RN0.540.910.640.750000.420.970.59
Table 2. Comparative performance analysis of Vision Transformer and ResNet architectures in datasets Dataset1 (1_02) and Dataset2 (2_02) for two classes: 0—diploid and 2—tetraploid. Models were evaluated on whole datasets and the values of metrics—accuracy (acc), precision (p), recall (r), and F1-score (f1) are presented. Vit-b: vit-base-patch16-224-in21k; Vit-l: vit-large-patch16-224-in21k; RN: ResNet152v2. Training (Tr) and testing (Te) sets are indicated for each experiment.
Table 2. Comparative performance analysis of Vision Transformer and ResNet architectures in datasets Dataset1 (1_02) and Dataset2 (2_02) for two classes: 0—diploid and 2—tetraploid. Models were evaluated on whole datasets and the values of metrics—accuracy (acc), precision (p), recall (r), and F1-score (f1) are presented. Vit-b: vit-base-patch16-224-in21k; Vit-l: vit-large-patch16-224-in21k; RN: ResNet152v2. Training (Tr) and testing (Te) sets are indicated for each experiment.
TrTemodacc0_p0_r0_f12_p2_r2_f1
1_022_02vit-b0.810.880.710.780.760.900.82
2_021_02vit-b0.880.840.930.880.920.820.87
1_022_02vit-l0.820.930.690.790.750.950.84
2_021_02vit-l0.820.740.990.850.980.660.79
1_022_02RN0.790.880.680.760.740.900.81
2_021_02RN0.800.940.630.760.720.960.83
Table 3. Comparative performance analysis of Vision Transformer and ResNet architectures in datasets Dataset1 (1) and Dataset2 (2) for all three classes and for two classes 0—diploid and 2—tetraploid (1_02 and 2_02, respectively). Models are trained on augmented training datasets (Tr) (1_a, 2_a, 1_a_02, 2_a_02) and tested on raw testing datasets (Te). Models were evaluated on whole datasets and the values of metrics accuracy (acc), precision (p), recall (r), and F1-score (f1) metrics are presented. Vit-b: vit-base-patch16-224-in21k; Vit-l: vit-large-patch16-224-in21k; RN: ResNet152v2.
Table 3. Comparative performance analysis of Vision Transformer and ResNet architectures in datasets Dataset1 (1) and Dataset2 (2) for all three classes and for two classes 0—diploid and 2—tetraploid (1_02 and 2_02, respectively). Models are trained on augmented training datasets (Tr) (1_a, 2_a, 1_a_02, 2_a_02) and tested on raw testing datasets (Te). Models were evaluated on whole datasets and the values of metrics accuracy (acc), precision (p), recall (r), and F1-score (f1) metrics are presented. Vit-b: vit-base-patch16-224-in21k; Vit-l: vit-large-patch16-224-in21k; RN: ResNet152v2.
TrTemodacc0_p0_r0_f11_p1_r1_f12_p2_r2_f1
1_a2vit-b0.590.910.710.800.910.140.250.450.920.60
2_a1vit-b0.600.690.910.7810.070.130.520.830.64
1_a_022_02vit-b0.730.930.490.64---0.650.970.78
2_a_021_02vit-b0.800.720.970.83---0.950.630.76
1_a2vit-l0.530.940.440.600.900.190.310.420.970.58
2_a1vit-l0.590.690.890.7810.080.140.500.820.62
1_a_022_02vit-l0.780.910.620.73---0.710.940.81
2_a_021_02vit-l0.880.850.910.88---0.900.840.87
1_a2RN0.600.520.900.660.680.440.540.730.470.57
2_a1RN0.590.720.820.7710.070.140.490.870.62
1_a_022_02RN0.780.910.630.74---0.720.940.81
2_a_021_02RN0.850.870.830.85---0.840.880.86
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Konopka, A.; Kozera, R.; Marasek-Ciołakowska, A.; Machlańska, A. Classification of Blackcurrant Genotypes by Ploidy Levels on Stomata Microscopic Images with Deep Learning: Convolutional Neural Networks and Vision Transformers. Appl. Sci. 2025, 15, 10735. https://doi.org/10.3390/app151910735

AMA Style

Konopka A, Kozera R, Marasek-Ciołakowska A, Machlańska A. Classification of Blackcurrant Genotypes by Ploidy Levels on Stomata Microscopic Images with Deep Learning: Convolutional Neural Networks and Vision Transformers. Applied Sciences. 2025; 15(19):10735. https://doi.org/10.3390/app151910735

Chicago/Turabian Style

Konopka, Aleksandra, Ryszard Kozera, Agnieszka Marasek-Ciołakowska, and Aleksandra Machlańska. 2025. "Classification of Blackcurrant Genotypes by Ploidy Levels on Stomata Microscopic Images with Deep Learning: Convolutional Neural Networks and Vision Transformers" Applied Sciences 15, no. 19: 10735. https://doi.org/10.3390/app151910735

APA Style

Konopka, A., Kozera, R., Marasek-Ciołakowska, A., & Machlańska, A. (2025). Classification of Blackcurrant Genotypes by Ploidy Levels on Stomata Microscopic Images with Deep Learning: Convolutional Neural Networks and Vision Transformers. Applied Sciences, 15(19), 10735. https://doi.org/10.3390/app151910735

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop