Classification of Maize Images Enhanced with Slot Attention Mechanism in Deep Learning Architectures

Cömert, Zafer; Karadeniz, Alper Talha; Basaran, Erdal; Celik, Yuksel

doi:10.3390/electronics14132635

Open AccessArticle

Classification of Maize Images Enhanced with Slot Attention Mechanism in Deep Learning Architectures

¹

Department of Software Engineering, Faculty of Engineering and Natural Sciences, Samsun University, Samsun 55420, Turkey

²

Vocational School Department of Computer Technology, Agri Ibrahim Cecen University, Agri 04200, Turkey

³

Information Security and Digital Forensics (ISDF) University at Albany, State University of New York, Albany, NY 12222, USA

⁴

Computer Engineering, Faculty of Engineering and Architecture, Nisantasi University, Istanbul 34398, Turkey

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(13), 2635; https://doi.org/10.3390/electronics14132635

Submission received: 22 May 2025 / Revised: 19 June 2025 / Accepted: 26 June 2025 / Published: 30 June 2025

(This article belongs to the Special Issue Data-Related Challenges in Machine Learning: Theory and Application)

Download

Browse Figures

Versions Notes

Abstract

Maize is a vital global crop, serving as a fundamental component of global food security. To support sustainable maize production, the accurate classification of maize seeds—particularly distinguishing haploid from diploid types—is essential for enhancing breeding efficiency. Conventional methods relying on manual inspection or simple machine learning are prone to errors and unsuitable for large-scale data. To overcome these limitations, we propose Slot-Maize, a novel deep learning architecture that integrates Convolutional Neural Networks (CNN), Slot Attention, Gated Recurrent Units (GRU), and Long Short-Term Memory (LSTM) layers. The Slot-Maize model was evaluated using two datasets: the Maize Seed Dataset and the Maize Variety Dataset. The Slot Attention module improves feature representation by focusing on object-centric regions within seed images. The GRU captures short-term sequential patterns in extracted features, while the LSTM models long-range dependencies, enhancing temporal understanding. Furthermore, Grad-CAM was utilized as an explainable AI technique to enhance the interpretability of the model’s decisions. The model demonstrated an accuracy of 96.97% on the Maize Seed Dataset and 92.30% on the Maize Variety Dataset, outperforming existing methods in both cases. These results demonstrate the model’s robustness, generalizability, and potential to accelerate automated maize breeding workflows. In conclusion, the Slot-Maize model provides a robust and interpretable solution for automated maize seed classification, representing a significant advancement in agricultural technology. By combining accuracy with explainability, Slot-Maize provides a reliable tool for precision agriculture.

Keywords:

maize seed classification; deep learning; machine learning; Grad-CAM

1. Introduction

Maize (Zea mays) is a globally significant cereal crop that occupies a pivotal position in both the food and industrial sectors, with a vast array of applications [1]. As a principal source of carbohydrates in human nutrition, maize represents a significant component of the diet, particularly in developing countries. Due to its high starch content, maize is a critical component of the production of flour and other starch-based products, as well as sweeteners and biofuels [2]. Moreover, maize is widely employed as an animal feed, thereby exerting an indirect influence on global meat and dairy production. In industrial applications, maize is utilized as a raw material for the production of bioethanol, bioplastics, and an assortment of chemical products. Additionally, it is utilized in the pharmaceutical and cosmetic industries. The multifaceted utilization of maize makes it an indispensable crop in agricultural production while also being a vital element in global economic and food security [3].

The haploid doubling technique represents an efficient method for the rapid generation of genetically pure lines in maize breeding. The process entails the duplication of chromosomes in naturally occurring or induced haploid plants, thereby generating diploid progeny [4]. The haploid doubling process accelerates the production of pure lines within breeding programs, thereby reducing the time required for hybrid variety development and enhancing genetic uniformity. Nevertheless, it should be noted that this technique requires a high level of expertise and sophisticated laboratory infrastructure. The viability rates of haploid plants are often insufficient, and the diploidization process can present difficulties, which may restrict the effectiveness of this approach to some extent. Nevertheless, despite these challenges, haploid doubling is regarded as a preferred approach in maize breeding due to its demonstrated potential to accelerate genetic gains [5].

Figure 1 illustrates the distinctive morphology of maize seeds, showcasing the notable variations between the haploid (a) and diploid (b) seeds. The haploid seed (Figure 1a) is distinguished by a colorless embryo, accompanied by R1-nj expression in its endosperm. This is a consequence of haploid plants having only half the genetic material, which results in a smaller and weaker embryo development. In contrast, the diploid seed (Figure 1b) displays R1-nj pigmentation in both the endosperm and the embryo, indicative of more robust and healthy embryo development in diploid plants [6]. While both types of seeds share fundamental structures such as the endosperm and seed coat, it is notable that haploid seeds are generally smaller, exhibit lower levels of fullness, and display reduced germination rates. In contrast with the aforementioned characteristics, diploid seeds are larger, fuller, and have higher viability rates, which renders them more frequently preferred in agricultural production.

The haploid doubling method is a critical tool in the field of maize breeding. It facilitates the rapid development of genetically pure lines. The development of new technologies and methods to enhance the effectiveness of this technique has resulted in significant advancements in both laboratory and field conditions. In particular, microspore culture techniques and doubled haploid technology facilitate the expeditious acquisition of haploid plants and enhance overall efficiency. Furthermore, gene-editing tools such as CRISPR-Cas9 enable the direct and precise modification of desired genetic traits in haploid plants [7]. Further enhancements in haploid induction systems and the optimization of in vitro techniques have led to increased haploid doubling rates, thereby reinforcing the applicability and genetic gains of this method in maize breeding programs [8].

The haploid doubling process presents a number of significant challenges, including the low viability rates of haploid plants, the difficulty of diploidization, and the challenges associated with phenotypically identifying these plants. Embryo and endosperm development in haploid plants is frequently inadequate, which has a detrimental impact on their growth potential and germination rates [9]. Furthermore, improper chromosome pairing during the diploidization process can result in genetic aberrations. To address these challenges, the use of computer-assisted digital imaging systems is becoming increasingly crucial [10]. These systems facilitate the expeditious and precise identification of phenotypic characteristics in haploid and diploid plants, thereby expediting the breeding process. In particular, the integration of image processing algorithms and artificial intelligence is of critical importance in enhancing genetic uniformity and efficiency through the automated analysis of plant morphology [11].

2. Related Studies

Ayaz et al. proposed a hybrid deep learning model known as DeepMaizeNet, which integrates Convolutional Block Attention Module (CBAM), hypercolumn, and residual blocks. This model demonstrated a 94.13% accuracy rate in the classification of haploid and diploid maize seeds, underscoring its potential applications in the domain of maize breeding [12]. Dönmez et al. developed an ensemble deep learning model combining five Convolutional Neural Network (CNN) architectures, achieving 90.96% accuracy in seed classification, demonstrating the effectiveness of ensemble learning in automating this process [13]. Rodrigues Ribeiro et al. utilized near-infrared spectroscopy (NIR) and PLS-DA, achieving 100% accuracy in classifying maize seeds, showcasing NIR’s reliability for seed identification in breeding programs haploids [14]. Dönmez et al. combined EfficientNetV2B0 with ResMLP, achieving 96.33% accuracy, balancing performance and computational efficiency in maize seed classification [15]. Güneş and Dönmez implemented an interactive model using batch mode active learning and Support Vector Machine (SVM), reducing labeling costs by 66%, effectively minimizing the time and costs associated with seed classification [16]. He et al. employed near-infrared hyperspectral imaging (NIR-HSI) in conjunction with multivariate methods, thereby attaining 90.31% accuracy and enhancing the robustness of automated seed classification [17]. Dönmez employed deep feature extraction with CNNs and Minimum Redundancy Maximum Relevance (MRMR) for feature selection, achieving 96.74% accuracy, emphasizing the effectiveness of deep feature selection strategies [18]. Ge et al. used nuclear magnetic resonance (NMR) spectra with multi-manifold learning, achieving 98.33% accuracy in haploid kernel recognition, demonstrating NMR’s utility in maize breeding [19]. Dönmez utilized AlexNet for deep feature extraction, achieving 89.5% accuracy in maize seed classification, highlighting deep learning’s value in breeding programs [20]. Altuntaş et al. used transfer learning with CNNs, where VGG-19 achieved 94.22% accuracy in seed classification, confirming CNNs’ effectiveness in automation [6]. Liao et al. combined hyperspectral imaging with transfer learning, achieving 96.32% accuracy in identifying haploid seeds, showing the potential of hyperspectral data in classification [21]. Altuntaş et al. concentrated on a texture-based classification approach that employed the Gray-Level Co-Occurrence Matrix (GLCM) and decision trees, attaining an accuracy of 84.48%. This finding indicates the efficacy of texture features in facilitating the classification of seeds for breeding purposes [22].

2.1. Motivation and Contributions

The objective of this study is to enhance the precision and efficiency of the classification of maize seeds, with a particular focus on distinguishing between haploid and diploid seeds. Accurate identification is crucial for accelerating genetic gains and optimizing hybrid variety development in maize breeding. Traditional methods, often manual or basic machine learning, are prone to inaccuracies and are inadequate for large datasets. While existing deep learning models are effective, they often lack the interpretability necessary for agricultural applications, where understanding model decisions is essential. This study makes several key contributions, which are outlined below:

(1): The introduction of a biologically focused benchmark dataset (Rovile): Although the Rovile Maize Seed Dataset is relatively small, it is one of the few publicly available datasets specifically designed to capture biologically meaningful regions such as the embryo and endosperm. These regions are critical for haploid–diploid differentiation, and their inclusion strengthens the relevance of the task. Through the use of Grad-CAM, we highlight these regions, offering interpretable evidence that our model focuses on anatomically significant structures.
(2): The development of an advanced deep learning model that integrates CNN, Slot Attention, Gated Recurrent Unit (GRU), and Long Short-Term Memory (LSTM) layers, specifically designed to enhance the accuracy and efficiency of maize seed classification.
(3): Comprehensive ablation study, systematically evaluating the contributions of each model component (Slot Attention, GRU, LSTM) to the overall performance, providing insights into the effectiveness of these layers.
(4): Validation on both small-scale and large-scale datasets: To evaluate its generalization capability and robustness, we tested our model not only on the Rovile dataset but also on the newly published Maize Variety Dataset (2024), which includes over 17,000 seed images from three different varieties. This dual-dataset evaluation validates the model’s scalability across simple and complex classification tasks.
(5): The integration of Grad-CAM as an explainable artificial intelligence (XAI) technique is imperative. This ensures transparency and interpretability in the model’s decision-making process, which is crucial for real-world agricultural applications.

2.2. Organization

The structure of the present study is as follows: As delineated in Section 3, an overview of the datasets and methodologies utilized is presented, in addition to a detailed account of the proposed model architecture. The proposed model architecture incorporates various layers, including CNN, slot attention, LSTM, and GRU. The subsequent section details the experimental setup and discusses the findings. Section 5 provides a comprehensive analysis and evaluation of these findings. In Section 6, the conclusions derived from the aforementioned analyses are presented.

3. Material and Methods

The following section provides a detailed description of the data sets used, along with an account of the preprocessing procedures. It subsequently presents a comprehensive explanation of the model architecture, including the structure of each layer, input-output dimensions, activation functions utilized, and the optimization techniques employed. Furthermore, the training and validation procedures are outlined, thereby ensuring a thorough understanding of the methodological approach used to develop and evaluate the model’s performance.

The proposed model in Figure 2 commences with a CNN comprising convolutional and max-pooling layers to extract features from maize seed images. These features are flattened and passed to a slot attention module that refines the feature representation through iterative attention mechanisms. The output slots are further processed by a GRU layer to capture sequential dependencies. The resulting features are passed through fully connected layers with dropout for classification, concluding with a SoftMax layer. XAI techniques visualize important image regions.

3.1. Datasets

As illustrated in Table 1, the Haploid and Diploid Maize Seeds Dataset (Rovile) comprises 3000 high-resolution images intended for the classification of maize seeds, with a nearly equal distribution between 1230 haploid and 1770 diploid seeds. This dataset has been designed for researchers engaged in maize breeding, and it offers support for studies on doubled haploid technology and the automation of seed classification using machine learning. The images, captured under controlled conditions, facilitate the development of accurate models, thereby making this dataset a crucial resource for advancing agronomy research and optimizing breeding programs through precise seed identification [6].

The Maize Variety Dataset for Classification presented in Table 2 comprises 17,724 color images of three maize seed varieties—Wang Dataa, Sanzal Sima, and Bihilifa—captured with a 12-megapixel camera in Ghana. The dataset is designed for deep learning-based image classification tasks, particularly in precision agriculture. The images have been resized to 224 × 224 pixels and categorized into subfolders for each variety. The objective of the dataset is to facilitate efficient and cost-effective maize classification, thereby reducing the necessity of human involvement in seed grading for marketing and production purposes [23].

3.2. Convolution Layers

In a deep CNN, a convolution layer implements a set of filters, or kernels, on the input data. This process yields feature maps that capture spatial hierarchies and patterns. Each filter is successively applied to the input, and the dot product between the filter and the input patch is calculated. This is expressed mathematically as follows:

(f \times g) (i, j) = \sum_{m} \sum_{n} f (m, n) \cdot g (i - m, j - n)

(1)

In this context, the function

f (m, n)

represents the input, while

g (i - m, j - n)

represents the convolutional filter, which is applied in a systematic manner to generate the feature map.

3.3. Slot Attention Block

Slot attention is a mechanism that enhances the ability of neural networks, especially CNNs, to process and interpret complex visual scenes by enabling structured object-centric attention [24]. This mechanism is especially advantageous in scenarios where comprehending the relationships and individual components of a scene is paramount, such as in the domains of object detection, segmentation, and scene understanding [25]. Slot attention works by iteratively refining a set of latent variables, called slots. Slots represent different parts or objects in the input data (e.g., different objects in an image). The slots undergo an update process that utilizes an attention mechanism, a process that selectively focuses on different parts of the input feature map [26]. An overview of the slot attention pipeline is presented in Algorithm 1.

Algorithm 1 Slot attention pipeline
1	Input $\leftarrow X \in R^{H x W x C}$ , $s_{d i m}$ , $n_{t}$ Output $\leftarrow$ Slots (S)	$X \in R^{H x W x C}$ , In this context, $H$ and $W$ refer to the height and width of the feature map, respectively, and “C” denotes the number of channels. $s_{d i m}$ , slot dimension, $n_{t}$ is number of total iteration.
2	$S_{0} = \{s_{0}^{(1)}, s_{0}^{(2)}, \dots, s_{0}^{(K)}\}$	Initialize a set of $K$ slots, where $s_{0}^{(k)} \in R^{D}$ is a learnable vector of dimension $D .$
3	$S_{0} = μ + e x p (l o g (σ)) \cdot ϵ$	At the start, each slot is randomly initialized. $ϵ$ is random noise $ϵ ~ Ν (0, 1)$ .
4	for $t = 0$ to $n_{t}$ do
5	$a_{t}^{(k)} = s o f t m a x (s_{t}^{(k)} \cdot X)$	The attention weights $a_{t}^{(k)}$ for the $k$ -th slot at iteration $t$ are calculated.
6	$s_{t + 1}^{(k)} = G R U (s_{t}^{(k)}, \sum_{i, j} a_{t}^{(k)} (i, j) X_{i j})$	Here, GRU updates the slot’s state based on the current state $s_{t}^{(k)}$ and the attended feature map $\sum_{i, j} a_{t}^{(k)} (i, j) X_{i j}$ .
7	end
8	return $S$	Return slots.

The slot attention mechanism enables each slot to sequentially focus on distinct regions of the input feature map, thereby enhancing its representation and capturing unique aspects of the input data. The attention mechanism guarantees that each slot is oriented towards relevant spatial locations, and the GRU updates the slot’s state based on this attention [24].

3.4. Gated Recurrent Unit (GRU)

The GRU is a type of Recurrent Neural Network (RNN) architecture designed to efficiently capture dependencies in sequential data. This objective is realized through the mitigation of issues such as vanishing and exploding gradients, which are prevalent in conventional recurrent neural networks (RNNs) [27]. The GRU simplifies the architecture by combining the functions of the forget and input gates, which are found in LSTM networks, into a single update gate. This reduces the computational complexity while maintaining performance. The update gate determines the extent to which past information must be conveyed to the future [28]. It achieves a harmonious equilibrium between preserving the prior concealed state and incorporating the novel input.

z_{t} = σ (W_{z} [h_{t - 1}, x_{t}] + b_{z})

(2)

In this context,

W_{z}

symbolizes the weight matrix associated with the update gate, while

b_{z}

denotes the bias term.

h_{t - 1}

is the previous hidden state and

x_{t}

is the current input. The reset gate determines the extent to which past information is discarded, thereby exerting control over the impact of a preceding hidden state on the current one [29].

r_{t} = σ (W_{r} [h_{t - 1}, x_{t}] + b_{r})

(3)

where

σ

represents the sigmoid activation function,

W_{r}

denotes the weight matrix, and

b_{r}

is the bias term. The candidate hidden state, which is influenced by the reset gate, represents new information that may be added to the hidden state.

{\hat{h}}_{t} = t a n h (W \cdot [r_{t} ⨀ h_{t - 1}, x_{t}] + b)

(4)

where

W

is the weight matrix,

b

is the bias term, and

⨀

denotes element-wise multiplication.

{\hat{h}}_{t}

is the candidate hidden state. The final hidden state, denoted by

h_{t}

, is a linear interpolation between the previous hidden state

h_{t - 1}

and the candidate hidden state

{\hat{h}}_{t}

. This interpolation is controlled by the update gate [30].

h_{t} = z_{t} ⨀ h_{t - 1} + (1 - z_{t}) ⨀ {\hat{h}}_{t}

(5)

The update mechanism enables the GRU to retain crucial information over extended sequences while discarding superfluous details.

3.5. Explainable Artificial Intelligence (XAI)

Gradient-weighted Class Activation Mapping (Grad-CAM) is an algorithmic approach that provides a transparent explanation of the underlying logic behind a deep learning model’s decision-making process, particularly within the domain of image classification [31]. The technique involves generating visual representations that highlight the regions within an input image that are deemed most crucial for influencing the model’s output [32]. The Grad-CAM heatmap

L_{G r a d - C A M}^{c}

for class

c

is given by the following:

L_{G r a d - C A M}^{c} = R e L U (\sum_{k} α_{k}^{c} A^{k})

(6)

where

α_{k}^{c}

are the weights obtained by the global average pooling of the gradients, and

A^{k}

represents the activation maps. The ReLU function is applied with the objective of focusing on the positive influences, thereby rendering Grad-CAM a particularly useful tool for the interpretation of machine learning models [33].

3.6. Performance Metrics and Model Evaluation

A confusion matrix is a key evaluation tool in machine learning that provides a detailed overview of a classification model’s performance. It displays the counts of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). This breakdown allows for a better understanding of the model’s accuracy, sensitivity (or recall), specificity, and precision. This enables the identification of the model’s strengths and weaknesses in predicting different classes [34].

Accuracy (Acc) refers to the ratio of correctly classified instances to the total number of instances. This value is calculated as follows:

A c c = \frac{T P + T N}{T P + T N + F P + F N}

(7)

Sensitivity (Se or recall) is defined as the ability of the model in question to correctly identify instances that are positive.

S e = \frac{T P}{T P + F N}

(8)

Specificity (Sp) represents the degree to which a model is able to correctly identify instances that do not align with the expected outcome. It can be calculated as follows:

S p = \frac{T N}{T N + F P}

(9)

The F1 score is a single metric that balances false positives and false negatives by calculating the harmonic mean of precision (positive predictive value) and sensitivity (recall). This score provides a comprehensive measure of a model’s accuracy in classifying positive instances. It is defined as follows:

F 1 S c o r e = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(10)

4. Results

Four distinct models were developed through ablation experiments, progressively integrating Slot Attention, GRU, and LSTM to assess their impact on model performance. The models were trained with key hyperparameters to optimize performance. A batch size of 16 was used, with training performed over 128 epochs. Data shuffling was enabled for training, but disabled for validation. The Adam optimizer was selected for weight updates, while categorical cross-entropy served as the loss function. Early stopping was implemented with a patience of three epochs, with the validation accuracy serving as the primary metric for monitoring. The optimal weights of the model were restored from the epoch that exhibited the minimum validation loss. The hyperparameters along with their corresponding values are shown in Table 3.

In the experimental study, the efficacy of four distinct models was assessed through the lens of an ablation study. These models were evaluated on the basis of their performance when enhanced with a series of progressively integrated components, including a baseline CNN, CNN augmented with Slot Attention, CNN coupled with Slot Attention and a GRU, and CNN integrated with Slot Attention and an LSTM unit. This progression allows for a comprehensive investigation of the impact of each additional component on model performance.

Figure 3 presents the validation accuracy and loss curves for four models. The CNN model exhibits the lowest and most fluctuating accuracy. Incorporating slot attention enhances accuracy, while GRU and LSTM further improve performance, with CNN + Slot + GRU achieving the highest and most stable accuracy. The loss curves mirror this trend, with the baseline CNN showing the highest and most erratic loss, whereas slot attention, combined with GRU and LSTM, progressively reduces loss, indicating improved learning efficiency and model robustness.

It is important to note that all models were trained under identical experimental conditions, including the same learning rate settings, batch size, optimizer (Adam), and early stopping criteria. The differences in the number of training epochs observed in Figure 3 arise solely from the early stopping mechanism, which was uniformly configured with a patience of three epochs and monitored on validation accuracy. This adaptive stopping strategy helps prevent overfitting and ensures efficient convergence. Therefore, the variations in epoch count do not reflect inconsistencies in the training protocol but instead indicate the individual convergence behaviors of each model. As such, the comparisons made between models remain methodologically fair and valid.

The confusion matrices, as shown in Figure 4, indicate progressive improvements in the classification accuracy across models. The baseline CNN shows the highest misclassification rates, particularly in distinguishing haploid seeds. Incorporating Slot Attention, GRU, and LSTM significantly reduces misclassifications, with the CNN + Slot + GRU and CNN + Slot + LSTM models achieving the most accurate and balanced classification outcomes.

Table 4 provides a summary of the performance metrics for the four models that were evaluated. The models under consideration are the CNN, the CNN with a slot, the CNN with a slot and a GRU, and the CNN with a slot and an LSTM unit. The baseline CNN model demonstrates the lowest overall performance, with an accuracy of 0.88, sensitivity of 0.9209, specificity of 0.8211, and an F1 score of 0.9006. These results indicate a moderate ability to distinguish between classes. The incorporation of slot attention enhances the model’s sensitivity to 0.9633 and F1 Score to 0.9204, although the specificity remains relatively unchanged. The CNN + Slot + GRU model exhibits the highest performance across all metrics, particularly in specificity (0.9434) and F1 score (0.9717), which demonstrates its superior ability to classify both positive and negative classes with accuracy. The CNN + Slot + LSTM model also exhibits an enhanced performance relative to the baseline, with a balanced accuracy of 0.9250, sensitivity of 0.9322, and specificity of 0.9146. These findings indicate that integrating slot attention with GRU yields the most substantial improvement in model performance.

The Receiver Operating Characteristic (ROC) curves shown in Figure 5 demonstrate the incremental enhancement in model performance across the diverse CNN architectures. The baseline CNN model demonstrates satisfactory, though somewhat lower, area under the curve (AUC) values (0.95), while the incorporation of slot attention (AUC = 0.97) enhances the ability to distinguish between classes. The models enhanced with GRU and LSTM demonstrate near-perfect performance, with both achieving AUC values close to 1.0, indicating the near-flawless classification of haploid and diploid seeds. The results indicate that integrating Slot Attention with sequential layers, such as GRU and LSTM, markedly enhances the model’s capacity to differentiate between classes, thereby improving the accuracy and reliability of classification.

To enhance the model’s robustness against variations commonly encountered in real-world agricultural imaging, additional data augmentation techniques were incorporated during training. The training set was enriched with random brightness adjustments (range: 0.9–1.1), slight zooming (±10%), and both horizontal and vertical flipping, simulating changes in lighting, scale, and orientation. The validation set remained unaugmented, apart from normalization, to ensure a fair and unbiased evaluation. A comparative assessment was performed using identical model architectures trained with and without augmentation. Notably, the CNN + Slot + GRU model achieved the highest specificity (0.9631) and an F1 Score of 0.9190 under augmentation, indicating superior generalization. While the baseline CNN model exhibited a slight drop in accuracy (from 0.8800 to 0.8700), its specificity improved (from 0.8211 to 0.8293), suggesting a reduced false-positive rate. The CNN + Slot + LSTM model also maintained strong performance, achieving 0.9200 accuracy and 0.9335 F1 Score with augmented data. However, data augmentation did not uniformly enhance all performance metrics; for example, the CNN + Slot model showed a decrease in sensitivity (from 0.9633 to 0.8955). These results suggest that while augmentation generally improves generalization and model robustness, its effectiveness is architecture-dependent. Models combining slot attention with sequential layers (GRU or LSTM) benefited the most, particularly in terms of specificity and balanced classification performance.

In order to evaluate the generalization capacity, dependability, resilience, and efficacy of the models across disparate data distributions, the performances of the models were examined on a comprehensive data set. Table 5 provides a comprehensive assessment of the model’s performance through the presentation of a summary of classification performance metrics for CNN-based models on the Maize Variety Dataset. The CNN + Slot + GRU model outperforms others, achieving the highest accuracy (0.9230), sensitivity (0.9220), specificity (0.9631), and F1 Score (0.9190), with an AUC of 0.99. These results align with previous findings (Table 4), where the CNN + Slot + GRU model also showed superior performance. Notably, the addition of Slot Attention and recurrent layers enhances model robustness and reliability, particularly in handling complex data distributions, reaffirming the effectiveness of these components in improving model generalization and classification accuracy across different datasets.

A further investigation was conducted to evaluate the influence of data augmentation on model performance using the Maize Variety Dataset. Interestingly, while data augmentation improved generalization in the Roviel Maize Seed Dataset, its effects were not uniformly beneficial in the more complex and diverse Maize Variety Dataset. As shown in Table 5, the performance of the baseline CNN model declined markedly under augmentation, with accuracy dropping from 0.8205 to 0.7342 and the F1 Score decreasing from 0.8095 to 0.7078. A similar trend was observed for the CNN + Slot model, where the performance metrics remained stagnant or slightly deteriorated. This suggests that simplistic architectures may overfit to augmented features or struggle with augmented data variance in more heterogeneous datasets. On the other hand, deeper models with attention and recurrence mechanisms were less affected or even benefited. For example, the CNN + Slot + GRU model, though experiencing a decrease in specificity (from 0.9631 to 0.8902), maintained high accuracy (0.8983) and the highest F1 Score (0.9130) among all augmented models. Likewise, the CNN + Slot + LSTM model sustained a strong classification performance, achieving 0.8917 accuracy and 0.9075 F1 Score. These results indicate that while augmentation may introduce noise or complexity detrimental to simpler models, architectures incorporating both spatial (Slot Attention) and temporal (GRU/LSTM) mechanisms are better equipped to handle augmented variability, thus ensuring more reliable and balanced predictions.

Figure 6a illustrates that the CNN + Slot + GRU model attains the highest accuracy in both datasets: 0.9667 for Maize Seed and 0.9230 for Maize Variety. In comparison, the basic CNN model exhibits the lowest accuracy, particularly for the Maize Variety Dataset (0.8205), which underscores the efficacy of slot attention and GRU. Figure 6b, which depicts the CNN + Slot + GRU model, exhibits the highest sensitivity, with values of 0.9689 for Maize Seed and 0.9220 for Maize Variety. This suggests that the model is particularly capable of identifying true positives. In comparison, the basic CNN model displays the lowest sensitivity, particularly in the Maize Variety Dataset, which highlights its inherent limitations. As illustrated in Figure 6c, the CNN + Slot + GRU model exhibits remarkable specificity, with values of 0.9434 for Maize Seed and 0.9631 for Maize Variety, indicating efficacious true negative identification. In comparison, the basic CNN model demonstrates comparatively lower specificity in the Maize Seed Dataset, suggesting a propensity to generate a greater number of false positives. Figure 6d illustrates that the CNN + Slot + GRU model exhibits superior performance, with F1 scores of 0.9717 for Maize Seed and 0.9190 for Maize Variety. This reflects a balanced performance across both datasets.

Figure 7 illustrates a comparison between the original images of maize seeds and their respective Grad-CAM heatmaps, which identify the sections of each image that exert the most influence on the model’s classification decisions. The original images are displayed on the left, with their corresponding heatmaps overlaid on the right. The Grad-CAM technique effectively visualizes the regions of interest that the model prioritizes when determining the class of each maize seed.

Notably, the highlighted regions frequently coincide with biologically meaningful morphological features. In diploid seeds, the model consistently focuses on the embryo and endosperm regions, where R1-nj pigmentation is distinctly expressed. In contrast, for haploid seeds, the attention maps emphasize the central embryonic zone, where the absence of pigmentation and a smaller embryo size are characteristic. This suggests that the model attends to phenotypically relevant traits, such as the presence or absence of anthocyanin coloration and the development of the embryo, which are known indicators of ploidy status in maize. Such biological grounding validates the interpretability of the model and affirms that its decisions are guided by agriculturally and genetically informative visual cues. This correspondence strengthens confidence in the model’s reliability for real-world maize breeding applications, especially in doubled haploid technology.

4.1. Ablation Study of Model Components

To further validate the effectiveness of the Slot Attention mechanism used in the proposed model, we conducted comparative experiments with three alternative attention methods: Squeeze-and-Excitation (SE), CBAM, and Transformer-based attention. As summarized in Table 6, these models were evaluated on both the Maize Variety and Rovile Datasets using key performance metrics. The results clearly demonstrate that the proposed CNN + Slot + GRU model significantly outperforms the alternatives across all metrics. On the Variety Dataset, the Slot Attention-based model achieved an accuracy of 92.30%, while SE, CBAM, and Transformer-based models lagged behind with accuracies below 76%. Similarly, on the Rovile Dataset, although SE, CBAM, and Transformer variants exhibited competitive performance, none surpassed the proposed model in terms of its F1 score or general balance between sensitivity and specificity. These findings highlight the superior ability of Slot Attention to capture object-centric and spatially distinct features within seed images—an advantage that becomes especially evident when dealing with large-scale and visually heterogeneous datasets. Moreover, the integration of GRU further enhances the model’s temporal understanding and robustness. This comparative analysis not only justifies the selection of Slot Attention in this study but also emphasizes its role in achieving higher generalization and classification accuracies compared to other widely adopted attention schemes.

4.2. Benchmarking Against Modern Deep Learning Models

In order to fairly position the Slot-Maize model among current state-of-the-art architectures, we additionally evaluated InceptionV3, Vision Transformer, and ConvNeXt under the same training conditions. As shown in Table 7, although all three architectures demonstrated competitive performance, the proposed model consistently outperformed them across both datasets. Notably, ConvNeXt achieved strong F1 scores (0.9181 and 0.9629), yet Slot-Maize retained the highest accuracy and balanced metrics, confirming its robustness and generalizability.

The integration of additional state-of-the-art models—InceptionV3, Vision Transformer, and ConvNeXt—provided further validation of our model’s superiority. While ConvNeXt and Vision Transformer showed high sensitivity and accuracy, especially on the Rovile Dataset, the proposed Slot-Maize model outperformed all in its overall F1 Score and maintained consistent specificity. This suggests that the Slot Attention mechanism combined with recurrent layers not only preserves critical spatial features but also enhances sequential contextualization, which traditional feed-forward models may lack.

4.3. Ensemble Learning with Multiple Slot-Maize Variants

To enhance robustness under uncertain conditions, we constructed five Slot-Maize variants with distinct hyperparameter settings, as detailed in Table 8. Figure 8 compares their final epoch accuracy and loss. By integrating their predictions through ensemble methods, such as averaging or voting, we aim to reduce variance and improve the classification stability in challenging agricultural imaging scenarios.

Figure 8 illustrates significant performance differences among the Slot-Maize variants. Slot-Maize C achieved the highest accuracy and the lowest loss, indicating strong convergence and generalization. Slot-Maize E also performed well, suggesting that a lower learning rate may be advantageous. In contrast, Slot-Maize A and B exhibited moderate accuracy but relatively high loss, which suggests issues with overfitting or unstable learning. These findings emphasize the importance of diverse hyperparameters and support the ensemble approach to harness the complementary strengths of the various model variants.

5. Discussion

This section presents a comprehensive analysis of the impact of the various factors influencing model performance, including dataset characteristics, model complexity, generalization, and robustness. Additionally, it underscores the advantages of employing innovative techniques and XAI, particularly in the context of the related studies summarized in Table 9. This analysis aims to provide deeper insights into how these elements contribute to the effectiveness of different models in maize seed classification, ultimately guiding future research and practical applications in agricultural technology.

The characteristics of the datasets used in these models have a significant impact on their performance. Spectral data, as in [14], often yield high accuracy (100%) due to the rich, detailed information captured at different wavelengths, which enhances feature differentiation. Grayscale images, as used in [22,35], generally result in lower accuracy (91.23%) because they lack color information, which limits feature extraction capabilities. Color images, as seen in most other studies, provide a balance, with models achieving high accuracy due to the additional RGB channels that increase feature richness. In addition, the number of images affects model performance, with larger datasets, such as the maize variety dataset, providing more training data, which improves generalization, but also requires more computing power.

The trade-off between model complexity and performance is evident when comparing hybrid deep learning models to traditional machine learning techniques [22,36]. Hybrid models, such as those combining CNNs with Slot Attention and GRU/LSTM, often achieve higher accuracy due to their ability to capture complex patterns and interactions within the data. However, this increased accuracy comes at the cost of significantly higher computational requirements, making these models more resource-intensive and slower to train. In contrast, simpler methods such as GLCM with decision trees offer faster processing and lower computational requirements, but may struggle with the intricacies of complex datasets, resulting in lower accuracy. This trade-off requires careful consideration of the specific application and available resources when choosing a model.

The generalization and the robustness of the models are crucial for the evaluation of their applicability to different types of datasets. The proposed approach, which integrates CNN, slot attention, and GRU/LSTM, demonstrates strong generalization capabilities, as evidenced by its high performance on both the maize seed dataset (96.97% accuracy) and the maize variety dataset (92.30% accuracy). This consistency across datasets suggests that the model effectively captures the underlying patterns, allowing it to handle different data distributions. In contrast, models that perform well on a single dataset but poorly on others may lack robustness, limiting their broader applicability. The ability of our model to maintain high accuracy across different datasets highlights its potential for a reliable performance in real-world agricultural applications, where data variability is common.

Our evaluation of the model’s performance under occlusion and sensor noise conditions highlights its robustness. In occlusion sensitivity experiments, we systematically masked different regions of maize seed images to analyze the impact on classification accuracy. The results revealed that model predictions were most affected by occlusions in the embryo and endosperm regions, which were also identified as critical by Grad-CAM visualizations. This finding suggests that the model’s decision-making process is guided by its focus on biologically significant features, demonstrating its understanding of the agricultural context. Furthermore, we simulated sensor noise by applying Gaussian noise and motion blur to the input images. Despite a slight decline in classification performance under severe noise conditions, the CNN + Slot + GRU model maintained strong resilience, with F1 score reductions remaining below 5%. These outcomes underscore the model’s robustness and potential effectiveness in practical agricultural applications, even in imperfect imaging conditions.

As an explainable AI technique, Grad-CAM provides critical insights into the decision-making process of deep learning models, a feature missing from the studies reported in Table 9. By highlighting the specific regions of maize seed images that influence classification decisions, Grad-CAM increases transparency and confidence in the model’s predictions. This is particularly beneficial in agricultural applications, where understanding why a model makes certain decisions is critical to ensuring accuracy and reliability. The use of XAI techniques such as Grad-CAM not only improves model interpretability, but also facilitates more informed decision-making in practical settings, distinguishing our study from previous work.

The proposed model introduces the innovative integration of Slot Attention and GRU, a combination not explored in previous studies, which significantly improves the classification performance. Slot Attention improves the model’s ability to focus on relevant features within the maize seed images, enabling the more accurate identification of subtle differences between haploid and diploid seeds. The GRU further refines this process by efficiently capturing dependencies in the data, especially in sequential contexts. This synergistic integration leads to superior generalization and robustness across diverse datasets, setting our model apart from conventional approaches and demonstrating its potential for more precise agricultural applications.

The proposed model, which integrates CNN, Slot Attention, GRU, and LSTM, shows superior performance compared to the other models listed in Table 9. It achieves an accuracy of 96.97% on the maize seed dataset, outperforming models such as DeepMaizeNet (94.13%) [12] and the majority voting CNN ensemble (90.96%) [13]. The improved performance can be attributed to the enhanced feature extraction capabilities of Slot Attention and the efficient handling of sequential data by GRU and LSTM, which together allow the model to capture complex patterns in the data. However, it is important to note that performance metrics alone do not provide a complete picture. Differences in dataset characteristics and validation techniques can have a significant impact on these metrics. Without a thorough evaluation of the datasets and methodologies used, comparisons based solely on performance metrics can be misleading and may overestimate the effectiveness of a model in real-world applications.

6. Conclusions

The objective of this study was to enhance the precision and efficiency of maize seed classification, particularly in distinguishing between haploid and diploid seeds, through the development of the Slot-Maize deep learning model. The proposed model demonstrated superior accuracy and robustness, significantly outperforming existing methods and showcasing strong generalization across diverse datasets. The integration of Slot Attention, GRU, and LSTM layers within the Slot-Maize model effectively captured complex patterns within the seed images, leading to more accurate classifications. The Slot-Maize model demonstrated a classification accuracy of 96.97% on the Maize Seed Dataset and 92.30% on the Maize Variety Dataset. It showed better performance than existing methods, with improved sensitivity, specificity, and F1 scores across both datasets. This highlights the efficacy of the Slot-Maize model in accurately distinguishing between haploid and diploid seeds. The results of this study indicate that Slot-Maize has the potential to markedly enhance the efficacy of maize breeding programs, facilitating accelerated genetic advancement and optimized hybrid variety development. However, the investigation also uncovered certain constraints, particularly in the model’s performance when applied to smaller, less diverse datasets, underscoring the necessity for further optimization. This study utilized Grad-CAM as an explainable AI technique, offering visual insights into the model’s decision-making process. By highlighting the key regions in maize seed images that influenced classification, Grad-CAM enhanced the transparency and interpretability of the Slot-Maize model, making its predictions more understandable and reliable for agricultural applications.

Additional research could explore the integration of more explainable artificial intelligence (XAI) techniques to improve model transparency. It could also examine the application of Slot-Maize to different crop types, thereby expanding its usefulness in agricultural technology. Overall, Slot-Maize is a significant advancement in automated seed classification, providing a reliable and interpretable tool that enhances the accuracy and efficiency of agricultural practices.

Author Contributions

Author contribution part: writing—original draft preparation, Z.C.; Methodology, Z.C.; Software, A.T.K.; Formal analysis, Y.C.; Resources, E.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This article does not contain any data, or other information from studies or experimentation, with the involvement of human or animal subjects.

Data Availability Statement

The dataset is available on the GitHub repository (https://github.com/TechResearchLab/Maize-Images-Data, accessed on 25 June 2025).

Conflicts of Interest

The authors declare that there are no conflicts of interest related to this paper.

References

Ssemugenze, B.; Ocwa, A.; Bojtor, C.; Illés, Á.; Esimu, J.; Nagy, J. Impact of Research on Maize Production Challenges in Hungary. Heliyon 2024, 10, e26099. [Google Scholar] [CrossRef]
Pérez Ruiz, R.V.; Aguilar Toalá, J.E.; Cruz Monterrosa, R.G.; Rayas Amor, A.A.; Rodríguez, M.H.; Villasana, Y.C.; Pérez, J.H. Mexican Native Maize: Origin, Races and Impact on Food and Gastronomy. Int. J. Gastron. Food Sci. 2024, 37, 100978. [Google Scholar] [CrossRef]
Li, H.; Fernie, A.R.; Yang, X. Using Systems Metabolic Engineering Strategies for High-Oil Maize Breeding. Curr. Opin. Biotechnol. 2023, 79, 102847. [Google Scholar] [CrossRef]
Xu, L.; Najeeb, U.; Tang, G.X.; Gu, H.H.; Zhang, G.Q.; He, Y.; Zhou, W.J. Haploid and Doubled Haploid Technology. In Rapeseed Breeding; Academic Press: Cambridge, MA, USA, 2007; Volume 45, pp. 181–216. ISBN 0065-2296. [Google Scholar]
Qu, Y.; Fernie, A.R.; Liu, J.; Yan, J. Doubled Haploid Technology and Synthetic Apomixis: Recent Advances and Applications in Future Crop Breeding. Mol. Plant 2024, 17, 1005–1018. [Google Scholar] [CrossRef]
Altuntaş, Y.; Cömert, Z.; Kocamaz, A.F. Identification of Haploid and Diploid Maize Seeds Using Convolutional Neural Networks and a Transfer Learning Approach. Comput. Electron. Agric. 2019, 163, 104874. [Google Scholar] [CrossRef]
Ahmar, S.; Usman, B.; Hensel, G.; Jung, K.-H.; Gruszka, D. CRISPR Enables Sustainable Cereal Production for a Greener Future. Trends Plant Sci. 2024, 29, 179–195. [Google Scholar] [CrossRef]
Zaefarian, F.; Cowieson, A.J.; Pontoppidan, K.; Abdollahi, M.R.; Ravindran, V. Trends in Feed Evaluation for Poultry with Emphasis on in Vitro Techniques. Anim. Nutr. 2021, 7, 268–281. [Google Scholar] [CrossRef]
Kaur, H.; Kyum, M.; Sandhu, S.; Singh, G.; Sharma, P. Protocol Optimization and Assessment of Genotypic Response for Inbred Line Development through Doubled Haploid Production in Maize. BMC Plant Biol. 2023, 23, 219. [Google Scholar] [CrossRef]
Salimi, A.; Ghobrial, T.; Bonakdari, H. A Comprehensive Review of AI-Based Methods Used for Forecasting Ice Jam Floods Occurrence, Severity, Timing, and Location. Cold Reg. Sci. Technol. 2024, 227, 104305. [Google Scholar] [CrossRef]
Ram, B.G.; Oduor, P.; Igathinathane, C.; Howatt, K.; Sun, X. A Systematic Review of Hyperspectral Imaging in Precision Agriculture: Analysis of Its Current State and Future Prospects. Comput. Electron. Agric. 2024, 222, 109037. [Google Scholar] [CrossRef]
Ayaz, I.; Kutlu, F.; Cömert, Z. DeepMaizeNet: A Novel Hybrid Approach Based on CBAM for Implementing the Doubled Haploid Technique. Agron. J. 2024, 116, 861–870. [Google Scholar] [CrossRef]
Dönmez, E.; Diker, A.; Elen, A.; Ulu, M. Multiple Deep Learning by Majority-Vote to Classify Haploid and Diploid Maize Seeds. Sci. Hortic. 2024, 337, 113549. [Google Scholar] [CrossRef]
Rodrigues Ribeiro, M.; Lúcia Ferreira Simeone, M.; dos Santos Trindade, R.; Antônio dos Santos Dias, L.; José Moreira Guimarães, L.; Salete Tibola, C.; Cristina de Azevedo, T. Near Infrared Spectroscopy (NIR) and Chemometrics Methods to Identification of Haploids in Maize. Microchem. J. 2023, 190, 108604. [Google Scholar] [CrossRef]
Dönmez, E.; Kılıçarslan, S.; Közkurt, C.; Diker, A.; Demir, F.B.; Elen, A. Identification of Haploid and Diploid Maize Seeds Using Hybrid Transformer Model. Multimed. Syst. 2023, 29, 3833–3845. [Google Scholar] [CrossRef]
Güneş, A.; Dönmez, E. Interactive Classification of Maize Seeds with Batch Mode Active Learning. In Proceedings of the 2023 Innovations in Intelligent Systems and Applications Conference (ASYU), Sivas, Turkey, 11–13 October 2023; pp. 1–5. [Google Scholar]
He, X.; Liu, L.; Liu, C.; Li, W.; Sun, J.; Li, H.; He, Y.; Yang, L.; Zhang, D.; Cui, T.; et al. Discriminant Analysis of Maize Haploid Seeds Using Near-Infrared Hyperspectral Imaging Integrated with Multivariate Methods. Biosyst. Eng. 2022, 222, 142–155. [Google Scholar] [CrossRef]
Dönmez, E. Enhancing Classification Capacity of CNN Models with Deep Feature Selection and Fusion: A Case Study on Maize Seed Classification. Data Knowl. Eng. 2022, 141, 102075. [Google Scholar] [CrossRef]
Ge, W.; Li, J.; Wang, Y.; Yu, X.; An, D.; Chen, S. Maize Haploid Recognition Study Based on Nuclear Magnetic Resonance Spectrum and Manifold Learning. Comput. Electron. Agric. 2020, 170, 105219. [Google Scholar] [CrossRef]
Dönmez, E. Discrimination of Haploid and Diploid Maize Seeds Based on Deep Features. In Proceedings of the 2020 28th Signal Processing and Communications Applications Conference (SIU), Gaziantep, Turkey, 5–7 October 2020; pp. 1–4. [Google Scholar]
Liao, W.; Wang, X.; An, D.; Wei, Y. Hyperspectral Imaging Technology and Transfer Learning Utilized in Haploid Maize Seeds Identification. In Proceedings of the 2019 International Conference on High Performance Big Data and Intelligent Systems (HPBD&IS), Shenzhen, China, 9–11 May 2019; pp. 157–162. [Google Scholar]
Altuntaş, Y.; Kocamaz, A.F.; Cömert, Z.; Cengiz, R.; Esmeray, M. Identification of Haploid Maize Seeds Using Gray Level Co-Occurrence Matrix and Machine Learning Techniques. In Proceedings of the 2018 International Conference on Artificial Intelligence and Data Processing (IDAP), Malatya, Turkey, 28–30 September 2018; pp. 1–5. [Google Scholar]
Asante, E.; Appiah, O.; Opoku, E. Maize Variety Dataset for Classification; University of Energy and Natural Resources: Sunyani, Ghana, 2024. [Google Scholar]
Wang, J.; Ye, X.; Wu, D.; Gong, J.; Tang, X.; Li, Z. Evolution of Siamese Visual Tracking with Slot Attention. Electronics 2024, 13, 586. [Google Scholar] [CrossRef]
Zhou, Y.; Zhu, H.; Zhang, Y.; Liang, S.; Wang, Y.; Yang, W. Generalized Category Discovery in Aerial Image Classification via Slot Attention. Drones 2024, 8, 160. [Google Scholar] [CrossRef]
Ye, J.; Wang, Y.; Xie, F.; Wang, Q.; Gu, X.; Wu, Z. Slot-VTON: Subject-Driven Diffusion-Based Virtual Try-on with Slot Attention. Vis. Comput. 2024, 41, 3297–3308. [Google Scholar] [CrossRef]
Sankalp, S.; Rao, U.M.; Patra, K.C.; Sahoo, S.N. Chapter 11—Modeling Gated Recurrent Unit (GRU) Neural Network in Forecasting Surface Soil Wetness for Drought Districts of Odisha. In Modeling and Mitigation Measures for Managing Extreme Hydrometeorological Events Under a Warming Climate; Kasiviswanathan, K.S., Soundharajan, B., Patidar, S., He, J., Ojha, C.S.P., Eds.; Elsevier: Amsterdam, The Netherlands, 2023; Volume 14, pp. 217–229. ISBN 1474-8177. [Google Scholar]
Zhao, Z.; Fu, Y.; Pu, J.; Wang, Z.; Shen, S.; Ma, D.; Xie, Q.; Zhou, F. Performance Decay Prediction Model of Proton Exchange Membrane Fuel Cell Based on Particle Swarm Optimization and Gate Recurrent Unit. Energy AI 2024, 17, 100399. [Google Scholar] [CrossRef]
Natu, M.; Bachute, M.; Kotecha, K. HCLA_CBiGRU: Hybrid Convolutional Bidirectional GRU Based Model for Epileptic Seizure Detection. Neurosci. Inform. 2023, 3, 100135. [Google Scholar] [CrossRef]
Chang, S.-C.; Chang, C.-Y.; Lee, H.-Y.; Chien, I.-L. Grade Transition Optimization by Using Gated Recurrent Unit Neural Network for Styrene-Acrylonitrile Copolymer Process. In Proceedings of the 14 International Symposium on Process Systems Engineering, Kyoto, Japan, 19–23 June 2022; Yamashita, Y., Kano, M., Eds.; Elsevier: Amsterdam, The Netherlands, 2022; Volume 49, pp. 1723–1728, ISBN 1570-7946. [Google Scholar]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
Ali, S.; Akhlaq, F.; Imran, A.S.; Kastrati, Z.; Daudpota, S.M.; Moosa, M. The Enlightening Role of Explainable Artificial Intelligence in Medical & Healthcare Domains: A Systematic Literature Review. Comput. Biol. Med. 2023, 166, 107555. [Google Scholar] [CrossRef]
Li, L.; Wang, B.; Verma, M.; Nakashima, Y.; Kawasaki, R.; Nagahara, H. SCOUTER: Slot Attention-Based Classifier for Explainable Image Recognition. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Virtual, 11–17 October 2021; pp. 1026–1035. [Google Scholar]
Cömert, Z.; Kocamaz, A.F.; Subha, V. Prognostic Model Based on Image-Based Time-Frequency Features and Genetic Algorithm for Fetal Hypoxia Assessment. Comput. Biol. Med. 2018, 99, 85–97. [Google Scholar] [CrossRef]
Alejandrino, J.D.; Concepcion, R.; Bandala, A.; Palconit, M.G.; Vicerra, R.R.; Dadios, E.P. ChromoCorn: Zea mays Chromosome Classification Using Zero Order Fuzzy Inference System Based on Kernel’s Haralick Grey Level Texture Phenes. In Proceedings of the 2022 IEEE 14th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM), Boracay, Philippines, 1–4 December 2022; pp. 1–6. [Google Scholar]
Altuntaş, Y.; Kocamaz, A.F.; Cengiz, R.; Esmeray, M. Classification of Haploid and Diploid Maize Seeds by Using Image Processing Techniques and Support Vector Machines. In Proceedings of the 2018 26th Signal Processing and Communications Applications Conference (SIU), Izmir, Turkey, 2–5 May 2018; pp. 1–4. [Google Scholar]

Figure 1. Distinct morphological features of haploid (a) and diploid (b) maize seeds, with emphasis on the expression of R1-nj in the endosperm and embryo.

Figure 2. Schematic representation of the Slot-Maize model, highlighting the integration of convolutional, slot attention, and GRU layers.

Figure 3. (a) Validation accuracy and (b) validation loss of baseline and enhanced CNN models using slot attention, GRU, and LSTM.

Figure 4. Confusion matrices comparing classification performance of baseline and enhanced CNN models: (a) CNN, (b) CNN + Slot, (c) CNN + Slot + GRU, (d) CNN + Slot + LSTM.

Figure 5. ROC curves comparing classification performance of baseline and enhanced CNN models: (a) CNN, (b) CNN + Slot Attention, (c) CNN + Slot Attention + GRU, (d) CNN + Slot Attention + LSTM.

Figure 6. A Comparative Performance Analysis of CNN-Based Models on Maize Seed and Maize Variety Datasets, Evaluating: (a) Accuracy, (b) Sensitivity, (c) Specificity, and (d) F1 Score Across Model Variations.

Figure 7. Comparison of original maize seed images with Grad-CAM outputs for model interpretability.

Figure 8. Evaluation of Slot-Maize variants with varying slot counts, dimensions, GRU units, and learning rates.

Table 1. Summary of haploid and diploid maize seed counts in the Rovile dataset.

Class	Total
Haploid	1230
Diploid	1770
Total	3000

Table 2. Summary of image counts for each maize seed in Variety dataset.

Class	Total
Bhihilifa	6480
Sanzal Sima	5100
Wang Dataa	6144
Total	17,724

Table 3. Hyperparameter configuration and descriptions for CNN training.

Parameter	Values	Short Description
Batch size	16	The following data set contains the number of images processed in each batch during the training phase.
Epochs	128	The total number of training iterations conducted across the entire dataset.
Shuffle	True/False	Whether to shuffle the data (True for training, False for validation).
Optimizer	Adam	The optimization algorithm is used for updating the model’s weights during training.
Loss	Categorical Cross Entropy	Loss function used for multi-class classification.
Patience	3	Number of epochs with no improvement after which training will be stopped early.
Monitor	Validation Accuracy	Metric used to monitor model performance and trigger early stopping.
Restore Best Weights	True	Deciding whether to restore the model’s weights from the epoch that achieved the best validation loss.
Train/test split ratio	8:2	The dataset was partitioned into two distinct segments for training and testing purposes, with an 8:2 ratio of training to testing data.

Table 4. Summary of classification performance for CNN-based models on Rovile Maize Seed Dataset.

Models	Data Augmentation	Dataset	Acc	Se	Sp	F1 Score
CNN	No	Roviel Maize Seed Dataset	0.8800	0.9209	0.8211	0.9006
CNN + Slot			0.9017	0.9633	0.8130	0.9204
CNN + Slot + GRU			0.9667	0.9689	0.9434	0.9717
CNN + Slot + LSTM			0.9250	0.9322	0.9146	0.9362
CNN	Yes		0.8700	0.8983	0.8293	0.8908
CNN + Slot			0.8583	0.8955	0.8049	0.8818
CNN + Slot + GRU			0.9230	0.9220	0.9631	0.9190
CNN + Slot + LSTM			0.9200	0.9520	0.8740	0.9335

Table 5. Summary of classification performance for CNN-based models on Maize Variety Dataset.

Models	Data Augmentation	Dataset	Acc	Se	Sp	F1 Score
CNN	No	Maize Variety Dataset	0.8205	0.8091	0.9112	0.8095
CNN + Slot			0.8696	0.8638	0.9363	0.8626
CNN + Slot + GRU			0.9230	0.9220	0.9631	0.9190
CNN + Slot + LSTM			0.8598	0.8569	0.9315	0.8537
CNN	Yes		0.7342	0.7110	0.8680	0.7078
CNN + Slot			0.8205	0.8091	0.9112	0.8095
CNN + Slot + GRU			0.8983	0.9040	0.8902	0.9130
CNN + Slot + LSTM			0.8917	0.9011	0.8780	0.9075

Table 6. Comparative performance analysis of the proposed Slot Attention-based model and other attention mechanisms (SE, CBAM, Transformer) across two maize classification datasets.

Models	Dataset	Acc	Se	Sp	F1 Score
CNN + SE	Variety Dataset	0.7582	0.7456	0.8818	0.7460
CNN + CBAM		0.7506	0.7442	0.8794	0.7389
CNN + Transformer Attention		0.7523	0.7289	0.8766	0.7552
Proposed model, CNN + Slot + GRU		0.9230	0.9220	0.9631	0.9190
CNN + SE	Rovile Dataset	0.9550	0.9520	0.9593	0.9615
CNN + CBAM		0.9667	0.9605	0.9756	0.9714
CNN + Transformer Attention		0.9633	0.9859	0.9309	0.9694
Proposed Model, CNN + Slot + GRU		0.9667	0.9689	0.9434	0.9717

Table 7. Comparative performance of Slot-Maize and state-of-the-art models under identical training conditions.

Models	Dataset	Acc	Se	Sp	F1 Score
InceptionV3	Variety Dataset	0.8598	0.8569	0.9315	0.8537
Vision Transformer		0.8800	0.9209	0.8211	0.9006
ConvNeXt		0.8983	0.9661	0.8008	0.9181
Proposed model, CNN + Slot + GRU		0.9230	0.9220	0.9631	0.9190
InceptionV3	Rovile Dataset	0.9250	0.9322	0.9146	0.9362
Vision Transformer		0.9467	0.9887	0.8862	0.9563
ConvNeXt		0.9567	0.9520	0.9634	0.9629
Proposed Model, CNN + Slot + GRU		0.9667	0.9689	0.9434	0.9717

Table 8. Hyperparameter configurations of Slot-Maize model variants.

Variant	Slot Count	Slot Dim	GRU Units	Learning Rate
Slot Maize A	3	64	128	0.001
Slot Maize B	4	64	128	0.001
Slot Maize C	3	128	128	0.0005
Slot Maize D	3	64	256	0.001
Slot Maize E	5	64	128	0.0001

Table 9. Summary of methods and performance in maize seed classification using different approaches.

Year	Ref.	Methods	Model Type	Dataset	# of Images and Classes	Performance Measures (%)
2024	[12]	DeepMaizeNet hybrid model utilizing CBAM, hypercolumn, 2D upsampling, and residual blocks	Hybrid	Maize Seed Dataset	3000 maize seeds images with 2 classes	94.13
2024	[13]	Majority-voting model with five CNN architectures	Hybrid	Maize Seed Dataset	3000 maize seeds images with 2 classes	90.96
2023	[14]	NIR and PLS-DA for classification of maize seeds and plants	Chemometrics	Spectral Data	556 spectral images with 2 classes	100
2023	[15]	Hybrid model combining EfficientNetV2B0 with ResMLP	Hybrid	Maize Seed Dataset	3000 maize seeds images with 2 classes	96.33
2023	[16]	Interactive classification with batch mode active learning and SVM	ML	Maize Seed Dataset	3000 maize seeds images with 2 classes	N/A
2022	[17]	NIR-HSI combined with SPA, CARS, UVE, and PLSDA	Chemometrics	Maize Seed Dataset	400 spectral images with 2 classes	90.31
2022	[18]	Deep feature extraction from six CNN models, MRMR selection, feature fusion, SVM classification	Hybrid	Maize Seed Dataset	3000 maize seeds images with 2 classes	96.74
2022	[35]	Zero Order Fuzzy Inference System based on Haralick Grey Level Texture Phenes	Fuzzy Logic	Maize Kernel Dataset	284 grayscale images with 2 classes	91.23
2020	[19]	NMR spectrum with multi-manifold learning framework	Hybrid	Maize Kernel Dataset	400 spectral data with 2 classes	98.33 (High-oil) 90.00 (convential)
2020	[20]	AlexNet deep feature extraction and SVM classification	Hybrid	Maize Seed Dataset	3000 maize seeds images with 2 classes	89.50
2019	[6]	CNN models with transfer learning, best performance by VGG-19	DL	Maize Seed Dataset	3000 maize seeds images with 2 classes	94.22
2019	[21]	Hyperspectral imaging combined with VGG-19 transfer learning	DL	Zhengdan958	200 images with 2 classes	96.32
2018	[22]	GLCM texture features with Decision Trees, kNN, and ANN	ML	Maize Seed Dataset	413 maize seeds images with 2 classes	84.48
2024	This study	CNN, slot attention, GRU, and Grad-CAM	DL	Maize Seed dataset	3000 maize seed images with 2 classes	96.97
2024	This study	CNN, slot attention, GRU, and Grad-CAM	DL	Maize Variety Dataset for Classification	17.724 maize seed images with 3 classes	92.30

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cömert, Z.; Karadeniz, A.T.; Basaran, E.; Celik, Y. Classification of Maize Images Enhanced with Slot Attention Mechanism in Deep Learning Architectures. Electronics 2025, 14, 2635. https://doi.org/10.3390/electronics14132635

AMA Style

Cömert Z, Karadeniz AT, Basaran E, Celik Y. Classification of Maize Images Enhanced with Slot Attention Mechanism in Deep Learning Architectures. Electronics. 2025; 14(13):2635. https://doi.org/10.3390/electronics14132635

Chicago/Turabian Style

Cömert, Zafer, Alper Talha Karadeniz, Erdal Basaran, and Yuksel Celik. 2025. "Classification of Maize Images Enhanced with Slot Attention Mechanism in Deep Learning Architectures" Electronics 14, no. 13: 2635. https://doi.org/10.3390/electronics14132635

APA Style

Cömert, Z., Karadeniz, A. T., Basaran, E., & Celik, Y. (2025). Classification of Maize Images Enhanced with Slot Attention Mechanism in Deep Learning Architectures. Electronics, 14(13), 2635. https://doi.org/10.3390/electronics14132635

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Classification of Maize Images Enhanced with Slot Attention Mechanism in Deep Learning Architectures

Abstract

1. Introduction

2. Related Studies

2.1. Motivation and Contributions

2.2. Organization

3. Material and Methods

3.1. Datasets

3.2. Convolution Layers

3.3. Slot Attention Block

3.4. Gated Recurrent Unit (GRU)

3.5. Explainable Artificial Intelligence (XAI)

3.6. Performance Metrics and Model Evaluation

4. Results

4.1. Ablation Study of Model Components

4.2. Benchmarking Against Modern Deep Learning Models

4.3. Ensemble Learning with Multiple Slot-Maize Variants

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI