YOLO-BSMamba: A YOLOv8s-Based Model for Tomato Leaf Disease Detection in Complex Backgrounds

Liu, Zongfang; Guo, Xiangyun; Zhao, Tian; Liang, Shuang

doi:10.3390/agronomy15040870

Open AccessArticle

YOLO-BSMamba: A YOLOv8s-Based Model for Tomato Leaf Disease Detection in Complex Backgrounds

¹

College of Management Science and Engineering, Beijing Information Science and Technology University, Beijing 102206, China

²

College of Computer Science, Beijing Information Science and Technology University, Beijing 102206, China

^*

Author to whom correspondence should be addressed.

Agronomy 2025, 15(4), 870; https://doi.org/10.3390/agronomy15040870

Submission received: 6 March 2025 / Revised: 23 March 2025 / Accepted: 26 March 2025 / Published: 30 March 2025

(This article belongs to the Section Pest and Disease Management)

Download

Browse Figures

Versions Notes

Abstract

:

The precise identification of diseases in tomato leaves is of great importance for precise target pesticide application in a complex background scenario. Existing models often have difficulty capturing long-range dependencies and fine-grained features in images, leading to poor recognition where there are complex backgrounds. To tackle this challenge, this study proposed using YOLO-BSMamba detection mode. We proposed that a Hybrid Convolutional Mamba module (HCMamba) is integrated within the neck network, with the aim of improving feature representation by leveraging the capture global contextual dependencies capabilities of the State Space Model (SSM) and discerning the localized spatial feature capabilities of convolution. Furthermore, we introduced the Similarity-Based Attention Mechanism into the C2f module to improve the model’s feature extraction capabilities by focusing on disease-indicative leaf areas and reducing background noise. The weighted bidirectional feature pyramid network (BiFPN) was utilized to replace the feature-fusion component of the network, thereby enhancing the model’s detection performance for lesions exhibiting heterogeneous symptomatic gradations and enabling the model to effectively integrate features from different scales. Research results showed that the YOLO-BSMamba achieved an F1 score, mAP@0.5, and mAP@0.5:0.95 of 81.9%, 86.7%, and 72.0%, respectively, which represents an improvement of 3.0%, 4.8%, and 4.3%, respectively, compared to YOLOv8s. Compared to other YOLO series models, it achieves the best mAP@0.5 and F1 score. This study provides a robust and reliable method for tomato leaf disease recognition, which is expected to improve target pesticide efficiency, and further enhance crop monitoring and management in precision agriculture.

Keywords:

tomato leaf disease; object detection; YOLOv8s; state space model; mamba

1. Introduction

The tomato is an important vegetable and is cultivated globally. However, tomato diseases frequently arise as a result of natural environmental factors, like climate change, and human interventions, like poor drainage and insufficient fertilization, resulting in substantial reductions in yield and economic value [1]. Early disease detection can help in terms of applying precisely targeted pesticide to prevent the spread of disease and minimize crop loss [2]. Most disease symptoms on tomato plants usually show up on the leaves, and these might include spots, yellowing, necrosis, leaf distortion, etc. These can serve as important indicators for disease identification [3]. Traditional disease detection usually relies on visual examination or laboratory tests, both of which cannot only be quite time-intensive and labor-intensive but can also require specialized knowledge, making large-scale disease monitoring and accurate diagnosis difficult to achieve [4].

The emergence and development of deep learning technologies, particularly the application of Convolutional Neural Networks (CNNs), has led to their being introduced into agricultural disease recognition [5], which not only alleviates the workload but also enhances the accuracy of disease detection. S Ledbin Vini et al. [6] proposed TrioConvTomatoNet, which is a deep convolutional neural network architecture; it exhibited a remarkable precision rate of around 99.39% in terms of classifying tomato diseases. Chakrabarty et al. [7] introduced a hybrid framework integrating transformer architecture and lightweight CNN, which showed high precision and recall when detecting rice leaf diseases. Zhang et al. [8] regards the ResNet-50 as the foundational framework for integrating architecture into their proposed model, and the precision of tomato disease recognition on the Plant Village dataset [9] reached 98.25%. Liu et al. [10] introduced a multi-scale constrained deformable convolution network, referred to as MCDCNet, which effectively enhanced the detection accuracy for apple leaf diseases by extracting reliable features from varying scales and geometries, and its accuracy reached 66.8% for apple leaf disease in a complex natural environment with an enhancement of 3.85% relative to the existing state-of-the-art models. Although CNNs demonstrate strong performance in agricultural disease detection tasks, they still face challenges in agricultural scenarios that require real-time processing because they have slow computing speed when applied to devices with constrained computational capabilities. In response to these challenges, You Only Look Once (YOLO) series models [11,12,13,14,15,16,17,18] were proposed. YOLO is capable of forecasting the bounding boxes and probabilities of different classes from entire images via a singular neural network. This approach significantly simplifies the traditional multi-step detection process and ensures excellent accuracy while maintaining a high detection speed [11]. Therefore, YOLO has great advantages in scenarios requiring high precision and real-time, such as agricultural automation and crop health monitoring. Li et al. [19] introduced an improved lightweight model based on YOLOv5s for identifying vegetable disease, and the model achieved 93.1% for the mAP@0.5 in the dataset with five diseases because the algorithm effectively reduced missed and wrong detection caused by a complex background and small-scale symptoms of disease. Guo et al. [20] developed a model named YOLOv7-TMRTM for detecting rice leaf disease symptoms of various sizes, named YOLOv7-TMRTM, to rapidly and accurately detect rice leaf diseases. It outperforms the baseline YOLOv7-tiny model in detecting leaf spots of various sizes and types of small targets. Yang et al. [21] introduced the slim-neck module and Global Attention Mechanism (GAM) based on YOLOv8, achieving improvements of 3.56%, 7.3%, 3.79%, and 4.65% in the mAP@0.5, mAP0.5:0.95, precision, and recall for corn leaf disease detection, respectively. Yan et al. [22] introduced FSM-YOLO, an improved convolutional neural network for detecting apple leaf diseases, which effectively enhanced the detection accuracy by introducing adaptive feature capture and spatial context awareness. The model achieved a 2.7% improvement in the mAP@0.5 compared to the baseline YOLOv8s when using the ALDD dataset. However, YOLO and CNNs are constrained by their local receptive fields, and they usually find it difficult to capture global spatial features and dependencies [23], which would lead to the models failing to fully identify scattered lesion spots and subtle disease signs on leaves in a complex background scenario.

The State Space Model (SSM)-based approach, exemplified by Mamba [24], performs well in terms of solving long-distance dependencies while maintaining linear computational complexity in sequence length and has become an efficient and widely applicable sequence model that has addressed the computational inefficiency of transformers in modeling long sequences. Some studies have applied SSM to object detection and achieved good results. FER-YOLO-Mamba [25], combining Mamba and YOLO, which integrates the inherent advantages of convolution layers in local feature extraction and the excellent ability of SSM in revealing long-distance dependencies, shows strong robustness and generalization ability in the task of facial expression detection and classification. Mamba-YOLO [26] is a new object-detection model based on SSM. It not only optimizes the basis of SSM but also adapts specifically for object detection. Many experiments on public benchmark datasets, such as COCO and VOC, demonstrated that Mamba-YOLO surpassed the existing YOLO series models in performance, showcasing its substantial potential and competitive edge. More recently, Mamba2 [27] has further refined the Selective State Space Models (S6) by introducing the concept of State Space Duality (SSD). It regards the state space transition matrix as a scalar and extends the dimensions of the state space, thus improving the model performance and the efficiency of training and inference. Hence, in this research, we incorporated Mamba2 into the neck network and proposed the YOLO-BSMamba model for tomato leaf disease recognition. The innovation of the algorithm is as follows:

(1) A Similarity-Based Attention Mechanism (SimAM) [28] is introduced into the backbone network to reduce background noise interference and highlight the disease area, and to further enhance the adaptability of the model where there are various complex backgrounds.

(2) A Hybrid Convolutional Mamba (HCMamba) module is proposed in this study, which integrates local detail information extracted by convolution, with global context information being provided by the SSM. This design enhances the model’s capacity to capture both the global and detailed features of the image, thereby improving its performance in disease localization and classification.

(3) The weighted bidirectional feature pyramid network (BiFPN) is used as the feature-fusion module of the network. BiFPN’s multi-scale feature-fusion ability can improve the detection ability of the model to different degrees of diseases and its ability of weighted feature fusion improves the sensitivity of the model to key disease areas.

2. Materials and Methods

This section introduces the materials and methods used in this study, including the datasets, data preprocessing techniques, and the detailed architecture and components of the proposed YOLO-BSMamba model.

2.1. Materials

This subsection describes the datasets utilized in this research and the data preprocessing methods applied to enhance the quality and diversity of the dataset.

2.1.1. Data Acquisition

The datasets utilized in this study originate from three distinct datasets, namely, Plant Village dataset [9], PlantDoc dataset [29], and Taiwan tomato leaf disease dataset [30]. Plant Village is a plant disease dataset, and all images were taken in the laboratory. The dataset contains 54,306 leaf images, including 13 kinds of plants and 26 kinds of disease, and includes 18,160 tomato leaf images. The PlantDoc dataset includes 2598 images of sick and healthy leaves for 30 categories and 13 plant species. The dataset pays special attention to plants in natural environments and includes images featuring non-trivial background noise and different environmental factors, and this is the first plant disease detection dataset that contains uncontrolled settings. The Taiwan tomato leaf disease dataset contains labeled images covering six categories (black leaf spot, bacterial leaf spot, health, gray leaf spot, late blight, and powdery mildew), and the dataset includes images of single leaves, multiple leaves, ordinary backgrounds, and complex backgrounds. Samples of the above datasets are shown in Figure 1.

To construct our tomato disease dataset, we carefully selected and integrated relevant tomato leaf images from the above three datasets based on their quality, diversity, and the presence of specific disease categories. After filtering and preprocessing, we finally obtained 3362 tomato leaf images from the above three datasets, and these included images of late blight, early blight, leaf miner, leaf mold, mosaic disease, spot blight, red spider mite, yellow leaf curl disease, and healthy leaves. And then the tomato disease dataset was partitioned into training, validation, and test sets, according to a 7:2:1 ratio.

2.1.2. Data Preprocessing

To augment the dataset’s variety and quality, and enhance the model’s generalization ability, this study introduced data augmentation approach. Concurrently, in order to ensure the objectivity of the recognition results, data augmentation techniques were employed exclusively on the training set. The data enhancement methods included rotation, grayscale adjustment, brightness adjustment, random masking, and adding Gaussian noise. The enhanced sample images are shown in Figure 2. Finally, the quantity of images in the training set was augmented from 2354 to 7062, and the statistical overview of the number of bounding boxes in the tomato leaf disease images is presented in Table 1.

2.2. Methods

This section details the methods employed in the development of the YOLO-BSMamba model for tomato leaf disease detection. The methodology encompasses the foundational principles of the model architecture, the integration of novel modules to enhance feature extraction and detection capabilities, and the evaluation metrics used to assess the model’s performance.

2.2.1. Preliminaries

Models based on SSM, like Structured State Space Sequence Models (S4) and Mamba and Mamba2, both stem from a continuous system that maps a univariate sequence, expressed as x(t)∈

R

, through intermediate implicit states h(t)∈

R^{N}

to the output y(t)∈

R

. The above process is expressible as a linear Ordinary Differential Equation (ODE):

\begin{matrix} h^{'} (t) = A h (t) + B x (t) \\ y (t) = C h (t) \end{matrix}

(1)

In Equation (1),

A \in R^{N \times N}

represents the state transition matrix and

B \in R^{N \times 1}

represents the weight matrix of the input space relative to the hidden state.

C \in R^{N \times 1}

is an observation matrix, which maps the hidden intermediate state to the output. Mamba applies this continuous system to discrete-time sequence data by employing fixed discretization rules to transform the parameters

A

and

B

into discrete counterparts

\bar{A}

and

\bar{B}

, respectively, to better integrate the system into the deep learning architecture. A commonly used discretization method for this purpose is Zero-Order Hold (ZOH). The discretized version can be defined as Equation (2):

\begin{matrix} \bar{A} = e x p (∆ A) \\ \bar{B} = {(∆ A)}^{- 1} (\exp (∆ A) - I) \cdot ∆ B \end{matrix}

(2)

In Equation (2),

∆

represents a timescale parameter that adjusts the temporal resolution of the model, and

∆ A

and

∆ B

, correspondingly, denote the discrete-time counterparts of the continuous parameters over the given time interval. Here,

I

represents the identity matrix. Following discretization, SSM-based models can be calculated using two distinct methods: linear recurrence, as specified in Equation (3), or global convolution, as defined in Equation (4):

\begin{matrix} h^{'} (t) = \bar{A} h_{t - 1} + \bar{B} x_{t} \\ y_{t} = C h_{t} \end{matrix}

(3)

\begin{matrix} \bar{K} = (C \bar{B}, C \bar{A} \bar{B}, \dots, C {\bar{A}}^{L - 1} \bar{B}) \\ y = x * \bar{K} \end{matrix}

(4)

In Equation (4),

K \in R^{L}

signifies a structured convolutional kernel, which is calculated using the discrete Fourier transform (DFT) and the inverse discrete Fourier transform (IDFT).

L

represents the length of the input sequence x. This approach utilizes convolution to generate outputs throughout the sequence in a parallel manner, thereby improving computational scalability and efficiency.

In recent times, Mamba2 [27] introduced the concept of State SSD and reduced the matrix A to a scalar. This particular instance of Selective State Space Models (SSMs) is applicable in both linear and quadratic configurations. The matrix transformation representation of SSMs is depicted in Equation (5):

\begin{matrix} y (t) = \sum_{i - 1}^{t} C_{t}^{T} A_{t : i + 1} B_{i} x (i), w h e r e A_{t : i} = \prod_{i = 2}^{t} A_{i} \\ y = SSM (A, B, C) (x) = F x, where F_{j i} = C_{j}^{T} A_{j : i} B_{i} \end{matrix},

(5)

When

A_{i}

is simplified to a scalar, the quadratic expression in Equation (5) can be reformulated as follows:

y = F x = M \cdot (C^{T} B) x, w h e r e M_{i j} = \{\begin{matrix} \prod_{k = i + 1}^{j} A_{k} i > j \\ 1 i = j \\ 0 i < j \end{matrix},

(6)

while its linear form is denoted as:

h (t) = A_{t} h (t - 1) + B_{t} x (t) y (t) = C_{t} h (t)

(7)

2.2.2. Overall Architecture

The architecture of YOLO-BSMamba is shown in Figure 3.

The architecture of the YOLO-BSMamba is composed of three primary components: the backbone network, neck network, and detection head. In the backbone network, the C2fSimAM module, which integrates with the SimAm attention mechanism, replaces the original C2f modules. This modification enables the model to ignore the impact of complex background and focus on extracting features from the lesion areas. In the neck network, the HCMamba module is employed in place of the C2f module, enhancing the model’s capacity in extracting global and local features of the disease. Moreover, the neck network integrates the BiFPN to enhance the model’s multi-scale feature-fusion capability. Finally, the detection head processes the features passed from the neck network and produces identification outcomes, including bounding boxes and categorical confidence levels for recognized targets.

2.2.3. Hybrid Convolutional Mamba (HCMamba Module)

In this study, the HCMamba module is proposed to enhance the model’s recognition capability for diseases by synergizing the local feature extraction capacity of convolutional layers and the long-range dependency capture capability of SSM. The HCMamba module, shown in Figure 4, integrates feature enhancement and fusion through a structured pipeline. The process begins with a 1 × 1 convolutional layer to reduce dimensionality, followed by a DCBlock that captures spatial relationships and enhances feature representation. Subsequently, the output is processed by a ConvSSM block to further extract features. The output of ConvSSM is fused with a parallel feature stream processed by a depthwise convolution and a 1 × 1 convolution. The resulting feature maps are combined through an element-wise addition operation, effectively fusing global and local contextual information. Batch Normalization (BN) and SiLU activation function are utilized throughout the module to ensure stability and non-linearity.

Specifically, we utilized DCBlock to enhance the extraction of local features, as shown in Figure 4a. The process begins with a depthwise convolution, which operates independently on each input channel without mixing inter-channel information. This is followed by two 1 × 1 convolutions to aggregate inter-channel information, combined with activation functions to enhance the model’s non-linear expressive power, thereby extracting richer features. Subsequently, Batch Normalization is applied to provide regularization and reduce overfitting. The output is ultimately merged with the original input to further strengthen the feature representation capability.

Figure 4b illustrates the dual-branch structure employed by the ConvSSM module. Through the channel segmentation operation, the input is partitioned into two sub-inputs that are of equal size, and which are processed by convolution and SSM, respectively. In the left branch (Branch A), we leveraged the advantages of consecutive convolutional layers to model local features of diseases. In the right branch (Branch B), each dimension of the input was first expanded into a one-dimensional sequence, then the features were normalized by normalization technology, and the input was scanned forward by Mamba2. After that, the outputs from the dual-branch structure were connected in series along the channel axis, and this was followed by feature fusion of the combined output through a 1 × 1 convolution layer to promote the exchange of information between channels. Finally, it was fused with the original input feature map through residual connection.

2.2.4. Similarity-Based Attention Mechanism (SimAM)

In natural scenes, complex backgrounds, like leaf occlusion and light variation, often influence tomato leaf disease recognition accuracy. To reduce the impact of complex backgrounds, this study integrates SimAM attention mechanism [28] into the backbone network, which could enhance convolutional neural networks’ ability to represent features by simulating the phenomenon of spatial inhibition in the human brain. This mechanism does not introduce additional parameters and infers the three-dimensional (3-D) attention weight of each neuron by optimizing the energy function-based method, which makes the key features more significant and suppresses the unimportant features. SimAM attention block’s structure is shown in Figure 5.

In SimAM, the creation of the energy function stems from the observation that the neurons carrying the richest information are typically characterized by firing patterns that diverge significantly from their neighboring neurons in visual processing, and an active neuron may also inhibit the activities of surrounding neurons. Therefore, SimAM evaluates the significance of each neuron by quantifying the dissimilarity between the target neuron and the rest of the neurons. Specifically, Equation (8) shows the definition of energy function:

e_{t} (w_{t}, b_{t}, y, x_{i}) = (y_{t} - \hat{t})^{2} + \frac{1}{M - 1} \sum_{i = 1}^{M - 1} (y_{o} - {\hat{x}}_{i})^{2}

(8)

Here,

\hat{t} = w_{t} t + b_{t} a n d {\hat{x}}_{i} = w_{t} x_{i} + b_{t}

are linear transformations of the target neuron and the rest of neurons. The binary labels

y_{t} = 1

and

y_{o} = - 1

are introduced to simplify the calculation.

M

is used to represent the total number of neurons, while

w_{t}

and

b_{t}

are employed to indicate the weight and bias of the linear transformations, respectively. By minimizing this energy function, SimAM can identify neurons with significant differences in feature representation, thereby giving them higher weights. SimAM further derives the closed-form solution of the energy function, thereby avoiding the use of iterative solvers, which enhanced the efficiency of calculating attention weights. The fast closed-form solution is as follows:

w_{t} = - \frac{2 (t - μ_{t})}{(t - μ_{t})^{2} + 2 σ_{t}^{2} + 2 λ}

(9)

b_{t} = - \frac{1}{2} (t + μ_{t}) w_{t}

(10)

In Equation (9),

σ_{t}^{2}

and

μ_{t}

are the variance and mean of all the rest of the neurons in the channel after removing target neurons, respectively. In this way, SimAM can generate a unique weight for each neuron to achieve fine-tuning of the feature map. Finally, SimAM generates weights and adjusts the feature map by the following formula:

\hat{X} = Sigmoid (\frac{1}{E}) ⊙ X,

(11)

2.2.5. Weighted Bidirectional Feature Pyramid Module (BiFPN)

The morphological diversity of lesion area features in tomato leaf diseases leads to feature loss and is unfavorable for precise identification of tomato leaf diseases. Hence, it is necessary to integrate features at different scales to enhance the recognition capability of various disease features. YOLOv8 uses the top-down Path Aggregation Network Feature Pyramid Module (FAN-FPN) structure for multi-scale feature fusion. While transmitting high-level semantic information, it ignores low-level details and uses the original method in feature fusion, which does not consider the unequal contributions of input features with different resolutions to the fused output features. Different from FAN-FPN, BiFPN, whose structure is shown in Figure 6, uses bidirectional cross-scale connection to promote the fusion of cross-layer features. In addition, BiFPN adopts a weighted feature-fusion mechanism to assign reasonable weights to features on different paths. BiFPN’s dynamic weight adjustment mechanism enables it to progressively optimize its feature-fusion strategy. This improvement allows for more precise extraction and integration of useful information across different levels. The learned features are then utilized to iteratively update weights, leading to the acquisition of more valuable information. As depicted in Figure 6a, the BiFPN structure in the research by Tan et al. [30] consisted of five layers. The feature layer P5 was down-sampled to obtain P6 and P7, and then the feature map from P3 to P7 was integrated. However, due to the size reduction of P6 and P7 to 1/64 and 1/128 of the original input, their resolution was too low. The minuscule feature details of the diseased leaves may be missed during a down-sampling process with this low a resolution [31]. Therefore, we used BiFPN with a three-layer structure for feature fusion, as shown in Figure 6b.

The three-layer BiFPN has three input feature nodes (P3, P4, and P5), and the feature-fusion calculation of the fourth level is as follows:

P_{4}^{t d} = Conv (\frac{w_{1} \cdot P_{4}^{i n} + w_{2} \cdot Resize (P_{5}^{i n})}{w_{1} + w_{2} + ϵ})

(12)

P_{4}^{o u t} = Conv (\frac{w'_{1} \cdot P_{4}^{i n} + w'_{2} \cdot P_{4}^{t d} + w'_{3} \cdot Resize (P_{3}^{o u t})}{w'_{1} + w'_{2} + w'_{3} + ϵ})

(13)

P_{4}^{t d}

represents the intermediate feature located at level 4 along the top-down pathway;

P_{4}^{o u t}

and

P_{4}^{i n}

denote the output and input features, respectively, at level 4 along the bottom-up path;

w_{1}

and

w_{2}

were the input weights of

P_{4}^{t d}

;

w'_{1}, w'_{2}

, and

w'_{3}

were the input weights of

P_{4}^{o u t}

.

2.2.6. Evaluation Indicators

This paper uses recall (R), precision (P), F1 score, mAP0.5, mAP0.5:0.95, total parameters, and detection time (dt) as evaluation criteria to evaluate and compare the improved model with other control models. The relevant formulas are as follows:

P r e c i s i o n = \frac{T P}{T P + F N}

(14)

R e c a l l = \frac{T P}{T P + F N}

(15)

F 1 S c o r e = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(16)

m A P = \frac{1}{N} \sum_{i = 1}^{N} A P_{i}

(17)

Here, TP denotes true positives and FP denotes false positives. FN represents false negatives. The

A P_{i}

refers to the average precision value for the i-th species and

N

represents the total number of species.

3. Results

This section provides a detailed account of the performance of the YOLO-BSMamba model in the task of tomato leaf disease detection. Initially, we introduced the experimental platform and parameter settings, including the hardware configuration and key parameters, during the training process. Subsequently, through comparative experiments with the YOLOv8s model, we demonstrated the significant improvements from using YOLO-BSMamba in various metrics. In addition, ablation experiments were conducted to verify the contributions of the proposed modules (the HCMamba module, SimAM attention mechanism, and BiFPN) to the model’s performance. Finally, by comparing with other models in the YOLO series, the superiority of YOLO-BSMamba for the task of tomato leaf disease detection was further substantiated.

3.1. Experimental Platform and Parameter Settings

All experiments were carried out using Python3.8 and Pytorch 2.0.0. A RTX 3090 GPU with a 24 GB memory was utilized for training purposes. The detailed configurations of the experimental setup are outlined in Table 2, ensuring transparency and facilitating replication of the study.

In the training process, the input image dimensions for the network were set at 640 × 640 pixels and the stochastic gradient descent (SGD) algorithm was employed as the optimization strategy. The initial learning rate was set to 0.01, with a momentum of 0.9. Weight decay was configured at 0.0005. The batch size was 16, and the number of epochs was 300. We also use a warmup strategy for the first 3 epochs, gradually raising the learning rate to 0.01. It took 10 h to train for 300 epochs.

3.2. Comparison of Performance Between YOLO-BSMamba and YOLOv8s

Figure 7 illustrates the changes in the mAP@0.5 and the total loss during the training of both models. The loss curve reveals a rapid decline in total loss for YOLO-BSMamba (blue curve) during the initial training phase and maintains lower loss values than YOLOv8s (red curve) across the entire training process, with a more pronounced early decline reflecting swift convergence. Similarly, the accuracy of the mAP@0.5 curve reveals that after stabilizing, the YOLO-BSMamba model consistently outperforms YOLOv8s on the validation set results. This indicates that YOLO-BSMamba possesses higher accuracy and stronger generalization ability.

Table 3 shows the results for the YOLOv8s and YOLO-BSMamba models. It indicates that the YOLO-BSMamba model outperforms YOLOv8 across several key metrics. Specifically, YOLO-BSMamba shows improvements of 3.0%, 3.1%, 2.0%, 4.8%, and 4.3% in P, R, F1 score, mAP@0.5, and mAP@0.5:0.95, respectively, compared to YOLOv8s. Table 4 compares the performance of YOLOv8s and YOLO-BSMamba models in detecting various tomato leaf diseases, offering a detailed evaluation of their detection capabilities across individual categories. Overall, YOLO-BSMamba demonstrates superior performance compared to YOLOv8s, exhibiting higher mAP@0.5, P, and R across the majority of disease categories. The mAP@0.5 results demonstrate that YOLO-BSMamba consistently surpasses YOLOv8s in all categories, with significant improvements observed in the detection of early blight and septoria, where the mAP@0.5 increases by 6.5% and 18.8%, respectively. Meanwhile, the P and R metrics for late blight, spider mites, and yellow leaf curl virus are slightly lower than those for YOLOv8s in certain detection results. YOLO-BSMamba achieves superior performance across the remaining six disease categories, underscoring its robustness in disease-specific detection.

3.3. Visual Analysis

To visually illustrate the detection performance of the YOLO-BSMamba model, we carried out assessments and analyzed the model’s performance using a confusion matrix and a heat map. The confusion matrix illustrates the model’s classification accuracy, while the heat map presents data information in an intuitive and visual way.

Figure 8 displays the normalized confusion matrices of YOLOv8s and YOLO-BSMamba. The detection accuracy increases as the color of the main diagonal cells becomes darker. The darker the color the non-main diagonal cells are, the more likely it is that the elements aligned horizontally and vertically will be mistaken for each other. Overall, the YOLOv8s-BSMamba model demonstrates superior detection accuracy across most disease categories when compared to YOLOv8s.As can be observed from Figure 8b, the confusion matrix of YOLO-BSMamba shows a darker diagonal element than that of YOLOv8s, pointing to higher detection accuracy across most categories of disease. YOLO-BSMamba exhibits a lower rate of misclassifying various categories as “background” compared to YOLOv8s, indicating its superior performance in distinguishing between different classes and the background. Although its performance when detecting spider mites is slightly lower than that of YOLOv8s, the difference is small and does not significantly affect the overall improvement in performance. These results suggest that YOLO-BSMamba had superior detection accuracy for most categories in tomato leaf disease detection, with a reduced misclassification rate, showcasing its stronger generalization ability. These advantages suggest that YOLO-BSMamba is more appropriate for complex and diverse agricultural disease detection applications. We have noticed that there is confusion about some diseases; for example, the confusion between late blight and septoria may originate from the similar shape of the lesion areas they produce on leaves. Moreover, in complex background interference scenarios, the model’s ability to capture the boundaries and texture features of these lesion areas is not accurate enough, which affects the classification accuracy.

Figure 9 presents the feature heatmaps for both models, along with the original images, (Figure 9a), YOLOv8s (Figure 9b), and YOLO-BSMamba (Figure 9c). The red regions in the heatmaps represent the areas of focus for the model. From the Figure, it is clear that the two models exhibit differences in their attention to the diseased areas.

In the heatmap for YOLOv8s shown in Figure 9b, the distribution of high-attention areas is relatively sparse, with some lesion areas not being effectively marked and some background areas being falsely detected. This indicates that YOLOv8s is prone to interference when dealing with complex backgrounds, leading to a dispersion of attention on the diseased areas and a decrease in recognition accuracy. In contrast, the heatmap for YOLO-BSMamba in Figure 9c demonstrates denser and more continuous high-attention areas, with the core parts and edge features of the diseased leaves being captured more distinctly. This indicates that the YOLO-BSMamba model demonstrates a more focused attention on the diseased areas in complex background scenarios, enabling a more comprehensive and accurate coverage of the lesion regions, and it demonstrates an enhanced ability to capture the boundaries and morphological details of the affected areas.

3.4. Ablation Experiments

To evaluate the influence of the introduced modules on the overall network performance, we carried out a series of ablation studies, and the results on the test set are presented in Table 5.

Table 5 shows that the introduction of only the HCMamba module results in increases of 0.4% in P, 1.2% in R, 0.8% in the F1 score, 2.7% in the mAP@0.5, and 1.6% in the mAP@0.5:0.95 for the model, indicating that the HCMamba module can significantly improve the model’s general detection performance. When the SimAM attention mechanism is incorporated independently, the model achieves an mAP@0.5 and an mAP@0.5:0.95 of 0.842 and 0.693, representing respective improvements of 2.3% and 1.9%. Furthermore, the F1 score also increases to 0.804. This enhancement is likely attributable to SimAM’s ability to effectively suppress non-target information in complex backgrounds, thereby augmenting the representation of critical features. When both the HCMamba module and the SimAM are incorporated, the model demonstrates significantly improved performance across all evaluation metrics. Specifically, the F1 score increases to 0.811, and the mAP@0.5 and mAP@0.5:0.95 attain 0.860 and 0.710, respectively. These results suggest that the SimAM mechanism and the HCMamba module complement each other effectively: the former suppresses non-target information, while the latter excels in extracting critical features, which improved the capacity for feature representation. When the three modules—HCMamba, SimAM, and BiFPN—were integrated, the model achieved its optimal overall performance. Specifically, the P metrics increased to 0.858 and the F1 score reached 0.819, while the mAP@0.5 and mAP@0.5:0.95 improved to 0.867 and 0.720, respectively. Through bidirectional cross-scale connections and weighted feature fusion, BiFPN ensures comprehensive integration of features across scales at various levels, effectively mitigating the loss of critical feature information. The integration of BiFPN, HCMamba, and SimAM further improves the model’s capacity to detect leaf diseases in complex scenarios.

To further assess the effectiveness of the HCMamba module for tomato leaf disease detection, we compared it experimentally with the Swin transformer module. The Swin transformer, an advanced transformer-based model, excels in many computer vision tasks. It has strong feature extraction and representation capabilities, making it suitable for complex image data and for capturing long-range dependencies. In the experiment, we replaced the HCMamba module in the neck network with a Swin transformer module, and the experimental results are shown in Table 6.

Experimental results show that the HCMamba outperforms the Swin Transformer in recall, F1 score, and mean average precision. Specifically, the HCMamba’s F1 score of 0.819 surpasses the Swin transformer’s 0.797, indicating better balance between precision and recall. In addition, HCMamba achieves an mAP@0.5 of 0.846 and an mAP@0.5:0.95 of 0.693, surpassing the Swin transformer’s 0.839 and 0.676, respectively, and demonstrating its capability in accurate lesion localization and classification across different confidence thresholds.

3.5. Performance Comparison with YOLO Series Model

A comparative evaluation framework was implemented to assess the YOLO-BSMamba model’s performance against the YOLO series model. In the experiment, YOLOv5s, YOLO6s, YOLO7-tiny, YOLO8s, YOLO9s, YOLO10s, and YOLO11s were used as comparison models, and the same datasets and experimental parameters were used for training and testing. The experimental results are shown in Table 7.

As evidenced in Table 7, the YOLO-BSMamba model has demonstrated superior performance when compared to other models, achieving good results regarding P, R, F1 score, mAP@0.5, and map@0.5:0.95. The YOLO-BSMamba model achieved 0.858 of P, which is higher than all other models under comparison. For recall (R), it achieved 0.784, showing good performance in this metric as well. Moreover, the F1 score of the YOLO-BSMamba model is notably superior to that of other comparative models. This suggests that the model not only accurately identifies targets but also minimizes the omission of targets. Regarding the mAP@0.5, YOLO-BSMamba achieved 0.864, outperforming all models in this regard. When considering the mAP@0.5:0.95, YOLO-BSMamba achieved 0.864, surpassing all models except for YOLOv9s, indicating a comparable level of detection accuracy. The above results indicate that YOLO-BSMamba strikes a superior balance in terms of comprehensive performance, thereby establishing its competitiveness and practicality in the task of detecting diseases in tomatoes.

To more clearly evaluate the accuracy of all contrast models for tomato disease detection, we compared the mAP@0.5 for different models across all categories, which can comprehensively reflect the models’ accuracy in detecting different categories. Table 8 presents the mAP@0.5 for the detection of various diseases across all comparative models for tomato diseases. The best performance is shown in bold. As shown by the indicator value in the table, our proposed YOLO-BSMamba model outperformed other models by achieving the highest mAP@0.5 for the detection of plant diseases, specifically early blight, healthy, leaf miner, mosaic virus, and septoria. Although the detection accuracy for late blight, leaf mold, spider mites, and yellow leaf curl virus is somewhat lower compared to more recent models such as YOLOv9s, YOLOv10s, and YOLOv11s, YOLO-BSMamba is still the best in terms of overall performance, which suggests that YOLO-BSMamba maintains a competitive advantage in tomato disease detection.

4. Discussion

This study proposed the YOLO-BSMamba model for tomato leaf disease detection in complex background scenarios. The experimental results show that the model’s precision, recall, and mean average precision (mAP) have all been significantly improved, which confirms the potential of the model in agricultural application.

Through ablation experiments, we verified the effectiveness of the HCMamba module, SimAM attention mechanism, and BiFPN for improving the performance of the model. Compared to other models, YOLO-BSMamba has several distinct advantages. Unlike traditional CNN-based models that struggle with capturing long-range dependencies and global context in complex background scenarios, YOLO-BSMamba leverages the SSM within the HCMamba module to effectively model these dependencies, leading to more accurate disease localization and classification. The application of SimAM allows the model to better focus on regions tied to diseases, lessening the sway of complex background scenarios. The BiFPN module ensures comprehensive integration of features across scales, improving the model’s sensitivity to different disease severities and morphologies. YOLO-BSMamba’s improved accuracy and generalization ability make it a promising tool for precision agriculture, like disease detection and targeted pesticide application.

Although these components working together enhanced the model’s performance, the detection accuracy for certain diseases, such as spider mites and yellow leaf curl virus, is relatively low, indicating that the model is inefficient at extracting features of subtle morphological patterns—spider mite infestations present micron-level chlorotic spots requiring micron-scale features. While yellow leaf curl virus involves curling leaves and symptomatic regions, these occupy a relatively small effective pixel area within imaging data. And there is some confusion between some disease categories. For example, there is a certain degree of mutual misclassification between late blight and gray mold. These findings suggest that in future model optimization, we can enhance the ability to distinguish between similar disease characteristics. This can be achieved by introducing more suitable feature extraction modules or expanding the training data to improve the model’s robustness and accuracy in handling similar disease classification tasks.

5. Conclusions

In this study, we proposed the YOLO-BSMamba model, which incorporates the HCMamba module, SimAM attention mechanism, and BiFPN to address the challenges of detecting tomato leaf diseases in complex background scenarios. Through extensive experiments and comparisons with state-of-the-art models, YOLO-BSMamba demonstrated good performance across multiple evaluation metrics. On the tomato leaf disease dataset we constructed, YOLO-BSMamba has significantly improved the evaluation indicators include precision, recall, F1 score, mAP@0.5, and mAP@0.5:0.95, with respective enhancements of 3.0%, 3.1%, 3.0%, 4.8%, and 4.3%, compared to YOLOv8s. The ablation experiment further verifies the effectiveness of each module and proves that the combination of HCMamba, SimAM, and BiFPN can synergistically improve the overall performance of the model. The HCMamba module effectively extracts both fine-grained and coarse-grained features, significantly enhancing the model’s capacity to distinguish subtle disease patterns from background noise. The incorporation of the SimAM attention mechanism further refines the focus on disease-relevant regions, while the BiFPN module facilitates efficient multi-scale feature fusion, ensuring reliable detection across diverse scales and complexities. The current weighted mechanisms in BiFPN are mainly softmax-based fusion and fast normalized fusion. Although BiFPN has achieved remarkable results through its weighted feature-fusion mechanism, introducing a learnable attention-based weighting mechanism could further optimize feature integration. Future research could explore the idea of incorporating such a mechanism. Integrating attention mechanisms into the BiFPN’s weighting process allows the model to adaptively learn the importance of different feature paths.

Compared to other models, the YOLO-BSMamba has several distinct advantages. Unlike traditional CNN-based models that struggle with capturing long-range dependencies and global context in complex backgrounds, YOLO-BSMamba leverages the SSM within the HCMamba module to effectively model these dependencies, leading to more accurate disease localization and classification. The architectural design of the YOLO-BSMamba model possesses a certain degree of universality, which makes it potentially applicable to the detection tasks of other crop diseases. To verify the universality of the model, future research could consider conducting experiments on other crop disease datasets, and on whether the model can be applied to other crops by fine-tuning, such as the detection of diseases in other crops like cotton, wheat, or corn. Future research will also focus on optimizing the model architecture through techniques such as pruning and quantization to mitigate computational costs. Furthermore, we expect to enhance the model’s adaptability to specific diseases, such as spider mites and yellow leaf curl virus, through further optimization of the model structure or an adjustment of training strategies, with the aim of achieving even higher detection accuracy and broader applicability in agricultural disease detection.

Author Contributions

Conceptualization, Z.L. and S.L.; methodology, Z.L.; software, Z.L.; validation, Z.L., X.G., and T.Z.; formal analysis, X.G.; investigation, S.L.; resources, T.Z.; data curation, X.G.; writing—original draft preparation, Z.L.; writing—review and editing, S.L.; visualization, T.Z.; supervision, X.G.; project administration, X.G.; and funding acquisition, X.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Batool, A.; Hyder, S.B.; Rahim, A.; Waheed, N.; Asghar, M.A.; Fawad. Classification and Identification of Tomato Leaf Disease Using Deep Neural Network. In Proceedings of the 2020 IEEE International Conference on Engineering and Emerging Technologies (ICEET), Lahore, Pakistan, 22–23 February 2020; pp. 1–6.
Trippa, D.; Scalenghe, R.; Basso, M.F.; Panno, S.; Davino, S.; Morone, C.; Giovino, A.; Oufensou, S.; Luchi, N.; Yousefi, S.; et al. Next-generation Methods for Early Disease Detection in Crops. Pest Manag. Sci. 2024, 80, 245–261. [Google Scholar] [CrossRef] [PubMed]
Lin, Y.; Gao, W.-C.; Lin, C.-P.; Tsai, H.-J.; Chen, Y.-J.; Kuo, Y.-F. Automated Identification of Tomato Pests, Diseases, and Disorders Using Convolutional Neural Networks. In Proceedings of the 2024 ASABE Annual International Meeting, Anaheim, CA, USA, 28–31 July 2024. [Google Scholar]
Khanna, M.; Singh, L.K.; Thawkar, S.; Goyal, M. PlaNet: A robust deep convolutional neural network model for plant leaves disease recognition. Multimed. Tools Appl. 2023, 83, 4465–4517. [Google Scholar] [CrossRef]
Mahmood ur Rehman, M.; Liu, J.; Nijabat, A.; Faheem, M.; Wang, W.; Zhao, S. Leveraging Convolutional Neural Networks for Disease Detection in Vegetables: A Comprehensive Review. Agronomy 2024, 14, 2231. [Google Scholar] [CrossRef]
Vini, S.L.; Rathika, P. TrioConvTomatoNet: A robust CNN architecture for fast and accurate tomato leaf disease classification for real time application. Sci. Hortic. 2024, 330, 113079. [Google Scholar] [CrossRef]
Chakrabarty, A.; Ahmed, S.T.; Islam, F.U.; Aziz, S.M.; Maidin, S.S. An interpretable fusion model integrating lightweight CNN and transformer architectures for rice leaf disease identification. Ecol. Inform. 2024, 82, 102718. [Google Scholar] [CrossRef]
Zhang, D.; Huang, Y.; Wu, C.; Ma, M. Detecting tomato disease types and degrees using multi-branch and destruction learning. Comput. Electron. Agric. 2023, 213, 108244. [Google Scholar] [CrossRef]
Hughes, D.; Salathé, M. An open access repository of images on plant health to enable the development of mobile disease diagnostics. arXiv 2015, arXiv:1511.08060. [Google Scholar]
Liu, B.; Huang, X.; Sun, L.; Wei, X.; Ji, Z.; Zhang, H. MCDCNet: Multi-scale constrained deformable convolution network for apple leaf disease detection. Comput. Electron. Agric. 2024, 222, 109028. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Ultralytics. YOLOv5. Available online: https://github.com/ultralytics/yolov5 (accessed on 1 May 2024).
Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. YOLOx: Exceeding YOLO series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
Li, C.; Li, L.; Jiang, H. YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
Ultralytics. YOLOv8. Available online: https://github.com/ultralytics/ultralytics (accessed on 1 May 2024).
Li, J.; Qiao, Y.; Liu, S.; Zhang, J.; Yang, Z.; Wang, M. An improved YOLOv5-based vegetable disease detection method. Comput. Electron. Agric. 2022, 202, 107345. [Google Scholar] [CrossRef]
Guo, F.; Li, J.; Liu, X.; Chen, S.; Zhang, H.; Cao, Y.; Wei, S. Improved YOLOv7-Tiny for the Detection of Common Rice Leaf Diseases in Smart Agriculture. Agronomy 2024, 14, 2796. [Google Scholar] [CrossRef]
Yang, S.; Yao, J.; Teng, G. Corn Leaf Spot Disease Recognition Based on Improved YOLOv8. Agriculture 2024, 14, 666. [Google Scholar] [CrossRef]
Yan, C.; Yang, K. FSM-YOLO: Apple leaf disease detection network based on adaptive feature capture and spatial context awareness. Digit. Signal Process. 2024, 155, 104770. [Google Scholar]
Zou, F.; Hua, J.; Zhu, Y.; Deng, J.; He, R. ECVNet: A Fusion Network of Efficient Convolutional Neural Networks and Visual Transformers for Tomato Leaf Disease Identification. Agronomy 2024, 14, 2985. [Google Scholar] [CrossRef]
Gu, A.; Dao, T. Mamba: Linear-time sequence modeling with selective state spaces. arXiv 2023, arXiv:2312.00752. [Google Scholar]
Ma, H.; Lei, S.; Celik, T.; Li, H.C. FER-YOLO-Mamba: Facial Expression Detection and Classification Based on Selective State Space. arXiv 2024, arXiv:2405.01828. [Google Scholar]
Wang, Z.; Li, C.; Xu, H.; Zhu, X. Mamba YOLO: SSMs-Based YOLO For Object Detection. arXiv 2024, arXiv:2406.05835. [Google Scholar]
Gu, A.; Dao, T. Transformers are ssms: Generalized models and efficient algorithms through structured state space duality. In Proceedings of the Forty-First International Conference on Machine Learning, Vienna, Austria, 21–27 July 2024. [Google Scholar]
Yang, L.; Zhang, R.-Y.; Li, L.; Xie, X. SimAM: A Simple, Parameter-Free Attention Module for Convolutional Neural Networks. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 11863–11874. [Google Scholar]
Singh, D.; Jain, N.; Jain, P.; Kayal, P.; Kumawat, S.; Batra, N. PlantDoc: A dataset for visual plant disease detection. In ACM International Conference Proceeding Series; Association for Computing Machinery: Melbourne, Australia, 2020; pp. 249–253. [Google Scholar]
Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 10781–10790. [Google Scholar]
Fan, X.; Sun, T.; Chai, X.; Zhou, J. YOLO-WDNet: A lightweight and accurate model for weeds detection in cotton field. Comput. Electron. Agric. 2024, 225, 109317. [Google Scholar] [CrossRef]

Figure 1. Image samples in tomato disease dataset. (a) Images in Plant Village dataset; (b) images in Plantdoc dataset; and (c) images in Taiwan tomato leaf disease dataset.

Figure 2. Samples of data enhancement. (a) Original image; (b) rotation; (c) brightness; (d) gray scale; (e) Gaussian noise; and (f) random masking.

Figure 3. The overall architecture of YOLOv8-BSMamba.

Figure 4. HCMamba module structure. (a) Detailed structure of DCBlock and (b) detailed structure of ConvSSM.

Figure 5. SimAM attention block’s structure.

Figure 6. Two network structures of BiFPN. (a) BiFPN of five layers and (b) BiFPN of three layers.

Figure 7. Comparison of models’ total loss and mAP@0.5 curves. (a) mAP@0.5 curve and (b) total loss curve.

Figure 8. Confusion matrix for prediction results; (a) confusion matrix of Yolov8s and (b) confusion matrix of Yolov8s-BSMamba.

Figure 9. Heatmaps of the models for tomato leaf disease. (a) original image; (b) heatmap for YOLOv8s; and (c) heatmap for YOLO-BSMamba.

Table 1. Statistics for tomato leaf disease dataset.

Data	Image Count	Number Bounding Box
Data	Image Count	Healthy	Early Blight	Late Blight	Leaf Miner	Leaf Mold	Mosaic Virus	Septoria	Spider Mites	Yellow Leaf Curl Virus
Training set	7062	3610	3300	2040	3276	3198	2844	2532	1812	2255
Validation set	673	257	194	197	192	187	170	168	149	207
Test set	335	176	87	80	94	121	80	60	87	184
Sum	8070	4043	3581	2317	3562	3506	3094	2760	2048	2646

Table 2. Experimental environment configuration.

Name	Configuration
Operating System	Ubuntu 22.04 LTS
CPU	Intel(R) Xeon(R) Gold 6133 CPU
GPU	RTX 3090 (24 GB)
CUDA	11.8
Pytorch	2.0.0
Python3	3.8

Table 3. Comparison of experimental results between YOLOv8s and YOLO-BSMamba models.

Methods	P	R	F1 Score	mAP@0.5	mAP@0.5:0.95	Parameters/M	dt (ms)
YOLOv8s	0.828	0.753	0.789	0.819	0.677	11.14	5.9
YOLO-BSMamba	0.858	0.784	0.819	0.867	0.720	12.96	7.7

Table 4. The performance of YOLOv8s and YOLO-BSMamba in the detection of different tomato leaf diseases.

Methods	Indicator	Early Blight	Healthy	Late Blight	Leaf Miner	Leaf Mold	Mosaic Virus	Septoria	Spider Mites	Yellow Leaf Curl Virus
YOLOv8s	mAP@0.5	0.799	0.891	0.938	0.946	0.796	0.929	0.459	0.952	0.661
	P	0.763	0.902	0.876	0.875	0.752	0.899	0.755	0.901	0.726
	R	0.703	0.764	0.933	0.894	0.719	0.850	0.359	0.957	0.598
YOLO-BSMamba	mAP@0.5	0.864	0.927	0.965	0.950	0.848	0.960	0.647	0.959	0.679
	P	0.836	0.925	0.832	0.934	0.810	0.929	0.833	0.868	0.753
	R	0.819	0.806	0.933	0.903	0.777	0.917	0.433	0.915	0.552

Table 5. Ablation experiment results.

HCMamba	SimAM	BiFPN	P	R	F1 Score	mAP@0.5	mAP@0.5:0.95
×	×	×	0.828	0.753	0.789	0.819	0.677
√	×	×	0.832	0.765	0.797	0.846	0.693
×	√	×	0.835	0.775	0.804	0.842	0.695
√	√	×	0.826	0.796	0.811	0.860	0.710
√	√	√	0.858	0.784	0.819	0.867	0.720

Table 6. Performance comparison between HCMamba and Swin Transformer.

Methods	P	R	F1 Score	mAP@0.5	mAP@0.5:0.95	Parameters/M	dt (ms)
Swin Transformer	0.86	0.738	0.797	0.839	0.676	10.33	7.1
HCMamba	0.832	0.765	0.819	0.846	0.693	12.96	7.0

Table 7. Performance comparison and analysis of YOLO series object-detection models.

Methods	P	R	F1 Score	mAP@0.5	mAP@0.5:0.95	Parameters/M	dt (ms)
YOLOv5s	0.798	0.786	0.782	0.827	0.610	7.04	4.7
YOLOv6s	0.841	0.751	0.793	0.830	0.647	18.51	5.99
YOLOv7-tiny	0.819	0.768	0.793	0.828	0.599	6.04	8.8
YOLOv8s	0.828	0.753	0.789	0.819	0.677	11.14	5.9
YOLOv9s	0.832	0.781	0.806	0.850	0.728	9.75	7.8
YOLOv10s	0.817	0.775	0.795	0.835	0.677	8.07	6.2
YOLOv11s	0.844	0.757	0.798	0.847	0.707	9.43	5.3
YOLO-BSMamba	0.858	0.784	0.819	0.864	0.720	12.96	7.7

Table 8. Tomato disease detection model mAP@0.5 performance comparison.

Methods	Early Blight	Healthy	Late Blight	Leaf Miner	Leaf Mold	Mosaic Virus	Septoria	Spider Mites	Yellow Leaf Curl Virus
YOLOv5s	0.757	0.873	0.955	0.921	0.797	0.939	0.555	0.946	0.696
YOLOv6s	0.804	0.817	0.876	0.903	0.788	0.837	0.419	0.947	0.64
YOLOv7-tiny	0.813	0.844	0.938	0.893	0.742	0.940	0.605	0.961	0.714
YOLOv8s	0.799	0.891	0.938	0.946	0.796	0.929	0.459	0.952	0.661
YOLOv9s	0.842	0.881	0.954	0.949	0.818	0.958	0.514	0.970	0.744
YOLOv10s	0.828	0.897	0.967	0.910	0.804	0.939	0.551	0.942	0.675
YOLOv11s	0.811	0.905	0.963	0.937	0.854	0.939	0.540	0.965	0.707
YOLO-BSMamba	0.864	0.927	0.965	0.950	0.848	0.960	0.647	0.959	0.679

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, Z.; Guo, X.; Zhao, T.; Liang, S. YOLO-BSMamba: A YOLOv8s-Based Model for Tomato Leaf Disease Detection in Complex Backgrounds. Agronomy 2025, 15, 870. https://doi.org/10.3390/agronomy15040870

AMA Style

Liu Z, Guo X, Zhao T, Liang S. YOLO-BSMamba: A YOLOv8s-Based Model for Tomato Leaf Disease Detection in Complex Backgrounds. Agronomy. 2025; 15(4):870. https://doi.org/10.3390/agronomy15040870

Chicago/Turabian Style

Liu, Zongfang, Xiangyun Guo, Tian Zhao, and Shuang Liang. 2025. "YOLO-BSMamba: A YOLOv8s-Based Model for Tomato Leaf Disease Detection in Complex Backgrounds" Agronomy 15, no. 4: 870. https://doi.org/10.3390/agronomy15040870

APA Style

Liu, Z., Guo, X., Zhao, T., & Liang, S. (2025). YOLO-BSMamba: A YOLOv8s-Based Model for Tomato Leaf Disease Detection in Complex Backgrounds. Agronomy, 15(4), 870. https://doi.org/10.3390/agronomy15040870

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

YOLO-BSMamba: A YOLOv8s-Based Model for Tomato Leaf Disease Detection in Complex Backgrounds

Abstract

1. Introduction

2. Materials and Methods

2.1. Materials

2.1.1. Data Acquisition

2.1.2. Data Preprocessing

2.2. Methods

2.2.1. Preliminaries

2.2.2. Overall Architecture

2.2.3. Hybrid Convolutional Mamba (HCMamba Module)

2.2.4. Similarity-Based Attention Mechanism (SimAM)

2.2.5. Weighted Bidirectional Feature Pyramid Module (BiFPN)

2.2.6. Evaluation Indicators

3. Results

3.1. Experimental Platform and Parameter Settings

3.2. Comparison of Performance Between YOLO-BSMamba and YOLOv8s

3.3. Visual Analysis

3.4. Ablation Experiments

3.5. Performance Comparison with YOLO Series Model

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI