Next Article in Journal
Immunomodulatory Effects of Angelica Sinensis Polysaccharides on Juvenile Chinese Sturgeon (Acipenser sinensis): Physiological and Molecular Insights
Next Article in Special Issue
Generative Model Construction Based on Highly Rated Koi Images to Evaluate Koi Quality
Previous Article in Journal
First Evidence of Secondary Sexual Dimorphism in the Freshwater Fish Family Botiidae: A Newly Recognised Synapomorphy of Loaches (Cypriniformes: Cobitoidea)
Previous Article in Special Issue
Salient Object Detection Guided Fish Phenotype Segmentation in High-Density Underwater Scenes via Multi-Task Learning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

FishMambaNet: A Mamba-Based Vision Model for Detecting Fish Diseases in Aquaculture

College of Artificial Intelligence, Zhongkai University of Agriculture and Engineering, Guangzhou 510225, China
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Fishes 2025, 10(12), 649; https://doi.org/10.3390/fishes10120649
Submission received: 18 November 2025 / Revised: 13 December 2025 / Accepted: 15 December 2025 / Published: 16 December 2025
(This article belongs to the Special Issue Application of Artificial Intelligence in Aquaculture)

Abstract

The growth of aquaculture poses significant challenges for disease management, impacting economic sustainability and global food security. Traditional diagnostics are slow and require expertise, while current deep learning models, including CNNs and Transformers, face a trade-off between capturing global symptom context and maintaining computational efficiency. This paper introduces FishMambaNet, a novel framework that integrates selective state space models (SSMs) with convolutional networks for accurate and efficient fish disease diagnosis. FishMambaNet features two core components: the Fish Disease Detection State Space block (FSBlock), which models long-range symptom dependencies via SSMs while preserving local details with gated convolutions, and the Multi-Scale Convolutional Attention (MSCA) mechanism, which enriches multi-scale feature representation with low computational cost. Experiments demonstrate state-of-the-art performance, with FishMambaNet achieving a mean Average Precision at 50% Intersection over Union (mAP@50) of 86.7% using only 4.3 M parameters and 10.7 GFLOPs, significantly surpassing models like YOLOv8-m and RT-DETR. This work establishes a new paradigm for lightweight, powerful disease detection in aquaculture, offering a practical solution for real-time deployment in resource-constrained environments.
Key Contribution: This paper pioneers the integration of selective state space models into aquatic animal health monitoring, introducing the novel FSBlock and MSCA modules that collaboratively achieve global dependency modeling and multi-scale feature extraction with linear computational complexity, setting a new benchmark for efficient and accurate fish disease detection.

1. Introduction

Aquaculture is one of the fastest-growing global food production sectors, accounting for a substantial share of the world’s edible fish and serving as the primary source of seafood, thereby playing a crucial role in food security [1,2,3]. However, this rapid expansion poses major disease management challenges that threaten its economic sustainability and undermine its role in global food security. According to the United Nations Food and Agriculture Organization (FAO), the aquaculture industry incurs substantial annual losses from aquatic animal diseases [4]. The most prevalent and damaging diseases in farmed fish encompass bacterial infections (e.g., aeromoniasis and vibriosis); fungal infections, notably saprolegniasis; parasitic infections (e.g., white spot disease and dactylogyrosis); and the rapidly spreading white tail disease [5,6,7]. The intensive nature of modern aquaculture—characterized by high-density farming in confined environments—creates ideal conditions for disease transmission, frequently resulting in catastrophic economic impacts [8].
Traditional diagnosis of fish diseases relies primarily on visual inspection by experienced veterinarians, supplemented by laboratory microscopy [9]. Although established, these methods have several inherent limitations: the process is time-consuming, causing critical delays in outbreak response; diagnostic accuracy is highly dependent on the expertise of the practitioner; and a global shortage of aquaculture health specialists hinders effective disease surveillance. These limitations are especially critical for diseases like white tail disease, whose rapid progression and highly contagious nature require immediate intervention to prevent mass mortality [10].
The emergence of computer vision and artificial intelligence has opened new pathways for automated fish disease diagnosis, with technical approaches evolving through three distinct phases. Initial research employed traditional machine learning, relying on handcrafted features with classifiers like Support Vector Machines (SVM) and Random Forests [11,12,13]. While promising in controlled settings, these methods struggled in practical aquaculture due to variations in water quality, lighting, and fish behavior. The advent of deep learning, particularly Convolutional Neural Networks (CNNs) [14,15], marked a significant advancement. End-to-end frameworks like Faster R-CNN and YOLO substantially improved detection accuracy [16,17,18,19,20,21,22,23,24]. For instance: Huang et al. developed CNN-OSELM [25], a multi-layer fusion network with an attention mechanism for precise disease identification; Sanjay Kumaar et al. proposed FishNet [26], increasing freshwater fish disease detection accuracy by 2%; Yu et al. introduced MobileNet3-GELU-YOLOv4 [27], achieving a 12.39% mAP increase and 19.31 frames per second (FPS) speed boost over YOLOv4 with fewer parameters; Li et al. extended YOLOv8 with a semantic segmentation branch for lesion localization [27], proposing YOLO-FD; Wu et al. addressed limited data and symptom similarity with YOLOv11-SDiseasedFishNet, attaining 94.8% mAP, 93.9% recall, and 97.1% precision [28]. However, CNNs are constrained by limited receptive fields, hindering their ability to capture long-range spatial dependencies of symptoms distributed across a fish’s body—a critical factor for diagnosing diseases with correlated symptom patterns. To address this, researchers have turned to Transformer architectures that leverage self-attention to model global contexts [29]. Nath et al. combined Vision Transformer (ViT) with CNNs to integrate global analysis and local features, enhancing classification efficiency and accuracy [30]. Bhattacharjee et al. achieved state-of-the-art 97.92% accuracy with a ViT-based model, surpassing CNNs [31]. Alluhaidan et al. developed an enhanced Swin-Transformer for automated pathogen detection [32]. Despite excelling at capturing comprehensive symptom distributions, these Transformer-based methods face a major obstacle: their quadratic computational complexity makes real-time processing of high-resolution underwater images computationally prohibitive, presenting a significant implementation barrier in resource-constrained aquaculture environments.
The recent advent of SSMs [33,34], exemplified by the Mamba architecture [35], presents a promising alternative to Transformers by combining comparable long-range dependency modeling with linear computational complexity. This approach efficiently processes sequential data while maintaining a global receptive field. VMamba [36] is a vision-centric adaptation of the Mamba state space model, which extends the 1D selective scan to 2D for efficient long-range dependency modeling in images. These characteristics are well-suited to aquatic disease diagnosis. However, the potential of SSMs in aquatic animal health monitoring remains largely unexplored, with no existing research systematically investigating their application to fish disease detection in aquaculture. In addition, while not specifically designed for aquatic disease diagnosis, general computer vision research has increasingly focused on improving model robustness in complex environments. For instance, methodologies from robust feature learning in complex scene parsing [37], and geometric-attack-resistant image representation [38], provide relevant strategies for handling environmental interference and feature distortion. These works offer valuable cross-domain insights that could inform future improvements in the environmental adaptability and stability of aquatic vision models. Building on these perspectives—and to bridge the identified gap while overcoming the limitations of existing approaches—this paper introduces FishMambaNet, a novel detection framework that strategically integrates selective state space models with convolutional networks.
The framework introduces three key architectural innovations:
  • The FSBlock integrates a VMamba-based (2D Selective Scan) SS2D module for capturing global disease patterns with a GCBlock’s convolutional operations for extracting local symptomatic features, stabilized through residual learning.
  • The MSCA mechanism employs attention branches to efficiently capture contextual information, utilizing parallel partial convolutions and channel splitting to gather multi-scale information while maintaining computational efficiency.
  • The overall network architecture strategically positions the FSBlock and MSCA modules at critical backbone and neck stages, thereby significantly enhancing feature representation.
This study makes three primary contributions: (1) It pioneers the comprehensive application of selective state space models to fish disease diagnosis, establishing a new paradigm that synergizes SSMs and CNNs for aquatic health monitoring. (2) It introduces two novel components—the FSBlock and the MSCA mechanism—that collaboratively address the challenges of global dependency modeling and multi-scale feature extraction in underwater environments. (3) Through extensive experiments, it validates FishMambaNet’s state-of-the-art performance in multi-category fish disease detection, achieving a mAP@50 of 86.7% with computational efficiency suitable for real-world deployment (4.3 M parameters, 10.7 GFLOPs), offering a practical solution for commercial aquaculture. The remainder of this paper is organized as follows: Section 2 details the dataset construction and the FishMambaNet architecture; Section 3 presents the experimental setup and results; Section 4 discusses practical applications and limitations; and Section 5 provides concluding remarks and future directions.

2. Materials and Methods

2.1. Datasets

Data Collection and Composition

The experimental dataset was systematically constructed from two primary sources: local aquaculture farms in Zhenlong Town, Xinyi City, Maoming, Guangdong Province (22.35° N, 110.94° E)—a region representative of intensive aquaculture in southern China—and publicly available online images of diseased fish. Data collection spanned from November 2024 to August 2025, thereby encompassing a complete production cycle and capturing seasonal variations. The dataset composition is illustrated in Figure 1.
The dataset comprises five annotated categories: Bacterial Diseases, Fungal Diseases, Healthy Fish, Parasitic Diseases, and White Tail Diseases. In total, 1628 bounding boxes were annotated across 927 images, which were partitioned into training (725 images), validation (96 images), and testing (106 images) sets. The distribution of bounding boxes per category is detailed in Table 1.
Figure 2 presents a multi-faceted analysis of the training set. The bar chart (top-left) illustrates the instance distribution per category, revealing a significant class imbalance where “Healthy Fish” samples substantially outnumber diseased categories. The grid (top-right) visualizes the clustering results of bounding box dimensions in the training set, which define the initial prior boxes (anchors) for the detection model. These nine anchor sizes are automatically learned to better match the typical scale and aspect ratio of fish and lesions in our dataset. The scatter plots below show the concentration of bounding box centers near the image center and the distribution of their width-height ratios, respectively. These analyses confirm consistent patterns in target position and scale, thereby validating the dataset’s construction quality.

2.2. State Space Models

SSMs, which form the basis for architectures like Mamba, originate from linear time-invariant systems. The core principles of these models are built upon classical continuous systems, which map a one-dimensional input sequence x ( t ) R to an output y ( t ) R through a hidden state h ( t ) R N . This mapping is typically described by a set of linear ordinary differential equations (ODEs).
h t = A h t + B x t
y t = C h t
where A R N × N represents the state matrix, while A R N × 1 and A R 1 × N are the projection parameters.
The S4 and Mamba models bridge the gap between these continuous-time systems and deep learning through discretization. A key step in this process is the introduction of a timescale parameter Δ , which is used to transform the continuous parameters A and B into their discrete counterparts A and B . This transformation is accomplished using the zero-order hold (ZOH) discretization rule.
A ¯ = e x p A
B ¯ = A 1 e x p A I · B
After discretization, Equation (1) can be defined as:
h t = A ¯ h t 1 + B ¯ x t
y t = C h t
The output sequence y is then computed via a global convolution operation between the input sequence x (of length L x ).
K ¯ = C B ¯ , C A B ¯ , , C A ¯ L x 1 B ¯
y = x K ¯
where K R L x represents a structured convolutional kernel.

2.3. Overall Architecture

FishMambaNet adopts a YOLOv8-based architecture, integrating custom modules into its backbone and neck to form an efficient multi-scale detection system. The backbone network extracts hierarchical features, beginning with two standard convolutional layers for initial downsampling and channel expansion. The innovative FSBlock is then systematically integrated across four different scales. Each FSBlock captures both global disease patterns and local lesion details by combining a SS2D with a Gated Convolution Block (GCBlock). Subsequently, feature maps undergo multi-scale context aggregation via the SPPF module and are refined by the MSCA mechanism. In the neck network, a feature pyramid structure fuses high-resolution shallow features with high-semantic deep features through upsampling and concatenation. These fused features are processed by the C2f module, ultimately producing three feature maps of different scales for the detection head. This multi-scale design enables the simultaneous detection of lesions of various sizes. The integration of custom FSBlock and MSCA modules significantly enhances the accuracy and robustness of multi-category fish disease diagnosis in complex underwater environments. The overall architecture is depicted in Figure 3.

2.4. Fish Disease Detection State Space Block

The FSBlock is architected to address two complementary objectives: modeling global disease distributions and extracting local lesion features. This dual design is motivated by the clinical presentation of fish diseases. For example, symptoms of white tail disease—such as white spots or ulcers—often distribute spatially across the fish’s body, forming correlated patterns that necessitate holistic analysis for accurate diagnosis. Traditional CNNs, constrained by limited receptive fields, typically fail to capture these long-range dependencies, resulting in incomplete symptom recognition. The integrated SS2D module overcomes this limitation by providing a global receptive field with linear computational complexity, enabling efficient contextual modeling across the entire image through selective state space mechanisms. This capability allows the model to recognize interacting symptom patterns across different body regions, significantly enhancing diagnostic precision for systemic conditions. Conversely, identifying localized pathologies like scale shedding morphology or gill congestion requires specialized local feature extraction. This function is handled by the GCBlock, which captures fine-grained details through its dual-path architecture combining depthwise convolutions with gating mechanisms. Within a residual learning framework, the synergistic operation of SS2D and GCBlock produces comprehensive feature representations spanning both micro-level details and macro-level patterns, making FSBlock particularly suited for the complex multi-scale nature of aquatic disease manifestations. The specific structure is shown in Figure 4.
Within the FSBlock, the input feature map X R H × W × B × C is first processed by a projection layer composed of a 1 × 1 convolution, batch normalization, and a SiLU activation. This layer serves to adjust the channel dimension to a unified hidden size and prepare the features for subsequent operations. The resulting features are then fed into a GCBlock, which enhances local feature representation through its gating mechanism. This output, denoted X p r e , is processed as follows prior to SS2D-based global modeling:
X p r e = G C B l o c k ( X p r o j )
X p r o j = S i L U ( B N ( C o n v 1 × 1 ( X ) ) )
Subsequently, X p r e undergoes layer normalization before being input to the SS2D module. As the core component for implementing the selective state space model, SS2D captures long-range dependencies crucial for fish disease detection while maintaining the linear computational complexity of the S6 block. The SS2D module comprises three core components: a scan expansion operation, the S6 module, and a scan merging operation (see Figure 4). The S6 module originates from Mamba and implements the selective scanning mechanism. By dynamically adjusting SSM parameters based on input, it introduces a selection mechanism built upon S4. This enables the model to distinguish and retain relevant information while filtering out irrelevant content. Specifically, the scan expansion operation unfolds the 2D feature map into sequences along four distinct directions (top-left to bottom-right, bottom-right to top-left, top-right to bottom-left, and bottom-left to top-right). The S6 module then processes these sequences, performing selective feature extraction that dynamically adjusts SSM parameters based on the input. This selection mechanism enables the model to retain critical information while filtering irrelevant data. Finally, the scan merging operation integrates the features from all four directional sequences to reconstruct a 2D output feature map with dimensions matching the input. The pseudocode for the S6 module is detailed in Algorithm 1.
Algorithm 1 Pseudo-code for S6 Block in SS2D
Input: x , the feature with shape [B, L, D] (batch size, token length, dimension)
Params: A , the nn. Parameter; D , the nn. Parameter
Operator: Linear(.), the linear projection layer
Output: y , the feature with shape [B, L, D]
1: , B , C = L i n e a r ( x ) , L i n e a r ( x ) , L i n e a r ( x )
2: A ¯ = e x p ( A )
3: B ¯ = ( A ) 1 ( e x p ( A ) I ) · B
4: h t = A ¯ h t 1 + B ¯ x t
5: y t =   C h t   +   D x t
6: y   =   [ y 1 ,   y 2 ,   ·   ·   ·   ,   y t ,   ·   ·   ·   ,   y L ]
7: return  y
Within the FSBlock, the output from the SS2D module, denoted X S S 2 D , contains the captured global contextual information. This output is then passed through a layer normalization and a subsequent GCBlock. The result is fused with the original X S S 2 D via a residual connection to enhance the model’s nonlinear representational capacity. This process is formally defined by the following expressions:
X S S 2 D = S S 2 D ( B N ( X p r e ) )
X o u t = G C B l o c k ( S i L U ( C o n v 1 × 1 ( X S S 2 D ) ) ) + X S S 2 D

2.5. Multi-Scale Convolutional Attention Model

To tackle the challenge of diverse fish disease symptoms—which vary in scale, morphology, and spatial distribution within underwater environments—this study introduces an MSCA mechanism. This module acts as an efficient hybrid unit that integrates local detailed features with global contextual information. It employs parallel multi-scale convolutional branches and a global attention branch to enhance the model’s ability to discriminate between symptoms in complex backgrounds. The design incorporates lightweight partial convolution (PConv) to significantly improve feature representation while maintaining low computational overhead. The structure of the MSCA mechanism is detailed in Figure 5.
The core of the MSCA design consists of parallel processing paths. For an input feature map X R B × C × H × W , the module first projects the input into query (Q), key (K), and value (V) vectors using a lightweight generation layer:
Q , K , V = S p l i t ( D W C o n v 3 × 3 ( C o n v 1 × 1 ( X ) ) )
This generation layer employs depthwise separable convolution, which embeds local spatial context directly into Q , K , and V during their formation, thereby providing a more discriminative foundational representation for subsequent attention computation.
Q , K = L 2 N o r m a l i z e ( R e s h a p e ( Q ) ) , L 2 N o r m a l i z e ( R e s h a p e ( K ) )
The scaled dot-product attention is then calculated. A key innovation is the inclusion of a learnable temperature parameter T R H × 1 × 1 that adaptively sharpens the attention distribution:
F attn = S o f t m a x ( Q K T T d k ) V
Here, d k denotes the dimension of the key vector, and the scaling by 1 d k prevents gradient explosion during training.
This attention mechanism enables the model to identify and reinforce spatially dispersed but semantically related disease manifestations—such as symptoms co-occurring in gills and tail fins—which is crucial for accurate disease diagnosis.
The multi-scale convolutional branch employs parallel convolutional operations with different receptive fields to capture features ranging from local to contextual scales. Specifically: Branch 1 uses three consecutive 3 × 3 partial convolutions to capture fine-grained local features like individual white spots or tiny blood vessels. Branch 2 employs three 3 × 3 partial convolutions with a dilation rate of 2, expanding the receptive field to extract intermediate-scale patterns such as localized color anomalies. Branch 3 utilizes three 3 × 3 partial convolutions with a dilation rate of 3, further enlarging the receptive field to capture broader contextual information like large-scale ulcer distributions.
All branches utilize PConv to maintain computational efficiency. The core principle of PConv leverages the inherent redundancy between feature map channels [39]. In this implementation, each parallel PConv processes only a subset of input channels while preserving the remainder unchanged. Compared to standard convolutions, this approach significantly reduces floating-point operations (FLOPs) and memory access, substantially lowering computational complexity while maintaining information integrity through identity connections. This achieves an optimal balance between computational efficiency and feature representation. The resulting architecture is particularly suitable for resource-constrained environments, enabling efficient high-precision detection of fish disease manifestations. The detailed structure of this multi-scale convolutional branch is presented in Figure 6.
The outputs from the three parallel branches are concatenated and fused using a 1 × 1 convolution, generating comprehensive multi-scale convolutional features F conv R B × C × H × W . These features are then combined with the original input through a residual connection.
F conv = Conv 1 × 1 Concat PConv d = 1 X , PConv d = 2 X , PConv d = 3 X + X
where d is the different dilation rate of convolution.
Finally, the MSCA module aggregates the outputs from both processing paths through residual connections:
Y = Conv 1 × 1 ( F attn ) + F conv
Here, F a t t n denotes the output from the attention branch, and F c o n v represents the output from the multi-scale convolutional branch.
In FishMambaNet, the MSCA module acts as the final feature refinement stage, integrating and enhancing multi-level features from the backbone. It produces high-level semantic representations that fuse rich detail with global context, thereby significantly boosting the model’s accuracy and robustness in detecting multi-scale fish diseases within complex underwater environments.

3. Results

3.1. Experimental Environment and Evaluation Metrics

3.1.1. Experimental Environment

To ensure experimental reproducibility, we provide a detailed specification of the computational environment in Table 2, as results in deep learning can be influenced by specific hardware and software configurations.
The hyperparameters were configured as follows: 200 training epochs, a batch size of 8, stochastic gradient descent (SGD) optimization with a learning rate of 0.01 and a weight decay coefficient of 0.0005, and 8 workers for data loading.

3.1.2. Evaluation Metrics

The model’s performance is evaluated using standard object detection metrics: precision, recall, mean Average Precision (mAP), and the F1 score. Their formal definitions are provided below.
P   =   T P T P + F P
R   =   T P T P + F N
A P   =   0 1 P ( R ) d R ,   m A P = 1 N i = 1 N A P i
F 1 = 2 × ( P × R ) P + R
Precision quantifies the proportion of correctly identified positive predictions among all instances predicted as positive, while recall measures the proportion of actual positives that were successfully identified. The Average Precision (AP) metric, derived from the area under the precision-recall curve, provides a comprehensive assessment of a model’s performance for a single class. The mAP extends this by computing the mean of AP values across all classes, serving as the primary indicator of overall detection accuracy in multi-class scenarios. The F1 score, as the harmonic mean of precision and recall, offers a balanced single metric. Collectively, these metrics constitute a rigorous framework for evaluating detection performance from multiple perspectives.
For model efficiency, we evaluate three key metrics: the number of parameters (Params), which reflects model complexity and memory footprint; FLOPs, which indicates computational complexity and potential inference speed; and the model weight file size, which directly impacts storage and deployment overhead. These efficiency metrics are crucial for assessing a model’s practicality in resource-constrained environments and, when combined with the accuracy metrics above, provide a holistic view of the model’s capability to balance performance with efficiency.

3.2. Compared to Common Detection Models

This study conducts a comparative performance evaluation of Faster R-CNN, multiple YOLO series variants, RT-DETR [40], and the proposed FishMambaNet. The evaluation encompasses both accuracy and efficiency metrics (with the best and second-best performances highlighted in bold and underline, respectively), with detailed results provided in Table 3.
As shown in Table 3, FishMambaNet achieves outstanding performance in both accuracy and efficiency, setting a new benchmark for practical aquaculture disease detection. It attains the highest scores in precision (91.0%), mAP@50 (86.7%), and F1-score (84.1%), while maintaining competitive recall (78.2%), demonstrating a robust ability to minimize false positives without compromising the identification of true diseases. Notably, this superior performance is achieved with remarkably low computational demands—only 4.3 M parameters, 10.7 GFLOPs, and an 8.66 MB model size. This efficiency rivals that of ultra-lightweight models like YOLOv5-n and YOLOv8-n, yet its accuracy significantly surpasses larger, more complex models such as YOLOv8-m and RT-DETR-r34. These results demonstrate that the strategic integration of selective state space models with convolutional neural networks in FishMambaNet successfully overcomes the traditional accuracy-efficiency trade-off. By combining the global receptive field of SSMs with the local feature extraction of CNNs, the architecture delivers a highly effective and computationally tractable solution, underscoring its strong potential for real-time deployment in practical aquaculture environments.
Figure 7 provides a comprehensive visualization of FishMambaNet’s performance through three analytical perspectives. The normalized confusion matrix in Figure 7a shows strong diagonal concentration, indicating high per-class accuracy, particularly for fungal diseases. Some off-diagonal misclassifications reveal challenges in distinguishing visually similar symptoms. Figure 7b displays F1 score-confidence curves, where peak values in high-confidence regions demonstrate the model’s ability to maintain high recall while minimizing false positives. Figure 7c shows precision-recall curves with areas consistently above 0.85 across categories. Collectively, these visualizations validate the robustness of FishMambaNet for multi-class fish disease detection, achieved through its synergistic combination of selective state space models and multi-scale attention mechanisms.
Figure 8 provides a qualitative comparison of detection results for Bacterial Diseases across seven models: (a) Faster R-CNN, (b) YOLOv5n, (c) YOLOv8n, (d) YOLOv5s, (e) YOLOv8s, (f) RT-DETR-r34, and (g) FishMambaNet. The comparison reveals that Faster R-CNN produced a false detection by misclassifying the disease as healthy fish, while the other six models correctly identified the bacterial infection. However, detection confidence varied significantly among the correct models. Most achieved confidence levels around 0.5, whereas FishMambaNet attained a substantially higher confidence of 0.73, demonstrating superior detection capability for Bacterial Diseases.
In the disease category-specific detection capability evaluation, this study further analyzed the AP@50 performance of different detection models across five categories of fish health status to validate the detection stability of the models for different types of diseases. Bold indicates the best performance, while underlining indicates the second-best performance. The detailed results are shown in Table 4.
Analysis of the category-specific results in Table 4 demonstrates FishMambaNet’s superior performance across most disease types, highlighting its robust capability to handle diverse pathologies. The model achieves the highest AP@50 for bacterial diseases (93.6%), outperforming the second-best YOLOv8-n by 1.1%, a result attributable to the effective synergy of SSM-based global pattern recognition and multi-scale feature extraction. Its performance is particularly strong in fungal disease detection, where a 91.7% AP@50—significantly higher than other models—showcases the MSCA module’s proficiency in identifying subtle textural and multi-scale manifestations. For the rapidly spreading white tail disease, FishMambaNet delivers the best detection results (86.7% AP@50), leveraging the FSBlock’s capacity to model long-range dependencies between distributed symptoms. Although it slightly trails RT-DETR-r18 and YOLOv5-s in healthy fish identification, this minor trade-off reflects a strategic focus on pathological features, while still maintaining high, balanced performance across all categories. These results confirm that FishMambaNet excels not only in overall accuracy but also in specificity and stability across diverse disease types, meeting the complex demands of practical aquaculture diagnostics.
Figure 9 provides comprehensive visual validation of FishMambaNet’s detection performance across all disease categories against six benchmark models. The qualitative comparison reveals several critical advantages of our proposed method. For Bacterial Diseases, FishMambaNet achieves the highest confidence with precise lesion localization, whereas Faster R-CNN produces false positives in healthy areas. In the challenging White Tail Diseases category, FishMambaNet accurately identifies and delineates affected regions, while several comparison models misclassify diseased fish as healthy. For Fungal Diseases, FishMambaNet successfully detects all three instances with high confidence, unlike other models that miss smaller lesions. Across all categories, including Healthy Fish identification, FishMambaNet demonstrates superior performance in both detection accuracy and confidence calibration. This visual evidence strongly corroborates the quantitative results, showing how the integration of selective state space models enables robust feature representation, while the multi-scale attention mechanism effectively handles lesions of varying sizes and morphologies. The consistent performance across diverse disease manifestations underscores FishMambaNet’s robustness and practical utility for real-world aquaculture monitoring.
Figure 9 further illustrates the confusion and misdetection between healthy fish and diseased cases. Specifically, for certain fish species, white tail disease can appear extremely similar to a healthy tail in visual appearance. The second row of Figure 9 clearly demonstrates such erroneous detection, where many models misclassify such fish as healthy. The primary reason for this confusion lies in feature similarity: in its early stages or in certain species, white tail disease manifests only as slight fading or transparency at the caudal peduncle. Its visual characteristics differ minimally from the natural luster and texture of a healthy fish tail, especially in underwater environments with complex lighting or limited image quality, where key discriminative details are easily lost. As a result, models relying on generic feature representations often struggle to capture such subtle pathological differences, leading to the misclassification of diseased states as healthy.
The consistent performance advantage underscores the model’s demonstrates promising generalization ability in cross-dataset evaluation and practical utility. To validate this generalizability, we further evaluated FishMambaNet on a publicly available, large-scale, and domain-distinct benchmark: the Roboflow Fish Detection Dataset [https://public.roboflow.com/object-detection/fish, accessed on 10 December 2025]. This dataset presents a significant domain shift, featuring marine species (vs. our freshwater species) in different underwater environments for a generic detection task (vs. disease-specific detection). As shown in Table 5, when trained and tested from scratch on this out-of-distribution data, FishMambaNet achieved a competitive mAP@50 of 50.3%, outperforming similarly lightweight models (e.g., YOLOv8-n at 48.5%) while maintaining its parameter efficiency. This result provides concrete, empirical evidence that the architectural advantages of FishMambaNet—its efficient modeling of both global context and local details—translate into robust and transferable feature representations, substantiating its potential for broader application beyond the original data distribution.

3.3. Results of the Ablation Study for the FishMambaNet

An ablation study was conducted to evaluate the individual and combined contributions of the three core modules: MSCA, FSBlock, and PConv. The results, presented in Table 6, validate each module’s effectiveness and their synergistic interactions. (Best and second-best performances are highlighted in bold and underline, respectively.)
Analysis of Table 6 confirms that each module enhances overall performance through distinct mechanisms. The standalone MSCA module improves mAP@50 by 1.0%, attesting to the value of its multi-scale attention in capturing features from fine-grained lesions to broader pathological contexts. Independently, the FSBlock delivers a more substantial 2.7% gain, highlighting the critical role of state-space models in capturing long-range spatial relationships between distributed symptoms—a capability beyond conventional convolutions. Notably, their combined integration achieves 85.5% mAP@50, exceeding the sum of individual improvements and revealing a clear synergy: MSCA’s refined local features provide superior input for FSBlock’s global reasoning, while FSBlock’s contextual understanding guides MSCA’s attention to semantically relevant regions. The complete model, augmented with PConv, further elevates mAP@50 to 86.7% while maintaining computational efficiency. PConv’s partial channel processing reduces redundant computations without compromising representational capacity, thus optimizing the accuracy-efficiency trade-off. These results validate the complementary roles of the three components, demonstrating that the strategic integration of multi-scale attention, state-space modeling, and efficient convolution successfully addresses the multifaceted challenges of aquatic disease detection. Achieving this performance with only 4.3 M parameters and 10.7 GFLOPs underscores the model’s practical viability for deployment in resource-constrained aquaculture environments.
The per-category analysis in Table 7 reveals the specialized diagnostic strengths of each module. The FSBlock excels at detecting diseases with distributed symptoms; its global state space modeling elevates the AP@50 for white-tail disease from 78.2% to 83.9% and provides the most significant improvement in healthy fish recognition by distinguishing normal variations from early pathology. In contrast, the MSCA module proves most effective for pathologies requiring multi-scale analysis, such as fungal diseases, where its parallel convolutional branches raise AP@50 from 87.8% to 88.9% by capturing both fine textures and broader infection patterns. The synergy between MSCA and FSBlock is evident in their combined performance—achieving 92.7% and 84.9% AP@50 for bacterial and white-tail diseases, respectively—where MSCA’s local refinement complements FSBlock’s establishment of long-range spatial relationships. The complete model with PConv achieves optimal performance in most categories (e.g., 91.7% for fungal diseases, 86.7% for white-tail disease), validating the integrated architecture’s effectiveness. This performance profile confirms that the complementary strengths of SSM-based global modeling, multi-scale attention, and efficient convolution collectively address the full spectrum of visual characteristics in aquatic disease diagnostics.
To quantitatively assess the overhead and benefit of the unique scan expansion operation within the FSBlock, we conducted a controlled benchmarking study. We compared three variants: (1) the full FSBlock, (2) an FSBlock without scan expansion (replaced by simple flattening), and (3) a convolutional block (ConvBlock) with a comparable receptive field. The impact of this operation on inference latency and GPU memory was specifically quantified. The specific experiments and results are shown in Table 8.
As shown in Table 8, the scan expansion operation increases the peak GPU memory usage by approximately 31% (from 2.64 GB to 3.45 GB). This overhead stems from the need to create and store intermediate sequential representations for the four scanning directions. The operation also introduces about 17% additional inference latency (from 4.1 ms to 4.8 ms), primarily due to the computational cost of transforming the 2D feature maps into multi-directional sequences and reconstructing the output. In return, it delivers an absolute mAP improvement of 1.4% (84.1% vs. 82.7%), confirming its effectiveness in capturing distributed lesion patterns through explicit modeling of directional long-range dependencies. The experiments quantitatively validate the existence of a trade-off between accuracy and efficiency introduced by the scan expansion operation.

4. Discussion

FishMambaNet demonstrates outstanding comprehensive performance in fish disease detection, achieving superior overall accuracy (86.7% mAP@50) compared to mainstream models like the YOLO series and RT-DETR [40], while maintaining exceptional efficiency with only 4.3 M parameters and 10.7 GFLOPs. This performance is primarily attributed to the synergistic integration of SSMs with convolutional neural networks. The FSBlock validates the strength of selective SSMs in capturing global spatial dependencies through linear sequence modeling, effectively overcoming the limited receptive field of traditional CNNs. This is crucial for diagnosing diseases with correlated symptom distributions, as evidenced by FishMambaNet’s high AP@50 scores for white-tail disease (86.7%) and fungal infections (91.7%) in Table 3 and Table 5. The MSCA module functionally complements the FSBlock by providing sensitive multi-scale perception of local lesions. Ablation experiments confirm their strong synergy, as their combined performance gain exceeds the sum of individual contributions. Furthermore, the integration of partial convolution optimizes computational efficiency without sacrificing accuracy, enhancing the model’s suitability for resource-constrained environments.
This study has several limitations. The dataset, while covering a full production cycle, originates from a single aquaculture region in Southern China. This limits the model’s validated robustness against key environmental and operational variables prevalent in real-world deployments, such as variations in water quality (e.g., turbidity), lighting conditions, diverse fish species, and different imaging equipment. Although computationally efficient, the model’s real-time inference performance and energy consumption on edge devices in practical aquaculture settings require empirical validation. Finally, the current framework focuses exclusively on external diseases and does not address internal pathologies or behavioral abnormalities.
Despite these limitations, FishMambaNet establishes a viable pathway for intelligent aquaculture disease management. Its lightweight architecture enables deployment of high-precision diagnostic models on embedded systems near fish ponds, facilitating early detection and timely intervention to reduce economic losses. Future work will, therefore, prioritize robustness benchmarking and domain adaptation research alongside expanding dataset diversity and scale. This includes exploring integration of temporal behavior analysis to develop a more comprehensive fish health monitoring system.

5. Conclusions

Addressing the critical need for effective disease management in global aquaculture, this study developed an automated detection solution that balances high accuracy with computational efficiency. To overcome the limited receptive fields of CNNs and the high computational cost of Transformers, we introduced FishMambaNet, a novel framework based on SSMs. The principal contributions are threefold: (1) We established a new architectural paradigm by pioneering the integration of SSMs with CNNs for fish disease detection, leveraging global dependency modeling and local feature extraction. (2) We designed two core innovations—the FSBlock for capturing global disease patterns and local lesion details, and the MSCA mechanism for efficient multi-scale context fusion. (3) Extensive validation demonstrated state-of-the-art performance, with FishMambaNet achieving 86.7% mAP@50 while requiring only 4.3 M parameters and 10.7 GFLOPs, significantly outperforming existing models and offering a practical solution for real-time diagnosis in resource-limited settings.
Future research will focus on three directions: The foremost direction is rigorous robustness evaluation and enhancement. This involves systematically constructing benchmarks to quantify performance under varying conditions of turbidity, illumination, and viewpoint, and conducting small-scale field deployments across different farming systems to gather authentic feedback and drive domain-adaptive improvements. Building upon the promising cross-dataset generalization shown in this work, the immediate priority is to expand dataset diversity through collaborative efforts to construct and release a large-scale, open-access benchmark spanning multiple species, diseases, and farming environments; advancing edge deployment and validation in operational aquaculture settings; and integrating temporal analysis for early detection of behavioral abnormalities, ultimately enabling a comprehensive health monitoring system from external symptoms to behavioral cues.

Author Contributions

Conceptualization, Z.L. and R.C.; methodology, R.C. and Z.L.; Formal analysis, Z.L. and R.C.; investigation, S.L.; Data curation, Z.L. and R.C.; Resources, J.Z.; Writing—original draft, Z.L. and R.C.; Writing—review and editing, J.G.; Supervision, S.L.; Project administration, J.G.; Funding acquisition, J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the following funding sources: the 2022 Graduate Education Innovation Plan Project of Zhongkai University of Agriculture and Engineering (KA220160228); the Guangdong Rural Science and Technology Commissioner Project (No. KTP20240633); and the Guangdong Basic and Applied Basic Research Foundation (20aA1515011230); Special Projects in Key Fields of Ordinary Universities in Guangdong Province 2025ZDZX4025, Guangdong Province Graduate Education Innovation Program Project under Grant 2024ANLK_049. The authors extend their sincere gratitude for the financial and technical support provided by these programs.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. The State of World Fisheries and Aquaculture 2022; FAO: Rome, Italy, 2022; ISBN 978-92-5-136364-5.
  2. Verdegem, M.; Buschmann, A.H.; Latt, U.W.; Dalsgaard, A.J.T.; Lovatelli, A. The Contribution of Aquaculture Systems to Global Aquaculture Production. J. World Aquac. Soc. 2023, 54, 206–250. [Google Scholar] [CrossRef]
  3. Garlock, T.M.; Asche, F.; Anderson, J.L.; Eggert, H.; Anderson, T.M.; Che, B.; Chávez, C.A.; Chu, J.; Chukwuone, N.; Dey, M.M.; et al. Author Correction: Environmental, Economic, and Social Sustainability in Aquaculture: The Aquaculture Performance Indicators. Nat. Commun. 2024, 15, 5965. [Google Scholar] [CrossRef]
  4. Li, D.; Li, X.; Wang, Q.; Hao, Y. Advanced Techniques for the Intelligent Diagnosis of Fish Diseases: A Review. Animals 2022, 12, 2938. [Google Scholar] [CrossRef]
  5. Ziarati, M.; Zorriehzahra, M.J.; Hassantabar, F.; Mehrabi, Z.; Dhawan, M.; Sharun, K.; Emran, T.B.; Dhama, K.; Chaicumpa, W.; Shamsi, S. Zoonotic Diseases of Fish and Their Prevention and Control. Vet. Q 2022, 42, 95–118. [Google Scholar] [CrossRef]
  6. Anjur, N.; Sabran, S.F.; Daud, H.M.; Othman, N.Z. An Update on the Ornamental Fish Industry in Malaysia: Aeromonas Hydrophila-Associated Disease and Its Treatment Control. Vet. World 2021, 14, 1143–1152. [Google Scholar] [CrossRef]
  7. Pridgeon, J.W.; Klesius, P.H. Major Bacterial Diseases in Aquaculture and Their Vaccine Development. CABI Rev. 2012, 7, 1–16. [Google Scholar] [CrossRef]
  8. Stentiford, G.D.; Sritunyalucksana, K.; Flegel, T.W.; Williams, B.A.P.; Withyachumnarnkul, B.; Itsathitphaisarn, O.; Bass, D. New Paradigms to Help Solve the Global Aquaculture Disease Crisis. PLoS Pathog. 2017, 13, e1006160. [Google Scholar] [CrossRef]
  9. Kumar, V.; Roy, S.; Behera, B.K.; Das, B.K. Disease Diagnostic Tools for Health Management in Aquaculture. In Advances in Fisheries Biotechnology; Pandey, P.K., Parhi, J., Eds.; Springer Nature: Singapore, Singapore, 2021; pp. 363–382. ISBN 978-981-16-3214-3. [Google Scholar]
  10. Pillai, D.; Bonami, J.R. A Review on the Diseases of Freshwater Prawns with Special Focus on White Tail Disease of Macrobrachium Rosenbergii. Aquac. Res. 2012, 43, 1029–1037. [Google Scholar] [CrossRef]
  11. Ahmed, M.S.; Aurpa, T.T.; Azad, A.K. Fish Disease Detection Using Image Based Machine Learning Technique in Aquaculture. J. King Saud Univ. Comput. Inf. Sci. 2022, 34, 5170–5182. [Google Scholar] [CrossRef]
  12. Jhansi, G.; Sujatha, K. HRFSVM: Identification of Fish Disease Using Hybrid Random Forest and Support Vector Machine. Environ. Monit. Assess. 2023, 195, 918. [Google Scholar] [CrossRef]
  13. Malik, S.; Kumar, T.; Sahoo, A.K. Image Processing Techniques for Identification of Fish Disease. In Proceedings of the 2017 IEEE 2nd International Conference on Signal and Image Processing (ICSIP), Singapore, 4–6 August 2017; pp. 55–59. [Google Scholar]
  14. LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
  15. Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-Based Learning Applied to Document Recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
  16. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
  17. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
  18. Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
  19. Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
  20. Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar] [CrossRef]
  21. Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. arXiv 2022, arXiv:2209.02976. [Google Scholar] [CrossRef]
  22. Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
  23. Wang, C.-Y.; Yeh, I.-H.; Liao, H.-Y.M. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. In European Conference on Computer Vision; Springer Nature: Cham, Switzerland, 2024. [Google Scholar]
  24. Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. Adv. Neural Inf. Process. Syst. 2024, 37, 107984–108011. [Google Scholar]
  25. Huang, Y.-P.; Khabusi, S.P. A CNN-OSELM Multi-Layer Fusion Network With Attention Mechanism for Fish Disease Recognition in Aquaculture. IEEE Access 2023, 11, 58729–58744. [Google Scholar] [CrossRef]
  26. Sanjay Kumaar, A.; Vishnu Vignesh, A.; Deepak, K. FishNet Freshwater Fish Disease Detection Using Deep Learning Techniques. In Proceedings of the 2024 2nd International Conference on Advancement in Computation & Computer Technologies (InCACCT), Gharuan, India, 2–3 May 2024; pp. 368–373. [Google Scholar]
  27. Li, X.; Zhao, S.; Chen, C.; Cui, H.; Li, D.; Zhao, R. YOLO-FD: An Accurate Fish Disease Detection Method Based on Multi-Task Learning. Expert Syst. Appl. 2024, 258, 125085. [Google Scholar] [CrossRef]
  28. Wu, Z.; Li, J.; Shi, R.; Dai, H.; Cui, Z.; Wang, Y.; Yu, H. YOLOv11-SDiseasedFishNet:Recognition of Body Surface Symptoms of Diseased Fish Based on Automatic Combination Augmentation and Multi-Scale Feature Fusion. Aquaculture 2025, 613, 743336. [Google Scholar] [CrossRef]
  29. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
  30. Nath, S.; Ani, S.R.; Arefin, F.; Islam, M.I.; Afnan, T.; Chowdhury, M.T.; Niloy, N.T. Towards Fish Disease Detection Using the Vision Transformer Model and Convolutional Neural Network Model. In Proceedings of the 2024 27th International Conference on Computer and Information Technology (ICCIT), Cox’s Bazar, Bangladesh, 20–22 December 2024; pp. 2742–2747. [Google Scholar]
  31. Bhattacharjee, S.; Murad, H.; Tista, S.C. Towards Developing a Fish Disease Recognizer: A Deep Learning-Based Approach. In Proceedings of the 2024 6th International Conference on Sustainable Technologies for Industry 5.0 (STI), Narayanganj, Bangladesh, 14–15 December 2024; pp. 1–6. [Google Scholar]
  32. Alluhaidan, A.S.; Pachiyannan, P.; Aziz, R.; Basheer, S. Automated Detection of Waterborne Pathogens in Aquaculture Using an Enhanced Swin-Transformer Model. Trait. Signal 2025, 42, 2301. [Google Scholar] [CrossRef]
  33. Gu, A.; Johnson, I.; Goel, K.; Saab, K.; Dao, T.; Rudra, A.; Ré, C. Combining Recurrent, Convolutional, and Continuous-Time Models with Linear State-Space Layers. Adv. Neural Inf. Process. Syst. 2021, 34, 572–585. [Google Scholar]
  34. Gu, A.; Goel, K.; Ré, C. Efficiently Modeling Long Sequences with Structured State Spaces. arXiv 2022, arXiv:2111.00396. [Google Scholar] [CrossRef]
  35. Gu, A.; Dao, T. Mamba: Linear-Time Sequence Modeling with Selective State Spaces. arXiv 2023, arXiv:2312.00752. [Google Scholar] [CrossRef]
  36. Liu, Y.; Tian, Y.; Zhao, Y.; Yu, H.; Xie, L.; Wang, Y.; Ye, Q.; Jiao, J.; Liu, Y. VMamba: Visual State Space Model. Adv. Neural Inf. Process. Syst. 2024, 37, 103031–103063. [Google Scholar]
  37. Liu, Y.; Wang, C.; Lu, M.; Yang, J.; Gui, J.; Zhang, S. From Simple to Complex Scenes: Learning Robust Feature Representations for Accurate Human Parsing. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 5449–5462. [Google Scholar] [CrossRef]
  38. Wang, C.; Zhang, Q.; Wang, X.; Zhou, L.; Li, Q.; Xia, Z.; Ma, B.; Shi, Y.-Q. Light-Field Image Multiple Reversible Robust Watermarking Against Geometric Attacks. IEEE Trans. Dependable Secur. Comput. 2025, 22, 5861–5875. [Google Scholar] [CrossRef]
  39. Chen, J.; Kao, S.; He, H.; Zhuo, W.; Wen, S.; Lee, C.-H.; Chan, S.-H.G. Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar]
  40. Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. DETRs Beat YOLOs on Real-Time Object Detection. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 16965–16974. [Google Scholar]
Figure 1. Dataset collection process.
Figure 1. Dataset collection process.
Fishes 10 00649 g001
Figure 2. Comprehensive statistical analysis of training set data. The blue color gradient in the scatter plots represents the local density of data points. The color at a specific location indicates how many data points are concentrated in that area of the 2D space.
Figure 2. Comprehensive statistical analysis of training set data. The blue color gradient in the scatter plots represents the local density of data points. The color at a specific location indicates how many data points are concentrated in that area of the 2D space.
Fishes 10 00649 g002
Figure 3. Overall model architecture.
Figure 3. Overall model architecture.
Fishes 10 00649 g003
Figure 4. Fish Disease Detection State Space Block (FSBlock) Structural Diagram.
Figure 4. Fish Disease Detection State Space Block (FSBlock) Structural Diagram.
Fishes 10 00649 g004
Figure 5. Multi-Scale Convolutional Attention Model Structural Diagram.
Figure 5. Multi-Scale Convolutional Attention Model Structural Diagram.
Fishes 10 00649 g005
Figure 6. Internal operation diagram of convolution. (a) Standard convolution; (b) Depthwise separable convolution; (c) Partial convolution.
Figure 6. Internal operation diagram of convolution. (a) Standard convolution; (b) Depthwise separable convolution; (c) Partial convolution.
Fishes 10 00649 g006
Figure 7. Confusion Matrix, F1-Confidence Curve, and Precision-Recall Curve of FishMambaNet. (a) Normalized confusion matrix, showing per-class accuracy and misclassification trends; (b) F1-Confidence curve, indicating the optimal confidence threshold (≈0.512) for maximal F1 score; (c) Precision-Recall curve, with the mean average precision (mAP@0.5) reaching 0.867.
Figure 7. Confusion Matrix, F1-Confidence Curve, and Precision-Recall Curve of FishMambaNet. (a) Normalized confusion matrix, showing per-class accuracy and misclassification trends; (b) F1-Confidence curve, indicating the optimal confidence threshold (≈0.512) for maximal F1 score; (c) Precision-Recall curve, with the mean average precision (mAP@0.5) reaching 0.867.
Fishes 10 00649 g007
Figure 8. Visual comparison of detection results across models.
Figure 8. Visual comparison of detection results across models.
Fishes 10 00649 g008
Figure 9. Detection results comparison across models and disease categories: (a) Faster R-CNN, (b) YOLOv5n, (c) YOLOv8n, (d) YOLOv5s, (e) YOLOv8s, (f) RT-DETR-r34, (g) FishMambaNet.
Figure 9. Detection results comparison across models and disease categories: (a) Faster R-CNN, (b) YOLOv5n, (c) YOLOv8n, (d) YOLOv5s, (e) YOLOv8s, (f) RT-DETR-r34, (g) FishMambaNet.
Fishes 10 00649 g009
Table 1. Dataset categories and number of annotation boxes.
Table 1. Dataset categories and number of annotation boxes.
Classes_AnnotationBounding_Box
Bacterial Diseases152
Fungal Diseases229
Healthy Fish962
Parasitic Diseases175
White Tail Diseases110
Table 2. Experimental environment configuration.
Table 2. Experimental environment configuration.
EnvironmentConfiguration
CPU13th Gen Intel Core i5-13490F 2.50 GHz processor
GPUNVIDIA GeForce RTX 4070Super 12G graphics card
SystemWindows 11
CUDA11.8
Pytorch2.1.0
Python3.10
Table 3. Performance comparison of FishMambaNet with other detection models on the fish disease dataset.
Table 3. Performance comparison of FishMambaNet with other detection models on the fish disease dataset.
MethodPrecision ↑Recall ↑mAP@50 ↑F1 ↑Params ↓FLOPs ↓Weights ↓
Faster R-CNN81.870.380.175.644.1 M178 G108.2 MB
YOLOv5-n83.370.080.676.02.5 M7.1 G5.05 MB
YOLOv8-n84.170.181.476.43.0 M8.1 G5.98 MB
YOLOv5-s83.770.881.876.79.1 M23.5 G17.6 MB
YOLOv8-s84.374.983.379.311.1 M28.4 G21.4 MB
YOLOv5-m83.878.384.780.925.0 M64.0 G48.1 MB
YOLOv8-m87.879.486.083.325.8 M79.1 G49.6 MB
RT-DETR-r1886.470.482.377.620.2 M58.4 G37.1 MB
RT-DETR-r3490.376.885.683.030.4 M89.3 G56.2 MB
FishMambaNet91.078.286.784.14.3 M10.7 G8.66 MB
Note: Bold indicates the best performance; underlined numbers indicate the second-best performance; An upward arrow (↑) indicates that a higher value is better for that metric (e.g., Precision, Recall). A downward arrow (↓) indicates that a lower value is better (e.g., Params, FLOPs).
Table 4. FishMambaNet vs. Various Detection Models in AP@50 Performance Across Different Diseases (%).
Table 4. FishMambaNet vs. Various Detection Models in AP@50 Performance Across Different Diseases (%).
MethodAP@50 ↑ (%)
Bacterial DiseasesFungal DiseasesHealthy FishParasitic DiseasesWhite Tail Diseases
Faster R-CNN88.779.477.077.677.8
YOLOv5-n91.981.075.776.178.1
YOLOv8-n92.587.876.372.378.2
YOLOv5-s86.187.084.672.279.3
YOLOv8-s91.979.984.479.880.7
RT-DETR-r1891.380.486.175.777.9
FishMambaNet93.691.782.679.086.7
Note: Bold indicates the best performance; underlined numbers indicate the second-best performance; An upward arrow (↑) indicates that a higher value is better for that metric.
Table 5. Performance comparison of FishMambaNet with mainstream models on the Fish Detection dataset.
Table 5. Performance comparison of FishMambaNet with mainstream models on the Fish Detection dataset.
ModulePrecision ↑Recall ↑F1 ↑mAP@50 ↑Params ↓FLOPs ↓
YOLOv5-n43.563.651.746.12.5 M7.1 G
YOLOv8-n45.271.555.448.53.0 M8.1 G
YOLOv5-s44.273.055.147.99.1 M23.5 G
YOLOv8-s42.780.855.949.511.1 M28.4 G
YOLOv5-m44.870.654.849.748.1 M64.0 G
YOLOv8-m48.978.560.254.149.6 M79.1 G
RT-DETR-r3443.978.156.250.356.2 M89.3 G
FishMambaNet45.977.057.551.54.3 M10.7 G
Note: Bold indicates the best performance; underlined numbers indicate the second-best performance; An upward arrow (↑) indicates that a higher value is better for that metric (e.g., Precision, Recall). A downward arrow (↓) indicates that a lower value is better (e.g., Params, FLOPs).
Table 6. Ablation study results of core modules on the fish disease dataset.
Table 6. Ablation study results of core modules on the fish disease dataset.
MSCAFSBlockPConvmAP@50 ↑F1 ↑Params ↓FLOPs ↓Weights ↓
81.476.43.0 M8.1 G5.98 MB
82.478.23.4 M8.5 G6.62 MB
84.180.34.1 M10.5 G8.15 MB
83.980.13.2 M8.3 G6.49 MB
85.582.64.4 M10.8 G8.78 MB
86.784.14.3 M10.7 G8.66 MB
Note: Bold indicates the best performance; underlined numbers indicate the second-best performance; An upward arrow (↑) indicates that a higher value is better for that metric (e.g., Precision, Recall). A downward arrow (↓) indicates that a lower value is better (e.g., Params, FLOPs); (√) A checkmark indicates that the specific module or design (as denoted by the column header) is included in the model variant for that row. Its absence denotes the ablation setting where that component is removed.
Table 7. AP@50 comparison of module ablation experiments across disease categories (%).
Table 7. AP@50 comparison of module ablation experiments across disease categories (%).
MSCAFSBlockPConvAP@50 ↑
Bacterial DiseasesFungal DiseasesHealthy FishParasitic DiseasesWhite Tail Diseases
92.587.876.372.378.2
91.388.977.174.779.9
91.888.181.275.583.9
92.389.379.476.382.1
92.789.982.877.284.9
93.691.782.679.086.7
Note: Bold indicates the best performance; underlined numbers indicate the second-best performance; An upward arrow (↑) indicates that a higher value is better for that metric; (√) A checkmark indicates that the specific module or design (as denoted by the column header) is included in the model variant for that row.
Table 8. Efficiency comparison of scan expansion operation variants.
Table 8. Efficiency comparison of scan expansion operation variants.
ModuleParams (M)FLOPs (G)Latency (ms)Memory (GB)mAP@50 (%)
+FSBlock4.110.54.83.4584.1
+FSBlock (w/o scan)4.18.54.12.6482.7
+C2f3.08.13.52.6681.4
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Luo, Z.; Chen, R.; Li, S.; Zheng, J.; Guo, J. FishMambaNet: A Mamba-Based Vision Model for Detecting Fish Diseases in Aquaculture. Fishes 2025, 10, 649. https://doi.org/10.3390/fishes10120649

AMA Style

Luo Z, Chen R, Li S, Zheng J, Guo J. FishMambaNet: A Mamba-Based Vision Model for Detecting Fish Diseases in Aquaculture. Fishes. 2025; 10(12):649. https://doi.org/10.3390/fishes10120649

Chicago/Turabian Style

Luo, Zhijie, Rui Chen, Shaoxin Li, Jianhua Zheng, and Jianjun Guo. 2025. "FishMambaNet: A Mamba-Based Vision Model for Detecting Fish Diseases in Aquaculture" Fishes 10, no. 12: 649. https://doi.org/10.3390/fishes10120649

APA Style

Luo, Z., Chen, R., Li, S., Zheng, J., & Guo, J. (2025). FishMambaNet: A Mamba-Based Vision Model for Detecting Fish Diseases in Aquaculture. Fishes, 10(12), 649. https://doi.org/10.3390/fishes10120649

Article Metrics

Back to TopTop