Orga-Dete: An Improved Lightweight Deep Learning Model for Lung Organoid Detection and Classification

Huang, Xuan; Gao, Qin; Zhang, Hanwen; Min, Fuhong; Li, Dong; Luo, Gangyin

doi:10.3390/app15158377

Open AccessArticle

Orga-Dete: An Improved Lightweight Deep Learning Model for Lung Organoid Detection and Classification

by

Xuan Huang

¹

,

Qin Gao

¹,

Hanwen Zhang

²,

Fuhong Min

¹

,

Dong Li

^2,*

and

Gangyin Luo

^2,*

¹

College of Electrical and Automation Engineering, Nanjing Normal University, Nanjing 210023, China

²

Engineering Laboratory of Advanced In Vitro Diagnostic Technology, Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou 215163, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2025, 15(15), 8377; https://doi.org/10.3390/app15158377

Submission received: 1 July 2025 / Revised: 22 July 2025 / Accepted: 23 July 2025 / Published: 28 July 2025

Download

Browse Figures

Versions Notes

Abstract

Lung organoids play a crucial role in modeling drug responses in pulmonary diseases. However, their morphological analysis remains hindered by manual detection inefficiencies and the high computational cost of existing algorithms. To overcome these challenges, this study proposes Orga-Dete—a lightweight, high-precision detection model based on YOLOv11n—which first employs data augmentation to mitigate the small-scale dataset and class imbalance issues, then optimizes via a triple co-optimization strategy: a bi-directional feature pyramid network for enhanced multi-scale feature fusion, MPCA for stronger micro-organoid feature response, and EMASlideLoss to address class imbalance. Validated on a lung organoid microscopy dataset, Orga-Dete achieves 81.4% mAP@0.5 with only 2.25 M parameters and 6.3 GFLOPs, surpassing the baseline model YOLOv11n by 3.5%. Ablation experiments confirm the synergistic effects of these modules in enhancing morphological feature extraction. With its balance of precision and efficiency, Orga-Dete offers a scalable solution for high-throughput organoid analysis, underscoring its potential for personalized medicine and drug screening.

Keywords:

deep learning; lung organoid; detection and classification

1. Introduction

The advancement of biology increasingly depends on innovations in in vitro cell culture technologies. Traditional two-dimensional (2D) cell culture models lack tissue-specific architecture and fail to accurately replicate complex physiological environments [1]. In contrast, organoids—self-organized, three-dimensional (3D) cultures—can mimic the cellular composition, structural organization, biological properties, and key functions of native tissues. As such, they serve as powerful platforms for disease modeling, drug screening, and precision medicine [2]. To monitor organoid development, researchers routinely employ fluorescence microscopy for the periodic assessment of morphology and growth dynamics [3]. Common analytical tasks include cell enumeration, quantification of region-specific biomarker intensities, and morphometric analysis (e.g., size, shape) [4]. However, manual quantification remains labor-intensive: a single experiment may require processing hundreds of microscopy images, each containing dozens to hundreds of organoids [5]. This bottleneck limits the scalability of high-throughput organoid technologies. To address this challenge, Borten et al. [6] introduced OrganoSeg, an open-source tool for automated image quantification. However, its reliance on classical image processing methods—such as threshold segmentation and morphological filtering—restricts its adaptability across diverse imaging conditions. These limitations underscore the growing need for deep learning–based approaches that offer superior generalization and cross-scenario robustness.

The convergence of artificial intelligence (AI) and bioinformatics is catalyzing a paradigm shift in organoid analysis, enabling scalable, high-throughput pipelines that enhance both biological discovery and translational applications [7]. Among AI methodologies, object detection [8] has emerged as a critical tool, combining classification with spatial localization to support quantitative organoid counting, spatial distribution analysis, and heterogeneity characterization. Within this domain, the YOLO (You Only Look Once) series [9] is known for its real-time detection speed. However, its anchor-based architecture exhibits limited sensitivity to small and overlapping organoid structures, leading to frequent missed detections. Conversely, Faster R-CNN [10] improves detection accuracy through its region proposal network (RPN), but suffers from computational inefficiencies when applied to high-resolution microscopy images [11]. Several organoid-specific detection models have recently demonstrated promising results. Kassis et al. [12] developed OrgaQuant, a CNN-based tool for detecting and quantifying intestinal organoids in bright-field microscopy images. It achieved a mean average precision at an intersection over union (IoU) threshold of 0.5 (mAP@0.5) of 80%, substantially increasing analytical throughput. Similarly, Abdul et al. [13] introduced Deep-LUMEN, which detects morphological transitions in lung epithelial spheroids and distinguishes luminal from non-luminal structures in bright-field images, attaining a mAP@0.5 of 73% without data augmentation. This offers a novel framework for respiratory organoid analysis. Kegeles et al. [14] applied ResNet50 to develop a three-stage classification system for retinal organoids, achieving an mAP@0.5 of 84% by leveraging early-stage morphological features for fate prediction—outperforming expert manual evaluations. Wang et al. [15] employed YOLO to build an automated detection system for induced pluripotent stem cell (iPSC) colonies, reaching a mAP@0.5 of 89.8%, thus demonstrating strong reliability and surpassing manual quantification in consistency. Addressing the complexity of cellular morphology, Li et al. [16] incorporated a feature pyramid network (FPN) into the Faster R-CNN framework to enhance the detection of abnormal cervical cells in cytology images, resulting in a 6–9% mAP@0.5 improvement over traditional methods. The OrgaNet algorithm developed by Bian et al. [17] achieved a 96.9% classification accuracy for organoid viability on a dedicated dataset but demonstrated limited adaptability to bright-field imaging. To overcome this limitation, Powell et al. [18] introduced DeepOrganoid, a model integrating bright-field microscopy, Z-stack projections, and biochemical multimodal data. Through cross-dimensional feature fusion, it achieved 97% mAP@0.5 for viability assessment and classification, supporting dynamic, personalized treatment decision-making. Okamoto et al. [19] proposed a synergistic framework combining U-Net for microscopic image feature extraction and DenseNet-201 for the classification of six colorectal cancer organoid subtypes based on textural characteristics. This approach enabled both morphological and functional analysis, achieving an mAP@0.5 of 73.6%. Abdul et al. [20] developed D-CryptO, a deep learning tool that attained mAP@0.5 scores of 98% and 90.87% on two colorectal organoid datasets, while also facilitating quantitative assessment of structural maturity. Du et al. [21] introduced OrgaTracker, which integrates YOLOv5 with U-Net and was trained on datasets of human colon tumors and murine intestinal organoids. It achieved an mAP@0.5 of 96.4% by combining real-time regional segmentation with dynamic fusion analysis, thus enabling multidimensional evaluation of organoid quality. Addressing morphological diversity in intestinal organoids, Domènech-Moreno et al. [22] developed Tellu, a YOLOv5-based model capable of category-sensitive analysis and real-time processing, achieving an mAP@0.5 of 79%. Yang et al. [23] constructed a gastric organoid senescence detection model by embedding the convolutional block attention module (CBAM) into the YOLOv3 architecture. This enabled fine-grained recognition of senescence features under complex microscopic conditions, achieving an mAP@0.5 of 93.2%. Leng et al. [24] developed Deep-Orga based on a YOLOX architecture, attaining mAP@0.5 scores of 90% and 81% on human and murine intestinal datasets, respectively, thereby demonstrating robust cross-sample generalization. To address classification challenges between cystic and solid colorectal cancer organoids, Huang et al. [25] evaluated six object detection models. YOLOv4 achieved the highest mAP@0.5 of 36.9%, revealing a significant correlation between these morphological subtypes and biologically viable organoids. In contrast, Sun et al. [26] proposed Deliod, an optimized YOLOv8s-based architecture. It achieved an mAP@0.5 of 87.5% on murine intestinal organoid detection tasks, while significantly reducing computational and parameter overhead, making it suitable for edge-device deployment.

In summary, while object detection models have advanced organoid morphological analysis, two persistent challenges hinder high-throughput application: (1) the need for fast and accurate target localization and classification and (2) model lightweighting to enable edge-device deployment. Existing deep learning models often struggle to strike a balance between computational efficiency and performance, limiting their utility in real-world high-throughput settings. To address these challenges, this study introduces Orga-Dete, a lightweight lung organoid detection model based on YOLOv11n [27]. Orga-Dete achieves an optimal trade-off among feature extraction capability, computational efficiency, and model complexity, while ensuring multi-scale adaptability and robustness in complex imaging scenarios.

This study assessed the following aspects:

To address defocus blur and multi-scale feature fusion challenges in microscopic imaging, we replace the original FPN + PAN structure in YOLOv11n with a BiFPN module. BiFPN employs cross-layer bidirectional connections and learnable weights to effectively differentiate organoids from background noise and imaging artifacts.

To overcome weak discriminative feature extraction caused by densely packed small organoids and low-contrast micro-morphological variation, we incorporate the multi-path coordinate attention (MPCA) mechanism into the YOLOv11n backbone. MPCA enhances the separability of subtle morphological features, significantly improving detection accuracy for small, clustered organoids.

To address class imbalance in lung organoid datasets, we replace the standard classification loss function with EMASlideLoss, a dynamic threshold-calibrated loss function. EMASlideLoss enhances attention to hard samples by adaptively adjusting classification thresholds via an exponential moving average, thus improving model performance on underrepresented classes.

The remainder of this paper is organized as follows: Section 2 details the proposed methods and theoretical foundations. Section 3 presents the experimental setup and results. Section 4 discusses key findings and concludes the study.

2. Methodology

This paper starts with a systematic synthesis of the key methodologies and conceptual models employed in our analysis.

2.1. YOLOv11n

The YOLO series, a paradigm in object detection, was first introduced by Redmon et al. [9] in 2016. The latest iteration, YOLOv11, developed by the Ultralytics team, is composed of three key modules—backbone, neck, and head—and is offered in five scaled variants: nano (n), small (s), medium (m), large (l), and extra-large (x). Among these, the nano version is the most lightweight, optimized for deployment on resource-constrained edge devices. Compared to YOLOv8 [28], YOLOv11 introduces several architectural enhancements that collectively improve performance. In both the backbone and neck, the C3K2 module replaces the earlier C2F design. This new module integrates dynamic multi-kernel convolution and channel separation strategies to improve multi-scale feature aggregation. Additionally, the cross-stage Partial with Pyramid Squeeze Attention (C2PSA) module is introduced following the retained spatial pyramid pooling fusion (SPPF) block, further enhancing feature extraction. The detection head adopts an anchor-free, decoupled structure that separates the object detection task into distinct regression and classification branches. The regression branch uses standard convolutions to preserve localization accuracy, while the classification branch employs depthwise separable convolution (DWConv) to reduce parameter redundancy. Collectively, these improvements result in a 22% reduction in parameter count. The architecture of YOLOv11 is illustrated in Figure 1.

2.2. Proposed Model Orga-Dete

The proposed Orga-Dete model builds upon YOLOv11n, incorporating targeted enhancements to the backbone, neck, and loss function to better accommodate the unique characteristics of lung organoid images. These modifications are detailed in the following sections, culminating in a comprehensive overview of the Orga-Dete architecture.

2.2.1. Bi-Directional Feature Pyramid Network

Earlier versions, such as YOLOv3 [29], employed the conventional feature pyramid network (FPN) [30] to fuse multi-scale features using a top-down pathway. However, conflicts between shallow spatial features and deep semantic representations led to inefficient cross-layer information flow, especially during large object detection. This necessitated repeated downsampling and upsampling operations, which increased computational overhead and compromised accuracy. To overcome these limitations, later models—including YOLOv5 [31], YOLOv8, and YOLOv11—adopted the Path Aggregation Network (PANet) [32] as a complement to the FPN. The resulting FPN + PAN architecture introduces a bottom-up pathway in addition to the original top-down design, allowing low-level localization features to be transmitted directly through lateral connections. This structure enables complementary fusion of semantic and spatial information, as illustrated in Figure 2a,b. Despite the bidirectional feature propagation enabled by FPN + PAN, its reliance on a fixed-weight summation mechanism limits its adaptability. This inflexibility hinders the effective balancing of spatial detail and semantic abstraction, ultimately weakening the model’s ability to distinguish targets from background noise.

To address these issues—particularly the defocus-induced background noise common in lung organoid microscopy—we employ the bi-directional feature pyramid network (BiFPN) [33], shown in Figure 2c, for superior multi-scale feature fusion and spatial-semantic alignment.

BiFPN introduces a novel cross-scale, bidirectional architecture that unifies top-down and bottom-up pathways into a single cohesive structure. Unlike FPN + PAN, which processes each direction independently, BiFPN integrates both into a single network layer, allowing for progressive feature refinement across multiple stacked layers. Each stacking iteration functions as a recalibration cycle, progressively improving target localization. Structurally, BiFPN reduces parameter overhead by eliminating nodes that receive only unidirectional inputs, thus maintaining efficient cross-scale connectivity while minimizing computation. These innovations enable BiFPN to deliver more effective fusion of low-level spatial cues (e.g., organoid boundaries) with high-level semantic context (e.g., cell type classification), achieving a critical balance between accuracy and efficiency in feature pyramid design.

BiFPN introduces a fast normalized fusion mechanism that dynamically evaluates and weights the contributions of multi-scale features using learnable parameters. The core principle is illustrated in Equation (1):

O = \sum_{i} \frac{ω_{i}}{ε + \sum_{j} ω_{j}} \cdot I_{i}

(1)

where

I_{i}

denotes the input features,

O

represents the output features,

ω_{i}

and

ω_{j}

are learnable scalar weights activated via ReLU to ensure non-negativity, and

ε = 0.0001

prevents division by zero while maintaining numerical stability. This mechanism prioritizes high-resolution features while suppressing background noise, thereby significantly enhancing recognition accuracy in critical regions.

2.2.2. Multi-Path Coordinate Attention

In organoid microscopy object detection, attention mechanisms significantly improve the accuracy of small target detection by optimizing multi-level feature representations. This is particularly critical for organoid images, where high-density small objects and low-contrast micro-morphological variations make discriminative feature extraction difficult [34]. To address these challenges, we introduce a multi-path coordinate attention (MPCA) [35] mechanism after the C2PSA module in the Backbone. MPCA enhances small target feature extraction by refining the attention mechanism to better distinguish subtle morphological differences.

The MPCA module is an enhanced version of the coordinate attention (CA) model [36], integrating a multi-path architecture to extract more comprehensive features across varying scales and orientations. It splits the input feature map into four separate pathways, each tailored to capture scale- or orientation-specific information through coordinate attention. This spatial decoupling allows the parallel analysis of distinct spatial characteristics. Each pathway performs bidirectional global average pooling: horizontally, by compressing the feature map along rows to encode horizontal spatial relationships, and vertically, by compressing along columns to capture vertical distributions. This dual-path pooling enables each stream to gather global contextual information from the entire H × W-dimensional space. The resulting encoding vectors from all four pathways are concatenated channel-wise and processed through a 1 × 1 convolution to enable cross-path interaction and dimensionality reduction. These features are then passed through a multi-layer perceptron (MLP) to generate channel attention weights. These weights are element-wise multiplied with the original feature map to recalibrate and emphasize critical regions. Finally, the outputs from all pathways are fused via weighted summation. The architecture of the MPCA module is shown in Figure 3.

Given the abundance of small targets with nuanced micromorphological features in pulmonary organoid images, conventional methods often fail to distinguish them effectively. The MPCA module, through its three-stage optimization—path decoupling, bidirectional encoding, and dynamic weighting—significantly improves micro-scale organoid detection accuracy while maintaining computational efficiency.

2.2.3. EMASlideLoss

Pulmonary organoid detection also faces severe sample imbalance, characterized by a significant disparity between simple and hard samples. Simple samples tend to have high IoU scores due to greater alignment with ground-truth annotations. The IoU is defined as follows:

I o U = \frac{A \cap B}{A \cup B}

(2)

where

A

represents the predicted bounding box and

B

denotes the ground-truth bounding box. High-IoU samples often exhibit regular shapes and clear boundaries, making them easier to detect. In contrast, small, occluded, or morphologically irregular objects tend to have low IoU scores. Analysis reveals that the binary cross entropy (BCE) loss used in YOLOv11 focuses solely on classification confidence and lacks a mechanism to correlate localization accuracy with classification scores. It is also insensitive to object scale, leading to compounded localization errors for hard samples. To address this, we propose EMASlideLoss [37].

EMASlideLoss is a novel loss function tailored for sample imbalance in object detection. It combines an exponential moving average (EMA) with a dynamic sliding threshold mechanism to adaptively weight hard and easy samples during training. Unlike conventional SlideLoss [38], which uses fixed thresholds, EMASlideLoss dynamically adjusts thresholds based on the mean IoU of predicted bounding boxes in each training batch:

μ_{t =} α \cdot μ_{t - 1} + (1 - α) \cdot mean (I o U_{batch})

(3)

where

α

is the smoothing factor, enabling the threshold to adapt to diverse data distributions. Based on the relationship between sample IoU and the dynamic threshold

μ

, different weights are assigned: samples with IoU below

μ

are categorized as hard samples and assigned higher weights, while weights for easy samples with IoU above

μ

are progressively decayed. This weighting strategy significantly enhances the model’s localization capability for hard samples in scenarios involving small targets, dense occlusions, and complex backgrounds. The weighting function f(x) is defined as follows:

f (x) = \{\begin{array}{l} 1, x \leq μ - 0.1 \\ e^{1 - μ}, μ - 0.1 < x < μ \\ e^{1 - x}, x \geq μ \end{array}

(4)

By applying EMA to the sliding threshold, the loss surface becomes smoother, reducing noise and improving generalization. This adaptive weighting mechanism allows the model to better focus on hard samples, making EMASlideLoss especially effective in complex tasks like organoid detection and small object recognition.

2.2.4. Orga-Dete

In this study, we present Orga-Dete, an organoid detection model built upon the core architecture of YOLOv11n and specifically optimized for the imaging characteristics of lung organoids. The architectural schematic of Orga-Dete is shown in Figure 4.

To mitigate background noise introduced by defocus blur—a common artifact in lung organoid microscopy—we replaced the original feature fusion network in YOLOv11n with BiFPN. BiFPN facilitates bidirectional information flow and incorporates learnable weighted feature fusion, thereby enabling cross-scale interaction within the feature pyramid. This design effectively suppresses background noise and improves target discrimination.

To address the difficulty of extracting discriminative features from dense, small-scale, low-contrast organoids, we introduce the MPCA attention mechanism. MPCA combines coordinate attention with a multi-path architecture to enhance recognition of subtle morphological differences under complex backgrounds. During feature fusion, MPCA assigns learnable channel-wise weights across hierarchical layers, adaptively emphasizing scale-specific information via backpropagation.

To overcome the issue of sample imbalance in lung organoid datasets, we substitute the original classification loss function with EMASlideLoss. This loss function not only compensates for class imbalance but also improves training stability by smoothing gradient updates through exponential moving averages. The result is enhanced detection accuracy and reduced noise during optimization.

3. Experiments and Results

3.1. Datasets

The dataset used in this study, sourced from [13], comprises 4008 bright-field microscopy images of alveolar epithelial organoids cultured in a 3D hydrogel matrix. Standardized annotation was applied to all images: (1) bounding boxes were drawn around in-focus spheroids, and (2) binary classification was performed based on morphological features. Spheroids exhibiting a pronounced central indentation were labeled as ‘With Lumen’, while those lacking such features were labeled ‘No Lumen’. Exclusion criteria included (1) defocused or blurred structures, (2) abnormal morphologies, and (3) planar monolayer structures. The limited scale and inherent class imbalance of the original dataset severely constrained the model’s generalization capability. To address this, we employed the Albumentations toolkit [39] to perform targeted data augmentation on minority class samples (including methods such as RandomBrightnessContrast, HueSaturationValue, VerticalFlip, RandomRotate90, and GaussNoise), resulting in a final dataset of 6440 images comprising 15,892 “No Lumen” samples and 8872 “Lumen” samples. Representative samples are displayed in Figure 5. A randomized partitioning strategy was used to allocate 90% of the data for training and 10% for testing.

3.2. Training Strategies

All experiments were conducted on a Windows 11 platform using the PyTorch 2.3.0 framework. The hardware setup included an AMD Ryzen 5 7500F processor and an NVIDIA GeForce RTX 4070 Ti Super (16 GB) GPU. Training was initialized with a learning rate of 0.01 and a weight decay of 0.0005. Detailed experimental conditions are provided in Table 1.

Training was initialized with a learning rate of 0.01 and a weight decay of 0.0005. The Detailed hyperparameters are provided in Table 2.

To comprehensively evaluate detection performance, we implemented a diverse benchmarking framework that includes single-stage detectors (YOLOv8n, YOLOv10n [40], YOLOv11n, and YOLOv12n [41]), two-stage detectors (Faster R-CNN with a ResNet50 backbone), and Transformer-based architectures (RTDETR-r18 [42]). Training configurations for each model are detailed in Table 3.

3.3. Evaluation Metrics

Model performance was primarily assessed using mean average precision (mAP), which is computed as the area under the precision–recall (PR) curve, providing a unified measure of both classification accuracy and localization precision. All reported mAP values use an IoU threshold of 0.50 (mAP@0.50). The corresponding calculation formula is detailed below.

P = \frac{T P}{T P + F P} \times 100 %

(5)

R = \frac{T P}{T P + F N} \times 100 %

(6)

A P = \sum_{i = 1}^{N} p_{i} Δ r_{i}

(7)

m A P = \frac{1}{n} \sum_{i = 1}^{n} A P_{i}

(8)

where TP (true positive) is the count of correctly detected targets; FP (false positive) refers to cases where the model erroneously identifies a region of one organoid class as another; FN (false negative) denotes instances where the model fails to detect an actual organoid; AP (average precision) quantifies the accuracy for a specific category; N denotes the number of targets within the same category;

p_{i}

is a function of precision;

r_{i}

is a function of recall; and

n

represents the total number of target categories. To comprehensively evaluate detection performance on medium and small targets, this study incorporates the COCO evaluation metrics [43], which categorize objects into three size classes: large (area > 96 × 96 pixels), medium (32 × 32 < area ≤ 96 × 96 pixels), and small (area ≤ 32 × 32 pixels). Average Precision (AP) is computed as the mean over 10 IoU thresholds ranging from 0.5 to 0.95 in steps of 0.05, providing increased sensitivity to localization precision.

Additionally, we introduce task-independent diagnostic evaluation (TIDE) [44], an error diagnostic framework that overcomes the aggregation limitations of the conventional mAP metric. TIDE applies a decoupled error decomposition approach, incorporating a dynamic IoU threshold strategy: it defines both a foreground determination threshold

t_{f}

and a background exclusion threshold

t_{b}

, then quantifies the spatial correlation between false-positive (FP) detection boxes and ground truth (GT) annotations using the maximum IoU overlap (

I o U^{\max}

). This enables systematic decomposition of detection errors into six core categories:

(1): Classification Error (Cls): The predicted box overlaps correctly with GT ( $I o U^{\max} \geq t_{f}$ ) but is assigned an incorrect class label.
(2): Localization Error (Loc): The class label is correct, but the box lacks sufficient spatial alignment with the GT ( $t_{b} \leq I o U^{\max} \leq t_{f}$ ).
(3): Classification + Localization Error (Cls + Loc): Both the class prediction and spatial alignment are incorrect, though the confidence score exceeds the background exclusion threshold ( $t_{b} \leq I o U^{\max} \leq t_{f}$ ).
(4): Duplicate Detection Error (Dupe): Multiple detection boxes correspond to a single GT ( $I o U^{\max} \geq t_{f}$ ) instance; only the highest-confidence box is valid, while the others are redundant.
(5): Background Error (Bkg): the background is detected as a target, and the predicted bounding box with GT satisfies $I o U^{\max} \leq t_{b}$ .
(6): Missed Detection Error (Miss): A GT instance is not matched by any detection box and is not attributable to the preceding error categories.

3.4. Experimental Results and Analysis

The experimental design centers on three core axes: model performance validation, component-wise effectiveness analysis, and cross-dataset generalization. First, comparative experiments assess the detection accuracy and computational efficiency of Orga-Dete relative to state-of-the-art models. Second, ablation studies quantify the individual contributions of key components—such as attention mechanisms and loss function strategies—to overall detection performance. Finally, generalization capability is evaluated across multiple organoid datasets, with comparisons drawn against baseline methods from existing literature.

3.4.1. Experimental Results and Comparative Analysis of Different Models

We analyze model performance across four dimensions: detection accuracy, error decomposition, computational complexity, and practical detection effectiveness. Detection accuracy is measured using metrics such as mAP@0.5, AP^small and AP^medium. Since the lung organoid dataset lacks large objects (area > 96 × 96 pixels), AP^large is consistently −1 and excluded from the analysis to preserve evaluative rigor. Error weight analysis identifies dominant error sources, while computational metrics (e.g., parameter count, model size, and frames per second [FPS]) are evaluated to balance accuracy with deployment efficiency. Finally, visual comparisons of detection outcomes illustrate model-specific strengths and limitations in organoid detection.

This study employed 10-fold cross-validation, retaining 1 fold as the validation set per iteration. The performance evaluation system was constructed based on the mean of ten validation results, with the model achieving an average MSE < 0.001. To evaluate the effects of data augmentation, we trained both YOLOv11n and Orga-Dete models using the original dataset and the augmented dataset, respectively. Experimental results demonstrated that data augmentation significantly enhanced detection accuracy. The results are summarized in Table 4.

Our model achieves the best performance across metrics. With data augmentation, it reaches 81.4% mAP@0.5, surpassing all others. Its Lumen (76.4%) and No Lumen (86.5%) accuracies lead, showing strong class-specific detection.

Notably, it excels in small/medium organoids: 51.6% AP^small and 63.5% AP^medium, far above Faster R-CNN (26.1%, 45.8%) and outperforming YOLO variants. Even without augmentation, it outperforms most models, highlighting its inherent effectiveness. This confirms its superiority in both overall and category-specific detection.

To assess model-specific error patterns, TIDE-based error analysis is conducted, with the results summarized in Table 5.

It is observed that background false positive errors are prevalent in the experimental model, primarily due to defocused organoid background noise within the dataset. Faster R-CNN demonstrates notably poor performance in the classification error category, aligning with its overall low detection accuracy. In contrast, Orga-Dete records the lowest total error weight across categories, confirming its effectiveness in improving detection precision and minimizing cumulative errors. Its strong performance in reducing classification errors is primarily attributed to the EMASlideLoss function, which effectively mitigates sample imbalance in lung organoid datasets.

This study visualizes the absolute error weights of each model using bar charts (Figure 6). Among the evaluated models, Orga-Dete exhibits a balanced distribution across error types, highlighting its robust performance and generalization capability in lung organoid detection.

A comprehensive evaluation of computational cost (FLOPs) and parameter count for each model was conducted using a 640 × 640 input resolution. The results are summarized in Table 6.

Our model (Ours) has 2.25 M params, 6.3 GFLOPs, 246 FPS, and 4.6 MB size. It outperforms YOLOv8n/10n/11n/12n in parameters and size, matches YOLOv11n in GFLOPs, and shows competitive speed. It is far more efficient than RTDETR-r18 and Faster R-CNN.

A 3D scatter plot of detection accuracy versus computational efficiency (Figure 7) illustrates the trade-off between performance and resource usage. Given that Faster R-CNN’s computational cost (149.6 GFLOPs) is 9.8× higher than the average of the other models (15.3 GFLOPs)—with no corresponding gain in accuracy—it is excluded from this plot to prevent skewed analysis due to scale imbalance. The figure demonstrates that the proposed Orga-Dete model not only boosts detection performance effectively but also lowers computational requirements and parameter counts, thereby outperforming the other models compared.

To assess real-world detection capability, four high-performing models—YOLOv8n, YOLOv11n, RTDETR-r18, and Orga-Dete—were tested on representative lung organoid images. As shown in Figure 8, Orga-Dete consistently delivers the highest detection accuracy across all test cases. Its superiority is especially pronounced in low-contrast scenes and Lumen regions with subtle deformations, where it demonstrates markedly enhanced fine-grained feature extraction compared to the alternatives.

3.4.2. Comparative Experiments on Feature Pyramid Architectures

To evaluate performance discrepancies among feature pyramid architectures for pulmonary organoid detection, we benchmarked BiMAFPN, EMBSFPN, and the Bi-directional Feature Pyramid Network (BiFPN) based on YOLOv11n. As detailed in Table 7, all three architectures enhanced detection accuracy, with BiFPN achieving optimal performance without additional computational overhead.

3.4.3. Comparative Experiments on Different Attention Mechanisms

To validate the superiority of the MPCA attention mechanism, we compared mainstream attention mechanisms—including coordinate attention (CA), parameter-free attention (SimAM), and combined spatial-channel attention (CBAM)—at the end of the backbone network in YOLOv11n (augmented with BiFPN). As shown in Table 8, MPCA achieved 80.7% mAP50 for pulmonary organoid detection, surpassing the baseline by 0.8% and significantly outperforming other mechanisms, without notably increasing computational complexity.

3.4.4. Comparative Experiments on Different Loss Functions

To address class imbalance, this study introduces the EMASlideLoss function and benchmarks it against mainstream IoU variants. As shown in Table 9, compared with other loss functions, EMASlideLoss achieves the best performance. It significantly improves the detection accuracy of hard samples without increasing computational load.

3.4.5. Ablation Experiment Analysis

To systematically validate the contributions of Orga-Dete’s improvement modules, a series of ablation experiments was conducted using the lung organoid detection dataset. Enhancement modules were incrementally introduced to the baseline model (YOLOv11n), forming the following experimental configurations: A: Baseline YOLOv11n; B: Replaces FPN + PAN with BiFPN; C: Adds MPCA after C2PSA; D: Upgrades the classification loss to EMASlideLoss. The results are summarized in Table 10 and Table 11.

Replacing the original FPN + PAN with BiFPN (Experiment B) yields substantial performance gains: AP for small and medium targets increases by 3.7% and 2.6%, respectively, without incurring significant computational overhead. Moreover, parameter count and model size are reduced by 25.6% and 23.1%, respectively, confirming that BiFPN effectively suppresses background noise and enhances target discrimination through its bidirectional, cross-layer fusion and learnable weighting.

Adding the MPCA module after C2PSA (Experiment C) further improves performance, increasing AP for small targets by 4.0% and medium targets by 1.1%. While this enhancement leads to a 12.1% increase in parameters (from 2.58 M to 2.89 M), the frame rate remains high (300 FPS), maintaining real-time viability.

Finally, upgrading the classification loss to EMASlideLoss (Experiment D) introduces no additional parameters or computational cost, yet significantly mitigates the sample imbalance problem, resulting in a measurable accuracy boost.

3.4.6. Performance on Other Organoid Datasets

To rigorously assess cross-domain generalization, this study employs a dual validation strategy using two external datasets: a human duodenal biopsy organoid dataset (1750 images, 14,242 instance annotations) from [12] and a murine intestinal organoid dataset (840 images, 23,066 four-class annotations) from [22]. Representative images are presented in Figure 9, highlighting the substantial visual disparities among the three datasets.

Table 12 lists the comparison results of the detection performance of Orga-Dete with mainstream detection models on these two representative datasets.

As shown in Table 12, Orga-Dete achieves the highest detection accuracy on the human duodenal organoid dataset (Dataset (b), mAP@0.5: 92.5%), validating its cross-domain generalization. On the murine intestinal organoid dataset (Dataset (c)), Orga-Dete achieves an mAP@0.5 of 84.4%, slightly below YOLOv8n’s 84.6%. However, Orga-Dete significantly reduces model complexity, with only 2.25 M parameters and 6.3 GFLOPs—25.0% and 22.2% less than YOLOv8n, respectively. These results demonstrate that Orga-Dete maintains competitive performance with a lightweight architecture well-suited for edge deployment.

4. Discussion and Conclusions

Organoids—three-dimensional in vitro cultures that closely mimic the structural and functional characteristics of human organs—are emerging as essential tools for disease modeling and drug discovery. However, traditional microscopic evaluation methods are hindered by subjectivity and low throughput. Deep learning approaches address these limitations by enabling automated, quantitative analysis of organoid morphology, improving both detection accuracy and processing efficiency.

To meet the challenges of detecting dense, multi-scale organoids with low-contrast morphologies, this study introduces Orga-Dete, a lightweight detection model built upon YOLOv11n and optimized through three architectural innovations: BiFPN replaces the original FPN + PAN to enable bidirectional multi-scale feature fusion without increasing computational overhead. MPCA, inserted after C2PSA, enhances micro-organoid feature representation by combining coordinate attention with a multi-path structure. EMASlideLoss, a dynamic threshold calibration loss, mitigates class imbalance and reduces the omission of hard samples. Under the dataset and experimental settings of this study, compared to YOLO variants, Faster R-CNN, and Transformer-based RTDETR-r18, Orga-Dete demonstrates superior accuracy, computational efficiency, and parameter compactness, though its generalizability remains to be validated on larger-scale cross-scenario data.

The practical value of Orga-Dete lies in its ability to perform high-speed, non-invasive morphological analysis of organoids from microscopic images—achieving millisecond-level parsing while preserving sample viability. Its parallel processing capability reduces contamination risks and overcomes throughput limitations associated with manual inspection. This enables standardized, high-throughput analysis frameworks for drug toxicity screening and dynamic efficacy evaluation, accelerating the shift from 2D cell assays to 3D tissue-based platforms in pharmaceutical research.

Nevertheless, this study acknowledges two key limitations. First, regarding data imbalance, the class distribution skew constrains Orga-Dete’s performance. Despite partially compensating for these issues, inter-class accuracy gaps remain significant, indicating the need for more balanced and comprehensive datasets. Second, concerning reduced performance in multi-class tasks with morphological heterogeneity, Orga-Dete underperforms slightly on the murine intestinal organoid dataset compared to state-of-the-art models, highlighting limitations in generalization to morphologically diverse targets.

Author Contributions

X.H.: investigation, literature search, writing—original draft preparation. Q.G., H.Z. and F.M.: writing—review and editing. D.L.: investigation, supervision. G.L.: project administration, funding acquisition. All authors contributed to the article and approved the submitted version. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Suzhou Key Technology (Research) Project of Critical and Infectious Diseases Precaution and Control (Grant No. GWZX202102) and the Strategic Priority Research Program of the Chinese Academy of Sciences (Grant No. XDA16021102).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset (a) can be downloaded from https://osf.io/g2a7r/ (accessed on 5 March 2025); the dataset (b) can be downloaded from https://osf.io/etz8r (accessed on 16 March 2025); the dataset (c) can be downloaded from https://zenodo.org/records/6768583 (accessed on 1 April 2025).

Acknowledgments

The authors extend their sincere gratitude to the academic teams at Suzhou Institute of Biomedical Engineering and Technology (Chinese Academy of Sciences) and Nanjing Normal University’s School of Electrical Engineering and Automation for their instrumental contributions to experimental design and technical validation throughout this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Jackson, E.; Lu, H. Three-dimensional models for studying development and disease: Moving on from organisms to organs-on-a-chip and organoids. Integr. Biol. 2016, 8, 672–683. [Google Scholar] [CrossRef]
Zhao, Z.; Chen, X.; Dowbaj, A.M.; Sljukic, A.; Bratlie, K.; Lin, L.; Fong, E.L.S.; Balachander, G.M.; Chen, Z.; Soragni, A. Organoids. Nat. Rev. Methods Primers 2022, 2, 94. [Google Scholar] [CrossRef]
Keshara, R.; Kim, Y.H.; Grapin-Botton, A. Organoid imaging: Seeing development and function. Annu. Rev. Cell Dev. Biol. 2022, 38, 447–466. [Google Scholar] [CrossRef]
Park, T.; Kim, T.K.; Han, Y.D.; Kim, K.-A.; Kim, H.; Kim, H.S. Development of a deep learning based image processing tool for enhanced organoid analysis. Sci. Rep. 2023, 13, 19841. [Google Scholar] [CrossRef] [PubMed]
Fei, K.; Zhang, J.; Yuan, J.; Xiao, P. Present application and perspectives of organoid imaging technology. Bioengineering 2022, 9, 121. [Google Scholar] [CrossRef] [PubMed]
Borten, M.A.; Bajikar, S.S.; Sasaki, N.; Clevers, H.; Janes, K.A. Automated brightfield morphometry of 3D organoid populations by OrganoSeg. Sci. Rep. 2018, 8, 5319. [Google Scholar] [CrossRef] [PubMed]
Bai, L.; Wu, Y.; Li, G.; Zhang, W.; Zhang, H.; Su, J. AI-enabled organoids: Construction, analysis, and application. Bioact. Mater. 2024, 31, 525–548. [Google Scholar] [CrossRef]
Zhao, Z.-Q.; Zheng, P.; Xu, S.-t.; Wu, X. Object detection with deep learning: A review. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 3212–3232. [Google Scholar] [CrossRef] [PubMed]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
Jaderberg, M.; Simonyan, K.; Zisserman, A. Spatial transformer networks. In Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, Montreal, QC, Canada, 7–12 December 2015. [Google Scholar]
Kassis, T.; Hernandez-Gordillo, V.; Langer, R.; Griffith, L.G. OrgaQuant: Human intestinal organoid localization and quantification using deep convolutional neural networks. Sci. Rep. 2019, 9, 12479. [Google Scholar] [CrossRef]
Abdul, L.; Rajasekar, S.; Lin, D.S.; Raja, S.V.; Sotra, A.; Feng, Y.; Liu, A.; Zhang, B. Deep-LUMEN assay–human lung epithelial spheroid classification from brightfield images using deep learning. Lab Chip 2020, 20, 4623–4631. [Google Scholar] [CrossRef]
Kegeles, E.; Naumov, A.; Karpulevich, E.A.; Volchkov, P.; Baranov, P. Convolutional neural networks can predict retinal differentiation in retinal organoids. Front. Cell. Neurosci. 2020, 14, 171. [Google Scholar] [CrossRef]
Wang, X.; Liao, J.; Yue, G.; He, L.; Wang, T.; Zhou, G.; Lei, B. Induced pluripotent stem cells detection via ensemble Yolo network. In Proceedings of the 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Virtual, 31 October–4 November 2021; pp. 3738–3741. [Google Scholar]
Li, X.; Xu, Z.; Shen, X.; Zhou, Y.; Xiao, B.; Li, T.-Q. Detection of cervical cancer cells in whole slide images using deformable and global context aware faster RCNN-FPN. Curr. Oncol. 2021, 28, 3585–3601. [Google Scholar] [CrossRef]
Bian, X.; Li, G.; Wang, C.; Shen, S.; Liu, W.; Lin, X.; Chen, Z.; Cheung, M.; Luo, X. OrgaNet: A deep learning approach for automated evaluation of organoids viability in drug screening. In Proceedings of the Bioinformatics Research and Applications: 17th International Symposium, ISBRA 2021, Shenzhen, China, 26–28 November 2021; Proceedings 17; pp. 411–423. [Google Scholar]
Powell, R.T.; Moussalli, M.J.; Guo, L.; Bae, G.; Singh, P.; Stephan, C.; Shureiqi, I.; Davies, P.J. deepOrganoid: A brightfield cell viability model for screening matrix-embedded organoids. SLAS Discov. 2022, 27, 175–184. [Google Scholar] [CrossRef]
Okamoto, T.; Natsume, Y.; Doi, M.; Nosato, H.; Iwaki, T.; Yamanaka, H.; Yamamoto, M.; Kawachi, H.; Noda, T.; Nagayama, S. Integration of human inspection and artificial intelligence-based morphological typing of patient-derived organoids reveals interpatient heterogeneity of colorectal cancer. Cancer Sci. 2022, 113, 2693–2703. [Google Scholar] [CrossRef]
Abdul, L.; Xu, J.; Sotra, A.; Chaudary, A.; Gao, J.; Rajasekar, S.; Anvari, N.; Mahyar, H.; Zhang, B. D-CryptO: Deep learning-based analysis of colon organoid morphology from brightfield images. Lab Chip 2022, 22, 4118–4128. [Google Scholar] [CrossRef] [PubMed]
Du, X.; Cui, W.; Song, J.; Cheng, Y.; Qi, Y.; Zhang, Y.; Li, Q.; Zhang, J.; Sha, L.; Ge, J. Sketch the Organoids from Birth to Death–Development of an Intelligent OrgaTracker System for Multi-Dimensional Organoid Analysis and Recreation. bioRxiv 2022. [Google Scholar] [CrossRef]
Domènech-Moreno, E.; Brandt, A.; Lemmetyinen, T.T.; Wartiovaara, L.; Mäkelä, T.P.; Ollila, S. Tellu–an object-detector algorithm for automatic classification of intestinal organoids. Dis. Models Mech. 2023, 16, dmm049756. [Google Scholar] [CrossRef] [PubMed]
Yang, R.; Du, Y.; Kwan, W.; Yan, R.; Shi, Q.; Zang, L.; Zhu, Z.; Zhang, J.; Li, C.; Yu, Y. A quick and reliable image-based AI algorithm for evaluating cellular senescence of gastric organoids. Cancer Biol. Med. 2023, 20, 519–536. [Google Scholar] [CrossRef]
Leng, B.; Jiang, H.; Wang, B.; Wang, J.; Luo, G. Deep-Orga: An improved deep learning-based lightweight model for intestinal organoid detection. Comput. Biol. Med. 2024, 169, 107847. [Google Scholar] [CrossRef] [PubMed]
Huang, K.; Li, M.; Li, Q.; Chen, Z.; Zhang, Y.; Gu, Z. Image-based profiling and deep learning reveal morphological heterogeneity of colorectal cancer organoids. Comput. Biol. Med. 2024, 173, 108322. [Google Scholar] [CrossRef]
Sun, Y.; Zhang, H.; Huang, F.; Gao, Q.; Li, P.; Li, D.; Luo, G. Deliod a lightweight detection model for intestinal organoids based on deep learning. Sci. Rep. 2025, 15, 5040. [Google Scholar] [CrossRef] [PubMed]
Khanam, R.; Hussain, M. Yolov11: An overview of the key architectural enhancements. arXiv 2024. [Google Scholar] [CrossRef]
Varghese, R.; Sambath, M. Yolov8: A novel object detection algorithm with enhanced performance and robustness. In Proceedings of the 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), Chennai, India, 18–19 April 2024; pp. 1–6. [Google Scholar]
Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018. [Google Scholar] [CrossRef]
Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
Jocher, G.; Stoken, A.; Borovec, J.; Changyu, L.; Hogan, A.; Diaconu, L.; Poznanski, J.; Yu, L.; Rai, P.; Ferriday, R. ultralytics/yolov5: v3.0; Zenodo: Geneva, Switzerland, 2020. [Google Scholar]
Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar]
Fang, H.; Liao, Z.; Wang, X.; Chang, Y.; Yan, L. Differentiated attention guided network over hierarchical and aggregated features for intelligent UAV surveillance. IEEE Trans. Ind. Inform. 2023, 19, 9909–9920. [Google Scholar] [CrossRef]
Huang, J.; Zhang, W.; Jin, W.; Hu, H. Surface defect detection of planar optical components based on OPT-YOLO. Opt. Lasers Eng. 2025, 190, 108974. [Google Scholar] [CrossRef]
Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13713–13722. [Google Scholar]
Jiang, T.; Zhou, J.; Xie, B.; Liu, L.; Ji, C.; Liu, Y.; Liu, B.; Zhang, B. Improved YOLOv8 model for lightweight pigeon egg detection. Animals 2024, 14, 1226. [Google Scholar] [CrossRef]
Yu, Z.; Huang, H.; Chen, W.; Su, Y.; Liu, Y.; Wang, X. Yolo-facev2: A scale and occlusion aware face detector. Pattern Recognit. 2024, 155, 110714. [Google Scholar] [CrossRef]
Buslaev, A.; Iglovikov, V.I.; Khvedchenya, E.; Parinov, A.; Druzhinin, M.; Kalinin, A.A. Albumentations: Fast and flexible image augmentations. Information 2020, 11, 125. [Google Scholar] [CrossRef]
Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J. Yolov10: Real-time end-to-end object detection. Adv. Neural Inf. Process. Syst. 2024, 37, 107984–108011. [Google Scholar]
Tian, Y.; Ye, Q.; Doermann, D. Yolov12: Attention-centric real-time object detectors. arXiv 2025, arXiv:2502.12524. [Google Scholar] [CrossRef]
Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. Detrs beat yolos on real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 16965–16974. [Google Scholar]
Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the Computer Vision—ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Part v 13; pp. 740–755. [Google Scholar]
Bolya, D.; Foley, S.; Hays, J.; Hoffman, J. Tide: A general toolbox for identifying object detection errors. In Proceedings of the Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Part III 16; pp. 558–573. [Google Scholar]

Figure 1. Network architecture diagram of YOLOv11.

Figure 2. Feature fusion network architecture (a) FPN; (b) FPN + PAN; (c) BiFPN.

Figure 3. Network architecture diagram of MPCA.

Figure 4. Network architecture diagram of Orga-Dete.

Figure 5. Representative samples of lung organoids: (a) detection output of Orga-Dete, where “L” denotes “With Lumen”, while “N” indicates “No Lumen”; (b) lung organoid with lumen; (c) lung organoid without lumen; (d) defocused lung organoid.

Figure 6. Comparison of absolute error weights across models.

Figure 7. Detection performance, computational cost, and parameter count for each model.

Figure 8. Comparison of practical detection effectiveness across models.

Figure 9. Representative images of organoid datasets: (a) lung organoids, (b) duodenal biopsy organoid dataset, and (c) murine intestinal organoid dataset.

Table 1. The experimental environment.

Software Environment	Hardware Environment
Operating system: Windows 11	CPU: AMD R5-7500f
Programming language: Python 3.10	GPU:RTX 4070Ti Super(16GB) × 1
Deep learning framework: PyTorch 2.3.0
Accelerated environment: CUDA 12.1

Table 2. Hyperparameter setting.

Parameters	Values
lr0	0.01
lrf	0.01
Momentum	0.937
Weight decay	0.0005
Warmup epochs	3
Warmup momentum	0.8
Batch size	8
Epochs	300
Workers	4

Table 3. Model parameters.

Model	Optimizer	Initial Learning Rate	Final Learning Rate	Batch Size	Epoch Number
YOLOv8n	SGD	0.01	0.01	8	300
YOLOv10n	SGD	0.01	0.01	8	300
YOLOv11n	SGD	0.01	0.01	8	300
YOLOv12n	SGD	0.01	0.01	8	300
Faster R-CNN	Adam	0.0001	0.01	8	300
RTDETR-r18	AdamW	0.0001	1.0	8	300
Ours	SGD	0.01	0.01	8	300

Table 4. Performance comparison between various network models.

Model	Lumen	No Lumen	mAP@0.5	AP^small	AP^medium	AP^all
YOLOv8n	74.0± 0.2%	82.4 ± 0.3%	78.2 ± 0.2%	46.1 ± 0.1%	60.3 ± 0.2%	59.4 ± 0.2%
YOLOv10n	70.6 ± 0.2%	84.2 ± 0.1%	77.3 ± 0.2%	48.2 ± 0.2%	55.7 ± 0.3%	54.6 ± 0.2%
YOLOv11n (without data augmentation)	65.8 ± 0.1%	82.8 ± 0.2%	74.3 ± 0.1%	40.1 ± 0.2%	55.8 ± 0.1%	54.2 ± 0.2%
YOLOv11n	70.8 ± 0.2%	84.9 ± 0.2%	77.9 ± 0.1%	45.1 ± 0.2%	58.3 ± 0.1%	58.1 ± 0.2%
YOLOv12n	70.1 ± 0.1%	83.7 ± 0.1%	76.9 ± 0.1%	41.6 ± 0.2%	54.4 ± 0.1%	53.6 ± 0.1%
RTDETR-r18	70.8 ± 0.1%	84.1 ± 0.2%	77.4 ± 0.1%	48.1 ± 0.1%	59.5 ± 0.1%	57.4 ± 0.1%
Faster R-CNN	69.4 ± 0.2%	72.6 ± 0.1%	71.0 ± 0.1%	26.1 ± 0.2%	45.8 ± 0.3%	44.1 ± 0.3%
Ours (without data augmentation)	73.4 ± 0.1%	83.5 ± 0.3%	78.5 ± 0.2%	49.8 ± 0.1%	60.5 ± 0.1%	59.6 ± 0.1%
Ours	76.4 ± 0.1%	86.5 ± 0.2%	81.4 ± 0.2%	51.6 ± 0.1%	63.5 ± 0.2%	62.8 ± 0.1%

Table 5. Comparison of the absolute weight of each model error.

Model	Cls	Loc	Both	Dupe	Bkg	Miss
YOLOv8n	6.33 ± 0.44	0.70 ± 0.18	0.41 ± 0.12	0.00 ± 0.00	11.06 ± 0.74	0.00 ± 0.00
YOLOv10n	6.78 ± 0.47	0.60 ± 0.17	0.53 ± 0.14	0.01 ± 0.01	12.03 ± 0.78	0.00 ± 0.00
YOLOv11n	6.60 ± 0.45	0.63 ± 0.18	0.47 ± 0.10	0.01 ± 0.00	12.31 ± 0.81	0.00 ± 0.00
YOLOv12n	7.11 ± 0.49	0.68 ± 0.20	0.41 ± 0.13	0.00 ± 0.00	11.14 ± 0.76	0.00 ± 0.00
RTDETR-r18	13.66 ± 0.75	0.57 ± 0.18	0.46 ± 0.14	0.01 ± 0.01	8.61 ± 0.66	0.00 ± 0.00
Faster R-CNN	32.88 ± 1.36	2.19 ± 0.37	1.73 ± 0.30	2.29 ± 0.37	2.84 ± 0.41	0.20 ± 0.12
Ours	6.23 ± 0.44	0.44 ± 0.15	0.32 ± 0.11	0.00 ± 0.00	10.61 ± 0.72	0.00 ± 0.00

Table 6. Comparison of the number of computations and parameters for each model.

Model	Params (M)	GFLOPs (s)	FPS	Model Size (MB)
YOLOv8n	3.0	8.1	303 ± 5	6.0
YOLOv10n	2.7	8.2	416 ± 8	5.5
YOLOv11n	2.58	6.3	280 ± 6	5.2
YOLOv12n	2.5	5.8	231 ± 4	5.2
RTDETR-r18	19.9	56.9	100 ± 3	38.6
Faster R-CNN	28.28	149.6	69 ± 3	108
Ours	2.25	6.3	246 ± 5	4.6

Table 7. Comparative experiments on feature pyramid architectures.

Model	mAP@0.5	Params (M)	GFLOPs (s)
YOLOv11n	77.9 ± 0.1%	2.58	6.3
YOLOv11n + BiFPN	79.9 ± 0.1%	1.92	6.3
YOLOv11n + BiMAFPN	78.8 ± 0.2%	2.12	6.6
YOLOv11n + EMBSFPN	79.1 ± 0.1%	2.60	6.8

Table 8. Comparative experiments on different attention mechanisms.

Model	mAP@0.5	Params (M)	GFLOPs (s)
YOLOv11n + BiFPN	79.9 ± 0.1%	1.92	6.3
YOLOv11n + BiFPN + CA	80.1 ± 0.1%	1.93	6.3
YOLOv11n + BiFPN + CBAM	78.3 ± 0.3%	1.99	6.3
YOLOv11n + BiFPN + SimAM	77.1 ± 0.2%	1.92	6.3
YOLOv11n + BiFPN + MPCA	80.7 ± 0.1%	2.25	6.3

Table 9. Comparative experiments on different loss functions.

IoU	Lumen	No Lumen	mAP@0.5
CIoU	75.7 ± 0.3%	84.5 ± 0.2%	80.1 ± 0.2%
EIoU	72.6 ± 0.2%	83.4 ± 0.0%	78.0 ± 0.1%
GIoU	75.1 ± 0.2%	81.9 ± 0.2%	78.5 ± 0.2%
Focal loss	76.8 ± 0.1%	85.4 ± 0.3%	81.1 ± 0.2%
EMASlideLoss	76.2 ± 0.2%	86.5 ± 0.1%	81.4 ± 0.1%

Table 10. Comparison results of ablation experiments of different models.

A	B	C	D	Lumen	No Lumen	mAP@0.5	AP^small	AP^medium	AP^all
✓				70.8 ± 0.2%	84.9 ± 0.2%	77.9 ± 0.2%	45.1 ± 0.2%	58.8 ± 0.1%	58.1 ± 0.1%
✓	✓			73.8 ± 0.1%	86.0 ± 0.2%	79.9 ± 0.1%	48.8 ± 0.3%	61.4 ± 0.2%	60.7 ± 0.2%
✓		✓		72.3 ± 0.2%	86.1 ± 0.1%	79.2 ± 0.1%	49.1 ± 0.1%	59.9 ± 0.1%	59.1 ± 0.2%
✓			✓	74.9 ± 0.2%	83.7 ± 0.1%	79.3 ± 0.1%	45.4 ± 0.2%	59.0 ± 0.2%	58.5 ± 0.1%
✓	✓	✓		74.3 ± 0.1%	87.0 ± 0.2%	80.7 ± 0.1%	50.7 ± 0.2%	62.7 ± 0.2%	62.1 ± 0.2%
✓	✓	✓	✓	76.2 ± 0.1%	86.5 ± 0.2%	81.4 ± 0.2%	51.6 ± 0.1%	63.5 ± 0.2%	62.8 ± 0.1%

Table 11. Computational cost of ablation experiments using different models.

A	B	C	D	Params (M)	GFLOPs	FPS	Model Size (MB)
✓				2.58	6.3	280 ± 6	5.2
✓	✓			1.92	6.3	269 ± 9	4
✓		✓		2.89	6.4	300 ± 13	5.3
✓			✓	2.58	6.3	276 ± 8	5.2
✓	✓	✓		2.25	6.3	243 ± 4	4.6
✓	✓	✓	✓	2.25	6.3	246 ± 5	4.6

Table 12. Performance on other organoid datasets.

Dataset	Detection Model	mAP@0.5
Dataset (b)	YOLOv8n	91.3 ± 0.4%
	YOLOv11n	91.6 ± 0.3%
	RTDETR-r18 YOLOv12n	90.1 ± 0.2% 91.7 ± 0.3%
	Ours	92.5 ± 0.3%
Dataset (c)	YOLOv8n	84.6 ± 0.2%
Dataset (c)	YOLOv11n RTDETR-r18 YOLOv12n Ours	83.5 ± 0.2% 83.2 ± 0.1% 82.3 ± 0.3% 84.4 ± 0.3%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huang, X.; Gao, Q.; Zhang, H.; Min, F.; Li, D.; Luo, G. Orga-Dete: An Improved Lightweight Deep Learning Model for Lung Organoid Detection and Classification. Appl. Sci. 2025, 15, 8377. https://doi.org/10.3390/app15158377

AMA Style

Huang X, Gao Q, Zhang H, Min F, Li D, Luo G. Orga-Dete: An Improved Lightweight Deep Learning Model for Lung Organoid Detection and Classification. Applied Sciences. 2025; 15(15):8377. https://doi.org/10.3390/app15158377

Chicago/Turabian Style

Huang, Xuan, Qin Gao, Hanwen Zhang, Fuhong Min, Dong Li, and Gangyin Luo. 2025. "Orga-Dete: An Improved Lightweight Deep Learning Model for Lung Organoid Detection and Classification" Applied Sciences 15, no. 15: 8377. https://doi.org/10.3390/app15158377

APA Style

Huang, X., Gao, Q., Zhang, H., Min, F., Li, D., & Luo, G. (2025). Orga-Dete: An Improved Lightweight Deep Learning Model for Lung Organoid Detection and Classification. Applied Sciences, 15(15), 8377. https://doi.org/10.3390/app15158377

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Orga-Dete: An Improved Lightweight Deep Learning Model for Lung Organoid Detection and Classification

Abstract

1. Introduction

2. Methodology

2.1. YOLOv11n

2.2. Proposed Model Orga-Dete

2.2.1. Bi-Directional Feature Pyramid Network

2.2.2. Multi-Path Coordinate Attention

2.2.3. EMASlideLoss

2.2.4. Orga-Dete

3. Experiments and Results

3.1. Datasets

3.2. Training Strategies

3.3. Evaluation Metrics

3.4. Experimental Results and Analysis

3.4.1. Experimental Results and Comparative Analysis of Different Models

3.4.2. Comparative Experiments on Feature Pyramid Architectures

3.4.3. Comparative Experiments on Different Attention Mechanisms

3.4.4. Comparative Experiments on Different Loss Functions

3.4.5. Ablation Experiment Analysis

3.4.6. Performance on Other Organoid Datasets

4. Discussion and Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI