Optimal Cell Segmentation and Counting Strategy for Embedding in Low-Power AIoT Devices

Park, Gunwoo; Park, Junmin; Lee, Sungjin

doi:10.3390/app16010357

Open AccessArticle

Optimal Cell Segmentation and Counting Strategy for Embedding in Low-Power AIoT Devices

by

Gunwoo Park

,

Junmin Park

and

Sungjin Lee

^*

Department of Smart Automotive, Soonchunhyang University, 22 Soonchunhyang-ro, Sinchang-myeon, Asan-si 31538, Chungcheongnam-do, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(1), 357; https://doi.org/10.3390/app16010357 (registering DOI)

Submission received: 9 November 2025 / Revised: 19 December 2025 / Accepted: 27 December 2025 / Published: 29 December 2025

(This article belongs to the Special Issue Advanced Intelligent Technologies in Bioinformatics and Biomedicine)

Download

Browse Figures

Versions Notes

Abstract

This study proposes an end-to-end (E2E) optimization methodology for a white blood cell (WBC) cell segmentation and counting (CSC) pipeline with a focus on deployment to low-power Artificial Intelligence of Things (AIoT) devices. The proposed framework addresses not only the selection of the segmentation model but also the corresponding loss function design, watershed threshold optimization for cell counting, and model compression strategies to balance accuracy, latency, and model size in embedded AIoT applications. For segmentation model selection, UNet, UNet++, ResUNet, EffUNet, FPN, BiFPN, PFPN, Cell-ViT, Evit-UNet and MAXVitUNet were employed, and three types of loss functions—binary cross-entropy (BCE), focal loss, and Dice loss—were utilized for model training. For cell-counting accuracy optimization, a distance transform-based watershed algorithm was applied, and the optimal threshold value was determined experimentally to lie within the range of 0.4 to 0.9. Quantization and pruning techniques were also considered for model compression. Experimental results demonstrate that using an FPN model trained with focal loss and setting the watershed threshold to 0.65 yields the optimal configuration. Compared to the latest baseline techniques, the proposed CSC E2E pipeline achieves a 21.1% improvement in cell-counting accuracy while reducing model size by 74.5% and latency by 16.8% through model compression. These findings verify the effectiveness of the proposed optimization strategy as a lightweight and efficient solution for real-time biomedical applications on low-power AIoT devices.

Keywords:

white blood cell; cell segmentation; cell counting; model compression

1. Introduction

Accurate segmentation and counting of white blood cells (WBCs) in peripheral blood smear images play a critical role not only in hematological diagnosis but also in a wide range of clinical and industrial applications, including automated hematology analyzers, tele-hematology platforms, and clinical laboratory automation systems [1,2,3]. Morphological differentiation and quantitative analysis of WBCs provide concentration-related information that serves as a key biomarker for diagnosing and assessing the prognosis of infectious, autoimmune, and hematologic neoplastic diseases. Traditionally, hematologists manually identify and classify cells under a microscope using stained smear slides. However, this process is time-consuming, labor-intensive, and subject to human error and inter-observer variability [4,5]. Such manual dependence not only limits clinical reliability but also reduces scalability for large-population studies and high-throughput biomedical research.

The cell segmentation and counting (CSC) technique has significant industrial potential across various biomedical fields. In clinical laboratories, it enables standardized, rapid, and reproducible diagnostic workflows, thereby reducing the workload of hematologists and minimizing human errors [6]. In digital pathology and tele-hematology, these automated systems provide a foundation for remote diagnostic platforms and deliver substantial value in regions with limited access to specialized hematologists [7]. In the pharmaceutical industry, automated WBC quantification plays an essential role in evaluating drug toxicity and immune responses across large-scale sample cohorts, thereby contributing to drug development pipelines and personalized precision medicine [8]. Integrating AI-based cell segmentation models into automated hematology analyzers holds the potential to transform the healthcare industry by enhancing both clinical utility and economic scalability.

Recent advances in computer vision and artificial intelligence [9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25] have created new opportunities to automate WBC segmentation and counting with improved accuracy and efficiency. Early studies utilized conventional image-processing methods such as thresholding and morphological operations [4,20,21]. However, these methods were sensitive to illumination variations, staining inconsistencies, and cell overlap, making them unreliable in practical clinical environments. In contrast, recent deep learning-based approaches, particularly CNN-based architectures such as U-Net [9], U-Net++ [10], Mask R-CNN [22], and Hover-Net [23], have demonstrated superior pixel-wise segmentation performance compared with rule-based techniques. These models exhibit strong boundary extraction and separation capabilities in overlapping-cell regions, and their performance is commonly evaluated using metrics such as Dice coefficient, mean intersection over union (mIoU), and sensitivity. Furthermore, recent studies [20,21,24,25] have compared various segmentation networks to identify optimal architectures for cell segmentation. Nevertheless, their performance remains limited, as they focus primarily on segmentation without incorporating counting.

However, optimizing segmentation accuracy alone does not necessarily guarantee reliable cell counting performance, as errors in boundary precision and instance separation can propagate and be amplified in subsequent counting stages. In practice, segmentation, counting, and deployment constraints are often optimized independently, leading to fragmented pipelines that lack consistency across tasks. This motivates the need for an integrated end-to-end (E2E) optimization framework that jointly considers segmentation quality, counting calibration, and on-device efficiency to ensure robust and deployable CSC performance.

In this paper, we propose an optimization strategy for cell segmentation and counting using the WBC dataset [26] and Chinese Hamster Ovary (CHO) cell [27], and we further present a deep learning-based model compression strategy for commercialization in biomedical diagnostic systems. The proposed optimization framework achieves state-of-the-art (SOTA) segmentation and counting performance while maintaining high model efficiency optimized for low-power AIoT deployment.

The primary contributions of this study are summarized as follows:

Optimization of Cell Segmentation: This study presents a performance optimization strategy for cell segmentation by evaluating various data augmentation techniques, segmentation backbones, and loss functions. The proposed approach enhances segmentation performance in terms of accuracy, Dice coefficient, sensitivity, and mIoU.
Optimization of the End-to-End CSC Pipeline: To improve the accuracy of the CSC pipeline, multiple segmentation models and distance transform-based watershed threshold configurations were compared. Through this analysis, the optimal combination of segmentation and counting techniques for the CSC E2E pipeline was derived.
Model Lightening and AIoT Deployment Optimization: A model lightening methodology was proposed to enable efficient deployment on low-power AIoT devices. This includes strategies for balancing accuracy, latency, and model size to achieve optimal performance under hardware-constrained environments.

2. Related Works

Early studies on cell segmentation primarily relied on image preprocessing and morphological operations. Techniques such as Otsu thresholding, K-means clustering, and the watershed transform were widely adopted to separate nuclei and cytoplasm based on staining intensity differences [4,5]. However, these conventional methods are highly sensitive to illumination non-uniformity, staining variability, and overlapping cells in smear images, making it difficult to ensure consistent segmentation performance in clinical environments [20].

In recent years, data-driven cell segmentation approaches based on convolutional neural networks (CNNs) [9,10,22,25] have achieved remarkable progress. Among them, UNet [9] demonstrated outstanding performance in biomedical image segmentation through its symmetric encoder–decoder architecture, while UNet++ [10,25] improved boundary restoration by introducing nested and dense skip connections. Mask R-CNN [22] extended object detection frameworks to instance-level segmentation, enabling cell-wise prediction within complex scenes. More recently, Transformer-based models [16] have been introduced to overcome the locality limitation of CNNs. Models such as CellViT [17], EvitUNet [18] and MaxVitUNet [19] leverage a self-attention mechanism to capture global structural relationships among cells, achieving robust segmentation, even under low-contrast or highly overlapped conditions. These approaches have demonstrated improved accuracy over CNN-based methods, particularly in challenging overlapping or densely packed cell regions.

Traditionally, cell counting has been performed using blob detection or watershed-based centroid localization. However, in high-density or overlapping regions, these methods often suffer from unstable centroid estimation. To address this limitation, density regression-based approaches were proposed. Lempitsky and Zisserman [28] introduced a density map regression technique that models object centers with Gaussian kernels, while CSRNet [29] employed dilated convolutions to enhance counting accuracy in highly congested cellular regions. More recently, end-to-end architectures that jointly perform segmentation and counting have emerged. These models learn both the segmentation decoder and the density regression head simultaneously, achieving consistent optimization between pixel-level and instance-level predictions [30]. In particular, multi-task learning frameworks have been shown to improve both boundary accuracy in segmentation and quantitative precision in counting by enabling mutual feature sharing between tasks.

Recent advances in cell segmentation have increasingly focused on foundation models and label-efficient learning strategies to improve generalization across diverse microscopy conditions. Study [31] introduced Segment Anything for Microscopy (μSAM), which adapts the Segment Anything Model (SAM) to both light and electron microscopy through domain-specific fine-tuning. By unifying interactive and automatic segmentation within a single framework, μSAM demonstrated competitive or superior performance compared to CellPose and traditional microscopy tools across multiple datasets and modalities Similarly, CellSAM [32] extended the foundation-model paradigm to cell segmentation by leveraging large-scale pretraining, enabling robust segmentation under limited annotation scenarios and heterogeneous imaging conditions. In parallel, recent studies [33] on unsupervised cell instance segmentation have explored object-centric embedding learning, allowing accurate separation of individual cells without pixel-level supervision. Such approaches significantly reduce annotation costs while maintaining competitive segmentation accuracy. Collectively, these works indicate a clear trend toward foundation models and self-supervised learning as key enablers for scalable and generalizable cell segmentation systems.

Despite these advancements, most existing studies are limited to dataset-specific optimization and lack sufficient validation under domain shift conditions. Furthermore, visual uncertainty factors in microscopic imagery—such as staining variations, defocus blur, and overlapping cells—remain largely unaddressed. Furthermore, foundation model-based and unsupervised learning methods typically require large model capacities in the absence of explicit labels, which leads to increased computational and memory demands, thereby limiting their applicability to low-power IoT devices.

In parallel, model optimization for lightweight and real-time inference has become an increasingly important research direction. In medical imaging, reducing model size and computational complexity is essential for on-device inference in embedded or portable systems. Quantization and pruning techniques [34] are representative approaches for model compression, reducing both parameter size and FLOPs by applying fixed-point arithmetic or eliminating redundant weights. More recent studies leverage Knowledge Distillation (KD) [35] and Neural Architecture Search (NAS) [36] to automatically discover efficient architectures that preserve accuracy while being suitable for deployment on embedded AI hardware. These strategies provide the technological foundation for implementing real-time cell segmentation and counting models in low-power edge devices such as clinical microscopes and portable hematology analyzers.

3. System Model

Figure 1 illustrates the proposed methodology for optimizing the end-to-end (E2E) cell segmentation and counting (CSC) pipeline designed for low-power AIoT devices. The overall optimization process consists of three sequential stages: cell segmentation, cell counting, and model compression.

3.1. Cell Segmentation

3.1.1. Segmentation Model Selection

For the cell segmentation task, various encoder–decoder-based deep neural networks were considered, including the UNet family—UNet [9], UNet++ [10], and ResUNet [11]—as well as an EfficientNet-based variant (EffUNet) [12]. Additionally, feature pyramid network (FPN) structures such as FPN [13], BiFPN [14], and PFPN [15], and transformer-based architectures including CellViT [17], EvitUNet [18] and MaxVitUNet [19] were employed for performance comparison.

Although all models share an encoder–decoder backbone structure, their feature extraction mechanisms and skip-connection strategies lead to distinct representational capacities and computational efficiencies. The baseline UNet [9] employs a symmetric encoder–decoder architecture that effectively reconstructs clear boundaries between nuclei and cytoplasm. UNet++ [10], built upon DenseNet [37], introduces dense skip connections to enhance multi-scale feature fusion between encoder and decoder, thereby improving boundary delineation among adjacent cells. ResUNet [11] incorporates residual connections to mitigate the gradient vanishing problem in deep neural networks and to achieve robust representation learning under morphological discontinuities or staining inconsistencies. EffUNet combines an EfficientNet encoder with a UNet decoder to provide computationally efficient segmentation performance optimized for embedded AI applications.

The FPN [13] and BiFPN [14] architectures utilize feature pyramid structures to jointly capture high-resolution cell structures and low-resolution global context. BiFPN further refines this approach using bidirectional feature fusion, enhancing efficiency in feature aggregation and providing robust segmentation performance in blood smear images with variable cell size and shape. PFPN [15] adds a pyramid attention mechanism that assigns spatially adaptive weights, enabling better differentiation between cellular centers and boundary regions.

Recently, transformer-based segmentation models such as CellViT [17], EvitUNet [18], and MaxVitUNet [19] have gained attention for their ability to learn global contextual relationships among cells via self-attention. CellViT adopts a hybrid structure combining a Vision Transformer and a CNN backbone to simultaneously capture local morphology and global context. EvitUNet introduces an efficient token-reduction mechanism to lower the computational complexity of the transformer while maintaining precise boundary recognition. MaxViT-UNet represents a lightweight UNet-based medical image segmentation framework that incorporates multi-axis attention into both the encoder and decoder, effectively combining local convolutional representations with global Transformer-based contextual modeling.

These architectures, each with distinct structural characteristics, were evaluated under identical dataset and experimental conditions to determine the optimal model for the cell segmentation task in terms of accuracy, complexity, and training efficiency. All models were trained using the same preprocessing pipeline, hyperparameters, and three loss functions—binary cross entropy (BCE) [38], Dice loss [39], and focal loss [40]—and were evaluated with Dice coefficient, mIoU, precision, and recall metrics.

To further clarify the rationale behind the segmentation model selection, the considered architectures were intentionally chosen to represent diverse design philosophies along three key dimensions: feature fusion strategy, contextual modeling capability, and computational efficiency. UNet-based models emphasize precise spatial localization through skip connections, making them suitable for delineating fine cellular boundaries in microscopy images. FPN-based models extend this capability by explicitly modeling multi-scale representations, which is particularly effective in blood smear images where cell sizes and shapes vary significantly. In contrast, transformer-based models aim to capture long-range dependencies and global context via self-attention, which can be advantageous in dense or overlapping cell regions. However, such global modeling often incurs higher computational cost and memory overhead. By systematically evaluating these heterogeneous architectures under identical experimental conditions, this study aims to provide a comprehensive analysis of how different architectural choices impact segmentation accuracy, boundary sensitivity, and deployment feasibility on low-power AIoT devices. This comparative design enables a principled selection of segmentation models that balance accuracy and efficiency within an end-to-end CSC pipeline.

3.1.2. Training Loss Selection

For training the aforementioned segmentation models, three loss functions were employed: BCE, Dice loss, and focal loss. Each loss function targets different aspects of segmentation performance: pixel-level prediction accuracy, class-imbalance correction, and robustness in boundary regions, respectively.

The BCE loss computes pixel-wise cross-entropy between the predicted probability

p_{i}

and the ground-truth label

g_{i} \in {0, 1}

, enforcing discrimination between cell (positive) and background (negative) pixels:

L_{BCE} = - \frac{1}{N} \sum_{i = 1}^{N} [g_{i} log (p_{i}) + (1 - g_{i}) log (1 - p_{i})]

(1)

Although BCE directly reflects prediction confidence, it becomes less effective under severe class imbalance, where positive (cell) regions contribute less to the total loss.

Dice loss measures similarity based on the overlap between prediction and ground-truth masks. Given a predicted mask P and a ground-truth mask G, it is defined as

L_{Dice} = 1 - \frac{2 \sum_{i = 1}^{N} p_{i} g_{i} + ϵ}{\sum_{i = 1}^{N} p_{i} + \sum_{i = 1}^{N} g_{i} + ϵ}

(2)

where

ϵ

is a small constant for numerical stability. Dice loss effectively compensates for class imbalance and enhances segmentation performance near cell boundaries.

Focal loss extends BCE by reducing the relative loss contribution of well-classified samples and focusing on hard examples [40]. It is formulated as

L_{Focal} = - \frac{1}{N} \sum_{i = 1}^{N} [α {(1 - p_{i})}^{γ} g_{i} log (p_{i}) + (1 - α) p_{i}^{γ} (1 - g_{i}) log (1 - p_{i})]

(3)

Here,

α \in [0, 1]

is the weighting factor for the positive class, and

γ > 0

is a focusing parameter. A higher

γ

value reduces the influence of easy samples, forcing the model to learn from difficult ones. In this study,

α = 0.25

,

γ = 2.0

were adopted.

3.2. Cell Counting

Distance Transform-Based Watershed

To quantify the number of cells, this study applies a distance transform-based watershed algorithm to the cell segmentation result mask. This approach defines the center of each cell as a seed and simulates the spread of water from the center toward the boundary to determine borders between overlapping cells.

Let the cell segmentation result map be represented as

M (x, y) \in 0, 1

. The distance transform

D (x, y)

is defined as the Euclidean distance from each pixel to the nearest cell boundary as follows:

D (x, y) = min_{(x^{'}, y^{'}) \in \partial M} {∥ (x, y) - (x^{'}, y^{'}) ∥}_{2},

(4)

where

\partial M

denotes the set of boundary pixels, and the value of

D (x, y)

reaches its maximum at the cell center.

The local maxima of the distance transform are used as candidates for cell center seeds. However, due to noise or structural variations within cells, unnecessary seeds may be generated. Therefore, in this study, a watershed threshold

T_{w}

is applied to select valid seeds:

S (x, y) = \{\begin{matrix} 1, & if D (x, y) \geq T_{w} \cdot D_{max}, \\ 0, & otherwise, \end{matrix}

(5)

where

D_{max} = {max}_{(x, y)} D (x, y)

, and

T_{w} \in [0, 1]

is a hyperparameter controlling the density of seed points. A larger

T_{w}

value reduces the number of seeds, potentially leading to under-segmentation, whereas a smaller

T_{w}

value may cause over-segmentation.

The watershed algorithm treats the distance map as a topographical elevation model and performs segmentation by simulating the water flooding process. When the distance transform is negated, cell centers correspond to valleys and boundaries to ridges:

H (x, y) = - D (x, y)

(6)

Then, the seed map

S (x, y)

is used as the initial minima for the watershed operation:

L (x, y) = Watershed (H (x, y), S (x, y))

(7)

The final labeled map

L (x, y)

contains unique labels representing individual cells, and the total number of distinct labels

N_{c} = | unique (L (x, y)) |

corresponds to the final cell count.

The watershed threshold

T_{w}

is optimized to minimize the mean absolute error (MAE) between the predicted and ground-truth cell counts:

T_{w}^{*} = arg min_{T_{w} \in [0, 1]} MAE (N_{c} (T_{w}), N_{G T})

(8)

where

N_{G T}

denotes the ground-truth cell count.

3.3. Model Compression

To enable lightweight and real-time inference of the cell segmentation model, various model compression techniques were applied. Specifically, considering the trade-off between accuracy, latency, and model size, FP16 quantization, INT8 quantization, and pruning methods were analyzed and compared. Among these, the configuration that achieved the optimal balance between performance and efficiency was selected as the final model.

3.3.1. FP16 Quantization

FP16 Quantization reduces the precision of floating-point operations from 32-bit (FP32) to 16-bit, effectively halving memory consumption while maintaining a comparable dynamic range. FP16 operations are natively supported by GPUs and AI accelerators and can achieve nearly the same inference speed as FP32 while providing roughly a 50% reduction in model size. In this study, FP16 quantization was applied as a post-training quantization procedure using the PyTorch 2.8.0’s native quantization utilities (torch.ao.quantization). Neither quantization-aware training nor additional calibration steps were required, as the numerical range of FP16 sufficiently preserved the distribution of weights and activations in the evaluated models. As a result, the accuracy degradation after quantization remained below 1% across all architectures, confirming that FP16 quantization provides an effective and low-overhead compression strategy for real-time CSC deployment on low-power AIoT devices.

3.3.2. INT8 Quantization

INT8 Quantization converts model weights and activations into 8-bit integers, substantially reducing computation and memory access. In this study, quantization-aware training (QAT) was adopted to integrate the quantization process directly into training. Each layer’s weights w and activations a are scaled as follows:

Q (w) = clip (round (\frac{w}{s_{w}}), q_{min}, q_{max}), Q (a) = clip (round (\frac{a}{s_{a}}), q_{min}, q_{max})

(9)

where

s_{w}

and

s_{a}

are scale factors, and

q_{min}, q_{max}

represent the integer range. Through QAT, quantization-induced error accumulation was minimized, theoretically reducing model size by a factor of four compared to FP32 precision.

3.3.3. Structured Pruning

Pruning eliminates parameters with low contribution to obtain model sparsity and reduce computational cost. In this study, a structured filter pruning method was applied to evaluate filter-level importance. The importance of the k-th filter,

α_{k}

, is defined by its mean magnitude:

α_{k} = \frac{1}{N_{k}} \sum_{i = 1}^{N_{k}} | w_{k, i} |, P = {k ∣ α_{k} < τ_{p}}

(10)

where

τ_{p}

denotes the pruning threshold. Filters with lower importance values were removed, followed by fine-tuning of the remaining structure to recover potential performance loss.

4. Experimental Results

The results of the cell segmentation model and training loss methods used for optimizing the CSC E2E pipeline are presented in Table 1. In addition, the optimization results for cell counting are shown in Table 2, and the outcomes of the CSC model compression are summarized in Table 3.

For clarity, the ground truth (GT) used in this study is defined as follows. The GT for cell segmentation was directly obtained from the pixel-wise segmentation annotations provided in the publicly available dataset. For cell counting, the GT values were derived by manually counting individual cells based on the corresponding GT segmentation masks, ensuring consistency between segmentation and counting evaluations. This separation of GT definitions allows a clear and fair assessment of both segmentation accuracy and downstream counting performance within the proposed CSC E2E pipeline.

4.1. Performance Analysis of Cell Segmentation

Table 1 presents the segmentation performance results on the blood cell segmentation dataset, comparing three loss functions: BCE, focal, and dice. The evaluation metrics include accuracy (Acc), Dice coefficient (Dice), sensitivity (Sens), and mean intersection over union (mIoU).

Overall, all models achieve over 97% accuracy, 95% Dice, 93% sensitivity, and 89% mIoU, demonstrating stable segmentation performance across most configurations.

From a loss-function perspective, BCE, focal, and Dice losses all show stable accuracy, Dice coefficient, and sensitivity results, while mIoU performance under focal loss is slightly less stable. Under BCE and Dice losses, most models achieve nearly identical Dice scores, whereas focal loss tends to show slightly lower Dice values (by about 0.2–0.4 on average). This is because focal loss emphasizes hard examples near cell boundaries, providing better boundary discrimination, but since this dataset features relatively balanced blood-cell distributions, its benefits are only partially manifested.

From a model perspective, the UNet family (UNet, UNet++, and ResUNet) consistently achieves the highest accuracy, Dice coefficient, sensitivity, and mIoU values under BCE and Dice losses. Meanwhile, Transformer-based architectures such as CellViT, Evit-UNet and MaxVitUNet, while offering strong global feature representation and better generalization across morphological variations, show slightly lower boundary precision when trained on smaller datasets.

Regarding model size and latency efficiency, FPN and ResUNet perform the best. FPN achieves the smallest model size (16.7 MB) and the fastest inference time (21.5 ms) without any loss of accuracy, making it ideal for real-time applications. ResUNet also provides balanced performance, with a model size of 26.9 MB and latency of 21.5 ms, benefiting from its residual lightweight architecture. In contrast, UNet++ shows higher latency (28.9 ms) due to its multiple skip-path structures, while EffUNet experiences increased latency due to the multi-stage complexity of the Mobile inverted bottleneck convolution (MBConv) in its EfficientNet encoder. Similarly, BiFPN suffers from heavier computation caused by multiple bidirectional feature fusion paths.

In conclusion, UNet, ResUNet, and FPN deliver the most balanced performance, achieving high segmentation accuracy (Dice ≥ 96.5%) and low latency (≤22 ms). Among them, FPN demonstrates the best trade-off between accuracy and efficiency, being particularly suitable for real-time blood-cell segmentation tasks due to its small model size (16.7 MB) and low computation cost.

4.2. Performance Analysis of Cell Counting

Table 2 presents the variation in average cell counting accuracy according to the watershed threshold (

T_{w}

) based on the segmentation results from Table 1. In the table, the highest accuracy per model and threshold setting is highlighted in bold. Since the threshold directly controls seed selection in the distance transform process, lower

T_{w}

values lead to over-segmentation, while higher values result in under-segmentation. Therefore, the choice of

T_{w}

significantly affects counting accuracy.

From a loss-function perspective, focal loss consistently achieves the highest counting accuracy. This differs from Table 1, where BCE and Dice losses yielded the best segmentation results, confirming that higher segmentation accuracy does not necessarily translate to higher counting accuracy. This can be attributed to focal loss’s robustness near uneven boundary intensities, which stabilizes seed-map generation in the counting stage.

When examining thresholds, the optimal value

T_{w}^{*}

varies with the loss function. For BCE Loss, the optimal accuracy occurs around 0.85–0.9; for focal Loss, the optimal accuracy is between 0.55–0.65; and for Dice Loss, optimal accuracy is around 0.7 for all models. This indicates that each loss function requires a distinct threshold tuning strategy for optimal deployment.

This behavior can be further explained by the different boundary characteristics induced by each loss function during segmentation training. BCE and Dice loss tend to produce relatively sharp and high-contrast boundaries between foreground and background regions, which results in distance-transform maps with steep gradients near cell borders. Consequently, higher watershed thresholds are required to suppress spurious local maxima and avoid over-segmentation. In contrast, focal loss emphasizes hard-to-classify pixels near ambiguous or low-contrast boundaries, leading to smoother boundary transitions and more gradual distance-transform gradients. As a result, meaningful cell-center seeds emerge at lower relative distance values, shifting the optimal watershed threshold toward a smaller

T_{w}

. This observation highlights that watershed threshold sensitivity is intrinsically linked to segmentation boundary sharpness and noise smoothness and therefore must be jointly optimized with the chosen loss function rather than treated as an independent post-processing parameter.

Under the focal loss setting, which yields the best overall performance, a model-wise comparison shows that MaxVitUNet and UNet++ achieve the highest cell counting accuracies of 0.96 and 0.95, respectively. In addition, UNet also demonstrates strong performance, while all other evaluated models consistently achieve stable cell counting accuracy exceeding 0.91 when trained with focal loss.

Therefore, from a pipeline optimization perspective, focal loss is recommended for counting tasks, MaxVitUNet and UNet++ for accuracy-oriented applications, and FPN or ResUNet for efficiency-oriented implementations.

Figure 2 and Figure 3 illustrate representative examples of cell segmentation and counting results for both sparse and dense cell distributions. In sparse distributions, most model–loss combinations maintain stable cell counting accuracy above 92%, whereas in dense distributions the accuracy degrades to approximately 60% due to severe cell overlap and ambiguous boundaries. In the visualization, white regions indicate confidently segmented cell interiors, black regions correspond to non-cell background, and gray regions represent ambiguous transition areas between cell and non-cell regions. Although these gray areas may appear visually connected to cell contours, they are not interpreted as definitive cell boundaries unless they form sufficiently strong gradients in the distance transform. As a result, some visually merged contours in dense regions do not generate independent seed points, leading to an apparent mismatch between the number of visible contours and the final cell count N. Notably, compared to the baseline CellViT model trained with Dice loss, the proposed focal loss-based UNet++ improves counting accuracy by up to 55.7% in dense distributions, while the focal loss-based FPN achieves an improvement of up to 47%. This improvement indicates that the proposed CSC pipeline—through optimized segmentation model selection, loss function design, and adaptive watershed thresholding—produces smoother and more discriminative distance-transform gradients, enabling more reliable seed generation and boundary separation even in challenging dense-cell scenarios.

4.3. Performance Analysis of Model Compression

Table 3 compares cell counting accuracy (Acc), model size, and latency across three compression techniques: no quantization (baseline), FP16 quantization, and structured pruning. All results are measured on the same dataset under identical threshold conditions. For structured pruning, a consistent pruning ratio of 30% was applied across all architectures, followed by fine-tuning for 20 epochs to recover performance. This uniform pruning and retraining protocol ensures fair comparison among models and helps explain the observed differences in pruning sensitivity across network architectures.

Overall, FP16 Quantization provides the best efficiency without accuracy degradation, maintaining a balance between model size and inference speed. Compared to the FP32 baseline, FP16 reduces memory usage by approximately 50% and latency by 10–15%, while keeping accuracy loss below 1%. For example, UNet++ maintains 95% accuracy after FP16 quantization, while its model size decreases from 138.8 MB to 67.0 MB. Similarly, FPN and ResUNet preserve accuracies of 92% and 91%, respectively, with latency variations under 1 ms.

Conversely, structured pruning achieves substantial size reduction but introduces larger accuracy degradation in some models. BiFPN and EffUNet exhibit 3–4% accuracy loss as pruning intensity increases, likely due to the removal of critical skip-path connections between encoder and decoder layers. Furthermore, pruning occasionally increases latency slightly, despite model sparsity, due to unbalanced memory access patterns and non-uniform sparsity. For example, UNet latency rises modestly from 21.6 ms to 25.5 ms after pruning.

In terms of model trends, UNet++, UNet, and FPN demonstrate the best overall trade-offs under FP16 Quantization, maintaining stable counting accuracies of 95%, 93%, and 92%, respectively. Meanwhile, Transformer-based models such as CellViT and Evit-UNet show minor accuracy drops (92%→ 91%) due to the sensitivity of attention-weight scaling but negligible latency increases (1–2 ms).

In summary, FP16 Quantization is the optimal compression strategy, improving efficiency without sacrificing precision. Although Structured Pruning provides significant size reduction, it may negatively affect segmentation and counting performance. The best trade-off is achieved with FP16-Quantized UNet++, which delivers 95% accuracy, a 67 MB model size, and 29.5 ms latency. Thus, the proposed FP16-based lightweight counting pipeline is identified as the most efficient solution for low-power AIoT environments.

4.4. Cross-Dataset Validation on CHO Cell Images

In the previous sections, the proposed end-to-end CSC framework was extensively evaluated on the WBC dataset to analyze which segmentation models are most effective in terms of accuracy and computational efficiency, as well as which combinations of loss functions and watershed thresholds maximize cell counting performance. In this section, the same evaluation procedure is applied to a different cell domain, namely the Chinese Hamster Ovary (CHO) cell line dataset, in order to assess the generalization capability of the proposed approach. Specifically, segmentation performance across different models and loss functions, as well as cell counting accuracy under varying watershed thresholds, are systematically analyzed.

As shown in Table 4, results on the CHO cell dataset exhibit trends consistent with those observed in Table 1 for the WBC dataset. Most segmentation models demonstrate stable performance across different training losses, with UNet++ achieving the highest accuracy, Dice coefficient, sensitivity, and mIoU among all evaluated architectures. Meanwhile, FPN and ResUNet again demonstrate the most favorable efficiency characteristics when considering model size and inference latency, confirming their suitability for resource-constrained deployment.

Table 5 presents the cell counting accuracy on the CHO cell dataset across different watershed threshold values. As illustrated in Figure 4 and Figure 5, the CHO cell images generally exhibit lower cell density compared to the WBC dataset. Consequently, robust counting performance is maintained over a wider range of watershed thresholds, indicating that the proposed CSC pipeline remains stable under less crowded cellular conditions.

5. Conclusions

This study presented an optimized end-to-end cell segmentation and counting (CSC) pipeline for low-power AIoT devices by jointly considering segmentation model selection, watershed-based counting optimization, and model compression. Experimental results demonstrate that a Focal-Loss-based FPN model with a watershed threshold of

T_{w} = 0.65

achieves the best overall counting performance. With FP16 quantization and pruning, the proposed pipeline improves cell counting accuracy by 21.1%, while reducing model size by 74.5% and inference latency by 16.8% compared to the baseline. These results confirm the effectiveness of the proposed framework as a lightweight and accurate CSC solution suitable for real-time AIoT deployment.

Author Contributions

Conceptualization, S.L.; methodology, S.L.; software, G.P., J.P. and S.L.; validation, S.L.; formal analysis, S.L.; investigation, S.L.; resources, S.L.; data curation, S.L.; writing—original draft preparation, S.L.; writing—review and editing, S.L.; visualization, S.L.; supervision, S.L.; project administration, S.L.; funding acquisition, S.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Korea Evaluation Institute of Industrial Technology (KEIT) grant funded by the Korea government (MOTIE) (No. RS-2025-02313838, Project Name: Development of AI Training, Internalization, and Prediction System Technology for High-Speed, Multiplex, and Ultra-Precision Detection Based on Bio-Semiconductors).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Kobara, Y.M.; Akpan, I.J.; Nam, A.D.; AlMukthar, F.H.; Peter, M. Artificial intelligence and data science methods for automatic detection of white blood cells in images. J. Imaging Inform. Med. 2025. [Google Scholar] [CrossRef]
Naouali, S.; Othmani, O.E. AI-driven automated blood cell anomaly detection: Enhancing diagnostics and telehealth in hematology. J. Imaging 2025, 11, 157. [Google Scholar] [PubMed]
Abozeid, A.; Alrashdi, I.; Krushnasamy, V.S.; Gudla, C.; Ulmas, Z.; Nimma, D.; El-Ebiary, Y.A.B.; Abdulhadi, R. White blood cells detection using deep learning in healthcare applications. Alex. Eng. J. 2025, 124, 135–146. [Google Scholar] [CrossRef]
Rezatofighi, S.; Soltanian-Zadeh, H. Automatic recognition of five types of white blood cells in peripheral blood. Cytom. Part A 2011, 79A, 747–757. [Google Scholar] [CrossRef]
Nimmy, T.; Sreejith, V. A Review on White Blood Cells Segmentation. In Proceedings of the International Conference on Recent Advancements and Effectual Researches in Engineering Science and Technology (RAEREST), Kerala State, India, 20–21 April 2018. [Google Scholar] [CrossRef]
Mohamed, A.; Farag, M.; Ghazal, H. Automatic white blood cell segmentation using deep learning. IEEE Access 2019, 7, 180449–180458. [Google Scholar]
Xu, J.; Luo, F.; Li, S. AI-assisted digital hematology for remote blood cell analysis. Front. Med. 2021, 8, 678913. [Google Scholar]
Darling, H.E.; Wheeler, R.T.; Lakowicz, J.R. Quantitative analysis of leukocyte morphology and viability for drug toxicity testing. Biotechnol. Bioeng. 2021, 118, 2234–2246. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention, Proceedings of the MICCAI 2015; Munich, Germany, 5–9 October 2015, Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
Zhou, Z.; Siddiquee, M.M.R.; Tajbakhsh, N.; Liang, J. UNet++: A Nested U-Net Architecture for Medical Image Segmentation. arXiv 2018, arXiv:1807.10165. [Google Scholar]
Diakogiannis, F.I.; Waldner, F.; Caccetta, P.; Wu, C. ResUNet-a: A deep learning framework for semantic segmentation of remotely sensed data. arXiv 2020, arXiv:1904.00592. [Google Scholar] [CrossRef]
Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv 2019, arXiv:1905.11946. [Google Scholar]
Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. arXiv 2019, arXiv:1612.03144. [Google Scholar]
Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
Kirillov, A.; Girshick, R.; He, K.; Dollár, P. Panoptic Feature Pyramid Networks. arXiv 2019, arXiv:1901.02446. [Google Scholar] [CrossRef]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Hörst, F.; Rempe, M.; Heine, L.; Seibold, C.; Keyl, J.; Baldini, G.; Ugurel, S.; Siveke, J.; Grünwald, B.; Egger, J.; et al. CellViT: Vision Transformers for Precise Cell Segmentation and Classification. arXiv 2023, arXiv:2306.15350. [Google Scholar] [CrossRef]
Li, X.; Zhu, W.; Dong, X.; Dumitrascu, O.M.; Wang, Y. EViT-Unet: U-Net Like Efficient Vision Transformer for Medical Image Segmentation on Mobile and Edge Devices. arXiv 2023, arXiv:2410.15036. [Google Scholar]
Khan, A.R.; Khan, A. Multi-axis vision transformer for medical image segmentation. Eng. Appl. Artif. Intell. 2025, 158, 111251. [Google Scholar] [CrossRef]
Depto, D.S.; Rahman, S.; Hosen, M.M.; Akter, M.S.; Reme, T.R.; Rahman, A.; Zunair, H.; Rahman, M.S.; Mahdy, M.R.C. Automatic segmentation of blood cells from microscopic slides: A comparative analysis. Tissue Cell 2021, 73, 101653. [Google Scholar] [CrossRef]
Sheikh, I.M.; Chachoo, M.A. A hybrid cell image segmentation method based on the multilevel improvement of data. Tissue Cell 2023, 84, 102169. [Google Scholar] [CrossRef]
He, K.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
Graham, S.; Vu, Q.D.; Raza, S.E.A.; Azam, A.; Tsang, Y.W.; Kwak, J.T.; Rajpoot, N. Hover-Net: Simultaneous segmentation and classification of nuclei in multi-tissue histology images. Med. Image Anal. 2019, 58, 101563. [Google Scholar] [CrossRef] [PubMed]
Adnan, N.; Umer, F.; Malik, S. Implementation of transfer learning for the segmentation of human mesenchymal stem cells—A validation study. Tissue Cell 2023, 83, 102149. [Google Scholar] [CrossRef]
Hoorali, F.; Khosravi, H.; Moradi, B. Automatic microscopic diagnosis of diseases using an improved UNet++ architecture. Tissue Cell 2022, 76, 101816. [Google Scholar] [CrossRef]
Blood Cell Segmentation Dataset. Available online: https://www.kaggle.com/datasets/jeetblahiri/bccd-dataset-with-mask (accessed on 26 December 2025).
Chinese Hamster Ovary Cells. Available online: https://bbbc.broadinstitute.org/BBBC030?utm_source=chatgpt.com (accessed on 26 December 2025).
Lempitsky, V.; Zisserman, A. Learning to Count Objects in Images. Adv. Neural Inf. Process. Syst. 2010, 23. [Google Scholar]
Li, Y.; Zhang, X.; Chen, D. CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes. In Proceedings of the CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
Xie, H.; He, Y.; Xu, D.; Kuo, J.Y.; Lei, H.; Lei, B. Joint segmentation and classification task via adversarial network: Application to HEp-2 cell images. Appl. Soft Comput. 2022, 114, 108156. [Google Scholar] [CrossRef]
Archit, A.; Freckmann, L.; Nair, S.; Khalid, N.; Hilt, P.; Rajashekar, V.; Freitag, M.; Teuber, C.; Spitzner, M.; Contreras, C.T.; et al. Segment Anything for Microscopy. Nat. Methods 2025, 22, 579–591. [Google Scholar] [CrossRef] [PubMed]
Wolf, S.; Lalit, M.; McDole, K.; Funke, J. Unsupervised Learning of Object-Centric Embeddings for Cell Instance Segmentation in Microscopy Images. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2–6 October 2023; pp. 21206–21215. [Google Scholar]
Israel, U.; Marks, M.; Dilip, R.; Li, Q.; Yu, C.; Laubscher, E.; Iqbal, A.; Pradhan, E.; Ates, A.; Abt, M.; et al. CellSAM: A foundation model for cell segmentation. Nat. Methods 2025, 22, 2585–2593. [Google Scholar] [CrossRef] [PubMed]
Lee, H.; Lee, N.; Lee, S. A Method of Deep Learning Model Optimization for Image Classification on Edge Device. Sensors 2022, 22, 7344. [Google Scholar] [CrossRef] [PubMed]
Hinton, G.; Vinyals, O.; Dean, J. Distilling the Knowledge in a Neural Network. In Proceedings of the NIPS 2014 Deep Learning Workshop, Montreal, QC, Canada, 12–13 December 2014. [Google Scholar]
Elsken, T.; Metzen, J.H.; Hutter, F. Neural Architecture Search: A Survey. arXiv 2018, arXiv:1808.05377. [Google Scholar]
Huang, G.; Liu, Z.; Maaten, L.v.; Weinberger, K.Q. Densely Connected Convolutional Networks. arXiv 2018, arXiv:1608.06993. [Google Scholar]
Bishop, C.M.; Nasrabadi, N.M. Pattern Recognition and Machine Learning; Springer: New York, NY, USA, 2006. [Google Scholar]
Dice, L.R. Measures of the Amount of Ecologic Association Between Species. Ecology 1945, 26, 297–302. [Google Scholar] [CrossRef]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]

Figure 1. E2E pipeline optimization of cell segmentation and counting for low-power AIoT devices: The E2E pipeline optimization is performed by jointly considering cell segmentation, cell counting, and model compression.

Figure 2. Result images of segmentation and counting in sparse cell distribution: GT denotes the ground truth annotation, N represents the total number of counted cells, and A indicates the cell counting accuracy. The green contours correspond to the segmented cell boundaries obtained by each segmentation model. Each sub-figure illustrates the cell segmentation and counting results produced by a specific combination of segmentation model and training loss, where the optimal watershed threshold is selected individually for each configuration.

Figure 3. Result images of cell segmentation and counting in dense cell distribution: GT denotes the ground truth annotation, N represents the total number of counted cells, and A indicates the cell Counting accuracy. The green contours indicate the predicted cell boundaries generated by each segmentation model. Each sub-figure presents representative examples of cell segmentation and counting results using the optimal watershed threshold selected for each segmentation model and training loss combination under dense cell distribution conditions.

Figure 4. Result images of segmentation and counting in sparse cell distribution: GT denotes the ground truth annotation, N represents the total number of counted cells, and A indicates the cell counting accuracy. The green contours indicate the predicted cell boundaries generated by each segmentation model. Each subfigure presents representative examples of cell segmentation and counting results using the optimal watershed threshold selected for each segmentation model and training loss combination under dense cell distribution conditions.

Figure 5. Result images of cell segmentation and counting in dense cell distribution: GT denotes the ground truth annotation, N represents the total number of counted cells, and A indicates the cell counting accuracy. The green contours indicate the predicted cell boundaries generated by each Segmentation model. Each subfigure presents representative examples of cell segmentation and counting results using the optimal watershed threshold selected for each segmentation model and training loss combination under dense cell distribution conditions.

Table 1. Cell segmentation results for blood cell segmentation dataset.

	BCE Loss					Focal Loss
Model	Acc	Dice	Sens		mIoU	Acc	Dice	Sens		mIoU
UNet	98.0	96.5	96.8		93.4	97.9	96.3	96.4		93.0
UNet++	98.0	96.5	97.2		93.4	97.9	96.4	96.2		93.2
ResUNet	98.0	96.5	97.3		93.2	97.8	96.2	95.3		92.8
EffUNet	97.8	96.3	96.9		92.9	97.7	96.0	95.9		92.4
FPN	97.9	96.4	96.9		93.1	97.9	96.3	96.2		93.0
BiFPN	97.4	95.6	96.3		91.5	97.9	96.3	96.2		93.0
PFPN	97.9	96.4	96.5		93.1	97.9	96.3	96.2		93.0
CellVit	97.2	95.2	95.6		90.8	96.9	94.7	93.9		89.9
EvitUNet	97.8	96.1	96.8		92.5	97.4	95.5	94.6		91.5
MaxViTUNet	97.9	96.4	96.7		93.0	97.8	96.1	96.0		92.6
	Dice Loss						Efficiency
Model	Acc	Dice		Sens		mIoU	size (MB)		latency (ms)
UNet	98.0	96.6		97.1		93.4	124.3		21.6
UNet++	98.0	96.5		97.1		93.3	138.8		28.9
ResUNet	98.0	96.5		97.3		93.3	26.9		21.5
EffUNet	97.9	96.4		97.0		93.1	69.8		46.4
FPN	98.0	96.5		97.1		93.2	16.7		21.5
BiFPN	97.4	95.6		96.3		91.7	32.4		56.3
PFPN	98.0	96.5		96.8		93.3	47.5		27.2
CellVit	97.2	95.3		95.8		91.0	33.5		27.3
EvitUNet	97.6	96.0		97.0		92.3	68.4		26.5
MaxViTUNet	97.9	96.4		96.9		93.1	133.5		42.7

Table 2. Cell counting accuracy according to threshold for the WBC dataset [26].

BCE Loss	Threshold
Model	0.45	0.50	0.55	0.60	0.65	0.7	0.75	0.8	0.85	0.9
UNet	0.81	0.82	0.83	0.84	0.86	0.88	0.89	0.91	0.91	0.92
UNet++	0.80	0.81	0.82	0.83	0.81	0.81	0.89	0.90	0.91	0.91
ResUNet	0.79	0.79	0.80	0.81	0.82	0.84	0.86	0.87	0.89	0.90
EffUNet	0.84	0.87	0.89	0.90	0.91	0.91	0.91	0.92	0.92	0.92
FPN	0.81	0.82	0.83	0.84	0.86	0.87	0.88	0.90	0.91	0.92
BiFPN	0.78	0.79	0.80	0.82	0.84	0.86	0.88	0.90	0.91	0.91
PFPN	0.79	0.81	0.82	0.83	0.85	0.87	0.89	0.91	0.91	0.92
CellVit	0.78	0.79	0.81	0.82	0.84	0.85	0.87	0.88	0.89	0.90
EvitUNet	0.79	0.80	0.82	0.84	0.85	0.87	0.89	0.91	0.92	0.92
MaxViTUNet	0.84	0.86	0.88	0.91	0.93	0.94	0.95	0.95	0.95	0.96
Focal Loss
Model	0.45	0.50	0.55	0.60	0.65	0.7	0.75	0.8	0.85	0.9
UNet	0.86	0.91	0.92	0.93	0.92	0.92	0.92	0.91	0.89	0.85
UNet++	0.90	0.94	0.95	0.95	0.95	0.93	0.93	0.92	0.90	0.89
ResUNet	0.85	0.87	0.90	0.90	0.91	0.91	0.91	0.90	0.89	0.84
EffUNet	0.89	0.91	0.91	0.91	0.91	0.91	0.91	0.91	0.91	0.91
FPN	0.82	0.86	0.90	0.91	0.92	0.91	0.90	0.88	0.85	0.75
BiFPN	0.84	0.89	0.91	0.91	0.91	0.91	0.91	0.91	0.91	0.89
PFPN	0.84	0.89	0.92	0.92	0.92	0.92	0.92	0.90	0.87	0.82
CellVit	0.83	0.88	0.90	0.91	0.92	0.91	0.91	0.89	0.87	0.80
EvitUNet	0.90	0.92	0.92	0.92	0.92	0.92	0.91	0.91	0.91	0.91
MaxViTUNet	0.93	0.95	0.95	0.95	0.95	0.96	0.96	0.96	0.96	0.96
Dice Loss
Model	0.45	0.50	0.55	0.60	0.65	0.7	0.75	0.8	0.85	0.9
UNet	0.80	0.80	0.80	0.80	0.80	0.80	0.80	0.80	0.80	0.81
UNet++	0.81	0.81	0.81	0.81	0.85	0.87	0.81	0.81	0.81	0.81
ResUNet	0.79	0.79	0.79	0.79	0.79	0.79	0.79	0.79	0.79	0.79
EffUNet	0.89	0.91	0.91	0.91	0.91	0.91	0.91	0.91	0.91	0.91
FPN	0.80	0.80	0.80	0.80	0.80	0.80	0.80	0.80	0.80	0.80
BiFPN	0.76	0.76	0.76	0.76	0.76	0.76	0.76	0.76	0.76	0.76
PFPN	0.80	0.80	0.80	0.81	0.81	0.81	0.81	0.81	0.81	0.82
CellVit	0.76	0.76	0.76	0.77	0.77	0.77	0.77	0.77	0.78	0.78
EvitUNet	0.81	0.81	0.81	0.81	0.82	0.82	0.82	0.82	0.82	0.82
MaxViTUNet	0.82	0.82	0.82	0.82	0.82	0.82	0.83	0.83	0.83	0.83

Table 3. Cell counting accuracy according to model compression techniques.

Model	No Quantization			FP16 Quantization			Pruning
Model	Acc	Size	Latency	Acc	Size	Latency	Acc	Size	Latency
UNet	93	124.3	21.6	93	62.2	23.6	91	79.4	25.5
UNet++	95	138.8	28.9	95	67.0	29.5	93	85.4	31.2
ResUNet	91	26.9	21.5	91	13.4	21.4	88	17.1	22.7
FPN	92	16.7	21.1	92	8.4	22.7	89	13.2	23.1
BiFPN	91	32.4	41.3	91	16.4	36.1	87	27.2	43.4
PFPN	92	47.5	27.2	92	23.8	24.3	88	30.2	20.4
EffUNet	91	69.8	26.4	92	35.0	29.8	87	44.8	25.1
CellVit	92	33.5	27.3	91	16.8	28.4	89	28.1	29.3
EvitUNet	92	68.4	26.5	91	34.2	26.9	88	42.7	31.2
MaxViTUNet	96	133.5	44.9	96	63.8	44.0	57	117.28	50.6

Table 4. Cell segmentation results for CHO cell segmentation dataset [27].

	BCE Loss					Focal Loss
Model	Acc	Dice	Sens		mIoU	Acc	Dice	Sens		mIoU
UNet	99.7	91.9	92.9		85.3	99.7	92.2	91.6		85.7
UNet++	99.7	93.1	93.8		87.3	99.7	92.5	92.5		86.3
ResUNet	99.7	92.5	92.2		86.3	99.7	92.0	90.0		85.5
EffUNet	99.5	86.0	81.6		75.9	99.4	83.8	75.1		72.7
FPN	99.7	92.3	93.8		85.9	99.6	91.0	89.6		83.8
BiFPN	99.3	81.9	80.9		69.6	99.3	79.6	74.3		66.6
PFPN	99.7	91.7	91.8		84.9	99.7	91.3	90.9		84.3
CellVit	99.2	78.6	76.3		65.5	99.2	77.4	69.7		63.9
EvitUNet	99.7	92.0	91.9		85.4	99.6	90.7	89.4		83.3
MaxViTUNet	99.7	91.5	92.0		84.5	99.6	91.1	91.5		84.0
	Dice Loss						Efficiency
Model	Acc	Dice		Sens		mIoU	size (MB)		latency (ms)
UNet	99.7	92.8		92.7		86.7	124.3		11.28
UNet++	99.7	93.2		95.0		87.3	138.8		17.04
ResUNet	99.7	92.8		93.2		86.8	26.9		11.33
EffUNet	99.5	88.0		84.3		79.1	69.8		19.35
FPN	99.7	92.6		94.2		86.3	16.7		10.46
BiFPN	99.3	80.5		78.6		67.8	32.4		32.09
PFPN	99.7	92.8		93.2		86.8	47.5		11.41
CellVit	99.0	73.2		69.6		58.4	33.5		20.72
EvitUNet	99.6	91.0		91.5		83.8	68.4		13.34
MaxViTUNet	99.7	92.0		92.3		85.4	133.5		31.52

Table 5. Cell counting accuracy according to threshold for the CHO dataset dataset [27].

BCE Loss	Threshold
Model	0.45	0.50	0.55	0.60	0.65	0.7	0.75	0.8	0.85	0.9
UNet	0.94	0.94	0.95	0.95	0.95	0.95	0.95	0.95	0.95	0.95
UNet++	0.96	0.96	0.95	0.95	0.94	0.94	0.95	0.95	0.95	0.96
ResUNet	0.95	0.94	0.94	0.94	0.93	0.94	0.92	0.92	0.93	0.91
EffUNet	0.95	0.94	0.94	0.94	0.95	0.95	0.95	0.95	0.95	0.96
FPN	0.94	0.94	0.94	0.94	0.95	0.95	0.94	0.95	0.95	0.95
BiFPN	0.95	0.95	0.94	0.94	0.94	0.93	0.92	0.92	0.91	0.91
PFPN	0.95	0.95	0.95	0.95	0.95	0.95	0.95	0.95	0.96	0.97
CellVit	0.92	0.92	0.92	0.92	0.92	0.91	0.91	0.91	0.91	0.91
EvitUNet	0.95	0.95	0.94	0.94	0.94	0.96	0.96	0.96	0.94	0.94
MaxViTUNet	0.96	0.96	0.96	0.96	0.96	0.96	0.96	0.96	0.96	0.96
Focal Loss
Model	0.45	0.50	0.55	0.60	0.65	0.7	0.75	0.8	0.85	0.9
UNet	0.95	0.95	0.96	0.95	0.95	0.95	0.95	0.96	0.96	0.95
UNet++	0.95	0.96	0.96	0.96	0.95	0.95	0.95	0.95	0.96	0.96
ResUNet	0.95	0.94	0.94	0.94	0.93	0.94	0.92	0.92	0.93	0.91
EffUNet	0.96	0.94	0.94	0.95	0.95	0.95	0.96	0.96	0.95	0.93
FPN	0.94	0.95	0.94	0.94	0.95	0.95	0.94	0.94	0.90	0.91
BiFPN	0.95	0.94	0.93	0.94	0.94	0.94	0.95	0.92	0.92	0.89
PFPN	0.96	0.96	0.96	0.96	0.96	0.96	0.95	0.96	0.97	0.97
CellVit	0.91	0.90	0.90	0.91	0.91	0.91	0.91	0.91	0.90	0.90
EvitUNet	0.95	0.96	0.95	0.95	0.96	0.95	0.95	0.94	0.95	0.93
MaxViTUNet	0.94	0.94	0.94	0.95	0.95	0.95	0.95	0.95	0.97	0.97
Dice Loss
Model	0.45	0.50	0.55	0.60	0.65	0.7	0.75	0.8	0.85	0.9
UNet	0.96	0.96	0.96	0.96	0.96	0.96	0.96	0.96	0.96	0.96
UNet++	0.96	0.96	0.96	0.96	0.96	0.96	0.96	0.96	0.96	0.96
ResUNet	0.95	0.95	0.96	0.96	0.96	0.97	0.97	0.97	0.97	0.97
EffUNet	0.95	0.95	0.95	0.95	0.95	0.95	0.95	0.95	0.95	0.95
FPN	0.95	0.95	0.95	0.95	0.95	0.95	0.95	0.95	0.95	0.95
BiFPN	0.93	0.93	0.92	0.92	0.92	0.92	0.93	0.93	0.94	0.94
PFPN	0.96	0.96	0.96	0.97	0.97	0.97	0.97	0.97	0.97	0.97
CellVit	0.88	0.88	0.88	0.88	0.88	0.88	0.88	0.88	0.88	0.88
EvitUNet	0.95	0.95	0.95	0.95	0.95	0.95	0.95	0.95	0.95	0.95
MaxViTUNet	0.95	0.94	0.94	0.94	0.94	0.94	0.94	0.94	0.94	0.95

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Park, G.; Park, J.; Lee, S. Optimal Cell Segmentation and Counting Strategy for Embedding in Low-Power AIoT Devices. Appl. Sci. 2026, 16, 357. https://doi.org/10.3390/app16010357

AMA Style

Park G, Park J, Lee S. Optimal Cell Segmentation and Counting Strategy for Embedding in Low-Power AIoT Devices. Applied Sciences. 2026; 16(1):357. https://doi.org/10.3390/app16010357

Chicago/Turabian Style

Park, Gunwoo, Junmin Park, and Sungjin Lee. 2026. "Optimal Cell Segmentation and Counting Strategy for Embedding in Low-Power AIoT Devices" Applied Sciences 16, no. 1: 357. https://doi.org/10.3390/app16010357

APA Style

Park, G., Park, J., & Lee, S. (2026). Optimal Cell Segmentation and Counting Strategy for Embedding in Low-Power AIoT Devices. Applied Sciences, 16(1), 357. https://doi.org/10.3390/app16010357

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimal Cell Segmentation and Counting Strategy for Embedding in Low-Power AIoT Devices

Abstract

1. Introduction

2. Related Works

3. System Model

3.1. Cell Segmentation

3.1.1. Segmentation Model Selection

3.1.2. Training Loss Selection

3.2. Cell Counting

Distance Transform-Based Watershed

3.3. Model Compression

3.3.1. FP16 Quantization

3.3.2. INT8 Quantization

3.3.3. Structured Pruning

4. Experimental Results

4.1. Performance Analysis of Cell Segmentation

4.2. Performance Analysis of Cell Counting

4.3. Performance Analysis of Model Compression

4.4. Cross-Dataset Validation on CHO Cell Images

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI