Revealing the Best Strategies for Rare Cell Type Detection in Multi-Sample Single-Cell Datasets

Zhiwei Ye; Yinqiao Yan; Yuanyuan Yu; Hao Wu

doi:10.3390/genes17010031

,

and

¹

Department of Electronic and Electrical Engineering, Southern University of Science and Technology, Shenzhen 518055, China

²

Faculty of Computer Science and Control Engineering, Shenzhen University of Advanced Technology, Shenzhen 518107, China

³

School of Mathematics, Statistics and Mechanics, Beijing University of Technology, Beijing 100124, China

⁴

Institute of Advanced Computing and Digital Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China

Genes2026, 17(1), 31;https://doi.org/10.3390/genes17010031
(registering DOI)

This article belongs to the Section Bioinformatics

Version Notes

Order Reprints

Abstract

Background: Single-cell RNA sequencing (scRNA-seq) enables high-resolution characterization of cellular heterogeneity and provides unique opportunities to identify rare cell populations that may be obscured in bulk transcriptomic data. However, despite the growing interest in rare-cell discovery, most existing detection methods were originally developed for single-sample datasets, and their behavior in multi-sample settings—where batch effects, sample imbalance, and heterogeneous cell-type compositions are common—remains poorly understood. This study aims to systematically evaluate representative rare cell detection methods under multi-sample settings and identify the most effective analytical strategies. Methods: We performed a comprehensive benchmarking analysis of five widely used rare cell detection tools, CellSIUS, GapClust, GiniClust, scCAD, SCISSORS and a scGPT-based rare cell detection method using Isolation Forest. Each method was evaluated under three analytical strategies: individual sample detection, pooled sample detection, and batch-corrected pooled sample detection. Performance was assessed across multiple publicly available scRNA-seq datasets using standardized evaluation metrics. Results: Batch-corrected pooled sample detection consistently achieved the highest overall performance across methods and datasets, whereas individual sample detection produced the weakest results. Among the evaluated tools, scCAD demonstrated the most robust and stable performance across dataset types and analytical conditions. Conclusions: This study provides strategy-level comparison in multi-sample settings. Our findings highlight the importance of batch correction and pooled analysis for improving rare cell detection accuracy and offer practical guidance for selecting optimal methods and analytical workflows in large-scale single-cell transcriptomic studies.

Keywords:

rare cell detection; single-cell RNA sequencing (scRNA-seq); population-level analysis

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Article metric data becomes available approximately 24 hours after publication online.