Statistic models | Scarf | Graph-based t-stochastic neighbour embedding | 4 million | Visualization | Graph-based neighbouring embedding and hierarchical clustering | Emphasizing rare cells and lineage trajectories | 27 |
iNMF | Online integrative non-negative matrix factorization | 1.3 million | Data integration | Jointly decomposed inputs into shared and dataset-specific metagenes | Integrates datasets without needing the entire data during training | 28 |
scMerge2 | Integrates single-cells in a hierarchical manner | 11 million | Data integration | Hierarchical integration for local and global variations | Integrates incoming datasets without complete dataset availability during training | 29 |
Seurat v5 | Dictionary learning | 8.6 million | Data integration for multi-omic data | Decompose cells into multi-omics dictionary | Integrates data independent of single-cell omics measurements | 30 |
Deep-learning methods | Cumulus | Supervised learning | 1.3 million | Visualization | Learns project unseen cells with subsampling | Ensures a higher rate of sampling from rare cells | 26 |
INSCT | Semi-supervised learning | 2.6 million | Data integration | Employs batch-aware triplet network to generate combined embedding space | Projects unseen single-cell data into pre-generated embeddings | 24 |
Fugue | Self-supervised learning | 18 million | Data integration | Encoding batch information in unsupervised network | Maintains consistent memory usage across varying data magnitudes | 25 |
SCALEX | Unsupervised learning | 4 million | Data integration | Applies VAE to project cells into a batch-invariant space | Incorporates incoming data without recalculating. | 31 |
scPoli | Semi-supervised learning | 7.8 million | Data integration | Applying conditional VAE to regress batch effects | Explains sample and cell-level variations with sample embeddings | 32 |
Concerto | Self-supervised learning | 10 million | Data integration for multi-omic data | Utilizes an asymmetric teacher-student architecture for cell pairing and batch separation | Pioneers multi-omics data integration | 33 |
Large-scale single-cell pre-training | iSEEEK | Masked language modelling | 11.9 million | Cell clustering, development trajectory, cell-cell communication | Leverages top 126 genes for each cell; predicts masked gene with bidirectional self-attention | Enables focused analysis and noise reduction in single-cell data; enhances contextual understanding | 20 |
Geneformer | Masked language modelling | 29.9 million | Chromatin network and therapeutic targets inference | Leverages all genes within each cell; predicts masked genes using bidirectional self-attention | Fosters a comprehensive understanding of the cellular context; enhances contextual understanding | 22 |
tGPT | Auto-regressive modeling | 22.3 million | Cell clustering, cell-phenotype, development trajectory, therapeutic targets inference. | Leverages top 64/126 genes for each cell; predicts the next gene based on previously generated genes | Enables focused analysis, noise reduction; suitable for single-cell data with temporal or positional order | 21 |