API Reference#
This section provides detailed documentation for all classes and functions in scDataset.
Main Dataset Class#
|
Iterable PyTorch Dataset for on-disk data collections with flexible sampling strategies. |
Multi-Modal Data Support#
|
Container for multiple indexable objects that should be indexed together. |
Transform Functions#
Transform AnnData/AnnCollection batch to MultiIndexable with optional obs columns. |
|
Transform HuggingFace Tahoe-100M sparse gene expression data to dense tensors. |
|
Fetch callback for BioNeMo SingleCellMemMapDataset. |
Sampling Strategies#
|
Abstract base class for sampling strategies. |
|
Sequential streaming sampling strategy with optional buffer-level shuffling. |
|
Block-based shuffling sampling strategy. |
|
Weighted sampling with block-based shuffling. |
|
Class-balanced sampling with automatic weight computation. |
Experimental Features#
Warning
Features in the experimental module are subject to change and may be modified significantly or removed entirely in future releases.
Suggest optimal parameters for scDataset based on system resources. |
|
Estimate the memory size of a single sample from the data collection. |