scdataset.transforms.bionemo_to_tensor

scdataset.transforms.bionemo_to_tensor#

scdataset.transforms.bionemo_to_tensor(data_collection, idx: int | slice | Sequence[int] | ndarray | Tensor) Tensor[source]#

Fetch callback for BioNeMo SingleCellMemMapDataset.

This callback provides custom indexing logic for BioNeMo’s SingleCellMemMapDataset, which returns sparse matrices that need to be collated and densified for use with scDataset.

Use this as a fetch_callback in scDataset.

Parameters:
  • data_collection (SingleCellMemMapDataset) – The BioNeMo dataset to fetch from.

  • idx (int, slice, sequence, or tensor) – Indices to fetch. Can be: - int: Single index - slice: Slice object - list/ndarray/tensor: Batch of indices

Returns:

Dense tensor of shape (batch_size, num_genes) with expression values.

Return type:

torch.Tensor

Examples

>>> from scdataset import scDataset, BlockShuffling
>>> from scdataset.transforms import bionemo_to_tensor
>>> from bionemo.scdl.io.single_cell_memmap_dataset import SingleCellMemMapDataset
>>>
>>> bionemo_data = SingleCellMemMapDataset(data_path='/path/to/data')
>>> dataset = scDataset(
...     bionemo_data,
...     BlockShuffling(),
...     batch_size=64,
...     fetch_callback=bionemo_to_tensor
... )

Notes

This callback requires the bionemo-scdl package to be installed. The collate function handles the sparse matrix format used by BioNeMo.

Raises:

ImportError – If bionemo-scdl is not installed.