scdataset.transforms.adata_to_mindex#
- scdataset.transforms.adata_to_mindex(batch, columns: List[str] | None = None)[source]#
Transform AnnData/AnnCollection batch to MultiIndexable with optional obs columns.
This transform converts a batch from an AnnCollection (or backed AnnData) into a MultiIndexable object containing the expression matrix and optionally selected observation columns. The MultiIndexable can then be indexed in subsequent batch operations.
- Parameters:
batch (AnnData-like) –
Batch from AnnCollection or backed AnnData. Must have:
.to_memory()method (for AnnCollection/backed AnnData).Xattribute (expression matrix).obsattribute (observation metadata)
columns (list of str, optional) – List of observation column names to include in the output. If None, only the X matrix is included.
- Returns:
A MultiIndexable object with:
'X': Dense expression matrix as numpy arrayAdditional keys for each column in
columns(as numpy arrays)
- Return type:
MultiIndexable
Examples
>>> # Basic usage - just X matrix >>> from scdataset import scDataset, BlockShuffling >>> from scdataset.transforms import adata_to_mindex >>> >>> dataset = scDataset( ... ann_collection, ... BlockShuffling(), ... batch_size=64, ... fetch_transform=adata_to_mindex ... )
>>> # With observation columns using functools.partial >>> from functools import partial >>> fetch_fn = partial(adata_to_mindex, columns=['cell_type', 'batch']) >>> dataset = scDataset( ... ann_collection, ... BlockShuffling(), ... batch_size=64, ... fetch_transform=fetch_fn ... ) >>> for batch in dataset: ... X = batch['X'] ... cell_types = batch['cell_type'] ... break
Notes
This transform calls
.to_memory()to materialize the AnnData object, which is necessary when working with backed or lazy AnnCollection objects.Sparse matrices are automatically converted to dense numpy arrays for compatibility with standard indexing operations.
See also
MultiIndexableContainer for synchronized multi-modal data