scdataset.transforms.adata_to_mindex

scdataset.transforms.adata_to_mindex#

scdataset.transforms.adata_to_mindex(batch, columns: List[str] | None = None)[source]#

Transform AnnData/AnnCollection batch to MultiIndexable with optional obs columns.

This transform converts a batch from an AnnCollection (or backed AnnData) into a MultiIndexable object containing the expression matrix and optionally selected observation columns. The MultiIndexable can then be indexed in subsequent batch operations.

Parameters:
  • batch (AnnData-like) –

    Batch from AnnCollection or backed AnnData. Must have:

    • .to_memory() method (for AnnCollection/backed AnnData)

    • .X attribute (expression matrix)

    • .obs attribute (observation metadata)

  • columns (list of str, optional) – List of observation column names to include in the output. If None, only the X matrix is included.

Returns:

A MultiIndexable object with:

  • 'X': Dense expression matrix as numpy array

  • Additional keys for each column in columns (as numpy arrays)

Return type:

MultiIndexable

Examples

>>> # Basic usage - just X matrix
>>> from scdataset import scDataset, BlockShuffling
>>> from scdataset.transforms import adata_to_mindex
>>>
>>> dataset = scDataset(
...     ann_collection,
...     BlockShuffling(),
...     batch_size=64,
...     fetch_transform=adata_to_mindex
... )
>>> # With observation columns using functools.partial
>>> from functools import partial
>>> fetch_fn = partial(adata_to_mindex, columns=['cell_type', 'batch'])
>>> dataset = scDataset(
...     ann_collection,
...     BlockShuffling(),
...     batch_size=64,
...     fetch_transform=fetch_fn
... )
>>> for batch in dataset:
...     X = batch['X']
...     cell_types = batch['cell_type']
...     break

Notes

This transform calls .to_memory() to materialize the AnnData object, which is necessary when working with backed or lazy AnnCollection objects.

Sparse matrices are automatically converted to dense numpy arrays for compatibility with standard indexing operations.

See also

MultiIndexable

Container for synchronized multi-modal data