scdataset.MultiIndexable#
- class scdataset.MultiIndexable(*indexables, names: List[str] | None = None, unstructured: Dict[str, Any] | None = None, **named_indexables)[source]
Bases:
objectContainer for multiple indexable objects that should be indexed together.
This class allows you to group multiple indexable objects (arrays, lists, etc.) and index them synchronously. It’s particularly useful for scenarios like:
Multi-modal single-cell data (gene expression + protein data)
Features and labels (X, y) that need to stay aligned
Multiple data modalities that share the same sample dimension
The class supports both positional and named access to the contained indexables, and ensures all indexables have the same length along the first dimension.
Additionally, it supports storing unstructured metadata that is not indexed but remains accessible after indexing operations. This is useful for keeping metadata like gene names, dataset info, or other non-sample-aligned data.
- Parameters:
*indexables (indexable objects or dict) – Variable number of indexable objects that should be indexed together, OR a single dictionary where keys become names and values are indexables. All indexables must have the same length in the first dimension.
names (list of str, optional) – Names for the indexables when using positional arguments. Must have the same length as the number of indexables. Cannot be used with dictionary input.
unstructured (dict, optional) – Dictionary of non-indexable metadata. This data is preserved unchanged when the MultiIndexable is indexed/subsetted. Useful for storing metadata like gene names, dataset descriptions, or configuration.
**named_indexables (dict, optional) – Named indexable objects passed as keyword arguments. Cannot be used together with positional indexables.
- Variables:
- Raises:
ValueError – If indexables have different lengths along the first dimension, or if the number of names doesn’t match the number of indexables.
TypeError – If both positional and keyword indexables are provided, or if unstructured is not a dictionary.
Examples
Create with positional arguments:
>>> import numpy as np >>> x = np.random.randn(100, 50) >>> y = np.random.randint(0, 3, 100) >>> multi = MultiIndexable(x, y, names=['features', 'labels']) >>> len(multi) 100 >>> multi.count 2
Create with dictionary as positional argument:
>>> data_dict = { ... 'genes': np.random.randn(100, 2000), ... 'proteins': np.random.randn(100, 100) ... } >>> multi = MultiIndexable(data_dict) >>> subset = multi[10:20] # Get samples 10-19 from both modalities >>> subset['genes'].shape (10, 2000)
Create with keyword arguments:
>>> multi = MultiIndexable( ... genes=np.random.randn(100, 2000), ... proteins=np.random.randn(100, 100) ... ) >>> multi.names ['genes', 'proteins']
Create with unstructured metadata:
>>> gene_names = ['Gene_' + str(i) for i in range(2000)] >>> multi = MultiIndexable( ... X=np.random.randn(100, 2000), ... unstructured={'gene_names': gene_names, 'dataset_name': 'MyDataset'} ... ) >>> multi.unstructured['gene_names'][:3] ['Gene_0', 'Gene_1', 'Gene_2'] >>> subset = multi[10:20] # Unstructured data is preserved >>> subset.unstructured['dataset_name'] 'MyDataset'
Access by name or position:
>>> multi = MultiIndexable(x, y, names=['x', 'y']) >>> same_x1 = multi[0] # Access by position >>> same_x2 = multi['x'] # Access by name >>> np.array_equal(same_x1, same_x2) True
Use with scDataset:
>>> from scdataset import scDataset, Streaming >>> dataset = scDataset(multi, Streaming(), batch_size=32) >>> for batch in dataset: ... genes, proteins = batch[0], batch[1] # or batch['genes'], batch['proteins'] ... break
See also
scdataset.scDatasetMain dataset class that can use MultiIndexable objects
Methods
__init__(*indexables[, names, unstructured])Initialize MultiIndexable with indexable objects.
items()Iterate over (name, indexable) pairs.
keys()Get the names or indices of indexables.
values()Get the indexable objects.
Attributes
countNumber of indexables contained in this object.
namesNames of the indexables, if provided.
unstructuredDictionary of non-indexable metadata.
unstructured_keysList of keys in the unstructured metadata dictionary.
- __init__(*indexables, names: List[str] | None = None, unstructured: Dict[str, Any] | None = None, **named_indexables)[source]
Initialize MultiIndexable with indexable objects.
Can be initialized in four ways: 1. Positional: MultiIndexable(x, y, z) 2. Positional with names: MultiIndexable(x, y, names=[‘x’, ‘y’]) 3. Dictionary as positional: MultiIndexable({‘x’: x_data, ‘y’: y_data}) 4. Named keywords: MultiIndexable(x=x_data, y=y_data)
All variants support the optional
unstructuredparameter for non-indexable metadata.
- property count: int
Number of indexables contained in this object.
- property unstructured: Dict[str, Any]
Dictionary of non-indexable metadata.
This data is preserved unchanged when the MultiIndexable is indexed or subsetted. Returns the internal dictionary directly for efficiency; modify with care if you need to preserve the original.
- Returns:
Dictionary containing unstructured metadata.
- Return type:
Examples
>>> multi = MultiIndexable( ... X=np.zeros((10, 5)), ... unstructured={'gene_names': ['A', 'B', 'C', 'D', 'E']} ... ) >>> multi.unstructured['gene_names'] ['A', 'B', 'C', 'D', 'E']
- property unstructured_keys: List[str]
List of keys in the unstructured metadata dictionary.
Examples
>>> multi = MultiIndexable( ... X=np.zeros((10, 5)), ... unstructured={'gene_names': ['A', 'B'], 'dataset': 'test'} ... ) >>> multi.unstructured_keys ['gene_names', 'dataset']
- __getitem__(key: int | str | slice | Sequence[int] | ndarray)[source]
Index the MultiIndexable object.
- Parameters:
key (int, str, slice, or array-like) –
int: Return the indexable at that position
str: Return the indexable with that name (if names provided)
slice/array: Return new MultiIndexable with subsets at those sample indices
- Returns:
Single indexable if key is int or str
New MultiIndexable with subsets if key represents sample indices
- Return type:
object or MultiIndexable
Notes
When subsetting with slices or arrays, the unstructured metadata is preserved unchanged in the resulting MultiIndexable.
- __iter__()[source]
Iterate over indexables.
- items()[source]
Iterate over (name, indexable) pairs.
- Yields:
tuple – (name, indexable) pairs if names are available, (index, indexable) pairs otherwise.
- keys()[source]
Get the names or indices of indexables.
- Returns:
List of names if available, list of indices otherwise.
- Return type: