Installation#
Requirements#
scDataset requires Python 3.8 or higher and the following dependencies:
torch >= 1.2.0numpy >= 1.17.0
Optional dependencies for specific data formats:
anndata- for AnnData supportdatasets- for HuggingFace Datasets support
Install from PyPI#
The easiest way to install scDataset is from PyPI:
pip install scDataset
This will install the latest stable release along with all required dependencies.
Install from GitHub#
To get the latest development version, install directly from GitHub:
pip install git+https://github.com/Kidara/scDataset.git
Development Installation#
For development, clone the repository and install in editable mode:
git clone https://github.com/Kidara/scDataset.git
cd scDataset
pip install -e .
To install development dependencies:
pip install -e ".[dev]"
Verify Installation#
You can verify your installation by importing the package:
import scdataset
print(scdataset.__version__)
Or run a quick test:
from scdataset import scDataset, Streaming
import numpy as np
# Create test data
data = np.random.randn(100, 50)
dataset = scDataset(data, Streaming(), batch_size=10)
# Test iteration
for batch in dataset:
print(f"Batch shape: {batch.shape}")
break
print("Installation successful!")
Troubleshooting#
- ImportError: No module named ‘torch’
Make sure PyTorch is installed. Visit pytorch.org for installation instructions.
- Performance Issues
For best performance with large datasets, consider installing:
pip install numba # For faster numerical operations