Changelog#

This document tracks all notable changes to scDataset.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]#

Core scDataset class with flexible sampling strategies
Sampling strategies:
- Streaming - Sequential sampling without shuffling
- BlockShuffling - Block-based shuffling for locality
- BlockWeightedSampling - Weighted sampling with blocks
- ClassBalancedSampling - Automatic class balancing
Support for multiple data formats:
- NumPy arrays
- AnnData objects
- HuggingFace Datasets
- PyTorch Datasets
- Any object with __getitem__ and __len__
Performance optimizations:
- Block-based data fetching
- Configurable fetch factors
- Multiprocessing support
Customization features:
- Custom fetch callbacks
- Custom batch callbacks
- Fetch and batch transforms
Comprehensive documentation and examples
Test suite with >90% coverage
GitHub Actions CI/CD pipeline

Note

This changelog will be updated with each release. See the GitHub releases page for the most up-to-date information.