Changelog#
This document tracks all notable changes to scDataset.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[Unreleased]#
[0.2.0] - 2025-08-13#
Added#
Core
scDatasetclass with flexible sampling strategiesSampling strategies:
Streaming- Sequential sampling without shufflingBlockShuffling- Block-based shuffling for localityBlockWeightedSampling- Weighted sampling with blocksClassBalancedSampling- Automatic class balancing
Support for multiple data formats:
NumPy arrays
AnnData objects
HuggingFace Datasets
PyTorch Datasets
Any object with
__getitem__and__len__
Performance optimizations:
Block-based data fetching
Configurable fetch factors
Multiprocessing support
Customization features:
Custom fetch callbacks
Custom batch callbacks
Fetch and batch transforms
Comprehensive documentation and examples
Test suite with >90% coverage
GitHub Actions CI/CD pipeline
Technical Details#
Minimum Python version: 3.8
Core dependencies:
torch >= 1.2.0,numpy >= 1.17.0Compatible with PyTorch DataLoader
Supports both eager and lazy data loading patterns
Known Issues#
Performance may vary depending on storage backend (SSD vs HDD)
Note
This changelog will be updated with each release. See the GitHub releases page for the most up-to-date information.