Contributing#
We welcome contributions to scDataset! This document outlines how to contribute to the project.
Getting Started#
Fork the repository on GitHub
Clone your fork locally:
git clone https://github.com/yourusername/scDataset.git cd scDataset
Create a development environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -e ".[dev]"
Create a feature branch:
git checkout -b feature/your-feature-name
Development Setup#
Install development dependencies:
pip install -e ".[dev,test,docs]"
This installs:
Core dependencies:
torch,numpyDevelopment tools:
black,flake8,mypyTesting:
pytest,pytest-covDocumentation:
sphinx,sphinx-book-theme
Code Style#
We use several tools to maintain code quality:
Black for code formatting:
black src/ tests/
Flake8 for linting:
flake8 src/ tests/
MyPy for type checking:
mypy src/
Pre-commit hooks (optional but recommended):
pre-commit install
Testing#
Run the test suite:
pytest
Run with coverage:
pytest --cov=scdataset --cov-report=html
Test specific modules:
pytest tests/test_strategy.py
Writing Tests#
Place tests in the
tests/directoryUse descriptive test names:
test_streaming_strategy_returns_correct_indicesTest both success and failure cases
Add tests for any new functionality
Documentation#
Build documentation locally:
cd docs
make html
View the built documentation:
open build/html/index.html # On macOS
# On Linux: xdg-open build/html/index.html
Writing Documentation#
Use reStructuredText format for documentation files
Add docstrings to all public functions and classes
Follow NumPy docstring style
Include examples in docstrings when helpful
Example docstring:
def my_function(param1: int, param2: str = "default") -> bool:
"""
Brief description of the function.
Parameters
----------
param1 : int
Description of param1.
param2 : str, default="default"
Description of param2.
Returns
-------
bool
Description of return value.
Examples
--------
>>> my_function(42, "test")
True
"""
Types of Contributions#
Bug Reports#
When reporting bugs, please include:
Clear description of the problem
Minimal example to reproduce the issue
System information (OS, Python version, package versions)
Expected vs actual behavior
Feature Requests#
For new features:
Describe the use case and motivation
Provide examples of how it would be used
Consider backwards compatibility
Code Contributions#
Start with an issue to discuss the change
Keep changes focused - one feature/fix per PR
Add tests for new functionality
Update documentation as needed
Follow code style guidelines
Pull Request Process#
Create an issue first (unless it’s a small fix)
Fork and clone the repository
Create a feature branch
Make your changes:
Write code
Add tests
Update documentation
Run tests and style checks
Commit your changes:
git add . git commit -m "feat: add new sampling strategy"
Push to your fork:
git push origin feature/your-feature-name
Create a Pull Request on GitHub
Commit Message Guidelines#
We follow Conventional Commits:
feat:: New featurefix:: Bug fixdocs:: Documentation changestest:: Adding testsrefactor:: Code refactoringstyle:: Code style changesci:: CI/CD changes
Examples:
feat: add weighted sampling strategy
fix: resolve memory leak in block shuffling
docs: improve quickstart examples
test: add integration tests for scDataset
Review Process#
All submissions require review. We use GitHub pull requests for this purpose.
Automated checks must pass (tests, linting, etc.)
At least one maintainer must approve
All conversations must be resolved
Documentation must be updated if needed
Release Process#
Releases are handled by maintainers:
Version bump following semantic versioning
Update changelog
Create GitHub release
Publish to PyPI
Community Guidelines#
Be respectful and inclusive
Follow the code of conduct
Help others and share knowledge
Stay on topic in discussions
Getting Help#
GitHub Issues: Bug reports and feature requests
GitHub Discussions: Questions and general discussion
Documentation: Check the docs first!
Thank you for contributing to scDataset! 🎉