Contributing#

We welcome contributions to scDataset! This document outlines how to contribute to the project.

Getting Started#

Fork the repository on GitHub

Clone your fork locally:

git clone https://github.com/yourusername/scDataset.git
cd scDataset

Create a development environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -e ".[dev]"

Create a feature branch:

git checkout -b feature/your-feature-name

Development Setup#

Install development dependencies:

pip install -e ".[dev,test,docs]"

This installs:

Core dependencies: torch, numpy
Development tools: black, flake8, mypy
Testing: pytest, pytest-cov
Documentation: sphinx, sphinx-book-theme

Code Style#

We use several tools to maintain code quality:

Black for code formatting:

black src/ tests/

Flake8 for linting:

flake8 src/ tests/

MyPy for type checking:

mypy src/

Pre-commit hooks (optional but recommended):

pre-commit install

Testing#

Run the test suite:

pytest

Run with coverage:

pytest --cov=scdataset --cov-report=html

Test specific modules:

pytest tests/test_strategy.py

Writing Tests#

Place tests in the tests/ directory
Use descriptive test names: test_streaming_strategy_returns_correct_indices
Test both success and failure cases
Add tests for any new functionality

Documentation#

Build documentation locally:

cd docs
make html

View the built documentation:

open build/html/index.html  # On macOS
# On Linux: xdg-open build/html/index.html

Writing Documentation#

Use reStructuredText format for documentation files
Add docstrings to all public functions and classes
Follow NumPy docstring style
Include examples in docstrings when helpful

Example docstring:

def my_function(param1: int, param2: str = "default") -> bool:
    """
    Brief description of the function.

    Parameters
    ----------
    param1 : int
        Description of param1.
    param2 : str, default="default"
        Description of param2.

    Returns
    -------
    bool
        Description of return value.

    Examples
    --------
    >>> my_function(42, "test")
    True
    """

Types of Contributions#

Bug Reports#

When reporting bugs, please include:

Clear description of the problem
Minimal example to reproduce the issue
System information (OS, Python version, package versions)
Expected vs actual behavior

Feature Requests#

For new features:

Describe the use case and motivation
Provide examples of how it would be used
Consider backwards compatibility

Code Contributions#

Start with an issue to discuss the change
Keep changes focused - one feature/fix per PR
Add tests for new functionality
Update documentation as needed
Follow code style guidelines

Pull Request Process#

Create an issue first (unless it’s a small fix)
Fork and clone the repository
Create a feature branch
Make your changes:
- Write code
- Add tests
- Update documentation
- Run tests and style checks

Commit your changes:

git add .
git commit -m "feat: add new sampling strategy"

Push to your fork:

git push origin feature/your-feature-name

Create a Pull Request on GitHub

Commit Message Guidelines#

We follow Conventional Commits:

feat:: New feature
fix:: Bug fix
docs:: Documentation changes
test:: Adding tests
refactor:: Code refactoring
style:: Code style changes
ci:: CI/CD changes

Examples:

feat: add weighted sampling strategy
fix: resolve memory leak in block shuffling
docs: improve quickstart examples
test: add integration tests for scDataset

Review Process#

All submissions require review. We use GitHub pull requests for this purpose.

Automated checks must pass (tests, linting, etc.)
At least one maintainer must approve
All conversations must be resolved
Documentation must be updated if needed

Release Process#

Releases are handled by maintainers:

Version bump following semantic versioning
Update changelog
Create GitHub release
Publish to PyPI

Community Guidelines#

Be respectful and inclusive
Follow the code of conduct
Help others and share knowledge
Stay on topic in discussions

Getting Help#

GitHub Issues: Bug reports and feature requests
GitHub Discussions: Questions and general discussion
Documentation: Check the docs first!

Thank you for contributing to scDataset! 🎉

Contributing

Contents