Development Guide
This guide covers everything you need to contribute to SDG Hub, from initial setup through CI requirements and contribution workflows.
Setup
Prerequisites
- Python 3.10 or later
- uv package manager
Clone and Install
git clone https://github.com/Red-Hat-AI-Innovation-Team/sdg_hub.git
cd sdg_hubInstall with development dependencies:
uv pip install .[dev]
# or equivalently:
uv sync --extra devPre-commit Hooks
Pre-commit hooks are required for all contributors. Install them immediately after cloning:
uv run pre-commit install
uv run pre-commit install --hook-type commit-msgTesting
All commands below are verified against pyproject.toml and the CI workflow in .github/workflows/test.yml.
Unit Tests
uv run pytest tests/blocks tests/connectors tests/flow tests/utils -m "not (examples or slow)"Unit Tests with Coverage
uv run pytest --cov=sdg_hub --cov-report=term tests/blocks tests/connectors tests/flow tests/utilsIntegration Tests
Integration tests require API keys and the integration extra:
uv sync --extra dev --extra integration
uv run pytest tests/integration -v -sRunning Specific Tests
# Single file
uv run pytest tests/blocks/test_specific_file.py
# Pattern match
uv run pytest -k "test_pattern"Pytest Configuration
The following settings are defined in pyproject.toml under [tool.pytest.ini_options]:
| Setting | Value |
|---|---|
asyncio_mode | auto |
asyncio_default_fixture_loop_scope | function |
LOG_LEVEL | WARNING (via pytest-env) |
Custom markers:
integration-- notebook-based end-to-end testsslow-- long-running tests (>60 s)
Linting and Formatting
Ruff
Ruff handles both linting and formatting. Configuration lives in pyproject.toml under [tool.ruff].
# Lint with auto-fix
uv run ruff check --fix src/ tests/
# Format
uv run ruff format src/ tests/
# Check only (no changes) -- same as CI
uv run ruff check src/ tests/
uv run ruff format --check src/ tests/Enabled rule sets: E (pycodestyle), F (Pyflakes), I (isort), N (pep8-naming), Q (flake8-quotes), TID (flake8-tidy-imports).
Target version: Python 3.10. Line length: 88.
Mypy
uv run mypy src/sdg_hubThe CI workflow runs uv run mypy src/sdg_hub --show-error-codes. Configuration is in pyproject.toml under [tool.mypy] with disable_error_code = ["import-not-found", "import-untyped"].
Make Targets
The Makefile provides two optional convenience targets that require external tools:
make actionlint # Lint GitHub Actions (requires actionlint + shellcheck)
make md-lint # Lint markdown files (requires podman)
make verify # Run ruff check + ruff format --checkPre-commit Hooks
The following hooks are configured in .pre-commit-config.yaml and run automatically on each commit:
| Hook | Stage | Source | What It Does |
|---|---|---|---|
uv-lock | pre-commit | astral-sh/uv-pre-commit | Keeps uv.lock in sync with pyproject.toml |
ruff | pre-commit | astral-sh/ruff-pre-commit | Lints Python code with auto-fix (--fix) |
ruff-format | pre-commit | astral-sh/ruff-pre-commit | Formats Python code |
conventional-pre-commit | commit-msg | compilerla/conventional-pre-commit | Validates commit message format |
The conventional-pre-commit hook runs at the commit-msg stage, which is why it requires the separate install command (uv run pre-commit install --hook-type commit-msg).
CI Requirements
All pull requests must pass these checks before merging. The table below lists each check with its exact command and workflow file.
| Check | Command | Workflow |
|---|---|---|
| Ruff formatting | ruff format --check src/ tests/ | lint.yml |
| Ruff linting | ruff check src/ tests/ | lint.yml |
| Mypy type checking | mypy src/sdg_hub --show-error-codes | lint.yml |
| Unit tests | pytest --cov=sdg_hub tests/blocks tests/connectors tests/flow tests/utils -m "not (examples or slow)" | test.yml |
| Conventional commits | commitlint | commitlint.yml |
| Lock file sync | uv lock --check | lock.yml |
| Markdown lint | markdownlint-cli2 | docs.yml |
| GitHub Actions lint | actionlint | actionlint.yml |
| Integration tests | pytest tests/integration -v -s | integration-test.yml (gated) |
Unit tests run on Python 3.10 and 3.11 (Ubuntu), plus Python 3.11 (macOS).
Integration tests are gated: they run on push to main, on workflow_dispatch, or on PRs with the run-integration-tests label when relevant paths change.
Contributing Blocks
Blocks live under src/sdg_hub/core/blocks/ in category directories:
llm/-- LLM-powered blocks (chat, prompt building, text parsing)transform/-- data transformation blocks (column operations, text manipulation)filtering/-- data filtering blocks with quality thresholdsagent/-- agent framework integration blocks
Creating a Block
- Create a new file in the appropriate category directory.
- Inherit from
BaseBlockand implement thegenerate()method. - Register with the
@BlockRegistry.register()decorator. - Add tests in
tests/blocks/<category>/.
Minimal template:
from typing import Any
import pandas as pd
from sdg_hub.core.blocks.base import BaseBlock
from sdg_hub.core.blocks.registry import BlockRegistry
@BlockRegistry.register("MyBlock", "transform", "Short description")
class MyBlock(BaseBlock):
"""What this block does.
Parameters
----------
custom_param : str
Description of the parameter.
"""
custom_param: str = "default"
def generate(self, samples: pd.DataFrame, **kwargs: Any) -> pd.DataFrame:
# Implementation here
return samplesBlock Test Template
import pandas as pd
import pytest
from sdg_hub.core.blocks.transform.my_block import MyBlock
class TestMyBlock:
def test_basic(self):
block = MyBlock(
block_name="test",
input_cols=["text"],
output_cols=["result"],
)
df = pd.DataFrame({"text": ["hello", "world"]})
result = block(df)
assert "result" in result.columns
def test_missing_column(self):
block = MyBlock(
block_name="test",
input_cols=["missing"],
output_cols=["result"],
)
df = pd.DataFrame({"other": ["data"]})
with pytest.raises(Exception):
block(df)Block Checklist
- Block placed in the correct category directory
- Inherits from
BaseBlockand implementsgenerate() - Registered with
@BlockRegistry.register(name, category, description) - Pydantic field validation for configuration
- Tests cover success, error, and edge cases
- Docstring with parameter descriptions
Contributing Flows
Flows are defined as YAML files under src/sdg_hub/flows/.
Directory Structure
src/sdg_hub/flows/<category>/<use_case>/<variant>/
flow.yaml
prompts/
prompt_template.yamlflow.yaml Requirements
Every flow must include:
metadata:
name: "flow_name"
version: "1.0.0"
author: "Author Name"
description: "What this flow does"
parameters:
param_name:
type: "string"
default: "value"
description: "What this parameter controls"
blocks:
- block_type: "BlockTypeName"
block_config:
block_name: "unique_name"
# block-specific configFlow Checklist
- Directory structure follows the convention above
flow.yamlincludes complete metadata- Required input columns documented
- Supporting prompt templates included
- Integration tests validate execution
Contributing Connectors
Connectors handle communication with external agent frameworks and live under src/sdg_hub/core/connectors/agent/.
Creating a Connector
- Create a new file in
src/sdg_hub/core/connectors/agent/. - Inherit from
BaseAgentConnector. - Register with
@ConnectorRegistry.register("name"). - Implement required methods.
from sdg_hub.core.connectors.agent.base import BaseAgentConnector
from sdg_hub.core.connectors.registry import ConnectorRegistry
@ConnectorRegistry.register("myframework")
class MyFrameworkConnector(BaseAgentConnector):
def build_request(self, **kwargs):
# Build the HTTP request for the agent framework
...
def parse_response(self, response):
# Parse the raw response
return response
@classmethod
def extract_text(cls, response):
# Extract text from agent response (used by AgentResponseExtractorBlock)
return None
@classmethod
def extract_session_id(cls, response):
return None
@classmethod
def extract_tool_trace(cls, response):
return NoneThe extract_* class methods are used by AgentResponseExtractorBlock to extract structured data from agent responses without changing block code.
Git Workflow
Conventional Commits
Commit messages must follow Conventional Commits format. This is enforced by the conventional-pre-commit hook at the commit-msg stage and validated in CI by commitlint.
Format: <type>(<scope>): <description>
Allowed types: feat, fix, docs, style, refactor, perf, test, build, ci, chore, revert.
Examples:
feat(blocks): add TextSummarizerBlock for document summarization
fix(flows): correct parameter validation in QA generation flow
docs(blocks): update LLM block examples with new model config
test(connectors): add integration tests for LangGraph connectorBranch Naming
feature/<description>-- new features (blocks, flows, connectors)fix/<description>-- bug fixesdocs/<description>-- documentation changeschore/<description>-- maintenance tasks
Pull Request Process
Create a feature branch from
main.Implement changes with tests.
Run tests and linting locally:
uv run pytest tests/blocks tests/connectors tests/flow tests/utils -m "not (examples or slow)" uv run ruff check src/ tests/ uv run ruff format --check src/ tests/ uv run mypy src/sdg_hubOpen a PR with a clear description.
Address review feedback.
Squash and merge when approved.
Docstring Guidelines
Docstrings are optional but recommended for public API functions, complex functions, and core framework components. Use NumPy-style format:
def my_function(param1: str, param2: int = 5) -> bool:
"""One-line summary.
Longer description if needed.
Parameters
----------
param1 : str
Description of param1.
param2 : int, optional
Description of param2 (default: 5).
Returns
-------
bool
Description of return value.
Raises
------
ValueError
When invalid parameters are provided.
"""