YAML Reference
Every flow is defined in a single flow.yaml file. This page documents every field, with types and defaults verified against source.
Top-Level Structure
A flow YAML has three top-level sections: metadata, parameters (optional), and blocks.
metadata:
# Flow identity and configuration (see below)
parameters:
# Optional runtime parameter definitions (see below)
blocks:
# Ordered list of block definitions (see below)The FlowValidator (source: src/sdg_hub/core/flow/validation.py) enforces:
blocksis required and must be a non-empty list.- Each block must have
block_typeandblock_configkeys. - Each
block_configmust containblock_name. metadata, if present, must be a dict with a non-emptynamestring.
metadata Section
Source: src/sdg_hub/core/flow/metadata.py -- class FlowMetadata
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
name | str | Yes | -- | Human-readable flow name. Minimum 1 character. |
id | str | No | auto-generated | Unique identifier. Must be lowercase, alphanumeric with hyphens, no leading/trailing hyphens. Auto-generated from name if omitted. |
description | str | No | "" | What the flow does. |
version | str | No | "1.0.0" | Semantic version matching ^\d+\.\d+\.\d+(-[a-zA-Z0-9.-]+)?$. |
author | str | No | "" | Author or contributor name. |
license | str | No | "Apache-2.0" | License identifier. |
tags | list[str] | No | [] | Tags for categorization. Automatically lowercased. |
recommended_models | RecommendedModels | No | None | Model recommendations (see below). |
dataset_requirements | DatasetRequirements | No | None | Input dataset validation rules (see below). |
output_columns | list[str] | No | None | Columns to keep in final output. Original input columns are always preserved. When set, intermediate columns are dropped during and after execution. Must be non-empty if specified; omit entirely to keep all columns. |
recommended_models
Source: src/sdg_hub/core/flow/metadata.py -- class RecommendedModels
recommended_models:
default: "openai/gpt-oss-120b"
compatible:
- "meta-llama/Llama-3.3-70B-Instruct"
- "microsoft/phi-4"
experimental:
- "gpt-4o"| Field | Type | Required | Default | Description |
|---|---|---|---|---|
default | str | Yes | -- | Primary recommended model. Cannot be empty. |
compatible | list[str] | No | [] | Models known to work well. |
experimental | list[str] | No | [] | Models not extensively tested. |
Model selection priority: default first, then compatible in order, then experimental in order.
dataset_requirements
Source: src/sdg_hub/core/flow/metadata.py -- class DatasetRequirements
dataset_requirements:
required_columns:
- "document"
- "document_outline"
optional_columns:
- "metadata"
min_samples: 1
max_samples: 10000
column_types:
document: "string"
description: "Input documents for processing"| Field | Type | Required | Default | Description |
|---|---|---|---|---|
required_columns | list[str] | No | [] | Columns that must be present. Flow fails if missing. |
optional_columns | list[str] | No | [] | Columns that can enhance performance. |
min_samples | int | No | 1 | Minimum row count. Must be >= 1. |
max_samples | int | No | None | Maximum row count. Must be >= min_samples if set. |
column_types | dict[str, str] | No | {} | Expected types for columns (documentation only). |
description | str | No | "" | Human-readable description of requirements. |
parameters Section
Optional. Defines flow-level parameters that can be overridden at runtime. Each parameter is a key-value pair where the key is the parameter name.
parameters:
temperature:
type: "float"
default: 0.7
description: "Sampling temperature for generation"
required: false
max_tokens:
type: "integer"
default: 2048
description: "Maximum token count"| Field | Type | Required | Description |
|---|---|---|---|
default | any | Yes | Default value for the parameter. |
type | str | No | Type hint (e.g., "string", "float", "integer"). |
description | str | No | Human-readable description. |
required | bool | No | Whether the parameter must be provided at runtime. |
blocks Section
An ordered list of block definitions. Each entry specifies a block type and its configuration.
blocks:
- block_type: "PromptBuilderBlock"
block_config:
block_name: "build_summary_prompt"
input_cols:
- "text"
output_cols: "summary_prompt"
prompt_config_path: "prompts/summarize.yaml"
- block_type: "LLMChatBlock"
block_config:
block_name: "generate_summary"
input_cols: "summary_prompt"
output_cols: "raw_summary"
max_tokens: 1024
temperature: 0.3
async_mode: trueBlock entry fields
| Field | Type | Required | Description |
|---|---|---|---|
block_type | str | Yes | Class name of the block (must exist in BlockRegistry). Valid values: AgentBlock, AgentResponseExtractorBlock, ColumnValueFilterBlock, DuplicateColumnsBlock, IndexBasedMapperBlock, JSONParserBlock, JSONStructureBlock, LLMChatBlock, LLMResponseExtractorBlock, MCPAgentBlock, MeltColumnsBlock, PromptBuilderBlock, RegexParserBlock, RenameColumnsBlock, RowMultiplierBlock, SamplerBlock, TagParserBlock, TextConcatBlock, UniformColumnValueSetter. |
block_config | dict | Yes | Configuration passed to the block constructor. |
block_config.block_name | str | Yes | Unique name within the flow. |
The block_config contents depend on the block type. Common fields include input_cols, output_cols, and block-specific parameters. See the Block documentation for per-block configuration.
Path resolution
Path fields in block_config (config_path, config_paths, prompt_config_path) are resolved relative to the directory containing the flow.yaml file. For example, if flow.yaml is at flows/my_flow/flow.yaml and a block specifies prompt_config_path: prompts/summary.yaml, it resolves to flows/my_flow/prompts/summary.yaml.
Complete Annotated Example
This is the Structured Text Insights Extraction Flow, taken from src/sdg_hub/flows/text_analysis/structured_insights/flow.yaml:
metadata:
id: green-clay-812
name: "Structured Text Insights Extraction Flow"
description: >-
Multi-step pipeline for extracting structured insights from text including
summary, keywords, entities, and sentiment analysis combined into a JSON output
version: "1.0.0"
author: "SDG Hub Contributors"
recommended_models:
default: "openai/gpt-oss-120b"
compatible:
- "meta-llama/Llama-3.3-70B-Instruct"
- "microsoft/phi-4"
- "mistralai/Mixtral-8x7B-Instruct-v0.1"
experimental:
- "gpt-4o"
tags:
- "text-analysis"
- "summarization"
- "nlp"
- "structured-output"
- "insights"
- "sentiment-analysis"
- "entity-extraction"
- "keyword-extraction"
license: "Apache-2.0"
dataset_requirements:
required_columns:
- "text"
description: >-
Input dataset should contain text content for analysis. Each text should be
substantial enough for meaningful analysis (minimum 50 words recommended).
blocks:
# Step 1: Build a prompt for summary extraction
- block_type: "PromptBuilderBlock"
block_config:
block_name: "build_summary_prompt"
input_cols:
- "text"
output_cols: "summary_prompt"
prompt_config_path: "prompts/summarize.yaml"
# Step 2: Generate the summary via LLM
- block_type: "LLMChatBlock"
block_config:
block_name: "generate_summary"
input_cols: "summary_prompt"
output_cols: "raw_summary"
max_tokens: 1024
temperature: 0.3
async_mode: true
# Step 3: Extract the assistant message content
- block_type: "LLMResponseExtractorBlock"
block_config:
block_name: "extract_summary"
input_cols: "raw_summary"
extract_content: true
expand_lists: true
# Step 4: Parse the summary from tagged output
- block_type: "TagParserBlock"
block_config:
block_name: "parse_summary"
input_cols: "extract_summary_content"
output_cols: "summary"
start_tags:
- "[SUMMARY]"
end_tags:
- "[/SUMMARY]"
# Steps 5-8: Same pattern for keywords extraction
- block_type: "PromptBuilderBlock"
block_config:
block_name: "build_keywords_prompt"
input_cols:
- "text"
output_cols: "keywords_prompt"
prompt_config_path: "prompts/extract_keywords.yaml"
- block_type: "LLMChatBlock"
block_config:
block_name: "generate_keywords"
input_cols: "keywords_prompt"
output_cols: "raw_keywords"
max_tokens: 512
temperature: 0.3
async_mode: true
- block_type: "LLMResponseExtractorBlock"
block_config:
block_name: "extract_keywords"
input_cols: "raw_keywords"
extract_content: true
expand_lists: true
- block_type: "TagParserBlock"
block_config:
block_name: "parse_keywords"
input_cols: "extract_keywords_content"
output_cols: "keywords"
start_tags:
- "[KEYWORDS]"
end_tags:
- "[/KEYWORDS]"
# Steps 9-12: Entities extraction (same pattern)
- block_type: "PromptBuilderBlock"
block_config:
block_name: "build_entities_prompt"
input_cols:
- "text"
output_cols: "entities_prompt"
prompt_config_path: "prompts/extract_entities.yaml"
- block_type: "LLMChatBlock"
block_config:
block_name: "generate_entities"
input_cols: "entities_prompt"
output_cols: "raw_entities"
max_tokens: 1024
temperature: 0.3
async_mode: true
- block_type: "LLMResponseExtractorBlock"
block_config:
block_name: "extract_entities"
input_cols: "raw_entities"
extract_content: true
expand_lists: true
- block_type: "TagParserBlock"
block_config:
block_name: "parse_entities"
input_cols: "extract_entities_content"
output_cols: "entities"
start_tags:
- "[ENTITIES]"
end_tags:
- "[/ENTITIES]"
# Steps 13-16: Sentiment analysis (same pattern)
- block_type: "PromptBuilderBlock"
block_config:
block_name: "build_sentiment_prompt"
input_cols:
- "text"
output_cols: "sentiment_prompt"
prompt_config_path: "prompts/analyze_sentiment.yaml"
- block_type: "LLMChatBlock"
block_config:
block_name: "generate_sentiment"
input_cols: "sentiment_prompt"
output_cols: "raw_sentiment"
max_tokens: 256
temperature: 0.1
async_mode: true
- block_type: "LLMResponseExtractorBlock"
block_config:
block_name: "extract_sentiment"
input_cols: "raw_sentiment"
extract_content: true
expand_lists: true
- block_type: "TagParserBlock"
block_config:
block_name: "parse_sentiment"
input_cols: "extract_sentiment_content"
output_cols: "sentiment"
start_tags:
- "[SENTIMENT]"
end_tags:
- "[/SENTIMENT]"
# Step 17: Combine all analyses into a JSON structure
- block_type: "JSONStructureBlock"
block_config:
block_name: "create_structured_insights"
input_cols:
- "summary"
- "keywords"
- "entities"
- "sentiment"
output_cols:
- "structured_insights"
ensure_json_serializable: trueThis flow demonstrates the common pattern: PromptBuilderBlock builds messages, LLMChatBlock generates, LLMResponseExtractorBlock extracts the assistant content, and TagParserBlock parses tagged regions.