Skip to content

Red-Hat-AI-Innovation-Team/sdg_hub

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

416 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SDG Hub

Composable blocks and flows for synthetic data generation

Docs PyPI Tests Python 3.10+ License Coverage Ask DeepWiki


SDG Hub Demo

SDG Hub is a Python framework for building synthetic data generation pipelines. Chain LLM, parsing, transform, filtering, and agent blocks into YAML-defined flows -- then generate training data at scale.

Get Started

pip install sdg-hub
from sdg_hub import FlowRegistry, Flow

# Discover and load a built-in flow
FlowRegistry.discover_flows()
flow = Flow.from_yaml(FlowRegistry.get_flow_path("MCP Server Distillation"))

# Configure and run
flow.set_model_config(model="openai/gpt-4o")
result = flow.generate(dataset)

See the Quick Start for a full walkthrough, or browse all built-in flows.

Documentation

Full documentation at ai-innovation.team/sdg_hub

  • Installation -- setup, optional dependencies, development install
  • Quick Start -- end-to-end walkthrough from loading a flow to generating data
  • Core Concepts -- blocks, flows, registries, and dataset handling
  • Block Reference -- LLM, parsing, transform, filtering, agent, and custom blocks
  • Flow Reference -- YAML schema, built-in flows, custom flows
  • API Reference -- auto-generated from source
  • Contributing -- development setup and contribution guidelines

License

Apache License 2.0 -- see LICENSE.


Built by the Red Hat AI Innovation Team