Skip to content

averinpa/dagsampler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dagsampler

PyPI version Python versions License: MIT Documentation

Configurable causal DAG simulator for synthetic mixed-type data and CI test benchmarks.

Documentation · Changelog

What it provides

  • CausalDataGenerator class for configurable simulation
  • Support for custom and random DAGs
  • Mixed continuous/binary/categorical nodes (configurable categorical cardinality)
  • Structural forms: linear, polynomial, interaction, sigmoid, cos, sin, stratum_means
  • Optional element-wise post_transform (tanh, sin, cos, exp_neg_abs, sqrt_abs, relu, sign)
  • Cross-type mechanisms:
    • continuous -> categorical (categorical_model.name = "threshold")
    • categorical -> continuous (functional_form.name = "stratum_means", including mixed-parent cases with metric_weights)
  • Noise models:
    • additive (gaussian, student_t, gamma, exponential, laplace, cauchy, uniform)
    • multiplicative (gaussian, student_t, gamma, exponential)
    • heteroskedastic (abs_first_parent, abs_parent_plus_const, mean_abs_plus_const)
  • Random weight sampling controls (including exclusion band around zero)
  • force_uniform_marginals for balanced exogenous binary / categorical draws
  • Template helpers (chain_config, fork_config, collider_config, independence_config)
  • Reproducibility via seed_structure and seed_data (or single seed)
  • Optional d-separation CI oracle output (store_ci_oracle=true)

Installation

From PyPI:

pip install dagsampler

Or with uv:

uv venv
source .venv/bin/activate
uv pip install dagsampler

From GitHub (latest main):

uv pip install "dagsampler @ git+https://github.com/averinpa/dagsampler.git"

Quick start (Python API)

from dagsampler import CausalDataGenerator

config = {
    "simulation_params": {"n_samples": 200, "seed": 42},
    "graph_params": {
        "type": "custom",
        "nodes": ["X", "Y", "Z1"],
        "edges": [["X", "Z1"], ["Y", "Z1"]],
    },
}

result = CausalDataGenerator(config).simulate()
data = result["data"]
dag = result["dag"]
params = result["parametrization"]

CLI

The package exposes dagsampler-generate.

dagsampler-generate \
  --config config.json \
  --output dataset.csv \
  --params-out params.json \
  --edges-out edges.json

config.json must contain the same structure used by CausalDataGenerator.

Learn more

Development

uv pip install -e ".[dev]"
pytest -q

About

Configurable causal DAG simulator for synthetic mixed-type data and CI test benchmarks

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages