SD-AI

Open source repository for the SD-AI Project.

Contains the engines used by Stella & CoModel, evaluations used to test those engines and a frontend used to explore those evaluations and engines.

Architecture and Data Structures

sd-ai is a NodeJS Express app with simple JSON-encoded HTTP API
all AI functionality in sd-ai is implemented as an "engine"
an engine is a javascript class that can implement ai functionality using any libraries/apis supported by javascript
- /engines folder contains examples including the simplest possible engine: predprey and engines like qualitative, quantitative, and seldon
sd-ai wraps engines to provides endpoints to:
- list all engines
- list parameters required/supported by each specific engine
- generating a model using a specific engine
all engines can be automatically tested for quality using evals

Engine

an engine only needs to do 2 things:
- provide a function to generate a model based on a prompt
- tell us what additional parameters users can pass to it

Additional Parameters

defined via additionalParameters() function on each engine class
format specifically crafted to allow your engine to be automatically incorporated into the Stella GUI and the sd-ai website

API Example

GET /api/v1/engines/:engine/parameters
Returns

{ 
    success: <bool>, 
    parameters:[{
        name: <string, unique name for the parmater that is passed to generate call>,
        type: <string, string|number|boolean>,
        required: <boolean, whether or not this parameter must be passed to the generate call>,
        uiElement: <string, type of UI element the client should use so that the user can enter this value.  Valid values are textarea|lineedit|password|combobox|hidden|checkbox>,
        label: <string, name to put next to the UI element in the client>,
        description: <string, description of what this parameter does, used as a tooltip or placeholder text in the client>,
        defaultValue: <string, default value for this parameter if there is one, otherwise skipped>,
        options: <array, of objects with two attributes 'label' and 'value' only used if 'uiElement' is combobox>,
        saveForUser: <string, whether or not this field should be saved for the user by the client, valid values are local|global leave unspecified if not saved>,
        minHeight: <int, NOT REQUIRED, default 100 only relevant if 'uiElement' is textarea -- this is the minimum height of that text area>,
        maxHeight: <int, NOT REQUIRED, default intmax, only relevant if 'uiElement' is textarea -- this is the maximum height of that text area>
    }] 
}

Generate

does the job of diagram generation, it's the workhorse of the engine
defined via generate(prompt, currentModel, parameters) function on each engine class
a complete diagram should be returned by each request, even if that just means returning an empty diagram or the same diagram the user passed in via currentModel

API Example

POST /api/v1/:engine/generate
JSON data

{
    "prompt": "", # Requested model or changes to model to be provided to the AI
    "currentModel": { "relationships": [], "variables": []} # Optional sd-json representation of the current model
    ....
    # additionalParameters given by `/api/v1/:engine/parameters`
}

Returns

{
    success: <bool>,
    model: {variables: [], relationships: [], specs?: {} },
    supportingInfo?: {} # only provided if supported by engine
}

SD-JSON

{
    variables: [{
        name: <string>,
        type: <string>, # stock|flow|variable
        equation?: <string>,
        documentation?: <string>,
        units?: <string>,
        uniflow?: <boolean>, # For flows only: true if flow should never be negative
        inflows?: Array<string>,
        outflows?: Array<string>,
        dimensions?: Array<string>, # Array of dimension names for arrayed variables
        arrayEquations?: [{ # Used for arrayed variables with element-specific equations
            equation: <string>,
            forElements: Array<string> # Array element names matching dimensions
        }],
        crossLevelGhostOf?: <string>, # For modular models: references source variable
        graphicalFunction?: {
            points: [
                {x: <number>, y: <number>}
                ...
            ]
        },
        subType?: <string>, # Discrete-entity sub-type (see below)
        additionalProperties?: { # Sub-type-specific settings (see below) }
    }],
    relationships: [{
        reasoning?: <string>, # Explanation for why this relationship is here
        from: <string>, # The variable the connection starts with
        to: <string>, # The variable the connection ends with
        polarity: <string>, # "+" or "-" or ""
        polarityReasoning?: <string> # Explanation for why this polarity was chosen
    }],
    modules?: [{ # Module definitions for hierarchical model organization
        name: <string>, # Simple module name (alphanumeric + underscores only)
        parentModule: <string> # Parent module name (empty string if top-level)
    }],
    specs?: {
        startTime: <number>,
        stopTime: <number>,
        dt?: <number>,
        timeUnits?: <string>,
        integrationMethod?: <string>, # "Euler" or "RK4"
        arrayDimensions?: [{ # Array dimension definitions (all four fields required)
            type: <string>, # "numeric" or "labels" - numeric auto-generates element names as strings ('1','2','3'), labels use user-defined meaningful names
            name: <string>, # Singular, alphanumeric dimension name (e.g., "region" not "regions")
            size: <number>, # Positive integer - number of elements in dimension
            elements: Array<string> # Element names - for numeric: auto-generated ['1','2','3'], for labels: user-defined ['North','South','East','West']
        }]
    }
}

? denotes an optional attribute

Arrays in SD-JSON

Variables can be arrayed over one or more dimensions to create multi-dimensional arrays:

Dimensions: Defined in specs.arrayDimensions with all four required fields:
- type: Either "numeric" (auto-generates elements as '1','2','3') or "labels" (user-defined element names)
- name: Singular, alphanumeric dimension name (e.g., "region" not "regions")
- size: Positive integer count of elements
- elements: Array of element names matching the size
Arrayed Variables: Reference dimensions by name in their dimensions array (order matters)
Array Equations:
- If all elements use the SAME formula: uses equation field only
- If elements have DIFFERENT formulas OR for arrayed STOCKS: uses arrayEquations array with element-specific equations
- Each arrayEquations entry has equation and forElements (ordered to match the variable's dimensions list)

Modules in SD-JSON

Models can be organized into modules for better structure and encapsulation:

Module Definition: Modules are defined in the top-level modules array:
- name: Simple module name (alphanumeric + underscores, no spaces, never module-qualified)
- parentModule: Name of containing module (empty string for top-level modules)
- Modules can be nested to create hierarchical structures
Module Naming in Variables: Use dot notation: ModuleName.variableName (e.g., Hares.population, Lynx.births)
Ghost Variables: For inter-module references, create cross-level ghost variables:
- Set crossLevelGhostOf to the fully qualified source variable name
- Leave equation field empty (empty string)
- Ghost variable has same local name as source but exists in consuming module
- All equations in consuming module reference the ghost, not the original source

Discrete-Entity Sub-Types in SD-JSON

Variables can have a subType field that further classifies them. Sub-types are a refinement of type — the top-level type field remains "stock", "flow", or "variable".

Stock sub-types — also set additionalProperties with the relevant configuration:

`subType`	Description
`"queue"`	A waiting line that holds discrete items until they are dispatched.
`"oven"`	A batch processor where items are held for a fixed cook time then released together.
`"conveyor"`	A pipeline delay where items travel a fixed transit time before exiting from the other end.

Flow sub-types — automatically managed flows. Set subType only; leave equation as an empty string:

`subType`	Description
`"discreteOutflow"`	The output flow from a conveyor or oven.
`"conveyorLeakage"`	The leakage flow from a conveyor. Set `additionalProperties` to configure leakage behavior.
`"queueOutflow"`	The output flow from a queue.
`"queueOverflow"`	The overflow flow emitted when a full queue cannot accept new items (requires `overflow: true` on the queue).

Variable sub-types — set subType on plain "variable" type entities:

`subType`	Description
`"delayVariable"`	A plain variable whose equation contains a `DELAY` or `SMTH` builtin function (e.g. `DELAY1`, `DELAY3`, `DELAY N`, `SMTH1`, `SMTH3`). Set this whenever any DELAY or SMTH variant appears in the equation.

additionalProperties fields for conveyor and oven stocks:

Field	Type	Applies to	Description
`processTime`	string (equation)	conveyor, oven	Transit time (conveyor) or cook time (oven). Required.
`capacity`	string (equation)	conveyor, oven	Maximum number of items the element can hold.
`inflowLimit`	string (equation)	conveyor, oven	Maximum inflow rate per time step.
`fillTime`	string (equation)	oven only	Time to fill the element before processing begins.
`cleanTime`	string (equation)	oven only	Clean-up time after emptying before accepting new items.
`sample`	string (equation)	conveyor, oven	Re-samples transit/cook time when this expression is non-zero.
`arrest`	string (equation)	conveyor, oven	Halts movement when this expression is non-zero.
`oneAtATime`	boolean	If true, accepts only one batch per time step.
`splitBatches`	boolean	If true, incoming batches can be split when entering.

additionalProperties fields for regular flows (inflows to a conveyor):

Field	Type	Description
`spreadFlow`	string enum	How this flow distributes along the conveyor when it enters: `"none"` (default, front-entry), `"even"`, `"destination"`, `"distribution"` (requires `distribEq`), `"source"`.
`distribEq`	string (equation)	Distribution table equation. Required when `spreadFlow` is `"distribution"`.

additionalProperties fields for conveyorLeakage flows:

Field	Type	Description
`leakFraction`	string (equation)	Fraction of conveyor contents that leak out per time step.
`exponential`	boolean	If true, leakage is exponential (constant fraction); if false (default), linear.
`leakZoneStart`	string (equation)	Start position (0–100%) along the conveyor where leakage begins. Leave empty for leakage across the entire length.
`leakZoneEnd`	string (equation)	End position (0–100%) along the conveyor where leakage ends. Leave empty for leakage across the entire length.
`leakIntegers`	boolean	If true, leakage amounts are rounded to whole integers.
`ignorePrevZones`	boolean	If true, each leak zone operates independently of losses from earlier zones.
`forceLeakFraction`	boolean	If true, the same leak fraction is applied regardless of transit duration.

additionalProperties fields for queue stocks:

Field	Type	Description
`fifoEnabled`	boolean	If true, dispatches in FIFO order; if false (default), LIFO.
`discrete`	boolean	If true, operates on integer quantities only (discrete mode).
`roundRobin`	boolean	If true, competing outflows are served in round-robin order.
`queueOutflowPriority`	string (equation)	Dispatch priority for the queue outflow.
`purgeEq`	string (equation)	Items older than this age (in time units) are automatically removed.
`overflow`	boolean	If true, a `queueOverflow` flow is automatically created for excess items.

Discussion Engine JSON response

{
    output: {
        textContent: <string, the response to the query from the user>
    }
}

Discussion Engine Feedback JSON input

{
    feedbackLoops: [{
        identifier: <string>,
        name: <string>,
        links: [
            { from: <string>, to: <string>, polarity: <string - +|-|? > }
            ...
        ],
        polarity: <string +|-|?>,
        loopset?: <number> 
        “Percent of Model Behavior Explained By Loop”?: [
            { time: <number>, value: <number> }
            ...
        ]
    }],
    dominantLoopsByPeriod?: [{
        dominantLoops: Array<string>,
        startTime: <number>,
        endTime: <number>
    }]   
}

WebSocket AI Agent

The agent/ directory contains a WebSocket server that wraps the SD-AI engines in a conversational AI agent for building and modifying System Dynamics models interactively.

Key characteristics:

Stateless — all model state, run data, and conversation history live on the client
All core tools are built-in (get/update model, run simulation, fetch variable data, feedback loops, visualizations)
Clients can optionally register custom tools for application-specific behavior
Agent personalities are configured via Markdown files in agent/config/
Visualizations are returned as raw SVG strings

WebSocket endpoint: ws://localhost:3000/api/v1/agent

Protocol summary: client connects → initialize_session (model type + initial model) → session_ready (agent list) → select_agent → chat messages → agent responds with agent_text, visualization, and tool_call_request messages that the client must answer.

See agent/README.md for the full WebSocket protocol, all message types, tool call request/response formats, and example client implementation.

Setup

fork this repo and git clone your fork locally
create an .env file at the top level which has the following keys:

OPENAI_API_KEY="sk-asdjkshd" # if you're doing work with engines that use the LLMWrapper class in utils.js (quantitative, qualitative, seldon, etc.)
GEMINI_API_KEY="asdjkshd" # if you're doing work with engines using Gemini models (causal-chains, seldon, quantitative, qualitative)
ANTHROPIC_API_KEY="sk-ant-asdjkshd" # if you're using Claude models for engines or the agent
OPEN_ROUTER_API_KEY="sk-or-asdjkshd" # if you're using OpenRouter-routed models (Qwen, Deepseek, Kimi) for engines or the agent
AUTHENTICATION_KEY="my_secret_key" # only needed for securing publically accessible deployments. Requires client pass an Authentication header matching this value. e.g. `curl -H "Authentication: my_super_secret_value_in_env_file"` to the engine generate request only
REPORTER_URL="https://your-metrics-server.com/api/metrics" # optional URL to POST engine usage metrics to. If not set, metrics reporting is disabled.
TOKEN_REPORTER_URL="https://your-metrics-server.com/api/token-usage" # optional URL to POST agent LLM token usage and cost to. If not set, token reporting is disabled.

npm install
npm start
(optional) npm run evals -- -e evals/experiments/careful.json
(optional) npm test
(optional) npm test:coverage

We recommend VSCode using a launch.json for the Node type applications (you get a debugger, and hot-reloading)

Optional Third-Party Requirements

Some engines require additional dependencies to be installed on your system:

Go 1.24.0 or later - Required for the causal-chains engine (installation guide)
Python 3.x - Required for the causal-decoder engine and the and the agentic tools
Docker (or Podman aliased as docker) - Required for the test-simlin-agent engine (Docker installation guide). The Docker daemon must be running when npm install runs the postinstall hook; if Docker is missing or the daemon is unreachable, the image build is skipped and the engine is disabled.

These dependencies are automatically built/installed when you run npm install via postinstall hooks, but only if the respective toolchains are available on your PATH.

To skip specific components during installation, set the SKIP_THIRD_PARTY_COMPONENTS environment variable to a comma-separated list of component names before running npm install:

Mac/Linux:

SKIP_THIRD_PARTY_COMPONENTS=causal-decoder,PySD-simulator,time-series-behavior-analysis npm install

Windows:

set SKIP_THIRD_PARTY_COMPONENTS=causal-decoder,PySD-simulator,time-series-behavior-analysis && npm install

Available component names and what they affect:

Component	Effect of skipping
`causal-chains`	Disables the causal-chains engine
`causal-decoder`	Disables the causal-decoder engine
`PySD-simulator`	Breaks evals
`time-series-behavior-analysis`	Breaks evals
`visualization-engine`	Breaks agentic tools
`simlin-agent`	Disables the test-simlin-agent engine

Agent Sandbox (Production Linux Only)

The agentic assistant runs each session's agent in an isolated worker process. On Linux, worker processes are sandboxed using bubblewrap (bwrap), which uses Linux kernel namespaces to confine the agent to its session-specific temp directory. The agent cannot read or write anywhere else on the server filesystem — including other sessions, application source code, or environment variables on disk.

Installing bubblewrap

Install bubblewrap via your system package manager (bubblewrap on most distros). See the bubblewrap releases page for more options.

What bwrap provides

Isolation	Guarantee
Filesystem writes	Agent can only write to its session temp dir
Filesystem reads	Only app code, system libs, and TLS certs are visible
Cross-session access	Other sessions' temp dirs are not mounted
Process isolation	Separate PID namespace; agent cannot signal other processes
Hostname isolation	Separate UTS namespace

The Python subprocess spawned for visualizations inherits the same bwrap namespace automatically — no separate Python-level sandbox is needed.

Development (macOS / Windows)

bwrap is a Linux kernel feature and is not available on macOS or Windows. On those platforms the agent worker runs unsandboxed with full filesystem access. A prominent warning is logged at startup. This is acceptable for local development but must not be used for any publicly hosted deployment.

What bwrap does NOT restrict

Network access — the agent worker must reach the upstream LLM APIs (Anthropic, Google, and/or OpenRouter depending on the configured providers). The agent can make arbitrary outbound HTTP requests if prompted to do so. Restrict this at the network/firewall level if needed.

Metrics Reporting

SD-AI includes optional metrics reporting via the GenerateMetricsReporter class. When enabled, it automatically tracks and reports usage data for every engine generation request.

Configuration

Set the REPORTER_URL environment variable in your .env file to enable metrics reporting:

REPORTER_URL="https://your-metrics-server.com/api/metrics"

If REPORTER_URL is not set or is empty, metrics reporting is disabled and no HTTP requests are made.

Reported Metrics

For each call to /api/v1/:engine/generate, the following JSON data is posted to the configured URL:

{
  "engine": "quantitative",
  "underlyingModel": "gpt-4o-mini",
  "duration": 1234,
  "timestamp": "2024-01-15T10:30:00.000Z"
}

Fields:

engine (string): The name of the engine used (e.g., "quantitative", "qualitative", "seldon")
underlyingModel (string|null): The underlying LLM model specified in the request body, or null if not provided
duration (number): Time in milliseconds for the generate call to complete
timestamp (string): ISO 8601 timestamp of when the report was generated

The reporter sends metrics asynchronously and will not block or affect the engine response, even if the reporting endpoint is unavailable.

Token Usage Reporting

The agent uses TokenUsageReporter to track token usage and cost for every LLM call made using this service. This is separate from the engine metrics above — it covers the agent's internal Anthropic, Gemini, OpenAI, and OpenRouter calls rather than top-level HTTP engine requests.

Configuration

Set TOKEN_REPORTER_URL in your .env file to enable reporting:

TOKEN_REPORTER_URL="https://your-metrics-server.com/api/token-usage"

Reporting is only active when both TOKEN_REPORTER_URL is set and the client provided a clientId in the initialize_session WebSocket message or as an additional parameter to an engine call. If either is missing, usage is still logged to the server console but not POSTed anywhere.

Console Logging

Regardless of whether remote reporting is enabled, every LLM call logs a line to the server console:

[usage:anthropic]  input=1234($0.003702) output=256($0.003840) cache_write_5m=0($0.000000) cache_write_1h=0($0.000000) cache_read=512($0.000461) total=$0.008003
[usage:gemini]     input=800($0.000160) output=120($0.000072) cached=200($0.000010) thoughts=40($0.000024) total=$0.000266
[usage:openai]     input=600($0.000300) output=150($0.000225) cached=100($0.000025) reasoning=0 total=$0.000550
[usage:openrouter] input=820 output=140 cached=0 cache_write=0 total=$0.001425

Per-token costs are shown in parentheses when pricing data is available for the model. If pricing is unknown the token counts are shown without a cost. The openrouter line omits per-component dollar amounts because OpenRouter returns the authoritative total cost on the response itself — we don't recompute per-token breakdowns locally.

Reported Payload

When remote reporting is active, the following JSON is POSTed to TOKEN_REPORTER_URL for each LLM call:

{
  "clientId": "client-provided-id",
  "provider": "anthropic",
  "model": "claude-sonnet-4-6",
  "tokens": {
    "inputTokens": 1234,
    "outputTokens": 256,
    "cacheCreation5mInputTokens": 0,
    "cacheCreation1hInputTokens": 0,
    "cacheReadInputTokens": 512
  },
  "cost": 0.008003,
  "timestamp": "2024-01-15T10:30:00.000Z"
}

The tokens shape varies by provider:

Provider	Token fields
`anthropic`	`inputTokens`, `outputTokens`, `cacheCreation5mInputTokens`, `cacheCreation1hInputTokens`, `cacheReadInputTokens`
`gemini`	`inputTokens`, `outputTokens`, `cachedTokens`, `thoughtsTokens`
`openai`	`inputTokens`, `outputTokens`, `cachedTokens`, `reasoningTokens`
`openrouter`	`inputTokens`, `outputTokens`, `cachedTokens`, `cacheWriteTokens`, `providerCost`

cost is the total dollar cost of the call, or null if pricing data is unavailable for the model. For openrouter, cost is taken directly from the provider's authoritative usage.cost (no local pricing table is consulted), and providerCost in the token block carries that same value for transparency.

The reporter fires asynchronously and never blocks or fails the agent response if the reporting endpoint is unavailable.

Testing

Unit Tests

Unit tests are provided for:

HTTP API routes in /routes/v1 folder:
- engineParameters.test.js - Validates that all engines return correct parameters
- engineGenerate.test.js - Tests model generation endpoints with authentication, parameter validation, and response structure
- engines.test.js - Tests engine listing and metadata endpoints
Engine implementations in /engines folder:
- QuantitativeEngineBrain.test.js - Tests quantitative model generation and LLM setup
- QualitativeEngineBrain.test.js - Tests qualitative diagram generation
- SeldonBrain.test.js - Tests discussion engine functionality
Evaluation methods in /evals/categories - Tests cover causal relationship evaluation, conformance validation, and quantitative model assessment
Model output evaluation in /evals/model_output_evaluation - Standalone tools for classifying System Dynamics model output (time series) into behavioral patterns like exponential growth, oscillation, or S-shaped growth

Run tests with:

npm test

Generate code coverage report with:

npm run test:coverage

Tests are built using Jest and Supertest, and use the actual engine implementations (no mocking) to ensure real-world functionality.

Evals

checkout the Evals README
for model output behavior classification, see Model Output Evaluation

Inspiration and Related Work

https://github.com/bear96/System-Dynamics-Bot served as departure point for engine prompt development
CoModel created by the team at Skip Designed to use Generative AI in their CBSD work

Name		Name	Last commit message	Last commit date
Latest commit History 836 Commits
.github/workflows		.github/workflows
agent		agent
engines		engines
evals		evals
frontend		frontend
routes		routes
tests		tests
third-party		third-party
utilities		utilities
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.js		app.js
config.js		config.js
jest.config.js		jest.config.js
package-lock.json		package-lock.json
package.json		package.json

Folders and files

Latest commit

History

Repository files navigation

SD-AI

Architecture and Data Structures

Engine

Additional Parameters

API Example

Generate

API Example

SD-JSON

Arrays in SD-JSON

Modules in SD-JSON

Discrete-Entity Sub-Types in SD-JSON

Discussion Engine JSON response

Discussion Engine Feedback JSON input

WebSocket AI Agent

Setup

Optional Third-Party Requirements

Agent Sandbox (Production Linux Only)

Installing bubblewrap

What bwrap provides

Development (macOS / Windows)

What bwrap does NOT restrict

Metrics Reporting

Configuration

Reported Metrics

Token Usage Reporting

Configuration

Console Logging

Reported Payload

Testing

Unit Tests

Evals

Inspiration and Related Work

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages