Open source repository for the SD-AI Project.
Contains the engines used by Stella & CoModel, evaluations used to test those engines and a frontend used to explore those evaluations and engines.
- sd-ai is a NodeJS Express app with simple JSON-encoded HTTP API
- all AI functionality in sd-ai is implemented as an "engine"
- an engine is a javascript class that can implement ai functionality using any libraries/apis supported by javascript
/enginesfolder contains examples including the simplest possible engine:predpreyand engines likequalitative,quantitative, andseldon
- sd-ai wraps engines to provides endpoints to:
- list all engines
- list parameters required/supported by each specific engine
- generating a model using a specific engine
- all engines can be automatically tested for quality using
evals
- an engine only needs to do 2 things:
- provide a function to generate a model based on a prompt
- tell us what additional parameters users can pass to it
- defined via
additionalParameters()function on each engine class - format specifically crafted to allow your engine to be automatically incorporated into the Stella GUI and the sd-ai website
GET/api/v1/engines/:engine/parameters- Returns
{
success: <bool>,
parameters:[{
name: <string, unique name for the parmater that is passed to generate call>,
type: <string, string|number|boolean>,
required: <boolean, whether or not this parameter must be passed to the generate call>,
uiElement: <string, type of UI element the client should use so that the user can enter this value. Valid values are textarea|lineedit|password|combobox|hidden|checkbox>,
label: <string, name to put next to the UI element in the client>,
description: <string, description of what this parameter does, used as a tooltip or placeholder text in the client>,
defaultValue: <string, default value for this parameter if there is one, otherwise skipped>,
options: <array, of objects with two attributes 'label' and 'value' only used if 'uiElement' is combobox>,
saveForUser: <string, whether or not this field should be saved for the user by the client, valid values are local|global leave unspecified if not saved>,
minHeight: <int, NOT REQUIRED, default 100 only relevant if 'uiElement' is textarea -- this is the minimum height of that text area>,
maxHeight: <int, NOT REQUIRED, default intmax, only relevant if 'uiElement' is textarea -- this is the maximum height of that text area>
}]
}
- does the job of diagram generation, it's the workhorse of the engine
- defined via
generate(prompt, currentModel, parameters)function on each engine class - a complete diagram should be returned by each request, even if that just means returning an empty diagram or the same diagram the user passed in via
currentModel
POST/api/v1/:engine/generate- JSON data
{
"prompt": "", # Requested model or changes to model to be provided to the AI
"currentModel": { "relationships": [], "variables": []} # Optional sd-json representation of the current model
....
# additionalParameters given by `/api/v1/:engine/parameters`
}
- Returns
{
success: <bool>,
model: {variables: [], relationships: [], specs?: {} },
supportingInfo?: {} # only provided if supported by engine
}
{
variables: [{
name: <string>,
type: <string>, # stock|flow|variable
equation?: <string>,
documentation?: <string>,
units?: <string>,
uniflow?: <boolean>, # For flows only: true if flow should never be negative
inflows?: Array<string>,
outflows?: Array<string>,
dimensions?: Array<string>, # Array of dimension names for arrayed variables
arrayEquations?: [{ # Used for arrayed variables with element-specific equations
equation: <string>,
forElements: Array<string> # Array element names matching dimensions
}],
crossLevelGhostOf?: <string>, # For modular models: references source variable
graphicalFunction?: {
points: [
{x: <number>, y: <number>}
...
]
},
subType?: <string>, # Discrete-entity sub-type (see below)
additionalProperties?: { # Sub-type-specific settings (see below) }
}],
relationships: [{
reasoning?: <string>, # Explanation for why this relationship is here
from: <string>, # The variable the connection starts with
to: <string>, # The variable the connection ends with
polarity: <string>, # "+" or "-" or ""
polarityReasoning?: <string> # Explanation for why this polarity was chosen
}],
modules?: [{ # Module definitions for hierarchical model organization
name: <string>, # Simple module name (alphanumeric + underscores only)
parentModule: <string> # Parent module name (empty string if top-level)
}],
specs?: {
startTime: <number>,
stopTime: <number>,
dt?: <number>,
timeUnits?: <string>,
integrationMethod?: <string>, # "Euler" or "RK4"
arrayDimensions?: [{ # Array dimension definitions (all four fields required)
type: <string>, # "numeric" or "labels" - numeric auto-generates element names as strings ('1','2','3'), labels use user-defined meaningful names
name: <string>, # Singular, alphanumeric dimension name (e.g., "region" not "regions")
size: <number>, # Positive integer - number of elements in dimension
elements: Array<string> # Element names - for numeric: auto-generated ['1','2','3'], for labels: user-defined ['North','South','East','West']
}]
}
}
? denotes an optional attribute
Variables can be arrayed over one or more dimensions to create multi-dimensional arrays:
- Dimensions: Defined in
specs.arrayDimensionswith all four required fields:type: Either "numeric" (auto-generates elements as '1','2','3') or "labels" (user-defined element names)name: Singular, alphanumeric dimension name (e.g., "region" not "regions")size: Positive integer count of elementselements: Array of element names matching the size
- Arrayed Variables: Reference dimensions by name in their
dimensionsarray (order matters) - Array Equations:
- If all elements use the SAME formula: uses
equationfield only - If elements have DIFFERENT formulas OR for arrayed STOCKS: uses
arrayEquationsarray with element-specific equations - Each
arrayEquationsentry hasequationandforElements(ordered to match the variable's dimensions list)
- If all elements use the SAME formula: uses
Models can be organized into modules for better structure and encapsulation:
- Module Definition: Modules are defined in the top-level
modulesarray:name: Simple module name (alphanumeric + underscores, no spaces, never module-qualified)parentModule: Name of containing module (empty string for top-level modules)- Modules can be nested to create hierarchical structures
- Module Naming in Variables: Use dot notation:
ModuleName.variableName(e.g.,Hares.population,Lynx.births) - Ghost Variables: For inter-module references, create cross-level ghost variables:
- Set
crossLevelGhostOfto the fully qualified source variable name - Leave
equationfield empty (empty string) - Ghost variable has same local name as source but exists in consuming module
- All equations in consuming module reference the ghost, not the original source
- Set
Variables can have a subType field that further classifies them. Sub-types are a refinement of type — the top-level type field remains "stock", "flow", or "variable".
Stock sub-types — also set additionalProperties with the relevant configuration:
subType |
Description |
|---|---|
"queue" |
A waiting line that holds discrete items until they are dispatched. |
"oven" |
A batch processor where items are held for a fixed cook time then released together. |
"conveyor" |
A pipeline delay where items travel a fixed transit time before exiting from the other end. |
Flow sub-types — automatically managed flows. Set subType only; leave equation as an empty string:
subType |
Description |
|---|---|
"discreteOutflow" |
The output flow from a conveyor or oven. |
"conveyorLeakage" |
The leakage flow from a conveyor. Set additionalProperties to configure leakage behavior. |
"queueOutflow" |
The output flow from a queue. |
"queueOverflow" |
The overflow flow emitted when a full queue cannot accept new items (requires overflow: true on the queue). |
Variable sub-types — set subType on plain "variable" type entities:
subType |
Description |
|---|---|
"delayVariable" |
A plain variable whose equation contains a DELAY or SMTH builtin function (e.g. DELAY1, DELAY3, DELAY N, SMTH1, SMTH3). Set this whenever any DELAY or SMTH variant appears in the equation. |
additionalProperties fields for conveyor and oven stocks:
| Field | Type | Applies to | Description |
|---|---|---|---|
processTime |
string (equation) | conveyor, oven | Transit time (conveyor) or cook time (oven). Required. |
capacity |
string (equation) | conveyor, oven | Maximum number of items the element can hold. |
inflowLimit |
string (equation) | conveyor, oven | Maximum inflow rate per time step. |
fillTime |
string (equation) | oven only | Time to fill the element before processing begins. |
cleanTime |
string (equation) | oven only | Clean-up time after emptying before accepting new items. |
sample |
string (equation) | conveyor, oven | Re-samples transit/cook time when this expression is non-zero. |
arrest |
string (equation) | conveyor, oven | Halts movement when this expression is non-zero. |
oneAtATime |
boolean | If true, accepts only one batch per time step. | |
splitBatches |
boolean | If true, incoming batches can be split when entering. |
additionalProperties fields for regular flows (inflows to a conveyor):
| Field | Type | Description |
|---|---|---|
spreadFlow |
string enum | How this flow distributes along the conveyor when it enters: "none" (default, front-entry), "even", "destination", "distribution" (requires distribEq), "source". |
distribEq |
string (equation) | Distribution table equation. Required when spreadFlow is "distribution". |
additionalProperties fields for conveyorLeakage flows:
| Field | Type | Description |
|---|---|---|
leakFraction |
string (equation) | Fraction of conveyor contents that leak out per time step. |
exponential |
boolean | If true, leakage is exponential (constant fraction); if false (default), linear. |
leakZoneStart |
string (equation) | Start position (0–100%) along the conveyor where leakage begins. Leave empty for leakage across the entire length. |
leakZoneEnd |
string (equation) | End position (0–100%) along the conveyor where leakage ends. Leave empty for leakage across the entire length. |
leakIntegers |
boolean | If true, leakage amounts are rounded to whole integers. |
ignorePrevZones |
boolean | If true, each leak zone operates independently of losses from earlier zones. |
forceLeakFraction |
boolean | If true, the same leak fraction is applied regardless of transit duration. |
additionalProperties fields for queue stocks:
| Field | Type | Description |
|---|---|---|
fifoEnabled |
boolean | If true, dispatches in FIFO order; if false (default), LIFO. |
discrete |
boolean | If true, operates on integer quantities only (discrete mode). |
roundRobin |
boolean | If true, competing outflows are served in round-robin order. |
queueOutflowPriority |
string (equation) | Dispatch priority for the queue outflow. |
purgeEq |
string (equation) | Items older than this age (in time units) are automatically removed. |
overflow |
boolean | If true, a queueOverflow flow is automatically created for excess items. |
{
output: {
textContent: <string, the response to the query from the user>
}
}
{
feedbackLoops: [{
identifier: <string>,
name: <string>,
links: [
{ from: <string>, to: <string>, polarity: <string - +|-|? > }
...
],
polarity: <string +|-|?>,
loopset?: <number>
“Percent of Model Behavior Explained By Loop”?: [
{ time: <number>, value: <number> }
...
]
}],
dominantLoopsByPeriod?: [{
dominantLoops: Array<string>,
startTime: <number>,
endTime: <number>
}]
}
The agent/ directory contains a WebSocket server that wraps the SD-AI engines in a conversational AI agent for building and modifying System Dynamics models interactively.
Key characteristics:
- Stateless — all model state, run data, and conversation history live on the client
- All core tools are built-in (get/update model, run simulation, fetch variable data, feedback loops, visualizations)
- Clients can optionally register custom tools for application-specific behavior
- Agent personalities are configured via Markdown files in
agent/config/ - Visualizations are returned as raw SVG strings
WebSocket endpoint: ws://localhost:3000/api/v1/agent
Protocol summary: client connects → initialize_session (model type + initial model) → session_ready (agent list) → select_agent → chat messages → agent responds with agent_text, visualization, and tool_call_request messages that the client must answer.
See agent/README.md for the full WebSocket protocol, all message types, tool call request/response formats, and example client implementation.
- fork this repo and git clone your fork locally
- create an
.envfile at the top level which has the following keys:
OPENAI_API_KEY="sk-asdjkshd" # if you're doing work with engines that use the LLMWrapper class in utils.js (quantitative, qualitative, seldon, etc.)
GEMINI_API_KEY="asdjkshd" # if you're doing work with engines using Gemini models (causal-chains, seldon, quantitative, qualitative)
ANTHROPIC_API_KEY="sk-ant-asdjkshd" # if you're using Claude models for engines or the agent
OPEN_ROUTER_API_KEY="sk-or-asdjkshd" # if you're using OpenRouter-routed models (Qwen, Deepseek, Kimi) for engines or the agent
AUTHENTICATION_KEY="my_secret_key" # only needed for securing publically accessible deployments. Requires client pass an Authentication header matching this value. e.g. `curl -H "Authentication: my_super_secret_value_in_env_file"` to the engine generate request only
REPORTER_URL="https://your-metrics-server.com/api/metrics" # optional URL to POST engine usage metrics to. If not set, metrics reporting is disabled.
TOKEN_REPORTER_URL="https://your-metrics-server.com/api/token-usage" # optional URL to POST agent LLM token usage and cost to. If not set, token reporting is disabled.
- npm install
- npm start
- (optional) npm run evals -- -e evals/experiments/careful.json
- (optional) npm test
- (optional) npm test:coverage
We recommend VSCode using a launch.json for the Node type applications (you get a debugger, and hot-reloading)
Some engines require additional dependencies to be installed on your system:
- Go 1.24.0 or later - Required for the causal-chains engine (installation guide)
- Python 3.x - Required for the causal-decoder engine and the and the agentic tools
- Docker (or Podman aliased as
docker) - Required for the test-simlin-agent engine (Docker installation guide). The Docker daemon must be running whennpm installruns the postinstall hook; if Docker is missing or the daemon is unreachable, the image build is skipped and the engine is disabled.
These dependencies are automatically built/installed when you run npm install via postinstall hooks, but only if the respective toolchains are available on your PATH.
To skip specific components during installation, set the SKIP_THIRD_PARTY_COMPONENTS environment variable to a comma-separated list of component names before running npm install:
Mac/Linux:
SKIP_THIRD_PARTY_COMPONENTS=causal-decoder,PySD-simulator,time-series-behavior-analysis npm installWindows:
set SKIP_THIRD_PARTY_COMPONENTS=causal-decoder,PySD-simulator,time-series-behavior-analysis && npm installAvailable component names and what they affect:
| Component | Effect of skipping |
|---|---|
causal-chains |
Disables the causal-chains engine |
causal-decoder |
Disables the causal-decoder engine |
PySD-simulator |
Breaks evals |
time-series-behavior-analysis |
Breaks evals |
visualization-engine |
Breaks agentic tools |
simlin-agent |
Disables the test-simlin-agent engine |
The agentic assistant runs each session's agent in an isolated worker process. On Linux, worker processes are sandboxed using bubblewrap (bwrap), which uses Linux kernel namespaces to confine the agent to its session-specific temp directory. The agent cannot read or write anywhere else on the server filesystem — including other sessions, application source code, or environment variables on disk.
Install bubblewrap via your system package manager (bubblewrap on most distros). See the bubblewrap releases page for more options.
| Isolation | Guarantee |
|---|---|
| Filesystem writes | Agent can only write to its session temp dir |
| Filesystem reads | Only app code, system libs, and TLS certs are visible |
| Cross-session access | Other sessions' temp dirs are not mounted |
| Process isolation | Separate PID namespace; agent cannot signal other processes |
| Hostname isolation | Separate UTS namespace |
The Python subprocess spawned for visualizations inherits the same bwrap namespace automatically — no separate Python-level sandbox is needed.
bwrap is a Linux kernel feature and is not available on macOS or Windows. On those platforms the agent worker runs unsandboxed with full filesystem access. A prominent warning is logged at startup. This is acceptable for local development but must not be used for any publicly hosted deployment.
- Network access — the agent worker must reach the upstream LLM APIs (Anthropic, Google, and/or OpenRouter depending on the configured providers). The agent can make arbitrary outbound HTTP requests if prompted to do so. Restrict this at the network/firewall level if needed.
SD-AI includes optional metrics reporting via the GenerateMetricsReporter class. When enabled, it automatically tracks and reports usage data for every engine generation request.
Set the REPORTER_URL environment variable in your .env file to enable metrics reporting:
REPORTER_URL="https://your-metrics-server.com/api/metrics"
If REPORTER_URL is not set or is empty, metrics reporting is disabled and no HTTP requests are made.
For each call to /api/v1/:engine/generate, the following JSON data is posted to the configured URL:
{
"engine": "quantitative",
"underlyingModel": "gpt-4o-mini",
"duration": 1234,
"timestamp": "2024-01-15T10:30:00.000Z"
}Fields:
engine(string): The name of the engine used (e.g., "quantitative", "qualitative", "seldon")underlyingModel(string|null): The underlying LLM model specified in the request body, or null if not providedduration(number): Time in milliseconds for the generate call to completetimestamp(string): ISO 8601 timestamp of when the report was generated
The reporter sends metrics asynchronously and will not block or affect the engine response, even if the reporting endpoint is unavailable.
The agent uses TokenUsageReporter to track token usage and cost for every LLM call made using this service. This is separate from the engine metrics above — it covers the agent's internal Anthropic, Gemini, OpenAI, and OpenRouter calls rather than top-level HTTP engine requests.
Set TOKEN_REPORTER_URL in your .env file to enable reporting:
TOKEN_REPORTER_URL="https://your-metrics-server.com/api/token-usage"
Reporting is only active when both TOKEN_REPORTER_URL is set and the client provided a clientId in the initialize_session WebSocket message or as an additional parameter to an engine call. If either is missing, usage is still logged to the server console but not POSTed anywhere.
Regardless of whether remote reporting is enabled, every LLM call logs a line to the server console:
[usage:anthropic] input=1234($0.003702) output=256($0.003840) cache_write_5m=0($0.000000) cache_write_1h=0($0.000000) cache_read=512($0.000461) total=$0.008003
[usage:gemini] input=800($0.000160) output=120($0.000072) cached=200($0.000010) thoughts=40($0.000024) total=$0.000266
[usage:openai] input=600($0.000300) output=150($0.000225) cached=100($0.000025) reasoning=0 total=$0.000550
[usage:openrouter] input=820 output=140 cached=0 cache_write=0 total=$0.001425
Per-token costs are shown in parentheses when pricing data is available for the model. If pricing is unknown the token counts are shown without a cost. The openrouter line omits per-component dollar amounts because OpenRouter returns the authoritative total cost on the response itself — we don't recompute per-token breakdowns locally.
When remote reporting is active, the following JSON is POSTed to TOKEN_REPORTER_URL for each LLM call:
{
"clientId": "client-provided-id",
"provider": "anthropic",
"model": "claude-sonnet-4-6",
"tokens": {
"inputTokens": 1234,
"outputTokens": 256,
"cacheCreation5mInputTokens": 0,
"cacheCreation1hInputTokens": 0,
"cacheReadInputTokens": 512
},
"cost": 0.008003,
"timestamp": "2024-01-15T10:30:00.000Z"
}The tokens shape varies by provider:
| Provider | Token fields |
|---|---|
anthropic |
inputTokens, outputTokens, cacheCreation5mInputTokens, cacheCreation1hInputTokens, cacheReadInputTokens |
gemini |
inputTokens, outputTokens, cachedTokens, thoughtsTokens |
openai |
inputTokens, outputTokens, cachedTokens, reasoningTokens |
openrouter |
inputTokens, outputTokens, cachedTokens, cacheWriteTokens, providerCost |
cost is the total dollar cost of the call, or null if pricing data is unavailable for the model. For openrouter, cost is taken directly from the provider's authoritative usage.cost (no local pricing table is consulted), and providerCost in the token block carries that same value for transparency.
The reporter fires asynchronously and never blocks or fails the agent response if the reporting endpoint is unavailable.
Unit tests are provided for:
- HTTP API routes in
/routes/v1folder:engineParameters.test.js- Validates that all engines return correct parametersengineGenerate.test.js- Tests model generation endpoints with authentication, parameter validation, and response structureengines.test.js- Tests engine listing and metadata endpoints
- Engine implementations in
/enginesfolder:QuantitativeEngineBrain.test.js- Tests quantitative model generation and LLM setupQualitativeEngineBrain.test.js- Tests qualitative diagram generationSeldonBrain.test.js- Tests discussion engine functionality
- Evaluation methods in
/evals/categories- Tests cover causal relationship evaluation, conformance validation, and quantitative model assessment - Model output evaluation in
/evals/model_output_evaluation- Standalone tools for classifying System Dynamics model output (time series) into behavioral patterns like exponential growth, oscillation, or S-shaped growth
Run tests with:
npm testGenerate code coverage report with:
npm run test:coverageTests are built using Jest and Supertest, and use the actual engine implementations (no mocking) to ensure real-world functionality.
- checkout the Evals README
- for model output behavior classification, see Model Output Evaluation
- https://github.com/bear96/System-Dynamics-Bot served as departure point for engine prompt development
- CoModel created by the team at Skip Designed to use Generative AI in their CBSD work