A quick reference for the most common TraceCore/tracecore CLI issues (the agent-bench legacy alias still works) across installation,
CLI runs, tasks, and the optional web UI. When in doubt, inspect the latest artifact in .agent_bench/runs/.
Tip: Load
.agent_bench/runs/<run_id>.jsondirectly, or usetracecore runs list --limit 5to find recent run IDs. The dashboard trace viewer at/?trace_id=<run_id>surfaces the same validator and harness messages.
Before installing TraceCore, follow FastAPI's virtual environment guidance to create and activate an isolated interpreter (e.g., python -m venv .venv then source .venv/bin/activate or .venv\Scripts\activate). Running installs and CLI commands inside the same shell session avoids PATH confusion.
Development (git clone) setup:
git clone https://github.com/justindobbs/Tracecore.git
cd Tracecore
python -m venv .venv && .venv\Scripts\activate
pip install -e .[dev]- Live edits to CLI/tasks/agents work immediately
- Uses local files directly from repo
- Run from repo root for relative imports
Pip install setup:
pip install tracecore
# or
pipx install tracecore- Installed to system/site-packages
- Must reinstall to get updates
- Works from any directory
# From repo root - build and test wheel
python -m build --wheel
python -m venv .tmp-tracecore
.tmp-tracecore\Scripts\pip install dist\tracecore-*.whl
# Verify it works
.tmp-tracecore\Scripts\tracecore --help
.tmp-tracecore\Scripts\tracecore run pairing --list- Ensure you ran
pip install -e .[dev](development setup) ORpip install tracecore(pip setup). - Activate the virtualenv before running commands (
.venv\Scripts\activateon Windows). - Verify you are invoking the same interpreter that owns the install (e.g.,
which python).
Development setup specific:
- Run from repo root so relative imports (tasks, agents) resolve correctly.
- Use
pip install -e .[dev]if the editable link was removed.
Pip install setup specific:
- Should work from any directory - no need to be in repo root.
- Reinstall with
pip install --upgrade tracecoreto get updates.
Windows-specific
- Add
%APPDATA%\Python\Python3x\Scripts(or thepipxshim dir) toPATH. See the "Windows PATH tip" inREADME.mdfor step-by-step instructions. - After editing
PATH, open a new terminal so the shell picks up the change.
Common pitfalls
- Launching
tracecorefrom PowerShell after activating the virtualenv in Command Prompt (or vice versa). Activate the env in the same shell you use to run the CLI soPATHandPYTHONPATHmatch. - Development setup: Running commands from inside the
.venv/folder. Always runtracecorefrom the repo root so relative imports (tasks, agents) resolve correctly. - Pip setup: Expecting live edits to work. Changes require rebuilding and reinstalling the package.
The package is not on PYTHONPATH.
Development setup: Activate the same virtualenv used for installation or reinstall with pip install -e . if the editable link was removed.
Pip setup: Ensure you're using the correct Python environment where TraceCore was installed, or reinstall with pip install tracecore.
If python points at a different interpreter than the one that ran pip, the scripts land in
another site-packages. Pin a single interpreter via py -3.12 -m venv .venv && .venv\Scripts\activate
(Windows) or python3.12 -m venv .venv (macOS/Linux).
Generate a stub with the correct reset / observe / act interface:
tracecore new-agent my_agent # creates agents/my_agent_agent.py
tracecore new-agent my-agent # kebab-case → MyAgentAgent
tracecore new-agent my_agent --output-dir src/ # write to a different directory
tracecore new-agent my_agent --force # overwrite an existing fileThe generated file is immediately importable and runnable. Replace the # TODO block in act() with your decision logic, then test it:
tracecore run --agent agents/my_agent_agent.py --task filesystem_hidden_config@1 --seed 0If the file already exists and --force is not set, the command exits non-zero with a clear error rather than silently overwriting.
The fastest way to fire a known-good run without memorizing flags:
tracecore run pairing log_stream_monitor # run by name, seed 0
tracecore run pairing log_stream_monitor --seed 7 # custom seed
tracecore run pairing --list # show all available pairingsIf you are inside a directory that contains exactly one paired agent file, the name can be omitted and it auto-selects. If the name is unknown or ambiguous, the CLI prints the pairing list and exits with a non-zero code.
Smoke-test every pairing in sequence (CI-friendly — exits non-zero if any fail):
tracecore run pairing --all
tracecore run pairing --all --seed 7 --timeout 120 # 120 s wall-clock limit per runPrevent a hung agent from blocking CI indefinitely:
tracecore run --agent agents/toy_agent.py --task filesystem_hidden_config@1 --seed 0 --timeout 60
tracecore run pairing log_stream_monitor --timeout 90If the run exceeds the limit the CLI exits immediately with a non-zero code and a clear message. The timeout is enforced via a daemon thread so the process terminates cleanly.
Print a compact table of recent runs without opening the dashboard:
tracecore runs summary # last 20 runs
tracecore runs summary --task log_stream_monitor@1 # filter by task
tracecore runs summary --failure-type budget_exhausted # filter by outcome
tracecore runs summary --limit 5 # fewer rowsFor raw JSON (e.g., for scripting) use tracecore runs list with the same filters.
tracecore maintain forwards flags after --. Use:
tracecore maintain --fix-agent path/to/file.py -- --maxfail=1 -qSee docs/maintainer.md for the full command reference.
- The module must expose a class implementing
reset,observe, andact. - Confirm the file path is importable (relative to repo root or absolute path).
- The loader picks the first class in the module that satisfies the interface; rename or reorder classes if multiple candidates exist.
Compare your emitted action schema to the task docs (docs/tasks.md + per-task README).
Common slips:
- Missing
argsdict - Wrong action
type - Returning
Noneinstead of a dict
- Agents that loop without checking observation-provided budgets can churn through steps before
making progress. Read the
remaining_steps/remaining_tool_callsfields in observations and stop early when low. - For debugging, add logging around each action to confirm you are not retrying the same failing call.
If a re-run produces different traces:
- Confirm you re-used the same
--seed,agent, andtask. - Inspect
.agent_bench/runs/<run_id>.jsonforharness_versionmismatches. - Ensure external randomness (e.g.,
random, numpy, or model sampling) is seeded fromtask_spec. - Avoid wall-clock timestamps inside actions; store logical timers instead.
The harness rejected the agent's output. Inspect the trace entry with "invalid_action_reason"
for field-level details.
The validator declared {"ok": false, "terminal": true} or the run ended without a success
condition. Typical causes:
- Required artifact (API key, patch, token) missing or misformatted.
- Agent skipped a validation step (
validate.pyfailed its checks).
Open .agent_bench/runs/<run_id>.json directly or load /?trace_id=<run_id> in the dashboard to see the validator message.
- Budgets are set in the task's
task.tomlmanifest. To debug, temporarily increase them there and re-run. - Inspect whether you are stuck in recovery loops (e.g., repeating
read_fileon the same path).
- Timeouts occur only if you passed
--timeoutor a task enforces one. non_terminationis reserved; if you see it, file a bug with the trace and harness version.
- Confirm the FastAPI server log shows the incoming request. If not, CORS or proxy filters might block POSTs.
- Ensure the backend is running in the same environment that has the agents/tasks installed.
- When using
--reload, the process restarts on file changes; avoid editing large directories while running tests.
--reloadis for local development only. Do not expose the dashboard with--reloadon shared or networked machines. For stable serving, omit the flag.
- Agent path must be relative to repo root or absolute. Use the dropdown or copy/paste from
agents/. - Task IDs require the
@versionsuffix (e.g.,filesystem_hidden_config@1). - Seeds must be integers.
Runs list is cached per session. Force-refresh with Ctrl+Shift+R or call
tracecore runs summary --task log_stream_monitor@1 --limit 5to verify the artifacts exist. If .agent_bench/runs/ is empty,
ensure the CLI has write permissions (network shares may block file creation).
You are running in dry-run mode. Add --apply to write changes.
Either the fixer does not target that file, or it is already idempotent. Cross-check the fixer
patterns in docs/maintainer.md and run the dedicated formatter (e.g., ruff format, black)
if needed.
- Re-run locally with the
--seedreported in CI. - Check for relative paths that only exist on CI runners (e.g.,
/home/runner/work/...). Use repo-relative paths instead. - If CI lacks the recorded agent dependencies, ensure
pyproject.tomlextras cover them andpip install -e .[dev]is part of the workflow (seedocs/ci_workflow.md).
- Inspect the latest
.agent_bench/runs/<run_id>.jsonfor harness + validator messages. - Reproduce with a fixed
--seedand inspect the resulting artifact. - Capture environment details:
python --versionpip show agent-bench- OS / shell
- Cross-reference specialized docs:
docs/agent_interface.mdfor contract issuesdocs/task_manifest.md+ per-task READMEs for action schemasdocs/record_mode.mdanddocs/manual_verification.mdfor replay workflows
- File an issue with the above details if the problem persists.