Troubleshooting Guide

A quick reference for the most common TraceCore/tracecore CLI issues (the agent-bench legacy alias still works) across installation, CLI runs, tasks, and the optional web UI. When in doubt, inspect the latest artifact in .agent_bench/runs/.

Tip: Load .agent_bench/runs/<run_id>.json directly, or use tracecore runs list --limit 5 to find recent run IDs. The dashboard trace viewer at /?trace_id=<run_id> surfaces the same validator and harness messages.

1. Installation & Environment

Before installing TraceCore, follow FastAPI's virtual environment guidance to create and activate an isolated interpreter (e.g., python -m venv .venv then source .venv/bin/activate or .venv\Scripts\activate). Running installs and CLI commands inside the same shell session avoids PATH confusion.

Development vs Pip Install: Which setup do you have?

Development (git clone) setup:

git clone https://github.com/justindobbs/Tracecore.git
cd Tracecore
python -m venv .venv && .venv\Scripts\activate
pip install -e .[dev]

Live edits to CLI/tasks/agents work immediately
Uses local files directly from repo
Run from repo root for relative imports

Pip install setup:

pip install tracecore
# or
pipx install tracecore

Installed to system/site-packages
Must reinstall to get updates
Works from any directory

Testing your packaged install (mirrors CI)

# From repo root - build and test wheel
python -m build --wheel
python -m venv .tmp-tracecore
.tmp-tracecore\Scripts\pip install dist\tracecore-*.whl

# Verify it works
.tmp-tracecore\Scripts\tracecore --help
.tmp-tracecore\Scripts\tracecore run pairing --list

`tracecore: command not found`

Ensure you ran pip install -e .[dev] (development setup) OR pip install tracecore (pip setup).
Activate the virtualenv before running commands (.venv\Scripts\activate on Windows).
Verify you are invoking the same interpreter that owns the install (e.g., which python).

Development setup specific:

Run from repo root so relative imports (tasks, agents) resolve correctly.
Use pip install -e .[dev] if the editable link was removed.

Pip install setup specific:

Should work from any directory - no need to be in repo root.
Reinstall with pip install --upgrade tracecore to get updates.

Windows-specific

Add %APPDATA%\Python\Python3x\Scripts (or the pipx shim dir) to PATH. See the "Windows PATH tip" in README.md for step-by-step instructions.
After editing PATH, open a new terminal so the shell picks up the change.

Common pitfalls

Launching tracecore from PowerShell after activating the virtualenv in Command Prompt (or vice versa). Activate the env in the same shell you use to run the CLI so PATH and PYTHONPATH match.
Development setup: Running commands from inside the .venv/ folder. Always run tracecore from the repo root so relative imports (tasks, agents) resolve correctly.
Pip setup: Expecting live edits to work. Changes require rebuilding and reinstalling the package.

`ModuleNotFoundError: No module named 'agent_bench'`

The package is not on PYTHONPATH.

Development setup: Activate the same virtualenv used for installation or reinstall with pip install -e . if the editable link was removed.

Pip setup: Ensure you're using the correct Python environment where TraceCore was installed, or reinstall with pip install tracecore.

Mixed Python versions between install and runtime

If python points at a different interpreter than the one that ran pip, the scripts land in another site-packages. Pin a single interpreter via py -3.12 -m venv .venv && .venv\Scripts\activate (Windows) or python3.12 -m venv .venv (macOS/Linux).

2. CLI Invocation Errors

Scaffold a new agent: `new-agent`

Generate a stub with the correct reset / observe / act interface:

tracecore new-agent my_agent                        # creates agents/my_agent_agent.py
tracecore new-agent my-agent                        # kebab-case → MyAgentAgent
tracecore new-agent my_agent --output-dir src/      # write to a different directory
tracecore new-agent my_agent --force                # overwrite an existing file

The generated file is immediately importable and runnable. Replace the # TODO block in act() with your decision logic, then test it:

tracecore run --agent agents/my_agent_agent.py --task filesystem_hidden_config@1 --seed 0

If the file already exists and --force is not set, the command exits non-zero with a clear error rather than silently overwriting.

Quick-start: `run pairing`

The fastest way to fire a known-good run without memorizing flags:

tracecore run pairing log_stream_monitor          # run by name, seed 0
tracecore run pairing log_stream_monitor --seed 7 # custom seed
tracecore run pairing --list                      # show all available pairings

If you are inside a directory that contains exactly one paired agent file, the name can be omitted and it auto-selects. If the name is unknown or ambiguous, the CLI prints the pairing list and exits with a non-zero code.

Smoke-test every pairing in sequence (CI-friendly — exits non-zero if any fail):

tracecore run pairing --all
tracecore run pairing --all --seed 7 --timeout 120   # 120 s wall-clock limit per run

Wall-clock timeout: `--timeout`

Prevent a hung agent from blocking CI indefinitely:

tracecore run --agent agents/toy_agent.py --task filesystem_hidden_config@1 --seed 0 --timeout 60
tracecore run pairing log_stream_monitor --timeout 90

If the run exceeds the limit the CLI exits immediately with a non-zero code and a clear message. The timeout is enforced via a daemon thread so the process terminates cleanly.

Inspect recent runs: `runs summary`

Print a compact table of recent runs without opening the dashboard:

tracecore runs summary                                  # last 20 runs
tracecore runs summary --task log_stream_monitor@1      # filter by task
tracecore runs summary --failure-type budget_exhausted  # filter by outcome
tracecore runs summary --limit 5                        # fewer rows

For raw JSON (e.g., for scripting) use tracecore runs list with the same filters.

`pytest got unrecognized arguments: --fix-agent`

tracecore maintain forwards flags after --. Use:

tracecore maintain --fix-agent path/to/file.py -- --maxfail=1 -q

See docs/maintainer.md for the full command reference.

`No compatible agent class found`

The module must expose a class implementing reset, observe, and act.
Confirm the file path is importable (relative to repo root or absolute path).
The loader picks the first class in the module that satisfies the interface; rename or reorder classes if multiple candidates exist.

`Invalid action` failures on start

Compare your emitted action schema to the task docs (docs/tasks.md + per-task README). Common slips:

Missing args dict
Wrong action type
Returning None instead of a dict

`Budget exhausted` immediately

Agents that loop without checking observation-provided budgets can churn through steps before making progress. Read the remaining_steps / remaining_tool_calls fields in observations and stop early when low.
For debugging, add logging around each action to confirm you are not retrying the same failing call.

Determinism drift detected

If a re-run produces different traces:

Confirm you re-used the same --seed, agent, and task.
Inspect .agent_bench/runs/<run_id>.json for harness_version mismatches.
Ensure external randomness (e.g., random, numpy, or model sampling) is seeded from task_spec.
Avoid wall-clock timestamps inside actions; store logical timers instead.

3. Runner / Validator Outcomes

`failure_type: invalid_action`

The harness rejected the agent's output. Inspect the trace entry with "invalid_action_reason" for field-level details.

`failure_type: logic_failure`

The validator declared {"ok": false, "terminal": true} or the run ended without a success condition. Typical causes:

Required artifact (API key, patch, token) missing or misformatted.
Agent skipped a validation step (validate.py failed its checks).

Open .agent_bench/runs/<run_id>.json directly or load /?trace_id=<run_id> in the dashboard to see the validator message.

`failure_type: budget_exhausted`

Budgets are set in the task's task.toml manifest. To debug, temporarily increase them there and re-run.
Inspect whether you are stuck in recovery loops (e.g., repeating read_file on the same path).

`failure_type: timeout` or `non_termination`

Timeouts occur only if you passed --timeout or a task enforces one.
non_termination is reserved; if you see it, file a bug with the trace and harness version.

4. Web UI / Dashboard

Page loads but runs never start

Confirm the FastAPI server log shows the incoming request. If not, CORS or proxy filters might block POSTs.
Ensure the backend is running in the same environment that has the agents/tasks installed.
When using --reload, the process restarts on file changes; avoid editing large directories while running tests.

--reload is for local development only. Do not expose the dashboard with --reload on shared or networked machines. For stable serving, omit the flag.

`422 Unprocessable Entity` when submitting the form

Agent path must be relative to repo root or absolute. Use the dropdown or copy/paste from agents/.
Task IDs require the @version suffix (e.g., filesystem_hidden_config@1).
Seeds must be integers.

Dashboard shows stale runs

Runs list is cached per session. Force-refresh with Ctrl+Shift+R or call

tracecore runs summary --task log_stream_monitor@1 --limit 5

to verify the artifacts exist. If .agent_bench/runs/ is empty, ensure the CLI has write permissions (network shares may block file creation).

5. Maintenance & CI Helpers

`tracecore maintain` says `changed: true` but files are untouched

You are running in dry-run mode. Add --apply to write changes.

`changed: false` but file still violates style

Either the fixer does not target that file, or it is already idempotent. Cross-check the fixer patterns in docs/maintainer.md and run the dedicated formatter (e.g., ruff format, black) if needed.

CI fails with `tracecore run ... exited 1`

Re-run locally with the --seed reported in CI.
Check for relative paths that only exist on CI runners (e.g., /home/runner/work/...). Use repo-relative paths instead.
If CI lacks the recorded agent dependencies, ensure pyproject.toml extras cover them and pip install -e .[dev] is part of the workflow (see docs/ci_workflow.md).

6. When All Else Fails

Inspect the latest .agent_bench/runs/<run_id>.json for harness + validator messages.
Reproduce with a fixed --seed and inspect the resulting artifact.
Capture environment details:
- python --version
- pip show agent-bench
- OS / shell
Cross-reference specialized docs:
- docs/agent_interface.md for contract issues
- docs/task_manifest.md + per-task READMEs for action schemas
- docs/record_mode.md and docs/manual_verification.md for replay workflows
File an issue with the above details if the problem persists.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Troubleshooting Guide

1. Installation & Environment

Development vs Pip Install: Which setup do you have?

Testing your packaged install (mirrors CI)

`tracecore: command not found`

`ModuleNotFoundError: No module named 'agent_bench'`

Mixed Python versions between install and runtime

2. CLI Invocation Errors

Scaffold a new agent: `new-agent`

Quick-start: `run pairing`

Wall-clock timeout: `--timeout`

Inspect recent runs: `runs summary`

`pytest got unrecognized arguments: --fix-agent`

`No compatible agent class found`

`Invalid action` failures on start

`Budget exhausted` immediately

Determinism drift detected

3. Runner / Validator Outcomes

`failure_type: invalid_action`

`failure_type: logic_failure`

`failure_type: budget_exhausted`

`failure_type: timeout` or `non_termination`

4. Web UI / Dashboard

Page loads but runs never start

`422 Unprocessable Entity` when submitting the form

Dashboard shows stale runs

5. Maintenance & CI Helpers

`tracecore maintain` says `changed: true` but files are untouched

`changed: false` but file still violates style

CI fails with `tracecore run ... exited 1`

6. When All Else Fails

FilesExpand file tree

troubleshooting.md

Latest commit

History

troubleshooting.md

File metadata and controls

Troubleshooting Guide

1. Installation & Environment

Development vs Pip Install: Which setup do you have?

Testing your packaged install (mirrors CI)

tracecore: command not found

ModuleNotFoundError: No module named 'agent_bench'

Mixed Python versions between install and runtime

2. CLI Invocation Errors

Scaffold a new agent: new-agent

Quick-start: run pairing

Wall-clock timeout: --timeout

Inspect recent runs: runs summary

pytest got unrecognized arguments: --fix-agent

No compatible agent class found

Invalid action failures on start

Budget exhausted immediately

Determinism drift detected

3. Runner / Validator Outcomes

failure_type: invalid_action

failure_type: logic_failure

failure_type: budget_exhausted

failure_type: timeout or non_termination

4. Web UI / Dashboard

Page loads but runs never start

422 Unprocessable Entity when submitting the form

Dashboard shows stale runs

5. Maintenance & CI Helpers

tracecore maintain says changed: true but files are untouched

changed: false but file still violates style

CI fails with tracecore run ... exited 1

6. When All Else Fails

`tracecore: command not found`

`ModuleNotFoundError: No module named 'agent_bench'`

Scaffold a new agent: `new-agent`

Quick-start: `run pairing`

Wall-clock timeout: `--timeout`

Inspect recent runs: `runs summary`

`pytest got unrecognized arguments: --fix-agent`

`No compatible agent class found`

`Invalid action` failures on start

`Budget exhausted` immediately

`failure_type: invalid_action`

`failure_type: logic_failure`

`failure_type: budget_exhausted`

`failure_type: timeout` or `non_termination`

`422 Unprocessable Entity` when submitting the form

`tracecore maintain` says `changed: true` but files are untouched

`changed: false` but file still violates style

CI fails with `tracecore run ... exited 1`