Last updated: 2026-05-03
This runbook documents how to operate the application that exists in the repository today. It focuses on practical runtime procedures, verification steps, and failure handling.
This repository assumes:
- a single deployed application instance,
- a single SQLite database file,
- externally scheduled cron calls,
- no background worker queue,
- no distributed rate limit,
- paper-trading only.
Because orchestration is route-driven, healthy operations depend on cron timing, environment configuration, and the SQLite file all being correct.
Current production VPS baseline:
- runtime: Next.js served by
forecasterarena.serviceunder systemd, behind Nginx - public domain:
https://forecasterarena.com - release shape: immutable directories under
/opt/forecasterarena-release-* - state path:
/opt/forecasterarena-state - database path:
/opt/forecasterarena-state/data/forecaster.db - backup path:
/opt/forecasterarena-state/backups - app port:
3010, intended to be reached through Nginx or localhost cron calls - scheduler file:
/etc/cron.d/forecasterarena
Production should define all of the following:
OPENROUTER_API_KEYCRON_SECRETADMIN_PASSWORDDATABASE_PATH=/opt/forecasterarena-state/data/forecaster.dbBACKUP_PATH=/opt/forecasterarena-state/backupsDECISION_COHORT_LIMIT=5(optional; defaults to5)NEXT_PUBLIC_SITE_URL(recommended)NEXT_PUBLIC_GITHUB_URL(optional)
Important current behavior:
- If
CRON_SECRETis missing in production, cron routes fail closed. - If
ADMIN_PASSWORDis missing in production, admin login fails closed. - If
OPENROUTER_API_KEYis missing in production, decision routes and model calls fail. GET /api/healthreports configuration incompleteness generically; it does not list missing secret names.
The code does not contain its own scheduler. The currently intended cadence is:
# Sync top markets from Polymarket
*/5 * * * * curl -s -X POST http://127.0.0.1:3010/api/cron/sync-markets \
-H "Authorization: Bearer $CRON_SECRET"
# Start the weekly cohort at Sunday 00:00 UTC
0 0 * * 0 curl -s -X POST http://127.0.0.1:3010/api/cron/start-cohort \
-H "Authorization: Bearer $CRON_SECRET"
# Run model decisions after cohort creation
5 0 * * 0 curl -s -X POST http://127.0.0.1:3010/api/cron/run-decisions \
-H "Authorization: Bearer $CRON_SECRET"
# Re-check closed markets for resolution
0 * * * * curl -s -X POST http://127.0.0.1:3010/api/cron/check-resolutions \
-H "Authorization: Bearer $CRON_SECRET"
# Mark to market unarchived active cohorts
*/10 * * * * curl -s -X POST http://127.0.0.1:3010/api/cron/take-snapshots \
-H "Authorization: Bearer $CRON_SECRET"
# Check OpenRouter for newer general-purpose model releases
0 9 * * 1 curl -s -X POST http://127.0.0.1:3010/api/cron/check-model-lineup \
-H "Authorization: Bearer $CRON_SECRET"
# Create a database backup before the next weekly cycle
0 23 * * 6 curl -s -X POST http://127.0.0.1:3010/api/cron/backup \
-H "Authorization: Bearer $CRON_SECRET"Why this schedule matters:
run-decisionsassumesstart-cohorthas already executed or that the weekly cohort can still be bootstrapped.run-decisionsonly spends LLM calls on unarchived active cohorts inside the latest decision window; older current v2 cohorts remain tracking-only.check-resolutionsonly processes markets that are locallyclosed.check-model-lineuponly reads the public OpenRouter catalog and creates an admin review. It never promotes releases or rolls cohorts without operator approval.take-snapshotsis most useful when run regularly; the database schema is timestamp-based, not day-based, archived v1 cohorts are intentionally excluded from routine snapshots, and open-position valuation uses CLOB prices with prior-value fallback on anomalies.
Run these at least once per day.
curl -s https://yourdomain.com/api/health | jqHealthy example:
{
"status": "ok",
"checks": {
"database": { "status": "ok" },
"environment": { "status": "ok" },
"data_integrity": { "status": "ok" }
}
}Unhealthy example:
{
"status": "error",
"checks": {
"database": { "status": "ok" },
"environment": {
"status": "error",
"message": "Required configuration is incomplete"
}
}
}Interpretation:
database = errormeans the app could not open or query SQLite.environment = errormeans one or more required production settings are missing.data_integrity = errormeans the lightweight integrity probe failed or detected an issue.
systemctl status forecasterarena --no-pager
journalctl -u forecasterarena --no-pager -n 100Expected:
- the service is
active (running), - no restart loop,
- no repeated OpenRouter timeout or auth failures.
sqlite3 /opt/forecasterarena-state/data/forecaster.db "
SELECT severity, event_type, created_at
FROM system_logs
WHERE severity = 'error'
ORDER BY created_at DESC
LIMIT 50;
"Pay attention to:
cohort_start_errordecisions_run_erroragent_decision_erroragent_decision_execution_failedmarket_resolution_partial_failuretake_snapshots_errormarket_sync_error
sqlite3 /opt/forecasterarena-state/data/forecaster.db "
SELECT MAX(last_updated_at) AS last_market_update,
COUNT(*) AS total_markets
FROM markets;
"Interpretation:
- If
last_market_updateis stale,sync-marketsis not running or is failing. - If
total_marketsis zero in production, the public site will show empty-state copy instead of live benchmark copy.
Run these after the Sunday decision window.
The live benchmark now separates:
model_families: stable public competitor slotsmodel_releases: exact OpenRouter targetsbenchmark_configs: default lineup manifests for future cohortsagents: the frozen family/release/config assignment actually used by a cohort
Operational rule:
- register new releases and promote future configs through the admin benchmark control plane
- do not mutate old
modelsrows expecting historical cohorts to follow along safely - approving a default config affects future cohorts only
- active and historical cohorts keep their frozen release lineage
- only decision-eligible active cohorts receive new LLM calls
- archived v1 cohorts skip new decisions and routine snapshots while remaining available for settlement and historical drilldowns
Recommended release-rotation workflow:
- Confirm the new OpenRouter release is reachable and priced correctly.
- Register the exact release under its existing family.
- Build a complete benchmark config for every active family.
- Review the admin benchmark page and confirm every family points to the intended release and price snapshot.
- Promote that config as the future default.
- Verify the next cohort starts with the promoted config while existing cohorts keep their original releases.
Rollback rule:
- if the wrong default lineup is promoted, promote the previous config again before the next cohort starts
sqlite3 /opt/forecasterarena-state/data/forecaster.db "
SELECT id, cohort_number, started_at, status
FROM cohorts
ORDER BY started_at DESC
LIMIT 3;
"Expected:
- one newly created weekly cohort,
started_atnormalized to the week start,benchmark_config_idpopulated for the new cohort,- no duplicate cohort rows for the same Sunday.
sqlite3 /opt/forecasterarena-state/data/forecaster.db "
SELECT cohort_id, decision_week, COUNT(*) AS decisions
FROM decisions
WHERE decision_timestamp > datetime('now', '-24 hours')
GROUP BY cohort_id, decision_week
ORDER BY decision_timestamp DESC;
"Expected:
- one decision row per active agent in each decision-eligible unarchived cohort for the run,
- no new decision rows for older active cohorts outside the decision window,
- no duplicate rows for the same
(agent_id, cohort_id, decision_week).
Because the application now claims a single canonical decision row per agent/week, reruns should overwrite or reuse that row rather than add another one.
sqlite3 /opt/forecasterarena-state/data/forecaster.db "
SELECT d.id, d.action, COUNT(t.id) AS trades
FROM decisions d
LEFT JOIN trades t ON t.decision_id = d.id
WHERE d.decision_timestamp > datetime('now', '-24 hours')
GROUP BY d.id, d.action
ORDER BY d.decision_timestamp DESC;
"Interpretation:
HOLDwith zero trades is normal.BETorSELLwith zero trades is retryable but indicates an execution problem.
sqlite3 /opt/forecasterarena-state/data/forecaster.db "
SELECT MAX(snapshot_timestamp) AS latest_snapshot,
COUNT(*) AS snapshots_last_hour
FROM portfolio_snapshots
WHERE snapshot_timestamp > datetime('now', '-1 hour');
"Important:
- The table uses
snapshot_timestamp, notsnapshot_date. - Snapshot rows are upserted per
(agent_id, snapshot_timestamp). - Price provenance for markets used in a run is stored in
market_price_snapshots; fallback or Gamma/CLOB disagreement events are logged asprice_validation_anomaly.
curl -s -X POST http://127.0.0.1:3010/api/cron/sync-markets \
-H "Authorization: Bearer YOUR_CRON_SECRET"Use when:
- market totals look stale,
- close/resolution transitions appear delayed,
- local development database needs seeding.
curl -s -X POST http://127.0.0.1:3010/api/cron/start-cohort \
-H "Authorization: Bearer YOUR_CRON_SECRET"Use when:
- validating a fresh environment,
- recovering after a missed Sunday run.
Current safeguards:
- the operation is week-unique,
- repeated calls in the same week return the existing cohort,
- agent creation is physically idempotent by the frozen slot key
(cohort_id, benchmark_config_model_id), - the canonical benchmark identity is carried by
benchmark_config_idand each agent’sbenchmark_config_model_id.
curl -s -X POST http://127.0.0.1:3010/api/cron/run-decisions \
-H "Authorization: Bearer YOUR_CRON_SECRET"Use sparingly.
Current runtime behavior:
- the route budget is 10 minutes,
- model calls are capped at 40 seconds each,
- transport retries are effectively disabled by default,
- malformed-output retries remain enabled once per model.
curl -s -X POST http://127.0.0.1:3010/api/cron/check-resolutions \
-H "Authorization: Bearer YOUR_CRON_SECRET"Important current behavior:
- markets are only marked
resolvedafter settlement succeeds, - if one position fails to settle, the market remains
closedlocally and will be retried later.
curl -s -X POST http://127.0.0.1:3010/api/cron/take-snapshots \
-H "Authorization: Bearer YOUR_CRON_SECRET"Important current behavior:
- closed-but-unresolved positions can use prior value as a fallback,
- this prevents the portfolio curve from collapsing incorrectly when external prices become unusable.
curl -s -X POST http://127.0.0.1:3010/api/cron/backup \
-H "Authorization: Bearer YOUR_CRON_SECRET"Backups are written to /opt/forecasterarena-state/backups in production. If a local or staging environment omits BACKUP_PATH, the default is backups/.
Admin exports are cookie-authenticated, not cron-authenticated.
curl -s -X POST http://127.0.0.1:3010/api/admin/export \
-H "Content-Type: application/json" \
-H "Cookie: forecaster_admin=..." \
-d '{
"cohort_id": "cohort-id",
"from": "2026-03-01T00:00:00.000Z",
"to": "2026-03-07T00:00:00.000Z",
"include_prompts": false
}'Operational constraints:
- 7 day max range,
- 50,000 row cap per table,
- ZIP archive name is sanitized,
- old archives are cleaned after roughly 24 hours.
SELECT id, agent_id, cohort_id, decision_week, decision_timestamp
FROM decisions
WHERE action = 'ERROR'
AND error_message = '__IN_PROGRESS__'
ORDER BY decision_timestamp DESC;Interpretation:
- a fresh row may simply represent a running decision,
- a stale row may indicate a crashed run that should be reclaimed by the next decision cycle.
SELECT agent_id, cohort_id, decision_week, COUNT(*) AS cnt
FROM decisions
GROUP BY agent_id, cohort_id, decision_week
HAVING COUNT(*) > 1;SELECT started_at, COUNT(*) AS cnt
FROM cohorts
GROUP BY started_at
HAVING COUNT(*) > 1;Both queries should return zero rows.
SELECT id, cohort_number
FROM cohorts
WHERE benchmark_config_id IS NULL
ORDER BY cohort_number DESC;SELECT id, cohort_id, model_id, family_id, release_id, benchmark_config_model_id
FROM agents
WHERE family_id IS NULL
OR release_id IS NULL
OR benchmark_config_model_id IS NULL
ORDER BY created_at DESC;Both queries should return zero rows. Any result indicates the frozen benchmark lineage was bypassed or corrupted.
SELECT id, polymarket_id, question, close_date, last_updated_at
FROM markets
WHERE status = 'closed'
ORDER BY close_date DESC
LIMIT 50;If rows remain here for too long, inspect resolution logs and retry /api/cron/check-resolutions.
SELECT p.id, p.agent_id, p.market_id, p.side, p.status, m.status AS market_status
FROM positions p
JOIN markets m ON p.market_id = m.id
WHERE p.status = 'open'
AND m.status IN ('closed', 'resolved')
ORDER BY m.status, p.opened_at DESC;Some open + closed rows are expected before resolution. open + resolved rows should be investigated.
Likely causes:
- missing production env vars,
- process started without the correct
.env.local, - deployment secret drift.
What to do:
- inspect actual process env,
- confirm
OPENROUTER_API_KEY,CRON_SECRET, andADMIN_PASSWORD, - restart
forecasterarena.service, - re-check
/api/health.
Likely causes:
- cron did not call the route,
- OpenRouter auth/config failed,
- model requests timed out,
- decision rows are stuck in
__IN_PROGRESS__.
What to do:
- inspect logs for
decisions_run_errorandagent_decision_error, - query in-progress decisions,
- manually rerun
/api/cron/run-decisionsif the environment is healthy.
Likely causes:
- Polymarket still has no decisive winner,
- settlement is partially failing and the market is intentionally being left
closed, - resolution checks are not running.
What to do:
- inspect
market_resolution_partial_failureinsystem_logs, - inspect
positionsfor that market, - rerun
/api/cron/check-resolutionsafter the underlying failure is fixed.
Likely causes:
- invalid date range,
- row cap exceeded,
- temp directory / zip utility failure,
- missing admin session.
What to do:
- reduce the date window,
- confirm session auth,
- check server logs,
- verify the
zipexecutable exists in the runtime environment.
After every deploy:
curl /api/healthcurl /api/leaderboardcurl /api/markets?limit=1- verify one CSS file, one JS chunk, and one font under
/_next/static/*return200with the correct content type, - verify admin login,
- run a build-local smoke test if the host permits it,
- confirm scheduled jobs still include the correct bearer token.
Recommended additional checks after schema-affecting changes:
- verify unique decision rows,
- verify unique weekly cohorts,
- verify snapshot insertion still upserts cleanly.
This repository’s tsconfig.json includes .next/types/**/*.ts. In practice that means:
npm run buildshould succeed beforenpm run typecheckis fully meaningful,- a failed build can cause
typecheckto complain about missing generated.next/typesfiles.
For local validation, use this order:
npm run checknpm run test:e2enpm run test:e2e:emptywhen you need explicit empty-state browser coverage
npm run check includes npm run build:standalone, which copies and verifies
the static assets required by the production standalone server. If a deploy
uses a manually created release directory, run npm run prepare:standalone-assets
and npm run check:standalone-assets in that release before switching systemd.
docs/ARCHITECTURE.mddocs/API_REFERENCE.mddocs/SECURITY.mddocs/DATABASE_SCHEMA.mddocs/TROUBLESHOOTING.md