diff --git a/README.md b/README.md index 4af195be8..4521d7e94 100644 --- a/README.md +++ b/README.md @@ -310,6 +310,7 @@ Projects use JSON schema files in the `agentcore/` directory: - [Evaluations](docs/evals.md) - Evaluators, on-demand evals, and online monitoring - [Batch Evaluation](docs/batch-evaluation.md) - Run evaluators across sessions at scale - [Recommendations](docs/recommendations.md) - Optimize prompts and tool descriptions +- [Insights](docs/insights.md) - Failure-pattern analysis and clustering across agent sessions - [A/B Tests](docs/ab-tests.md) - Split traffic between variants and promote the winner **Operations** diff --git a/docs/insights.md b/docs/insights.md new file mode 100644 index 000000000..a084df567 --- /dev/null +++ b/docs/insights.md @@ -0,0 +1,146 @@ +# Insights — `[preview]` + +Insights run failure-pattern analysis across your agent's sessions. The insights service inspects historical traces, +clusters bad outcomes into failure categories, and surfaces root causes with recommendations you can act on. Run it +on-demand with `run insights`, or attach a continuous config with `add online-insights`. + +> **Preview:** the insights feature is in preview. Commands and output may change. + +## Quick Start + +```bash +# On-demand failure analysis over the last 7 days of sessions +agentcore run insights \ + -r MyAgent \ + --insights Builtin.Insight.FailureAnalysis + +# Block until the job finishes +agentcore run insights -r MyAgent --insights Builtin.Insight.FailureAnalysis --wait +``` + +If you omit `--insights`, the CLI defaults to `Builtin.Insight.FailureAnalysis`. + +## On-Demand Insights + +`run insights` starts a job that analyzes the sessions it finds for your runtime in CloudWatch. + +### Choosing the session window + +By default, insights looks back 7 days. Narrow or widen the window with `--lookback-days`, or pin an explicit range +with `--start-time` / `--end-time`: + +```bash +# Custom lookback window (1–90 days) +agentcore run insights -r MyAgent --insights Builtin.Insight.FailureAnalysis --lookback-days 14 + +# Explicit time range (ISO-8601) +agentcore run insights \ + -r MyAgent \ + --insights Builtin.Insight.FailureAnalysis \ + --start-time 2026-06-01T00:00:00Z \ + --end-time 2026-06-15T00:00:00Z +``` + +### Limiting to specific sessions + +```bash +agentcore run insights -r MyAgent --session-ids +``` + +### Using an existing online eval config as the source + +```bash +agentcore run insights --online-eval-config-arn +``` + +### Chaining into recommendations + +Pass evaluators with `-e` so the resulting batch evaluation can later feed `run recommendation --from-insights`: + +```bash +agentcore run insights -r MyAgent -e Builtin.Correctness +agentcore run recommendation -r MyAgent -e Builtin.Correctness --type system-prompt --from-insights +``` + +## Options Reference + +| Option | Description | +| ---------------------------- | ---------------------------------------------------------------------- | +| `-r, --runtime ` | Runtime name from project config. | +| `--insights ` | Insight type(s). Defaults to `Builtin.Insight.FailureAnalysis`. | +| `-e, --evaluator ` | Evaluator(s) to include (needed for chaining into recommendations). | +| `--online-eval-config-arn` | Use an existing OnlineEvaluationConfig as the session source. | +| `-d, --lookback-days ` | Lookback window in days, 1–90 (default: 7). | +| `--start-time ` | Session filter start time. | +| `--end-time ` | Session filter end time. | +| `-s, --session-ids ` | Limit analysis to specific session IDs. | +| `-n, --name ` | Job name (auto-generated if omitted). | +| `--endpoint ` | Runtime endpoint name (e.g. `PROMPT_V1`). | +| `--wait` | Block until the job reaches a terminal state. | +| `--region ` | AWS region (auto-detected if omitted). | +| `--json` | Output as JSON. | + +## Output + +Insights jobs are fire-and-forget: `run insights` returns the job `id` and an initial `status` +(`PENDING`/`IN_PROGRESS`) — the failure analysis is **not** available immediately. Pass `--wait` to block until the job +finishes, or check later with `agentcore view insights `. + +```bash +agentcore run insights -r MyAgent --insights Builtin.Insight.FailureAnalysis --json +``` + +A completed job record includes: + +| Field | Description | +| ----------------------- | ---------------------------------------------------------------------------- | +| `id` | Insights job ID. | +| `name` | Job name. | +| `status` | Job status (`PENDING`, `IN_PROGRESS`, `COMPLETED`, `FAILED`). | +| `insights` | Insight type(s) requested. | +| `evaluators` | Evaluators included (when chaining into recommendations). | +| `failureAnalysisResult` | Structured failure categories, each with root causes and recommendations. | +| `evaluationResults` | Per-evaluator score summaries (when evaluators were included). | + +Each failure category carries a name, description, optional group, and one or more root causes. A root cause includes a +category, description, a recommendation, and the related session IDs. + +## Viewing History + +List past insights jobs or view one in detail: + +```bash +# List all insights jobs (or open the TUI when run without --json) +agentcore view insights --json + +# Detail for a single job +agentcore view insights --json +``` + +You can also browse jobs interactively via the TUI: + +```bash +agentcore +# Navigate to: View → Insights +``` + +## Continuous Insights + +Attach a config that runs insights continuously alongside your online evals: + +```bash +agentcore add online-insights # add a continuous insights config bound to a runtime +agentcore pause online-insights +agentcore resume online-insights +``` + +Use `--arn ` with `pause`/`resume` to target configs outside the current project. + +## Archiving + +Delete an insights job on the service and clear local history: + +```bash +agentcore archive insights -i +agentcore archive insights -i --region us-west-2 --json +``` diff --git a/src/cli/tui/__tests__/run-insights-copy.test.ts b/src/cli/tui/__tests__/run-insights-copy.test.ts new file mode 100644 index 000000000..c5115dd0f --- /dev/null +++ b/src/cli/tui/__tests__/run-insights-copy.test.ts @@ -0,0 +1,39 @@ +/** + * Regression test: the `run-insights` CLI examples in copy.ts must only reference + * flags that the `run insights` command actually registers. Guards against drift + * like `--lookback 7` (the real flag is `--lookback-days`). + */ +import { createProgram } from '../../cli'; +import { CLI_ONLY_EXAMPLES } from '../copy'; +import { describe, expect, it } from 'vitest'; + +function registeredFlags(): Set { + const program = createProgram(); + const runCmd = program.commands.find(c => c.name() === 'run'); + const insights = runCmd?.commands.find(c => c.name() === 'insights'); + if (!insights) throw new Error('run insights command not found'); + const flags = new Set(); + for (const opt of insights.options) { + if (opt.short) flags.add(opt.short); + if (opt.long) flags.add(opt.long); + } + return flags; +} + +const examples = CLI_ONLY_EXAMPLES['run-insights']?.examples ?? []; + +describe('run-insights copy examples', () => { + it('only reference flags the `run insights` command registers', () => { + const flags = registeredFlags(); + const tokens = examples.flatMap(example => example.split(/\s+/)).filter(token => token.startsWith('-')); + + const unknown = tokens.filter(token => !flags.has(token)); + expect(unknown).toEqual([]); + }); + + it('does not reference the non-existent --lookback flag', () => { + for (const example of examples) { + expect(example).not.toMatch(/--lookback\s/); + } + }); +}); diff --git a/src/cli/tui/copy.ts b/src/cli/tui/copy.ts index 794dc000b..8b9fe3df9 100644 --- a/src/cli/tui/copy.ts +++ b/src/cli/tui/copy.ts @@ -130,9 +130,9 @@ export const CLI_ONLY_EXAMPLES: Record