crew: empirical calibration for /crew:do's topology classifier

`/crew:do` currently classifies with a Haiku call against a hand-crafted prompt that encodes the Coase/Hayek-topology paper's decision table. No validation.

## Proposed

Build a labeled dataset of ~100–200 tasks tagged with their \"correct\" topology:

- **solo**: refactor with cross-file invariants, long-horizon feature, schema migration.
- **hub-spoke**: code review, security audit, multi-file linting.
- **market**: LeetCode-medium with tests, subtle regex, dense math with cheap oracle.
- **hybrid**: large feature with mix of reasoning + review sub-steps.

Run the classifier against the dataset, measure accuracy per category, iterate on the prompt. Check-in: would we use a labeled dataset within the repo, or externalize to a fixtures file?

## Why this matters

Without validation, `/crew:do` mispicks at some unknown rate. Users who don't know to second-guess it get the wrong topology for their task. If accuracy is, say, 60%, the skill is worse than flipping a coin on 4-way classification. If 90%, it's a clear win.

Classifier cost is ~\\$0.001/call so an ensemble vote or a more expensive model (sonnet) is worth considering IF calibration shows haiku is under-performing.

Originally flagged in PR #61's follow-ups list and in `skills/do/SKILL.md`'s \"Known limitations\".

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

crew: empirical calibration for /crew:do's topology classifier #64

Proposed

Why this matters

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

crew: empirical calibration for /crew:do's topology classifier #64

Description

Proposed

Why this matters

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions