Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 0 additions & 51 deletions build-mcpb/references/workflow/phase-3-implement-and-verify.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,64 +67,13 @@ make check
```
This runs: `format:check` → `lint` → `typecheck` → `test`

### Integration tests

**Python:**
```bash
make test-integration # real API calls (needs <NAME>_API_KEY in .env)
```

Real API calls against the live service. The template scaffolds `tests-integration/test_core_tools.py` as a stub — you must replace it with real tests.

**How to write them:** Open `api_client.py` and list every public method (skip `__init__`, `close`, `_request`, `_ensure_session`, and dunder methods). For each method, write a test:

- **Read methods** (list, get, search): Call with minimal valid parameters, assert the response has the expected shape (list, dict, or model with expected keys).
- **Write methods** (create, update, delete): Create a test resource, verify it, then clean it up in a `finally` block. If no delete method exists, mark the resource as completed/archived and leave a comment.
- **Chained methods:** Some methods need an ID from a prior call (e.g., `list_workspaces` returns a GID needed by `search_tasks`). Chain them — call the list method first, use the first result's ID.
- **Tier-gated methods:** If the API has premium endpoints that may not be available on the user's plan, write a `has_<feature>_access` helper that probes the endpoint and returns `False` on 401/402/403. Use `pytest.skip()` in the test if access is unavailable.

See `references/PATTERNS.md` → "Integration Test Patterns (Python)" for concrete examples.

**How to run them:**

1. Ask the contributor to add their API key to `.env` (e.g., `ASANA_API_KEY=xxx`). The `.env` file is already in `.gitignore` and `.mcpbignore`. The contributor was asked for this key at the start of the process — they should have it ready.
2. Run `make test-integration`.
3. All tests should pass or skip (for tier-gated features). Fix any failures before proceeding.
4. If the contributor says auth setup is too complex for now (e.g., OAuth flows, multi-step app configuration), proceed — the tests are written and ready to run later. Do not skip writing the tests.

### LLM smoke tests

**Python:**
```bash
make test-llm # needs <NAME>_API_KEY + ANTHROPIC_API_KEY in .env
```

Verify Claude Haiku selects the correct tool given the skill resource. Requires both the service API key and `ANTHROPIC_API_KEY`.

**How to write them:** The template scaffolds `get_server_context()` and `get_anthropic_client()` — leave those as-is. Replace the commented-out test stub with 3–5 real tests, one per key tool. Extract a `call_llm()` helper to avoid repeating the system prompt construction across tests.

Each test sends a natural language prompt and asserts the LLM selected the expected tool. Include concrete values for any required parameters in the prompt (IDs, coordinates, dates) — without them, the LLM will ask for clarification instead of calling the tool.

See `references/PATTERNS.md` → "LLM smoke tests" for the `call_llm()` helper pattern and the concrete-identifiers rule.

**How to run them:**

1. Ask the contributor to add `ANTHROPIC_API_KEY` to `.env` alongside the service API key.
2. Run `make test-llm`.
3. All tests should pass. If a test fails because the LLM picked the wrong tool, adjust the prompt to be more specific before touching the SKILL.md.
4. If the contributor does not have an `ANTHROPIC_API_KEY`, proceed — the tests are written and ready to run later. Do not skip writing the tests.

## Gate

**Criteria:**
- [ ] All tool logic, models, and client code implemented
- [ ] Linting passes with no errors
- [ ] Type checking passes with no errors
- [ ] Unit tests pass (`make check`)
- [ ] Integration tests written with real assertions (not stubs or TODOs)
- [ ] Integration tests pass or skip, if API key is available
- [ ] LLM smoke tests written with real assertions (not stubs or TODOs)
- [ ] LLM smoke tests pass, if `ANTHROPIC_API_KEY` is available

**If any criterion fails:** Fix the reported issues and re-run checks.

Expand Down
62 changes: 60 additions & 2 deletions build-mcpb/references/workflow/phase-5-embed-skill.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,13 +116,71 @@ The response should include a resource with `uri: "skill://<name>/usage"`.

> **Note:** The embedded skill is encouraged but not mandatory per mpak spec. If no meaningful workflows exist yet (e.g., the server has only 1-2 tools), it's acceptable to skip this phase and add the skill later.

## 5f: Integration & LLM Smoke Tests

### Integration tests

**Python:**
```bash
make test-integration # real API calls (needs <NAME>_API_KEY in .env)
```

Real API calls against the live service. The template scaffolds `tests-integration/test_core_tools.py` as a stub — you must replace it with real tests.

**How to write them:** Open `api_client.py` and list every public method (skip `__init__`, `close`, `_request`, `_ensure_session`, and dunder methods). For each method, write a test:

- **Read methods** (list, get, search): Call with minimal valid parameters, assert the response has the expected shape (list, dict, or model with expected keys).
- **Write methods** (create, update, delete): Create a test resource, verify it, then clean it up in a `finally` block. If no delete method exists, mark the resource as completed/archived and leave a comment.
- **Chained methods:** Some methods need an ID from a prior call (e.g., `list_workspaces` returns a GID needed by `search_tasks`). Chain them — call the list method first, use the first result's ID.
- **Tier-gated methods:** If the API has premium endpoints that may not be available on the user's plan, write a `has_<feature>_access` helper that probes the endpoint and returns `False` on 401/402/403. Use `pytest.skip()` in the test if access is unavailable.

See `references/PATTERNS.md` → "Integration Test Patterns (Python)" for concrete examples.

**How to run them:**

1. Ask the contributor to add their API key to `.env` (e.g., `ASANA_API_KEY=xxx`). The `.env` file is already in `.gitignore` and `.mcpbignore`. The contributor was asked for this key at the start of the process — they should have it ready.
2. Run `make test-integration`.
3. All tests should pass or skip (for tier-gated features). Fix any failures before proceeding.
4. If the contributor says auth setup is too complex for now (e.g., OAuth flows, multi-step app configuration), proceed — the tests are written and ready to run later. Do not skip writing the tests.

### LLM smoke tests

**Python:**
```bash
make test-llm # needs <NAME>_API_KEY + ANTHROPIC_API_KEY in .env
```

Verify Claude Haiku selects the correct tool given the skill resource. Requires both the service API key and `ANTHROPIC_API_KEY`.

**How to write them:** The template scaffolds `get_server_context()` and `get_anthropic_client()` — leave those as-is. Replace the commented-out test stub with 3–5 real tests, one per key tool. Extract a `call_llm()` helper to avoid repeating the system prompt construction across tests.

Each test sends a natural language prompt and asserts the LLM selected the expected tool. Include concrete values for any required parameters in the prompt (IDs, coordinates, dates) — without them, the LLM will ask for clarification instead of calling the tool.

See `references/PATTERNS.md` → "LLM smoke tests" for the `call_llm()` helper pattern and the concrete-identifiers rule.

**How to run them:**

1. Ask the contributor to add `ANTHROPIC_API_KEY` to `.env` alongside the service API key.
2. Run `make test-llm`.
3. All tests should pass. If a test fails because the LLM picked the wrong tool, adjust the prompt to be more specific before touching the SKILL.md.
4. If the contributor does not have an `ANTHROPIC_API_KEY`, proceed — the tests are written and ready to run later. Do not skip writing the tests.

**Working through failures:** These tests can be challenging depending on the target API's auth method, plan-gated endpoints, and rate limits. Work through failures interactively with the contributor:
- If auth is complex (OAuth, multi-step), help the contributor get a working token and update the test fixtures
- If endpoints are plan-gated, use the tier-skip pattern (see PATTERNS.md) to gracefully skip inaccessible endpoints
- If the contributor doesn't have the required API keys or wants to move on, that's fine — these tests are recommended but **not blocking** for initial release

## Gate

**Criteria:**
- [ ] Contributor has approved the SKILL.md content
- [ ] Skill resource is wired in server code
- [ ] `resources/list` includes `skill://<name>/usage`
- [ ] Integration tests written with real assertions (not stubs or TODOs)
- [ ] Integration tests pass (recommended, not blocking)
- [ ] LLM smoke tests written with real assertions (not stubs or TODOs)
- [ ] LLM smoke tests pass (recommended, not blocking)

**If any criterion fails:** Revisit the relevant sub-step above.
**If any criterion fails:** Revisit the relevant sub-step above. For integration/LLM tests, discuss with the contributor whether to fix now or defer.

**When all pass:** Proceed to Phase 6.
**When all pass (or non-blocking items deferred):** Proceed to Phase 6.
Loading