Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 38 additions & 1 deletion docs/skills-execution/ui-testing-plan.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ plumbing) is explicitly out of scope here.
| Fixture location | `src/client/playwright/fixtures/skills/` (vendored, see its `README.md` for provenance/licensing). Chosen over `e2e/` naming; anticipates a future durable Playwright suite under `src/client/playwright/`. |
| Test workspace | A new, dedicated project (**"Skills QA"**) so test artifacts don't mix with real project data. Created fresh, not reused. |
| Chat/runtime backend | The local llama runtime model already configured in this dev environment (no cloud provider calls). |
| First execution step | Not started yet — this plan is for review first. Once approved, start with the smallest smoke scenario (A1) before the full matrix. |
| First execution step | Started 2026-07-01 — exploratory pass via `playwright-cli --extension` on the **Skills QA** project. See §6 execution log. |

## 2. Fixture inventory

Expand Down Expand Up @@ -117,3 +117,40 @@ not graded on response quality — local model output isn't asserted verbatim.
- Whether any of these scenarios graduate into a durable `@playwright/test`
suite under `src/client/playwright/` is a decision for after this
exploratory pass, once we know which assertions are stable enough for CI.

## 6. Execution log

Workspace: project **Skills QA** (`a976c272-931f-47ee-8d92-34a5374a1199`).

| Guide | ID | Purpose |
|---|---|---|
| Skills QA Guide | `aab4c5fd-27bb-4774-a457-6abcfa13c50d` | Main import + runtime tool-calling |
| Skills QA Gating | `c4164aaf-ea21-4940-b88e-e21bb4f6d91a` | `requires_toolsets` + `fallback_for_toolsets` gating |

### Runtime / chat (section C + tool-calling extensions)

| # | Result | Notes |
|---|---|---|
| C1 | PASS | Discovery lists enabled skills; disabled `arxiv` omitted |
| C2 | PASS | `skills.read` on ocr skill; trace `Source=skills` |
| C3 / T19 | PASS | `docker-management` hidden from discovery when terminal tools off |
| T1–T8 | PASS | `skills.list`, progressive `skills.read`, path traversal, disabled skill |
| T13 | PASS | PDF extraction via `ocr-and-documents` script + `[@files]` |
| T20–T22 | PASS | Gated docker skill: empty list → explicit read works → listed after Run Python/Bash enabled |
| T24 | PASS | Disabled `arxiv` not claimed when asked directly |
| T18 runtime | PASS | Web off → `searxng-search` listed; Web on → suppressed (only `docker-management`) |
| T18 / A10 UI | **FAIL → fixed** | UI showed “Prerequisites met” for `searxng-search` while Web Search was on. Root cause: client `computeSkillGating` ignored `fallback_for_toolsets`. Fixed in `skillGating.ts` + DTO wiring; badge now **Suppressed** with explanatory summary. Re-verify in browser after rebuild. |

### Authoring (section A) — partial

| # | Result | Notes |
|---|---|---|
| A9 | PASS | `docker-management` card **Gated** without terminal; **Prerequisites met** after Run Python/Bash |
| A10 | FAIL (pre-fix) | See T18 UI row above |
| A1–A8, A11–A14, B | Not run / deferred | |

### Bugs found during pass

1. **SkillManifestUpdater** corrupted SKILL.md frontmatter on save — removed; sidecar-only sync (`AssistantSkillMetaSync`).
2. **Fallback gating UI gap** — Skills tab did not mirror server `SkillVisibilityFilter` for `fallback_for_toolsets` (fixed this session).
3. **Fixture README** — `searxng-search` gating description was inverted (fixed).
2 changes: 1 addition & 1 deletion src/client/playwright/fixtures/skills/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ ships with are not needed here.
| `ocr-and-documents/` | `skills/productivity/ocr-and-documents` | hermes, has `scripts/` + an unrecognized `DESCRIPTION.md` | Import preserves files it doesn't understand |
| `research-paper-writing/` | `skills/research/research-paper-writing` | hermes; **`SKILL.md` is 105,111 chars** | Real-world trip of the 100,000-char cap (`SkillFrontmatter.MaxSkillMarkdownChars`) — must be rejected |
| `kanban-video-orchestrator/` | `optional-skills/creative/kanban-video-orchestrator` | hermes; has all three canonical subfolders (`references/`, `scripts/`, `assets/`) together | Full-tree import; confirms scripts/assets are stored but never auto-run/auto-copied (S11) |
| `searxng-search/` | `optional-skills/research/searxng-search` | hermes; `fallback_for_toolsets: [web]` | Gating: GuideAnts maps `web` → `WebSearch`/`ReadWeb` (`SkillToolsetMapping.ts`) — visible only when the assistant has those tools |
| `searxng-search/` | `optional-skills/research/searxng-search` | hermes; `fallback_for_toolsets: [web]` | Gating: GuideAnts maps `web` → `WebSearch`/`ReadWeb` (`SkillToolsetMapping.ts`) — offered only when those tools are **not** enabled (fallback suppressed when web search is available) |
| `docker-management/` | `optional-skills/devops/docker-management` | hermes; `requires_toolsets: [terminal]` | Gating: `terminal` → `code_interpreter` — visible only when the assistant has the sandbox/code-interpreter capability |
| `pptx-author-collision/` | `optional-skills/finance/pptx-author` | agentskills.io-style, Apache-2.0 (adapted from `anthropics/financial-services`); **`name: pptx-author`** | Name collision: same skill `name` as the shipped bootstrap skill at `Resources/bootstrap/guides/pptx-guide/Skills/pptx-author/` |
| `invalid-missing-name/` | Hand-authored (not from hermes-agent) | n/a | Missing required `name` field — import must reject explicitly, no silent partial acceptance |
Expand Down
2 changes: 2 additions & 0 deletions src/client/src/components/guides/editor/BaseEntityEditor.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -594,6 +594,8 @@ export default function BaseEntityEditor({ entityType, entityId, projectId }: Ba
source: skill.source,
requiresToolsets: [],
requiresTools: [],
fallbackForToolsets: [],
fallbackForTools: [],
files: (skill.filesToAdd ?? []).map((file, index) => ({
id: `pending-${skill.name}-${index}`,
folderKind: file.folderKind,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,8 @@ describe('guideHasSkillScriptsPayload', () => {
source: 'Imported',
requiresToolsets: [],
requiresTools: [],
fallbackForToolsets: [],
fallbackForTools: [],
files: [{ id: '1', folderKind: 'Skill', relativePath: 'Skills/kanban/scripts/monitor.py', created: '' }],
}],
pendingSkillUploads: [],
Expand Down
6 changes: 5 additions & 1 deletion src/client/src/components/guides/editor/skills/SkillCard.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ interface SkillCardProps {
source: import('../../../../types/guides').SkillSourceKind;
requiresToolsets: string[];
requiresTools: string[];
fallbackForToolsets: string[];
fallbackForTools: string[];
files: FileDto[];
};
availableToolTypes: Set<string>;
Expand Down Expand Up @@ -50,6 +52,8 @@ export function SkillCard({
{
requiresToolsets: skill.requiresToolsets,
requiresTools: skill.requiresTools,
fallbackForToolsets: skill.fallbackForToolsets,
fallbackForTools: skill.fallbackForTools,
},
availableToolTypes,
hasCodeInterpreterFiles,
Expand All @@ -71,7 +75,7 @@ export function SkillCard({
{skill.source}
</span>
<span className={`inline-flex rounded px-2 py-0.5 text-xs font-medium ${viewModel.gatingClassName}`}>
{gating.satisfied ? 'Prerequisites met' : 'Gated'}
{gating.statusLabel}
</span>
</div>
<p className="text-sm text-gray-700">{skill.description}</p>
Expand Down
6 changes: 6 additions & 0 deletions src/client/src/components/guides/editor/skills/SkillsTab.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -103,12 +103,16 @@ export function SkillsTab({
const manifest = skill.filesToAdd?.find((file) => file.relativePath.endsWith('/SKILL.md'));
let requiresToolsets: string[] = [];
let requiresTools: string[] = [];
let fallbackForToolsets: string[] = [];
let fallbackForTools: string[] = [];
if (manifest?.contentBytes) {
try {
const markdown = decodePendingFileContent(manifest.contentBytes);
const parsed = parseSkillFrontmatter(markdown);
requiresToolsets = parsed.frontmatter.requiresToolsets;
requiresTools = parsed.frontmatter.requiresTools;
fallbackForToolsets = parsed.frontmatter.fallbackForToolsets;
fallbackForTools = parsed.frontmatter.fallbackForTools;
} catch {
// display without gating metadata if parse fails
}
Expand All @@ -122,6 +126,8 @@ export function SkillsTab({
source: skill.source,
requiresToolsets,
requiresTools,
fallbackForToolsets,
fallbackForTools,
files: (skill.filesToAdd ?? []).map((file, index) => ({
id: `pending-${skill.name}-${index}`,
folderKind: file.folderKind,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -73,22 +73,66 @@ body
describe('skillGating', () => {
it('reports satisfied when required tools are present', () => {
const result = computeSkillGating(
{ requiresToolsets: ['web'], requiresTools: [] },
{
requiresToolsets: ['web'],
requiresTools: [],
fallbackForToolsets: [],
fallbackForTools: [],
},
new Set(['WebSearch', 'ReadWeb']),
false,
);
expect(result.satisfied).toBe(true);
expect(result.statusLabel).toBe('Prerequisites met');
});

it('reports unsatisfied honestly', () => {
const result = computeSkillGating(
{ requiresToolsets: ['web'], requiresTools: [] },
{
requiresToolsets: ['web'],
requiresTools: [],
fallbackForToolsets: [],
fallbackForTools: [],
},
new Set([]),
false,
);
expect(result.satisfied).toBe(false);
expect(result.statusLabel).toBe('Gated');
expect(result.summary).toContain('Will not be offered');
});

it('suppresses fallback skills when the primary toolset is available', () => {
const result = computeSkillGating(
{
requiresToolsets: [],
requiresTools: [],
fallbackForToolsets: ['web'],
fallbackForTools: [],
},
new Set(['WebSearch']),
false,
);
expect(result.satisfied).toBe(false);
expect(result.statusLabel).toBe('Suppressed');
expect(result.summary).toContain('web');
});

it('offers fallback skills when the primary toolset is unavailable', () => {
const result = computeSkillGating(
{
requiresToolsets: [],
requiresTools: [],
fallbackForToolsets: ['web'],
fallbackForTools: [],
},
new Set([]),
false,
);
expect(result.satisfied).toBe(true);
expect(result.statusLabel).toBe('Prerequisites met');
expect(result.summary).toContain('fallback');
});
});

describe('skillToolsetMapping', () => {
Expand All @@ -108,7 +152,12 @@ describe('skillToolsetMapping', () => {
describe('skillCardViewModel', () => {
it('builds source and gating badges', () => {
const gating = computeSkillGating(
{ requiresToolsets: [], requiresTools: [] },
{
requiresToolsets: [],
requiresTools: [],
fallbackForToolsets: [],
fallbackForTools: [],
},
new Set(['WebSearch']),
false,
);
Expand Down Expand Up @@ -147,6 +196,8 @@ describe('skillOrdering', () => {
files: [],
requiresToolsets: [],
requiresTools: [],
fallbackForToolsets: [],
fallbackForTools: [],
},
{
name: 'beta',
Expand All @@ -157,6 +208,8 @@ describe('skillOrdering', () => {
files: [],
requiresToolsets: [],
requiresTools: [],
fallbackForToolsets: [],
fallbackForTools: [],
},
];

Expand All @@ -181,6 +234,8 @@ describe('skillOrdering', () => {
files: [],
requiresToolsets: [],
requiresTools: [],
fallbackForToolsets: [],
fallbackForTools: [],
},
{
name: 'qa-authored-a8',
Expand All @@ -191,6 +246,8 @@ describe('skillOrdering', () => {
files: [],
requiresToolsets: [],
requiresTools: [],
fallbackForToolsets: [],
fallbackForTools: [],
},
{
name: 'kanban-video-orchestrator',
Expand All @@ -201,6 +258,8 @@ describe('skillOrdering', () => {
files: [],
requiresToolsets: [],
requiresTools: [],
fallbackForToolsets: [],
fallbackForTools: [],
},
];

Expand Down
10 changes: 10 additions & 0 deletions src/client/src/components/guides/editor/skills/skillFrontmatter.ts
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ export interface ParsedSkillFrontmatter {
displayOrder: number;
requiresToolsets: string[];
requiresTools: string[];
fallbackForToolsets: string[];
fallbackForTools: string[];
source?: string;
}

Expand Down Expand Up @@ -165,6 +167,14 @@ export function parseSkillFrontmatter(markdown: string): SkillFrontmatterParseRe
getStringList(hermes, 'requires_tools'),
getStringList(root, 'allowed-tools'),
),
fallbackForToolsets: coalesceLists(
getStringList(guideants, 'fallback_for_toolsets'),
getStringList(hermes, 'fallback_for_toolsets'),
),
fallbackForTools: coalesceLists(
getStringList(guideants, 'fallback_for_tools'),
getStringList(hermes, 'fallback_for_tools'),
),
source: getString(guideants, 'source') ?? getString(hermes, 'source'),
};

Expand Down
61 changes: 54 additions & 7 deletions src/client/src/components/guides/editor/skills/skillGating.ts
Original file line number Diff line number Diff line change
@@ -1,14 +1,18 @@
import { TOOLSET_TO_TOOLS } from './skillToolsetMapping';
import { isToolsetAvailable, TOOLSET_TO_TOOLS } from './skillToolsetMapping';

export interface SkillGatingInput {
requiresToolsets: string[];
requiresTools: string[];
fallbackForToolsets: string[];
fallbackForTools: string[];
}

export interface SkillGatingResult {
satisfied: boolean;
missingCapabilities: string[];
suppressedByCapabilities: string[];
summary: string;
statusLabel: string;
}

export function computeSkillGating(
Expand Down Expand Up @@ -44,18 +48,61 @@ export function computeSkillGating(
}
}

if (missing.size === 0) {
if (missing.size > 0) {
const labels = [...missing].map((item) => item.replace(/^toolset:|^tool:/, ''));
return {
satisfied: false,
missingCapabilities: labels,
suppressedByCapabilities: [],
summary: `Will not be offered to the model until ${labels.join(', ')} is added.`,
statusLabel: 'Gated',
};
}

const suppressed = new Set<string>();

for (const toolset of skill.fallbackForToolsets) {
if (isToolsetAvailable(toolset, availableToolTypes, hasCodeInterpreterFiles)) {
suppressed.add(toolset);
}
}

for (const tool of skill.fallbackForTools) {
if (availableToolTypes.has(tool)) {
suppressed.add(tool);
}
}

if (suppressed.size > 0) {
const labels = [...suppressed];
return {
satisfied: false,
missingCapabilities: [],
suppressedByCapabilities: labels,
summary: `Will not be offered while ${labels.join(', ')} is available (primary capability replaces this fallback skill).`,
statusLabel: 'Suppressed',
};
}

if (skill.fallbackForToolsets.length > 0 || skill.fallbackForTools.length > 0) {
const fallbackLabels = [
...skill.fallbackForToolsets,
...skill.fallbackForTools,
];
return {
satisfied: true,
missingCapabilities: [],
summary: 'All prerequisites satisfied by the current assistant tools.',
suppressedByCapabilities: [],
summary: `Offered as a fallback when ${fallbackLabels.join(', ')} is unavailable.`,
statusLabel: 'Prerequisites met',
};
}

const labels = [...missing].map((item) => item.replace(/^toolset:|^tool:/, ''));
return {
satisfied: false,
missingCapabilities: labels,
summary: `Will not be offered to the model until ${labels.join(', ')} is added.`,
satisfied: true,
missingCapabilities: [],
suppressedByCapabilities: [],
summary: 'All prerequisites satisfied by the current assistant tools.',
statusLabel: 'Prerequisites met',
};
}
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,25 @@ export const TOOLSET_TO_TOOLS: Record<string, string[]> = {
web: ['WebSearch', 'ReadWeb'],
};

export function isToolsetAvailable(
toolset: string,
availableToolTypes: Set<string>,
hasCodeInterpreterFiles: boolean,
): boolean {
const mapped = TOOLSET_TO_TOOLS[toolset.toLowerCase()];
if (!mapped) {
return false;
}

return mapped.some((toolType) => {
if (toolType === 'code_interpreter') {
return hasCodeInterpreterFiles || availableToolTypes.has('code_interpreter');
}

return availableToolTypes.has(toolType);
});
}

export const CATALOG_TOOL_IDS: Record<string, string> = {
WebSearch: 'b0000000-0000-0000-0000-00000000000d',
ReadWeb: 'b0000000-0000-0000-0000-00000000000e',
Expand Down
Loading
Loading