Platform Loading Behavior: [Platform Name]

Platform Details

Field	Value
Platform
Platform version
Check list version
Date tested
Model used
Tester

A note on models

Some behaviors may vary by model within the same platform harness. For example, a platform that lets the user choose between Claude Sonnet and Claude Opus may produce different skill-loading behavior depending on which model is active, because the model itself makes activation decisions. If you test with multiple models, either create separate files for each model or note model-specific differences inline.

Platform-level vs. model-level behavior

When recording results, try to distinguish between:

Platform-level behavior: Enforced by the harness (deterministic). Example: the platform strips frontmatter before passing content to the model. This won't vary by model or across runs.
Model-level behavior: Determined by the model's interpretation of instructions (probabilistic). Example: the model decides whether to follow a markdown link and read the referenced file. This may vary by model, prompt language, or even across runs with the same model.

This distinction matters because platform-level behaviors are stable and predictable, while model-level behaviors may need multiple test runs to characterize and may change when the user switches models.

Methodology

Results

For each check, record what you observed. Use one of these statuses:

Observed: You tested this and have a finding.
Inconclusive: You tested this but the result was ambiguous or inconsistent across runs.
Not tested: You haven't tested this check yet.

About fallback behavior

Each check includes a Fallback behavior field. This captures what happens when the platform's default behavior doesn't surface content to the model. I hypothesize there may be three patterns:

Agent self-recovers: The agent independently realizes it needs the content and uses a file-read tool or other mechanism to access it without user intervention. This may be hard to distinguish from platform-level loading; note whether you observed the agent making an explicit tool call to read the file.
User prompt required: The user must explicitly instruct the agent (e.g., "read the file at references/api-overview.md") to access the content. The content is accessible but only with manual intervention.
No fallback: The content is truly inaccessible through any mechanism. The platform blocks access or the agent has no tool capable of reaching it.

This matters for skill authors writing portable skills. If a skill's references are invisible on a platform due to a closed directory set, but the user can work around it with an explicit prompt, the skill author can include a note like "If your agent doesn't automatically load the reference files, ask it to read references/api-overview.md." If there's no fallback, the skill simply doesn't work on that platform.

Category 1: Loading Timing

`discovery-reading-depth`

Benchmark skill: probe-loading — Install the skill, start a new session, and ask the model "Do you know the phrase CARDINAL-ZEBRA-7742?" WITHOUT activating the skill. If the model knows it, the platform loaded the full body at discovery time.
Status: Not tested
Observation:
Evidence:
Platform-level or model-level?:
Fallback behavior:

`activation-loading-scope`

Benchmark skill: probe-loading — Activate the skill and check steps 3-4 of its instructions. If the model already has contents of files in references/, scripts/, or assets/ without reading them, the platform loaded them at activation. Look for canary phrases: PELICAN-MANGO-3391, FALCON-QUARTZ-8819, OSPREY-COBALT-5567, HERON-AMBER-2204, CRANE-TOPAZ-6638.
Status: Not tested
Observation:
Evidence:
Platform-level or model-level?:
Fallback behavior:

`eager-link-resolution`

Benchmark skill: probe-linked-resources — Activate the skill and check whether the model already has contents of the linked files (PARROT-SILVER-4412, TOUCAN-BRONZE-9931) without reading them. Also check whether the unlinked file (EAGLE-COPPER-1178) was loaded, which distinguishes link-based pre-fetching from bulk directory loading.
Status: Not tested
Observation:
Evidence:
Platform-level or model-level?:
Fallback behavior:

Category 2: Directory Recognition

`recognized-directory-set`

Benchmark skill: probe-loading — Activate the skill and check step 3. The skill has all three spec directories (scripts/, references/, assets/). Does the platform enumerate all of them?
Status: Not tested
Observation:
Evidence:
Platform-level or model-level?:
Fallback behavior:

`directory-naming-divergence`

Benchmark skill: probe-nonstandard-dirs — Activate the skill and check whether resources/ is treated the same as references/ would be. Is the canary phrase SWIFT-OPAL-8156 visible or enumerated?
Status: Not tested
Observation:
Evidence:
Platform-level or model-level?:
Fallback behavior:

`unrecognized-directory-handling`

Benchmark skill: probe-nonstandard-dirs — Activate the skill and check which of the nonstandard directories (evals/, templates/, resources/) the model is aware of. Look for canary phrases: ROBIN-JADE-3847, WREN-PEARL-6293, SWIFT-OPAL-8156.
Status: Not tested
Observation:
Evidence:
Platform-level or model-level?:
Fallback behavior:

Category 3: Resource Access Patterns

`resource-enumeration-behavior`

Benchmark skill: probe-loading — Activate the skill. The references/ directory has 3 files (2 linked from SKILL.md, 1 unreferenced). Check whether all 3 are enumerated, only the linked ones, or none. The unreferenced file's canary phrase is OSPREY-COBALT-5567.
Status: Not tested
Observation:
Evidence:
Platform-level or model-level?:
Fallback behavior:

`path-resolution-base`

Benchmark skill: probe-linked-resources — Activate the skill and have the model try to read files using the relative paths in the SKILL.md. Note what directory the paths resolve against.
Status: Not tested
Observation:
Evidence:
Platform-level or model-level?:
Fallback behavior:

`cross-skill-resource-shadowing`

Benchmark skills: probe-shadow-alpha + probe-shadow-beta — Activate both skills. Have each one read references/API.md. Check which canary phrase appears: STORK-CORAL-4471 (alpha) or EGRET-SLATE-8823 (beta).
Status: Not tested
Observation:
Evidence:
Platform-level or model-level?:
Fallback behavior:

`path-traversal-boundary`

Benchmark skill: probe-traversal — Activate the skill and follow its instructions to attempt reads outside the skill directory (../probe-loading/SKILL.md, ../README.md, ../../loading-behavior.md).
Status: Not tested
Observation:
Evidence:
Platform-level or model-level?:
Fallback behavior:

Category 4: Content Presentation

`frontmatter-handling`

Benchmark skills: probe-loading, probe-compatibility — Activate the skill and check whether the model can see frontmatter fields (allowed-tools, compatibility, metadata). probe-loading has multiple optional fields; probe-compatibility has a compatibility field with meaningful requirements text.
Status: Not tested
Observation:
Evidence:
Platform-level or model-level?:
Fallback behavior:

`metadata-value-edge-cases`

Benchmark skill: probe-metadata-values — Activate the skill. If it loads, the platform didn't reject the edge-case metadata values. Check steps 2-3 to see which values the model received and whether any keys were dropped. Look for canary phrase THRUSH-FLINT-8294 to confirm the body loaded.
Status: Not tested
Observation:
Evidence:
Platform-level or model-level?:
Fallback behavior:

`content-wrapping-format`

Benchmark skill: probe-loading — Activate the skill and check step 2. Ask the model to describe how the skill content was presented to it (raw markdown, XML tags, JSON, etc.).
Status: Not tested
Observation:
Evidence:
Platform-level or model-level?:
Fallback behavior:

Category 5: Lifecycle Management

`reactivation-deduplication`

Benchmark skill: probe-loading — Activate the skill, have a conversation, then activate it again. Ask the model if it sees the skill instructions twice in its context.
Status: Not tested
Observation:
Evidence:
Platform-level or model-level?:
Fallback behavior:

`reactivation-freshness`

Benchmark skill: probe-loading — Activate the skill, then edit the SKILL.md to change the canary phrase from CARDINAL-ZEBRA-7742 to something else. Activate the skill again in the same session and ask for the canary phrase.
Status: Not tested
Observation:
Evidence:
Platform-level or model-level?:
Fallback behavior:

`context-compaction-protection`

Benchmark skill: probe-loading — Activate the skill, then have a long conversation (enough to trigger context compaction). Ask the model to recall the canary phrase CARDINAL-ZEBRA-7742 and the skill's specific instructions.
Status: Not tested
Observation:
Evidence:
Platform-level or model-level?:
Fallback behavior:

Category 6: Access Control

`trust-gating-behavior`

Benchmark skill: Any benchmark skill — Install at project level in a freshly cloned or untrusted repository. Start a new session and check whether the skill appears in the available skills list, or if the platform prompts for trust approval.
Status: Not tested
Observation:
Evidence:
Platform-level or model-level?:
Fallback behavior:

`compatibility-field-behavior`

Benchmark skill: probe-compatibility — Activate the skill and follow its instructions. The skill's compatibility field says "Designed for Claude Code (or similar products). Requires Python 3.14+ and network access." Test on a non-Claude platform to see how it handles the Claude-specific text.
Status: Not tested
Observation:
Evidence:
Platform-level or model-level?:
Fallback behavior:

Category 7: Structural Edge Cases

`nested-skill-discovery`

Benchmark skill: probe-deep-nesting — Install the skill and check the available skills list. Does nested-skill appear as a separate skill? Its SKILL.md is at probe-deep-nesting/references/nested-skill/SKILL.md. Canary phrase: HAWK-ONYX-5534.
Status: Not tested
Observation:
Evidence:
Platform-level or model-level?:
Fallback behavior:

`resource-nesting-depth`

Benchmark skill: probe-deep-nesting — Activate the skill and follow its instructions to read files at 1 level (DOVE-GARNET-1029), 2 levels (LARK-RUBY-4483), and 3 levels (OWL-EMERALD-7756, FINCH-SAPPHIRE-2098) of nesting.
Status: Not tested
Observation:
Evidence:
Platform-level or model-level?:
Fallback behavior:

Category 8: Skill-to-Skill Invocation

`cross-skill-invocation`

Benchmark skills: invoke-alpha + invoke-beta — Activate invoke-alpha. Does it successfully activate invoke-beta? Look for canary phrase TERN-MOSS-6647 in the output.
Status: Not tested
Observation:
Evidence:
Platform-level or model-level?:
Fallback behavior:

`invocation-depth-limit`

Benchmark skills: invoke-alpha + invoke-beta + invoke-gamma — Activate invoke-alpha and let the full chain run. Does it reach invoke-gamma? Look for canary phrase JAY-TEAL-9984. If the chain breaks, note which link failed.
Status: Not tested
Observation:
Evidence:
Platform-level or model-level?:
Fallback behavior:

`circular-invocation-handling`

Benchmark skills: probe-circular-alpha + probe-circular-beta — Activate probe-circular-alpha. Does the platform detect the circular reference and stop, or does it loop? Count how many times each canary phrase appears (KITE-ONYX-2251, WREN-SLATE-7738).
Status: Not tested
Observation:
Evidence:
Platform-level or model-level?:
Fallback behavior:

`invocation-language-sensitivity`

Benchmark skills: invoke-alpha + invoke-beta + invoke-gamma — Run the invocation chain test in English, then repeat in another language (e.g., Japanese: "呼び出しチェーンを開始してください"). Compare success rates across multiple runs.
Status: Not tested
Observation:
Evidence:
Platform-level or model-level?:
Fallback behavior:

Category 9: Skill Dependencies

`informal-dependency-resolution`

Benchmark skills: invoke-alpha + invoke-beta — Same test as cross-skill-invocation. The invoke chain uses prose instructions to express dependencies between skills.
Status: Not tested
Observation:
Evidence:
Platform-level or model-level?:
Fallback behavior:

`missing-dependency-behavior`

Benchmark skill: probe-missing-dep — Activate the skill. It references nonexistent-formatter which doesn't exist. Observe the failure mode: does the model report the skill doesn't exist, silently skip the step, or attempt to fulfill the task from general knowledge?
Status: Not tested
Observation:
Evidence:
Platform-level or model-level?:
Fallback behavior:

`nonstandard-dependency-fields`

Benchmark skill: probe-nonstandard-fields — Activate the skill. It has requires: probe-loading and depends-on: [probe-shadow-alpha, probe-shadow-beta] in frontmatter. Check whether the platform acted on these fields or ignored them.
Status: Not tested
Observation:
Evidence:
Platform-level or model-level?:
Fallback behavior:

`cross-scope-dependency`

Benchmark skills: probe-cross-scope + probe-loading — Install probe-cross-scope at project level and probe-loading at user level. Activate probe-cross-scope and see if it can invoke probe-loading across scopes. Then remove probe-loading from user level and test again.
Status: Not tested
Observation:
Evidence:
Platform-level or model-level?:
Fallback behavior:

Uh oh!

FilesExpand file tree

template.md

Latest commit

History

template.md

File metadata and controls

Platform Loading Behavior: [Platform Name]

Platform Details

A note on models

Platform-level vs. model-level behavior

Methodology

Results

About fallback behavior

Category 1: Loading Timing

discovery-reading-depth

activation-loading-scope

eager-link-resolution

Category 2: Directory Recognition

recognized-directory-set

directory-naming-divergence

unrecognized-directory-handling

Category 3: Resource Access Patterns

resource-enumeration-behavior

path-resolution-base

cross-skill-resource-shadowing

path-traversal-boundary

Category 4: Content Presentation

frontmatter-handling

metadata-value-edge-cases

content-wrapping-format

Category 5: Lifecycle Management

reactivation-deduplication

reactivation-freshness

context-compaction-protection