Skip to content

harness: add targeted tests for dedup sensitivity, multi-module section splitting, and security boundary routing #36

@SmartBrandStrategies

Description

@SmartBrandStrategies

Context

After running the full scenario harness and analyzing results, these test gaps were identified as high-value additions for improving classifier quality.

Proposed additional test scenarios

1. Dedup sensitivity — near-duplicate variants

The current dedup test (`edge-duplicate-injection`) injects identical text. We need tests for:

  • Rephrased duplicates: "Never commit secrets" vs "Do not commit secrets to the repository" — Jaccard similarity may fall below 0.8, causing false negatives
  • Partial duplicates: a new session adds 3 rules, 2 of which already exist in ADF — only 1 should migrate

2. Multi-module section splitting — heading dominates all items

Current classifier: once a heading routes to a module, ALL items in that section go to the same module. Items with keywords for other modules are ignored.

Example failure:
```

Database

  • D1 bound as `DB` in wrangler.toml → backend.adf (heading wins)
  • Run migrations with `wrangler d1 migrate` → backend.adf (should be infra.adf!)
    ```

A test that verifies cross-keyword items within a section would expose this and track when it's fixed.

3. Security boundary routing — auth in backend vs security modules

Auth-related rules appear in two contexts:

  • Implementation rules (how to write auth code): belong in `backend.adf`
  • Security policy rules (what must be enforced): belong in `security.adf`

The current `## Auth` heading maps everything to `security.adf`. A test with mixed implementation + policy rules under one heading would expose the lack of sub-heading routing.

4. Trigger prefix collision — short triggers matching unrelated content

The prefix match fix (removing trailing `\b`) introduced a potential over-matching risk. Example:

  • Trigger `auth` now matches "authority", "author", "authentic"
  • Trigger `api` matches "apiary", "apiVersion"

A test with content containing "the author of this library" or "apiary endpoint" should verify these don't accidentally route to security/backend modules.

5. Large injection — 20+ items in one session

Current tests max at ~13 items per session. A stress test with 25+ items would:

  • Test dedup performance (O(n²) Jaccard comparisons)
  • Verify routing accuracy doesn't degrade at scale
  • Surface any ADF write failures for large patch sets

6. Empty/minimal injection — just a heading, no items

Edge case: AI adds `## Auth\n\n` (heading with no content). Should produce 0 extractions cleanly without errors.

Implementation

Add these to `harness/corpus/edge-cases.ts` as additional `Scenario` objects. The trigger prefix collision test (#4) is particularly important to add before the prefix-match change ships in a release.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions