Skip to content

feat: Add specification for anti-hallucination v0#100

Merged
DocRoms merged 1 commit into
mainfrom
feat/Add-anti-hallu-v0
May 29, 2026
Merged

feat: Add specification for anti-hallucination v0#100
DocRoms merged 1 commit into
mainfrom
feat/Add-anti-hallu-v0

Conversation

@DocRoms
Copy link
Copy Markdown
Owner

@DocRoms DocRoms commented May 27, 2026

Summary
0.8.7 — Anti-hallucination program + usage/cost panel + test-coverage push + CI hardening

added

  • Anti-hallucination program (Phase 1+2): sourcing discipline injected at the single runner chokepoint (covers every
    agent surface incl. Ollama), off | warn | enforce mode (Settings + config.toml). Post-output lint with 2 tiers: cheap
    unsourced-claim heuristic (EN/FR) + mechanical [src: …] citation verification (path-jailed, line-bounds-checked,
    ungameable). Per-message pill (red=fabricated, amber=unsourced, green ✓=verified) + detail panel. Migration 062
    (messages.lint_report)
  • Architecture pivot: docs/AGENTS.md is now the canonical anti-hallu source — deterministic audit STEP 0
    inserts/refreshes a kronn:section block (no LLM call), runtime preambles collapse to pointers, legacy projects adopt
    it in 1 click (/anti-hallu/status|inject + ProjectCard badge). Open convention spec served at
    /api/conventions/agents-md-format-v1
  • New "Sourcing & Anti-hallucination" Settings card (extracted from Identity) with the 3-mode dropdown + inline spec
    viewer. "Strict" tagged "(preview · 0.8.8)" with a disclosure toast (behaves like Warn until write-refusal ships)
  • Agent usage & cost panel via ccusage (GET /api/usage + Settings card): REAL token + cache breakdown + live pricing
    of detected CLIs from their local logs (replaces the ~6× over-estimating estimate_cost). Daily/weekly/monthly toggle,
    per-agent chips, paginated history
  • MCP remote-control tools: qp_batch_run, workflow_run_discussions, workflow_wait_for_completion (long-poll to
    terminal status with next_check hint) — completes the mobile remote-control surface
  • Convention authoring surface: convention_get MCP tool (fetches the spec verbatim, allowlisted) + builtin
    kronn-doc-author skill (provenance grammar cheat-sheet, auto-triggers on AGENTS.md edits)
  • Backend tests +570 (Lines 72.66% → 77.48%, Functions 75.17% → 81.49%, crossed 80% Functions); frontend tests +229
    (Lines 57.76% → 59.16%). New pre-seed handler test pattern (~7× the wrong-id-sweep ratio). 0%-coverage modules now
    covered: crypto, auth middleware, db/mcps, reconciliation, error-hint, Settings cards
    (DebugSection/OllamaCard/ProfilesSection/Identity/Usage), lib/api.ts (24%→48%)

fixes

  • ChatInput double-send race: two sync clicks fired onSend twice before the parent flipped sending — useRef guard +
    queueMicrotask release, 5 tests
  • Add project → Discover repos: state now resets on modal close (was sending a filtered source_ids → user saw "only my
    perso GitHub key detected"); per-source failures surfaced as amber chips instead of swallowed in logs
  • Anti-hallu false positives on cross-repo absolute citations: absolute [src:] paths now existence-checked on the host
    (via resolve_host_path) instead of jailed to one project root (broke linked_repos/monorepo citations); relative paths
    still jailed
  • Unread-badge inflation: tool calls + cached-summary System rows no longer counted (non_system_message_count) : 26
    workflow discs showed 400+ instead of ~52
  • Symlink-escape guard restored on relative citations; migration idempotency strengthened (byte-stable sqlite_master +
    seed row-count diff); create_batch_run FK-violation rollback pinned
  • App.test missing version mock (masked Dashboard regressions); 4 empty describe blocks in api.coverage.test (broke
    lcov writeback); 8 TS build errors in new tests
  • 2 audit false positives confirmed (P0-3 SSRF already had 22 tests, P0-10 useAsyncGuard had 4) — codified "grep
    before fix"

ci

  • E2E job now runs inside the official Playwright container (mcr.microsoft.com/playwright:v1.59.1-noble) — browsers
    pre-baked, zero cdn.playwright.dev download (that CDN TCP-half-closed after 100% and hung the job to timeout, 3× in
    project history)
  • Coverage regression floors: backend cargo llvm-cov --fail-under-lines 77 --fail-under-functions 81
    --fail-under-regions 78, frontend coverage.thresholds {statements 55 / branches 50 / functions 47 / lines 58}
  • Node 24 runtime opt-in (silences Node 20 deprecation ahead of 2026-06-02 cutover); gitignored frontend/coverage/ +
    backend/coverage/

@DocRoms DocRoms added the ci-test Start all the CI Test steps label May 27, 2026
@DocRoms DocRoms force-pushed the feat/Add-anti-hallu-v0 branch from 33c77c5 to 41d6755 Compare May 28, 2026 07:06
@DocRoms DocRoms added ci-test Start all the CI Test steps and removed ci-test Start all the CI Test steps labels May 28, 2026
@DocRoms DocRoms force-pushed the feat/Add-anti-hallu-v0 branch 2 times, most recently from 6210248 to d1662db Compare May 29, 2026 06:18
@DocRoms DocRoms added ci-test Start all the CI Test steps and removed ci-test Start all the CI Test steps labels May 29, 2026
@DocRoms DocRoms force-pushed the feat/Add-anti-hallu-v0 branch from d1662db to d8633b4 Compare May 29, 2026 06:49
@DocRoms DocRoms added ci-test Start all the CI Test steps and removed ci-test Start all the CI Test steps labels May 29, 2026
Signed-off-by: Romuald <Romuald.priol@protonmail.com>
@DocRoms DocRoms force-pushed the feat/Add-anti-hallu-v0 branch from d8633b4 to cb73e51 Compare May 29, 2026 07:13
@DocRoms DocRoms added ci-test Start all the CI Test steps and removed ci-test Start all the CI Test steps labels May 29, 2026
@DocRoms DocRoms self-assigned this May 29, 2026
@DocRoms DocRoms merged commit 913658a into main May 29, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-test Start all the CI Test steps

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant