Skip to content

fix: harden path sandboxing with symlink protection, safe defaults, and sensitive file guards#204

Closed
anandgupta42 wants to merge 10000 commits intomainfrom
fix/security-hardening-v1
Closed

fix: harden path sandboxing with symlink protection, safe defaults, and sensitive file guards#204
anandgupta42 wants to merge 10000 commits intomainfrom
fix/security-hardening-v1

Conversation

@anandgupta42
Copy link
Contributor

@anandgupta42 anandgupta42 commented Mar 16, 2026

What does this PR do?

Hardens the security posture of Altimate Code's path sandboxing and permission system, addressing the same class of vulnerabilities that led to CVEs in Codex (GHSA-w5fx-fh39-j5rw, CVSS 8.6) and Claude Code (CVE-2025-54794, CVSS 7.7). Reviewed by 6 AI models (Claude, GPT 5.2 Codex, Gemini 3.1 Pro, Kimi K2.5, MiniMax M2.5, GLM-5).

Five areas of improvement:

  1. Symlink escape protectionFilesystem.containsReal() resolves symlinks via realpathSync before checking containment. Rejects paths with unresolved .. segments to prevent divergence between realpathSync (lexical .. normalization) and kernel behavior (follows symlink then applies ..). Also adds isAbsolute(rel) check for Windows cross-drive bypass.

  2. Safe permission defaults — Destructive shell/git commands (rm -rf, git push --force, git reset --hard) now prompt before execution instead of running silently. Database DDL (DROP DATABASE, DROP SCHEMA, TRUNCATE) is blocked entirely. Users can override in config.

  3. Sensitive file guards — New assertSensitiveWrite() check on write, edit, apply_patch, and move operations that prompts before modifying .env*, .ssh/, .aws/, .git/, credential files, and private keys (.pem, .key) — even inside the project boundary. Case-insensitive on macOS/Windows.

  4. Critical fix: realpathSync vs kernel divergence — Gemini 3.1 Pro discovered that realpathSync("project/symlink/..") returns project/ (lexical) while the kernel writes to the parent of the symlink target (outside project). Fixed by rejecting any unresolved path containing .. segments.

  5. Documentation — Updated SECURITY.md, permissions docs with 3 recommended configs and rule ordering guidance, and security FAQ with new sections on sensitive file prompts, default command protections, and best practices.

UX Impact

Change User Impact What users will notice
Symlink-aware path checks None Transparent — only blocks symlinks pointing outside the project
Windows cross-drive fix None Only relevant on Windows edge case
.. segment rejection None Normal tools already strip .. — only catches crafted paths
Bash defaults: destructive → ask Low rm -rf, git push --force, etc. now prompt instead of running silently. "Allow always" available.
Bash defaults: database DDL → deny Low DROP DATABASE, TRUNCATE blocked entirely. Override in config if needed.
Sensitive file prompts Medium First edit of .env, .ssh/, .aws/, credential files prompts. "Allow always" per-file for the session.

Key UX decisions:

  • ask instead of deny for shell/git commands — Blocking rm -rf ./build or git push --force after rebase would break common workflows. Prompting lets users approve safely.
  • "Allow always" for sensitive files — Users click once per .env file per session, not on every edit.
  • .github NOT in sensitive dirs — CI/CD workflow editing is a core use case; prompting every time would cause approval fatigue.
  • Database DDL is the only hard blockDROP DATABASE, DROP SCHEMA, TRUNCATE are almost never intentional in agent context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • Documentation update

Issue for this PR

Closes #202

How did you verify your code works?

  • 105 tests passing (27 unit + 73 e2e + 5 existing external-directory tests)
  • E2E tests use real filesystem operations (symlinks, tmp dirs, real path resolution)
  • Test coverage includes:
    • Symlink escape: file symlink, directory symlink, chained symlinks, relative symlinks, symlink/../ kernel divergence attack
    • Path traversal via File.read/File.list
    • Absolute path escape and prefix collision
    • Non-git project worktree safety
    • Sensitive dir/file detection: .git, .ssh, .aws, .env*, credentials, private keys
    • Case-insensitive detection on macOS/Windows
    • Certificate extension detection (.pem, .key, .p12, .pfx)
    • assertSensitiveWrite prompting for sensitive files, no-op for normal files
    • Bash deny defaults evaluation: database DDL denied, shell commands prompted
    • User config override merge semantics (last-match-wins)
    • Windows backslash path handling
  • Typecheck passes (4/4 turbo tasks)
  • Multi-model code review by 6 models with all findings addressed

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have added tests that prove my fix is effective
  • New and existing tests pass locally with my changes

🤖 Generated with Claude Code

jerome-benoit and others added 30 commits March 3, 2026 16:00
…sion (#15762)

Co-authored-by: Test User <test@test.com>
Co-authored-by: Shoubhit Dash <shoubhit2005@gmail.com>
Co-authored-by: Adam <2363879+adamdotdevin@users.noreply.github.com>
* fix: auto-bootstrap Python engine before starting bridge

Bridge.start() now calls ensureEngine() to download uv, create an
isolated venv, and install altimate-engine before spawning the Python
subprocess. resolvePython() also checks the managed venv path so
the correct interpreter is used after bootstrapping.

Previously, resolvePython() would fall through to system python3
which doesn't have altimate_engine installed, causing
ModuleNotFoundError on first run.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add bridge client tests for ensureEngine and resolvePython

- Export resolvePython() from client.ts for direct unit testing
- Test that ALTIMATE_CLI_PYTHON env var takes highest priority
- Test that managed engine venv is used when present on disk
- Test fallback to python3 when no venvs exist
- Test that ensureEngine() is called before bridge spawn
- Mock only bridge/engine module to avoid leaking into other tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: auto-bootstrap Python engine before starting bridge

Bridge.start() now calls ensureEngine() to download uv, create an
isolated venv, and install altimate-engine before spawning the Python
subprocess. resolvePython() also checks the managed venv path so
the correct interpreter is used after bootstrapping.

Previously, resolvePython() would fall through to system python3
which doesn't have altimate_engine installed, causing
ModuleNotFoundError on first run.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add bridge client tests for ensureEngine and resolvePython

- Export resolvePython() from client.ts for direct unit testing
- Test that ALTIMATE_CLI_PYTHON env var takes highest priority
- Test that managed engine venv is used when present on disk
- Test fallback to python3 when no venvs exist
- Test that ensureEngine() is called before bridge spawn
- Mock only bridge/engine module to avoid leaking into other tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Move existing data engineering docs into data-engineering/ subdirectory and
add 29 new pages covering platform features: TUI, CLI, web UI, IDE and CI/CD
integration, configuration, providers, tools, agents, models, themes, keybinds,
commands, formatters, permissions, LSP, MCP, ACP, skills, custom tools, SDK,
server, plugins, ecosystem, network, troubleshooting, and Windows/WSL.

All content adapted with altimate-code branding (env vars, config paths,
package names). mkdocs builds with zero warnings.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
anandgupta42 and others added 14 commits March 15, 2026 14:09
* fix: gate release workflow on tests and add build branding guards

Release workflow:
- Add test job (typecheck + bun test) that must pass before build/publish
- Both build and publish-engine now depend on test job

Branding guard tests (upstream-merge-guard.test.ts):
- build.ts binary name is 'altimate' not 'opencode'
- build.ts user-agent is 'altimate/' not 'opencode/'
- build.ts embeds ALTIMATE_ENGINE_VERSION
- build.ts reads engine version from pyproject.toml
- build.ts creates altimate-code backward-compat symlink
- build.ts has sourcemap: "external"
- package.json bin entries are correct (no 'opencode')
- package.json has no junk fields or echo-stub scripts
- bin/opencode does not exist
- keepOurs config includes build.ts, publish.ts, bin/**, CHANGELOG.md

Publish package test:
- Assert 'altimate' bin points to ./bin/altimate
- Assert no 'opencode' bin entry exists

* fix: scope release test job to release-critical tests only

Scope the release workflow test job to branding and install tests
only, with --timeout 30000 for consistency with package.json.
Prevents unrelated flaky tests from blocking releases while still
catching branding regressions and packaging issues.

* fix: fix pr-management duplicate check and update TEAM_MEMBERS

- Replace broken `curl altimate.ai/install` with `npm install -g @altimateai/altimate-code`
  (altimate.ai/install returns HTML, not a shell script)
- Replace upstream TEAM_MEMBERS with actual AltimateAI collaborators
  so internal PRs skip the duplicate check

* fix: address code review findings — regex precedence and case-sensitive TEAM_MEMBERS

- Split symlink guard regex into two explicit assertions to prevent
  false positives (Unix symlink + Windows .exe checked independently)
- Lowercase TEAM_MEMBERS entries and make grep case-insensitive (-i)
  since GitHub logins are case-insensitive
…ection (#161)

* fix: v0.3.0 post-release — upgrade crash, install banner, upgrade detection

- Backfill NULL migration names in `__drizzle_migrations` before Drizzle
  `migrate()` runs. Drizzle beta.16 matches by `name` field; old DBs from
  v0.2.x have NULL names causing all migrations to re-run and crash on
  `CREATE TABLE project` (table already exists).

- Move postinstall `printWelcome()` from `console.log` to
  `process.stderr.write` so the banner is visible under npm v7+ (which
  silences postinstall stdout).

- Remove changelog dump from `showWelcomeBannerIfNeeded()` — the
  postinstall banner now handles the install notification.

- Fix `method()` package detection: `opencode-ai` → `@altimateai/altimate-code`,
  brew formula `opencode` → `altimate-code`.

- Fix `latest()` npm registry URL and brew formulae URL to use
  `@altimateai/altimate-code` and `altimate-code.json`.

- Fix `getBrewFormula()` to reference `AltimateAI/tap/altimate-code`.

- Add `altimate_change` markers to all modified upstream-shared files.

- Add `db.ts` to required marker files in upstream-merge-guard test.

- Add 5 branding guard tests for installation detection, brew formula,
  npm registry URL, and `getBrewFormula()`.

- Add 8 E2E install tests covering new user (clean DB) and existing user
  (v0.2.x upgrade) scenarios, including backfill correctness,
  idempotency, and edge cases.

- Add 5 unit tests for migration name backfill logic.

Closes #160

* fix: use precise `altimate-code` match in `getBrewFormula()`

Address code review finding: `includes("altimate")` could match
unrelated packages. Narrowed to `includes("altimate-code")`.

* fix: restore brief upgrade banner in `welcome.ts`

The previous commit removed the entire banner display, leaving
`showWelcomeBannerIfNeeded()` as dead code. Brew and curl install
users don't run postinstall.mjs, so they'd never see any feedback.

Restore a single-line "installed successfully!" message on stderr.
The verbose changelog dump remains removed — only the postinstall
box shows the full get-started hints.

Addresses review feedback from GPT 5.2 Codex, GLM-5, MiniMax M2.5.

* docs: fix stale @opencode-ai/plugin reference in CONTRIBUTING.md

Update to @altimateai/altimate-code-plugin to match the rebranded
npm package name.

* docs: fix .opencode/memory/ references to .altimate-code/memory/

Update 7 occurrences in memory-tools.md to use the rebranded
project config directory name.

* ci: skip unaffected jobs using path-based change detection

Add a `changes` job using dorny/paths-filter to detect which areas
of the codebase were modified. Jobs now skip when their paths are
unaffected:

- TypeScript tests: only on packages/opencode/**, bun.lock, etc.
- Python tests (3 matrix jobs): only on packages/altimate-engine/**
- Lint: only on packages/altimate-engine/src/**
- Marker Guard: always runs on PRs (unchanged)

Push to main always runs all jobs (safety net).

For a docs-only or CI-config-only change, this skips ~4 minutes of
unnecessary TypeScript + Python test runs.
…148)

* Add AI Teammate repositioning design document

Comprehensive design for repositioning altimate from "AI tool" to "AI
teammate" — including trainable knowledge system (/teach, /train,
/feedback), Deep Research mode for multi-step investigations, team
memory that persists via git, and UX reframing from "agent modes" to
"teammate roles."

https://claude.ai/code/session_01V17Kk3qCZFp9ZJiuNYucoq

* Enrich design doc with OpenClaw research and proactive behaviors

Add detailed competitive analysis from OpenClaw (self-improving memory,
heartbeat scheduler, meet-users-where-they-are), Devin ($10.2B
valuation, "junior partner" framing), and Factory AI (workflow
embedding). Add proactive behaviors section with background monitors
(cost alerts, freshness checks, schema drift, PII scanning) and
auto-promotion of learned corrections.

https://claude.ai/code/session_01V17Kk3qCZFp9ZJiuNYucoq

* Implement AI Teammate training system and Deep Research mode

Core training infrastructure built on top of existing memory system:

Training Store & Types:
- TrainingStore wraps MemoryStore with training-specific conventions
- Four knowledge kinds: pattern, rule, glossary, standard
- Structured metadata (applied count, source, acceptance tracking)
- Training blocks stored in .opencode/memory/training/ (git-committable)
- One person teaches, whole team benefits via git

Training Tools:
- training_save: Save learned patterns, rules, glossary, standards
- training_list: List all learned knowledge with applied counts
- training_remove: Remove outdated training entries

Training Skills:
- /teach: Learn patterns from example files in the codebase
- /train: Learn standards from documents or style guides
- /training-status: Dashboard of all learned knowledge

System Prompt Injection:
- Training knowledge injected alongside memory at session start
- Structured by kind: rules first, then patterns, standards, glossary
- Budget-limited to 6000 chars to control prompt size
- Zero LLM calls on startup — just reads files from disk

Deep Research Agent Mode:
- New "researcher" agent for multi-step investigations
- 4-phase protocol: Plan → Gather → Analyze → Report
- Read-only access to all warehouse, schema, FinOps tools
- Structured reports with evidence, root causes, action items

Agent Awareness:
- All agent prompts updated with training awareness section
- Agents offer to save corrections as rules when users correct behavior
- Training tools permitted in all agent modes

Tests:
- 88 new tests across 5 test files (types, store, prompt, tools, integration)
- All tests standalone (no Instance dependency)
- Full lifecycle tests: save → list → format → inject → remove
- Edge cases: budget limits, meta roundtrips, coexistence with memory

https://claude.ai/code/session_01V17Kk3qCZFp9ZJiuNYucoq

* Polish AI Teammate training UX: auto-lowercase names, update detection, budget visibility

- Fix researcher agent permissions: add training_save/remove (was read-only)
- Auto-lowercase + space-to-hyphen name transform in training_save (ARR → arr)
- Detect update vs new save, show "Updated" with preserved applied count
- Show training budget usage (chars/percent) on save, list, and remove
- Improve training_list: group by kind, show most-applied entries, budget %
- Improve training_remove: show available entries on not-found, applied count
- Show similar entry names in duplicate warnings (not just count)
- Raise content limit from 1800 to 2500 chars
- Export TRAINING_BUDGET constant, add budgetUsage() to TrainingPrompt
- Add 30 new tests: auto-lowercase, update detection, budget overflow,
  name collision, scale (80 entries), improved messaging
- All 118 training tests + 305 memory tests pass

https://claude.ai/code/session_01V17Kk3qCZFp9ZJiuNYucoq

* Enhance training UX: attribution, correction detection, priority sorting

- Builder prompt: add attribution instructions (cite training entries that
  influenced output), correction detection (explicit + implicit patterns),
  conflict flagging between contradictory training entries
- Add /teach, /train, /training-status to Available Skills list in builder prompt
- Sort training entries by applied count (descending) in prompt injection so
  most-used entries get priority within the 6000-char budget
- Restructure Teammate Training section with clear subsections

https://claude.ai/code/session_01V17Kk3qCZFp9ZJiuNYucoq

* Fix experience gaps from user journey simulations

Simulation findings and fixes:

1. training_save now echoes back saved content so user can verify
   what was captured (new saves show content preview, updates show
   old vs new diff)

2. When training limit is reached, error now lists existing entries
   sorted by applied count and suggests the least-applied entry
   for removal

3. Researcher prompt now documents training_save/remove permissions
   (was contradicting its own permissions by saying "read-only" while
   having write access to training)

4. Added 10 new tests: content echo, update diff, limit suggestion,
   special character preservation (SQL -->, Jinja, HTML comments,
   code blocks), priority sorting verification

Verified: --> in content does NOT corrupt meta block (false positive).
The non-greedy regex terminates at the meta block's own --> correctly.

128 training tests + 305 memory tests all pass.

https://claude.ai/code/session_01V17Kk3qCZFp9ZJiuNYucoq

* Add self-improvement loop: applied tracking, insights, staleness detection

OpenClaw-inspired self-improvement mechanisms:

1. Wire up incrementApplied at injection time — counters now actually
   increment once per session per entry (deduped via session-scoped set),
   making "Most Applied" dashboard and priority sorting meaningful

2. TrainingInsights module analyzes training metadata and surfaces:
   - Stale entries (7+ days old, never applied) — suggests cleanup
   - High-value entries (5+ applications) — highlights most impactful
   - Near-limit warnings (18-19 of 20 entries per kind)
   - Consolidation opportunities (3+ entries with shared name prefix)

3. Insights automatically shown in training_list output

4. 24 new tests covering all insight types, boundary conditions,
   session tracking dedup, and format output

152 training tests + 305 memory tests all pass.

https://claude.ai/code/session_01V17Kk3qCZFp9ZJiuNYucoq

* fix: add dedicated training feature flag and remove unused insight type

- Add `ALTIMATE_DISABLE_TRAINING` flag independent of memory's disable flag
- Use new flag in session prompt injection and tool registry
- Remove unused `budget-warning` insight type from `TrainingInsight`

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: reset training session tracking, add error logging, fix list truncation

- Call `TrainingPrompt.resetSession()` at session start (step === 1)
  to prevent applied counters from growing unbounded across sessions
- Add structured error logging to all three training tools
- Add truncation indicator (`...`) when training list preview is cut off

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use `.altimate-code/memory` as primary storage path with `.opencode` fallback

Memory store was hardcoded to `.opencode/memory/` but the config system
already uses `.altimate-code` as primary with `.opencode` as fallback.

Now checks for `.altimate-code/` directory first, falls back to `.opencode/`,
and defaults to `.altimate-code/` for new projects. Result is cached per
process to avoid repeated filesystem checks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add Trainer agent mode with pattern discovery and training validation

Add dedicated trainer mode — the 8th primary agent — for systematically
building the AI teammate's knowledge base. Unlike inline corrections in
other modes, trainer mode actively scans codebases, validates training
against reality, and guides knowledge curation.

Changes:
- New `trainer` agent mode with read-only permissions (no write/edit/sql_execute)
- New `training_scan` tool: auto-discover patterns in models, SQL, config, tests, docs
- New `training_validate` tool: check training compliance against actual codebase
- Expand `TrainingKind` to 6 types: add `context` (background "why" knowledge)
  and `playbook` (multi-step procedures)
- Update `count()` to derive from enum (prevents drift when kinds change)
- Add KIND_HEADERS for context and playbook in prompt injection
- Update injection order: rules first, playbooks last (budget priority)
- Update training-save and training-list descriptions for new kinds

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add comprehensive training guide with scenarios and limitations

- New `data-engineering/training/index.md` (350+ lines):
  - Quick start with 3 entry points (trainer mode, inline corrections, /train skill)
  - Deep dive into all 4 trainer workflows (scan, validate, teach, gap analysis)
  - 5 comprehensive scenarios: new project onboarding, post-incident learning,
    quarterly review, business domain teaching, pre-migration documentation
  - Explicit limitations section (not a hard gate, budget limits, no auto-learning,
    heuristic validation, no conflict resolution, no version history)
  - Full reference tables for tools, skills, limits, and feature flag
- Updated `agent-modes.md`: add Researcher and Trainer mode sections with
  examples, capabilities, and "when to use" guidance
- Updated `getting-started.md`: add training link to "Next steps"
- Updated `mkdocs.yml`: add Training nav section under Data Engineering

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: increase training budget to 16K chars and rewrite docs as harness customization guide

Training is not a CLAUDE.md replacement — it's the mechanism by which users
customize the data engineering harness for their specific project. The agent
works WITH the user to discover what it needs to know, rather than requiring
users to write perfect static instructions.

Changes:
- Increase TRAINING_BUDGET from 6000 to 16000 chars (removes the #1 criticism
  from user simulations — budget was worse than unlimited CLAUDE.md)
- Complete docs rewrite with correct positioning:
  - "Customizing Your AI Teammate" framing (not "Training Your AI Teammate")
  - Research-backed "why" section (40-70% knowledge omission, guided discovery)
  - Clear comparison table: training vs CLAUDE.md (complementary, not competing)
  - 6 real-world scenarios including Databricks, Salesforce quirks, cost spikes
  - Honest limitations section (not a linter, not an audit trail, not automatic)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: merge training into memory with context-aware relevance scoring

Replace two parallel injection systems (memory 8KB + training 16KB)
with a single unified injection that scores blocks by relevance to
the current agent.

How it works:
- All blocks (memory + training) loaded in one pass
- Each block scored: agent tag match (+10), training kind relevance
  per agent (+1-5), applied count bonus (+0-3), recency (+0-2),
  non-training base (+5)
- Builder sees rules/patterns first; analyst sees glossary/context first
- Budget is 20KB unified, filled greedily by score
- Training blocks still tracked with applied counts (fire-and-forget)

Architecture:
- memory/prompt.ts: new scoreBlock(), unified inject() with InjectionContext
- memory/types.ts: UNIFIED_INJECTION_BUDGET, AGENT_TRAINING_RELEVANCE weights
- session/prompt.ts: single inject call with agent context (was 2 separate)
- training/prompt.ts: deprecated, delegates to MemoryPrompt (backward compat)

No changes to: MemoryStore, TrainingStore, training tools, memory tools.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: cut training_scan and training_validate, simplify docs

Research from 8 independent evaluations + SkillsBench (7,308 test runs)
found that compact focused context beats comprehensive docs by 20pp.
The training system's value is in correction capture (2-sec saves) and
team propagation (git sync) — not in regex scanning or keyword grep.

Removed:
- training_scan (255 lines) — regex pattern counting, not discovery
- training_validate (315 lines) — keyword grep, not validation

Simplified:
- trainer.txt: removed scan/validate workflows, focused on guided
  teaching and curation
- agent-modes.md: updated trainer section with correction-focused example
- training docs: complete rewrite with new pitch:
  "Correct the agent once. It remembers forever. Your team inherits it."
  Backed by SkillsBench research showing compact > comprehensive.

Net: -753 lines. 152 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove dead accepted/rejected fields, add training tips, expand limitations

Gaps found by simulation team:

1. Remove `accepted`/`rejected` counters from TrainingBlockMeta — they were
   never incremented anywhere in the codebase (dead code since inception)
2. Add 5 training discoverability tips to TUI tips (was 0 mentions in 152 tips)
3. Expand limitations section in docs with honest, complete list:
   context budget, 20/kind limit, no approval workflow, SQL-focused,
   git discipline required

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update site-wide docs for training and new agent modes

- Homepage: update from "Four agents" to "Seven agents" — add Researcher,
  Trainer, Executive cards with descriptions
- Getting Started: update training link to match new pitch
  "Corrections That Stick"
- Tools index: add Training row (3 tools + 3 skills) with link
- All references now consistent with simplified training system

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address Sentry review findings — 7 bugs fixed

1. stripTrainingMeta/parseTrainingMeta regex: remove multiline `m` flag
   that could match user content starting with `<!-- training` mid-string
   (types.ts, store.ts)

2. training_save content limit: reduce from 2500 to 1800 chars to account
   for ~200 char metadata overhead against MemoryStore's 2048 char limit
   (training-save.ts)

3. injectTrainingOnly: change `break` to `continue` so budget-exceeding
   section headers skip to next kind instead of stopping all injection
   (memory/prompt.ts)

4. injectTrainingOnly: track itemCount and return empty string when no
   items injected (was returning header-only string, inflating budget
   reports) (memory/prompt.ts)

5. projectDir cache: replace module-level singleton with Map keyed by
   Instance.directory to prevent stale paths when AsyncLocalStorage
   context changes across concurrent requests (memory/store.ts)

6. budgetUsage side effect: already fixed — delegates to injectTrainingOnly
   which is read-only (no applied count increment). Sentry comments were
   against pre-refactor code.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: CI failure + new Sentry finding — orphaned headers and agent test

1. Agent test: add researcher + trainer to "all disabled" test so it
   correctly expects "no primary visible agent" when ALL agents are off

2. Orphaned section headers: add pre-check that at least one entry fits
   before adding section header in both injectTrainingOnly and inject
   memory section (prevents header-only output inflating budget reports)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address multi-model code review findings

Fixes from 6-model consensus review (Claude + GPT + Gemini + Kimi + MiniMax + GLM-5):

1. training_remove: add name validation regex matching training_save
   (Gemini finding — prevents path traversal via malformed names)

2. training_save: improve name transform to strip ALL non-alphanumeric
   chars, not just whitespace (Gemini finding — "don't-use-float!"
   now becomes "don-t-use-float" instead of failing regex)

3. incrementApplied: replace silent `.catch(() => {})` with warning
   log (Kimi + GLM-5 consensus — fire-and-forget is by design but
   failures should be visible in logs for debugging)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address new Sentry findings — regex m flag and off-by-one budget check

1. formatTrainingEntry regex: remove multiline `m` flag that could
   match user content mid-string (memory/prompt.ts:82)

2. Memory block budget check: change `<` to `<=` so blocks that fit
   exactly into remaining budget are included (memory/prompt.ts:204)

3 prior Sentry findings already fixed in earlier commits:
   - projectDir cache (Map keyed by Instance.directory)
   - injectTrainingOnly header-only return (itemCount guard)
   - orphaned section headers (first-entry pre-check)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address 6-model consensus review — 4 remaining bugs

Fixes from consensus across Claude, GPT 5.2, Gemini 3.1, Kimi K2.5,
MiniMax M2.5, and GLM-5:

1. parseTrainingMeta: check safeParse().success before accessing .data
   (GLM-5 + MiniMax consensus — accessing .data on failed parse returns
   undefined, could cause downstream errors)

2. Stale detection: use `e.updated` not `e.created` so entries updated
   recently aren't incorrectly flagged as stale (MiniMax finding)

3. training_list: pass scope/kind filter to count() so summary table
   matches the filtered entries list (GPT finding)

4. training_remove: show hint entries from same scope only, not all
   scopes (GPT + MiniMax finding)

Prior fixes already addressed: name validation on remove (Gemini),
name transform punctuation (Gemini), silent incrementApplied catch
(Kimi + GLM-5), regex m flag (MiniMax + Sentry).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
npm v7+ suppresses ALL postinstall output (stdout AND stderr), so
the welcome box was never visible after `npm install`. Users only
saw "added 2 packages" with no feedback.

Move the full welcome box into `showWelcomeBannerIfNeeded()` which
runs in the CLI middleware before the TUI starts. The postinstall
script now only writes the marker file — no output.

Flow:
1. `npm install` → postinstall writes `.installed-version` marker
2. First `altimate` run → CLI reads marker, shows welcome box, deletes marker
3. Subsequent runs → no marker, no banner

Closes #160
Multiple scripts and CI workflows were fetching/pushing tags in ways
that caused ~900 upstream OpenCode tags to leak into our origin remote:

- CI `git fetch upstream` auto-followed tags — added `--no-tags`
- Release scripts used `git push --tags` pushing ALL local tags to
  origin — changed to push only the specific new tag
- Release scripts used `git fetch --force --tags` without explicit
  remote — added explicit `origin`
- `script/publish.ts` used `--tags` flag — push only `v${version}`
- Docs referenced `git fetch upstream --tags` — fixed to `--no-tags`

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…erge (#168)

* fix: sidebar shows `OpenCode` instead of `Altimate Code` after upstream merge

- Replace `<b>Open</b><b>Code</b>` with `<b>Altimate</b><b> Code</b>` in sidebar footer
- Add `altimate_change` markers to protect branding block from future upstream merges
- Add TUI branding guard tests to `upstream-merge-guard.test.ts`

Closes #167

* fix: remove stale `accepted`/`rejected` properties from `TrainingBlockMeta` test

These fields were removed from the type but the test wasn't updated.
* feat: added a skill for data story telling and visualizations/ data products

* fix: rename skill to data-viz

* fix: reduce skills.md and references files by 60%

---------

Co-authored-by: Saurabh Arora <saurabh@altimate.ai>
* feat: add Langfuse tracing for CLI and Spider2-DBT benchmark

Instrument `altimate run` with Langfuse observability (activated via
LANGFUSE_* env vars) so benchmark runs capture tool calls, tokens, cost,
and timing as traces. After evaluation, traces are updated with
benchmark_pass scores for end-to-end visibility in the Langfuse dashboard.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: populate generation input/output in Langfuse traces

Generations previously had empty input (no context) and empty output
when the model only produced tool calls. Now:
- Input shows tool results from the preceding step
- Output falls back to "[tool calls: read, write, ...]" when no text

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: replace Langfuse with local-first tracing system

Replace the Langfuse SDK integration with a vendor-neutral, exporter-based
tracing system that captures full session traces locally by default.

Core changes:
- New `Tracer` class with `FileExporter` (local JSON) and `HttpExporter` (remote)
- 4-mode HTML trace viewer (Waterfall, Tree, Chat, Log) served via `Bun.serve`
- `altimate-code trace list` and `altimate-code trace view` CLI commands
- Data engineering semantic attributes (`de-attributes.ts`) for warehouse/SQL/dbt
- Tracing config schema: `tracing.enabled`, `tracing.dir`, `tracing.maxFiles`, `tracing.exporters`
- Integrated into both `run` command and TUI worker with per-session tracers

Key design decisions:
- Crash-safe: `writeFileSync` in signal handlers, atomic tmp+rename snapshots
- Config-aware: TUI and CLI both honor `tracing.*` config settings
- Fail-safe: all tracing methods wrapped in try-catch, never crashes the app
- Localhost-only: trace viewer server binds to `127.0.0.1`
- Session finalization: TUI traces properly end on `session.status=idle`

Tests: 380 tests across 13 files (unit, integration, adversarial, e2e)
Docs: `docs/docs/configure/tracing.md` + CLI docs updated

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: pad status text before wrapping with ANSI codes in `trace list`

`padEnd(10)` was called on strings containing ANSI color codes,
causing the invisible escape characters to be counted toward the
padding length. This broke column alignment for non-"ok" statuses.

Fix: pad the visible text first, then wrap with color codes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: kulvirgit <kulvirgithub@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…188)

Add fallback retry without `--label` flag when `gh issue create` fails
due to missing labels in the repository. The tool now:
1. Tries to create the issue with labels first
2. If that fails, retries without labels

Also created the missing `from-cli`, `enhancement`, and `improvement`
labels in the GitHub repository.

Closes #187

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
)

- Add `clearTimeout` in `.finally()` to `withTimeout` so the event loop
  exits immediately after `endTrace()` instead of hanging for 5 seconds
- Log a `console.warn` when an exporter times out (uses the previously
  unused `name` parameter for diagnostics)
- Align `HttpExporter` internal `AbortSignal.timeout` from 10s to 5s to
  match the per-exporter wrapper timeout
- Clean up safety-net timer in adversarial test to prevent open handles

Closes #190

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: replace UI.println with Log in engine bootstrap to prevent TUI prompt corruption

Engine bootstrap messages (downloading uv, engine ready, etc.) were written via
UI.println() which writes to stderr. In TUI mode the framework captures stderr,
causing garbled text like "readyltimate-engine 0.4.0..." in the prompt input area.

Switched all status messages in engine.ts to use Log.Default.info() which routes
to the log file instead of stderr, keeping the TUI display clean.

https://claude.ai/code/session_01MBVNhRX6XKTHqtZ6zS1i6w

* chore: update bun.lock after dependency install

https://claude.ai/code/session_01MBVNhRX6XKTHqtZ6zS1i6w

* fix: replace console.log/error with Log.Default in TUI code to prevent prompt corruption

All console.log and console.error calls in TUI components and bridge
client wrote to stdout/stderr, which the TUI framework captures and
renders as garbled text in the prompt input area. Replace with
Log.Default.info/error which writes to the log file instead.

Affected: route navigation, bootstrap, theme resolution, route changes,
clipboard detection, session creation, auto-enhance, MCP toggle,
workspace creation, and altimate-engine stderr.

https://claude.ai/code/session_01MBVNhRX6XKTHqtZ6zS1i6w

* fix: resolve code review findings for TUI prompt corruption PR

- Move "workspace created" log inside success branch (was logging on failure)
- Use structured metadata in `client.ts` stderr handler instead of template string
- Downgrade noisy diagnostic logs to `debug` level (clipboard, theme, route, sync)
- Replace platform-dependent binaries (`echo`, `python3`, `tar`) with `process.execPath`
  in tests for CI portability
- Replace hollow Log test with real `Log.init({ print: false })` integration test
- Use regex matching instead of brittle exact-string matching in log assertion tests
- Add source-scanning regression tests for TUI files and `client.ts` to prevent
  future `console.log/error` regressions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
…nd sensitive file guards

- Add `Filesystem.containsReal()` with `realpathSync` to prevent symlink escape attacks
  (same class of bug as Codex GHSA-w5fx-fh39-j5rw and Claude Code CVE-2025-54794)
- Add `isAbsolute(rel)` check to `Filesystem.contains()` for Windows cross-drive bypass
- Update `Instance.containsPath()` to use symlink-aware `containsReal()`
- Add safe permission defaults: deny `rm -rf`, `git push --force`, `git reset --hard`,
  `DROP DATABASE`, `TRUNCATE` out of the box
- Add `Protected.isSensitiveWrite()` to detect writes to `.git/`, `.ssh/`, `.aws/`,
  `.env*`, credential files even inside the project boundary
- Add `assertSensitiveWrite()` guard to write, edit, and apply_patch tools
- Remove resolved TODO comments from `file/index.ts`
- Update SECURITY.md, permissions docs, and security FAQ with practical guidance
- Add 94 tests including 62 e2e tests covering symlink attacks, path traversal,
  sensitive file detection, and combined attack scenarios

Closes #202

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@claude
Copy link

claude bot commented Mar 16, 2026

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review.

anandgupta42 and others added 2 commits March 16, 2026 17:03
… paths, TOCTOU docs

- Fix critical bug: bash deny defaults had `"*": "ask"` LAST which overrode deny rules
  due to last-match-wins semantics. Moved `"*": "ask"` to first position so deny rules
  take precedence.
- Fix all doc examples with same ordering bug (security-faq.md, permissions.md)
- Fix `isSensitiveWrite` to use regex split `/[/\\]/` for cross-platform path handling
- Allow per-path "Always" approval for sensitive file writes (reduces approval fatigue)
- Document TOCTOU limitation in `containsReal` JSDoc
- Add doc clarification about last-match-wins rule ordering with examples
- Add tests: bash deny defaults evaluation, user override merge, Windows backslash paths

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nsitive matching, expanded patterns

Fixes from consensus across GPT 5.2, Kimi K2.5, MiniMax M2.5, and GLM-5 reviews:

- Add `assertSensitiveWrite(ctx, movePath)` for move destinations in `apply_patch`
  (CRITICAL: 3 models flagged that moves to `.ssh/`, `.env` bypassed sensitive check)
- Add case-insensitive matching on macOS/Windows for sensitive dirs and files
  (`.GIT/config`, `.SSH/id_rsa` now correctly detected on case-insensitive FS)
- Expand `SENSITIVE_FILES` with `.htpasswd`, `.pgpass`
- Add `SENSITIVE_EXTENSIONS` for private keys: `.pem`, `.key`, `.p12`, `.pfx`
- Add tests: case-insensitive matching, certificate extensions, credential files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment on lines +47 to +56
await ctx.ask({
permission: "edit",
patterns: [relativePath],
always: [relativePath],
metadata: {
filepath: target,
sensitive: matched,
reason: `This file is in a sensitive location (${matched}). Modifications could affect credentials, version control, or security configuration.`,
},
})

This comment was marked as outdated.

…on divergence

Gemini 3.1 Pro found that `realpathSync` and the OS kernel disagree on
`symlink/../file.txt`:
- `realpathSync("project/link/..")` → `project/` (lexical normalization)
- `writeFile("project/link/../f")` → writes to parent of symlink TARGET (kernel)

This means `containsReal` would approve a write that the OS places OUTSIDE
the project boundary. The fix rejects any unresolved path containing `..`
segments, since their behavior through symlinks is fundamentally unpredictable
at the application level.

Also adds `.github` to `SENSITIVE_DIRS` (workflow injection vector).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
anandgupta42 and others added 2 commits March 16, 2026 17:28
… from sensitive dirs

UX impact evaluation of each change:

1. **Bash defaults softened**: Changed destructive shell/git commands from
   `deny` (blocked silently) to `ask` (prompted). `rm -rf ./build` and
   `git push --force` after rebase are legitimate workflows — blocking them
   without a prompt is poor UX. Database DDL (`DROP DATABASE`, `TRUNCATE`)
   stays `deny` since it's almost never intentional in agent context.

2. **Removed `.github` from sensitive dirs**: Editing CI/CD workflows is a
   core use case. Prompting on every workflow edit would cause severe
   approval fatigue.

3. **Expanded FAQ**: Added "Why am I being prompted to edit .env files?"
   with table of protected patterns and guidance on "Allow always".
   Added "What commands are blocked or prompted by default?" with clear
   table showing which commands prompt vs block. Reordered best practices
   to lead with "work on a branch" (most effective, least friction).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…icky DDL deny rules

Fixes two issues flagged by Sentry automated review:

1. `assertSensitiveWrite` now uses `permission: "sensitive_write"` instead of
   `"edit"`, preventing agents with `edit: "allow"` from silently bypassing
   sensitive file prompts for `.env`, `.ssh/`, `.aws/`, etc.

2. Database DDL deny rules (`DROP DATABASE`, `DROP SCHEMA`, `TRUNCATE`) are now
   merged as a `safetyDenials` layer AFTER user/agent configs via `userWithSafety`.
   This ensures wildcard `bash: "allow"` in agent configs cannot override these
   denials (last-match-wins). Users who need to override must use specific patterns
   like `"DROP DATABASE test_db": "allow"`.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@anandgupta42
Copy link
Contributor Author

Replaced by #205 — rebased onto main with clean commit history (7 commits, 16 files instead of 100+ commits, 995 files).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: harden path sandboxing — symlink escape, cross-drive bypass, no OS sandbox