Skip to content

Commit 988871f

Browse files
anandgupta42claude
andauthored
feat: add Altimate Memory — persistent cross-session memory with TTL, namespaces, citations, and audit logging (#136)
* feat: add Letta-style persistent memory blocks for cross-session agent context Adds a file-based persistent memory system that allows the AI agent to retain and recall context across sessions — warehouse configurations, naming conventions, team preferences, and past analysis decisions. Three new tools: memory_read, memory_write, memory_delete with global and project scoping, YAML frontmatter format, atomic writes, size/count limits, and system prompt injection support. Closes #135 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: rebrand to Altimate Memory, add comprehensive docs and side-effect analysis - Rename tool IDs to altimate_memory_read/write/delete - Add comprehensive documentation at docs/data-engineering/tools/memory-tools.md - Document context window impact, stale memory risks, wrong memory detection, security considerations, and mitigation strategies - Add altimate_change markers consistent with codebase conventions - Update tools index to include Altimate Memory category Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add TTL expiration, hierarchical namespaces, dedup detection, audit logging, citations, session extraction, and global opt-out to Altimate Memory Implements P0/P1 improvements: - TTL expiration via optional `expires` field with automatic filtering - Hierarchical namespace IDs with slash-separated paths mapped to subdirectories - Deduplication detection on write with tag-overlap warnings - Audit log for all CREATE/UPDATE/DELETE operations - Citation-backed memories with file/line/note references - Session-end batch extraction tool (opt-in via ALTIMATE_MEMORY_AUTO_EXTRACT) - Global opt-out via ALTIMATE_DISABLE_MEMORY environment variable - Comprehensive tests: 175 tests covering all new features Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: harden Altimate Memory against path traversal, add adversarial tests Security fixes: - Replace permissive ID regex with segment-based validation that rejects '..', '.', '//', and all path traversal patterns (a/../b, a/./b, etc.) - Use unique temp file names (timestamp + random suffix) to prevent race condition crashes during concurrent writes to the same block ID The old regex /^[a-z0-9][a-z0-9_/.-]*[a-z0-9]$/ allowed dangerous IDs like "a/../b" or "a/./b" that could escape the memory directory via path.join(). The new regex validates each path segment individually. Adds 71 adversarial tests covering: - Path traversal attacks (10 tests) - Frontmatter injection and parsing edge cases (9 tests) - Unicode and special character handling (6 tests) - TTL/expiration boundary conditions (6 tests) - Deduplication edge cases (7 tests) - Concurrent operations and race conditions (4 tests) - ID validation gaps (11 tests) - Malformed files on disk (7 tests) - Serialization round-trip edge cases (5 tests) - Schema validation with adversarial inputs (6 tests) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: expired blocks no longer count against capacity limit, add path guard Addresses PR review comments: 1. Expired blocks counted against capacity (sentry[bot] MEDIUM): - write() now only counts non-expired blocks against MEMORY_MAX_BLOCKS_PER_SCOPE - Auto-cleans expired blocks from disk when total file count hits capacity - Users no longer see "scope full" errors when all blocks are expired 2. Path traversal defense-in-depth (sentry[bot] CRITICAL): - Added runtime path.resolve() guard in blockPath() to verify the resolved path stays within the memory directory, as a second layer behind the segment-based ID regex from the previous commit Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address consensus code review findings for Altimate Memory - Add schema validation on disk reads (MemoryBlockSchema.safeParse) - Add safe ID regex to MemoryReadTool and MemoryDeleteTool parameters - Fix include_expired ignored when reading by specific ID - Fix duplicate tags inflating dedup overlap count (dedupe with Set) - Move expired block cleanup to after successful write - Eliminate double directory scan in write() by passing preloaded blocks - Fix docs/code mismatch: max ID length 128 -> 256 - Add 22 new tests covering all fixes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: wire up memory injection into system prompt with telemetry - Inject memory blocks into system prompt at session start, gated by ALTIMATE_DISABLE_MEMORY flag - Add memory_operation and memory_injection telemetry events to App Insights - Add memory tool categorization for telemetry - Document disabling memory for benchmarks/CI - Add injection integration tests and telemetry event tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent cc88de9 commit 988871f

23 files changed

Lines changed: 4433 additions & 0 deletions

File tree

docs/docs/data-engineering/tools/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,5 +10,6 @@ altimate has 55+ specialized tools organized by function.
1010
| [Lineage Tools](lineage-tools.md) | 1 tool | Column-level lineage tracing with confidence scoring |
1111
| [dbt Tools](dbt-tools.md) | 2 tools + 6 skills | Run, manifest parsing, test generation, scaffolding |
1212
| [Warehouse Tools](warehouse-tools.md) | 6 tools | Environment scanning, connection management, discovery, testing |
13+
| [Altimate Memory](memory-tools.md) | 3 tools | Persistent cross-session memory for warehouse config, conventions, and preferences |
1314

1415
All tools are available in the interactive TUI. The agent automatically selects the right tools based on your request.
Lines changed: 258 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,258 @@
1+
# Altimate Memory Tools
2+
3+
Altimate Memory gives your data engineering agent **persistent, cross-session memory**. Instead of re-explaining your warehouse setup, naming conventions, or team preferences every session, the agent remembers what matters and picks up where you left off.
4+
5+
Memory blocks are plain Markdown files stored on disk — human-readable, version-controllable, and fully under your control.
6+
7+
## Why memory matters for data engineering
8+
9+
General-purpose coding agents treat every session as a blank slate. For data engineering, this is especially painful because:
10+
11+
- **Warehouse context is stable** — your Snowflake warehouse name, default database, and connection details rarely change, but you re-explain them every session.
12+
- **Naming conventions are tribal knowledge**`stg_` for staging, `int_` for intermediate, `fct_`/`dim_` for marts. The agent needs to learn these once, not every time.
13+
- **Past analyses inform future work** — if the agent optimized a query or traced lineage for a table last week, recalling that context avoids redundant work.
14+
- **User preferences accumulate** — SQL style, preferred dialects, dbt patterns, warehouse sizing decisions.
15+
16+
Altimate Memory solves this with three tools that let the agent save, recall, and manage its own persistent knowledge.
17+
18+
## Tools
19+
20+
### altimate_memory_read
21+
22+
Read memory blocks from previous sessions. Automatically called at session start to give the agent context.
23+
24+
```
25+
> Read my memory about warehouse configuration
26+
27+
Memory: 1 block(s)
28+
29+
### warehouse-config (project) [snowflake, warehouse]
30+
## Warehouse Configuration
31+
32+
- **Provider**: Snowflake
33+
- **Default warehouse**: ANALYTICS_WH (XS for dev, M for prod)
34+
- **Default database**: ANALYTICS_DB
35+
- **Naming convention**: stg_ for staging, int_ for intermediate, fct_/dim_ for marts
36+
```
37+
38+
**Parameters:**
39+
40+
| Parameter | Type | Default | Description |
41+
|---|---|---|---|
42+
| `scope` | `"global" \| "project" \| "all"` | `"all"` | Filter by scope |
43+
| `tags` | `string[]` | `[]` | Filter to blocks containing all specified tags |
44+
| `id` | `string` || Read a specific block by ID |
45+
46+
---
47+
48+
### altimate_memory_write
49+
50+
Create or update a persistent memory block.
51+
52+
```
53+
> Remember that our Snowflake warehouse is ANALYTICS_WH and we use stg_ prefix for staging models
54+
55+
Memory: Created "warehouse-config"
56+
```
57+
58+
The agent automatically calls this when it learns something worth persisting — you can also explicitly ask it to "remember" something.
59+
60+
**Parameters:**
61+
62+
| Parameter | Type | Required | Description |
63+
|---|---|---|---|
64+
| `id` | `string` | Yes | Unique identifier (lowercase, hyphens/underscores). Examples: `warehouse-config`, `naming-conventions` |
65+
| `scope` | `"global" \| "project"` | Yes | `global` for user-wide preferences, `project` for project-specific knowledge |
66+
| `content` | `string` | Yes | Markdown content (max 2,048 characters) |
67+
| `tags` | `string[]` | No | Up to 10 tags for categorization (max 64 chars each) |
68+
69+
---
70+
71+
### altimate_memory_delete
72+
73+
Remove a memory block that is outdated, incorrect, or no longer relevant.
74+
75+
```
76+
> Forget the old warehouse config, we migrated to BigQuery
77+
78+
Memory: Deleted "warehouse-config"
79+
```
80+
81+
**Parameters:**
82+
83+
| Parameter | Type | Required | Description |
84+
|---|---|---|---|
85+
| `id` | `string` | Yes | ID of the block to delete |
86+
| `scope` | `"global" \| "project"` | Yes | Scope of the block to delete |
87+
88+
## Scoping
89+
90+
Memory blocks live in two scopes:
91+
92+
| Scope | Storage location | Use case |
93+
|---|---|---|
94+
| **global** | `~/.local/share/altimate-code/memory/` | User-wide preferences: SQL style, preferred models, general conventions |
95+
| **project** | `.opencode/memory/` (in project root) | Project-specific: warehouse config, naming conventions, data model notes, past analyses |
96+
97+
Project memory travels with your repo. Add `.opencode/memory/` to `.gitignore` if it contains sensitive information, or commit it to share team conventions.
98+
99+
## File format
100+
101+
Each block is a Markdown file with YAML frontmatter:
102+
103+
```markdown
104+
---
105+
id: warehouse-config
106+
scope: project
107+
created: 2026-03-14T10:00:00.000Z
108+
updated: 2026-03-14T10:00:00.000Z
109+
tags: ["snowflake", "warehouse"]
110+
---
111+
112+
## Warehouse Configuration
113+
114+
- **Provider**: Snowflake
115+
- **Default warehouse**: ANALYTICS_WH
116+
- **Default database**: ANALYTICS_DB
117+
```
118+
119+
Files are human-readable and editable. You can create, edit, or delete them manually — the agent will pick up changes on the next session.
120+
121+
## Limits and safety
122+
123+
| Limit | Value | Rationale |
124+
|---|---|---|
125+
| Max block size | 2,048 characters | Prevents any single block from consuming too much context |
126+
| Max blocks per scope | 50 | Bounds total memory footprint |
127+
| Max tags per block | 10 | Keeps metadata manageable |
128+
| Max tag length | 64 characters | Prevents tag abuse |
129+
| Max ID length | 256 characters | Reasonable filename length |
130+
131+
### Atomic writes
132+
133+
Blocks are written to a temporary file first, then atomically renamed. This prevents corruption if the process is interrupted mid-write.
134+
135+
## Disabling memory
136+
137+
Set the environment variable to disable all memory functionality — tools and automatic injection:
138+
139+
```bash
140+
ALTIMATE_DISABLE_MEMORY=true
141+
```
142+
143+
This is useful for **benchmarks**, CI pipelines, or any environment where persistent memory should not influence agent behavior. When disabled, memory tools are removed from the tool registry and no memory blocks are injected into the system prompt.
144+
145+
## Context window impact
146+
147+
Altimate Memory automatically injects relevant blocks into the system prompt at session start, subject to a configurable token budget (default: 8,000 characters). Blocks are sorted by last-updated timestamp, so the most recently relevant information is loaded first. The agent also has access to memory tools (`altimate_memory_read`, `altimate_memory_write`, `altimate_memory_delete`) to manage blocks on demand during a session.
148+
149+
**What this means in practice:**
150+
151+
- With a typical block size of 200-500 characters, the default budget comfortably fits 15-40 blocks
152+
- Memory injection adds a one-time cost at session start — it does not grow during the session
153+
- If you notice context pressure, reduce the number of blocks or keep them concise
154+
- The agent's own tool calls and responses consume far more context than memory blocks
155+
- To disable injection entirely (e.g., for benchmarks), set `ALTIMATE_DISABLE_MEMORY=true`
156+
157+
!!! tip
158+
Keep blocks concise and focused. A block titled "warehouse-config" with 5 bullet points is better than a wall of text. The agent can always call `altimate_memory_read` to fetch specific blocks on demand.
159+
160+
## Potential side effects and how to handle them
161+
162+
### Stale or incorrect memory
163+
164+
Memory blocks persist indefinitely. If your warehouse configuration changes or a convention is updated, the agent will continue using outdated information until the block is updated or deleted.
165+
166+
**How to detect:** If the agent makes assumptions that don't match your current setup (e.g., references an old warehouse name), check what's in memory:
167+
168+
```
169+
> Show me all memory blocks
170+
171+
> Delete the warehouse-config block, it's outdated
172+
```
173+
174+
**How to prevent:**
175+
176+
- Review memory blocks periodically — they're plain Markdown files you can inspect directly
177+
- Ask the agent to "forget" outdated information when things change
178+
- Keep blocks focused on stable facts rather than ephemeral details
179+
180+
### Wrong information getting saved
181+
182+
The agent decides what to save based on conversation context. It may occasionally save incorrect inferences or overly specific details that don't generalize well.
183+
184+
**How to detect:**
185+
186+
- After a session where the agent saved memory, review what was written:
187+
```bash
188+
ls .opencode/memory/ # project memory
189+
cat .opencode/memory/*.md # inspect all blocks
190+
```
191+
- The agent always reports when it creates or updates a memory block, so watch for `Memory: Created "..."` or `Memory: Updated "..."` messages in the session output
192+
193+
**How to fix:**
194+
195+
- Delete the bad block: ask the agent or run `rm .opencode/memory/bad-block.md`
196+
- Edit the file directly — it's just Markdown
197+
- Ask the agent to rewrite it: "Update the warehouse-config memory with the correct warehouse name"
198+
199+
### Context bloat
200+
201+
With 50 blocks at 2KB each, the theoretical maximum injection is ~100KB. In practice, the 8,000-character default budget caps injection at well under 10KB.
202+
203+
**Signs of context bloat:**
204+
205+
- Frequent auto-compaction (visible in the TUI)
206+
- The agent losing track of your current task because memory is crowding out working context
207+
208+
**How to mitigate:**
209+
210+
- Keep the total block count low (10-20 active blocks is a sweet spot)
211+
- Delete blocks you no longer need
212+
- Use tags to categorize and let the agent filter to what's relevant
213+
- Reduce the injection budget if needed
214+
215+
### Security considerations
216+
217+
Memory blocks are stored as plaintext files on disk. Be mindful of what gets saved:
218+
219+
- **Do not** save credentials, API keys, or connection strings in memory blocks
220+
- **Do** save structural information (warehouse names, naming conventions, schema patterns)
221+
- If using project-scoped memory in a shared repo, add `.opencode/memory/` to `.gitignore` to avoid committing sensitive context
222+
- Memory blocks are scoped per-user (global) and per-project — there is no cross-user or cross-project leakage
223+
224+
!!! warning
225+
Memory blocks are not encrypted. Treat them like any other configuration file on your machine. Do not store secrets or PII in memory blocks.
226+
227+
## Examples
228+
229+
### Data engineering team setup
230+
231+
```
232+
> Remember: we use Snowflake with warehouse COMPUTE_WH for dev and ANALYTICS_WH for prod.
233+
Our dbt project uses the staging/intermediate/marts pattern with stg_, int_, fct_, dim_ prefixes.
234+
Always use QUALIFY instead of subqueries for deduplication.
235+
236+
Memory: Created "team-conventions" in project scope
237+
```
238+
239+
### Personal SQL preferences
240+
241+
```
242+
> Remember globally: I prefer CTEs over subqueries, always use explicit column lists
243+
(no SELECT *), and format SQL with lowercase keywords.
244+
245+
Memory: Created "sql-preferences" in global scope
246+
```
247+
248+
### Recalling past work
249+
250+
```
251+
> What do you remember about our warehouse?
252+
253+
Memory: 2 block(s)
254+
### warehouse-config (project) [snowflake]
255+
...
256+
### team-conventions (project) [dbt, conventions]
257+
...
258+
```

packages/opencode/src/altimate/index.ts

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,3 +76,6 @@ export * from "./tools/warehouse-discover"
7676
export * from "./tools/warehouse-list"
7777
export * from "./tools/warehouse-remove"
7878
export * from "./tools/warehouse-test"
79+
80+
// Memory
81+
export * from "../memory"

packages/opencode/src/altimate/telemetry/index.ts

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -254,6 +254,26 @@ export namespace Telemetry {
254254
tool_count: number
255255
resource_count: number
256256
}
257+
| {
258+
type: "memory_operation"
259+
timestamp: number
260+
session_id: string
261+
operation: "write" | "delete"
262+
scope: "global" | "project"
263+
block_id: string
264+
is_update: boolean
265+
duplicate_count: number
266+
tags_count: number
267+
}
268+
| {
269+
type: "memory_injection"
270+
timestamp: number
271+
session_id: string
272+
block_count: number
273+
total_chars: number
274+
budget: number
275+
scopes_used: string[]
276+
}
257277

258278
const FILE_TOOLS = new Set(["read", "write", "edit", "glob", "grep", "bash"])
259279

@@ -266,6 +286,7 @@ export namespace Telemetry {
266286
{ category: "dbt", keywords: ["dbt"] },
267287
{ category: "warehouse", keywords: ["warehouse", "connection"] },
268288
{ category: "lineage", keywords: ["lineage", "dag"] },
289+
{ category: "memory", keywords: ["memory"] },
269290
]
270291

271292
export function categorizeToolName(name: string, type: "standard" | "mcp"): string {

packages/opencode/src/flag/flag.ts

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,12 @@ export namespace Flag {
3030
export const OPENCODE_CONFIG_CONTENT = process.env["OPENCODE_CONFIG_CONTENT"]
3131
export const OPENCODE_DISABLE_AUTOUPDATE = truthy("OPENCODE_DISABLE_AUTOUPDATE")
3232
export const OPENCODE_DISABLE_PRUNE = truthy("OPENCODE_DISABLE_PRUNE")
33+
// altimate_change start - global opt-out for Altimate Memory
34+
export const ALTIMATE_DISABLE_MEMORY = altTruthy("ALTIMATE_DISABLE_MEMORY", "OPENCODE_DISABLE_MEMORY")
35+
// altimate_change end
36+
// altimate_change start - opt-in for session-end auto-extraction
37+
export const ALTIMATE_MEMORY_AUTO_EXTRACT = altTruthy("ALTIMATE_MEMORY_AUTO_EXTRACT", "OPENCODE_MEMORY_AUTO_EXTRACT")
38+
// altimate_change end
3339
export const OPENCODE_DISABLE_TERMINAL_TITLE = truthy("OPENCODE_DISABLE_TERMINAL_TITLE")
3440
export const OPENCODE_PERMISSION = process.env["OPENCODE_PERMISSION"]
3541
export const OPENCODE_DISABLE_DEFAULT_PLUGINS = truthy("OPENCODE_DISABLE_DEFAULT_PLUGINS")
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
export { MemoryStore, isExpired } from "./store"
2+
export { MemoryPrompt } from "./prompt"
3+
export { MemoryReadTool } from "./tools/memory-read"
4+
export { MemoryWriteTool } from "./tools/memory-write"
5+
export { MemoryDeleteTool } from "./tools/memory-delete"
6+
export { MemoryAuditTool } from "./tools/memory-audit"
7+
export { MemoryExtractTool } from "./tools/memory-extract"
8+
export { MEMORY_MAX_BLOCK_SIZE, MEMORY_MAX_BLOCKS_PER_SCOPE, MEMORY_MAX_CITATIONS, MEMORY_DEFAULT_INJECTION_BUDGET } from "./types"
9+
export type { MemoryBlock, Citation } from "./types"
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
import { MemoryStore, isExpired } from "./store"
2+
import { MEMORY_DEFAULT_INJECTION_BUDGET, type MemoryBlock } from "./types"
3+
import { Telemetry } from "@/altimate/telemetry"
4+
5+
export namespace MemoryPrompt {
6+
export function formatBlock(block: MemoryBlock): string {
7+
const tagsStr = block.tags.length > 0 ? ` [${block.tags.join(", ")}]` : ""
8+
const expiresStr = block.expires ? ` (expires: ${block.expires})` : ""
9+
let result = `### ${block.id} (${block.scope})${tagsStr}${expiresStr}\n${block.content}`
10+
11+
if (block.citations && block.citations.length > 0) {
12+
const citationLines = block.citations.map((c) => {
13+
const lineStr = c.line ? `:${c.line}` : ""
14+
const noteStr = c.note ? ` — ${c.note}` : ""
15+
return `- \`${c.file}${lineStr}\`${noteStr}`
16+
})
17+
result += "\n\n**Sources:**\n" + citationLines.join("\n")
18+
}
19+
20+
return result
21+
}
22+
23+
export async function inject(budget: number = MEMORY_DEFAULT_INJECTION_BUDGET): Promise<string> {
24+
const blocks = await MemoryStore.listAll()
25+
if (blocks.length === 0) return ""
26+
27+
const header = "## Altimate Memory\n\nThe following memory blocks were saved from previous sessions:\n"
28+
let result = header
29+
let used = header.length
30+
let injectedCount = 0
31+
const scopesSeen = new Set<string>()
32+
33+
for (const block of blocks) {
34+
if (isExpired(block)) continue
35+
const formatted = formatBlock(block)
36+
const needed = formatted.length + 2
37+
if (used + needed > budget) break
38+
result += "\n" + formatted + "\n"
39+
used += needed
40+
injectedCount++
41+
scopesSeen.add(block.scope)
42+
}
43+
44+
if (injectedCount > 0) {
45+
Telemetry.track({
46+
type: "memory_injection",
47+
timestamp: Date.now(),
48+
session_id: Telemetry.getContext().sessionId,
49+
block_count: injectedCount,
50+
total_chars: used,
51+
budget,
52+
scopes_used: [...scopesSeen],
53+
})
54+
}
55+
56+
return result
57+
}
58+
}

0 commit comments

Comments
 (0)