You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Model directories as first-class records in the index DB instead of deriving them purely from file paths. Today there is no directory concept at all, which makes empty directories unrepresentable and forces a placeholder-file kludge in the cloud (see basicmachines-co/basic-memory-cloud#1014).
Background — why this is needed
DirectoryService.get_directory_tree() calls entity_repository.find_all() and synthesizes directory nodes by splitting each file's file_path. There is no directories table and no directory entity — a folder exists only as a path-prefix of some file. Consequences:
Empty directories can't exist. No file under a prefix ⇒ no node ⇒ the folder is invisible.
The cloud works around this by writing a placeholder note (.keep) into new folders. Because EntitySchema.file_path regenerates the filename from the (sanitized, kebab'd) title, .keep is stored as keep.md with title .keep, gets indexed as a real note, and then has to be filtered out of the UI per-call. That filter is fragile and already leaks (cloud #1014).
Tree/list reads don't scale.get_directory_tree() loads every entity and rebuilds the whole tree in Python on each call — a full scan for large (12k–18k file) projects.
Folder move/rename rewrites file_path on every descendant entity.
Goal
A directory is a real row in the index. The directory tree is the union of (a) directories derived from file paths and (b) explicit directory records, so empty folders are representable, reads are level-scoped, and structure ops are cheap on the index side.
Hard constraints
SQLite and Postgres must both be supported. basic-memory is local-first (SQLite) and cloud (per-tenant Postgres). So Postgres ltree is not a portable option — either avoid it, or treat it as a Postgres-only optimization behind a SQLite-compatible fallback (two implementations). Default to a backend-agnostic model: adjacency list (parent_id) + a materialized path text column (prefix queries + depth integer for level scoping). This works identically on both engines.
Files keep their actual directory paths — local and cloud. No id-addressed/opaque storage. The bucket (cloud) and the folder on disk (local) must remain a readable mirror of the human path tree. So a folder move still physically moves the underlying files; the directory model makes the index side cheap and the reads fast, it does not eliminate the file moves.
Proposed design (sketch — open to iteration)
Schema
New directories table: id, project_id, parent_id (adjacency), name, path (materialized full path, text), depth (int). Index on (project_id, parent_id) and (project_id, path).
entity gains directory_id FK (its containing directory). file_path stays as the real path / storage key.
Recommend a separate table over an is_directory flag on entity: folders have no content, no frontmatter, no search vector, and different move/lifecycle semantics; mixing taxes every entity query.
Directory tree / list
get_directory_tree() / list_directory(dir, depth) become level-scoped queries (WHERE parent_id = :dir / WHERE path LIKE :prefix AND depth = :n) instead of find_all() + full Python rebuild. Big win for large projects; enables lazy per-level loading.
Empty-directory persistence (durable across reindex/restore)
Adopt the S3-native directory-marker convention: an empty folder is a zero-byte object whose key ends in / (e.g. notes/2024/empty/) — the same thing the S3 console writes for "New folder." Its basename is empty, so it is unambiguously a directory, not a file.
Sync recognizes a trailing-slash key as a directory and creates a directories row — it must NOT be parsed as markdown, given a title, or indexed as a note.
Locally, an empty directory on disk maps to the same directories row (no marker file needed; the real empty dir is the source of truth).
Move / rename
Index side: structural update (re-point parent_id, rewrite path/depth for the moved subtree — one set-based statement).
Storage side: still moves the underlying files (unavoidable, S3 has no rename / local is a real mv) — should run in the background, not the request path (mirrors cloud #1012).
Migration
Convert existing keep.md placeholders into directories rows (+ trailing-slash markers in cloud buckets) and stop creating them.
Out of scope / non-goals
Decoupling storage keys from human paths (would break local-first readability) — explicitly not doing this (constraint Memory json import #2).
ltree as the primary model (constraint Pre release #1) — only acceptable as an optional PG optimization with a SQLite-equivalent path.
Related
Cloud web-UI bug this unblocks: basicmachines-co/basic-memory-cloud#1014 (empty folder leaks a .keep/keep.md note). A short-term filter fix is being applied cloud-side; this issue is the real fix.
Cloud folder-move latency: basicmachines-co/basic-memory-cloud#1012 (server-side move-directory reindex still inline).
Summary
Model directories as first-class records in the index DB instead of deriving them purely from file paths. Today there is no directory concept at all, which makes empty directories unrepresentable and forces a placeholder-file kludge in the cloud (see basicmachines-co/basic-memory-cloud#1014).
Background — why this is needed
DirectoryService.get_directory_tree()callsentity_repository.find_all()and synthesizes directory nodes by splitting each file'sfile_path. There is nodirectoriestable and no directory entity — a folder exists only as a path-prefix of some file. Consequences:.keep) into new folders. BecauseEntitySchema.file_pathregenerates the filename from the (sanitized, kebab'd) title,.keepis stored askeep.mdwith title.keep, gets indexed as a real note, and then has to be filtered out of the UI per-call. That filter is fragile and already leaks (cloud #1014).get_directory_tree()loads every entity and rebuilds the whole tree in Python on each call — a full scan for large (12k–18k file) projects.file_pathon every descendant entity.Goal
A directory is a real row in the index. The directory tree is the union of (a) directories derived from file paths and (b) explicit directory records, so empty folders are representable, reads are level-scoped, and structure ops are cheap on the index side.
Hard constraints
ltreeis not a portable option — either avoid it, or treat it as a Postgres-only optimization behind a SQLite-compatible fallback (two implementations). Default to a backend-agnostic model: adjacency list (parent_id) + a materializedpathtext column (prefix queries +depthinteger for level scoping). This works identically on both engines.Proposed design (sketch — open to iteration)
Schema
directoriestable:id,project_id,parent_id(adjacency),name,path(materialized full path, text),depth(int). Index on(project_id, parent_id)and(project_id, path).entitygainsdirectory_idFK (its containing directory).file_pathstays as the real path / storage key.is_directoryflag onentity: folders have no content, no frontmatter, no search vector, and different move/lifecycle semantics; mixing taxes every entity query.Directory tree / list
get_directory_tree()/list_directory(dir, depth)become level-scoped queries (WHERE parent_id = :dir/WHERE path LIKE :prefix AND depth = :n) instead offind_all()+ full Python rebuild. Big win for large projects; enables lazy per-level loading.Empty-directory persistence (durable across reindex/restore)
/(e.g.notes/2024/empty/) — the same thing the S3 console writes for "New folder." Its basename is empty, so it is unambiguously a directory, not a file.directoriesrow — it must NOT be parsed as markdown, given a title, or indexed as a note.directoriesrow (no marker file needed; the real empty dir is the source of truth).Move / rename
parent_id, rewritepath/depthfor the moved subtree — one set-based statement).Migration
keep.mdplaceholders intodirectoriesrows (+ trailing-slash markers in cloud buckets) and stop creating them.Out of scope / non-goals
ltreeas the primary model (constraint Pre release #1) — only acceptable as an optional PG optimization with a SQLite-equivalent path.Related
.keep/keep.mdnote). A short-term filter fix is being applied cloud-side; this issue is the real fix.