Model directories as first-class records in the index DB (enable empty dirs, faster tree/list, cheap folder moves)

## Summary

Model directories as **first-class records in the index DB** instead of deriving them purely from file paths. Today there is no directory concept at all, which makes empty directories unrepresentable and forces a placeholder-file kludge in the cloud (see basicmachines-co/basic-memory-cloud#1014).

## Background — why this is needed

`DirectoryService.get_directory_tree()` calls `entity_repository.find_all()` and **synthesizes** directory nodes by splitting each file's `file_path`. There is no `directories` table and no directory entity — a folder exists *only* as a path-prefix of some file. Consequences:

- **Empty directories can't exist.** No file under a prefix ⇒ no node ⇒ the folder is invisible.
- The cloud works around this by writing a placeholder note (`.keep`) into new folders. Because `EntitySchema.file_path` regenerates the filename from the (sanitized, kebab'd) title, `.keep` is stored as `keep.md` with title `.keep`, gets indexed as a real note, and then has to be filtered out of the UI per-call. That filter is fragile and already leaks (cloud #1014).
- **Tree/list reads don't scale.** `get_directory_tree()` loads *every* entity and rebuilds the whole tree in Python on each call — a full scan for large (12k–18k file) projects.
- **Folder move/rename** rewrites `file_path` on every descendant entity.

## Goal

A directory is a real row in the index. The directory tree is the union of (a) directories derived from file paths and (b) explicit directory records, so empty folders are representable, reads are level-scoped, and structure ops are cheap on the index side.

## Hard constraints

1. **SQLite *and* Postgres must both be supported.** basic-memory is local-first (SQLite) and cloud (per-tenant Postgres). So **Postgres `ltree` is not a portable option** — either avoid it, or treat it as a Postgres-only optimization behind a SQLite-compatible fallback (two implementations). Default to a backend-agnostic model: **adjacency list (`parent_id`) + a materialized `path` text column** (prefix queries + `depth` integer for level scoping). This works identically on both engines.
2. **Files keep their actual directory paths — local and cloud.** No id-addressed/opaque storage. The bucket (cloud) and the folder on disk (local) must remain a readable mirror of the human path tree. So a folder move still physically moves the underlying files; the directory model makes the *index* side cheap and the reads fast, it does not eliminate the file moves.

## Proposed design (sketch — open to iteration)

**Schema**
- New `directories` table: `id`, `project_id`, `parent_id` (adjacency), `name`, `path` (materialized full path, text), `depth` (int). Index on `(project_id, parent_id)` and `(project_id, path)`.
- `entity` gains `directory_id` FK (its containing directory). `file_path` stays as the real path / storage key.
- Recommend a **separate table** over an `is_directory` flag on `entity`: folders have no content, no frontmatter, no search vector, and different move/lifecycle semantics; mixing taxes every entity query.

**Directory tree / list**
- `get_directory_tree()` / `list_directory(dir, depth)` become **level-scoped queries** (`WHERE parent_id = :dir` / `WHERE path LIKE :prefix AND depth = :n`) instead of `find_all()` + full Python rebuild. Big win for large projects; enables lazy per-level loading.

**Empty-directory persistence (durable across reindex/restore)**
- Adopt the **S3-native directory-marker convention**: an empty folder is a zero-byte object whose key ends in `/` (e.g. `notes/2024/empty/`) — the same thing the S3 console writes for "New folder." Its basename is empty, so it is unambiguously a directory, not a file.
- **Sync recognizes a trailing-slash key as a directory** and creates a `directories` row — it must NOT be parsed as markdown, given a title, or indexed as a note.
- Locally, an empty directory on disk maps to the same `directories` row (no marker file needed; the real empty dir is the source of truth).

**Move / rename**
- Index side: structural update (re-point `parent_id`, rewrite `path`/`depth` for the moved subtree — one set-based statement). 
- Storage side: still moves the underlying files (unavoidable, S3 has no rename / local is a real mv) — should run in the background, not the request path (mirrors cloud #1012).

**Migration**
- Convert existing `keep.md` placeholders into `directories` rows (+ trailing-slash markers in cloud buckets) and stop creating them.

## Out of scope / non-goals
- Decoupling storage keys from human paths (would break local-first readability) — explicitly not doing this (constraint #2).
- `ltree` as the primary model (constraint #1) — only acceptable as an optional PG optimization with a SQLite-equivalent path.

## Related
- Cloud web-UI bug this unblocks: basicmachines-co/basic-memory-cloud#1014 (empty folder leaks a `.keep`/`keep.md` note). A short-term filter fix is being applied cloud-side; this issue is the real fix.
- Cloud folder-move latency: basicmachines-co/basic-memory-cloud#1012 (server-side move-directory reindex still inline).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model directories as first-class records in the index DB (enable empty dirs, faster tree/list, cheap folder moves) #884

Summary

Background — why this is needed

Goal

Hard constraints

Proposed design (sketch — open to iteration)

Out of scope / non-goals

Related

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Model directories as first-class records in the index DB (enable empty dirs, faster tree/list, cheap folder moves) #884

Description

Summary

Background — why this is needed

Goal

Hard constraints

Proposed design (sketch — open to iteration)

Out of scope / non-goals

Related

Metadata

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Issue actions