Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,59 @@ end
# removed old_file.ex
```

### Working with the workspace (agent loop)

`Exgit.Workspace` is a working tree on top of a ref. Reads pass
through to the ref until a write happens; each write produces a new
in-memory tree SHA, so every state of the workspace is a real git
tree object — snapshots are 20-byte values, branching is free,
commits are an O(1) hash-and-store.

```elixir
ws = Exgit.Workspace.open(repo, "main")

{:ok, ws} = Exgit.Workspace.write(ws, "lib/foo.ex", new_source)
{:ok, ws} = Exgit.Workspace.rm(ws, "lib/old.ex")

{:ok, content, ws} = Exgit.Workspace.read(ws, "lib/foo.ex")
{:ok, [{:modified, "lib/foo.ex"}, {:deleted, "lib/old.ex"}], ws} =
Exgit.Workspace.diff(ws)

{:ok, commit_sha, ws} =
Exgit.Workspace.commit(ws,
message: "agent: refactor",
author: %{name: "agent", email: "agent@example.com"},
update_ref: "refs/heads/agent-turn-1")

# Snapshot is an opaque value — persist it and replay later
saved = Exgit.Workspace.snapshot(ws)
ws = Exgit.Workspace.restore(ws, saved)
```

The struct is a plain value, so branching the agent's state for
parallel exploration is just `ws_b = ws_a` — both diverge
independently from there.

### Mounting through `:vfs`

If you depend on [`:vfs`](https://github.com/ivarvong/vfs), an
`Exgit.Workspace` ships a `VFS.Mountable` defimpl so it composes
with other backends (in-memory scratch, postgres, S3) under one
mount table. Capabilities: `[:read, :write, :lazy]`.

```elixir
fs =
VFS.new()
|> VFS.mount("/repo", Exgit.Workspace.open(repo, "main"))
|> VFS.mount("/scratch", VFS.Memory.new())

{:ok, content, fs} = VFS.read_file(fs, "/repo/lib/foo.ex")
{:ok, fs} = VFS.write_file(fs, "/repo/lib/foo.ex", new_source)
```

`:vfs` is an optional dependency; `Exgit.Workspace` is fully
usable without it.

## Performance

Every hot path emits [`:telemetry`](https://hexdocs.pm/telemetry/)
Expand Down
174 changes: 174 additions & 0 deletions docs/VFS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
# VFS integration

`:vfs` ([github.com/ivarvong/vfs](https://github.com/ivarvong/vfs)) is
the protocol-based virtual-filesystem layer that sits *above* exgit.
Agents in production rarely consume git in isolation — they compose a
read-write working tree (git), a scratch (in-memory), and a durable
per-tenant store (postgres/S3) under one tree. `:vfs` provides the
mount table and the `VFS.Mountable` protocol; exgit ships
`Exgit.Workspace`, a working-tree-on-top-of-a-ref, as a backend.

## Concept

The integration is a wrapper struct, `Exgit.Workspace`, that pairs
`(repository, base_ref, head_tree)`:

* `:base_ref` — the starting point ("HEAD", a branch, a commit SHA).
* `:head_tree` — the working tree's SHA (20 bytes), or `nil` when
the workspace is pristine. Reads use `head_tree || base_ref`.

Every state of the workspace is a real git tree object. That gives
us:

* **Snapshot is free.** Just save the head_tree binary.
* **Branching is free.** `ws_b = ws_a` — both diverge independently.
* **Commit is instantaneous.** The work is already done; we have
the tree.
* **Diff is structured.** `Exgit.Diff.trees/4` against `base_ref`.

Trade-off: empty directories don't exist (git doesn't store them) and
writes cost O(depth × log fanout) rather than O(1). For typical agent
workloads (depth <10, <100 entries per dir) writes are ~100µs each
in-memory.

## Dependency direction

`:exgit` takes `:vfs` as an **optional** dep. `:vfs` never depends on
`:exgit`.

```elixir
# mix.exs in :exgit (already wired)
{:vfs, github: "ivarvong/vfs", ref: "...", optional: true, only: [:dev, :test]}
```

The `Exgit.Workspace.VFS` module wraps the entire defimpl in
`Code.ensure_loaded?(VFS.Mountable)`. Production builds without
`:vfs` resolved drop it cleanly; `Exgit.Workspace` itself is fully
usable as a standalone API.

## Capabilities

```elixir
MapSet.new([:read, :write, :lazy])
```

Not `:mkdir` — git trees can't represent empty directories, so a
faithful `mkdir/3` has no honest semantics. `write_file/4` implicitly
creates parent directories (vfs explicitly supports this for
flat-keyed backends).

## State threading

`VFS.Mountable` requires every op to return the (possibly updated)
backend impl as the last element of its success tuple. The workspace
threads two pieces of state on each call:

1. `repo.object_store` — grows on lazy partial-clone fetches.
2. `head_tree` — advances on every write.

The conformance suite's "state threading" tests exercise this
explicitly: a write returns a workspace whose subsequent reads
reflect the write.

## Path translation

vfs paths are absolute with a leading `/`. `Exgit.FS` paths are
slash-tolerant but treat `""` as the root tree. The defimpl strips
the leading slash before calling FS:

```elixir
defp strip_leading("/"), do: ""
defp strip_leading("/" <> rest), do: rest
```

vfs's mount-table dispatcher already normalizes paths and strips the
mount prefix before reaching the backend.

## Materialize

Calls `Exgit.Repository.materialize/2`, NOT `Exgit.FS.prefetch/3`.
The latter populates the cache without flipping `mode: :eager`, which
means streaming ops (`walk`, `grep`) still raise `ArgumentError` (see
`Exgit.FS.require_eager!/2` at `lib/exgit/fs.ex:1414-1423`). The
former does both in one step.

## Walk

`Exgit.FS.walk/2` requires the underlying repo to be `:eager`. After
a write, the head_tree is resident in the object store but the repo's
mode flag is unchanged, so `VFS.walk/3` still requires
`VFS.materialize/2` to be called first on lazy partial-clone repos.

For an agent loop this is the natural sequence anyway: clone lazy →
materialize → search/edit. Loosening `walk/2` to allow walking a
fully-resident tree without `:eager` is tracked as a possible
follow-up in `Exgit.FS`.

## Walk-emitted stat caveats

* **`size` is 0.** Git tree entries don't carry blob size; only an
explicit `stat/2` per path resolves the blob and returns the
real number.
* **`mtime` is the epoch.** Git blobs aren't dated; only commits
are. Walking history per blob to invent an mtime is expensive
and rarely correct.

## Git-aware ops live on the workspace, not the protocol

`commit/2`, `snapshot/1`, `restore/2`, `diff/1`, `checkout/2`, and
`materialize/1` aren't part of `VFS.Mountable`. Agents reach for
them on the workspace struct directly:

```elixir
ws = Exgit.Workspace.open(repo, "main")

# Filesystem ops via vfs (interoperable with other mounts)
fs = VFS.new() |> VFS.mount("/repo", ws)
{:ok, content, fs} = VFS.read_file(fs, "/repo/lib/foo.ex")

# Or directly on the workspace (when ws is the only thing you have)
{:ok, content, ws} = Exgit.Workspace.read(ws, "lib/foo.ex")

# Git-aware: workspace API only
snapshot = Exgit.Workspace.snapshot(ws)
{:ok, sha, ws} = Exgit.Workspace.commit(ws, message: "...", author: %{...})
```

## Conformance

vfs ships `VFS.ConformanceCase` — a parametrized macro every backend
runs through. The exgit-side conformance test lives at
`test/exgit/workspace_vfs_test.exs` and is tagged `:vfs` so it's
skipped when the dep isn't resolved (e.g. on the Elixir 1.17 CI tier
where vfs requires ~> 1.18).

A backend that ships without conformance is shipping with unverified
contract behavior — which is exactly how `VFS.Test.AppService`
silently ignored `:byte_range` / `:line_range` / `:chunk_size` until
the audit (vfs CHANGELOG, 2026-05-02). New behavior gets caught here.

The harness currently lives in vfs's `test/support/`; we load it via
`Code.require_file/1` from `test_helper.exs`. Once vfs publishes
`VFS.ConformanceCase` in `lib/`, the require-file dance can drop and
`use VFS.ConformanceCase` works directly.

## What this doesn't try to be

* **Not an index.** No "staged vs working tree" distinction. The
workspace IS the working tree; commit takes everything-or-nothing.
* **Not a merger.** Single parent on commit. Multi-parent merge is
a future concern.
* **Not auto-committing.** Writes never produce a commit by
themselves. The agent decides when to checkpoint via
`Exgit.Workspace.commit/2`.
* **Not a sync layer.** Push/pull aren't workspace ops — they're
`Exgit.push/3` against the underlying repo.

## References

* vfs repo: <https://github.com/ivarvong/vfs>
* vfs SPEC: vfs `SPEC.md`
* Working impl: `lib/exgit/workspace.ex`
* VFS defimpl: `lib/exgit/workspace/vfs.ex`
* Workspace tests: `test/exgit/workspace_test.exs`
* Conformance test: `test/exgit/workspace_vfs_test.exs`
78 changes: 78 additions & 0 deletions lib/exgit/fs.ex
Original file line number Diff line number Diff line change
Expand Up @@ -1241,6 +1241,84 @@ defmodule Exgit.FS do
{:ok, sha, %{repo | object_store: store}}
end

@doc """
Remove the entry at `path` from the tree at `reference`. Returns
`{:ok, new_tree_sha, repo}` — the new tree omits the entry; existing
blob/tree objects are left untouched (git is content-addressed; orphan
objects are GC'd separately).

## Options

* `:recursive` — when `true`, removing a directory also removes its
contents. Default `false`; removing a directory without
`:recursive` returns `{:error, :eisdir}`.

Errors:

* `{:error, :not_found}` — `path` does not exist in the tree
* `{:error, :eisdir}` — `path` is a directory and `:recursive` is
not set
* `{:error, :cannot_rm_root}` — `path` is empty or `"/"`

Mirrors `write_path/5`'s tree-rewrite shape so a workspace can chain
`rm_path` and `write_path` calls to assemble multi-file edits before
committing.
"""
@spec rm_path(Repository.t(), ref(), path(), keyword()) ::
{:ok, binary(), Repository.t()} | {:error, term()}
def rm_path(%Repository{} = repo, reference, path, opts \\ []) do
recursive = Keyword.get(opts, :recursive, false)
segments = normalize_path(path)

if segments == [] do
{:error, :cannot_rm_root}
else
with {:ok, tree_sha, repo} <- resolve_tree(repo, reference) do
remove_entry_from_tree(repo, tree_sha, segments, recursive)
end
end
end

defp remove_entry_from_tree(repo, tree_sha, [name], recursive) do
with {:ok, %Tree{entries: entries}, repo} <- fetch_object(repo, tree_sha) do
case Enum.find(entries, fn {_, n, _} -> n == name end) do
nil ->
{:error, :not_found}

{"40000", _, _} when not recursive ->
{:error, :eisdir}

_ ->
new_entries = Enum.reject(entries, fn {_, n, _} -> n == name end)
new_tree = Tree.new(new_entries)
{:ok, sha, store} = ObjectStore.put(repo.object_store, new_tree)
{:ok, sha, %{repo | object_store: store}}
end
end
end

defp remove_entry_from_tree(repo, tree_sha, [dir | rest], recursive) do
with {:ok, %Tree{entries: entries}, repo} <- fetch_object(repo, tree_sha) do
case Enum.find(entries, fn {m, n, _} -> n == dir and m == "40000" end) do
nil ->
{:error, :not_found}

{_, _, child_sha} ->
case remove_entry_from_tree(repo, child_sha, rest, recursive) do
{:ok, new_child_sha, repo} ->
other_entries = Enum.reject(entries, fn {_, n, _} -> n == dir end)
new_entries = other_entries ++ [{"40000", dir, new_child_sha}]
new_tree = Tree.new(new_entries)
{:ok, sha, store} = ObjectStore.put(repo.object_store, new_tree)
{:ok, sha, %{repo | object_store: store}}

{:error, _} = err ->
err
end
end
end
end

# ----------------------------------------------------------------------
# Internal: object fetch that threads the repo for Promisor-backed stores
# ----------------------------------------------------------------------
Expand Down
Loading
Loading