Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 83 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# exgit — notes for Claude

## Before every commit

Run these in order. CI runs all of them; catching failures locally is faster.

```sh
mix format
mix compile --warnings-as-errors
MIX_ENV=dev mix credo --strict
MIX_ENV=dev mix dialyzer
mix test --warnings-as-errors
```

Dialyzer is slow (~2 min on a cold PLT, ~30s warm). Run it every time —
it catches type errors that Credo and the compiler miss entirely. The five
Dialyzer classes that showed up in this codebase:

- **`pattern_match`** — a clause that can never match given the inferred
types (e.g. `{:literal, s}` in a function Dialyzer proves only receives
`binary | %Regex{}`). Remove the dead clause.
- **`unmatched_return`** — a call whose return type includes `{:error, _}`
but the result is discarded. Add `_ =` to silence intentional ignoring.
- **`improper_list_constr`** — `[list | binary]` builds an improper list.
Use `[list, binary]` instead.
- **`pattern_match` on `with`-else** — Dialyzer traces through the `with`
chain and knows the else block only receives one type. If it says
`{:error, _}` can't match, the inferred type is more specific (e.g.
always a tuple, so use `else err -> err`).

## What CI checks (`.github/workflows/ci.yml`)

| Step | Command | Notes |
|---|---|---|
| Compile | `mix compile --warnings-as-errors` | Warnings are errors |
| **Format** | `mix format --check-formatted` | Formatter is non-negotiable |
| Unused deps | `mix deps.unlock --check-unused` | Run after adding/removing deps |
| **Credo** | `MIX_ENV=dev mix credo --strict` | Strict = all categories, no exceptions |
| Dialyzer | `MIX_ENV=dev mix dialyzer` | Slow; primary matrix only |
| Tests | `mix test --warnings-as-errors` | Warnings are errors here too |
| Extended | `mix test --warnings-as-errors --include slow --include real_git` | Includes real git binary tests |
| Integration | `mix test --warnings-as-errors --only integration` | Live network (pyex) — primary only |

## Common Credo traps

- **Alias ordering**: aliases must be alphabetical within a group.
`Exgit.Object` before `Exgit.Pack`, `{Blob, Commit}` before `{Tree}`.
- **TODO comments**: Credo flags `# TODO` at design level. Use
`# TODO(owner):` to suppress, or rephrase as `# follow-up:`.
- **Function complexity**: cyclomatic complexity cap is 12. Extract
helpers if you get close — `do_fetch` hit 13 and needed splitting.
- **Unused aliases**: every `alias` in a file must be referenced.
Test files are checked too.
- **Unused private functions**: dead helpers in test files trigger
`--warnings-as-errors` in `mix test`.

## Architecture invariants (don't break these)

- **No hidden state**: no ETS, no Process dictionary, no persistent_term
in the hot path. State lives on the struct the caller holds.
- **No disk in the agent path**: `Exgit.clone/2` (default) uses Memory
stores. `File.*` calls belong only in `ObjectStore.Disk`, `RefStore.Disk`,
`Config`, and `Index`.
- **No auth on public repos**: never pass a PAT/token to `Exgit.clone/2`
for a public repo. Credential exposure for no benefit.
- **StreamParser is pure**: `ingest/2` and `finalize/1` are pure functions
that return updated state. No side effects except writes through the
`ObjectStore` protocol callbacks.
- **ObjectStore protocol**: `put/2` returns `{:ok, sha, new_store}` — always
thread `new_store` forward. Same for `open_write/close_write`.

## Test tags

| Tag | Meaning |
|---|---|
| (none) | Runs in default `mix test` |
| `:slow` | Long-running; included in extended CI tier |
| `:real_git` | Requires `git` binary on PATH |
| `:integration` | Live network; primary CI only |
| `:github_private` | Requires secrets; push-to-main only |
| `:memory` | Memory regression guard; run with `--include memory` |
| `:git_cross_check` | Cross-checks against real git binary |
| `:network` | Live network; excluded by default |
144 changes: 144 additions & 0 deletions bench/local_pack_eval.exs
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
# Local pack evaluation — streaming parser vs Pack.Reader
#
# Runs both parsers against the local opencode .git packfiles
# (no network, pure parser performance and memory comparison).
#
# mix run bench/local_pack_eval.exs

alias Exgit.ObjectStore.Memory
alias Exgit.Pack.{Reader, StreamParser}

defmodule LocalEval.Fmt do
def us(t) when t >= 1_000_000, do: "#{Float.round(t / 1_000_000, 2)}s"
def us(t) when t >= 1_000, do: "#{Float.round(t / 1_000, 1)}ms"
def us(t), do: "#{t}µs"
def bytes(b) when b >= 1_073_741_824, do: "#{Float.round(b / 1_073_741_824, 2)} GB"
def bytes(b) when b >= 1_048_576, do: "#{Float.round(b / 1_048_576, 1)} MB"
def bytes(b) when b >= 1_024, do: "#{Float.round(b / 1_024, 1)} KB"
def bytes(b), do: "#{b} B"
end

alias LocalEval.Fmt

packs = [
{"opencode 34MB", "/Users/ivar/code/opencode/.git/objects/pack/pack-c6597be5752d52a1569f84052ce7bc96a2071210.pack"},
{"opencode 135MB", "/Users/ivar/code/opencode/.git/objects/pack/pack-87af0cf7c6779ce067dfbfaf9ef8368804204b3a.pack"}
]

IO.puts("""

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
exgit local pack benchmark — StreamParser vs Pack.Reader
#{DateTime.utc_now() |> Calendar.strftime("%Y-%m-%d %H:%M:%S")} UTC
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
""")

for {label, path} <- packs do
if not File.exists?(path) do
IO.puts("SKIP #{label}: #{path} not found")
else

pack_size = File.stat!(path).size
IO.puts("── #{label} (#{Fmt.bytes(pack_size)}) ──────────────────────────────")

# ── Pack.Reader (baseline) ──────────────────────────────────────────────
:erlang.garbage_collect()
:timer.sleep(100)
mem_before_reader = :erlang.memory(:total)

{t_reader, reader_result} =
:timer.tc(fn ->
pack = File.read!(path)
Reader.parse(pack)
end)

:erlang.garbage_collect()
mem_after_reader = :erlang.memory(:total)

case reader_result do
{:ok, objects} ->
IO.puts(" Pack.Reader")
IO.puts(" time : #{Fmt.us(t_reader)}")
IO.puts(" objects : #{length(objects)}")
IO.puts(" heap growth : #{Fmt.bytes(mem_after_reader - mem_before_reader)}")
IO.puts(" growth/pack : #{Float.round((mem_after_reader - mem_before_reader) / pack_size, 2)}×")

{:error, reason} ->
IO.puts(" Pack.Reader FAILED: #{inspect(reason)}")
end

IO.puts("")

# ── StreamParser ────────────────────────────────────────────────────────
:erlang.garbage_collect()
:timer.sleep(100)
mem_before_stream = :erlang.memory(:total)

{t_stream, stream_result} =
:timer.tc(fn ->
store = Memory.new()
parser = StreamParser.new(store)

# Feed in 64KB chunks to simulate network streaming.
chunk_size = 64 * 1024

result =
File.stream!(path, chunk_size)
|> Enum.reduce_while({:ok, parser}, fn chunk, {:ok, p} ->
case StreamParser.ingest(p, chunk) do
{:ok, p2} -> {:cont, {:ok, p2}}
{:error, _} = err -> {:halt, err}
end
end)

case result do
{:ok, parser} -> StreamParser.finalize(parser)
{:error, _} = err -> err
end
end)

:erlang.garbage_collect()
mem_after_stream = :erlang.memory(:total)

case stream_result do
{:ok, n, _store} ->
IO.puts(" StreamParser")
IO.puts(" time : #{Fmt.us(t_stream)}")
IO.puts(" objects : #{n}")
IO.puts(" heap growth : #{Fmt.bytes(mem_after_stream - mem_before_stream)}")
IO.puts(" growth/pack : #{Float.round((mem_after_stream - mem_before_stream) / pack_size, 2)}×")

if match?({:ok, objs}, reader_result) do
{:ok, reader_objs} = reader_result
speedup = Float.round(t_reader / t_stream, 2)
mem_ratio =
Float.round(
(mem_after_stream - mem_before_stream) /
(mem_after_reader - mem_before_reader),
2
)

IO.puts("")
IO.puts(" Comparison (vs Pack.Reader)")
IO.puts(" time ratio : #{speedup}× (>1 = StreamParser faster)")
IO.puts(" memory ratio : #{mem_ratio}× (>1 = StreamParser uses more)")
IO.puts(" objects match: #{length(reader_objs) == n}")
end

{:error, reason} ->
IO.puts(" StreamParser FAILED: #{inspect(reason)}")
end

IO.puts("")
end # end if File.exists?
end

IO.puts("""
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Notes
Pack.Reader : loads full binary into heap, parses all at once
StreamParser: 64KB chunks → inflate → streaming deflate write to Memory
heap never holds full pack binary; peaks at O(max_object_size)
Memory usage includes the store (all objects compressed), not just parse overhead
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
""")
Loading
Loading