Skip to content

Consumer ergonomics for the Morris cutover (hash / store / registry / preset / emulator)#11

Merged
dratner merged 2 commits into
mainfrom
feat/cutover-helpers
Jun 8, 2026
Merged

Consumer ergonomics for the Morris cutover (hash / store / registry / preset / emulator)#11
dratner merged 2 commits into
mainfrom
feat/cutover-helpers

Conversation

@dratner

@dratner dratner commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Additive, app-neutral helpers the Morris team asked for to smooth their cutover. No new dependencies; no go.mod change. Each leaves policy with the app.

What's added

  • contentSHA256HexBytes / SHA256HexString / SHA256HexReader(ctx) content-addressing primitives (the §10 hash helper). Documented as primitives only: the app decides whether a digest is raw-source / artifact / chunk / cache identity. SHA256HexReader streams and returns the byte count + honors ctx.
  • storeIsNotFound(err) and DeleteIfExists(ctx, s, key). Keeps ObjectStore.Delete strict (the contract stays ErrObjectNotFound) while giving Morris a clean idempotent path for rollback/cleanup.
  • extractRegistry.Supports + Registry.SupportedMediaTypes() (sorted, canonical — same normalization as Extract); and TextArtifacts / SingleTextArtifact. Per review, "text" is structural (Blob == nil && Text != ""), and SingleTextArtifact returns ErrNoTextArtifact / ErrMultipleTextArtifacts so multi-artifact resolution stays the app's call.
  • extract/preset — opt-in document bundle: RegisterDocuments, NewDocumentRegistry(opts…), SupportedDocumentMediaTypes() (delegates to the registry so it can't drift), wiring text/html/pdf/docx/markdown. Added to the depguard deny-list; importing it pulls the format deps by design while core stays clean.
  • store/gcsNewEmulator(ctx, bucket, endpoint) convenience over WithEndpoint + WithoutAuthentication for dev/test wiring.

Tests

New unit tests for every helper (preset & store-helpers at 100%), plus NewWithClient panic coverage, NewEmulator validation, and an emulator round-trip in the integration suite.

Decisions carried from review

  • text-artifact = structural (Blob == nil && Text != ""), not MediaType-based.
  • preset lives at extract/preset with RegisterDocuments / NewDocumentRegistry / SupportedDocumentMediaTypes.

Verification

make lint, go test ./..., go vet, go test -race, and make test-integration all pass under go1.26.4. Core remains dependency-free (go list -deps ./store and ./extract show 0 cloud/adapter deps).

After merge this would be v0.2.0 (Morris bumps from v0.1.0 to pick these up).

🤖 Generated with Claude Code

…ry, preset, emulator)

Additive, app-neutral helpers requested by the Morris team. No new dependencies;
no go.mod change.

- content: SHA256HexBytes/String/Reader(ctx) content-addressing primitives. Doc'd
  as primitives only — the app decides what a digest identifies (the §10 helper).
- store: IsNotFound(err) and DeleteIfExists(ctx,s,key) — keep ObjectStore.Delete
  strict while making rollback/cleanup idempotent.
- extract: Registry.Supports + SupportedMediaTypes (sorted, canonical); and
  TextArtifacts / SingleTextArtifact (structural: Blob==nil && Text!="") with
  ErrNoTextArtifact / ErrMultipleTextArtifacts — multi-artifact policy stays the
  app's.
- extract/preset: opt-in document bundle — RegisterDocuments, NewDocumentRegistry,
  SupportedDocumentMediaTypes (delegates) wiring text/html/pdf/docx/markdown. Added
  to the depguard deny-list; importing it pulls the format deps by design, core
  stays clean.
- store/gcs: NewEmulator(ctx,bucket,endpoint) convenience over WithEndpoint +
  WithoutAuthentication for dev/test. Also adds unit tests for the NewWithClient
  panics and NewEmulator validation, and an emulator round-trip integration test.

Verified: make lint, go test ./... (preset & store helpers 100%), go vet,
go test -race, and make test-integration all pass under go1.26.4. Core remains
dependency-free (0 cloud/adapter deps in ./store and ./extract).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 8, 2026 21:05

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds small, additive helper APIs across content, store, and extract to improve consumer ergonomics (hashing primitives, idempotent delete helpers, extractor registry introspection + text-artifact helpers), plus a convenience “document bundle” registry preset and a GCS emulator constructor—keeping core packages dependency-free and pushing optional deps into subpackages.

Changes:

  • Add SHA-256 lowercase-hex helpers in content (bytes/string/streaming reader with ctx support).
  • Add store-level helpers (IsNotFound, DeleteIfExists) and GCS adapter convenience (NewEmulator) with accompanying unit/integration tests.
  • Add extractor registry helpers (Supports, SupportedMediaTypes, TextArtifacts, SingleTextArtifact) and a extract/preset bundle, plus depguard enforcement for the new preset package.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
store/helpers.go Adds IsNotFound and DeleteIfExists helper APIs on top of store.ObjectStore semantics.
store/helpers_test.go Unit tests covering the new store helper behaviors.
store/gcs/gcs.go Adds NewEmulator convenience constructor for emulator endpoint + no-auth configuration.
store/gcs/gcs_test.go Adds unit tests for NewWithClient panic behavior and NewEmulator validation.
store/gcs/gcs_integration_test.go Adds an integration round-trip test using NewEmulator.
extract/preset/preset.go Introduces opt-in document registry bundle wiring common extractors.
extract/preset/preset_test.go Tests for preset registry support, extraction, sorting, and option pass-through.
extract/helpers.go Adds Registry.Supports, Registry.SupportedMediaTypes, and text-artifact helpers + errors.
extract/helpers_test.go Unit tests for new extract helpers.
content/hash.go Adds SHA-256 lowercase-hex primitives, including streaming reader hashing with ctx awareness.
content/hash_test.go Unit tests for hash helpers including known vectors and ctx cancellation.
.golangci.yaml Extends depguard deny-list to prevent core packages importing extract/preset.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread content/hash_test.go Outdated
Comment thread store/helpers.go
…oc Metadata aliasing

Addresses external review + Copilot on #11.
- DeleteIfExists now calls IsNotFound (single source of truth for not-found).
- hash test compares byte count as int64 (no int conversion / overflow risk).
- TextArtifacts/SingleTextArtifact docs note the returned artifacts are shallow
  copies sharing Metadata (use content.Artifact.Clone for independence).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@dratner dratner merged commit ba796c3 into main Jun 8, 2026
6 checks passed
@dratner dratner deleted the feat/cutover-helpers branch June 8, 2026 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants