Consumer ergonomics for the Morris cutover (hash / store / registry / preset / emulator)#11
Merged
Conversation
…ry, preset, emulator) Additive, app-neutral helpers requested by the Morris team. No new dependencies; no go.mod change. - content: SHA256HexBytes/String/Reader(ctx) content-addressing primitives. Doc'd as primitives only — the app decides what a digest identifies (the §10 helper). - store: IsNotFound(err) and DeleteIfExists(ctx,s,key) — keep ObjectStore.Delete strict while making rollback/cleanup idempotent. - extract: Registry.Supports + SupportedMediaTypes (sorted, canonical); and TextArtifacts / SingleTextArtifact (structural: Blob==nil && Text!="") with ErrNoTextArtifact / ErrMultipleTextArtifacts — multi-artifact policy stays the app's. - extract/preset: opt-in document bundle — RegisterDocuments, NewDocumentRegistry, SupportedDocumentMediaTypes (delegates) wiring text/html/pdf/docx/markdown. Added to the depguard deny-list; importing it pulls the format deps by design, core stays clean. - store/gcs: NewEmulator(ctx,bucket,endpoint) convenience over WithEndpoint + WithoutAuthentication for dev/test. Also adds unit tests for the NewWithClient panics and NewEmulator validation, and an emulator round-trip integration test. Verified: make lint, go test ./... (preset & store helpers 100%), go vet, go test -race, and make test-integration all pass under go1.26.4. Core remains dependency-free (0 cloud/adapter deps in ./store and ./extract). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR adds small, additive helper APIs across content, store, and extract to improve consumer ergonomics (hashing primitives, idempotent delete helpers, extractor registry introspection + text-artifact helpers), plus a convenience “document bundle” registry preset and a GCS emulator constructor—keeping core packages dependency-free and pushing optional deps into subpackages.
Changes:
- Add SHA-256 lowercase-hex helpers in
content(bytes/string/streaming reader with ctx support). - Add store-level helpers (
IsNotFound,DeleteIfExists) and GCS adapter convenience (NewEmulator) with accompanying unit/integration tests. - Add extractor registry helpers (
Supports,SupportedMediaTypes,TextArtifacts,SingleTextArtifact) and aextract/presetbundle, plus depguard enforcement for the new preset package.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
store/helpers.go |
Adds IsNotFound and DeleteIfExists helper APIs on top of store.ObjectStore semantics. |
store/helpers_test.go |
Unit tests covering the new store helper behaviors. |
store/gcs/gcs.go |
Adds NewEmulator convenience constructor for emulator endpoint + no-auth configuration. |
store/gcs/gcs_test.go |
Adds unit tests for NewWithClient panic behavior and NewEmulator validation. |
store/gcs/gcs_integration_test.go |
Adds an integration round-trip test using NewEmulator. |
extract/preset/preset.go |
Introduces opt-in document registry bundle wiring common extractors. |
extract/preset/preset_test.go |
Tests for preset registry support, extraction, sorting, and option pass-through. |
extract/helpers.go |
Adds Registry.Supports, Registry.SupportedMediaTypes, and text-artifact helpers + errors. |
extract/helpers_test.go |
Unit tests for new extract helpers. |
content/hash.go |
Adds SHA-256 lowercase-hex primitives, including streaming reader hashing with ctx awareness. |
content/hash_test.go |
Unit tests for hash helpers including known vectors and ctx cancellation. |
.golangci.yaml |
Extends depguard deny-list to prevent core packages importing extract/preset. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…oc Metadata aliasing Addresses external review + Copilot on #11. - DeleteIfExists now calls IsNotFound (single source of truth for not-found). - hash test compares byte count as int64 (no int conversion / overflow risk). - TextArtifacts/SingleTextArtifact docs note the returned artifacts are shallow copies sharing Metadata (use content.Artifact.Clone for independence). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Additive, app-neutral helpers the Morris team asked for to smooth their cutover. No new dependencies; no
go.modchange. Each leaves policy with the app.What's added
SHA256HexBytes/SHA256HexString/SHA256HexReader(ctx)content-addressing primitives (the §10 hash helper). Documented as primitives only: the app decides whether a digest is raw-source / artifact / chunk / cache identity.SHA256HexReaderstreams and returns the byte count + honors ctx.IsNotFound(err)andDeleteIfExists(ctx, s, key). KeepsObjectStore.Deletestrict (the contract staysErrObjectNotFound) while giving Morris a clean idempotent path for rollback/cleanup.Registry.Supports+Registry.SupportedMediaTypes()(sorted, canonical — same normalization asExtract); andTextArtifacts/SingleTextArtifact. Per review, "text" is structural (Blob == nil && Text != ""), andSingleTextArtifactreturnsErrNoTextArtifact/ErrMultipleTextArtifactsso multi-artifact resolution stays the app's call.RegisterDocuments,NewDocumentRegistry(opts…),SupportedDocumentMediaTypes()(delegates to the registry so it can't drift), wiring text/html/pdf/docx/markdown. Added to the depguard deny-list; importing it pulls the format deps by design while core stays clean.NewEmulator(ctx, bucket, endpoint)convenience overWithEndpoint+WithoutAuthenticationfor dev/test wiring.Tests
New unit tests for every helper (preset & store-helpers at 100%), plus
NewWithClientpanic coverage,NewEmulatorvalidation, and an emulator round-trip in the integration suite.Decisions carried from review
Blob == nil && Text != ""), not MediaType-based.extract/presetwithRegisterDocuments/NewDocumentRegistry/SupportedDocumentMediaTypes.Verification
make lint,go test ./...,go vet,go test -race, andmake test-integrationall pass under go1.26.4. Core remains dependency-free (go list -deps ./storeand./extractshow 0 cloud/adapter deps).After merge this would be v0.2.0 (Morris bumps from v0.1.0 to pick these up).
🤖 Generated with Claude Code