Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions .github/workflows/integration.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
name: Integration

# Manual-dispatch only: these tests need Docker (a fake-gcs-server emulator) and
# are not part of the default network-free CI. Run them before merging changes to
# real-service adapters such as store/gcs.
on:
workflow_dispatch:

permissions:
contents: read

jobs:
integration:
name: Integration (GCS emulator)
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Set up Go
uses: actions/setup-go@v5
with:
go-version-file: go.mod
cache: true
- name: Integration tests
run: make test-integration
15 changes: 14 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
.PHONY: build test test-coverage lint fix fix-imports tidy install-lint install-goimports install-hooks clean
.PHONY: build test test-integration test-coverage lint fix fix-imports tidy install-lint install-goimports install-hooks clean

FAKEGCS_CONTAINER := maestro-cms-fakegcs

# Build all packages.
build: lint
Expand All @@ -9,6 +11,17 @@ build: lint
test:
go test -cover $(TESTARGS) ./...

# Run build-tagged integration tests against a Dockerized fake-gcs-server.
# Requires Docker. Starts the emulator, waits for readiness, runs the
# integration-tagged tests with STORAGE_EMULATOR_HOST set, then tears it down.
# Single test: make test-integration TESTARGS='-run TestGCSRoundTrip ./store/gcs/...'
test-integration:
@docker rm -f $(FAKEGCS_CONTAINER) >/dev/null 2>&1 || true
docker run -d --rm --name $(FAKEGCS_CONTAINER) -p 4443:4443 fsouza/fake-gcs-server -scheme http -backend memory -public-host localhost:4443 >/dev/null
@for i in $$(seq 1 50); do curl -sf "http://localhost:4443/storage/v1/b?project=test" >/dev/null 2>&1 && break || sleep 0.2; done
@STORAGE_EMULATOR_HOST=http://localhost:4443 go test -tags=integration $(TESTARGS) ./... ; \
status=$$? ; docker stop $(FAKEGCS_CONTAINER) >/dev/null 2>&1 || true ; exit $$status

# Generate an HTML coverage report.
test-coverage:
@mkdir -p coverage
Expand Down
30 changes: 15 additions & 15 deletions docs/deferred-tooling.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,21 +4,21 @@ Repo-tooling items intentionally left out of the initial scaffolding
(`maestro-llms` has them; we don't yet, because the trigger doesn't exist).
Add each when its trigger lands so we don't rediscover the gap later.

Status: items 1 and 4 open; items 2 and 3 done (kept here for the audit trail).

## 1. Integration test target + workflow

- **What `maestro-llms` has:** a `test-integration` Make target (OS-aware:
macOS routes through an ad-hoc-codesign script, Linux/CI runs
`go test -tags=integration`), a `test-integration-local` escape hatch, and a
manual-dispatch `integration.yml` GitHub workflow that runs live tests against
real services.
- **Why deferred:** `maestro-cms` has no integration tests yet. Core packages
(extract/chunk/content/tokens) are pure and unit-tested.
- **Add when:** the first adapter that talks to a real external service lands —
e.g. `store/gcs` (real GCS / emulator) or `index/pgvector` (real Postgres).
At that point add the build-tagged tests, the `test-integration` target, and a
manual-dispatch workflow; keep the default `make test` and CI network-free.
Status: item 4 open; items 1, 2, and 3 done (kept here for the audit trail).

## 1. Integration test target + workflow — DONE

- **Done:** landed with the first real-service adapter, `store/gcs`. A
`test-integration` Make target starts a Dockerized `fsouza/fake-gcs-server`
(no official Google GCS emulator exists), waits for readiness, runs the
`//go:build integration` tests with `STORAGE_EMULATOR_HOST` set, and tears the
container down. A manual-dispatch `integration.yml` workflow runs the same on
CI. The default `make test` and CI stay network-/Docker-free: the tagged tests
are excluded without the `integration` tag and `t.Skip` when the emulator host
is unset.
- **Extend when:** the next real-service adapter lands (e.g. `index/pgvector`
against real Postgres) — add its build-tagged tests under the same target. If a
macOS ad-hoc-codesign step is ever needed (as in `maestro-llms`), add it then.

## 2. golangci-lint depguard: core-must-not-import-adapters — DONE

Expand Down
2 changes: 1 addition & 1 deletion docs/spec-v1.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,7 @@ type BatchFailure struct {
| `chunk` | ✅ | **Pure, boundary-aware**: segments at semantic boundaries (`Paragraphs` default; `Headings` for Markdown — fence-aware ATX/setext, ADR 0008; pluggable `Boundaries` for pages/sections/code/transcripts/caller units), packs units to a token budget, and hard-splits an oversize unit only as a last resort (code-aware for fenced spans: line-boundary cuts, ADR 0008). Token estimation is a budget *constraint*, not the strategy: injected `func(string) int` — standard injection `llms.EstimateTextTokens` (v0.6.0+), local rune-counted char/4 default. Imports no `maestro-llms`. |
| `content` | ✅ | `Source` + `Artifact` + media type + single-parent provenance + stable IDs + optional neutral metadata map. New, minimal code. |
| `embed` | ✅ | **Runner**, not a contract: `Run` takes `[]Input` (a chunk + its source/artifact provenance, so batches span documents), packs them by input-count and token budget, and embeds over `llms.EmbeddingClient`, returning persist-ready `Record`s in input order. Defensive ID matching (opaque per-input IDs, dup/missing/unknown → batch failure); retry is delegated to `llms` middleware (the runner does not retry); a failed batch is bisected by default to isolate a poison input (`DisableBisect` to opt out). Invalid inputs are reported as failures, never panics. The optional `extract→chunk→embed` `Pipeline` is deferred until a real consumer (Morris) shapes its ergonomics. (Failure semantics: ADR 0004; vocabulary: ADR 0001.) |
| `store` | ✅ | `Get/Put/Delete/Exists(key)` object-store interface — opaque, adapter-defined keys, **no path conventions** — plus optional GCS adapter. Clean lift from Morris. A `content.StoreHandle{Backend, Key}` names which adapter resolves a given key. |
| `store` | ✅ | `Get/Put/Delete/Exists(key)` object-store interface — opaque, adapter-defined keys, **no path conventions**. The optional `store/gcs` adapter (over `cloud.google.com/go/storage`, an opt-in subpackage per ADR 0006) has landed: keys are GCS object names verbatim, `storage.ErrObjectNotExist` maps to `store.ErrObjectNotFound`, and `Delete` reports not-found per this interface (not Morris's idempotent delete). Integration-tested against a Dockerized fake-gcs-server (`make test-integration`). A `content.StoreHandle{Backend, Key}` names which adapter resolves a given key. |
| `testcms` | ✅ | Deterministic fakes, including a fake embedder. |
| `retrieval`| v1.x | Search request/response, context-window, source-handle, citation contracts. Deferred until a consumer is ready (§6). |
| `graph` | v2 | Generic directed-graph primitive with caller-defined schema (ADR 0005). |
Expand Down
51 changes: 50 additions & 1 deletion go.mod
Original file line number Diff line number Diff line change
@@ -1,9 +1,58 @@
module github.com/SnapdragonPartners/maestro-cms

go 1.26.3
go 1.26.4

require (
cloud.google.com/go/storage v1.62.3
github.com/SnapdragonPartners/maestro-llms v0.7.1
github.com/dslipak/pdf v0.0.2
golang.org/x/net v0.55.0
google.golang.org/api v0.274.0
)

require (
cel.dev/expr v0.25.1 // indirect
cloud.google.com/go v0.123.0 // indirect
cloud.google.com/go/auth v0.19.0 // indirect
cloud.google.com/go/auth/oauth2adapt v0.2.8 // indirect
cloud.google.com/go/compute/metadata v0.9.0 // indirect
cloud.google.com/go/iam v1.7.0 // indirect
cloud.google.com/go/monitoring v1.24.3 // indirect
github.com/GoogleCloudPlatform/opentelemetry-operations-go/detectors/gcp v1.31.0 // indirect
github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/metric v0.55.0 // indirect
github.com/GoogleCloudPlatform/opentelemetry-operations-go/internal/resourcemapping v0.55.0 // indirect
github.com/cespare/xxhash/v2 v2.3.0 // indirect
github.com/cncf/xds/go v0.0.0-20251210132809-ee656c7534f5 // indirect
github.com/envoyproxy/go-control-plane/envoy v1.36.0 // indirect
github.com/envoyproxy/protoc-gen-validate v1.3.0 // indirect
github.com/felixge/httpsnoop v1.0.4 // indirect
github.com/go-jose/go-jose/v4 v4.1.4 // indirect
github.com/go-logr/logr v1.4.3 // indirect
github.com/go-logr/stdr v1.2.2 // indirect
github.com/google/s2a-go v0.1.9 // indirect
github.com/google/uuid v1.6.0 // indirect
github.com/googleapis/enterprise-certificate-proxy v0.3.14 // indirect
github.com/googleapis/gax-go/v2 v2.21.0 // indirect
github.com/planetscale/vtprotobuf v0.6.1-0.20240319094008-0393e58bdf10 // indirect
github.com/spiffe/go-spiffe/v2 v2.6.0 // indirect
go.opentelemetry.io/auto/sdk v1.2.1 // indirect
go.opentelemetry.io/contrib/detectors/gcp v1.39.0 // indirect
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc v0.63.0 // indirect
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.61.0 // indirect
go.opentelemetry.io/otel v1.43.0 // indirect
go.opentelemetry.io/otel/metric v1.43.0 // indirect
go.opentelemetry.io/otel/sdk v1.43.0 // indirect
go.opentelemetry.io/otel/sdk/metric v1.43.0 // indirect
go.opentelemetry.io/otel/trace v1.43.0 // indirect
golang.org/x/crypto v0.51.0 // indirect
golang.org/x/oauth2 v0.36.0 // indirect
golang.org/x/sync v0.20.0 // indirect
golang.org/x/sys v0.45.0 // indirect
golang.org/x/text v0.37.0 // indirect
golang.org/x/time v0.15.0 // indirect
google.golang.org/genproto v0.0.0-20260319201613-d00831a3d3e7 // indirect
google.golang.org/genproto/googleapis/api v0.0.0-20260401024825-9d38bb4040a9 // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20260401024825-9d38bb4040a9 // indirect
google.golang.org/grpc v1.80.0 // indirect
google.golang.org/protobuf v1.36.11 // indirect
)
Loading
Loading