Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 22 additions & 3 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -68,9 +68,22 @@ jobs:

# SQL E2E: build the worker fat JAR, then run the sqllogictest suite against it
# through the real signed `vgi` community extension via standalone haybarn.
#
# The SAME suite runs once per VGI transport (subprocess/stdio, HTTP, AF_UNIX
# launcher). The vgi extension picks the transport from the worker LOCATION:
# subprocess -> `java -jar <jar>` (DuckDB spawns it, Arrow over stdio)
# http -> `http://127.0.0.1:<p>` (we boot `--http`, wait /health 200)
# unix -> `unix://<sock>` (we boot `--unix <sock>`, wait sock)
# The vgi Java SDK Worker.runFromArgs supports --http/--host/--port (prints
# `PORT:<n>` on stdout, serves /health) and --unix <path>. See ci/README.md.
integration:
needs: resolve-haybarn
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
transport: [subprocess, http, unix]
name: SQL E2E (${{ matrix.transport }})
steps:
- uses: actions/checkout@v4

Expand Down Expand Up @@ -98,15 +111,21 @@ jobs:
- name: Resolve runner + worker paths
run: |
# Absolute paths: run-integration.sh cd's into a staging dir before
# invoking the runner, and DuckDB spawns the worker LOCATION as a
# subprocess — so both must be absolute.
# invoking the runner, and (subprocess) DuckDB spawns the worker
# LOCATION as a subprocess — so both must be absolute. For http/unix
# the script boots the JAR itself (cwd = stage dir).
UNITTEST="$PWD/$(find hb -name 'haybarn-unittest' -type f | head -1)"
chmod +x "$UNITTEST"
JAR="$PWD/$(ls build/libs/vgi-ical-*-all.jar | head -1)"
echo "Worker JAR: $JAR"
echo "HAYBARN_UNITTEST=$UNITTEST" >> "$GITHUB_ENV"
# The fat JAR manifest sets Add-Opens, so a bare `java -jar` works.
# subprocess transport uses this stdio command; http/unix derive the
# JAR path from it (or from VGI_ICAL_WORKER_JAR). The fat JAR manifest
# sets Add-Opens, so a bare `java -jar` works.
echo "VGI_ICAL_WORKER=java -jar $JAR" >> "$GITHUB_ENV"
echo "VGI_ICAL_WORKER_JAR=$JAR" >> "$GITHUB_ENV"

- name: Run SQL E2E suite
run: ci/run-integration.sh
env:
TRANSPORT: ${{ matrix.transport }}
81 changes: 72 additions & 9 deletions ci/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,23 +25,86 @@ extension from the Haybarn community channel:
{community,core}; LOAD <ext>;`. `require-env` and everything else pass
through untouched.
4. **Run** — [`run-integration.sh`](run-integration.sh) stages the preprocessed
tree plus the committed `.ics` fixtures, points `VGI_ICAL_WORKER` at the fat
JAR command, **warms the extension cache once** (`INSTALL vgi FROM
community;`) so each test file's explicit `LOAD vgi;` succeeds, then runs the
suite in a single `haybarn-unittest` invocation. Any failed assertion exits
non-zero and fails the job.
tree plus the committed `.ics` fixtures, brings up the worker on the chosen
transport, points `VGI_ICAL_WORKER` at it, **warms the extension cache once**
(`INSTALL vgi FROM community;`) so each test file's explicit `LOAD vgi;`
succeeds, then runs the suite in a single `haybarn-unittest` invocation. Any
failed assertion exits non-zero and fails the job.

## Transport matrix

The `vgi` extension chooses the transport from the worker `LOCATION` string, so
the **same suite** exercises a different transport just by changing what
`VGI_ICAL_WORKER` resolves to. `run-integration.sh` is parameterized by a
`TRANSPORT` env var (default `subprocess`), and the `integration` CI job runs it
once per transport (`strategy.matrix.transport: [subprocess, http, unix]`):

| `TRANSPORT` | `VGI_ICAL_WORKER` | how the worker is started | readiness gate |
|--------------|-------------------------|---------------------------|----------------|
| `subprocess` | `java -jar <jar>` | DuckDB spawns it per attach (Arrow-IPC over stdio) | n/a |
| `http` | `http://127.0.0.1:<p>` | this script boots `java -jar <jar> --http --host 127.0.0.1 --port <p>` (default 18110) | poll `GET /health` until HTTP 200 |
| `unix` | `unix://<sock>` | this script boots `java -jar <jar> --unix <sock>` | wait for the AF_UNIX socket file to appear |

The vgi Java SDK's `Worker.runFromArgs` (farm.query:vgi) supports these flags
directly: `--http`/`--host`/`--port` (it serves `/health` and also prints
`PORT:<n>` to stdout, so `--port 0` would advertise the chosen port), and
`--unix <path>` (the AF_UNIX launcher), plus an optional `--idle-timeout`.

For `http`/`unix` the worker is started **by this script** rather than by
DuckDB, so it is launched with **cwd = the stage dir** — the `.test` files
reference fixtures by the relative path `test/sql/data/*.ics`, which the worker
resolves against its own cwd. (For `subprocess`, DuckDB spawns the worker with
that same cwd.) In all three cases the committed `.ics` fixtures are staged into
the stage dir, so fixture resolution is identical across transports.

The suite itself is transport-agnostic — pure table-function / scalar reads with
SQL-side `ORDER BY` (the runner sorts results), no inline log streaming,
partition-local state, input-buffering, or wire-order-dependent assertions — so
no test needs a per-transport gate; all three legs run the full suite.

### httpfs is required for the HTTP transport

The `vgi` extension's HTTP transport uses DuckDB's HTTP client, which lives in
the **`httpfs`** extension — without it, ATTACH over an `http://` LOCATION raises
`VGI HTTP transport requires the httpfs extension`. On the standalone
`haybarn-unittest` runner, extension auto-load/auto-install is off, so `httpfs`
must be installed and loaded explicitly. Each `.test` therefore carries a
`require httpfs` line (preprocessed into `INSTALL httpfs FROM core; LOAD
httpfs;`) and the warm step installs it once. `httpfs` is a harmless no-op for
the subprocess/unix transports.

This was the one real failure surfaced by adding the http leg: before the
`httpfs` fix, the http job appeared green but every test was **silently skipped**
— some `haybarn-unittest` builds carry a built-in `skip on error_message
matching 'HTTP'` rule, and the `httpfs`-missing BinderException contains the
substring `HTTP`, so the runner skipped rather than failed. Always confirm the
http leg reports `All tests passed`, not `All tests were skipped`.

## Run it locally

```bash
./gradlew shadowJar # build the fat JAR
# point HAYBARN_UNITTEST at a haybarn-unittest binary (or `uv tool install
# haybarn-unittest`), and the worker at the fat JAR command:
JAR="$PWD/build/libs/$(ls build/libs | grep -- -all.jar | head -1)"

# subprocess (default):
HAYBARN_UNITTEST=/path/to/haybarn-unittest \
VGI_ICAL_WORKER="java -jar $JAR" \
ci/run-integration.sh

# http / unix: the script boots the JAR itself, so give it the JAR path:
HAYBARN_UNITTEST=/path/to/haybarn-unittest TRANSPORT=http \
VGI_ICAL_WORKER_JAR="$JAR" ci/run-integration.sh
HAYBARN_UNITTEST=/path/to/haybarn-unittest TRANSPORT=unix \
VGI_ICAL_WORKER_JAR="$JAR" ci/run-integration.sh
```

Or use `make test-sql`, which builds the JAR and runs the suite via a
`haybarn-unittest` already on PATH.
Or use `make test-sql`, which builds the JAR and runs the (subprocess) suite via
a `haybarn-unittest` already on PATH.

> Note: some prebuilt `haybarn-unittest` binaries carry a built-in
> `skip on error_message matching 'HTTP'` rule that silently skips a `.test`
> file whenever *any* error string contains `HTTP` — including unrelated local
> issues like a community-extension download miss for your dev platform. If the
> `http` leg reports "All tests were skipped" locally, that is this masking rule,
> not a worker fault; the Linux CI runner (where the signed extension installs
> cleanly) is the source of truth for the `http` leg.
155 changes: 145 additions & 10 deletions ci/run-integration.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,22 +5,48 @@
# VGI worker, using a prebuilt standalone `haybarn-unittest` and the signed
# community `vgi` extension — no C++ build from source. See ci/README.md.
#
# Parameterized by TRANSPORT (the same suite, a different VGI transport):
# subprocess (default) — VGI_ICAL_WORKER is a stdio command; DuckDB spawns
# the fat JAR per attach and speaks Arrow-IPC over its
# stdio. cwd of the spawned worker is the stage dir.
# http — we start the fat JAR once in `--http` mode bound to a fixed
# high port, wait for `/health` to return 200, then point
# VGI_ICAL_WORKER at http://127.0.0.1:<port>.
# unix — we start the fat JAR once in `--unix <sock>` (AF_UNIX
# launcher) mode, wait for the socket to appear, then point
# VGI_ICAL_WORKER at unix://<sock>.
#
# For http/unix the worker is started by THIS script (not by DuckDB), so it is
# launched with cwd = the stage dir — the .test files reference fixtures by the
# relative path test/sql/data/*.ics, which the worker resolves against its own
# cwd. (The stage dir is where we cp the committed fixtures.)
#
# Required environment:
# HAYBARN_UNITTEST path to the haybarn-unittest binary
# VGI_ICAL_WORKER worker LOCATION the .test files attach (a stdio command
# such as `java -jar /abs/path/vgi-ical-<ver>-all.jar`, or
# an http:// URL)
# VGI_ICAL_WORKER for subprocess: the stdio command (e.g.
# `java -jar /abs/path/vgi-ical-<ver>-all.jar`).
# For http/unix it is computed here, so set
# VGI_ICAL_WORKER_JAR (or VGI_ICAL_WORKER as the bare jar
# command) instead — see below.
# Optional:
# TRANSPORT subprocess | http | unix (default: subprocess)
# VGI_ICAL_WORKER_JAR abs path to the fat JAR (required for http/unix; if
# unset, derived from VGI_ICAL_WORKER by stripping a
# leading `java -jar `).
# HTTP_PORT fixed port for http mode (default: 18110)
# STAGE scratch dir for the preprocessed test tree (default: mktemp)
set -euo pipefail

TRANSPORT="${TRANSPORT:-subprocess}"

: "${HAYBARN_UNITTEST:?path to the haybarn-unittest binary}"
: "${VGI_ICAL_WORKER:?worker LOCATION (stdio command or http:// URL)}"

HERE="$(cd "$(dirname "$0")" && pwd)"
REPO="$(cd "$HERE/.." && pwd)"
STAGE="${STAGE:-$(mktemp -d)}"

echo "Transport: $TRANSPORT"

echo "Staging preprocessed tests into $STAGE ..."
mkdir -p "$STAGE/test/sql"
for f in "$REPO"/test/sql/*.test; do
Expand All @@ -29,28 +55,137 @@ done

# The .test files reference committed fixtures under test/sql/data via relative
# paths; stage them alongside the preprocessed tests so the runner (which cd's
# into $STAGE) resolves them.
# into $STAGE) and the out-of-band http/unix worker (started with cwd=$STAGE)
# both resolve them.
if [ -d "$REPO/test/sql/data" ]; then
cp -R "$REPO/test/sql/data" "$STAGE/test/sql/data"
fi

# ---------------------------------------------------------------------------
# Bring up the worker transport and export VGI_ICAL_WORKER for the .test files.
# ---------------------------------------------------------------------------
WORKER_PID=""
cleanup() {
if [ -n "$WORKER_PID" ]; then
kill "$WORKER_PID" 2>/dev/null || true
wait "$WORKER_PID" 2>/dev/null || true
fi
}
trap cleanup EXIT INT TERM

# Resolve the fat JAR command for http/unix. Prefer VGI_ICAL_WORKER_JAR; else
# strip a leading `java -jar ` from VGI_ICAL_WORKER.
resolve_jar() {
if [ -n "${VGI_ICAL_WORKER_JAR:-}" ]; then
printf '%s' "$VGI_ICAL_WORKER_JAR"
return
fi
if [ -n "${VGI_ICAL_WORKER:-}" ]; then
# e.g. "java -jar /abs/x-all.jar" -> "/abs/x-all.jar"
printf '%s' "${VGI_ICAL_WORKER#java -jar }"
return
fi
echo "::error::http/unix transport needs VGI_ICAL_WORKER_JAR (or VGI_ICAL_WORKER as the jar command)" >&2
exit 1
}

case "$TRANSPORT" in
subprocess)
: "${VGI_ICAL_WORKER:?worker LOCATION (stdio command) for subprocess transport}"
echo "subprocess worker: $VGI_ICAL_WORKER"
;;

http)
JAR="$(resolve_jar)"
PORT="${HTTP_PORT:-18110}"
echo "Starting fat JAR in --http mode on 127.0.0.1:$PORT (cwd=$STAGE) ..."
# cwd = $STAGE so relative fixture paths (test/sql/data/*.ics) resolve.
( cd "$STAGE" && exec java -jar "$JAR" --http --host 127.0.0.1 --port "$PORT" ) \
> "$STAGE/worker-http.log" 2>&1 &
WORKER_PID=$!

# Readiness: poll /health until it returns 200 (and the worker is alive).
ready=""
for _ in $(seq 1 60); do
if ! kill -0 "$WORKER_PID" 2>/dev/null; then
echo "::error::http worker exited during startup; log follows:" >&2
cat "$STAGE/worker-http.log" >&2 || true
exit 1
fi
if curl -fsS "http://127.0.0.1:$PORT/health" -o /dev/null 2>/dev/null; then
ready=1
break
fi
sleep 0.5
done
if [ -z "$ready" ]; then
echo "::error::http worker /health never returned 200; log follows:" >&2
cat "$STAGE/worker-http.log" >&2 || true
exit 1
fi
echo "http worker healthy on port $PORT (also advertised PORT:<n> on stdout)"
export VGI_ICAL_WORKER="http://127.0.0.1:$PORT"
;;

unix)
JAR="$(resolve_jar)"
SOCK="${UNIX_SOCK:-$STAGE/vgi-ical.sock}"
echo "Starting fat JAR in --unix mode on $SOCK (cwd=$STAGE) ..."
( cd "$STAGE" && exec java -jar "$JAR" --unix "$SOCK" ) \
> "$STAGE/worker-unix.log" 2>&1 &
WORKER_PID=$!

# Readiness: wait for the AF_UNIX socket file to appear (and worker alive).
ready=""
for _ in $(seq 1 60); do
if ! kill -0 "$WORKER_PID" 2>/dev/null; then
echo "::error::unix worker exited during startup; log follows:" >&2
cat "$STAGE/worker-unix.log" >&2 || true
exit 1
fi
if [ -S "$SOCK" ]; then
ready=1
break
fi
sleep 0.5
done
if [ -z "$ready" ]; then
echo "::error::unix worker socket never appeared; log follows:" >&2
cat "$STAGE/worker-unix.log" >&2 || true
exit 1
fi
echo "unix worker listening on $SOCK"
export VGI_ICAL_WORKER="unix://$SOCK"
;;

*)
echo "::error::unknown TRANSPORT '$TRANSPORT' (want subprocess|http|unix)" >&2
exit 1
;;
esac

cd "$STAGE"

# Warm the extension cache once: vgi from the signed community channel. A miss
# here is only a warning — but it is what provisions the signed `vgi`
# extension so each test file's explicit `LOAD vgi;` succeeds on a clean runner.
echo "Warming the extension cache (vgi from community) ..."
# Warm the extension cache once: vgi from the signed community channel, plus
# httpfs from core (the vgi HTTP transport requires httpfs for DuckDB's HTTP
# client). A miss here is only a warning — but it is what provisions the signed
# extensions so each test file's explicit `LOAD vgi;` / `require httpfs` succeed
# on a clean runner. httpfs is harmless for the subprocess/unix transports.
echo "Warming the extension cache (vgi from community, httpfs from core) ..."
mkdir -p "$STAGE/test"
cat > "$STAGE/test/_warm.test" <<'EOF'
# name: test/_warm.test
# group: [warm]
statement ok
INSTALL vgi FROM community;

statement ok
INSTALL httpfs FROM core;
EOF
"$HAYBARN_UNITTEST" "test/_warm.test" >/dev/null 2>&1 || echo "::warning::extension warm step did not fully succeed"
rm -f "$STAGE/test/_warm.test"

# Run the whole suite in one invocation, streaming the runner's native
# sqllogictest report. Any failed assertion exits non-zero and fails the job.
echo "Running suite (worker: $VGI_ICAL_WORKER) ..."
echo "Running suite (transport: $TRANSPORT, worker: $VGI_ICAL_WORKER) ..."
"$HAYBARN_UNITTEST" "test/sql/*"
6 changes: 6 additions & 0 deletions test/sql/events.test
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,12 @@

require-env VGI_ICAL_WORKER

# httpfs supplies DuckDB's HTTP client, which the vgi extension requires for the
# HTTP worker transport (http:// LOCATION); it is a harmless no-op for the
# subprocess and unix transports. CI's preprocess-require.awk rewrites this into
# an explicit `INSTALL httpfs FROM core; LOAD httpfs;`.
require httpfs

# NOTE: under haybarn-unittest `require vgi` SKIPS the file; load explicitly.
statement ok
LOAD vgi;
Expand Down
5 changes: 5 additions & 0 deletions test/sql/scalars.test
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,11 @@

require-env VGI_ICAL_WORKER

# httpfs supplies DuckDB's HTTP client, which the vgi extension requires for the
# HTTP worker transport (http:// LOCATION); a no-op for subprocess/unix. CI's
# preprocess-require.awk rewrites this into `INSTALL httpfs FROM core; LOAD httpfs;`.
require httpfs

statement ok
LOAD vgi;

Expand Down
5 changes: 5 additions & 0 deletions test/sql/todos.test
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,11 @@

require-env VGI_ICAL_WORKER

# httpfs supplies DuckDB's HTTP client, which the vgi extension requires for the
# HTTP worker transport (http:// LOCATION); a no-op for subprocess/unix. CI's
# preprocess-require.awk rewrites this into `INSTALL httpfs FROM core; LOAD httpfs;`.
require httpfs

statement ok
LOAD vgi;

Expand Down
Loading