huggingface · dacorvo · Jun 26, 2026 · Jun 25, 2026 · Jun 25, 2026
diff --git a/examples/transformers-agentic/.gitignore b/examples/transformers-agentic/.gitignore
@@ -0,0 +1,9 @@
+# Build artefacts and run outputs — never commit (the bundle alone is ~16 GB).
+# Anchored to this dir so they don't match like-named subdirs (e.g. the skill's
+# own transformers/ folder).
+/toolenv/
+/transformers/
+/inputs/
+/sandbox-*/
+/xet-test/
+/.agentcap/
diff --git a/examples/transformers-agentic/README.md b/examples/transformers-agentic/README.md
@@ -0,0 +1,69 @@
+# transformers-agentic
+
+agentcap port of the [`is-it-agentic-enough`](https://github.com/huggingface/is-it-agentic-enough)
+task suite (the [blog post](https://huggingface.co/blog/is-it-agentic-enough)):
+16 prompts that each ask an agent to run a **named** Hugging Face model
+(classify sentiment, transcribe audio, caption an image, …) and report the
+result. Because each task pins a specific model, the agent has to actually
+load and run it rather than answer from world knowledge.
+
+Here it's used to **compare models/agents** through agentcap's capture path —
+not to reproduce the article's scoring. agentcap records the agent ↔ model
+wire traffic; match %, token, and CLI-vs-`pipeline()` marker analysis are the
+upstream harness's job (the captures contain what's needed to compute them
+later).
+
+## How the agent actually runs transformers
+
+The agent's task work executes **inside the podman sandbox**, which ships only
+the agent CLI — no transformers. Rather than rebuild the images, a
+self-contained, relocatable `transformers` bundle is mounted read-only via
+`agentcap run --tool-dir` and put on the agent's PATH:
+
+```bash
+./build-toolenv.sh        # one-time: builds ./toolenv/ + prewarms the model cache
+```
+
+`build-toolenv.sh` builds the bundle **inside `ubuntu:24.04`** — the base of
+every agentcap agent image — so the venv's interpreter and torch `.so`s are
+ABI-identical when mounted into any sandbox. It pins the exact transformers
+commit that carries the (still unreleased) agentic CLI, installs CPU torch, and
+prewarms every corpus model into `./toolenv/hf-cache/`. The venv configures
+itself to use that cache — a `.pth` points `HF_HOME` at it (resolved from the
+venv root, so it holds wherever the bundle is mounted) and defaults to offline —
+so runs read models from the read-only mount with no network or re-downloads.
+
+## Tiers (the article's discovery conditions)
+
+| `--tier` | what the agent gets |
+|---|---|
+| `bare`  | empty cwd; only the mounted `transformers` bundle |
+| `clone` | cwd is a git worktree of `./transformers` @ the bundle's commit (AGENTS.md / `cli/agentic/*.py` auto-discover) |
+| `skill` | empty cwd + the packaged transformers Skill (`./skill`) in context |
+
+## Run
+
+```bash
+# server: any OpenAI-compat /v1 on $UPSTREAM (default http://127.0.0.1:8001)
+./run.sh --agent pi     --model unsloth/GLM-4.5-Air-GGUF --tier skill
+./run.sh --agent hermes --model unsloth/GLM-4.5-Air-GGUF --tier bare
+```
+
+`./run.sh --help` for the env knobs. It pins `AGENTCAP_WORKSPACE` here, so runs
+live under `./.agentcap/` — list them with `agentcap ls` from this directory, and
+publish with `agentcap export <run-id|--all> --push <owner>/<dataset>`.
+
+`tasks.txt` is the full 16-task corpus; pass `--tasks <file>` to run a subset.
+
+## Caveats vs. the article
+
+- **One cwd per (agent, model, tier) run**, reused across the corpus's tasks
+  (agentcap runs a corpus in a single sandbox), where the article isolates each
+  task in its own worktree. File writes from one task can persist into the next.
+- The agentic CLI is unreleased; the bundle pins commit
+  `4d15b215f3` (`is-it-agentic-enough`'s "w/ CLI + Skill" ref).
+- Prewarm uses the classic HTTPS backend (`HF_HUB_DISABLE_XET=1`): xet stalled
+  once on a transient CAS hiccup during a long bulk download, and HTTPS is
+  steadier for a one-shot prewarm. (xet into the bind-mounted cache itself works
+  fine — verified; it's not a mount problem.) Runs read the cache offline, so
+  xet is never invoked at run time regardless.
diff --git a/examples/transformers-agentic/build-toolenv.sh b/examples/transformers-agentic/build-toolenv.sh
@@ -0,0 +1,121 @@
+#!/usr/bin/env bash
+# One-time builder for the self-contained `transformers` bundle the corpus
+# mounts via `agentcap run --tool-dir`. Everything (interpreter, torch, the
+# agentic-CLI transformers, and a prewarmed model cache) lives under ./toolenv,
+# built INSIDE ubuntu:24.04 — the exact base of every agentcap agent image — so
+# the venv's /usr/bin/python3.12 base and torch .so's are ABI-identical when the
+# bundle is mounted (read-only) into any agent sandbox.
+#
+#   ./toolenv/                relocatable venv (bin/transformers, bin/python, lib/)
+#   ./toolenv/hf-cache/       prewarmed HF cache; the venv points HF_HOME here
+#   ./transformers/           transformers checkout @ PINNED_SHA (clone-tier source)
+#   ./inputs/                 corpus inputs (cat.jpg, sample.wav), fetched from the blog repo
+#
+# Re-run to prewarm any missing models; the heavy build is skipped if ./toolenv
+# already has a working `transformers`. Pass HF_TOKEN for faster, rate-limit-free
+# downloads. The agentic CLI is unreleased, so we pin the exact commit.
+
+set -euo pipefail
+HERE="$(cd "$(dirname "$0")" && pwd)"
+
+# transformers @ the "agent-first CLI" effort (is-it-agentic-enough's
+# `4d15b215f3` / "w/ CLI + Skill"). Not on main, not in any release.
+PINNED_SHA="4d15b215f37bcb25a2d6472b2147e34b3d465186"
+
+# Every model named in the corpus (tasks.txt). Prewarmed so runs read offline.
+MODELS="
+distilbert/distilbert-base-uncased-finetuned-sst-2-english
+dslim/bert-base-NER
+openai/whisper-tiny
+llava-hf/llava-interleave-qwen-0.5b-hf
+HuggingFaceTB/SmolLM2-360M-Instruct
+facebook/bart-large-cnn
+distilbert/distilbert-base-cased-distilled-squad
+distilbert/distilbert-base-uncased
+facebook/bart-large-mnli
+google/vit-base-patch16-224
+facebook/detr-resnet-50
+laion/clap-htsat-unfused
+Helsinki-NLP/opus-mt-en-fr
+"
+
+# Corpus inputs (the cat image + audio clip the tasks reference) live in the
+# is-it-agentic-enough repo; fetch them at a pinned commit instead of vendoring
+# binaries here. Idempotent — skipped if already present.
+INPUTS_SHA="1655d61abf056c58ee2bc8682cb2f0d336ce31ae"
+INPUTS_URL="https://raw.githubusercontent.com/huggingface/is-it-agentic-enough/${INPUTS_SHA}/src/ae/data/inputs"
+mkdir -p "$HERE/inputs"
+for f in cat.jpg sample.wav; do
+    [ -f "$HERE/inputs/$f" ] || { echo ">>> fetching input $f"; curl -fsSL "$INPUTS_URL/$f" -o "$HERE/inputs/$f"; }
+done
+
+podman run --rm -i \
+    -e PINNED_SHA="$PINNED_SHA" -e MODELS="$MODELS" \
+    -e HF_TOKEN="${HF_TOKEN:-}" \
+    -e HF_HUB_DISABLE_XET=1 \
+    -v "$HERE:$HERE" -w "$HERE" \
+    ubuntu:24.04 bash -s <<'IN_CONTAINER'
+set -e
+export DEBIAN_FRONTEND=noninteractive
+apt-get update -q >/dev/null
+apt-get install -y -q --no-install-recommends python3 python3-venv python3-pip git ca-certificates >/dev/null
+HERE="$(pwd)"; TE="$HERE/toolenv"; TFSRC="$HERE/transformers"
+# Prewarm into the bundle's cache, explicitly online (the .pth written below
+# makes the venv default to offline at run time; setdefault leaves these be).
+export HF_HOME="$TE/hf-cache" HF_HUB_OFFLINE=0 TRANSFORMERS_OFFLINE=0
+
+if [ ! -x "$TE/bin/transformers" ]; then
+    echo ">>> fetching transformers @ $PINNED_SHA"
+    rm -rf "$TFSRC"; mkdir -p "$TFSRC"; cd "$TFSRC"
+    git init -q; git remote add origin https://github.com/huggingface/transformers
+    git fetch -q --depth 1 origin "$PINNED_SHA"; git checkout -q FETCH_HEAD
+    cd "$HERE"
+    echo ">>> building venv + CPU torch + transformers + task deps"
+    python3 -m venv "$TE"
+    "$TE/bin/pip" install -q --no-cache-dir --upgrade pip
+    "$TE/bin/pip" install -q --no-cache-dir torch torchvision --index-url https://download.pytorch.org/whl/cpu
+    "$TE/bin/pip" install -q --no-cache-dir "$TFSRC"
+    "$TE/bin/pip" install -q --no-cache-dir timm pillow sentencepiece sacremoses librosa soundfile scipy accelerate protobuf openai
+else
+    echo ">>> toolenv present; skipping build"
+fi
+
+# Self-configuring bundle: a .pth points HF_HOME at the in-bundle hf-cache,
+# resolved from the venv root (sys.prefix) so it holds wherever the bundle is
+# mounted read-only. The agent invokes the bundle's python/transformers, so HF
+# reads the prewarmed cache offline with no per-sandbox env setup. (Ubuntu venvs
+# don't auto-import sitecustomize, hence the .pth + helper module.)
+SP="$("$TE/bin/python" -c 'import sysconfig; print(sysconfig.get_path("purelib"))')"
+cat > "$SP/_agentcap_hf_home.py" <<'PY'
+import os
+import sys
+
+_cache = os.path.join(sys.prefix, "hf-cache")
+if os.path.isdir(_cache):
+    os.environ.setdefault("HF_HOME", _cache)
+    os.environ.setdefault("HF_HUB_OFFLINE", "1")
+    os.environ.setdefault("TRANSFORMERS_OFFLINE", "1")
+PY
+echo 'import _agentcap_hf_home' > "$SP/_agentcap_hf_home.pth"
+
+echo ">>> sanity: CLI + pipeline import"
+"$TE/bin/transformers" --help >/dev/null && echo "    transformers CLI OK"
+"$TE/bin/python" -c "from transformers import pipeline" && echo "    pipeline import OK"
+
+echo ">>> prewarming model cache (xet disabled)"
+for m in $MODELS; do
+    printf '    %-58s ' "$m"
+    if "$TE/bin/python" - "$m" <<'PY' 2>/tmp/dl.err
+import sys
+from huggingface_hub import snapshot_download
+# PyTorch + safetensors only; skip the TF/Flax/ONNX/Rust/GGUF weight copies
+# transformers never loads (they triple the download for no benefit).
+snapshot_download(sys.argv[1], ignore_patterns=[
+    "*.h5", "tf_model*", "*.msgpack", "flax_model*", "*.onnx", "onnx/**",
+    "*.tflite", "rust_model.ot", "*.gguf",
+])
+PY
+    then echo "ok"; else echo "FAILED"; tail -2 /tmp/dl.err; fi
+done
+echo ">>> DONE. bundle at $TE ($(du -sh "$TE" | cut -f1))"
+IN_CONTAINER
diff --git a/examples/transformers-agentic/run.sh b/examples/transformers-agentic/run.sh
@@ -0,0 +1,95 @@
+#!/usr/bin/env bash
+# Drive the transformers-agentic corpus (the `is-it-agentic-enough` task suite)
+# through any registered agent, in one of three assistance tiers. Each task
+# names a specific HF model the agent must actually load and run, so the agent
+# needs a runnable `transformers` — provided by a self-contained, relocatable
+# bundle mounted via `agentcap run --tool-dir` (build it once with
+# ./build-toolenv.sh). The agent's own model is served on $UPSTREAM as usual.
+#
+# Tiers (the article's bare/clone/skill discovery conditions):
+#   bare   empty cwd; only the mounted transformers bundle is available.
+#   clone  cwd is a detached git worktree of ./transformers @ the bundle's
+#          commit, so AGENTS.md / cli/agentic/*.py auto-discover from cwd.
+#   skill  empty cwd + the packaged transformers Skill (./skill) in context.
+#
+# Usage:
+#   ./run.sh --agent <name> --model <id> [--tier bare|clone|skill] [--tasks <file>]
+#
+# Examples:
+#   ./run.sh --agent pi      --model unsloth/GLM-4.5-Air-GGUF --tier skill
+#   ./run.sh --agent hermes  --model unsloth/GLM-4.5-Air-GGUF --tier bare
+#
+# Captures land under $HERE/.agentcap/<run-id>/; publish with `agentcap export`.
+#
+# Env knobs:
+#   UPSTREAM   model server URL                http://127.0.0.1:8001
+#   TURNS      turns per task                  1
+#   FOLLOWUP   continue | templates | synthesized   continue
+#   TIMEOUT    per-turn timeout (seconds)      900
+
+set -euo pipefail
+HERE="$(cd "$(dirname "$0")" && pwd)"
+export AGENTCAP_WORKSPACE="$HERE"
+
+AGENT="" MODEL="" TIER="bare" TASKS="$HERE/tasks.txt"
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        --agent) AGENT="$2"; shift 2 ;;
+        --model) MODEL="$2"; shift 2 ;;
+        --tier)  TIER="$2";  shift 2 ;;
+        --tasks) TASKS="$2"; shift 2 ;;
+        -h|--help) sed -n '/^# Usage:/,/^set -euo/p' "$0" | sed 's/^# \?//; /^set -euo/d'; exit 0 ;;
+        *) echo "ERROR: unexpected arg: $1" >&2; exit 2 ;;
+    esac
+done
+[[ -n "$AGENT" && -n "$MODEL" ]] || { echo "ERROR: --agent and --model are required. See: $0 --help" >&2; exit 2; }
+[[ "$TIER" =~ ^(bare|clone|skill)$ ]] || { echo "ERROR: --tier must be bare|clone|skill" >&2; exit 2; }
+[[ -f "$TASKS" ]] || { echo "ERROR: tasks file not found: $TASKS" >&2; exit 2; }
+
+UPSTREAM="${UPSTREAM:-http://127.0.0.1:8001}"
+TURNS="${TURNS:-1}"
+FOLLOWUP="${FOLLOWUP:-continue}"
+TIMEOUT="${TIMEOUT:-900}"
+
+TOOLENV="$HERE/toolenv"
+[[ -x "$TOOLENV/bin/transformers" ]] || {
+    echo "ERROR: transformers bundle missing at $TOOLENV. Build it first:" >&2
+    echo "         ./build-toolenv.sh" >&2
+    exit 2
+}
+
+# Per-tier sandbox cwd, rebuilt fresh each invocation, with inputs/ seeded.
+SANDBOX="$HERE/sandbox-$TIER"
+if [[ -e "$SANDBOX/.git" ]]; then
+    git -C "$HERE/transformers" worktree remove --force "$SANDBOX" 2>/dev/null || true
+fi
+rm -rf "$SANDBOX"
+if [[ "$TIER" == "clone" ]]; then
+    [[ -d "$HERE/transformers/.git" ]] || { echo "ERROR: clone tier needs $HERE/transformers (built by ./build-toolenv.sh)" >&2; exit 2; }
+    SHA="$(git -C "$HERE/transformers" rev-parse HEAD)"
+    git -C "$HERE/transformers" worktree add --detach "$SANDBOX" "$SHA" >/dev/null
+else
+    mkdir -p "$SANDBOX"
+fi
+cp -r "$HERE/inputs" "$SANDBOX/inputs"
+
+# Only the skill tier passes --skills; empty otherwise. Expanded set-u-safe below
+# (bash 3.2 treats "${arr[@]}" of an empty array as an unbound-variable error).
+skill_args=()
+[[ "$TIER" == "skill" ]] && skill_args=(--skills "$HERE/skill")
+
+echo ">>> agent=$AGENT model=$MODEL tier=$TIER tasks=$(basename "$TASKS") upstream=$UPSTREAM" >&2
+agentcap run \
+    --agent     "$AGENT" \
+    --model     "$MODEL" \
+    --upstream  "$UPSTREAM" \
+    --sandbox   "$SANDBOX" \
+    --tool-dir  "$TOOLENV" \
+    --label     "$TIER" \
+    "${skill_args[@]+"${skill_args[@]}"}" \
+    --tasks     "$TASKS" \
+    --turns     "$TURNS" \
+    --followup  "$FOLLOWUP" \
+    --timeout   "$TIMEOUT"
+
+echo "done. captures under $HERE/.agentcap/ (agentcap ls). publish: agentcap export" >&2
diff --git a/examples/transformers-agentic/skill/agents/AGENTS.md b/examples/transformers-agentic/skill/agents/AGENTS.md
@@ -0,0 +1,17 @@
+<skills>
+
+You have additional SKILLs documented in directories containing a "SKILL.md" file.
+
+These skills are:
+ - transformers -> "skills/transformers/SKILL.md"
+
+IMPORTANT: You MUST read the SKILL.md file whenever the description of the skills matches the user intent, or may help accomplish their task.
+
+<available_skills>
+
+transformers: `Run one-off Hugging Face Transformers inference from the command line — classify, ner, qa, fill-mask, summarize, translate, tokenize, caption, image-classify, detect, vqa, transcribe, audio-classify, generate, and more. Use this skill whenever a task asks you to run a named model on text, an image, or audio: invoke the `transformers` CLI (e.g. `transformers --format json classify --text "..." --model ...`) rather than hand-writing a Python `pipeline(...)` script. Run `transformers --help` for the full command list.`
+</available_skills>
+
+Paths referenced within SKILL folders are relative to that SKILL.
+
+</skills>