Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions examples/transformers-agentic/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Build artefacts and run outputs — never commit (the bundle alone is ~16 GB).
# Anchored to this dir so they don't match like-named subdirs (e.g. the skill's
# own transformers/ folder).
/toolenv/
/transformers/
/inputs/
/sandbox-*/
/xet-test/
/.agentcap/
69 changes: 69 additions & 0 deletions examples/transformers-agentic/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# transformers-agentic

agentcap port of the [`is-it-agentic-enough`](https://github.com/huggingface/is-it-agentic-enough)
task suite (the [blog post](https://huggingface.co/blog/is-it-agentic-enough)):
16 prompts that each ask an agent to run a **named** Hugging Face model
(classify sentiment, transcribe audio, caption an image, …) and report the
result. Because each task pins a specific model, the agent has to actually
load and run it rather than answer from world knowledge.

Here it's used to **compare models/agents** through agentcap's capture path —
not to reproduce the article's scoring. agentcap records the agent ↔ model
wire traffic; match %, token, and CLI-vs-`pipeline()` marker analysis are the
upstream harness's job (the captures contain what's needed to compute them
later).

## How the agent actually runs transformers

The agent's task work executes **inside the podman sandbox**, which ships only
the agent CLI — no transformers. Rather than rebuild the images, a
self-contained, relocatable `transformers` bundle is mounted read-only via
`agentcap run --tool-dir` and put on the agent's PATH:

```bash
./build-toolenv.sh # one-time: builds ./toolenv/ + prewarms the model cache
```

`build-toolenv.sh` builds the bundle **inside `ubuntu:24.04`** — the base of
every agentcap agent image — so the venv's interpreter and torch `.so`s are
ABI-identical when mounted into any sandbox. It pins the exact transformers
commit that carries the (still unreleased) agentic CLI, installs CPU torch, and
prewarms every corpus model into `./toolenv/hf-cache/`. The venv configures
itself to use that cache — a `.pth` points `HF_HOME` at it (resolved from the
venv root, so it holds wherever the bundle is mounted) and defaults to offline —
so runs read models from the read-only mount with no network or re-downloads.

## Tiers (the article's discovery conditions)

| `--tier` | what the agent gets |
|---|---|
| `bare` | empty cwd; only the mounted `transformers` bundle |
| `clone` | cwd is a git worktree of `./transformers` @ the bundle's commit (AGENTS.md / `cli/agentic/*.py` auto-discover) |
| `skill` | empty cwd + the packaged transformers Skill (`./skill`) in context |

## Run

```bash
# server: any OpenAI-compat /v1 on $UPSTREAM (default http://127.0.0.1:8001)
./run.sh --agent pi --model unsloth/GLM-4.5-Air-GGUF --tier skill
./run.sh --agent hermes --model unsloth/GLM-4.5-Air-GGUF --tier bare
```

`./run.sh --help` for the env knobs. It pins `AGENTCAP_WORKSPACE` here, so runs
live under `./.agentcap/` — list them with `agentcap ls` from this directory, and
publish with `agentcap export <run-id|--all> --push <owner>/<dataset>`.

`tasks.txt` is the full 16-task corpus; pass `--tasks <file>` to run a subset.

## Caveats vs. the article

- **One cwd per (agent, model, tier) run**, reused across the corpus's tasks
(agentcap runs a corpus in a single sandbox), where the article isolates each
task in its own worktree. File writes from one task can persist into the next.
- The agentic CLI is unreleased; the bundle pins commit
`4d15b215f3` (`is-it-agentic-enough`'s "w/ CLI + Skill" ref).
- Prewarm uses the classic HTTPS backend (`HF_HUB_DISABLE_XET=1`): xet stalled
once on a transient CAS hiccup during a long bulk download, and HTTPS is
steadier for a one-shot prewarm. (xet into the bind-mounted cache itself works
fine — verified; it's not a mount problem.) Runs read the cache offline, so
xet is never invoked at run time regardless.
121 changes: 121 additions & 0 deletions examples/transformers-agentic/build-toolenv.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
#!/usr/bin/env bash
# One-time builder for the self-contained `transformers` bundle the corpus
# mounts via `agentcap run --tool-dir`. Everything (interpreter, torch, the
# agentic-CLI transformers, and a prewarmed model cache) lives under ./toolenv,
# built INSIDE ubuntu:24.04 — the exact base of every agentcap agent image — so
# the venv's /usr/bin/python3.12 base and torch .so's are ABI-identical when the
# bundle is mounted (read-only) into any agent sandbox.
#
# ./toolenv/ relocatable venv (bin/transformers, bin/python, lib/)
# ./toolenv/hf-cache/ prewarmed HF cache; the venv points HF_HOME here
# ./transformers/ transformers checkout @ PINNED_SHA (clone-tier source)
# ./inputs/ corpus inputs (cat.jpg, sample.wav), fetched from the blog repo
#
# Re-run to prewarm any missing models; the heavy build is skipped if ./toolenv
# already has a working `transformers`. Pass HF_TOKEN for faster, rate-limit-free
# downloads. The agentic CLI is unreleased, so we pin the exact commit.

set -euo pipefail
HERE="$(cd "$(dirname "$0")" && pwd)"

# transformers @ the "agent-first CLI" effort (is-it-agentic-enough's
# `4d15b215f3` / "w/ CLI + Skill"). Not on main, not in any release.
PINNED_SHA="4d15b215f37bcb25a2d6472b2147e34b3d465186"

# Every model named in the corpus (tasks.txt). Prewarmed so runs read offline.
MODELS="
distilbert/distilbert-base-uncased-finetuned-sst-2-english
dslim/bert-base-NER
openai/whisper-tiny
llava-hf/llava-interleave-qwen-0.5b-hf
HuggingFaceTB/SmolLM2-360M-Instruct
facebook/bart-large-cnn
distilbert/distilbert-base-cased-distilled-squad
distilbert/distilbert-base-uncased
facebook/bart-large-mnli
google/vit-base-patch16-224
facebook/detr-resnet-50
laion/clap-htsat-unfused
Helsinki-NLP/opus-mt-en-fr
"

# Corpus inputs (the cat image + audio clip the tasks reference) live in the
# is-it-agentic-enough repo; fetch them at a pinned commit instead of vendoring
# binaries here. Idempotent — skipped if already present.
INPUTS_SHA="1655d61abf056c58ee2bc8682cb2f0d336ce31ae"
INPUTS_URL="https://raw.githubusercontent.com/huggingface/is-it-agentic-enough/${INPUTS_SHA}/src/ae/data/inputs"
mkdir -p "$HERE/inputs"
for f in cat.jpg sample.wav; do
[ -f "$HERE/inputs/$f" ] || { echo ">>> fetching input $f"; curl -fsSL "$INPUTS_URL/$f" -o "$HERE/inputs/$f"; }
done

podman run --rm -i \
-e PINNED_SHA="$PINNED_SHA" -e MODELS="$MODELS" \
-e HF_TOKEN="${HF_TOKEN:-}" \
-e HF_HUB_DISABLE_XET=1 \
-v "$HERE:$HERE" -w "$HERE" \
ubuntu:24.04 bash -s <<'IN_CONTAINER'
set -e
export DEBIAN_FRONTEND=noninteractive
apt-get update -q >/dev/null
apt-get install -y -q --no-install-recommends python3 python3-venv python3-pip git ca-certificates >/dev/null
HERE="$(pwd)"; TE="$HERE/toolenv"; TFSRC="$HERE/transformers"
# Prewarm into the bundle's cache, explicitly online (the .pth written below
# makes the venv default to offline at run time; setdefault leaves these be).
export HF_HOME="$TE/hf-cache" HF_HUB_OFFLINE=0 TRANSFORMERS_OFFLINE=0

if [ ! -x "$TE/bin/transformers" ]; then
echo ">>> fetching transformers @ $PINNED_SHA"
rm -rf "$TFSRC"; mkdir -p "$TFSRC"; cd "$TFSRC"
git init -q; git remote add origin https://github.com/huggingface/transformers
git fetch -q --depth 1 origin "$PINNED_SHA"; git checkout -q FETCH_HEAD
cd "$HERE"
echo ">>> building venv + CPU torch + transformers + task deps"
python3 -m venv "$TE"
"$TE/bin/pip" install -q --no-cache-dir --upgrade pip
"$TE/bin/pip" install -q --no-cache-dir torch torchvision --index-url https://download.pytorch.org/whl/cpu
"$TE/bin/pip" install -q --no-cache-dir "$TFSRC"
"$TE/bin/pip" install -q --no-cache-dir timm pillow sentencepiece sacremoses librosa soundfile scipy accelerate protobuf openai
else
echo ">>> toolenv present; skipping build"
fi

# Self-configuring bundle: a .pth points HF_HOME at the in-bundle hf-cache,
# resolved from the venv root (sys.prefix) so it holds wherever the bundle is
# mounted read-only. The agent invokes the bundle's python/transformers, so HF
# reads the prewarmed cache offline with no per-sandbox env setup. (Ubuntu venvs
# don't auto-import sitecustomize, hence the .pth + helper module.)
SP="$("$TE/bin/python" -c 'import sysconfig; print(sysconfig.get_path("purelib"))')"
cat > "$SP/_agentcap_hf_home.py" <<'PY'
import os
import sys

_cache = os.path.join(sys.prefix, "hf-cache")
if os.path.isdir(_cache):
os.environ.setdefault("HF_HOME", _cache)
os.environ.setdefault("HF_HUB_OFFLINE", "1")
os.environ.setdefault("TRANSFORMERS_OFFLINE", "1")
PY
echo 'import _agentcap_hf_home' > "$SP/_agentcap_hf_home.pth"

echo ">>> sanity: CLI + pipeline import"
"$TE/bin/transformers" --help >/dev/null && echo " transformers CLI OK"
"$TE/bin/python" -c "from transformers import pipeline" && echo " pipeline import OK"

echo ">>> prewarming model cache (xet disabled)"
for m in $MODELS; do
printf ' %-58s ' "$m"
if "$TE/bin/python" - "$m" <<'PY' 2>/tmp/dl.err
import sys
from huggingface_hub import snapshot_download
# PyTorch + safetensors only; skip the TF/Flax/ONNX/Rust/GGUF weight copies
# transformers never loads (they triple the download for no benefit).
snapshot_download(sys.argv[1], ignore_patterns=[
"*.h5", "tf_model*", "*.msgpack", "flax_model*", "*.onnx", "onnx/**",
"*.tflite", "rust_model.ot", "*.gguf",
])
PY
then echo "ok"; else echo "FAILED"; tail -2 /tmp/dl.err; fi
done
echo ">>> DONE. bundle at $TE ($(du -sh "$TE" | cut -f1))"
IN_CONTAINER
95 changes: 95 additions & 0 deletions examples/transformers-agentic/run.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
#!/usr/bin/env bash
# Drive the transformers-agentic corpus (the `is-it-agentic-enough` task suite)
# through any registered agent, in one of three assistance tiers. Each task
# names a specific HF model the agent must actually load and run, so the agent
# needs a runnable `transformers` — provided by a self-contained, relocatable
# bundle mounted via `agentcap run --tool-dir` (build it once with
# ./build-toolenv.sh). The agent's own model is served on $UPSTREAM as usual.
#
# Tiers (the article's bare/clone/skill discovery conditions):
# bare empty cwd; only the mounted transformers bundle is available.
# clone cwd is a detached git worktree of ./transformers @ the bundle's
# commit, so AGENTS.md / cli/agentic/*.py auto-discover from cwd.
# skill empty cwd + the packaged transformers Skill (./skill) in context.
#
# Usage:
# ./run.sh --agent <name> --model <id> [--tier bare|clone|skill] [--tasks <file>]
#
# Examples:
# ./run.sh --agent pi --model unsloth/GLM-4.5-Air-GGUF --tier skill
# ./run.sh --agent hermes --model unsloth/GLM-4.5-Air-GGUF --tier bare
#
# Captures land under $HERE/.agentcap/<run-id>/; publish with `agentcap export`.
#
# Env knobs:
# UPSTREAM model server URL http://127.0.0.1:8001
# TURNS turns per task 1
# FOLLOWUP continue | templates | synthesized continue
# TIMEOUT per-turn timeout (seconds) 900

set -euo pipefail
HERE="$(cd "$(dirname "$0")" && pwd)"
export AGENTCAP_WORKSPACE="$HERE"

AGENT="" MODEL="" TIER="bare" TASKS="$HERE/tasks.txt"
while [[ $# -gt 0 ]]; do
case "$1" in
--agent) AGENT="$2"; shift 2 ;;
--model) MODEL="$2"; shift 2 ;;
--tier) TIER="$2"; shift 2 ;;
--tasks) TASKS="$2"; shift 2 ;;
-h|--help) sed -n '/^# Usage:/,/^set -euo/p' "$0" | sed 's/^# \?//; /^set -euo/d'; exit 0 ;;
*) echo "ERROR: unexpected arg: $1" >&2; exit 2 ;;
esac
done
[[ -n "$AGENT" && -n "$MODEL" ]] || { echo "ERROR: --agent and --model are required. See: $0 --help" >&2; exit 2; }
[[ "$TIER" =~ ^(bare|clone|skill)$ ]] || { echo "ERROR: --tier must be bare|clone|skill" >&2; exit 2; }
[[ -f "$TASKS" ]] || { echo "ERROR: tasks file not found: $TASKS" >&2; exit 2; }

UPSTREAM="${UPSTREAM:-http://127.0.0.1:8001}"
TURNS="${TURNS:-1}"
FOLLOWUP="${FOLLOWUP:-continue}"
TIMEOUT="${TIMEOUT:-900}"

TOOLENV="$HERE/toolenv"
[[ -x "$TOOLENV/bin/transformers" ]] || {
echo "ERROR: transformers bundle missing at $TOOLENV. Build it first:" >&2
echo " ./build-toolenv.sh" >&2
exit 2
}

# Per-tier sandbox cwd, rebuilt fresh each invocation, with inputs/ seeded.
SANDBOX="$HERE/sandbox-$TIER"
if [[ -e "$SANDBOX/.git" ]]; then
git -C "$HERE/transformers" worktree remove --force "$SANDBOX" 2>/dev/null || true
fi
rm -rf "$SANDBOX"
if [[ "$TIER" == "clone" ]]; then
[[ -d "$HERE/transformers/.git" ]] || { echo "ERROR: clone tier needs $HERE/transformers (built by ./build-toolenv.sh)" >&2; exit 2; }
SHA="$(git -C "$HERE/transformers" rev-parse HEAD)"
git -C "$HERE/transformers" worktree add --detach "$SANDBOX" "$SHA" >/dev/null
else
mkdir -p "$SANDBOX"
fi
cp -r "$HERE/inputs" "$SANDBOX/inputs"

# Only the skill tier passes --skills; empty otherwise. Expanded set-u-safe below
# (bash 3.2 treats "${arr[@]}" of an empty array as an unbound-variable error).
skill_args=()
[[ "$TIER" == "skill" ]] && skill_args=(--skills "$HERE/skill")

echo ">>> agent=$AGENT model=$MODEL tier=$TIER tasks=$(basename "$TASKS") upstream=$UPSTREAM" >&2
agentcap run \
--agent "$AGENT" \
--model "$MODEL" \
--upstream "$UPSTREAM" \
--sandbox "$SANDBOX" \
--tool-dir "$TOOLENV" \
--label "$TIER" \
"${skill_args[@]+"${skill_args[@]}"}" \
--tasks "$TASKS" \
--turns "$TURNS" \
--followup "$FOLLOWUP" \
--timeout "$TIMEOUT"

echo "done. captures under $HERE/.agentcap/ (agentcap ls). publish: agentcap export" >&2
17 changes: 17 additions & 0 deletions examples/transformers-agentic/skill/agents/AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
<skills>

You have additional SKILLs documented in directories containing a "SKILL.md" file.

These skills are:
- transformers -> "skills/transformers/SKILL.md"

IMPORTANT: You MUST read the SKILL.md file whenever the description of the skills matches the user intent, or may help accomplish their task.

<available_skills>

transformers: `Run one-off Hugging Face Transformers inference from the command line — classify, ner, qa, fill-mask, summarize, translate, tokenize, caption, image-classify, detect, vqa, transcribe, audio-classify, generate, and more. Use this skill whenever a task asks you to run a named model on text, an image, or audio: invoke the `transformers` CLI (e.g. `transformers --format json classify --text "..." --model ...`) rather than hand-writing a Python `pipeline(...)` script. Run `transformers --help` for the full command list.`
</available_skills>

Paths referenced within SKILL folders are relative to that SKILL.

</skills>
Loading
Loading