Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,8 @@ Other install methods: [pip install](#alternative-install-with-pip) | [uv instal

## 🔥🔥🔥 News (Pacific Time)

- June 5, 2026 (latest, **v3.05.82**): **Adaptive Markdown streaming — live output stays correct on every device** by auto-selecting a per-device tier (`live` in-place redraw on capable terminals incl. modern SSH emulators, append-only `commit` for SSH/Apple Terminal/pipes/CJK text so frames never duplicate, `plain` fallback); also ships a visual `/context` usage grid and a 1M context window for `deepseek-v4-flash`. Details: [docs/guides/features.md](docs/guides/features.md) · [docs/news.md](docs/news.md).
- June 5, 2026 (latest, **v3.05.82**): **User-controllable token/cost budgets** — `/budget $5` / `/budget 200k` / `/budget daily $20` cap spend per session or per day, enforced before each model call; on hit the session auto-saves and you're shown how to `/resume` or raise the cap and continue (warns at ≥80%/95%; `--budget` sets it at startup). Details: [docs/guides/features.md](docs/guides/features.md) · [docs/news.md](docs/news.md).
- June 5, 2026: **Adaptive Markdown streaming — live output stays correct on every device** by auto-selecting a per-device tier (`live` in-place redraw on capable terminals incl. modern SSH emulators, append-only `commit` for SSH/Apple Terminal/pipes/CJK text so frames never duplicate, `plain` fallback); also ships a visual `/context` usage grid and a 1M context window for `deepseek-v4-flash`. Details: [docs/guides/features.md](docs/guides/features.md) · [docs/news.md](docs/news.md).
- June 4, 2026 (**v3.05.81**): **Claude-Code-style quiet output** hides per-tool execution and shows one summary line per turn (on by default), with a live spinner timer + token estimate and a `✻ Worked for…` footer; `/verbose` overrides, toggle with `/quiet`. Details: [docs/guides/features.md](docs/guides/features.md) · [docs/news.md](docs/news.md).
- June 4, 2026: **Context-window override** — `/config context_window=<N>` sets the context length that drives the prompt `%`, `/context`, the compaction trigger, and the output cap consistently (distinct from `max_tokens`; read live, no restart). Details: [docs/guides/reference.md](docs/guides/reference.md) · [docs/news.md](docs/news.md).
- June 4, 2026: **Rich Live streaming** keeps long responses live via a bounded tail window — redrawing only the most recent screenful and committing the full output when done, fixing duplicate/stale frames (builds on PR #133). Details: [docs/guides/features.md](docs/guides/features.md) · [docs/news.md](docs/news.md).
Expand Down
41 changes: 38 additions & 3 deletions agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,19 @@ class PermissionRequest:
description: str
granted: bool = False

@dataclass
class QuotaPause:
"""Yielded when a configured budget is reached, instead of making a billable
call. The REPL auto-saves the session and tells the user how to resume or
raise the budget. ``usage`` is the snapshot from quota.get_usage(); the
key/scope/unit/limit identify which cap broke so the hint targets it."""
reason: str
usage: dict = field(default_factory=dict)
key: str | None = None
scope: str | None = None
unit: str | None = None
limit: float | None = None


# ── Agent loop ─────────────────────────────────────────────────────────────

Expand Down Expand Up @@ -149,12 +162,34 @@ def run(
removed=_before_len - len(state.messages))

# ── Quota check — before spending tokens ──────────────────────────
# Project this request's INPUT so a single large (tool-heavy) call can't
# blow past the cap, then clamp the OUTPUT cap to the remaining headroom
# so the response can't either — keeping the overshoot near zero.
_proj_tokens, _proj_cost = 0, 0.0
_call_config = config
if any(config.get(k) for k in ("session_token_budget", "session_cost_budget",
"daily_token_budget", "daily_cost_budget")):
try:
from compaction import estimate_tokens as _est_tok
from providers import calc_cost as _calc_cost
_proj_tokens = (_est_tok(state.messages)
+ _est_tok([{"role": "system", "content": system_prompt}]))
_proj_cost = _calc_cost(config["model"], _proj_tokens, 0)
except Exception:
_proj_tokens, _proj_cost = 0, 0.0
try:
_quota.check_quota(session_id, config)
_quota.check_quota(session_id, config,
projected_tokens=_proj_tokens, projected_cost=_proj_cost)
except _quota.QuotaExceeded as qe:
_log.warn("quota_exceeded", session_id=session_id, reason=qe.reason)
yield TextChunk(f"\n[Quota exceeded — {qe.reason}]\n")
yield QuotaPause(qe.reason, _quota.get_usage(session_id),
key=qe.key, scope=qe.scope, unit=qe.unit, limit=qe.limit)
break
_room = _quota.output_room(session_id, config, _proj_tokens, _proj_cost)
if _room is not None:
_cur_cap = config.get("max_tokens") or 4096
if _room < _cur_cap:
_call_config = {**config, "max_tokens": max(256, int(_room))}

# NIM-only: when build.nvidia.com rate-limits a model, cycle to
# the next free-tier model before consuming a regular retry. Capped
Expand All @@ -177,7 +212,7 @@ def run(
system=system_prompt,
messages=state.messages,
tool_schemas=get_tool_schemas(),
config=config,
config=_call_config,
):
if isinstance(event, (TextChunk, ThinkingChunk)):
yield event
Expand Down
62 changes: 60 additions & 2 deletions cheetahclaws.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
/history Print conversation history
/context Show context window usage
/cost Show API cost this session
/budget View or set token/cost budgets (session + daily)
/status Show current session status (model, mode, tokens, cost)
/verbose Toggle verbose mode
/quiet Toggle compact tool display (hide execution, show per-turn summary)
Expand Down Expand Up @@ -239,7 +240,7 @@ def __getattr__(self, name):

# ── Core commands ──────────────────────────────────────────────────────────
from commands.core import (
cmd_help, cmd_clear, cmd_context, cmd_cost, cmd_compact,
cmd_help, cmd_clear, cmd_context, cmd_cost, cmd_budget, cmd_compact,
cmd_init, cmd_export, cmd_copy, cmd_status, cmd_doctor,
cmd_proactive, cmd_image, cmd_circuit, cmd_web, run_setup_wizard,
)
Expand Down Expand Up @@ -452,6 +453,7 @@ def _proactive_watcher_loop(config):
"search": cmd_search,
"context": cmd_context,
"cost": cmd_cost,
"budget": cmd_budget,
"verbose": cmd_verbose,
"quiet": cmd_quiet,
"thinking": cmd_thinking,
Expand Down Expand Up @@ -615,6 +617,7 @@ def handle_slash(line: str, state, config) -> Union[bool, tuple]:
"search": ("Search past sessions", []),
"context": ("Visualize context-window usage by category", []),
"cost": ("Show cost estimate", []),
"budget": ("View or set token/cost budgets (session + daily)", ["session", "daily", "clear"]),
"verbose": ("Toggle verbose output", []),
"quiet": ("Toggle compact tool display", []),
"thinking": ("Toggle extended thinking", []),
Expand Down Expand Up @@ -895,7 +898,7 @@ def _headless_run_query(prompt: str, is_background: bool = False) -> None:
def repl(config: dict, initial_prompt: str = None):
from cc_config import HISTORY_FILE
from context import build_system_prompt
from agent import AgentState, run, TextChunk, ThinkingChunk, ToolStart, ToolEnd, TurnDone, PermissionRequest
from agent import AgentState, run, TextChunk, ThinkingChunk, ToolStart, ToolEnd, TurnDone, PermissionRequest, QuotaPause

if HAS_PROMPT_TOOLKIT:
# Inject live providers so ui.input's completer enumerates the same
Expand Down Expand Up @@ -1101,6 +1104,7 @@ def run_query(user_input: str, is_background: bool = False):
turn_start = time.monotonic()
turn_in_tokens = 0
turn_out_tokens = 0
quota_paused = False # set when a budget is reached mid-turn
streamed_chars = 0

# Rebuild system prompt each turn (picks up cwd changes, etc.)
Expand Down Expand Up @@ -1251,6 +1255,38 @@ def run_query(user_input: str, is_background: bool = False):
f"\n [tokens: +{event.input_tokens} in / "
f"+{event.output_tokens} out]", "dim"
))

elif isinstance(event, QuotaPause):
# A configured budget was reached BEFORE making the next
# (billable) call. Auto-save so nothing is lost, then tell
# the user how to resume or raise the budget and continue.
_stop_tool_spinner()
spinner_shown = False
flush_response()
quota_paused = True
print()
print(clr(f" ⛔ Budget reached — {event.reason}", "yellow", "bold"))
# save_latest() prints the saved paths itself — don't echo.
try:
from commands.session import save_latest
save_latest("", state, config)
except Exception:
pass
# Suggest raising the cap that actually broke, in its own
# unit/scope — a token cap can't be lifted with a $ amount.
try:
import quota as _q
_pre = "daily " if event.scope == "daily" else ""
_amt = _q.fmt_amount((event.limit or 0) * 2, event.unit or "tok")
_raise_cmd = f"/budget {_pre}{_amt}" if event.limit else "/budget 40k"
except Exception:
_raise_cmd = "/budget 40k"
print(clr(" To continue:", "bold"))
print(" • raise it: " + clr(_raise_cmd, "cyan")
+ " (or " + clr("/budget clear", "cyan") + "), then resend your message")
print(" • later: restart and run " + clr("/resume", "cyan")
+ " to pick up where you left off")
print(" • view usage: " + clr("/budget", "cyan"))
except KeyboardInterrupt:
_stop_tool_spinner()
flush_response()
Expand Down Expand Up @@ -1285,6 +1321,15 @@ def run_query(user_input: str, is_background: bool = False):
if quiet:
print_turn_stats(time.monotonic() - turn_start,
turn_in_tokens, turn_out_tokens)
# Budget proximity warnings (≥80% / ≥95%) — heads-up before the hard
# stop arrives. Skipped when this turn already hit the cap.
if not quota_paused:
try:
import quota as _quota
for _level, _msg in _quota.warnings(config.get("_session_id", "default"), config):
(err if _level == "crit" else warn)(f" ⚠ Budget: {_msg} — /budget to view")
except Exception:
pass
print(clr("╰──────────────────────────────────────────────", "dim"))
print()

Expand Down Expand Up @@ -1912,6 +1957,10 @@ def main():
help="Show each tool call instead of a per-turn summary")
parser.add_argument("--thinking", action="store_true",
help="Enable extended thinking")
parser.add_argument("--budget", metavar="AMOUNT",
help="Session budget cap, e.g. --budget $5 (cost) or "
"--budget 200k (tokens). Auto-saves and prompts to "
"resume / raise when reached.")
parser.add_argument("--version", action="store_true", help="Print version")
parser.add_argument("--setup", action="store_true", help="Run interactive setup wizard")
parser.add_argument("--web", action="store_true",
Expand Down Expand Up @@ -1994,6 +2043,15 @@ def main():
config["quiet"] = False
if args.thinking:
config["thinking"] = True
if getattr(args, "budget", None):
import quota as _quota
try:
_kind, _val = _quota.parse_budget(args.budget)
config[_quota.BUDGET_KEYS[(_kind, "session")]] = _val
_shown = _quota.fmt_amount(_val, "usd" if _kind == "cost" else "tok")
print(clr(f" Session {'cost' if _kind == 'cost' else 'token'} budget: {_shown}", "dim"))
except ValueError as _e:
warn(f"--budget: {_e} (e.g. --budget $5 or --budget 200k); ignoring.")

# ── Setup wizard: --setup flag or first-run auto-trigger ─────────────
from cc_config import CONFIG_FILE
Expand Down
80 changes: 80 additions & 0 deletions commands/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,86 @@ def cmd_cost(_args: str, state, config) -> bool:
return True


def _budget_bar(pct: float | None, width: int = 16) -> str:
filled = int(round((pct or 0) / 100 * width))
filled = max(0, min(width, filled))
return "█" * filled + "░" * (width - filled)


def cmd_budget(args: str, state, config) -> bool:
"""View or set token / cost budgets (session + daily).

/budget show usage vs every budget (bars + %)
/budget $5 session cost cap (the $ means USD)
/budget 200k session token cap (supports 200k / 1.5m / 200000)
/budget daily $20 daily cost cap · /budget daily 2m daily tokens
/budget clear remove all caps (unlimited)
"""
import quota as _quota
from cc_config import save_config

arg = args.strip()
sid = config.get("_session_id", "default")

# ── view ────────────────────────────────────────────────────────────────
if not arg:
rows = _quota.usage_vs_limits(sid, config)
print(clr(" Token Budget", "bold"))
any_set = False
for r in rows:
used = _quota.fmt_amount(r["used"], r["unit"])
if r["limit"] is None:
print(f" {r['label']:<15} {used:>9} " + clr("unlimited", "dim"))
continue
any_set = True
lim = _quota.fmt_amount(r["limit"], r["unit"])
pct = r["pct"] or 0
color = "red" if pct >= 95 else ("yellow" if pct >= 80 else "green")
print(f" {r['label']:<15} {used:>9} / {lim:<9} "
f"{clr(_budget_bar(pct), color)} {pct:4.0f}%")
print()
if any_set:
info(" Change: /budget $5 · /budget 200k · /budget daily $20 · /budget clear")
else:
info(" No budgets set (unlimited). Set one: /budget $5 · /budget 200k · /budget daily $20")
return True

# ── clear ─────────────────────────────────────────────────────────────────
if arg.lower() in ("clear", "off", "none", "reset", "unlimited"):
for key in _quota.BUDGET_KEYS.values():
config[key] = None
save_config(config)
ok("All budgets cleared (unlimited).")
return True

# ── set ───────────────────────────────────────────────────────────────────
parts = arg.split()
scope = "session"
if parts[0].lower() in ("session", "daily"):
scope, rest = parts[0].lower(), " ".join(parts[1:])
else:
rest = arg
if not rest.strip():
err("Usage: /budget [session|daily] <amount> — e.g. /budget $5 · /budget daily 2m")
return True
try:
kind, value = _quota.parse_budget(rest)
except ValueError as e:
err(f"{e}. Examples: /budget $5 (cost) · /budget 200k (tokens) · /budget daily $20")
return True
config[_quota.BUDGET_KEYS[(kind, scope)]] = value
# One budget per scope: a new cap replaces the other unit for that scope, so
# e.g. setting a $ cap clears a leftover token cap that would still block.
config[_quota.BUDGET_KEYS[("tokens" if kind == "cost" else "cost", scope)]] = None
save_config(config)
shown = _quota.fmt_amount(value, "usd" if kind == "cost" else "tok")
ok(f"{scope.capitalize()} budget set to {shown} "
f"({'cost' if kind == 'cost' else 'tokens'}).")
info(f"Replaces any previous {scope} cap. Checked before each model call; "
"auto-saves and shows how to resume when reached.")
return True


def cmd_compact(args: str, state, config) -> bool:
"""Manually compact conversation history."""
from compaction import manual_compact
Expand Down
1 change: 1 addition & 0 deletions docs/guides/features.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,5 +55,6 @@ and indexed in the [README Documentation section](../../README.md#documentation)
| Cloud sync | `/cloudsave` syncs sessions to private GitHub Gists; auto-sync on exit; load from cloud by Gist ID. No new dependencies (stdlib `urllib`). |
| Extended Thinking | Toggle on/off for Claude models; native `<think>` block streaming for local Ollama reasoning models (deepseek-r1, qwen3, gemma4) |
| Cost tracking | Token usage + estimated USD cost |
| Token / cost budgets | `/budget` sets and views spend caps — per-session or per-day, in tokens or USD (`/budget $5`, `/budget 200k`, `/budget daily $20`, `/budget clear`; or `--budget $5` at startup). **One budget per scope**: a new cap replaces the other unit for that scope (so switching tokens↔USD just works, no stale cap left blocking). Enforced before each model call, and **tight** — it projects the next request's input and clamps its output cap, so a single tool-heavy turn can't overshoot the budget. Warns at ≥80%/95%. When a cap is hit the session is **auto-saved** and you're shown how to `/resume` later or raise the **same** cap (the hint matches the breached unit) and continue — nothing is lost. Backed by `quota.py`; the daemon ships conservative defaults (200k tok / $2 per session) in `serve` mode. |
| Non-interactive mode | `--print` flag for scripting / CI |
| **Web UI** | `--web` opens the browser. Multi-user accounts (bcrypt + JWT), SQLite-persisted history, session CRUD + markdown export, light/dark/system theme, `/health` + `/metrics`, auto-picks a free port if 8080 is busy. `pip install 'cheetahclaws[web]'`. See [web-ui.md](web-ui.md). |
Loading
Loading