✨ Feat: model capacity foundation — context management upgrade#3293
✨ Feat: model capacity foundation — context management upgrade#3293wuyuanfr wants to merge 142 commits into
Conversation
…city-and-request-safety Add context management upgrade design documents: - Context management production plan (EN/CN) - Memory improvement analysis and architecture - 16 workstreams for context management upgrade
…view Add review documents and update workstreams: - Phase 1-5 review documents - Findings registry and impact analysis - Updated 16 workstreams with detailed specs - Context management weekly design summary (CN)
…egistry Introduces the contract surface for W1 (Correct Model Token-Capacity Configuration) so W2/W3 development can begin against stable types. No runtime behaviour change — resolver/registry implementations land in the follow-up PR. New modules: - sdk/nexent/core/models/capacity_resolver.py: CapabilityProfile and ModelCapacitySnapshot (Pydantic v2, frozen), typed ResolverError hierarchy, compute_fingerprint() implementing the SHA-256/canonical-JSON contract from W1 ADR Decision 3, RESOLVER_VERSION constant, and a resolve_capacity() stub. - sdk/nexent/core/models/tokenizer_registry.py: TokenizerAdapter Protocol, empty REGISTRY, FallbackEstimator (char/4 heuristic that always returns counting_mode='estimated'), and resolve() function. Family-name validation pattern enforces the naming convention fixed in the ADR. - backend/consts/capability_profiles.py: CATALOG with eight approved day-one entries (openai/gpt-4o, openai/gpt-4.1, dashscope/qwen-plus, qwen-turbo, glm-5.1, silicon DeepSeek-V4-Flash, Qwen3.6-27B, Kimi-K2.6) plus CATALOG_REVISION. Design reference: doc/working/context-management-workstreams/ W1_ADR_Capability_Catalog_Storage_and_Fingerprint.md (locally hosted; team sharing channel separate from this repo per doc/.gitignore policy). Smoke-tested: fingerprint is deterministic and order-independent across unknown_capabilities and field_sources; ModelCapacitySnapshot rejects mutation; tokenizer resolve() falls back to estimated for unknown families; resolve_capacity stub raises NotImplementedError; CATALOG imports cleanly with all 8 entries. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
feat(W1): add type skeleton for ModelCapacityResolver and tokenizer registry
Adds seven nullable capacity fields to model_record_t so the ModelCapacityResolver can read operator overrides per W1 ADR: - context_window_tokens - max_input_tokens - max_output_tokens - default_output_reserve_tokens - tokenizer_family - capacity_source - capability_profile_version All columns are nullable, no defaults that change semantics. Legacy max_tokens is left untouched and continues to behave as a deprecated output-cap alias until consumers migrate (separate follow-up). Touchpoints: - docker/sql/v2.2.0_0615_add_capacity_fields_to_model_record_t.sql: idempotent upgrade with ALTER TABLE ... ADD COLUMN IF NOT EXISTS + COMMENT ON COLUMN. - docker/init.sql: fresh-install CREATE TABLE inline plus COMMENT ON COLUMN. - k8s/helm/nexent/charts/nexent-common/files/init.sql: same for k8s deploys. - backend/database/db_models.py: ModelRecord ORM columns. - backend/consts/model.py: ModelRequest Pydantic schema fields so CRUD round-trips the new values. Design reference: doc/working/context-management-workstreams/ W1_ADR_Capability_Catalog_Storage_and_Fingerprint.md (Decision 1, schema). Verification: - ORM exposes all 7 columns - Pydantic ModelRequest exposes all 7 fields - All three SQL files contain 14 occurrences (column + COMMENT per field) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Move W1_ADR_Capability_Catalog_Storage_and_Fingerprint.md from context-management-workstreams to context-management-workstream/ADRs for better organization. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
feat(W1): add capacity columns to model_record_t (additive migration)
Replaces the resolve_capacity NotImplementedError stub with the real ModelCapacityResolver per W1 ADR. The resolver: - Looks up the (provider, model_name) entry in the capability profile catalog passed by the caller. - Merges operator overrides over the profile (operator wins). - Validates that hard capacity is known and not impossible (output cap cannot exceed combined window; capacities must be positive). - Defaults requested_output_tokens to the profile's default_output_reserve_tokens; rejects requests that exceed max_output_tokens. - Derives provider_input_limit_tokens as min(max_input_tokens, context_window_tokens - requested_output_tokens) using only the limits that are defined. - Asks tokenizer_registry for (adapter, counting_mode); records capability gaps in unknown_capabilities. - Computes the deterministic SHA-256/canonical-JSON fingerprint from the resolved contract and builds an immutable ModelCapacitySnapshot. The resolver stays pure: the SDK never reads DB or env; backend callers supply the capability_profiles dict and operator_overrides. This matches CLAUDE.md's SDK layer rules. Typed failures raised on invalid input: - ProviderCapabilityUnknown (no hard capacity) - InvalidCapacityConfiguration (non-positive values, output > window, derived input limit non-positive) - RequestedOutputExceedsCap (request above max_output_tokens) Tests (15, all passing): - Catalog lookup + override precedence - Uncataloged with operator-supplied capacity - Rejection: missing capacity, impossible values, negative values, requested-output overflow - Default requested_output behavior - Separate-input-limit path (synthetic, no day-one model uses it) - Combined window + separate input limit takes minimum - Snapshot immutability (Pydantic ValidationError on mutation) - Fingerprint determinism and sensitivity to request changes - Tokenizer estimated-mode flag appears in unknown_capabilities Design reference: doc/working/context-management-workstreams/ W1_ADR_Capability_Catalog_Storage_and_Fingerprint.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
feat(W1): implement resolve_capacity with catalog + operator override
…LLM output cap ModelConfig (sdk/nexent/core/agents/agent_model.py): - Add max_output_tokens as the preferred name per W1 ADR. - Keep max_tokens as a deprecated alias; a model_validator backfills the unset side so old and new callers both work during migration. - Add the remaining capacity-snapshot fields so a ModelConfig can carry the resolved values from backend service down to the SDK: context_window_tokens, max_input_tokens, default_output_reserve_tokens, tokenizer_family, capacity_source, capability_profile_version. OpenAIModel (sdk/nexent/core/models/openai_llm.py): - Accept max_output_tokens (preferred) and max_tokens (deprecated). If only the legacy name is passed, log a debug and remap to max_output_tokens. - Internal attribute renamed to self.max_output_tokens; self.max_tokens is kept as an alias for any reader. - chat.completions.create still receives wire field max_tokens; only the internal name changed. NexentAgent.create_model (sdk/nexent/core/agents/nexent_agent.py): - Construct OpenAIModel with max_output_tokens=model_config.max_output_tokens so the new name flows through end-to-end. Backward compatibility: - Existing callers that set ModelConfig.max_tokens see no behavior change (validator copies it into max_output_tokens; the wire payload is identical). - Existing callers reading OpenAIModel.max_tokens see no behavior change (alias attribute returns the same value). Verified by table-driven smoke test of all four (max_tokens, max_output_tokens) combinations on ModelConfig. Design reference: doc/working/context-management-workstreams/W1_*.md and W1 ADR. Provider adapters (step 3) and create_agent_info (step 6) follow. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…p legacy max_tokens Replaces the long-standing bug where `model_info['max_tokens']` (a deprecated output cap, semantically wrong) was assigned to ContextManagerConfig.token_threshold (an input/context budget). The fix wires ModelCapacityResolver into the runtime path so the context manager receives a real input budget derived from the capacity snapshot. Changes in backend/agents/create_agent_info.py: - Add _resolve_input_budget(model_info): pulls operator overrides from the new model_record_t capacity columns, calls resolve_capacity(...) with the CATALOG from backend.consts.capability_profiles, and returns snapshot.provider_input_limit_tokens. - On ProviderCapabilityUnknown (uncataloged model with no operator-supplied hard capacity), falls back to a safe constant _TOKEN_THRESHOLD_LEGACY_FALLBACK (8192) so the migration window doesn't break existing setups. Logged prominently so admins know to backfill. - create_agent_config: stops reading model_info['max_tokens'] and passes the resolved input_budget into ContextManagerConfig.token_threshold. - create_model_config_list: passes all seven new capacity columns (context_window_tokens, max_input_tokens, max_output_tokens, default_output_reserve_tokens, tokenizer_family, capacity_source, capability_profile_version) through to the SDK ModelConfig so end-to-end capacity flow works. This is the end of the legacy max_tokens-as-context-threshold confusion. ModelConfig.max_tokens stays as a deprecated alias per W1 step 4; this commit removes its only known misuse from the runtime path. The fallback constant is intentionally conservative — it kicks compression early for unmigrated models so behavior degrades gracefully rather than overflowing provider context. W2 will subtract its 10% uncertainty reserve on top of the resolver's output once enforcement phase begins. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…neering methodology and recommendations for Nexent's evolution
…nexent into doc/context-management-upgrade
Restore W1_ADR_Capability_Catalog_Storage_and_Fingerprint.md from doc/context-management-upgrade branch to context-management-workstreams/ADRs directory. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Persist resolved model capacity snapshot metadata on model monitoring records so per-request telemetry can report total window, output reserve, safe input budget, source, tokenizer mode, unknown capabilities, and fingerprint. - add nullable monitoring columns to ORM, fresh-install SQL, and idempotent upgrade migration - bind resolved capacity snapshots from agent creation into SDK monitoring context - enrich LLM, client-level, and record_model_call monitoring rows with snapshot fields - cover enqueue and ORM payload behavior in SDK monitoring tests Verification: - env PYTHONPATH=/home/feiran/nexent/sdk:/home/feiran/nexent:/home/feiran/nexent/backend uv run --project /home/feiran/nexent/backend pytest --rootdir=/home/feiran/nexent --import-mode=importlib /home/feiran/nexent/test/sdk/monitor/test_monitoring.py - env PYTHONPATH=/home/feiran/nexent/sdk:/home/feiran/nexent:/home/feiran/nexent/backend uv run --project /home/feiran/nexent/backend pytest --rootdir=/home/feiran/nexent --import-mode=importlib /home/feiran/nexent/test/sdk/core/models/test_capacity_resolver.py - env PYTHONPATH=/home/feiran/nexent/sdk:/home/feiran/nexent:/home/feiran/nexent/backend uv run --project /home/feiran/nexent/backend python -m py_compile backend/agents/create_agent_info.py backend/database/db_models.py sdk/nexent/core/agents/agent_model.py sdk/nexent/core/agents/run_agent.py sdk/nexent/monitor/monitoring.py sdk/nexent/monitor/__init__.py Co-Authored-By: Codex <codex@openai.com>
Expose provider-supplied token-capacity metadata as advisory candidate fields in discovery responses without promoting them into persisted model records. - add shared candidate extraction for common context, output, input, reserve, and tokenizer aliases - wire SiliconFlow, DashScope, TokenPony, and ModelEngine adapters to attach provider_candidate hints when present - keep prepare_model_dict from persisting provider_candidate fields automatically - cover positive and no-hint paths for provider discovery Verification: - env PYTHONPATH=/home/feiran/nexent/sdk:/home/feiran/nexent:/home/feiran/nexent/backend uv run --project /home/feiran/nexent/backend pytest --rootdir=/home/feiran/nexent --import-mode=importlib /home/feiran/nexent/test/backend/services/providers/test_silicon_provider.py /home/feiran/nexent/test/backend/services/providers/test_dashscope_provider.py /home/feiran/nexent/test/backend/services/providers/test_tokenpony_provider.py /home/feiran/nexent/test/backend/services/providers/test_modelengine_provider.py /home/feiran/nexent/test/backend/services/test_model_provider_service.py::test_prepare_model_dict_does_not_persist_provider_capacity_candidates - env PYTHONPATH=/home/feiran/nexent/sdk:/home/feiran/nexent:/home/feiran/nexent/backend uv run --project /home/feiran/nexent/backend python -m py_compile backend/services/providers/base.py backend/services/providers/silicon_provider.py backend/services/providers/dashscope_provider.py backend/services/providers/tokenpony_provider.py backend/services/providers/modelengine_provider.py Co-Authored-By: Codex <codex@openai.com>
Add explicit model-capacity controls to model management so operators can promote known capacity values through the existing model create and update flows.
- extend frontend model types and service request/response mappings for capacity fields
- add shared capacity form controls with tokenizer autocomplete, source badge, profile version text, and legacy max_tokens warning
- wire capacity validation and operator payloads into Add/Edit Model dialogs
- localize labels, tooltips, source names, and validation messages in en/zh
Verification:
- npm run type-check
- node -e "const fs=require('fs'); for (const f of ['frontend/public/locales/en/common.json','frontend/public/locales/zh/common.json']) { JSON.parse(fs.readFileSync(f,'utf8').replace(/^\uFEFF/,'')); } console.log('locale json ok')"
Co-Authored-By: Codex <codex@openai.com>
Review and accept decisions for 5 findings: - CM-018: structural validation blocks commit, semantic quality routes to W15 SLO - CM-021: source lineage + mandatory presence validation blocks, semantic coverage to W15 - CM-024: use claim-scoped production readiness terminology - CM-017: finite initial conflict set with explicit unresolved failure - CM-025: subagent as independent agent with parent_session_id, async tool delegation, no recursion Updated: finding-review-decisions.md, findings-registry.md (20/26 complete), W4, W6, W10, W11, W12, W13, parent plan. Added: pending-findings-decision-sheet.md for decision tracking. Remaining 6 findings (CM-009, CM-010, CM-014, CM-015, CM-022, CM-026) pending individual discussion.
…lease 1 gates Remove multimodal testing from Release 1 SLO gates. W15 covers text modality only; add modality contracts when specific product requirements emerge. Updated: finding-review-decisions.md, findings-registry.md (21/26 complete), W15, W3, pending-findings-decision-sheet.md.
…ents Architectural simplification: checkpoints are no longer an independent subsystem (W7). Compression results are stored as compression.snapshot events within the W5 execution event log. Recovery finds the latest compression.snapshot event and replays subsequent events. Eliminates: - Independent checkpoint table and CAS concurrency control - Redis checkpoint cache layer - W8 checkpoint-specific validation - CM-014 checkpoint schema migration (covered by CM-005) - W7 publication outbox for cross-system consistency Updated: W5 (compression.snapshot event type, recovery flow, dirty-state flush), W6, W8, W9, W13, W14, W15, parent plan, README, review artifacts. Deleted: W7_Durable_Multi_Worker_Context_State.md. CM-014 marked N/A (22/26 findings complete).
…plementation measurement Do not pre-define workload envelopes. After W1-W16 implementation, use W15 measurement infrastructure to collect real performance data and define envelopes based on observed data. No production-scale claim until envelopes are defined. Aligns with CM-004 (measure before optimizing) and CM-011 (evidence-based gates). Progress: 23/26 findings complete.
…mentation measurement Do not pre-define numeric availability, RPO, RTO, rebuild time, queue lag, or storage capacity targets. After W1-W16 implementation, use W15 measurement infrastructure to collect real recovery/availability data per topology and define targets based on observed data. No production-scale claim until targets are defined. Aligns with CM-009 (measure before defining envelopes) and CM-011 (evidence-based gates). Progress: 24/26 findings complete.
…ata validation W7 retirement eliminates the primary O(history) hashing consumer. Replace content hashing with metadata-based validation at three points: 1. compression.snapshot: partial_after_erasure + version fields 2. W6 materialized cache: snapshot validity + event count + version fields 3. Physical erasure: one-time partial_after_erasure flag No Merkle trees or segmented hashing needed. Storage-layer integrity handled by database checksums, not W8. Progress: 25/26 findings complete.
…ed OpenTelemetry spec Consolidate all decision trace requirements (W5, W6, W10, W15) into a single unified telemetry/observability specification (low priority, post-core). Use OpenTelemetry-style spans/attributes/events collected by external observability infrastructure, not product-internal persistence. Updated: W15 (replace decision trace persistence with OTel output), parent plan (replace decision trace references with unified telemetry spec), finding-review-decisions.md, findings-registry.md (26/26 complete), pending-findings-decision-sheet.md. All 26 findings now reviewed and decided.
Step 7 added capacity controls to ModelEditDialog (the OpenAI-API-Compatible
"custom model" edit path) but missed ProviderConfigEditDialog, the dialog
opened by the per-model gear icon under provider-categorized sections
(SiliconFlow / DashScope / TokenPony / ModelEngine). For any model whose
model_factory matches a recognized provider — including the W1 catalog
keys 'dashscope' / 'silicon' / 'tokenpony' — that gear icon was the only
edit path, leaving operators no way to set context_window_tokens et al.
Changes:
- ProviderConfigEditDialog: accept optional initialCapacity and
hideCapacityFields props; render ModelCapacityFields when supported;
include capacity payload in onSave callback shape.
- modelService.updateBatchModel: accept and forward the 6 capacity
fields (context_window_tokens, max_input_tokens, max_output_tokens,
default_output_reserve_tokens, tokenizer_family, capacity_source) to
the existing batch_update_models endpoint, which already pass-throughs
arbitrary update_data per backend/services/model_management_service.py
line 347.
- ModelDeleteDialog single-model gear path: pass current capacity values
from selectedSingleModel as initialCapacity, and forward saved capacity
fields into the updateBatchModel call.
- ModelDeleteDialog provider-level "Edit Config" path: pass
hideCapacityFields={true} since handleProviderConfigSave applies
settings batch-wise to all models from one provider and per-model
capacity is not a batch concept.
No behavior change for callers that don't pass initialCapacity (backward
compatible). Verified with npm run type-check.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…odules pollution Two tests (test_get_models_llm_success, test_get_models_embedding_success) failed intermittently when test_model_provider_service.py ran after test_capacity_resolver.py or test_silicon_provider.py. Root cause: silicon_provider is loaded under two distinct sys.modules keys — `services.providers.silicon_provider` (the path production code uses) and `backend.services.providers.silicon_provider` (the path some test files use). Each binding gets its own `SILICON_GET_URL` attribute because `silicon_provider.py` does `from consts.provider import SILICON_GET_URL`, which copies the value into the importing module's namespace. When both keys are present, mock.patch targeting only the `backend.` path silently fails to override the value used by the production code path that SiliconModelProvider.get_models executes. Fix: introduce _patch_provider_module_constant context manager that patches the named attribute on every loaded copy of the module. Apply to all four SILICON_GET_URL mock.patch sites in this file. Verification: - 289 tests pass under the previously-failing combined order: test/sdk/core/models/test_capacity_resolver.py + test/sdk/monitor/test_monitoring.py + test/backend/services/providers/ + test/backend/services/test_model_provider_service.py The helper is order-independent and safe even when one of the two sys.modules paths is absent. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Align provider URL detection with the frontend hint table in
frontend/const/modelConfig.ts and expand the catalog:
- HOST_PROVIDER_PATTERNS: add aliyuncs, deepseek, jina, bytedance and
broaden api.openai.com to openai; drop the openrouter -> modelengine
guess (OpenRouter is a multi-provider gateway, base_url alone cannot
identify the backing model).
- pick_provider_from_base_url now substring-matches the lower-cased
full URL instead of just the hostname, mirroring the frontend
detectProviderFromUrl helper so self-hosted reverse proxies that
embed the provider in the path are recognised.
- CATALOG: add ("deepseek", "deepseek-v4-flash") and
("deepseek", "deepseek-v4-pro") with the 1M / 384K specs from
https://api-docs.deepseek.com/zh-cn/quick_start/pricing. Realign
deepseek-chat and deepseek-reasoner to the same numbers because they
alias to deepseek-v4-flash non-thinking and thinking modes per
DeepSeek docs; note the 2026-07-24 deprecation in a comment so we
remove them after the cutover. Add ("dashscope", "qwen3.7-max")
cross-checked against help.aliyun.com/zh/model-studio/models and
llm-stats.com/models/qwen3.7-max. Drop the obsolete
("silicon", "deepseek-ai/DeepSeek-V4-Flash") entry. CATALOG_REVISION
bumped to 2026-06-23.4.
- test_model_capacity_suggestion_service: cover the extended host
patterns (deepseek, jina, Azure OpenAI, broader aliyuncs, reverse
proxy) and the dashscope-over-aliyuncs ordering.
- create_agent_info: drop leftover merge conflict markers around the
create_agent_run_info signature.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Single-model add: stop forwarding the hidden default `form.provider`
("modelengine") as `provider_hint` to /suggest-capacity. The dropdown
is only rendered in batch mode, so single-mode requests were silently
pinning catalog lookup to modelengine and never falling through to the
base_url inference.
Apply/save: stop overwriting `provider` / `model_factory` / single-model
`source` with `suggestion.suggested_provider`. The catalog's provider
namespace (deepseek, openai, jina, volcengine, ...) is a superset of
the frontend dropdown values (modelengine / silicon / dashscope /
tokenpony / custom); writing an unknown one back made the model vanish
from the active list and the edit dropdown, and reclassified custom
models that fuzzy-matched a known provider.
Capacity numerics (context_window_tokens, max_output_tokens, reserve,
tokenizer_family) and `canonical_model_name` are still applied --
that is the suggestion's actual job.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`capacityFormFromModel` previously auto-promoted `model.max_tokens` into
the `maxOutputTokens` form field whenever the new column was empty. That
made the edit dialog show a value the user never approved, and once
saved, persisted the legacy number into max_output_tokens as if the
operator had typed it in.
Now the legacy value is surfaced via a new `legacyMaxTokensCandidate`
prop on ModelCapacityFields. When the input is empty and the record has
a legacy value, the panel renders a warning Alert with the actual number
plus an [Apply] button; clicking it writes the value into the form and
the prompt clears itself. Independent from the suggest-capacity flow --
shows whenever the condition holds, no extra trigger.
Two call sites in ModelEditDialog (main edit dialog and
ProviderConfigEditDialog) pass the candidate. Batch flows in
ModelAddDialog already avoided passing legacy max_tokens, so they need
no change.
Locale keys added: model.dialog.capacity.legacyMaxTokensDetected (zh/en,
with {{value}} interpolation) and .apply.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Four small revisions in the explainer to match what the code actually does now -- no behavioral claims, just removing stale "future work" hedges and one outright-wrong UI-visibility note. - §2.1 footnote: defaultOutputReserveTokens IS rendered in both Add and Edit modes (see ModelCapacityFields.tsx:399-407); update the note about the Add flow and mention that the W11 suggest button pre-fills all four capacity fields on a catalog hit. - §3 third paragraph: same correction; clarify reserve only falls back to the SDK default (4096) when the operator explicitly leaves the field empty, not because the UI hides it. - §4 example 4 fix: W11's capacity-coverage badge and the "lacks capacity" hint in the delete / edit panels are shipped, not future work; "suggest" is the one-click fix for catalog-known rows. - §5 troubleshooting row about new models getting truncated at 4K: cause/fix rewritten -- Add now exposes the field, so the failure mode is "operator left it empty" and the preferred remedy is the W11 suggest button (manual edit still listed as fallback). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…into feature/w11-capacity-suggestion
… warnings, and legacy hygiene fixes( pull request #8 ) feat(W11): capacity suggestion API, frontend UX, catalog coverage warnings, and legacy hygiene fixes
…ement-upgrade-no-working-docs # Conflicts: # backend/agents/create_agent_info.py # test/sdk/core/models/test_openai_llm.py
…rfaces The Tokenizer Family input was rendered on Add, Edit, batch Add, and the provider-level "bulk modify config" surfaces. Per the W1 ADR the value is consumed only by `sdk/nexent/core/models/tokenizer_registry.resolve`, which today has no registered adapters and unconditionally returns `(FallbackEstimator, "estimated")` -- so the input never affects runtime behavior and forcing operators to type/choose it surfaces an irrelevant implementation detail. Hidden, not removed: the field stays in form state, payload builders, batch row mapping, and DB. W11 catalog suggestions still write it silently, existing DB values are still preserved through edits, and any future adapter registration becomes a one-line change with no UI work. Backend/SDK fully decoupled: - backend `consts/model.py` request schemas keep `tokenizer_family` - catalog entries in `consts/capability_profiles.py` still set it - SDK consumes it via `tokenizer_registry.resolve` and W2's `_UNKNOWN_CAPABILITIES_REQUIRING_RESERVE` continues to trigger the 10% reserve when counting_mode is estimated Changes in this commit: - ModelCapacityFields.tsx: drop the AutoComplete input block + the `TOKENIZER_FAMILY_OPTIONS` constant + the `AutoComplete` import + the `hideTokenizer` prop (interface + destructure) - ModelEditDialog.tsx: drop the `hideTokenizer` prop from the bulk-apply call site and the now-stale "Tokenizer hidden" comment - zh/en common.json: drop the two unused locale keys Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…aults
Both fields are no longer required at any of the six capacity write
surfaces. An empty input renders a gray placeholder showing what value
would land if the user saves without typing; the form state stays "" so
nothing is silently mutated client-side. At save time, the wire-payload
builder substitutes the default into the API call only when the operator
truly left the field empty -- otherwise the typed value (or existing DB
value loaded into the form) is sent unchanged.
Defaults chosen to mirror the existing SDK fallbacks so observed runtime
behavior does not change when defaults land:
- DEFAULT_CONTEXT_WINDOW_TOKENS = 32_768
(matches `_TOKEN_THRESHOLD_LEGACY_FALLBACK` in capacity_resolver.py)
- DEFAULT_MAX_OUTPUT_TOKENS = 4_096
(matches `_DEFAULT_REQUESTED_OUTPUT_TOKENS` in capacity_resolver.py)
Constants exported from ModelCapacityFields.tsx so the snake_case mirror
in ModelAddDialog stays in sync.
Six-surface contract -- single-row write paths apply defaults; the
bulk-apply broadcast preserves "empty means do not broadcast":
- 1) ModelAddDialog single-add form -> capacityFormToSnakePayload
applies defaults
- 2) ModelEditDialog single-edit form -> buildCapacityPayload
(applyDefaults=true default)
- 3) ModelAddDialog batch-import top-defaults panel ->
capacityFormToSnakePayload(form) for batchDefaults; per-row
`model.X ?? batchDefaults.X` now never falls through to undefined
in the gate at isFormValid (the gate becomes defense-in-depth,
comment updated)
- 4) ModelAddDialog batch per-row gear (Settings Modal) ->
capacityFormToSnakePayload(modelCapacity); preload-from-row-or-
batch-default means "no-op save" already carries non-empty input
and goes through toInt unchanged. Only "row=NULL plus batch-empty"
materializes the defaults
- 5) ProviderConfigEditDialog per-row gear
(hideCapacityFields=false) -> buildCapacityPayload(capacityForm)
- 6) ProviderConfigEditDialog "modify config" bulk-apply
(hideCapacityFields=true) -> buildCapacityPayload(form,
{ applyDefaults: false }); `applyDefaultsOnEmpty={false}` on the
panel suppresses the gray placeholder so operators do not read
"empty means 32K/4K will be broadcast"
requiredFields stripped from every validateCapacityForm call site
and every ModelCapacityFields prop usage. validateCapacityForm still
enforces the data-shape checks (positive integers, output <= window,
reserve <= output) -- those are not affected by removing the
"must be non-empty" requirement.
Backend and SDK unchanged: the wire payload still ships the same
snake_case keys; the only difference is that on save, those keys are
guaranteed to carry a number (not null) for single-row writes, which
makes the `_is_bare_capacity_model` badge and the W11 catalog-coverage
banner clear themselves automatically for new rows.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three failure clusters reported by CI after merging upstream/develop
into this PR branch:
1) test_prepare_agent_run -- assert_called_once_with(...) on
create_agent_run_info was missing `tool_params=None`. Production
code at agent_service.py:2245 now passes
`tool_params=agent_request.tool_params` and AgentRequest defaults
`tool_params` to None when the fixture does not set it. Add the
kwarg to the expected call.
2) update_agent_info_impl_* (14 tests) -- W2 added
`_validate_requested_output_tokens_for_agent(request, tenant_id)`
at agent_service.py:1164. The validator reads
`request.requested_output_tokens` and compares it against the
model's `max_output_tokens`. The existing tests build their
request via `MagicMock(spec=AgentInfoRequest)` and never set
`requested_output_tokens`, so:
- either the spec exposes the field as a fresh MagicMock and the
`> max_output_tokens` comparison fails with TypeError,
- or Pydantic-v2 field introspection through dir() omits the
name and the access AttributeErrors.
Both branches are unrelated to what these tests cover, so this
commit adds a module-level autouse fixture that stubs the
validator to a no-op. Tests that want to exercise the validator
in the future can still patch it locally; module-level autouse
loses to per-test patches.
3) test_import_agent_by_agent_id_publish_version_error --
import_agent_by_agent_id reads `import_agent_info.requested_output_tokens`
directly at agent_service.py:1874 (no validator involved), so the
autouse fixture from (2) does not help. Set
`mock_agent_info.requested_output_tokens = None` on the existing
`MagicMock(spec=ExportAndImportAgentInfo)` so the access returns a
defined value instead of AttributeErroring.
4) test_create_model_success / test_create_model_deep_thinking_success
(test_nexent_agent.py) -- W1 renamed the SDK's OpenAIModel kwarg
from `max_tokens` to `max_output_tokens`. The two `assert_called_once_with`
blocks still asserted on the old name. Updated to `max_output_tokens`.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ponse shape
The production response shape at agent_service.py:1112 now includes
`requested_output_tokens` (added by W2). The mocked
`search_agent_info` payload does not include the key, so the function
returns `None` for it via `.get(...)`. Add the key to expected_result
to match.
test_import_agent_by_agent_id_publish_version_error still fails for an
unrelated reason: `create_agent`'s `mock.return_value` is configured to
`{"agent_id": 100}` but the test result shows `create_agent(...)`
returning the auto-MagicMock instead of the dict. Static analysis of
the patch wiring shows nothing wrong; needs a local repro to inspect
the mock state. Saving the partial progress first.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…lish_version_error
The test claimed to verify "import_agent_by_agent_id swallows
publish_version_impl exceptions and still returns the new agent id",
but the three lines that actually configure the patched mocks were
missing from the body:
mock_query_tools.return_value = []
mock_create.return_value = {"agent_id": 100}
mock_publish.side_effect = Exception("Publish error")
Without them every patched mock returned the default auto-MagicMock,
so `create_agent(...)` returned a MagicMock instead of the dict,
`new_agent["agent_id"]` returned `MagicMock.__getitem__()`,
publish_version_impl never raised, and `assert result == 100` failed
against the MagicMock return value.
Likely lost during the upstream/develop merge that introduced
`requested_output_tokens` to the import flow (the missing-attribute
error surfaced first, masking the deeper issue). Adding the three
configuration lines back lets the test exercise the actual code path
it was designed to cover.
Verified locally: full test_agent_service.py passes 217/217.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
🔍 Code Review Comments1. [安全/漏洞] 2. [逻辑漏洞] 3. [代码规范] |
YehongPan
left a comment
There was a problem hiding this comment.
Code Review
- [安全/漏洞]
_CAPACITY_WARNING_EMITTED是模块级set(),在多线程/多协程环境下无锁保护,存在竞态条件。应使用threading.Lock或确认 set 操作的原子性。 - [逻辑漏洞]
_resolve_input_budget中model_info.get("model_factory")返回None时,provider会是空字符串,resolve_capacity可能不接受空 provider,导致静默 fallback 到 legacy threshold,掩盖配置错误。建议对空 provider 显式记录 WARNING。 - [代码规范]
create_agent_config函数签名中request_requested_output_tokens参数缩进缺少前导空格,违反 PEP 8 参数对齐规则。
…edup with a lock Two small fixes reported during review: 1) `request_requested_output_tokens` in the `create_agent_config` signature was flush-left (zero indent) while every other parameter sits at four-space indent. Python's parser tolerates this inside parentheses, but linters and humans both stumble on it. Re-indent to align with the rest of the signature. 2) `_CAPACITY_WARNING_EMITTED` is a per-process dedup set for the "model has no W1/W2 capacity configured" operator warning. The `if dedup_key in S: return; S.add(dedup_key)` pattern was a check-then-add race: two threads on the same model could both pass the membership test before either added, leading to duplicate WARNING lines that defeat the per-process dedup contract. Wrap the test-and-set in a `threading.Lock`. The lock is released before `logger.warning(...)` so warning I/O is not serialised across paths; only the dedup decision is. Verified locally: test/backend/agents/test_create_agent_info.py 171/171 passes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1 3 is same with what @JasonW404 mentioned, the issue was fixed in commit https://github.com/ModelEngine-Group/nexent/commit/72e378eaafab2eabf8555357984ca3e6436094c2.\ fix 2 in 10a41ca |
| (1509, 'ASSET_OWNER', 'VISIBILITY', 'LEFT_NAV_MENU', '/agent-space', '/resource-space'), | ||
| (1510, 'ASSET_OWNER', 'VISIBILITY', 'LEFT_NAV_MENU', '/mcp-space', '/resource-space'), | ||
| (1511, 'ASSET_OWNER', 'VISIBILITY', 'LEFT_NAV_MENU', '/skill-space', '/resource-space'); No newline at end of file | ||
| (1511, 'ASSET_OWNER', 'VISIBILITY', 'LEFT_NAV_MENU', '/skill-space', '/resource-space'); |




Overview
Delivers the first three workstreams of the context-management production plan (W1, W2, W11).
Replaces the conflated
max_tokensfield with explicit context/input/output semantics, enforces the resolved budget at the LLM dispatch boundary, and gives operators a one-click "Suggest" path to populate capacity from an approved catalog.136 commits, 77 files changed, ~+8.6K / -0.6K LOC. Working design notes are intentionally excluded from the diff (kept in
doc/working/locally for collaboration).What changes
W1 — Correct token-capacity configuration
max_tokensinto five typed fields onmodel_record_t:context_window_tokens,max_input_tokens,max_output_tokens,default_output_reserve_tokens,tokenizer_family(+capacity_source,capability_profile_versionprovenance).ModelCapacityResolver(sdk/nexent/core/models/capacity_resolver.py) produces aModelCapacitySnapshotwithprovider_input_limit_tokensderived frommin(max_input_tokens, context_window_tokens - requested_output_tokens).CATALOGinbackend/consts/capability_profiles.pycovers OpenAI / DashScope / Silicon / DeepSeek production deployments.max_tokensretained as a deprecated alias ofmax_output_tokensfor migration compatibility; never used as a context threshold after this PR.W2 — Output and safety capacity reserve
SafeInputBudgetCalculator(sdk/nexent/core/models/capacity_budget.py) emitsSafeInputBudgetSnapshot { hard_input_budget, soft_input_budget, uncertainty_reserve, requested_output }per dispatch.soft_limit_ratiodefaults to 0.8 (CM-027); per-tenant override viatenant_config_t.config_key = 'context.soft_limit_ratio'.ag_tenant_agent_t.requested_output_tokens) and per-request (AgentRequest.requested_output_tokens) output-reserve overrides (CM-028) with validation against the model'smax_output_tokens.sdk/nexent/core/models/openai_llm.py:391-412: rejects caller-suppliedmax_tokensthat does not match the W2 snapshot'srequested_output_tokensand pins the snapshot value before the provider call. This is the trusted server-side boundary required by the production plan.W11 — Capacity suggestion on model add (post-acceptance follow-up to W1)
POST /model/suggest-capacitywith catalog-exact / normalized / fuzzy matching, and base-url → provider inference mirroring the frontendPROVIDER_HINTSmap (10 substring patterns).GET /model/capacity-coveragesurfaces "bare" LLM/VLM rows (used by the inline banner in the model management page and the provider management dialog).max_tokensmigration prompt with explicit Apply button (no more silent promotion); Tokenizer Family input hidden on all four model-config surfaces (catalog hits still write the value silently; the field is consumed bytokenizer_registrywhich has no registered adapters today, so forcing operators to type it has no runtime effect).context_window_tokensandmax_output_tokensare no longer required in the UI. Empty input shows a gray placeholder (32_768/4_096, matching the SDK fallback constants_TOKEN_THRESHOLD_LEGACY_FALLBACKand_DEFAULT_REQUESTED_OUTPUT_TOKENS). On Save, defaults are substituted into the wire payload so thebare-capacity badge clears automatically. Verified across all six write surfaces (single add/edit, batch top-defaults, batch per-row gear, provider per-row gear, provider bulk-apply broadcast); the bulk-apply path preserves "empty = do not broadcast" semantics.
Schema migrations
Six idempotent SQL files under
docker/sql/:v2.2.0_0615_add_capacity_fields_to_model_record_t.sql— five W1 columnsv2.2.0_0615_add_capacity_snapshot_to_model_monitoring_record_t.sql— W1 observabilityv2.2.0_0617_add_requested_output_tokens_to_ag_tenant_agent_t.sql— W2 agent overridev2.2.0_0617_add_w2_budget_snapshot_to_model_monitoring_record_t.sql— W2 observabilityv2.2.0_0617_backfill_w2_capacity_from_w1_catalog.sql— one-time backfill from catalogv2.2.0_0618_reconcile_max_tokens_alias.sql— coercemax_tokens↔max_output_tokensAll
IF NOT EXISTS/ON CONFLICT DO NOTHING. Existing rows with NULL capacity continue to work through the SDK fallback until edited.Backward compatibility
model_record_trows withmax_tokenspopulated and the new W1 columns NULL keep working; SDK promotes the legacy column asmax_output_tokensat resolve time.chat.completions.createcallers that previously passedmax_tokensare now rejected unless the value matches the W2 snapshot. No other production call sites changed in this PR; the broader dispatch hardening across remaining bypasses is the W10 follow-up.CAPACITY_SUGGESTION_ENABLEDgates the W11 endpoints; turning it off restores pre-W11 behavior with no UI surface for catalog suggestion.Test coverage
test_capacity_resolver.py,test_capacity_budget.py,test_openai_llm.py(dispatch enforcement),test_monitoring.py(snapshot fields).test_model_capacity_suggestion_service.py,test_model_management_service.py,test_config_utils.py, provider tests for the four batch-add adapters.tsc --noEmitclean; six-surface matrix manually verified.Notes for reviewers
tokenizer_registryis intentionally empty in this PR:tokenizer_familyis persisted from the catalog and consumed downstream, butresolve()returns(FallbackEstimator, "estimated")for every value today. The 10% uncertainty reserve fires uniformly as a consequence — this is the documented W1 ADR conservative path until verified adapters land.provider_hintfrom the single-add dialog (the hidden defaultform.provider="modelengine"would otherwise pin catalog lookup to ModelEngine for every operator). Hint is only sent in batch mode where the dropdown is user-controlled.model_factory: the catalog'ssuggested_providernamespace (deepseek,openai,jina, ...) is a superset of the frontend dropdown's allowed values, and writing an unknown one back made models vanish from the active list / edit dropdown (root cause of the per-row reclassification bug we fixed during testing).