Skip to content

Honor provider cooldown hints and add gateway OpenAI compat#1233

Merged
gh-xj merged 10 commits intoeastreams:devfrom
littlepenguin66:feat/983-996-rate-limit-openai-compat
Apr 14, 2026
Merged

Honor provider cooldown hints and add gateway OpenAI compat#1233
gh-xj merged 10 commits intoeastreams:devfrom
littlepenguin66:feat/983-996-rate-limit-openai-compat

Conversation

@littlepenguin66
Copy link
Copy Markdown
Collaborator

Summary

  • Problem:
    The provider cooldown path ignored shorter provider retry hints, and the gateway still lacked a complete OpenAI-compatible /v1/chat/completions + /v1/models surface that inherited shared runtime semantics.
  • Why it matters:
    Ignoring provider rate-limit hints causes unnecessary wait or premature retries, and a provider-only gateway compat path drifts from shared turn/runtime behavior, usage reporting, and streaming semantics expected by OpenAI-compatible clients.
  • What changed:
    Added provider rate-limit observation/parsing and threaded it through transport/failover/cooldown selection; made cooldown resolution honor provider hints when present; added the gateway OpenAI-compatible surface; routed accepted compat requests through the shared runtime path with usage propagation; disambiguated duplicate model ids; ensured streaming error termination with [DONE]; tightened Feishu/WeCom streaming request assertions.
  • What did not change (scope boundary):
    Tool calling on the OpenAI-compatible gateway surface is still explicitly unsupported; channel entrypoints did not gain a new high-level fallback integration test for streaming-disabled transports.

Linked Issues

Change Type

  • Bug fix
  • Feature
  • Refactor
  • Documentation
  • Security hardening
  • CI / workflow / release

Touched Areas

  • Kernel / policy / approvals
  • Contracts / protocol / spec
  • Daemon / CLI / install
  • Providers / routing
  • Tools
  • Browser automation
  • Channels / integrations
  • ACP / conversation / session runtime
  • Memory / context assembly
  • Config / migration / onboarding
  • Docs / contributor workflow
  • CI / release / workflows

Risk Track

  • Track A (routine / low-risk)
  • Track B (higher-risk / policy-impacting)

If Track B, fill these in:

  • Risk notes:
    This changes provider cooldown behavior, provider request parsing, shared turn/runtime wiring, gateway request validation, and streaming behavior.
  • Rollout / guardrails:
    Gateway compat rejects unsupported tool fields and non-user-ending message sequences; duplicate model ids are surfaced as profile_id:model; package-level tests cover provider, gateway, and channel paths.
  • Rollback path:
    Revert commit 6b617e3f (and follow-up doc commit if needed) to restore the pre-change provider cooldown and gateway behavior.

Validation

  • cargo fmt --all -- --check
  • cargo clippy --workspace --all-targets --all-features -- -D warnings
  • cargo test --workspace --locked
  • cargo test --workspace --all-features --locked
  • Relevant architecture / dep-graph / docs checks for touched areas
  • Additional scenario, benchmark, or manual checks when behavior changed
  • If this changes config/env fallback, limits, or defaults: include before/after behavior and regression coverage for explicit path, fallback path, and boundary values
  • If tests mutate process-global env: document how state is restored or serialized

Commands and evidence:

cargo fmt --all -- --check
cargo test -p loongclaw-app --lib
cargo test -p loongclaw

All three commands passed locally after the final fix set.

Before/after notes:
- Provider cooldowns now prefer provider hints even when shorter than the previous policy floor.
- Gateway compat accepted requests now route through shared runtime semantics instead of direct provider execution.
- Duplicate configured model ids are exposed as stable `profile_id:model` aliases.
- Streaming error responses now terminate with `[DONE]`.
- OpenAI-compatible requests that do not end in a `user` message are rejected instead of silently bypassing the shared runtime path.

User-visible / Operator-visible Changes

  • OpenAI-compatible clients can call /v1/models and /v1/chat/completions on the gateway and get shared-runtime behavior, usage propagation, stable duplicate-model aliases, and terminated SSE streams.
  • Provider cooldown behavior now honors provider retry hints more precisely.

Failure Recovery

  • Fast rollback or disable path:
    Revert the branch tip or disable use of the OpenAI-compatible gateway surface until the revert lands.
  • Observable failure symptoms reviewers should watch for:
    Unexpected 400s for malformed/non-user-ending gateway chat payloads, incorrect model alias selection, missing usage in non-streaming compat responses, or channels no longer sending "stream": true in provider requests.

Reviewer Focus

  • Provider hint parsing and cooldown precedence in crates/app/src/provider/rate_limit.rs and crates/app/src/provider/model_candidate_cooldown_runtime.rs.
  • Shared-runtime routing and usage propagation in crates/daemon/src/gateway/openai_compat.rs, crates/app/src/agent_runtime.rs, crates/app/src/chat.rs, and crates/app/src/conversation/turn_coordinator.rs.
  • Channel streaming assertions in crates/app/src/channel/feishu/webhook.rs, crates/app/src/channel/feishu/websocket.rs, and crates/app/src/channel/wecom.rs.

@github-actions github-actions Bot added documentation Improvements or additions to documentation. daemon Daemon binary, CLI entrypoints, and install flow. providers Provider routing, selection, and transport behavior. channels Channel adapters and external integration surfaces. conversation Conversation runtime, session flow, and prompt assembly. config Runtime config parsing, schema, and defaults. size: XL Very large pull request: more than 1000 changed lines. labels Apr 12, 2026
@littlepenguin66 littlepenguin66 force-pushed the feat/983-996-rate-limit-openai-compat branch from 2ec715e to 3eba385 Compare April 13, 2026 07:05
@github-actions github-actions Bot added the tools Tool runtime, policy adapters, and tool catalog behavior. label Apr 13, 2026
@gh-xj gh-xj self-assigned this Apr 14, 2026
@gh-xj
Copy link
Copy Markdown
Collaborator

gh-xj commented Apr 14, 2026

LoongClaw QA Review — PR #1233

Reviewed commit: 77b73ddd
Risk: high
Agent: ai-scientist

Findings

  • high: Streamed OpenAI-compatible turn execution can drop streamed tool calls because the streaming parser only reconstructs text deltas and finish markers, not streamed tool-call chunks. crates/app/src/provider/contracts.rs:64
  • medium: The non-streaming OpenAI-compatible gateway flattens propagated upstream 4xx/429 failures into HTTP 500, which breaks client-visible retry/auth semantics. crates/daemon/src/gateway/openai_compat.rs:229

Coverage

  • Rust-specific review: applied (manual code-path review plus targeted Rust tests)
  • Harness review: applied (repo-owned tests green; wrapper-path loongclaw-dev QA still lacks behavioral coverage)
  • Adversarial challenge: applied (the participant_id QA build blocker did not reproduce under direct cargo build, so it remains a harness inconsistency rather than a confirmed branch defect)

Open Questions

  • GitHub currently reports this PR as DIRTY; rebase is required before closure.
  • Wrapper-path E2E proof through the OpenAI-compatible gateway is still missing and should be rerun after rebase from a clean QA lane.

Verdict

Blocking findings on the current head; rebase onto dev, fix the two runtime issues, and rerun boundary-aligned gateway QA before re-dispatch.

gh-xj added 2 commits April 13, 2026 20:18
Merges dev into the PR branch to clear the current conflict set.
The release support drift docs moved on dev, so the stale top-level report stays deleted and the updated support-side artifact wins.

Constraint: PR eastreams#1233 is currently CONFLICTING against dev
Rejected: Keep the deleted top-level drift report | dev already removed that path
Confidence: high
Scope-risk: narrow
Directive: Re-run CI-parity gates after any follow-up code edits on top of this merge
Tested: Merge conflict resolution only
Not-tested: Runtime/code changes after the merge
The compat gateway branch now rebuilds streamed OpenAI tool calls and
keeps upstream HTTP status classes visible on the non-streaming gateway
surface. This keeps the PR aligned with the shared runtime after the
dev merge while preserving client-visible retry and auth behavior.

Constraint: PR eastreams#1233 had to absorb origin/dev before follow-up fixes could land
Constraint: Gateway turns currently surface provider failures as strings, so status preservation had to reuse embedded failover snapshots
Rejected: Keep generic HTTP 500 mapping | breaks OpenAI-compatible retry/auth semantics
Rejected: Duplicate failover snapshot parsing inside the gateway | reuse the existing provider parser instead
Confidence: high
Scope-risk: moderate
Directive: If streamed OpenAI turns gain richer chunk shapes, extend the parser/tests before widening observer-backed streaming further
Tested: cargo fmt --all -- --check; cargo clippy --workspace --all-targets --all-features -- -D warnings; cargo test -p loongclaw-app reconstructs_openai_tool_calls -- --nocapture; cargo test -p loongclaw gateway_openai_chat_completion_preserves_provider_rate_limit_status -- --nocapture; cargo test -p loongclaw-app provider:: -- --nocapture; cargo test -p loongclaw gateway_openai_chat_completion -- --nocapture; cargo test --workspace; cargo test --workspace --all-features
Not-tested: loongclaw-dev wrapper-path QA rerun from a clean slot after these fixes
@github-actions github-actions Bot removed the documentation Improvements or additions to documentation. label Apr 14, 2026
The governance workflow regenerates the tracked monthly architecture drift
report from the merged tree. The compat parser and gateway updates changed
those tracked metrics, so the checked-in report needed to be refreshed to
match the new merged reality.

Constraint: governance requires docs/releases/support/architecture-drift-2026-04.md to match a fresh generated report on the PR merge commit
Rejected: Leave the stale report in place | keeps governance red even though the code is otherwise mergeable
Confidence: high
Scope-risk: narrow
Directive: Whenever provider/chat/turn-coordinator size shifts on a release-tracked branch, refresh the monthly drift report before expecting governance to pass
Tested: bash scripts/check_architecture_drift_freshness.sh docs/releases/support/architecture-drift-2026-04.md
Not-tested: Full governance workflow rerun before push
@github-actions github-actions Bot added the documentation Improvements or additions to documentation. label Apr 14, 2026
@gh-xj gh-xj merged commit bae1305 into eastreams:dev Apr 14, 2026
18 checks passed
@littlepenguin66 littlepenguin66 deleted the feat/983-996-rate-limit-openai-compat branch April 14, 2026 09:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channels Channel adapters and external integration surfaces. config Runtime config parsing, schema, and defaults. conversation Conversation runtime, session flow, and prompt assembly. daemon Daemon binary, CLI entrypoints, and install flow. documentation Improvements or additions to documentation. providers Provider routing, selection, and transport behavior. size: XL Very large pull request: more than 1000 changed lines. tools Tool runtime, policy adapters, and tool catalog behavior.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OpenAI-compatible API surface in the gateway Rate-limit header observation for provider cooldown

2 participants