Skip to content

Add image generation support (bridge Tabbit's built-in image gen) #10

Description

@liwei9745

Background

Tabbit browser has built-in image generation capability (triggered via task mode), but tabbit2api currently only bridges the text chat interface (/v1/chat/completions) and does not expose image generation capabilities to downstream clients like CherryStudio, Codex, or Claude Code.

Packet Capture Analysis

We conducted a network packet capture using Playwright to reverse-engineer how Tabbit handles image generation internally.

Image Generation API Call Chain

  1. POST /api/v1/chat/completion — User sends an image prompt (e.g., "画一只猫")
  2. GET /proxy/v0/browser-task/graph — Tabbit polls task mode status
  3. POST /proxy/v0/cos/presigned-download-url — Fetches a presigned download URL from Tencent COS
  4. Image URL is returned via SSE stream embedded as Markdown: ![image](cos_url)

Key Internal Endpoints

Endpoint Method Purpose
/api/v1/chat/completion POST Send chat/image generation request
/proxy/v0/browser-task/graph GET Poll task execution status
/proxy/v0/cos/presigned-download-url POST Get COS presigned download URL

Image Storage

  • Provider: Tencent Cloud COS (Singapore region)
  • Path pattern: https://tab-sg-1300456063.cos.ap-singapore.myqcloud.com/{user_id}/{session_id}/image_generation/{YYYYMMDD}/{UUID}.png
  • Access: Presigned URLs with time-limited signatures

Example Request

POST /api/v1/chat/completion
{
  "messages": [
    {"role": "user", "content": "生成一张猫的图片"}
  ],
  "model": "gpt-4o",
  "stream": true
}

Example Response (SSE)

data: {"choices":[{"delta":{"content":"![image](https://tab-sg-1300456063.cos.ap-singapore.myqcloud.com/3bc8.../image_generation/20260623/32a9...png?q-sign-algorithm=sha1&...)"}}]}

Proposed Solutions

Option A: Extend Existing /v1/chat/completions (Recommended for MVP)

Parse Markdown image syntax from the SSE stream and return image URLs as part of the assistant message content.

Pros: Minimal code changes, backward compatible.
Cons: Clients need to parse Markdown images; no standard /v1/images/generations endpoint.

Option B: Add /v1/images/generations Endpoint (OpenAI Compatible)

Implement the standard OpenAI image generation API:

POST /v1/images/generations
{
  "prompt": "a cat",
  "n": 1,
  "size": "1024x1024"
}

Response:

{
  "created": 1234567890,
  "data": [{ "url": "https://..." }]
}

Pros: Fully compatible with OpenAI API spec; any OpenAI client can use it directly.
Cons: Requires waiting for SSE stream to complete; higher latency.

Option C: Browser Automation Click (Fallback)

Use Playwright to simulate clicking the "image generation" button in the Tabbit UI.

Pros: Does not depend on internal API stability.
Cons: Fragile to UI changes; high overhead.

Recommended Implementation

Phase 1 (Short-term): Extend chat/completions to detect and relay image URLs.
Phase 2 (Medium-term): Add native /v1/images/generations endpoint.

Environment

  • tabbit2api v0.1.3
  • Tabbit browser 1.41.10
  • OS: Windows 10
  • Capture date: 2026-06-23

Additional Context

  • Models with supports_images: true flag (e.g., GPT-5.5, Claude-Opus-4.7) should work
  • Image generation currently requires Tabbit's "task mode" to be activated in the UI
  • The existing Playwright-based bridge (tabbit-web-bridge.js) could be extended to pass image prompts

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions