From 3e2642d95f0b263466c423292a699c0aaf1d1d04 Mon Sep 17 00:00:00 2001 From: RiskeyL <7a8y@163.com> Date: Fri, 27 Mar 2026 19:33:28 +0800 Subject: [PATCH] feat: add shared writing guides, skills, and docs infrastructure --- .../skills/dify-docs-api-reference/SKILL.md | 220 +++ .../references/audit-checklist.md | 71 + .../references/codebase-paths.md | 41 + .../references/common-mistakes.md | 36 + .claude/skills/dify-docs-env-vars/SKILL.md | 137 ++ .../skills/dify-docs-env-vars/deep-dive.md | 1279 +++++++++++++++++ .../dify-docs-env-vars/verify-env-docs.py | 236 +++ .claude/skills/dify-docs-guides/SKILL.md | 62 + .claude/skills/dify-docs-reader-test/SKILL.md | 44 + .../skills/dify-docs-release-sync/SKILL.md | 260 ++++ .githooks/pre-commit | 11 + .gitignore | 2 - .mintignore | 2 + AGENTS.md | 94 +- CLAUDE.md | 211 +-- README.md | 103 ++ tools/translate/derive-termbase.py | 162 +++ tools/translate/formatting-ja.md | 49 + tools/translate/formatting-zh.md | 49 + tools/translate/termbase_i18n.md | 366 ++++- .../translate/translation-system-overview.md | 345 +++++ writing-guides/formatting-guide.md | 274 ++++ writing-guides/glossary.md | 358 +++++ writing-guides/index.md | 21 + writing-guides/style-guide.md | 80 ++ 25 files changed, 4269 insertions(+), 244 deletions(-) create mode 100644 .claude/skills/dify-docs-api-reference/SKILL.md create mode 100644 .claude/skills/dify-docs-api-reference/references/audit-checklist.md create mode 100644 .claude/skills/dify-docs-api-reference/references/codebase-paths.md create mode 100644 .claude/skills/dify-docs-api-reference/references/common-mistakes.md create mode 100644 .claude/skills/dify-docs-env-vars/SKILL.md create mode 100644 .claude/skills/dify-docs-env-vars/deep-dive.md create mode 100644 .claude/skills/dify-docs-env-vars/verify-env-docs.py create mode 100644 .claude/skills/dify-docs-guides/SKILL.md create mode 100644 .claude/skills/dify-docs-reader-test/SKILL.md create mode 100644 .claude/skills/dify-docs-release-sync/SKILL.md create mode 100755 .githooks/pre-commit create mode 100644 .mintignore create mode 100644 README.md create mode 100755 tools/translate/derive-termbase.py create mode 100644 tools/translate/formatting-ja.md create mode 100644 tools/translate/formatting-zh.md create mode 100644 tools/translate/translation-system-overview.md create mode 100644 writing-guides/formatting-guide.md create mode 100644 writing-guides/glossary.md create mode 100644 writing-guides/index.md create mode 100644 writing-guides/style-guide.md diff --git a/.claude/skills/dify-docs-api-reference/SKILL.md b/.claude/skills/dify-docs-api-reference/SKILL.md new file mode 100644 index 000000000..1aa384d8f --- /dev/null +++ b/.claude/skills/dify-docs-api-reference/SKILL.md @@ -0,0 +1,220 @@ +--- +name: dify-docs-api-reference +description: > + Use when editing, auditing, or creating OpenAPI specs for the Dify + documentation repo. Applies to files in en/api-reference/. Covers + formatting rules, error code conventions, example standards, + operationId patterns. +--- + +# Dify API Reference Documentation + +## Before Starting + +Read these shared guides: + +1. `writing-guides/style-guide.md` +2. `writing-guides/formatting-guide.md` +3. `writing-guides/glossary.md` + +**When auditing**, also load: +- `references/audit-checklist.md` +- `references/common-mistakes.md` + +**When tracing code paths**, also load: +- `references/codebase-paths.md` + +## Reader Persona + +Backend developers integrating Dify apps or knowledge bases into their own applications via REST APIs. Assume strong coding ability, familiarity with HTTP, authentication patterns, and JSON. Focus on precision: exact parameter types, required vs optional, error codes, and realistic examples. Don't explain what a REST API is. + +## Code Fidelity (Non-Negotiable) + +**Every detail in the spec MUST be verifiable against the codebase.** When the spec disagrees with the code, the spec is wrong. + +### What must match the code exactly + +- **Schema constraints**: `default`, `minimum`/`maximum`, `enum` must exactly match Pydantic `Field()` arguments. E.g., `le=101` -> `"maximum": 101` -- not 100. +- **Required/optional**: `Field(default=...)` = optional, no default = required; `FetchUserArg(required=True)` = required. +- **Error codes**: Only errors the endpoint actually raises. Trace `except` -> `raise` -> exception class -> `error_code` and `code` attributes. See [Error Responses](#error-responses). +- **Response status codes**: Must match the code's `return ..., ` value. +- **Response body fields**: Must match what the code actually returns. For streaming endpoints, verify the event type `enum` against actual events yielded by the task pipeline. Each event type must have a corresponding discriminator mapping entry. +- **Error messages**: Must match the exception's `description` attribute or the string passed to werkzeug exceptions. + +### How to verify + +1. Read the controller method. +2. For each parameter: find the Pydantic model or `request.args.get()`, note `Field()` arguments. +3. **Trace string fields beyond the controller.** The controller may declare `str`, but the service layer may cast to `StrEnum`, `Literal`, or validate against a fixed list. Common patterns: `SomeEnum(value)` cast, `Literal["a", "b"]` downstream, explicit `if field not in ALLOWED_VALUES` checks. If any exist, the spec MUST have `enum`. +4. For errors: trace `except` -> `raise` -> exception class -> `error_code` and `code` in `error.py`. +5. For responses: check `return` statement. **Important:** Response converters (e.g., `convert_blocking_full_response`) may flatten, restructure, or inject fields not present in the Pydantic entity. Always read the converter. +6. For service calls: read the service method to see what it returns or raises. + +### Flagging suspected code bugs + +The code is the source of truth, but the **code itself may have bugs**. When you encounter something irregular: + +1. **Flag it explicitly** -- do NOT silently document the suspected bug. +2. **Show the evidence** -- quote the exact code line and explain why it looks wrong. +3. **Ask the user for a decision** -- (a) document as-is, or (b) treat as upstream bug. +4. **Never auto-correct** -- do not silently write the "correct" value when the code says otherwise. + +Common code smells: off-by-one in `le`/`ge`, response body with 204, inconsistent error handling across similar endpoints, missing error handlers that sibling endpoints have, `required` mismatches. + +### Professional judgment + +You are a professional API documentation writer. Beyond code fidelity: +- **Challenge questionable decisions** with reasoning. +- **Suggest improvements** to API consistency or developer experience (clearly separated from required fixes). +- **Question conflicting instructions** -- push back with evidence. + +## Spec Structure + +| Spec File | App Type | `AppMode` values | Key Endpoints | +|-----------|----------|------------------|---------------| +| `openapi_chat.json` | Chat & Agent | `CHAT`, `AGENT_CHAT` | `/chat-messages`, conversations | +| `openapi_chatflow.json` | Chatflow | `ADVANCED_CHAT` | Same as chat, mode `advanced-chat` | +| `openapi_workflow.json` | Workflow | `WORKFLOW` | `/workflows/run`, workflow logs | +| `openapi_completion.json` | Completion | `COMPLETION` | `/completion-messages` | +| `openapi_knowledge.json` | Knowledge | *(N/A)* | datasets, documents, segments, metadata | + +Shared endpoints (file upload, audio, feedback, app info, parameters, meta, site, end-user) appear in chat/chatflow/workflow/completion specs. + +### App-Type Scoping (Critical) + +The codebase uses shared controllers and Pydantic models across app modes. The **documentation separates** these into per-app-type specs. You MUST filter through the app type lens: + +1. **Shared Pydantic models** -- only include fields relevant to this spec's app type. +2. **Shared error handlers** -- only include errors triggerable under this spec's app type. +3. **Internal-only fields** (e.g., `retriever_from`) -- omit from all specs. + +**How to determine relevance:** Check the controller's `AppMode` guard. For fields: "does this field have any effect in this mode?" For errors: "can this error be triggered in this mode?" When in doubt, trace through `AppGenerateService.generate()`. + +## Style Overrides + +These rules are specific to API reference docs and override or extend the general style guide. + +### Endpoint Summaries + +Must start with an imperative verb. Title Case. Standard vocabulary: + +| Verb | Method | When to use | +|------|--------|-------------| +| `Get` | GET | Single JSON resource by ID or fixed path | +| `List` | GET | Collection (paginated array) | +| `Download` | GET | Binary file content | +| `Create` | POST | New persistent resource | +| `Send` | POST | Message or request dispatch | +| `Submit` | POST | Feedback or input on existing resource | +| `Upload` | POST | File upload | +| `Convert` | POST | Format transformation | +| `Run` | POST | Execute workflow or process | +| `Stop` | POST | Halt running task | +| `Configure` | POST | Enable/disable setting | +| `Rename` | POST | Rename existing resource | +| `Update` | PUT/PATCH | Modify fields on existing resource | +| `Delete` | DELETE | Remove resource | + +**Do NOT use `Retrieve`** -- use `Get` or `List`. Verb-object order: `Upload File` not `File Upload`. + +### operationId Convention + +Pattern: `{verb}{AppType}{Resource}` + +| App Type | Prefix | Examples | +|----------|--------|---------| +| Chat | `Chat` | `createChatMessage`, `listChatConversations` | +| Chatflow | `Chatflow` | `createChatflowMessage` | +| Workflow | `Workflow` | `runWorkflow`, `getWorkflowLogs` | +| Completion | `Completion` | `createCompletionMessage` | +| Knowledge | *(none)* | `createDataset`, `listDocuments` | + +**Legacy operationIds**: Do NOT rename existing ones. Changing operationIds is a breaking change for SDK users. Apply this convention to **new endpoints only**. + +### Descriptions + +- **User-centric**: Write for developers, not the codebase. Name by what developers want to accomplish (e.g., "Download" not "Preview" for an endpoint serving raw file bytes). +- **Terminology consistency**: All user-facing text within a spec must use consistent terms. Code-derived names (paths, fields, schema names) stay as-is. Watch for: "segment" vs "chunk" (use "chunk"), "dataset" vs "knowledge base" (use "knowledge base"). +- **Descriptions must add value**: `"Session identifier."` is a label, not a description. Instead: `"The \`user\` identifier provided in API requests."`. +- **Nullable/conditional fields**: Explain when present or `null`. + +### Cross-API Links + +When a description mentions another endpoint, add a markdown link. +Pattern: `/api-reference/{category}/{endpoint-name}` (kebab-case from endpoint summary). + +## Parameters + +- Every parameter MUST have a `description`. +- **Schema constraints must exactly match code.** Transcribe `Field()` arguments verbatim. +- Do NOT have `example` field on parameters. +- **Do NOT repeat schema metadata in descriptions.** If `default: 20` is in schema, don't repeat in description. +- **Do NOT repeat enum values in descriptions** unless explaining when to choose each value. +- Mark `required` accurately based on code. +- **Request fields**: Use `enum` for known value sets. Trace string fields through service layer for hidden enums. +- **Response fields**: Do NOT use `enum`. Explain values in `description` instead (Mintlify renders duplicate "Available options" list). +- **Backtick all values** in descriptions: literal values, field names, code references. +- **Space between numbers and units**: `100 s`, `15 MB` -- not `100s`, `15MB`. +- **Descriptions must be specific**: `"Available options."` is not acceptable. + +## Responses + +### Success Responses + +Only 200/201 as the primary response. For multiple response modes (blocking/streaming), use markdown bullets in the 200 description. + +**Every 200/201 JSON response MUST have at least one `examples` entry** with realistic values. + +**Binary/file responses**: Use `content` with appropriate media type and `format: binary`. Use `audio/mpeg` for audio, `application/octet-stream` for generic files. Put details in response `description`, not endpoint description. + +**Schema description duplication**: When using `$ref`, the schema definition MUST NOT have a top-level `description`. Mintlify renders both, causing duplication. + +### Error Responses + +Each endpoint MUST list its specific error codes, grouped by HTTP status. + +#### Error Tracing Rules + +1. **`BaseHTTPException` subclasses** (in `error.py`): Use `error_code` attribute as code name, `code` attribute as HTTP status. +2. **Werkzeug built-in exceptions** (`BadRequest`, `NotFound`): Use generic codes -- `bad_request`, `not_found`. NOT the service-layer exception name. +3. **Custom werkzeug `HTTPException` subclasses** (NOT `BaseHTTPException`): Global handler converts class name to snake_case via regex. E.g., `FilenameNotExistsError` -> `filename_not_exists_error`. +4. **Fire-and-forget methods**: If a service method never raises, do NOT invent error responses. +5. **No custom error handling**: If controller only uses `@validate_app_token` with no `try/except`, the only error is 401 (global auth). Do NOT add empty error sections. +6. **Error messages**: Use the exact string from the exception's `description` attribute or werkzeug string argument. + +#### Error Format + +- **No `$ref` schema** in error responses -- omit `"schema"` entirely. +- **Description** lists error codes as markdown bullets with backticked names. +- **Examples** required for every error response (provides Mintlify dropdown selector). + +## Schemas + +- **Prefer inline** over `$ref` for simple objects. +- Only use `$ref` for genuinely reused or complex schemas. +- **Array items must define `properties`** -- no bare `"type": "object"`. +- **`required` arrays on request schemas only** -- not response schemas. +- **`oneOf` options**: Each must have a `title` property. Parent schema must NOT have `description`. + +## Examples + +- **Realistic values only.** Real-looking UUIDs, timestamps, text, metadata. +- **Verify example values against code.** Enum-like fields must use values the code actually returns. +- Request and response examples must correspond. +- **Titles**: `"summary": "Request Example"` (single) or `"summary": "Request Example-Streaming mode"` (multiple). Error examples: use error code as summary. + +## Tag Naming + +- **Plural** for countable resources: `Chats`, `Files`, `Conversations`. +- **Singular** for uncountable nouns or abbreviations: `Feedback`, `TTS`. +- Title Case. + +## Endpoint Ordering + +**CRUD lifecycle**: POST create -> GET list/detail -> PUT/PATCH update -> DELETE. + +Exception: Tags without a create operation (e.g., Conversations). GET list comes first; non-create POST placed after GETs but before PUT/DELETE. + +## Post-Writing Verification + +After completing the document, invoke `dify-docs-reader-test` to verify it from the reader's perspective. diff --git a/.claude/skills/dify-docs-api-reference/references/audit-checklist.md b/.claude/skills/dify-docs-api-reference/references/audit-checklist.md new file mode 100644 index 000000000..fd26e58f7 --- /dev/null +++ b/.claude/skills/dify-docs-api-reference/references/audit-checklist.md @@ -0,0 +1,71 @@ +# Audit Checklist (Per Endpoint) + +Use this checklist when auditing or reviewing an OpenAPI spec against the Dify codebase. + +## Pre-Audit + +1. **Identify the spec's app type**: Determine which `AppMode` values this spec covers (see SKILL.md Spec Structure table). All subsequent checks are filtered through this app-type scope. +2. **Compare routes**: Check `api/controllers/service_api/__init__.py` for registered routes, then each controller file. + +## Per-Endpoint Checks + +3. **App-type scoping**: For shared controllers/models, only include fields, parameters, and errors relevant to this spec's app type. Trace code paths to confirm relevance. +4. **Missing endpoints**: Present in code but not in spec. +5. **Ghost endpoints**: Present in spec but not in code. +6. **Request schemas**: Verify params, types, required/optional, defaults, enums against every `Field()` argument. +7. **Hidden enums on request string fields**: For every `string` field without `enum`, trace through the service layer to check for `StrEnum` casts, `Literal` types, or validation against fixed lists. Do NOT trust the controller-level type annotation alone. +8. **Response schemas**: Verify fields, types, status codes. Check `return ..., ` and read response converters (they may flatten or inject fields). +9. **Error codes -- completeness**: All errors the endpoint raises are documented. Trace every `except` -> `raise` chain; read service methods to confirm they actually raise. +10. **Error codes -- correctness**: No phantom codes. Remove errors the controller does not raise. +11. **Error code names**: Must match `error_code` attribute (custom exceptions) or werkzeug generic name (`bad_request`, `not_found`). Never use Python class names or service exception names. +12. **Error messages**: Must match the `description` attribute or string argument. Copy from code verbatim. +13. **Example values**: Match actual code output (e.g., enum values returned by the code). No unresolved `{message}` placeholders. +14. **operationId convention**: Follows `{verb}{AppType}{Resource}` pattern for new endpoints; legacy IDs left as-is. +15. **Description quality**: Useful explanations, not just field-name labels. +16. **200/201 responses have examples**: Every JSON success response must have at least one `examples` entry with realistic values. +17. **No schema description duplication**: `$ref` response schemas must not have a top-level `description` (Mintlify shows both). +18. **Binary responses**: Use `content` with `format: binary` schema; details in response `description`. +19. **`oneOf` options have `title`**: Each option object needs a descriptive `title`. Parent schema has no `description`. +20. **`required` arrays on request schemas only**: Not on response schemas. +21. **`enum` on request schemas only**: Not on response schemas (Mintlify renders duplicate "Available options"). +22. **Response array items have `properties`**: No bare `"type": "object"` -- Mintlify renders `object[]` with no expandable fields. +23. **Terminology consistency**: No synonym mixing within a tag (e.g., "segment" vs "chunk"). +24. **Values backticked, number-unit spacing correct**: All literal values backticked; space between numbers and units. +25. **Endpoint ordering**: Follows CRUD lifecycle (POST create -> GET list/detail -> PUT/PATCH update -> DELETE). +26. **Tag naming**: Plural for countable resources, singular for uncountable nouns/abbreviations, Title Case. + +## Two-Agent Workflow + +- **Agent 1 (Fixer)**: Audits the spec and applies fixes using this checklist and all rules from SKILL.md. +- **Agent 2 (Reviewer)**: Reads the fixed spec and verifies compliance. Reports remaining issues WITHOUT making edits. If issues are found, fix and optionally re-run the reviewer. + +Always validate JSON (`python -m json.tool`) after fixes. + +## Cross-Spec Propagation + +Shared endpoints (file upload, audio, feedback, app info, parameters, meta, site, end-user) appear in chat, chatflow, completion, and workflow specs. When a fix is applied to one spec, check all sibling specs for the same issue. + +## Verification Rigor + +**Every reported issue must be correct.** False positives erode trust and waste time. + +1. **Trace the full path.** Don't stop at the controller. Follow errors through global handlers (`external_api.py`), check whether service methods actually raise. +2. **Check app-type relevance.** Don't flag `workflow_id` as missing from the chat spec. +3. **Verify every claim has evidence.** You must have read the actual code line. No speculative claims. +4. **Self-review before reporting.** Re-read each finding and ask: + - "Did I read the actual code, or am I assuming?" + - "Did I check global error handlers for bare `raise ValueError/Exception`?" + - "Is this field/error relevant to THIS spec's app type?" + - "Am I confusing the Python class name with the `error_code` attribute?" + - "Did I check the service method body, or did I assume it raises?" +5. **When uncertain, investigate further.** Report fewer verified issues rather than many unverified ones. Mark uncertain items as "unverified -- needs manual check." + +### Common False-Positive Patterns + +- Assuming bare `ValueError` is a 500 (global handler converts to 400 `invalid_param`) +- Flagging shared-model fields as missing from a spec covering a different app type +- Assuming a service method raises when it's actually fire-and-forget +- Using the Python exception class name instead of the `error_code` attribute +- Inventing errors for code paths that don't exist under the spec's app mode +- Documenting an unreachable `except` clause (controller catches exception the service never raises for this endpoint) +- Adding `enum` to a genuinely dynamic/provider-specific string field (e.g., `voice`, `embedding_model_name`) diff --git a/.claude/skills/dify-docs-api-reference/references/codebase-paths.md b/.claude/skills/dify-docs-api-reference/references/codebase-paths.md new file mode 100644 index 000000000..3b94a33b8 --- /dev/null +++ b/.claude/skills/dify-docs-api-reference/references/codebase-paths.md @@ -0,0 +1,41 @@ +# Codebase Paths + +Mapping of concepts to file paths in the Dify codebase and docs repo. + +## Dify Docs Repo + +| What | Path | +|------|------| +| OpenAPI specs | `en/api-reference/openapi_*.json` | +| Navigation config | `docs.json` | + +## Dify Codebase + +| What | Path | +|------|------| +| App controllers | `api/controllers/service_api/app/` | +| Dataset controllers | `api/controllers/service_api/dataset/` | +| App error definitions | `api/controllers/service_api/app/error.py` | +| Dataset error definitions | `api/controllers/service_api/dataset/error.py` | +| Auth/rate-limit wrapper | `api/controllers/service_api/wraps.py` | +| Global error handlers | `api/libs/external_api.py` | +| Route registration | `api/controllers/service_api/__init__.py` | + +### Global Error Handlers + +The handlers in `api/libs/external_api.py` are critical for error tracing: + +- `ValueError` -> 400 `invalid_param` +- `AppInvokeQuotaExceededError` -> 429 `too_many_requests` +- Generic `Exception` -> 500 + +Always check these when tracing bare `raise ValueError(...)` or unhandled exceptions. + +### Error Code Sources + +| Error Type | Source | +|------------|--------| +| App-level errors | `api/controllers/service_api/app/error.py` | +| Knowledge errors | `api/controllers/service_api/dataset/error.py` | +| Auth/rate-limit | `api/controllers/service_api/wraps.py` | +| Global handlers | `api/libs/external_api.py` | diff --git a/.claude/skills/dify-docs-api-reference/references/common-mistakes.md b/.claude/skills/dify-docs-api-reference/references/common-mistakes.md new file mode 100644 index 000000000..1e1ebae4c --- /dev/null +++ b/.claude/skills/dify-docs-api-reference/references/common-mistakes.md @@ -0,0 +1,36 @@ +# Common Mistakes + +Quick-reference table of frequent spec issues. Each row is a distinct mistake pattern -- fix column shows what to do. + +| # | Mistake | Fix | +|---|---------|-----| +| 1 | Schema constraint doesn't match code (`Field(le=101)` -> spec says 100) | Transcribe `Field()` arguments exactly. Do not round or "correct." | +| 2 | `example` field on parameters | Remove. Use request body `examples` instead. | +| 3 | Missing error codes on endpoint | Trace controller `except` -> `raise` chain. Add all raised errors. | +| 4 | Phantom error codes not raised by controller | Remove. Only document errors with a traceable raise path. | +| 5 | Error code uses exception class name instead of `error_code` | Use the `error_code` attribute from `error.py`, not the Python class name. | +| 6 | Werkzeug exception -> wrong error code | `BadRequest` -> `bad_request`, `NotFound` -> `not_found`. Never use service-layer exception name. | +| 7 | `$ref` to ErrorResponse in error responses | Remove schema. Use description + examples only (avoids Mintlify rendering issue). | +| 8 | Error response missing `description` or `examples` | Every error response needs backticked code in `description` AND example objects. | +| 9 | Invented error for fire-and-forget method | Read the service method body. If it returns `None` and never raises, no error response. | +| 10 | Error from unreachable `except` clause documented | Verify the service method actually raises the caught exception for this endpoint. | +| 11 | Field/error from wrong app type included | Filter by spec's `AppMode`. E.g., `workflow_id` belongs in chatflow, not chat. | +| 12 | Internal-only field exposed in spec | Fields like `retriever_from` are internal. Omit from all specs. | +| 13 | Request string field missing `enum` -- controller says `str` but service casts to Enum | Trace through service layer. If cast to `StrEnum`, `Literal`, or validated against list, add `enum`. | +| 14 | `enum` on response schema field | Remove. Mintlify renders duplicate "Available options." Explain values in `description` instead. | +| 15 | Values listed in request description but no `enum` | Add `enum` to request schema when specific values are known. | +| 16 | Request schema missing `required` array | Add based on Pydantic model non-optional fields. | +| 17 | `required` array on response schema | Remove. Only request schemas should have `required` arrays. | +| 18 | Response array items with bare `"type": "object"` | Define `properties` on array items. Mintlify shows `object[]` with no expandable fields otherwise. | +| 19 | Deeply nested `$ref` for simple objects | Inline the properties directly. | +| 20 | `oneOf` options show "Option 1", "Option 2" | Add `"title"` to each `oneOf` option object. | +| 21 | Parent description on `oneOf` wrapper | Remove. Describe only the parent array/property that references the wrapper. | +| 22 | 200 response missing examples | Every JSON 200/201 response must have at least one `examples` entry. | +| 23 | Schema has `description` AND response has `description` | Remove `description` from referenced schema. Mintlify shows both. | +| 24 | Binary response missing `format: binary` | Use `content` with media type + `{ "type": "string", "format": "binary" }`. Details in response `description`. | +| 25 | Response schema matches Pydantic entity, not actual API output | Response converters can flatten or inject fields. Read the converter method. | +| 26 | Streaming event enum missing event types | Cross-reference against task pipeline event yields. Every event type needs a discriminator mapping entry. | +| 27 | Unresolved `{message}` placeholder in example | Replace format placeholders with realistic static text. | +| 28 | Fix applied to one spec but not siblings | Shared endpoints exist in chat/chatflow/completion/workflow specs. Propagate fixes. | +| 29 | Mixed synonyms for same concept | Pick one user-facing term per concept (e.g., "chunk" not "segment"). Code-derived names stay as-is. | +| 30 | Mentioning another API without a link | Add markdown link: `[API Name](/api-reference/category/endpoint)`. | diff --git a/.claude/skills/dify-docs-env-vars/SKILL.md b/.claude/skills/dify-docs-env-vars/SKILL.md new file mode 100644 index 000000000..ef9447cb0 --- /dev/null +++ b/.claude/skills/dify-docs-env-vars/SKILL.md @@ -0,0 +1,137 @@ +--- +name: dify-docs-env-vars +description: > + Use when writing, rewriting, or auditing environment variable documentation + for Dify self-hosted deployment. Applies to + en/self-host/configuration/environments.mdx. Covers the full process from + codebase tracing to user-facing descriptions. +--- + +# Dify Environment Variable Documentation + +## Before Starting + +Read these shared guides: + +1. `writing-guides/style-guide.md` +2. `writing-guides/formatting-guide.md` +3. `writing-guides/glossary.md` + +## Four-Step Process + +**This process applies to every variable without exception.** Do not skip variables because they seem "obvious" — every variable must be traced, explained, and described. + +### Step 1: Trace the Variable in the Codebase + +**Agent granularity**: When using subagents for tracing, assign 3–5 related variables per agent. + +**Tracing depth depends on variable type:** + +- **Python config variables** (defined in `api/configs/`): Full tracing — find definition, all usage locations, and behavior when empty vs set. +- **Frontend variables** (mapped in `web/docker/entrypoint.sh`): Trace from `entrypoint.sh` to find the Docker-to-`NEXT_PUBLIC_*` mapping, verify the default in both `docker/.env.example` and `web/.env.example`, and check whether the variable is also used in Python code (dual-purpose). For Next.js-only variables (UI knobs like `MAX_TOOLS_NUM`), light verification is sufficient. +- **Docker/container service variables** (only in `docker-compose.yaml`): Light verification — grep to confirm the variable is not used in Python code, then document from `.env.example` comments. +- **Plugin daemon variables** (`PLUGIN_*` not in `api/configs/`): Document from `.env.example` comments. + +**For full tracing**, search the Dify codebase: + +1. **Find the definition** in `api/configs/` — note the Pydantic field type, default, description, and any `validation_alias` (fallback) settings. +2. **Find every usage** — grep for both the env var name and the Python attribute (e.g., `dify_config.VARIABLE_NAME`). Read surrounding code to understand what each usage does. +3. **Determine behavior when empty vs set** — trace fallback chains and identify what features break. + +### Step 2: Write a Plain-Language Explanation + +Write an explanation covering: + +- What the variable actually does (in practical terms, not code terms) +- Specific features that depend on it (name them) +- What happens if left empty (what breaks, what falls back) +- What happens if set (what works) +- Key code locations (file paths, no line numbers — they shift) + +Save to `deep-dive.md` (in this skill directory) under the appropriate section heading. + +### Step 3: Write the User-Facing Description + +Transform the explanation into a concise documentation description. The description must: + +- **Lead with the practical impact**, not the technical mechanism +- **Name the features** that require this variable (e.g., "Required for the Human Input node" not "used for frontend references") +- **Explain what breaks** if misconfigured (e.g., "If empty, email links will be broken") +- **Mention fallback behavior** if the variable has one (e.g., "falls back to `CONSOLE_API_URL`") +- **Include relationships** with other variables when relevant +- **End with an example value** for non-obvious variables + +### Step 4: Confirm with Reviewer + +Present the proposed description to the user for review before editing the documentation file. + +## Document Structure + +The env var doc is organized into three sections following `docker/.env.example` section order: + +1. **Backend (API + Worker)** — Python API server and Celery worker variables. +2. **Frontend (Web)** — Next.js frontend variables. Uses `` to show Docker and source code variable names. +3. **Infrastructure (Docker Compose / AWS AMI Only)** — database, Redis, Nginx, and other container variables. Not applicable to source code deployments. + +**When to use tables**: Groups of related, straightforward variables (connection settings, credentials, tuning knobs). + +**When to use individual headings**: Important variables needing explanation — typically enum-type selectors (`STORAGE_TYPE`, `VECTOR_STORE`) or variables where the "why" matters (`SECRET_KEY`, `FILES_URL`). + +**When to use tabs**: Frontend section variables where Docker and source code deployments use different variable names. Tabs cannot be placed inside table cells, so all tabbed variables require individual headings. + +**When to use accordions**: Provider-specific configuration (storage backends, vector databases, mail providers) — users only need one provider. + +## Reader Persona + +Same audience as `en/self-host/` documentation (see `dify-docs-guides` skill): DevOps engineers and system administrators deploying Dify. Assume strong infrastructure knowledge. + +**Additional context for env var docs:** Readers are actively configuring a deployment. They need to know what each variable does, when to change it, and what breaks if they get it wrong. They are not reading linearly—they are scanning for a specific variable. + +## Style Overrides + +Rules specific to env var docs (override or extend the shared style guide): + +- Use `(empty)` for empty-string defaults, not `""` or blank +- For empty defaults with a fallback: `(empty; falls back to X)` or `(empty; defaults to X)` +- Never include real or example secret keys — GitHub push protection blocks `sk-*` patterns. Use descriptions like `(pre-filled in .env.example; must be replaced for production)` + +**Consistency over variety in reference tables.** The general style guide says to vary sentence patterns. In reference tables, consistency aids scanning. Use predictable patterns for connection credentials (hostname, port, username, password) across providers. Vary descriptions only when variables genuinely differ in behavior or purpose. + +**Variable descriptions should be self-contained.** The general style guide says not to restate the heading. Variable descriptions must state what the variable does—even if the name partially implies it. Not all variable names are self-explanatory, and users may arrive at a description via search without seeing the surrounding section context. + +**Include actionable technical mechanisms.** The general style guide favors user outcomes over technical mechanisms. For env var docs, include technical mechanisms that help users configure, troubleshoot, or understand trade-offs—algorithm names, encoding behavior, fallback chains, version requirements. Exclude mechanisms that only describe code architecture—factory patterns, lazy imports, class names—unless understanding them is necessary for configuration. + +- **Keep**: "URL-encoded in the connection string, so `@`, `:`, `%` are safe to use", "HMAC-SHA256", "Requires Milvus >= 2.5.0", "Falls back to `CONSOLE_API_URL`" +- **Remove**: "Dify's storage dispatcher lazily imports the selected backend", "Sends POST to /v1/sandbox/run with X-Api-Key header" + +**No specific recommended values for tuning parameters.** For numeric tuning parameters without clear boundaries (connection pool sizes, worker counts, timeouts, buffer sizes), do not prescribe values. Describe the symptom that indicates the value needs changing: "If you experience connection rejections under load, try increasing this value." Exception: when a value has a well-established recommendation (e.g., PostgreSQL `shared_buffers` = 25% of RAM), include it with a reference link. + +## Description Anti-Patterns + +| Anti-Pattern | Better | +|---|---| +| "Used for frontend references" | "Required for the Human Input node — form links in email notifications are built from this URL" | +| "The backend URL of the console API" | "Set this if you use OAuth login (GitHub, Google) or Notion integration — these features need an absolute callback URL" | +| "Upload file size limit, default 15" | "Maximum file size in MB for uploads" | +| Restating the code comment verbatim | Explaining when you'd change it and what happens if you don't | + +## Verification + +Run after completing any documentation change: + +```bash +python3 .claude/skills/dify-docs-env-vars/verify-env-docs.py \ + --env-example \ + --docs +``` + +The script reports: +- **Missing from docs**: Variables in `.env.example` not yet documented (address over time) +- **Extra in docs**: Variables documented but not in `.env.example` (verify manually) +- **Default mismatches**: Documented defaults that don't match `.env.example` — **must be zero before work is complete** + +Use `.env.example` defaults (what Docker Compose users actually get), not Pydantic code defaults. + +## Post-Writing Verification + +After completing the document, invoke `dify-docs-reader-test` to verify it from the reader's perspective. diff --git a/.claude/skills/dify-docs-env-vars/deep-dive.md b/.claude/skills/dify-docs-env-vars/deep-dive.md new file mode 100644 index 000000000..9d89bc5bf --- /dev/null +++ b/.claude/skills/dify-docs-env-vars/deep-dive.md @@ -0,0 +1,1279 @@ +# Dify Environment Variables — Deep Dive Reference + +This document records detailed code-traced explanations of each Dify environment variable. It serves as a long-term study reference and will be updated as new variables are analyzed. + +--- + +## Common Variables + +### CONSOLE_API_URL + +**Default:** `""` (empty) + +**What it actually does:** This is the address of Dify's backend API server. The code uses it in two main ways: + +1. **OAuth login redirects** — When a user clicks "Log in with GitHub" or "Log in with Google," Dify tells GitHub/Google: "after the user approves, send them back to `{CONSOLE_API_URL}/console/api/oauth/authorize/github`." The same pattern is used for Notion integration, plugin OAuth, and MCP connections. There are ~22 places in the code that build callback URLs this way. + +2. **Icon and file URLs** — The frontend loads plugin icons and file previews from URLs built on this base, like `{CONSOLE_API_URL}/console/api/workspaces/current/plugin/icon`. + +3. **Cookie security** — The code checks whether this URL starts with `https` to decide whether to use secure cookies. + +**If left empty:** For a simple local deployment behind Nginx on the same domain, it works — the frontend makes relative API calls. But any OAuth login (GitHub, Google, Notion) will break because OAuth providers require absolute callback URLs. Plugin icons may also fail to load. + +**If set:** All OAuth flows and icon URLs work correctly. + +**Key code locations:** +- Definition: `api/configs/feature/__init__.py` (EndpointConfig) +- OAuth callbacks: `api/controllers/console/auth/oauth.py` +- Plugin OAuth: `api/controllers/console/workspace/tool_providers.py`, `trigger_providers.py` +- Icon URLs: `api/core/tools/tool_manager.py`, `api/services/tools/tools_transform_service.py` +- Cookie security: `api/libs/token.py` + +--- + +### CONSOLE_WEB_URL + +**Default:** `""` (empty) + +**What it actually does:** This is the address of Dify's frontend web interface. The code uses it for: + +1. **Email links** — Every email Dify sends contains links built from this URL: invitation activation links (`{CONSOLE_WEB_URL}/activate?token=...`), password reset links (`/reset-password`), login links (`/signin`), and dataset notification links (`/datasets`). + +2. **OAuth completion redirects** — After an OAuth flow finishes on the backend, the server redirects the user's browser back to the frontend using `redirect(f"{CONSOLE_WEB_URL}/oauth-callback")` or `redirect(f"{CONSOLE_WEB_URL}/signin?message=...")`. + +3. **CORS fallback** — If `CONSOLE_CORS_ALLOW_ORIGINS` is not set, the system uses this value as the allowed origin for cross-domain requests. + +**If left empty:** Email links break (they'd look like `/signin` with no domain). OAuth login can't redirect back to the frontend. For local single-domain deployments behind Nginx, CORS may still work via the fallback, but emails are still broken. + +**If set:** Emails contain clickable links, OAuth redirects work, and CORS is properly configured. + +**Key code locations:** +- Definition: `api/configs/feature/__init__.py` (EndpointConfig) +- Email links: `api/tasks/mail_invite_member_task.py`, `mail_register_task.py`, `mail_reset_password_task.py` +- OAuth redirects: `api/controllers/console/auth/oauth.py`, `data_source_oauth.py` +- Plugin/trigger OAuth: `api/controllers/console/workspace/tool_providers.py`, `trigger_providers.py`, `datasource_auth.py` +- CORS fallback: `api/configs/feature/__init__.py` (HttpConfig, via validation_alias) +- Scheduled tasks: `api/schedule/mail_clean_document_notify_task.py` + +--- + +### SERVICE_API_URL + +**Default:** `""` (empty) + +**What it actually does:** This is the simplest one. It's used in exactly **2 places** in the backend, and both do the same thing: + +```python +(dify_config.SERVICE_API_URL or request.host_url.rstrip("/")) + "/v1" +``` + +It provides the "API Base URL" shown to developers in the Dify console — the URL they copy-paste into their code when calling the Dify API (e.g., `https://api.example.com/v1`). + +**If left empty:** Falls back to the current request's host URL. This works fine for single-domain setups — the frontend just uses whatever URL it's already talking to. + +**If set:** Ensures all users see the same API base URL regardless of how they access the console (e.g., via IP vs domain name, or behind a load balancer). + +**Key code locations:** +- Definition: `api/configs/feature/__init__.py` (EndpointConfig) +- App model: `api/models/model.py` (`api_base_url` property) +- Dataset endpoint: `api/controllers/console/datasets/datasets.py` (`DatasetApiBaseUrlApi`) + +--- + +### APP_API_URL + +**Default:** `""` (empty; Docker image defaults to `http://127.0.0.1:5001`) + +**What it actually does:** This variable is **not used in the Python backend at all**. It's only used in the web frontend's Docker entrypoint script: + +```bash +export NEXT_PUBLIC_PUBLIC_API_PREFIX=${APP_API_URL}/api +``` + +It tells the WebApp frontend (the published app interface, not the console) where the API server is. + +**If left empty:** The frontend Docker image has a hardcoded fallback of `http://127.0.0.1:5001`. + +**If set:** The WebApp frontend sends API requests to the specified address. + +**Key code locations:** +- Docker entrypoint: `web/docker/entrypoint.sh` +- Dockerfile default: `web/Dockerfile` + +--- + +### APP_WEB_URL + +**Default:** `""` (empty) + +**What it actually does:** This is the address of the WebApp frontend (where published apps live). It's used in **4 places**, all related to the **Human Input node** in workflows: + +1. Building form URLs for workflow pause forms: `{APP_WEB_URL}/form/{token}` +2. Including those form URLs in notification emails sent to users +3. Displaying form links in the workflow run details +4. Testing delivery methods for human input + +The `Site` model also uses it as the base URL for published apps, with a fallback to the current request URL. + +**If left empty:** The `Site` model falls back to the current request URL for basic display. But form links in emails return `None` or are malformed — the Human Input email notification feature is broken. + +**If set:** Human Input workflow forms work correctly, and email notifications contain valid links. + +**Key code locations:** +- Definition: `api/configs/feature/__init__.py` (EndpointConfig) +- Site base URL: `api/models/model.py` (`app_base_url` property on Site) +- Workflow form URLs: `api/controllers/console/app/workflow_run.py` +- Email delivery: `api/tasks/mail_human_input_delivery_task.py` +- Delivery test: `api/services/human_input_delivery_test_service.py` + +--- + +### TRIGGER_URL + +**Default:** `http://localhost:5001` + +**What it actually does:** This is the externally reachable address where webhook and plugin triggers can reach Dify. The code uses it to build two types of endpoint URLs: + +- Plugin triggers: `{TRIGGER_URL}/triggers/plugin/{endpoint_id}` +- Webhook triggers: `{TRIGGER_URL}/triggers/webhook/{webhook_id}` + +These URLs are given to external systems so they know where to send events to invoke Dify workflows. + +**If left empty or set to `localhost`:** Triggers only work locally. External systems can't reach your Dify instance. + +**If set to a public URL:** External services can invoke your workflows via webhooks and plugin triggers. + +**Key code locations:** +- Definition: `api/configs/feature/__init__.py` (EndpointConfig) +- URL generation: `api/core/trigger/utils/endpoint.py` +- Trigger subscriptions: `api/services/trigger/trigger_subscription_builder_service.py` +- Trigger provider: `api/services/trigger/trigger_provider_service.py` + +--- + +### FILES_URL + +**Default:** `""` (empty; falls back to `CONSOLE_API_URL` via Pydantic alias) + +**What it actually does:** This is the base URL for all file preview and download links. The code builds signed URLs like: + +- `{FILES_URL}/files/tools/{file_id}?timestamp=X&nonce=Y&sign=Z` +- `{FILES_URL}/files/datasources/{file_id}?...` +- `{FILES_URL}/files/workspaces/{tenant_id}/webapp-logo` + +These signed URLs are given to the frontend for displaying images/files, to multi-modal models as input, and to observability/tracing integrations (LangFuse, LangSmith, etc.). + +**Fallback mechanism:** If `FILES_URL` is not set, Pydantic's `validation_alias=AliasChoices("FILES_URL", "CONSOLE_API_URL")` causes it to use `CONSOLE_API_URL` instead. + +**If both are empty:** File previews, tool outputs, and workspace logos all fail to display. + +**If set:** All file URLs resolve correctly. Required for file processing plugins. + +**Key code locations:** +- Definition: `api/configs/feature/__init__.py` (FileAccessConfig) +- Datasource files: `api/core/datasource/datasource_file_manager.py` +- Tool files: `api/core/tools/signature.py` (when `for_external=True`) +- Workspace logos: `api/services/workspace_service.py`, `api/controllers/web/site.py` +- Observability: `api/core/ops/langfuse_trace/`, `langsmith_trace/`, `opik_trace/`, etc. + +--- + +### INTERNAL_FILES_URL + +**Default:** `""` (empty; falls back to `FILES_URL`) + +**What it actually does:** Same purpose as `FILES_URL`, but for communication **between services inside the Docker network**. Every usage in the code follows this pattern: + +```python +base_url = dify_config.INTERNAL_FILES_URL or dify_config.FILES_URL +``` + +It's used when plugins, PDF extractors, or Word document processors need to access files. These internal services may not be able to reach the external `FILES_URL` (which might go through Nginx, a CDN, or a public domain that isn't routable from inside Docker). + +**If left empty:** Falls back to `FILES_URL`. Works fine if internal services can reach the external URL. + +**If set (e.g., `http://api:5001`):** Services communicate directly within Docker, avoiding external routing. + +**Key code locations:** +- Definition: `api/configs/feature/__init__.py` (FileAccessConfig) +- Tool files: `api/core/tools/tool_file_manager.py`, `api/core/tools/signature.py` (when `for_external=False`) +- Document extraction: `api/core/rag/extractor/pdf_extractor.py`, `word_extractor.py` +- Workflow runtime: `api/core/app/workflow/file_runtime.py` + +--- + +### FILES_ACCESS_TIMEOUT + +**Default:** `300` (seconds, i.e., 5 minutes) + +**What it actually does:** Controls how long signed file URLs remain valid. Every file URL Dify generates includes a timestamp and HMAC signature. When the URL is accessed, the verification logic checks: + +```python +current_time - timestamp <= FILES_ACCESS_TIMEOUT +``` + +If the URL is older than `FILES_ACCESS_TIMEOUT` seconds, it's rejected. + +**If set to a larger value (e.g., 3600):** URLs stay valid longer, useful for long-running processes. Less protection against URL reuse/replay. + +**If set to a smaller value (e.g., 60):** URLs expire faster, more secure but may break slow operations. + +**Key code locations:** +- Definition: `api/configs/feature/__init__.py` (FileAccessConfig) +- Verification: `api/core/datasource/datasource_file_manager.py`, `api/core/tools/tool_file_manager.py`, `api/core/tools/signature.py` +- Workflow runtime: `api/core/app/workflow/file_runtime.py` + +--- + +## Server Configuration + +### SECRET_KEY + +**Default:** `change-this-to-a-random-secret` (pre-filled in .env.example; must be replaced for production) + +**What it actually does:** This is the most critical security variable. It's used for four distinct purposes: + +1. **Session cookie signing**—Flask uses it to sign browser session cookies, preventing forgery. +2. **JWT token signing**—All authentication tokens (login sessions) are signed with HS256 using this key. Both creation and verification depend on it. +3. **File URL signing**—Every file preview/download URL includes an HMAC-SHA256 signature derived from this key, making URLs tamper-proof and time-limited. +4. **OAuth credential encryption**—Third-party OAuth credentials (client_id, client_secret for plugin integrations) are encrypted with AES-256-CBC using a key derived from SECRET_KEY. + +**If changed after deployment:** +- All users are immediately logged out (session cookies and JWT tokens become invalid) +- All existing file preview URLs break +- All encrypted OAuth credentials become undecryptable—plugin integrations stop working +- This is essentially a "reset everything" action + +**Key code locations:** +- Definition: `api/configs/feature/__init__.py` (SecurityConfig) +- Flask session: `api/extensions/ext_set_secretkey.py` +- JWT: `api/libs/passport.py` +- File signing: `api/core/tools/signature.py`, `api/core/datasource/datasource_file_manager.py` +- OAuth encryption: `api/core/tools/utils/system_oauth_encryption.py` + +--- + +### INIT_PASSWORD + +**Default:** (empty) + +**What it actually does:** An optional security gate for the initial admin account setup. It is NOT defined in the Pydantic config—it's read directly from the environment via `os.environ.get("INIT_PASSWORD")`. + +When set, Dify requires this password to be entered at the `/install` page before the admin account can be created. The `@setup_required` decorator on API endpoints blocks all access until this validation passes. + +When empty, the setup page is open without any password—anyone who can reach the `/install` URL can create the admin account. + +**Only relevant during first-time setup.** Once the `DifySetup` database record exists (i.e., setup is complete), this variable has no further effect. Maximum length: 30 characters. + +**Key code locations:** +- Validation: `api/controllers/console/init_validate.py` +- Setup gate: `api/controllers/console/wraps.py` (`@setup_required` decorator) + +--- + +### DEBUG + +**Default:** `false` + +**What it actually does:** Enables verbose logging across many subsystems. When `true`: + +1. **App startup timing**—logs how long each extension takes to initialize. +2. **Workflow debugging**—adds a `DebugLoggingLayer` to the GraphEngine that logs all node inputs and outputs. +3. **Tool execution details**—prints colored console output showing tool invocations, inputs, outputs (truncated to 1000 chars), and errors. +4. **LLM invocation logging**—logs the full prompt messages sent to models, streaming chunks as they arrive, and final responses with usage stats. +5. **Exception details**—various app generators log full stack traces on errors instead of silently failing. + +This is primarily a developer/debugging tool. Not recommended for production due to the volume of output and potential exposure of sensitive data in logs. + +**Key code locations:** +- Definition: `api/configs/deploy/__init__.py` +- Workflow debug: `api/core/workflow/workflow_entry.py` +- Tool callbacks: `api/core/callback_handler/agent_tool_callback_handler.py` +- LLM logging: `api/dify_graph/model_runtime/model_providers/__base/large_language_model.py` + +--- + +### FLASK_DEBUG + +**Default:** `false` + +**What it actually does:** Defined in `.env.example` but **not actively used** in the Dify codebase. Flask's standard `FLASK_DEBUG` would enable Flask's auto-reloader and interactive debugger, but Dify doesn't leverage this mechanism—`DEBUG` is the primary control instead. + +--- + +### ENABLE_REQUEST_LOGGING + +**Default:** `false` + +**What it actually does:** Registers Flask signal handlers that log HTTP request and response details. When enabled: + +- **Always logs** a compact access line for every request: `{METHOD} {PATH} {STATUS_CODE} {DURATION_MS} {TRACE_ID}` +- **If LOG_LEVEL is also DEBUG**, additionally logs full request and response bodies as pretty-printed JSON. + +Useful for debugging API issues, but can produce very large log volumes in production. The body logging only activates when both this variable AND DEBUG-level logging are enabled—enabling just this variable gives you the compact access log without the bodies. + +**Key code locations:** +- Definition: `api/configs/deploy/__init__.py` +- Implementation: `api/extensions/ext_request_logging.py` + +--- + +### DEPLOY_ENV + +**Default:** `PRODUCTION` + +**What it actually does:** Purely an **observability label**—it tags monitoring data but does NOT change any application behavior. Despite the `.env.example` comment about a "distinct color label on the front-end page," the backend code has no conditional logic based on this value. + +It's sent to: +- **Sentry** as the `environment` tag for error grouping +- **OpenTelemetry** as the deployment environment resource attribute +- **HTTP response headers** as `X-Env` on every response + +Setting it to `TESTING`, `STAGING`, or any custom value simply changes these labels. The frontend color label behavior, if it exists, is handled by the web frontend independently. + +**Key code locations:** +- Definition: `api/configs/deploy/__init__.py` +- Sentry: `api/extensions/ext_sentry.py` +- OTEL: `api/extensions/ext_otel.py` +- Response header: `api/extensions/ext_app_metrics.py` + +--- + +### MIGRATION_ENABLED + +**Default:** `true` + +**What it actually does:** Controls whether database schema migrations run automatically when the Docker container starts. This is handled in the Docker entrypoint script (not in Python config): + +```bash +if [[ "${MIGRATION_ENABLED}" == "true" ]]; then + flask upgrade-db +fi +``` + +When `true`, the container runs `flask upgrade-db` before starting the API server, ensuring the database schema matches the code version. This is essential during upgrades. + +When `false`, migrations are skipped—useful if you want to run them manually or if a separate migration job handles them. For source code deployments (non-Docker), you always run migrations manually with `flask db upgrade`. + +**Key code locations:** +- Docker entrypoint: `docker/docker-compose.yaml` and `api/docker/entrypoint.sh` + +--- + +### CHECK_UPDATE_URL + +**Default:** `https://updates.dify.ai` + +**What it actually does:** The console has a version check feature that calls this URL with the current version. The remote endpoint returns information about newer versions (version number, release date, release notes, whether auto-update is possible). + +If set to empty, the version check is skipped entirely—the console shows the current version but never indicates that updates are available. This is useful in air-gapped environments or if you don't want the system making external HTTP calls. + +**Key code locations:** +- Definition: `api/configs/feature/__init__.py` (UpdateConfig) +- Usage: `api/controllers/console/version.py` + +--- + +### OPENAI_API_BASE + +**Default:** `https://api.openai.com/v1` + +**What it actually does:** This is a **legacy variable** that appears in `.env.example` but is **not actively used** in the current codebase. The modern implementation uses hosted service configurations (`HOSTED_OPENAI_API_BASE`, etc.) instead. It may still be picked up by the OpenAI Python SDK if present in the environment, but Dify's own code does not reference it. + +--- + +### ACCESS_TOKEN_EXPIRE_MINUTES + +**Default:** `60` (1 hour) + +**What it actually does:** Controls how long a login session's access token remains valid. Dify uses JWT tokens stored in HTTP-only cookies. The access token has a short lifespan—when it expires, the browser automatically uses the refresh token to get a new one without requiring re-login. + +**Key code locations:** +- JWT creation: `api/services/account_service.py` +- Cookie settings: `api/libs/token.py` +- WebApp tokens: `api/controllers/web/passport.py` + +--- + +### REFRESH_TOKEN_EXPIRE_DAYS + +**Default:** `30` (1 month) + +**What it actually does:** Controls how long a user can stay logged in without re-entering credentials. The refresh token is stored in Redis with this TTL and as an HTTP-only cookie. When the access token expires (every 60 minutes by default), the refresh token is used to silently generate a new access token. + +If the user doesn't visit for longer than this period, they'll need to log in again. + +**Key code locations:** +- Token storage: `api/services/account_service.py` +- Cookie settings: `api/libs/token.py` + +--- + +### APP_MAX_EXECUTION_TIME + +**Default:** `1200` (20 minutes) + +**What it actually does:** Sets the maximum time an app execution (chat completion, workflow run, etc.) can run before being forcefully terminated. The queue listener that streams results monitors elapsed time and publishes a stop event when this limit is reached. + +This prevents runaway executions from consuming resources indefinitely—for example, a workflow with an infinite loop or an extremely slow LLM call. + +**Key code locations:** +- Definition: `api/configs/feature/__init__.py` (AppExecutionConfig) +- Enforcement: `api/core/app/apps/base_app_queue_manager.py` + +--- + +### APP_DEFAULT_ACTIVE_REQUESTS + +**Default:** `0` (unlimited) + +**What it actually does:** Sets the default concurrent request limit per app. When an app doesn't have a custom `max_active_requests` setting configured in the UI, this value is used as the fallback. + +If set to `5`, each app allows at most 5 simultaneous executions by default. New requests while the limit is reached are rejected. `0` means no limit. + +Works together with `APP_MAX_ACTIVE_REQUESTS`—the effective limit is the smaller of the two non-zero values. + +**Key code locations:** +- Definition: `api/configs/feature/__init__.py` (AppExecutionConfig) +- Enforcement: `api/services/app_generate_service.py` + +--- + +### APP_MAX_ACTIVE_REQUESTS + +**Default:** `0` (unlimited) + +**What it actually does:** Sets the global hard ceiling for concurrent requests across all apps. Even if an app's individual limit (or `APP_DEFAULT_ACTIVE_REQUESTS`) is higher, this value can never be exceeded. + +Think of it as: `APP_DEFAULT_ACTIVE_REQUESTS` is the per-app default, and `APP_MAX_ACTIVE_REQUESTS` is the system-wide maximum that overrides everything. + +**Key code locations:** +- Definition: `api/configs/feature/__init__.py` (AppExecutionConfig) +- Enforcement: `api/services/app_generate_service.py` + +--- + +## Datasource Configuration + +### ENABLE_WEBSITE_JINAREADER / ENABLE_WEBSITE_FIRECRAWL / ENABLE_WEBSITE_WATERCRAWL + +**Defaults:** `true` / `true` / `true` + +**What they actually do:** These are **frontend-only** feature flags. They control whether the corresponding web crawling service appears as an option in the dataset creation UI. The backend does not check these variables—it always supports all crawlers. Setting one to `false` simply hides that option from the UI. + +**Key code locations:** +- Frontend config: `web/config/index.ts`, `web/env.ts` +- UI rendering: `web/app/components/datasets/create/website/index.tsx` + +--- + +### NEXT_PUBLIC_ENABLE_SINGLE_DOLLAR_LATEX + +**Default:** `false` + +**What it actually does:** **Frontend-only.** Controls whether single dollar signs (`$...$`) trigger inline LaTeX rendering in chat responses. Disabled by default because single dollar signs commonly appear in regular text (prices, code), causing unintended math formatting. When enabled, both `$...$` (inline) and `$$...$$` (block) trigger LaTeX. When disabled, only double-dollar works. + +**Key code locations:** +- Frontend: `web/env.ts`, `web/app/components/base/markdown/streamdown-wrapper.tsx` + +--- + +## Database Configuration — Connection Pool + +### SQLALCHEMY_MAX_OVERFLOW + +**Default:** `10` + +**What it actually does:** When all `SQLALCHEMY_POOL_SIZE` connections are in use, SQLAlchemy can create up to this many additional temporary connections. So with `pool_size=30` and `max_overflow=10`, up to 40 connections can exist simultaneously. These overflow connections are closed immediately after use (not returned to the pool). If even the overflow is exhausted, new requests wait up to `SQLALCHEMY_POOL_TIMEOUT` seconds. + +--- + +### SQLALCHEMY_POOL_PRE_PING + +**Default:** `false` + +**What it actually does:** Before handing out a connection from the pool, sends a lightweight test query (`SELECT 1`) to verify the connection is still alive. If the connection is dead (e.g., database restarted, network blip), it's discarded and a fresh one is created. Adds a small amount of latency to every database operation, but prevents "connection lost" errors. Recommended for production deployments with long idle periods or unreliable networks. + +--- + +### SQLALCHEMY_POOL_USE_LIFO + +**Default:** `false` (FIFO) + +**What it actually does:** Controls the order connections are reused from the pool. FIFO (default) rotates through all connections evenly—good for distributing load. LIFO reuses the most recently returned connection—keeps fewer connections "warm" and can reduce overhead when the pool is larger than typical demand. + +--- + +### SQLALCHEMY_POOL_TIMEOUT + +**Default:** `30` (seconds) + +**What it actually does:** When a request needs a database connection but none are available (all `pool_size + max_overflow` connections are busy), it waits this many seconds. If no connection frees up in time, the request fails with a timeout error. This is a safety valve preventing requests from hanging indefinitely during database overload. + +--- + +### PostgreSQL / MySQL Performance Tuning Variables + +These are **not read by Dify's Python code**. They are passed as startup arguments to the database container in `docker-compose.yaml`. For example, `POSTGRES_SHARED_BUFFERS=128MB` becomes `postgres -c 'shared_buffers=128MB'`. They configure the database server itself, not the application. + +--- + +## Redis Configuration + +### REDIS_USE_CLUSTERS / REDIS_CLUSTERS / REDIS_CLUSTERS_PASSWORD + +**What they actually do:** Enable Redis Cluster mode (as opposed to standalone or Sentinel mode). When `REDIS_USE_CLUSTERS=true`, Dify creates a `RedisCluster` client that connects to multiple Redis nodes for automatic sharding and high availability. `REDIS_CLUSTERS` is a comma-separated list of nodes (`host1:port1,host2:port2`). Cluster mode is mutually exclusive with Sentinel mode—you use one or the other. + +**Key code locations:** +- Definition: `api/configs/middleware/cache/redis_config.py` +- Client creation: `api/extensions/ext_redis.py` + +--- + +### REDIS_MAX_CONNECTIONS + +**Default:** (empty; uses redis-py library default) + +**What it actually does:** Limits the total number of connections in the Redis connection pool. Applied to standalone, Sentinel, and Cluster modes. When the pool is exhausted, new operations block waiting for a connection to free up. Leave unset to use the library default. Set this if you need to limit Redis connections to match your Redis server's `maxclients` setting. + +--- + +### Redis SSL Variables + +`REDIS_SSL_CERT_REQS`, `REDIS_SSL_CA_CERTS`, `REDIS_SSL_CERTFILE`, `REDIS_SSL_KEYFILE` only take effect when `REDIS_USE_SSL=true`. They configure TLS certificate verification and mutual TLS (mTLS) authentication: + +- `CERT_REQS`: Level of verification (`CERT_NONE` = no verification, `CERT_REQUIRED` = full verification) +- `CA_CERTS`: Path to the CA certificate for verifying the server +- `CERTFILE` + `KEYFILE`: Client certificate and key for mutual TLS + +These same SSL settings are also applied to the Celery broker when `BROKER_USE_SSL=true`. + +--- + +## Celery Configuration + +### CELERY_BACKEND + +**Default:** `redis` + +**What it actually does:** Controls where Celery stores task results after execution. Options: `redis` (stores in Redis, fast), `database` (stores in the main PostgreSQL/MySQL database). For most deployments, `redis` is the right choice. + +--- + +### CELERY_TASK_ANNOTATIONS + +**Default:** `null` + +**What it actually does:** Applies runtime configuration to specific Celery tasks. Format is a JSON dictionary mapping task names to options like rate limits or time limits. Example: `{"tasks.add": {"rate_limit": "10/s"}}` limits that task to 10 executions per second. Most users don't need this. + +--- + +### CELERY_SENTINEL_PASSWORD + +**Default:** (empty) + +**What it actually does:** Password for authenticating with Redis Sentinel nodes when using Sentinel mode for the Celery broker. This is separate from `REDIS_SENTINEL_PASSWORD`—they can differ if you use different Sentinel clusters for caching vs task queuing, though in practice they're usually the same value. + +--- + +### BROKER_USE_SSL + +**Default:** (auto-detected from URL scheme) + +**What it actually does:** This is a computed property, not something you set directly. It returns `true` when `CELERY_BROKER_URL` starts with `rediss://` (note the double `s`). When true, the Redis SSL certificate settings are applied to the Celery broker connection. + +--- + +## CORS Configuration + +### COOKIE_DOMAIN / NEXT_PUBLIC_COOKIE_DOMAIN + +**What they actually do:** These work together to enable cross-subdomain authentication. + +By default (both empty), Dify uses `__Host-` prefixed cookies—the most secure option, but cookies are locked to a single domain. When your frontend and backend are on different subdomains (e.g., `console.example.com` and `api.example.com`), set both to the shared top-level domain (`example.com`) so authentication cookies can be shared across subdomains. + +`COOKIE_DOMAIN` is used by the backend when setting cookies. `NEXT_PUBLIC_COOKIE_DOMAIN` is used by the frontend to know the cookie domain. + +--- + +### NEXT_PUBLIC_BATCH_CONCURRENCY + +**Default:** `5` + +**What it actually does:** **Frontend-only.** Controls how many concurrent API calls the web UI makes during batch operations (e.g., bulk dataset indexing). Does not affect the backend. + +--- + +## Datasource Configuration (continued) + +### ENABLE_WEBSITE_JINAREADER / ENABLE_WEBSITE_FIRECRAWL / ENABLE_WEBSITE_WATERCRAWL + +Already documented above. + +### NEXT_PUBLIC_ENABLE_SINGLE_DOLLAR_LATEX + +Already documented above. + +--- + +## Database Configuration (continued) + +### DB_TYPE + +**Default:** `postgresql` + +**What it actually does:** Controls the SQLAlchemy database driver. When set to `postgresql`, uses the `postgresql` driver. Any other value (`mysql`, `oceanbase`, `seekdb`) uses `mysql+pymysql`. Also affects connect_args: PostgreSQL gets `-c timezone=UTC` appended to force UTC timezone on all connections. + +**Key code locations:** +- Definition and URI construction: `api/configs/middleware/__init__.py` (DatabaseConfig) + +--- + +### DB_USERNAME + +**Default:** `postgres` + +**What it actually does:** Database username, URL-encoded into the SQLAlchemy connection URI. There is NO MySQL root-only restriction enforced in the code despite the `.env.example` comment—any valid MySQL user works. + +--- + +### DB_PASSWORD + +**Default:** `difyai123456` + +**What it actually does:** Database password, URL-encoded into the connection URI to handle special characters like `@`, `:`, `%`. + +--- + +### DB_HOST + +**Default:** `db_postgres` (Docker service name) / `localhost` (code default) + +**What it actually does:** Database server hostname, used directly in the SQLAlchemy connection URI. + +--- + +### DB_PORT + +**Default:** `5432` (always, regardless of DB_TYPE) + +**What it actually does:** Database server port. Important: the default is hardcoded to 5432 (PostgreSQL). If you switch to MySQL, you MUST explicitly set `DB_PORT=3306`—there is no auto-detection based on DB_TYPE. + +--- + +### DB_DATABASE + +**Default:** `dify` + +**What it actually does:** Database/schema name in the connection URI. + +--- + +### SQLALCHEMY_POOL_SIZE + +**Default:** `30` + +**What it actually does:** Number of persistent connections SQLAlchemy keeps open in its pool. These connections are pre-created and held open, ready for immediate use. Increasing this allows more concurrent database operations but uses more database server resources. + +--- + +### SQLALCHEMY_POOL_RECYCLE + +**Default:** `3600` (1 hour) + +**What it actually does:** Automatically closes and recreates connections that have been alive longer than this many seconds. Solves the problem of database servers silently dropping idle connections—without recycling, SQLAlchemy would try to use a dead connection and fail. + +--- + +### SQLALCHEMY_ECHO + +**Default:** `false` + +**What it actually does:** When enabled, logs every SQL statement SQLAlchemy executes to the Python logger. Output includes the raw SQL with bound parameters. Generates massive log volume—for development/debugging only. + +--- + +## Logging Configuration (continued) + +### LOG_LEVEL + +**Default:** `INFO` + +**What it actually does:** Sets the root Python logger level via `logging.basicConfig(level=...)`. Controls the minimum severity that gets logged across all handlers (file + console). Levels from least to most severe: `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`. + +--- + +### LOG_OUTPUT_FORMAT + +**Default:** `text` + +**What it actually does:** Chooses between two log formatters. `text` produces human-readable lines with timestamp, level, thread, file:line, trace ID, and message. Supports timezone conversion via `LOG_TZ`. `json` produces structured JSON suitable for log aggregation tools (ELK, Datadog, etc.)—but does NOT support `LOG_TZ` (always UTC). + +**Key code locations:** +- Formatter selection: `api/extensions/ext_logging.py` + +--- + +### LOG_FILE + +**Default:** (empty; console-only logging) + +**What it actually does:** When set, enables file-based logging with automatic rotation via Python's `RotatingFileHandler`. The directory is created automatically if it doesn't exist. Also passed to Celery workers as `worker_logfile`. When empty, logs only go to console (stdout). + +--- + +### LOG_FILE_MAX_SIZE + +**Default:** `20` (MB) + +**What it actually does:** Maximum size of a single log file before rotation. Converted to bytes internally (`value * 1024 * 1024`). When the active log file exceeds this size, it's renamed to `.1`, the previous `.1` becomes `.2`, etc. + +--- + +### LOG_FILE_BACKUP_COUNT + +**Default:** `5` + +**What it actually does:** Number of rotated log files to keep. With default settings, you'll have at most 6 log files: the active file plus 5 backups (`.1` through `.5`). The oldest is deleted when a new rotation occurs. + +--- + +### LOG_DATEFORMAT + +**Default:** (empty; uses Python default `%Y-%m-%d %H:%M:%S`) + +**What it actually does:** Timestamp format string for text-format logs. Uses Python's strftime codes. Only applies to text format—JSON format ignores this. + +--- + +### LOG_TZ + +**Default:** `UTC` + +**What it actually does:** Timezone for log timestamps. Accepts pytz timezone strings (e.g., `America/New_York`, `Asia/Shanghai`). Only applies to text format logs—JSON format always uses UTC. Also sets Celery's internal timezone for task scheduling. + +--- + +## Redis Configuration (continued) + +### REDIS_HOST + +**Default:** `localhost` (code) / `redis` (.env.example) + +**What it actually does:** Redis server hostname. Only used in standalone mode—ignored when Sentinel or Cluster mode is enabled. + +--- + +### REDIS_PORT + +**Default:** `6379` + +**What it actually does:** Redis server port. Only used in standalone mode. + +--- + +### REDIS_USERNAME + +**Default:** (empty) + +**What it actually does:** Redis 6.0+ ACL username. Applies to all three modes (standalone, Sentinel, Cluster). When empty, uses Redis's default user. + +--- + +### REDIS_PASSWORD + +**Default:** `difyai123456` + +**What it actually does:** Redis authentication password. Applies to standalone and Sentinel modes. For Cluster mode, use `REDIS_CLUSTERS_PASSWORD` instead. + +--- + +### REDIS_DB + +**Default:** `0` + +**What it actually does:** Redis database number (0-15). Only applies to standalone and Sentinel modes (Cluster mode doesn't support database selection). Important: Celery broker parses its own database number from `CELERY_BROKER_URL`—make sure they don't collide. Default setup uses DB 0 for cache and DB 1 for Celery. + +--- + +### REDIS_USE_SSL + +**Default:** `false` + +**What it actually does:** Changes the Redis connection class from plain `Connection` to `SSLConnection`. Also auto-detects in pub/sub URL building (changes scheme to `rediss://`). Does NOT automatically apply SSL to Sentinel protocol—Sentinel runs its own protocol. + +--- + +### REDIS_USE_SENTINEL + +**Default:** `false` + +**What it actually does:** Switches from standalone to Sentinel mode. When enabled, `REDIS_HOST`/`REDIS_PORT` are ignored. Instead, Dify connects to the Sentinel nodes listed in `REDIS_SENTINELS`, asks them for the current master node for the service named `REDIS_SENTINEL_SERVICE_NAME`, and connects to that master. Failover is automatic—if the master goes down, Sentinel promotes a replica and Dify follows. + +--- + +### REDIS_SENTINELS + +**Default:** (empty) + +**What it actually does:** Comma-separated list of Sentinel nodes, parsed by splitting on `,` then on `:` to get `[(host1, port1), (host2, port2), ...]`. These are the Sentinel instances, not the actual Redis servers. + +--- + +### REDIS_SENTINEL_SERVICE_NAME + +**Default:** (empty) + +**What it actually does:** The logical service name that Sentinel monitors (configured in sentinel.conf as `sentinel monitor ...`). Dify calls `sentinel.master_for(service_name)` to get the current master's address. + +--- + +### REDIS_SENTINEL_USERNAME / REDIS_SENTINEL_PASSWORD + +**Defaults:** (both empty) + +**What they actually do:** Authentication for the Sentinel instances themselves—NOT for the Redis master/replica servers (those use `REDIS_USERNAME`/`REDIS_PASSWORD`). Sentinel nodes may require separate credentials. + +--- + +### REDIS_SENTINEL_SOCKET_TIMEOUT + +**Default:** `0.1` (seconds) + +**What it actually does:** Socket timeout for communicating with Sentinel nodes. If too low, Sentinel health checks and master discovery may time out intermittently. Default 0.1s assumes fast local network. For cloud/WAN deployments, increase to 1.0-5.0s. + +--- + +## Celery Configuration (continued) + +### CELERY_BROKER_URL + +Already documented above. Additional detail: the URL is parsed by Kombu's `parse_url()` to extract hostname, port, password, and database number. Supports both `redis://` and `rediss://` schemes. + +--- + +### CELERY_USE_SENTINEL / CELERY_SENTINEL_MASTER_NAME / CELERY_SENTINEL_SOCKET_TIMEOUT + +Already documented above. These form a cohesive unit: `CELERY_USE_SENTINEL` is the toggle, the other two provide the configuration. They configure `broker_transport_options` which Celery uses for Sentinel-aware broker connections. + +--- + +## CORS Configuration (continued) + +### WEB_API_CORS_ALLOW_ORIGINS + +**Default:** `*` + +**What it actually does:** Comma-separated string that gets split into a list. Applied to the Web API and Service API blueprints, covering public API endpoints (chat messages, embedded bots, authenticated API calls). + +--- + +### CONSOLE_CORS_ALLOW_ORIGINS + +**Default:** (empty; falls back to `CONSOLE_WEB_URL` via Pydantic AliasChoices) + +**What it actually does:** Same format as above, but applied to the console API blueprint and FastOpenAPI endpoints. If not explicitly set, Pydantic's `AliasChoices("CONSOLE_CORS_ALLOW_ORIGINS", "CONSOLE_WEB_URL")` falls back to `CONSOLE_WEB_URL`—so for single-domain setups, you only need to set `CONSOLE_WEB_URL`. + +--- + +## Container Startup Configuration (continued) + +### DIFY_BIND_ADDRESS + +**Default:** `0.0.0.0` + +**What it actually does:** Network interface the API server binds to. `0.0.0.0` listens on all interfaces. Set to `127.0.0.1` to restrict to localhost only. Used in both Flask debug mode and Gunicorn production mode. + +--- + +### DIFY_PORT + +**Default:** `5001` + +**What it actually does:** Port the API server listens on. Combined with `DIFY_BIND_ADDRESS` for the full socket binding in the entrypoint script. + +--- + +### SERVER_WORKER_AMOUNT + +**Default:** `1` + +**What it actually does:** Number of Gunicorn worker processes. With gevent (default), each worker handles multiple concurrent connections via greenlets, so 1 is usually sufficient. For sync workers, Gunicorn docs recommend `(2 x CPU cores) + 1`. + +--- + +### SERVER_WORKER_CLASS + +**Default:** `gevent` + +**What it actually does:** Gunicorn worker type. Gevent provides lightweight concurrency via greenlets. After gevent monkey-patches Python's standard library, a `post_patch()` hook patches psycopg2 (PostgreSQL driver) and gRPC for async compatibility. Changing this breaks these patches and requires removing gevent dependencies. + +--- + +### SERVER_WORKER_CONNECTIONS + +**Default:** `10` + +**What it actually does:** Maximum concurrent connections per worker. Only applies to async workers (gevent). With default settings (1 worker, 10 connections), the server handles up to 10 concurrent requests. Increase for high-concurrency deployments. + +--- + +### GUNICORN_TIMEOUT + +**Default:** `360` + +**What it actually does:** If a worker doesn't respond within this many seconds, Gunicorn kills and restarts it. The client request is lost. Set to 360 (6 minutes) to support long-lived SSE (Server-Sent Events) connections used for streaming LLM responses. + +--- + +### CELERY_WORKER_CLASS + +**Default:** (empty; defaults to `gevent`) + +**What it actually does:** Worker type for Celery task processing. Same gevent patching requirements as `SERVER_WORKER_CLASS`. The entrypoint script checks `CELERY_WORKER_POOL` first, then `CELERY_WORKER_CLASS`, then falls back to `gevent`. + +--- + +### CELERY_WORKER_AMOUNT + +**Default:** (empty; defaults to `1`) + +**What it actually does:** Number of Celery worker processes (concurrency level). Only used when autoscaling is disabled. Passed as the `-c` flag to Celery. + +--- + +### CELERY_AUTO_SCALE / CELERY_MAX_WORKERS / CELERY_MIN_WORKERS + +**Defaults:** `false` / (empty; defaults to CPU count) / (empty; defaults to 1) + +**What they actually do:** When `CELERY_AUTO_SCALE=true`, Celery uses `--autoscale=MAX,MIN` instead of fixed concurrency. Celery monitors queue depth and spawns/kills workers dynamically between MIN and MAX. Useful for variable workloads with spiky task queues. + +--- + +### API_TOOL_DEFAULT_CONNECT_TIMEOUT / API_TOOL_DEFAULT_READ_TIMEOUT + +**Defaults:** `10` / `60` + +**What they actually do:** Timeout values (in seconds) for HTTP requests made by API Tool nodes in workflows. Connect timeout controls how long to wait for establishing a TCP connection; read timeout controls how long to wait for the response. When exceeded, the tool invocation fails. Read directly via `os.getenv()` in `api/core/tools/custom_tool/tool.py`. + +--- + +## Knowledge Configuration + +### UPLOAD_FILE_SIZE_LIMIT + +**Default:** `15` (MB) + +**What it actually does:** Maximum file size for general document uploads (PDFs, Word docs, etc.). Enforced in `FileService.is_file_size_within_limit()`—users get a `FileTooLargeError` when exceeded. Does not apply to images, videos, or audio (they have separate limits). + +### UPLOAD_FILE_BATCH_LIMIT + +**Default:** `5` + +**What it actually does:** Returned to the frontend as `batch_count_limit` in the upload config endpoint. Primarily a frontend hint for the file picker dialog, not strictly enforced server-side for individual batches. + +### UPLOAD_FILE_EXTENSION_BLACKLIST + +**Default:** (empty—all file types allowed) + +**What it actually does:** Comma-separated list of blocked file extensions (lowercase, no dots). Enforced in `FileService.upload_file()`—raises `BlockedFileExtensionError`. Users see: "File extension '.exe' is not allowed for security reasons." + +### SINGLE_CHUNK_ATTACHMENT_LIMIT + +**Default:** `10` + +**What it actually does:** Maximum number of images/files that can be embedded in a single knowledge base chunk (segment). A "chunk" is a segment of content in the knowledge base. When creating or editing a segment with more images than this limit, it raises a ValueError. + +### IMAGE_FILE_BATCH_LIMIT + +**Default:** `10` + +**What it actually does:** Maximum images per upload batch. Returned to frontend in upload config. Different from `SINGLE_CHUNK_ATTACHMENT_LIMIT` which limits images per knowledge base segment. + +### ATTACHMENT_IMAGE_FILE_SIZE_LIMIT + +**Default:** `2` (MB) + +**What it actually does:** Maximum size for images embedded in knowledge base content from external URLs. When Dify indexes a document with markdown images (`![alt](url)`), it fetches them. Images larger than this are skipped. Different from `UPLOAD_IMAGE_FILE_SIZE_LIMIT` (10 MB) which applies to direct image uploads via UI. + +### ATTACHMENT_IMAGE_DOWNLOAD_TIMEOUT + +**Default:** `60` (seconds) + +**What it actually does:** Timeout for downloading images from external URLs during knowledge base indexing. Uses SSRF proxy for the request. If an external image server is slow or unresponsive, the download is abandoned after this timeout and the image is skipped. + +### ETL_TYPE + +**Default:** `dify` + +**What it actually does:** Chooses the document extraction library. `dify` uses built-in extractors (supports txt, md, pdf, html, xlsx, docx, csv). `Unstructured` uses Unstructured.io (adds support for doc, msg, eml, ppt, pptx, xml, epub). The choice affects which file types appear as uploadable in the UI. + +### UNSTRUCTURED_API_URL / UNSTRUCTURED_API_KEY + +**Defaults:** both empty + +**What they actually do:** Connection settings for Unstructured.io API. Only needed when `ETL_TYPE=Unstructured`. The URL is also checked to enable .ppt file support (old PowerPoint format only works with Unstructured). + +### TOP_K_MAX_VALUE + +**Default:** `10` + +**What it actually does:** Maximum value users can set for the `top_k` parameter in knowledge base retrieval. Defined in `.env.example` and docker-compose but not yet in Python config classes. + +--- + +## Model Configuration + +### PROMPT_GENERATION_MAX_TOKENS / CODE_GENERATION_MAX_TOKENS + +**Defaults:** `512` / `1024` + +**What they actually do:** Defined in `.env.example` but not yet implemented in Python config. Intended to limit LLM output tokens when auto-generating prompts or code. + +### PLUGIN_BASED_TOKEN_COUNTING_ENABLED + +**Default:** `false` + +**What it actually does:** When enabled, uses plugin-based token counting via PluginModelClient for accurate token usage tracking. When disabled, token counting returns 0 (faster, but cost/usage tracking is less accurate). + +--- + +## Multi-modal Configuration + +### MULTIMODAL_SEND_FORMAT + +**Default:** `base64` + +**What it actually does:** When sending files to multi-modal LLMs, `base64` embeds the file data directly in the request (more compatible, works offline, larger requests). `url` sends a signed URL for the model to fetch (faster, smaller requests, but requires FILES_URL to be externally accessible and the model must have internet access). + +### UPLOAD_IMAGE_FILE_SIZE_LIMIT / UPLOAD_VIDEO_FILE_SIZE_LIMIT / UPLOAD_AUDIO_FILE_SIZE_LIMIT + +**Defaults:** `10` / `100` / `50` (MB) + +**What they actually do:** Maximum file sizes for direct uploads via UI, enforced in `FileService.is_file_size_within_limit()`. Each applies to its respective file type category (images: jpg/png/webp/gif/svg; videos: mp4/mov/mpeg/webm; audio: mp3/m4a/wav/amr/mpga). + +--- + +## Sentry Configuration + +### SENTRY_DSN vs API_SENTRY_DSN + +`SENTRY_DSN` is the canonical variable used in the Python backend. `API_SENTRY_DSN` is a Docker-level alias that maps to it. They are effectively the same. + +### WEB_SENTRY_DSN + +Frontend-only (maps to `NEXT_PUBLIC_SENTRY_DSN` in the Next.js web app). Not used by the backend. + +### PLUGIN_SENTRY_ENABLED / PLUGIN_SENTRY_DSN + +Placeholder variables for future plugin daemon Sentry integration. Not yet implemented in Python code. + +--- + +## Notion Integration + +### NOTION_INTEGRATION_TYPE + +**Default:** `public` + +**What it actually does:** Selects between two Notion API authentication modes. `public` uses standard OAuth 2.0 (requires HTTPS for redirect URL, needs CLIENT_ID + CLIENT_SECRET). `internal` uses a direct integration token (works with HTTP, only needs NOTION_INTERNAL_SECRET). Use `internal` for local deployments since Notion's OAuth redirect URL requires HTTPS. + +--- + +## Mail Configuration + +### SMTP_USE_TLS vs SMTP_OPPORTUNISTIC_TLS + +Three modes: +- **Implicit TLS** (`SMTP_USE_TLS=true`, `SMTP_OPPORTUNISTIC_TLS=false`): Uses `SMTP_SSL` on port 465. TLS from the start. +- **Explicit TLS/STARTTLS** (`SMTP_USE_TLS=true`, `SMTP_OPPORTUNISTIC_TLS=true`): Uses `SMTP` on port 587, then upgrades to TLS via STARTTLS command. +- **Plain** (`SMTP_USE_TLS=false`, `SMTP_OPPORTUNISTIC_TLS=false`): No encryption. +- `SMTP_USE_TLS=false` + `SMTP_OPPORTUNISTIC_TLS=true` is invalid and raises an error. + +--- + +## Others Configuration + +### CODE_EXECUTION_ENDPOINT / CODE_EXECUTION_API_KEY + +The sandbox is a separate Go service that executes Python/JavaScript/Jinja2 code nodes. Dify sends POST requests to `{endpoint}/v1/sandbox/run` with the API key in `X-Api-Key` header. The sandbox runs code in isolation with configurable network access. + +### WORKFLOW_MAX_EXECUTION_STEPS / WORKFLOW_MAX_EXECUTION_TIME / WORKFLOW_CALL_MAX_DEPTH + +Enforced by `ExecutionLimitsLayer` in the graph engine. Steps counts every node execution. Time is wall-clock. Depth limits nested workflow-calls-workflow. Exceeding any of these terminates the workflow. + +### MAX_VARIABLE_SIZE + +**Default:** `204800` (200 KB) + +Checked when creating variables: `if result.size > MAX_VARIABLE_SIZE` raises `VariableError`. Prevents memory attacks via extremely large variable values. + +### HTTP_REQUEST_MAX_CONNECT_TIMEOUT / HTTP_REQUEST_MAX_READ_TIMEOUT / HTTP_REQUEST_MAX_WRITE_TIMEOUT + +These are maximum ceilings. Users can set per-node timeouts in the workflow editor, but those values cannot exceed these limits. + +### WEBHOOK_REQUEST_BODY_MAX_SIZE + +Checked in webhook service's `_validate_content_length()`. Rejects payloads larger than this with `RequestEntityTooLarge`. Prevents webhook payload bombing. + +### SSRF_PROXY_HTTP_URL / SSRF_PROXY_HTTPS_URL + +All outbound HTTP requests from Dify (HTTP nodes, extension requests, image downloads) route through this SSRF proxy. The proxy (Squid) blocks requests to internal/private IP ranges, preventing Server-Side Request Forgery attacks. + +### RESPECT_XFORWARD_HEADERS_ENABLED + +When enabled, wraps the WSGI app with Flask's ProxyFix middleware, trusting X-Forwarded-For/Proto/Port headers. Only enable behind a single trusted reverse proxy—otherwise allows IP spoofing. + +--- + +## Plugin Daemon Configuration + +### PLUGIN_DAEMON_URL / PLUGIN_DAEMON_KEY + +The plugin daemon is a separate process. All plugin operations (list, install, execute) flow as HTTP requests to this URL with the key in `X-Api-Key` header. + +### PLUGIN_DIFY_INNER_API_KEY + +The reverse direction: when the plugin daemon needs to call back to the Dify API (e.g., to access files or models), it authenticates with this key via `X-Inner-Api-Key` header. Must match between API and plugin daemon services. + +### MARKETPLACE_ENABLED / MARKETPLACE_API_URL + +When disabled, only locally installed plugins are available. When enabled, Dify fetches plugin manifests from the marketplace for browsing, installation, and auto-upgrade checking. + +### FORCE_VERIFYING_SIGNATURE + +When true, plugin packages must have valid signatures before installation. Prevents installing tampered or unsigned plugins. + +### PIP_MIRROR_URL + +Used by the plugin daemon (not the Python API) when installing plugin dependencies. Set to a local PyPI mirror for faster installs or air-gapped environments. + +--- + +## OTLP / OpenTelemetry + +### ENABLE_OTEL + +Master switch. When enabled, instruments Flask with OpenTelemetry for distributed tracing and metrics. + +### OTLP endpoint fallback chain + +If `OTLP_TRACE_ENDPOINT` is set, use it. Otherwise, use `OTLP_BASE_ENDPOINT + "/v1/traces"`. Same pattern for metrics. + +### OTEL_SAMPLING_RATE + +**Default:** `0.1` (10%) + +Probabilistic sampling: only 10% of traces are exported by default. Reduces overhead in high-traffic production environments. + +--- + +## Scheduled Tasks + +### ENABLE_CLEAN_EMBEDDING_CACHE_TASK + +Deletes embedding cache records older than the configured retention period. Manages database size for the embeddings table. + +### ENABLE_CLEAN_UNUSED_DATASETS_TASK + +Disables documents in knowledge bases that haven't had activity within the retention period. Logs cleanup in `DatasetAutoDisableLog`. + +### ENABLE_CLEAN_MESSAGES + +Deletes conversation messages older than `SANDBOX_EXPIRED_RECORDS_RETENTION_DAYS`. If billing is enabled, only cleans sandbox-tier tenants. + +### ENABLE_MAIL_CLEAN_DOCUMENT_NOTIFY_TASK + +Sends email to workspace owners listing which knowledge bases had documents auto-disabled by the unused datasets cleanup task. + +### ENABLE_DATASETS_QUEUE_MONITOR + +Monitors the dataset processing queue length in Redis. Sends email alerts to `QUEUE_MONITOR_ALERT_EMAILS` when the backlog exceeds `QUEUE_MONITOR_THRESHOLD`. + +### ENABLE_CHECK_UPGRADABLE_PLUGIN_TASK + +Fetches all plugin manifests from marketplace, compares with installed versions, and dispatches upgrade tasks according to each tenant's auto-upgrade schedule. + +--- + +## Sandbox / Web Frontend / Nginx / Docker + +### TEXT_GENERATION_TIMEOUT_MS + +Frontend-only. Controls UI timeout for streaming text generation—pauses rendering if the stream exceeds this duration. + +### SANDBOX_* + +Configure the isolated code execution sandbox (a separate Go service). `SANDBOX_ENABLE_NETWORK` controls whether code can make outbound HTTP requests. `SANDBOX_WORKER_TIMEOUT` limits individual code execution time. + +### COMPOSE_PROFILES + +Docker Compose feature. Each service declares which profiles it belongs to. The value `${VECTOR_STORE:-weaviate},${DB_TYPE:-postgresql}` starts the correct database and vector store containers automatically based on your choices. + +--- + +## File Storage Configuration + +### STORAGE_TYPE + +**Default:** `opendal` + +**What it actually does:** The central dispatcher in `api/extensions/ext_storage.py` uses a match/case pattern to select and initialize the storage backend. 12 backend types are supported. The `opendal` type is the modern default—it wraps Apache OpenDAL which provides a unified interface to many storage services. The old `local` type is deprecated but still works—it internally uses `OpenDALStorage(scheme="fs")`, the same as opendal with filesystem scheme. + +### OPENDAL_SCHEME / OPENDAL_FS_ROOT + +**Defaults:** `fs` / `storage` + +**What they actually do:** When `STORAGE_TYPE=opendal`, Dify scans environment variables matching `OPENDAL__*` and passes them as kwargs to the OpenDAL Operator. For example, with `OPENDAL_SCHEME=s3`, it scans for `OPENDAL_S3_ACCESS_KEY_ID`, `OPENDAL_S3_SECRET_ACCESS_KEY`, etc. Environment variables take precedence over `.env` file values. + +For the default `fs` scheme, `OPENDAL_FS_ROOT` sets the local directory path. The directory is created automatically. + +### S3_USE_AWS_MANAGED_IAM + +**Default:** `false` + +**What it actually does:** When `true`, creates a `boto3.Session()` without explicit credentials—boto3 auto-discovers credentials from EC2 instance metadata, ECS task roles, etc. When `false`, explicitly passes `S3_ACCESS_KEY` and `S3_SECRET_KEY` to the boto3 client. + +### ARCHIVE_STORAGE_* + +**What they actually do:** Separate S3-compatible storage for workflow run log archival. Used by the paid plan retention system to archive workflow runs older than 90 days to JSONL format. Tables archived include workflow_runs, workflow_node_executions, workflow_trigger_logs, etc. Requires `BILLING_ENABLED=true` and `ARCHIVE_STORAGE_ENABLED=true`. Different from main storage (which stores active files)—archive storage is write-once, read-rarely for compliance/audit. + +### Provider credential variables + +All provider-specific credential variables (S3_ENDPOINT, AZURE_BLOB_ACCOUNT_NAME, ALIYUN_OSS_ACCESS_KEY, etc.) are defined in Pydantic config classes under `api/configs/middleware/storage/` and flow directly to their respective client libraries (boto3, azure-storage-blob, oss2, etc.) during initialization in `api/extensions/storage/`. + +--- + +## Vector Database Configuration + +### VECTOR_STORE + +**Default:** `weaviate` (from .env.example; code default is `None`) + +**What it actually does:** The factory in `api/core/rag/datasource/vdb/vector_factory.py` uses a match/case pattern with `VectorType` enum to select and initialize the vector store backend. Supports 37+ backends with lazy imports. Falls back to the dataset's existing index type if the dataset already has one. + +### VECTOR_INDEX_NAME_PREFIX + +**Default:** `Vector_index` + +**What it actually does:** Used in `Dataset.gen_collection_name_by_id()` to generate collection names: `{prefix}_{dataset_id}_Node`. Dataset ID hyphens are converted to underscores. + +### WEAVIATE_GRPC_ENDPOINT + +Separate gRPC endpoint for high-performance binary protocol communication alongside REST. Parses URL to extract host, port, and security scheme. Falls back to inferring from HTTP endpoint if not set. gRPC provides significantly better performance for batch operations. + +### MILVUS_ENABLE_HYBRID_SEARCH + +When enabled, creates a BM25 sparse index for full-text search alongside vector similarity search. Requires Milvus >= 2.5.0. If the collection was created without this flag, it must be recreated after enabling. + +### ELASTICSEARCH_USE_CLOUD + +Toggles between self-hosted mode (uses HOST/PORT/USERNAME/PASSWORD) and Elastic Cloud mode (uses CLOUD_URL/API_KEY). Different credential requirements and client initialization paths. + +### OPENSEARCH_AUTH_METHOD + +Two modes: `basic` (username/password via http_auth) and `aws_managed_iam` (SigV4 request signing via Boto3 credentials). The `aws_service` setting distinguishes between Elasticsearch Service (`es`) and OpenSearch Serverless (`aoss`). + +### OCEANBASE_ENABLE_HYBRID_SEARCH + +Similar to Milvus—enables fulltext index creation for BM25 queries alongside vector search. Requires OceanBase >= 4.3.5.1. Collections must be recreated after enabling. diff --git a/.claude/skills/dify-docs-env-vars/verify-env-docs.py b/.claude/skills/dify-docs-env-vars/verify-env-docs.py new file mode 100644 index 000000000..5bb33a890 --- /dev/null +++ b/.claude/skills/dify-docs-env-vars/verify-env-docs.py @@ -0,0 +1,236 @@ +#!/usr/bin/env python3 +""" +Verify Dify environment variable documentation against .env.example. + +Parses both files, extracts variable names and defaults, and reports +discrepancies between what the documentation says and what .env.example defines. + +Usage: + python3 verify-env-docs.py [--env-example PATH] [--docs PATH] + +Both arguments are required (no defaults). +""" + +import argparse +import re +import sys +from pathlib import Path + + +def parse_env_example(path: str) -> dict[str, str]: + """Parse .env.example and return {VARIABLE_NAME: default_value}.""" + variables = {} + with open(path, encoding="utf-8") as f: + for line in f: + line = line.strip() + # Skip comments and empty lines + if not line or line.startswith("#"): + continue + # Match VARIABLE=value (value can be empty) + match = re.match(r"^([A-Z][A-Z0-9_]+)=(.*)", line) + if match: + name = match.group(1) + value = match.group(2).strip() + variables[name] = value + return variables + + +def parse_mdx_docs(path: str) -> dict[str, str]: + """Parse MDX documentation and extract documented defaults. + + Handles two formats: + 1. Table rows: | `VARIABLE` | `value` | description | + 2. Heading + Default line: + ### VARIABLE + Default: `value` + """ + variables = {} + with open(path, encoding="utf-8") as f: + lines = f.readlines() + + i = 0 + while i < len(lines): + line = lines[i].strip() + + # Format 1: Table rows — | `VAR_NAME` | `default` | description | + table_match = re.match( + r"^\|\s*`([A-Z][A-Z0-9_]+)`\s*\|\s*(.*?)\s*\|", line + ) + if table_match: + name = table_match.group(1) + default_cell = table_match.group(2).strip() + # Extract value from backticks if present + backtick_match = re.match(r"^`(.*)`$", default_cell) + if backtick_match: + variables[name] = backtick_match.group(1) + elif default_cell.startswith("("): + # (empty), (empty; falls back to...), (auto-generated), etc. + variables[name] = "" + else: + variables[name] = default_cell + i += 1 + continue + + # Format 2: ### VARIABLE_NAME followed by Default: `value` + heading_match = re.match(r"^###\s+([A-Z][A-Z0-9_]+)\s*$", line) + if heading_match: + name = heading_match.group(1) + # Look ahead for "Default:" line within next 5 lines + for j in range(1, 6): + if i + j >= len(lines): + break + next_line = lines[i + j].strip() + if not next_line: + continue + default_match = re.match(r"^Default:\s*(.+)", next_line) + if default_match: + raw = default_match.group(1).strip() + # Extract from backticks + bt = re.match(r"^`(.*)`", raw) + if bt: + variables[name] = bt.group(1) + elif raw.startswith("("): + variables[name] = "" + else: + variables[name] = raw + break + # Stop if we hit content that's not the default line + if next_line.startswith("#") or next_line.startswith("|"): + break + i += 1 + continue + + i += 1 + + return variables + + +PLACEHOLDER_PATTERNS = [ + "your-", "your_", "YOUR-", "YOUR_", + "xxx", "difyai", "dify-sandbox", + "sk-9f73s", "lYkiYYT6", "QaHbTe77", + "testaccount", "testpassword", "difypassword", + "gp-test.", "gp-ab", + "instance-name", + "WVF5YTha", +] + +PLACEHOLDER_EXACT = { + "dify", "password", "admin", +} + +PLACEHOLDER_CONTAINS = [ + "your-object-storage", + "xxx-vector", +] + + +def is_placeholder(value: str) -> bool: + """Check if a value is a placeholder (not a real default).""" + v = value.strip() + if v in PLACEHOLDER_EXACT: + return True + for pattern in PLACEHOLDER_PATTERNS: + if v.startswith(pattern): + return True + for pattern in PLACEHOLDER_CONTAINS: + if pattern in v: + return True + return False + + +def normalize(value: str) -> str: + """Normalize a value for comparison.""" + v = value.strip().strip('"').strip("'") + # Normalize boolean representations + if v.lower() in ("true", "yes", "1"): + return "true" + if v.lower() in ("false", "no", "0"): + return "false" + # Normalize empty + if v in ("null", "None", "none", ""): + return "" + # Treat placeholder values as empty (they're not real defaults) + if is_placeholder(v): + return "" + return v + + +def main(): + parser = argparse.ArgumentParser( + description="Verify env var documentation against .env.example" + ) + parser.add_argument( + "--env-example", + required=True, + help="Path to .env.example file (e.g., /path/to/dify/docker/.env.example)", + ) + parser.add_argument( + "--docs", + required=True, + help="Path to MDX documentation file (e.g., en/self-host/configuration/environments.mdx)", + ) + args = parser.parse_args() + + if not Path(args.env_example).exists(): + print(f"ERROR: .env.example not found at {args.env_example}") + sys.exit(1) + if not Path(args.docs).exists(): + print(f"ERROR: Documentation not found at {args.docs}") + sys.exit(1) + + env_vars = parse_env_example(args.env_example) + doc_vars = parse_mdx_docs(args.docs) + + print(f"Parsed {len(env_vars)} variables from .env.example") + print(f"Parsed {len(doc_vars)} variables from documentation") + print() + + # --- Check 1: Variables in .env.example but missing from docs --- + missing_from_docs = sorted(set(env_vars.keys()) - set(doc_vars.keys())) + if missing_from_docs: + print(f"=== MISSING FROM DOCS ({len(missing_from_docs)}) ===") + for name in missing_from_docs: + print(f" {name}={env_vars[name]}") + print() + + # --- Check 2: Variables in docs but not in .env.example --- + extra_in_docs = sorted(set(doc_vars.keys()) - set(env_vars.keys())) + if extra_in_docs: + print(f"=== IN DOCS BUT NOT IN .env.example ({len(extra_in_docs)}) ===") + for name in extra_in_docs: + print(f" {name} (doc default: {doc_vars[name]!r})") + print() + + # --- Check 3: Default value mismatches --- + common = sorted(set(env_vars.keys()) & set(doc_vars.keys())) + mismatches = [] + for name in common: + env_val = normalize(env_vars[name]) + doc_val = normalize(doc_vars[name]) + if env_val != doc_val: + mismatches.append((name, env_vars[name], doc_vars[name])) + + if mismatches: + print(f"=== DEFAULT MISMATCHES ({len(mismatches)}) ===") + for name, env_val, doc_val in mismatches: + print(f" {name}:") + print(f" .env.example: {env_val!r}") + print(f" documentation: {doc_val!r}") + print() + + # --- Summary --- + total_issues = len(missing_from_docs) + len(extra_in_docs) + len(mismatches) + if total_issues == 0: + print("ALL CHECKS PASSED — documentation matches .env.example") + else: + print(f"TOTAL ISSUES: {total_issues}") + print(f" Missing from docs: {len(missing_from_docs)}") + print(f" Extra in docs: {len(extra_in_docs)}") + print(f" Default mismatches: {len(mismatches)}") + + return 1 if total_issues > 0 else 0 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/.claude/skills/dify-docs-guides/SKILL.md b/.claude/skills/dify-docs-guides/SKILL.md new file mode 100644 index 000000000..3048354b9 --- /dev/null +++ b/.claude/skills/dify-docs-guides/SKILL.md @@ -0,0 +1,62 @@ +--- +name: dify-docs-guides +description: > + Use when writing, improving, or reviewing Dify user guide documentation. + Covers pages in en/use-dify/, en/develop-plugin/, and en/self-host/. + Triggers: "write docs for [feature]", "improve this page", + "review this documentation section". +--- + +# Dify Documentation Guides + +## Before Starting + +Read these files before beginning any documentation task: + +1. `writing-guides/style-guide.md` — voice, tone, writing patterns +2. `writing-guides/formatting-guide.md` — MDX formatting, Mintlify components +3. `writing-guides/glossary.md` — standardized terminology + +When optimizing Chinese or Japanese translations, also read: +- `tools/translate/formatting-zh.md` or `tools/translate/formatting-ja.md` + +## Reader Personas + +Adjust tone and assumed knowledge based on the document path: + +### en/use-dify/ +Product users building AI applications on Dify. Mix of developers and non-technical users. Assume basic AI familiarity but not infrastructure or deep coding knowledge. Explain technical concepts when they appear. Prioritize task completion and outcomes. + +### en/self-host/ +DevOps engineers and system administrators deploying Dify. Assume strong infrastructure knowledge (Docker, databases, networking, environment variables). Be precise with technical details. Don't over-explain standard operations. + +### en/develop-plugin/ +Developers building custom Dify plugins. Assume strong Python skills and familiarity with Dify's core concepts. Focus on API contracts, extension points, and code patterns. Code examples are essential. + +## Collaboration Model + +This is a team effort. The user brings documentation expertise and user empathy; Claude brings AI domain knowledge and broader technical perspective. Actively leverage this dynamic rather than passively executing writing tasks. + +**Explain the "why" behind AI concepts.** When an AI concept comes up, explain why it's designed this way and what problem it solves—not just what it does. For example, if asked why a tool role appears in conversation history, explain from the LLM API mechanism level. + +**Help judge design decisions.** When the user questions a product design: assess whether it's common in the AI field, clarify if it's Dify-specific or industry standard, and offer perspective on how users might understand it. + +**Provide analogies and concrete scenarios.** When concepts are abstract, use specific scenarios rather than technical jargon. Help the user understand "why users need this feature" from a practical standpoint. + +**Analyze wording from user cognition perspective.** When the user is unsure about phrasing, consider: Can users understand this term? Is it accurate in the AI context? Is there a term closer to the user's mental model? + +**Proactively flag issues.** If a design seems unusual, a concept may have been misunderstood, or a term is inaccurate in the AI domain—speak up directly rather than waiting to be asked. + +## Verifying Feature Behavior + +- For existing features: verify against the `main` branch of the Dify codebase. The user will provide the codebase path or it will be configured as an additional working directory. +- For new features: the user may specify a development branch. Code may be in flux—when behavior is ambiguous, ask rather than assume. +- Trust the codebase over existing documentation. Existing docs may be outdated or inaccurate. + +## Style Overrides + +No overrides. Follow `writing-guides/style-guide.md` as written. + +## Post-Writing Verification + +After completing the document, invoke `dify-docs-reader-test` to verify it from the reader's perspective. diff --git a/.claude/skills/dify-docs-reader-test/SKILL.md b/.claude/skills/dify-docs-reader-test/SKILL.md new file mode 100644 index 000000000..36862ed04 --- /dev/null +++ b/.claude/skills/dify-docs-reader-test/SKILL.md @@ -0,0 +1,44 @@ +--- +name: dify-docs-reader-test +description: > + Post-writing verification from the reader's perspective. Invoke after + completing any documentation task to simulate a real user reading the + document for the first time. +--- + +# Reader Experience Test + +## Purpose + +Verify documentation from the reader's perspective. A clean-context agent reads the finished document as the target persona, with no access to source material, codebase, or prior conversation. + +## How to Invoke + +Dispatch a subagent with exactly two inputs: +1. The document content (the page just written or updated) +2. The reader persona description (from the writing skill being used) + +Do NOT provide: source material, codebase access, prior conversation, or any context the reader wouldn't have. + +## What the Test Agent Checks + +- Can I accomplish the task described without prior knowledge? +- Are there steps that assume context not provided on this page? +- Are there terms used without explanation (and not in the glossary)? +- Is the information I need actually here, or do I have to guess? +- Do the code examples make sense on their own? +- After reading, do I know what to do next? + +## What the Test Agent Does NOT Do + +- Style or formatting review (that is the writer's job) +- Fact-check against the codebase (that is the writer's job) +- Rewrite anything—only report what was confusing + +## Output Format + +Structured feedback: +- **Got stuck at**: [section/step where understanding broke down] +- **Didn't understand**: [terms, concepts, or references that were unclear] +- **Missing context**: [assumptions the document makes that weren't established] +- **Verdict**: Clear / Minor gaps / Needs revision diff --git a/.claude/skills/dify-docs-release-sync/SKILL.md b/.claude/skills/dify-docs-release-sync/SKILL.md new file mode 100644 index 000000000..dc08f692a --- /dev/null +++ b/.claude/skills/dify-docs-release-sync/SKILL.md @@ -0,0 +1,260 @@ +--- +name: dify-docs-release-sync +description: > + Use when a Dify release milestone is ready and documentation needs + pre-release sync. User provides the milestone name manually. Covers + API reference, help documentation, and environment variable changes. +--- + +# Dify Release Documentation Sync + +## Overview + +Analyzes a GitHub milestone's merged PRs to identify documentation impact, generates a structured report, then executes updates after user approval. Three tracks: API reference (→ `dify-docs-api-reference`), help documentation (→ `dify-docs-guides`), and environment variables (→ `dify-docs-env-vars`). + +**Input**: Milestone name, provided by the user exactly as it appears on GitHub (e.g., `v1.14.0`). Never auto-detect — always ask if not provided. + +## Workflow + +```dot +digraph { + rankdir=TB; + "User provides milestone name" -> "Fetch merged PRs from GitHub"; + "Fetch merged PRs from GitHub" -> "Categorize PRs by doc impact"; + "Categorize PRs by doc impact" -> "Generate report"; + "Generate report" -> "Present report, STOP"; + "Present report, STOP" -> "User approves / adjusts" [style=dashed]; + "User approves / adjusts" -> "Execute API spec updates"; + "User approves / adjusts" -> "Execute help doc updates"; + "User approves / adjusts" -> "Execute env var updates"; + "Execute API spec updates" -> "Auto-translated on PR push"; + "Execute help doc updates" -> "Auto-translated on PR push"; + "Execute env var updates" -> "Auto-translated on PR push"; +} +``` + +## Phase 1: Analysis + +### 1.1 Fetch Milestone PRs + +```bash +# Get milestone number from name +MILESTONE_NUM=$(gh api repos/langgenius/dify/milestones --paginate \ + --jq '.[] | select(.title=="MILESTONE_NAME") | .number') + +# List closed issues/PRs for that milestone +gh api "repos/langgenius/dify/issues?milestone=$MILESTONE_NUM&state=closed&per_page=100" \ + --paginate --jq '.[] | select(.pull_request) | {number, title}' +``` + +For each PR, fetch changed files and description: +```bash +gh pr view PR_NUMBER --repo langgenius/dify --json number,title,body,labels,files +``` + +### 1.2 Categorize PRs + +For each PR, check changed files against these mappings. + +**Skip** (no doc impact): PRs that only touch `tests/`, `.github/`, `dev/`, or are pure refactoring with no behavior change (confirm from PR description). + +#### API Reference Detection (Deterministic) + +Any file matching these patterns means the corresponding spec is affected: + +| Source path | Affected spec(s) | +|---|---| +| `controllers/service_api/app/chat.py` | chat, chatflow | +| `controllers/service_api/app/completion.py` | completion | +| `controllers/service_api/app/workflow.py` | workflow, chatflow | +| `controllers/service_api/app/audio.py`, `file.py`, `site.py`, `app.py` | all 4 app specs | +| `controllers/service_api/app/message.py` | chat, chatflow, completion | +| `controllers/service_api/app/conversation.py` | chat, chatflow | +| `controllers/service_api/app/annotation.py` | chat, chatflow | +| `controllers/service_api/dataset/` | knowledge | +| `controllers/service_api/app/error.py` | all 4 app specs | +| `controllers/service_api/dataset/error.py` | knowledge | +| `controllers/service_api/wraps.py` | all 5 specs | +| `controllers/service_api/__init__.py` | all 5 specs (route changes) | +| `libs/external_api.py` | all 5 specs | + +Also check: Pydantic models and `fields/` serializers used by Service API controllers. If a PR modifies a model or serializer referenced by a Service API endpoint, that spec is affected. + +#### Help Documentation Detection (Heuristic) + +Read the PR description for context. Map changed source paths to likely doc areas: + +| Source path pattern | Likely doc area | +|---|---| +| `api/core/workflow/nodes/` | `en/use-dify/workflow/nodes/` | +| `api/core/rag/` | `en/use-dify/knowledge/` | +| `api/core/model_runtime/` | `en/use-dify/model-providers/` | +| `api/core/tools/` | `en/use-dify/tools/` or workflow tool node docs | +| `api/core/agent/` | `en/use-dify/build-apps/agent.mdx` | +| `api/core/app/` | `en/use-dify/build-apps/` | +| `web/app/components/` | UI-related docs (check PR description for specifics) | +| `docker/`, deployment configs | `en/getting-started/install/` | +| `api/configs/` | Configuration/environment variable docs | + +**Important**: These mappings are heuristic. For every candidate match: + +1. **Read the PR title and description** to confirm the change is user-facing (not purely internal). +2. **Read the existing doc page** to check whether the current documentation covers the affected area at a level of detail that warrants an update. If the doc doesn't cover the topic (e.g., a node doc that mentions model selection but never discusses model parameters), a PR that changes model parameter behavior may not require a doc update. +3. **Assess priority**: + - **High**: PR changes behavior that the doc explicitly describes → doc is now inaccurate + - **Medium**: PR adds a new capability in an area the doc covers at a general level → doc could be enhanced + - **Low / Skip**: PR changes something the doc doesn't cover at all → no update needed unless the feature is significant enough to warrant a new section + +**Also watch for**: +- "Breaking change" labels → high priority +- New feature PRs → may need new doc pages +- Deprecation notices → update existing docs +- Behavior changes → verify current docs are still accurate + +#### Environment Variable Detection (Deterministic) + +Any file matching these patterns means env var documentation is affected: + +| Source path | Impact | +|---|---| +| `docker/.env.example` | New vars, changed defaults, removed vars | +| `api/configs/**/*.py` | Pydantic config models define backend vars | +| `web/docker/entrypoint.sh` | Frontend Docker-to-NEXT_PUBLIC mapping | +| `docker-compose.yaml` | Infrastructure/container vars | + +When detected, the report should list which variables were added, removed, or had defaults changed, which config file(s) were modified, and priority (High if new/removed vars, Medium if default changes only). + +#### UI i18n Change Detection + +Check PRs that touch `web/i18n/en-US/` files: +1. Compare changed i18n keys against the UI Labels section of `writing-guides/glossary.md` +2. If a changed key exists in the glossary → flag for glossary update (value may have changed) +3. If a changed key is new and falls within terminology scope (feature names, field labels, menu names, button names, status labels) → flag as candidate for glossary addition +4. Report as a separate section in Phase 2 with: key, old value, new value, glossary status + +i18n source files: `web/i18n/{en-US,zh-Hans,ja-JP}/` (~30 JSON files each, ~4,875 keys total). Focus on: `common.json`, `app.json`, `workflow.json`, `dataset.json`, `dataset-creation.json`, `dataset-documents.json`—these contain the most documentation-relevant UI labels. + +## Phase 2: Report + +Generate the report and **STOP**. Do not execute until the user reviews and approves. + +### Report Template + +```markdown +# Pre-Release Doc Sync Report: [milestone] + +## Summary +- **PRs analyzed**: X merged PRs in milestone +- **API reference impact**: Y PRs → Z spec files +- **Help documentation impact**: W PRs → V doc pages +- **Environment variable impact**: E PRs → F variables +- **UI i18n changes**: G PRs → H glossary entries affected +- **No doc impact**: N PRs + +## API Reference Changes + +### openapi_chat.json / openapi_chatflow.json + +| PR | Title | Change Type | Details | +|---|---|---|---| +| #1234 | Add streaming retry | New parameter | `retry_count` on `/chat-messages` | +| #1235 | Fix error handling | Error codes | New `rate_limit_exceeded` on `/chat-messages` | + +### openapi_knowledge.json +| PR | Title | Change Type | Details | +|---|---|---|---| +| #1240 | Add metadata filter | New parameter | `metadata_filter` on list segments | + +## Help Documentation Changes + +| PR | Title | Affected Doc(s) | Priority | Change Needed | +|---|---|---|---|---| +| #1250 | Add semantic chunking | `knowledge/chunking.mdx` | High | Doc describes chunking strategies — new option must be added | +| #1251 | New HTTP node timeout | `workflow/nodes/http.mdx` | Low | Doc doesn't cover timeout config at this level of detail | + +## Environment Variable Changes + +| PR | Title | Variables | Change Type | Priority | +|---|---|---|---|---| +| #1270 | Add Redis sentinel | `REDIS_SENTINEL_*` (3 new) | New variables | High | +| #1271 | Change default log level | `LOG_LEVEL` default INFO→WARNING | Default change | Medium | + +## UI i18n Changes (Glossary Impact) + +| PR | Key | Old Value | New Value | Glossary Status | +|---|---|---|---|---| +| #1280 | `dataset.indexMethod` | Index Method | Indexing Method | Exists — update needed | +| #1281 | `workflow.nodeGroup` | (new) | Node Group | Candidate for addition | + +## No Documentation Impact + +| PR | Title | Reason | +|---|---|---| +| #1260 | Refactor internal cache | Internal only | +| #1261 | Update CI pipeline | Infrastructure | +``` + +## Phase 3: Execution + +After user approval (they may add, remove, or adjust items): + +### Codebase Preparation + +Checkout the release tag/branch in the Dify codebase (configured as an additional working directory) before auditing: +```bash +git fetch --tags +git checkout v1.14.0 # or the release branch +``` + +### API Reference Updates + +For each affected spec, dispatch a parallel audit agent with `dify-docs-api-reference` skill: +1. Audit the spec against the code, focusing on changes from the report (but audit fully — PRs may have side effects) +2. Fix the EN spec +3. Validate all modified JSON files + +**Cross-spec propagation**: Shared endpoints (file upload, audio, feedback, app info) appear in all 4 app specs. When fixing one, propagate to siblings. + +Translation of API specs is handled automatically by the workflow when changes are pushed — no manual translation step needed. + +### Help Documentation Updates + +For each affected doc page, use `dify-docs-guides` skill: +1. Read the current doc and the relevant PR(s) for context +2. Update content to reflect changes +3. Translation is handled automatically by the Dify workflow on PR push — no manual translation needed + +### Environment Variable Updates + +For each affected variable group, use `dify-docs-env-vars` skill: +1. Trace the variable in the release codebase +2. Update `en/self-host/configuration/environments.mdx` +3. Run the verification script to confirm zero mismatches +4. Translation is handled automatically by the Dify workflow on PR push + +### Parallel Execution + +- API spec audits: one agent per spec (parallel) +- Help doc updates: one agent per doc page (parallel) +- Env var updates: sequential (single target file) +- API, help doc, and env var tracks: can run in parallel + +## Key Paths + +| What | Path | +|---|---| +| Dify codebase | Configured as an additional working directory | +| OpenAPI specs | `dify-docs/{en,zh,ja}/api-reference/openapi_*.json` | +| GitHub repo | `langgenius/dify` | + +## Common Mistakes + +| Mistake | Fix | +|---|---| +| Auto-detecting milestone name | Always ask the user — naming is non-standard | +| Executing before report approval | STOP after report — user must review | +| Missing shared endpoint propagation | Fix in one spec → check all 4 app specs | +| Ignoring PR description | File paths are heuristic for non-API — description has the real context | +| Skipping Pydantic model changes | A model change may affect multiple endpoints — trace which controllers use it | +| Forgetting to checkout release tag | Audit against the release code, not main HEAD | +| Manually translating after EN fixes | Translation is automatic on PR push — never run manual translation scripts | diff --git a/.githooks/pre-commit b/.githooks/pre-commit new file mode 100755 index 000000000..51d86f8de --- /dev/null +++ b/.githooks/pre-commit @@ -0,0 +1,11 @@ +#!/bin/sh +set -e + +# Auto-regenerate termbase_i18n.md when glossary.md is staged. + +if git diff --cached --name-only | grep -q "^writing-guides/glossary.md$"; then + echo "glossary.md changed — regenerating termbase_i18n.md..." + python3 tools/translate/derive-termbase.py + git add tools/translate/termbase_i18n.md + echo "termbase_i18n.md regenerated and staged." +fi diff --git a/.gitignore b/.gitignore index 40d0f71ea..e5efc6ae9 100644 --- a/.gitignore +++ b/.gitignore @@ -2,7 +2,5 @@ .DS_Store .venv __pycache__/ -CLAUDE.md -AGENTS.md .claude/CLAUDE.local.md .claude/settings.local.json diff --git a/.mintignore b/.mintignore new file mode 100644 index 000000000..aa610350e --- /dev/null +++ b/.mintignore @@ -0,0 +1,2 @@ +writing-guides/ +tools/translate/config.json diff --git a/AGENTS.md b/AGENTS.md index a6c97c37b..07c4908aa 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,50 +1,44 @@ -# Mintlify documentation - -## Working relationship -- You can push back on ideas-this can lead to better documentation. Cite sources and explain your reasoning when you do so -- ALWAYS ask for clarification rather than making assumptions -- NEVER lie, guess, or make up anything - -## Project context -- Format: MDX files with YAML frontmatter -- Config: docs.json for navigation, theme, settings -- Components: Mintlify components - -## Content strategy -- Document just enough for user success - not too much, not too little -- Prioritize accuracy and usability -- Make content evergreen when possible -- Search for existing content before adding anything new. Avoid duplication unless it is done for a strategic reason -- Check existing patterns for consistency -- Start by making the smallest reasonable changes - -## docs.json - -- Refer to the [docs.json schema](https://mintlify.com/docs.json) when building the docs.json file and site navigation - -## Frontmatter requirements for pages -- title: Clear, descriptive page title -- description: Concise summary for SEO/navigation - -## Writing standards -- Second-person voice ("you") -- Prerequisites at start of procedural content -- Test all code examples before publishing -- Match style and formatting of existing pages -- Include both basic and advanced use cases -- Language tags on all code blocks -- Alt text on all images -- Relative paths for internal links - -## Git workflow -- NEVER use --no-verify when committing -- Ask how to handle uncommitted changes before starting -- Create a new branch when no clear branch exists for changes -- Commit frequently throughout development -- NEVER skip or disable pre-commit hooks - -## Do not -- Skip frontmatter on any MDX file -- Use absolute URLs for internal links -- Include untested code examples -- Make assumptions - always ask for clarification \ No newline at end of file +# Dify Documentation — AI Agent Instructions + +For documentation tasks, read these guides before starting: + +1. `writing-guides/style-guide.md` — Voice, tone, writing patterns +2. `writing-guides/formatting-guide.md` — MDX formatting, Mintlify components +3. `writing-guides/glossary.md` — Standardized terminology + +For task-specific guidance, see `writing-guides/index.md`. + +## Key Rules + +- Write in English only, except when specifically optimizing Chinese + or Japanese translations. +- Only edit the English section in `docs.json`. Translation sections sync + automatically. +- MDX files require `title` and `description` in YAML frontmatter. +- Never use `--no-verify` when committing. + +## Repository Structure + +en/, zh/, ja/ Documentation content (en is source) +writing-guides/ Style guide, formatting guide, glossary +tools/translate/ Translation pipeline and language-specific formatting +.claude/skills/ Documentation writing skills (auto-discovered) +docs.json Navigation structure + +## Development + +mintlify dev Local preview at localhost:3000 + +## Commit and PR Title Conventions + +{type}: {description} — lowercase, imperative, no trailing period, under 72 chars. + +| Type | When | Example | +|:-----|:-----|:--------| +| `docs` | New or updated content | `docs: add workflow node configuration guide` | +| `fix` | Typos, broken links, incorrect info | `fix: correct broken link in knowledge base page` | +| `feat` | Tooling or structural changes | `feat: add search index to knowledge section` | +| `refactor` | Reorganization without content changes | `refactor: restructure knowledge base section` | +| `translate` | Translation additions or updates | `translate: update Japanese workflow pages` | +| `style` | Formatting-only changes | `style: fix heading levels in plugin guide` | +| `chore` | Dependencies, config | `chore: bump mintlify to 4.0.710` | \ No newline at end of file diff --git a/CLAUDE.md b/CLAUDE.md index d1c5f0d41..85d750d35 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -1,187 +1,48 @@ -# CLAUDE.md +# Dify Documentation Repository -This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. +Documentation for Dify, built with Mintlify. English is the source language; +Chinese and Japanese translations are generated automatically. -## Project Overview +## Before Any Documentation Task -Dify documentation repository built with Mintlify, supporting multi-language documentation (English, Chinese, Japanese) with AI-powered automatic translation via Dify workflows. +Read `writing-guides/index.md` to identify the correct skill and shared +references for your task. -## Documentation Structure +## Key Rules -- **Format**: MDX files with YAML frontmatter (`title` and `description` required) -- **Languages**: `en/`, `cn/`, `jp/` (source: en) -- **Configuration**: `docs.json` (navigation structure) - [Mintlify schema](https://mintlify.com/docs.json) +- Write in English only, except when specifically optimizing Chinese or + Japanese translations. +- Only edit the English section in `docs.json`. Translation sections sync + automatically. +- MDX files require `title` and `description` in YAML frontmatter. +- When writing about a feature, verify behavior against the Dify codebase, + not just existing docs. Existing docs may be outdated. +- For new features, the user may specify a development branch. Code on + development branches may be in flux—when behavior is ambiguous, ask + rather than assume. -## Translation System +## Repository Structure -### Language Configuration +en/, zh/, ja/ Documentation content (en is source) +writing-guides/ Style guide, formatting guide, glossary +tools/translate/ Translation pipeline and language-specific formatting +.claude/skills/ Documentation writing skills (auto-discovered) +docs.json Navigation structure -All language settings in `tools/translate/config.json` (single source of truth): +## Development -```json -{ - "source_language": "en", - "target_languages": ["cn", "jp"], - "languages": { - "en": {"code": "en", "name": "English", "directory": "en"}, - "cn": { - "code": "cn", - "name": "Chinese", - "directory": "cn", - "translation_notice": "⚠️ AI translation..." - } - } -} -``` +mintlify dev Local preview at localhost:3000 -**Adding new language**: Edit config.json only - add to `target_languages` and `languages` object with required fields. +## Commit and PR Title Conventions -### Workflow +{type}: {description} — lowercase, imperative, no trailing period, under 72 chars. -- **Trigger**: Push to non-main branches with `.md/.mdx` changes in `en/` -- **Process**: Dify API streaming mode with terminology database (`termbase_i18n.md`) -- **Timing**: New files ~30-60s/lang | Modified files ~2-3min/lang (context-aware with git diff) -- **Auto-operations**: Translation notices, incremental docs.json sync - -### Surgical Reconciliation (Move & Rename) - -**Purpose**: Detect and apply structural changes (moves, renames) from English section to cn/jp automatically. - -**How it works**: -1. Compares English section between base commit and HEAD -2. Detects **moves** (same file, different `group_path`) and **renames** (deleted+added in same location) -3. Applies identical operations to cn/jp using **index-based navigation** - -**Index-based navigation**: -- Groups matched by position index, not name (works across translations: "Nodes" ≠ "节点") -- Location tracked as `group_indices: [0, 1]` (parent group index 0, child index 1) -- Navigates nested structures regardless of translated group names - -**Rename specifics**: -- Detects file extension (.md, .mdx) from physical file -- Preserves extension when renaming cn/jp files -- Updates docs.json entries (stored without extensions) - -### Navigation Sync Behavior - -**Manual editing**: Only edit English (`en`) section in docs.json - workflow syncs to cn/jp automatically. - -**Auto-sync operations**: -- **Added files**: Fresh translation, inserted at same index position as English -- **Modified files**: Context-aware update using existing translation + git diff -- **Deleted files**: Removed from all language sections + physical files -- **Moved files**: Detected via `group_path` changes, cn/jp relocated using index-based navigation -- **Renamed files**: Detected when deleted+added in same location, physical files renamed with extension preserved - -### First-Time Contributor Approval Flow - -For PRs from forks by contributors who are not OWNER/MEMBER/COLLABORATOR: - -1. **PR opened** → Analyze workflow runs → Execute workflow checks for approval -2. **No approval found** → Execute skips translation, posts "pending approval" comment -3. **Maintainer approves PR** → `sync_docs_on_approval.yml` triggers automatically -4. **"Approval received" comment posted** → Analyze workflow re-runs with fresh artifacts -5. **Execute workflow runs** → Finds approval → Creates translation PR - -**Approval requirements**: -- Reviewer must have OWNER, MEMBER, or COLLABORATOR association -- Approval triggers immediate translation (no additional push needed) -- Each approval posts a new comment preserving the timeline - -**Edge cases**: -- If translation PR already exists when approval happens → info comment posted, no re-run -- If Analyze run is too old to re-run → error comment with instructions to push a small commit -- Internal PRs (same repo, not fork) → no approval gate, auto-translates immediately - -**Manual trigger**: If approval flow fails, maintainers can manually trigger via Actions → Execute Documentation Sync → Run workflow (enter PR number) - -## Development Commands - -```bash -# Local preview -npm i -g mintlify -mintlify dev - -# Local translation testing -pip install -r tools/translate/requirements.txt -echo "DIFY_API_KEY=your_key" > tools/translate/.env -python tools/translate/main.py -``` - -**Configuration**: -- Terminology: `tools/translate/termbase_i18n.md` -- Languages: `tools/translate/config.json` -- Model: Configure in Dify Studio - -**Git Rules**: -- NEVER use `--no-verify` or skip hooks -- Create new branch for each feature/fix -- Commit frequently with descriptive messages - -## Testing & Debugging - -### Test Translation Workflow - -Create test PR with branch name `test/{operation}-{scope}`: - -- **Add**: New file + docs.json entry -- **Delete**: Remove file + docs.json entry -- **Update**: Modify existing file content -- **Move**: Move file between groups in docs.json (e.g., Getting Started → Nodes) -- **Rename**: Rename file + update docs.json entry (tests extension preservation) - -### Common Issues - -**Translation failures**: -- **HTTP 504**: Verify `response_mode: "streaming"` in `main.py` (NOT `"blocking"`) -- **Missing output**: Check Dify workflow has output variable `output1` -- **Failed workflow**: Review Dify workflow logs for node errors - -**Move/Rename issues**: -- **Not detected**: Check logs for "INFO: Detected X moves, Y renames" - if 0 when expecting changes, verify `group_path` actually changed between commits -- **Wrong location**: Structure mismatch between languages - verify group indices align (same nested structure) -- **File not found**: Extension detection failed - ensure file has .md or .mdx extension - -**Success log pattern**: -``` -INFO: Detected 1 moves, 0 renames, 0 adds, 0 deletes -INFO: Moving en/test-file from 'Dropdown > GroupA' to 'Dropdown > GroupB' -SUCCESS: Moved cn/test-file to new location -SUCCESS: Moved jp/test-file to new location -``` - -**Approval flow issues** (first-time contributors): -- **Translation not starting after approval**: Check Actions tab for `Retrigger Sync on Approval` workflow status -- **"Could not automatically start translation"**: Analyze run too old - push a small commit to trigger fresh workflow -- **Approval from non-maintainer**: Only OWNER/MEMBER/COLLABORATOR approvals unlock the gate -- **Multiple "pending approval" comments**: Normal - each commit triggers Execute which posts if no approval found - -## Translation A/B Testing - -For comparing translation quality between models or prompt variations: - -```bash -cd tools/translate-test-dify -./setup.sh -source venv/bin/activate -python run_test.py -python compare.py results// -``` - -**Important**: -- Never commit `results/`, `mock_docs/`, or real API keys -- Always redact keys with `app-***` before committing -- See `tools/translate-test-dify/README.md` for details - -## Key Paths - -- `docs.json` - Navigation structure -- `tools/translate/config.json` - Language configuration (single source of truth) -- `tools/translate/termbase_i18n.md` - Translation terminology database -- `tools/translate/sync_and_translate.py` - Core translation + surgical reconciliation logic -- `tools/translate-test-dify/` - Translation A/B testing framework -- `.github/workflows/sync_docs_analyze.yml` - Analyzes PR changes, uploads artifacts -- `.github/workflows/sync_docs_execute.yml` - Creates translation PRs (triggered by Analyze) -- `.github/workflows/sync_docs_update.yml` - Updates existing translation PRs -- `.github/workflows/sync_docs_cleanup.yml` - Cleans up sync PRs when original PR closes -- `.github/workflows/sync_docs_on_approval.yml` - Retriggers translation on maintainer approval +| Type | When | Example | +|:-----|:-----|:--------| +| `docs` | New or updated content | `docs: add workflow node configuration guide` | +| `fix` | Typos, broken links, incorrect info | `fix: correct broken link in knowledge base page` | +| `feat` | Tooling or structural changes | `feat: add search index to knowledge section` | +| `refactor` | Reorganization without content changes | `refactor: restructure knowledge base section` | +| `translate` | Translation additions or updates | `translate: update Japanese workflow pages` | +| `style` | Formatting-only changes | `style: fix heading levels in plugin guide` | +| `chore` | Dependencies, config | `chore: bump mintlify to 4.0.710` | diff --git a/README.md b/README.md new file mode 100644 index 000000000..f8fc235f8 --- /dev/null +++ b/README.md @@ -0,0 +1,103 @@ +# Dify Documentation + +Official documentation for [Dify](https://dify.ai), available in English, Chinese, and Japanese. + +--- + +## Contributing + +We welcome contributions! All content should be submitted in **English only** — Chinese and Japanese translations are generated automatically by our translation pipeline. + +### Quick Start + +1. Fork and clone the repository. +2. Create a branch, make your changes, and open a pull request against `main`. +3. Your PR will be reviewed by a maintainer. Once approved, translations are generated automatically. + +### Repository Structure + +``` +dify-docs/ +├── en/ # English documentation (source language) +├── zh/ # Chinese translations (auto-generated) +├── ja/ # Japanese translations (auto-generated) +├── writing-guides/ # Style guide, formatting guide, glossary +├── .claude/skills/ # Claude Code documentation skills +├── tools/translate/ # Translation pipeline +├── docs.json # Navigation structure +``` + +- All content changes should be made in the `en/` directory. Do not edit `zh/` or `ja/` directly except when specifically optimizing Chinese or Japanese translations. +- If you add or move a page, update the English section in `docs.json` — translations sync automatically. + +### File Format + +Documentation files use MDX with required YAML frontmatter: + +```mdx +--- +title: Your Page Title +description: A brief description of the page content. +--- + +Page content here... +``` + +### Commit and PR Conventions + +Commits and PR titles follow the same format: `{type}: {description}` + +- Lowercase, imperative mood ("add", not "added" or "adds"). +- No trailing period. +- Under 72 characters. + +| Type | When | Example | +|:-----|:-----|:--------| +| `docs` | New or updated content | `docs: add workflow node configuration guide` | +| `fix` | Typos, broken links, incorrect info | `fix: correct broken link in knowledge base page` | +| `feat` | Tooling or structural changes | `feat: add search index to knowledge section` | +| `refactor` | Reorganization without content changes | `refactor: restructure knowledge base section` | +| `translate` | Translation additions or updates | `translate: update Japanese workflow pages` | +| `style` | Formatting-only changes | `style: fix heading levels in plugin guide` | +| `chore` | Dependencies, config | `chore: bump mintlify to 4.0.710` | + +For non-obvious changes, add a body after a blank line explaining why: + +``` +fix: switch API response mode to streaming + +Blocking mode was causing HTTP 504 timeouts on large pages. +``` + +### Local Preview + +```bash +npm i -g mintlify +mintlify dev +``` + +This starts a local development server at `http://localhost:3000`. + +### Setup + +To enable the pre-commit hook (auto-regenerates the terminology database when you commit glossary changes): + +```bash +git config core.hooksPath .githooks +``` + +### Formatting Standards + +We maintain a formatting guide in [`writing-guides/formatting-guide.md`](writing-guides/formatting-guide.md). If you use an AI-powered editor or assistant (Cursor, Claude Code, Copilot, etc.), you can point it to that file to check your work before submitting. + +### AI-Assisted Contributing + +This repository includes Claude Code skills in `.claude/skills/` that provide writing assistance for different documentation types. If you use Claude Code, these skills are available automatically after cloning. + +### Guidelines + +- **One topic per PR.** Don't combine unrelated changes. +- **English only.** Translations are handled automatically. +- **Update navigation.** If you add a new page, add it to the English section of `docs.json`. +- **Test locally.** Run `mintlify dev` to verify your changes render correctly before opening a PR. +- **No secrets.** Never commit API keys, credentials, or `.env` files. diff --git a/tools/translate/derive-termbase.py b/tools/translate/derive-termbase.py new file mode 100755 index 000000000..e20bcefc8 --- /dev/null +++ b/tools/translate/derive-termbase.py @@ -0,0 +1,162 @@ +#!/usr/bin/env python3 +"""Derive termbase_i18n.md from writing-guides/glossary.md. + +Reads the rich glossary (with Notes and i18n Key columns), +strips them, and writes the lean termbase for the translation pipeline. +""" + +import argparse +import re +import sys +from pathlib import Path + +REPO_ROOT = Path(__file__).resolve().parent.parent.parent +GLOSSARY_PATH = REPO_ROOT / "writing-guides" / "glossary.md" +TERMBASE_PATH = Path(__file__).resolve().parent / "termbase_i18n.md" + +STATIC_FOOTER = """ +## General Guidelines + +Technical accuracy, English identifiers preserved, markdown formatting +maintained, professional tone. +""" + +# H2 sections to skip entirely (non-term content) +SKIP_SECTIONS: list[str] = [] + + +def parse_glossary(glossary_text: str) -> list[tuple[str, list[tuple[str, list[list[str]]]]]]: + """Parse glossary into top-level sections of (h2_heading, [(h3_heading, rows)]). + + Each row is [English, Chinese, Japanese] with Notes/i18n Key stripped. + """ + top_sections = [] + current_h2 = None + current_h3 = None + current_rows = [] + h3_groups = [] + header_seen = False + + def flush_h3(): + nonlocal current_h3, current_rows + if current_h3 and current_rows: + h3_groups.append((current_h3, current_rows)) + current_h3 = None + current_rows = [] + + def flush_h2(): + nonlocal current_h2, h3_groups + flush_h3() + if current_h2 and current_h2 not in SKIP_SECTIONS and h3_groups: + top_sections.append((current_h2, list(h3_groups))) + current_h2 = None + h3_groups = [] + + for line in glossary_text.splitlines(): + # Detect H2 headings + h2_match = re.match(r"^## (.+)", line) + if h2_match and not line.startswith("### "): + flush_h2() + current_h2 = h2_match.group(1).strip() + header_seen = False + continue + + # Detect H3 headings + h3_match = re.match(r"^### (.+)", line) + if h3_match: + flush_h3() + current_h3 = h3_match.group(1).strip() + header_seen = False + continue + + # Skip sections we don't want + if current_h2 and current_h2 in SKIP_SECTIONS: + continue + + # Detect table rows + if line.startswith("|"): + if not header_seen: + if re.match(r"\|[\s:]-", line): + header_seen = True + continue + if re.match(r"\|[\s:]-", line): + continue + + cells = [c.strip() for c in line.split("|")[1:-1]] + + if len(cells) >= 3: + english = cells[0] + chinese = cells[1] + japanese = cells[2] + + # Strip (UI) suffix from column values + english = re.sub(r"\s*\(UI\)\s*$", "", english) + + current_rows.append([english, chinese, japanese]) + + flush_h2() + return top_sections + + +def generate_termbase(top_sections: list[tuple[str, list[tuple[str, list[list[str]]]]]]) -> str: + """Generate lean termbase markdown from parsed sections.""" + lines = ["# Terminology Database", ""] + + for h2_heading, h3_groups in top_sections: + lines.append(f"## {h2_heading}") + lines.append("") + for h3_heading, rows in h3_groups: + lines.append(f"### {h3_heading}") + lines.append("") + lines.append("| English | Chinese | Japanese |") + lines.append("|:--------|:--------|:---------|") + for english, chinese, japanese in rows: + lines.append(f"| {english} | {chinese} | {japanese} |") + lines.append("") + + lines.append(STATIC_FOOTER.strip()) + lines.append("") + + return "\n".join(lines) + + +def main(): + parser = argparse.ArgumentParser( + description="Derive termbase_i18n.md from glossary.md" + ) + parser.add_argument( + "--check", + action="store_true", + help="Check if termbase is in sync (exit 1 if not). Does not write.", + ) + args = parser.parse_args() + + if not GLOSSARY_PATH.exists(): + print(f"Error: glossary not found at {GLOSSARY_PATH}", file=sys.stderr) + sys.exit(1) + + glossary_text = GLOSSARY_PATH.read_text(encoding="utf-8") + top_sections = parse_glossary(glossary_text) + generated = generate_termbase(top_sections) + + if args.check: + if not TERMBASE_PATH.exists(): + print(f"Error: termbase not found at {TERMBASE_PATH}", file=sys.stderr) + sys.exit(1) + current = TERMBASE_PATH.read_text(encoding="utf-8") + if current != generated: + print( + "termbase_i18n.md is out of sync with glossary.md.\n" + "Run `python3 tools/translate/derive-termbase.py` and commit the result.", + file=sys.stderr, + ) + sys.exit(1) + print("termbase_i18n.md is in sync with glossary.md.") + sys.exit(0) + + TERMBASE_PATH.write_text(generated, encoding="utf-8") + print(f"Generated {TERMBASE_PATH}") + + +if __name__ == "__main__": + main() diff --git a/tools/translate/formatting-ja.md b/tools/translate/formatting-ja.md new file mode 100644 index 000000000..b6f19074b --- /dev/null +++ b/tools/translate/formatting-ja.md @@ -0,0 +1,49 @@ +# Japanese Documentation Formatting Guide + +Formatting rules specific to Japanese (ja) translations. These supplement the general formatting guide at `writing-guides/formatting-guide.md`. + +--- + +## CJK-Latin Spacing + +Insert a space between Japanese characters and adjacent Latin letters, numbers, or backticked code—same principle as Chinese. + +| Correct | Incorrect | +|:--------|:----------| +| Docker を使用 | Dockerを使用 | +| 最大 15 MB | 最大15MB | +| `Temperature` パラメータ | `Temperature`パラメータ | + +## Punctuation + +Use full-width punctuation in Japanese text: + +| Type | Full-width | Half-width (do not use) | +|:-----|:-----------|:------------------------| +| Comma | 、 | , | +| Period | 。 | . | +| Middle dot | ・ | · | +| Parentheses | () | () | + +**Exception:** Use half-width punctuation inside code, URLs, and backticked text. + +## Katakana Conventions + +Use katakana for foreign loanwords and technical terms that have established katakana equivalents. Refer to the glossary for standard translations. + +When a loanword ending in "-er", "-or", "-ar" is written in katakana, follow the Microsoft Language Portal convention: include the trailing long vowel mark (ー) for words of 3 morae or fewer, omit it for longer words. + +| English | Katakana | +|:--------|:---------| +| server | サーバー | +| parameter | パラメータ | +| provider | プロバイダー | + +## Emphasis + +Do not use italic emphasis (`*text*`) in Japanese text. CJK italic rendering is poor in most fonts. Use bold (`**text**`) instead when emphasis is needed. + +## Numbers + +- Use Arabic numerals, not kanji numerals, for technical content. +- Use half-width numbers even in Japanese text. diff --git a/tools/translate/formatting-zh.md b/tools/translate/formatting-zh.md new file mode 100644 index 000000000..2a52ae9f8 --- /dev/null +++ b/tools/translate/formatting-zh.md @@ -0,0 +1,49 @@ +# Chinese Documentation Formatting Guide + +Formatting rules specific to Chinese (zh) translations. These supplement the general formatting guide at `writing-guides/formatting-guide.md`. + +--- + +## CJK-Latin Spacing + +Always insert a space between Chinese characters and adjacent Latin letters, numbers, or backticked code. + +| Correct | Incorrect | +|:--------|:----------| +| 使用 Docker 部署 | 使用Docker部署 | +| 最大文件大小为 15 MB | 最大文件大小为15MB | +| 设置 `Temperature` 参数 | 设置`Temperature`参数 | +| 支持 3 种模型 | 支持3种模型 | + +## Punctuation + +Use full-width punctuation in Chinese text: + +| Type | Full-width | Half-width (do not use) | +|:-----|:-----------|:------------------------| +| Comma | , | , | +| Period | 。 | . | +| Colon | : | : | +| Semicolon | ; | ; | +| Question mark | ? | ? | +| Exclamation | ! | ! | +| Parentheses | () | () | + +**Exception:** Use half-width punctuation inside code, URLs, and backticked text. + +## Quotation Marks + +Use corner bracket quotation marks: + +- Single: 「 」 +- Double (nested): 『 』 + +## Emphasis + +Do not use italic emphasis (`*text*`) in Chinese text. CJK italic rendering is poor in most fonts. Use bold (`**text**`) instead when emphasis is needed. + +## Numbers + +- Use Arabic numerals (1, 2, 3), not Chinese numerals (一、二、三), for + technical content. +- Use half-width numbers even in Chinese text. diff --git a/tools/translate/termbase_i18n.md b/tools/translate/termbase_i18n.md index 800a71b8b..ea74e60c0 100644 --- a/tools/translate/termbase_i18n.md +++ b/tools/translate/termbase_i18n.md @@ -1,23 +1,355 @@ -# Translation Termbase for i18n +# Terminology Database -This termbase provides consistent terminology for translating Dify documentation. +## General Terms -## Technical Terms +### Core Concepts -- **Workflow** → 工作流 (CN) / ワークフロー (JP) -- **Agent** → 智能体 (CN) / エージェント (JP) -- **Knowledge Base** → 知识库 (CN) / ナレッジベース (JP) -- **Model** → 模型 (CN) / モデル (JP) -- **Node** → 节点 (CN) / ノード (JP) -- **Variable** → 变量 (CN) / 変数 (JP) -- **Parameter** → 参数 (CN) / パラメータ (JP) -- **API** → API (CN) / API (JP) -- **Token** → 令牌 (CN) / トークン (JP) -- **Prompt** → 提示词 (CN) / プロンプト (JP) +| English | Chinese | Japanese | +|:--------|:--------|:---------| +| Workflow | 工作流 | ワークフロー | +| Chatflow | 对话流 | チャットフロー | +| Agent | Agent | Agent | +| Text Generator | 文本生成应用 | テキストジェネレーター | +| Agent app | Agent 应用 | エージェントアプリ | +| knowledge base | 知识库 | ナレッジベース | +| plugin | 插件 | プラグイン | +| Dify tool | Dify 工具 | ツール | +| workspace | 工作区 | ワークスペース | +| template | 模板 | テンプレート | +| WebApp | WebApp | WebApp | + +### Models + +| English | Chinese | Japanese | +|:--------|:--------|:---------| +| model | 模型 | モデル | +| model provider | 模型供应商 | モデルプロバイダー | +| LLM (large language model) | 大语言模型 | 大規模言語モデル | +| chat model | 对话模型 | チャットモデル | +| completion model | 文本续写模型 | 補完モデル | +| embedding model | 嵌入模型 | 埋め込みモデル | +| text embedding model | 文本嵌入模型 | テキスト埋め込みモデル | +| multimodal embedding model | 多模态嵌入模型 | マルチモーダル埋め込みモデル | +| rerank model | 重排序模型 | リランクモデル | +| reasoning model | 推理模型 | 推論モデル | +| moderation model | 内容审核模型 | モデレーションモデル | +| TTS model | 文字转语音模型 | TTSモデル | +| Speech2Text model | 语音转文字模型 | 音声認識モデル | +| token | token | token | +| API token | API 令牌 | APIトークン | +| prompt | 提示词 | プロンプト | +| system instruction | 系统指令 | システムインストラクション | +| user message | User 消息 | ユーザーメッセージ | +| assistant message | Assistant 消息 | アシスタントメッセージ | +| model tag | 模型标签 | モデルタグ | + +### Nodes + +| English | Chinese | Japanese | +|:--------|:--------|:---------| +| node | 节点 | ノード | +| User Input | 用户输入 | ユーザー入力 | +| Output | 输出 | 出力 | +| Answer | 直接回复 | 回答 | +| LLM | LLM | LLM | +| Knowledge Retrieval | 知识检索 | ナレッジ検索 | +| Question Classifier | 问题分类器 | 質問分類器 | +| IF/ELSE | 条件分支 | IF/ELSE | +| Code | 代码执行 | コード実行 | +| Template | 模板转换 | テンプレート | +| HTTP Request | HTTP 请求 | HTTP リクエスト | +| Variable Aggregator | 变量聚合器 | 変数集約器 | +| Variable Assigner | 变量赋值器 | 変数代入器 | +| Tool | 工具 | ツール | +| Parameter Extractor | 参数提取器 | パラメータ抽出 | +| Iteration | 迭代 | イテレーション | +| Loop | 循环 | ループ | +| Doc Extractor | 文档提取器 | テキスト抽出 | +| List Operator | 列表操作 | リスト処理 | +| Agent | Agent | エージェント | +| Human Input | 人工介入 | 人間の入力 | +| Schedule Trigger | 定时触发器 | スケジュールトリガー | +| Webhook Trigger | Webhook 触发器 | Webhook トリガー | +| Plugin Trigger | 插件触发器 | プラグイントリガー | +| Command | 命令 | コマンド | +| Upload File to Sandbox | 上传文件至沙盒 | サンドボックスへのファイルアップロード | + +### Knowledge & Retrieval + +| English | Chinese | Japanese | +|:--------|:--------|:---------| +| knowledge base | 知识库 | ナレッジベース | +| chunk | 分段 | チャンク | +| chunking | 分段 | チャンキング | +| retrieval | 检索 | 検索 | +| retrieval mode | 检索模式 | 検索モード | +| indexing | 索引 | インデックス | +| index method | 索引方法 | インデックス方法 | +| embedding | 嵌入 | 埋め込み | +| metadata | 元数据 | メタデータ | +| delimiter | 分隔符 | デリミタ | +| maximum chunk length | 最大分段长度 | 最大チャンク長 | +| chunk overlap | 分段重叠 | チャンクオーバーラップ | +| General mode | 通用模式 | 汎用モード | +| Parent-child mode | 父子模式 | 親子モード | +| parent chunk | 父分段 | 親チャンク | +| child chunk | 子分段 | 子チャンク | +| Paragraph mode | 段落模式 | 段落モード | +| Full Doc mode | 全文档模式 | 全文書モード | +| pre-processing | 预处理 | 前処理 | +| Summary Auto-Gen | 摘要自动生成 | 要約自動生成 | +| Top K | Top K | Top K | +| score threshold | 分数阈值 | スコアしきい値 | + +### Configuration & Parameters + +| English | Chinese | Japanese | +|:--------|:--------|:---------| +| variable | 变量 | 変数 | +| environment variable | 环境变量 | 環境変数 | +| Top P | Top P | Top P | +| context variable | 上下文变量 | コンテキスト変数 | +| conversation memory | 对话记忆 | 会話メモリ | +| window size | 窗口大小 | ウィンドウサイズ | +| structured output | 结构化输出 | 構造化出力 | +| input field | 输入字段 | 入力フィールド | +| request form | 请求表单 | リクエストフォーム | +| Assemble Variable | 变量组装 | 変数アセンブル | + +### Agent + +| English | Chinese | Japanese | +|:--------|:--------|:---------| +| Max Iterations | 最大迭代次数 | 最大イテレーション数 | + +### Infrastructure + +| English | Chinese | Japanese | +|:--------|:--------|:---------| +| self-hosted | 自托管 | セルフホスト | +| SaaS | SaaS | SaaS | +| Docker | Docker | Docker | +| sandbox | 沙箱 | サンドボックス | +| API | API | API | +| runtime | 运行时 | ランタイム | +| classic runtime | 经典运行时 | クラシックランタイム | +| sandboxed runtime | 沙盒运行时 | サンドボックスランタイム | +| skill | 技能 | スキル | +| file system | 文件系统 | ファイルシステム | + +### Marketplace + +| English | Chinese | Japanese | +|:--------|:--------|:---------| +| Marketplace | 市场 | マーケットプレイス | +| Creator Center | 创作者中心 | クリエイターセンター | + +## UI Labels + +### Sidebar & Navigation + +| English | Chinese | Japanese | +|:--------|:--------|:---------| +| Studio | 工作室 | スタジオ | +| Knowledge | 知识库 | ナレッジ | +| Explore | 探索 | 探索 | +| Plugins | 插件 | プラグイン | +| Tools | 工具 | ツール | + +### App Detail Tabs + +| English | Chinese | Japanese | +|:--------|:--------|:---------| +| Orchestrate | 编排 | オーケストレート | +| Monitoring | 监测 | 監視 | +| API Access | 访问 API | API アクセス | +| Logs & Annotations | 日志与标注 | ログ&注釈 | + +### Knowledge Detail Tabs + +| English | Chinese | Japanese | +|:--------|:--------|:---------| +| Documents | 文档 | ドキュメント | +| Retrieval Testing | 召回测试 | 検索テスト | +| Settings | 设置 | 設定 | +| Pipeline | 流水线 | パイプライン | + +### Settings Panel + +| English | Chinese | Japanese | +|:--------|:--------|:---------| +| My account | 我的账户 | マイアカウント | +| Members | 成员 | メンバー | +| Model Provider | 模型供应商 | モデルプロバイダー | +| Data Source | 数据来源 | データソース | +| API Extension | API 扩展 | API 拡張 | +| Billing | 账单 | 請求 | +| Integrations | 集成 | 統合 | +| System Model Settings | 系统模型设置 | システムモデル設定 | +| System Reasoning Model | 系统推理模型 | システム推論モデル | +| Embedding Model | Embedding 模型 | 埋め込みモデル | +| Rerank Model | Rerank 模型 | Rerank モデル | +| Speech-to-Text Model | 语音转文本模型 | 音声-to-テキストモデル | +| Text-to-Speech Model | 文本转语音模型 | テキスト-to-音声モデル | +| Load Balancing | 负载均衡 | 負荷分散 | +| Message Credits | 消息额度 | クレジット | + +### Workspace Roles + +| English | Chinese | Japanese | +|:--------|:--------|:---------| +| Owner | 所有者 | オーナー | +| Admin | 管理员 | 管理者 | +| Editor | 编辑 | エディター | +| Builder | 构建器 | ビルダー | +| Knowledge Admin | 知识库管理员 | ナレッジ管理員 | +| Normal | 成员 | 通常 | + +### App Type Selectors + +| English | Chinese | Japanese | +|:--------|:--------|:---------| +| Assistant | 助手 | アシスタント | +| Chatbot | 聊天助手 | チャットボット | +| Chatflow | Chatflow | チャットフロー | +| Completion | 文本生成 | テキスト生成 | + +### App Actions + +| English | Chinese | Japanese | +|:--------|:--------|:---------| +| Duplicate | 复制 | 複製 | +| Export DSL | 导出 DSL | DSL をエクスポート | +| Import DSL file | 导入 DSL 文件 | DSL ファイルをインポート | +| Create from Blank | 创建空白应用 | 最初から作成 | +| Create from Template | 从应用模板创建 | テンプレートから作成 | +| Tracing | 追踪 | 追跡 | +| Web App Access Control | Web 应用访问控制 | Web アプリアクセス制御 | + +### Workflow Node Names + +| English | Chinese | Japanese | +|:--------|:--------|:---------| +| User Input | 用户输入 | ユーザー入力 | +| LLM | LLM | LLM | +| Knowledge Retrieval | 知识检索 | 知識検索 | +| IF/ELSE | 条件分支 | IF/ELSE | +| Code | 代码执行 | コード実行 | +| Template | 模板转换 | テンプレート | +| Question Classifier | 问题分类器 | 質問分類器 | +| HTTP Request | HTTP 请求 | HTTP リクエスト | +| Variable Aggregator | 变量聚合器 | 変数集約器 | +| Variable Assigner | 变量赋值 | 変数代入 | +| Iteration | 迭代 | イテレーション | +| Loop | 循环 | ループ | +| Parameter Extractor | 参数提取器 | パラメータ抽出 | +| Doc Extractor | 文档提取器 | テキスト抽出 | +| List Operator | 列表操作 | リスト処理 | +| Output | 输出 | 出力 | +| Answer | 直接回复 | 回答 | +| Human Input | 人工介入 | 人間の入力 | +| Webhook Trigger | Webhook 触发器 | Webhook トリガー | +| Schedule Trigger | 定时触发器 | スケジュールトリガー | +| Plugin Trigger | 插件触发器 | プラグイントリガー | +| Knowledge Base | 知识库 | 知識ベース | + +### Workflow Controls + +| English | Chinese | Japanese | +|:--------|:--------|:---------| +| Publish | 发布 | 公開する | +| Published | 已发布 | 公開済み | +| Unpublished | 未发布 | 未公開 | +| Preview | 预览 | プレビュー | +| Test Run | 测试运行 | テスト実行 | +| Run App | 运行 | アプリを実行 | +| Features | 功能 | 機能 | +| Version History | 版本历史 | バージョン履歴 | +| Workflow as Tool | 发布为工具 | ワークフローをツールとして公開する | +| Embed Into Site | 嵌入网站 | サイトに埋め込む | +| Conversation Variables | 会话变量 | 会話変数 | +| Environment Variables | 环境变量 | 環境変数 | +| System Variables | 系统变量 | システム変数 | + +### Agent Node Config + +| English | Chinese | Japanese | +|:--------|:--------|:---------| +| Agentic Strategy | Agent 策略 | エージェンティック戦略 | +| Query Variable | 查询变量 | 検索変数 | +| Metadata Filtering | 元数据过滤 | メタデータフィルタ | + +### Knowledge Retrieval Methods + +| English | Chinese | Japanese | +|:--------|:--------|:---------| +| Vector Search | 向量检索 | ベクトル検索 | +| Full-Text Search | 全文检索 | 全文検索 | +| Hybrid Search | 混合检索 | ハイブリッド検索 | +| Inverted Index | 倒排索引 | 転置インデックス | +| Weighted Score | 权重设置 | ウェイト設定 | + +### Knowledge Settings + +| English | Chinese | Japanese | +|:--------|:--------|:---------| +| External Knowledge Base | 外部知识库 | 外部知識ベース | +| External API | 外部 API | 外部 API | +| Service API | 服务 API | サービスAPI | +| Multimodal | 多模态 | マルチモーダル | + +### Chunking Mode Labels + +| English | Chinese | Japanese | +|:--------|:--------|:---------| +| General | 通用 | 汎用 | +| Parent-child | 父子 | 親子 | +| Q&A | 问答 | Q&A | +| Graph | 图 | グラフ | +| Full-doc | 全文 | 全体 | +| Paragraph | 段落 | 段落 | + +### Document Processing + +| English | Chinese | Japanese | +|:--------|:--------|:---------| +| Index Method | 索引方式 | インデックス方法 | +| High Quality | 高质量 | 高品質 | +| Economical | 经济 | 経済的 | +| Chunk Settings | 分段设置 | チャンク設定 | +| Text Pre-processing Rules | 文本预处理规则 | テキストの前処理ルール | +| Automatic | 自动分段与清洗 | 自動 | +| Custom | 自定义 | カスタム | +| Preview Chunk | 预览块 | チャンクをプレビュー | +| Data Source | 选择数据源 | データソース | +| Document Processing | 文本分段与清洗 | テキスト進行中 | +| Execute & Finish | 处理并完成 | 実行と完成 | +| Import from file | 导入已有文本 | テキストファイルからインポート | +| Sync from Notion | 同步自 Notion 内容 | Notion から同期 | +| Sync from website | 同步自 Web 站点 | ウェブサイトから同期 | + +### Document Status Labels + +| English | Chinese | Japanese | +|:--------|:--------|:---------| +| Enabled | 已启用 | 有効 | +| Disabled | 已禁用 | 無効 | +| Archived | 已归档 | アーカイブ済み | +| Available | 可用 | 利用可能 | +| Indexing | 索引中 | インデックス化中 | +| Queuing | 排队中 | キューイング中 | +| Paused | 已暂停 | 一時停止中 | +| Error | 错误 | エラー | + +### Document Embedding Modes + +| English | Chinese | Japanese | +|:--------|:--------|:---------| +| Chunking Setting | 分段模式 | チャンキングモード | +| High-quality mode | 高质量模式 | 高品質モード | +| Economy mode | 经济模式 | 経済モード | ## General Guidelines -- Maintain technical accuracy while adapting to local conventions -- Keep code examples and technical identifiers in English -- Preserve markdown formatting and structure -- Maintain a professional and clear tone +Technical accuracy, English identifiers preserved, markdown formatting +maintained, professional tone. diff --git a/tools/translate/translation-system-overview.md b/tools/translate/translation-system-overview.md new file mode 100644 index 000000000..cf248969f --- /dev/null +++ b/tools/translate/translation-system-overview.md @@ -0,0 +1,345 @@ +# Dify Documentation Translation System + +A complete overview of the automated translation pipeline in the dify-docs repository. Covers the Python translation engine, GitHub Actions workflows, and the end-to-end flow from PR creation to translated documentation. + +--- + +## Architecture + +The system has two layers: + +1. **Translation engine** (`tools/translate/`) — Python code that calls the Dify API to translate documents. +2. **Automation** (`.github/workflows/sync_docs_*.yml`) — GitHub Actions that trigger the engine automatically on PR events. + +``` +PR created/updated with changes in en/ + │ + ▼ + ┌─────────────┐ + │ Analyze │ Detects what changed, generates a sync plan + └──────┬──────┘ + │ + ▼ + ┌─────────────┐ + │ Route │──→ New PR? ──→ Execute (creates translation PR) + │ │──→ Existing translation PR? ──→ Update (incremental) + └──────┬──────┘ + │ + ▼ + ┌─────────────┐ + │ Cleanup │ When original PR is closed/merged + └─────────────┘ +``` + +--- + +## Translation Engine (`tools/translate/`) + +### `main.py` — Local CLI Tool + +Entry point for manual, single-file translation. Useful for testing. + +```bash +# Interactive mode +python main.py + +# Translate a specific file +python main.py en/some-file.mdx + +# With explicit API key +python main.py en/some-file.mdx app-xxx +``` + +- Calls Dify API in **streaming mode** (avoids HTTP 504 timeouts). +- Retry logic with exponential backoff: 30s, 60s, 120s, 240s, 300s. +- 600s timeout for streaming response. +- Concurrency limited to 2 simultaneous translations. + +### `sync_and_translate.py` — Core Engine (~2,100 lines) + +The main orchestrator. Handles the full translation lifecycle: + +| Responsibility | How | +|:---------------|:----| +| Detect changes | Git diff between commits | +| Translate new files | Fresh translation of full content | +| Update modified files | Context-aware: sends existing translation + git diff to AI | +| Delete removed files | Removes target language files + docs.json entries | +| Sync docs.json | Inserts/removes entries at matching index positions | +| Surgical reconciliation | Auto-detects moves/renames in docs.json structure | + +**Key class: `DocsSynchronizer`** + +- Loads config from `config.json`. +- Translates files via Dify API with terminology database context. +- Inserts AI translation notice after frontmatter in each translated file. +- Syncs `docs.json` navigation structure across languages. + +**Surgical reconciliation** detects when files are moved between sections or renamed in `docs.json`: + +- Compares English section structure between base commit and HEAD. +- Detects **moves** (same file, different group path) and **renames** (deleted + added in same location). +- Applies identical operations to Chinese and Japanese using **index-based navigation** — groups are matched by position index, not by name. This works because "Nodes" in English is "节点" in Chinese and "ノード" in Japanese. + +### `translate_pr.py` — PR Orchestration + +Manages the translation branch lifecycle for GitHub workflow use: + +- Creates `docs-sync-pr-{PR_NUMBER}` branch from `origin/main` (not from PR branch — this prevents stale state). +- Checks out only the files the PR actually changed. +- Merges `docs.json` structure: main's structure + branch's translations. +- Commits and pushes translated files. + +**Stale PR handling:** When other PRs are merged to main before this PR processes, the translation branch is created from the latest main. Only this PR's changed files are checked out. This prevents reverting changes from other merged PRs. + +### `pr_analyzer.py` — PR Analysis + +Validates and categorizes PRs: + +- Classifies PR as `source`, `translation`, or `none`. +- Rejects mixed PRs (both source and translation changes). +- Validates file paths (no directory traversal). +- Checks ignore list from config. +- Enforces file size limits (10MB). + +### `json_formatter.py` — Format-Preserving JSON + +Detects original JSON formatting (indent style, indent size, trailing newlines, key spacing) and rewrites `docs.json` maintaining the exact format. Prevents noisy diffs from formatting changes. + +### `sync_by_path.py` — Utility for Specific Paths + +Translate specific files or directories on demand: + +```bash +python sync_by_path.py --file en/test.mdx --api-key app-xxx +python sync_by_path.py --dir en/guides/ --api-key app-xxx --dry-run +``` + +### `openapi/` — OpenAPI Translation Pipeline + +Separate pipeline for OpenAPI specification files: + +- `extractor.py` — Extracts translatable fields from OpenAPI JSON. +- `translator.py` — Translates field values via Dify API. +- `rehydrator.py` — Rebuilds OpenAPI JSON with translated values. + +Translatable fields are defined in `config.json`: `title`, `summary`, `description`. + +--- + +## Configuration + +### `config.json` — Single Source of Truth + +```json +{ + "source_language": "en", + "target_languages": ["zh", "ja"], + "languages": { + "en": { "code": "en", "name": "English", "directory": "en" }, + "zh": { "code": "zh", "name": "Chinese", "directory": "zh", "translation_notice": "..." }, + "ja": { "code": "ja", "name": "Japanese", "directory": "ja", "translation_notice": "..." } + }, + "max_files_per_run": 10, + "max_openapi_files_per_run": 5 +} +``` + +Also contains: versioned doc paths, ignore list, label translations, and translatable OpenAPI fields. + +**Adding a new language:** + +1. Add language code to `target_languages`. +2. Add language entry to `languages` with `code`, `name`, `directory`, `translation_notice`. +3. Create the directory structure. +4. Update workflow path filters in `.github/workflows/sync_docs_analyze.yml`. + +### `termbase_i18n.md` — Terminology Database + +A markdown file containing standardized translations for technical terms (Workflow, Agent, Knowledge Base, Node, Variable, etc.). Passed to the Dify API alongside each document to ensure consistent terminology across all translations. + +### `.env` / `.env.example` + +``` +DIFY_API_KEY=your_dify_api_key_here +``` + +Required for local usage. In GitHub Actions, the key is stored as a repository secret. + +--- + +## GitHub Actions Workflows + +### `sync_docs_analyze.yml` — Analyze + +**Trigger:** PR opened, updated, or reopened with changes to `en/`, `zh/`, `ja/`, or `docs.json`. + +**What it does:** + +1. Determines comparison range: + - New PR: uses merge-base (where branch diverged from main). + - Updated PR: uses `Last-Processed-Commit` from translation PR for incremental range. +2. Calls `pr_analyzer.py` to classify changes. +3. Validates file paths and sizes. +4. Generates sync plan via `SyncPlanGenerator`. +5. Uploads artifacts (1-day retention): `sync_plan.json`, `analysis.json`, `changed_files.txt`. + +### `sync_docs_execute.yml` — Execute + +**Trigger:** Analyze workflow succeeds (new PR, no existing translation PR). + +**What it does:** + +1. Downloads analysis artifacts from Analyze. +2. **Approval gate for fork PRs:** + - Checks if PR author is OWNER, MEMBER, or COLLABORATOR. + - If not: posts "pending approval" comment, skips translation. + - If approved by maintainer: proceeds. +3. Calls `translate_pr.py` with PR number, head SHA, base SHA. +4. Creates `docs-sync-pr-{NUMBER}` branch with translated files. +5. Opens translation PR linking back to the source PR. +6. Comments on source PR with link to translation PR. + +### `sync_docs_update.yml` — Update + +**Trigger:** Analyze workflow succeeds (existing translation PR found). + +**What it does:** + +1. Finds existing translation PR/branch. +2. Reads `Last-Processed-Commit` from translation PR to determine incremental range. +3. Calls `translate_pr.py --is-incremental` — only translates files changed since last processing. +4. Context-aware: passes existing translation + git diff to AI for each modified file. +5. Pushes new commits to translation branch. +6. Comments on both PRs about the update. + +### `sync_docs_cleanup.yml` — Cleanup + +**Trigger:** Original PR closed (merged or abandoned). + +**What it does:** + +- If original PR was **merged**: leaves translation PR open (it can be merged independently). +- If original PR was **closed without merging**: closes the translation PR with an explanatory comment. + +### `sync_docs_on_approval.yml` — Approval Gate + +**Trigger:** PR review submitted with "Approved" state. + +**What it does:** + +1. Validates reviewer is OWNER, MEMBER, or COLLABORATOR. +2. Checks if this is a fork PR that needs the gate. +3. If translation PR already exists: posts info comment, skips. +4. Posts "Approval received" comment. +5. Re-runs the most recent Analyze workflow for this PR. +6. This triggers Execute, which now finds the approval and proceeds. + +If the Analyze run is too old to re-run, it posts a comment suggesting the contributor push a small commit to trigger a fresh workflow. + +--- + +## How a Translation Actually Happens + +The Python code calls the Dify API endpoint (`https://api.dify.ai/v1/workflows/run`) in streaming mode with these inputs: + +| Input | Value | +|:------|:------| +| `the_doc` | Full document content | +| `termbase` | Contents of `termbase_i18n.md` | +| `original_language` | "English" | +| `output_language1` | "Chinese" or "Japanese" | +| `the_doc_exist` | Existing translation (for modified files only) | +| `diff_original` | Git diff (for modified files only) | + +The Dify workflow returns the translated content via the `output1` variable. The Python code then inserts a translation notice after the frontmatter. + +**For new files:** Fresh full translation. Typically takes ~30–60 seconds per language. + +**For modified files:** Context-aware update. The AI receives the current translation and only the diff, so it updates the relevant sections rather than retranslating from scratch. Typically takes ~2–3 minutes per language. + +**Translation direction is always English → Chinese and English → Japanese.** Not Chinese → Japanese. + +--- + +## End-to-End Flow + +### Internal Maintainer PR + +``` +1. Maintainer creates PR with changes in en/ +2. Analyze runs → generates sync plan +3. Execute runs → no approval gate → calls Dify API → creates translation PR +4. Translation PR (docs-sync-pr-{NUMBER}) is created automatically +5. If maintainer pushes more commits → Analyze → Update runs incrementally +6. Maintainer merges source PR → Cleanup leaves translation PR open +7. Maintainer reviews and merges translation PR +``` + +### External Contributor PR (Fork) + +``` +1. Contributor creates PR from fork with changes in en/ +2. Analyze runs → generates sync plan +3. Execute runs → detects fork PR, author not trusted → posts "pending approval" comment +4. Maintainer reviews and approves the PR +5. On Approval workflow triggers → posts "Approval received" → re-runs Analyze +6. Execute runs again → finds approval → creates translation PR +7. Normal flow from here (same as internal) +``` + +--- + +## Safeguards + +| Safeguard | Details | +|:----------|:--------| +| Streaming mode | Avoids HTTP 504 gateway timeouts on long translations | +| Retry with backoff | 30s, 60s, 120s, 240s, 300s intervals | +| Concurrency limit | Max 2 simultaneous translations | +| File processing limit | Max 10 docs + 5 OpenAPI files per run | +| File size limit | 10MB per file | +| Path validation | Rejects directory traversal attempts | +| Mixed PR rejection | PRs cannot contain both source and translation changes | +| Stale PR protection | Translation branch from main, only PR files checked out | +| Format preservation | JSON formatting detected and maintained | +| Approval gate | Fork PRs require OWNER/MEMBER/COLLABORATOR approval | + +--- + +## Translation Testing Framework (`tools/translate-test-dify/`) + +A/B testing framework for comparing translation quality between models or prompt variations: + +```bash +cd tools/translate-test-dify +./setup.sh +source venv/bin/activate +python run_test.py spec.md +python compare.py results// +``` + +Test specs define multiple API keys (different models/prompts) and test content. The framework runs the same content through each variant and generates comparison reports. + +**Important:** Never commit `results/`, `mock_docs/`, or real API keys. Redact keys with `app-***` before committing. + +--- + +## Key File Reference + +| File | Purpose | +|:-----|:--------| +| `tools/translate/config.json` | Language config, processing limits, ignore list | +| `tools/translate/termbase_i18n.md` | Terminology database for consistent translations | +| `tools/translate/main.py` | Local CLI translation tool | +| `tools/translate/sync_and_translate.py` | Core translation + reconciliation engine | +| `tools/translate/translate_pr.py` | PR-level translation orchestration | +| `tools/translate/pr_analyzer.py` | PR change analysis and validation | +| `tools/translate/json_formatter.py` | Format-preserving JSON serialization | +| `tools/translate/sync_by_path.py` | Translate specific files/directories | +| `tools/translate/openapi/` | OpenAPI spec translation pipeline | +| `.github/workflows/sync_docs_analyze.yml` | PR analysis workflow | +| `.github/workflows/sync_docs_execute.yml` | Translation PR creation workflow | +| `.github/workflows/sync_docs_update.yml` | Incremental translation update workflow | +| `.github/workflows/sync_docs_cleanup.yml` | PR cleanup on close/merge | +| `.github/workflows/sync_docs_on_approval.yml` | Fork PR approval gate | diff --git a/writing-guides/formatting-guide.md b/writing-guides/formatting-guide.md new file mode 100644 index 000000000..ae72e16b3 --- /dev/null +++ b/writing-guides/formatting-guide.md @@ -0,0 +1,274 @@ +# Dify Documentation Formatting Guide + +This document defines the formatting standards for Dify documentation. Contributors can load this guide into an AI assistant to automatically check and fix formatting issues during the writing process. + +> **For AI assistants:** When a contributor asks you to review their documentation, check every rule in this guide and report violations with line-level references. When asked to fix formatting, apply all rules in a single pass. + +--- + +## Frontmatter + +Every MDX file must start with YAML frontmatter containing at least `title` and `description`: + +```yaml +--- +title: "Page Title" +description: "A concise summary of what this page covers." +--- +``` + +- `title` is **required**. `description` is required for new pages; existing pages should be updated over time. +- `sidebarTitle` is optional. Add it when the title is too long for the sidebar, or when other pages in the same group follow a specific `sidebarTitle` convention. +- Use double quotes around values that contain special characters. +- Leave one blank line after the closing `---` before the document body. + +--- + +## Headings + +- Use **title case**: `## Model Selection and Parameters`, not `## Model selection and parameters`. +- Page titles and section titles starting with a verb should use the base form (imperative), not the "-ing" form: `## Create a Workflow`, not `## Creating a Workflow`. +- Use H2 (`##`) for major sections, H3 (`###`) for subsections, H4 (`####`) for deeper subsections. +- Do not skip heading levels (e.g., don't jump from H2 to H4). +- One blank line before and after each heading. +- Do not add a trailing `#` to headings. + +--- + +## Bold and Italic + +### Bold (`**text**`) + +Use bold for: + +- **UI elements**: button names, menu items, tab labels, field names. + - Example: `Click **Save and Authorize** to confirm.` +- **Key terms** when first introduced or when emphasis is critical. + +Do not use bold for general emphasis in running text. If everything is bold, nothing stands out. + +### Italic (`*text*`) + +- Always use single asterisks (`*text*`), never underscores (`_text_`). +- Use sparingly — for semantic emphasis, alternative phrasings, or example values. + - Example: `Both plugin triggers and webhook triggers make your workflow *event-driven*.` + +--- + +## Lists + +- Use dashes (`-`) for unordered lists, not asterisks (`*`). +- Use numbered lists (`1.`, `2.`, `3.`) only for sequential steps. +- Leave a blank line before the first list item and after the last. +- For descriptive lists, use the **bold label + colon** pattern: + +```markdown +- **Delivery method**: How the request form reaches recipients. +- **Form content**: What information recipients will see. +``` + +- End list items with a period when they are complete sentences or clauses. Omit periods for short phrases or fragments. +- Nested lists: indent with 2 spaces. +- If an expanded description, callout, or image belongs to a specific list item rather than the main body text, indent it with two spaces below that item to ensure correct rendering. + +--- + +## Code + +### Inline Code + +Use backticks for: + +- Variable names: `` `{{variable_name}}` `` +- File paths and extensions: `` `.env` ``, `` `docker-compose.yml` `` +- Configuration values: `` `streaming` ``, `` `true` `` +- Special characters and delimiters: `` `\n\n` `` + +Do not use backticks for product names, UI labels, or general English words. + +### Code Blocks + +- Always specify a language tag: `` ```python ``, `` ```bash ``, `` ```json ``, `` ```text ``. +- Use `` ```text `` for plain text, prompts, or output that isn't code. +- One blank line before and after the code block. +- No indentation inside code blocks (start at column 0). + +--- + +## Links + +### Internal Links + +- Use absolute paths from the language root: `[Link text](/en/path/to/page)` +- Include anchor references when linking to a specific section: `[Retrieval Settings](/en/use-dify/knowledge/create-knowledge/setting-indexing-methods#setting-the-retrieval-setting)` +- Use descriptive link text. Never use "click here" or "here" as link text. + +### External Links + +- Use full URLs: `[Dify Marketplace](https://marketplace.dify.ai/)` +- Ensure all external links are HTTPS. + +--- + +## Images + +### Preferred Format (Frame Component) + +```mdx + + LLM Node Overview + +``` + +- Include a `caption` for images that need a title or description. +- Always include a descriptive `alt` attribute. + +--- + +## Mintlify Components + +### Info, Tip, Note, Warning + +Use these for callouts instead of italics or raw text. Each serves a different purpose: + +| Component | When to use | +|:----------|:------------| +| `` | General informational content—helpful context, version-specific or deployment-specific details | +| `` | Helpful suggestions or shortcuts | +| `` | Important information that requires attention—missing it could lead to potential complications | +| `` | Actions that could cause errors or data loss | + +Format: + +```mdx + +Configure at least one model provider in **System Settings** > **Model Providers** before using LLM nodes. + +``` + +- One blank line before and after the component. +- Content inside can include bold, links, and other inline formatting. + +### Tabs + +Use for presenting alternatives (e.g., different methods, OS-specific instructions): + +```mdx + + + Content for this tab. + + + Content for this tab. + + +``` + +- Each `` must have a `title` attribute. + +### Steps + +Use for sequential procedures: + +````mdx + + + Clone the source code to your local machine. + + ```bash + git clone https://github.com/langgenius/dify.git + ``` + + + Navigate to the docker directory and start the containers. + + +```` + +### Accordion + +Use for optional or supplementary content: + +```mdx + + A webhook allows one system to automatically send real-time data to another system. + +``` + +### CodeGroup + +Use for showing multiple code variants of the same operation: + +````mdx + + ```bash Docker Compose V2 + docker compose up -d + ``` + ```bash Docker Compose V1 + docker-compose up -d + ``` + +```` + +--- + +## Tables + +- Left-align columns by default using `:---`. +- Use bold for header row content only when it adds clarity. +- For multi-line content within cells, prefer lists or components. When manual line breaks are needed, use `



` between lines. +- Mintlify components (``, ``) can be embedded within table cells when necessary. + +```markdown +| Setting | Description | +|:--------------|:-----------------------------------------| +| **Name** | Identifies the knowledge base. | +| **Description** | Indicates the knowledge base's purpose.| +``` + +--- + +## UI Element References + +| Element type | Format | Example | +|:-------------|:-------|:--------| +| Buttons | Bold | `**Save and Authorize**` | +| Menu paths | Bold with arrow | `**System Settings** > **Model Providers**` | +| Tab/section names | Bold | `**Quick Settings**` | +| Field names | Bold | `**Temperature**` | +| Status indicators | Bold | `**Edited**`, `**Healthy**` | + +- Use the arrow character `>` (not `→`, `->`, or `=>`) for menu paths. + +--- + +## Spacing + +- **One blank line** between paragraphs, before/after headings, before/after components, before/after code blocks. +- **No double blank lines** anywhere in the file. +- Do not leave trailing whitespace at the end of lines. + +--- + +## Punctuation + +- **Em dashes**: No spaces around em dashes — write `word—word`, not `word — word`. +- **En dashes**: No spaces around en dashes in ranges — write `2–4`, not `2 – 4`. + +--- + +## Quick Checklist + +Before submitting, verify: + +- [ ] Frontmatter has both `title` and `description` +- [ ] Headings use title case; verb-leading titles use imperative form +- [ ] Headings follow the correct hierarchy (H2 → H3 → H4) +- [ ] Bold is used for UI elements and key terms +- [ ] Italics use `*asterisks*`, not `_underscores_` +- [ ] Lists use dashes (`-`), not asterisks (`*`) +- [ ] Code blocks have language tags +- [ ] Internal links use absolute paths (`/en/...`) +- [ ] Images have `alt` attributes +- [ ] No double blank lines +- [ ] Em dashes and en dashes have no surrounding spaces diff --git a/writing-guides/glossary.md b/writing-guides/glossary.md new file mode 100644 index 000000000..a5544a202 --- /dev/null +++ b/writing-guides/glossary.md @@ -0,0 +1,358 @@ +# Dify Documentation Glossary + +Standard terminology for Dify documentation. Single source of truth for writers, translators, and the automated translation pipeline. + +--- + +## General Terms + +Terms appear in body text exactly as written in this table. Capitalize them further only when external rules require it (start of a sentence, title case headings). + +### Core Concepts + +| English | Chinese | Japanese | Notes | +|:--------|:--------|:---------|:------| +| Workflow | 工作流 | ワークフロー | | +| Chatflow | 对话流 | チャットフロー | | +| Agent | Agent | Agent | Dify App type (alongside Workflow, Chatflow, etc.) that autonomously uses tools| +| Text Generator | 文本生成应用 | テキストジェネレーター | | +| knowledge base | 知识库 | ナレッジベース | Always lowercase unless at sentence start | +| plugin | 插件 | プラグイン | | +| Dify tool | Dify 工具 | ツール | | +| workspace | 工作区 | ワークスペース | | +| template | 模板 | テンプレート | Published app that others can download from Dify Marketplace and use | +| WebApp | WebApp | WebApp | | + +### Models + +| English | Chinese | Japanese | Notes | +|:--------|:--------|:---------|:------| +| model | 模型 | モデル | | +| model provider | 模型供应商 | モデルプロバイダー | | +| LLM (large language model) | 大语言模型 | 大規模言語モデル | | +| chat model | 对话模型 | チャットモデル | Models that support role-based conversations (System/User/Assistant) | +| completion model | 文本续写模型 | 補完モデル | Models designed for simple text continuation | +| embedding model | 嵌入模型 | 埋め込みモデル | | +| text embedding model | 文本嵌入模型 | テキスト埋め込みモデル | Models that convert text into vector representations | +| multimodal embedding model | 多模态嵌入模型 | マルチモーダル埋め込みモデル | Models that convert text and images into vector representations | +| rerank model | 重排序模型 | リランクモデル | Models that reorder retrieval results by relevance | +| reasoning model | 推理模型 | 推論モデル | Models that output thinking process before final response | +| moderation model | 内容审核模型 | モデレーションモデル | Models that detect and filter inappropriate content | +| TTS model | 文字转语音模型 | TTSモデル | Text-to-speech models | +| Speech2Text model | 语音转文字模型 | 音声認識モデル | Speech-to-text models | +| token | token | token | | +| API token | API 令牌 | APIトークン | | +| prompt | 提示词 | プロンプト | | +| system instruction | 系统指令 | システムインストラクション | Defines the model's behavior and role | +| user message | User 消息 | ユーザーメッセージ | Passes user input or provides example queries | +| assistant message | Assistant 消息 | アシスタントメッセージ | Provides example responses to guide model behavior | +| model tag | 模型标签 | モデルタグ | Indicators of model capabilities (Vision, Tool Call, context window, etc.) | + +### Nodes + +| English | Chinese | Japanese | Notes | +|:--------|:--------|:---------|:------| +| node | 节点 | ノード | | +| User Input | 用户输入 | ユーザー入力 | Start node; collects information from end users during application runtime | +| Output | 输出 | 出力 | End node for Workflows; defines output variables | +| Answer | 直接回复 | 回答 | End node for Chatflows; streams response text to the user | +| LLM | LLM | LLM | Node that calls large language models to generate responses | +| Knowledge Retrieval | 知识检索 | ナレッジ検索 | Retrieves relevant information from knowledge bases | +| Question Classifier | 问题分类器 | 質問分類器 | Classifies user input into categories using an LLM | +| IF/ELSE | 条件分支 | IF/ELSE | Splits workflow into branches based on conditions | +| Code | 代码执行 | コード実行 | Executes custom Python or JavaScript code | +| Template | 模板转换 | テンプレート | Transforms data using Jinja2 templates | +| HTTP Request | HTTP 请求 | HTTP リクエスト | Sends HTTP requests to external APIs | +| Variable Aggregator | 变量聚合器 | 変数集約器 | Aggregates multi-branch variables into one | +| Variable Assigner | 变量赋值器 | 変数代入器 | Assigns values to conversation/environment variables | +| Tool | 工具 | ツール | Calls external tools and services | +| Parameter Extractor | 参数提取器 | パラメータ抽出 | Extracts structured parameters from natural language using an LLM | +| Iteration | 迭代 | イテレーション | Processes array items sequentially | +| Loop | 循环 | ループ | Repeats steps until a condition is met | +| Doc Extractor | 文档提取器 | テキスト抽出 | Extracts text content from document files | +| List Operator | 列表操作 | リスト処理 | Filters, sorts, and limits list data | +| Agent | Agent | Agent | Workflow node (distinct from Agent app type above) | +| Human Input | 人工介入 | 人間の入力 | Pauses workflow execution to request human review or decisions | +| Schedule Trigger | 定时触发器 | スケジュールトリガー | Triggers workflow execution on a cron schedule | +| Webhook Trigger | Webhook 触发器 | Webhook トリガー | Triggers workflow execution via incoming HTTP webhook | +| Plugin Trigger | 插件触发器 | プラグイントリガー | Triggers workflow execution from a plugin | +| Command | 命令 | コマンド | Executes commands in the sandboxed runtime environment | +| Upload File to Sandbox | 上传文件至沙盒 | サンドボックスへのファイルアップロード | Uploads files to the sandboxed runtime environment | + +### Knowledge & Retrieval + +| English | Chinese | Japanese | Notes | +|:--------|:--------|:---------|:------| +| knowledge base | 知识库 | ナレッジベース | Always lowercase unless at sentence start | +| chunk | 分段 | チャンク | Use "chunk" not "segment"; a segment of text resulting from the chunking process | +| chunking | 分段 | チャンキング | Use "chunking" consistently; avoid "segmentation" or "splitting" | +| retrieval | 检索 | 検索 | Always lowercase in body text | +| retrieval mode | 检索模式 | 検索モード | Strategy for finding and ranking relevant chunks | +| indexing | 索引 | インデックス | Use "Index Method" consistently in documentation | +| index method | 索引方法 | インデックス方法 | Also referred to as "Index Method" in some UI contexts | +| embedding | 嵌入 | 埋め込み | | +| metadata | 元数据 | メタデータ | | +| delimiter | 分隔符 | デリミタ | The character or sequence used to split text during chunking | +| maximum chunk length | 最大分段长度 | 最大チャンク長 | The maximum size of each chunk in characters | +| chunk overlap | 分段重叠 | チャンクオーバーラップ | Characters overlapping between adjacent chunks; specific to General mode | +| General mode | 通用模式 | 汎用モード | Single-tier chunking where all chunks use the same settings | +| Parent-child mode | 父子模式 | 親子モード | Two-tier hierarchical chunking strategy | +| parent chunk | 父分段 | 親チャンク | Larger text blocks that provide context to the LLM | +| child chunk | 子分段 | 子チャンク | Smaller, fine-grained pieces used for semantic search | +| Paragraph mode | 段落模式 | 段落モード | Parent chunk creation mode that splits documents into multiple chunks | +| Full Doc mode | 全文档模式 | 全文書モード | Parent chunk creation mode where the entire document is a single chunk; capitalize "Full Doc" | +| pre-processing | 预处理 | 前処理 | Text cleaning operations before chunking; use "pre-processing" (noun) or "pre-process" (verb) | +| Summary Auto-Gen | 摘要自动生成| 要約自動生成 | Feature that automatically generates summaries for chunks | +| Top K | Top K | Top K | Number of most relevant chunks to retrieve | +| score threshold | 分数阈值 | スコアしきい値 | Minimum relevance score required for chunks to be included | + +### Configuration & Parameters + +| English | Chinese | Japanese | Notes | +|:--------|:--------|:---------|:------| +| variable | 变量 | 変数 | | +| environment variable | 环境变量 | 環境変数 | | +| Top P | Top P | Top P | | +| context variable | 上下文变量 | コンテキスト変数 | Variables injected to provide additional information to the LLM node | +| conversation memory | 对话记忆 | 会話メモリ | Feature that retains recent chat history (Chatflows only) | +| window size | 窗口大小 | ウィンドウサイズ | Controls how many recent exchanges to retain in memory | +| structured output | 结构化输出 | 構造化出力 | Feature that enforces JSON schema for reliable formatting | +| input field | 输入字段 | 入力フィールド | Form fields where people provide requested information | +| request form | 请求表单 | リクエストフォーム | The form sent to recipients asking for input/review; use "request form" not "request page" | +| Assemble Variable | 变量组装 | 変数アセンブル | On-demand data transformation using natural language descriptions | + +### Agent + +| English | Chinese | Japanese | Notes | +|:--------|:--------|:---------|:------| +| Max Iterations | 最大迭代次数 | 最大イテレーション数 | Limits the maximum number of reasoning loops and tool actions | + +### Infrastructure + +| English | Chinese | Japanese | Notes | +|:--------|:--------|:---------|:------| +| self-hosted | 自托管 | セルフホスト | | +| SaaS | SaaS | SaaS | | +| Docker | Docker | Docker | | +| sandbox | 沙箱 | サンドボックス | | +| API | API | API | | +| runtime | 运行时 | ランタイム | The execution environment for workflow nodes | +| classic runtime | 经典运行时 | クラシックランタイム | Original lightweight execution environment focused on speed and token efficiency | +| sandboxed runtime | 沙盒运行时 | サンドボックスランタイム | Enhanced execution environment with file system access and autonomous tool installation | +| skill | 技能 | スキル | Reusable expertise packages that eliminate repetitive prompt writing (Sandboxed runtime) | +| file system | 文件系统 | ファイルシステム | Sandboxed file access for reading/writing during execution | + +### Marketplace + +| English | Chinese | Japanese | Notes | +|:--------|:--------|:---------|:------| +| Marketplace | 市场 | マーケットプレイス | Platform where users publish and discover app templates | +| Creator Center | 创作者中心 | クリエイターセンター | Interface for managing template submissions and publications | + +## UI Labels + +Terms in this section must match the Dify product interface exactly. When these terms appear **bolded** in documentation, translations MUST use the corresponding UI string from the product. + +### Sidebar & Navigation + +| English (UI) | Chinese (UI) | Japanese (UI) | i18n Key | Notes | +|:-------------|:-------------|:--------------|:---------|:------| +| Studio | 工作室 | スタジオ | common.menus.apps | Sidebar menu label for the app workspace | +| Knowledge | 知识库 | ナレッジ | common.menus.datasets | Sidebar menu label; not to be confused with lowercase "knowledge base" in prose | +| Explore | 探索 | 探索 | common.menus.explore | Sidebar menu label | +| Plugins | 插件 | プラグイン | common.menus.plugins | Sidebar menu label | +| Tools | 工具 | ツール | common.menus.tools | Sidebar menu label | + +### App Detail Tabs + +| English (UI) | Chinese (UI) | Japanese (UI) | i18n Key | Notes | +|:-------------|:-------------|:--------------|:---------|:------| +| Orchestrate | 编排 | オーケストレート | common.appMenus.promptEng | App configuration tab | +| Monitoring | 监测 | 監視 | common.appMenus.overview | App metrics/overview tab | +| API Access | 访问 API | API アクセス | common.appMenus.apiAccess | Also used in Knowledge detail | +| Logs & Annotations | 日志与标注 | ログ&注釈 | common.appMenus.logAndAnn | | + +### Knowledge Detail Tabs + +| English (UI) | Chinese (UI) | Japanese (UI) | i18n Key | Notes | +|:-------------|:-------------|:--------------|:---------|:------| +| Documents | 文档 | ドキュメント | common.datasetMenus.documents | | +| Retrieval Testing | 召回测试 | 検索テスト | common.datasetMenus.hitTesting | | +| Settings | 设置 | 設定 | common.datasetMenus.settings | | +| Pipeline | 流水线 | パイプライン | common.datasetMenus.pipeline | | + +### Settings Panel + +| English (UI) | Chinese (UI) | Japanese (UI) | i18n Key | Notes | +|:-------------|:-------------|:--------------|:---------|:------| +| My account | 我的账户 | マイアカウント | common.settings.account | | +| Members | 成员 | メンバー | common.settings.members | | +| Model Provider | 模型供应商 | モデルプロバイダー | common.settings.provider | | +| Data Source | 数据来源 | データソース | common.settings.dataSource | | +| API Extension | API 扩展 | API 拡張 | common.settings.apiBasedExtension | | +| Billing | 账单 | 請求 | common.settings.billing | | +| Integrations | 集成 | 統合 | common.settings.integrations | | +| System Model Settings | 系统模型设置 | システムモデル設定 | common.modelProvider.systemModelSettings | | +| System Reasoning Model | 系统推理模型 | システム推論モデル | common.modelProvider.systemReasoningModel.key | | +| Embedding Model | Embedding 模型 | 埋め込みモデル | common.modelProvider.embeddingModel.key | | +| Rerank Model | Rerank 模型 | Rerank モデル | common.modelProvider.rerankModel.key | | +| Speech-to-Text Model | 语音转文本模型 | 音声-to-テキストモデル | common.modelProvider.speechToTextModel.key | | +| Text-to-Speech Model | 文本转语音模型 | テキスト-to-音声モデル | common.modelProvider.ttsModel.key | | +| Load Balancing | 负载均衡 | 負荷分散 | common.modelProvider.loadBalancing | | +| Message Credits | 消息额度 | クレジット | common.modelProvider.credits | | + +### Workspace Roles + +| English (UI) | Chinese (UI) | Japanese (UI) | i18n Key | Notes | +|:-------------|:-------------|:--------------|:---------|:------| +| Owner | 所有者 | オーナー | common.members.owner | | +| Admin | 管理员 | 管理者 | common.members.admin | | +| Editor | 编辑 | エディター | common.members.editor | | +| Builder | 构建器 | ビルダー | common.members.builder | REVIEW: ZH "构建器" seems unusual for a role name | +| Knowledge Admin | 知识库管理员 | ナレッジ管理員 | common.members.datasetOperator | Formerly "Dataset Operator" in code | +| Normal | 成员 | 通常 | common.members.normal | REVIEW: ZH uses "成员" (member); verify intended translation | + +### App Type Selectors + +| English (UI) | Chinese (UI) | Japanese (UI) | i18n Key | Notes | +|:-------------|:-------------|:--------------|:---------|:------| +| Assistant | 助手 | アシスタント | app.newApp.chatApp | Chat app creation label | +| Chatbot | 聊天助手 | チャットボット | app.typeSelector.chatbot | App type filter | +| Chatflow | Chatflow | チャットフロー | app.typeSelector.advanced | App type filter | +| Completion | 文本生成 | テキスト生成 | app.typeSelector.completion | App type filter | + +### App Actions + +| English (UI) | Chinese (UI) | Japanese (UI) | i18n Key | Notes | +|:-------------|:-------------|:--------------|:---------|:------| +| Duplicate | 复制 | 複製 | app.duplicate | | +| Export DSL | 导出 DSL | DSL をエクスポート | app.export | | +| Import DSL file | 导入 DSL 文件 | DSL ファイルをインポート | app.importDSL | | +| Create from Blank | 创建空白应用 | 最初から作成 | app.newApp.startFromBlank | | +| Create from Template | 从应用模板创建 | テンプレートから作成 | app.newApp.startFromTemplate | | +| Tracing | 追踪 | 追跡 | app.tracing.tracing | LLMOps tracing feature | +| Web App Access Control | Web 应用访问控制 | Web アプリアクセス制御 | app.accessControl | | + +### Workflow Node Names + +| English (UI) | Chinese (UI) | Japanese (UI) | i18n Key | Notes | +|:-------------|:-------------|:--------------|:---------|:------| +| User Input | 用户输入 | ユーザー入力 | workflow.blocks.start | Start node display name | +| LLM | LLM | LLM | workflow.blocks.llm | | +| Knowledge Retrieval | 知识检索 | 知識検索 | workflow.blocks.knowledge-retrieval | | +| IF/ELSE | 条件分支 | IF/ELSE | workflow.blocks.if-else | | +| Code | 代码执行 | コード実行 | workflow.blocks.code | | +| Template | 模板转换 | テンプレート | workflow.blocks.template-transform | Jinja template transform node | +| Question Classifier | 问题分类器 | 質問分類器 | workflow.blocks.question-classifier | | +| HTTP Request | HTTP 请求 | HTTP リクエスト | workflow.blocks.http-request | | +| Variable Aggregator | 变量聚合器 | 変数集約器 | workflow.blocks.variable-aggregator | | +| Variable Assigner | 变量赋值 | 変数代入 | workflow.blocks.assigner | | +| Iteration | 迭代 | イテレーション | workflow.blocks.iteration | | +| Loop | 循环 | ループ | workflow.blocks.loop | | +| Parameter Extractor | 参数提取器 | パラメータ抽出 | workflow.blocks.parameter-extractor | | +| Doc Extractor | 文档提取器 | テキスト抽出 | workflow.blocks.document-extractor | | +| List Operator | 列表操作 | リスト処理 | workflow.blocks.list-operator | | +| Output | 输出 | 出力 | workflow.blocks.end | End/output node | +| Answer | 直接回复 | 回答 | workflow.blocks.answer | | +| Human Input | 人工介入 | 人間の入力 | workflow.blocks.human-input | | +| Webhook Trigger | Webhook 触发器 | Webhook トリガー | workflow.blocks.trigger-webhook | | +| Schedule Trigger | 定时触发器 | スケジュールトリガー | workflow.blocks.trigger-schedule | | +| Plugin Trigger | 插件触发器 | プラグイントリガー | workflow.blocks.trigger-plugin | | +| Knowledge Base | 知识库 | 知識ベース | workflow.blocks.knowledge-index | Knowledge index node | + +### Workflow Controls + +| English (UI) | Chinese (UI) | Japanese (UI) | i18n Key | Notes | +|:-------------|:-------------|:--------------|:---------|:------| +| Publish | 发布 | 公開する | workflow.common.publish | | +| Published | 已发布 | 公開済み | workflow.common.published | Status label | +| Unpublished | 未发布 | 未公開 | workflow.common.unpublished | Status label | +| Preview | 预览 | プレビュー | workflow.common.debugAndPreview | Debug & preview button | +| Test Run | 测试运行 | テスト実行 | workflow.common.run | | +| Run App | 运行 | アプリを実行 | workflow.common.runApp | | +| Features | 功能 | 機能 | workflow.common.features | Panel for web app features | +| Version History | 版本历史 | バージョン履歴 | workflow.common.versionHistory | | +| Workflow as Tool | 发布为工具 | ワークフローをツールとして公開する | workflow.common.workflowAsTool | REVIEW: ZH/JA much longer than EN label | +| Embed Into Site | 嵌入网站 | サイトに埋め込む | workflow.common.embedIntoSite | | +| Conversation Variables | 会话变量 | 会話変数 | workflow.chatVariable.panelTitle | Panel label | +| Environment Variables | 环境变量 | 環境変数 | workflow.env.envPanelTitle | Panel label | +| System Variables | 系统变量 | システム変数 | workflow.globalVar.title | Panel label | + +### Agent Node Config + +| English (UI) | Chinese (UI) | Japanese (UI) | i18n Key | Notes | +|:-------------|:-------------|:--------------|:---------|:------| +| Agentic Strategy | Agent 策略 | エージェンティック戦略 | workflow.nodes.agent.strategy.label | | +| Query Variable | 查询变量 | 検索変数 | workflow.nodes.knowledgeRetrieval.queryVariable | Knowledge retrieval node config | +| Metadata Filtering | 元数据过滤 | メタデータフィルタ | workflow.nodes.knowledgeRetrieval.metadata.title | Knowledge retrieval node config | + +### Knowledge Retrieval Methods + +| English (UI) | Chinese (UI) | Japanese (UI) | i18n Key | Notes | +|:-------------|:-------------|:--------------|:---------|:------| +| Vector Search | 向量检索 | ベクトル検索 | dataset.retrieval.semantic_search.title | | +| Full-Text Search | 全文检索 | 全文検索 | dataset.retrieval.full_text_search.title | | +| Hybrid Search | 混合检索 | ハイブリッド検索 | dataset.retrieval.hybrid_search.title | | +| Inverted Index | 倒排索引 | 転置インデックス | dataset.retrieval.invertedIndex.title | | +| Weighted Score | 权重设置 | ウェイト設定 | dataset.weightedScore.title | Rerank strategy option | + +### Knowledge Settings + +| English (UI) | Chinese (UI) | Japanese (UI) | i18n Key | Notes | +|:-------------|:-------------|:--------------|:---------|:------| +| External Knowledge Base | 外部知识库 | 外部知識ベース | dataset.externalKnowledgeBase | | +| External API | 外部 API | 外部 API | dataset.externalAPI | | +| Service API | 服务 API | サービスAPI | dataset.serviceApi.title | | +| Multimodal | 多模态 | マルチモーダル | dataset.multimodal | REVIEW: Verify this is user-facing | + +### Chunking Mode Labels + +| English (UI) | Chinese (UI) | Japanese (UI) | i18n Key | Notes | +|:-------------|:-------------|:--------------|:---------|:------| +| General | 通用 | 汎用 | dataset.chunkingMode.general | | +| Parent-child | 父子 | 親子 | dataset.chunkingMode.parentChild | | +| Q&A | 问答 | Q&A | dataset.chunkingMode.qa | | +| Graph | 图 | グラフ | dataset.chunkingMode.graph | | +| Full-doc | 全文 | 全体 | dataset.parentMode.fullDoc | Parent chunk mode | +| Paragraph | 段落 | 段落 | dataset.parentMode.paragraph | Parent chunk mode | + +### Document Processing + +| English (UI) | Chinese (UI) | Japanese (UI) | i18n Key | Notes | +|:-------------|:-------------|:--------------|:---------|:------| +| Index Method | 索引方式 | インデックス方法 | dataset-creation.stepTwo.indexMode | | +| High Quality | 高质量 | 高品質 | dataset-creation.stepTwo.qualified | Index quality level | +| Economical | 经济 | 経済的 | dataset-creation.stepTwo.economical | Index quality level | +| Chunk Settings | 分段设置 | チャンク設定 | dataset-creation.stepTwo.segmentation | | +| Text Pre-processing Rules | 文本预处理规则 | テキストの前処理ルール | dataset-creation.stepTwo.rules | | +| Automatic | 自动分段与清洗 | 自動 | dataset-creation.stepTwo.auto | Processing mode | +| Custom | 自定义 | カスタム | dataset-creation.stepTwo.custom | Processing mode | +| Preview Chunk | 预览块 | チャンクをプレビュー | dataset-creation.stepTwo.previewChunk | | +| Data Source | 选择数据源 | データソース | dataset-creation.steps.one | Wizard step 1 label | +| Document Processing | 文本分段与清洗 | テキスト進行中 | dataset-creation.steps.two | Wizard step 2 label | +| Execute & Finish | 处理并完成 | 実行と完成 | dataset-creation.steps.three | Wizard step 3 label | +| Import from file | 导入已有文本 | テキストファイルからインポート | dataset-creation.stepOne.dataSourceType.file | | +| Sync from Notion | 同步自 Notion 内容 | Notion から同期 | dataset-creation.stepOne.dataSourceType.notion | | +| Sync from website | 同步自 Web 站点 | ウェブサイトから同期 | dataset-creation.stepOne.dataSourceType.web | | + +### Document Status Labels + +| English (UI) | Chinese (UI) | Japanese (UI) | i18n Key | Notes | +|:-------------|:-------------|:--------------|:---------|:------| +| Enabled | 已启用 | 有効 | dataset-documents.list.status.enabled | | +| Disabled | 已禁用 | 無効 | dataset-documents.list.status.disabled | | +| Archived | 已归档 | アーカイブ済み | dataset-documents.list.status.archived | | +| Available | 可用 | 利用可能 | dataset-documents.list.status.available | | +| Indexing | 索引中 | インデックス化中 | dataset-documents.list.status.indexing | | +| Queuing | 排队中 | キューイング中 | dataset-documents.list.status.queuing | | +| Paused | 已暂停 | 一時停止中 | dataset-documents.list.status.paused | | +| Error | 错误 | エラー | dataset-documents.list.status.error | | + +### Document Embedding Modes + +| English (UI) | Chinese (UI) | Japanese (UI) | i18n Key | Notes | +|:-------------|:-------------|:--------------|:---------|:------| +| Chunking Setting | 分段模式 | チャンキングモード | dataset-documents.embedding.mode | Section heading | +| High-quality mode | 高质量模式 | 高品質モード | dataset-documents.embedding.highQuality | | +| Economy mode | 经济模式 | 経済モード | dataset-documents.embedding.economy | | + diff --git a/writing-guides/index.md b/writing-guides/index.md new file mode 100644 index 000000000..0776d7efa --- /dev/null +++ b/writing-guides/index.md @@ -0,0 +1,21 @@ +# Documentation Task Guide + +## Which Skill to Use + +| Task | Skill | Paths | References | +|:-----|:------|:------|:-----------| +| Write or improve a user guide | dify-docs-guides | `en/use-dify/`, `en/develop-plugin/`, `en/self-host/` | style-guide, formatting-guide, glossary | +| Write or audit API reference specs | dify-docs-api-reference | `en/api-reference/` | style-guide, formatting-guide, glossary | +| Write or audit env var docs | dify-docs-env-vars | `en/self-host/configuration/environments.mdx` | style-guide, formatting-guide, glossary | + +When paths overlap, the most specific match takes precedence. + +## Without a Skill + +If no skill matches your task (e.g., fixing a typo, updating navigation), follow the style guide and formatting guide directly. + +## Reference Files + +- **style-guide.md** — Voice, tone, writing patterns, callout usage +- **formatting-guide.md** — MDX formatting, Mintlify components, headings, lists, code +- **glossary.md** — Standardized terminology with Chinese and Japanese translations diff --git a/writing-guides/style-guide.md b/writing-guides/style-guide.md new file mode 100644 index 000000000..77fc4227e --- /dev/null +++ b/writing-guides/style-guide.md @@ -0,0 +1,80 @@ +# Dify Documentation Style Guide + +## Voice and Tone + +Use **active voice** whenever natural and clear. Passive voice is acceptable when the actor is unknown or when it reads more naturally. + +Be conversational but professional. Prefer everyday language over formal equivalents—"ask questions" over "submit queries". Avoid robotic, AI-sounding phrasing. + +This documentation serves both developers and non-technical users. Write to be accessible to both. + +## Clarity and Conciseness + +Express ideas clearly and concisely. Every sentence should add value. Cut unnecessary words without losing meaning, but don't sacrifice readability for minimalism—the goal is the shortest version that still reads naturally. + +Choose precision when it prevents confusion. A specific, descriptive term is better than a shorthand that assumes shared context with the reader. + +When a heading already states the topic, the first sentence should add new information—not restate the heading. + +## Formatting Principles + +Use **Title Case** for all headings. + +Prefer prose over bullet points when explaining concepts or processes. Use bullet points only for genuinely discrete, enumerable items. Write in paragraphs when ideas connect. + +Use tabs (not numbered lists) when presenting parallel options users choose between. Numbered lists imply sequence; tabs signal alternatives. + +## Callout Usage + +- **Info**: General informational content—helpful context, version-specific or deployment-specific details +- **Tip**: Helpful suggestions or shortcuts +- **Note**: Important information that requires attention—missing it could lead to potential complications +- **Warning**: Actions that could cause errors or data loss + +Place critical limitations at the start of a section when users need them before taking action, not only at the end. + +**Avoid overuse.** Too many callouts dilute their importance and interrupt reading flow. When a section accumulates multiple callouts, restructure into flowing paragraphs with inline bold text instead. Reserve callout visual weight for genuinely critical information. + +These are principles, not absolute rules. Apply them when they improve clarity; use editorial judgment when they don't. + +## Patterns to Use + +**Direct instructions.** Use the imperative for required actions: "Click **Generate** to create the output." Reserve "you can" for optional actions to signal choice. + +**Task-oriented headings.** "Import Your Data" instead of "Data Import Feature." + +**Location-first instructions.** When an operation involves a specific UI location, name the location before the action: "In the **Settings** panel, enable the toggle." This prevents users from completing an action in the wrong place. + +**User outcomes over technical mechanisms.** Focus on what users achieve, not how the system works internally. "Answer follow-up questions coherently" (outcome) over "maintain conversational context across turns" (mechanism). + +**Problem → solution structure.** Introduce features by stating the problem they solve, then the solution. + +**Purpose-oriented descriptions.** Describe actions with their purpose: "Add comments to share ideas and discuss design decisions" is more useful than "Click the comment icon to add comments." + +**Progressive disclosure.** Lead with the essential, add details as needed. Don't over-segment simple tasks into excessive steps. + +**Natural transitions.** Connect ideas smoothly. Avoid mechanical connectors or repetitive sentence openers across a section. + +**Parallel structure for dependencies.** Keep interdependent configurations in one sentence. Splitting implies sequential order or suggests one is more important. + +**Decision-making information.** Provide applicable scenarios and trade-offs rather than prescribing specific configurations. Users have diverse needs; give them what they need to make informed choices. + +**Genuine insight.** Add the "why" and "how it connects", not just a reorganization of information already visible in the product. + +## Patterns to Avoid + +**Excessive bullets.** Don't fragment continuous reasoning into bullet lists. If the items connect, use prose. + +**Passive voice overuse.** "The file is uploaded by the user" → "You upload the file." + +**Feature-centric framing.** "This feature allows you to..." → "You can..." When an action is optional, "you can" is preferable; when it's required, use the imperative. + +**Redundant phrases.** Cut "in order to", "it should be noted that", "please note that", and similar filler. + +**Repeating context.** Don't restate the scenario conditions established by the section heading or earlier prose. If a section is titled "Configure Webhooks", individual steps shouldn't keep saying "to configure webhooks." + +**Repeating the UI.** Don't describe interface elements users can see directly—default values, field labels, button names. Documentation provides context and rationale not visible in the UI. + +**Repetitive structures.** Vary sentence patterns across related sections to avoid a mechanical feel. + +**Over-simplification.** Don't sacrifice clarity for brevity. Choose precision when a specific term prevents confusion, even if it's longer.