From 70892536f5dd2c3adfc36d749079dfe78b64bfe2 Mon Sep 17 00:00:00 2001
From: DrMelone <27028174+Classic298@users.noreply.github.com>
Date: Mon, 2 Feb 2026 17:01:13 +0100
Subject: [PATCH 01/10] analytics

---
 docs/features/analytics/index.mdx | 342 ++++++++++++++++++++++++++++++
 docs/features/index.mdx           |   2 +
 2 files changed, 344 insertions(+)
 create mode 100644 docs/features/analytics/index.mdx

diff --git a/docs/features/analytics/index.mdx b/docs/features/analytics/index.mdx
new file mode 100644
index 000000000..b9fbe81e6
--- /dev/null
+++ b/docs/features/analytics/index.mdx
@@ -0,0 +1,342 @@
+---
+sidebar_position: 1050
+title: "Analytics"
+---
+
+# Analytics
+
+The **Analytics** feature in Open WebUI provides administrators with comprehensive insights into usage patterns, token consumption, and model performance across their instance. This powerful tool helps you understand how your users are interacting with AI models and make data-driven decisions about resource allocation and model selection.
+
+:::info Admin-Only Feature
+Analytics is only accessible to users with **admin** role. Access it via **Admin Panel > Analytics**.
+:::
+
+## Overview
+
+The Analytics dashboard gives you a bird's-eye view of your Open WebUI instance's activity, including:
+
+- **Message volume** across different models and time periods
+- **Token usage** tracking for cost estimation and resource planning
+- **User activity** patterns to understand engagement
+- **Time-series data** showing trends over hours, days, or months
+
+All analytics data is derived from the message history stored in your instance's database. When the Analytics feature is enabled, Open WebUI automatically tracks and indexes messages to provide fast, queryable insights.
+
+---
+
+## Accessing Analytics
+
+1. Log in with an **admin** account
+2. Navigate to **Admin Panel** (click your profile icon → Admin Panel)
+3. Click on the **Analytics** tab in the admin navigation
+
+---
+
+## Dashboard Features
+
+### Time Period Selection
+
+At the top right of the Analytics dashboard, you can filter all data by time period:
+
+- **Last 24 hours** - Hourly granularity for real-time monitoring
+- **Last 7 days** - Daily overview of the past week
+- **Last 30 days** - Monthly snapshot
+- **Last 90 days** - Quarterly trends
+- **All time** - Complete historical data
+
+All metrics on the page update automatically when you change the time period.
+
+### Summary Statistics
+
+The dashboard header displays key metrics for the selected time period:
+
+- **Total Messages** - Number of assistant responses generated
+- **Total Tokens** - Sum of all input and output tokens processed
+- **Total Chats** - Number of unique conversations
+- **Total Users** - Number of users who sent messages
+
+:::note Message Counting
+Analytics counts **assistant responses** rather than user messages. This provides a more accurate measure of AI model usage and token consumption.
+:::
+
+### Message Timeline Chart
+
+The interactive timeline chart visualizes message volume over time, broken down by model. Key features:
+
+- **Hourly or Daily granularity** - Automatically adjusts based on selected time period
+- **Multi-model visualization** - Shows up to 8 models with distinct colors
+- **Hover tooltips** - Display exact counts and percentages for each model at any point in time
+- **Trend identification** - Quickly spot usage patterns, peak hours, and model adoption
+
+This chart helps you:
+- Identify busy periods for capacity planning
+- Track model adoption after deployment
+- Detect unusual activity spikes
+- Monitor the impact of changes or announcements
+
+### Model Usage Table
+
+A detailed breakdown of how each model is being used:
+
+| Column | Description |
+|--------|-------------|
+| **#** | Rank by message count |
+| **Model** | Model name with icon |
+| **Messages** | Total assistant responses generated |
+| **Tokens** | Total tokens (input + output) consumed |
+| **%** | Percentage share of total messages |
+
+**Features:**
+- **Sortable columns** - Click column headers to sort by name or message count
+- **Model icons** - Visual identification with profile images
+- **Token tracking** - See which models consume the most resources
+
+**Use cases:**
+- Identify your most popular models
+- Calculate cost per model (by multiplying tokens by provider rates)
+- Decide which models to keep or remove
+- Plan infrastructure upgrades based on usage
+
+### User Activity Table
+
+Track user engagement and token consumption per user:
+
+| Column | Description |
+|--------|-------------|
+| **#** | Rank by activity |
+| **User** | Username with profile picture |
+| **Messages** | Total messages sent by this user |
+| **Tokens** | Total tokens consumed by this user |
+
+**Features:**
+- **Sortable columns** - Organize by name or activity level
+- **User identification** - Profile pictures and display names
+- **Token attribution** - See resource consumption per user
+
+**Use cases:**
+- Monitor power users and their token consumption
+- Identify inactive or low-usage accounts
+- Plan user quotas or rate limits
+- Calculate per-user costs for billing purposes
+
+---
+
+## Token Usage Tracking
+
+### What Are Tokens?
+
+Tokens are the units that language models use to process text. Both the **input** (your prompt) and **output** (the model's response) consume tokens. Most AI providers charge based on token usage, making token tracking essential for cost management.
+
+### How Token Tracking Works
+
+Open WebUI automatically captures token usage from model responses and stores it with each message. The Analytics feature aggregates this data to show:
+
+- **Input tokens** - Tokens in user prompts and context
+- **Output tokens** - Tokens in model responses
+- **Total tokens** - Sum of input and output
+
+Token data is normalized across different model providers (OpenAI, Ollama, llama.cpp, etc.) to provide consistent metrics regardless of which backend you're using.
+
+### Token Usage Metrics
+
+The **Token Usage** section (accessible via the Tokens endpoint or dashboard) provides:
+
+- **Per-model token breakdown** - Input, output, and total tokens for each model
+- **Total token consumption** - Instance-wide token usage
+- **Message count correlation** - Tokens per message for efficiency analysis
+
+:::tip Cost Estimation
+To estimate costs, multiply the token counts by your provider's pricing:
+```
+Cost = (input_tokens × input_price) + (output_tokens × output_price)
+```
+
+Example for GPT-4:
+- Input: 1,000,000 tokens × $0.03/1K = $30
+- Output: 500,000 tokens × $0.06/1K = $30
+- **Total: $60**
+:::
+
+---
+
+## Use Cases
+
+### 1. Resource Planning
+
+**Scenario:** You're running Open WebUI for a team and need to plan infrastructure capacity.
+
+**How Analytics helps:**
+- View the **Message Timeline** to identify peak usage hours
+- Check **Model Usage** to see which models need more resources
+- Monitor **Token Usage** to estimate future costs
+- Track **User Activity** to plan for team growth
+
+### 2. Model Evaluation
+
+**Scenario:** You've deployed several models and want to know which ones your users prefer.
+
+**How Analytics helps:**
+- Compare **message counts** across models to see adoption rates
+- Check **token efficiency** (tokens per message) to identify verbose models
+- Monitor **trends** in the timeline chart after introducing new models
+- Combine with the [Evaluation feature](../evaluation/index.mdx) for quality insights
+
+### 3. Cost Management
+
+**Scenario:** You're using paid API providers and need to control costs.
+
+**How Analytics helps:**
+- Track **total token consumption** by model and user
+- Identify **high-usage users** for quota discussions
+- Compare **token costs** across different model providers
+- Set up regular reviews using time-period filters
+
+### 4. User Engagement
+
+**Scenario:** You want to understand how your team is using AI tools.
+
+**How Analytics helps:**
+- Monitor **active users** vs. registered accounts
+- Identify **power users** who might need support or training
+- Track **adoption trends** over time
+- Correlate usage with team initiatives or training sessions
+
+### 5. Compliance & Auditing
+
+**Scenario:** Your organization requires usage reporting for compliance.
+
+**How Analytics helps:**
+- Generate **activity reports** for specific time periods
+- Track **user attribution** for all AI interactions
+- Monitor **model usage** for approved vs. unapproved models
+- Export data via API for external reporting tools
+
+---
+
+## Technical Details
+
+### Data Storage
+
+Analytics data is stored in the `chat_message` table, which contains:
+
+- **Message content** - User and assistant messages
+- **Metadata** - Model ID, user ID, timestamps
+- **Token usage** - Input, output, and total tokens
+- **Relationships** - Links to parent messages and chats
+
+When you enable Analytics (via migration), Open WebUI:
+1. Creates the `chat_message` table with optimized indexes
+2. **Backfills existing messages** from your chat history
+3. **Dual-writes** new messages to both the chat JSON and the message table
+
+This dual-write approach ensures:
+- **Backward compatibility** - Existing features continue working
+- **Fast queries** - Analytics doesn't impact chat performance
+- **Data consistency** - All messages are captured
+
+### Database Indexes
+
+The following indexes optimize analytics queries:
+
+- `chat_id` - Fast lookup of all messages in a chat
+- `user_id` - Quick user activity reports
+- `model_id` - Efficient model usage queries
+- `created_at` - Time-range filtering
+- Composite indexes for common query patterns
+
+### API Endpoints
+
+For advanced users and integrations, Analytics provides REST API endpoints:
+
+```
+GET /api/v1/analytics/summary
+GET /api/v1/analytics/models
+GET /api/v1/analytics/users
+GET /api/v1/analytics/messages
+GET /api/v1/analytics/daily
+GET /api/v1/analytics/tokens
+```
+
+All endpoints support `start_date` and `end_date` parameters (Unix timestamps) for time-range filtering.
+
+:::tip API Access
+All Analytics endpoints require admin authentication. Include your admin bearer token:
+```bash
+curl -H "Authorization: Bearer YOUR_ADMIN_TOKEN" \
+  https://your-instance.com/api/v1/analytics/summary
+```
+:::
+
+---
+
+## Privacy & Data Considerations
+
+### What Gets Tracked?
+
+Analytics tracks:
+- ✅ Message timestamps and counts
+- ✅ Token usage per message
+- ✅ Model IDs and user IDs
+- ✅ Chat IDs and message relationships
+
+Analytics **does not** track:
+- ❌ Message content display in the dashboard (only metadata)
+- ❌ External sharing or exports
+- ❌ Individual message content outside the database
+
+### Data Retention
+
+Analytics data follows your instance's chat retention policy. When you delete:
+- **A chat** - All associated messages are removed from analytics
+- **A user** - All their messages are disassociated
+- **Message history** - Analytics data is also cleared
+
+---
+
+## Frequently Asked Questions
+
+### Why are message counts different from what I expected?
+
+Analytics counts **assistant responses**, not user messages. If a chat has 10 user messages and 10 assistant responses, the count is 10. This provides a more accurate measure of AI usage and token consumption.
+
+### How accurate is token tracking?
+
+Token accuracy depends on your model provider:
+- **OpenAI/Anthropic** - Exact counts from API responses
+- **Ollama** - Accurate for models with token reporting
+- **llama.cpp** - Reports tokens when available
+- **Custom providers** - Depends on implementation
+
+Missing token data appears as 0 in analytics.
+
+### Can I export analytics data?
+
+Yes, via the API endpoints. Use tools like `curl`, Python scripts, or BI tools to fetch and export data:
+
+```bash
+curl -H "Authorization: Bearer TOKEN" \
+  "https://instance.com/api/v1/analytics/summary?start_date=1704067200&end_date=1706745600" \
+  > analytics_export.json
+```
+
+---
+
+## Summary
+
+Open WebUI's Analytics feature transforms your instance into a data-driven platform by providing:
+
+- 📊 **Real-time insights** into model and user activity
+- 💰 **Token tracking** for cost management and optimization  
+- 📈 **Trend analysis** to understand usage patterns over time
+- 👥 **User engagement** metrics for community building
+- 🔒 **Privacy-focused** design keeping all data on your instance
+
+Whether you're managing a personal instance or a large organizational deployment, Analytics gives you the visibility needed to optimize performance, control costs, and better serve your users.
+
+---
+
+## Related Features
+
+- [**Evaluation**](../evaluation/index.mdx) - Measure model quality through user feedback
+- [**RBAC**](../rbac/index.mdx) - Control access to models and features per user
+- [**Data Controls**](../data-controls/index.mdx) - Manage chat history and exports
diff --git a/docs/features/index.mdx b/docs/features/index.mdx
index f001caf84..9fbc8a058 100644
--- a/docs/features/index.mdx
+++ b/docs/features/index.mdx
@@ -426,6 +426,8 @@ import { TopBanners } from "@site/src/components/TopBanners";
 
 - 👥 **Active Users Indicator**: Monitor the number of active users and which models are being utilized by whom to assist in gauging when performance may be impacted due to a high number of users.
 
+- 📊 **Analytics Dashboard**: Comprehensive usage insights for administrators including message volume, token consumption, user activity, and model performance metrics with interactive time-series charts and detailed breakdowns. Track costs, identify trends, and make data-driven decisions about resource allocation. [Learn more about Analytics](/features/analytics).
+
 - 🔒 **Default Sign-Up Role**: Specify the default role for new sign-ups to `pending`, `user`, or `admin`, providing flexibility in managing user permissions and access levels for new users.
 
 - 🤖 **Bulk Model Management & Filtering**: Administrators can effortlessly manage large model collections from external providers with tools to bulk enable/disable models and filter the admin list by status (Enabled, Disabled, Hidden, etc.) to maintain a clean workspace. [Learn about Admin Models](/features/workspace/models#global-model-management-admin).

From 5c3f2562c0a291465bf9cdcc62937824f770ba60 Mon Sep 17 00:00:00 2001
From: DrMelone <27028174+Classic298@users.noreply.github.com>
Date: Tue, 3 Feb 2026 21:26:28 +0100
Subject: [PATCH 02/10] redis

---
 docs/getting-started/env-configuration.mdx |  8 +++-
 docs/tutorials/integrations/redis.md       | 54 ++++++++++++++++++++++
 2 files changed, 61 insertions(+), 1 deletion(-)

diff --git a/docs/getting-started/env-configuration.mdx b/docs/getting-started/env-configuration.mdx
index 4a1b5f996..5237c1980 100644
--- a/docs/getting-started/env-configuration.mdx
+++ b/docs/getting-started/env-configuration.mdx
@@ -5804,7 +5804,7 @@ If you're running Open WebUI as a single instance with `UVICORN_WORKERS=1` (the
 
 - Type: `bool`
 - Default: `False`
-- Description: Connect to a Redis Cluster instead of a single instance or using Redis Sentinels. If `True`, `REDIS_URL` must also be defined.
+- Description: Connect to a Redis Cluster instead of a single instance or using Redis Sentinels. If `True`, `REDIS_URL` must also be defined. This mode is compatible with AWS Elasticache Serverless and other Redis Cluster implementations.
 
 :::info
 
@@ -5812,6 +5812,12 @@ This option has no effect if `REDIS_SENTINEL_HOSTS` is defined.
 
 :::
 
+:::tip OpenTelemetry Support
+
+Redis Cluster mode is fully compatible with OpenTelemetry instrumentation. When `ENABLE_OTEL` is enabled, Redis operations are properly traced regardless of whether you're using a single Redis instance or Redis Cluster mode.
+
+:::
+
 #### `REDIS_KEY_PREFIX`
 
 - Type: `str`
diff --git a/docs/tutorials/integrations/redis.md b/docs/tutorials/integrations/redis.md
index 47d49577f..9db6a0b26 100644
--- a/docs/tutorials/integrations/redis.md
+++ b/docs/tutorials/integrations/redis.md
@@ -222,6 +222,41 @@ To enhance resilience during Sentinel failover—the window when a new master is
 - **`REDIS_SENTINEL_MAX_RETRY_COUNT`**: Sets the maximum number of retries for Redis operations when using Sentinel (Default: `2`).
 - **`REDIS_RECONNECT_DELAY`**: Adds an optional delay in **milliseconds** between retry attempts (e.g., `REDIS_RECONNECT_DELAY=500`). This prevents tight retry loops that may otherwise overwhelm the event loop or block the application before a new master is ready.
 
+### Redis Cluster Mode
+
+For deployments using Redis Cluster (including managed services like **AWS Elasticache Serverless**), enable cluster mode with the following configuration:
+
+```bash
+REDIS_URL="redis://your-cluster-endpoint:6379/0"
+REDIS_CLUSTER="true"
+```
+
+:::info
+
+**Key Configuration Notes**
+
+- `REDIS_CLUSTER` enables cluster-aware connection handling
+- The `REDIS_URL` should point to your cluster's configuration endpoint
+- This option has no effect if `REDIS_SENTINEL_HOSTS` is defined (Sentinel takes precedence)
+- When using cluster mode, the `REDIS_KEY_PREFIX` is automatically formatted as `{prefix}:` to ensure multi-key operations target the same hash slot
+
+:::
+
+#### AWS Elasticache Serverless
+
+For AWS Elasticache Serverless deployments, use the following configuration:
+
+```bash
+REDIS_URL="rediss://your-elasticache-endpoint.serverless.use1.cache.amazonaws.com:6379/0"
+REDIS_CLUSTER="true"
+```
+
+Note the `rediss://` scheme (with double 's') which enables TLS, required for Elasticache Serverless.
+
+#### OpenTelemetry Support
+
+Redis Cluster mode is fully compatible with OpenTelemetry instrumentation. When `ENABLE_OTEL` is enabled, Redis operations are properly traced regardless of whether you're using a single Redis instance, Redis Sentinel, or Redis Cluster mode.
+
 ### Complete Example Configuration
 
 Here's a complete example showing all Redis-related environment variables:
@@ -245,6 +280,25 @@ REDIS_KEY_PREFIX="open-webui"
 
 For Redis Sentinel deployments specifically, ensure `REDIS_SOCKET_CONNECT_TIMEOUT` is set to prevent application hangs during master failover.
 
+#### Redis Cluster Mode Example
+
+For Redis Cluster deployments (including AWS Elasticache Serverless):
+
+```bash
+# Required for Redis Cluster
+REDIS_URL="rediss://your-cluster-endpoint:6379/0"
+REDIS_CLUSTER="true"
+
+# Required for websocket support
+ENABLE_WEBSOCKET_SUPPORT="true"
+WEBSOCKET_MANAGER="redis"
+WEBSOCKET_REDIS_URL="rediss://your-cluster-endpoint:6379/0"
+WEBSOCKET_REDIS_CLUSTER="true"
+
+# Optional
+REDIS_KEY_PREFIX="open-webui"
+```
+
 ### Docker Run Example
 
 When running Open WebUI using Docker, connect it to the same Docker network and include all necessary Redis variables:

From 7a3a25e23e093c17901794b55fe24bf329a07d88 Mon Sep 17 00:00:00 2001
From: DrMelone <27028174+Classic298@users.noreply.github.com>
Date: Wed, 4 Feb 2026 17:53:44 +0100
Subject: [PATCH 03/10] Update env-configuration.mdx

---
 docs/getting-started/env-configuration.mdx | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/docs/getting-started/env-configuration.mdx b/docs/getting-started/env-configuration.mdx
index 5237c1980..d1be55e27 100644
--- a/docs/getting-started/env-configuration.mdx
+++ b/docs/getting-started/env-configuration.mdx
@@ -4048,9 +4048,21 @@ Strictly return in JSON format:
 
 - Type: `str`
 - Default: `512x512`
-- Description: Sets the default output dimensions for generated images in WIDTHxHEIGHT format (e.g., `1024x1024`).
+- Description: Sets the default output dimensions for generated images in WIDTHxHEIGHT format (e.g., `1024x1024`). Set to `auto` to let the model determine the appropriate size (only supported by models matching `IMAGE_AUTO_SIZE_MODELS_REGEX_PATTERN`).
 - Persistence: This environment variable is a `PersistentConfig` variable.
 
+#### `IMAGE_AUTO_SIZE_MODELS_REGEX_PATTERN`
+
+- Type: `str`
+- Default: `^gpt-image`
+- Description: A regex pattern to match model names that support `IMAGE_SIZE = "auto"`. When a model matches this pattern, the `auto` size option becomes available, allowing the model to determine the appropriate output dimensions. By default, only models starting with `gpt-image` (e.g., `gpt-image-1`) are matched.
+
+#### `IMAGE_URL_RESPONSE_MODELS_REGEX_PATTERN`
+
+- Type: `str`
+- Default: `^gpt-image`
+- Description: A regex pattern to match model names that return image URLs directly instead of base64-encoded data. Models matching this pattern will not include `response_format: b64_json` in API requests. By default, only models starting with `gpt-image` are matched. For other models, Open WebUI requests base64 responses and handles the conversion internally.
+
 #### `IMAGE_STEPS`
 
 - Type: `int`

From 8709187535a01a0471927ef15dcadcdd3ff8eefb Mon Sep 17 00:00:00 2001
From: DrMelone <27028174+Classic298@users.noreply.github.com>
Date: Wed, 4 Feb 2026 18:18:32 +0100
Subject: [PATCH 04/10] filters

---
 .../plugin/development/reserved-args.mdx      |  21 ++
 docs/features/plugin/functions/filter.mdx     | 206 ++++++++++++++++--
 docs/getting-started/api-endpoints.md         |  75 +++++++
 3 files changed, 289 insertions(+), 13 deletions(-)

diff --git a/docs/features/plugin/development/reserved-args.mdx b/docs/features/plugin/development/reserved-args.mdx
index 8b36d2f63..133381ed7 100644
--- a/docs/features/plugin/development/reserved-args.mdx
+++ b/docs/features/plugin/development/reserved-args.mdx
@@ -132,6 +132,27 @@ A `dict` with wide ranging information about the chat, model, files, etc.
 
 </details>
 
+:::tip Detecting Request Source
+
+The `interface` field indicates where the request originated:
+- **`"open-webui"`** - Request came from the web interface
+- **Other/missing** - Request likely came from a direct API call
+
+For direct API calls, some fields like `chat_id`, `message_id`, and `session_id` may be absent or `null` if not explicitly provided by the API client. You can use this to distinguish between WebUI and API requests in your filters:
+
+```python
+def inlet(self, body: dict, __metadata__: dict = None) -> dict:
+    if __metadata__ and __metadata__.get("interface") == "open-webui":
+        # Request from WebUI
+        pass
+    else:
+        # Direct API request
+        pass
+    return body
+```
+
+:::
+
 ### `__model__`
 
 A `dict` with information about the model.
diff --git a/docs/features/plugin/functions/filter.mdx b/docs/features/plugin/functions/filter.mdx
index aa365f3ec..c1815e882 100644
--- a/docs/features/plugin/functions/filter.mdx
+++ b/docs/features/plugin/functions/filter.mdx
@@ -243,10 +243,15 @@ Understanding the difference between these two types is key to using the filter
 - Do **not** show up in the chat integrations menu (⚙️ icon)
 
 **Use Cases:**
-- Content moderation (always filter inappropriate content)
-- PII scrubbing (always remove sensitive data)
-- System-level transformations (always apply certain formatting)
-- Security/compliance filters
+- **Content moderation** - Filter profanity, hate speech, or inappropriate content
+- **PII scrubbing** - Automatically redact emails, phone numbers, SSNs, credit card numbers
+- **Prompt injection detection** - Block attempts to manipulate the system prompt
+- **Input/output logging** - Track all conversations for audit or analytics
+- **Cost tracking** - Estimate and log token usage for billing
+- **Rate limiting** - Enforce request limits per user or globally
+- **Language enforcement** - Ensure responses are in a specific language
+- **Company policy enforcement** - Inject legal disclaimers or compliance notices
+- **Model routing** - Redirect requests to different models based on content
 
 **Example:**
 ```python
@@ -271,12 +276,16 @@ class ContentModerationFilter:
 - `defaultFilterIds` controls their initial state (ON or OFF)
 
 **Use Cases:**
-- Web search integration (user decides when to search)
-- Citation mode (user controls when to require sources)
-- Verbose output mode (user toggles detailed responses)
-- Translation filters (user enables when needed)
-- Code formatting (user chooses when to apply)
-- Thinking/reasoning toggle (user controls whether to show model reasoning)
+- **Web search integration** - User decides when to search the web for context
+- **Citation mode** - User controls when to require sources in responses
+- **Verbose/detailed mode** - User toggles between concise and detailed responses
+- **Translation filters** - User enables translation to/from specific languages
+- **Code formatting** - User chooses when to apply syntax highlighting or linting
+- **Thinking/reasoning toggle** - Show or hide model's chain-of-thought reasoning
+- **Markdown rendering** - Toggle between raw text and formatted output
+- **Anonymization mode** - User enables when discussing sensitive topics
+- **Expert mode** - Inject domain-specific context (legal, medical, technical)
+- **Creative writing mode** - Adjust temperature and style for creative tasks
 
 **Example:**
 ```python
@@ -342,6 +351,149 @@ Here's the complete flow from admin configuration to filter execution:
 
 ---
 
+### 📡 Filter Behavior with API Requests
+
+When using Open WebUI's API endpoints directly (e.g., via `curl` or external applications), filters behave differently than when the request comes from the web interface. Understanding these differences is crucial for building effective filters.
+
+#### Key Behavioral Differences
+
+| Function | WebUI Request | Direct API Request |
+|----------|--------------|-------------------|
+| `inlet()` | ✅ Always called | ✅ Always called |
+| `stream()` | ✅ Called during streaming | ✅ Called during streaming |
+| `outlet()` | ✅ Called after response | ❌ **NOT called** by default |
+| `__event_emitter__` | ✅ Shows UI feedback | ⚠️ Runs but no UI to display |
+
+:::warning Outlet Not Called for API Requests
+The `outlet()` function is **only triggered for WebUI chat requests**, not for direct API calls to `/api/chat/completions`. This is because `outlet()` is invoked by the WebUI's `/api/chat/completed` endpoint after the chat is finished.
+
+If you need `outlet()` processing for API requests, your API client must call `/api/chat/completed` after receiving the full response.
+:::
+
+#### Triggering Outlet for API Requests
+
+To invoke `outlet()` filters for API requests, your client must make a second request to `/api/chat/completed` after receiving the complete response:
+
+```bash
+# After receiving the full response from /api/chat/completions, call:
+curl -X POST http://localhost:3000/api/chat/completed \
+  -H "Authorization: Bearer YOUR_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "llama3.1",
+    "messages": [
+      {"role": "user", "content": "Hello"},
+      {"role": "assistant", "content": "Hi there! How can I help you?"}
+    ],
+    "chat_id": "optional-chat-id",
+    "session_id": "optional-session-id"
+  }'
+```
+
+:::tip
+Include the full conversation in `messages`, including the assistant's response. The `chat_id` and `session_id` are optional but recommended for proper logging and state tracking.
+:::
+
+#### Detecting API vs WebUI Requests
+
+You can detect whether a request originates from the WebUI or a direct API call by checking the `__metadata__` argument:
+
+```python
+def inlet(self, body: dict, __metadata__: dict = None) -> dict:
+    # Check if request is from WebUI
+    interface = __metadata__.get("interface") if __metadata__ else None
+    
+    if interface == "open-webui":
+        print("Request from WebUI")
+    else:
+        print("Direct API request")
+    
+    # You can also check for presence of chat context
+    chat_id = __metadata__.get("chat_id") if __metadata__ else None
+    if not chat_id:
+        print("No chat context - likely a direct API call")
+    
+    return body
+```
+
+#### Example: Rate Limiting for All Requests
+
+Since `inlet()` is always called, use it for rate limiting that applies to both WebUI and API requests:
+
+```python
+from pydantic import BaseModel, Field
+from typing import Optional
+import time
+
+class Filter:
+    class Valves(BaseModel):
+        requests_per_minute: int = Field(default=60, description="Max requests per minute per user")
+    
+    def __init__(self):
+        self.valves = self.Valves()
+        self.user_requests = {}  # Track requests per user
+    
+    def inlet(self, body: dict, __user__: dict = None) -> dict:
+        if not __user__:
+            return body
+        
+        user_id = __user__.get("id")
+        current_time = time.time()
+        
+        # Clean old entries and count recent requests
+        if user_id not in self.user_requests:
+            self.user_requests[user_id] = []
+        
+        # Keep only requests from the last minute
+        self.user_requests[user_id] = [
+            t for t in self.user_requests[user_id] 
+            if current_time - t < 60
+        ]
+        
+        if len(self.user_requests[user_id]) >= self.valves.requests_per_minute:
+            raise Exception(f"Rate limit exceeded: {self.valves.requests_per_minute} requests/minute")
+        
+        self.user_requests[user_id].append(current_time)
+        return body
+```
+
+#### Example: Logging All API Usage
+
+Track token usage and requests for both WebUI and direct API calls:
+
+```python
+from pydantic import BaseModel, Field
+from typing import Optional
+import logging
+
+class Filter:
+    class Valves(BaseModel):
+        log_level: str = Field(default="INFO", description="Logging level")
+    
+    def __init__(self):
+        self.valves = self.Valves()
+        self.logger = logging.getLogger("api_usage")
+    
+    def inlet(self, body: dict, __user__: dict = None, __metadata__: dict = None) -> dict:
+        user_email = __user__.get("email", "unknown") if __user__ else "anonymous"
+        model = body.get("model", "unknown")
+        interface = __metadata__.get("interface", "api") if __metadata__ else "api"
+        chat_id = __metadata__.get("chat_id") if __metadata__ else None
+        
+        self.logger.info(
+            f"Request: user={user_email}, model={model}, "
+            f"interface={interface}, chat_id={chat_id or 'none'}"
+        )
+        
+        return body
+```
+
+:::note Event Emitter Behavior
+Filters that use `__event_emitter__` will still execute for API requests, but since there's no WebUI to display the events, the status messages won't be visible. The filter logic still runs—only the visual feedback is missing.
+:::
+
+---
+
 ### ⚡ Filter Priority & Execution Order
 
 When multiple filters are active, they execute in a specific order determined by their **priority** value. Understanding this is crucial when building filter chains where one filter depends on another's changes.
@@ -612,6 +764,18 @@ Modify and return the `body`. The modified version of the `body` is what the LLM
 
 4. **Streamlining User Input**: If your model’s output improves with additional guidance, you can use the `inlet` to inject clarifying instructions automatically!
 
+5. **Rate Limiting**: Track requests per user and reject requests that exceed your quota (works for both WebUI and API requests).
+
+6. **Request Logging**: Log all incoming requests for analytics, debugging, or billing purposes.
+
+7. **Language Detection**: Detect the user's language and inject translation instructions or route to a language-specific model.
+
+8. **Prompt Injection Detection**: Scan user input for attempts to manipulate the system prompt and block malicious requests.
+
+9. **Cost Estimation**: Estimate input tokens before sending to the model for budget tracking.
+
+10. **A/B Testing**: Route users to different model configurations based on user ID or random selection.
+
 ##### 💡 Example Use Cases: Build on Food Prep
 
 ###### 🥗 Example 1: Adding System Context
@@ -669,9 +833,12 @@ The **`stream` function** is a new feature introduced in Open WebUI **0.5.17** t
 Unlike `outlet`, which processes an entire completed response, `stream` operates on **individual chunks** as they are received from the model.
 
 ##### 🛠️ When to Use the Stream Hook?
-- Modify **streaming responses** before they are displayed to users.
-- Implement **real-time censorship or cleanup**.
-- **Monitor streamed data** for logging/debugging.
+- **Real-time content filtering** - Censor profanity or sensitive content as it streams
+- **Live word replacement** - Replace brand names, competitor mentions, or outdated terms
+- **Streaming analytics** - Count tokens and track response length in real-time
+- **Progress indicators** - Detect specific patterns to show loading states
+- **Debugging** - Log each chunk for troubleshooting streaming issues
+- **Format correction** - Fix common formatting issues as they appear
 
 ##### 📜 Example: Logging Streaming Chunks
 
@@ -719,6 +886,19 @@ The `outlet` function is like a **proofreader**: tidy up the AI's response (or m
 - Prefer logging over direct edits in the outlet (e.g., for debugging or analytics).
 - If heavy modifications are needed (like formatting outputs), consider using the **pipe function** instead.
 
+##### 🛠️ Use Cases for `outlet`:
+- **Response logging** - Track all model outputs for analytics or compliance
+- **Token usage tracking** - Count output tokens after completion for billing
+- **Langfuse/observability integration** - Send traces to monitoring platforms
+- **Citation formatting** - Reformat reference links in the final output
+- **Disclaimer injection** - Append legal notices or AI disclosure statements
+- **Response caching** - Store responses for future retrieval
+- **Quality scoring** - Run automated quality checks on model outputs
+
+:::warning Outlet and API Requests
+Remember: `outlet()` is **not called** for direct API requests to `/api/chat/completions`. If you need outlet processing for API calls, see the [Filter Behavior with API Requests](#-filter-behavior-with-api-requests) section above.
+:::
+
 💡 **Example Use Case**: Strip out sensitive API responses you don't want the user to see:
 ```python
 def outlet(self, body: dict, __user__: Optional[dict] = None) -> dict:
diff --git a/docs/getting-started/api-endpoints.md b/docs/getting-started/api-endpoints.md
index 189bd61b4..7ec3f89b5 100644
--- a/docs/getting-started/api-endpoints.md
+++ b/docs/getting-started/api-endpoints.md
@@ -81,6 +81,81 @@ Access detailed API documentation for different services provided by Open WebUI:
       return response.json()
   ```
 
+### 🔧 Filter and Function Behavior with API Requests
+
+When using the API endpoints directly, filters (Functions) behave differently than when requests come from the web interface.
+
+:::info Authentication Note
+Open WebUI accepts both **API keys** (prefixed with `sk-`) and **JWT tokens** for API authentication. This is intentional—the web interface uses JWT tokens internally for the same API endpoints. Both authentication methods provide equivalent API access.
+:::
+
+#### Filter Execution
+
+| Filter Function | WebUI Request | Direct API Request |
+|----------------|--------------|-------------------|
+| `inlet()` | ✅ Runs | ✅ Runs |
+| `stream()` | ✅ Runs | ✅ Runs |
+| `outlet()` | ✅ Runs | ❌ **Does NOT run** |
+
+The `inlet()` function always executes, making it ideal for:
+- **Rate limiting** - Track and limit requests per user
+- **Request logging** - Log all API usage for monitoring
+- **Input validation** - Reject invalid requests before they reach the model
+
+#### Triggering Outlet Processing
+
+The `outlet()` function only runs when the WebUI calls `/api/chat/completed` after a chat finishes. For direct API requests, you must call this endpoint yourself if you need outlet processing:
+
+- **Endpoint**: `POST /api/chat/completed`
+- **Description**: Triggers outlet filter processing for a completed chat
+
+- **Curl Example**:
+
+  ```bash
+  curl -X POST http://localhost:3000/api/chat/completed \
+  -H "Authorization: Bearer YOUR_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{
+        "model": "llama3.1",
+        "messages": [
+          {"role": "user", "content": "Hello"},
+          {"role": "assistant", "content": "Hi! How can I help you today?"}
+        ],
+        "chat_id": "optional-uuid",
+        "session_id": "optional-session-id"
+      }'
+  ```
+
+- **Python Example**:
+
+  ```python
+  import requests
+
+  def complete_chat_with_outlet(token, model, messages, chat_id=None):
+      """
+      Call after receiving the full response from /api/chat/completions
+      to trigger outlet filter processing.
+      """
+      url = 'http://localhost:3000/api/chat/completed'
+      headers = {
+          'Authorization': f'Bearer {token}',
+          'Content-Type': 'application/json'
+      }
+      payload = {
+          'model': model,
+          'messages': messages  # Include the full conversation with assistant response
+      }
+      if chat_id:
+          payload['chat_id'] = chat_id
+      
+      response = requests.post(url, headers=headers, json=payload)
+      return response.json()
+  ```
+
+:::tip
+For more details on writing filters that work with API requests, see the [Filter Function documentation](/features/plugin/functions/filter#-filter-behavior-with-api-requests).
+:::
+
 ### 🦙 Ollama API Proxy Support
 
 If you want to interact directly with Ollama models—including for embedding generation or raw prompt streaming—Open WebUI offers a transparent passthrough to the native Ollama API via a proxy route.

From 3cebd1c6e125dcf41df09bc63a43b4fdad8f0865 Mon Sep 17 00:00:00 2001
From: DrMelone <27028174+Classic298@users.noreply.github.com>
Date: Wed, 4 Feb 2026 18:25:57 +0100
Subject: [PATCH 05/10] access control knowledge bases

---
 docs/features/rag/index.md           |  4 +++
 docs/features/rbac/groups.md         |  4 +++
 docs/features/workspace/knowledge.md | 37 ++++++++++++++++++++++++++++
 docs/features/workspace/models.md    |  2 +-
 4 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/docs/features/rag/index.md b/docs/features/rag/index.md
index 47b53e542..ab43f1386 100644
--- a/docs/features/rag/index.md
+++ b/docs/features/rag/index.md
@@ -168,6 +168,10 @@ The **File Context** capability controls whether Open WebUI performs RAG (Retrie
 When File Context is disabled, file content is **not automatically extracted or injected**. Open WebUI does not forward files to the model's native API. If you disable this, the only way the model can access file content is through builtin tools (if enabled) that query knowledge bases or retrieve attached files on-demand (agentic file processing).
 :::
 
+:::tip Per-File Retrieval Mode
+Individual files and knowledge bases can also be set to bypass RAG entirely using the **"Using Entire Document"** toggle. This injects the full file content into every message regardless of native function calling settings. See [Full Context vs Focused Retrieval](/features/workspace/knowledge#full-context-vs-focused-retrieval) for details.
+:::
+
 :::info
 The File Context toggle only appears when **File Upload** is enabled for the model.
 :::
diff --git a/docs/features/rbac/groups.md b/docs/features/rbac/groups.md
index 7b9d70c4a..01d4fd9d9 100644
--- a/docs/features/rbac/groups.md
+++ b/docs/features/rbac/groups.md
@@ -69,6 +69,10 @@ You can restrict access to specific objects (like a proprietary Model or sensiti
 1.  **Tag the Resource**: When creating/editing a Model or Knowledge Base, set its visibility to **Private** or **Restricted**.
 2.  **Grant Access**: Select the specific **Groups** (or individual users) that should have "Read" or "Write" access.
 
+:::tip Knowledge Scoping for Models
+Beyond visibility, knowledge access is also scoped by model configuration. When a model has **attached knowledge bases**, it can only access those specific KBs (not all user-accessible KBs). See [Knowledge Scoping with Native Function Calling](/features/workspace/knowledge#knowledge-scoping-with-native-function-calling) for details.
+:::
+
 ### Access Control Object
 At a deeper level, resources store an access control list (ACL) looking like this:
 
diff --git a/docs/features/workspace/knowledge.md b/docs/features/workspace/knowledge.md
index 8b3eac22f..8ef16e41c 100644
--- a/docs/features/workspace/knowledge.md
+++ b/docs/features/workspace/knowledge.md
@@ -56,6 +56,43 @@ Autonomous knowledge base exploration works best with frontier models (GPT-5, Cl
 
 These tools enable models to autonomously explore and retrieve information from your knowledge bases, making conversations more contextually aware and grounded in your stored documents.
 
+#### Knowledge Scoping with Native Function Calling
+
+When native function calling is enabled, the model's access to knowledge bases depends on whether you've attached specific knowledge to the model:
+
+| Model Configuration | Knowledge Access |
+|-------------------|------------------|
+| **No KB attached** | Model can access **all** knowledge bases the user has access to (public KBs, user's own KBs) |
+| **KB attached to model** | Model is **limited** to only the attached knowledge base(s) |
+
+:::tip Restricting Knowledge Access
+If you want a model to focus on specific documents, attach those knowledge bases to the model in **Workspace > Models > Edit**. This prevents the model from searching other available knowledge bases.
+:::
+
+### Full Context vs Focused Retrieval
+
+When attaching files, notes, or knowledge bases to a model, you can choose between two retrieval modes by clicking on the attached item:
+
+#### 🔍 Focused Retrieval (Default)
+
+- Uses **RAG (Retrieval Augmented Generation)** to find relevant chunks
+- Only injects the most relevant portions of documents based on the user's query
+- Best for large documents or knowledge bases where only specific sections are relevant
+- With native function calling enabled, the model decides when to search
+
+#### 📄 Using Entire Document (Full Context)
+
+- Injects the **complete content** of the file/note into every message
+- Bypasses RAG entirely—no chunking or semantic search
+- Best for short reference documents, style guides, or context that's always relevant
+- **Always injected** regardless of native function calling settings
+
+:::info Full Context with Native Function Calling
+When "Using Entire Document" is enabled for a file or knowledge base, its content is **always injected** into the conversation, even when native function calling is enabled. The model does not need to call any tools to access this content—it's automatically included in the context.
+
+Files set to Focused Retrieval (the default) will only be accessed when the model calls the appropriate knowledge tools.
+:::
+
 :::note Per-Model Control
 The Knowledge Base tools require the **Knowledge Base** category to be enabled for the model in **Workspace > Models > Edit > Builtin Tools** (enabled by default). Administrators can disable this category per-model to prevent autonomous knowledge base access.
 :::
diff --git a/docs/features/workspace/models.md b/docs/features/workspace/models.md
index b581a693e..7b1c7da6c 100644
--- a/docs/features/workspace/models.md
+++ b/docs/features/workspace/models.md
@@ -72,7 +72,7 @@ Clicking **Show** on **Advanced Params** allows you to fine-tune the inference g
 
 You can transform a generic model into a specialized agent by toggling specific capabilities and binding resources.
 
-- **Knowledge**: Instead of manually selecting documents for every chat, you can bind a specific knowledgebase **Collection** or **File** to this model. Whenever this model is selected, RAG (Retrieval Augmented Generation) is automatically active for those specific files.
+- **Knowledge**: Instead of manually selecting documents for every chat, you can bind a specific knowledgebase **Collection** or **File** to this model. Whenever this model is selected, RAG (Retrieval Augmented Generation) is automatically active for those specific files. Click on attached items to toggle between **Focused Retrieval** (RAG chunks) and **Using Entire Document** (full content injection). See [Full Context vs Focused Retrieval](/features/workspace/knowledge#full-context-vs-focused-retrieval) for details.
 - **Tools**: Force specific tools to be enabled by default (e.g., always enable the **Calculator** tool for a "Math Bot").
 - **Filters**: Attach specific Pipelines/Filters (e.g., a Profanity Filter or PII Redaction script) to run exclusively on this model.
 - **Actions**: Attach actionable scripts like `Add to Memories` or `Button` triggers.

From a6512e6c861f342ac9bcd8a8625ad7022e47e6b8 Mon Sep 17 00:00:00 2001
From: DrMelone <27028174+Classic298@users.noreply.github.com>
Date: Wed, 4 Feb 2026 18:29:20 +0100
Subject: [PATCH 06/10] follow up

---
 .../chat-features/follow-up-prompts.md        | 47 +++++++++++++++++++
 docs/features/chat-features/index.mdx         |  2 +
 2 files changed, 49 insertions(+)
 create mode 100644 docs/features/chat-features/follow-up-prompts.md

diff --git a/docs/features/chat-features/follow-up-prompts.md b/docs/features/chat-features/follow-up-prompts.md
new file mode 100644
index 000000000..11e2b810a
--- /dev/null
+++ b/docs/features/chat-features/follow-up-prompts.md
@@ -0,0 +1,47 @@
+---
+sidebar_position: 9
+title: "Follow-Up Prompts"
+---
+
+# Follow-Up Prompts
+
+Open WebUI can automatically generate follow-up question suggestions after each model response. These suggestions appear as clickable chips below the response, helping you explore topics further without typing new prompts.
+
+## Settings
+
+Configure follow-up prompt behavior in **Settings > Interface** under the **Chat** section:
+
+### Follow-Up Auto-Generation
+
+**Default: On**
+
+Automatically generates follow-up question suggestions after each response. These suggestions are generated by the [task model](/getting-started/admin-panel#task-model) based on the conversation context.
+
+- **On**: Follow-up prompts are generated after each model response
+- **Off**: No follow-up suggestions are generated
+
+### Keep Follow-Up Prompts in Chat
+
+**Default: Off**
+
+By default, follow-up prompts only appear for the most recent message and disappear when you continue the conversation.
+
+- **On**: Follow-up prompts are preserved and remain visible for all messages in the chat history
+- **Off**: Only the last message shows follow-up prompts
+
+:::tip Perfect for Knowledge Exploration
+Enable this setting when exploring a knowledge base. You can see all the suggested follow-ups from previous responses, making it easy to revisit and explore alternative paths through the information.
+:::
+
+### Insert Follow-Up Prompt to Input
+
+**Default: Off**
+
+Controls what happens when you click a follow-up prompt.
+
+- **On**: Clicking a follow-up inserts the text into the input field, allowing you to edit it before sending
+- **Off**: Clicking a follow-up immediately sends it as your next message
+
+## Regenerating Follow-Ups
+
+If you want to regenerate follow-up suggestions for a specific response, you can use the [Regenerate Followups](https://openwebui.com/f/silentoplayz/regenerate_followups) action button from the community.
diff --git a/docs/features/chat-features/index.mdx b/docs/features/chat-features/index.mdx
index 93f191b8b..9ac8814f5 100644
--- a/docs/features/chat-features/index.mdx
+++ b/docs/features/chat-features/index.mdx
@@ -24,3 +24,5 @@ Open WebUI provides a comprehensive set of chat features designed to enhance you
 - **[🕒 Temporal Awareness](./temporal-awareness.mdx)**: How models understand time and date, including native tools for precise time calculations.
 
 - **[🧠 Reasoning & Thinking Models](./reasoning-models.mdx)**: Specialized support for models that generate internal chains of thought using thinking tags.
+
+- **[💬 Follow-Up Prompts](./follow-up-prompts.md)**: Automatic generation of suggested follow-up questions after model responses.

From 6ecf9275e40121cc82dc9aa47ef3e106791c7923 Mon Sep 17 00:00:00 2001
From: DrMelone <27028174+Classic298@users.noreply.github.com>
Date: Wed, 4 Feb 2026 18:41:45 +0100
Subject: [PATCH 07/10] nginx

---
 docs/troubleshooting/performance.md           |  40 +++++
 docs/tutorials/https/nginx.md                 | 145 ++++++++++++++++++
 docs/tutorials/tab-nginx/LetsEncrypt.md       |   7 +-
 docs/tutorials/tab-nginx/NginxProxyManager.md |  14 ++
 docs/tutorials/tab-nginx/SelfSigned.md        |   6 +-
 docs/tutorials/tab-nginx/Windows.md           |   6 +-
 6 files changed, 215 insertions(+), 3 deletions(-)

diff --git a/docs/troubleshooting/performance.md b/docs/troubleshooting/performance.md
index e20ba1349..678eb956d 100644
--- a/docs/troubleshooting/performance.md
+++ b/docs/troubleshooting/performance.md
@@ -176,8 +176,48 @@ Defines the number of worker threads available for handling requests.
 
 - **Env Var**: `THREAD_POOL_SIZE=2000`
 
+#### AIOHTTP Client Timeouts
+Long LLM completions can exceed default HTTP client timeouts. Configure these to prevent requests being cut off mid-response:
+
+- **Env Var**: `AIOHTTP_CLIENT_TIMEOUT=1800` (30 minutes for completions)
+- **Env Var**: `AIOHTTP_CLIENT_TIMEOUT_MODEL_LIST=15` (shorter for model listing)
+- **Env Var**: `AIOHTTP_CLIENT_TIMEOUT_OPENAI_MODEL_LIST=15`
+
+#### Container Resource Limits
+For Docker deployments, ensure adequate resource allocation:
+
+```yaml
+deploy:
+  resources:
+    limits:
+      memory: 8G      # Adjust based on usage
+      cpus: '4.0'
+    reservations:
+      memory: 4G
+      cpus: '2.0'
+
+# Increase file descriptor limits
+ulimits:
+  nofile:
+    soft: 65536
+    hard: 65536
+```
+
+**Diagnosis commands:**
+```bash
+# Check container resource usage
+docker stats openwebui --no-stream
+
+# Check connection states
+docker exec openwebui netstat -an | grep -E "ESTABLISHED|TIME_WAIT|CLOSE_WAIT" | sort | uniq -c
+
+# Check open file descriptors
+docker exec openwebui ls -la /proc/1/fd | wc -l
+```
+
 ---
 
+
 ## ☁️ Cloud Infrastructure Latency
 
 When deploying Open WebUI in cloud Kubernetes environments (AKS, EKS, GKE), you may notice significant performance degradation compared to local Kubernetes (Rancher Desktop, kind, Minikube) or bare-metal deployments—even with identical resource allocations. This is almost always caused by **latency** in the underlying infrastructure.
diff --git a/docs/tutorials/https/nginx.md b/docs/tutorials/https/nginx.md
index dafa1c95d..88dd3b120 100644
--- a/docs/tutorials/https/nginx.md
+++ b/docs/tutorials/https/nginx.md
@@ -89,6 +89,151 @@ import Windows from '../tab-nginx/Windows.md';
 </Tabs>
 
 
+## Complete Optimized NGINX Configuration
+
+This section provides a production-ready NGINX configuration optimized for Open WebUI streaming, WebSocket connections, and high-concurrency deployments.
+
+### Upstream Configuration
+
+Define an upstream with keepalive connections to reduce connection setup overhead:
+
+```nginx
+upstream openwebui {
+    server 127.0.0.1:3000;
+    keepalive 128;              # Persistent connections
+    keepalive_timeout 1800s;    # 30 minutes
+    keepalive_requests 10000;
+}
+```
+
+### Timeout Configuration
+
+Long-running LLM completions require extended timeouts:
+
+```nginx
+location /api/ {
+    proxy_connect_timeout 1800;   # 30 minutes
+    proxy_send_timeout 1800;
+    proxy_read_timeout 1800;
+}
+
+# WebSocket connections need even longer timeouts
+location ~ ^/(ws/|socket\.io/) {
+    proxy_connect_timeout 86400;  # 24 hours
+    proxy_send_timeout 86400;
+    proxy_read_timeout 86400;
+}
+```
+
+### Header and Body Size Limits
+
+Prevent errors with large requests or OAuth tokens:
+
+```nginx
+# In http {} or server {} block
+client_max_body_size 100M;           # Large file uploads
+proxy_buffer_size 128k;              # Large headers (OAuth tokens)
+proxy_buffers 4 256k;
+proxy_busy_buffers_size 256k;
+large_client_header_buffers 4 32k;
+```
+
+### Common Streaming Mistakes
+
+| Setting | Impact on Streaming |
+|---------|---------------------|
+| `gzip on` with `application/json` | 🔴 Buffers for compression |
+| `proxy_buffering on` | 🔴 Buffers entire response |
+| `tcp_nopush on` | 🔴 Waits for full packets |
+| `chunked_transfer_encoding on` | 🟡 Can break SSE |
+| `proxy_cache` enabled on `/api/` | 🟡 Adds overhead |
+
+### Full Example Configuration
+
+```nginx
+upstream openwebui {
+    server 127.0.0.1:3000;
+    keepalive 128;
+    keepalive_timeout 1800s;
+    keepalive_requests 10000;
+}
+
+server {
+    listen 443 ssl http2;
+    server_name your-domain.com;
+
+    # SSL configuration...
+
+    # Compression - EXCLUDE streaming content types
+    gzip on;
+    gzip_types text/plain text/css application/javascript image/svg+xml;
+    # DO NOT include: application/json, text/event-stream
+
+    # API endpoints - streaming optimized
+    location /api/ {
+        proxy_pass http://openwebui;
+        proxy_http_version 1.1;
+        proxy_set_header Upgrade $http_upgrade;
+        proxy_set_header Connection "upgrade";
+        proxy_set_header Host $host;
+        proxy_set_header X-Real-IP $remote_addr;
+        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
+        proxy_set_header X-Forwarded-Proto $scheme;
+
+        # CRITICAL: Disable all buffering for streaming
+        gzip off;
+        proxy_buffering off;
+        proxy_request_buffering off;
+        proxy_cache off;
+        tcp_nodelay on;
+        add_header X-Accel-Buffering "no" always;
+        add_header Cache-Control "no-store" always;
+
+        # Extended timeouts for LLM completions
+        proxy_connect_timeout 1800;
+        proxy_send_timeout 1800;
+        proxy_read_timeout 1800;
+    }
+
+    # WebSocket endpoints
+    location ~ ^/(ws/|socket\.io/) {
+        proxy_pass http://openwebui;
+        proxy_http_version 1.1;
+        proxy_set_header Upgrade $http_upgrade;
+        proxy_set_header Connection "upgrade";
+
+        gzip off;
+        proxy_buffering off;
+        proxy_cache off;
+
+        # 24-hour timeout for persistent connections
+        proxy_connect_timeout 86400;
+        proxy_send_timeout 86400;
+        proxy_read_timeout 86400;
+    }
+
+    # Static assets - CAN buffer and cache
+    location /static/ {
+        proxy_pass http://openwebui;
+        proxy_buffering on;
+        proxy_cache_valid 200 7d;
+        add_header Cache-Control "public, max-age=604800, immutable";
+    }
+
+    # Default location
+    location / {
+        proxy_pass http://openwebui;
+        proxy_http_version 1.1;
+        proxy_set_header Host $host;
+        proxy_set_header X-Real-IP $remote_addr;
+        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
+        proxy_set_header X-Forwarded-Proto $scheme;
+    }
+}
+```
+
+---
+
 ## Caching Configuration
 
 Proper caching significantly improves Open WebUI performance by reducing backend load and speeding up page loads. This section provides guidance for advanced users who want to implement server-side and client-side caching.
diff --git a/docs/tutorials/tab-nginx/LetsEncrypt.md b/docs/tutorials/tab-nginx/LetsEncrypt.md
index 2e17a4eb3..21e3b7ce0 100644
--- a/docs/tutorials/tab-nginx/LetsEncrypt.md
+++ b/docs/tutorials/tab-nginx/LetsEncrypt.md
@@ -270,7 +270,12 @@ With the certificate saved in your `ssl` directory, you can now update the Nginx
             proxy_set_header X-Real-IP $remote_addr;
             proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
             proxy_set_header X-Forwarded-Proto $scheme;
-            proxy_read_timeout 10m;
+            
+            # Extended timeout for long LLM completions (30 minutes)
+            proxy_read_timeout 1800;
+            proxy_send_timeout 1800;
+            proxy_connect_timeout 1800;
+            
             proxy_buffering off;
             proxy_cache off;
             client_max_body_size 20M;
diff --git a/docs/tutorials/tab-nginx/NginxProxyManager.md b/docs/tutorials/tab-nginx/NginxProxyManager.md
index 1f8c0cf1a..93859dd5d 100644
--- a/docs/tutorials/tab-nginx/NginxProxyManager.md
+++ b/docs/tutorials/tab-nginx/NginxProxyManager.md
@@ -94,6 +94,20 @@ Without this, Nginx re-chunks SSE streams, breaking markdown formatting (visible
 
 :::
 
+:::tip Extended Timeouts for Long Completions
+
+Long LLM completions (30+ minutes for complex tasks) may exceed the default 60-second timeout. Add these directives in the **Advanced** tab → **Custom Nginx Configuration**:
+
+```nginx
+proxy_read_timeout 1800;
+proxy_send_timeout 1800;
+proxy_connect_timeout 1800;
+```
+
+This sets a 30-minute timeout. Adjust as needed for your use case.
+
+:::
+
 :::tip Caching Best Practice
 
 While Nginx Proxy Manager handles most configuration automatically, be aware that:
diff --git a/docs/tutorials/tab-nginx/SelfSigned.md b/docs/tutorials/tab-nginx/SelfSigned.md
index aa024d579..369bd24a4 100644
--- a/docs/tutorials/tab-nginx/SelfSigned.md
+++ b/docs/tutorials/tab-nginx/SelfSigned.md
@@ -83,7 +83,11 @@ Using self-signed certificates is suitable for development or internal use where
             proxy_cache off;
 
             client_max_body_size 20M;
-            proxy_read_timeout 10m;
+            
+            # Extended timeout for long LLM completions (30 minutes)
+            proxy_read_timeout 1800;
+            proxy_send_timeout 1800;
+            proxy_connect_timeout 1800;
 
             add_header Cache-Control "public, max-age=300, must-revalidate";
         }
diff --git a/docs/tutorials/tab-nginx/Windows.md b/docs/tutorials/tab-nginx/Windows.md
index 3675c788a..42a8763f5 100644
--- a/docs/tutorials/tab-nginx/Windows.md
+++ b/docs/tutorials/tab-nginx/Windows.md
@@ -156,7 +156,11 @@ http {
             proxy_buffering off;
             proxy_cache off;
             client_max_body_size 20M;
-            proxy_read_timeout 10m;
+            
+            # Extended timeout for long LLM completions (30 minutes)
+            proxy_read_timeout 1800;
+            proxy_send_timeout 1800;
+            proxy_connect_timeout 1800;
 
             add_header Cache-Control "public, max-age=300, must-revalidate";
         }

From c3c4edd32a86522a703e9136dfb79a09037dfeef Mon Sep 17 00:00:00 2001
From: DrMelone <27028174+Classic298@users.noreply.github.com>
Date: Wed, 4 Feb 2026 18:46:55 +0100
Subject: [PATCH 08/10] reranking

---
 docs/getting-started/env-configuration.mdx | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/docs/getting-started/env-configuration.mdx b/docs/getting-started/env-configuration.mdx
index d1be55e27..3daa7d1a0 100644
--- a/docs/getting-started/env-configuration.mdx
+++ b/docs/getting-started/env-configuration.mdx
@@ -2907,6 +2907,14 @@ If you are embedding externally via API, ensure your rate limits are high enough
 
 ### Reranking
 
+#### `RAG_RERANKING_ENGINE`
+
+- Type: `str`
+- Options: `external`, or empty for local Sentence-Transformer CrossEncoder
+- Default: Empty string (local reranking)
+- Description: Specifies the reranking engine to use. Set to `external` to use an external reranker API (requires `RAG_EXTERNAL_RERANKER_URL`). Leave empty to use a local Sentence-Transformer CrossEncoder model.
+- Persistence: This environment variable is a `PersistentConfig` variable.
+
 #### `RAG_RERANKING_MODEL`
 
 - Type: `str`

From c593e9fec969e9b6a2e006e19f3de34c812c48ca Mon Sep 17 00:00:00 2001
From: DrMelone <27028174+Classic298@users.noreply.github.com>
Date: Wed, 4 Feb 2026 18:49:20 +0100
Subject: [PATCH 09/10] Update amazon-bedrock.md

---
 docs/tutorials/integrations/amazon-bedrock.md | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/docs/tutorials/integrations/amazon-bedrock.md b/docs/tutorials/integrations/amazon-bedrock.md
index 9ff014462..c5d3ce48d 100644
--- a/docs/tutorials/integrations/amazon-bedrock.md
+++ b/docs/tutorials/integrations/amazon-bedrock.md
@@ -73,6 +73,23 @@ docker run -e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID -e AWS_SECRET_ACCESS_KEY=$AWS
 
 You should now be able to access the BAG's swagger page at: http://localhost:8000/docs
 
+:::warning Troubleshooting: Container Exits Immediately
+
+If the Bedrock Gateway container starts and immediately exits (especially on Windows), check the logs with `docker logs <container_id>`. If you see Python/Uvicorn errors, this is likely a **Python 3.13 compatibility issue** with the BAG's Dockerfile.
+
+**Workaround:** Edit the `Dockerfile` before building and change the Python version from 3.13 to 3.12:
+
+```dockerfile
+# Change this line:
+FROM python:3.13-slim
+# To:
+FROM python:3.12-slim
+```
+
+Then rebuild with `docker build . -f Dockerfile -t bedrock-gateway`.
+
+:::
+
 ![Bedrock Access Gateway Swagger](/images/tutorials/amazon-bedrock/amazon-bedrock-proxy-api.png)
 
 ## Step 3: Add Connection in Open-WebUI

From 6a3da9b50c1652124d38cacb872e13a5e49abf8f Mon Sep 17 00:00:00 2001
From: DrMelone <27028174+Classic298@users.noreply.github.com>
Date: Wed, 4 Feb 2026 19:01:56 +0100
Subject: [PATCH 10/10] knowledge

---
 .../chat-features/follow-up-prompts.md        |  2 +-
 docs/features/plugin/tools/index.mdx          | 77 ++++++++++++-------
 docs/features/workspace/knowledge.md          | 15 ++++
 3 files changed, 64 insertions(+), 30 deletions(-)

diff --git a/docs/features/chat-features/follow-up-prompts.md b/docs/features/chat-features/follow-up-prompts.md
index 11e2b810a..d2af865fe 100644
--- a/docs/features/chat-features/follow-up-prompts.md
+++ b/docs/features/chat-features/follow-up-prompts.md
@@ -15,7 +15,7 @@ Configure follow-up prompt behavior in **Settings > Interface** under the **Chat
 
 **Default: On**
 
-Automatically generates follow-up question suggestions after each response. These suggestions are generated by the [task model](/getting-started/admin-panel#task-model) based on the conversation context.
+Automatically generates follow-up question suggestions after each response. These suggestions are generated by the [task model](/getting-started/env-configuration#task_model) based on the conversation context.
 
 - **On**: Follow-up prompts are generated after each model response
 - **Off**: No follow-up suggestions are generated
diff --git a/docs/features/plugin/tools/index.mdx b/docs/features/plugin/tools/index.mdx
index 9797f4bb0..0e08e4390 100644
--- a/docs/features/plugin/tools/index.mdx
+++ b/docs/features/plugin/tools/index.mdx
@@ -210,35 +210,43 @@ These models excel at multi-step reasoning, proper JSON formatting, and autonomo
 | `get_current_timestamp` | Get the current UTC Unix timestamp and ISO date. |
 | `calculate_timestamp` | Calculate relative timestamps (e.g., "3 days ago"). |
 
-#### Tool Parameters Reference
-
-| Tool | Parameters |
-|------|------------|
-| `search_web` | `query` (required), `count` (default: 5) |
-| `fetch_url` | `url` (required) |
-| `list_knowledge_bases` | `count` (default: 10), `skip` (default: 0) |
-| `query_knowledge_bases` | `query` (required), `count` (default: 5) |
-| `search_knowledge_bases` | `query` (required), `count` (default: 5), `skip` (default: 0) |
-| `query_knowledge_files` | `query` (required), `knowledge_ids` (optional), `count` (default: 5) |
-| `search_knowledge_files` | `query` (required), `knowledge_id` (optional), `count` (default: 5), `skip` (default: 0) |
-| `view_knowledge_file` | `file_id` (required) |
-| `generate_image` | `prompt` (required) |
-| `edit_image` | `prompt` (required), `image_urls` (required) |
-| `search_memories` | `query` (required), `count` (default: 5) |
-| `add_memory` | `content` (required) |
-| `replace_memory_content` | `memory_id` (required), `content` (required) |
-| `search_notes` | `query` (required), `count` (default: 5), `start_timestamp` (optional), `end_timestamp` (optional) |
-| `view_note` | `note_id` (required) |
-| `write_note` | `title` (required), `content` (required) |
-| `replace_note_content` | `note_id` (required), `content` (required), `title` (optional) |
-| `search_chats` | `query` (required), `count` (default: 5), `start_timestamp` (optional), `end_timestamp` (optional) |
-| `view_chat` | `chat_id` (required) |
-| `search_channels` | `query` (required), `count` (default: 5) |
-| `search_channel_messages` | `query` (required), `count` (default: 10), `start_timestamp` (optional), `end_timestamp` (optional) |
-| `view_channel_message` | `message_id` (required) |
-| `view_channel_thread` | `parent_message_id` (required) |
-| `get_current_timestamp` | None |
-| `calculate_timestamp` | `days_ago` (default: 0), `weeks_ago` (default: 0), `months_ago` (default: 0), `years_ago` (default: 0) |
+#### Tool Reference
+
+| Tool | Parameters | Output |
+|------|------------|--------|
+| **Search & Web** | | |
+| `search_web` | `query` (required), `count` (default: 5) | Array of `{title, link, snippet}` |
+| `fetch_url` | `url` (required) | Plain text content (max 50,000 chars) |
+| **Knowledge Base** | | |
+| `list_knowledge_bases` | `count` (default: 10), `skip` (default: 0) | Array of `{id, name, description, file_count}` |
+| `query_knowledge_bases` | `query` (required), `count` (default: 5) | Array of `{id, name, description}` by similarity |
+| `search_knowledge_bases` | `query` (required), `count` (default: 5), `skip` (default: 0) | Array of `{id, name, description, file_count}` |
+| `query_knowledge_files` | `query` (required), `knowledge_ids` (optional), `count` (default: 5) | Array of `{id, filename, content_snippet, knowledge_id}` |
+| `search_knowledge_files` | `query` (required), `knowledge_id` (optional), `count` (default: 5), `skip` (default: 0) | Array of `{id, filename, knowledge_id, knowledge_name}` |
+| `view_knowledge_file` | `file_id` (required) | `{id, filename, content}` |
+| **Image Gen** | | |
+| `generate_image` | `prompt` (required) | `{status, message, images}` — auto-displayed |
+| `edit_image` | `prompt` (required), `image_urls` (required) | `{status, message, images}` — auto-displayed |
+| **Memory** | | |
+| `search_memories` | `query` (required), `count` (default: 5) | Array of `{id, date, content}` |
+| `add_memory` | `content` (required) | `{status: "success", id}` |
+| `replace_memory_content` | `memory_id` (required), `content` (required) | `{status: "success", id, content}` |
+| **Notes** | | |
+| `search_notes` | `query` (required), `count` (default: 5), `start_timestamp`, `end_timestamp` | Array of `{id, title, snippet, updated_at}` |
+| `view_note` | `note_id` (required) | `{id, title, content, updated_at, created_at}` |
+| `write_note` | `title` (required), `content` (required) | `{status: "success", id}` |
+| `replace_note_content` | `note_id` (required), `content` (required), `title` (optional) | `{status: "success", id, title}` |
+| **Chat History** | | |
+| `search_chats` | `query` (required), `count` (default: 5), `start_timestamp`, `end_timestamp` | Array of `{id, title, snippet, updated_at}` |
+| `view_chat` | `chat_id` (required) | `{id, title, messages: [{role, content}]}` |
+| **Channels** | | |
+| `search_channels` | `query` (required), `count` (default: 5) | Array of `{id, name, description}` |
+| `search_channel_messages` | `query` (required), `count` (default: 10), `start_timestamp`, `end_timestamp` | Array of `{id, channel_id, content, user_name, created_at}` |
+| `view_channel_message` | `message_id` (required) | `{id, content, user_name, created_at, reply_count}` |
+| `view_channel_thread` | `parent_message_id` (required) | Array of `{id, content, user_name, created_at}` |
+| **Time Tools** | | |
+| `get_current_timestamp` | None | `{current_timestamp, current_iso}` |
+| `calculate_timestamp` | `days_ago`, `weeks_ago`, `months_ago`, `years_ago` (all default: 0) | `{current_timestamp, current_iso, calculated_timestamp, calculated_iso}` |
 
 :::info Automatic Timezone Detection
 Open WebUI automatically detects and stores your timezone when you log in. This allows time-related tools and features to provide accurate local times without any manual configuration. Your timezone is determined from your browser settings.
@@ -253,6 +261,17 @@ The native `query_knowledge_files` tool uses **simple vector search** with a def
 For the full RAG pipeline with hybrid search and reranking, use the **File Context** capability (attach files via `#` or knowledge base assignment) instead of relying on autonomous tool calls.
 :::
 
+:::warning Knowledge is NOT Auto-Injected in Native Mode
+**Important:** When using Native Function Calling, attached knowledge is **not automatically injected** into the conversation. The model must actively call knowledge tools to search and retrieve information.
+
+**If your model isn't using attached knowledge:**
+1. **Add instructions to your system prompt** telling the model to discover and query knowledge bases. Example: *"When users ask questions, first use list_knowledge_bases to see what knowledge is available, then use query_knowledge_files to search the relevant knowledge base before answering."*
+2. **Or disable Native Function Calling** for that model to restore automatic RAG injection.
+3. **Or use "Full Context" mode** for attached knowledge (click on the attachment and select "Use Entire Document") which always injects the full content.
+
+See [Knowledge Scoping with Native Function Calling](/features/workspace/knowledge#knowledge-scoping-with-native-function-calling) for more details.
+:::
+
 **Why use these?** It allows for **Deep Research** (searching the web multiple times, or querying knowledge bases), **Contextual Awareness** (looking up previous chats or notes), **Dynamic Personalization** (saving facts), and **Precise Automation** (generating content based on existing notes or documents).
 
 #### Disabling Builtin Tools (Per-Model)
diff --git a/docs/features/workspace/knowledge.md b/docs/features/workspace/knowledge.md
index 8ef16e41c..787938706 100644
--- a/docs/features/workspace/knowledge.md
+++ b/docs/features/workspace/knowledge.md
@@ -65,6 +65,21 @@ When native function calling is enabled, the model's access to knowledge bases d
 | **No KB attached** | Model can access **all** knowledge bases the user has access to (public KBs, user's own KBs) |
 | **KB attached to model** | Model is **limited** to only the attached knowledge base(s) |
 
+:::warning Knowledge is NOT Auto-Injected with Native Function Calling
+
+**Important behavioral difference:** When using Native Function Calling, attached knowledge is **not automatically injected** into the conversation. Instead, the model must actively call the knowledge tools to search and retrieve information.
+
+**If your model isn't using attached knowledge:**
+
+1. **Add instructions to your system prompt** telling the model to discover and query knowledge bases. For example:
+   > "When users ask questions, first use list_knowledge_bases to see what knowledge is available, then use query_knowledge_files to search the relevant knowledge base before answering."
+
+2. **Or disable Native Function Calling** for that model to restore the automatic RAG injection behavior from earlier versions.
+
+3. **Or use "Full Context" mode** for the attached knowledge (click on the attachment and select "Use Entire Document") which bypasses RAG and always injects the full content.
+
+:::
+
 :::tip Restricting Knowledge Access
 If you want a model to focus on specific documents, attach those knowledge bases to the model in **Workspace > Models > Edit**. This prevents the model from searching other available knowledge bases.
 :::