diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml
new file mode 100644
index 0000000..01608e1
--- /dev/null
+++ b/.github/workflows/docs.yml
@@ -0,0 +1,63 @@
+name: Docs
+
+on:
+  push:
+    branches: [main]
+    paths:
+      - "docs/**"
+      - "mkdocs.yml"
+      - "src/adk_redis/**"
+      - "pyproject.toml"
+      - ".github/workflows/docs.yml"
+  workflow_dispatch:
+
+permissions:
+  contents: read
+  pages: write
+  id-token: write
+
+concurrency:
+  group: pages
+  cancel-in-progress: false
+
+jobs:
+  build:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Install uv
+        uses: astral-sh/setup-uv@v4
+        with:
+          version: "latest"
+
+      - name: Set up Python
+        run: uv python install 3.12
+
+      - name: Install docs dependencies
+        # docs extra pulls in mkdocs-material, mkdocstrings, llmstxt, etc.
+        # The `all` extra is included so mkdocstrings can import the package
+        # surface (e.g., redisvl) when rendering API reference pages.
+        run: uv sync --extra docs --extra all
+
+      - name: Build site
+        # No --strict: pre-existing griffe warnings in tools/memory/*
+        # docstrings would block deploys. Tracked separately.
+        run: uv run mkdocs build
+
+      - name: Upload site
+        uses: actions/upload-pages-artifact@v3
+        with:
+          path: site
+
+  deploy:
+    if: github.ref == 'refs/heads/main'
+    needs: build
+    runs-on: ubuntu-latest
+    environment:
+      name: github-pages
+      url: ${{ steps.deployment.outputs.page_url }}
+    steps:
+      - name: Deploy to GitHub Pages
+        id: deployment
+        uses: actions/deploy-pages@v4
diff --git a/docs/concepts/adk_overview.md b/docs/concepts/adk_overview.md
index 347c637..4f39411 100644
--- a/docs/concepts/adk_overview.md
+++ b/docs/concepts/adk_overview.md
@@ -1,30 +1,53 @@
 # ADK Overview
 
-The [Google Agent Development Kit (ADK)](https://github.com/google/adk-python) is a framework for building AI agents with Google's Gemini models. `adk-redis` provides Redis-backed implementations of ADK's service interfaces.
+The [Google Agent Development Kit (ADK)](https://github.com/google/adk-python) is a framework for building AI agents with Google's Gemini models. `adk-redis` provides Redis-backed implementations of ADK's service interfaces so you can move from prototype to production without rewriting your agent.
 
-## ADK abstractions
+## Architecture
 
-| Abstraction | What it does | Redis implementation |
-|-------------|-------------|---------------------|
-| **Agent** | The reasoning core: plans, calls tools, responds | No change (ADK provides this) |
-| **Session** | Conversation state across turns | `RedisSessionService` |
-| **Memory** | Persistent knowledge across sessions | `RedisMemoryService` |
-| **Tool** | Functions the agent can call | RedisVL search tools |
+```mermaid
+flowchart TD
+    subgraph Agent [Your ADK Agent]
+        SS[Session Service<br/>working memory]
+        MS[Memory Service<br/>long-term]
+        ST[Search Tools<br/>vector · hybrid · SQL]
+        SC[Semantic Cache<br/>before/after callbacks]
+    end
 
-## Where Redis fits
+    SS & MS -->|REST / MCP| AMS
+    ST -->|RedisVL / MCP| R
+    SC -->|RedisVL / LangCache| R
 
-Redis replaces the default in-memory implementations with durable, scalable alternatives:
+    subgraph AMS [Agent Memory Server]
+        WM[Working Memory API]
+        LTM[Long-Term Memory API]
+    end
 
-- **Sessions** are stored as Redis JSON documents with optional TTL
-- **Memory** is proxied to the Redis Agent Memory Server for two-tier storage
-- **Search tools** use RedisVL for vector similarity search
-- **Caching** uses Redis for semantic LLM response caching
+    AMS --> R
 
-## When to use adk-redis
+    subgraph R [Redis 8.4+]
+        JSON[(JSON storage)]
+        VEC[(Vector index)]
+        FTS[(Full-text index)]
+    end
+```
 
-Use `adk-redis` when you are building a Google ADK agent and need:
+## ADK Interfaces
 
-- Session persistence across process restarts
-- Long-term memory that survives beyond a single conversation
-- Vector search over your own documents
-- Production deployment with Redis as the data layer
+`adk-redis` implements four ADK extension points. Each one maps to a concept page with full details.
+
+| ADK Interface | `adk-redis` implementation | Concept page |
+|---------------|---------------------------|-------------|
+| `BaseSessionService` | `RedisWorkingMemorySessionService` | [Sessions + Memory Services](sessions.md) |
+| `BaseMemoryService` | `RedisLongTermMemoryService` | [Sessions + Memory Services](sessions.md) |
+| `BaseTool` | Search tools (`RedisVectorSearchTool`, `RedisHybridSearchTool`, etc.) and memory tools (`SearchMemoryTool`, `CreateMemoryTool`, etc.) | [Search Tools](search.md), [Memory MCP + Tools](memory.md) |
+| Model callbacks | `LLMResponseCache` with `RedisVLCacheProvider` or `LangCacheProvider` | [Semantic Caching](caching.md) |
+
+## Running Your Agent
+
+ADK provides several ways to run and test agents:
+
+- **`adk web`**: browser-based UI for interactive development and debugging.
+- **`adk run`**: terminal-based interaction.
+- **`adk api_server`**: RESTful API for production deployment.
+
+See the [ADK runtime documentation](https://google.github.io/adk-docs/runtime/) for details.
diff --git a/docs/concepts/caching.md b/docs/concepts/caching.md
new file mode 100644
index 0000000..2328cfb
--- /dev/null
+++ b/docs/concepts/caching.md
@@ -0,0 +1,144 @@
+# Semantic Caching
+
+`adk-redis` provides semantic caching that skips LLM calls when a user sends a prompt that is similar (or identical) to one already answered. This reduces latency and cost without changing agent behavior.
+
+## Quick Reference
+
+| Feature | Details |
+|---------|---------|
+| **What it caches** | LLM responses keyed by prompt similarity |
+| **Similarity** | Vector distance between prompt embeddings |
+| **Providers** | `RedisVLCacheProvider` (self-hosted) or `LangCacheProvider` (managed) |
+| **TTL** | Configurable per-entry expiration |
+| **Integration** | ADK `before_model_callback` / `after_model_callback` hooks |
+
+## How It Works
+
+```mermaid
+flowchart TD
+    U([User prompt]) --> BC[before_model_callback<br/>embed prompt, search cache]
+    BC --> D{Cache hit?}
+    D -->|Yes| CR([Return cached response<br/>no LLM call])
+    D -->|No| LLM[Call LLM]
+    LLM --> AC[after_model_callback<br/>store response in cache]
+    AC --> R([Return LLM response])
+
+    subgraph Cache [Redis Cache]
+        SE[(Semantic index<br/>prompt embeddings)]
+    end
+
+    BC <--> Cache
+    AC --> Cache
+```
+
+1. Before the LLM is called, `LLMResponseCache` embeds the prompt and searches for a semantically similar entry in the cache.
+2. If the distance is below the configured threshold, the cached response is returned immediately (no LLM call).
+3. If no match is found, the LLM runs normally and the response is stored in the cache for future hits.
+
+## Two Provider Options
+
+### Self-Hosted (RedisVL)
+
+Use `RedisVLCacheProvider` when you run your own Redis instance and want full control over the vectorizer and cache index.
+
+```python
+from redisvl.utils.vectorize import HFTextVectorizer
+
+from adk_redis.cache import (
+    LLMResponseCache,
+    LLMResponseCacheConfig,
+    RedisVLCacheProvider,
+    RedisVLCacheProviderConfig,
+)
+
+vectorizer = HFTextVectorizer(model="redis/langcache-embed-v1")
+
+provider = RedisVLCacheProvider(
+    config=RedisVLCacheProviderConfig(
+        redis_url="redis://localhost:6379",
+        name="my_cache",
+        ttl=3600,
+        distance_threshold=0.1,
+    ),
+    vectorizer=vectorizer,
+)
+```
+
+**Requirements**: `pip install 'adk-redis[search]'` and a running Redis instance.
+
+### Managed (LangCache)
+
+Use `LangCacheProvider` with [Redis LangCache](https://redis.io/langcache) for a fully managed service. No local vectorizer needed; embeddings are handled server-side.
+
+```python
+from adk_redis.cache import (
+    LLMResponseCache,
+    LLMResponseCacheConfig,
+    LangCacheProvider,
+    LangCacheProviderConfig,
+)
+
+provider = LangCacheProvider(
+    config=LangCacheProviderConfig(
+        cache_id="your-cache-id",
+        api_key="your-api-key",
+        server_url="https://aws-us-east-1.langcache.redis.io",
+        ttl=3600,
+    ),
+)
+```
+
+**Requirements**: `pip install 'adk-redis[langcache]'` and a LangCache account.
+
+## Wiring Into an Agent
+
+Both providers use the same `LLMResponseCache` wrapper, which produces ADK-compatible callbacks:
+
+```python
+from adk_redis.cache import create_llm_cache_callbacks
+
+llm_cache = LLMResponseCache(
+    provider=provider,
+    config=LLMResponseCacheConfig(
+        first_message_only=True,   # only cache the first user message
+        include_app_name=True,     # scope cache keys by app
+        include_user_id=True,      # scope cache keys by user
+    ),
+)
+
+before_cb, after_cb = create_llm_cache_callbacks(llm_cache)
+
+agent = Agent(
+    model="gemini-2.0-flash",
+    name="my_agent",
+    before_model_callback=before_cb,
+    after_model_callback=after_cb,
+)
+```
+
+## When to Use Which
+
+| Provider | Use when |
+|----------|----------|
+| **RedisVL** | You already run Redis, want local embeddings, need full control over cache index schema. |
+| **LangCache** | You want a managed service with no infrastructure, server-side embeddings, and built-in analytics. |
+
+## Configuration Options
+
+| Option | Provider | Default | Description |
+|--------|----------|---------|-------------|
+| `distance_threshold` | Both | `0.1` | Max vector distance for a cache hit (lower = stricter) |
+| `ttl` | Both | `None` | Time-to-live in seconds for cache entries |
+| `name` | RedisVL | `llmcache` | Redis index name for the cache |
+| `redis_url` | RedisVL | `redis://localhost:6379` | Redis connection string |
+| `cache_id` | LangCache | Required | LangCache instance identifier |
+| `api_key` | LangCache | Required | LangCache API key |
+| `use_exact_search` | LangCache | `True` | Enable exact (hash) matching in addition to semantic |
+| `use_semantic_search` | LangCache | `True` | Enable semantic (vector) matching |
+
+## Next Steps
+
+- [Semantic cache example](https://github.com/redis-developer/adk-redis/tree/main/examples/semantic_cache) for a runnable self-hosted demo.
+- [LangCache example](https://github.com/redis-developer/adk-redis/tree/main/examples/langcache_cache) for a runnable managed demo.
+- [Sessions + Memory services](sessions.md) and [Sessions + Memory MCP](memory.md) for the other Redis-backed features.
+- [ADK runtime options](https://google.github.io/adk-docs/runtime/) for `adk web`, `adk run`, and `adk api_server`.
diff --git a/docs/concepts/index.md b/docs/concepts/index.md
index 4fc1408..e9c7785 100644
--- a/docs/concepts/index.md
+++ b/docs/concepts/index.md
@@ -4,7 +4,12 @@ description: Foundational concepts for adk-redis.
 
 # Concepts
 
-How adk-redis maps Google ADK service interfaces onto Redis.
+`adk-redis` maps Google ADK service interfaces onto Redis, the Agent Memory
+Server, and RedisVL. These pages explain the **what** and **why** behind each
+feature. For step-by-step setup instructions, see the
+[User Guide](../user_guide/index.md).
+
+There are four ways to use this integration. Pick the page that matches your goal.
 
 <div class="grid cards" markdown>
 
@@ -12,24 +17,42 @@ How adk-redis maps Google ADK service interfaces onto Redis.
 
     ---
 
-    A short tour of the Google Agent Development Kit interfaces this package implements.
+    Architecture diagram and the ADK interfaces this package implements.
+
+-   :material-brain:{ .lg .middle } **[Sessions + Memory Services](sessions.md)**
+
+    ---
+
+    Framework-managed sessions and memory. The ADK Runner handles everything automatically.
 
--   :material-account-multiple:{ .lg .middle } **[Sessions](sessions.md)**
+-   :material-tools:{ .lg .middle } **[Sessions + Memory MCP + Tools](memory.md)**
 
     ---
 
-    Session storage model and ADK session-service contract.
+    LLM-controlled memory via MCP or REST-based tools. The agent decides when to remember and recall.
 
--   :material-brain:{ .lg .middle } **[Memory](memory.md)**
+-   :material-database-search:{ .lg .middle } **[RedisVL MCP + Search Tools](search.md)**
 
     ---
 
-    Short-term and long-term memory layered over Agent Memory Server.
+    Vector, hybrid, range, text, and SQL search over your own data via in-process tools or MCP.
 
--   :material-database-search:{ .lg .middle } **[Search](search.md)**
+-   :material-cached:{ .lg .middle } **[Semantic Caching](caching.md)**
 
     ---
 
-    Vector and lexical search backing the ADK tool surface.
+    Skip repeat LLM calls with self-hosted (RedisVL) or managed (LangCache) semantic caching.
 
 </div>
+
+## Where to Start
+
+| Goal | Read this |
+|------|-----------|
+| Understand the big picture | [ADK overview](adk_overview.md) |
+| Let the framework manage sessions and memory | [Sessions + Memory Services](sessions.md) |
+| Give the LLM explicit memory tools | [Sessions + Memory MCP + Tools](memory.md) |
+| Search your own knowledge base | [RedisVL MCP + Search Tools](search.md) |
+| Reduce LLM latency and cost | [Semantic Caching](caching.md) |
+| Get a working agent running | [Quickstart](../user_guide/01_integration.md) |
+| Run and test your agent | [ADK runtime](https://google.github.io/adk-docs/runtime/) |
diff --git a/docs/concepts/memory.md b/docs/concepts/memory.md
index d75ab08..611a3aa 100644
--- a/docs/concepts/memory.md
+++ b/docs/concepts/memory.md
@@ -1,33 +1,161 @@
-# Memory
+# Sessions + Memory with MCP + Tools
 
-`RedisMemoryService` connects ADK agents to the Redis Agent Memory Server, providing two-tier memory that persists across sessions.
+Use ADK's native `McpToolset` to connect your agent to the Agent Memory Server's MCP endpoint, or use the REST-based memory tools directly. The LLM decides when to search, create, update, or delete memories.
 
-## Two-tier model
+## Quick Reference
 
-| Tier | Scope | Storage | Search |
-|------|-------|---------|--------|
-| **Working memory** | Current session | Redis JSON | None (use session state) |
-| **Long-term memory** | All sessions | Redis vector index | Semantic, keyword, hybrid |
+| Feature | Details |
+|---------|---------|
+| **Protocol** | MCP (via SSE or Streamable HTTP) or REST-based ADK tools |
+| **Control** | LLM-driven: the agent chooses when to remember and recall |
+| **Session storage** | Agent Memory Server working memory |
+| **Long-term memory** | Agent Memory Server with vector + full-text indexes |
+| **Language support** | MCP works with Python, TypeScript, and any MCP-compatible client |
 
-## How it works
+## How It Works
 
-1. During a conversation, the agent accumulates facts and preferences
-2. When the session ends (or on explicit flush), memories are extracted and stored in long-term memory
-3. On future sessions, the agent searches long-term memory to recall relevant context
+```mermaid
+flowchart TD
+    U([User message]) --> A[ADK Agent]
+    A -->|"LLM decides to search"| MCP{MCP or REST?}
 
-## Configuration
+    MCP -->|MCP| MCPS[McpToolset<br/>search · create · prompt]
+    MCP -->|REST| REST[Memory Tools<br/>SearchMemoryTool · CreateMemoryTool]
 
-`RedisMemoryService` connects to a running Agent Memory Server instance:
+    MCPS --> AMS[Agent Memory Server]
+    REST --> AMS
+
+    AMS --> WM[Working Memory]
+    AMS --> LTM[Long-Term Memory]
+
+    AMS -->|results| A
+    A --> R([Agent response])
+
+    subgraph Redis [Redis 8.4+]
+        J[(JSON)]
+        V[(Vector index)]
+        FT[(Full-text index)]
+    end
+
+    AMS --- Redis
+```
+
+Unlike the [services approach](sessions.md), where the framework handles memory automatically, here the LLM explicitly calls memory tools during reasoning. This gives you fine-grained control over what gets stored and retrieved.
+
+## Option 1: MCP Tools
+
+Connect to the Agent Memory Server's MCP endpoint using ADK's `McpToolset`. This is the recommended approach for multi-language support and when the same memory server is shared across agents.
+
+```python
+from google.adk import Agent
+from google.adk.tools.mcp_tool import McpToolset
+from google.adk.tools.mcp_tool.mcp_session_manager import SseConnectionParams
+
+memory_tools = McpToolset(
+    connection_params=SseConnectionParams(url="http://localhost:9000/sse"),
+    tool_filter=[
+        "search_long_term_memory",
+        "create_long_term_memories",
+        "memory_prompt",
+    ],
+)
+
+agent = Agent(
+    model="gemini-2.5-flash",
+    name="my_agent",
+    tools=[memory_tools],
+    instruction="Search memory before answering. Store important facts.",
+)
+```
+
+### Available MCP Tools
+
+| Tool | Description |
+|------|-------------|
+| `search_long_term_memory` | Semantic, keyword, or hybrid search across memories |
+| `create_long_term_memories` | Store new memories with topics, types, and metadata |
+| `get_long_term_memory` | Retrieve a specific memory by ID |
+| `edit_long_term_memory` | Update an existing memory |
+| `delete_long_term_memories` | Remove memories by ID |
+| `memory_prompt` | Enrich a prompt with relevant memories |
+| `set_working_memory` | Write to the current session's working memory |
+
+## Option 2: REST-Based Tools
+
+Use the Python memory tool classes for direct REST access. No MCP server needed; the tools call the Agent Memory Server REST API.
 
 ```python
-from adk_redis import RedisMemoryService
+from google.adk import Agent
+
+from adk_redis import (
+    SearchMemoryTool,
+    CreateMemoryTool,
+    UpdateMemoryTool,
+    DeleteMemoryTool,
+    MemoryPromptTool,
+    MemoryToolConfig,
+)
+
+config = MemoryToolConfig(
+    api_base_url="http://localhost:8000",
+    default_namespace="my_app",
+    recency_boost=True,
+)
 
-memory = RedisMemoryService(
-    memory_server_url="http://localhost:8000",
-    namespace="my-app",
+agent = Agent(
+    model="gemini-2.0-flash",
+    name="my_agent",
+    tools=[
+        SearchMemoryTool(config=config),
+        CreateMemoryTool(config=config),
+        UpdateMemoryTool(config=config),
+        DeleteMemoryTool(config=config),
+        MemoryPromptTool(config=config),
+    ],
 )
 ```
 
-## Relationship to Agent Memory Server
+### Available REST Tools
+
+| Tool | Description |
+|------|-------------|
+| `SearchMemoryTool` | Semantic search with optional recency boost |
+| `CreateMemoryTool` | Store a new memory (semantic, episodic, or message) |
+| `GetMemoryTool` | Retrieve a memory by ID |
+| `UpdateMemoryTool` | Update content, topics, or metadata |
+| `DeleteMemoryTool` | Remove memories by ID |
+| `MemoryPromptTool` | Enrich a system prompt with relevant memories |
+
+## MCP vs REST Decision
+
+| | MCP | REST Tools |
+|---|---|---|
+| **Multi-language** | Yes (Python, TypeScript, any MCP client) | Python only |
+| **Shared server** | Yes, multiple agents connect to one MCP endpoint | Each agent connects directly to REST API |
+| **Extra service** | Requires MCP server running | No extra service (direct HTTP) |
+| **Tool filtering** | `tool_filter` on `McpToolset` | Choose which tool classes to instantiate |
+
+## Configuration (REST Tools)
+
+| Option | Default | Description |
+|--------|---------|-------------|
+| `api_base_url` | `http://localhost:8000` | Agent Memory Server URL |
+| `timeout` | `30` | HTTP timeout in seconds |
+| `default_namespace` | `default` | Namespace for memory isolation |
+| `search_top_k` | `10` | Default max search results |
+| `recency_boost` | `True` | Bias scoring toward newer memories |
+| `distance_threshold` | `None` | Max vector distance for search results |
+| `deduplicate` | `True` | Deduplicate when creating memories |
+
+Launch with the [ADK web UI](https://google.github.io/adk-docs/runtime/) for interactive testing:
+
+```bash
+adk web .
+```
+
+## Next Steps
 
-`RedisMemoryService` is a thin client that proxies to the Agent Memory Server REST API. It does not implement memory logic itself. The server handles extraction, deduplication, and search.
+- [Sessions + Memory services](sessions.md) for the framework-managed alternative.
+- [Fitness coach example](https://github.com/redis-developer/adk-redis/tree/main/examples/fitness_coach_mcp) for a working MCP-based agent.
+- [Search tools](search.md) for RedisVL-backed index search (separate from memory search).
+- [ADK runtime options](https://google.github.io/adk-docs/runtime/) for `adk web`, `adk run`, and `adk api_server`.
diff --git a/docs/concepts/search.md b/docs/concepts/search.md
index a385c7c..9c9e4d8 100644
--- a/docs/concepts/search.md
+++ b/docs/concepts/search.md
@@ -1,28 +1,154 @@
-# Search
+# RedisVL MCP + Search Tools
 
-`adk-redis` provides RedisVL-backed search tools that ADK agents can call during reasoning.
+Search your own data with [RedisVL](https://docs.redisvl.com)-backed tools. Use in-process Python tools for fine-grained control, or connect to RedisVL's MCP server for language-agnostic, multi-agent access.
 
-## How it works
+## Quick Reference
 
-The search tools use RedisVL to perform vector similarity search over a Redis index:
+| Feature | Details |
+|---------|---------|
+| **Engine** | Redis Query Engine (built into Redis 8.4+) |
+| **Library** | RedisVL (`redisvl>=0.18.2`) |
+| **Search types** | Vector (semantic), hybrid (vector + BM25), range, text (BM25), SQL |
+| **Vectorizers** | HuggingFace, OpenAI, Cohere, Ollama, and more via RedisVL |
+| **MCP server** | `rvl mcp` exposes `search-records` and `upsert-records` tools |
+| **Install** | `pip install 'adk-redis[search]'` (add `[sql]` for SQL search) |
 
-1. The agent decides it needs to search for information
-2. It calls the search tool with a natural language query
-3. The tool generates an embedding for the query
-4. RedisVL performs a vector similarity search against the index
-5. The top results are returned to the agent as context
+## How It Works
 
-## Available tools
+```mermaid
+flowchart TD
+    A([ADK Agent]) -->|natural language query| T{In-process or MCP?}
+
+    T -->|In-process| IP[Search Tool<br/>e.g. RedisHybridSearchTool]
+    IP -->|1. embed query| V[Vectorizer]
+    V -->|2. build query| RVL[RedisVL]
+    RVL -->|FT.SEARCH / FT.HYBRID| R[(Redis 8.4+<br/>Vector + Full-text index)]
+
+    T -->|MCP| MCP[McpToolset<br/>→ rvl mcp server]
+    MCP -->|search-records| R
+
+    R -->|top-k results| A
+```
+
+## Option 1: In-Process Python Tools
+
+Each tool subclasses `BaseTool` and wraps a RedisVL query type. Bind it to an existing index and pass it to your agent.
+
+```python
+from google.adk import Agent
+from redisvl.index import SearchIndex
+from redisvl.utils.vectorize import HFTextVectorizer
+
+from adk_redis import RedisVectorSearchTool, RedisVectorQueryConfig
+
+vectorizer = HFTextVectorizer(model="sentence-transformers/all-MiniLM-L6-v2")
+index = SearchIndex.from_existing("products", redis_url="redis://localhost:6379")
+
+tool = RedisVectorSearchTool(
+    index=index,
+    vectorizer=vectorizer,
+    config=RedisVectorQueryConfig(
+        vector_field_name="embedding",
+        num_results=5,
+    ),
+    return_fields=["title", "price", "category"],
+    name="search_products",
+    description="Semantic search across the product catalog.",
+)
+
+agent = Agent(model="gemini-2.0-flash", tools=[tool])
+```
+
+### Available Tools
+
+| Tool | Query type | Vectorizer | Use when |
+|------|-----------|------------|----------|
+| `RedisVectorSearchTool` | Semantic (cosine/IP/L2) | Yes | Finding semantically similar content |
+| `RedisHybridSearchTool` | Vector + BM25 keyword | Yes | Best of both worlds: meaning and exact terms |
+| `RedisRangeSearchTool` | Vector within distance | Yes | All content within a similarity radius |
+| `RedisTextSearchTool` | BM25 full-text only | No | Keyword matching without embeddings |
+| `RedisSQLSearchTool` | SQL `SELECT` | No | Structured queries with filters and aggregations |
+
+### Hybrid Search
+
+Combines vector similarity with BM25 keyword scoring. On Redis 8.4+ with `redisvl>=0.18.2`, uses the native `FT.HYBRID` command. Falls back to client-side aggregation on older versions.
+
+```python
+from adk_redis import RedisHybridSearchTool, RedisHybridQueryConfig
+
+tool = RedisHybridSearchTool(
+    index=index,
+    vectorizer=vectorizer,
+    config=RedisHybridQueryConfig(
+        text_field_name="description",
+        combination_method="LINEAR",
+        linear_alpha=0.7,  # 70% text, 30% vector
+        num_results=10,
+    ),
+)
+```
+
+### SQL Search
+
+The LLM writes SQL `SELECT` statements; the tool translates them into `FT.SEARCH` or `FT.AGGREGATE` calls.
+
+```python
+from adk_redis import RedisSQLSearchTool
+
+tool = RedisSQLSearchTool(index=index)
+# LLM emits: SELECT title, price FROM products WHERE category = 'electronics'
+```
+
+Install with `pip install 'adk-redis[sql]'`.
+
+## Option 2: RedisVL MCP Server
+
+Connect to RedisVL's MCP server (`rvl mcp`) using ADK's `McpToolset`. The server exposes schema-aware tools per index.
+
+```python
+from google.adk import Agent
+from google.adk.tools.mcp_tool import McpToolset
+from google.adk.tools.mcp_tool.mcp_session_manager import StdioConnectionParams
+from mcp import StdioServerParameters
+
+agent = Agent(
+    model="gemini-2.5-flash",
+    name="redis_mcp_agent",
+    tools=[
+        McpToolset(
+            connection_params=StdioConnectionParams(
+                server_params=StdioServerParameters(
+                    command="rvl",
+                    args=["mcp", "--config", "mcp.yaml", "--read-only"],
+                ),
+                timeout=30,
+            ),
+            tool_filter=["search-records"],
+        ),
+    ],
+)
+```
+
+For a remote server, swap `StdioConnectionParams` for `StreamableHTTPConnectionParams(url=..., headers={...})`.
+
+Install with `pip install 'redisvl[mcp]>=0.18.2'`.
+
+### MCP Tools
 
 | Tool | Description |
 |------|-------------|
-| `redis_vector_search` | Semantic search over a Redis vector index |
-| `redis_hybrid_search` | Combined vector + keyword search (native FT.HYBRID on Redis 8.4+, aggregation fallback elsewhere) |
-| `redis_range_search` | Vector search with a distance threshold |
-| `redis_text_search` | Keyword full-text search via BM25 |
-| `redis_sql_search` | SQL `SELECT` against a bound index via `redisvl.query.SQLQuery`. Requires the `adk-redis[sql]` extra. |
+| `search-records` | Schema-aware search (vector, fulltext, or hybrid, chosen at server start) |
+| `upsert-records` | Write path. Suppress with `--read-only` on the server |
+
+## In-Process vs MCP Decision
 
-In addition to the in-process Python tools, you can connect an agent to RedisVL's own MCP server (one index per server) using ADK's standard `McpToolset` pointed at a running `rvl mcp` instance. The server exposes schema-aware `search-records` and `upsert-records` tools and is useful when the same index needs to be served to multiple agents or non-Python clients. See the [search tools how-to](../user_guide/how_to_guides/search_tools.md) for the decision matrix and code samples.
+| | In-process tools | MCP (`rvl mcp`) |
+|---|---|---|
+| **Control** | Full config over query params, vectorizer | Schema-driven, server chooses search mode |
+| **Multi-language** | Python only | Any MCP client (Python, TypeScript, Claude Desktop) |
+| **Shared index** | Each agent connects directly | Multiple agents share one server |
+| **Deployment** | No extra service | Requires running `rvl mcp` process |
+| **Read-only guard** | Application-level | `--read-only` flag on server |
 
 ## Indexing
 
@@ -36,6 +162,15 @@ index.create(overwrite=True)
 index.load(documents)
 ```
 
-## Relationship to RedisVL
+Launch with the [ADK web UI](https://google.github.io/adk-docs/runtime/) for interactive testing:
+
+```bash
+adk web .
+```
+
+## Next Steps
 
-The search tools are thin wrappers around RedisVL query types (`VectorQuery`, `FilterQuery`). They translate the agent's natural language query into a structured RedisVL search.
+- [Search tools how-to](../user_guide/how_to_guides/search_tools.md) for complete code samples.
+- [RedisVL MCP search example](https://github.com/redis-developer/adk-redis/tree/main/examples/redisvl_mcp_search) for a working agent.
+- [Sessions + Memory](sessions.md) for cross-session knowledge (different from index search).
+- [ADK runtime options](https://google.github.io/adk-docs/runtime/) for `adk web`, `adk run`, and `adk api_server`.
diff --git a/docs/concepts/sessions.md b/docs/concepts/sessions.md
index 640eb02..cd59dd7 100644
--- a/docs/concepts/sessions.md
+++ b/docs/concepts/sessions.md
@@ -1,27 +1,162 @@
-# Sessions
+# Sessions + Memory with Services
 
-`RedisSessionService` stores ADK session state in Redis, making it durable and shareable across processes.
+Use `RedisWorkingMemorySessionService` and `RedisLongTermMemoryService` when you want the ADK `Runner` to manage sessions and memory automatically. Plug them in and let the framework handle the rest.
 
-## How sessions are stored
+## Quick Reference
 
-Each session is a Redis JSON document keyed by `adk:session:{app_name}:{user_id}:{session_id}`. The document contains:
+| Feature | Details |
+|---------|---------|
+| **Session storage** | Agent Memory Server working memory (Redis JSON) |
+| **Long-term memory** | Agent Memory Server with vector + full-text indexes |
+| **Auto-summarization** | Old messages are summarized when context window fills |
+| **Memory extraction** | Background promotion of facts to long-term storage |
+| **Search** | Semantic, keyword, and hybrid search across sessions |
+| **Multi-process** | Safe for horizontal scaling; all state lives in Redis |
 
-- **Messages** - The conversation history (user, agent, tool calls)
-- **State** - Arbitrary key-value data for the session
-- **Metadata** - Timestamps, app name, user ID
+## How It Works
 
-## TTL behavior
+```mermaid
+flowchart TD
+    U([User message]) --> R[ADK Runner]
+    R -->|append_event| WM[Working Memory<br/>messages · context · data]
+    WM -->|auto-summarize| WM
+    WM -->|background extraction| LTM[Long-Term Memory<br/>vector + full-text index]
+    LTM -->|search_memory| R
+    R --> A([Agent response])
 
-Sessions support configurable TTL:
+    subgraph AMS [Agent Memory Server]
+        WM
+        LTM
+    end
 
-- TTL is refreshed on every read or write
-- Expired sessions are automatically cleaned up by Redis
-- Default: no expiration (persistent until explicitly deleted)
+    subgraph Redis [Redis 8.4+]
+        J[(JSON storage)]
+        V[(Vector index)]
+        FT[(Full-text index)]
+    end
 
-## Cross-process semantics
+    AMS --- Redis
+```
 
-Because sessions are stored in Redis (not in memory), multiple processes can share the same session. This enables:
+1. The ADK `Runner` calls `append_event()` after every turn, forwarding the message to the Agent Memory Server.
+2. When the conversation exceeds `context_window_max` tokens, the server summarizes older messages and stores the summary in a `context` field.
+3. A background task extracts structured memories (facts, preferences, events) and promotes them to long-term storage.
+4. On future sessions, `search_memory()` retrieves relevant memories via hybrid search.
 
-- Horizontal scaling of ADK agents
-- Seamless failover between instances
-- Background workers that access session state
+## Usage
+
+```python
+from google.adk.agents import Agent
+from google.adk.runners import Runner
+
+from adk_redis import (
+    RedisLongTermMemoryService,
+    RedisLongTermMemoryServiceConfig,
+    RedisWorkingMemorySessionService,
+    RedisWorkingMemorySessionServiceConfig,
+)
+
+session_service = RedisWorkingMemorySessionService(
+    config=RedisWorkingMemorySessionServiceConfig(
+        api_base_url="http://localhost:8000",
+        default_namespace="my_app",
+        model_name="gpt-4o",
+        context_window_max=8000,
+    ),
+)
+
+memory_service = RedisLongTermMemoryService(
+    config=RedisLongTermMemoryServiceConfig(
+        api_base_url="http://localhost:8000",
+        default_namespace="my_app",
+        recency_boost=True,
+    ),
+)
+
+agent = Agent(
+    model="gemini-2.0-flash",
+    name="my_agent",
+    instruction="You are a helpful assistant with memory.",
+)
+
+runner = Runner(
+    agent=agent,
+    session_service=session_service,
+    memory_service=memory_service,
+)
+```
+
+Launch with the [ADK web UI](https://google.github.io/adk-docs/runtime/) for interactive testing:
+
+```bash
+adk web .
+```
+
+## Configuration
+
+### Session Service (`RedisWorkingMemorySessionServiceConfig`)
+
+| Option | Default | Description |
+|--------|---------|-------------|
+| `api_base_url` | `http://localhost:8000` | Agent Memory Server URL |
+| `timeout` | `30.0` | HTTP request timeout in seconds |
+| `default_namespace` | `None` | Logical grouping for multi-tenant isolation |
+| `model_name` | `None` | Model name used for context window sizing and summarization |
+| `context_window_max` | `None` | Token limit that triggers auto-summarization |
+| `extraction_strategy` | `discrete` | How memories are extracted (`discrete`, `summary`, `preferences`, `custom`) |
+| `session_ttl_seconds` | `None` | Optional TTL; expired sessions are cleaned up by Redis |
+
+### Memory Service (`RedisLongTermMemoryServiceConfig`)
+
+| Option | Default | Description |
+|--------|---------|-------------|
+| `api_base_url` | `http://localhost:8000` | Agent Memory Server URL |
+| `timeout` | `30.0` | HTTP request timeout in seconds |
+| `default_namespace` | `None` | Namespace for memory isolation |
+| `search_top_k` | `10` | Max results returned from `search_memory()` |
+| `distance_threshold` | `None` | Max vector distance for search results (0.0-1.0) |
+| `recency_boost` | `True` | Bias search scoring toward newer memories |
+| `semantic_weight` | `0.8` | Weight for semantic similarity (0.0-1.0) |
+| `recency_weight` | `0.2` | Weight for recency score (0.0-1.0) |
+| `extraction_strategy` | `discrete` | How memories are extracted (`discrete`, `summary`, `preferences`, `custom`) |
+
+## Automatic Summarization
+
+When conversation messages exceed `context_window_max` tokens, the server:
+
+1. Summarizes older messages into a compact paragraph.
+2. Stores the summary in the `context` field of working memory.
+3. Removes the summarized messages to free space.
+4. Keeps recent messages intact.
+
+```mermaid
+flowchart LR
+    M["msg1 msg2 ... msg10"] -->|exceeds threshold| S[Summarize]
+    S --> C["context: 'User discussed trip planning...'"]
+    S --> K["msg8 msg9 msg10<br/>(recent kept)"]
+```
+
+## Memory Types
+
+The server extracts three types of memories from conversations:
+
+| Type | Description | Example |
+|------|-------------|---------|
+| **Semantic** | Facts, preferences, general knowledge | "User prefers window seats" |
+| **Episodic** | Events with temporal context | "User visited Paris in March 2024" |
+| **Message** | Conversation records (auto-generated) | Stored from working memory messages |
+
+## Cross-Process Scaling
+
+Because all state lives in the Agent Memory Server (backed by Redis), multiple processes can share sessions:
+
+- **Horizontal scaling**: deploy multiple agent replicas behind a load balancer.
+- **Seamless failover**: if one instance goes down, another picks up the session.
+- **Background workers**: separate processes can read session state for analytics.
+
+## Next Steps
+
+- [Session service how-to](../user_guide/how_to_guides/session_service.md) for setup details.
+- [Memory service how-to](../user_guide/how_to_guides/memory_service.md) for memory configuration.
+- [Sessions + Memory MCP + Tools](memory.md) for the MCP-based alternative.
+- [Fitness coach example](https://github.com/redis-developer/adk-redis/tree/main/examples/fitness_coach_mcp) for a working agent.
diff --git a/docs/examples/fitness_coach.md b/docs/examples/fitness_coach.md
deleted file mode 100644
index 2ca3620..0000000
--- a/docs/examples/fitness_coach.md
+++ /dev/null
@@ -1,23 +0,0 @@
-# Fitness Coach
-
-A fitness coaching agent that uses MCP for memory operations and Redis for session persistence.
-
-## What it demonstrates
-
-- `RedisSessionService` for durable sessions
-- MCP integration for accessing Agent Memory Server
-- Multi-turn conversation with personalized fitness advice
-
-## Running
-
-```bash
-cd examples/fitness_coach_mcp
-pip install -r requirements.txt
-python main.py
-```
-
-## Architecture
-
-The fitness coach stores workout preferences and history in long-term memory via MCP. Each session is backed by Redis, so the agent can be restarted without losing conversation context.
-
-See the [full source on GitHub](https://github.com/redis-developer/adk-redis/tree/main/examples/fitness_coach_mcp).
diff --git a/docs/examples/index.md b/docs/examples/index.md
index fab8df6..15395c4 100644
--- a/docs/examples/index.md
+++ b/docs/examples/index.md
@@ -4,26 +4,29 @@ description: Worked agents built with adk-redis and the Google ADK.
 
 # Examples
 
-Worked agents built with the Google ADK and Redis backends.
+Runnable agents built with the Google ADK and Redis. Each links to its source
+directory and explains what it demonstrates.
 
-<div class="grid cards" markdown>
+## Memory and Sessions
 
--   :material-run-fast:{ .lg .middle } **[Fitness coach](fitness_coach.md)**
+| Example | What it shows |
+|---------|---------------|
+| [**Simple Redis memory**](https://github.com/redis-developer/adk-redis/tree/main/examples/simple_redis_memory) | Minimal agent with `RedisWorkingMemorySessionService` and `RedisLongTermMemoryService`. |
+| [**Fitness coach (MCP)**](https://github.com/redis-developer/adk-redis/tree/main/examples/fitness_coach_mcp) | MCP-based memory with `McpToolset` and Agent Memory Server. |
+| [**Travel agent (hybrid)**](https://github.com/redis-developer/adk-redis/tree/main/examples/travel_agent_memory_hybrid) | Framework-managed sessions + memory with vector search over travel docs. |
+| [**Travel agent (tools)**](https://github.com/redis-developer/adk-redis/tree/main/examples/travel_agent_memory_tools) | Same travel agent using LLM-controlled memory tools instead of framework services. |
 
-    ---
+## Search
 
-    A coaching agent that remembers user goals and progress over time.
+| Example | What it shows |
+|---------|---------------|
+| [**Redis search tools**](https://github.com/redis-developer/adk-redis/tree/main/examples/redis_search_tools) | Vector, text, and range search tools in one agent. |
+| [**SQL search**](https://github.com/redis-developer/adk-redis/tree/main/examples/redis_sql_search) | `RedisSQLSearchTool` answering catalog questions via parameterized SQL. |
+| [**RedisVL MCP search**](https://github.com/redis-developer/adk-redis/tree/main/examples/redisvl_mcp_search) | Same knowledge base served via `rvl mcp` over MCP. |
 
--   :material-airplane:{ .lg .middle } **[Travel agent (hybrid)](travel_agent_hybrid.md)**
+## Semantic Caching
 
-    ---
-
-    Combines vector and structured search for itinerary planning.
-
--   :material-magnify:{ .lg .middle } **[Redis search tools](redis_search_tools.md)**
-
-    ---
-
-    Drop-in tools that expose Redis FT.SEARCH to an ADK agent.
-
-</div>
+| Example | What it shows |
+|---------|---------------|
+| [**Semantic cache (RedisVL)**](https://github.com/redis-developer/adk-redis/tree/main/examples/semantic_cache) | Self-hosted semantic cache with `RedisVLCacheProvider`. |
+| [**LangCache cache**](https://github.com/redis-developer/adk-redis/tree/main/examples/langcache_cache) | Managed semantic cache with `LangCacheProvider`. |
diff --git a/docs/examples/redis_search_tools.md b/docs/examples/redis_search_tools.md
deleted file mode 100644
index e61568f..0000000
--- a/docs/examples/redis_search_tools.md
+++ /dev/null
@@ -1,19 +0,0 @@
-# Redis Search Tools
-
-An ADK agent with RedisVL-backed semantic search over custom documents.
-
-## What it demonstrates
-
-- Creating RedisVL search tools for ADK
-- Indexing documents into Redis
-- Agent using search results as context for answers
-
-## Running
-
-```bash
-cd examples/redis_search_tools
-pip install -r requirements.txt
-python main.py
-```
-
-See the [full source on GitHub](https://github.com/redis-developer/adk-redis/tree/main/examples/redis_search_tools).
diff --git a/docs/examples/travel_agent_hybrid.md b/docs/examples/travel_agent_hybrid.md
deleted file mode 100644
index d105a6c..0000000
--- a/docs/examples/travel_agent_hybrid.md
+++ /dev/null
@@ -1,25 +0,0 @@
-# Travel Agent (Hybrid Memory)
-
-A travel planning agent using both session-scoped memory and long-term persistent memory.
-
-## What it demonstrates
-
-- `RedisSessionService` for conversation state
-- `RedisMemoryService` for long-term user preferences
-- Hybrid memory pattern: session + persistent
-
-## Running
-
-```bash
-cd examples/travel_agent_memory_hybrid
-pip install -r requirements.txt
-python main.py
-```
-
-## How it works
-
-1. The agent starts a session and loads the user's travel preferences from long-term memory
-2. During the conversation, it plans trips based on preferences and real-time input
-3. At session end, new preferences are extracted and stored in long-term memory
-
-See the [full source on GitHub](https://github.com/redis-developer/adk-redis/tree/main/examples/travel_agent_memory_hybrid).
diff --git a/docs/index.md b/docs/index.md
index 77a7591..4fa0af8 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -1,5 +1,5 @@
 ---
-description: adk-redis documentation. Redis backends for the Google Agent Development Kit.
+description: adk-redis documentation. Redis integrations for the Google Agent Development Kit.
 ---
 
 <div class="rds-hero" markdown>
@@ -8,7 +8,7 @@ description: adk-redis documentation. Redis backends for the Google Agent Develo
 
 # adk-redis
 
-Redis backends for the Google Agent Development Kit
+Redis Integrations for the Google Agent Development Kit
 { .rds-hero__tagline }
 
 </div>
@@ -23,7 +23,7 @@ pip install adk-redis
 docker run -d --name redis -p 6379:6379 redis:8
 ```
 
-→ *[Integration walkthrough](user_guide/01_integration.md)*
+→ *[Quickstart](user_guide/01_integration.md)*
 
 ---
 
@@ -47,7 +47,7 @@ docker run -d --name redis -p 6379:6379 redis:8
 
     ---
 
-    Worked agents: fitness coach, hybrid travel agent, Redis search tools.
+    Nine runnable agents covering memory, search, and caching.
 
 -   :material-api:{ .lg .middle } **[API Reference](api/index.md)**
 
diff --git a/docs/llms.txt b/docs/llms.txt
index 17d6e6a..fd6f8ae 100644
--- a/docs/llms.txt
+++ b/docs/llms.txt
@@ -1,6 +1,6 @@
 # adk-redis
 
-> Redis backends for Google's Agent Development Kit. Provides ADK
+> Redis integrations for Google's Agent Development Kit. Provides ADK
 > `BaseSessionService` and `BaseMemoryService` implementations, five
 > RedisVL-backed search tools, MCP toolsets for RedisVL and Agent Memory
 > Server, and semantic cache providers (self-hosted RedisVL or managed
@@ -17,10 +17,11 @@ For task-oriented recipes, browse the
 - [Sessions](concepts/sessions.md): working memory backed by Agent Memory Server.
 - [Memory](concepts/memory.md): long-term semantic memory with recency boosting.
 - [Search](concepts/search.md): vector, hybrid, range, text, and SQL search over RedisVL.
+- [Caching](concepts/caching.md): semantic caching with RedisVL or LangCache.
 
 ## User guide
 
-- [Integration walkthrough](user_guide/01_integration.md): end-to-end wiring of services into an ADK Runner.
+- [Quickstart](user_guide/01_integration.md): three steps to a running agent with Redis sessions and memory.
 - [How-to guides index](user_guide/how_to_guides/index.md): pointers to setup and task guides.
 
 ## How-to guides
@@ -28,8 +29,9 @@ For task-oriented recipes, browse the
 - [Redis setup](user_guide/how_to_guides/redis_setup.md): start Redis 8.4 locally or in Redis Cloud.
 - [Memory server setup](user_guide/how_to_guides/memory_server_setup.md): run Agent Memory Server.
 - [Session service](user_guide/how_to_guides/session_service.md): wire `RedisWorkingMemorySessionService`.
-- [Memory service](user_guide/how_to_guides/memory_service.md): wire `RedisLongTermMemoryService` and the memory tools.
+- [Memory service](user_guide/how_to_guides/memory_service.md): wire `RedisLongTermMemoryService`.
 - [Search tools](user_guide/how_to_guides/search_tools.md): use the five search tools and the RedisVL MCP toolset.
+- [Semantic cache](user_guide/how_to_guides/semantic_cache.md): add self-hosted or managed semantic caching.
 
 ## API reference
 
diff --git a/docs/stylesheets/redis-brand.css b/docs/stylesheets/redis-brand.css
index b56643c..49a1ee2 100644
--- a/docs/stylesheets/redis-brand.css
+++ b/docs/stylesheets/redis-brand.css
@@ -30,6 +30,17 @@
   --md-accent-fg-color--transparent: rgba(255, 68, 56, 0.1);
 }
 
+/* Layout: pin the left sidebar to the viewport's left edge and the right
+ * TOC to the viewport's right edge by removing Material's centered grid
+ * cap. Sidebars keep their default 12.1rem widths; the content column
+ * absorbs all remaining space. Header/tabs use the same grid so they
+ * also span full width, keeping the layout consistent. */
+@media screen and (min-width: 76.25em) {
+  .md-grid {
+    max-width: none;
+  }
+}
+
 /* Bold every top-level entry in the left sidebar. With navigation.sections,
  * group labels (Concepts, User Guide, ...) get the section-title treatment,
  * but single-page top-level items like Home stay regular weight. This rule
diff --git a/docs/user_guide/01_integration.md b/docs/user_guide/01_integration.md
index 13518e1..775cfdf 100644
--- a/docs/user_guide/01_integration.md
+++ b/docs/user_guide/01_integration.md
@@ -1,67 +1,23 @@
-# Integration Guide
+# Quickstart
 
-> **Canonical docs:** The full integration guide is published at [redis.io/docs/latest/integrate/google-adk/](https://redis.io/docs/latest/integrate/google-adk/). This file is a quick-reference for contributors working in the repo.
+Get an ADK agent running with Redis-backed sessions and long-term memory in
+three steps. For the concepts behind each feature, see the
+[Concepts](../concepts/index.md) section.
 
-Complete guide for integrating Redis Agent Memory Server with adk-redis.
+> **Full guide on redis.io:**
+> [redis.io/docs/latest/integrate/google-adk/](https://redis.io/docs/latest/integrate/google-adk/)
 
-## Architecture
+## 1. Start infrastructure
 
-```
-┌─────────────────────────────────────────────────────────────────┐
-│                         ADK Application                         │
-│  ┌──────────────────────────────────────────────────────────┐   │
-│  │                      ADK Agent                           │   │
-│  │  ┌────────────────────┐    ┌──────────────────────────┐  │   │
-│  │  │ Session Service    │    │   Memory Service         │  │   │
-│  │  │ (Working Memory)   │    │   (Long-Term Memory)     │  │   │
-│  │  └────────┬───────────┘    └──────────┬───────────────┘  │   │
-│  └───────────┼────────────────────────────┼──────────────────┘   │
-└──────────────┼────────────────────────────┼──────────────────────┘
-               │                            │
-               │    HTTP API (port 8000)    │
-               ▼                            ▼
-┌─────────────────────────────────────────────────────────────────┐
-│              Redis Agent Memory Server                          │
-│  ┌──────────────────────┐    ┌──────────────────────────────┐  │
-│  │  Working Memory API  │    │  Long-Term Memory API        │  │
-│  │ - Session messages  │    │ - Semantic search           │  │
-│  │ - Auto-summarize    │    │ - Memory extraction         │  │
-│  │ - Context window    │    │ - Recency boosting          │  │
-│  └──────────┬───────────┘    └──────────┬───────────────────┘  │
-└─────────────┼────────────────────────────┼──────────────────────┘
-              │                            │
-              │    Redis Protocol          │
-              ▼                            ▼
-┌─────────────────────────────────────────────────────────────────┐
-│                         Redis 8.4+                              │
-│ - JSON storage                                                 │
-│ - Vector search (Redis Query Engine)                           │
-│ - Full-text search                                             │
-│ - Persistence                                                  │
-└─────────────────────────────────────────────────────────────────┘
-```
-
-## Component Responsibilities
-
-| Component | Responsibility |
-|-----------|----------------|
-| **ADK Agent** | Agent logic, tool execution, response generation |
-| **adk-redis Session Service** | Implements ADK's `BaseSessionService` interface |
-| **adk-redis Memory Service** | Implements ADK's `BaseMemoryService` interface |
-| **Agent Memory Server** | Memory extraction, summarization, vector search |
-| **Redis 8.4+** | Data storage, vector indexing, full-text search, persistence |
-
----
-
-## Complete Setup
-
-### 1. Start Infrastructure
+Follow the [Redis setup](how_to_guides/redis_setup.md) and
+[Agent Memory Server setup](how_to_guides/memory_server_setup.md) how-to guides,
+or use this minimal start:
 
 ```bash
-# Start Redis 8.4
+# Redis 8.4
 docker run -d --name redis -p 6379:6379 redis:8.4-alpine
 
-# Start Agent Memory Server
+# Agent Memory Server (dev mode)
 docker run -d --name agent-memory-server \
   -p 8000:8000 \
   -e REDIS_URL=redis://host.docker.internal:6379 \
@@ -77,53 +33,48 @@ docker run -d --name agent-memory-server \
 curl http://localhost:8000/v1/health
 ```
 
-> **Note**: Redis 8.4 includes the Redis Query Engine (evolved from RediSearch) with native support for vector search, full-text search, and JSON operations. Redis Stack is no longer needed.
-
-**Note:** On Linux, replace `host.docker.internal` with `172.17.0.1` or use `--network host` mode.
+!!! note
+    On Linux, replace `host.docker.internal` with `172.17.0.1` or use
+    `--network host`.
 
-### 2. Install Dependencies
+## 2. Install dependencies
 
 ```bash
 pip install google-adk "adk-redis[memory]"
 ```
 
-### 3. Configure Services
+## 3. Wire services into an agent
 
 ```python
 from google.adk import Agent
 from google.adk.runners import Runner
-from adk_redis.memory import RedisLongTermMemoryService, RedisLongTermMemoryServiceConfig
-from adk_redis.sessions import RedisWorkingMemorySessionService, RedisWorkingMemorySessionServiceConfig
+from adk_redis import (
+    RedisWorkingMemorySessionService,
+    RedisWorkingMemorySessionServiceConfig,
+    RedisLongTermMemoryService,
+    RedisLongTermMemoryServiceConfig,
+)
 
-# Configure session service (Tier 1: Working Memory)
-session_config = RedisWorkingMemorySessionServiceConfig(
-    api_base_url="http://localhost:8000",
-    default_namespace="my_app",
-    model_name="gpt-4o",
-    context_window_max=8000,
-    extraction_strategy="discrete",
+session_service = RedisWorkingMemorySessionService(
+    config=RedisWorkingMemorySessionServiceConfig(
+        api_base_url="http://localhost:8000",
+        default_namespace="my_app",
+    )
 )
-session_service = RedisWorkingMemorySessionService(config=session_config)
 
-# Configure memory service (Tier 2: Long-Term Memory)
-memory_config = RedisLongTermMemoryServiceConfig(
-    api_base_url="http://localhost:8000",
-    default_namespace="my_app",
-    extraction_strategy="discrete",
-    recency_boost=True,
-    semantic_weight=0.8,
-    recency_weight=0.2,
+memory_service = RedisLongTermMemoryService(
+    config=RedisLongTermMemoryServiceConfig(
+        api_base_url="http://localhost:8000",
+        default_namespace="my_app",
+    )
 )
-memory_service = RedisLongTermMemoryService(config=memory_config)
 
-# Create agent
 agent = Agent(
     name="memory_agent",
     model="gemini-2.0-flash",
     instruction="You are a helpful assistant with long-term memory.",
 )
 
-# Create runner with both services
 runner = Runner(
     agent=agent,
     app_name="my_app",
@@ -132,256 +83,27 @@ runner = Runner(
 )
 ```
 
----
-
-## Configuration Reference
-
-### RedisWorkingMemorySessionServiceConfig
-
-| Parameter | Type | Default | Description |
-|-----------|------|---------|-------------|
-| `api_base_url` | `str` | `http://localhost:8000` | Agent Memory Server URL |
-| `default_namespace` | `str` | `None` | Namespace for session isolation |
-| `model_name` | `str` | `None` | LLM model for summarization |
-| `context_window_max` | `int` | `None` | Max tokens before auto-summarization |
-| `extraction_strategy` | `str` | `discrete` | `discrete`, `summary`, `preferences`, `custom` |
-| `session_ttl_seconds` | `int` | `None` | Session expiration time |
-| `timeout` | `float` | `30.0` | HTTP request timeout |
-
-### RedisLongTermMemoryServiceConfig
-
-| Parameter | Type | Default | Description |
-|-----------|------|---------|-------------|
-| `api_base_url` | `str` | `http://localhost:8000` | Agent Memory Server URL |
-| `default_namespace` | `str` | `None` | Namespace for memory isolation |
-| `search_top_k` | `int` | `10` | Max memories per search |
-| `distance_threshold` | `float` | `None` | Max distance for search results (0.0-1.0) |
-| `recency_boost` | `bool` | `True` | Enable recency-aware ranking |
-| `semantic_weight` | `float` | `0.8` | Weight for semantic similarity (0.0-1.0) |
-| `recency_weight` | `float` | `0.2` | Weight for recency score (0.0-1.0) |
-| `extraction_strategy` | `str` | `discrete` | `discrete`, `summary`, `preferences`, `custom` |
-| `timeout` | `float` | `30.0` | HTTP request timeout |
-
----
-
-## Running Examples
-
-### Memory Example
-
-```bash
-cd examples/simple_redis_memory
-pip install "adk-redis[web]"
-
-# Set environment
-export GOOGLE_API_KEY=your-google-key
-export REDIS_MEMORY_SERVER_URL=http://localhost:8000
-
-# Run
-python main.py
-```
-
-Open http://localhost:8080
-
-**Test conversation:**
-1. Session 1: "Hi, I'm Alice. I love pizza and Python programming."
-2. Wait 5 seconds for memory extraction
-3. Session 2 (new session): "What do you remember about me?"
-
-### Search Tools Example
+Launch the agent with the ADK runtime
+([`adk web`](https://google.github.io/adk-docs/runtime/)):
 
 ```bash
-cd examples/redis_search_tools
-pip install adk-redis
-
-# Set environment
-export REDIS_URL=redis://localhost:6379
-export GOOGLE_API_KEY=your-google-key
-
-# Load data
-python load_data.py
-
-# Run agent
-adk web redis_search_tools_agent
-```
-
----
-
-## Data Flow
-
-### Session Message Flow
-
-```
-1. User sends message
-   ↓
-2. ADK Agent processes with RedisWorkingMemorySessionService
-   ↓
-3. Session service stores message in Agent Memory Server (Working Memory API)
-   ↓
-4. Agent Memory Server stores in Redis
-   ↓
-5. Background task extracts memories to Long-Term Memory
+adk web my_app
 ```
 
-### Memory Search Flow
-
-```
-1. ADK Agent needs context
-   ↓
-2. RedisLongTermMemoryService.search_memory() called
-   ↓
-3. Query sent to Agent Memory Server (Long-Term Memory API)
-   ↓
-4. Agent Memory Server performs vector search in Redis
-   ↓
-5. Results ranked with recency boosting
-   ↓
-6. Memories returned to agent
-```
-
----
-
-## Three Ways to Use Memory
-
-adk-redis provides three approaches for memory integration:
-
-### 1. Memory Services (Framework-Managed)
-
-Best for: Full ADK integration with automatic memory management.
-
-```python
-from adk_redis import (
-    RedisWorkingMemorySessionService,
-    RedisLongTermMemoryService,
-)
-
-# Framework handles memory automatically
-runner = Runner(
-    agent=agent,
-    session_service=session_service,
-    memory_service=memory_service,
-)
-```
+**Try it out:**
 
-### 2. REST-Based Tools (LLM-Controlled)
-
-Best for: Explicit LLM control over memory operations.
-
-```python
-from adk_redis import (
-    SearchMemoryTool,
-    GetMemoryTool,
-    CreateMemoryTool,
-    UpdateMemoryTool,
-    DeleteMemoryTool,
-    MemoryToolConfig,
-)
-
-config = MemoryToolConfig(
-    api_base_url="http://localhost:8000",
-    default_namespace="my_app",
-)
-
-agent = Agent(
-    name="memory_agent",
-    tools=[
-        SearchMemoryTool(config=config),
-        GetMemoryTool(config=config),
-        CreateMemoryTool(config=config),
-    ],
-)
-```
-
-### 3. MCP-Based Tools (Protocol-Based)
-
-Best for: MCP ecosystem integration and standardized tool discovery. Use ADK's native `McpToolset` with `SseConnectionParams` to connect to Agent Memory Server's SSE endpoint.
-
-```python
-from google.adk import Agent
-from google.adk.tools.mcp_tool import McpToolset
-from google.adk.tools.mcp_tool.mcp_session_manager import SseConnectionParams
-
-memory_tools = McpToolset(
-    connection_params=SseConnectionParams(url="http://localhost:8000/sse"),
-    tool_filter=["search_long_term_memory", "create_long_term_memories"],
-)
-
-agent = Agent(
-    name="memory_agent",
-    tools=[memory_tools],
-)
-```
-
-See the [fitness_coach_mcp example](../examples/fitness_coach_mcp/) for a complete MCP integration example.
-
-### Decision Matrix
-
-| Use Case | Recommended Approach |
-|----------|---------------------|
-| Full ADK integration | Memory Services |
-| LLM decides when to remember | REST Tools |
-| MCP ecosystem | MCP Tools (native `McpToolset`) |
-| Debugging/development | REST Tools |
-| Multi-agent systems | MCP Tools |
-
----
-
-## Troubleshooting
-
-### No memories found
-
-**Cause:** Memory extraction hasn't completed
-
-**Solution:** Wait 5-10 seconds after sending messages for background extraction
-
-### Connection refused
-
-**Cause:** Agent Memory Server not running
-
-**Solution:**
-```bash
-docker ps | grep agent-memory-server
-curl http://localhost:8000/v1/health
-```
-
-### Import errors
-
-**Cause:** Missing dependencies
-
-**Solution:**
-```bash
-pip install "adk-redis[memory]"
-```
-
----
-
-## Alternative: MCP Integration
-
-For MCP (Model Context Protocol) based integration, see the [fitness_coach_mcp example](../examples/fitness_coach_mcp/).
-
-MCP provides a standardized protocol for connecting agents to tools via Server-Sent Events (SSE):
-
-```python
-from google.adk import Agent
-from google.adk.tools.mcp_tool import McpToolset
-from google.adk.tools.mcp_tool.mcp_session_manager import SseConnectionParams
-
-# Connect to Agent Memory Server's MCP endpoint
-memory_tools = McpToolset(
-    connection_params=SseConnectionParams(url="http://localhost:9000/sse"),
-    tool_filter=["search_long_term_memory", "create_long_term_memories"],
-)
-
-agent = Agent(
-    model="gemini-2.0-flash",
-    name="my_agent",
-    tools=[memory_tools],
-)
-```
+1. "Hi, I'm Alice. I love pizza and Python."
+2. Wait 5 seconds for background memory extraction.
+3. Start a new session: "What do you remember about me?"
 
-**When to use MCP vs Services:**
+## What next?
 
-| Approach | Best For |
-|----------|----------|
-| **ADK Services** | Full framework integration, automatic memory extraction |
-| **REST Tools** | LLM-controlled memory with explicit tool calls |
-| **MCP Tools** | Standard MCP protocol, automatic tool discovery |
+| Goal | Page |
+|------|------|
+| Understand sessions, memory, search, and caching | [Concepts](../concepts/index.md) |
+| Configure session or memory services in detail | [Session service how-to](how_to_guides/session_service.md), [Memory service how-to](how_to_guides/memory_service.md) |
+| Give the LLM explicit memory tools | [Sessions + Memory MCP + Tools](../concepts/memory.md) |
+| Add vector or hybrid search | [Search tools how-to](how_to_guides/search_tools.md) |
+| Reduce LLM cost with semantic caching | [Semantic Caching](../concepts/caching.md) |
+| See a full working example | [Examples](../examples/index.md) |
+| Run or deploy your agent | [ADK runtime](https://google.github.io/adk-docs/runtime/) |
diff --git a/docs/user_guide/how_to_guides/index.md b/docs/user_guide/how_to_guides/index.md
index 468bda4..e67fd2c 100644
--- a/docs/user_guide/how_to_guides/index.md
+++ b/docs/user_guide/how_to_guides/index.md
@@ -38,4 +38,10 @@ Task-oriented recipes for adk-redis.
 
     Expose Redis vector and lexical search as ADK tools.
 
+-   :material-cached:{ .lg .middle } **[Semantic cache](semantic_cache.md)**
+
+    ---
+
+    Skip repeat LLM calls with self-hosted (RedisVL) or managed (LangCache) caching.
+
 </div>
diff --git a/docs/user_guide/how_to_guides/memory_server_setup.md b/docs/user_guide/how_to_guides/memory_server_setup.md
index f9870b7..0cd2349 100644
--- a/docs/user_guide/how_to_guides/memory_server_setup.md
+++ b/docs/user_guide/how_to_guides/memory_server_setup.md
@@ -127,7 +127,7 @@ docker compose up -d
 | Variable | Default | Description |
 |----------|---------|-------------|
 | `DISABLE_AUTH` | `false` | Disable authentication (dev only) |
-| `GENERATION_MODEL` | `gpt-5` | LLM model for summarization and memory extraction |
+| `GENERATION_MODEL` | `gpt-4o` | LLM model for summarization and memory extraction |
 | `EMBEDDING_MODEL` | `text-embedding-3-small` | Embedding model for semantic search |
 | `REDISVL_VECTOR_DIMENSIONS` | `1536` | Embedding dimensions (required for some models like Ollama) |
 | `EXTRACTION_DEBOUNCE_SECONDS` | `300` | Debounce period (in seconds) for memory extraction. Lower values (e.g., 5) provide faster memory extraction, while higher values reduce API calls |
diff --git a/docs/user_guide/how_to_guides/memory_service.md b/docs/user_guide/how_to_guides/memory_service.md
index 5e21278..196ec0f 100644
--- a/docs/user_guide/how_to_guides/memory_service.md
+++ b/docs/user_guide/how_to_guides/memory_service.md
@@ -1,42 +1,77 @@
 # Memory Service
 
-This guide shows how to wire `RedisMemoryService` into a Google ADK agent for persistent long-term memory.
+This guide shows how to wire `RedisLongTermMemoryService` into a Google ADK
+agent for persistent long-term memory backed by the
+[Agent Memory Server](memory_server_setup.md).
+
+For the concepts behind long-term memory, see
+[Sessions + Memory Services](../../concepts/sessions.md).
 
 ## Prerequisites
 
-- Redis Agent Memory Server running on `localhost:8000`
-- `adk-redis` installed: `pip install adk-redis`
+- Agent Memory Server running on `localhost:8000`
+  (see [Memory server setup](memory_server_setup.md)).
+- `adk-redis` with the memory extra: `pip install "adk-redis[memory]"`.
 
 ## Basic usage
 
 ```python
-from google.adk.agents import Agent
-from adk_redis import RedisMemoryService
+from google.adk import Agent
+from google.adk.runners import Runner
+from adk_redis import (
+    RedisLongTermMemoryService,
+    RedisLongTermMemoryServiceConfig,
+    RedisWorkingMemorySessionService,
+    RedisWorkingMemorySessionServiceConfig,
+)
+
+session_service = RedisWorkingMemorySessionService(
+    config=RedisWorkingMemorySessionServiceConfig(
+        api_base_url="http://localhost:8000",
+        default_namespace="my_app",
+    )
+)
 
-# Create Redis-backed memory service
-memory_service = RedisMemoryService(
-    memory_server_url="http://localhost:8000",
-    namespace="my-app",
+memory_service = RedisLongTermMemoryService(
+    config=RedisLongTermMemoryServiceConfig(
+        api_base_url="http://localhost:8000",
+        default_namespace="my_app",
+        recency_boost=True,
+    )
 )
 
-# Use with your ADK agent
 agent = Agent(
     model="gemini-2.0-flash",
     name="my_agent",
     instruction="You are a helpful assistant with memory.",
 )
+
+runner = Runner(
+    agent=agent,
+    app_name="my_app",
+    session_service=session_service,
+    memory_service=memory_service,
+)
 ```
 
 ## How memories flow
 
-1. The agent converses with the user in a session
-2. At session end, the memory service extracts key facts
-3. Facts are stored in the Agent Memory Server as long-term memories
-4. On future sessions, the agent retrieves relevant memories via semantic search
+1. The agent converses with the user in a session.
+2. The Agent Memory Server extracts key facts in the background.
+3. Facts are stored as long-term memories with vector embeddings.
+4. On future sessions, the runner calls `search_memory()` and injects
+   relevant memories into the prompt automatically.
 
 ## Configuration options
 
 | Option | Default | Description |
 |--------|---------|-------------|
-| `memory_server_url` | `http://localhost:8000` | Agent Memory Server URL |
-| `namespace` | `default` | Memory namespace for isolation |
+| `api_base_url` | `http://localhost:8000` | Agent Memory Server URL |
+| `default_namespace` | `None` | Namespace for memory isolation |
+| `search_top_k` | `10` | Max memories per search |
+| `distance_threshold` | `None` | Max distance for results (0.0-1.0) |
+| `recency_boost` | `True` | Enable recency-aware ranking |
+| `semantic_weight` | `0.8` | Weight for semantic similarity |
+| `recency_weight` | `0.2` | Weight for recency score |
+| `extraction_strategy` | `discrete` | `discrete`, `summary`, `preferences`, `custom` |
+| `timeout` | `30.0` | HTTP request timeout |
diff --git a/docs/user_guide/how_to_guides/redis_setup.md b/docs/user_guide/how_to_guides/redis_setup.md
index fc92b2e..2b349e5 100644
--- a/docs/user_guide/how_to_guides/redis_setup.md
+++ b/docs/user_guide/how_to_guides/redis_setup.md
@@ -6,21 +6,14 @@ This guide covers Redis deployment options for use with adk-redis.
 
 ## Deployment Options
 
-### Option 1: Redis Stack (Docker)
+### Option 1: Redis 8.4+ (Docker)
 
-**Use case:** Local development with search tools (RedisVL integration)
-
-**Features:**
-- Redis with Search & Query modules
-- JSON support
-- RedisInsight GUI on port 8001
+**Use case:** Recommended for all new projects. Redis 8.4 includes the Redis
+Query Engine (vector search, full-text search, JSON) with no extra modules.
 
 **Installation:**
 ```bash
-docker run -d --name redis-stack \
-  -p 6379:6379 \
-  -p 8001:8001 \
-  redis/redis-stack:latest
+docker run -d --name redis -p 6379:6379 redis:8.4-alpine
 ```
 
 **Connection URL:**
@@ -30,28 +23,19 @@ redis://localhost:6379
 
 **Verification:**
 ```bash
-# Check container status
-docker ps | grep redis-stack
-
 # Test connection
 redis-cli ping
 # Expected: PONG
 
-# Verify Search module
+# Verify Search engine
 redis-cli FT._LIST
 # Expected: (empty array) or list of indices
 ```
 
-**Ports:**
-- `6379`: Redis server
-- `8001`: RedisInsight GUI (http://localhost:8001)
-
-**Environment variables:** None required
-
 **Stop/Remove:**
 ```bash
-docker stop redis-stack
-docker rm redis-stack
+docker stop redis
+docker rm redis
 ```
 
 ---
@@ -72,7 +56,11 @@ docker rm redis-stack
 docker run -d --name agent-memory-server \
   -p 8000:8000 \
   -e REDIS_URL=redis://host.docker.internal:6379 \
-  -e OPENAI_API_KEY=your-openai-key \
+  -e GEMINI_API_KEY=your-gemini-api-key \
+  -e GENERATION_MODEL=gemini/gemini-2.0-flash-exp \
+  -e EMBEDDING_MODEL=gemini/text-embedding-004 \
+  -e EXTRACTION_DEBOUNCE_SECONDS=5 \
+  -e DISABLE_AUTH=true \
   redislabs/agent-memory-server:0.13.2 \
   agent-memory api --host 0.0.0.0 --port 8000 --task-backend=asyncio
 ```
@@ -85,7 +73,7 @@ http://localhost:8000
 **Verification:**
 ```bash
 # Health check
-curl http://localhost:8000/health
+curl http://localhost:8000/v1/health
 # Expected: {"status":"healthy"}
 
 # API docs (open in browser)
@@ -97,14 +85,14 @@ curl http://localhost:8000/health
 
 **Required environment variables:**
 - `REDIS_URL`: Redis connection string
- - Mac/Windows: `redis://host.docker.internal:6379`
- - Linux: `redis://172.17.0.1:6379` or use `--network host`
-- `OPENAI_API_KEY`: OpenAI API key for embeddings (or configure alternative provider)
+    - Mac/Windows: `redis://host.docker.internal:6379`
+    - Linux: `redis://172.17.0.1:6379` or use `--network host`
+- LLM provider API key (e.g. `GEMINI_API_KEY`, `OPENAI_API_KEY`)
 
 **Optional environment variables:**
 - `DISABLE_AUTH=true`: Disable authentication (development only)
-- `GENERATION_MODEL=gpt-4o`: LLM model for summarization
-- `EMBEDDING_MODEL=text-embedding-3-small`: Embedding model
+- `GENERATION_MODEL`: LLM model for summarization (e.g. `gemini/gemini-2.0-flash-exp`)
+- `EMBEDDING_MODEL`: Embedding model (e.g. `gemini/text-embedding-004`)
 
 **Stop/Remove:**
 ```bash
@@ -189,10 +177,10 @@ redis+sentinel://sentinel1:26379,sentinel2:26379/mymaster
 
 | Redis Deployment | Search Tools | Memory Services Backend | Use Case |
 |------------------|--------------|-------------------------|----------|
-| Redis Stack | ✅ | ✅ (via Agent Memory Server) | Local development |
-| Redis Cloud | ✅ | ✅ (via Agent Memory Server) | Production |
+| Redis 8.4+ | ✅ | ✅ (via Agent Memory Server) | Recommended for all new projects |
+| Redis Cloud | ✅ | ✅ (via Agent Memory Server) | Managed production |
 | Redis Enterprise | ✅ | ✅ (via Agent Memory Server) | Enterprise production |
 | Redis Sentinel | ✅ | ✅ (via Agent Memory Server) | High availability |
 
-**Note:** Memory services require both a Redis deployment AND Agent Memory Server. Search tools only need Redis with Search module.
+**Note:** Memory services require both a Redis deployment AND Agent Memory Server. Search tools only need Redis with the Query Engine.
 
diff --git a/docs/user_guide/how_to_guides/semantic_cache.md b/docs/user_guide/how_to_guides/semantic_cache.md
new file mode 100644
index 0000000..49d6dfb
--- /dev/null
+++ b/docs/user_guide/how_to_guides/semantic_cache.md
@@ -0,0 +1,133 @@
+# Semantic Cache
+
+This guide shows how to add semantic caching to a Google ADK agent so that
+near-duplicate prompts return a cached LLM response instead of making a new
+call.
+
+For the concepts behind semantic caching, see
+[Semantic Caching](../../concepts/caching.md).
+
+## Option A: Self-hosted with RedisVL
+
+Use `RedisVLCacheProvider` when you run your own Redis instance and want full
+control over the vectorizer and cache index.
+
+### Prerequisites
+
+- Redis 8.4+ running locally (see [Redis setup](redis_setup.md)).
+- `pip install 'adk-redis[search]'`
+
+### Setup
+
+```python
+from google.adk import Agent
+from redisvl.utils.vectorize import HFTextVectorizer
+
+from adk_redis import (
+    LLMResponseCache,
+    LLMResponseCacheConfig,
+    RedisVLCacheProvider,
+    RedisVLCacheProviderConfig,
+    create_llm_cache_callbacks,
+)
+
+# 1. Create a vectorizer (runs locally, no API key needed)
+vectorizer = HFTextVectorizer(model="redis/langcache-embed-v1")
+
+# 2. Create the cache provider
+provider = RedisVLCacheProvider(
+    config=RedisVLCacheProviderConfig(
+        redis_url="redis://localhost:6379",
+        name="my_cache",
+        ttl=3600,
+        distance_threshold=0.1,
+    ),
+    vectorizer=vectorizer,
+)
+
+# 3. Create the cache and wire callbacks into the agent
+llm_cache = LLMResponseCache(
+    provider=provider,
+    config=LLMResponseCacheConfig(first_message_only=True),
+)
+before_cb, after_cb = create_llm_cache_callbacks(llm_cache)
+
+agent = Agent(
+    model="gemini-2.0-flash",
+    name="cached_agent",
+    before_model_callback=before_cb,
+    after_model_callback=after_cb,
+)
+```
+
+See the
+[semantic_cache example](https://github.com/redis-developer/adk-redis/tree/main/examples/semantic_cache)
+for a runnable version.
+
+---
+
+## Option B: Managed with LangCache
+
+Use `LangCacheProvider` with
+[Redis LangCache](https://redis.io/langcache) for a fully managed service. No
+local vectorizer or Redis instance needed; embeddings are handled server-side.
+
+### Prerequisites
+
+- A LangCache account and cache ID (sign up at
+  [redis.io/langcache](https://redis.io/langcache)).
+- `pip install 'adk-redis[langcache]'`
+
+### Setup
+
+```python
+from google.adk import Agent
+
+from adk_redis import (
+    LLMResponseCache,
+    LLMResponseCacheConfig,
+    LangCacheProvider,
+    LangCacheProviderConfig,
+    create_llm_cache_callbacks,
+)
+
+provider = LangCacheProvider(
+    config=LangCacheProviderConfig(
+        cache_id="your-cache-id",
+        api_key="your-api-key",
+        server_url="https://aws-us-east-1.langcache.redis.io",
+        ttl=3600,
+    ),
+)
+
+llm_cache = LLMResponseCache(
+    provider=provider,
+    config=LLMResponseCacheConfig(first_message_only=False),
+)
+before_cb, after_cb = create_llm_cache_callbacks(llm_cache)
+
+agent = Agent(
+    model="gemini-2.0-flash",
+    name="langcache_agent",
+    before_model_callback=before_cb,
+    after_model_callback=after_cb,
+)
+```
+
+See the
+[langcache_cache example](https://github.com/redis-developer/adk-redis/tree/main/examples/langcache_cache)
+for a runnable version.
+
+---
+
+## Configuration options
+
+| Option | Provider | Default | Description |
+|--------|----------|---------|-------------|
+| `distance_threshold` | Both | `0.1` | Max vector distance for a cache hit (lower = stricter) |
+| `ttl` | Both | `None` | Time-to-live in seconds for cache entries |
+| `name` | RedisVL | `llmcache` | Redis index name |
+| `redis_url` | RedisVL | `redis://localhost:6379` | Redis connection string |
+| `cache_id` | LangCache | Required | LangCache instance identifier |
+| `api_key` | LangCache | Required | LangCache API key |
+| `first_message_only` | Cache config | `True` | Only cache the first message per session |
diff --git a/docs/user_guide/how_to_guides/session_service.md b/docs/user_guide/how_to_guides/session_service.md
index ee4cace..c7ce0d4 100644
--- a/docs/user_guide/how_to_guides/session_service.md
+++ b/docs/user_guide/how_to_guides/session_service.md
@@ -1,45 +1,64 @@
 # Session Service
 
-This guide shows how to wire `RedisSessionService` into a Google ADK agent for durable session state.
+This guide shows how to wire `RedisWorkingMemorySessionService` into a Google
+ADK agent for durable, auto-summarizing session state backed by the
+[Agent Memory Server](memory_server_setup.md).
+
+For the concepts behind sessions and working memory, see
+[Sessions + Memory Services](../../concepts/sessions.md).
 
 ## Prerequisites
 
-- Redis running on `localhost:6379`
-- `adk-redis` installed: `pip install adk-redis`
+- Agent Memory Server running on `localhost:8000`
+  (see [Memory server setup](memory_server_setup.md)).
+- `adk-redis` with the memory extra: `pip install "adk-redis[memory]"`.
 
 ## Basic usage
 
 ```python
-from google.adk.agents import Agent
-from adk_redis import RedisSessionService
+from google.adk import Agent
+from google.adk.runners import Runner
+from adk_redis import (
+    RedisWorkingMemorySessionService,
+    RedisWorkingMemorySessionServiceConfig,
+)
 
-# Create Redis-backed session service
-session_service = RedisSessionService(
-    redis_url="redis://localhost:6379",
-    app_name="my-agent",
+session_service = RedisWorkingMemorySessionService(
+    config=RedisWorkingMemorySessionServiceConfig(
+        api_base_url="http://localhost:8000",
+        default_namespace="my_app",
+    )
 )
 
-# Create your ADK agent with Redis sessions
 agent = Agent(
     model="gemini-2.0-flash",
     name="my_agent",
     instruction="You are a helpful assistant.",
 )
 
+runner = Runner(
+    agent=agent,
+    app_name="my_app",
+    session_service=session_service,
+)
+
 # Create a session
 session = await session_service.create_session(
-    app_name="my-agent",
+    app_name="my_app",
     user_id="alice",
 )
 
-# The session is now persisted in Redis and survives process restarts
+# The session is persisted in Agent Memory Server and survives restarts
 ```
 
 ## Configuration options
 
 | Option | Default | Description |
 |--------|---------|-------------|
-| `redis_url` | `redis://localhost:6379` | Redis connection string |
-| `app_name` | Required | Application namespace |
-| `ttl` | `None` | Session TTL in seconds |
-| `key_prefix` | `adk:session` | Redis key prefix |
+| `api_base_url` | `http://localhost:8000` | Agent Memory Server URL |
+| `default_namespace` | `None` | Namespace for session isolation |
+| `model_name` | `None` | LLM model for summarization |
+| `context_window_max` | `None` | Max tokens before auto-summarization |
+| `extraction_strategy` | `discrete` | `discrete`, `summary`, `preferences`, `custom` |
+| `session_ttl_seconds` | `None` | Session expiration time |
+| `timeout` | `30.0` | HTTP request timeout |
diff --git a/docs/user_guide/index.md b/docs/user_guide/index.md
index 9be3278..f52d182 100644
--- a/docs/user_guide/index.md
+++ b/docs/user_guide/index.md
@@ -10,16 +10,16 @@ Integrate adk-redis into a Google ADK agent.
 
 <div class="grid cards" markdown>
 
--   :material-rocket-launch:{ .lg .middle } **[1. Integration walkthrough](01_integration.md)**
+-   :material-rocket-launch:{ .lg .middle } **[Quickstart](01_integration.md)**
 
     ---
 
-    Stand up Redis, register the services, and run an end-to-end ADK agent.
+    Stand up Redis, wire in session and memory services, and run your first agent.
 
 </div>
 
 ## How-to guides
 
 Specific recipes are in [How-To Guides](how_to_guides/index.md): Redis
-setup, memory server setup, session service, memory service, and search
-tools.
+setup, memory server setup, session service, memory service, search
+tools, and semantic caching.
diff --git a/mkdocs.yml b/mkdocs.yml
index feab5d6..41987b8 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -1,6 +1,6 @@
 site_name: Google ADK + Redis
-site_description: Redis backends for Google Agent Development Kit. Sessions, memory, tools, cache.
-site_url: https://ai.redis.io/adk/
+site_description: Redis integrations for Google Agent Development Kit. Sessions, memory, tools, cache.
+site_url: https://redis-developer.github.io/adk-redis/
 repo_url: https://github.com/redis-developer/adk-redis
 repo_name: redis-developer/adk-redis
 edit_uri: edit/main/docs/
@@ -162,12 +162,13 @@ nav:
   - Concepts:
       - concepts/index.md
       - ADK overview: concepts/adk_overview.md
-      - Sessions: concepts/sessions.md
-      - Memory: concepts/memory.md
-      - Search: concepts/search.md
+      - Sessions + Memory Services: concepts/sessions.md
+      - Sessions + Memory MCP + Tools: concepts/memory.md
+      - RedisVL MCP + Search Tools: concepts/search.md
+      - Semantic Caching: concepts/caching.md
   - User Guide:
       - user_guide/index.md
-      - Integration: user_guide/01_integration.md
+      - Quickstart: user_guide/01_integration.md
       - How-To Guides:
           - user_guide/how_to_guides/index.md
           - Redis setup: user_guide/how_to_guides/redis_setup.md
@@ -175,11 +176,9 @@ nav:
           - Session service: user_guide/how_to_guides/session_service.md
           - Memory service: user_guide/how_to_guides/memory_service.md
           - Search tools: user_guide/how_to_guides/search_tools.md
+          - Semantic cache: user_guide/how_to_guides/semantic_cache.md
   - Examples:
       - examples/index.md
-      - Fitness coach: examples/fitness_coach.md
-      - Travel agent (hybrid): examples/travel_agent_hybrid.md
-      - Redis search tools: examples/redis_search_tools.md
   - API Reference:
       - api/index.md
       - Python package:
diff --git a/pyproject.toml b/pyproject.toml
index 280fb36..f72de45 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -66,6 +66,15 @@ examples = [
     "python-dotenv",
 ]
 
+# Docs build (mkdocs + plugins used in mkdocs.yml)
+docs = [
+    "mkdocs-material>=9.5",
+    "mkdocstrings[python]>=0.27",
+    "mkdocs-llmstxt>=0.5",
+    "mkdocs-section-index>=0.3",
+    "mkdocs-autorefs>=1.2",
+]
+
 # All Redis integrations
 all = [
     "adk-redis[memory,search,langcache,sql]",