diff --git a/docs/en/llama_stack/install.mdx b/docs/en/llama_stack/install.mdx
index 3ca1ab1..26ca655 100644
--- a/docs/en/llama_stack/install.mdx
+++ b/docs/en/llama_stack/install.mdx
@@ -34,7 +34,9 @@ After the operator is installed, deploy Llama Stack Server by creating a `LlamaS
 > - **Inference URL**: `VLLM_URL` must point at a **vLLM OpenAI-compatible** HTTP base URL (for example an in-cluster vLLM or KServe InferenceService) that serves the target model.
 > - **Secret (optional)**: `VLLM_API_TOKEN` is only needed when the vLLM endpoint requires authentication. If vLLM has no auth, do not set it. When required, create a Secret in the same namespace and reference it from `containerSpec.env` (see the commented example in the manifest below).
 > - **Storage Class**: Ensure the `default` Storage Class exists in the cluster; otherwise the PVC cannot be bound and the resource will not become ready.
+> - **PostgreSQL storage**: The `starter` distribution in this release uses PostgreSQL for Llama Stack persistence. Configure `POSTGRES_*` environment variables for the server pod before deploying.
 > - **PGVector (optional)**: To use `vector_stores` with `provider_id="pgvector"`, provide `PGVECTOR_*` environment variables to the server pod. ACP-provided PostgreSQL can be used directly because it already includes the `pgvector` extension.
+> - **Milvus (optional)**: To use `vector_stores` with `provider_id="milvus-remote"`, provide `MILVUS_ENDPOINT` and, when authentication is enabled, `MILVUS_TOKEN`. Set `MILVUS_CONSISTENCY_LEVEL` to a valid Milvus consistency level such as `Strong`.
 > - **Embedding model download**: Llama Stack includes a default embedding model configuration for vector-store usage, but the model artifacts are downloaded from Hugging Face on first use. If a mirror or proxy is needed, configure `HF_ENDPOINT`. For fully offline environments, pre-download the model files into the server PVC before running the first vector-store request.
 
 ```yaml
@@ -68,8 +70,27 @@ spec:
         #       key: token
         #       name: vllm-api-token
 
+        # Required: PostgreSQL-backed Llama Stack persistence for this starter
+        # distribution image.
+        - name: POSTGRES_HOST
+          value: "<postgresql-service>"
+        - name: POSTGRES_PORT
+          value: "5432"
+        - name: POSTGRES_DB
+          value: "<database-name>"
+        - name: POSTGRES_USER
+          value: "<database-username>"
+        - name: POSTGRES_PASSWORD
+          valueFrom:
+            secretKeyRef:
+              name: <postgresql-credentials-secret>
+              key: password
+
         # Optional: enable PGVector-backed vector stores.
-        # Omit the entire block below if you do not need vector store APIs.
+        # Omit the entire block below if you do not need PGVector vector stores.
+        # These settings configure the vector DB provider and are separate from
+        # the POSTGRES_* persistence settings above, although they may point to
+        # the same PostgreSQL instance when it has the pgvector extension.
         # ACP-provided PostgreSQL already includes the pgvector extension.
         # - name: ENABLE_PGVECTOR
         #   value: "true"
@@ -87,6 +108,23 @@ spec:
         #       name: <pgvector-credentials-secret>
         #       key: password
 
+        # Optional: enable remote Milvus-backed vector stores.
+        # Use provider_id="milvus-remote" from the client API.
+        # - name: MILVUS_ENDPOINT
+        #   value: "http://<milvus-endpoint-host-and-port>"
+        # - name: MILVUS_TOKEN
+        #   valueFrom:
+        #     secretKeyRef:
+        #       name: <milvus-credentials-secret>
+        #       key: token
+        # - name: MILVUS_CONSISTENCY_LEVEL
+        #   value: "Strong"
+
+        # Required for PGVector or Milvus vector stores that use local
+        # sentence-transformers embeddings.
+        # - name: ENABLE_SENTENCE_TRANSFORMERS
+        #   value: "true"
+        #
         # Optional: configure a Hugging Face mirror or proxy for the default
         # embedding model download path.
         # - name: HF_ENDPOINT
@@ -118,6 +156,18 @@ status:
   serviceURL: http://demo-service.default.svc.cluster.local:8321
 ```
 
+## Configure PostgreSQL Storage
+
+The `starter` distribution image used by this release requires PostgreSQL for Llama Stack persistence. Configure these server environment variables in the `LlamaStackDistribution`:
+
+- `POSTGRES_HOST`
+- `POSTGRES_PORT`
+- `POSTGRES_DB`
+- `POSTGRES_USER`
+- `POSTGRES_PASSWORD`
+
+These settings are for Llama Stack server state. They are not the same as `PGVECTOR_*`, which only configures the optional PGVector vector-store provider. You may use the same PostgreSQL instance for both roles when it has the required database, credentials, and `pgvector` extension.
+
 ## Tool calling with vLLM on KServe
 
 The following applies to the **vLLM predictor** on KServe, not to the `LlamaStackDistribution` manifest. For agent flows that use **tools** (client-side tools or MCP), the vLLM process must expose tool-call support. Add predictor container `args` as required by upstream vLLM, for example:
@@ -139,12 +189,26 @@ Recommended preparation:
 
 1. Prepare an ACP PostgreSQL instance and record its service name, database name, username, and password.
 2. Expose the database connection to the `LlamaStackDistribution` with `PGVECTOR_HOST`, `PGVECTOR_PORT`, `PGVECTOR_DB`, `PGVECTOR_USER`, and `PGVECTOR_PASSWORD`.
-3. Use the default embedding model provided by Llama Stack, and make sure its model files can be fetched on first use.
+3. Set `ENABLE_SENTENCE_TRANSFORMERS=true` and make sure the default embedding model files can be fetched on first use.
 4. If the cluster uses a Hugging Face mirror or proxy, set `HF_ENDPOINT` accordingly.
 5. If the cluster is fully offline, pre-download the embedding model files into the server PVC and enable offline cache-related environment variables.
 
 After the distribution is ready, you can validate the setup with the PGVector section in the [Quickstart](./quickstart) notebook.
 
+## Enable Milvus Vector Store
+
+When `MILVUS_ENDPOINT` is set on the server, Llama Stack can create vector stores by using `provider_id="milvus-remote"` from the client API.
+
+Recommended preparation:
+
+1. Prepare a Milvus endpoint reachable from the Llama Stack Server pod. `MILVUS_ENDPOINT` must include the scheme, either `http://` or `https://`, and the port required by your Milvus service.
+2. Expose the Milvus connection to the `LlamaStackDistribution` with `MILVUS_ENDPOINT`.
+3. If Milvus authentication is enabled, set `MILVUS_TOKEN` from a Secret.
+4. Set `MILVUS_CONSISTENCY_LEVEL` to a string value such as `Strong`; the Milvus provider requires this field.
+5. Set `ENABLE_SENTENCE_TRANSFORMERS=true` and make sure the embedding model files can be fetched or are already present in the server PVC.
+
+After the distribution is ready, validate the setup with the Milvus section in the [Quickstart](./quickstart) notebook. The client creates the vector store with `provider_id="milvus-remote"` and passes the selected embedding model id plus embedding dimension in `extra_body`.
+
 ## Hugging Face Access For Embedding Models
 
 Llama Stack uses a default embedding model for vector-store operations. On first use, the server downloads the model files from Hugging Face into its local cache.
@@ -179,4 +243,4 @@ Common deployment modes:
      value: "1"
    ```
 
-If the cache path is pre-populated correctly, the server can create PGVector-backed vector stores without downloading model artifacts at runtime.
+If the cache path is pre-populated correctly, the server can create PGVector-backed or Milvus-backed vector stores without downloading model artifacts at runtime.
diff --git a/docs/en/llama_stack/overview/features.mdx b/docs/en/llama_stack/overview/features.mdx
index 59de484..1ae0b12 100644
--- a/docs/en/llama_stack/overview/features.mdx
+++ b/docs/en/llama_stack/overview/features.mdx
@@ -26,5 +26,5 @@ weight: 20
 ## Integration
 
 - **Python Client**: `llama-stack-client` for Python 3.12+ with full agent and model APIs
-- **Vector Store APIs**: Create and query vector stores from the client, including PGVector-backed stores when the server is configured with `ENABLE_PGVECTOR=true`
+- **Vector Store APIs**: Create and query vector stores from the client, including PGVector-backed stores with `provider_id="pgvector"` and Milvus-backed stores with `provider_id="milvus-remote"`
 - **REST-Friendly**: Server exposes APIs for inference, agents, and tool runtime; can be wrapped in FastAPI or other web frameworks for production use
diff --git a/docs/en/llama_stack/quickstart.mdx b/docs/en/llama_stack/quickstart.mdx
index 7c42417..68e9370 100644
--- a/docs/en/llama_stack/quickstart.mdx
+++ b/docs/en/llama_stack/quickstart.mdx
@@ -9,9 +9,9 @@ This section provides a quickstart example for creating an AI Agent with Llama S
 ## Prerequisites
 
 - Python 3.12 or higher (if not satisfied, refer to [FAQ: How to prepare Python 3.12 in Notebook](#how-to-prepare-python-312-in-notebook))
-- Llama Stack Server installed and running via Operator (see [Install Llama Stack](./install)), with **`VLLM_URL` pointing at a vLLM-served model endpoint** (see install notes)
+- Llama Stack Server installed and running via Operator (see [Install Llama Stack](./install)), with **`VLLM_URL` pointing at a vLLM-served model endpoint** and `POSTGRES_*` configured for server persistence (see install notes)
 - Access to a Notebook environment (e.g., Jupyter Notebook, JupyterLab)
-- Python environment with `llama-stack-client==0.6.0`, `fastmcp` (for the MCP section), and other notebook dependencies installed
+- Python environment with `llama-stack-client==0.7.1`, `fastmcp` (for the MCP section), and other notebook dependencies installed
 
 ## Quickstart Example
 
@@ -25,19 +25,26 @@ The notebook demonstrates:
 
 - **Two tool options:** client-side tools (`@client_tool`) and MCP tools (FastMCP + `toolgroups.register`)
 - **Shared agent flow:** connect to Llama Stack Server, select a model, create an `Agent` with `tools=AGENT_TOOLS`, then run sessions and streaming turns
-- **Optional PGVector flow:** upload a file, create a `pgvector`-backed vector store, and run a hybrid search query
+- **Optional vector store flows:** upload a file, create a `pgvector` or `milvus-remote` backed vector store, and run a search query
 - Streaming responses and event logging
 - Optional FastAPI deployment of the `agent`
 
-## PGVector Usage
+## Vector Store Usage
 
-The downloadable notebook includes an optional PGVector section. To run it, start the server with `ENABLE_PGVECTOR=true` and valid `PGVECTOR_*` connection settings, then execute the PGVector cells in the notebook. ACP-provided PostgreSQL can be used directly because it already includes the `pgvector` extension.
+The downloadable notebook includes optional PGVector and Milvus sections.
+
+For PGVector, start the server with `ENABLE_PGVECTOR=true` and valid `PGVECTOR_*` connection settings, then execute the PGVector cells in the notebook. ACP-provided PostgreSQL can be used directly because it already includes the `pgvector` extension.
+
+For Milvus, start the server with `MILVUS_ENDPOINT`, optional `MILVUS_TOKEN`, and `MILVUS_CONSISTENCY_LEVEL`, then execute the Milvus cells in the notebook. Use `provider_id="milvus-remote"` in the client request.
+
+For both vector-store examples, `client.models.list()` must include an embedding model, for example `sentence-transformers/nomic-ai/nomic-embed-text-v1.5`. If it only returns LLM models, restart the `LlamaStackDistribution` with `ENABLE_SENTENCE_TRANSFORMERS=true` and configure Hugging Face cache/download access as described in [Install Llama Stack](./install).
 
 The notebook example covers:
 
 - Uploading a file through `client.files.create(...)`
-- Creating a vector store with `provider_id="pgvector"`
-- Running a hybrid search with `client.vector_stores.search(...)` and `search_mode="hybrid"`
+- Creating a vector store with `provider_id="pgvector"` or `provider_id="milvus-remote"`
+- Passing `embedding_model` and `embedding_dimension` through `client.vector_stores.create(..., extra_body=...)`
+- Running a search with `client.vector_stores.search(...)`; PGVector uses `search_mode="hybrid"` in `extra_body`
 
 ## FAQ
 
diff --git a/docs/public/llama-stack/llama-stack_quickstart.ipynb b/docs/public/llama-stack/llama-stack_quickstart.ipynb
index 71ff102..64a0b81 100644
--- a/docs/public/llama-stack/llama-stack_quickstart.ipynb
+++ b/docs/public/llama-stack/llama-stack_quickstart.ipynb
@@ -7,12 +7,12 @@
       "source": [
         "# Llama Stack Quick Start Demo\n",
         "\n",
-        "This notebook demonstrates how to use Llama Stack for agent workflows and PGVector-backed vector store access:\n",
+        "This notebook demonstrates how to use Llama Stack for agent workflows and vector store access:\n",
         "\n",
         "- **Option A (section 2):** define a **client-side** weather tool with `@client_tool`; the cell sets **`AGENT_TOOLS`**.\n",
         "- **Option B (section 2):** run an **MCP** weather tool with **FastMCP** and register it with the server; the register cell sets **`AGENT_TOOLS`**.\n",
         "- **Section 3** uses the **same** connect / model selection / `Agent` construction / run flow for both options. The only difference is the value of **`AGENT_TOOLS`** passed into `Agent`.\n",
-        "- **Section 4** shows how to upload a file and query a **PGVector**-backed vector store.\n",
+        "- **Section 4** shows how to upload a file and query **PGVector** and **Milvus** backed vector stores.\n",
         "\n",
         "### Inference backend (`LlamaStackDistribution`)\n",
         "\n",
@@ -49,7 +49,7 @@
         "# Use current kernel's Python so PATH does not point to another env\n",
         "# If download is slow, add: -i https://pypi.tuna.tsinghua.edu.cn/simple\n",
         "import sys\n",
-        "!{sys.executable} -m pip install \"llama-stack-client==0.6.0\" \"requests\" \"fastapi\" \"uvicorn\" \"fastmcp\""
+        "!{sys.executable} -m pip install \"llama-stack-client==0.7.1\" \"requests\" \"fastapi\" \"uvicorn\" \"fastmcp\""
       ]
     },
     {
@@ -462,26 +462,26 @@
       "id": "pgvector-title-md",
       "metadata": {},
       "source": [
-        "## 4. PGVector Vector Store Example\n",
+        "## 4. Vector Store Examples\n",
         "\n",
-        "This section shows how to upload a file and query a PGVector-backed vector store.\n",
+        "This section shows how to upload a file and query PGVector-backed and Milvus-backed vector stores.\n",
         "\n",
-        "Prerequisites:\n",
-        "- The server distribution is configured with `ENABLE_PGVECTOR=true` and valid `PGVECTOR_*` connection settings.\n",
-        "- ACP-provided PostgreSQL can be used directly; it already includes the `pgvector` extension.\n",
-        "- Llama Stack includes a default embedding model configuration, but the model files are downloaded from Hugging Face on first use.\n",
-        "- If the cluster uses a Hugging Face mirror or proxy, configure `HF_ENDPOINT`.\n",
-        "- If the cluster is fully offline, pre-download the model files into `/home/lls/.lls/huggingface/hub` and set offline cache-related environment variables.\n"
+        "### Shared Embedding Model Selection\n",
+        "\n",
+        "Run this cell once before the PGVector or Milvus example. Both vector stores use the selected embedding model id and dimension in `client.vector_stores.create(..., extra_body=...)`.\n",
+        "\n",
+        "Before continuing, `client.models.list()` must include an embedding model, for example `sentence-transformers/nomic-ai/nomic-embed-text-v1.5`. If it only shows LLM models, restart the server distribution with `ENABLE_SENTENCE_TRANSFORMERS=true` and the embedding model cache/download settings described in the install guide.\n"
       ]
     },
     {
       "cell_type": "code",
       "execution_count": null,
-      "id": "pgvector-demo-code",
+      "id": "vector-store-shared-embedding-code",
       "metadata": {},
       "outputs": [],
       "source": [
         "import json\n",
+        "import os\n",
         "import time\n",
         "\n",
         "\n",
@@ -498,16 +498,58 @@
         "\n",
         "\n",
         "models = client.models.list()\n",
+        "print(\"models(list) response:\")\n",
+        "if hasattr(models, \"model_dump\"):\n",
+        "    print(json.dumps(models.model_dump(mode=\"json\"), ensure_ascii=False, indent=2))\n",
+        "else:\n",
+        "    print(\n",
+        "        json.dumps(\n",
+        "            [getattr(model, \"model_dump\", lambda: str(model))() for model in models],\n",
+        "            ensure_ascii=False,\n",
+        "            default=str,\n",
+        "            indent=2,\n",
+        "        )\n",
+        "    )\n",
+        "\n",
+        "preferred_embedding_model_id = os.getenv(\n",
+        "    \"EMBEDDING_MODEL\",\n",
+        "    os.getenv(\"TEST_EMBEDDING_MODEL\", \"sentence-transformers/nomic-ai/nomic-embed-text-v1.5\"),\n",
+        ")\n",
+        "preferred_embedding_dimension = int(\n",
+        "    os.getenv(\"EMBEDDING_DIMENSION\", os.getenv(\"TEST_EMBEDDING_DIMENSION\", \"768\"))\n",
+        ")\n",
+        "\n",
         "embedding_model = next(\n",
         "    (\n",
         "        model\n",
         "        for model in models\n",
-        "        if get_model_metadata(model).get(\"model_type\") == \"embedding\"\n",
+        "        if getattr(model, \"id\", \"\") == preferred_embedding_model_id\n",
         "    ),\n",
         "    None,\n",
         ")\n",
+        "\n",
         "if embedding_model is None:\n",
-        "    raise RuntimeError(\"No embedding model found from client.models.list()\")\n",
+        "    print(\n",
+        "        f\"Preferred embedding model {preferred_embedding_model_id!r} was not found; \"\n",
+        "        \"falling back to the first model tagged as embedding.\"\n",
+        "    )\n",
+        "    embedding_model = next(\n",
+        "        (\n",
+        "            model\n",
+        "            for model in models\n",
+        "            if get_model_metadata(model).get(\"model_type\") == \"embedding\"\n",
+        "        ),\n",
+        "        None,\n",
+        "    )\n",
+        "if embedding_model is None:\n",
+        "    raise RuntimeError(\n",
+        "        \"No embedding model found from client.models.list(). The server currently \"\n",
+        "        \"exposes only LLM models, so vector store examples cannot run yet. \"\n",
+        "        \"Restart the LlamaStackDistribution with ENABLE_SENTENCE_TRANSFORMERS=true \"\n",
+        "        \"and make sure the embedding model is registered and its files are available. \"\n",
+        "        \"If you use a different registered embedding model, set EMBEDDING_MODEL and \"\n",
+        "        \"EMBEDDING_DIMENSION in the notebook environment before running this cell.\"\n",
+        "    )\n",
         "\n",
         "embedding_metadata = get_model_metadata(embedding_model)\n",
         "resolved_dimension = (\n",
@@ -515,14 +557,45 @@
         "    or embedding_metadata.get(\"dimensions\")\n",
         "    or getattr(embedding_model, \"embedding_dimension\", None)\n",
         "    or getattr(embedding_model, \"dimensions\", None)\n",
+        "    or preferred_embedding_dimension\n",
         ")\n",
-        "if resolved_dimension is None:\n",
-        "    raise RuntimeError(\n",
-        "        f\"Could not determine embedding dimension for model {embedding_model.id!r}. \"\n",
-        "        \"Set it explicitly to match the embedding model used by the server.\"\n",
-        "    )\n",
         "embedding_dimension = int(resolved_dimension)\n",
         "\n",
+        "print(\n",
+        "    json.dumps(\n",
+        "        {\n",
+        "            \"embedding_model\": embedding_model.id,\n",
+        "            \"embedding_dimension\": embedding_dimension,\n",
+        "        },\n",
+        "        ensure_ascii=False,\n",
+        "        indent=2,\n",
+        "    )\n",
+        ")\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "pgvector-prerequisites-md",
+      "metadata": {},
+      "source": [
+        "### PGVector Vector Store Example\n",
+        "\n",
+        "Prerequisites:\n",
+        "- The server distribution is configured with `ENABLE_PGVECTOR=true` and valid `PGVECTOR_*` connection settings.\n",
+        "- ACP-provided PostgreSQL can be used directly; it already includes the `pgvector` extension.\n",
+        "- The server distribution is configured with `ENABLE_SENTENCE_TRANSFORMERS=true` so an embedding model is registered.\n",
+        "- Llama Stack includes a default embedding model configuration, but the model files are downloaded from Hugging Face on first use.\n",
+        "- If the cluster uses a Hugging Face mirror or proxy, configure `HF_ENDPOINT`.\n",
+        "- If the cluster is fully offline, pre-download the model files into `/home/lls/.lls/huggingface/hub` and set offline cache-related environment variables.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "id": "pgvector-demo-code",
+      "metadata": {},
+      "outputs": [],
+      "source": [
         "document = \"\"\"ACP PostgreSQL with pgvector can be used as the vector backend.\n",
         "Unique token: pgvector-demo-token\n",
         "This document is used to verify vector store indexing and retrieval.\n",
@@ -561,6 +634,67 @@
         "print(json.dumps(search_result, ensure_ascii=False, indent=2))\n"
       ]
     },
+    {
+      "cell_type": "markdown",
+      "id": "milvus-title-md",
+      "metadata": {},
+      "source": [
+        "### Milvus Vector Store Example\n",
+        "\n",
+        "This section shows how to upload a file and query a remote Milvus-backed vector store.\n",
+        "\n",
+        "Prerequisites:\n",
+        "- The server distribution is configured with `MILVUS_ENDPOINT` pointing at a Milvus endpoint reachable from the Llama Stack Server pod.\n",
+        "- If Milvus authentication is enabled, the server distribution is configured with `MILVUS_TOKEN`.\n",
+        "- `MILVUS_CONSISTENCY_LEVEL` is set to a valid Milvus consistency level, for example `Strong`.\n",
+        "- `ENABLE_SENTENCE_TRANSFORMERS=true` is set when using the default local embedding model.\n",
+        "- Llama Stack can download the embedding model files from Hugging Face, or the files are preloaded into `/home/lls/.lls/huggingface/hub` for offline clusters.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "id": "milvus-demo-code",
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "document = \"\"\"Remote Milvus can be used as the vector backend for Llama Stack.\n",
+        "Unique token: milvus-demo-token\n",
+        "This document is about Shanghai and verifies Milvus vector store retrieval.\n",
+        "\"\"\"\n",
+        "\n",
+        "file_object = client.files.create(\n",
+        "    file=(\"milvus-demo.txt\", document.encode(\"utf-8\"), \"text/plain\"),\n",
+        "    purpose=\"assistants\",\n",
+        ")\n",
+        "\n",
+        "vector_store = client.vector_stores.create(\n",
+        "    name=f\"milvus-demo-{int(time.time())}\",\n",
+        "    file_ids=[file_object.id],\n",
+        "    extra_body={\n",
+        "        \"provider_id\": \"milvus-remote\",\n",
+        "        \"embedding_model\": embedding_model.id,\n",
+        "        \"embedding_dimension\": embedding_dimension,\n",
+        "    },\n",
+        ")\n",
+        "\n",
+        "search_result = client.vector_stores.search(\n",
+        "    vector_store_id=vector_store.id,\n",
+        "    query=\"milvus-demo-token\",\n",
+        "    max_num_results=3,\n",
+        ")\n",
+        "\n",
+        "if hasattr(vector_store, \"model_dump\"):\n",
+        "    vector_store = vector_store.model_dump(mode=\"json\")\n",
+        "if hasattr(search_result, \"model_dump\"):\n",
+        "    search_result = search_result.model_dump(mode=\"json\")\n",
+        "\n",
+        "print(\"Milvus vector store:\")\n",
+        "print(json.dumps(vector_store, ensure_ascii=False, indent=2))\n",
+        "print(\"\\nMilvus search result:\")\n",
+        "print(json.dumps(search_result, ensure_ascii=False, indent=2))\n"
+      ]
+    },
     {
       "cell_type": "markdown",
       "id": "6f8d31d0",