Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions inference-platforms/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,43 @@ Elastic Stack.
* [AgC](AgC) - with [OpenTelemetry export][AgC]
* [vLLM](vllm) - with [OpenTelemetry POC][vllm] configuration

## MCP Agent flow

[agent.py](agent.py) uses the [OpenAI Agents SDK][openai-agents] to search for
flights via [Kiwi's MCP server][kiwi-mcp], proxied through an inference
platform like [Envoy AI Gateway](aigw).

```mermaid
sequenceDiagram
participant Agent
participant Gateway as AI Gateway
participant LLM as LLM Server
participant MCP as MCP Server

Agent ->> Gateway: user: "Use the search-flight tool to search for flights from New York to Los Angeles on 18/03/2026"
Gateway ->> LLM: ChatCompletion
activate LLM
LLM ->> Gateway: tool_call: search-flight({origin: "JFK", destination: "LAX", departureDate: "18/03/2026"})
deactivate LLM
Gateway ->> Agent:
activate Agent

Agent ->> Gateway: tools/call: search-flight
Gateway ->> MCP: tools/call: search-flight
activate MCP
MCP -->> Gateway: {flights: [{price: 177, route: "JFK→ATL→LAX"}, ...]}
deactivate MCP
Gateway -->> Agent:
deactivate Agent

Agent ->> Gateway: [user, assistant, tool: {flights}]
Gateway ->> LLM: ChatCompletion
activate LLM
LLM ->> Gateway: "The cheapest flight is JFK → ATL → LAX for €177..."
deactivate LLM
Gateway ->> Agent:
```

If you use Elastic Stack, an example would look like this in Kibana:

![Kibana screenshot](./kibana-trace.jpg)
Expand Down Expand Up @@ -114,3 +151,5 @@ To start and use Ollama, do the following:
[uv]: https://docs.astral.sh/uv/getting-started/installation/
[ollama-dl]: https://ollama.com/download
[otel-tui]: https://github.com/ymtdzzz/otel-tui
[openai-agents]: https://github.com/openai/openai-agents-python
[kiwi-mcp]: https://mcp.kiwi.com
5 changes: 3 additions & 2 deletions inference-platforms/agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,10 +41,11 @@ async def run_agent(tools: list[Tool], model_name: str, use_responses: bool):
tools=tools,
)

next_week = (datetime.now() + timedelta(weeks=1)).strftime("%Y-%m-%d")
# Small models can't convert between date formats that may be required by tools so this format needs to be precise
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi it was probably wrong to ever do YYY-MM-DD but anyway, since I looked up here it is. I think I just wasn't thinking when originally wrote this, and probably should have looked. but indeed LLMs should be able to figure this out!

$ npx @modelcontextprotocol/inspector --cli https://mcp.kiwi.com --transport http --method tools/list
{
  "tools": [
    {
      "name": "search-flight",
      "description": "\n# Search for a flight\n\n## Description\n\nUses the Kiwi API to search for available flights between two locations on a specific date.\n\n## How it works\n\nThe tool will:\n1. Search for matching locations to resolve airport codes\n2. Find available flights for the specified route and date range\n\n## Method\n\nCall this tool whenever a user wants to search for flights, regardless of whether they provided exact airport codes or just city names.\n\nYou should display the returned results in a markdown table format: Group the results by price (those who are the cheapest), duration (those who are the shortest, i.e. have the smallest 'totalDurationInSeconds') and the rest (those that could still be interesting).\n\nAlways display for each flight in order:\n  - In the 1st column: The departure and arrival airports, including layovers (e.g. \"Paris CDG → Barcelona BCN → Lisbon LIS\")\n  - In the 2nd column: The departure and arrival dates & times in the local timezones, and duration of the flight (e.g. \"03/08 06:05 → 09:30 (3h 25m)\", use 'durationInSeconds' to display the duration and not 'totalDurationInSeconds')\n  - In the 3rd column: The cabin class (e.g. \"Economy\")\n  - (In case of return flight only) In the 4th column: The return flight departure and arrival airports, including layovers (e.g. \"Paris CDG → Barcelona BCN → Lisbon LIS\")\n  - (In case of return flight only) In the 5th column: The return flight departure and arrival dates & times in the local timezones, and duration of the flight (e.g. \"03/08 06:05 → 09:30 (3h 25m)\", use 'return.durationInSeconds' to display the duration)\n  - (In case of return flight only) In the 6th column: The return flight cabin class (e.g. \"Economy\")\n  - In the previous-to-last column: The total price of the flight\n  - In the last column: The deep link to book the flight\n\nFinally, provide a summary highlighting the best prices, the shortest flights and a recommendation. End wishing a nice trip to the user with a short fun fact about the destination!\n",
      "inputSchema": {
        "type": "object",
        "properties": {
          "flyFrom": {
            "type": "string",
            "minLength": 1,
            "description": "Location to fly from: It could be a city or an airport name or code"
          },
          "flyTo": {
            "type": "string",
            "description": "Location to fly to: It could be a city or an airport name or code"
          },
          "departureDate": {
            "type": "string",
            "pattern": "^\\d{2}\\/\\d{2}\\/\\d{4}$",
            "description": "Departure date in dd/mm/yyyy format"
          },
          "departureDateFlexRange": {
            "type": "integer",
            "minimum": 0,
            "maximum": 3,
            "default": 0,
            "description": "Departure date flexibility range in days (0 to 3 days before/after the selected departure date)"
          },
          "returnDate": {
            "type": "string",
            "pattern": "^\\d{2}\\/\\d{2}\\/\\d{4}$",
            "description": "Return date in dd/mm/yyyy format"
          },
          "returnDateFlexRange": {
            "type": "integer",
            "minimum": 0,
            "maximum": 3,
            "default": 0,
            "description": "Return date flexibility range in days (0 to 3 days before/after the selected return date)"
          },
          "passengers": {
            "type": "object",
            "properties": {
              "adults": {
                "type": "integer",
                "minimum": 0,
                "maximum": 9,
                "default": 1,
                "description": "Number of adults (over 12 years old included)"
              },
              "children": {
                "type": "integer",
                "minimum": 0,
                "maximum": 8,
                "default": 0,
                "description": "Number of children (from 3 to 11 years old included)"
              },
              "infants": {
                "type": "integer",
                "minimum": 0,
                "maximum": 4,
                "default": 0,
                "description": "Number of infants (under 2 years old)"
              }
            },
            "additionalProperties": false,
            "default": {
              "adults": 1,
              "children": 0,
              "infants": 0
            },
            "description": "Passengers details. The total number of passengers must be between 1 and 9. There must be at least one adult. There must be at least one adult per infant."
          },
          "cabinClass": {
            "type": "string",
            "enum": [
              "M",
              "W",
              "C",
              "F"
            ],
            "description": "Cabin class: M (economy), W (economy premium), C (business), F (first class)"
          },
          "sort": {
            "type": "string",
            "enum": [
              "price",
              "duration",
              "quality",
              "date"
            ],
            "default": "date",
            "description": "Sort results by: price, duration, quality or date (default: date)"
          },
          "curr": {
            "type": "string",
            "default": "EUR",
            "description": "Currency for response (examples: EUR, USD, GBP, JPY, CAD, AUD, NZD, CHF etc.)"
          },
          "locale": {
            "type": "string",
            "minLength": 2,
            "maxLength": 5,
            "default": "en",
            "description": "Language of city names and kiwi.com website links (examples: en, uk, de, fr, es, it, ru etc.)"
          }
        },
        "required": [
          "flyFrom",
          "flyTo",
          "departureDate"
        ],
        "additionalProperties": false,
        "$schema": "http://json-schema.org/draft-07/schema#"
      },
      "annotations": {
        "title": "Search for flights with Kiwi.com",
        "readOnlyHint": true,
        "openWorldHint": true
      }
    },
    {
      "name": "feedback-to-devs",
      "description": "Send feedback to the dev of the Kiwi MCP server.",
      "inputSchema": {
        "type": "object",
        "properties": {
          "text": {
            "type": "string",
            "minLength": 1,
            "description": "The content of the feedback. Don't hesitate to include any text relevant to the issue (logs, error message) if you are having one."
          }
        },
        "required": [
          "text"
        ],
        "additionalProperties": false,
        "$schema": "http://json-schema.org/draft-07/schema#"
      },
      "annotations": {
        "title": "Send feedback to the devs of the Kiwi.com MCP server",
        "readOnlyHint": false,
        "destructiveHint": false,
        "idempotentHint": false,
        "openWorldHint": true
      }
    }
  ]

next_week = (datetime.now() + timedelta(weeks=1)).strftime("%d/%m/%Y")
result = await Runner.run(
starting_agent=agent,
input=f"Give me the best flight from New York to Kota Kinabalu on {next_week}",
input=f"Use the search-flight tool to search for flights from New York to Los Angeles on {next_week}",
run_config=RunConfig(workflow_name="flight search"),
)
print(result.final_output)
Expand Down
24 changes: 22 additions & 2 deletions inference-platforms/aigw/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ Start Ollama and your OpenTelemetry Collector via this repository's [README](../

## Run Envoy AI Gateway

### Run with Docker

```bash
docker compose up --force-recreate --pull always --remove-orphans --wait -d
```
Expand All @@ -40,6 +42,20 @@ Clean up when finished, like this:
docker compose down
```

### Run with Go

Download [shdotenv](https://github.com/ko1nksm/shdotenv) to load `env.local` when running.

```
curl -O -L https://github.com/ko1nksm/shdotenv/releases/download/v0.14.0/shdotenv
chmod +x ./shdotenv
```

Run `aigw` from source after setting ENV variables like this:
```bash
./shdotenv -e env.local go run github.com/envoyproxy/ai-gateway/cmd/aigw@latest run --mcp-json '{"mcpServers":{"kiwi":{"type":"http","url":"https://mcp.kiwi.com"}}}'
```

## Call Envoy AI Gateway with python

Once Envoy AI Gateway is running, use [uv][uv] to make an OpenAI request via
Expand All @@ -51,6 +67,11 @@ Once Envoy AI Gateway is running, use [uv][uv] to make an OpenAI request via
OPENAI_BASE_URL=http://localhost:1975/v1 uv run --exact -q --env-file env.local ../chat.py
```

Or, for the OpenAI Responses API
```bash
OPENAI_BASE_URL=http://localhost:1975/v1 uv run --exact -q --env-file env.local ../chat.py --use-responses-api
```

### MCP Agent

```bash
Expand All @@ -60,10 +81,9 @@ OPENAI_BASE_URL=http://localhost:1975/v1 MCP_URL=http://localhost:1975/mcp uv ru
## Notes

Here are some constraints about the Envoy AI Gateway implementation:
* Until [this][openai-responses] resolves, don't use `--use-responses-api`.
* Access log integration currently requires the OTLP gRPC transport (`OTEL_EXPORTER_OTLP_PROTOCOL=grpc`).

---
[docs]: https://aigateway.envoyproxy.io/docs/cli/
[openinference]: https://github.com/Arize-ai/openinference/tree/main/spec
[uv]: https://docs.astral.sh/uv/getting-started/installation/
[openai-responses]: https://github.com/envoyproxy/ai-gateway/issues/980
2 changes: 1 addition & 1 deletion inference-platforms/aigw/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ services:
environment:
- OTEL_SERVICE_NAME=aigw
- OPENAI_BASE_URL=http://host.docker.internal:11434/v1
- OTEL_EXPORTER_OTLP_ENDPOINT=http://host.docker.internal:4318
- OTEL_EXPORTER_OTLP_ENDPOINT=http://host.docker.internal:4317
ports:
- "1975:1975" # OpenAI compatible endpoint at /v1, MCP server at /mcp
configs:
Expand Down
6 changes: 2 additions & 4 deletions inference-platforms/aigw/env.local
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,8 @@ MCP_HEADERS=

# OpenTelemetry configuration
OTEL_SERVICE_NAME=openai-agent
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
OTEL_TRACES_EXPORTER=otlp
OTEL_METRICS_EXPORTER=otlp
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
OTEL_EXPORTER_OTLP_PROTOCOL=grpc

# Reduce trace and metrics export delay for demo purposes
OTEL_BSP_SCHEDULE_DELAY=100
Expand Down
2 changes: 1 addition & 1 deletion inference-platforms/chat.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ def main():
response = client.responses.create(
model=model, input=messages[0]["content"], temperature=0, extra_body=extra_body
)
print(response.output[0].content[0].text)
print(response.output_text)
else:
chat_completion = client.chat.completions.create(
model=model, messages=messages, temperature=0, extra_body=extra_body
Expand Down
Binary file modified inference-platforms/kibana-trace.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading