diff --git a/README.md b/README.md index 2c50404..73b8905 100644 --- a/README.md +++ b/README.md @@ -21,8 +21,8 @@ Open http://localhost:3000. ## Edit - Page content lives in `*.mdx` files. -- Navigation is configured in [`mint.json`](./mint.json). -- Add a new page by creating the `.mdx` file and listing it under the right group in `mint.json`. +- Navigation is configured in [`docs.json`](./docs.json). +- Add a new page by creating the `.mdx` file and listing it under the right group in `docs.json`. ## Deploy diff --git a/api-reference/account/api-keys.mdx b/api-reference/account/api-keys.mdx index 16a4154..e73924c 100644 --- a/api-reference/account/api-keys.mdx +++ b/api-reference/account/api-keys.mdx @@ -1,11 +1,11 @@ --- -title: "Manage API keys" -description: "List, mint, and revoke API keys." +title: "List, create, and revoke API keys" +description: "List, create, and revoke long-lived API keys for your org." --- ## `GET /me/api-keys` -Lists keys for the current **org** (sorted by `created_at` desc). Same shape as `GET /me`'s `api_keys` field. +Lists keys for the current **org**, sorted by `created_at` descending. The response matches the `api_keys` field in `GET /me`. ```http GET /me/api-keys[?include_revoked=true] @@ -18,7 +18,7 @@ Authorization: Bearer ## `POST /me/api-keys` -Mints a new key. Returns the plaintext value **once.** +Creates a new key. The plaintext value is returned **once**. ```http POST /me/api-keys @@ -50,7 +50,7 @@ Content-Type: application/json - Metadata (id, prefix, display_id, name, …). + Metadata for the key, including id, prefix, display_id, name, scopes, and timestamps. ```json @@ -84,13 +84,13 @@ Authorization: Bearer - Revocation is **irreversible.** Existing JWTs minted from a revoked key keep working until they expire (max 24h after revocation). + Revocation is **irreversible.** Existing JWTs created from a revoked key keep working until they expire, up to 24h after revocation. ## Examples -```bash cURL — list, mint, revoke +```bash cURL - list, mint, revoke # List active keys curl -H "Authorization: Bearer $JWT" \ https://api.getmetacognition.com/me/api-keys @@ -126,6 +126,6 @@ httpx.delete(f"{api}/me/api-keys/{key_id}", headers=H) ## Operational tips -- One key per environment. Mint `production`, `staging`, `local-dev-sauhard` separately. -- Set `last_used_at` thresholds in your monitoring — alert if a key hasn't been used in 30 days (probably abandoned). -- Don't share keys across services — each service gets its own so revocation has surgical blast radius. +- One key per environment. Mint `production`, `staging`, and `local-dev` separately. +- Alert if a key has not been used in 30 days. It may be abandoned. +- Do not share keys across services. Give each service its own key so revocation is narrow. diff --git a/api-reference/account/me.mdx b/api-reference/account/me.mdx index 47d98d4..42ed96c 100644 --- a/api-reference/account/me.mdx +++ b/api-reference/account/me.mdx @@ -1,10 +1,10 @@ --- -title: "Current org" -description: "Read your org, user, and active API keys." +title: "Get current organization" +description: "Read the authenticated org, user, and visible API keys." api: "GET https://api.getmetacognition.com/me" --- -Returns the authenticated org and user, plus all API keys for that **org** (not just the calling user). +Returns the authenticated org and user, plus all API keys visible to that **org**. ## Headers @@ -23,7 +23,7 @@ Authorization: Bearer - Array of API key metadata for the org (sorted by `created_at` desc). Plaintext keys are **not** included — only metadata. To get the plaintext value, mint a new key via [`POST /me/api-keys`](/api-reference/account/api-keys). + API key metadata for the org, sorted by `created_at` descending. Plaintext keys are **not** included. To get a plaintext key, create a new one with [`POST /me/api-keys`](/api-reference/account/api-keys). ## Example @@ -68,6 +68,6 @@ print(me["org_id"], "→", len(me["api_keys"]), "keys") ## When to use -- Build a "logged-in as…" UI in your own product. +- Build a "logged-in as" UI in your own product. - Audit which keys are still active before a rotation. -- Verify that `last_used_at` advances after a deploy (sanity check). +- Verify that `last_used_at` advances after a deploy. diff --git a/api-reference/auth/refresh.mdx b/api-reference/auth/refresh.mdx index 2449240..c45b0f0 100644 --- a/api-reference/auth/refresh.mdx +++ b/api-reference/auth/refresh.mdx @@ -1,10 +1,13 @@ --- -title: "Refresh token" -description: "Mint a new access token using a refresh token." +title: "Refresh access token" +description: "Use a refresh token to get a new access token." api: "POST https://api.getmetacognition.com/auth/refresh" +icon: "arrows-rotate" --- -Use this when an `access_token` has expired but the `refresh_token` is still valid. The SDK does this automatically on 401. +import { TokenRetryVisual } from "/snippets/flow-visuals.mdx"; + +Use this when an `access_token` has expired and the `refresh_token` is still valid. The SDK does this automatically after a 401. ## Body @@ -25,7 +28,7 @@ Use this when an `access_token` has expired but the `refresh_token` is still val - Possibly rotated; persist whichever the response returns. + May be rotated. Store the value returned by the response. @@ -58,15 +61,10 @@ tokens = resp.json() ## When refresh fails -If the refresh token is itself expired (>7d old) or revoked (the underlying API key was revoked), `/auth/refresh` returns `401`. Fall back to `/auth/token-exchange` with the original API key — or, if the API key is also gone, surface a re-login flow. +If the refresh token is expired (more than 7 days old) or revoked, `/auth/refresh` returns `401`. At that point, call `/auth/token-exchange` with the original API key. If the API key is also gone, ask the user or service to authenticate again. -```mermaid -flowchart TD - A[401 on a normal call] --> B[Try POST /auth/refresh] - B -->|200| C[Retry original call] - B -->|401| D[Try POST /auth/token-exchange] - D -->|200| C - D -->|401| E[Surface AuthenticationError to user] -``` +### After 401 + + -The Python SDK runs this entire flow internally on a single 401 response — you only see `AuthenticationError` if the whole chain fails. +Without the SDK, implement this sequence in your HTTP client. With the SDK, `AuthenticationError` usually means refresh **and** exchange both failed. diff --git a/api-reference/auth/signup.mdx b/api-reference/auth/signup.mdx index 9cf4c0c..c688276 100644 --- a/api-reference/auth/signup.mdx +++ b/api-reference/auth/signup.mdx @@ -1,10 +1,10 @@ --- -title: "Create account" -description: "Create a new org and mint the first API key." +title: "Sign up" +description: "Create an org and receive the first API key." api: "POST https://api.getmetacognition.com/signup" --- -Creates a new organization, a default user inside it, and the first API key for that user. The plaintext API key appears in the response **once and only once**. +Creates a new organization, a default user, and the first API key. The plaintext API key appears in the response **once**. ## Body @@ -16,7 +16,7 @@ Creates a new organization, a default user inside it, and the first API key for ``` - Optional org id. Auto-generated as `org_<10-char base62>` when omitted. Must be **alphanumeric** (with `-` or `_` allowed) and **≤ 64 chars**. Server returns `409 Conflict` if the id already exists. + Optional org id. If omitted, the server generates `org_<10-char base62>`. Must be alphanumeric, with `-` or `_` allowed, and at most 64 chars. Returns `409 Conflict` if the id already exists. @@ -34,7 +34,7 @@ Creates a new organization, a default user inside it, and the first API key for - The plaintext API key. **Save it now — there's no recovery.** + The plaintext API key. **Save it now. It cannot be recovered later.** @@ -82,11 +82,11 @@ api_key = data["api_key"] # store this securely ``` - The `api_key` field appears only in this response. Persist it server-side. We can't recover or re-display it. + The `api_key` field appears only in this response. Store it server-side. We cannot recover or re-display it. - This endpoint is unauthenticated by design — anyone can create an org. To gate it (e.g. private launch), put your existing auth provider in front (Auth0, Clerk) and only forward to `/signup` after their checks pass. + This endpoint is unauthenticated by design. Anyone can create an org. For a private launch, put your existing auth provider in front and only forward to `/signup` after your checks pass. ## Errors diff --git a/api-reference/auth/token-exchange.mdx b/api-reference/auth/token-exchange.mdx index 765c6f3..5e5c0ee 100644 --- a/api-reference/auth/token-exchange.mdx +++ b/api-reference/auth/token-exchange.mdx @@ -1,10 +1,12 @@ --- -title: "Exchange API key" -description: "Exchange an API key for a short-lived JWT pair." +title: "Exchange API key for tokens" +description: "Exchange an API key for short-lived access and refresh JWTs." api: "POST https://api.getmetacognition.com/auth/token-exchange" --- -Trades an API key for a JWT access/refresh pair. The Python SDK does this automatically — call this directly only if you're integrating from another language or brokering tokens in a separate service. +Exchange your API key for an access token and refresh token. This is the HTTP version of what the SDK does on first use. See [Authentication](/authentication) for the full flow. + +Call this directly only when you are not using the Python SDK, or when another service brokers tokens for your app. ## Body @@ -80,7 +82,7 @@ const tokens = await resp.json(); ## JWT contents -Decode the access token (don't *trust* the contents — verify with the [JWKS endpoint](https://api.getmetacognition.com/.well-known/jwks.json) if you need to): +Decode the access token if you want to inspect its claims. If you need to trust those claims, verify the token with the [JWKS endpoint](https://api.getmetacognition.com/.well-known/jwks.json). ```json { diff --git a/api-reference/memory/ingest-memory.mdx b/api-reference/memory/ingest-memory.mdx index 83ebb15..ea28db0 100644 --- a/api-reference/memory/ingest-memory.mdx +++ b/api-reference/memory/ingest-memory.mdx @@ -1,10 +1,10 @@ --- -title: "Remember conversation" -description: "Persist conversation turns into memory." +title: "Ingest conversation memory" +description: "Write turns under a scope. Active memory is saved first; enrichment continues in the background." api: "POST https://api.getmetacognition.com/ingestion/memory" --- -The REST equivalent of `tex.conversations.remember(...)`. Active write completes synchronously; passive enrichment runs in the background. +This is the REST version of **`tex.conversations.remember`**. Use it to write turns under a scope. Tex saves active memory first, then continues enrichment in the background. ## Headers @@ -40,7 +40,7 @@ Content-Type: application/json ``` - Your org id (≥ 1 char). Server uses the JWT's `org_id` claim for tenancy regardless — this field is required for Pydantic validation only. + Your org id. Minimum length is 1 character. The server still uses the JWT's `org_id` claim for tenancy; this field is required for request validation. @@ -52,15 +52,15 @@ Content-Type: application/json - At least one turn (`min_length=1`). Each turn: `{role, text, timestamp, observations?}`. See [memory model](/concepts/memory-model). + At least one turn (`min_length=1`). Each turn: `{role, text, timestamp, observations?}`. See [How memory works](/concepts/memory-model). - Optional dual-write toggles: `{ write_active: bool = true, write_passive: bool = true }`. Disabling either lets advanced callers bypass one of the storage tiers. + Optional write toggles: `{ write_active: bool = true, write_passive: bool = true }`. Advanced callers can disable one storage tier. - Free-form metadata. **Currently stored but not surfaced** in retrieval — reserved for future filters. + Free-form metadata. It is stored today and reserved for future filters. ## Response — `202 Accepted` @@ -70,7 +70,7 @@ Content-Type: application/json - Active-memory fragment ids — already recallable. + Active-memory fragment ids. These are already recallable. @@ -127,4 +127,4 @@ resp.raise_for_status() ## Idempotency -Tex computes a stable hash per turn (`role + text + timestamp`). Re-sending the same turn is a no-op — no duplicate fragments, no double-billing. Safe to retry on network failure. +Tex computes a stable hash per turn from `role`, `text`, and `timestamp`. Re-sending the same turn is a no-op. It does not create duplicate fragments or double bill the turn. It is safe to retry after a network failure. diff --git a/api-reference/memory/recall.mdx b/api-reference/memory/recall.mdx index a802640..b6fc6b6 100644 --- a/api-reference/memory/recall.mdx +++ b/api-reference/memory/recall.mdx @@ -1,10 +1,10 @@ --- title: "Recall memory" -description: "Retrieve the most relevant slice of memory." +description: "Search memory with a natural-language query and get ranked hits, confidence, and usage." api: "POST https://api.getmetacognition.com/recall" --- -REST equivalent of `tex.recall(...)`. +This is the REST version of **`tex.recall`**. Use it before generation to get the memory your model should read. For **`mode`**, **`top_k`**, and confidence, see [Recall and ranking](/concepts/retrieval). ## Headers @@ -29,11 +29,11 @@ Content-Type: application/json ``` - Your org id (≥ 1 char). Server uses the JWT's `org_id` claim for tenancy regardless — this field is required for Pydantic validation only. + Your org id. Minimum length is 1 character. The server still uses the JWT's `org_id` claim for tenancy; this field is required for request validation. - Session to search. Optional — falls back to the JWT's `session_id`. + Session to search. Optional. Falls back to the JWT's `session_id`. @@ -41,11 +41,11 @@ Content-Type: application/json - Retrieval depth. See [retrieval](/concepts/retrieval). + Retrieval depth. See [Recall and ranking](/concepts/retrieval). - Hits to return across all kinds. **Defaults to 15 (active) / 25 (deep).** Pydantic validates `1 ≤ top_k ≤ 50`; the runtime then caps at **30**, so values above 30 are silently clamped. + Hits to return across all kinds. **Defaults to 15 (active) / 25 (deep).** Request validation accepts `1 <= top_k <= 50`; runtime caps the final value at **30**. @@ -63,7 +63,7 @@ Content-Type: application/json - Linked entities. Each entity has `{id?, label, score}` — **not** the same shape as turns/observations. + Linked entities. Each entity has `{id?, label, score}`. This is different from turns and observations. @@ -82,7 +82,7 @@ Content-Type: application/json `{tokens_in, tokens_out}` for this call. -### Hit shape +### Hit fields ```json { diff --git a/api-reference/overview.mdx b/api-reference/overview.mdx index c8913fa..6d30fb0 100644 --- a/api-reference/overview.mdx +++ b/api-reference/overview.mdx @@ -1,9 +1,24 @@ --- -title: "Overview" -description: "Use Tex directly over HTTPS — for languages and runtimes the SDK doesn't cover yet." +title: "REST API overview" +description: "Base URL, JWT auth, correlation IDs, errors, limits, and retries." +icon: "globe" --- -The Tex API is a small REST surface. If you're writing Python, prefer the [SDK](/sdk/installation) — it handles auth, refresh, and serialization. Use the raw API when you're in a language we don't ship a client for, or when you're building a sidecar. + + In Python, prefer the [SDK](/sdk/installation). It handles token exchange and refresh for you. Use this page for curl, other languages, or raw HTTP debugging. + + +## Quick reference + +| Topic | Where | +| --- | --- | +| Live host | `https://api.getmetacognition.com` | +| Auth header | `Authorization: Bearer ` | +| Get tokens | [`POST /auth/token-exchange`](/api-reference/auth/token-exchange) | +| Trace failures | `X-Correlation-ID` + JSON `request_id` | +| Limits & metering | [Usage, quotas, and billing](/concepts/usage-billing) | + +The API is small, and auth works the same way across it: **`/me`**, **`/ingestion/memory`**, **`/recall`**, and **`/usage/*`**. ## Base URL @@ -13,13 +28,13 @@ https://api.getmetacognition.com ## Auth -Every Tex-unique endpoint expects: +Every product endpoint expects: ```http Authorization: Bearer ``` -Where `` is a JWT obtained by exchanging your API key. See [`POST /auth/token-exchange`](/api-reference/auth/token-exchange). +Create `` by posting your API key to [`POST /auth/token-exchange`](/api-reference/auth/token-exchange). Then send it like any other short-lived bearer token. ## Content type @@ -29,17 +44,15 @@ Always: Content-Type: application/json ``` -## Correlation ID +## Correlation IDs -Every request gets an `X-Correlation-ID` (UUID) generated on the server if you don't supply one. The SDK generates one client-side. Pass it through your own logs to trace requests end-to-end: +Each request can include an `X-Correlation-ID` UUID. If you do not send one, the server creates one. The SDK always sends one so your logs and ours match. Include this value in support threads. ```http X-Correlation-ID: 4f1d8e3c-2a9b-4c0d-9e6f-1a2b3c4d5e6f ``` -Quote this header in support tickets. - -## Errors +## HTTP errors Standard HTTP status codes: @@ -53,11 +66,11 @@ Standard HTTP status codes: | `401` | Auth failure | | `403` | Forbidden | | `404` | Not found | -| `422` | Validation error (FastAPI shape) | +| `422` | Validation error (FastAPI request format) | | `429` | Daily quota exceeded | -| `5xx` | Our problem | +| `5xx` | Tex platform fault. Retry with backoff, then escalate with the correlation ID. | -Error body shape: +Error body: ```json { @@ -71,20 +84,17 @@ Error body shape: ## Rate limits -Daily quota is enforced per-org. Currently: - -- `tokens_in`: 1,000,000 / day -- `tokens_out`: 5,000,000 / day - -Reset 00:00 UTC. See [Usage and billing](/concepts/usage-billing). + + Limits are per organization. The free tier allows **1,000,000** `tokens_in` and **5,000,000** `tokens_out` each UTC day. Both reset at **00:00 UTC**. [Usage, quotas, and billing](/concepts/usage-billing) explains how this appears in responses and dashboards. + ## Retries -Recommended client-side policy: 2 retries with exponential backoff on `408`, `500`, `502`, `503`, `504`, and network errors. Honor the `Retry-After` header. +Retry **twice** with exponential backoff on `408`, `500`, `502`, `503`, `504`, and hard network failures. Respect `Retry-After` when the API sends it. -Don't retry `400`, `401`, `403`, `404`, `422`, `429` — they're terminal. +Do not retry `400`, `401`, `403`, `404`, `422`, or quota `429`. A replay will usually fail the same way. -## Endpoint index +## Endpoints diff --git a/api-reference/usage/summary.mdx b/api-reference/usage/summary.mdx index 2571077..8ac4e06 100644 --- a/api-reference/usage/summary.mdx +++ b/api-reference/usage/summary.mdx @@ -1,10 +1,10 @@ --- -title: "Monthly usage" -description: "Calendar-month token totals." +title: "Monthly usage summary" +description: "Read monthly token rollups in UTC." api: "GET https://api.getmetacognition.com/usage/summary" --- -Calendar-month rollup, UTC. Defaults to the current month. +Returns a calendar-month usage rollup in UTC. If you omit `month`, the endpoint returns the current month. ## Headers @@ -25,11 +25,11 @@ Authorization: Bearer - First moment of the month, UTC. + First moment of the month in UTC. - First moment of the next month, UTC. + First moment of the next month in UTC. diff --git a/api-reference/usage/today.mdx b/api-reference/usage/today.mdx index 1834f77..fcf668e 100644 --- a/api-reference/usage/today.mdx +++ b/api-reference/usage/today.mdx @@ -1,10 +1,10 @@ --- title: "Today's usage" -description: "Today's token totals and the active daily quota." +description: "Read today's token totals, limits, and reset time." api: "GET https://api.getmetacognition.com/usage/today" --- -Returns today's `tokens_in` / `tokens_out` plus the limits that will trigger a `429`. Doesn't count against your own quota — poll as often as you want. +Returns today's `tokens_in` and `tokens_out`, plus the limits that trigger `429`. This request does **not** count against your quota, so you can poll it from dashboards or monitors. ## Headers @@ -15,11 +15,11 @@ Authorization: Bearer ## Response — `200` - Tokens billed today, ingress. + Input tokens billed today. - Tokens billed today, egress. + Output tokens billed today. diff --git a/authentication.mdx b/authentication.mdx index 8faa80c..5454a67 100644 --- a/authentication.mdx +++ b/authentication.mdx @@ -1,38 +1,80 @@ --- -title: "Authentication" -description: "How API keys, JWTs, and the refresh loop work." +title: "Authentication & API keys" +description: "How your API key becomes a JWT, how the SDK refreshes it, and where to store secrets in dev and prod." +icon: "key" --- -You hold an **API key.** The SDK exchanges it for a short-lived **JWT** the first time it's needed and refreshes transparently. You never see the JWT. +import { AuthSequence } from "/snippets/flow-visuals.mdx"; -## The auth flow + + If you only need a working client, start with the [Quickstart](/quickstart). Come back here when you are wiring secrets, using your own JWT, or rotating keys. + + +The SDK starts with your API key. On the first real call, it exchanges that key for short-lived access and refresh tokens. + +After that, the client keeps the tokens in memory, refreshes them when needed, and retries the original call. Most apps never need to handle raw token strings. + +## Flow + +The diagram shows what happens when your app calls the SDK. ```mermaid sequenceDiagram - participant App as Your app - participant SDK as Tex SDK - participant API as Tex API - - App->>SDK: Tex(api_key="tex_live_…") - Note over SDK: Lazy — no call yet - App->>SDK: tex.recall(...) - SDK->>API: POST /auth/token-exchange - API-->>SDK: access_token (24h) + refresh_token (7d) - SDK->>API: POST /recall (Authorization: Bearer access_token) - API-->>SDK: hits - - Note over SDK: 24h later, on next call: - SDK->>API: POST /recall (expired) - API-->>SDK: 401 - SDK->>API: POST /auth/refresh - API-->>SDK: new access_token - SDK->>API: POST /recall (retry) - API-->>SDK: hits + autonumber + participant App as Your app + participant SDK as Tex SDK + participant API as Tex API + + App->>SDK: Tex(api_key=...) + note over SDK: Lazy — no network until a real method runs + App->>SDK: tex.recall(...) / remember / usage + SDK->>API: POST /auth/token-exchange + API-->>SDK: access_token (24h) + refresh_token (7d) + SDK->>API: Product call (Authorization: Bearer access_token) + API-->>SDK: 200 + payload + SDK-->>App: return value + + note over App,API: Later: access token expired + App->>SDK: tex.recall(...) + SDK->>API: Product call (expired Bearer) + API-->>SDK: 401 + SDK->>API: POST /auth/refresh + API-->>SDK: new access_token + SDK->>API: Retry product call + API-->>SDK: 200 + payload + SDK-->>App: return value ``` -## What you provide - -The constructor accepts any **one** of three auth modes: +### Steps in the SDK + +.", + }, + { + id: "refresh", + title: "When access expires", + detail: "The SDK calls POST /auth/refresh, gets a new access token, and retries once.", + }, + ]} +/> + +## Auth mode + +Most apps use an API key. Use one of the other modes only when you already manage auth somewhere else. @@ -42,7 +84,7 @@ The constructor accepts any **one** of three auth modes: base_url="https://api.getmetacognition.com", ) ``` - Default for almost every app. The SDK handles exchange and refresh. + This is the default for most apps. The SDK handles token exchange and refresh. ```python @@ -55,10 +97,10 @@ The constructor accepts any **one** of three auth modes: ) ``` - Useful when you broker auth in a separate service and pass JWTs to your client. + Use this when another service already creates the JWTs. - Unlike the api-key flow, BYO-JWT does **not** auto-fill `org_id` / `user_id` from `/auth/verify`. Pass them explicitly — every `remember` and `recall` needs them in the request scope. + BYO-JWT does **not** auto-fill `org_id` / `user_id` from `/auth/verify`. Pass them explicitly. Every `remember` and `recall` needs them in the request scope. @@ -71,31 +113,31 @@ The constructor accepts any **one** of three auth modes: ``` - `/auth/login` is **disabled** in production (returns 403 unless the integration backend runs with `DEBUG=true`). Use this only for local development against a self-hosted backend. For production apps, use an API key. + `/auth/login` is **disabled** in production. It returns 403 unless the integration backend runs with `DEBUG=true`. Use this only for local development against a self-hosted backend. In production, use an API key. -## Where to keep the key +## Key storage `.env` file. Add it to `.gitignore`. Load with `python-dotenv`. - Secret manager → mounted as `TEX_API_KEY` env var. + Secret manager mounted as `TEX_API_KEY` env var. - Project Environment Variables → `TEX_API_KEY`. + Project environment variable named `TEX_API_KEY`. - Repository secret → exposed via `${{ secrets.TEX_API_KEY }}`. + Repository secret exposed as `${{ secrets.TEX_API_KEY }}`. The SDK reads `TEX_API_KEY` from the environment automatically when `api_key=` is omitted. -## Rotating keys without downtime +## Rotation @@ -105,14 +147,14 @@ The SDK reads `TEX_API_KEY` from the environment automatically when `api_key=` i Deploy with `TEX_API_KEY=`. - Check the dashboard's "last used" column or your own logs. + Check the dashboard's `last_used_at` value or your own logs. - Click **Revoke** on the old key. Existing JWTs minted from key A keep working for up to 24h, so step 4 has zero customer-visible impact. + Click **Revoke** on the old key. JWTs created from key A can keep working for up to 24h, so customers do not see a sudden failure. -## What happens on a bad key +## Bad key ```python Python @@ -137,7 +179,7 @@ $ curl -X POST https://api.getmetacognition.com/auth/token-exchange \ - **Token lifetimes.** Access JWT lasts 24h. Refresh JWT lasts 7d. Beyond that, the SDK re-exchanges your API key automatically. JWTs are stateless — to invalidate one, revoke the underlying API key. + **Token lifetimes.** Access JWTs last 24h. Refresh JWTs last 7d. After that, the SDK exchanges your API key again. To invalidate tokens, revoke the API key they came from. diff --git a/benchmarks.mdx b/benchmarks.mdx index e621e37..183f7fa 100644 --- a/benchmarks.mdx +++ b/benchmarks.mdx @@ -1,6 +1,7 @@ --- -title: "Benchmarks" -description: "#1 on LoCoMo (93.3%) and LongMemEval_S (92.2%) — the two industry-standard long-term memory benchmarks." +title: "Benchmarks and methodology" +description: "LoCoMo and LongMemEval_S scores, with category tables, latency, tokens, and methodology." +icon: "trophy" --- export const Bar = ({label, pct, value, win}) => ( @@ -20,20 +21,22 @@ export const Chart = ({title, children}) => ( ); -Tex is the state of the art on long-term conversational memory. Below are the full results from the two benchmarks the field uses to compare systems: **LoCoMo** (EMNLP 2024) and **LongMemEval** (ICLR 2025). +Tex scores **93.3%** overall on **LoCoMo** with the full system. Tex scores **92.2%** on **LongMemEval_S** with active retrieval only. + +This page shows the category tables, latency, tokens, and methodology behind those numbers. - Full system. Previous SOTA: **EverMemOS at 92.3%**. + Full Tex system vs published baselines. EverMemOS was the prior headline number at **92.3%**. - Active-only. Previous SOTA among retrieval systems: **Emergence AI at 86.0%**. + Active retrieval track vs other retrieval-first systems. Emergence AI posted **86.0%** on comparable reporting. - - Both numbers are reported with **gpt-4o-mini** as the reader and **gpt-4o** as the LLM-as-judge. Single-machine deployment; no distributed infrastructure. Methodology and reproducibility notes below. - + + We generated answers with **gpt-4o-mini** and graded them with **gpt-4o** in an LLM-as-judge setup. Each evaluation ran on a **single** machine. The exact setup is in **Methodology** below. + ## LoCoMo @@ -65,7 +68,7 @@ LoCoMo evaluates long-conversation memory across 10 multi-session conversations - Tex's **99.33% on adversarial questions** is the strongest signal: the system correctly *refuses* to answer unanswerable questions rather than hallucinating. Most competitors don't disclose this category. + On adversarial items Tex is **99.33%**. It declines to answer when the transcript does not support an answer. If a public benchmark skips that bucket, compare headline numbers carefully. ## LongMemEval_S @@ -97,19 +100,19 @@ LongMemEval_S evaluates memory over **500 questions across ~48 sessions each (~1 - + - Tex's **92.2% matches Oracle GPT-o3 (92.0%)** and beats Oracle GPT-4o (82.4%) — meaning Tex's retrieval surfaces relevant context more effectively than oracle-selected sessions for many question types. + **92.2%** is close to Oracle GPT-o3 (**92.0%**) and well above Oracle GPT-4o (**82.4%**). That means retrieval is finding evidence close to what you would hand-pick for each question. ## Latency -| Configuration | p50 | p90 | End-to-end (incl. reader) | +| Configuration | p50 | p90 | End-to-end (including the model that reads context) | | --- | --- | --- | --- | | **Tex (Active)** | **~120ms** | ~200ms | ~0.6s | | **Tex (Full System)** | ~350ms | ~500ms | ~1.0s | @@ -152,18 +155,16 @@ LongMemEval_S evaluates memory over **500 questions across ~48 sessions each (~1 - **Tex uses zero LLM tokens during ingestion.** All embedding and indexing runs on offline models — no provider call per turn. Competitors typically run an LLM pass on every ingested message; those costs aren't always disclosed but are real. + Ingest here costs **zero LLM tokens**. This run used offline embeddings only. If a system runs an LLM per message during ingest, that cost should show up in token usage. ### Headline efficiency claims -- **27% more accurate than Mem0** with **87% fewer tokens** and **~95% lower latency**. -- **5.8% more accurate than MemMachine Memory mode** with **43% fewer tokens**. -- **Lowest tokens-per-correct-answer on the board** at ~1,296. +Compared with published Mem0 numbers, Tex is about **27%** higher on accuracy, uses roughly **87%** fewer tokens, and runs at about **95%** lower latency. Against MemMachine's memory-only configuration, Tex is about **5.8%** more accurate on **43%** fewer tokens. On LoCoMo, Tex also has the lowest tokens-per-correct-answer in this comparison: about **1,296**. -## Ablation — what each capability contributes +## Ablation: what each part of the pipeline adds -On LongMemEval_S, peeling layers off the production retrieval pipeline: +On LongMemEval_S, removing pieces of the retrieval pipeline changes accuracy like this: | Capability | What it does | Δ accuracy | | --- | --- | --- | @@ -177,18 +178,15 @@ On LongMemEval_S, peeling layers off the production retrieval pipeline: ## Methodology -- **Reader model**: `gpt-4o-mini` -- **Judge model**: `gpt-4o` -- **Evaluation**: LLM-as-judge, category-specific judge prompts, binary yes/no, averaged per category -- **LoCoMo**: 10 conversations, 1,984 QA pairs, Tex Full System -- **LongMemEval_S**: 500 questions, ~48 sessions each, Tex Active Only -- **Infrastructure**: Single machine. No distributed cluster. Complete eval runs end-to-end without manual intervention. +Answers came from **`gpt-4o-mini`**. We graded them with an LLM-as-judge setup built on **`gpt-4o`**, using category-specific prompts, binary pass/fail per item, and a straight average inside each category. + +**LoCoMo** used all ten provided conversations (**1,984** question-answer pairs) against the **full Tex system** (not retrieval-only). **LongMemEval_S** used **500** questions with about **48** sessions each (~115K tokens per trace) against **Tex Active** retrieval only. + +The full pipeline ran on a **single** machine. There was no multi-node orchestration. -## What's next +## On our roadmap for evals -- **MemoryAgentBench (ICLR 2026)** — next eval target; tests accurate retrieval, test-time learning, long-range understanding, conflict resolution. -- **Multi-step reasoning** — two-pass retrieval + reasoning for complex counting / computation. -- **LongMemEval_M** — scaling to 500+ sessions per question for extreme long-memory evaluation. +We plan to add **MemoryAgentBench (ICLR 2026)**, stronger multi-step reasoning for questions that need counting or arithmetic over evidence, and **LongMemEval_M** for questions that span hundreds of sessions. ## References @@ -200,6 +198,6 @@ On LongMemEval_S, peeling layers off the production retrieval pipeline: 6. Supermemory. *State-of-the-Art Agent Memory Research.* 2026. 7. Mastra. *Observational Memory: 95% on LongMemEval.* 2026. - - Five minutes from `pip install` to first recall. + + Install the SDK, store one turn, and print a recall result. diff --git a/changelog.mdx b/changelog.mdx index 21c8667..d2499bb 100644 --- a/changelog.mdx +++ b/changelog.mdx @@ -1,6 +1,7 @@ --- title: "Changelog" -description: "What's new in the Tex SDK." +description: "SDK and API release notes." +icon: "clock-rotate-left" --- ## 1.1.0 — 2026-05 @@ -9,7 +10,7 @@ description: "What's new in the Tex SDK." - New `tex.usage.today()` and `tex.usage.summary(month=...)` methods. - Every `remember` and `recall` response now carries a `usage` field with `tokens_in` and `tokens_out`. -- Daily quotas enforced at the engine — `RateLimitError` on exceed. +- Daily quotas enforced at the engine. Exceeding them raises `RateLimitError`. **Conversation-native verbs** @@ -32,7 +33,7 @@ Initial public release. Supermemory-compatible verbs (`add`, `search`, `profile` - Native async methods on a parallel `AsyncTex` class. Same API surface, awaitable. Targeting Q3. + Native async methods on an `AsyncTex` class. Same methods, but awaitable. Targeting Q3. `tex.recall(..., user_id=...)` without constructing a new client. Targeting 1.2. @@ -47,6 +48,6 @@ Initial public release. Supermemory-compatible verbs (`add`, `search`, `profile` Email notifications at 80% / 100% of either daily cap. Targeting 1.2. - `read` / `write` / `usage:read` scopes for principle-of-least-privilege keys. + Separate **`read`**, **`write`**, and **`usage:read`** scopes so each key does only what it should. diff --git a/concepts/memory-model.mdx b/concepts/memory-model.mdx index 53cab3b..26911e8 100644 --- a/concepts/memory-model.mdx +++ b/concepts/memory-model.mdx @@ -1,67 +1,110 @@ --- title: "How memory works" -description: "Three layers, two write phases, one query API." +description: "What Tex stores after remember, and how soon each layer can appear in recall." +icon: "brain" --- -Tex stores memory in three layers, each with different latency and recall characteristics. +import { PipelineFlow } from "/snippets/flow-visuals.mdx"; + +Call **`remember`** when you have new turns to store. Call **`recall`** when you have a question and want the best matches back. + +Most of the time you work with **turns**. Tex also builds **observations** and **entities** in the background. A write usually becomes recallable in about **150 ms**. The richer memory layers continue after that. + +## Layers - The raw transcript. What was said, by whom, when. + Raw lines: who said what, when. - Atomic facts extracted from turns. *"User avoids shellfish."* + Small facts inferred from turns, such as dietary constraints or locations. - Recurring things — people, places, projects — linked across observations. + People, places, and organizations that show up across observations. -## How writes work +## Writes -```mermaid -flowchart LR - A[remember] -->|~150ms| B[Active memory] - B -->|returns| C[Caller] - A -.->|async| D[Passive enrichment] - D --> E[Observations + Entities + Temporal] -``` +When you call **`remember`**, Tex first saves the turn in active memory. That is the fast path. Then it keeps building richer memory in the background. -- **Active write** is synchronous. Turns are recallable in ~150ms. -- **Passive enrichment** runs in the background. Observations and entities surface in subsequent recalls. + -You don't wait for enrichment. The active turn is enough for the next request. +### Fast path -## How reads work +Your code gets control back quickly. New turns are usually recallable within about **150 ms**. -```mermaid -flowchart LR - Q[Query] --> X[Expansion] - X --> H[Hybrid retrieval] - H --> R[Cross-encoder rerank] - R --> C[Calibrated confidence] - C --> O[Top-k hits] -``` +### Background + +Observations, entities, and timeline work continue after the response. They improve recall on later questions. -Vector search + temporal scoring + entity-graph hits, fused, reranked, then calibrated. The output: scored turns, observations, and entities — plus a confidence number. +You do not need the background work to finish before the next user message. The latest turn can still be enough. -## What gets extracted +## Reads -Take this turn: +For reads, pass a natural-language **`q`** and the scope to search. Tex retrieves candidates, ranks them, and returns **`hits`** with a **`confidence`** score. + +Over HTTP, **`POST /recall`** takes **`q`**, **`scope`**, and options like **`mode`**, **`top_k`**, and **`include_timeline`**. The response includes ranked turns, observations, entities, token **`usage`**, and an optional **`timeline`** string. The full request and response fields are in [Recall memory](/api-reference/memory/recall). + + + +Tune **`mode`**, **`top_k`**, and **`confidence`** behavior in [Recall and ranking](/concepts/retrieval). + +## One example turn ```python {"role": "user", "text": "I just moved from Seattle to Austin for a job at Acme.", "timestamp": "..."} ``` -| Layer | What's stored | +| Layer | What you get | | --- | --- | -| Turn | Full text, role, timestamp, dedup hash | -| Observations | `lives_in: Austin`, `previously_lived_in: Seattle`, `works_at: Acme` | -| Entities | `Person(self)`, `Place(Seattle)`, `Place(Austin)`, `Org(Acme)` — linked | -| Temporal | A point on the user's timeline | +| Turn | Full text, role, timestamp, dedupe metadata | +| Observations | Facts like current city, previous city, employer | +| Entities | Typed nodes (person, place, org) wired together | +| Temporal | Events on a lightweight timeline | + +## BYO facts -You don't think about extraction — it's automatic. You *can* pre-extract and pass observations inline. See [`conversations.remember`](/sdk/conversations-remember). +Let Tex extract facts for you. If you already have facts from your own system, attach them to **`remember`**. See [`conversations.remember`](/sdk/conversations-remember). - - Org / user / session partitioning. + + How `org_id`, `user_id`, and `session_id` isolate memory. diff --git a/concepts/retrieval.mdx b/concepts/retrieval.mdx index 98ae76b..f70d351 100644 --- a/concepts/retrieval.mdx +++ b/concepts/retrieval.mdx @@ -1,42 +1,54 @@ --- title: "Recall and ranking" -description: "Active vs deep, top_k tuning, confidence gating." +description: "Choose active or deep recall, set top_k, and decide when confidence is strong enough." +icon: "magnifying-glass" --- -`tex.recall(q, session_id, ...)` is your read path. +[How memory works](/concepts/memory-model) explains what Tex stores. This page explains how to choose what comes back. + +Use **`mode="active"`** for chat and copilots. Use **`mode="deep"`** when the user can wait longer, or when active recall is not finding enough. + +Use **`top_k`** to choose how many hits you give the model. Smaller values keep prompts tight. Larger values help summaries, digests, and long answers. Since hits count toward **`tokens_out`**, this also affects cost. + +If **`confidence`** stays under about **0.3**, do not force the memory into the prompt. Try **`mode="deep"`** once, raise **`top_k`**, or ask a clearer **`q`**. + +Python uses **`tex.recall(q, session_id, ...)`**. HTTP uses **`POST /recall`** with the same ideas. The REST field list is in [Recall memory](/api-reference/memory/recall). ## Modes - - - Single-pass. **1.5–2.5s.** Use for every interactive call. - - - Two-pass with iterative rerank. **3–6s.** For periodic analysis or low-confidence retries. - - +### `active` (default) + +- **Best for:** Chat, copilots, and live user flows. +- **Rough latency:** about **1.5-2.5 s** end-to-end in typical setups. +- **Behavior:** Single-pass retrieval and ranking. + +### `deep` + +- **Best for:** Offline jobs, decision reviews, or a second pass after weak `active` results. +- **Rough latency:** about **3-6 s**. +- **Behavior:** Two-pass with heavier reranking. ## `top_k` -Defaults: **15** (active) / **25** (deep). Server caps at **30** regardless of what you send. +Defaults: **15** (`active`) / **25** (`deep`). The server caps at **30** no matter what you send. -| Use case | `top_k` | +| Situation | Starting `top_k` | | --- | --- | -| Live chat, small context | 3–5 | -| Live chat, large context | 8–15 | -| Summaries / digests | 20–30 | +| Tight assistant prompt | 3-5 | +| Standard chat with citations | 8-15 | +| Summaries or long answers | 20-30 | -Larger `top_k` = more `tokens_out` billed. +Larger `top_k` directly increases **`tokens_out`** on your bill. How that maps to quota is in [Usage, quotas, and billing](/concepts/usage-billing). ## Confidence -Every recall returns a `confidence ∈ [0, 1]`, calibrated so `P(hits relevant | c) ≈ c`. +Every recall returns **`confidence` in [0, 1]**, calibrated so that roughly **`P(relevant hits | confidence) ≈ confidence`**. -| Range | Meaning | Action | +| Range | How to read it | Practical move | | --- | --- | --- | -| ≥ 0.6 | Strong | Use as-is | -| 0.3 – 0.6 | Useful but uncertain | Use, cite the sources | -| < 0.3 | Weak | Try `mode="deep"`, or skip memory | +| **≥ 0.6** | Strong | Pass context to the model as-is. | +| **0.3 - 0.6** | Mixed | Use hits, but cite or summarize sources for the user. | +| **< 0.3** | Weak | Try `mode="deep"`, rephrase `q`, or skip memory for this turn. | ```python hits = tex.recall(q=q, session_id=sid) @@ -44,25 +56,27 @@ if hits.confidence < 0.3: hits = tex.recall(q=q, session_id=sid, mode="deep") ``` -## What's in a hit +## Hit fields ```python RecallHit(id, text, score, kind, timestamp) # turns + observations RecallEntity(id, label, score) # entities ``` -`hits.hits.turns` is what you stuff into a system prompt. `hits.hits.observations` are atomic facts. `hits.hits.entities` are linked things — useful for analytical queries. +- **`hits.hits.turns`** - use these for most prompts. +- **`hits.hits.observations`** - small facts extracted from prior turns. +- **`hits.hits.entities`** - people, places, and organizations that help with "who", "what", and "where" questions. -## Timeline +## Timeline string ```python hits = tex.recall(q="when did we discuss pricing?", session_id=sid, include_timeline=True) -print(hits.timeline) # a pre-rendered chronological summary string +print(hits.timeline) # optional pre-rendered string ``` -`timeline` is an `Optional[str]` — drop it into a prompt as-is. It's not iterable. +`timeline` is an **`Optional[str]`**: either drop it straight into a prompt or ignore it. It is not a list you iterate. - - Tokens in / out and quota. + + How recall choices affect `tokens_in` / `tokens_out`. diff --git a/concepts/scopes.mdx b/concepts/scopes.mdx index 533f8c6..95e465e 100644 --- a/concepts/scopes.mdx +++ b/concepts/scopes.mdx @@ -1,40 +1,45 @@ --- -title: "Multi-user memory" -description: "Org / user / session — how Tex partitions memory." +title: "Scopes and multi-tenancy" +description: "Use org_id, user_id, and session_id to keep memory separated." +icon: "layer-group" --- -Every turn and every recall is keyed by `(org_id, user_id, session_id)`. +Every memory call is scoped by **`org_id`**, **`user_id`**, and **`session_id`**. -| Field | Set by | Required | +Your API key decides **`org_id`**. The SDK usually gets **`user_id`** from the token. Your app chooses **`session_id`**. That is the field you use for a chat thread, Slack channel, agent run, or tenant-specific memory. + +| Field | Source | You set it? | | --- | --- | --- | -| `org_id` | The JWT (your API key) — server-injected | ✓ | -| `user_id` | The JWT, or override per-call for sub-user scoping | ✓ | -| `session_id` | **You, per call.** | ✓ | +| `org_id` | JWT minted from your API key | Almost never | +| `user_id` | JWT (or per-call override when supported) | Sometimes | +| `session_id` | Your application | **Always** | -`org_id` is locked to your API key. You almost never set it manually. `session_id` is where you do the work. +In most apps, **`session_id`** is the field you set on every call. -## Picking a `session_id` +## `session_id` -Free-form string. Reuse, don't enumerate. +Pick a stable pattern: - + `chat-{conversation_uuid}` `slack-{channel_id}` - + `agent-{task_id}` - + `bio-{user_id}` -## Multi-user SaaS +Use the same string for the same logical thread. That way **`recall`** searches the same memory each time. + +## SaaS (one key) -Each end-user gets their own memory. **Encode the end-user into the `session_id`** with one shared client: +For a SaaS app, map each customer or user conversation to a distinct **`session_id`**. One shared **`Tex`** client is enough: ```python tex = Tex(api_key=os.environ["TEX_API_KEY"], base_url=BASE_URL) @@ -45,12 +50,12 @@ def chat(user_msg: str, x_user_id: str, conv_id: str): ... ``` -One API key, one bill, one client, full isolation between end-users. +That gives you one bill and separated memory. The only rule is simple: never reuse one customer's **`session_id`** for another customer. - Per-call `user_id` scoping is on the SDK roadmap (1.2). Until then, encoding into `session_id` is the cleanest pattern. See the [multi-tenant SaaS recipe](/recipes/multi-tenant-saas). + Per-call `user_id` overrides are planned for **SDK 1.2**. Until then, include the end user in `session_id`. The [multi-tenant SaaS recipe](/recipes/multi-tenant-saas) shows the full pattern. - - Active vs deep, top_k, confidence. + + `top_k`, modes, confidence. diff --git a/concepts/usage-billing.mdx b/concepts/usage-billing.mdx index 7e66226..0276849 100644 --- a/concepts/usage-billing.mdx +++ b/concepts/usage-billing.mdx @@ -1,9 +1,12 @@ --- -title: "Usage and billing" -description: "Tokens in, tokens out, daily quotas." +title: "Usage, quotas, and billing" +description: "What Tex meters, how daily caps work, and how to keep memory costs predictable." +icon: "receipt" --- -The billable unit is **tokens** — `tokens_in` (everything you sent us) and `tokens_out` (everything we returned). Counted with `tiktoken cl100k_base`, same encoding as `text-embedding-3-large`. +Tex tracks two numbers: **`tokens_in`** and **`tokens_out`**. It counts them with **`tiktoken`** and the **`cl100k_base`** vocabulary. + +On the free tier, each org gets **1M** tokens in and **5M** tokens out per UTC day. If either limit is exceeded, the API returns **`429`**. Every **`remember`** and **`recall`** response includes a **`usage`** object. The SDK helpers **`tex.usage.today()`** and **`tex.usage.summary()`** show the same totals as the dashboard. ## Free tier @@ -16,48 +19,52 @@ The billable unit is **tokens** — `tokens_in` (everything you sent us) and `to -Reset 00:00 UTC. Both caps trigger `429 RateLimitError`. +Both reset at **00:00 UTC**. Crossing either limit raises **`RateLimitError`** (`HTTP 429`). + +## Usage in code -## Reading usage +### Per response -Every `remember` and `recall` response carries its own usage: +Every **`remember`** and **`recall`** returns usage for that call: ```python hits = tex.recall(q="...", session_id=sid) print(hits.usage.tokens_in, hits.usage.tokens_out) ``` -For org-wide totals: +### Per org (dashboards, cron jobs) ```python -today = tex.usage.today() # daily + active quota +today = tex.usage.today() # daily totals + quota headroom month = tex.usage.summary() # current calendar month march = tex.usage.summary("2026-03") ``` -Or visit the [Dashboard → Usage](https://app.getmetacognition.com/dashboard/usage) for graph form. +The [Dashboard Usage page](https://app.getmetacognition.com/dashboard/usage) shows the same numbers. -## Cost control +## Cost knobs -- **Cap `top_k`** — defaults 15/25. Cut to 5–8 for live chat. -- **Use `mode="active"`** — faster *and* cheaper than `deep`. -- **Pre-filter writes** — strip system messages and "ok" / "thanks" turns. -- **Batch `remember`** — pass dozens of turns in one call instead of looping. -- **Quota-aware routing** — gate non-essential paths when usage > 90%. +| Lever | Why it helps | +| --- | --- | +| **Lower `top_k`** | Defaults are 15 / 25. Live chat often needs only 5-8. | +| **Stay on `active`** | `deep` mode costs more time and tokens than `active`. | +| **Trim noisy writes** | Skip one-word acks and redundant system spam in `remember`. | +| **Batch turns** | Send many turns in one `remember` instead of dozens of calls. | +| **Quota-aware routing** | Fall back to “no memory” for non-critical paths when you are near the cap. | ```python if tex.usage.today().tokens_in_used / 1_000_000 > 0.9: return generate_without_memory(query) ``` -## Alerts +## Alerting -Today, alerting is opt-in: poll `tex.usage.today()` from your own monitoring (every 5 min) and page on `> 0.9` of either cap. Server-side email alerts at 80% are on the [roadmap](/changelog). +There is no hosted email alert yet. Poll **`tex.usage.today()`** from your own monitor. Page your team when either usage column crosses **~90%** of quota. Server-side emails near **80%** are on the [roadmap](/changelog). -## After launch +## Pricing (later) -Pay-as-you-go. Pricing TBA. The daily caps stay as soft alerts — we'll email you, not 429 you. +Billing will be pay-as-you-go once pricing is published. Daily caps stay in place as safety rails. Until the billing docs change, treat today's **`429`** behavior as the source of truth. - - Set up the Python client. + + `pip install tex-sdk` diff --git a/docs.json b/docs.json new file mode 100644 index 0000000..11550ae --- /dev/null +++ b/docs.json @@ -0,0 +1,180 @@ +{ + "$schema": "https://mintlify.com/docs.json", + "theme": "mint", + "name": "Tex | Memory API for agents", + "colors": { + "primary": "#F32C05", + "light": "#FF5530", + "dark": "#C82200" + }, + "favicon": "/favicon.png", + "navigation": { + "anchors": [ + { + "anchor": "Documentation", + "icon": "book-open", + "groups": [ + { + "group": "Get started", + "pages": ["quickstart", "introduction", "authentication"] + }, + { + "group": "Concepts", + "pages": [ + { + "group": "Memory & retrieval", + "pages": ["concepts/memory-model", "concepts/retrieval"] + }, + { + "group": "Tenancy & usage", + "pages": ["concepts/scopes", "concepts/usage-billing"] + } + ] + }, + { + "group": "Cookbook", + "pages": [ + { + "group": "Apps", + "pages": ["recipes/fastapi", "recipes/streamlit"] + }, + { + "group": "Agents & channels", + "pages": ["recipes/langchain", "recipes/slack-bot"] + }, + { + "group": "Production patterns", + "pages": ["recipes/azure-openai-rag", "recipes/multi-tenant-saas"] + } + ] + }, + { + "group": "Evaluation", + "pages": ["benchmarks"] + }, + { + "group": "Migration", + "pages": [ + "migration/from-redis", + "migration/from-langchain-memory", + "migration/from-supermemory" + ] + }, + { + "group": "Support", + "pages": ["troubleshooting", "changelog"] + } + ] + }, + { + "anchor": "Python SDK", + "icon": "python", + "groups": [ + { + "group": "Python SDK", + "pages": [ + { + "group": "Setup", + "pages": ["sdk/installation", "sdk/client"] + }, + { + "group": "Calls", + "pages": [ + "sdk/conversations-remember", + "sdk/recall", + "sdk/usage", + "sdk/errors" + ] + } + ] + } + ] + }, + { + "anchor": "REST API", + "icon": "code", + "groups": [ + { + "group": "REST API", + "pages": [ + "api-reference/overview", + { + "group": "Auth & sessions", + "pages": [ + "api-reference/auth/signup", + "api-reference/auth/token-exchange", + "api-reference/auth/refresh" + ] + }, + { + "group": "Organization", + "pages": [ + "api-reference/account/me", + "api-reference/account/api-keys" + ] + }, + { + "group": "Memory", + "pages": [ + "api-reference/memory/ingest-memory", + "api-reference/memory/recall" + ] + }, + { + "group": "Usage", + "pages": [ + "api-reference/usage/today", + "api-reference/usage/summary" + ] + } + ] + } + ] + }, + { + "anchor": "GitHub", + "href": "https://github.com/metacoglabs", + "icon": "github" + } + ] + }, + "logo": { + "light": "/logo/light.png", + "dark": "/logo/dark.png" + }, + "background": { + "color": { + "light": "#FFFFFF", + "dark": "#0A0A0A" + } + }, + "navbar": { + "links": [ + { + "label": "Dashboard", + "href": "https://app.getmetacognition.com" + } + ], + "primary": { + "type": "button", + "label": "Get an API key", + "href": "https://app.getmetacognition.com/signup" + } + }, + "footer": { + "socials": { + "github": "https://github.com/metacoglabs", + "linkedin": "https://www.linkedin.com/company/metacognition-ai" + } + }, + "fonts": { + "heading": { + "family": "Geist", + "weight": 700 + }, + "body": { + "family": "Geist", + "weight": 400 + } + } +} diff --git a/introduction.mdx b/introduction.mdx index 955a6ad..e0574dd 100644 --- a/introduction.mdx +++ b/introduction.mdx @@ -1,35 +1,53 @@ --- -title: "Introduction" -description: "#1 on every major long-term memory benchmark. 93.3% on LoCoMo. 92.2% on LongMemEval_S. Sub-200ms retrieval. Zero LLM tokens during ingestion." +title: "Overview - What is Tex?" +description: "Long-term memory for assistants and agents. Store turns, recall the useful ones, and keep prompts small." +icon: "book-open" --- -Tex is the memory layer for AI agents — and the state of the art on every long-term memory benchmark we know of. Stream conversation turns to it; pull back the relevant slice on every new request. Bounded prompts, cross-session continuity, no Redis to babysit. + + New here? Start with the [Quickstart](/quickstart). Then read this overview and [Authentication](/authentication) before you ship. + + +Most chat apps make you choose between two bad options. You either send the whole chat history to the model, or you lose memory when the page refreshes. + +Tex gives you a simpler path. Store each turn as it happens. When the user asks the next question, ask Tex for the few memories that matter. Then call your model with that smaller context. + +Your app still runs the model, routes, and UI. Tex handles storage, search, ranking, and usage tracking. + +## API + +| Call | When | +| --- | --- | +| **`remember`** | Store new turns, plus optional metadata. | +| **`recall`** | Before you call the model, ask for the most relevant memories. | + + + Need access? Create an account in the [dashboard](https://app.getmetacognition.com/signup), copy the API key once, then follow the [Quickstart](/quickstart). Locally, set `TEX_API_KEY` or pass `api_key=` to the client. + + +## Start - First call in 5 minutes. + Install `tex-sdk`, store one turn, recall it, and print the score. - - Sign up at the dashboard. + + LoCoMo and LongMemEval_S results with splits, latency, and token counts. -## Best in class +## Benchmarks - **#1.** Beats EverMemOS (92.3%), MemMachine (91.7%), Zep (~85%), Mem0 (~66%). + Full-system benchmark. Tex is ahead of EverMemOS (**92.3%**), MemMachine v0.2 (**91.7%**), Zep (**~85%**), and Mem0 (**~66%**). See [Benchmarks](/benchmarks) for splits and methodology. - **#1.** Beats Emergence AI (86%), Supermemory (81.6%), Oracle GPT-4o (82.4%). + Active retrieval track. Tex is ahead of Emergence AI (**86.0%**), Supermemory (**81.6%**), and Zep (**71.2%**). See [Benchmarks](/benchmarks) for per-ability tables. - - Per-category breakdowns. Latency. Token efficiency. Ablations. - - -## The loop +## Loop @@ -46,37 +64,45 @@ Tex is the memory layer for AI agents — and the state of the art on every long ``` - Drop `context` into your system prompt. Reply. Persist the new turns. Loop. + Put `context` where your model reads it. Answer the user. Store the new turns. -## Concepts + + Low `confidence` at the start usually means the session has very little memory. Store more real turns and the score becomes more useful. + + +## More + +### Latency: active write vs background work + +The fast part of **`remember`** returns quickly. New turns are usually recallable within about **150 ms**. Tex then continues background work, such as observations, entities, and timeline updates. The diagrams and timing notes are in [How memory works](/concepts/memory-model). + +### Isolation between customers + +Use **`org_id`**, **`user_id`**, and **`session_id`** to keep memory separated. [Scopes and multi-tenancy](/concepts/scopes) shows how to map those fields to your users and tenants. + +### Python vs raw HTTP + +Use the [Python SDK](/sdk/installation) if you want token exchange and refresh handled for you. Use the [REST API](/api-reference/overview) from another language, or when your service already owns HTTP calls. + +### Quotas and billing + +Tex meters `tokens_in` and `tokens_out` with daily caps. [Usage, quotas, and billing](/concepts/usage-billing) explains what counts and when limits reset. + +## Docs - Turns, observations, entities — what we store and why. + What lands in storage after `remember`. - Active vs deep modes. Confidence calibration. + Modes, `top_k`, confidence. - - Org / user / session — multi-tenant in two lines. - - - Tokens in / out. Quota. Cost control. - - - -## Build with it - - - `pip install tex-sdk` - - - Direct HTTP, any language. + Install and client setup. - Chatbot · agent · Slack · Streamlit + Apps, agents, production patterns. diff --git a/migration/from-langchain-memory.mdx b/migration/from-langchain-memory.mdx index d37f856..9a59d9c 100644 --- a/migration/from-langchain-memory.mdx +++ b/migration/from-langchain-memory.mdx @@ -1,25 +1,25 @@ --- -title: "From LangChain" -description: "Migrate from BaseChatMemory subclasses to Tex retrieval." +title: "Migrate from LangChain chat memory" +description: "Move from LangChain chat buffers to Tex-backed recall." --- -LangChain's built-in memory classes (`ConversationBufferMemory`, `ConversationBufferWindowMemory`, `ConversationSummaryMemory`, `ConversationKGMemory`) are **buffers, summarizers, or graphs** that live inside your process. Tex is a **hosted retrieval service.** +If you came from the [LangChain recipe](/recipes/langchain), this page gives the longer comparison. -The migration moves you from "give the LLM the whole sliding window" to "give the LLM the relevant slice." +LangChain memory classes live inside your process. They keep buffers, summaries, or graphs near the chain. Tex stores memory outside the process and returns the relevant slice when you call **`recall`**. ## Mapping | LangChain | Tex equivalent | | --- | --- | -| `ConversationBufferMemory` | `recall(q=user_msg, session_id=sid, top_k=20)` — surfaces the relevant 20 turns instead of the last N. | -| `ConversationBufferWindowMemory(k=10)` | `recall(q=user_msg, session_id=sid, top_k=10)` — same `k`, but ranked by relevance. | -| `ConversationSummaryMemory` | Tex stores extracted observations automatically. Surface them via `recall`'s `hits.observations`. | +| `ConversationBufferMemory` | `recall(q=user_msg, session_id=sid, top_k=20)` returns the relevant 20 turns instead of the last N. | +| `ConversationBufferWindowMemory(k=10)` | `recall(q=user_msg, session_id=sid, top_k=10)` uses the same `k`, but ranks by relevance. | +| `ConversationSummaryMemory` | Tex stores extracted observations automatically. Read them from `recall`'s `hits.observations`. | | `ConversationKGMemory` | Tex builds an entity graph in the background. Query via `hits.entities` (linked across observations). | ## Migration - + ```python from langchain.chains import ConversationChain from langchain.memory import ConversationBufferWindowMemory @@ -34,7 +34,7 @@ The migration moves you from "give the LLM the whole sliding window" to "give th answer = chain.invoke({"input": user_msg})["response"] ``` - + ```python from langchain.prompts import ChatPromptTemplate from langchain_openai import ChatOpenAI @@ -63,22 +63,13 @@ The migration moves you from "give the LLM the whole sliding window" to "give th -## What you give up +## Trade-offs -LangChain's memory classes are simple in-process objects with no network call. Tex adds: +LangChain memory has no network hop. Tex adds a network call: writes are usually around 150ms, and reads can take a few seconds. -- Network round-trip (~150ms `remember`, ~1.7s `recall`) — your request gets slower. -- A dependency on our service availability. +In return, memory survives deploys, prompts stay bounded, sessions can share memory, and each recall has a confidence score. -What you gain: - -- Memory survives restarts, deploys, and shard moves. -- Bounded prompt size — never hit the context limit. -- Cross-session continuity — same user across days/weeks/months. -- Confidence scoring — gate fallback paths. -- Observation extraction — structured "facts" without writing prompts. - -For a hobby project, LangChain's buffer is fine. For anything user-facing in production, Tex's tradeoff is right. +For a small hobby bot, a buffer may be enough. For a customer-facing app, Tex is usually the cleaner path. ## Drop-in adapter (optional) @@ -117,7 +108,7 @@ class TexChatMemory(BaseChatMemory): ) def clear(self) -> None: - # No-op — Tex memory persists by design + # No-op. Tex memory persists by design. pass ``` diff --git a/migration/from-redis.mdx b/migration/from-redis.mdx index 37fd58c..9ff22ed 100644 --- a/migration/from-redis.mdx +++ b/migration/from-redis.mdx @@ -1,11 +1,11 @@ --- -title: "From Redis" -description: "Replace your Redis-backed conversation log with Tex memory." +title: "Migrate from Redis (or a homegrown log)" +description: "Replace prompt-stuffed chat logs with remember and recall." --- -If your chatbot stores chat history in Redis (or Postgres, or Mongo) and dumps the whole thing into every prompt, this is the migration for you. +Use this guide if you store chat history in Redis, Postgres, or Mongo and send too much of it to the model on every request. -## The pattern you're replacing +## Before (Redis log) ```python before.py # Append every turn @@ -23,7 +23,7 @@ Pain points: - "Last 50" is a guess. Older relevant context gets evicted. - Redis cost grows linearly forever. -## The Tex version +## After (Tex) ```python after.py hits = tex.recall(q=user_msg, session_id=sid, top_k=8) @@ -35,11 +35,11 @@ tex.conversations.remember(session_id=sid, turns=[ ]) ``` -Three things change for the better: +Three things change: -1. **Bounded prompts.** You pull the *relevant* 8 turns regardless of how many exist. +1. **Bounded prompts.** You pull the relevant 8 turns regardless of how many exist. 2. **Cross-session continuity.** Use `f"chat-{user_id}"` to share memory across conversations. -3. **No retention policy to maintain.** Tex's confidence scoring decides what stays useful. +3. **Less prompt trimming.** You stop guessing which recent turns fit in context. ## Backfill plan @@ -54,7 +54,7 @@ Three things change for the better: {"role": t["role"], "text": t["text"], "timestamp": t["ts"]} for t in turns ] - # Big batches are fine — pass all turns at once + # Big batches are fine. Pass all turns at once. tex.conversations.remember(session_id=sid, turns=formatted) ``` @@ -65,7 +65,7 @@ Three things change for the better: - `recall(q=)` finds the right turn - Run **both** read paths in production for a week. Log when Tex's `confidence < 0.2`. If those rate is below your tolerance, proceed. + Run **both** read paths in production for a week. Log when Tex's `confidence < 0.2`. If that rate stays below your tolerance, proceed. Switch the prompt to use Tex hits. Keep the Redis writes for one more week as backup. @@ -77,7 +77,7 @@ Three things change for the better: ## Edge cases -- **Streaming responses.** Persist with `remember` *after* the stream completes. Background-task it so the next request isn't delayed. -- **System messages.** Don't migrate them — they consume tokens and add no recall value. -- **Tool calls.** Store the *result* of a tool call as an assistant turn, not the raw JSON. Recall surfaces text. -- **Audit log.** Tex isn't an audit store. Keep Redis (or a real append-only log) for compliance and use Tex for retrieval — two systems, two purposes. +- **Streaming responses.** Persist with `remember` after the stream completes. Run it in the background so the next request is not delayed. +- **System messages.** Do not migrate them. They consume tokens and add little recall value. +- **Tool calls.** Store the result of a tool call as an assistant turn, not the raw JSON. Recall returns text. +- **Audit log.** Tex is not an audit store. Keep Redis or another append-only log for compliance, and use Tex for retrieval. diff --git a/migration/from-supermemory.mdx b/migration/from-supermemory.mdx index 6878da4..d82b4f5 100644 --- a/migration/from-supermemory.mdx +++ b/migration/from-supermemory.mdx @@ -1,9 +1,9 @@ --- -title: "From Supermemory" -description: "Migrate from Supermemory's SDK to Tex with minimal code changes." +title: "Migrate from Supermemory" +description: "Map Supermemory calls to Tex calls." --- -The Tex Python SDK is **resource-shape compatible** with Supermemory for the verbs we both implement. If you're using `supermemory.add(...)`, `client.search(...)`, or `client.profile(...)`, the migration is mostly a pip install and a base-url swap. +The Tex Python SDK keeps Supermemory-compatible calls for the verbs both SDKs support. If you use `supermemory.add(...)`, `client.search(...)`, or `client.profile(...)`, most of the migration is installing `tex-sdk` and changing the base URL. ## Drop-in compatibility @@ -21,14 +21,14 @@ These calls work the same in Tex: | `client.documents.batch_add([...])` | `tex.documents.batch_add([...])` | - Tex's `search` is a **resource**, not a callable — `tex.search(q)` will raise `TypeError`. Use `tex.search.documents(q)` or `tex.search.memories(q)`. There's also a generic `tex.search.execute(q)` alias for `tex.search.documents(q)` if you want a single entry point. + Tex's `search` is a **resource**, not a callable. `tex.search(q)` raises `TypeError`. Use `tex.search.documents(q)` or `tex.search.memories(q)`. `tex.search.execute(q)` is also available as an alias for `tex.search.documents(q)`. -Tex's `/v3/*` and `/v4/*` paths accept the **raw API key** as Bearer (no JWT exchange) for full Supermemory compatibility — the SDK routes them automatically. +Tex's `/v3/*` and `/v4/*` paths accept the raw API key as Bearer for Supermemory compatibility. The SDK routes those calls automatically. ## Exclusive Tex features -These are Tex-only verbs you can adopt incrementally: +These Tex-only verbs can be adopted one at a time: | Verb | What it adds | | --- | --- | @@ -60,15 +60,15 @@ These are Tex-only verbs you can adopt incrementally: Existing `add` / `search` / `profile` calls keep working unchanged. - Replace your conversation-history pattern with `tex.conversations.remember` + `tex.recall` for stronger temporal awareness and confidence scoring. See [memory model](/concepts/memory-model). + Replace your conversation-history pattern with `tex.conversations.remember` + `tex.recall` for stronger temporal awareness and confidence scoring. The mental model is in [How memory works](/concepts/memory-model). ## Behavioral differences -- **Auth.** Supermemory uses the API key as Bearer. Tex does the same on `/v3/*` and `/v4/*` (compat paths) but exchanges to JWT for Tex-native paths. The SDK handles both transparently. -- **Retention.** Persistent on both. No "free tier expires" wipe on Tex — your memory stays until you delete it. -- **Pricing.** Tex bills on `tokens_in` / `tokens_out`, not document count. Chatty workloads are usually cheaper; document-heavy ones — measure with `tex.usage.summary()` after a representative day. +- **Auth.** Supermemory uses the API key as Bearer. Tex does the same on `/v3/*` and `/v4/*`, but exchanges to JWT for Tex-native paths. The SDK handles both. +- **Retention.** Memory is persistent. Tex does not wipe free-tier memory on a timer. It stays until you delete it. +- **Pricing.** Tex bills on `tokens_in` / `tokens_out`, not document count. For document-heavy workloads, measure with `tex.usage.summary()` after a representative day. - **Profiles.** Tex's `profile` aggregates over `container_tag` like Supermemory's. Backfill works without changes. If you hit a Supermemory verb that doesn't behave identically, file an issue. diff --git a/mint.json b/mint.json deleted file mode 100644 index 9e84229..0000000 --- a/mint.json +++ /dev/null @@ -1,160 +0,0 @@ -{ - "$schema": "https://mintlify.com/schema.json", - "name": "Tex", - "logo": { - "dark": "/logo/dark.png", - "light": "/logo/light.png" - }, - "favicon": "/favicon.png", - "colors": { - "primary": "#F32C05", - "light": "#FF5530", - "dark": "#C82200", - "background": { - "light": "#FFFFFF", - "dark": "#0A0A0A" - }, - "anchors": { - "from": "#F32C05", - "to": "#FF5530" - } - }, - "font": { - "headings": { - "family": "Geist", - "weight": 700 - }, - "body": { - "family": "Source Serif 4", - "weight": 400 - } - }, - "topbarLinks": [ - { - "name": "Dashboard", - "url": "https://app.getmetacognition.com" - } - ], - "topbarCtaButton": { - "name": "Get an API key", - "url": "https://app.getmetacognition.com/signup" - }, - "anchors": [ - { - "name": "Python SDK", - "icon": "python", - "url": "sdk" - }, - { - "name": "API Reference", - "icon": "code", - "url": "api-reference" - }, - { - "name": "GitHub", - "icon": "github", - "url": "https://github.com/metacoglabs" - } - ], - "navigation": [ - { - "group": "Get started", - "pages": [ - "introduction", - "quickstart", - "authentication" - ] - }, - { - "group": "Benchmarks", - "pages": [ - "benchmarks" - ] - }, - { - "group": "Guides", - "pages": [ - "concepts/memory-model", - "concepts/retrieval", - "concepts/scopes", - "concepts/usage-billing" - ] - }, - { - "group": "Python SDK", - "pages": [ - "sdk/installation", - "sdk/client", - "sdk/conversations-remember", - "sdk/recall", - "sdk/usage", - "sdk/errors" - ] - }, - { - "group": "API Reference", - "pages": [ - "api-reference/overview", - { - "group": "Auth", - "pages": [ - "api-reference/auth/signup", - "api-reference/auth/token-exchange", - "api-reference/auth/refresh" - ] - }, - { - "group": "Account", - "pages": [ - "api-reference/account/me", - "api-reference/account/api-keys" - ] - }, - { - "group": "Memory", - "pages": [ - "api-reference/memory/ingest-memory", - "api-reference/memory/recall" - ] - }, - { - "group": "Usage", - "pages": [ - "api-reference/usage/today", - "api-reference/usage/summary" - ] - } - ] - }, - { - "group": "Cookbook", - "pages": [ - "recipes/fastapi", - "recipes/langchain", - "recipes/azure-openai-rag", - "recipes/slack-bot", - "recipes/streamlit", - "recipes/multi-tenant-saas" - ] - }, - { - "group": "Migrate", - "pages": [ - "migration/from-redis", - "migration/from-langchain-memory", - "migration/from-supermemory" - ] - }, - { - "group": "Help", - "pages": [ - "troubleshooting", - "changelog" - ] - } - ], - "footerSocials": { - "github": "https://github.com/metacoglabs", - "linkedin": "https://www.linkedin.com/company/metacognition-ai" - } -} diff --git a/quickstart.mdx b/quickstart.mdx index b63912a..0c3be6f 100644 --- a/quickstart.mdx +++ b/quickstart.mdx @@ -1,26 +1,47 @@ --- title: "Quickstart" -description: "First call in five minutes." +description: "Install tex-sdk, set TEX_API_KEY, store one turn, recall it, and read the score." +icon: "rocket" --- +## Goal + +In a few minutes, you will run one Python script that does the core Tex flow: + +1. Store a turn with **`remember`**. +2. Ask a question with **`recall`**. +3. Print the matching memory, confidence, and token usage. + +You need **Python 3.9+**. + - Sign up at [app.getmetacognition.com](https://app.getmetacognition.com/signup) and copy the key shown once. + Open [app.getmetacognition.com/signup](https://app.getmetacognition.com/signup), create an account, and copy the key shown once. - The key appears only on the screen after signup — store it now. + You only see the full key when you create it. Store it now in a password manager or secret store. - - ```bash + + + ```bash pip pip install tex-sdk ``` - Or `uv add tex-sdk` / `poetry add tex-sdk`. Requires Python ≥ 3.9. Distribution name is `tex-sdk`; import name is `tex`. + ```bash uv + uv add tex-sdk + ``` + + ```bash poetry + poetry add tex-sdk + ``` + + + On PyPI the package is **`tex-sdk`**. In Python you **`import tex`**. - + ```python first_call.py import os from tex import Tex @@ -47,26 +68,52 @@ description: "First call in five minutes." export TEX_API_KEY="tex_live_..." python first_call.py ``` + + + You should see the shellfish line with a score, plus `confidence` and token usage. If you get `AuthenticationError`, check your key and `base_url`, then use [Troubleshooting](/troubleshooting). + +## Next + +Once the script works, decide how you want to load secrets and whether you want to stay on the SDK. + +### Load the key from a `.env` file + +Use `python-dotenv` or your framework's loader. Keep `.env` out of git. In production, load `TEX_API_KEY` from your normal secret store. + +### Call the API without the SDK + +If you do not use the SDK, first exchange your API key for an **access token** and **refresh token**. Send `Authorization: Bearer ...` on ingest and recall. Refresh the access token when it expires. The [REST API overview](/api-reference/overview) lists the endpoints. + +The SDK does this for you. + +## Reads + - Confidence will be low when memory is fresh — keep adding turns and watch it climb. + If **`confidence`** is low but you know the memory exists, try **`mode="deep"`** once, raise **`top_k`**, or ask the question closer to the stored wording. [Recall and ranking](/concepts/retrieval) explains the knobs. -## Next +| Goal | Page | +| --- | --- | +| Understand what gets stored | [How memory works](/concepts/memory-model) | +| Tune recall quality | [Recall and ranking](/concepts/retrieval) | +| Ship real users | [Scopes and multi-tenancy](/concepts/scopes) | +| Put a backend in front of a UI | [Production chatbot (FastAPI)](/recipes/fastapi) | +| Production errors | [Errors and retries](/sdk/errors) | - What's stored and how. + Turns, observations, entities after each `remember`. - - Multi-user partitioning. + + Map `session_id` (and tenants) to your product. - - Production chatbot in 40 lines. + + Small service pattern behind a UI. - - Every exception, every retry rule. + + What the SDK throws and what it retries. diff --git a/recipes/azure-openai-rag.mdx b/recipes/azure-openai-rag.mdx index 6d1ab14..434f3b3 100644 --- a/recipes/azure-openai-rag.mdx +++ b/recipes/azure-openai-rag.mdx @@ -1,95 +1,103 @@ --- -title: "RAG with Azure GPT-4o" -description: "Tex memory + GPT-4o on Azure for production RAG." +title: "RAG on Azure OpenAI" +description: "Use Tex recall with Azure OpenAI chat completions." +icon: "cloud" --- -The canonical "memory + LLM" loop with Azure OpenAI. Same shape works for OpenAI direct, Anthropic, or any chat-completion API — swap the SDK. - -## Install - -```bash -pip install tex-sdk openai -``` - -## Environment - -```bash -# .env -TEX_API_KEY=tex_live_... -TEX_BASE_URL=https://api.getmetacognition.com - -AZURE_OPENAI_ENDPOINT=https://.openai.azure.com -AZURE_OPENAI_API_KEY=... -AZURE_OPENAI_DEPLOYMENT=gpt-4o -AZURE_OPENAI_API_VERSION=2025-04-01-preview -``` - -## Code - -```python rag.py -import os -from datetime import datetime, timezone -from openai import AzureOpenAI -from tex import Tex, RateLimitError, APITimeoutError - -tex = Tex(api_key=os.environ["TEX_API_KEY"], base_url=os.environ["TEX_BASE_URL"], timeout=10) -gpt = AzureOpenAI( - api_key=os.environ["AZURE_OPENAI_API_KEY"], - api_version=os.environ["AZURE_OPENAI_API_VERSION"], - azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"], -) - -def now_iso() -> str: - return datetime.now(timezone.utc).isoformat().replace("+00:00", "Z") - -def answer(query: str, sid: str) -> dict: - # 1. Recall — soft-fail - memory: list[str] = [] - confidence = 0.0 - try: - hits = tex.recall(q=query, session_id=sid, top_k=5) - memory = [h.text for h in hits.hits.turns] - confidence = hits.confidence - except (RateLimitError, APITimeoutError): - pass - - # 2. Generate - sys_msg = ( - "You are a helpful assistant. " - + (f"Relevant memory:\n{chr(10).join('- ' + m for m in memory)}" - if memory else "") +This recipe uses the same flow as the [FastAPI recipe](/recipes/fastapi). Recall memory, answer with Azure OpenAI, then store the new turn. + + + + ```bash + pip install tex-sdk openai + ``` + + + + ```bash + TEX_API_KEY=tex_live_... + TEX_BASE_URL=https://api.getmetacognition.com + + AZURE_OPENAI_ENDPOINT=https://.openai.azure.com + AZURE_OPENAI_API_KEY=... + AZURE_OPENAI_DEPLOYMENT=gpt-4o + AZURE_OPENAI_API_VERSION=2025-04-01-preview + ``` + + + + ```python + import os + from openai import AzureOpenAI + from tex import Tex + + tex = Tex( + api_key=os.environ["TEX_API_KEY"], + base_url=os.environ["TEX_BASE_URL"], + timeout=10, ) - chat = gpt.chat.completions.create( - model=os.environ["AZURE_OPENAI_DEPLOYMENT"], - messages=[ - {"role": "system", "content": sys_msg}, - {"role": "user", "content": query}, - ], - temperature=0.4, + gpt = AzureOpenAI( + api_key=os.environ["AZURE_OPENAI_API_KEY"], + api_version=os.environ["AZURE_OPENAI_API_VERSION"], + azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"], ) - reply = chat.choices[0].message.content - - # 3. Remember - tex.conversations.remember(session_id=sid, turns=[ - {"role":"user","text": query, "timestamp": now_iso()}, - {"role":"assistant","text": reply, "timestamp": now_iso()}, - ]) - - return {"answer": reply, "confidence": confidence, "memory_used": len(memory)} - -if __name__ == "__main__": - import json - print(json.dumps(answer("any food restrictions?", "demo"), indent=2)) -``` - -## Run - -```bash -python rag.py -``` - -## Notes - -- **Cite sources.** Pass each hit's `id` to GPT-4o and instruct it to cite `[mem:abc]` — you can later resolve those IDs back to the turn for click-through. -- **Confidence-gated retry.** If `confidence < 0.4`, retry with `mode="deep"`. Costs +1–4s, rescues most weak-recall cases. -- **Streaming.** `stream=True` works as expected. Push `remember` to a background task so it doesn't delay the stream. + ``` + + + + ```python + from datetime import datetime, timezone + from tex import RateLimitError, APITimeoutError + + def now_iso() -> str: + return datetime.now(timezone.utc).isoformat().replace("+00:00", "Z") + + def answer(query: str, sid: str) -> dict: + memory: list[str] = [] + confidence = 0.0 + try: + hits = tex.recall(q=query, session_id=sid, top_k=5) + memory = [h.text for h in hits.hits.turns] + confidence = hits.confidence + except (RateLimitError, APITimeoutError): + pass + + sys_msg = ( + "You are a helpful assistant. " + + (f"Relevant memory:\n{chr(10).join('- ' + m for m in memory)}" if memory else "") + ) + chat = gpt.chat.completions.create( + model=os.environ["AZURE_OPENAI_DEPLOYMENT"], + messages=[ + {"role": "system", "content": sys_msg}, + {"role": "user", "content": query}, + ], + temperature=0.4, + ) + reply = chat.choices[0].message.content + + tex.conversations.remember( + session_id=sid, + turns=[ + {"role": "user", "text": query, "timestamp": now_iso()}, + {"role": "assistant", "text": reply, "timestamp": now_iso()}, + ], + ) + return {"answer": reply, "confidence": confidence, "memory_used": len(memory)} + ``` + + + + ```python + if __name__ == "__main__": + import json + print(json.dumps(answer("any food restrictions?", "demo-session"), indent=2)) + ``` + + + +## Production notes + +- **Citations:** pass hit ids into the prompt and ask the model to quote `[mem:]`. Use that id to link answers back to memory. +- **Weak recall:** if `confidence < 0.4`, call `tex.recall(..., mode="deep")` once before answering. +- **Streaming:** set `stream=True`, then enqueue `remember` in a background worker so the stream can start quickly. diff --git a/recipes/fastapi.mdx b/recipes/fastapi.mdx index 6e24b0c..7546cc5 100644 --- a/recipes/fastapi.mdx +++ b/recipes/fastapi.mdx @@ -1,20 +1,138 @@ --- -title: "Chatbot backend (FastAPI)" -description: "Drop-in chatbot backend with Tex memory." +title: "Production chatbot (FastAPI)" +description: "Build a FastAPI chat route that recalls memory, calls your model, and stores the new turn." +icon: "server" --- -Chatbot backend in ~40 lines. One Tex client per process; memory recall on every turn. +This recipe puts the quickstart loop behind one HTTP route: -## Layout +1. Recall memory for the incoming message. +2. Call your model. +3. Store the user and assistant turns. -```text -app/ -├── deps.py # Tex singleton -├── main.py # FastAPI app -└── chat.py # /chat endpoint -``` +Create one **`Tex`** client per process and reuse it across requests. + +## Layout -## Code +| File | Job | +| --- | --- | +| `deps.py` | Cached Tex | +| `chat.py` | `/chat` | +| `main.py` | App entry | + + + + Create this structure: + + ```text + app/ + ├── deps.py # cached Tex client + ├── main.py # FastAPI entry + └── chat.py # /chat route + ``` + + Rename `app/` if you want. Keep the import paths consistent in `uvicorn`. + + + + Read secrets from the environment and construct Tex **once**: + + ```python deps.py + from functools import cache + from tex import Tex + import os + + @cache + def tex_client() -> Tex: + return Tex( + api_key=os.environ["TEX_API_KEY"], + base_url=os.environ.get( + "TEX_BASE_URL", "https://api.getmetacognition.com" + ), + timeout=10, + ) + ``` + + Use `tex_client()` inside FastAPI `Depends(...)` so every route shares the same connection pool. + + + + Derive `session_id` from the user and the chat. Recall with a small `top_k`. If Tex times out or quota is exhausted, answer without memory. Then store both sides of the turn: + + ```python chat.py + from datetime import datetime, timezone + from fastapi import APIRouter, Depends, Header + from pydantic import BaseModel + from tex import Tex, RateLimitError, APITimeoutError + from .deps import tex_client + + router = APIRouter() + + def now_iso() -> str: + return datetime.now(timezone.utc).isoformat().replace("+00:00", "Z") + + class ChatBody(BaseModel): + text: str + session_id: str + + @router.post("/chat") + def chat( + body: ChatBody, + x_user_id: str = Header(...), + tex: Tex = Depends(tex_client), + ): + sid = f"u_{x_user_id}-{body.session_id}" + + memory: list[str] = [] + try: + hits = tex.recall(q=body.text, session_id=sid, top_k=5) + memory = [h.text for h in hits.hits.turns] + except (RateLimitError, APITimeoutError): + memory = [] + + answer = your_llm.complete(body.text, memory=memory) + + tex.conversations.remember( + session_id=sid, + turns=[ + {"role": "user", "text": body.text, "timestamp": now_iso()}, + {"role": "assistant", "text": answer, "timestamp": now_iso()}, + ], + ) + + return {"answer": answer} + ``` + + Replace `your_llm.complete(...)` with your model call. + + + + Mount the router once: + + ```python main.py + from fastapi import FastAPI + from .chat import router + + app = FastAPI() + app.include_router(router) + ``` + + + + Export your key and launch uvicorn: + + ```bash + export TEX_API_KEY=tex_live_... + uvicorn app.main:app --reload + ``` + + Send `POST /chat` with JSON `{"text":"...","session_id":"..."}` and header `x-user-id`. + + + +## Full files + +If you prefer one copy block, paste these files: ```python deps.py @@ -58,7 +176,6 @@ def chat( ): sid = f"u_{x_user_id}-{body.session_id}" - # 1. Recall — gracefully degrade on timeout/quota memory: list[str] = [] try: hits = tex.recall(q=body.text, session_id=sid, top_k=5) @@ -66,10 +183,8 @@ def chat( except (RateLimitError, APITimeoutError): memory = [] - # 2. Generate — your LLM of choice answer = your_llm.complete(body.text, memory=memory) - # 3. Remember (fire and forget — see notes) tex.conversations.remember(session_id=sid, turns=[ {"role": "user", "text": body.text, "timestamp": now_iso()}, {"role": "assistant", "text": answer, "timestamp": now_iso()}, @@ -87,31 +202,36 @@ app.include_router(router) ``` -## Run - -```bash -export TEX_API_KEY=tex_live_... -uvicorn app.main:app --reload -``` - ## Production tweaks -**Push `remember` to a background task.** Holding the request open for the write adds 100–250ms to tail latency: +### Run `remember` in the background + +Do not make the user wait for **`remember`**. Enqueue it in the background: ```python from fastapi import BackgroundTasks @router.post("/chat") -def chat(body: ChatBody, bg: BackgroundTasks, tex: Tex = Depends(tex_client)): - ... - bg.add_task(tex.conversations.remember, - session_id=sid, turns=[user_turn, assistant_turn]) +def chat( + body: ChatBody, + bg: BackgroundTasks, + x_user_id: str = Header(...), + tex: Tex = Depends(tex_client), +): + # ... recall + answer ... + bg.add_task( + tex.conversations.remember, + session_id=sid, + turns=[user_turn, assistant_turn], + ) return {"answer": answer} ``` -**Bound recall latency** with `Tex(timeout=2.0)`; catch `APITimeoutError` and degrade to no-memory generation. +### Bound recall latency + +Set **`Tex(timeout=2.0)`** and catch **`APITimeoutError`**. If recall is slow, answer without memory instead of blocking the whole chat request. -**Health check:** +### Add a health probe ```python @app.get("/healthz") diff --git a/recipes/langchain.mdx b/recipes/langchain.mdx index 24246cb..445de01 100644 --- a/recipes/langchain.mdx +++ b/recipes/langchain.mdx @@ -1,107 +1,145 @@ --- -title: "Agent memory (LangChain)" -description: "Use Tex as an agent's memory tool." +title: "LangChain agents with memory" +description: "Add Tex memory to LangChain by injecting recall before the chain or exposing recall as a tool." +icon: "link" --- -Two patterns: - -1. **Memory tool** — agent explicitly calls `recall` when it needs context. -2. **Pre-prompt injection** — controller code recalls before the LLM call and stuffs results into the prompt. - -Pattern 2 is simpler and almost always sufficient. Pattern 1 is right for true autonomous agents. - -## Pattern 1: memory as a tool - -```python -# pip install tex-sdk langchain langchain-openai -import os -from datetime import datetime, timezone -from tex import Tex -from langchain.tools import tool -from langchain.agents import create_react_agent, AgentExecutor -from langchain_openai import ChatOpenAI - -tex = Tex(api_key=os.environ["TEX_API_KEY"], base_url=os.environ["TEX_BASE_URL"]) -SESSION = "agent-1" - -def now_iso() -> str: - return datetime.now(timezone.utc).isoformat().replace("+00:00", "Z") - -@tool -def recall_memory(query: str) -> str: - """Look up the agent's long-term memory for relevant context. - Returns up to 5 most-relevant past statements.""" - hits = tex.recall(q=query, session_id=SESSION, top_k=5) - if not hits.hits.turns: - return "(no relevant memory)" - return "\n".join(f"- {h.text}" for h in hits.hits.turns) - -@tool -def remember_fact(text: str) -> str: - """Persist a fact for future recall.""" - tex.conversations.remember(session_id=SESSION, turns=[ - {"role":"system","text":text,"timestamp":now_iso()}, - ]) - return "remembered" - -agent = create_react_agent( - ChatOpenAI(model="gpt-4o"), - tools=[recall_memory, remember_fact, ...], - prompt="...", -) -``` +There are two common ways to use Tex with LangChain. + +Most chat apps should recall memory before the chain runs. Agents that choose their own steps can receive Tex as tools. -The agent decides when to invoke `recall_memory` — useful for multi-step plans where some steps need history and others don't. +| Pattern | When to use | +| --- | --- | +| **Inject** | Your code recalls once per user message. | +| **Tools** | The model decides when to read or write memory. | -## Pattern 2: injection at the controller level +## Inject (default) -```python -import os -from tex import Tex + + + ```bash + pip install tex-sdk langchain langchain-openai + ``` + -tex = Tex(api_key=os.environ["TEX_API_KEY"], base_url=os.environ["TEX_BASE_URL"]) + + ```python + import os + from tex import Tex -def chain(user_msg: str, sid: str) -> str: - # 1. Recall - hits = tex.recall(q=user_msg, session_id=sid, top_k=5) - memory = "\n".join(f"- {h.text}" for h in hits.hits.turns) + tex = Tex( + api_key=os.environ["TEX_API_KEY"], + base_url=os.environ["TEX_BASE_URL"], + ) + ``` + - # 2. Build prompt with memory + + ```python + from datetime import datetime, timezone from langchain.prompts import ChatPromptTemplate from langchain_openai import ChatOpenAI - prompt = ChatPromptTemplate.from_messages([ - ("system", "Relevant memory about the user:\n{memory}"), - ("user", "{input}"), - ]) - chain = prompt | ChatOpenAI(model="gpt-4o") - - answer = chain.invoke({"memory": memory, "input": user_msg}).content - - # 3. Persist - tex.conversations.remember(session_id=sid, turns=[ - {"role":"user","text":user_msg,"timestamp": now_iso()}, - {"role":"assistant","text":answer,"timestamp": now_iso()}, - ]) - return answer -``` - -## Replacing `BaseChatMemory` + def now_iso() -> str: + return datetime.now(timezone.utc).isoformat().replace("+00:00", "Z") + + def answer_turn(user_msg: str, session_id: str) -> str: + hits = tex.recall(q=user_msg, session_id=session_id, top_k=5) + memory = "\n".join(f"- {h.text}" for h in hits.hits.turns) + + prompt = ChatPromptTemplate.from_messages([ + ("system", "Relevant memory about the user:\n{memory}"), + ("user", "{input}"), + ]) + chain = prompt | ChatOpenAI(model="gpt-4o") + + reply = chain.invoke({"memory": memory, "input": user_msg}).content + + tex.conversations.remember( + session_id=session_id, + turns=[ + {"role": "user", "text": user_msg, "timestamp": now_iso()}, + {"role": "assistant", "text": reply, "timestamp": now_iso()}, + ], + ) + return reply + ``` + + + +## Tools + + + + ```python + import os + from datetime import datetime, timezone + from tex import Tex + from langchain.tools import tool + from langchain.agents import create_react_agent, AgentExecutor + from langchain_openai import ChatOpenAI -LangChain's built-in memory (`ConversationBufferMemory`, etc.) is a buffer. Tex is retrieval. The migration is: + tex = Tex(api_key=os.environ["TEX_API_KEY"], base_url=os.environ["TEX_BASE_URL"]) + SESSION = "agent-1" + + def now_iso() -> str: + return datetime.now(timezone.utc).isoformat().replace("+00:00", "Z") + + @tool + def recall_memory(query: str) -> str: + """Look up long-term memory; returns bullet list of statements.""" + hits = tex.recall(q=query, session_id=SESSION, top_k=5) + if not hits.hits.turns: + return "(no relevant memory)" + return "\n".join(f"- {h.text}" for h in hits.hits.turns) + + @tool + def remember_fact(text: str) -> str: + """Persist a fact for later recall.""" + tex.conversations.remember( + session_id=SESSION, + turns=[{"role": "system", "text": text, "timestamp": now_iso()}], + ) + return "remembered" + ``` + + + + ```python + agent = create_react_agent( + ChatOpenAI(model="gpt-4o"), + tools=[recall_memory, remember_fact], + prompt="...", # you supply + ) + + executor = AgentExecutor(agent=agent, tools=[recall_memory, remember_fact]) + ``` + + + + + Add other LangChain tools to the same `tools=[...]` list. The Tex tools behave like normal tools. + + +## Compared to `BaseChatMemory` + +LangChain buffers keep history in process. Tex stores memory outside the process and returns the top matches for the current question. That keeps prompts smaller and survives deploys. + + +```python Before +from langchain.chains import ConversationChain +from langchain.memory import ConversationBufferMemory -```python -# BEFORE — full history in the prompt memory = ConversationBufferMemory() chain = ConversationChain(llm=llm, memory=memory) +``` -# AFTER — top-k relevant turns, regardless of recency +```python After hits = tex.recall(q=user_msg, session_id=sid, top_k=5) prompt = stitch(hits.hits.turns, user_msg) answer = llm.invoke(prompt) -tex.conversations.remember(session_id=sid, turns=[...]) +tex.conversations.remember(session_id=sid, turns=[...]) # use your normal turn format ``` + -You stop maintaining a sliding window. You stop hitting context-length errors. The retrieval is bounded. - -See [Migrating from LangChain memory](/migration/from-langchain-memory) for a full walk-through. +For the migration details, continue to [Migrating from LangChain memory](/migration/from-langchain-memory). diff --git a/recipes/multi-tenant-saas.mdx b/recipes/multi-tenant-saas.mdx index c6173ac..09e05ec 100644 --- a/recipes/multi-tenant-saas.mdx +++ b/recipes/multi-tenant-saas.mdx @@ -1,108 +1,131 @@ --- -title: "Multi-tenant SaaS" -description: "One Tex key, partitioned per end-user." +title: "Multi-tenant SaaS pattern" +description: "Choose one shared Tex org or one Tex org per customer." +icon: "building" --- -You're building a product where each of your customers has their own private memory. Two patterns; almost everyone wants Pattern A. - -## Pattern A — one Tex key, encode end-user into session (recommended) - -You hold a single Tex API key in your secrets manager. End-users are partitioned by encoding their id into the `session_id` you pass on each call. One shared, long-lived client serves all your traffic. - -```python deps.py -from functools import cache -from tex import Tex -import os - -@cache -def shared_tex() -> Tex: - return Tex( - api_key=os.environ["TEX_API_KEY"], - base_url=os.environ["TEX_BASE_URL"], - ) -``` - -```python chat.py -from fastapi import APIRouter, Header -from .deps import shared_tex - -router = APIRouter() - -@router.post("/chat") -def chat(body: ChatBody, x_user_id: str = Header(...)): - tex = shared_tex() - sid = f"u_{x_user_id}-{body.session_id}" # user-scoped session - - hits = tex.recall(q=body.text, session_id=sid) - answer = your_llm(body.text, memory=hits) - tex.conversations.remember(session_id=sid, turns=[...]) - return {"answer": answer} -``` +There are two common ways to isolate tenants. Most teams should start with **Pattern A**. + +## A - one key, put the user in `session_id` + + + + ```python deps.py + from functools import cache + from tex import Tex + import os + + @cache + def shared_tex() -> Tex: + return Tex( + api_key=os.environ["TEX_API_KEY"], + base_url=os.environ["TEX_BASE_URL"], + ) + ``` + + + + ```python chat.py + from pydantic import BaseModel + from fastapi import APIRouter, Header + from .deps import shared_tex + + router = APIRouter() + + class ChatBody(BaseModel): + text: str + session_id: str + + @router.post("/chat") + def chat(body: ChatBody, x_user_id: str = Header(...)): + tex = shared_tex() + sid = f"u_{x_user_id}-{body.session_id}" + + hits = tex.recall(q=body.text, session_id=sid) + answer = your_llm(body.text, memory=hits) + tex.conversations.remember( + session_id=sid, + turns=[ + {"role": "user", "text": body.text, "timestamp": "2026-05-13T00:00:00Z"}, + {"role": "assistant", "text": answer, "timestamp": "2026-05-13T00:00:00Z"}, + ], + ) + return {"answer": answer} + ``` + + Replace `your_llm(...)` with your model call. Reuse the same turn format you already store. + + | Trait | Pattern A | | --- | --- | -| Number of Tex keys to manage | 1 | -| Bills | 1 (yours) | -| Dashboard access for end-users | None (you build your own UI) | -| Memory isolation | By `session_id` prefix — strict, no cross-talk | -| Latency | Best (one warm client, no per-user TLS) | +| Tex keys you operate | **1** | +| Bills | **1** (yours) | +| Dashboard for end-users | You build it | +| Isolation | Strong, as long as you do not collide `session_id` | - The Tex constructor also accepts `user_id=...`, which sends the value as `scope.user_id` in the request body. That works today, but **the SDK scopes `user_id` per client, not per call** — using it for per-end-user partitioning would force one `Tex` instance per end-user, which is slow. Per-call `user_id` scoping is on the SDK roadmap (1.2). Until then, encode the end-user into `session_id`. + The SDK accepts `user_id` in the constructor, but scopes are per client instance today. Creating one `Tex` client per end user would waste connection pools. Until per-call `user_id` ships in SDK **1.2**, keep the tenant in `session_id`. -## Pattern B — one Tex key per end-user - -You're *reselling* Tex — your customers want to log into our dashboard with their own account, see their own usage, get their own bill. - -```python signup.py -import httpx - -def onboard_end_user(end_user_email: str) -> str: - """Mint a fresh Tex org for this end-user. Persist the key in your DB.""" - resp = httpx.post( - "https://api.getmetacognition.com/signup", - json={"name": end_user_email}, - ) - resp.raise_for_status() - data = resp.json() - db.users.update(end_user_email, tex_api_key=data["api_key"], tex_org_id=data["org_id"]) - return data["api_key"] -``` - -Then per-request, look up that user's key: - -```python -def tex_for_user(end_user_id: str) -> Tex: - row = db.users.get(end_user_id) - return Tex(api_key=row.tex_api_key, base_url=os.environ["TEX_BASE_URL"]) -``` - -Cache the `Tex` per user-id in a TTL'd dict (≤ 1h) so you reuse connections without keeping every user's client live forever. +## B - one key per customer org + + + + ```python signup.py + import httpx + + def onboard_end_user(end_user_email: str) -> str: + resp = httpx.post( + "https://api.getmetacognition.com/signup", + json={"name": end_user_email}, + ) + resp.raise_for_status() + data = resp.json() + db.users.update( + end_user_email, + tex_api_key=data["api_key"], + tex_org_id=data["org_id"], + ) + return data["api_key"] + ``` + + + + ```python + def tex_for_user(end_user_id: str) -> Tex: + row = db.users.get(end_user_id) + return Tex(api_key=row.tex_api_key, base_url=os.environ["TEX_BASE_URL"]) + ``` + + Cache instances in a TTL map for about 1 hour. That keeps warm connections without keeping every customer client forever. + + | Trait | Pattern B | | --- | --- | -| Number of Tex keys to manage | One per end-user — store in your DB. | -| Bills | One per end-user (we send each org an invoice). | -| Dashboard access | Each end-user has their own dashboard login. | -| Memory isolation | At the org level — completely separate. | +| Tex keys | **One per paying customer** | +| Bills | **Per customer org** | +| Dashboard | Each customer can log into Tex directly | +| Isolation | Hard boundary at org level | -## Picking +## Which pattern to choose - You're building an app on top of Tex. Most cases. Cleanest ops. + You ship an app on top of Tex. Infrastructure is shared, ops are simpler, and metering stays in your product. - - You're reselling Tex as part of your platform. Customers want their own bill. + + You resell Tex and customers expect their own bill and console. -## Quota strategy under Pattern A +## Shared quota (A only) + +Daily quotas are **per Tex org**. Under Pattern A, every user shares **your** quota. One noisy tenant can affect everyone. -Daily quotas are per-org. Under Pattern A, all end-users share your one quota — so a runaway end-user can starve everyone else. +Add these controls in your own app: -Mitigations: -- Track per-user token usage yourself (the `usage` field on every response makes this free). -- Soft-cap each user at, say, 10% of your daily limit; switch them to no-memory generation when they exceed it. -- Use [`tex.usage.today()`](/sdk/usage) to gate non-essential paths when total usage > 90%. +- You track per-user bytes/tokens yourself (`usage` is on every response). +- You soft-cap heavy users (for example switch off memory after they consume 10% of your daily budget). +- You poll [`tex.usage.today()`](/sdk/usage) and degrade gracefully after ~90%. diff --git a/recipes/slack-bot.mdx b/recipes/slack-bot.mdx index 23f5ce4..edb5e20 100644 --- a/recipes/slack-bot.mdx +++ b/recipes/slack-bot.mdx @@ -1,80 +1,97 @@ --- -title: "Slack channel memory" -description: "Channel-scoped memory for a Slack workspace." +title: "Slack bot with channel memory" +description: "Build a Slack Bolt app that remembers channel messages and answers mentions with Tex recall." +icon: "hashtag" --- -A Slack bot that remembers everything said in each channel and answers contextual questions when summoned. - -## Install - -```bash -pip install tex-sdk slack-bolt python-dotenv -``` - -## Environment - -```bash -SLACK_BOT_TOKEN=xoxb-... -SLACK_APP_TOKEN=xapp-... # if using socket mode -TEX_API_KEY=tex_live_... -TEX_BASE_URL=https://api.getmetacognition.com -``` - -## Code - -```python bot.py -import os -from datetime import datetime, timezone -from slack_bolt import App -from slack_bolt.adapter.socket_mode import SocketModeHandler -from tex import Tex - -tex = Tex(api_key=os.environ["TEX_API_KEY"], base_url=os.environ["TEX_BASE_URL"]) -app = App(token=os.environ["SLACK_BOT_TOKEN"]) - -def now_iso() -> str: - return datetime.now(timezone.utc).isoformat().replace("+00:00", "Z") - -def session_for(channel: str) -> str: - return f"slack-{channel}" - -@app.event("message") -def remember_message(event, say): - """Remember every non-bot message in the channel.""" - if event.get("subtype") or event.get("bot_id"): - return - tex.conversations.remember( - session_id=session_for(event["channel"]), - turns=[{ - "role": "user", - "text": f"<@{event['user']}>: {event['text']}", - "timestamp": now_iso(), - }], - ) - -@app.event("app_mention") -def answer(event, say): - """When mentioned, answer with channel-scoped memory.""" - query = event["text"].split(">", 1)[-1].strip() - if not query: - say("Ask me something — I'll dig through this channel's memory.") - return - - hits = tex.recall(q=query, session_id=session_for(event["channel"]), top_k=5) - - if not hits.hits.turns or hits.confidence < 0.2: - say(f"<@{event['user']}> I don't have anything relevant in memory yet.") - return - - body = "\n".join(f"• {h.text}" for h in hits.hits.turns[:3]) - say(f"<@{event['user']}> here's what I remember:\n{body}") - -if __name__ == "__main__": - SocketModeHandler(app, os.environ["SLACK_APP_TOKEN"]).start() -``` - -## Patterns - -- **Channel vs DM memory.** `slack-{channel}` shares memory across everyone in the channel. For private notes use `slack-{channel}-{user}`. -- **Skip noisy messages.** Filter joins, leaves, uploads, and reactions before `remember`. Saves tokens and reduces recall noise. -- **Acknowledge slow recalls.** `recall` takes 1–3s. React with `:thinking_face:` immediately, then post the answer. +Give each Slack channel its own Tex **`session_id`**. Save normal messages with **`remember`**. When someone mentions the bot, use **`recall`** to answer from that channel's memory. + +This uses the same isolation pattern as [Scopes and multi-tenancy](/concepts/scopes). + + + + ```bash + pip install tex-sdk slack-bolt python-dotenv + ``` + + + + Load the bot and app tokens. This example uses socket mode: + + ```bash + SLACK_BOT_TOKEN=xoxb-... + SLACK_APP_TOKEN=xapp-... + TEX_API_KEY=tex_live_... + TEX_BASE_URL=https://api.getmetacognition.com + ``` + + + + Ignore bot messages, remember human text, and answer mentions with recall: + + ```python bot.py + import os + from datetime import datetime, timezone + from slack_bolt import App + from slack_bolt.adapter.socket_mode import SocketModeHandler + from tex import Tex + + tex = Tex(api_key=os.environ["TEX_API_KEY"], base_url=os.environ["TEX_BASE_URL"]) + app = App(token=os.environ["SLACK_BOT_TOKEN"]) + + def now_iso() -> str: + return datetime.now(timezone.utc).isoformat().replace("+00:00", "Z") + + def session_for(channel: str) -> str: + return f"slack-{channel}" + + @app.event("message") + def remember_message(event, say): + if event.get("subtype") or event.get("bot_id"): + return + tex.conversations.remember( + session_id=session_for(event["channel"]), + turns=[{ + "role": "user", + "text": f"<@{event['user']}>: {event['text']}", + "timestamp": now_iso(), + }], + ) + + @app.event("app_mention") + def answer(event, say): + query = event["text"].split(">", 1)[-1].strip() + if not query: + say("Ask me something and I'll search this channel's memory.") + return + + hits = tex.recall(q=query, session_id=session_for(event["channel"]), top_k=5) + + if not hits.hits.turns or hits.confidence < 0.2: + say(f"<@{event['user']}> I don't have anything relevant in memory yet.") + return + + body = "\n".join(f"• {h.text}" for h in hits.hits.turns[:3]) + say(f"<@{event['user']}> here's what I remember:\n{body}") + + if __name__ == "__main__": + SocketModeHandler(app, os.environ["SLACK_APP_TOKEN"]).start() + ``` + + + + ```bash + python bot.py + ``` + + Type in a channel, then ask `@YourBot what did we decide?` to verify recall. + + + +## Slack tweaks + +| Topic | What you do | +| --- | --- | +| **Channel vs DM** | You keep `slack-{channel}` for shared rooms; you append `-{user}` if you need private scratch space. | +| **Noise** | You filter joins, uploads, reactions **before** `remember` so you do not pay tokens for junk. | +| **Latency** | React with `:thinking_face:` right away. `recall` can take 1-3s. Post the final answer after recall finishes. | diff --git a/recipes/streamlit.mdx b/recipes/streamlit.mdx index 4a60426..e5337ff 100644 --- a/recipes/streamlit.mdx +++ b/recipes/streamlit.mdx @@ -1,20 +1,142 @@ --- -title: "Chat UI (Streamlit)" -description: "A chat UI with persistent memory in 50 lines." +title: "Streamlit chat UI" +description: "Build one Streamlit page that recalls from Tex, streams GPT output, and shows the memories used." +icon: "desktop" --- -A single-page Streamlit chat that uses Tex for memory across page reloads, browser refreshes, and even cleared sessions. +This recipe builds a small browser demo. Streamlit can rerun the script often, but Tex keeps the long-term memory. `st.session_state` only stores UI state for the current browser session. -## Install + + + ```bash + pip install tex-sdk streamlit openai + ``` + -```bash -pip install tex-sdk streamlit openai -``` + + Wrap `Tex` and `OpenAI` in `@st.cache_resource` so Streamlit does not recreate clients on every interaction: + + ```python + import os + import streamlit as st + from openai import OpenAI + from tex import Tex + + st.set_page_config(page_title="Tex chat", page_icon="🧠") + + @st.cache_resource + def get_clients(): + tex = Tex( + api_key=os.environ["TEX_API_KEY"], + base_url=os.environ.get("TEX_BASE_URL", "https://api.getmetacognition.com"), + ) + gpt = OpenAI() + return tex, gpt + + tex, gpt = get_clients() + ``` + + + + Store a stable `sid` in session state. You can also read `?uid=` from the query string to test different users: + + ```python + if "sid" not in st.session_state: + st.session_state.sid = f"web-{st.query_params.get('uid', 'anon')}-default" + sid = st.session_state.sid + + if "messages" not in st.session_state: + st.session_state.messages = [] + ``` + + + + Call `tex.usage.today()` to show current quota usage during demos: + + ```python + with st.sidebar: + st.write("## Usage today") + today = tex.usage.today() + pct_in = min(1.0, today.tokens_in_used / max(1, today.tokens_in_limit)) + pct_out = min(1.0, today.tokens_out_used / max(1, today.tokens_out_limit)) + st.progress(pct_in, f"in: {today.tokens_in_used:,} / {today.tokens_in_limit:,}") + st.progress(pct_out, f"out: {today.tokens_out_used:,} / {today.tokens_out_limit:,}") + st.caption(f"session: `{sid}`") + ``` + + + + Render messages from `st.session_state.messages`, then handle `st.chat_input`: + + ```python + for m in st.session_state.messages: + with st.chat_message(m["role"]): + st.write(m["text"]) + + if prompt := st.chat_input("Talk to me…"): + import datetime + now = datetime.datetime.now(datetime.timezone.utc).isoformat().replace("+00:00", "Z") + + with st.chat_message("user"): + st.write(prompt) + + with st.chat_message("assistant"): + with st.spinner("Recalling…"): + hits = tex.recall(q=prompt, session_id=sid, top_k=5) + + if hits.hits.turns: + st.caption(f"confidence {hits.confidence:.2f}") + with st.expander("Memory used"): + for h in hits.hits.turns: + st.write(f"`{h.score:.2f}` — {h.text}") + + memory_block = "\n".join(f"- {h.text}" for h in hits.hits.turns) + sys_prompt = f"You are a thoughtful assistant. Memory:\n{memory_block}" + + chat = gpt.chat.completions.create( + model="gpt-4o-mini", + messages=[ + {"role": "system", "content": sys_prompt}, + {"role": "user", "content": prompt}, + ], + stream=True, + ) + answer = st.write_stream( + chunk.choices[0].delta.content or "" for chunk in chat + ) + + tex.conversations.remember( + session_id=sid, + turns=[ + {"role": "user", "text": prompt, "timestamp": now}, + {"role": "assistant", "text": answer, "timestamp": now}, + ], + ) + st.session_state.messages += [ + {"role": "user", "text": prompt}, + {"role": "assistant", "text": answer}, + ] + ``` + -## Code + + ```bash + export TEX_API_KEY=tex_live_... + export OPENAI_API_KEY=sk-... + streamlit run app.py + ``` + + Open `http://localhost:8501/?uid=alice` and `?uid=bob` in two tabs to test isolation. + + + +## One file + +If you want one block to paste, use this full `app.py`: ```python app.py -import os, datetime +import os +import datetime import streamlit as st from openai import OpenAI from tex import Tex @@ -27,21 +149,18 @@ def get_clients(): api_key=os.environ["TEX_API_KEY"], base_url=os.environ.get("TEX_BASE_URL", "https://api.getmetacognition.com"), ) - gpt = OpenAI() # or AzureOpenAI(...) + gpt = OpenAI() return tex, gpt tex, gpt = get_clients() -# Stable session for the browser tab — survives reloads if "sid" not in st.session_state: st.session_state.sid = f"web-{st.query_params.get('uid', 'anon')}-default" sid = st.session_state.sid -# Chat history we display this page render only — Tex is the source of truth if "messages" not in st.session_state: st.session_state.messages = [] -# Sidebar: usage + memory stats with st.sidebar: st.write("## Usage today") today = tex.usage.today() @@ -51,25 +170,22 @@ with st.sidebar: st.progress(pct_out, f"out: {today.tokens_out_used:,} / {today.tokens_out_limit:,}") st.caption(f"session: `{sid}`") -# Render chat for m in st.session_state.messages: with st.chat_message(m["role"]): st.write(m["text"]) -# Input if prompt := st.chat_input("Talk to me…"): - now = datetime.datetime.utcnow().isoformat() + "Z" + now = datetime.datetime.now(datetime.timezone.utc).isoformat().replace("+00:00", "Z") with st.chat_message("user"): st.write(prompt) with st.chat_message("assistant"): - # Recall with st.spinner("Recalling…"): hits = tex.recall(q=prompt, session_id=sid, top_k=5) if hits.hits.turns: - st.caption(f"📚 confidence {hits.confidence:.2f}") + st.caption(f"confidence {hits.confidence:.2f}") with st.expander("Memory used"): for h in hits.hits.turns: st.write(f"`{h.score:.2f}` — {h.text}") @@ -77,7 +193,6 @@ if prompt := st.chat_input("Talk to me…"): memory_block = "\n".join(f"- {h.text}" for h in hits.hits.turns) sys_prompt = f"You are a thoughtful assistant. Memory:\n{memory_block}" - # Generate chat = gpt.chat.completions.create( model="gpt-4o-mini", messages=[ @@ -88,29 +203,23 @@ if prompt := st.chat_input("Talk to me…"): ) answer = st.write_stream(chunk.choices[0].delta.content or "" for chunk in chat) - # Persist + update local state - tex.conversations.remember(session_id=sid, turns=[ - {"role":"user","text": prompt, "timestamp": now}, - {"role":"assistant","text": answer, "timestamp": now}, - ]) + tex.conversations.remember( + session_id=sid, + turns=[ + {"role": "user", "text": prompt, "timestamp": now}, + {"role": "assistant", "text": answer, "timestamp": now}, + ], + ) st.session_state.messages += [ - {"role":"user","text": prompt}, - {"role":"assistant","text": answer}, + {"role": "user", "text": prompt}, + {"role": "assistant", "text": answer}, ] ``` -## Run - -```bash -export TEX_API_KEY=tex_live_... -export OPENAI_API_KEY=sk-... -streamlit run app.py -``` - -Open with `?uid=alice` in the URL to scope memory to "alice" — `?uid=bob` gets a separate memory pool. +## Why Streamlit + Tex -## Why this works +This pattern pairs a disposable browser UI with durable memory in Tex. -- **Refresh-proof** — Streamlit's session state dies on refresh; Tex doesn't. -- **No glue code** — no Postgres / Redis layer for chat history. -- **Built-in debug panel** — the expander shows exactly which memory items the model saw. +- **Refreshes keep memory:** Streamlit session state can reset; Tex memory stays. +- **No Redis or Postgres for demos:** Tex stores the long-term memory. +- **Visible recall:** the expander shows exactly what the model saw. diff --git a/sdk/client.mdx b/sdk/client.mdx index 82fbcad..e1c43ea 100644 --- a/sdk/client.mdx +++ b/sdk/client.mdx @@ -1,10 +1,13 @@ --- title: "Configure the client" -description: "Constructor reference, environment variables, lifecycle." +description: "Set API keys, base URL, timeouts, retries, and client lifetime." +icon: "sliders" --- ## `Tex(...)` constructor +Create one **`Tex`** client and reuse it. The same client handles **`remember`**, **`recall`**, token refresh, retries, and usage calls. + ```python Tex( api_key: str | None = None, @@ -32,7 +35,7 @@ Tex( - Default `org_id` for all requests. Optional — the SDK auto-fills from your JWT. + Default `org_id` for all requests. Optional. The SDK auto-fills it from your JWT. @@ -40,19 +43,19 @@ Tex( - A default `session_id`. You usually pass this per-call instead. + Default `session_id`. Most apps pass this per call. Bring-your-own JWT. If set, the SDK skips the `api_key` exchange. - With BYO-JWT, the SDK does **not** auto-fill `org_id` / `user_id` from `/auth/verify`. Pass them explicitly to the constructor — `remember` and `recall` need them in the request `scope`. + With BYO-JWT, the SDK does **not** auto-fill `org_id` / `user_id` from `/auth/verify`. Pass them to the constructor. `remember` and `recall` need them in the request `scope`. - Companion to `access_token` — used for transparent refresh on 401. + Companion to `access_token`. Used for refresh on 401. @@ -60,7 +63,7 @@ Tex( - Retries on transient errors (408 / 429 / 5xx / network). Exponential backoff. + Retries transient errors: 408, 429, 5xx, and network failures. @@ -83,7 +86,7 @@ TEX_BASE_URL=https://api.getmetacognition.com ## Lifecycle -The client maintains a pooled `httpx.Client` under the hood. **Construct once, reuse everywhere.** +The client keeps a pooled `httpx.Client` under the hood. **Construct once, reuse everywhere.** @@ -115,7 +118,7 @@ The client maintains a pooled `httpx.Client` under the hood. **Construct once, r tex = Tex(api_key=...) # opens a new TLS session every request return tex.recall(...) ``` - This costs you the TCP+TLS handshake on every call. + This opens a new TCP and TLS connection on every call. @@ -123,7 +126,7 @@ The client maintains a pooled `httpx.Client` under the hood. **Construct once, r The client is **thread-safe for read traffic** (`recall`, `usage.today`). -For write traffic under high RPS, push to a worker pool: +For high write volume, push **`remember`** calls to a worker pool: ```python from concurrent.futures import ThreadPoolExecutor @@ -133,7 +136,7 @@ def remember_async(turns, sid): pool.submit(tex.conversations.remember, turns=turns, session_id=sid) ``` -A native async client is on the roadmap. Open an issue if you need it sooner. +A native async client is on the roadmap. ## Closing @@ -141,7 +144,7 @@ A native async client is on the roadmap. Open an issue if you need it sooner. tex.close() # closes the underlying httpx.Client ``` -Or use the context manager pattern (above) — `__exit__` calls `close()`. +Or use the context manager pattern above. `__exit__` calls `close()`. Push conversation turns into memory. diff --git a/sdk/conversations-remember.mdx b/sdk/conversations-remember.mdx index e62e684..ab682b8 100644 --- a/sdk/conversations-remember.mdx +++ b/sdk/conversations-remember.mdx @@ -1,8 +1,11 @@ --- -title: "Remember" -description: "Persist conversation turns to memory." +title: "Remember conversation turns" +description: "Store turns in a session, with optional observations and metadata." +icon: "pen-to-square" --- +Use **`tex.conversations.remember`** to store new conversation turns. This is the write side of the loop from the [Quickstart](/quickstart). + ```python RememberResponse = tex.conversations.remember( turns: list[dict], @@ -15,7 +18,7 @@ RememberResponse = tex.conversations.remember( ## Parameters - A list of turn dicts. **At least one turn is required** (server rejects empty lists with `422`). Each turn: + List of turn dicts. **At least one turn is required**. Empty lists return `422`. Each turn: ```python { @@ -32,17 +35,17 @@ RememberResponse = tex.conversations.remember( - Free-form metadata attached to the batch. Searchable in deep mode. + Free-form metadata attached to the batch. Deep mode can use it during search. ## Returns - Stable identifier for this write. Use it to correlate with logs. Always present in production. + Stable identifier for this write. Use it to match SDK logs with server logs. - IDs of the active-memory fragments. These are recallable immediately. + IDs of the active-memory fragments. These are recallable right away. @@ -96,7 +99,7 @@ tex.conversations.remember(session_id=f"chat-{conv_id}", turns=turns) ### Pre-extracted observations -If you've already extracted structured facts on your side (e.g. with your own LLM call), pass them inline to skip Tex's extraction step: +If your app already extracted structured facts, pass them inline: ```python turns = [{ @@ -114,26 +117,26 @@ tex.conversations.remember(session_id="chat-1", turns=turns) ## Behavior - - The call returns after active memory persists (~150ms). The turn is recallable immediately. + + The call returns after active memory is saved, usually around 150ms. The turn is recallable right away. - - Observations and entities are extracted and indexed in the background. They become available in subsequent recalls within seconds to a minute. + + Tex extracts observations and entities in the background. They appear in later recalls. ## Best practices - **Batch.** Pass dozens of turns in one call. Don't loop one-per-turn. -- **UTC ISO 8601** for timestamps (`...Z` suffix). Avoids timezone surprises in temporal queries. +- **Use UTC ISO 8601** for timestamps (`...Z` suffix). This keeps temporal queries clear. - **Skip system messages.** They consume tokens and add noise to recall. -- **Background-task `remember`.** Users shouldn't wait for it on the request path — push to Celery / RQ / a `BackgroundTasks` queue. +- **Run `remember` off the request path.** Use Celery, RQ, or a `BackgroundTasks` queue when users should not wait. ## Idempotency -Tex computes a stable hash per turn (text + timestamp + role). Re-sending the same turn is a no-op — there's no double-counting in active memory or in your token bill. +Tex computes a stable hash per turn from text, timestamp, and role. Re-sending the same turn is a no-op. It does not create duplicate active memory or double bill the same turn. -This means you can safely retry a `remember` call after a network blip without worrying about duplicates. +This makes retries safe after a network blip. Pull the relevant slice of memory. diff --git a/sdk/errors.mdx b/sdk/errors.mdx index cabeda6..44689f8 100644 --- a/sdk/errors.mdx +++ b/sdk/errors.mdx @@ -1,9 +1,12 @@ --- -title: "Handle errors" -description: "Every exception class and the SDK's automatic retry behavior." +title: "Errors and retries" +description: "Exception types, which status codes retry automatically, and what to log before you open a ticket." +icon: "triangle-exclamation" --- -All Tex exceptions inherit from `tex.TexError`. You almost always want to catch one of: +When something fails, start with the exception class. If you need a symptom-based guide, use [Troubleshooting](/troubleshooting). + +All Tex exceptions inherit from `tex.TexError`. Most apps catch one of these: | Class | Status | Inherits | When | | --- | --- | --- | --- | @@ -17,25 +20,25 @@ All Tex exceptions inherit from `tex.TexError`. You almost always want to catch | [`InternalServerError`](#internalservererror) | 5xx | `APIStatusError` | Our problem; SDK retried | | [`APITimeoutError`](#apitimeouterror) | — | `APIConnectionError` → `APIError` | Network or server too slow | | [`APIConnectionError`](#apiconnectionerror) | — | `APIError` | DNS, TLS, connection reset | -| [`APIResponseValidationError`] | — | `APIError` | Server returned an unexpected shape | +| [`APIResponseValidationError`] | - | `APIError` | Server returned an unexpected response | `TexHTTPError` (alias of `APIStatusError`) and `TexAuthError` (alias of `AuthenticationError`) are kept for backward compatibility. -## Common shape +## Common fields ```python # All TexError subclasses e.message # human-readable -# APIStatusError subclasses (everything with an HTTP status — see "Inherits" column above) +# APIStatusError subclasses (everything with an HTTP status) e.status_code # int -e.request_id # X-Correlation-ID — quote in support tickets -e.details # dict — server's full JSON, may include field-level errors +e.request_id # X-Correlation-ID; include this in support tickets +e.details # dict; server JSON, may include field errors e.response_text # raw response body, capped at 2KB ``` - `APITimeoutError` and `APIConnectionError` are **network errors**, not HTTP errors — they don't have `status_code`, `request_id`, `details`, or `response_text` because the request never produced a response. Catch them separately. + `APITimeoutError` and `APIConnectionError` are **network errors**, not HTTP errors. They do not have `status_code`, `request_id`, `details`, or `response_text` because the request never produced a response. Catch them separately. ```python @@ -59,47 +62,47 @@ except BadRequestError as e: ### `BadRequestError` -Returned when a payload is malformed. Common causes: +Raised when a payload is malformed. Common causes: - Missing required field on a turn (e.g. no `text`) - Invalid `mode` value on `recall` -- Invalid `session_id` shape (must be a string) +- Invalid `session_id` value. It must be a string. -`e.details` includes a Pydantic-style `loc` list that tells you exactly which field broke. +`e.details` includes a Pydantic-style `loc` list that points to the bad field. ### `AuthenticationError` -Status 401. The SDK already attempted a JWT refresh (once) before raising. +Status 401. The SDK already tried one JWT refresh before raising. ```python try: tex.recall(q=q, session_id=sid) except AuthenticationError as e: if "Invalid API key" in e.message: - # Real bad-key situation + # Bad API key rotate_key_alarm() else: - # JWT refresh failed — likely revoked + # JWT refresh failed, likely revoked notify_user("please log in again") ``` ### `PermissionDeniedError` -Status 403. The credential is valid but lacks scope. Mostly relevant for keys you've scoped down (admin keys, read-only keys). For default keys this shouldn't fire. +Status 403. The credential is valid but lacks scope. This mostly matters for scoped keys. Default keys should not hit this. ### `NotFoundError` -Status 404. You referenced something that doesn't exist — typically a stale `key_id` on `DELETE /me/api-keys/{id}`. +Status 404. You referenced something that does not exist. This is often a stale `key_id` on `DELETE /me/api-keys/{id}`. ### `UnprocessableEntityError` -Status 422. FastAPI's Pydantic validation rejected the payload. The SDK builds payloads for you, so this usually only fires when you've passed an unexpected type. +Status 422. FastAPI validation rejected the payload. The SDK builds payloads for you, so this usually means an argument has the wrong type. ### `RateLimitError` Status 429. The SDK retries on `429` like other transient codes (with exponential backoff and `Retry-After` honored), so by the time you see this exception the SDK has already exhausted retries. -For **daily-quota 429s** (the normal case), retries are futile until midnight UTC — set `max_retries=0` for paths where you'd rather fail fast and degrade: +For **daily-quota 429s**, retries will not help until midnight UTC. Set `max_retries=0` on paths where you would rather fail fast: ```python try: @@ -108,7 +111,7 @@ except RateLimitError as e: return generate_without_memory(q) # graceful degradation ``` -The `e.details` payload tells you which cap fired (`tokens_in_daily` or `tokens_out_daily`) and when it resets: +The `e.details` payload tells you which cap was exceeded (`tokens_in_daily` or `tokens_out_daily`) and when it resets: ```python e.details @@ -117,11 +120,11 @@ e.details ### `InternalServerError` -Status 5xx. The SDK already retried (default 2x with exponential backoff). If you still see this, file a ticket with `e.request_id`. +Status 5xx. The SDK already tried the request again with exponential backoff. If you still see this, file a ticket with `e.request_id`. ### `APITimeoutError` -The request didn't return within `timeout`. The SDK retries on timeout, but ultimately surfaces `APITimeoutError` if all attempts fail. +The request did not return within `timeout`. The SDK retries timeouts. If every attempt fails, it raises `APITimeoutError`. ```python try: @@ -132,7 +135,7 @@ except APITimeoutError: ### `APIConnectionError` -DNS, TLS, or socket-level failure. Same retry behavior as `APITimeoutError`. If you see this in production, check your egress proxy / firewall. +DNS, TLS, or socket-level failure. The retry behavior is the same as `APITimeoutError`. If you see this in production, check your egress proxy or firewall. ## Built-in retries @@ -148,15 +151,15 @@ Default: **2 retries** with exponential backoff (0.5s, 1s). Override: tex = Tex(api_key=..., max_retries=5) ``` -The `Retry-After` header is honored — if the server says wait 3 seconds, the SDK waits at least 3 (not capped at the backoff value). +The SDK honors `Retry-After`. If the server says wait 3 seconds, the SDK waits at least 3 seconds. - `429` from the daily-quota path retries the same way other 429s do — but you'll still be over quota when the retry hits, so the `RateLimitError` ultimately surfaces. If that round-trip waste matters, set `max_retries=0` on quota-sensitive paths. + Quota `429`s retry like other `429`s. The retry will still fail if you are over the cap. Set `max_retries=0` on quota-sensitive paths if you want to fail faster. ## Idempotency -`remember` is idempotent (turn-hash dedup). `recall` and `usage.*` are read-only. Safe to retry any of them. +`remember` is idempotent because Tex deduplicates turns by hash. `recall` and `usage.*` are read-only. It is safe to retry any of them. Direct HTTP integration without the SDK. diff --git a/sdk/installation.mdx b/sdk/installation.mdx index 569bc01..9661b48 100644 --- a/sdk/installation.mdx +++ b/sdk/installation.mdx @@ -1,55 +1,68 @@ --- -title: "Install" -description: "Install tex-sdk and verify the install." +title: "Install the SDK" +description: "Install tex-sdk, verify the import, and pin a safe version range." +icon: "download" --- ## Requirements -- Python ≥ 3.9 +- Python 3.9 or newer - `pip`, `uv`, or `poetry` +If you have not made a live **`remember`** and **`recall`** call yet, start with the [Quickstart](/quickstart). + ## Install -```bash + +```bash pip pip install tex-sdk ``` -Or `uv add tex-sdk` / `poetry add tex-sdk`. +```bash uv +uv add tex-sdk +``` + +```bash poetry +poetry add tex-sdk +``` + - PyPI distribution is `tex-sdk`. Import name is `tex`: + PyPI lists the package as **`tex-sdk`**. You **`import tex`** in Python: ```python from tex import Tex ``` -## Verify +## Check version ```bash python -c "import tex; print(tex.__version__)" # 1.1.0 ``` -## Pin a version +## Pin versions ```text requirements.txt tex-sdk>=1.1.0,<2 ``` -For PEP 621 `pyproject.toml`, add `"tex-sdk>=1.1.0,<2"` under `[project].dependencies`. For Poetry, `tex-sdk = "^1.1.0"` under `[tool.poetry.dependencies]`. +For PEP 621 `pyproject.toml`, add `"tex-sdk>=1.1.0,<2"` under `[project].dependencies`. + +For Poetry, use `tex-sdk = "^1.1.0"` under `[tool.poetry.dependencies]`. -## Optional extras +## HTTP/2 and proxies -The SDK pulls in `httpx[http2]` automatically, which gives you HTTP/2 multiplexing. If your egress proxy strips HTTP/2, disable it: +The SDK uses `httpx[http2]`. If your proxy strips HTTP/2, disable it when you construct the client: ```python tex = Tex(api_key=..., http2=False) ``` -## Type hints +## Types -The SDK ships type stubs (`py.typed`). Mypy and Pyright pick them up automatically — no `types-` package needed. +Type stubs ship with the package (`py.typed`). Mypy and Pyright pick them up automatically. You do not need a separate `types-` package. ```python from tex import Tex, RecallResponse @@ -60,5 +73,5 @@ reveal_type(hits.confidence) # float ``` - Constructor options, environment variables, lifecycle. + Environment variables, timeouts, and lifecycle hooks. diff --git a/sdk/recall.mdx b/sdk/recall.mdx index 499b47c..9d83f52 100644 --- a/sdk/recall.mdx +++ b/sdk/recall.mdx @@ -1,8 +1,13 @@ --- -title: "Recall" -description: "Pull the most relevant slice of memory for a query." +title: "Recall relevant context" +description: "Query memory with natural language and get ranked hits, confidence, and usage." +icon: "magnifying-glass" --- +Use **`tex.recall`** before you call your model. It returns the memory that best matches the current user message. + +For **`mode`**, **`top_k`**, and **`confidence`**, read [Recall and ranking](/concepts/retrieval). This page lists the Python fields. + ```python RecallResponse = tex.recall( q: str, @@ -15,29 +20,29 @@ RecallResponse = tex.recall( ``` - `recall` is callable on the client itself: `tex.recall(...)`. There's no `tex.recall.search(...)` indirection. + Call recall directly on the client: `tex.recall(...)`. There is no `tex.recall.search(...)`. ## Parameters - Natural-language query. Same shape as the user's most recent turn works well. + Natural-language query. The user's latest message usually works well. - Which session(s) to search. Use the same `session_id` you wrote with. + Session to search. Use the same `session_id` you wrote with. - Retrieval depth. See [Retrieval](/concepts/retrieval). + Retrieval depth. See [Recall and ranking](/concepts/retrieval). - Number of hits across all kinds. Defaults to **15** in `active` mode, **25** in `deep` mode. Server caps at **30** regardless of the value you send. + Number of hits across all kinds. Defaults to **15** in `active` mode and **25** in `deep` mode. The server caps the final value at **30**. - Adds a chronological list of relevant events to the response. + When true, the response includes a pre-rendered **`timeline`** string (not a structured list). ## Returns @@ -47,15 +52,15 @@ RecallResponse = tex.recall( - Atomic facts extracted from past turns (e.g. preferences, decisions). + Small facts extracted from past turns, such as preferences or decisions. - People / places / things linked across observations. + People, places, and things linked across observations. - Calibrated confidence in [0, 1]. P(hits relevant | confidence = c) ≈ c. + Calibrated confidence in [0, 1]. Higher means the returned memory is more likely to help. @@ -70,29 +75,29 @@ RecallResponse = tex.recall( `tokens_in` / `tokens_out` billed for this call. Always present in production. -### `RecallHit` shape (turns / observations) +### `RecallHit` fields (turns / observations) ```python @dataclass(frozen=True) class RecallHit: id: str | None # stable; persists across recalls text: str # matched content - score: float # raw relevance, 0.0–1.0 - kind: str # "turn" | "observation" | "entity" — defaults to "turn" + score: float # raw relevance, 0.0-1.0 + kind: str # "turn" | "observation" | "entity"; defaults to "turn" timestamp: str | None ``` -### `RecallEntity` shape (entities only) +### `RecallEntity` fields (entities only) ```python @dataclass(frozen=True) class RecallEntity: id: str | None - label: str # the entity's surface label (e.g. "Acme") + label: str # the entity label (e.g. "Acme") score: float ``` -`RecallEntity` is **not** the same shape as `RecallHit` — it has `label` instead of `text` and no `kind` / `timestamp`. +`RecallEntity` is **not** the same as `RecallHit`. It has `label` instead of `text`, and it does not have `kind` or `timestamp`. ## Examples @@ -134,7 +139,7 @@ if hits.timeline: print(hits.timeline) # a pre-rendered chronological summary string ``` -`timeline` is a free-form string that the server pre-renders by walking the temporal index for matched turns. Treat it as a paragraph to drop into a prompt, not a structured array. +`timeline` is a free-form string. Drop it into a prompt as text. Do not treat it like an array. ### Multi-source recall @@ -154,7 +159,7 @@ context = "\n".join(f"- {h.text}" for h in (bio.hits.turns + chat.hits.turns)) | `active` | 1.7s | 2.5s | Every interactive call | | `deep` | 3.5s | 6s | Periodic analysis, low-confidence retries | -Set `timeout=2.0` on the constructor for interactive paths to bound your tail latency. Catch `APITimeoutError` and degrade gracefully: +Set `timeout=2.0` on the constructor for interactive paths. Catch `APITimeoutError` and continue without memory: ```python try: diff --git a/sdk/usage.mdx b/sdk/usage.mdx index 92fc3bb..be458be 100644 --- a/sdk/usage.mdx +++ b/sdk/usage.mdx @@ -1,15 +1,18 @@ --- -title: "Track usage" -description: "Read your org's token totals." +title: "Inspect usage" +description: "Read today's totals and monthly rollups from the SDK." +icon: "chart-pie" --- +Use these methods to read org-wide usage totals. They are useful for dashboards, quota checks, and billing reports. + ```python tex.usage.today() # today's usage + daily quota tex.usage.summary() # current calendar month tex.usage.summary(month="2026-03") # specific month ``` -These calls don't count against your own quota — query them as often as you want. +These helpers are read-only. **They do not count toward your quota**. ## `tex.usage.today()` @@ -93,23 +96,23 @@ df.set_index("month")[["tokens_in","tokens_out"]].plot.bar() ## Patterns -- **Per-response usage** is on every `remember` / `recall`: +- **Per-response usage** is on every `remember` and `recall`: ```python hits = tex.recall(q="...", session_id=sid) print(hits.usage.tokens_in, hits.usage.tokens_out) ``` -- **Quota-aware routing** — gate non-essential paths past 90%: +- **Quota-aware routing** - turn off non-essential memory paths past 90%: ```python if tex.usage.today().tokens_in_used > 0.9 * tex.usage.today().tokens_in_limit: return generate_without_memory(query) ``` - Cache `usage.today()` on your side (≤ 60s) — this is soft gating, not strict. + Cache `usage.today()` on your side for up to 60 seconds. Use it as a soft guard, not a strict limiter. -- **Usage charts** — call `summary(month="YYYY-MM")` for each of the last 6 months and render with whatever charting lib you use. +- **Usage charts** - call `summary(month="YYYY-MM")` for each of the last 6 months and render the results in your charting library. Every exception class and how to handle it. diff --git a/snippets/flow-visuals.mdx b/snippets/flow-visuals.mdx new file mode 100644 index 0000000..64a33a2 --- /dev/null +++ b/snippets/flow-visuals.mdx @@ -0,0 +1,116 @@ +export const PipelineFlow = ({ steps, caption }) => ( +
+
+ {caption && ( +
+ {caption} +
+ )} +
+ {steps.map((step, i) => ( +
+
+ {step.phase && ( +
+ {step.phase} +
+ )} +
{step.label}
+ {step.hint && ( +
{step.hint}
+ )} +
+ {i < steps.length - 1 && ( +
+ ↓ +
+ )} +
+ ))} +
+
+
+); + +export const AuthSequence = ({ phases }) => ( +
+
+
+ {phases.map((p, i) => ( +
+
+ {i + 1} +
+
+
{p.title}
+
+ {p.detail} +
+
+
+ ))} +
+
+
+); + +export const TokenRetryVisual = () => ( +
+
+
+
+
+ Start +
+
+ A normal request comes back 401 +
+
+ Usually the access token expired; refresh might still work. +
+
+
+
+
+ First try +
+
POST /auth/refresh
+
+ If that returns 200, you get a new access token and retry what you were doing. +
+
+
+ ↓ if refresh is also 401 +
+
+
+ Fallback +
+
+ POST /auth/token-exchange +
+
+ Send your API key again. 200 means retry with the new token. 401 means the key is dead and you need a new + one or a proper login flow. +
+
+
+

+ In most SDK setups a single failed request can walk through refresh (and then exchange) before your code + returns an error unless the full chain returns 401. +

+
+
+); diff --git a/troubleshooting.mdx b/troubleshooting.mdx index a22af6b..2e38bf4 100644 --- a/troubleshooting.mdx +++ b/troubleshooting.mdx @@ -1,42 +1,68 @@ --- title: "Troubleshooting" -description: "Common symptoms, root causes, and fixes." +description: "Match an error or symptom, apply the fix, and collect the right details for support." +icon: "wrench" --- -## Symptom → cause → fix + + Start with the symptom table. Find the error or behavior you see, then try the fix in the same row. + + +## Match your symptom | Symptom | Likely cause | Fix | | --- | --- | --- | -| `AuthenticationError: Invalid API key` on first call | Wrong key, key revoked, or wrong `base_url` | Re-mint at the [dashboard](https://app.getmetacognition.com); verify `TEX_BASE_URL` | -| `BadRequestError: 'scope' field required` | You're calling REST directly without the SDK | Use the SDK; it builds `scope` for you. Or include `scope: {org_id, session_id}` in the body. | -| `recall` returns 0 hits | New session, or memory hasn't finished passive enrichment yet | Wait 1–2s after `remember`; query a broader `q`; switch to `mode="deep"` | -| `recall.confidence` always low | Your `q` doesn't match anything stored | Re-phrase; switch to `mode="deep"`; raise `top_k` | -| `RateLimitError` mid-day | Hit the daily quota | Wait until 00:00 UTC, reduce `top_k`, pre-filter writes | -| Slow `recall` (> 5s) | Likely `mode="deep"` or cold cache | Use `mode="active"`; warm-start the client | -| `httpx.RemoteProtocolError` / HTTP/2 issues | Egress proxy strips h2 | `Tex(http2=False)` | -| Long `remember` blocks the user | You're awaiting it on the request path | Push to a background worker — see [FastAPI recipe](/recipes/fastapi#production-tweaks) | -| Tests are flaky against Tex | You're hitting the real cluster from CI | Use a dedicated CI org and a daily-cleanup script | -| `last_used_at` on the dashboard isn't updating | Caching / stale UI | Hard refresh; the field updates within seconds of a real call | +| `AuthenticationError: Invalid API key` on first call | Wrong key, revoked key, or wrong `base_url` | Mint a fresh key in the [dashboard](https://app.getmetacognition.com); confirm `TEX_BASE_URL` points to the right environment | +| `BadRequestError: 'scope' field required` | Raw REST call without the fields the SDK normally adds | Use the SDK, or include `scope: {org_id, session_id}` yourself in the JSON body | +| `recall` returns zero hits | Brand-new session, enrichment still catching up, or query mismatch | Wait a second after `remember`; broaden `q`; try `mode="deep"` once to see if signal appears | +| `recall.confidence` looks stuck low | Query does not overlap stored content | Rephrase closer to the stored wording; raise `top_k`; try `mode="deep"` | +| `RateLimitError` mid-day | Org exhausted daily quota | Wait for **00:00 UTC**, lower `top_k`, or trim noisy writes. See [Usage and billing](/concepts/usage-billing) | +| Slow `recall` (> 5s) | `mode="deep"` or cold caches | Default to `mode="active"` in user-facing paths; warm the client on deploy | +| `httpx.RemoteProtocolError` / HTTP/2 noise | Middlebox stripped HTTP/2 | Instantiate with `Tex(http2=False)` | +| Long `remember` blocks UX | You await ingestion on the hot request path | Push persistence to a worker. The [FastAPI recipe](/recipes/fastapi#production-tweaks) shows the pattern | +| Flaky CI against prod | Shared org contention or dirty sessions | Give CI its own org and sweep data on a schedule | +| Dashboard `last_used_at` looks stale | Browser cache | Hard refresh; the field updates within seconds of real traffic | -## Filing a ticket +## Extra + +### Auth still failing after you rotated keys + +SDK clients cache JWTs until expiry. After you deploy a new **`TEX_API_KEY`**, restart workers or recreate the client. That stops them from sending tokens created from the old key. Also confirm **`https://api.getmetacognition.com`** is spelled correctly in staging configs. + +### You see hits but they feel unrelated -Include: +Recall ranks by **relevance**, not chronological order. Set **`include_timeline=True`** when the model needs time order. If results still look wrong, check that you are using the same **`session_id`** for write and read. [Scopes and multi-tenancy](/concepts/scopes) explains the mapping. + +### Confidence swings between identical queries + +Small score changes can happen when candidates are close together. If the spread is bigger than **~0.1**, capture **`request_id`** and file a ticket. + +## Filing a ticket -1. The full `e.request_id` from the exception (a UUID). -2. The approximate timestamp (UTC). -3. The verb you called (`recall`, `remember`, …) and the `session_id`. -4. The SDK version (`tex.__version__`). -5. Anything sensitive — *redact before sharing.* + + + Copy `e.request_id` from any SDK exception. It is safe to share. + + + Give us the approximate UTC time the call failed. + + + Include the method (`recall`, `remember`, etc.), `session_id`, and whether you hit REST or the SDK. + + + Run `python -c "import tex; print(tex.__version__)"`. Scrub API keys or PII before you press send. + + -Email `support@getmetacognition.com` or open an issue on [GitHub](https://github.com/metacoglabs). +You can email `support@getmetacognition.com` or open an issue on [GitHub](https://github.com/metacoglabs). -## Diagnostics — quick checks +## Copy-paste diagnostics ```python # 1. Auth works? -tex.usage.today() # if this raises AuthenticationError, your key is bad +tex.usage.today() # AuthenticationError here means your key or host is wrong -# 2. Round-trip works? +# 2. Round-trip latency? import time t0 = time.perf_counter() tex.recall(q="ping", session_id="diag") @@ -46,8 +72,8 @@ print(f"recall RTT: {(time.perf_counter()-t0)*1000:.0f}ms") import tex; print(tex.__version__) ``` -## Common-but-not-bugs +## Things that look like bugs but are not -- **Turns out of order?** Recall ranks by relevance, not chronology. For chronological order, set `include_timeline=True`. -- **Confidence varies between identical queries?** Some randomness is intrinsic (rerank ties broken by hash). Variation > 0.1 suggests an actual issue — file a ticket. -- **`session_id` mismatch returns nothing.** By design — sessions are isolated. See [scopes](/concepts/scopes) for cross-session patterns. +- **Hits are not chronological:** relevance ordering is intentional. Use timeline mode when you need chronology. +- **Empty cross-session recall:** sessions are isolated until you design a scope strategy. See [Scopes and multi-tenancy](/concepts/scopes). +- **Identical queries, tiny score deltas:** expect minor movement; large swings merit a ticket.