From 5d1de4db04b6e742ed9da36c15ca32ce45fea3e6 Mon Sep 17 00:00:00 2001 From: venkat1701 Date: Thu, 14 May 2026 05:10:48 +0530 Subject: [PATCH 1/2] docs: Mintlify nav, humanized copy, and fixed flow visuals - Replace mint.json with consolidated docs.json to remove duplicate Documentation nav - Add snippets/flow-visuals (vertical PipelineFlow, Tailwind styling, TokenRetryVisual) - Humanize benchmarks, concepts, API overview, introduction, troubleshooting - Fix authentication import path and add Mermaid + AuthSequence flow - Update README navigation pointer to docs.json Co-authored-by: Cursor --- README.md | 4 +- api-reference/account/api-keys.mdx | 4 +- api-reference/account/me.mdx | 4 +- api-reference/auth/refresh.mdx | 20 +-- api-reference/auth/signup.mdx | 4 +- api-reference/auth/token-exchange.mdx | 4 +- api-reference/memory/ingest-memory.mdx | 4 +- api-reference/memory/recall.mdx | 2 +- api-reference/overview.mdx | 50 +++--- api-reference/usage/summary.mdx | 4 +- api-reference/usage/today.mdx | 2 +- authentication.mdx | 98 +++++++---- benchmarks.mdx | 54 +++--- changelog.mdx | 3 +- concepts/memory-model.mdx | 111 +++++++++---- concepts/retrieval.mdx | 63 ++++--- concepts/scopes.mdx | 41 ++--- concepts/usage-billing.mdx | 47 +++--- docs.json | 180 ++++++++++++++++++++ introduction.mdx | 94 +++++++---- migration/from-langchain-memory.mdx | 21 +-- migration/from-redis.mdx | 10 +- migration/from-supermemory.mdx | 4 +- mint.json | 160 ------------------ quickstart.mdx | 71 ++++++-- recipes/azure-openai-rag.mdx | 188 +++++++++++---------- recipes/fastapi.mdx | 214 ++++++++++++++++++------ recipes/langchain.mdx | 218 ++++++++++++++----------- recipes/multi-tenant-saas.mdx | 193 ++++++++++++---------- recipes/slack-bot.mdx | 169 ++++++++++--------- recipes/streamlit.mdx | 187 ++++++++++++++++----- sdk/client.mdx | 3 +- sdk/conversations-remember.mdx | 5 +- sdk/errors.mdx | 7 +- sdk/installation.mdx | 37 +++-- sdk/recall.mdx | 5 +- sdk/usage.mdx | 5 +- snippets/flow-visuals.mdx | 116 +++++++++++++ troubleshooting.mdx | 82 +++++++--- 39 files changed, 1567 insertions(+), 921 deletions(-) create mode 100644 docs.json delete mode 100644 mint.json create mode 100644 snippets/flow-visuals.mdx diff --git a/README.md b/README.md index 2c50404..73b8905 100644 --- a/README.md +++ b/README.md @@ -21,8 +21,8 @@ Open http://localhost:3000. ## Edit - Page content lives in `*.mdx` files. -- Navigation is configured in [`mint.json`](./mint.json). -- Add a new page by creating the `.mdx` file and listing it under the right group in `mint.json`. +- Navigation is configured in [`docs.json`](./docs.json). +- Add a new page by creating the `.mdx` file and listing it under the right group in `docs.json`. ## Deploy diff --git a/api-reference/account/api-keys.mdx b/api-reference/account/api-keys.mdx index 16a4154..2876262 100644 --- a/api-reference/account/api-keys.mdx +++ b/api-reference/account/api-keys.mdx @@ -1,6 +1,6 @@ --- -title: "Manage API keys" -description: "List, mint, and revoke API keys." +title: "List, create, and revoke API keys" +description: "Manage long-lived keys for your org—mint, list metadata, and revoke without taking production down." --- ## `GET /me/api-keys` diff --git a/api-reference/account/me.mdx b/api-reference/account/me.mdx index 47d98d4..1fbb926 100644 --- a/api-reference/account/me.mdx +++ b/api-reference/account/me.mdx @@ -1,6 +1,6 @@ --- -title: "Current org" -description: "Read your org, user, and active API keys." +title: "Get current organization" +description: "Authenticated org, user, and all API keys visible to that org." api: "GET https://api.getmetacognition.com/me" --- diff --git a/api-reference/auth/refresh.mdx b/api-reference/auth/refresh.mdx index 2449240..b172eb0 100644 --- a/api-reference/auth/refresh.mdx +++ b/api-reference/auth/refresh.mdx @@ -1,9 +1,12 @@ --- -title: "Refresh token" -description: "Mint a new access token using a refresh token." +title: "Refresh access token" +description: "Rotate access tokens using a valid refresh token; the SDK calls this after a 401." api: "POST https://api.getmetacognition.com/auth/refresh" +icon: "arrows-rotate" --- +import { TokenRetryVisual } from "../../snippets/flow-visuals.mdx"; + Use this when an `access_token` has expired but the `refresh_token` is still valid. The SDK does this automatically on 401. ## Body @@ -60,13 +63,8 @@ tokens = resp.json() If the refresh token is itself expired (>7d old) or revoked (the underlying API key was revoked), `/auth/refresh` returns `401`. Fall back to `/auth/token-exchange` with the original API key — or, if the API key is also gone, surface a re-login flow. -```mermaid -flowchart TD - A[401 on a normal call] --> B[Try POST /auth/refresh] - B -->|200| C[Retry original call] - B -->|401| D[Try POST /auth/token-exchange] - D -->|200| C - D -->|401| E[Surface AuthenticationError to user] -``` +### After 401 + + -The Python SDK runs this entire flow internally on a single 401 response — you only see `AuthenticationError` if the whole chain fails. +Without the SDK, implement the same sequence in your HTTP client. With the SDK you usually only surface `AuthenticationError` if refresh **and** exchange both fail. diff --git a/api-reference/auth/signup.mdx b/api-reference/auth/signup.mdx index 9cf4c0c..bc0ad42 100644 --- a/api-reference/auth/signup.mdx +++ b/api-reference/auth/signup.mdx @@ -1,6 +1,6 @@ --- -title: "Create account" -description: "Create a new org and mint the first API key." +title: "Sign up" +description: "Create an org and receive your first API key—plaintext key is shown once in the response." api: "POST https://api.getmetacognition.com/signup" --- diff --git a/api-reference/auth/token-exchange.mdx b/api-reference/auth/token-exchange.mdx index 765c6f3..1dcb8e6 100644 --- a/api-reference/auth/token-exchange.mdx +++ b/api-reference/auth/token-exchange.mdx @@ -1,6 +1,6 @@ --- -title: "Exchange API key" -description: "Exchange an API key for a short-lived JWT pair." +title: "Exchange API key for tokens" +description: "Trade an API key for short-lived access + refresh JWTs—mirrors what the Python SDK does on first call." api: "POST https://api.getmetacognition.com/auth/token-exchange" --- diff --git a/api-reference/memory/ingest-memory.mdx b/api-reference/memory/ingest-memory.mdx index 83ebb15..34a7b35 100644 --- a/api-reference/memory/ingest-memory.mdx +++ b/api-reference/memory/ingest-memory.mdx @@ -1,6 +1,6 @@ --- -title: "Remember conversation" -description: "Persist conversation turns into memory." +title: "Ingest conversation memory" +description: "REST `remember`: write turns under a scope; active path is synchronous, enrichment continues async." api: "POST https://api.getmetacognition.com/ingestion/memory" --- diff --git a/api-reference/memory/recall.mdx b/api-reference/memory/recall.mdx index a802640..7b522d5 100644 --- a/api-reference/memory/recall.mdx +++ b/api-reference/memory/recall.mdx @@ -1,6 +1,6 @@ --- title: "Recall memory" -description: "Retrieve the most relevant slice of memory." +description: "REST `recall`: natural-language query, ranked hits, confidence, token usage in the response." api: "POST https://api.getmetacognition.com/recall" --- diff --git a/api-reference/overview.mdx b/api-reference/overview.mdx index c8913fa..aad410c 100644 --- a/api-reference/overview.mdx +++ b/api-reference/overview.mdx @@ -1,9 +1,24 @@ --- -title: "Overview" -description: "Use Tex directly over HTTPS — for languages and runtimes the SDK doesn't cover yet." +title: "REST API overview" +description: "Base URL, JWT auth, correlation IDs, error bodies, rate limits, retries—everything before you open the endpoint list." +icon: "globe" --- -The Tex API is a small REST surface. If you're writing Python, prefer the [SDK](/sdk/installation) — it handles auth, refresh, and serialization. Use the raw API when you're in a language we don't ship a client for, or when you're building a sidecar. + + If you are in Python, prefer the [SDK](/sdk/installation) so JWT refresh stays handled for you. Stay on this page for curl, other languages, or when you are debugging raw HTTP. + + +## Quick reference + +| Topic | Where | +| --- | --- | +| Live host | `https://api.getmetacognition.com` | +| Auth header | `Authorization: Bearer ` | +| Get tokens | [`POST /auth/token-exchange`](/api-reference/auth/token-exchange) | +| Trace failures | `X-Correlation-ID` + JSON `request_id` | +| Limits & metering | [Usage, quotas, and billing](/concepts/usage-billing) | + +Small surface: **`/me`**, **`/ingestion/memory`**, **`/recall`**, **`/usage/*`**. ## Base URL @@ -13,13 +28,13 @@ https://api.getmetacognition.com ## Auth -Every Tex-unique endpoint expects: +Every product endpoint expects: ```http Authorization: Bearer ``` -Where `` is a JWT obtained by exchanging your API key. See [`POST /auth/token-exchange`](/api-reference/auth/token-exchange). +You mint `` by posting your API key to [`POST /auth/token-exchange`](/api-reference/auth/token-exchange). After that you treat it like any short-lived bearer token. ## Content type @@ -29,17 +44,15 @@ Always: Content-Type: application/json ``` -## Correlation ID +## Correlation IDs -Every request gets an `X-Correlation-ID` (UUID) generated on the server if you don't supply one. The SDK generates one client-side. Pass it through your own logs to trace requests end-to-end: +Each request should carry (or receive) an `X-Correlation-ID` UUID. The server mints one if you skip it; the SDK always sends its own so your service logs and ours match. Paste that value into support threads. ```http X-Correlation-ID: 4f1d8e3c-2a9b-4c0d-9e6f-1a2b3c4d5e6f ``` -Quote this header in support tickets. - -## Errors +## HTTP errors Standard HTTP status codes: @@ -55,7 +68,7 @@ Standard HTTP status codes: | `404` | Not found | | `422` | Validation error (FastAPI shape) | | `429` | Daily quota exceeded | -| `5xx` | Our problem | +| `5xx` | Tex platform fault—retry with backoff, then escalate with the correlation ID | Error body shape: @@ -71,20 +84,17 @@ Error body shape: ## Rate limits -Daily quota is enforced per-org. Currently: - -- `tokens_in`: 1,000,000 / day -- `tokens_out`: 5,000,000 / day - -Reset 00:00 UTC. See [Usage and billing](/concepts/usage-billing). + + Limits are per organization. Today the free tier allows **1,000,000** `tokens_in` and **5,000,000** `tokens_out` each UTC day, resetting at **00:00 UTC**. See [Usage, quotas, and billing](/concepts/usage-billing) for how that shows up in responses and dashboards. + ## Retries -Recommended client-side policy: 2 retries with exponential backoff on `408`, `500`, `502`, `503`, `504`, and network errors. Honor the `Retry-After` header. +You should retry **twice** with exponential backoff on `408`, `500`, `502`, `503`, `504`, and hard network failures. Respect `Retry-After` when the API sends it. -Don't retry `400`, `401`, `403`, `404`, `422`, `429` — they're terminal. +Skip retries for `400`, `401`, `403`, `404`, `422`, and `429`—those will not magically clear on a replay. -## Endpoint index +## Endpoints diff --git a/api-reference/usage/summary.mdx b/api-reference/usage/summary.mdx index 2571077..6c4855b 100644 --- a/api-reference/usage/summary.mdx +++ b/api-reference/usage/summary.mdx @@ -1,6 +1,6 @@ --- -title: "Monthly usage" -description: "Calendar-month token totals." +title: "Monthly usage summary" +description: "Calendar-month rollups in UTC—handy for finance and capacity reviews." api: "GET https://api.getmetacognition.com/usage/summary" --- diff --git a/api-reference/usage/today.mdx b/api-reference/usage/today.mdx index 1834f77..434c0bb 100644 --- a/api-reference/usage/today.mdx +++ b/api-reference/usage/today.mdx @@ -1,6 +1,6 @@ --- title: "Today's usage" -description: "Today's token totals and the active daily quota." +description: "Today's token totals, limits, and headroom before you hit a 429." api: "GET https://api.getmetacognition.com/usage/today" --- diff --git a/authentication.mdx b/authentication.mdx index 8faa80c..4eb0465 100644 --- a/authentication.mdx +++ b/authentication.mdx @@ -1,38 +1,78 @@ --- -title: "Authentication" -description: "How API keys, JWTs, and the refresh loop work." +title: "Authentication & API keys" +description: "How your API key becomes a JWT, how the SDK refreshes it, and where to store secrets in dev and prod." +icon: "key" --- -You hold an **API key.** The SDK exchanges it for a short-lived **JWT** the first time it's needed and refreshes transparently. You never see the JWT. +import { AuthSequence } from "/snippets/flow-visuals.mdx"; -## The auth flow + + If you only need a working client, start with the [Quickstart](/quickstart). Return here when you are wiring secrets, bringing your own JWT, or planning key rotation. + + +Your API key is exchanged for short-lived access and refresh tokens the first time the SDK hits a real product route. After that the client keeps JWTs in memory (or your hooks), refreshes them before they expire, and you rarely handle raw token strings yourself. + +## Flow + +The numbered list below matches this sequence—your app calls the SDK; the SDK talks to the API (including exchange, refresh, and retries). ```mermaid sequenceDiagram - participant App as Your app - participant SDK as Tex SDK - participant API as Tex API - - App->>SDK: Tex(api_key="tex_live_…") - Note over SDK: Lazy — no call yet - App->>SDK: tex.recall(...) - SDK->>API: POST /auth/token-exchange - API-->>SDK: access_token (24h) + refresh_token (7d) - SDK->>API: POST /recall (Authorization: Bearer access_token) - API-->>SDK: hits - - Note over SDK: 24h later, on next call: - SDK->>API: POST /recall (expired) - API-->>SDK: 401 - SDK->>API: POST /auth/refresh - API-->>SDK: new access_token - SDK->>API: POST /recall (retry) - API-->>SDK: hits + autonumber + participant App as Your app + participant SDK as Tex SDK + participant API as Tex API + + App->>SDK: Tex(api_key=...) + note over SDK: Lazy — no network until a real method runs + App->>SDK: tex.recall(...) / remember / usage + SDK->>API: POST /auth/token-exchange + API-->>SDK: access_token (24h) + refresh_token (7d) + SDK->>API: Product call (Authorization: Bearer access_token) + API-->>SDK: 200 + payload + SDK-->>App: return value + + note over App,API: Later: access token expired + App->>SDK: tex.recall(...) + SDK->>API: Product call (expired Bearer) + API-->>SDK: 401 + SDK->>API: POST /auth/refresh + API-->>SDK: new access_token + SDK->>API: Retry product call + API-->>SDK: 200 + payload + SDK-->>App: return value ``` -## What you provide - -The constructor accepts any **one** of three auth modes: +### Steps in the SDK + + + +## Auth mode + +Use **one** of these: @@ -76,7 +116,7 @@ The constructor accepts any **one** of three auth modes: -## Where to keep the key +## Key storage @@ -95,7 +135,7 @@ The constructor accepts any **one** of three auth modes: The SDK reads `TEX_API_KEY` from the environment automatically when `api_key=` is omitted. -## Rotating keys without downtime +## Rotation @@ -112,7 +152,7 @@ The SDK reads `TEX_API_KEY` from the environment automatically when `api_key=` i -## What happens on a bad key +## Bad key ```python Python diff --git a/benchmarks.mdx b/benchmarks.mdx index e621e37..116b5d7 100644 --- a/benchmarks.mdx +++ b/benchmarks.mdx @@ -1,6 +1,7 @@ --- -title: "Benchmarks" -description: "#1 on LoCoMo (93.3%) and LongMemEval_S (92.2%) — the two industry-standard long-term memory benchmarks." +title: "Benchmarks & methodology" +description: "LoCoMo and LongMemEval_S scores, per-category tables, latency, and token efficiency—so you can verify claims yourself." +icon: "trophy" --- export const Bar = ({label, pct, value, win}) => ( @@ -20,20 +21,20 @@ export const Chart = ({title, children}) => ( ); -Tex is the state of the art on long-term conversational memory. Below are the full results from the two benchmarks the field uses to compare systems: **LoCoMo** (EMNLP 2024) and **LongMemEval** (ICLR 2025). +On **LoCoMo**, the full Tex stack scores **93.3%** overall. On **LongMemEval_S** (active retrieval only), Tex is at **92.2%**. The sections that follow break out categories, latency, and tokens in plain view so you can reproduce the story yourself. - Full system. Previous SOTA: **EverMemOS at 92.3%**. + Full Tex system vs published baselines—EverMemOS was the prior headline number at **92.3%**. - Active-only. Previous SOTA among retrieval systems: **Emergence AI at 86.0%**. + Active retrieval track vs other retrieval-first systems—Emergence AI posted **86.0%** on comparable reporting. - - Both numbers are reported with **gpt-4o-mini** as the reader and **gpt-4o** as the LLM-as-judge. Single-machine deployment; no distributed infrastructure. Methodology and reproducibility notes below. - + + We generated answers with **gpt-4o-mini** and graded them with **gpt-4o** in an LLM-as-judge configuration. Each evaluation ran on a **single** machine from start to finish (no multi-node setup). The precise recipe is in **Methodology** below. + ## LoCoMo @@ -65,7 +66,7 @@ LoCoMo evaluates long-conversation memory across 10 multi-session conversations - Tex's **99.33% on adversarial questions** is the strongest signal: the system correctly *refuses* to answer unanswerable questions rather than hallucinating. Most competitors don't disclose this category. + On adversarial items Tex is **99.33%**: it declines to answer when the transcript does not support one. If a vendor’s public write-up skips that bucket, treat their headline number with the skepticism it deserves. ## LongMemEval_S @@ -97,19 +98,19 @@ LongMemEval_S evaluates memory over **500 questions across ~48 sessions each (~1 - + - Tex's **92.2% matches Oracle GPT-o3 (92.0%)** and beats Oracle GPT-4o (82.4%) — meaning Tex's retrieval surfaces relevant context more effectively than oracle-selected sessions for many question types. + **92.2%** sits next to Oracle GPT-o3 (**92.0%**) and well above Oracle GPT-4o (**82.4%**). In practice that means the retrieval step is surfacing roughly the same evidence you would hand-pick for each question—without you doing the hand-picking. ## Latency -| Configuration | p50 | p90 | End-to-end (incl. reader) | +| Configuration | p50 | p90 | End-to-end (including the model that reads context) | | --- | --- | --- | --- | | **Tex (Active)** | **~120ms** | ~200ms | ~0.6s | | **Tex (Full System)** | ~350ms | ~500ms | ~1.0s | @@ -152,16 +153,14 @@ LongMemEval_S evaluates memory over **500 questions across ~48 sessions each (~1 - **Tex uses zero LLM tokens during ingestion.** All embedding and indexing runs on offline models — no provider call per turn. Competitors typically run an LLM pass on every ingested message; those costs aren't always disclosed but are real. + Ingest here costs **zero LLM tokens**—offline embeds only. Vendors that run an LLM per message will show up in your bill; ask them before you commit. ### Headline efficiency claims -- **27% more accurate than Mem0** with **87% fewer tokens** and **~95% lower latency**. -- **5.8% more accurate than MemMachine Memory mode** with **43% fewer tokens**. -- **Lowest tokens-per-correct-answer on the board** at ~1,296. +Compared with published Mem0 numbers, Tex lands about **27%** higher on accuracy while using roughly **87%** fewer tokens and running at about **95%** lower latency. Against MemMachine’s memory-only configuration it is about **5.8%** more accurate on **43%** fewer tokens. On LoCoMo the stack also posts the lowest tokens-per-correct-answer we have recorded here — about **1,296** — because ingestion never spends LLM cycles on every message. -## Ablation — what each capability contributes +## Ablation: what each part of the pipeline adds On LongMemEval_S, peeling layers off the production retrieval pipeline: @@ -177,18 +176,15 @@ On LongMemEval_S, peeling layers off the production retrieval pipeline: ## Methodology -- **Reader model**: `gpt-4o-mini` -- **Judge model**: `gpt-4o` -- **Evaluation**: LLM-as-judge, category-specific judge prompts, binary yes/no, averaged per category -- **LoCoMo**: 10 conversations, 1,984 QA pairs, Tex Full System -- **LongMemEval_S**: 500 questions, ~48 sessions each, Tex Active Only -- **Infrastructure**: Single machine. No distributed cluster. Complete eval runs end-to-end without manual intervention. +Answers came from **`gpt-4o-mini`**. We graded them with an LLM-as-judge setup built on **`gpt-4o`**, using category-specific prompts, binary pass/fail per item, and a straight average inside each category. -## What's next +**LoCoMo** used all ten provided conversations (**1,984** question–answer pairs) against the **full Tex system** (not retrieval-only). **LongMemEval_S** used **500** questions with about **48** sessions each (~115K tokens per trace) against **Tex Active** retrieval only. -- **MemoryAgentBench (ICLR 2026)** — next eval target; tests accurate retrieval, test-time learning, long-range understanding, conflict resolution. -- **Multi-step reasoning** — two-pass retrieval + reasoning for complex counting / computation. -- **LongMemEval_M** — scaling to 500+ sessions per question for extreme long-memory evaluation. +Hardware was a **single** machine for the full pipeline—no multi-node orchestration—and runs completed without someone babysitting intermediate steps. + +## On our roadmap for evals + +We plan to add **MemoryAgentBench (ICLR 2026)** (retrieval quality, learning on the fly, long context, conflicting facts), stronger **multi-step reasoning** when answers need counting or arithmetic over evidence, and **LongMemEval_M** for the nasty case of hundreds of sessions behind one question. ## References @@ -200,6 +196,6 @@ On LongMemEval_S, peeling layers off the production retrieval pipeline: 6. Supermemory. *State-of-the-Art Agent Memory Research.* 2026. 7. Mastra. *Observational Memory: 95% on LongMemEval.* 2026. - - Five minutes from `pip install` to first recall. + + You go from `pip install tex-sdk` to a printed recall in one sitting. diff --git a/changelog.mdx b/changelog.mdx index 21c8667..6bd40c0 100644 --- a/changelog.mdx +++ b/changelog.mdx @@ -1,6 +1,7 @@ --- title: "Changelog" -description: "What's new in the Tex SDK." +description: "SDK and API releases: metering, verbs, latency improvements—skim before you upgrade." +icon: "clock-rotate-left" --- ## 1.1.0 — 2026-05 diff --git a/concepts/memory-model.mdx b/concepts/memory-model.mdx index 53cab3b..de0f653 100644 --- a/concepts/memory-model.mdx +++ b/concepts/memory-model.mdx @@ -1,67 +1,106 @@ --- title: "How memory works" -description: "Three layers, two write phases, one query API." +description: "Turns you write, observations and entities Tex derives—plus how fast each layer shows up in recall." +icon: "brain" --- -Tex stores memory in three layers, each with different latency and recall characteristics. +import { PipelineFlow } from "../snippets/flow-visuals.mdx"; + +Call **`remember`** when you have new turns to store; call **`recall`** when you have a question and want the best-matching slices back. Most of the time you think in **turns**, but Tex also maintains **observations** and **entities** under the hood. The hot path for a write usually settles in about **150 ms**, while heavier indexing continues afterward. + +Those layers have different freshness, so a single **`recall`** can blend raw lines, distilled facts, and linked entities in one payload. + +## Layers - The raw transcript. What was said, by whom, when. + Raw lines: who said what, when. - Atomic facts extracted from turns. *"User avoids shellfish."* + Atomic facts inferred from turns (for example, dietary constraints, locations). - Recurring things — people, places, projects — linked across observations. + Recurring nouns—people, places, orgs—with links across observations. -## How writes work +## Writes -```mermaid -flowchart LR - A[remember] -->|~150ms| B[Active memory] - B -->|returns| C[Caller] - A -.->|async| D[Passive enrichment] - D --> E[Observations + Entities + Temporal] -``` +You call `remember`; Tex commits a **fast** slice you can query immediately, then keeps **enriching** after the response is already on its way back to you. The flow looks like this: -- **Active write** is synchronous. Turns are recallable in ~150ms. -- **Passive enrichment** runs in the background. Observations and entities surface in subsequent recalls. + -You don't wait for enrichment. The active turn is enough for the next request. +### Fast path -## How reads work +You get control back quickly. New turns are usually recallable within about **150 ms** in normal conditions. -```mermaid -flowchart LR - Q[Query] --> X[Expansion] - X --> H[Hybrid retrieval] - H --> R[Cross-encoder rerank] - R --> C[Calibrated confidence] - C --> O[Top-k hits] -``` +### Background -Vector search + temporal scoring + entity-graph hits, fused, reranked, then calibrated. The output: scored turns, observations, and entities — plus a confidence number. +Observations, entities, and timeline enrichment continue after the response. They tighten recall quality on **later** queries. -## What gets extracted +You never need enrichment to finish for the **next** user message to work—the latest turn alone can be enough. -Take this turn: +## Reads + +You pass natural-language **`q`** (plus scope); Tex runs fused retrieval, reranking, and emits a calibrated **`confidence`** score alongside the hits. The HTTP shape lives at [recall](/api-reference/memory/recall). + + + +## One example turn ```python {"role": "user", "text": "I just moved from Seattle to Austin for a job at Acme.", "timestamp": "..."} ``` -| Layer | What's stored | +| Layer | What you get | | --- | --- | -| Turn | Full text, role, timestamp, dedup hash | -| Observations | `lives_in: Austin`, `previously_lived_in: Seattle`, `works_at: Acme` | -| Entities | `Person(self)`, `Place(Seattle)`, `Place(Austin)`, `Org(Acme)` — linked | -| Temporal | A point on the user's timeline | +| Turn | Full text, role, timestamp, dedupe metadata | +| Observations | Facts like current city, previous city, employer | +| Entities | Typed nodes (person, place, org) wired together | +| Temporal | Events on a lightweight timeline | + +## BYO facts -You don't think about extraction — it's automatic. You *can* pre-extract and pass observations inline. See [`conversations.remember`](/sdk/conversations-remember). +You can ignore extraction and use defaults. Already have facts? Attach them on `remember`—see [`conversations.remember`](/sdk/conversations-remember). - - Org / user / session partitioning. + + How `org_id`, `user_id`, and `session_id` isolate memory. diff --git a/concepts/retrieval.mdx b/concepts/retrieval.mdx index 98ae76b..b135824 100644 --- a/concepts/retrieval.mdx +++ b/concepts/retrieval.mdx @@ -1,42 +1,57 @@ --- title: "Recall and ranking" -description: "Active vs deep, top_k tuning, confidence gating." +description: "Active vs deep recall, tuning top_k, and when to trust—or ignore—the confidence score." +icon: "magnifying-glass" --- -`tex.recall(q, session_id, ...)` is your read path. +Stick with **`mode="active"`** for anything a person is staring at; switch to **`deep`** when you can spend more time or when **`active`** keeps coming back thin. **`top_k`** is your dial for context size—tight in chat, generous in digests—and it feeds straight into **`tokens_out`**. When **`confidence`** stays under about **0.3**, plan on skipping memory for that turn, widening the query, or trying **`deep`** once. + +Python: `tex.recall(q, session_id, ...)`. HTTP: [recall](/api-reference/memory/recall). ## Modes +### `active` (default) + +- **Best for:** Chat, copilots, anything with a person waiting. +- **Rough latency:** about **1.5–2.5 s** end-to-end in typical setups. +- **Behavior:** Single-pass retrieval and ranking. + +### `deep` + +- **Best for:** Offline jobs, “why did we decide X?” investigations, or a second pass after weak `active` results. +- **Rough latency:** about **3–6 s**. +- **Behavior:** Two-pass with heavier reranking. + - - Single-pass. **1.5–2.5s.** Use for every interactive call. + + Default. Fast path for interactive products. - Two-pass with iterative rerank. **3–6s.** For periodic analysis or low-confidence retries. + Slower, richer retrieval when latency is acceptable. ## `top_k` -Defaults: **15** (active) / **25** (deep). Server caps at **30** regardless of what you send. +Defaults: **15** (`active`) / **25** (`deep`). The server caps at **30** no matter what you send. -| Use case | `top_k` | +| Situation | Starting `top_k` | | --- | --- | -| Live chat, small context | 3–5 | -| Live chat, large context | 8–15 | -| Summaries / digests | 20–30 | +| Tight assistant prompt | 3–5 | +| Standard chat with citations | 8–15 | +| Summaries or long answers | 20–30 | -Larger `top_k` = more `tokens_out` billed. +Larger `top_k` directly increases **`tokens_out`** on your bill—see [Usage, quotas, and billing](/concepts/usage-billing). ## Confidence -Every recall returns a `confidence ∈ [0, 1]`, calibrated so `P(hits relevant | c) ≈ c`. +Every recall returns **`confidence` in [0, 1]**, calibrated so that roughly **`P(relevant hits | confidence) ≈ confidence`**. -| Range | Meaning | Action | +| Range | How to read it | Practical move | | --- | --- | --- | -| ≥ 0.6 | Strong | Use as-is | -| 0.3 – 0.6 | Useful but uncertain | Use, cite the sources | -| < 0.3 | Weak | Try `mode="deep"`, or skip memory | +| **≥ 0.6** | Strong | Pass context to the model as-is. | +| **0.3 – 0.6** | Mixed | Use hits, but cite or summarize sources for the user. | +| **< 0.3** | Weak | Try `mode="deep"`, rephrase `q`, or skip memory for this turn. | ```python hits = tex.recall(q=q, session_id=sid) @@ -44,25 +59,27 @@ if hits.confidence < 0.3: hits = tex.recall(q=q, session_id=sid, mode="deep") ``` -## What's in a hit +## Hit fields ```python RecallHit(id, text, score, kind, timestamp) # turns + observations RecallEntity(id, label, score) # entities ``` -`hits.hits.turns` is what you stuff into a system prompt. `hits.hits.observations` are atomic facts. `hits.hits.entities` are linked things — useful for analytical queries. +- **`hits.hits.turns`** — usual choice for stuffing a system prompt. +- **`hits.hits.observations`** — atomic facts. +- **`hits.hits.entities`** — useful for analytical or “who / what / where” questions. -## Timeline +## Timeline string ```python hits = tex.recall(q="when did we discuss pricing?", session_id=sid, include_timeline=True) -print(hits.timeline) # a pre-rendered chronological summary string +print(hits.timeline) # optional pre-rendered string ``` -`timeline` is an `Optional[str]` — drop it into a prompt as-is. It's not iterable. +`timeline` is an **`Optional[str]`**: either drop it straight into a prompt or ignore it. It is not a list you iterate. - - Tokens in / out and quota. + + How recall choices affect `tokens_in` / `tokens_out`. diff --git a/concepts/scopes.mdx b/concepts/scopes.mdx index 533f8c6..b70afe5 100644 --- a/concepts/scopes.mdx +++ b/concepts/scopes.mdx @@ -1,40 +1,43 @@ --- -title: "Multi-user memory" -description: "Org / user / session — how Tex partitions memory." +title: "Scopes and multi-tenancy" +description: "How org_id, user_id, and session_id partition memory so your customers never leak into each other." +icon: "layer-group" --- -Every turn and every recall is keyed by `(org_id, user_id, session_id)`. +Every call keys off **`(org_id, user_id, session_id)`**. `org_id` comes from your API key. **`session_id` is yours**—that’s where you put thread / channel / tenant. -| Field | Set by | Required | +| Field | Source | You set it? | | --- | --- | --- | -| `org_id` | The JWT (your API key) — server-injected | ✓ | -| `user_id` | The JWT, or override per-call for sub-user scoping | ✓ | -| `session_id` | **You, per call.** | ✓ | +| `org_id` | JWT minted from your API key | Almost never | +| `user_id` | JWT (or per-call override when supported) | Sometimes | +| `session_id` | Your application | **Always** | -`org_id` is locked to your API key. You almost never set it manually. `session_id` is where you do the work. +`session_id` is the knob you touch daily. -## Picking a `session_id` +## `session_id` -Free-form string. Reuse, don't enumerate. +Pick a stable pattern: - + `chat-{conversation_uuid}` `slack-{channel_id}` - + `agent-{task_id}` - + `bio-{user_id}` -## Multi-user SaaS +Keep strings deterministic for the same logical thread so `recall` sees the same corpus every time. -Each end-user gets their own memory. **Encode the end-user into the `session_id`** with one shared client: +## SaaS (one key) + +Map each customer to a distinct **`session_id`** (and optionally `user_id` when you use overrides). One shared `Tex` client is enough: ```python tex = Tex(api_key=os.environ["TEX_API_KEY"], base_url=BASE_URL) @@ -45,12 +48,12 @@ def chat(user_msg: str, x_user_id: str, conv_id: str): ... ``` -One API key, one bill, one client, full isolation between end-users. +That gives you **one bill** and **hard separation** between tenants—as long as you never reuse someone else’s `session_id` by accident. - Per-call `user_id` scoping is on the SDK roadmap (1.2). Until then, encoding into `session_id` is the cleanest pattern. See the [multi-tenant SaaS recipe](/recipes/multi-tenant-saas). + Fine-grained per-call `user_id` overrides are planned for **SDK 1.2**. Until then, encoding the end user into `session_id` is the cleanest pattern. See the [multi-tenant SaaS recipe](/recipes/multi-tenant-saas). - - Active vs deep, top_k, confidence. + + `top_k`, modes, confidence. diff --git a/concepts/usage-billing.mdx b/concepts/usage-billing.mdx index 7e66226..7c27e55 100644 --- a/concepts/usage-billing.mdx +++ b/concepts/usage-billing.mdx @@ -1,9 +1,10 @@ --- -title: "Usage and billing" -description: "Tokens in, tokens out, daily quotas." +title: "Usage, quotas, and billing" +description: "What we meter (tokens in/out), how daily caps work, and how to reason about cost as you scale." +icon: "receipt" --- -The billable unit is **tokens** — `tokens_in` (everything you sent us) and `tokens_out` (everything we returned). Counted with `tiktoken cl100k_base`, same encoding as `text-embedding-3-large`. +We bill on **`tokens_in`** plus **`tokens_out`**, counted with **`tiktoken`** using the **`cl100k_base`** vocabulary. The free tier allows **1M** tokens in and **5M** out per UTC day; exceeding either side returns **`429`**. Every response includes a **`usage`** object, and **`tex.usage.today()`** / **`.summary()`** give you org-level rollups. ## Free tier @@ -16,48 +17,52 @@ The billable unit is **tokens** — `tokens_in` (everything you sent us) and `to -Reset 00:00 UTC. Both caps trigger `429 RateLimitError`. +Both reset at **00:00 UTC**. Crossing either limit raises **`RateLimitError`** (`HTTP 429`). -## Reading usage +## Usage in code -Every `remember` and `recall` response carries its own usage: +### Per response + +Every `remember` and `recall` returns usage for that call: ```python hits = tex.recall(q="...", session_id=sid) print(hits.usage.tokens_in, hits.usage.tokens_out) ``` -For org-wide totals: +### Per org (dashboards, cron jobs) ```python -today = tex.usage.today() # daily + active quota +today = tex.usage.today() # daily totals + quota headroom month = tex.usage.summary() # current calendar month march = tex.usage.summary("2026-03") ``` -Or visit the [Dashboard → Usage](https://app.getmetacognition.com/dashboard/usage) for graph form. +The [Dashboard → Usage](https://app.getmetacognition.com/dashboard/usage) page shows the same numbers graphically. -## Cost control +## Cost knobs -- **Cap `top_k`** — defaults 15/25. Cut to 5–8 for live chat. -- **Use `mode="active"`** — faster *and* cheaper than `deep`. -- **Pre-filter writes** — strip system messages and "ok" / "thanks" turns. -- **Batch `remember`** — pass dozens of turns in one call instead of looping. -- **Quota-aware routing** — gate non-essential paths when usage > 90%. +| lever | why it helps | +| --- | --- | +| **Lower `top_k`** | Defaults are 15 / 25; live chat often needs only 5–8. | +| **Stay on `active`** | `deep` mode costs more time and tokens than `active`. | +| **Trim noisy writes** | Skip one-word acks and redundant system spam in `remember`. | +| **Batch turns** | Send many turns in one `remember` instead of dozens of calls. | +| **Quota-aware routing** | Fall back to “no memory” for non-critical paths when you are near the cap. | ```python if tex.usage.today().tokens_in_used / 1_000_000 > 0.9: return generate_without_memory(query) ``` -## Alerts +## Alerting -Today, alerting is opt-in: poll `tex.usage.today()` from your own monitoring (every 5 min) and page on `> 0.9` of either cap. Server-side email alerts at 80% are on the [roadmap](/changelog). +There is no hosted email alert yet. Poll **`tex.usage.today()`** every few minutes from your own monitor and page when either usage column crosses **~90%** of quota. Server-side emails near **80%** are on the [roadmap](/changelog). -## After launch +## Pricing (later) -Pay-as-you-go. Pricing TBA. The daily caps stay as soft alerts — we'll email you, not 429 you. +Billing will be pay-as-you-go once pricing is published. Daily caps remain safety rails—we intend to **notify** you as you approach limits rather than surprise-**429** production, but treat today’s behavior as authoritative until the billing docs update. - - Set up the Python client. + + `pip install tex-sdk` diff --git a/docs.json b/docs.json new file mode 100644 index 0000000..11550ae --- /dev/null +++ b/docs.json @@ -0,0 +1,180 @@ +{ + "$schema": "https://mintlify.com/docs.json", + "theme": "mint", + "name": "Tex | Memory API for agents", + "colors": { + "primary": "#F32C05", + "light": "#FF5530", + "dark": "#C82200" + }, + "favicon": "/favicon.png", + "navigation": { + "anchors": [ + { + "anchor": "Documentation", + "icon": "book-open", + "groups": [ + { + "group": "Get started", + "pages": ["quickstart", "introduction", "authentication"] + }, + { + "group": "Concepts", + "pages": [ + { + "group": "Memory & retrieval", + "pages": ["concepts/memory-model", "concepts/retrieval"] + }, + { + "group": "Tenancy & usage", + "pages": ["concepts/scopes", "concepts/usage-billing"] + } + ] + }, + { + "group": "Cookbook", + "pages": [ + { + "group": "Apps", + "pages": ["recipes/fastapi", "recipes/streamlit"] + }, + { + "group": "Agents & channels", + "pages": ["recipes/langchain", "recipes/slack-bot"] + }, + { + "group": "Production patterns", + "pages": ["recipes/azure-openai-rag", "recipes/multi-tenant-saas"] + } + ] + }, + { + "group": "Evaluation", + "pages": ["benchmarks"] + }, + { + "group": "Migration", + "pages": [ + "migration/from-redis", + "migration/from-langchain-memory", + "migration/from-supermemory" + ] + }, + { + "group": "Support", + "pages": ["troubleshooting", "changelog"] + } + ] + }, + { + "anchor": "Python SDK", + "icon": "python", + "groups": [ + { + "group": "Python SDK", + "pages": [ + { + "group": "Setup", + "pages": ["sdk/installation", "sdk/client"] + }, + { + "group": "Calls", + "pages": [ + "sdk/conversations-remember", + "sdk/recall", + "sdk/usage", + "sdk/errors" + ] + } + ] + } + ] + }, + { + "anchor": "REST API", + "icon": "code", + "groups": [ + { + "group": "REST API", + "pages": [ + "api-reference/overview", + { + "group": "Auth & sessions", + "pages": [ + "api-reference/auth/signup", + "api-reference/auth/token-exchange", + "api-reference/auth/refresh" + ] + }, + { + "group": "Organization", + "pages": [ + "api-reference/account/me", + "api-reference/account/api-keys" + ] + }, + { + "group": "Memory", + "pages": [ + "api-reference/memory/ingest-memory", + "api-reference/memory/recall" + ] + }, + { + "group": "Usage", + "pages": [ + "api-reference/usage/today", + "api-reference/usage/summary" + ] + } + ] + } + ] + }, + { + "anchor": "GitHub", + "href": "https://github.com/metacoglabs", + "icon": "github" + } + ] + }, + "logo": { + "light": "/logo/light.png", + "dark": "/logo/dark.png" + }, + "background": { + "color": { + "light": "#FFFFFF", + "dark": "#0A0A0A" + } + }, + "navbar": { + "links": [ + { + "label": "Dashboard", + "href": "https://app.getmetacognition.com" + } + ], + "primary": { + "type": "button", + "label": "Get an API key", + "href": "https://app.getmetacognition.com/signup" + } + }, + "footer": { + "socials": { + "github": "https://github.com/metacoglabs", + "linkedin": "https://www.linkedin.com/company/metacognition-ai" + } + }, + "fonts": { + "heading": { + "family": "Geist", + "weight": 700 + }, + "body": { + "family": "Geist", + "weight": 400 + } + } +} diff --git a/introduction.mdx b/introduction.mdx index 955a6ad..f73dcdd 100644 --- a/introduction.mdx +++ b/introduction.mdx @@ -1,35 +1,51 @@ --- -title: "Introduction" -description: "#1 on every major long-term memory benchmark. 93.3% on LoCoMo. 92.2% on LongMemEval_S. Sub-200ms retrieval. Zero LLM tokens during ingestion." +title: "Overview — What is Tex?" +description: "Long-term memory for assistants and agents: you remember turns, you recall what matters, you keep prompts small." +icon: "book-open" --- -Tex is the memory layer for AI agents — and the state of the art on every long-term memory benchmark we know of. Stream conversation turns to it; pull back the relevant slice on every new request. Bounded prompts, cross-session continuity, no Redis to babysit. + + New here? Run through the [Quickstart](/quickstart), skim this overview, then follow [Authentication](/authentication) before you ship to production. + + +Chat apps usually force a bad choice: paste the whole thread every request, or lose state on refresh. Tex stores turns as they happen and pulls back what matches the *current* question—then you call your model. + +## API + +| Call | When | +| --- | --- | +| **`remember`** | New stuff to store (turns + optional metadata). | +| **`recall`** | Call right before generation: pass a natural-language question, get ranked hits and a confidence score. | + +You keep model, routing, and UI. Tex does storage, retrieval, ranking, metering. + + + Need access? Create an account in the [dashboard](https://app.getmetacognition.com/signup), copy the API key once, then follow the [Quickstart](/quickstart). Locally, set `TEX_API_KEY` or pass `api_key=` to the client. + + +## Start - First call in 5 minutes. + Install `tex-sdk`, one `remember`, one `recall`, print scores in the terminal. - - Sign up at the dashboard. + + LoCoMo and LongMemEval_S with splits, latency, and token math laid out in the open. -## Best in class +## Benchmarks - **#1.** Beats EverMemOS (92.3%), MemMachine (91.7%), Zep (~85%), Mem0 (~66%). + Full-system benchmark; splits and baselines on [Benchmarks](/benchmarks). - **#1.** Beats Emergence AI (86%), Supermemory (81.6%), Oracle GPT-4o (82.4%). + Active retrieval track; how we ran the evals is on the same page. - - Per-category breakdowns. Latency. Token efficiency. Ablations. - - -## The loop +## Loop @@ -46,37 +62,47 @@ Tex is the memory layer for AI agents — and the state of the art on every long ``` - Drop `context` into your system prompt. Reply. Persist the new turns. Loop. + Put `context` where your model reads it, answer, append new turns—repeat. -## Concepts + + Low `confidence` at the start just means thin memory—feed real traffic and it moves. + + +## More + + + + The synchronous part of `remember` returns fast; heavier enrichment continues afterward. Diagrams and timing notes live in [How memory works](/concepts/memory-model). + + + + You partition with `org_id`, `user_id`, and `session_id`. Read [Scopes and multi-tenancy](/concepts/scopes) before you map those fields to your auth model. + + + + Prefer the [Python SDK](/sdk/installation) for JWT exchange and refresh. Use [REST](/api-reference/overview) from other languages or when you already centralize HTTP in a gateway. + + + + Metering is token-based with daily caps. Full detail is under [Usage, quotas, and billing](/concepts/usage-billing). + + + +## Docs - Turns, observations, entities — what we store and why. + What lands in storage after `remember`. - Active vs deep modes. Confidence calibration. - - - Org / user / session — multi-tenant in two lines. - - - Tokens in / out. Quota. Cost control. + Modes, `top_k`, confidence. - - -## Build with it - - - `pip install tex-sdk` - - - Direct HTTP, any language. + Install and client setup. - Chatbot · agent · Slack · Streamlit + Apps, agents, production patterns. diff --git a/migration/from-langchain-memory.mdx b/migration/from-langchain-memory.mdx index d37f856..885460a 100644 --- a/migration/from-langchain-memory.mdx +++ b/migration/from-langchain-memory.mdx @@ -1,6 +1,6 @@ --- -title: "From LangChain" -description: "Migrate from BaseChatMemory subclasses to Tex retrieval." +title: "Migrate from LangChain chat memory" +description: "Move off BaseChatMemory-style buffers to Tex-backed recall with a short mapping guide." --- LangChain's built-in memory classes (`ConversationBufferMemory`, `ConversationBufferWindowMemory`, `ConversationSummaryMemory`, `ConversationKGMemory`) are **buffers, summarizers, or graphs** that live inside your process. Tex is a **hosted retrieval service.** @@ -63,22 +63,15 @@ The migration moves you from "give the LLM the whole sliding window" to "give th -## What you give up +## Tradeoffs -LangChain's memory classes are simple in-process objects with no network call. Tex adds: +In-process LangChain memory = zero network. Tex = network + ~150ms writes, multi-second reads. -- Network round-trip (~150ms `remember`, ~1.7s `recall`) — your request gets slower. -- A dependency on our service availability. +**Down:** slower request, depends on our API. -What you gain: +**Up:** survives deploys, bounded prompts, cross-session memory, confidence, extracted facts. -- Memory survives restarts, deploys, and shard moves. -- Bounded prompt size — never hit the context limit. -- Cross-session continuity — same user across days/weeks/months. -- Confidence scoring — gate fallback paths. -- Observation extraction — structured "facts" without writing prompts. - -For a hobby project, LangChain's buffer is fine. For anything user-facing in production, Tex's tradeoff is right. +Hobby bot: buffer is fine. Customer-facing: Tex is usually worth it. ## Drop-in adapter (optional) diff --git a/migration/from-redis.mdx b/migration/from-redis.mdx index 37fd58c..363897b 100644 --- a/migration/from-redis.mdx +++ b/migration/from-redis.mdx @@ -1,11 +1,11 @@ --- -title: "From Redis" -description: "Replace your Redis-backed conversation log with Tex memory." +title: "Migrate from Redis (or a homegrown log)" +description: "Swap append-only chat logs for remember/recall while keeping your routing and models the same." --- -If your chatbot stores chat history in Redis (or Postgres, or Mongo) and dumps the whole thing into every prompt, this is the migration for you. +If your chat history lives in Redis/Postgres/Mongo and you dump it all into every prompt, this page is for you. -## The pattern you're replacing +## Before (Redis log) ```python before.py # Append every turn @@ -23,7 +23,7 @@ Pain points: - "Last 50" is a guess. Older relevant context gets evicted. - Redis cost grows linearly forever. -## The Tex version +## After (Tex) ```python after.py hits = tex.recall(q=user_msg, session_id=sid, top_k=8) diff --git a/migration/from-supermemory.mdx b/migration/from-supermemory.mdx index 6878da4..9898a21 100644 --- a/migration/from-supermemory.mdx +++ b/migration/from-supermemory.mdx @@ -1,6 +1,6 @@ --- -title: "From Supermemory" -description: "Migrate from Supermemory's SDK to Tex with minimal code changes." +title: "Migrate from Supermemory" +description: "If you already integrated Supermemory, here is the Tex-shaped equivalent of each call." --- The Tex Python SDK is **resource-shape compatible** with Supermemory for the verbs we both implement. If you're using `supermemory.add(...)`, `client.search(...)`, or `client.profile(...)`, the migration is mostly a pip install and a base-url swap. diff --git a/mint.json b/mint.json deleted file mode 100644 index 9e84229..0000000 --- a/mint.json +++ /dev/null @@ -1,160 +0,0 @@ -{ - "$schema": "https://mintlify.com/schema.json", - "name": "Tex", - "logo": { - "dark": "/logo/dark.png", - "light": "/logo/light.png" - }, - "favicon": "/favicon.png", - "colors": { - "primary": "#F32C05", - "light": "#FF5530", - "dark": "#C82200", - "background": { - "light": "#FFFFFF", - "dark": "#0A0A0A" - }, - "anchors": { - "from": "#F32C05", - "to": "#FF5530" - } - }, - "font": { - "headings": { - "family": "Geist", - "weight": 700 - }, - "body": { - "family": "Source Serif 4", - "weight": 400 - } - }, - "topbarLinks": [ - { - "name": "Dashboard", - "url": "https://app.getmetacognition.com" - } - ], - "topbarCtaButton": { - "name": "Get an API key", - "url": "https://app.getmetacognition.com/signup" - }, - "anchors": [ - { - "name": "Python SDK", - "icon": "python", - "url": "sdk" - }, - { - "name": "API Reference", - "icon": "code", - "url": "api-reference" - }, - { - "name": "GitHub", - "icon": "github", - "url": "https://github.com/metacoglabs" - } - ], - "navigation": [ - { - "group": "Get started", - "pages": [ - "introduction", - "quickstart", - "authentication" - ] - }, - { - "group": "Benchmarks", - "pages": [ - "benchmarks" - ] - }, - { - "group": "Guides", - "pages": [ - "concepts/memory-model", - "concepts/retrieval", - "concepts/scopes", - "concepts/usage-billing" - ] - }, - { - "group": "Python SDK", - "pages": [ - "sdk/installation", - "sdk/client", - "sdk/conversations-remember", - "sdk/recall", - "sdk/usage", - "sdk/errors" - ] - }, - { - "group": "API Reference", - "pages": [ - "api-reference/overview", - { - "group": "Auth", - "pages": [ - "api-reference/auth/signup", - "api-reference/auth/token-exchange", - "api-reference/auth/refresh" - ] - }, - { - "group": "Account", - "pages": [ - "api-reference/account/me", - "api-reference/account/api-keys" - ] - }, - { - "group": "Memory", - "pages": [ - "api-reference/memory/ingest-memory", - "api-reference/memory/recall" - ] - }, - { - "group": "Usage", - "pages": [ - "api-reference/usage/today", - "api-reference/usage/summary" - ] - } - ] - }, - { - "group": "Cookbook", - "pages": [ - "recipes/fastapi", - "recipes/langchain", - "recipes/azure-openai-rag", - "recipes/slack-bot", - "recipes/streamlit", - "recipes/multi-tenant-saas" - ] - }, - { - "group": "Migrate", - "pages": [ - "migration/from-redis", - "migration/from-langchain-memory", - "migration/from-supermemory" - ] - }, - { - "group": "Help", - "pages": [ - "troubleshooting", - "changelog" - ] - } - ], - "footerSocials": { - "github": "https://github.com/metacoglabs", - "linkedin": "https://www.linkedin.com/company/metacognition-ai" - } -} diff --git a/quickstart.mdx b/quickstart.mdx index b63912a..012fb78 100644 --- a/quickstart.mdx +++ b/quickstart.mdx @@ -1,26 +1,41 @@ --- title: "Quickstart" -description: "First call in five minutes." +description: "From zero to a working remember + recall: install tex-sdk, set TEX_API_KEY, run one script, read the scores." +icon: "rocket" --- +## Goal + +One script: `remember` a turn, `recall` it with a question, print scores and token usage. A few minutes with Python 3.9+. + - Sign up at [app.getmetacognition.com](https://app.getmetacognition.com/signup) and copy the key shown once. + Open [app.getmetacognition.com/signup](https://app.getmetacognition.com/signup), create an account, and copy the key shown once. - The key appears only on the screen after signup — store it now. + You only see the full key at creation time. Store it in a password manager or secret store now—rotating later is easy, guessing later is not. - - ```bash + + + ```bash pip pip install tex-sdk ``` - Or `uv add tex-sdk` / `poetry add tex-sdk`. Requires Python ≥ 3.9. Distribution name is `tex-sdk`; import name is `tex`. + ```bash uv + uv add tex-sdk + ``` + + ```bash poetry + poetry add tex-sdk + ``` + + + You need **Python ≥ 3.9**. On PyPI the package is **`tex-sdk`**; in code you **`import tex`**. - + ```python first_call.py import os from tex import Tex @@ -47,26 +62,46 @@ description: "First call in five minutes." export TEX_API_KEY="tex_live_..." python first_call.py ``` + + + You should see the shellfish line with a numeric score, plus `confidence` and token usage. If you get `AuthenticationError`, your key or `base_url` is wrong—start with [Troubleshooting](/troubleshooting). + - - Confidence will be low when memory is fresh — keep adding turns and watch it climb. - - ## Next + + + Use `python-dotenv` or your framework’s loader. Keep `.env` out of git. In production, inject `TEX_API_KEY` from the same secret store you use for every other third-party key. + + + + Read [REST API overview](/api-reference/overview): exchange the API key for a JWT, then call ingestion and recall with `Authorization: Bearer …`. The SDK exists so you do not write that refresh loop by hand. + + + +## Reads + +| Goal | Page | +| --- | --- | +| Understand what gets stored | [How memory works](/concepts/memory-model) | +| Tune recall quality | [Recall and ranking](/concepts/retrieval) | +| Ship real users | [Scopes and multi-tenancy](/concepts/scopes) | +| Drop behind a real API | [Production chatbot (FastAPI)](/recipes/fastapi) | +| Production errors | [Errors and retries](/sdk/errors) | + - What's stored and how. + Turns, observations, entities after each `remember`. - - Multi-user partitioning. + + Map `session_id` (and tenants) to your product. - - Production chatbot in 40 lines. + + Small service pattern behind a UI. - - Every exception, every retry rule. + + What the SDK throws and what it retries. diff --git a/recipes/azure-openai-rag.mdx b/recipes/azure-openai-rag.mdx index 6d1ab14..c4d6914 100644 --- a/recipes/azure-openai-rag.mdx +++ b/recipes/azure-openai-rag.mdx @@ -1,95 +1,103 @@ --- -title: "RAG with Azure GPT-4o" -description: "Tex memory + GPT-4o on Azure for production RAG." +title: "RAG on Azure OpenAI" +description: "You pair Tex recall with Azure GPT-4o answers—same loop works for any chat-completions API if you swap the client." +icon: "cloud" --- -The canonical "memory + LLM" loop with Azure OpenAI. Same shape works for OpenAI direct, Anthropic, or any chat-completion API — swap the SDK. - -## Install - -```bash -pip install tex-sdk openai -``` - -## Environment - -```bash -# .env -TEX_API_KEY=tex_live_... -TEX_BASE_URL=https://api.getmetacognition.com - -AZURE_OPENAI_ENDPOINT=https://.openai.azure.com -AZURE_OPENAI_API_KEY=... -AZURE_OPENAI_DEPLOYMENT=gpt-4o -AZURE_OPENAI_API_VERSION=2025-04-01-preview -``` - -## Code - -```python rag.py -import os -from datetime import datetime, timezone -from openai import AzureOpenAI -from tex import Tex, RateLimitError, APITimeoutError - -tex = Tex(api_key=os.environ["TEX_API_KEY"], base_url=os.environ["TEX_BASE_URL"], timeout=10) -gpt = AzureOpenAI( - api_key=os.environ["AZURE_OPENAI_API_KEY"], - api_version=os.environ["AZURE_OPENAI_API_VERSION"], - azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"], -) - -def now_iso() -> str: - return datetime.now(timezone.utc).isoformat().replace("+00:00", "Z") - -def answer(query: str, sid: str) -> dict: - # 1. Recall — soft-fail - memory: list[str] = [] - confidence = 0.0 - try: - hits = tex.recall(q=query, session_id=sid, top_k=5) - memory = [h.text for h in hits.hits.turns] - confidence = hits.confidence - except (RateLimitError, APITimeoutError): - pass - - # 2. Generate - sys_msg = ( - "You are a helpful assistant. " - + (f"Relevant memory:\n{chr(10).join('- ' + m for m in memory)}" - if memory else "") +Recall → answer → remember. Same loop on OpenAI direct or Anthropic—swap the client. + + + + ```bash + pip install tex-sdk openai + ``` + + + + ```bash + TEX_API_KEY=tex_live_... + TEX_BASE_URL=https://api.getmetacognition.com + + AZURE_OPENAI_ENDPOINT=https://.openai.azure.com + AZURE_OPENAI_API_KEY=... + AZURE_OPENAI_DEPLOYMENT=gpt-4o + AZURE_OPENAI_API_VERSION=2025-04-01-preview + ``` + + + + ```python + import os + from openai import AzureOpenAI + from tex import Tex + + tex = Tex( + api_key=os.environ["TEX_API_KEY"], + base_url=os.environ["TEX_BASE_URL"], + timeout=10, ) - chat = gpt.chat.completions.create( - model=os.environ["AZURE_OPENAI_DEPLOYMENT"], - messages=[ - {"role": "system", "content": sys_msg}, - {"role": "user", "content": query}, - ], - temperature=0.4, + gpt = AzureOpenAI( + api_key=os.environ["AZURE_OPENAI_API_KEY"], + api_version=os.environ["AZURE_OPENAI_API_VERSION"], + azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"], ) - reply = chat.choices[0].message.content - - # 3. Remember - tex.conversations.remember(session_id=sid, turns=[ - {"role":"user","text": query, "timestamp": now_iso()}, - {"role":"assistant","text": reply, "timestamp": now_iso()}, - ]) - - return {"answer": reply, "confidence": confidence, "memory_used": len(memory)} - -if __name__ == "__main__": - import json - print(json.dumps(answer("any food restrictions?", "demo"), indent=2)) -``` - -## Run - -```bash -python rag.py -``` - -## Notes - -- **Cite sources.** Pass each hit's `id` to GPT-4o and instruct it to cite `[mem:abc]` — you can later resolve those IDs back to the turn for click-through. -- **Confidence-gated retry.** If `confidence < 0.4`, retry with `mode="deep"`. Costs +1–4s, rescues most weak-recall cases. -- **Streaming.** `stream=True` works as expected. Push `remember` to a background task so it doesn't delay the stream. + ``` + + + + ```python + from datetime import datetime, timezone + from tex import RateLimitError, APITimeoutError + + def now_iso() -> str: + return datetime.now(timezone.utc).isoformat().replace("+00:00", "Z") + + def answer(query: str, sid: str) -> dict: + memory: list[str] = [] + confidence = 0.0 + try: + hits = tex.recall(q=query, session_id=sid, top_k=5) + memory = [h.text for h in hits.hits.turns] + confidence = hits.confidence + except (RateLimitError, APITimeoutError): + pass + + sys_msg = ( + "You are a helpful assistant. " + + (f"Relevant memory:\n{chr(10).join('- ' + m for m in memory)}" if memory else "") + ) + chat = gpt.chat.completions.create( + model=os.environ["AZURE_OPENAI_DEPLOYMENT"], + messages=[ + {"role": "system", "content": sys_msg}, + {"role": "user", "content": query}, + ], + temperature=0.4, + ) + reply = chat.choices[0].message.content + + tex.conversations.remember( + session_id=sid, + turns=[ + {"role": "user", "text": query, "timestamp": now_iso()}, + {"role": "assistant", "text": reply, "timestamp": now_iso()}, + ], + ) + return {"answer": reply, "confidence": confidence, "memory_used": len(memory)} + ``` + + + + ```python + if __name__ == "__main__": + import json + print(json.dumps(answer("any food restrictions?", "demo-session"), indent=2)) + ``` + + + +## Harder stuff + +- **You cite sources** — pass hit ids into the prompt and ask the model to quote `[mem:]` so you can deep-link later. +- **You retry deep recall** — if `confidence < 0.4`, you call `tex.recall(..., mode="deep")` once before answering. +- **You stream** — you set `stream=True`, and you enqueue `remember` in a background worker so the stream starts instantly. diff --git a/recipes/fastapi.mdx b/recipes/fastapi.mdx index 6e24b0c..8cf77ad 100644 --- a/recipes/fastapi.mdx +++ b/recipes/fastapi.mdx @@ -1,20 +1,132 @@ --- -title: "Chatbot backend (FastAPI)" -description: "Drop-in chatbot backend with Tex memory." +title: "Production chatbot (FastAPI)" +description: "You run a small FastAPI service with Tex on every turn—one client per process, recall, generate, remember." +icon: "server" --- -Chatbot backend in ~40 lines. One Tex client per process; memory recall on every turn. +One `/chat` route: recall → your LLM → remember. One cached `Tex` per process so you are not reconnecting every hit. ## Layout -```text -app/ -├── deps.py # Tex singleton -├── main.py # FastAPI app -└── chat.py # /chat endpoint -``` - -## Code +| File | Job | +| --- | --- | +| `deps.py` | Cached Tex | +| `chat.py` | `/chat` | +| `main.py` | App entry | + + + + Create this structure: + + ```text + app/ + ├── deps.py # cached Tex client + ├── main.py # FastAPI entry + └── chat.py # /chat route + ``` + + You can rename `app/`—just keep the import paths consistent in `uvicorn`. + + + + You read secrets from the environment and construct Tex **once**: + + ```python deps.py + from functools import cache + from tex import Tex + import os + + @cache + def tex_client() -> Tex: + return Tex( + api_key=os.environ["TEX_API_KEY"], + base_url=os.environ.get( + "TEX_BASE_URL", "https://api.getmetacognition.com" + ), + timeout=10, + ) + ``` + + You call `tex_client()` inside FastAPI `Depends(...)` so every route shares the same pool. + + + + You derive `session_id` from headers + body, recall with a small `top_k`, swallow quota/timeouts, then remember both sides of the turn: + + ```python chat.py + from datetime import datetime, timezone + from fastapi import APIRouter, Depends, Header + from pydantic import BaseModel + from tex import Tex, RateLimitError, APITimeoutError + from .deps import tex_client + + router = APIRouter() + + def now_iso() -> str: + return datetime.now(timezone.utc).isoformat().replace("+00:00", "Z") + + class ChatBody(BaseModel): + text: str + session_id: str + + @router.post("/chat") + def chat( + body: ChatBody, + x_user_id: str = Header(...), + tex: Tex = Depends(tex_client), + ): + sid = f"u_{x_user_id}-{body.session_id}" + + memory: list[str] = [] + try: + hits = tex.recall(q=body.text, session_id=sid, top_k=5) + memory = [h.text for h in hits.hits.turns] + except (RateLimitError, APITimeoutError): + memory = [] + + answer = your_llm.complete(body.text, memory=memory) + + tex.conversations.remember( + session_id=sid, + turns=[ + {"role": "user", "text": body.text, "timestamp": now_iso()}, + {"role": "assistant", "text": answer, "timestamp": now_iso()}, + ], + ) + + return {"answer": answer} + ``` + + You swap `your_llm.complete(...)` for whatever stack you already use (OpenAI, Azure, local, etc.). + + + + You mount the router once: + + ```python main.py + from fastapi import FastAPI + from .chat import router + + app = FastAPI() + app.include_router(router) + ``` + + + + You export your key and launch uvicorn: + + ```bash + export TEX_API_KEY=tex_live_... + uvicorn app.main:app --reload + ``` + + You hit `POST /chat` with JSON `{"text":"...","session_id":"..."}` and header `x-user-id`. + + + +## Full files + +If you prefer one copy block, you can still paste the trio together: ```python deps.py @@ -58,7 +170,6 @@ def chat( ): sid = f"u_{x_user_id}-{body.session_id}" - # 1. Recall — gracefully degrade on timeout/quota memory: list[str] = [] try: hits = tex.recall(q=body.text, session_id=sid, top_k=5) @@ -66,10 +177,8 @@ def chat( except (RateLimitError, APITimeoutError): memory = [] - # 2. Generate — your LLM of choice answer = your_llm.complete(body.text, memory=memory) - # 3. Remember (fire and forget — see notes) tex.conversations.remember(session_id=sid, turns=[ {"role": "user", "text": body.text, "timestamp": now_iso()}, {"role": "assistant", "text": answer, "timestamp": now_iso()}, @@ -87,38 +196,45 @@ app.include_router(router) ``` -## Run - -```bash -export TEX_API_KEY=tex_live_... -uvicorn app.main:app --reload -``` - -## Production tweaks - -**Push `remember` to a background task.** Holding the request open for the write adds 100–250ms to tail latency: - -```python -from fastapi import BackgroundTasks - -@router.post("/chat") -def chat(body: ChatBody, bg: BackgroundTasks, tex: Tex = Depends(tex_client)): - ... - bg.add_task(tex.conversations.remember, - session_id=sid, turns=[user_turn, assistant_turn]) - return {"answer": answer} -``` - -**Bound recall latency** with `Tex(timeout=2.0)`; catch `APITimeoutError` and degrade to no-memory generation. - -**Health check:** - -```python -@app.get("/healthz") -def healthz(tex: Tex = Depends(tex_client)): - try: - tex.usage.today() - return {"ok": True} - except Exception as e: - return {"ok": False, "error": str(e)}, 503 -``` +## Prod tweaks + + + + Holding the HTTP request open for `remember` adds ~100–250ms tail latency. You enqueue a background task instead: + + ```python + from fastapi import BackgroundTasks + + @router.post("/chat") + def chat( + body: ChatBody, + bg: BackgroundTasks, + x_user_id: str = Header(...), + tex: Tex = Depends(tex_client), + ): + # ... recall + answer ... + bg.add_task( + tex.conversations.remember, + session_id=sid, + turns=[user_turn, assistant_turn], + ) + return {"answer": answer} + ``` + + + + You set `Tex(timeout=2.0)` and catch `APITimeoutError` so a slow recall never blocks your entire generation window. + + + + ```python + @app.get("/healthz") + def healthz(tex: Tex = Depends(tex_client)): + try: + tex.usage.today() + return {"ok": True} + except Exception as e: + return {"ok": False, "error": str(e)}, 503 + ``` + + diff --git a/recipes/langchain.mdx b/recipes/langchain.mdx index 24246cb..c5d8db9 100644 --- a/recipes/langchain.mdx +++ b/recipes/langchain.mdx @@ -1,107 +1,143 @@ --- -title: "Agent memory (LangChain)" -description: "Use Tex as an agent's memory tool." +title: "LangChain agents with memory" +description: "You give LangChain a Tex-backed tool, or you inject recall yourself before the chain—pick what matches how much autonomy you want." +icon: "link" --- -Two patterns: - -1. **Memory tool** — agent explicitly calls `recall` when it needs context. -2. **Pre-prompt injection** — controller code recalls before the LLM call and stuffs results into the prompt. - -Pattern 2 is simpler and almost always sufficient. Pattern 1 is right for true autonomous agents. - -## Pattern 1: memory as a tool - -```python -# pip install tex-sdk langchain langchain-openai -import os -from datetime import datetime, timezone -from tex import Tex -from langchain.tools import tool -from langchain.agents import create_react_agent, AgentExecutor -from langchain_openai import ChatOpenAI - -tex = Tex(api_key=os.environ["TEX_API_KEY"], base_url=os.environ["TEX_BASE_URL"]) -SESSION = "agent-1" - -def now_iso() -> str: - return datetime.now(timezone.utc).isoformat().replace("+00:00", "Z") - -@tool -def recall_memory(query: str) -> str: - """Look up the agent's long-term memory for relevant context. - Returns up to 5 most-relevant past statements.""" - hits = tex.recall(q=query, session_id=SESSION, top_k=5) - if not hits.hits.turns: - return "(no relevant memory)" - return "\n".join(f"- {h.text}" for h in hits.hits.turns) - -@tool -def remember_fact(text: str) -> str: - """Persist a fact for future recall.""" - tex.conversations.remember(session_id=SESSION, turns=[ - {"role":"system","text":text,"timestamp":now_iso()}, - ]) - return "remembered" - -agent = create_react_agent( - ChatOpenAI(model="gpt-4o"), - tools=[recall_memory, remember_fact, ...], - prompt="...", -) -``` - -The agent decides when to invoke `recall_memory` — useful for multi-step plans where some steps need history and others don't. - -## Pattern 2: injection at the controller level - -```python -import os -from tex import Tex - -tex = Tex(api_key=os.environ["TEX_API_KEY"], base_url=os.environ["TEX_BASE_URL"]) - -def chain(user_msg: str, sid: str) -> str: - # 1. Recall - hits = tex.recall(q=user_msg, session_id=sid, top_k=5) - memory = "\n".join(f"- {h.text}" for h in hits.hits.turns) - - # 2. Build prompt with memory +Two setups: + +| | | +| --- | --- | +| **Inject** | You recall before the chain; most chat apps. | +| **Tools** | Agent calls `recall` when it wants; heavier, flexible. | + +## Inject (default) + + + + ```bash + pip install tex-sdk langchain langchain-openai + ``` + + + + ```python + import os + from tex import Tex + + tex = Tex( + api_key=os.environ["TEX_API_KEY"], + base_url=os.environ["TEX_BASE_URL"], + ) + ``` + + + + ```python + from datetime import datetime, timezone from langchain.prompts import ChatPromptTemplate from langchain_openai import ChatOpenAI - prompt = ChatPromptTemplate.from_messages([ - ("system", "Relevant memory about the user:\n{memory}"), - ("user", "{input}"), - ]) - chain = prompt | ChatOpenAI(model="gpt-4o") - - answer = chain.invoke({"memory": memory, "input": user_msg}).content - - # 3. Persist - tex.conversations.remember(session_id=sid, turns=[ - {"role":"user","text":user_msg,"timestamp": now_iso()}, - {"role":"assistant","text":answer,"timestamp": now_iso()}, - ]) - return answer -``` - -## Replacing `BaseChatMemory` + def now_iso() -> str: + return datetime.now(timezone.utc).isoformat().replace("+00:00", "Z") + + def answer_turn(user_msg: str, session_id: str) -> str: + hits = tex.recall(q=user_msg, session_id=session_id, top_k=5) + memory = "\n".join(f"- {h.text}" for h in hits.hits.turns) + + prompt = ChatPromptTemplate.from_messages([ + ("system", "Relevant memory about the user:\n{memory}"), + ("user", "{input}"), + ]) + chain = prompt | ChatOpenAI(model="gpt-4o") + + reply = chain.invoke({"memory": memory, "input": user_msg}).content + + tex.conversations.remember( + session_id=session_id, + turns=[ + {"role": "user", "text": user_msg, "timestamp": now_iso()}, + {"role": "assistant", "text": reply, "timestamp": now_iso()}, + ], + ) + return reply + ``` + + + +## Tools + + + + ```python + import os + from datetime import datetime, timezone + from tex import Tex + from langchain.tools import tool + from langchain.agents import create_react_agent, AgentExecutor + from langchain_openai import ChatOpenAI -LangChain's built-in memory (`ConversationBufferMemory`, etc.) is a buffer. Tex is retrieval. The migration is: + tex = Tex(api_key=os.environ["TEX_API_KEY"], base_url=os.environ["TEX_BASE_URL"]) + SESSION = "agent-1" + + def now_iso() -> str: + return datetime.now(timezone.utc).isoformat().replace("+00:00", "Z") + + @tool + def recall_memory(query: str) -> str: + """Look up long-term memory; returns bullet list of statements.""" + hits = tex.recall(q=query, session_id=SESSION, top_k=5) + if not hits.hits.turns: + return "(no relevant memory)" + return "\n".join(f"- {h.text}" for h in hits.hits.turns) + + @tool + def remember_fact(text: str) -> str: + """Persist a fact for later recall.""" + tex.conversations.remember( + session_id=SESSION, + turns=[{"role": "system", "text": text, "timestamp": now_iso()}], + ) + return "remembered" + ``` + + + + ```python + agent = create_react_agent( + ChatOpenAI(model="gpt-4o"), + tools=[recall_memory, remember_fact], + prompt="...", # you supply + ) + + executor = AgentExecutor(agent=agent, tools=[recall_memory, remember_fact]) + ``` + + + + + You merge additional LangChain tools into the same `tools=[...]` list whenever your agent needs them—the Tex tools behave like every other tool. + + +## Vs `BaseChatMemory` + +LangChain buffers keep *everything*; Tex retrieves *top‑k*. You delete sliding-window hacks and stop blowing context windows. + + +```python Before +from langchain.chains import ConversationChain +from langchain.memory import ConversationBufferMemory -```python -# BEFORE — full history in the prompt memory = ConversationBufferMemory() chain = ConversationChain(llm=llm, memory=memory) +``` -# AFTER — top-k relevant turns, regardless of recency +```python After hits = tex.recall(q=user_msg, session_id=sid, top_k=5) prompt = stitch(hits.hits.turns, user_msg) answer = llm.invoke(prompt) -tex.conversations.remember(session_id=sid, turns=[...]) +tex.conversations.remember(session_id=sid, turns=[...]) # fill like your prod code ``` + -You stop maintaining a sliding window. You stop hitting context-length errors. The retrieval is bounded. - -See [Migrating from LangChain memory](/migration/from-langchain-memory) for a full walk-through. +You want the full playbook? Read [Migrating from LangChain memory](/migration/from-langchain-memory). diff --git a/recipes/multi-tenant-saas.mdx b/recipes/multi-tenant-saas.mdx index c6173ac..6093057 100644 --- a/recipes/multi-tenant-saas.mdx +++ b/recipes/multi-tenant-saas.mdx @@ -1,108 +1,131 @@ --- -title: "Multi-tenant SaaS" -description: "One Tex key, partitioned per end-user." +title: "Multi-tenant SaaS pattern" +description: "You fan out many customers on one Tex org, or you mint one key per customer—pick the pattern that matches your billing story." +icon: "building" --- -You're building a product where each of your customers has their own private memory. Two patterns; almost everyone wants Pattern A. - -## Pattern A — one Tex key, encode end-user into session (recommended) - -You hold a single Tex API key in your secrets manager. End-users are partitioned by encoding their id into the `session_id` you pass on each call. One shared, long-lived client serves all your traffic. - -```python deps.py -from functools import cache -from tex import Tex -import os - -@cache -def shared_tex() -> Tex: - return Tex( - api_key=os.environ["TEX_API_KEY"], - base_url=os.environ["TEX_BASE_URL"], - ) -``` - -```python chat.py -from fastapi import APIRouter, Header -from .deps import shared_tex - -router = APIRouter() - -@router.post("/chat") -def chat(body: ChatBody, x_user_id: str = Header(...)): - tex = shared_tex() - sid = f"u_{x_user_id}-{body.session_id}" # user-scoped session - - hits = tex.recall(q=body.text, session_id=sid) - answer = your_llm(body.text, memory=hits) - tex.conversations.remember(session_id=sid, turns=[...]) - return {"answer": answer} -``` +You isolate tenants two ways. Most teams stay on **Pattern A**. + +## A — one key, bake user into `session_id` + + + + ```python deps.py + from functools import cache + from tex import Tex + import os + + @cache + def shared_tex() -> Tex: + return Tex( + api_key=os.environ["TEX_API_KEY"], + base_url=os.environ["TEX_BASE_URL"], + ) + ``` + + + + ```python chat.py + from pydantic import BaseModel + from fastapi import APIRouter, Header + from .deps import shared_tex + + router = APIRouter() + + class ChatBody(BaseModel): + text: str + session_id: str + + @router.post("/chat") + def chat(body: ChatBody, x_user_id: str = Header(...)): + tex = shared_tex() + sid = f"u_{x_user_id}-{body.session_id}" + + hits = tex.recall(q=body.text, session_id=sid) + answer = your_llm(body.text, memory=hits) + tex.conversations.remember( + session_id=sid, + turns=[ + {"role": "user", "text": body.text, "timestamp": "2026-05-13T00:00:00Z"}, + {"role": "assistant", "text": answer, "timestamp": "2026-05-13T00:00:00Z"}, + ], + ) + return {"answer": answer} + ``` + + You swap `your_llm(...)` for your stack; you list the turns you already send today. + + | Trait | Pattern A | | --- | --- | -| Number of Tex keys to manage | 1 | -| Bills | 1 (yours) | -| Dashboard access for end-users | None (you build your own UI) | -| Memory isolation | By `session_id` prefix — strict, no cross-talk | -| Latency | Best (one warm client, no per-user TLS) | +| Tex keys you operate | **1** | +| Bills | **1** (yours) | +| Dashboard for end-users | You build it | +| Isolation | Strong, as long as you do not collide `session_id` | - The Tex constructor also accepts `user_id=...`, which sends the value as `scope.user_id` in the request body. That works today, but **the SDK scopes `user_id` per client, not per call** — using it for per-end-user partitioning would force one `Tex` instance per end-user, which is slow. Per-call `user_id` scoping is on the SDK roadmap (1.2). Until then, encode the end-user into `session_id`. + The SDK accepts a constructor `user_id`, but **scopes are per client instance today**. If you spun up one `Tex` per end-user you would thrash TLS pools. Until per-call `user_id` ships in SDK **1.2**, you keep encoding the tenant into `session_id`. -## Pattern B — one Tex key per end-user - -You're *reselling* Tex — your customers want to log into our dashboard with their own account, see their own usage, get their own bill. - -```python signup.py -import httpx - -def onboard_end_user(end_user_email: str) -> str: - """Mint a fresh Tex org for this end-user. Persist the key in your DB.""" - resp = httpx.post( - "https://api.getmetacognition.com/signup", - json={"name": end_user_email}, - ) - resp.raise_for_status() - data = resp.json() - db.users.update(end_user_email, tex_api_key=data["api_key"], tex_org_id=data["org_id"]) - return data["api_key"] -``` - -Then per-request, look up that user's key: - -```python -def tex_for_user(end_user_id: str) -> Tex: - row = db.users.get(end_user_id) - return Tex(api_key=row.tex_api_key, base_url=os.environ["TEX_BASE_URL"]) -``` - -Cache the `Tex` per user-id in a TTL'd dict (≤ 1h) so you reuse connections without keeping every user's client live forever. +## B — one key per customer org + + + + ```python signup.py + import httpx + + def onboard_end_user(end_user_email: str) -> str: + resp = httpx.post( + "https://api.getmetacognition.com/signup", + json={"name": end_user_email}, + ) + resp.raise_for_status() + data = resp.json() + db.users.update( + end_user_email, + tex_api_key=data["api_key"], + tex_org_id=data["org_id"], + ) + return data["api_key"] + ``` + + + + ```python + def tex_for_user(end_user_id: str) -> Tex: + row = db.users.get(end_user_id) + return Tex(api_key=row.tex_api_key, base_url=os.environ["TEX_BASE_URL"]) + ``` + + You cache instances in a TTL map (~1h) so warm connections stick around without leaking every user forever. + + | Trait | Pattern B | | --- | --- | -| Number of Tex keys to manage | One per end-user — store in your DB. | -| Bills | One per end-user (we send each org an invoice). | -| Dashboard access | Each end-user has their own dashboard login. | -| Memory isolation | At the org level — completely separate. | +| Tex keys | **One per paying customer** | +| Bills | **Per customer org** | +| Dashboard | Each customer can log into Tex directly | +| Isolation | Hard boundary at org level | -## Picking +## Which one - You're building an app on top of Tex. Most cases. Cleanest ops. + You ship an app *on top of* Tex—shared infra, simplest ops, you own metering. - - You're reselling Tex as part of your platform. Customers want their own bill. + + You *resell* Tex and customers expect their own bill + console. -## Quota strategy under Pattern A +## Shared quota (A only) + +Daily quotas are **per Tex org**. Under Pattern A every user shares **your** quota—one noisy tenant can starve everyone. -Daily quotas are per-org. Under Pattern A, all end-users share your one quota — so a runaway end-user can starve everyone else. +Mitigations **you** layer in: -Mitigations: -- Track per-user token usage yourself (the `usage` field on every response makes this free). -- Soft-cap each user at, say, 10% of your daily limit; switch them to no-memory generation when they exceed it. -- Use [`tex.usage.today()`](/sdk/usage) to gate non-essential paths when total usage > 90%. +- You track per-user bytes/tokens yourself (`usage` is on every response). +- You soft-cap heavy users (for example switch off memory after they consume 10% of your daily budget). +- You poll [`tex.usage.today()`](/sdk/usage) and degrade gracefully after ~90%. diff --git a/recipes/slack-bot.mdx b/recipes/slack-bot.mdx index 23f5ce4..149405e 100644 --- a/recipes/slack-bot.mdx +++ b/recipes/slack-bot.mdx @@ -1,80 +1,95 @@ --- -title: "Slack channel memory" -description: "Channel-scoped memory for a Slack workspace." +title: "Slack bot with channel memory" +description: "You run a Slack Bolt app that remembers every channel message and answers @mentions with Tex recall." +icon: "hashtag" --- -A Slack bot that remembers everything said in each channel and answers contextual questions when summoned. - -## Install - -```bash -pip install tex-sdk slack-bolt python-dotenv -``` - -## Environment - -```bash -SLACK_BOT_TOKEN=xoxb-... -SLACK_APP_TOKEN=xapp-... # if using socket mode -TEX_API_KEY=tex_live_... -TEX_BASE_URL=https://api.getmetacognition.com -``` - -## Code - -```python bot.py -import os -from datetime import datetime, timezone -from slack_bolt import App -from slack_bolt.adapter.socket_mode import SocketModeHandler -from tex import Tex - -tex = Tex(api_key=os.environ["TEX_API_KEY"], base_url=os.environ["TEX_BASE_URL"]) -app = App(token=os.environ["SLACK_BOT_TOKEN"]) - -def now_iso() -> str: - return datetime.now(timezone.utc).isoformat().replace("+00:00", "Z") - -def session_for(channel: str) -> str: - return f"slack-{channel}" - -@app.event("message") -def remember_message(event, say): - """Remember every non-bot message in the channel.""" - if event.get("subtype") or event.get("bot_id"): - return - tex.conversations.remember( - session_id=session_for(event["channel"]), - turns=[{ - "role": "user", - "text": f"<@{event['user']}>: {event['text']}", - "timestamp": now_iso(), - }], - ) - -@app.event("app_mention") -def answer(event, say): - """When mentioned, answer with channel-scoped memory.""" - query = event["text"].split(">", 1)[-1].strip() - if not query: - say("Ask me something — I'll dig through this channel's memory.") - return - - hits = tex.recall(q=query, session_id=session_for(event["channel"]), top_k=5) - - if not hits.hits.turns or hits.confidence < 0.2: - say(f"<@{event['user']}> I don't have anything relevant in memory yet.") - return - - body = "\n".join(f"• {h.text}" for h in hits.hits.turns[:3]) - say(f"<@{event['user']}> here's what I remember:\n{body}") - -if __name__ == "__main__": - SocketModeHandler(app, os.environ["SLACK_APP_TOKEN"]).start() -``` - -## Patterns - -- **Channel vs DM memory.** `slack-{channel}` shares memory across everyone in the channel. For private notes use `slack-{channel}-{user}`. -- **Skip noisy messages.** Filter joins, leaves, uploads, and reactions before `remember`. Saves tokens and reduces recall noise. -- **Acknowledge slow recalls.** `recall` takes 1–3s. React with `:thinking_face:` immediately, then post the answer. +You mirror each Slack channel into its own Tex `session_id`. You **listen** for plain messages to `remember`, and you **listen** for app mentions to `recall` + reply. + + + + ```bash + pip install tex-sdk slack-bolt python-dotenv + ``` + + + + You load classic bot + app tokens (socket mode shown here): + + ```bash + SLACK_BOT_TOKEN=xoxb-... + SLACK_APP_TOKEN=xapp-... + TEX_API_KEY=tex_live_... + TEX_BASE_URL=https://api.getmetacognition.com + ``` + + + + You ignore bot spam, remember human text, and answer mentions with recall: + + ```python bot.py + import os + from datetime import datetime, timezone + from slack_bolt import App + from slack_bolt.adapter.socket_mode import SocketModeHandler + from tex import Tex + + tex = Tex(api_key=os.environ["TEX_API_KEY"], base_url=os.environ["TEX_BASE_URL"]) + app = App(token=os.environ["SLACK_BOT_TOKEN"]) + + def now_iso() -> str: + return datetime.now(timezone.utc).isoformat().replace("+00:00", "Z") + + def session_for(channel: str) -> str: + return f"slack-{channel}" + + @app.event("message") + def remember_message(event, say): + if event.get("subtype") or event.get("bot_id"): + return + tex.conversations.remember( + session_id=session_for(event["channel"]), + turns=[{ + "role": "user", + "text": f"<@{event['user']}>: {event['text']}", + "timestamp": now_iso(), + }], + ) + + @app.event("app_mention") + def answer(event, say): + query = event["text"].split(">", 1)[-1].strip() + if not query: + say("Ask me something — I'll dig through this channel's memory.") + return + + hits = tex.recall(q=query, session_id=session_for(event["channel"]), top_k=5) + + if not hits.hits.turns or hits.confidence < 0.2: + say(f"<@{event['user']}> I don't have anything relevant in memory yet.") + return + + body = "\n".join(f"• {h.text}" for h in hits.hits.turns[:3]) + say(f"<@{event['user']}> here's what I remember:\n{body}") + + if __name__ == "__main__": + SocketModeHandler(app, os.environ["SLACK_APP_TOKEN"]).start() + ``` + + + + ```bash + python bot.py + ``` + + You type in channel, then `@YourBot what did we decide?` to verify recall. + + + +## Slack tweaks + +| Topic | What you do | +| --- | --- | +| **Channel vs DM** | You keep `slack-{channel}` for shared rooms; you append `-{user}` if you need private scratch space. | +| **Noise** | You filter joins, uploads, reactions **before** `remember` so you do not pay tokens for junk. | +| **Latency** | You react with `:thinking_face:` immediately—`recall` can take 1–3s — then you post the final text. | diff --git a/recipes/streamlit.mdx b/recipes/streamlit.mdx index 4a60426..9478e4b 100644 --- a/recipes/streamlit.mdx +++ b/recipes/streamlit.mdx @@ -1,20 +1,142 @@ --- -title: "Chat UI (Streamlit)" -description: "A chat UI with persistent memory in 50 lines." +title: "Streamlit chat UI" +description: "You build one Streamlit page that recalls from Tex, streams GPT output, and shows which memories fired." +icon: "desktop" --- -A single-page Streamlit chat that uses Tex for memory across page reloads, browser refreshes, and even cleared sessions. +You get a **browser-friendly demo** that survives reruns because Tex—not `st.session_state`—owns long-term memory. You still keep lightweight UI state only for what Streamlit redraw needs. -## Install + + + ```bash + pip install tex-sdk streamlit openai + ``` + -```bash -pip install tex-sdk streamlit openai -``` + + You wrap `Tex` and `OpenAI` in `@st.cache_resource` so Streamlit does not recreate TLS pools every interaction: + + ```python + import os + import streamlit as st + from openai import OpenAI + from tex import Tex + + st.set_page_config(page_title="Tex chat", page_icon="🧠") + + @st.cache_resource + def get_clients(): + tex = Tex( + api_key=os.environ["TEX_API_KEY"], + base_url=os.environ.get("TEX_BASE_URL", "https://api.getmetacognition.com"), + ) + gpt = OpenAI() + return tex, gpt + + tex, gpt = get_clients() + ``` + + + + You store a stable `sid` in session state; optionally you read `?uid=` from the query string so QA can fork personas quickly: + + ```python + if "sid" not in st.session_state: + st.session_state.sid = f"web-{st.query_params.get('uid', 'anon')}-default" + sid = st.session_state.sid + + if "messages" not in st.session_state: + st.session_state.messages = [] + ``` + + + + You call `tex.usage.today()` for a quick quota sanity check during demos: + + ```python + with st.sidebar: + st.write("## Usage today") + today = tex.usage.today() + pct_in = min(1.0, today.tokens_in_used / max(1, today.tokens_in_limit)) + pct_out = min(1.0, today.tokens_out_used / max(1, today.tokens_out_limit)) + st.progress(pct_in, f"in: {today.tokens_in_used:,} / {today.tokens_in_limit:,}") + st.progress(pct_out, f"out: {today.tokens_out_used:,} / {today.tokens_out_limit:,}") + st.caption(f"session: `{sid}`") + ``` + + + + You paint historic bubbles from `st.session_state.messages`, then watch `st.chat_input`: + + ```python + for m in st.session_state.messages: + with st.chat_message(m["role"]): + st.write(m["text"]) + + if prompt := st.chat_input("Talk to me…"): + import datetime + now = datetime.datetime.now(datetime.timezone.utc).isoformat().replace("+00:00", "Z") + + with st.chat_message("user"): + st.write(prompt) + + with st.chat_message("assistant"): + with st.spinner("Recalling…"): + hits = tex.recall(q=prompt, session_id=sid, top_k=5) + + if hits.hits.turns: + st.caption(f"confidence {hits.confidence:.2f}") + with st.expander("Memory used"): + for h in hits.hits.turns: + st.write(f"`{h.score:.2f}` — {h.text}") + + memory_block = "\n".join(f"- {h.text}" for h in hits.hits.turns) + sys_prompt = f"You are a thoughtful assistant. Memory:\n{memory_block}" + + chat = gpt.chat.completions.create( + model="gpt-4o-mini", + messages=[ + {"role": "system", "content": sys_prompt}, + {"role": "user", "content": prompt}, + ], + stream=True, + ) + answer = st.write_stream( + chunk.choices[0].delta.content or "" for chunk in chat + ) + + tex.conversations.remember( + session_id=sid, + turns=[ + {"role": "user", "text": prompt, "timestamp": now}, + {"role": "assistant", "text": answer, "timestamp": now}, + ], + ) + st.session_state.messages += [ + {"role": "user", "text": prompt}, + {"role": "assistant", "text": answer}, + ] + ``` + + + + ```bash + export TEX_API_KEY=tex_live_... + export OPENAI_API_KEY=sk-... + streamlit run app.py + ``` -## Code + You open `http://localhost:8501/?uid=alice` and `?uid=bob` in two tabs to prove isolation. + + + +## One file + +If you want one block to paste, use this full `app.py` (same logic as the steps above): ```python app.py -import os, datetime +import os +import datetime import streamlit as st from openai import OpenAI from tex import Tex @@ -27,21 +149,18 @@ def get_clients(): api_key=os.environ["TEX_API_KEY"], base_url=os.environ.get("TEX_BASE_URL", "https://api.getmetacognition.com"), ) - gpt = OpenAI() # or AzureOpenAI(...) + gpt = OpenAI() return tex, gpt tex, gpt = get_clients() -# Stable session for the browser tab — survives reloads if "sid" not in st.session_state: st.session_state.sid = f"web-{st.query_params.get('uid', 'anon')}-default" sid = st.session_state.sid -# Chat history we display this page render only — Tex is the source of truth if "messages" not in st.session_state: st.session_state.messages = [] -# Sidebar: usage + memory stats with st.sidebar: st.write("## Usage today") today = tex.usage.today() @@ -51,25 +170,22 @@ with st.sidebar: st.progress(pct_out, f"out: {today.tokens_out_used:,} / {today.tokens_out_limit:,}") st.caption(f"session: `{sid}`") -# Render chat for m in st.session_state.messages: with st.chat_message(m["role"]): st.write(m["text"]) -# Input if prompt := st.chat_input("Talk to me…"): - now = datetime.datetime.utcnow().isoformat() + "Z" + now = datetime.datetime.now(datetime.timezone.utc).isoformat().replace("+00:00", "Z") with st.chat_message("user"): st.write(prompt) with st.chat_message("assistant"): - # Recall with st.spinner("Recalling…"): hits = tex.recall(q=prompt, session_id=sid, top_k=5) if hits.hits.turns: - st.caption(f"📚 confidence {hits.confidence:.2f}") + st.caption(f"confidence {hits.confidence:.2f}") with st.expander("Memory used"): for h in hits.hits.turns: st.write(f"`{h.score:.2f}` — {h.text}") @@ -77,7 +193,6 @@ if prompt := st.chat_input("Talk to me…"): memory_block = "\n".join(f"- {h.text}" for h in hits.hits.turns) sys_prompt = f"You are a thoughtful assistant. Memory:\n{memory_block}" - # Generate chat = gpt.chat.completions.create( model="gpt-4o-mini", messages=[ @@ -88,29 +203,21 @@ if prompt := st.chat_input("Talk to me…"): ) answer = st.write_stream(chunk.choices[0].delta.content or "" for chunk in chat) - # Persist + update local state - tex.conversations.remember(session_id=sid, turns=[ - {"role":"user","text": prompt, "timestamp": now}, - {"role":"assistant","text": answer, "timestamp": now}, - ]) + tex.conversations.remember( + session_id=sid, + turns=[ + {"role": "user", "text": prompt, "timestamp": now}, + {"role": "assistant", "text": answer, "timestamp": now}, + ], + ) st.session_state.messages += [ - {"role":"user","text": prompt}, - {"role":"assistant","text": answer}, + {"role": "user", "text": prompt}, + {"role": "assistant", "text": answer}, ] ``` -## Run - -```bash -export TEX_API_KEY=tex_live_... -export OPENAI_API_KEY=sk-... -streamlit run app.py -``` - -Open with `?uid=alice` in the URL to scope memory to "alice" — `?uid=bob` gets a separate memory pool. - -## Why this works +## Why bother -- **Refresh-proof** — Streamlit's session state dies on refresh; Tex doesn't. -- **No glue code** — no Postgres / Redis layer for chat history. -- **Built-in debug panel** — the expander shows exactly which memory items the model saw. +- **You survive refreshes** — Streamlit session dies; Tex does not. +- **You skip Redis/Postgres** for early demos. +- **You debug recall visually** — expander shows exactly what the model saw. diff --git a/sdk/client.mdx b/sdk/client.mdx index 82fbcad..6316ad1 100644 --- a/sdk/client.mdx +++ b/sdk/client.mdx @@ -1,6 +1,7 @@ --- title: "Configure the client" -description: "Constructor reference, environment variables, lifecycle." +description: "Constructor arguments, TEX_API_KEY and TEX_BASE_URL, HTTP/2, timeouts, and how the client lives across requests." +icon: "sliders" --- ## `Tex(...)` constructor diff --git a/sdk/conversations-remember.mdx b/sdk/conversations-remember.mdx index e62e684..dd1c102 100644 --- a/sdk/conversations-remember.mdx +++ b/sdk/conversations-remember.mdx @@ -1,6 +1,7 @@ --- -title: "Remember" -description: "Persist conversation turns to memory." +title: "Remember conversation turns" +description: "Persist turns (and optional observations) to a session—payload shape, timestamps, and what happens on the wire." +icon: "pen-to-square" --- ```python diff --git a/sdk/errors.mdx b/sdk/errors.mdx index cabeda6..dd8744e 100644 --- a/sdk/errors.mdx +++ b/sdk/errors.mdx @@ -1,6 +1,7 @@ --- -title: "Handle errors" -description: "Every exception class and the SDK's automatic retry behavior." +title: "Errors and retries" +description: "Exception types, which status codes retry automatically, and what to log before you open a ticket." +icon: "triangle-exclamation" --- All Tex exceptions inherit from `tex.TexError`. You almost always want to catch one of: @@ -151,7 +152,7 @@ tex = Tex(api_key=..., max_retries=5) The `Retry-After` header is honored — if the server says wait 3 seconds, the SDK waits at least 3 (not capped at the backoff value). - `429` from the daily-quota path retries the same way other 429s do — but you'll still be over quota when the retry hits, so the `RateLimitError` ultimately surfaces. If that round-trip waste matters, set `max_retries=0` on quota-sensitive paths. + `429` on quota retries the same as other `429`s—but you are still over the cap when the retry lands, so `RateLimitError` still fires. If that waste bugs you, set `max_retries=0` on quota-sensitive paths. ## Idempotency diff --git a/sdk/installation.mdx b/sdk/installation.mdx index 569bc01..5ace042 100644 --- a/sdk/installation.mdx +++ b/sdk/installation.mdx @@ -1,55 +1,64 @@ --- -title: "Install" -description: "Install tex-sdk and verify the install." +title: "Install the SDK" +description: "Python 3.9+, pip or uv or poetry, verify the import, pin a version range in requirements." +icon: "download" --- ## Requirements - Python ≥ 3.9 -- `pip`, `uv`, or `poetry` +- `pip`, `uv`, or `poetry`—pick whichever you already standardize on ## Install -```bash + +```bash pip pip install tex-sdk ``` -Or `uv add tex-sdk` / `poetry add tex-sdk`. +```bash uv +uv add tex-sdk +``` + +```bash poetry +poetry add tex-sdk +``` + - PyPI distribution is `tex-sdk`. Import name is `tex`: + PyPI lists the package as **`tex-sdk`**. You **`import tex`** in Python: ```python from tex import Tex ``` -## Verify +## Check version ```bash python -c "import tex; print(tex.__version__)" # 1.1.0 ``` -## Pin a version +## Pin versions ```text requirements.txt tex-sdk>=1.1.0,<2 ``` -For PEP 621 `pyproject.toml`, add `"tex-sdk>=1.1.0,<2"` under `[project].dependencies`. For Poetry, `tex-sdk = "^1.1.0"` under `[tool.poetry.dependencies]`. +For PEP 621 `pyproject.toml`, add `"tex-sdk>=1.1.0,<2"` under `[project].dependencies`. For Poetry, use `tex-sdk = "^1.1.0"` under `[tool.poetry.dependencies]` so minor releases flow in without a major bump surprising you. -## Optional extras +## HTTP/2 and grumpy proxies -The SDK pulls in `httpx[http2]` automatically, which gives you HTTP/2 multiplexing. If your egress proxy strips HTTP/2, disable it: +The SDK depends on `httpx[http2]` for multiplexing. If your corporate proxy strips HTTP/2, disable it when you construct the client: ```python tex = Tex(api_key=..., http2=False) ``` -## Type hints +## Types -The SDK ships type stubs (`py.typed`). Mypy and Pyright pick them up automatically — no `types-` package needed. +Stubs ship with the package (`py.typed`). Mypy and Pyright pick them up automatically—you do not need a separate `types-` stub distribution. ```python from tex import Tex, RecallResponse @@ -60,5 +69,5 @@ reveal_type(hits.confidence) # float ``` - Constructor options, environment variables, lifecycle. + Environment variables, timeouts, and lifecycle hooks. diff --git a/sdk/recall.mdx b/sdk/recall.mdx index 499b47c..6d2bc2f 100644 --- a/sdk/recall.mdx +++ b/sdk/recall.mdx @@ -1,6 +1,7 @@ --- -title: "Recall" -description: "Pull the most relevant slice of memory for a query." +title: "Recall relevant context" +description: "Query memory with natural language, get ranked hits, confidence, and usage—all from the Python client." +icon: "magnifying-glass" --- ```python diff --git a/sdk/usage.mdx b/sdk/usage.mdx index 92fc3bb..2245856 100644 --- a/sdk/usage.mdx +++ b/sdk/usage.mdx @@ -1,6 +1,7 @@ --- -title: "Track usage" -description: "Read your org's token totals." +title: "Inspect usage" +description: "Today's totals and monthly rollups—same fields as the dashboard." +icon: "chart-pie" --- ```python diff --git a/snippets/flow-visuals.mdx b/snippets/flow-visuals.mdx new file mode 100644 index 0000000..e1b85a3 --- /dev/null +++ b/snippets/flow-visuals.mdx @@ -0,0 +1,116 @@ +export const PipelineFlow = ({ steps, caption }) => ( +
+
+ {caption && ( +
+ {caption} +
+ )} +
+ {steps.map((step, i) => ( +
+
+ {step.phase && ( +
+ {step.phase} +
+ )} +
{step.label}
+ {step.hint && ( +
{step.hint}
+ )} +
+ {i < steps.length - 1 && ( +
+ ↓ +
+ )} +
+ ))} +
+
+
+); + +export const AuthSequence = ({ phases }) => ( +
+
+
+ {phases.map((p, i) => ( +
+
+ {i + 1} +
+
+
{p.title}
+
+ {p.detail} +
+
+
+ ))} +
+
+
+); + +export const TokenRetryVisual = () => ( +
+
+
+
+
+ Start +
+
+ A normal request comes back 401 +
+
+ Usually the access token expired; refresh might still work. +
+
+
+
+
+ First try +
+
POST /auth/refresh
+
+ If that returns 200, you get a new access token and retry what you were doing. +
+
+
+ ↓ if refresh is also 401 +
+
+
+ Fallback +
+
+ POST /auth/token-exchange +
+
+ Send your API key again. 200 means retry with the new token. 401 means the key is dead—you need a new + one or a proper login flow. +
+
+
+

+ In most SDK setups a single failed request can walk through refresh (and then exchange) before your code + surfaces an error—unless the full chain returns 401. +

+
+
+); diff --git a/troubleshooting.mdx b/troubleshooting.mdx index a22af6b..cab97a5 100644 --- a/troubleshooting.mdx +++ b/troubleshooting.mdx @@ -1,42 +1,70 @@ --- title: "Troubleshooting" -description: "Common symptoms, root causes, and fixes." +description: "Symptoms, likely causes, fixes, and what we need in a support email when you are stuck." +icon: "wrench" --- -## Symptom → cause → fix + + Scan the **symptom table** first and match the log line or exception you are seeing. Open the accordions below when you need more narrative around a fix. + + +## Match your symptom | Symptom | Likely cause | Fix | | --- | --- | --- | -| `AuthenticationError: Invalid API key` on first call | Wrong key, key revoked, or wrong `base_url` | Re-mint at the [dashboard](https://app.getmetacognition.com); verify `TEX_BASE_URL` | -| `BadRequestError: 'scope' field required` | You're calling REST directly without the SDK | Use the SDK; it builds `scope` for you. Or include `scope: {org_id, session_id}` in the body. | -| `recall` returns 0 hits | New session, or memory hasn't finished passive enrichment yet | Wait 1–2s after `remember`; query a broader `q`; switch to `mode="deep"` | -| `recall.confidence` always low | Your `q` doesn't match anything stored | Re-phrase; switch to `mode="deep"`; raise `top_k` | -| `RateLimitError` mid-day | Hit the daily quota | Wait until 00:00 UTC, reduce `top_k`, pre-filter writes | -| Slow `recall` (> 5s) | Likely `mode="deep"` or cold cache | Use `mode="active"`; warm-start the client | -| `httpx.RemoteProtocolError` / HTTP/2 issues | Egress proxy strips h2 | `Tex(http2=False)` | -| Long `remember` blocks the user | You're awaiting it on the request path | Push to a background worker — see [FastAPI recipe](/recipes/fastapi#production-tweaks) | -| Tests are flaky against Tex | You're hitting the real cluster from CI | Use a dedicated CI org and a daily-cleanup script | -| `last_used_at` on the dashboard isn't updating | Caching / stale UI | Hard refresh; the field updates within seconds of a real call | +| `AuthenticationError: Invalid API key` on first call | Wrong key, revoked key, or wrong `base_url` | Mint a fresh key in the [dashboard](https://app.getmetacognition.com); confirm `TEX_BASE_URL` matches the environment you think you are hitting | +| `BadRequestError: 'scope' field required` | Raw REST call without the scaffolding the SDK adds | Use the SDK, or include `scope: {org_id, session_id}` yourself in the JSON body | +| `recall` returns zero hits | Brand-new session, enrichment still catching up, or query mismatch | Wait a second after `remember`; broaden `q`; try `mode="deep"` once to see if signal appears | +| `recall.confidence` looks stuck low | Query does not overlap stored content | Rephrase closer to the stored wording; raise `top_k`; try `mode="deep"` | +| `RateLimitError` mid-day | Org exhausted daily quota | Wait for **00:00 UTC**, lower `top_k`, or trim noisy writes—see [Usage and billing](/concepts/usage-billing) | +| Slow `recall` (> 5s) | `mode="deep"` or cold caches | Default to `mode="active"` in user-facing paths; warm the client on deploy | +| `httpx.RemoteProtocolError` / HTTP/2 noise | Middlebox stripped HTTP/2 | Instantiate with `Tex(http2=False)` | +| Long `remember` blocks UX | You await ingestion on the hot request path | Push persistence to a worker—pattern in the [FastAPI recipe](/recipes/fastapi#production-tweaks) | +| Flaky CI against prod | Shared org contention or dirty sessions | Give CI its own org and sweep data on a schedule | +| Dashboard `last_used_at` looks stale | Browser cache | Hard refresh; the field updates within seconds of real traffic | -## Filing a ticket +## Extra + + + + SDK clients cache JWTs until expiry. After you deploy a new `TEX_API_KEY`, restart workers or recreate the client so they stop presenting tokens minted from the old secret. Confirm you did not typo `https://api.getmetacognition.com` in staging configs. + -Include: + + Recall ranks by relevance, not chronological order—set `include_timeline=True` when you need strict time order for the model. If overlap is still weak, your `session_id` might not match the one you wrote with—re-read [Scopes and multi-tenancy](/concepts/scopes). + + + + Small jitter can come from rerank tie-breaks keyed by hashes. If spread exceeds ~0.1, capture `request_id` and file a ticket—something else is going on. + + + +## Filing a ticket -1. The full `e.request_id` from the exception (a UUID). -2. The approximate timestamp (UTC). -3. The verb you called (`recall`, `remember`, …) and the `session_id`. -4. The SDK version (`tex.__version__`). -5. Anything sensitive — *redact before sharing.* + + + Copy `e.request_id` from any SDK exception—UUID-shaped, safe to share. + + + Give us the approximate UTC time the call failed. + + + Include the method (`recall`, `remember`, …), `session_id`, and whether you hit REST or the SDK. + + + Run `python -c "import tex; print(tex.__version__)"`. Scrub API keys or PII before you press send. + + -Email `support@getmetacognition.com` or open an issue on [GitHub](https://github.com/metacoglabs). +You can email `support@getmetacognition.com` or open an issue on [GitHub](https://github.com/metacoglabs). -## Diagnostics — quick checks +## Copy-paste diagnostics ```python # 1. Auth works? -tex.usage.today() # if this raises AuthenticationError, your key is bad +tex.usage.today() # AuthenticationError here means your key or host is wrong -# 2. Round-trip works? +# 2. Round-trip latency? import time t0 = time.perf_counter() tex.recall(q="ping", session_id="diag") @@ -46,8 +74,8 @@ print(f"recall RTT: {(time.perf_counter()-t0)*1000:.0f}ms") import tex; print(tex.__version__) ``` -## Common-but-not-bugs +## Things that look like bugs but are not -- **Turns out of order?** Recall ranks by relevance, not chronology. For chronological order, set `include_timeline=True`. -- **Confidence varies between identical queries?** Some randomness is intrinsic (rerank ties broken by hash). Variation > 0.1 suggests an actual issue — file a ticket. -- **`session_id` mismatch returns nothing.** By design — sessions are isolated. See [scopes](/concepts/scopes) for cross-session patterns. +- **Hits sorted funny:** relevance ordering is intentional—use timeline mode when you need chronology. +- **Empty cross-session recall:** sessions are isolated until you design a scope strategy—see [Scopes and multi-tenancy](/concepts/scopes). +- **Identical queries, tiny score deltas:** expect minor movement; large swings merit a ticket. From 820c5facd9c12cca5ccc4163e5e818571e27411f Mon Sep 17 00:00:00 2001 From: venkat1701 Date: Sun, 17 May 2026 12:48:39 +0530 Subject: [PATCH 2/2] docs: enhance clarity and consistency across multiple sections - Updated language in authentication, benchmarks, and introduction for improved readability and precision. - Revised descriptions and details in API reference, including token management and memory ingestion/recall processes. - Streamlined troubleshooting guidance and symptom-response table for better user experience. - Adjusted formatting and structure in quickstart and changelog for enhanced scannability. - Ensured consistent terminology and phrasing throughout documentation to align with user feedback. --- api-reference/account/api-keys.mdx | 18 ++--- api-reference/account/me.mdx | 10 +-- api-reference/auth/refresh.mdx | 12 +-- api-reference/auth/signup.mdx | 12 +-- api-reference/auth/token-exchange.mdx | 8 +- api-reference/memory/ingest-memory.mdx | 16 ++-- api-reference/memory/recall.mdx | 16 ++-- api-reference/overview.mdx | 22 +++--- api-reference/usage/summary.mdx | 8 +- api-reference/usage/today.mdx | 8 +- authentication.mdx | 38 +++++----- benchmarks.mdx | 32 ++++---- changelog.mdx | 8 +- concepts/memory-model.mdx | 42 ++++++----- concepts/retrieval.mdx | 45 ++++++----- concepts/scopes.mdx | 18 +++-- concepts/usage-billing.mdx | 18 +++-- introduction.mdx | 56 +++++++------- migration/from-langchain-memory.mdx | 28 ++++--- migration/from-redis.mdx | 22 +++--- migration/from-supermemory.mdx | 18 ++--- quickstart.mdx | 40 ++++++---- recipes/azure-openai-rag.mdx | 12 +-- recipes/fastapi.mdx | 100 +++++++++++++------------ recipes/langchain.mdx | 22 +++--- recipes/multi-tenant-saas.mdx | 24 +++--- recipes/slack-bot.mdx | 16 ++-- recipes/streamlit.mdx | 26 ++++--- sdk/client.mdx | 24 +++--- sdk/conversations-remember.mdx | 30 ++++---- sdk/errors.mdx | 50 +++++++------ sdk/installation.mdx | 18 +++-- sdk/recall.mdx | 40 +++++----- sdk/usage.mdx | 14 ++-- snippets/flow-visuals.mdx | 4 +- troubleshooting.mdx | 40 +++++----- 36 files changed, 480 insertions(+), 435 deletions(-) diff --git a/api-reference/account/api-keys.mdx b/api-reference/account/api-keys.mdx index 2876262..e73924c 100644 --- a/api-reference/account/api-keys.mdx +++ b/api-reference/account/api-keys.mdx @@ -1,11 +1,11 @@ --- title: "List, create, and revoke API keys" -description: "Manage long-lived keys for your org—mint, list metadata, and revoke without taking production down." +description: "List, create, and revoke long-lived API keys for your org." --- ## `GET /me/api-keys` -Lists keys for the current **org** (sorted by `created_at` desc). Same shape as `GET /me`'s `api_keys` field. +Lists keys for the current **org**, sorted by `created_at` descending. The response matches the `api_keys` field in `GET /me`. ```http GET /me/api-keys[?include_revoked=true] @@ -18,7 +18,7 @@ Authorization: Bearer ## `POST /me/api-keys` -Mints a new key. Returns the plaintext value **once.** +Creates a new key. The plaintext value is returned **once**. ```http POST /me/api-keys @@ -50,7 +50,7 @@ Content-Type: application/json - Metadata (id, prefix, display_id, name, …). + Metadata for the key, including id, prefix, display_id, name, scopes, and timestamps. ```json @@ -84,13 +84,13 @@ Authorization: Bearer - Revocation is **irreversible.** Existing JWTs minted from a revoked key keep working until they expire (max 24h after revocation). + Revocation is **irreversible.** Existing JWTs created from a revoked key keep working until they expire, up to 24h after revocation. ## Examples -```bash cURL — list, mint, revoke +```bash cURL - list, mint, revoke # List active keys curl -H "Authorization: Bearer $JWT" \ https://api.getmetacognition.com/me/api-keys @@ -126,6 +126,6 @@ httpx.delete(f"{api}/me/api-keys/{key_id}", headers=H) ## Operational tips -- One key per environment. Mint `production`, `staging`, `local-dev-sauhard` separately. -- Set `last_used_at` thresholds in your monitoring — alert if a key hasn't been used in 30 days (probably abandoned). -- Don't share keys across services — each service gets its own so revocation has surgical blast radius. +- One key per environment. Mint `production`, `staging`, and `local-dev` separately. +- Alert if a key has not been used in 30 days. It may be abandoned. +- Do not share keys across services. Give each service its own key so revocation is narrow. diff --git a/api-reference/account/me.mdx b/api-reference/account/me.mdx index 1fbb926..42ed96c 100644 --- a/api-reference/account/me.mdx +++ b/api-reference/account/me.mdx @@ -1,10 +1,10 @@ --- title: "Get current organization" -description: "Authenticated org, user, and all API keys visible to that org." +description: "Read the authenticated org, user, and visible API keys." api: "GET https://api.getmetacognition.com/me" --- -Returns the authenticated org and user, plus all API keys for that **org** (not just the calling user). +Returns the authenticated org and user, plus all API keys visible to that **org**. ## Headers @@ -23,7 +23,7 @@ Authorization: Bearer - Array of API key metadata for the org (sorted by `created_at` desc). Plaintext keys are **not** included — only metadata. To get the plaintext value, mint a new key via [`POST /me/api-keys`](/api-reference/account/api-keys). + API key metadata for the org, sorted by `created_at` descending. Plaintext keys are **not** included. To get a plaintext key, create a new one with [`POST /me/api-keys`](/api-reference/account/api-keys). ## Example @@ -68,6 +68,6 @@ print(me["org_id"], "→", len(me["api_keys"]), "keys") ## When to use -- Build a "logged-in as…" UI in your own product. +- Build a "logged-in as" UI in your own product. - Audit which keys are still active before a rotation. -- Verify that `last_used_at` advances after a deploy (sanity check). +- Verify that `last_used_at` advances after a deploy. diff --git a/api-reference/auth/refresh.mdx b/api-reference/auth/refresh.mdx index b172eb0..c45b0f0 100644 --- a/api-reference/auth/refresh.mdx +++ b/api-reference/auth/refresh.mdx @@ -1,13 +1,13 @@ --- title: "Refresh access token" -description: "Rotate access tokens using a valid refresh token; the SDK calls this after a 401." +description: "Use a refresh token to get a new access token." api: "POST https://api.getmetacognition.com/auth/refresh" icon: "arrows-rotate" --- -import { TokenRetryVisual } from "../../snippets/flow-visuals.mdx"; +import { TokenRetryVisual } from "/snippets/flow-visuals.mdx"; -Use this when an `access_token` has expired but the `refresh_token` is still valid. The SDK does this automatically on 401. +Use this when an `access_token` has expired and the `refresh_token` is still valid. The SDK does this automatically after a 401. ## Body @@ -28,7 +28,7 @@ Use this when an `access_token` has expired but the `refresh_token` is still val - Possibly rotated; persist whichever the response returns. + May be rotated. Store the value returned by the response. @@ -61,10 +61,10 @@ tokens = resp.json() ## When refresh fails -If the refresh token is itself expired (>7d old) or revoked (the underlying API key was revoked), `/auth/refresh` returns `401`. Fall back to `/auth/token-exchange` with the original API key — or, if the API key is also gone, surface a re-login flow. +If the refresh token is expired (more than 7 days old) or revoked, `/auth/refresh` returns `401`. At that point, call `/auth/token-exchange` with the original API key. If the API key is also gone, ask the user or service to authenticate again. ### After 401 -Without the SDK, implement the same sequence in your HTTP client. With the SDK you usually only surface `AuthenticationError` if refresh **and** exchange both fail. +Without the SDK, implement this sequence in your HTTP client. With the SDK, `AuthenticationError` usually means refresh **and** exchange both failed. diff --git a/api-reference/auth/signup.mdx b/api-reference/auth/signup.mdx index bc0ad42..c688276 100644 --- a/api-reference/auth/signup.mdx +++ b/api-reference/auth/signup.mdx @@ -1,10 +1,10 @@ --- title: "Sign up" -description: "Create an org and receive your first API key—plaintext key is shown once in the response." +description: "Create an org and receive the first API key." api: "POST https://api.getmetacognition.com/signup" --- -Creates a new organization, a default user inside it, and the first API key for that user. The plaintext API key appears in the response **once and only once**. +Creates a new organization, a default user, and the first API key. The plaintext API key appears in the response **once**. ## Body @@ -16,7 +16,7 @@ Creates a new organization, a default user inside it, and the first API key for ``` - Optional org id. Auto-generated as `org_<10-char base62>` when omitted. Must be **alphanumeric** (with `-` or `_` allowed) and **≤ 64 chars**. Server returns `409 Conflict` if the id already exists. + Optional org id. If omitted, the server generates `org_<10-char base62>`. Must be alphanumeric, with `-` or `_` allowed, and at most 64 chars. Returns `409 Conflict` if the id already exists. @@ -34,7 +34,7 @@ Creates a new organization, a default user inside it, and the first API key for - The plaintext API key. **Save it now — there's no recovery.** + The plaintext API key. **Save it now. It cannot be recovered later.** @@ -82,11 +82,11 @@ api_key = data["api_key"] # store this securely ``` - The `api_key` field appears only in this response. Persist it server-side. We can't recover or re-display it. + The `api_key` field appears only in this response. Store it server-side. We cannot recover or re-display it. - This endpoint is unauthenticated by design — anyone can create an org. To gate it (e.g. private launch), put your existing auth provider in front (Auth0, Clerk) and only forward to `/signup` after their checks pass. + This endpoint is unauthenticated by design. Anyone can create an org. For a private launch, put your existing auth provider in front and only forward to `/signup` after your checks pass. ## Errors diff --git a/api-reference/auth/token-exchange.mdx b/api-reference/auth/token-exchange.mdx index 1dcb8e6..5e5c0ee 100644 --- a/api-reference/auth/token-exchange.mdx +++ b/api-reference/auth/token-exchange.mdx @@ -1,10 +1,12 @@ --- title: "Exchange API key for tokens" -description: "Trade an API key for short-lived access + refresh JWTs—mirrors what the Python SDK does on first call." +description: "Exchange an API key for short-lived access and refresh JWTs." api: "POST https://api.getmetacognition.com/auth/token-exchange" --- -Trades an API key for a JWT access/refresh pair. The Python SDK does this automatically — call this directly only if you're integrating from another language or brokering tokens in a separate service. +Exchange your API key for an access token and refresh token. This is the HTTP version of what the SDK does on first use. See [Authentication](/authentication) for the full flow. + +Call this directly only when you are not using the Python SDK, or when another service brokers tokens for your app. ## Body @@ -80,7 +82,7 @@ const tokens = await resp.json(); ## JWT contents -Decode the access token (don't *trust* the contents — verify with the [JWKS endpoint](https://api.getmetacognition.com/.well-known/jwks.json) if you need to): +Decode the access token if you want to inspect its claims. If you need to trust those claims, verify the token with the [JWKS endpoint](https://api.getmetacognition.com/.well-known/jwks.json). ```json { diff --git a/api-reference/memory/ingest-memory.mdx b/api-reference/memory/ingest-memory.mdx index 34a7b35..ea28db0 100644 --- a/api-reference/memory/ingest-memory.mdx +++ b/api-reference/memory/ingest-memory.mdx @@ -1,10 +1,10 @@ --- title: "Ingest conversation memory" -description: "REST `remember`: write turns under a scope; active path is synchronous, enrichment continues async." +description: "Write turns under a scope. Active memory is saved first; enrichment continues in the background." api: "POST https://api.getmetacognition.com/ingestion/memory" --- -The REST equivalent of `tex.conversations.remember(...)`. Active write completes synchronously; passive enrichment runs in the background. +This is the REST version of **`tex.conversations.remember`**. Use it to write turns under a scope. Tex saves active memory first, then continues enrichment in the background. ## Headers @@ -40,7 +40,7 @@ Content-Type: application/json ``` - Your org id (≥ 1 char). Server uses the JWT's `org_id` claim for tenancy regardless — this field is required for Pydantic validation only. + Your org id. Minimum length is 1 character. The server still uses the JWT's `org_id` claim for tenancy; this field is required for request validation. @@ -52,15 +52,15 @@ Content-Type: application/json - At least one turn (`min_length=1`). Each turn: `{role, text, timestamp, observations?}`. See [memory model](/concepts/memory-model). + At least one turn (`min_length=1`). Each turn: `{role, text, timestamp, observations?}`. See [How memory works](/concepts/memory-model). - Optional dual-write toggles: `{ write_active: bool = true, write_passive: bool = true }`. Disabling either lets advanced callers bypass one of the storage tiers. + Optional write toggles: `{ write_active: bool = true, write_passive: bool = true }`. Advanced callers can disable one storage tier. - Free-form metadata. **Currently stored but not surfaced** in retrieval — reserved for future filters. + Free-form metadata. It is stored today and reserved for future filters. ## Response — `202 Accepted` @@ -70,7 +70,7 @@ Content-Type: application/json - Active-memory fragment ids — already recallable. + Active-memory fragment ids. These are already recallable. @@ -127,4 +127,4 @@ resp.raise_for_status() ## Idempotency -Tex computes a stable hash per turn (`role + text + timestamp`). Re-sending the same turn is a no-op — no duplicate fragments, no double-billing. Safe to retry on network failure. +Tex computes a stable hash per turn from `role`, `text`, and `timestamp`. Re-sending the same turn is a no-op. It does not create duplicate fragments or double bill the turn. It is safe to retry after a network failure. diff --git a/api-reference/memory/recall.mdx b/api-reference/memory/recall.mdx index 7b522d5..b6fc6b6 100644 --- a/api-reference/memory/recall.mdx +++ b/api-reference/memory/recall.mdx @@ -1,10 +1,10 @@ --- title: "Recall memory" -description: "REST `recall`: natural-language query, ranked hits, confidence, token usage in the response." +description: "Search memory with a natural-language query and get ranked hits, confidence, and usage." api: "POST https://api.getmetacognition.com/recall" --- -REST equivalent of `tex.recall(...)`. +This is the REST version of **`tex.recall`**. Use it before generation to get the memory your model should read. For **`mode`**, **`top_k`**, and confidence, see [Recall and ranking](/concepts/retrieval). ## Headers @@ -29,11 +29,11 @@ Content-Type: application/json ``` - Your org id (≥ 1 char). Server uses the JWT's `org_id` claim for tenancy regardless — this field is required for Pydantic validation only. + Your org id. Minimum length is 1 character. The server still uses the JWT's `org_id` claim for tenancy; this field is required for request validation. - Session to search. Optional — falls back to the JWT's `session_id`. + Session to search. Optional. Falls back to the JWT's `session_id`. @@ -41,11 +41,11 @@ Content-Type: application/json - Retrieval depth. See [retrieval](/concepts/retrieval). + Retrieval depth. See [Recall and ranking](/concepts/retrieval). - Hits to return across all kinds. **Defaults to 15 (active) / 25 (deep).** Pydantic validates `1 ≤ top_k ≤ 50`; the runtime then caps at **30**, so values above 30 are silently clamped. + Hits to return across all kinds. **Defaults to 15 (active) / 25 (deep).** Request validation accepts `1 <= top_k <= 50`; runtime caps the final value at **30**. @@ -63,7 +63,7 @@ Content-Type: application/json - Linked entities. Each entity has `{id?, label, score}` — **not** the same shape as turns/observations. + Linked entities. Each entity has `{id?, label, score}`. This is different from turns and observations. @@ -82,7 +82,7 @@ Content-Type: application/json `{tokens_in, tokens_out}` for this call. -### Hit shape +### Hit fields ```json { diff --git a/api-reference/overview.mdx b/api-reference/overview.mdx index aad410c..6d30fb0 100644 --- a/api-reference/overview.mdx +++ b/api-reference/overview.mdx @@ -1,11 +1,11 @@ --- title: "REST API overview" -description: "Base URL, JWT auth, correlation IDs, error bodies, rate limits, retries—everything before you open the endpoint list." +description: "Base URL, JWT auth, correlation IDs, errors, limits, and retries." icon: "globe" --- - If you are in Python, prefer the [SDK](/sdk/installation) so JWT refresh stays handled for you. Stay on this page for curl, other languages, or when you are debugging raw HTTP. + In Python, prefer the [SDK](/sdk/installation). It handles token exchange and refresh for you. Use this page for curl, other languages, or raw HTTP debugging. ## Quick reference @@ -18,7 +18,7 @@ icon: "globe" | Trace failures | `X-Correlation-ID` + JSON `request_id` | | Limits & metering | [Usage, quotas, and billing](/concepts/usage-billing) | -Small surface: **`/me`**, **`/ingestion/memory`**, **`/recall`**, **`/usage/*`**. +The API is small, and auth works the same way across it: **`/me`**, **`/ingestion/memory`**, **`/recall`**, and **`/usage/*`**. ## Base URL @@ -34,7 +34,7 @@ Every product endpoint expects: Authorization: Bearer ``` -You mint `` by posting your API key to [`POST /auth/token-exchange`](/api-reference/auth/token-exchange). After that you treat it like any short-lived bearer token. +Create `` by posting your API key to [`POST /auth/token-exchange`](/api-reference/auth/token-exchange). Then send it like any other short-lived bearer token. ## Content type @@ -46,7 +46,7 @@ Content-Type: application/json ## Correlation IDs -Each request should carry (or receive) an `X-Correlation-ID` UUID. The server mints one if you skip it; the SDK always sends its own so your service logs and ours match. Paste that value into support threads. +Each request can include an `X-Correlation-ID` UUID. If you do not send one, the server creates one. The SDK always sends one so your logs and ours match. Include this value in support threads. ```http X-Correlation-ID: 4f1d8e3c-2a9b-4c0d-9e6f-1a2b3c4d5e6f @@ -66,11 +66,11 @@ Standard HTTP status codes: | `401` | Auth failure | | `403` | Forbidden | | `404` | Not found | -| `422` | Validation error (FastAPI shape) | +| `422` | Validation error (FastAPI request format) | | `429` | Daily quota exceeded | -| `5xx` | Tex platform fault—retry with backoff, then escalate with the correlation ID | +| `5xx` | Tex platform fault. Retry with backoff, then escalate with the correlation ID. | -Error body shape: +Error body: ```json { @@ -85,14 +85,14 @@ Error body shape: ## Rate limits - Limits are per organization. Today the free tier allows **1,000,000** `tokens_in` and **5,000,000** `tokens_out` each UTC day, resetting at **00:00 UTC**. See [Usage, quotas, and billing](/concepts/usage-billing) for how that shows up in responses and dashboards. + Limits are per organization. The free tier allows **1,000,000** `tokens_in` and **5,000,000** `tokens_out` each UTC day. Both reset at **00:00 UTC**. [Usage, quotas, and billing](/concepts/usage-billing) explains how this appears in responses and dashboards. ## Retries -You should retry **twice** with exponential backoff on `408`, `500`, `502`, `503`, `504`, and hard network failures. Respect `Retry-After` when the API sends it. +Retry **twice** with exponential backoff on `408`, `500`, `502`, `503`, `504`, and hard network failures. Respect `Retry-After` when the API sends it. -Skip retries for `400`, `401`, `403`, `404`, `422`, and `429`—those will not magically clear on a replay. +Do not retry `400`, `401`, `403`, `404`, `422`, or quota `429`. A replay will usually fail the same way. ## Endpoints diff --git a/api-reference/usage/summary.mdx b/api-reference/usage/summary.mdx index 6c4855b..8ac4e06 100644 --- a/api-reference/usage/summary.mdx +++ b/api-reference/usage/summary.mdx @@ -1,10 +1,10 @@ --- title: "Monthly usage summary" -description: "Calendar-month rollups in UTC—handy for finance and capacity reviews." +description: "Read monthly token rollups in UTC." api: "GET https://api.getmetacognition.com/usage/summary" --- -Calendar-month rollup, UTC. Defaults to the current month. +Returns a calendar-month usage rollup in UTC. If you omit `month`, the endpoint returns the current month. ## Headers @@ -25,11 +25,11 @@ Authorization: Bearer - First moment of the month, UTC. + First moment of the month in UTC. - First moment of the next month, UTC. + First moment of the next month in UTC. diff --git a/api-reference/usage/today.mdx b/api-reference/usage/today.mdx index 434c0bb..fcf668e 100644 --- a/api-reference/usage/today.mdx +++ b/api-reference/usage/today.mdx @@ -1,10 +1,10 @@ --- title: "Today's usage" -description: "Today's token totals, limits, and headroom before you hit a 429." +description: "Read today's token totals, limits, and reset time." api: "GET https://api.getmetacognition.com/usage/today" --- -Returns today's `tokens_in` / `tokens_out` plus the limits that will trigger a `429`. Doesn't count against your own quota — poll as often as you want. +Returns today's `tokens_in` and `tokens_out`, plus the limits that trigger `429`. This request does **not** count against your quota, so you can poll it from dashboards or monitors. ## Headers @@ -15,11 +15,11 @@ Authorization: Bearer ## Response — `200` - Tokens billed today, ingress. + Input tokens billed today. - Tokens billed today, egress. + Output tokens billed today. diff --git a/authentication.mdx b/authentication.mdx index 4eb0465..5454a67 100644 --- a/authentication.mdx +++ b/authentication.mdx @@ -7,14 +7,16 @@ icon: "key" import { AuthSequence } from "/snippets/flow-visuals.mdx"; - If you only need a working client, start with the [Quickstart](/quickstart). Return here when you are wiring secrets, bringing your own JWT, or planning key rotation. + If you only need a working client, start with the [Quickstart](/quickstart). Come back here when you are wiring secrets, using your own JWT, or rotating keys. -Your API key is exchanged for short-lived access and refresh tokens the first time the SDK hits a real product route. After that the client keeps JWTs in memory (or your hooks), refreshes them before they expire, and you rarely handle raw token strings yourself. +The SDK starts with your API key. On the first real call, it exchanges that key for short-lived access and refresh tokens. + +After that, the client keeps the tokens in memory, refreshes them when needed, and retries the original call. Most apps never need to handle raw token strings. ## Flow -The numbered list below matches this sequence—your app calls the SDK; the SDK talks to the API (including exchange, refresh, and retries). +The diagram shows what happens when your app calls the SDK. ```mermaid sequenceDiagram @@ -50,29 +52,29 @@ sequenceDiagram { id: "construct", title: "You build the client", - detail: "Tex(api_key=…) — lazy, zero bytes on the wire until you call a real method.", + detail: "Tex(api_key=...) does not call the network until you run a real method.", }, { id: "first", title: "You call recall/remember/usage", - detail: "SDK POSTs /auth/token-exchange, receives access (≈24h) + refresh (≈7d) JWTs.", + detail: "The SDK calls POST /auth/token-exchange and receives access (24h) and refresh (7d) JWTs.", }, { id: "steady", title: "SDK attaches Bearer access token", - detail: "Your business call runs (for example POST /recall) with Authorization: Bearer ….", + detail: "Your call runs with Authorization: Bearer .", }, { id: "refresh", title: "When access expires", - detail: "Next call may 401; SDK POSTs /auth/refresh, retries once with the new access token — you still await a single method call.", + detail: "The SDK calls POST /auth/refresh, gets a new access token, and retries once.", }, ]} /> ## Auth mode -Use **one** of these: +Most apps use an API key. Use one of the other modes only when you already manage auth somewhere else. @@ -82,7 +84,7 @@ Use **one** of these: base_url="https://api.getmetacognition.com", ) ``` - Default for almost every app. The SDK handles exchange and refresh. + This is the default for most apps. The SDK handles token exchange and refresh. ```python @@ -95,10 +97,10 @@ Use **one** of these: ) ``` - Useful when you broker auth in a separate service and pass JWTs to your client. + Use this when another service already creates the JWTs. - Unlike the api-key flow, BYO-JWT does **not** auto-fill `org_id` / `user_id` from `/auth/verify`. Pass them explicitly — every `remember` and `recall` needs them in the request scope. + BYO-JWT does **not** auto-fill `org_id` / `user_id` from `/auth/verify`. Pass them explicitly. Every `remember` and `recall` needs them in the request scope. @@ -111,7 +113,7 @@ Use **one** of these: ``` - `/auth/login` is **disabled** in production (returns 403 unless the integration backend runs with `DEBUG=true`). Use this only for local development against a self-hosted backend. For production apps, use an API key. + `/auth/login` is **disabled** in production. It returns 403 unless the integration backend runs with `DEBUG=true`. Use this only for local development against a self-hosted backend. In production, use an API key. @@ -123,13 +125,13 @@ Use **one** of these: `.env` file. Add it to `.gitignore`. Load with `python-dotenv`.
- Secret manager → mounted as `TEX_API_KEY` env var. + Secret manager mounted as `TEX_API_KEY` env var. - Project Environment Variables → `TEX_API_KEY`. + Project environment variable named `TEX_API_KEY`. - Repository secret → exposed via `${{ secrets.TEX_API_KEY }}`. + Repository secret exposed as `${{ secrets.TEX_API_KEY }}`.
@@ -145,10 +147,10 @@ The SDK reads `TEX_API_KEY` from the environment automatically when `api_key=` i Deploy with `TEX_API_KEY=`. - Check the dashboard's "last used" column or your own logs. + Check the dashboard's `last_used_at` value or your own logs. - Click **Revoke** on the old key. Existing JWTs minted from key A keep working for up to 24h, so step 4 has zero customer-visible impact. + Click **Revoke** on the old key. JWTs created from key A can keep working for up to 24h, so customers do not see a sudden failure. @@ -177,7 +179,7 @@ $ curl -X POST https://api.getmetacognition.com/auth/token-exchange \ - **Token lifetimes.** Access JWT lasts 24h. Refresh JWT lasts 7d. Beyond that, the SDK re-exchanges your API key automatically. JWTs are stateless — to invalidate one, revoke the underlying API key. + **Token lifetimes.** Access JWTs last 24h. Refresh JWTs last 7d. After that, the SDK exchanges your API key again. To invalidate tokens, revoke the API key they came from. diff --git a/benchmarks.mdx b/benchmarks.mdx index 116b5d7..183f7fa 100644 --- a/benchmarks.mdx +++ b/benchmarks.mdx @@ -1,6 +1,6 @@ --- -title: "Benchmarks & methodology" -description: "LoCoMo and LongMemEval_S scores, per-category tables, latency, and token efficiency—so you can verify claims yourself." +title: "Benchmarks and methodology" +description: "LoCoMo and LongMemEval_S scores, with category tables, latency, tokens, and methodology." icon: "trophy" --- @@ -21,19 +21,21 @@ export const Chart = ({title, children}) => ( ); -On **LoCoMo**, the full Tex stack scores **93.3%** overall. On **LongMemEval_S** (active retrieval only), Tex is at **92.2%**. The sections that follow break out categories, latency, and tokens in plain view so you can reproduce the story yourself. +Tex scores **93.3%** overall on **LoCoMo** with the full system. Tex scores **92.2%** on **LongMemEval_S** with active retrieval only. + +This page shows the category tables, latency, tokens, and methodology behind those numbers. - Full Tex system vs published baselines—EverMemOS was the prior headline number at **92.3%**. + Full Tex system vs published baselines. EverMemOS was the prior headline number at **92.3%**. - Active retrieval track vs other retrieval-first systems—Emergence AI posted **86.0%** on comparable reporting. + Active retrieval track vs other retrieval-first systems. Emergence AI posted **86.0%** on comparable reporting. - We generated answers with **gpt-4o-mini** and graded them with **gpt-4o** in an LLM-as-judge configuration. Each evaluation ran on a **single** machine from start to finish (no multi-node setup). The precise recipe is in **Methodology** below. + We generated answers with **gpt-4o-mini** and graded them with **gpt-4o** in an LLM-as-judge setup. Each evaluation ran on a **single** machine. The exact setup is in **Methodology** below. ## LoCoMo @@ -66,7 +68,7 @@ LoCoMo evaluates long-conversation memory across 10 multi-session conversations - On adversarial items Tex is **99.33%**: it declines to answer when the transcript does not support one. If a vendor’s public write-up skips that bucket, treat their headline number with the skepticism it deserves. + On adversarial items Tex is **99.33%**. It declines to answer when the transcript does not support an answer. If a public benchmark skips that bucket, compare headline numbers carefully. ## LongMemEval_S @@ -105,7 +107,7 @@ LongMemEval_S evaluates memory over **500 questions across ~48 sessions each (~1 - **92.2%** sits next to Oracle GPT-o3 (**92.0%**) and well above Oracle GPT-4o (**82.4%**). In practice that means the retrieval step is surfacing roughly the same evidence you would hand-pick for each question—without you doing the hand-picking. + **92.2%** is close to Oracle GPT-o3 (**92.0%**) and well above Oracle GPT-4o (**82.4%**). That means retrieval is finding evidence close to what you would hand-pick for each question. ## Latency @@ -153,16 +155,16 @@ LongMemEval_S evaluates memory over **500 questions across ~48 sessions each (~1 - Ingest here costs **zero LLM tokens**—offline embeds only. Vendors that run an LLM per message will show up in your bill; ask them before you commit. + Ingest here costs **zero LLM tokens**. This run used offline embeddings only. If a system runs an LLM per message during ingest, that cost should show up in token usage. ### Headline efficiency claims -Compared with published Mem0 numbers, Tex lands about **27%** higher on accuracy while using roughly **87%** fewer tokens and running at about **95%** lower latency. Against MemMachine’s memory-only configuration it is about **5.8%** more accurate on **43%** fewer tokens. On LoCoMo the stack also posts the lowest tokens-per-correct-answer we have recorded here — about **1,296** — because ingestion never spends LLM cycles on every message. +Compared with published Mem0 numbers, Tex is about **27%** higher on accuracy, uses roughly **87%** fewer tokens, and runs at about **95%** lower latency. Against MemMachine's memory-only configuration, Tex is about **5.8%** more accurate on **43%** fewer tokens. On LoCoMo, Tex also has the lowest tokens-per-correct-answer in this comparison: about **1,296**. ## Ablation: what each part of the pipeline adds -On LongMemEval_S, peeling layers off the production retrieval pipeline: +On LongMemEval_S, removing pieces of the retrieval pipeline changes accuracy like this: | Capability | What it does | Δ accuracy | | --- | --- | --- | @@ -178,13 +180,13 @@ On LongMemEval_S, peeling layers off the production retrieval pipeline: Answers came from **`gpt-4o-mini`**. We graded them with an LLM-as-judge setup built on **`gpt-4o`**, using category-specific prompts, binary pass/fail per item, and a straight average inside each category. -**LoCoMo** used all ten provided conversations (**1,984** question–answer pairs) against the **full Tex system** (not retrieval-only). **LongMemEval_S** used **500** questions with about **48** sessions each (~115K tokens per trace) against **Tex Active** retrieval only. +**LoCoMo** used all ten provided conversations (**1,984** question-answer pairs) against the **full Tex system** (not retrieval-only). **LongMemEval_S** used **500** questions with about **48** sessions each (~115K tokens per trace) against **Tex Active** retrieval only. -Hardware was a **single** machine for the full pipeline—no multi-node orchestration—and runs completed without someone babysitting intermediate steps. +The full pipeline ran on a **single** machine. There was no multi-node orchestration. ## On our roadmap for evals -We plan to add **MemoryAgentBench (ICLR 2026)** (retrieval quality, learning on the fly, long context, conflicting facts), stronger **multi-step reasoning** when answers need counting or arithmetic over evidence, and **LongMemEval_M** for the nasty case of hundreds of sessions behind one question. +We plan to add **MemoryAgentBench (ICLR 2026)**, stronger multi-step reasoning for questions that need counting or arithmetic over evidence, and **LongMemEval_M** for questions that span hundreds of sessions. ## References @@ -197,5 +199,5 @@ We plan to add **MemoryAgentBench (ICLR 2026)** (retrieval quality, learning on 7. Mastra. *Observational Memory: 95% on LongMemEval.* 2026. - You go from `pip install tex-sdk` to a printed recall in one sitting. + Install the SDK, store one turn, and print a recall result. diff --git a/changelog.mdx b/changelog.mdx index 6bd40c0..d2499bb 100644 --- a/changelog.mdx +++ b/changelog.mdx @@ -1,6 +1,6 @@ --- title: "Changelog" -description: "SDK and API releases: metering, verbs, latency improvements—skim before you upgrade." +description: "SDK and API release notes." icon: "clock-rotate-left" --- @@ -10,7 +10,7 @@ icon: "clock-rotate-left" - New `tex.usage.today()` and `tex.usage.summary(month=...)` methods. - Every `remember` and `recall` response now carries a `usage` field with `tokens_in` and `tokens_out`. -- Daily quotas enforced at the engine — `RateLimitError` on exceed. +- Daily quotas enforced at the engine. Exceeding them raises `RateLimitError`. **Conversation-native verbs** @@ -33,7 +33,7 @@ Initial public release. Supermemory-compatible verbs (`add`, `search`, `profile` - Native async methods on a parallel `AsyncTex` class. Same API surface, awaitable. Targeting Q3. + Native async methods on an `AsyncTex` class. Same methods, but awaitable. Targeting Q3. `tex.recall(..., user_id=...)` without constructing a new client. Targeting 1.2. @@ -48,6 +48,6 @@ Initial public release. Supermemory-compatible verbs (`add`, `search`, `profile` Email notifications at 80% / 100% of either daily cap. Targeting 1.2. - `read` / `write` / `usage:read` scopes for principle-of-least-privilege keys. + Separate **`read`**, **`write`**, and **`usage:read`** scopes so each key does only what it should. diff --git a/concepts/memory-model.mdx b/concepts/memory-model.mdx index de0f653..26911e8 100644 --- a/concepts/memory-model.mdx +++ b/concepts/memory-model.mdx @@ -1,14 +1,14 @@ --- title: "How memory works" -description: "Turns you write, observations and entities Tex derives—plus how fast each layer shows up in recall." +description: "What Tex stores after remember, and how soon each layer can appear in recall." icon: "brain" --- -import { PipelineFlow } from "../snippets/flow-visuals.mdx"; +import { PipelineFlow } from "/snippets/flow-visuals.mdx"; -Call **`remember`** when you have new turns to store; call **`recall`** when you have a question and want the best-matching slices back. Most of the time you think in **turns**, but Tex also maintains **observations** and **entities** under the hood. The hot path for a write usually settles in about **150 ms**, while heavier indexing continues afterward. +Call **`remember`** when you have new turns to store. Call **`recall`** when you have a question and want the best matches back. -Those layers have different freshness, so a single **`recall`** can blend raw lines, distilled facts, and linked entities in one payload. +Most of the time you work with **turns**. Tex also builds **observations** and **entities** in the background. A write usually becomes recallable in about **150 ms**. The richer memory layers continue after that. ## Layers @@ -17,16 +17,16 @@ Those layers have different freshness, so a single **`recall`** can blend raw li Raw lines: who said what, when. - Atomic facts inferred from turns (for example, dietary constraints, locations). + Small facts inferred from turns, such as dietary constraints or locations. - Recurring nouns—people, places, orgs—with links across observations. + People, places, and organizations that show up across observations. ## Writes -You call `remember`; Tex commits a **fast** slice you can query immediately, then keeps **enriching** after the response is already on its way back to you. The flow looks like this: +When you call **`remember`**, Tex first saves the turn in active memory. That is the fast path. Then it keeps building richer memory in the background. ### Fast path -You get control back quickly. New turns are usually recallable within about **150 ms** in normal conditions. +Your code gets control back quickly. New turns are usually recallable within about **150 ms**. ### Background -Observations, entities, and timeline enrichment continue after the response. They tighten recall quality on **later** queries. +Observations, entities, and timeline work continue after the response. They improve recall on later questions. -You never need enrichment to finish for the **next** user message to work—the latest turn alone can be enough. +You do not need the background work to finish before the next user message. The latest turn can still be enough. ## Reads -You pass natural-language **`q`** (plus scope); Tex runs fused retrieval, reranking, and emits a calibrated **`confidence`** score alongside the hits. The HTTP shape lives at [recall](/api-reference/memory/recall). +For reads, pass a natural-language **`q`** and the scope to search. Tex retrieves candidates, ranks them, and returns **`hits`** with a **`confidence`** score. + +Over HTTP, **`POST /recall`** takes **`q`**, **`scope`**, and options like **`mode`**, **`top_k`**, and **`include_timeline`**. The response includes ranked turns, observations, entities, token **`usage`**, and an optional **`timeline`** string. The full request and response fields are in [Recall memory](/api-reference/memory/recall). +Tune **`mode`**, **`top_k`**, and **`confidence`** behavior in [Recall and ranking](/concepts/retrieval). + ## One example turn ```python @@ -99,7 +103,7 @@ You pass natural-language **`q`** (plus scope); Tex runs fused retrieval, rerank ## BYO facts -You can ignore extraction and use defaults. Already have facts? Attach them on `remember`—see [`conversations.remember`](/sdk/conversations-remember). +Let Tex extract facts for you. If you already have facts from your own system, attach them to **`remember`**. See [`conversations.remember`](/sdk/conversations-remember). How `org_id`, `user_id`, and `session_id` isolate memory. diff --git a/concepts/retrieval.mdx b/concepts/retrieval.mdx index b135824..f70d351 100644 --- a/concepts/retrieval.mdx +++ b/concepts/retrieval.mdx @@ -1,47 +1,44 @@ --- title: "Recall and ranking" -description: "Active vs deep recall, tuning top_k, and when to trust—or ignore—the confidence score." +description: "Choose active or deep recall, set top_k, and decide when confidence is strong enough." icon: "magnifying-glass" --- -Stick with **`mode="active"`** for anything a person is staring at; switch to **`deep`** when you can spend more time or when **`active`** keeps coming back thin. **`top_k`** is your dial for context size—tight in chat, generous in digests—and it feeds straight into **`tokens_out`**. When **`confidence`** stays under about **0.3**, plan on skipping memory for that turn, widening the query, or trying **`deep`** once. +[How memory works](/concepts/memory-model) explains what Tex stores. This page explains how to choose what comes back. -Python: `tex.recall(q, session_id, ...)`. HTTP: [recall](/api-reference/memory/recall). +Use **`mode="active"`** for chat and copilots. Use **`mode="deep"`** when the user can wait longer, or when active recall is not finding enough. + +Use **`top_k`** to choose how many hits you give the model. Smaller values keep prompts tight. Larger values help summaries, digests, and long answers. Since hits count toward **`tokens_out`**, this also affects cost. + +If **`confidence`** stays under about **0.3**, do not force the memory into the prompt. Try **`mode="deep"`** once, raise **`top_k`**, or ask a clearer **`q`**. + +Python uses **`tex.recall(q, session_id, ...)`**. HTTP uses **`POST /recall`** with the same ideas. The REST field list is in [Recall memory](/api-reference/memory/recall). ## Modes ### `active` (default) -- **Best for:** Chat, copilots, anything with a person waiting. -- **Rough latency:** about **1.5–2.5 s** end-to-end in typical setups. +- **Best for:** Chat, copilots, and live user flows. +- **Rough latency:** about **1.5-2.5 s** end-to-end in typical setups. - **Behavior:** Single-pass retrieval and ranking. ### `deep` -- **Best for:** Offline jobs, “why did we decide X?” investigations, or a second pass after weak `active` results. -- **Rough latency:** about **3–6 s**. +- **Best for:** Offline jobs, decision reviews, or a second pass after weak `active` results. +- **Rough latency:** about **3-6 s**. - **Behavior:** Two-pass with heavier reranking. - - - Default. Fast path for interactive products. - - - Slower, richer retrieval when latency is acceptable. - - - ## `top_k` Defaults: **15** (`active`) / **25** (`deep`). The server caps at **30** no matter what you send. | Situation | Starting `top_k` | | --- | --- | -| Tight assistant prompt | 3–5 | -| Standard chat with citations | 8–15 | -| Summaries or long answers | 20–30 | +| Tight assistant prompt | 3-5 | +| Standard chat with citations | 8-15 | +| Summaries or long answers | 20-30 | -Larger `top_k` directly increases **`tokens_out`** on your bill—see [Usage, quotas, and billing](/concepts/usage-billing). +Larger `top_k` directly increases **`tokens_out`** on your bill. How that maps to quota is in [Usage, quotas, and billing](/concepts/usage-billing). ## Confidence @@ -50,7 +47,7 @@ Every recall returns **`confidence` in [0, 1]**, calibrated so that roughly **`P | Range | How to read it | Practical move | | --- | --- | --- | | **≥ 0.6** | Strong | Pass context to the model as-is. | -| **0.3 – 0.6** | Mixed | Use hits, but cite or summarize sources for the user. | +| **0.3 - 0.6** | Mixed | Use hits, but cite or summarize sources for the user. | | **< 0.3** | Weak | Try `mode="deep"`, rephrase `q`, or skip memory for this turn. | ```python @@ -66,9 +63,9 @@ RecallHit(id, text, score, kind, timestamp) # turns + observations RecallEntity(id, label, score) # entities ``` -- **`hits.hits.turns`** — usual choice for stuffing a system prompt. -- **`hits.hits.observations`** — atomic facts. -- **`hits.hits.entities`** — useful for analytical or “who / what / where” questions. +- **`hits.hits.turns`** - use these for most prompts. +- **`hits.hits.observations`** - small facts extracted from prior turns. +- **`hits.hits.entities`** - people, places, and organizations that help with "who", "what", and "where" questions. ## Timeline string diff --git a/concepts/scopes.mdx b/concepts/scopes.mdx index b70afe5..95e465e 100644 --- a/concepts/scopes.mdx +++ b/concepts/scopes.mdx @@ -1,10 +1,12 @@ --- title: "Scopes and multi-tenancy" -description: "How org_id, user_id, and session_id partition memory so your customers never leak into each other." +description: "Use org_id, user_id, and session_id to keep memory separated." icon: "layer-group" --- -Every call keys off **`(org_id, user_id, session_id)`**. `org_id` comes from your API key. **`session_id` is yours**—that’s where you put thread / channel / tenant. +Every memory call is scoped by **`org_id`**, **`user_id`**, and **`session_id`**. + +Your API key decides **`org_id`**. The SDK usually gets **`user_id`** from the token. Your app chooses **`session_id`**. That is the field you use for a chat thread, Slack channel, agent run, or tenant-specific memory. | Field | Source | You set it? | | --- | --- | --- | @@ -12,7 +14,7 @@ Every call keys off **`(org_id, user_id, session_id)`**. `org_id` comes from you | `user_id` | JWT (or per-call override when supported) | Sometimes | | `session_id` | Your application | **Always** | -`session_id` is the knob you touch daily. +In most apps, **`session_id`** is the field you set on every call. ## `session_id` @@ -33,11 +35,11 @@ Pick a stable pattern: -Keep strings deterministic for the same logical thread so `recall` sees the same corpus every time. +Use the same string for the same logical thread. That way **`recall`** searches the same memory each time. ## SaaS (one key) -Map each customer to a distinct **`session_id`** (and optionally `user_id` when you use overrides). One shared `Tex` client is enough: +For a SaaS app, map each customer or user conversation to a distinct **`session_id`**. One shared **`Tex`** client is enough: ```python tex = Tex(api_key=os.environ["TEX_API_KEY"], base_url=BASE_URL) @@ -48,12 +50,12 @@ def chat(user_msg: str, x_user_id: str, conv_id: str): ... ``` -That gives you **one bill** and **hard separation** between tenants—as long as you never reuse someone else’s `session_id` by accident. +That gives you one bill and separated memory. The only rule is simple: never reuse one customer's **`session_id`** for another customer. - Fine-grained per-call `user_id` overrides are planned for **SDK 1.2**. Until then, encoding the end user into `session_id` is the cleanest pattern. See the [multi-tenant SaaS recipe](/recipes/multi-tenant-saas). + Per-call `user_id` overrides are planned for **SDK 1.2**. Until then, include the end user in `session_id`. The [multi-tenant SaaS recipe](/recipes/multi-tenant-saas) shows the full pattern. - + `top_k`, modes, confidence. diff --git a/concepts/usage-billing.mdx b/concepts/usage-billing.mdx index 7c27e55..0276849 100644 --- a/concepts/usage-billing.mdx +++ b/concepts/usage-billing.mdx @@ -1,10 +1,12 @@ --- title: "Usage, quotas, and billing" -description: "What we meter (tokens in/out), how daily caps work, and how to reason about cost as you scale." +description: "What Tex meters, how daily caps work, and how to keep memory costs predictable." icon: "receipt" --- -We bill on **`tokens_in`** plus **`tokens_out`**, counted with **`tiktoken`** using the **`cl100k_base`** vocabulary. The free tier allows **1M** tokens in and **5M** out per UTC day; exceeding either side returns **`429`**. Every response includes a **`usage`** object, and **`tex.usage.today()`** / **`.summary()`** give you org-level rollups. +Tex tracks two numbers: **`tokens_in`** and **`tokens_out`**. It counts them with **`tiktoken`** and the **`cl100k_base`** vocabulary. + +On the free tier, each org gets **1M** tokens in and **5M** tokens out per UTC day. If either limit is exceeded, the API returns **`429`**. Every **`remember`** and **`recall`** response includes a **`usage`** object. The SDK helpers **`tex.usage.today()`** and **`tex.usage.summary()`** show the same totals as the dashboard. ## Free tier @@ -23,7 +25,7 @@ Both reset at **00:00 UTC**. Crossing either limit raises **`RateLimitError`** ( ### Per response -Every `remember` and `recall` returns usage for that call: +Every **`remember`** and **`recall`** returns usage for that call: ```python hits = tex.recall(q="...", session_id=sid) @@ -38,13 +40,13 @@ month = tex.usage.summary() # current calendar month march = tex.usage.summary("2026-03") ``` -The [Dashboard → Usage](https://app.getmetacognition.com/dashboard/usage) page shows the same numbers graphically. +The [Dashboard Usage page](https://app.getmetacognition.com/dashboard/usage) shows the same numbers. ## Cost knobs -| lever | why it helps | +| Lever | Why it helps | | --- | --- | -| **Lower `top_k`** | Defaults are 15 / 25; live chat often needs only 5–8. | +| **Lower `top_k`** | Defaults are 15 / 25. Live chat often needs only 5-8. | | **Stay on `active`** | `deep` mode costs more time and tokens than `active`. | | **Trim noisy writes** | Skip one-word acks and redundant system spam in `remember`. | | **Batch turns** | Send many turns in one `remember` instead of dozens of calls. | @@ -57,11 +59,11 @@ if tex.usage.today().tokens_in_used / 1_000_000 > 0.9: ## Alerting -There is no hosted email alert yet. Poll **`tex.usage.today()`** every few minutes from your own monitor and page when either usage column crosses **~90%** of quota. Server-side emails near **80%** are on the [roadmap](/changelog). +There is no hosted email alert yet. Poll **`tex.usage.today()`** from your own monitor. Page your team when either usage column crosses **~90%** of quota. Server-side emails near **80%** are on the [roadmap](/changelog). ## Pricing (later) -Billing will be pay-as-you-go once pricing is published. Daily caps remain safety rails—we intend to **notify** you as you approach limits rather than surprise-**429** production, but treat today’s behavior as authoritative until the billing docs update. +Billing will be pay-as-you-go once pricing is published. Daily caps stay in place as safety rails. Until the billing docs change, treat today's **`429`** behavior as the source of truth. `pip install tex-sdk` diff --git a/introduction.mdx b/introduction.mdx index f73dcdd..e0574dd 100644 --- a/introduction.mdx +++ b/introduction.mdx @@ -1,23 +1,25 @@ --- -title: "Overview — What is Tex?" -description: "Long-term memory for assistants and agents: you remember turns, you recall what matters, you keep prompts small." +title: "Overview - What is Tex?" +description: "Long-term memory for assistants and agents. Store turns, recall the useful ones, and keep prompts small." icon: "book-open" --- - New here? Run through the [Quickstart](/quickstart), skim this overview, then follow [Authentication](/authentication) before you ship to production. + New here? Start with the [Quickstart](/quickstart). Then read this overview and [Authentication](/authentication) before you ship. -Chat apps usually force a bad choice: paste the whole thread every request, or lose state on refresh. Tex stores turns as they happen and pulls back what matches the *current* question—then you call your model. +Most chat apps make you choose between two bad options. You either send the whole chat history to the model, or you lose memory when the page refreshes. + +Tex gives you a simpler path. Store each turn as it happens. When the user asks the next question, ask Tex for the few memories that matter. Then call your model with that smaller context. + +Your app still runs the model, routes, and UI. Tex handles storage, search, ranking, and usage tracking. ## API | Call | When | | --- | --- | -| **`remember`** | New stuff to store (turns + optional metadata). | -| **`recall`** | Call right before generation: pass a natural-language question, get ranked hits and a confidence score. | - -You keep model, routing, and UI. Tex does storage, retrieval, ranking, metering. +| **`remember`** | Store new turns, plus optional metadata. | +| **`recall`** | Before you call the model, ask for the most relevant memories. | Need access? Create an account in the [dashboard](https://app.getmetacognition.com/signup), copy the API key once, then follow the [Quickstart](/quickstart). Locally, set `TEX_API_KEY` or pass `api_key=` to the client. @@ -27,10 +29,10 @@ You keep model, routing, and UI. Tex does storage, retrieval, ranking, metering. - Install `tex-sdk`, one `remember`, one `recall`, print scores in the terminal. + Install `tex-sdk`, store one turn, recall it, and print the score. - LoCoMo and LongMemEval_S with splits, latency, and token math laid out in the open. + LoCoMo and LongMemEval_S results with splits, latency, and token counts. @@ -38,10 +40,10 @@ You keep model, routing, and UI. Tex does storage, retrieval, ranking, metering. - Full-system benchmark; splits and baselines on [Benchmarks](/benchmarks). + Full-system benchmark. Tex is ahead of EverMemOS (**92.3%**), MemMachine v0.2 (**91.7%**), Zep (**~85%**), and Mem0 (**~66%**). See [Benchmarks](/benchmarks) for splits and methodology. - Active retrieval track; how we ran the evals is on the same page. + Active retrieval track. Tex is ahead of Emergence AI (**86.0%**), Supermemory (**81.6%**), and Zep (**71.2%**). See [Benchmarks](/benchmarks) for per-ability tables. @@ -62,33 +64,31 @@ You keep model, routing, and UI. Tex does storage, retrieval, ranking, metering. ``` - Put `context` where your model reads it, answer, append new turns—repeat. + Put `context` where your model reads it. Answer the user. Store the new turns. - Low `confidence` at the start just means thin memory—feed real traffic and it moves. + Low `confidence` at the start usually means the session has very little memory. Store more real turns and the score becomes more useful. ## More - - - The synchronous part of `remember` returns fast; heavier enrichment continues afterward. Diagrams and timing notes live in [How memory works](/concepts/memory-model). - +### Latency: active write vs background work + +The fast part of **`remember`** returns quickly. New turns are usually recallable within about **150 ms**. Tex then continues background work, such as observations, entities, and timeline updates. The diagrams and timing notes are in [How memory works](/concepts/memory-model). + +### Isolation between customers + +Use **`org_id`**, **`user_id`**, and **`session_id`** to keep memory separated. [Scopes and multi-tenancy](/concepts/scopes) shows how to map those fields to your users and tenants. + +### Python vs raw HTTP - - You partition with `org_id`, `user_id`, and `session_id`. Read [Scopes and multi-tenancy](/concepts/scopes) before you map those fields to your auth model. - +Use the [Python SDK](/sdk/installation) if you want token exchange and refresh handled for you. Use the [REST API](/api-reference/overview) from another language, or when your service already owns HTTP calls. - - Prefer the [Python SDK](/sdk/installation) for JWT exchange and refresh. Use [REST](/api-reference/overview) from other languages or when you already centralize HTTP in a gateway. - +### Quotas and billing - - Metering is token-based with daily caps. Full detail is under [Usage, quotas, and billing](/concepts/usage-billing). - - +Tex meters `tokens_in` and `tokens_out` with daily caps. [Usage, quotas, and billing](/concepts/usage-billing) explains what counts and when limits reset. ## Docs diff --git a/migration/from-langchain-memory.mdx b/migration/from-langchain-memory.mdx index 885460a..9a59d9c 100644 --- a/migration/from-langchain-memory.mdx +++ b/migration/from-langchain-memory.mdx @@ -1,25 +1,25 @@ --- title: "Migrate from LangChain chat memory" -description: "Move off BaseChatMemory-style buffers to Tex-backed recall with a short mapping guide." +description: "Move from LangChain chat buffers to Tex-backed recall." --- -LangChain's built-in memory classes (`ConversationBufferMemory`, `ConversationBufferWindowMemory`, `ConversationSummaryMemory`, `ConversationKGMemory`) are **buffers, summarizers, or graphs** that live inside your process. Tex is a **hosted retrieval service.** +If you came from the [LangChain recipe](/recipes/langchain), this page gives the longer comparison. -The migration moves you from "give the LLM the whole sliding window" to "give the LLM the relevant slice." +LangChain memory classes live inside your process. They keep buffers, summaries, or graphs near the chain. Tex stores memory outside the process and returns the relevant slice when you call **`recall`**. ## Mapping | LangChain | Tex equivalent | | --- | --- | -| `ConversationBufferMemory` | `recall(q=user_msg, session_id=sid, top_k=20)` — surfaces the relevant 20 turns instead of the last N. | -| `ConversationBufferWindowMemory(k=10)` | `recall(q=user_msg, session_id=sid, top_k=10)` — same `k`, but ranked by relevance. | -| `ConversationSummaryMemory` | Tex stores extracted observations automatically. Surface them via `recall`'s `hits.observations`. | +| `ConversationBufferMemory` | `recall(q=user_msg, session_id=sid, top_k=20)` returns the relevant 20 turns instead of the last N. | +| `ConversationBufferWindowMemory(k=10)` | `recall(q=user_msg, session_id=sid, top_k=10)` uses the same `k`, but ranks by relevance. | +| `ConversationSummaryMemory` | Tex stores extracted observations automatically. Read them from `recall`'s `hits.observations`. | | `ConversationKGMemory` | Tex builds an entity graph in the background. Query via `hits.entities` (linked across observations). | ## Migration - + ```python from langchain.chains import ConversationChain from langchain.memory import ConversationBufferWindowMemory @@ -34,7 +34,7 @@ The migration moves you from "give the LLM the whole sliding window" to "give th answer = chain.invoke({"input": user_msg})["response"] ``` - + ```python from langchain.prompts import ChatPromptTemplate from langchain_openai import ChatOpenAI @@ -63,15 +63,13 @@ The migration moves you from "give the LLM the whole sliding window" to "give th -## Tradeoffs +## Trade-offs -In-process LangChain memory = zero network. Tex = network + ~150ms writes, multi-second reads. +LangChain memory has no network hop. Tex adds a network call: writes are usually around 150ms, and reads can take a few seconds. -**Down:** slower request, depends on our API. +In return, memory survives deploys, prompts stay bounded, sessions can share memory, and each recall has a confidence score. -**Up:** survives deploys, bounded prompts, cross-session memory, confidence, extracted facts. - -Hobby bot: buffer is fine. Customer-facing: Tex is usually worth it. +For a small hobby bot, a buffer may be enough. For a customer-facing app, Tex is usually the cleaner path. ## Drop-in adapter (optional) @@ -110,7 +108,7 @@ class TexChatMemory(BaseChatMemory): ) def clear(self) -> None: - # No-op — Tex memory persists by design + # No-op. Tex memory persists by design. pass ``` diff --git a/migration/from-redis.mdx b/migration/from-redis.mdx index 363897b..9ff22ed 100644 --- a/migration/from-redis.mdx +++ b/migration/from-redis.mdx @@ -1,9 +1,9 @@ --- title: "Migrate from Redis (or a homegrown log)" -description: "Swap append-only chat logs for remember/recall while keeping your routing and models the same." +description: "Replace prompt-stuffed chat logs with remember and recall." --- -If your chat history lives in Redis/Postgres/Mongo and you dump it all into every prompt, this page is for you. +Use this guide if you store chat history in Redis, Postgres, or Mongo and send too much of it to the model on every request. ## Before (Redis log) @@ -35,11 +35,11 @@ tex.conversations.remember(session_id=sid, turns=[ ]) ``` -Three things change for the better: +Three things change: -1. **Bounded prompts.** You pull the *relevant* 8 turns regardless of how many exist. +1. **Bounded prompts.** You pull the relevant 8 turns regardless of how many exist. 2. **Cross-session continuity.** Use `f"chat-{user_id}"` to share memory across conversations. -3. **No retention policy to maintain.** Tex's confidence scoring decides what stays useful. +3. **Less prompt trimming.** You stop guessing which recent turns fit in context. ## Backfill plan @@ -54,7 +54,7 @@ Three things change for the better: {"role": t["role"], "text": t["text"], "timestamp": t["ts"]} for t in turns ] - # Big batches are fine — pass all turns at once + # Big batches are fine. Pass all turns at once. tex.conversations.remember(session_id=sid, turns=formatted) ``` @@ -65,7 +65,7 @@ Three things change for the better: - `recall(q=)` finds the right turn - Run **both** read paths in production for a week. Log when Tex's `confidence < 0.2`. If those rate is below your tolerance, proceed. + Run **both** read paths in production for a week. Log when Tex's `confidence < 0.2`. If that rate stays below your tolerance, proceed. Switch the prompt to use Tex hits. Keep the Redis writes for one more week as backup. @@ -77,7 +77,7 @@ Three things change for the better: ## Edge cases -- **Streaming responses.** Persist with `remember` *after* the stream completes. Background-task it so the next request isn't delayed. -- **System messages.** Don't migrate them — they consume tokens and add no recall value. -- **Tool calls.** Store the *result* of a tool call as an assistant turn, not the raw JSON. Recall surfaces text. -- **Audit log.** Tex isn't an audit store. Keep Redis (or a real append-only log) for compliance and use Tex for retrieval — two systems, two purposes. +- **Streaming responses.** Persist with `remember` after the stream completes. Run it in the background so the next request is not delayed. +- **System messages.** Do not migrate them. They consume tokens and add little recall value. +- **Tool calls.** Store the result of a tool call as an assistant turn, not the raw JSON. Recall returns text. +- **Audit log.** Tex is not an audit store. Keep Redis or another append-only log for compliance, and use Tex for retrieval. diff --git a/migration/from-supermemory.mdx b/migration/from-supermemory.mdx index 9898a21..d82b4f5 100644 --- a/migration/from-supermemory.mdx +++ b/migration/from-supermemory.mdx @@ -1,9 +1,9 @@ --- title: "Migrate from Supermemory" -description: "If you already integrated Supermemory, here is the Tex-shaped equivalent of each call." +description: "Map Supermemory calls to Tex calls." --- -The Tex Python SDK is **resource-shape compatible** with Supermemory for the verbs we both implement. If you're using `supermemory.add(...)`, `client.search(...)`, or `client.profile(...)`, the migration is mostly a pip install and a base-url swap. +The Tex Python SDK keeps Supermemory-compatible calls for the verbs both SDKs support. If you use `supermemory.add(...)`, `client.search(...)`, or `client.profile(...)`, most of the migration is installing `tex-sdk` and changing the base URL. ## Drop-in compatibility @@ -21,14 +21,14 @@ These calls work the same in Tex: | `client.documents.batch_add([...])` | `tex.documents.batch_add([...])` | - Tex's `search` is a **resource**, not a callable — `tex.search(q)` will raise `TypeError`. Use `tex.search.documents(q)` or `tex.search.memories(q)`. There's also a generic `tex.search.execute(q)` alias for `tex.search.documents(q)` if you want a single entry point. + Tex's `search` is a **resource**, not a callable. `tex.search(q)` raises `TypeError`. Use `tex.search.documents(q)` or `tex.search.memories(q)`. `tex.search.execute(q)` is also available as an alias for `tex.search.documents(q)`. -Tex's `/v3/*` and `/v4/*` paths accept the **raw API key** as Bearer (no JWT exchange) for full Supermemory compatibility — the SDK routes them automatically. +Tex's `/v3/*` and `/v4/*` paths accept the raw API key as Bearer for Supermemory compatibility. The SDK routes those calls automatically. ## Exclusive Tex features -These are Tex-only verbs you can adopt incrementally: +These Tex-only verbs can be adopted one at a time: | Verb | What it adds | | --- | --- | @@ -60,15 +60,15 @@ These are Tex-only verbs you can adopt incrementally: Existing `add` / `search` / `profile` calls keep working unchanged. - Replace your conversation-history pattern with `tex.conversations.remember` + `tex.recall` for stronger temporal awareness and confidence scoring. See [memory model](/concepts/memory-model). + Replace your conversation-history pattern with `tex.conversations.remember` + `tex.recall` for stronger temporal awareness and confidence scoring. The mental model is in [How memory works](/concepts/memory-model). ## Behavioral differences -- **Auth.** Supermemory uses the API key as Bearer. Tex does the same on `/v3/*` and `/v4/*` (compat paths) but exchanges to JWT for Tex-native paths. The SDK handles both transparently. -- **Retention.** Persistent on both. No "free tier expires" wipe on Tex — your memory stays until you delete it. -- **Pricing.** Tex bills on `tokens_in` / `tokens_out`, not document count. Chatty workloads are usually cheaper; document-heavy ones — measure with `tex.usage.summary()` after a representative day. +- **Auth.** Supermemory uses the API key as Bearer. Tex does the same on `/v3/*` and `/v4/*`, but exchanges to JWT for Tex-native paths. The SDK handles both. +- **Retention.** Memory is persistent. Tex does not wipe free-tier memory on a timer. It stays until you delete it. +- **Pricing.** Tex bills on `tokens_in` / `tokens_out`, not document count. For document-heavy workloads, measure with `tex.usage.summary()` after a representative day. - **Profiles.** Tex's `profile` aggregates over `container_tag` like Supermemory's. Backfill works without changes. If you hit a Supermemory verb that doesn't behave identically, file an issue. diff --git a/quickstart.mdx b/quickstart.mdx index 012fb78..0c3be6f 100644 --- a/quickstart.mdx +++ b/quickstart.mdx @@ -1,19 +1,25 @@ --- title: "Quickstart" -description: "From zero to a working remember + recall: install tex-sdk, set TEX_API_KEY, run one script, read the scores." +description: "Install tex-sdk, set TEX_API_KEY, store one turn, recall it, and read the score." icon: "rocket" --- ## Goal -One script: `remember` a turn, `recall` it with a question, print scores and token usage. A few minutes with Python 3.9+. +In a few minutes, you will run one Python script that does the core Tex flow: + +1. Store a turn with **`remember`**. +2. Ask a question with **`recall`**. +3. Print the matching memory, confidence, and token usage. + +You need **Python 3.9+**. Open [app.getmetacognition.com/signup](https://app.getmetacognition.com/signup), create an account, and copy the key shown once. - You only see the full key at creation time. Store it in a password manager or secret store now—rotating later is easy, guessing later is not. + You only see the full key when you create it. Store it now in a password manager or secret store. @@ -32,7 +38,7 @@ One script: `remember` a turn, `recall` it with a question, print scores and tok ``` - You need **Python ≥ 3.9**. On PyPI the package is **`tex-sdk`**; in code you **`import tex`**. + On PyPI the package is **`tex-sdk`**. In Python you **`import tex`**. @@ -64,31 +70,37 @@ One script: `remember` a turn, `recall` it with a question, print scores and tok ``` - You should see the shellfish line with a numeric score, plus `confidence` and token usage. If you get `AuthenticationError`, your key or `base_url` is wrong—start with [Troubleshooting](/troubleshooting). + You should see the shellfish line with a score, plus `confidence` and token usage. If you get `AuthenticationError`, check your key and `base_url`, then use [Troubleshooting](/troubleshooting). ## Next - - - Use `python-dotenv` or your framework’s loader. Keep `.env` out of git. In production, inject `TEX_API_KEY` from the same secret store you use for every other third-party key. - +Once the script works, decide how you want to load secrets and whether you want to stay on the SDK. + +### Load the key from a `.env` file + +Use `python-dotenv` or your framework's loader. Keep `.env` out of git. In production, load `TEX_API_KEY` from your normal secret store. - - Read [REST API overview](/api-reference/overview): exchange the API key for a JWT, then call ingestion and recall with `Authorization: Bearer …`. The SDK exists so you do not write that refresh loop by hand. - - +### Call the API without the SDK + +If you do not use the SDK, first exchange your API key for an **access token** and **refresh token**. Send `Authorization: Bearer ...` on ingest and recall. Refresh the access token when it expires. The [REST API overview](/api-reference/overview) lists the endpoints. + +The SDK does this for you. ## Reads + + If **`confidence`** is low but you know the memory exists, try **`mode="deep"`** once, raise **`top_k`**, or ask the question closer to the stored wording. [Recall and ranking](/concepts/retrieval) explains the knobs. + + | Goal | Page | | --- | --- | | Understand what gets stored | [How memory works](/concepts/memory-model) | | Tune recall quality | [Recall and ranking](/concepts/retrieval) | | Ship real users | [Scopes and multi-tenancy](/concepts/scopes) | -| Drop behind a real API | [Production chatbot (FastAPI)](/recipes/fastapi) | +| Put a backend in front of a UI | [Production chatbot (FastAPI)](/recipes/fastapi) | | Production errors | [Errors and retries](/sdk/errors) | diff --git a/recipes/azure-openai-rag.mdx b/recipes/azure-openai-rag.mdx index c4d6914..434f3b3 100644 --- a/recipes/azure-openai-rag.mdx +++ b/recipes/azure-openai-rag.mdx @@ -1,10 +1,10 @@ --- title: "RAG on Azure OpenAI" -description: "You pair Tex recall with Azure GPT-4o answers—same loop works for any chat-completions API if you swap the client." +description: "Use Tex recall with Azure OpenAI chat completions." icon: "cloud" --- -Recall → answer → remember. Same loop on OpenAI direct or Anthropic—swap the client. +This recipe uses the same flow as the [FastAPI recipe](/recipes/fastapi). Recall memory, answer with Azure OpenAI, then store the new turn. @@ -96,8 +96,8 @@ Recall → answer → remember. Same loop on OpenAI direct or Anthropic—swap t -## Harder stuff +## Production notes -- **You cite sources** — pass hit ids into the prompt and ask the model to quote `[mem:]` so you can deep-link later. -- **You retry deep recall** — if `confidence < 0.4`, you call `tex.recall(..., mode="deep")` once before answering. -- **You stream** — you set `stream=True`, and you enqueue `remember` in a background worker so the stream starts instantly. +- **Citations:** pass hit ids into the prompt and ask the model to quote `[mem:]`. Use that id to link answers back to memory. +- **Weak recall:** if `confidence < 0.4`, call `tex.recall(..., mode="deep")` once before answering. +- **Streaming:** set `stream=True`, then enqueue `remember` in a background worker so the stream can start quickly. diff --git a/recipes/fastapi.mdx b/recipes/fastapi.mdx index 8cf77ad..7546cc5 100644 --- a/recipes/fastapi.mdx +++ b/recipes/fastapi.mdx @@ -1,10 +1,16 @@ --- title: "Production chatbot (FastAPI)" -description: "You run a small FastAPI service with Tex on every turn—one client per process, recall, generate, remember." +description: "Build a FastAPI chat route that recalls memory, calls your model, and stores the new turn." icon: "server" --- -One `/chat` route: recall → your LLM → remember. One cached `Tex` per process so you are not reconnecting every hit. +This recipe puts the quickstart loop behind one HTTP route: + +1. Recall memory for the incoming message. +2. Call your model. +3. Store the user and assistant turns. + +Create one **`Tex`** client per process and reuse it across requests. ## Layout @@ -25,11 +31,11 @@ One `/chat` route: recall → your LLM → remember. One cached `Tex` per proces └── chat.py # /chat route ``` - You can rename `app/`—just keep the import paths consistent in `uvicorn`. + Rename `app/` if you want. Keep the import paths consistent in `uvicorn`. - You read secrets from the environment and construct Tex **once**: + Read secrets from the environment and construct Tex **once**: ```python deps.py from functools import cache @@ -47,11 +53,11 @@ One `/chat` route: recall → your LLM → remember. One cached `Tex` per proces ) ``` - You call `tex_client()` inside FastAPI `Depends(...)` so every route shares the same pool. + Use `tex_client()` inside FastAPI `Depends(...)` so every route shares the same connection pool. - You derive `session_id` from headers + body, recall with a small `top_k`, swallow quota/timeouts, then remember both sides of the turn: + Derive `session_id` from the user and the chat. Recall with a small `top_k`. If Tex times out or quota is exhausted, answer without memory. Then store both sides of the turn: ```python chat.py from datetime import datetime, timezone @@ -97,11 +103,11 @@ One `/chat` route: recall → your LLM → remember. One cached `Tex` per proces return {"answer": answer} ``` - You swap `your_llm.complete(...)` for whatever stack you already use (OpenAI, Azure, local, etc.). + Replace `your_llm.complete(...)` with your model call. - You mount the router once: + Mount the router once: ```python main.py from fastapi import FastAPI @@ -113,20 +119,20 @@ One `/chat` route: recall → your LLM → remember. One cached `Tex` per proces - You export your key and launch uvicorn: + Export your key and launch uvicorn: ```bash export TEX_API_KEY=tex_live_... uvicorn app.main:app --reload ``` - You hit `POST /chat` with JSON `{"text":"...","session_id":"..."}` and header `x-user-id`. + Send `POST /chat` with JSON `{"text":"...","session_id":"..."}` and header `x-user-id`. ## Full files -If you prefer one copy block, you can still paste the trio together: +If you prefer one copy block, paste these files: ```python deps.py @@ -196,45 +202,43 @@ app.include_router(router) ``` -## Prod tweaks +## Production tweaks - - - Holding the HTTP request open for `remember` adds ~100–250ms tail latency. You enqueue a background task instead: +### Run `remember` in the background - ```python - from fastapi import BackgroundTasks +Do not make the user wait for **`remember`**. Enqueue it in the background: - @router.post("/chat") - def chat( - body: ChatBody, - bg: BackgroundTasks, - x_user_id: str = Header(...), - tex: Tex = Depends(tex_client), - ): - # ... recall + answer ... - bg.add_task( - tex.conversations.remember, - session_id=sid, - turns=[user_turn, assistant_turn], - ) - return {"answer": answer} - ``` - +```python +from fastapi import BackgroundTasks + +@router.post("/chat") +def chat( + body: ChatBody, + bg: BackgroundTasks, + x_user_id: str = Header(...), + tex: Tex = Depends(tex_client), +): + # ... recall + answer ... + bg.add_task( + tex.conversations.remember, + session_id=sid, + turns=[user_turn, assistant_turn], + ) + return {"answer": answer} +``` - - You set `Tex(timeout=2.0)` and catch `APITimeoutError` so a slow recall never blocks your entire generation window. - +### Bound recall latency - - ```python - @app.get("/healthz") - def healthz(tex: Tex = Depends(tex_client)): - try: - tex.usage.today() - return {"ok": True} - except Exception as e: - return {"ok": False, "error": str(e)}, 503 - ``` - - +Set **`Tex(timeout=2.0)`** and catch **`APITimeoutError`**. If recall is slow, answer without memory instead of blocking the whole chat request. + +### Add a health probe + +```python +@app.get("/healthz") +def healthz(tex: Tex = Depends(tex_client)): + try: + tex.usage.today() + return {"ok": True} + except Exception as e: + return {"ok": False, "error": str(e)}, 503 +``` diff --git a/recipes/langchain.mdx b/recipes/langchain.mdx index c5d8db9..445de01 100644 --- a/recipes/langchain.mdx +++ b/recipes/langchain.mdx @@ -1,15 +1,17 @@ --- title: "LangChain agents with memory" -description: "You give LangChain a Tex-backed tool, or you inject recall yourself before the chain—pick what matches how much autonomy you want." +description: "Add Tex memory to LangChain by injecting recall before the chain or exposing recall as a tool." icon: "link" --- -Two setups: +There are two common ways to use Tex with LangChain. -| | | +Most chat apps should recall memory before the chain runs. Agents that choose their own steps can receive Tex as tools. + +| Pattern | When to use | | --- | --- | -| **Inject** | You recall before the chain; most chat apps. | -| **Tools** | Agent calls `recall` when it wants; heavier, flexible. | +| **Inject** | Your code recalls once per user message. | +| **Tools** | The model decides when to read or write memory. | ## Inject (default) @@ -116,12 +118,12 @@ Two setups: - You merge additional LangChain tools into the same `tools=[...]` list whenever your agent needs them—the Tex tools behave like every other tool. + Add other LangChain tools to the same `tools=[...]` list. The Tex tools behave like normal tools. -## Vs `BaseChatMemory` +## Compared to `BaseChatMemory` -LangChain buffers keep *everything*; Tex retrieves *top‑k*. You delete sliding-window hacks and stop blowing context windows. +LangChain buffers keep history in process. Tex stores memory outside the process and returns the top matches for the current question. That keeps prompts smaller and survives deploys. ```python Before @@ -136,8 +138,8 @@ chain = ConversationChain(llm=llm, memory=memory) hits = tex.recall(q=user_msg, session_id=sid, top_k=5) prompt = stitch(hits.hits.turns, user_msg) answer = llm.invoke(prompt) -tex.conversations.remember(session_id=sid, turns=[...]) # fill like your prod code +tex.conversations.remember(session_id=sid, turns=[...]) # use your normal turn format ``` -You want the full playbook? Read [Migrating from LangChain memory](/migration/from-langchain-memory). +For the migration details, continue to [Migrating from LangChain memory](/migration/from-langchain-memory). diff --git a/recipes/multi-tenant-saas.mdx b/recipes/multi-tenant-saas.mdx index 6093057..09e05ec 100644 --- a/recipes/multi-tenant-saas.mdx +++ b/recipes/multi-tenant-saas.mdx @@ -1,12 +1,12 @@ --- title: "Multi-tenant SaaS pattern" -description: "You fan out many customers on one Tex org, or you mint one key per customer—pick the pattern that matches your billing story." +description: "Choose one shared Tex org or one Tex org per customer." icon: "building" --- -You isolate tenants two ways. Most teams stay on **Pattern A**. +There are two common ways to isolate tenants. Most teams should start with **Pattern A**. -## A — one key, bake user into `session_id` +## A - one key, put the user in `session_id` @@ -53,7 +53,7 @@ You isolate tenants two ways. Most teams stay on **Pattern A**. return {"answer": answer} ``` - You swap `your_llm(...)` for your stack; you list the turns you already send today. + Replace `your_llm(...)` with your model call. Reuse the same turn format you already store. @@ -65,10 +65,10 @@ You isolate tenants two ways. Most teams stay on **Pattern A**. | Isolation | Strong, as long as you do not collide `session_id` | - The SDK accepts a constructor `user_id`, but **scopes are per client instance today**. If you spun up one `Tex` per end-user you would thrash TLS pools. Until per-call `user_id` ships in SDK **1.2**, you keep encoding the tenant into `session_id`. + The SDK accepts `user_id` in the constructor, but scopes are per client instance today. Creating one `Tex` client per end user would waste connection pools. Until per-call `user_id` ships in SDK **1.2**, keep the tenant in `session_id`. -## B — one key per customer org +## B - one key per customer org @@ -98,7 +98,7 @@ You isolate tenants two ways. Most teams stay on **Pattern A**. return Tex(api_key=row.tex_api_key, base_url=os.environ["TEX_BASE_URL"]) ``` - You cache instances in a TTL map (~1h) so warm connections stick around without leaking every user forever. + Cache instances in a TTL map for about 1 hour. That keeps warm connections without keeping every customer client forever. @@ -109,22 +109,22 @@ You isolate tenants two ways. Most teams stay on **Pattern A**. | Dashboard | Each customer can log into Tex directly | | Isolation | Hard boundary at org level | -## Which one +## Which pattern to choose - You ship an app *on top of* Tex—shared infra, simplest ops, you own metering. + You ship an app on top of Tex. Infrastructure is shared, ops are simpler, and metering stays in your product. - You *resell* Tex and customers expect their own bill + console. + You resell Tex and customers expect their own bill and console. ## Shared quota (A only) -Daily quotas are **per Tex org**. Under Pattern A every user shares **your** quota—one noisy tenant can starve everyone. +Daily quotas are **per Tex org**. Under Pattern A, every user shares **your** quota. One noisy tenant can affect everyone. -Mitigations **you** layer in: +Add these controls in your own app: - You track per-user bytes/tokens yourself (`usage` is on every response). - You soft-cap heavy users (for example switch off memory after they consume 10% of your daily budget). diff --git a/recipes/slack-bot.mdx b/recipes/slack-bot.mdx index 149405e..edb5e20 100644 --- a/recipes/slack-bot.mdx +++ b/recipes/slack-bot.mdx @@ -1,10 +1,12 @@ --- title: "Slack bot with channel memory" -description: "You run a Slack Bolt app that remembers every channel message and answers @mentions with Tex recall." +description: "Build a Slack Bolt app that remembers channel messages and answers mentions with Tex recall." icon: "hashtag" --- -You mirror each Slack channel into its own Tex `session_id`. You **listen** for plain messages to `remember`, and you **listen** for app mentions to `recall` + reply. +Give each Slack channel its own Tex **`session_id`**. Save normal messages with **`remember`**. When someone mentions the bot, use **`recall`** to answer from that channel's memory. + +This uses the same isolation pattern as [Scopes and multi-tenancy](/concepts/scopes). @@ -14,7 +16,7 @@ You mirror each Slack channel into its own Tex `session_id`. You **listen** for - You load classic bot + app tokens (socket mode shown here): + Load the bot and app tokens. This example uses socket mode: ```bash SLACK_BOT_TOKEN=xoxb-... @@ -25,7 +27,7 @@ You mirror each Slack channel into its own Tex `session_id`. You **listen** for - You ignore bot spam, remember human text, and answer mentions with recall: + Ignore bot messages, remember human text, and answer mentions with recall: ```python bot.py import os @@ -60,7 +62,7 @@ You mirror each Slack channel into its own Tex `session_id`. You **listen** for def answer(event, say): query = event["text"].split(">", 1)[-1].strip() if not query: - say("Ask me something — I'll dig through this channel's memory.") + say("Ask me something and I'll search this channel's memory.") return hits = tex.recall(q=query, session_id=session_for(event["channel"]), top_k=5) @@ -82,7 +84,7 @@ You mirror each Slack channel into its own Tex `session_id`. You **listen** for python bot.py ``` - You type in channel, then `@YourBot what did we decide?` to verify recall. + Type in a channel, then ask `@YourBot what did we decide?` to verify recall. @@ -92,4 +94,4 @@ You mirror each Slack channel into its own Tex `session_id`. You **listen** for | --- | --- | | **Channel vs DM** | You keep `slack-{channel}` for shared rooms; you append `-{user}` if you need private scratch space. | | **Noise** | You filter joins, uploads, reactions **before** `remember` so you do not pay tokens for junk. | -| **Latency** | You react with `:thinking_face:` immediately—`recall` can take 1–3s — then you post the final text. | +| **Latency** | React with `:thinking_face:` right away. `recall` can take 1-3s. Post the final answer after recall finishes. | diff --git a/recipes/streamlit.mdx b/recipes/streamlit.mdx index 9478e4b..e5337ff 100644 --- a/recipes/streamlit.mdx +++ b/recipes/streamlit.mdx @@ -1,10 +1,10 @@ --- title: "Streamlit chat UI" -description: "You build one Streamlit page that recalls from Tex, streams GPT output, and shows which memories fired." +description: "Build one Streamlit page that recalls from Tex, streams GPT output, and shows the memories used." icon: "desktop" --- -You get a **browser-friendly demo** that survives reruns because Tex—not `st.session_state`—owns long-term memory. You still keep lightweight UI state only for what Streamlit redraw needs. +This recipe builds a small browser demo. Streamlit can rerun the script often, but Tex keeps the long-term memory. `st.session_state` only stores UI state for the current browser session. @@ -14,7 +14,7 @@ You get a **browser-friendly demo** that survives reruns because Tex—not `st.s - You wrap `Tex` and `OpenAI` in `@st.cache_resource` so Streamlit does not recreate TLS pools every interaction: + Wrap `Tex` and `OpenAI` in `@st.cache_resource` so Streamlit does not recreate clients on every interaction: ```python import os @@ -38,7 +38,7 @@ You get a **browser-friendly demo** that survives reruns because Tex—not `st.s - You store a stable `sid` in session state; optionally you read `?uid=` from the query string so QA can fork personas quickly: + Store a stable `sid` in session state. You can also read `?uid=` from the query string to test different users: ```python if "sid" not in st.session_state: @@ -51,7 +51,7 @@ You get a **browser-friendly demo** that survives reruns because Tex—not `st.s - You call `tex.usage.today()` for a quick quota sanity check during demos: + Call `tex.usage.today()` to show current quota usage during demos: ```python with st.sidebar: @@ -66,7 +66,7 @@ You get a **browser-friendly demo** that survives reruns because Tex—not `st.s - You paint historic bubbles from `st.session_state.messages`, then watch `st.chat_input`: + Render messages from `st.session_state.messages`, then handle `st.chat_input`: ```python for m in st.session_state.messages: @@ -126,13 +126,13 @@ You get a **browser-friendly demo** that survives reruns because Tex—not `st.s streamlit run app.py ``` - You open `http://localhost:8501/?uid=alice` and `?uid=bob` in two tabs to prove isolation. + Open `http://localhost:8501/?uid=alice` and `?uid=bob` in two tabs to test isolation. ## One file -If you want one block to paste, use this full `app.py` (same logic as the steps above): +If you want one block to paste, use this full `app.py`: ```python app.py import os @@ -216,8 +216,10 @@ if prompt := st.chat_input("Talk to me…"): ] ``` -## Why bother +## Why Streamlit + Tex -- **You survive refreshes** — Streamlit session dies; Tex does not. -- **You skip Redis/Postgres** for early demos. -- **You debug recall visually** — expander shows exactly what the model saw. +This pattern pairs a disposable browser UI with durable memory in Tex. + +- **Refreshes keep memory:** Streamlit session state can reset; Tex memory stays. +- **No Redis or Postgres for demos:** Tex stores the long-term memory. +- **Visible recall:** the expander shows exactly what the model saw. diff --git a/sdk/client.mdx b/sdk/client.mdx index 6316ad1..e1c43ea 100644 --- a/sdk/client.mdx +++ b/sdk/client.mdx @@ -1,11 +1,13 @@ --- title: "Configure the client" -description: "Constructor arguments, TEX_API_KEY and TEX_BASE_URL, HTTP/2, timeouts, and how the client lives across requests." +description: "Set API keys, base URL, timeouts, retries, and client lifetime." icon: "sliders" --- ## `Tex(...)` constructor +Create one **`Tex`** client and reuse it. The same client handles **`remember`**, **`recall`**, token refresh, retries, and usage calls. + ```python Tex( api_key: str | None = None, @@ -33,7 +35,7 @@ Tex( - Default `org_id` for all requests. Optional — the SDK auto-fills from your JWT. + Default `org_id` for all requests. Optional. The SDK auto-fills it from your JWT. @@ -41,19 +43,19 @@ Tex( - A default `session_id`. You usually pass this per-call instead. + Default `session_id`. Most apps pass this per call. Bring-your-own JWT. If set, the SDK skips the `api_key` exchange. - With BYO-JWT, the SDK does **not** auto-fill `org_id` / `user_id` from `/auth/verify`. Pass them explicitly to the constructor — `remember` and `recall` need them in the request `scope`. + With BYO-JWT, the SDK does **not** auto-fill `org_id` / `user_id` from `/auth/verify`. Pass them to the constructor. `remember` and `recall` need them in the request `scope`. - Companion to `access_token` — used for transparent refresh on 401. + Companion to `access_token`. Used for refresh on 401. @@ -61,7 +63,7 @@ Tex( - Retries on transient errors (408 / 429 / 5xx / network). Exponential backoff. + Retries transient errors: 408, 429, 5xx, and network failures. @@ -84,7 +86,7 @@ TEX_BASE_URL=https://api.getmetacognition.com ## Lifecycle -The client maintains a pooled `httpx.Client` under the hood. **Construct once, reuse everywhere.** +The client keeps a pooled `httpx.Client` under the hood. **Construct once, reuse everywhere.** @@ -116,7 +118,7 @@ The client maintains a pooled `httpx.Client` under the hood. **Construct once, r tex = Tex(api_key=...) # opens a new TLS session every request return tex.recall(...) ``` - This costs you the TCP+TLS handshake on every call. + This opens a new TCP and TLS connection on every call. @@ -124,7 +126,7 @@ The client maintains a pooled `httpx.Client` under the hood. **Construct once, r The client is **thread-safe for read traffic** (`recall`, `usage.today`). -For write traffic under high RPS, push to a worker pool: +For high write volume, push **`remember`** calls to a worker pool: ```python from concurrent.futures import ThreadPoolExecutor @@ -134,7 +136,7 @@ def remember_async(turns, sid): pool.submit(tex.conversations.remember, turns=turns, session_id=sid) ``` -A native async client is on the roadmap. Open an issue if you need it sooner. +A native async client is on the roadmap. ## Closing @@ -142,7 +144,7 @@ A native async client is on the roadmap. Open an issue if you need it sooner. tex.close() # closes the underlying httpx.Client ``` -Or use the context manager pattern (above) — `__exit__` calls `close()`. +Or use the context manager pattern above. `__exit__` calls `close()`. Push conversation turns into memory. diff --git a/sdk/conversations-remember.mdx b/sdk/conversations-remember.mdx index dd1c102..ab682b8 100644 --- a/sdk/conversations-remember.mdx +++ b/sdk/conversations-remember.mdx @@ -1,9 +1,11 @@ --- title: "Remember conversation turns" -description: "Persist turns (and optional observations) to a session—payload shape, timestamps, and what happens on the wire." +description: "Store turns in a session, with optional observations and metadata." icon: "pen-to-square" --- +Use **`tex.conversations.remember`** to store new conversation turns. This is the write side of the loop from the [Quickstart](/quickstart). + ```python RememberResponse = tex.conversations.remember( turns: list[dict], @@ -16,7 +18,7 @@ RememberResponse = tex.conversations.remember( ## Parameters - A list of turn dicts. **At least one turn is required** (server rejects empty lists with `422`). Each turn: + List of turn dicts. **At least one turn is required**. Empty lists return `422`. Each turn: ```python { @@ -33,17 +35,17 @@ RememberResponse = tex.conversations.remember( - Free-form metadata attached to the batch. Searchable in deep mode. + Free-form metadata attached to the batch. Deep mode can use it during search. ## Returns - Stable identifier for this write. Use it to correlate with logs. Always present in production. + Stable identifier for this write. Use it to match SDK logs with server logs. - IDs of the active-memory fragments. These are recallable immediately. + IDs of the active-memory fragments. These are recallable right away. @@ -97,7 +99,7 @@ tex.conversations.remember(session_id=f"chat-{conv_id}", turns=turns) ### Pre-extracted observations -If you've already extracted structured facts on your side (e.g. with your own LLM call), pass them inline to skip Tex's extraction step: +If your app already extracted structured facts, pass them inline: ```python turns = [{ @@ -115,26 +117,26 @@ tex.conversations.remember(session_id="chat-1", turns=turns) ## Behavior - - The call returns after active memory persists (~150ms). The turn is recallable immediately. + + The call returns after active memory is saved, usually around 150ms. The turn is recallable right away. - - Observations and entities are extracted and indexed in the background. They become available in subsequent recalls within seconds to a minute. + + Tex extracts observations and entities in the background. They appear in later recalls. ## Best practices - **Batch.** Pass dozens of turns in one call. Don't loop one-per-turn. -- **UTC ISO 8601** for timestamps (`...Z` suffix). Avoids timezone surprises in temporal queries. +- **Use UTC ISO 8601** for timestamps (`...Z` suffix). This keeps temporal queries clear. - **Skip system messages.** They consume tokens and add noise to recall. -- **Background-task `remember`.** Users shouldn't wait for it on the request path — push to Celery / RQ / a `BackgroundTasks` queue. +- **Run `remember` off the request path.** Use Celery, RQ, or a `BackgroundTasks` queue when users should not wait. ## Idempotency -Tex computes a stable hash per turn (text + timestamp + role). Re-sending the same turn is a no-op — there's no double-counting in active memory or in your token bill. +Tex computes a stable hash per turn from text, timestamp, and role. Re-sending the same turn is a no-op. It does not create duplicate active memory or double bill the same turn. -This means you can safely retry a `remember` call after a network blip without worrying about duplicates. +This makes retries safe after a network blip. Pull the relevant slice of memory. diff --git a/sdk/errors.mdx b/sdk/errors.mdx index dd8744e..44689f8 100644 --- a/sdk/errors.mdx +++ b/sdk/errors.mdx @@ -4,7 +4,9 @@ description: "Exception types, which status codes retry automatically, and what icon: "triangle-exclamation" --- -All Tex exceptions inherit from `tex.TexError`. You almost always want to catch one of: +When something fails, start with the exception class. If you need a symptom-based guide, use [Troubleshooting](/troubleshooting). + +All Tex exceptions inherit from `tex.TexError`. Most apps catch one of these: | Class | Status | Inherits | When | | --- | --- | --- | --- | @@ -18,25 +20,25 @@ All Tex exceptions inherit from `tex.TexError`. You almost always want to catch | [`InternalServerError`](#internalservererror) | 5xx | `APIStatusError` | Our problem; SDK retried | | [`APITimeoutError`](#apitimeouterror) | — | `APIConnectionError` → `APIError` | Network or server too slow | | [`APIConnectionError`](#apiconnectionerror) | — | `APIError` | DNS, TLS, connection reset | -| [`APIResponseValidationError`] | — | `APIError` | Server returned an unexpected shape | +| [`APIResponseValidationError`] | - | `APIError` | Server returned an unexpected response | `TexHTTPError` (alias of `APIStatusError`) and `TexAuthError` (alias of `AuthenticationError`) are kept for backward compatibility. -## Common shape +## Common fields ```python # All TexError subclasses e.message # human-readable -# APIStatusError subclasses (everything with an HTTP status — see "Inherits" column above) +# APIStatusError subclasses (everything with an HTTP status) e.status_code # int -e.request_id # X-Correlation-ID — quote in support tickets -e.details # dict — server's full JSON, may include field-level errors +e.request_id # X-Correlation-ID; include this in support tickets +e.details # dict; server JSON, may include field errors e.response_text # raw response body, capped at 2KB ``` - `APITimeoutError` and `APIConnectionError` are **network errors**, not HTTP errors — they don't have `status_code`, `request_id`, `details`, or `response_text` because the request never produced a response. Catch them separately. + `APITimeoutError` and `APIConnectionError` are **network errors**, not HTTP errors. They do not have `status_code`, `request_id`, `details`, or `response_text` because the request never produced a response. Catch them separately. ```python @@ -60,47 +62,47 @@ except BadRequestError as e: ### `BadRequestError` -Returned when a payload is malformed. Common causes: +Raised when a payload is malformed. Common causes: - Missing required field on a turn (e.g. no `text`) - Invalid `mode` value on `recall` -- Invalid `session_id` shape (must be a string) +- Invalid `session_id` value. It must be a string. -`e.details` includes a Pydantic-style `loc` list that tells you exactly which field broke. +`e.details` includes a Pydantic-style `loc` list that points to the bad field. ### `AuthenticationError` -Status 401. The SDK already attempted a JWT refresh (once) before raising. +Status 401. The SDK already tried one JWT refresh before raising. ```python try: tex.recall(q=q, session_id=sid) except AuthenticationError as e: if "Invalid API key" in e.message: - # Real bad-key situation + # Bad API key rotate_key_alarm() else: - # JWT refresh failed — likely revoked + # JWT refresh failed, likely revoked notify_user("please log in again") ``` ### `PermissionDeniedError` -Status 403. The credential is valid but lacks scope. Mostly relevant for keys you've scoped down (admin keys, read-only keys). For default keys this shouldn't fire. +Status 403. The credential is valid but lacks scope. This mostly matters for scoped keys. Default keys should not hit this. ### `NotFoundError` -Status 404. You referenced something that doesn't exist — typically a stale `key_id` on `DELETE /me/api-keys/{id}`. +Status 404. You referenced something that does not exist. This is often a stale `key_id` on `DELETE /me/api-keys/{id}`. ### `UnprocessableEntityError` -Status 422. FastAPI's Pydantic validation rejected the payload. The SDK builds payloads for you, so this usually only fires when you've passed an unexpected type. +Status 422. FastAPI validation rejected the payload. The SDK builds payloads for you, so this usually means an argument has the wrong type. ### `RateLimitError` Status 429. The SDK retries on `429` like other transient codes (with exponential backoff and `Retry-After` honored), so by the time you see this exception the SDK has already exhausted retries. -For **daily-quota 429s** (the normal case), retries are futile until midnight UTC — set `max_retries=0` for paths where you'd rather fail fast and degrade: +For **daily-quota 429s**, retries will not help until midnight UTC. Set `max_retries=0` on paths where you would rather fail fast: ```python try: @@ -109,7 +111,7 @@ except RateLimitError as e: return generate_without_memory(q) # graceful degradation ``` -The `e.details` payload tells you which cap fired (`tokens_in_daily` or `tokens_out_daily`) and when it resets: +The `e.details` payload tells you which cap was exceeded (`tokens_in_daily` or `tokens_out_daily`) and when it resets: ```python e.details @@ -118,11 +120,11 @@ e.details ### `InternalServerError` -Status 5xx. The SDK already retried (default 2x with exponential backoff). If you still see this, file a ticket with `e.request_id`. +Status 5xx. The SDK already tried the request again with exponential backoff. If you still see this, file a ticket with `e.request_id`. ### `APITimeoutError` -The request didn't return within `timeout`. The SDK retries on timeout, but ultimately surfaces `APITimeoutError` if all attempts fail. +The request did not return within `timeout`. The SDK retries timeouts. If every attempt fails, it raises `APITimeoutError`. ```python try: @@ -133,7 +135,7 @@ except APITimeoutError: ### `APIConnectionError` -DNS, TLS, or socket-level failure. Same retry behavior as `APITimeoutError`. If you see this in production, check your egress proxy / firewall. +DNS, TLS, or socket-level failure. The retry behavior is the same as `APITimeoutError`. If you see this in production, check your egress proxy or firewall. ## Built-in retries @@ -149,15 +151,15 @@ Default: **2 retries** with exponential backoff (0.5s, 1s). Override: tex = Tex(api_key=..., max_retries=5) ``` -The `Retry-After` header is honored — if the server says wait 3 seconds, the SDK waits at least 3 (not capped at the backoff value). +The SDK honors `Retry-After`. If the server says wait 3 seconds, the SDK waits at least 3 seconds. - `429` on quota retries the same as other `429`s—but you are still over the cap when the retry lands, so `RateLimitError` still fires. If that waste bugs you, set `max_retries=0` on quota-sensitive paths. + Quota `429`s retry like other `429`s. The retry will still fail if you are over the cap. Set `max_retries=0` on quota-sensitive paths if you want to fail faster. ## Idempotency -`remember` is idempotent (turn-hash dedup). `recall` and `usage.*` are read-only. Safe to retry any of them. +`remember` is idempotent because Tex deduplicates turns by hash. `recall` and `usage.*` are read-only. It is safe to retry any of them. Direct HTTP integration without the SDK. diff --git a/sdk/installation.mdx b/sdk/installation.mdx index 5ace042..9661b48 100644 --- a/sdk/installation.mdx +++ b/sdk/installation.mdx @@ -1,13 +1,15 @@ --- title: "Install the SDK" -description: "Python 3.9+, pip or uv or poetry, verify the import, pin a version range in requirements." +description: "Install tex-sdk, verify the import, and pin a safe version range." icon: "download" --- ## Requirements -- Python ≥ 3.9 -- `pip`, `uv`, or `poetry`—pick whichever you already standardize on +- Python 3.9 or newer +- `pip`, `uv`, or `poetry` + +If you have not made a live **`remember`** and **`recall`** call yet, start with the [Quickstart](/quickstart). ## Install @@ -46,11 +48,13 @@ python -c "import tex; print(tex.__version__)" tex-sdk>=1.1.0,<2 ``` -For PEP 621 `pyproject.toml`, add `"tex-sdk>=1.1.0,<2"` under `[project].dependencies`. For Poetry, use `tex-sdk = "^1.1.0"` under `[tool.poetry.dependencies]` so minor releases flow in without a major bump surprising you. +For PEP 621 `pyproject.toml`, add `"tex-sdk>=1.1.0,<2"` under `[project].dependencies`. + +For Poetry, use `tex-sdk = "^1.1.0"` under `[tool.poetry.dependencies]`. -## HTTP/2 and grumpy proxies +## HTTP/2 and proxies -The SDK depends on `httpx[http2]` for multiplexing. If your corporate proxy strips HTTP/2, disable it when you construct the client: +The SDK uses `httpx[http2]`. If your proxy strips HTTP/2, disable it when you construct the client: ```python tex = Tex(api_key=..., http2=False) @@ -58,7 +62,7 @@ tex = Tex(api_key=..., http2=False) ## Types -Stubs ship with the package (`py.typed`). Mypy and Pyright pick them up automatically—you do not need a separate `types-` stub distribution. +Type stubs ship with the package (`py.typed`). Mypy and Pyright pick them up automatically. You do not need a separate `types-` package. ```python from tex import Tex, RecallResponse diff --git a/sdk/recall.mdx b/sdk/recall.mdx index 6d2bc2f..9d83f52 100644 --- a/sdk/recall.mdx +++ b/sdk/recall.mdx @@ -1,9 +1,13 @@ --- title: "Recall relevant context" -description: "Query memory with natural language, get ranked hits, confidence, and usage—all from the Python client." +description: "Query memory with natural language and get ranked hits, confidence, and usage." icon: "magnifying-glass" --- +Use **`tex.recall`** before you call your model. It returns the memory that best matches the current user message. + +For **`mode`**, **`top_k`**, and **`confidence`**, read [Recall and ranking](/concepts/retrieval). This page lists the Python fields. + ```python RecallResponse = tex.recall( q: str, @@ -16,29 +20,29 @@ RecallResponse = tex.recall( ``` - `recall` is callable on the client itself: `tex.recall(...)`. There's no `tex.recall.search(...)` indirection. + Call recall directly on the client: `tex.recall(...)`. There is no `tex.recall.search(...)`. ## Parameters - Natural-language query. Same shape as the user's most recent turn works well. + Natural-language query. The user's latest message usually works well. - Which session(s) to search. Use the same `session_id` you wrote with. + Session to search. Use the same `session_id` you wrote with. - Retrieval depth. See [Retrieval](/concepts/retrieval). + Retrieval depth. See [Recall and ranking](/concepts/retrieval). - Number of hits across all kinds. Defaults to **15** in `active` mode, **25** in `deep` mode. Server caps at **30** regardless of the value you send. + Number of hits across all kinds. Defaults to **15** in `active` mode and **25** in `deep` mode. The server caps the final value at **30**. - Adds a chronological list of relevant events to the response. + When true, the response includes a pre-rendered **`timeline`** string (not a structured list). ## Returns @@ -48,15 +52,15 @@ RecallResponse = tex.recall( - Atomic facts extracted from past turns (e.g. preferences, decisions). + Small facts extracted from past turns, such as preferences or decisions. - People / places / things linked across observations. + People, places, and things linked across observations. - Calibrated confidence in [0, 1]. P(hits relevant | confidence = c) ≈ c. + Calibrated confidence in [0, 1]. Higher means the returned memory is more likely to help. @@ -71,29 +75,29 @@ RecallResponse = tex.recall( `tokens_in` / `tokens_out` billed for this call. Always present in production. -### `RecallHit` shape (turns / observations) +### `RecallHit` fields (turns / observations) ```python @dataclass(frozen=True) class RecallHit: id: str | None # stable; persists across recalls text: str # matched content - score: float # raw relevance, 0.0–1.0 - kind: str # "turn" | "observation" | "entity" — defaults to "turn" + score: float # raw relevance, 0.0-1.0 + kind: str # "turn" | "observation" | "entity"; defaults to "turn" timestamp: str | None ``` -### `RecallEntity` shape (entities only) +### `RecallEntity` fields (entities only) ```python @dataclass(frozen=True) class RecallEntity: id: str | None - label: str # the entity's surface label (e.g. "Acme") + label: str # the entity label (e.g. "Acme") score: float ``` -`RecallEntity` is **not** the same shape as `RecallHit` — it has `label` instead of `text` and no `kind` / `timestamp`. +`RecallEntity` is **not** the same as `RecallHit`. It has `label` instead of `text`, and it does not have `kind` or `timestamp`. ## Examples @@ -135,7 +139,7 @@ if hits.timeline: print(hits.timeline) # a pre-rendered chronological summary string ``` -`timeline` is a free-form string that the server pre-renders by walking the temporal index for matched turns. Treat it as a paragraph to drop into a prompt, not a structured array. +`timeline` is a free-form string. Drop it into a prompt as text. Do not treat it like an array. ### Multi-source recall @@ -155,7 +159,7 @@ context = "\n".join(f"- {h.text}" for h in (bio.hits.turns + chat.hits.turns)) | `active` | 1.7s | 2.5s | Every interactive call | | `deep` | 3.5s | 6s | Periodic analysis, low-confidence retries | -Set `timeout=2.0` on the constructor for interactive paths to bound your tail latency. Catch `APITimeoutError` and degrade gracefully: +Set `timeout=2.0` on the constructor for interactive paths. Catch `APITimeoutError` and continue without memory: ```python try: diff --git a/sdk/usage.mdx b/sdk/usage.mdx index 2245856..be458be 100644 --- a/sdk/usage.mdx +++ b/sdk/usage.mdx @@ -1,16 +1,18 @@ --- title: "Inspect usage" -description: "Today's totals and monthly rollups—same fields as the dashboard." +description: "Read today's totals and monthly rollups from the SDK." icon: "chart-pie" --- +Use these methods to read org-wide usage totals. They are useful for dashboards, quota checks, and billing reports. + ```python tex.usage.today() # today's usage + daily quota tex.usage.summary() # current calendar month tex.usage.summary(month="2026-03") # specific month ``` -These calls don't count against your own quota — query them as often as you want. +These helpers are read-only. **They do not count toward your quota**. ## `tex.usage.today()` @@ -94,23 +96,23 @@ df.set_index("month")[["tokens_in","tokens_out"]].plot.bar() ## Patterns -- **Per-response usage** is on every `remember` / `recall`: +- **Per-response usage** is on every `remember` and `recall`: ```python hits = tex.recall(q="...", session_id=sid) print(hits.usage.tokens_in, hits.usage.tokens_out) ``` -- **Quota-aware routing** — gate non-essential paths past 90%: +- **Quota-aware routing** - turn off non-essential memory paths past 90%: ```python if tex.usage.today().tokens_in_used > 0.9 * tex.usage.today().tokens_in_limit: return generate_without_memory(query) ``` - Cache `usage.today()` on your side (≤ 60s) — this is soft gating, not strict. + Cache `usage.today()` on your side for up to 60 seconds. Use it as a soft guard, not a strict limiter. -- **Usage charts** — call `summary(month="YYYY-MM")` for each of the last 6 months and render with whatever charting lib you use. +- **Usage charts** - call `summary(month="YYYY-MM")` for each of the last 6 months and render the results in your charting library. Every exception class and how to handle it. diff --git a/snippets/flow-visuals.mdx b/snippets/flow-visuals.mdx index e1b85a3..64a33a2 100644 --- a/snippets/flow-visuals.mdx +++ b/snippets/flow-visuals.mdx @@ -102,14 +102,14 @@ export const TokenRetryVisual = () => ( POST /auth/token-exchange
- Send your API key again. 200 means retry with the new token. 401 means the key is dead—you need a new + Send your API key again. 200 means retry with the new token. 401 means the key is dead and you need a new one or a proper login flow.

In most SDK setups a single failed request can walk through refresh (and then exchange) before your code - surfaces an error—unless the full chain returns 401. + returns an error unless the full chain returns 401.

diff --git a/troubleshooting.mdx b/troubleshooting.mdx index cab97a5..2e38bf4 100644 --- a/troubleshooting.mdx +++ b/troubleshooting.mdx @@ -1,55 +1,53 @@ --- title: "Troubleshooting" -description: "Symptoms, likely causes, fixes, and what we need in a support email when you are stuck." +description: "Match an error or symptom, apply the fix, and collect the right details for support." icon: "wrench" --- - Scan the **symptom table** first and match the log line or exception you are seeing. Open the accordions below when you need more narrative around a fix. + Start with the symptom table. Find the error or behavior you see, then try the fix in the same row. ## Match your symptom | Symptom | Likely cause | Fix | | --- | --- | --- | -| `AuthenticationError: Invalid API key` on first call | Wrong key, revoked key, or wrong `base_url` | Mint a fresh key in the [dashboard](https://app.getmetacognition.com); confirm `TEX_BASE_URL` matches the environment you think you are hitting | -| `BadRequestError: 'scope' field required` | Raw REST call without the scaffolding the SDK adds | Use the SDK, or include `scope: {org_id, session_id}` yourself in the JSON body | +| `AuthenticationError: Invalid API key` on first call | Wrong key, revoked key, or wrong `base_url` | Mint a fresh key in the [dashboard](https://app.getmetacognition.com); confirm `TEX_BASE_URL` points to the right environment | +| `BadRequestError: 'scope' field required` | Raw REST call without the fields the SDK normally adds | Use the SDK, or include `scope: {org_id, session_id}` yourself in the JSON body | | `recall` returns zero hits | Brand-new session, enrichment still catching up, or query mismatch | Wait a second after `remember`; broaden `q`; try `mode="deep"` once to see if signal appears | | `recall.confidence` looks stuck low | Query does not overlap stored content | Rephrase closer to the stored wording; raise `top_k`; try `mode="deep"` | -| `RateLimitError` mid-day | Org exhausted daily quota | Wait for **00:00 UTC**, lower `top_k`, or trim noisy writes—see [Usage and billing](/concepts/usage-billing) | +| `RateLimitError` mid-day | Org exhausted daily quota | Wait for **00:00 UTC**, lower `top_k`, or trim noisy writes. See [Usage and billing](/concepts/usage-billing) | | Slow `recall` (> 5s) | `mode="deep"` or cold caches | Default to `mode="active"` in user-facing paths; warm the client on deploy | | `httpx.RemoteProtocolError` / HTTP/2 noise | Middlebox stripped HTTP/2 | Instantiate with `Tex(http2=False)` | -| Long `remember` blocks UX | You await ingestion on the hot request path | Push persistence to a worker—pattern in the [FastAPI recipe](/recipes/fastapi#production-tweaks) | +| Long `remember` blocks UX | You await ingestion on the hot request path | Push persistence to a worker. The [FastAPI recipe](/recipes/fastapi#production-tweaks) shows the pattern | | Flaky CI against prod | Shared org contention or dirty sessions | Give CI its own org and sweep data on a schedule | | Dashboard `last_used_at` looks stale | Browser cache | Hard refresh; the field updates within seconds of real traffic | ## Extra - - - SDK clients cache JWTs until expiry. After you deploy a new `TEX_API_KEY`, restart workers or recreate the client so they stop presenting tokens minted from the old secret. Confirm you did not typo `https://api.getmetacognition.com` in staging configs. - +### Auth still failing after you rotated keys - - Recall ranks by relevance, not chronological order—set `include_timeline=True` when you need strict time order for the model. If overlap is still weak, your `session_id` might not match the one you wrote with—re-read [Scopes and multi-tenancy](/concepts/scopes). - +SDK clients cache JWTs until expiry. After you deploy a new **`TEX_API_KEY`**, restart workers or recreate the client. That stops them from sending tokens created from the old key. Also confirm **`https://api.getmetacognition.com`** is spelled correctly in staging configs. - - Small jitter can come from rerank tie-breaks keyed by hashes. If spread exceeds ~0.1, capture `request_id` and file a ticket—something else is going on. - - +### You see hits but they feel unrelated + +Recall ranks by **relevance**, not chronological order. Set **`include_timeline=True`** when the model needs time order. If results still look wrong, check that you are using the same **`session_id`** for write and read. [Scopes and multi-tenancy](/concepts/scopes) explains the mapping. + +### Confidence swings between identical queries + +Small score changes can happen when candidates are close together. If the spread is bigger than **~0.1**, capture **`request_id`** and file a ticket. ## Filing a ticket - Copy `e.request_id` from any SDK exception—UUID-shaped, safe to share. + Copy `e.request_id` from any SDK exception. It is safe to share. Give us the approximate UTC time the call failed. - Include the method (`recall`, `remember`, …), `session_id`, and whether you hit REST or the SDK. + Include the method (`recall`, `remember`, etc.), `session_id`, and whether you hit REST or the SDK. Run `python -c "import tex; print(tex.__version__)"`. Scrub API keys or PII before you press send. @@ -76,6 +74,6 @@ import tex; print(tex.__version__) ## Things that look like bugs but are not -- **Hits sorted funny:** relevance ordering is intentional—use timeline mode when you need chronology. -- **Empty cross-session recall:** sessions are isolated until you design a scope strategy—see [Scopes and multi-tenancy](/concepts/scopes). +- **Hits are not chronological:** relevance ordering is intentional. Use timeline mode when you need chronology. +- **Empty cross-session recall:** sessions are isolated until you design a scope strategy. See [Scopes and multi-tenancy](/concepts/scopes). - **Identical queries, tiny score deltas:** expect minor movement; large swings merit a ticket.