Skip to content

Latest commit

 

History

History
386 lines (316 loc) · 13.6 KB

File metadata and controls

386 lines (316 loc) · 13.6 KB

AGENTS.md — DataGems (instructions for Codex)

This file is the briefing packet for Codex, the AI coding agent developing DataGems.

DataGems is a full-stack TypeScript web app + API for generating synthetic datasets by making one LLM-style inference call per record. It uses Ratio1’s CStore for shared state and @ratio1/cstore-auth-ts for authentication, plus R1FS for large artifacts.


Deployment endpoint

  • Production app URL: https://datagems.app

0) Non‑negotiables

  • Use @ratio1/cstore-auth-ts for auth. All actions require auth (UI and API).
  • Persist everything in CStore/R1FS:
    • CStore: users (via cstore-auth-ts), job base records, per‑peer progress, metrics, user indexes.
    • R1FS: job details (schema + instructions) and per‑peer results.
  • Inference API uses an OpenAI-like /create_chat_completion endpoint.
  • Constants: base inference URL, path, and the system prompt(s) must be constants.
  • One request per record (N records = N inference calls).
  • UI is minimal but must show:
    • progress for each in‑flight record
    • a dashboard of all jobs and their status
    • download completed results as JSON or CSV
  • Multi‑instance: Each replica works its assigned slice only via CStore + R1FS (no peer HTTP).

1) Tech choices (use these unless you have a strong reason)

  • Next.js (App Router) + TypeScript for one‑repo full‑stack.
  • Minimal UI: plain React + basic CSS.
  • Node runtime for backend routes (avoid Edge runtime for crypto/session libs).

2) Repo layout (current)

/app
  /(auth)/login/page.tsx
  /(auth)/register/page.tsx           # signup-by-email
  /(app)/page.tsx                     # dashboard UI
  /api
    /auth
      /login/route.ts
      /logout/route.ts
      /me/route.ts
      /register/route.ts
    /metrics/route.ts
    /tasks/route.ts                   # GET list only
    /tasks/schema/route.ts            # POST draft schema
    /tasks/confirm/route.ts           # POST confirm -> create job
    /tasks/[id]/route.ts              # GET job + peers + details
    /tasks/[id]/export/route.ts       # GET download json/csv
    /user
      /settings/route.ts              # GET/POST inference settings (optional)
      /models/route.ts                # POST list models
/components
  /TasksPanel.tsx                     # schema-first flow + job cards
  /LogoutButton.tsx
/lib
  /ratio1
    client.ts                         # edge-sdk client init
    auth.ts                           # CStoreAuth init + helpers
    keys.ts                           # key naming + helpers
    mock.ts                           # in-memory mock mode
    r1fs.ts                           # R1FS helpers
  /auth
    session.ts                        # session cookie/jwt helpers
    requireAuth.ts                    # route guard
    mailer.ts                         # SMTP helper for signup email
  /datagen
    constants.ts                      # INFERENCE_BASE_URL, CREATE_CHAT_COMPLETION_PATH, prompts
    types.ts                          # job + peer + details types
    jobStore.ts                       # CStore persistence
    jobWorker.ts                      # polling worker + local cache
    inference.ts                      # inference call + parsing
    draftToken.ts                     # signed schema draft token
    metrics.ts                        # counters + aggregation
    userIndex.ts                      # user index in CStore
    polling.ts                        # UI polling intervals
    peers.ts                          # peer config + assignment split
    exporters.ts                      # json/csv export

3) Environment variables (local + prod)

Ratio1 Edge SDK (CStore/R1FS endpoints)

  • EE_CHAINSTORE_API_HOST + EE_CHAINSTORE_API_PORT (fallback: EE_CHAINSTORE_API_URL)
  • EE_R1FS_API_HOST + EE_R1FS_API_PORT (fallback: EE_R1FS_API_URL)

cstore-auth-ts (preferred names)

  • R1EN_CSTORE_AUTH_HKEY
  • R1EN_CSTORE_AUTH_SECRET
  • R1EN_CSTORE_AUTH_BOOTSTRAP_ADMIN_PWD

Legacy EE_CSTORE_AUTH_* names are supported; prefer R1EN_* in new code.

DataGems app secrets

  • DATAGEN_SESSION_SECRET (used to sign cookies/JWTs and draft tokens)
  • DATAGEN_APP_HOST + DATAGEN_APP_PORT (defaults: $R1EN_HOST_IP + 3000)
  • DATAGEN_MAX_RECORDS_PER_JOB (default 200)

Inference API

  • DATAGEN_INFERENCE_HOST + DATAGEN_INFERENCE_PORT (defaults: $R1EN_HOST_IP + $API_PORT)
  • DATAGEN_INFERENCE_BASE_URL (fallback for host/port)

Multi-instance / peer identity

  • R1EN_CHAINSTORE_PEERS (comma-separated peer ids OR JSON array string, e.g. ["peerA","peerB"])
  • R1EN_HOST_ADDR (must match one entry in R1EN_CHAINSTORE_PEERS)

Worker behavior

  • DATAGEN_JOB_POLL_SECONDS (default 5)
  • DATAGEN_UPDATE_EVERY_K_REQUESTS (default 5)
  • DATAGEN_MAX_CONCURRENT_JOBS_PER_INSTANCE (default 1)
  • DATAGEN_LOCAL_CACHE_DIR (default /_local_cache/datagen)

Signup email

  • DATAGEN_SMTP_HOST
  • DATAGEN_SMTP_PORT
  • DATAGEN_SMTP_USER
  • DATAGEN_SMTP_PASS
  • DATAGEN_SMTP_FROM

Runtime flags / UI knobs

  • DATAGEN_MOCK_CSTORE — in-memory mock CStore/auth (admin/admin, test_user/testtest)
  • DATAGEN_MOCK_INFERENCE_API — in-memory inference stub
  • LOG_INFERENCE_REQUESTS — logs outgoing inference request (Authorization redacted)
  • DATAGEN_LOG_R1FS_CALLS — logs R1FS call start/success/error events
  • RETRY_INFERENCE_ON_FAILURE — retry one extra inference call on failure/parse errors
  • NEXT_PUBLIC_SHOW_FAILURES — show failure count in UI
  • DATAGEN_MAX_EXTERNAL_API_CONFIGS — max saved external API profiles per user (default 10)
  • DATAGEN_ACTIVE_POLL_SECONDS / DATAGEN_IDLE_POLL_SECONDS — UI polling intervals (defaults: 10 / 30)
  • NEXT_PUBLIC_DATAGEN_UI_TEST_PRESET — optional JSON string to auto-fill UI generation form fields for testing

4) Authentication model (required)

  • Username/password login using CStoreAuth.simple.authenticate(...).
  • Server issues a signed session token as an HttpOnly cookie.
  • All API routes require auth (including /api/metrics).
  • UI never talks directly to CStore/R1FS; it only calls our API routes.

5) Persistence model in CStore + R1FS

Key naming

  • datagen:metrics (metrics hash)
  • datagen:jobs (hash: jobId -> DataGemsJobBase JSON)
  • datagen:job:{jobId}:peers (hash: peerId -> DataGemsJobPeerState JSON)
  • datagen:user:{username}:jobs (hash: jobId -> summary JSON)
  • datagen:users (hash: username -> DataGemsUserIndex JSON)
  • datagen:user:{username}:settings (JSON)

Job base (CStore)

type JobStatus = "queued" | "running" | "succeeded" | "failed";

type DataGemsJobBase = {
  id: string;
  owner: string;
  title: string;
  status: JobStatus;
  totalRecords: number;
  datasetMode?: boolean;
  peers: string[];
  peerCount: number;
  totalGenerated: number;
  totalOk: number;
  totalFailed: number;
  jobDetailsCid: string;
  createdAt: string;
  schemaGeneratedAt: string;
  jobStartedAt?: string;
  jobFinishedAt?: string;
  schemaDurationMs: number;
  recordsDurationMs?: number;
  schemaRefreshes: number;
  updatedAt: string;
};

Peer state (CStore)

type DataGemsJobPeerState = {
  peerId: string;
  assigned: number;
  range: { start: number; end: number };
  generatedOk: number;
  generatedFailed: number;
  lastUpdateAt?: string;
  startedAt?: string;
  finishedAt?: string;
  resultCid?: string;
  errors?: Array<{ index: number; message: string }>;
};

Job details (R1FS)

type DataGemsJobDetails = {
  id: string;
  owner: string;
  description: string;
  instructions: string;
  schema: unknown;
  inference: {
    baseUrl: string;
    path: string;
    model?: string;
    parameters?: Record<string, unknown>;
  };
  datasetMode?: boolean;
  createdAt: string;
  schemaGeneratedAt: string;
  schemaDurationMs: number;
  schemaRefreshes: number;
};

Peer results (R1FS)

  • JSONL file: one line per record { i, ok, data } or { i, ok:false, error }.

6) Inference API integration

Constants (must exist)

  • INFERENCE_BASE_URL
  • CREATE_CHAT_COMPLETION_PATH = "/create_chat_completion"
  • SYSTEM_PROMPT, SCHEMA_SYSTEM_PROMPT, DATASET_RECORD_SYSTEM_PROMPT, DATASET_SCHEMA_SYSTEM_PROMPT

One call per record

  • Schema draft: single call.
  • Record generation: N calls (sequential; concurrency later).

Structured output (response_format)

  • Schema draft uses response_format: { "type": "json_object" } to guarantee valid JSON.
  • Record generation uses response_format: { "type": "json_object", "schema": <JSON Schema> } when a schema is available.

Schema validation

  • Job confirmation validates the draft schema against the JSON Schema 2020-12 meta-schema and rejects invalid drafts.
  • Draft schemas are sanitized before validation to correct obvious mistakes (wrong $schema URL, non-object property values).

7) API endpoints (current)

Auth

  • POST /api/auth/login
  • POST /api/auth/logout
  • GET /api/auth/me
  • POST /api/auth/register (name/email/country; emails generated password)

Schema + Jobs

  • POST /api/tasks/schema — returns schema + signed draft token
  • POST /api/tasks/confirm — persists job to CStore/R1FS and returns { jobId }
  • GET /api/tasks — list jobs for current user
  • GET /api/tasks/[id] — job base + peer states + job details
  • GET /api/tasks/[id]/export?format=json|csv — merged export

Metrics

  • GET /api/metrics{ metrics: { totalJobs, totalRecordsRequested, totalRecordsGenerated, activeJobs, failedJobs, lastJobAt } }

8) Worker loop (multi-instance)

  • Singleton polling worker per instance (lib/datagen/jobWorker.ts).
  • Polls every DATAGEN_JOB_POLL_SECONDS.
  • Each peer processes only its assigned range from R1EN_CHAINSTORE_PEERS.
  • Writes progress to CStore every DATAGEN_UPDATE_EVERY_K_REQUESTS records.
  • Writes results locally (DATAGEN_LOCAL_CACHE_DIR) for resume, then uploads JSONL to R1FS when done.
  • Updates job totals in CStore and marks succeeded when all peers finished.

9) UI (minimal)

  • Schema‑first flow: generate schema → confirm job.
  • Job cards (collapsed + expanded) with:
    • status, progress, timestamps, durations
    • description/instructions
    • schema viewer
    • per‑peer stats table
  • Download JSON/CSV when job completed.

10) Commands (expected)

  • Install: npm install
  • Dev: npm run dev
  • Lint: npm run lint
  • Test: npm test

11) Guardrails

  • Do not store raw passwords anywhere outside cstore-auth-ts.
  • Do not log secrets, session tokens, or inference API keys.
  • Validate user input:
    • record count within [1, DATAGEN_MAX_RECORDS_PER_JOB]
    • prompt/instructions non‑empty and size‑limited
  • Avoid overwriting job records: update job base and peer states independently.

14) BUILD-CRITIC protocol (SOTA iterative development + evaluation, 2026 update)

For all non-trivial implementation tasks (especially after /metrics), use a BUILD-CRITIC loop:

BUILD phase (small, testable increments)

  • Create a short task card with:
    • goal
    • constraints
    • acceptance checks
    • rollback plan
  • Break work into stepwise requirement turns (inspired by SR-Eval, 2025) instead of one large patch.
  • Produce the smallest coherent patch that can be validated end-to-end.

CRITIC phase (tool + model-assisted critique)

  • Run tool-interactive critique first (inspired by CRITIC, 2024):
    • type checks
    • lint
    • targeted tests
    • route-level contract checks
    • security/safety checks (auth enforced, secret leaks, input bounds)
  • Add structured LLM critique on top of tool output:
    • correctness
    • edge cases
    • maintainability
    • failure handling
    • data integrity in CStore
  • Use selective critique (inspired by RefineCoder/ACR, 2025): spend critique budget where confidence is low, tests fail, or risk is high.

ITERATE phase

  • Apply exactly the fixes justified by critique evidence.
  • Re-run the CRITIC phase.
  • Keep a compact reflection log (inspired by Reflexion, 2023):
    • root cause
    • patch applied
    • prevention note for next iteration

Stop criteria (must be explicit)

  • All acceptance checks pass.
  • No auth regressions.
  • No persisted-data schema regressions.
  • No untriaged high-severity critic findings.

Evaluation policy (offline + live)

  • Always report:
    • pass@1 on deterministic tests
    • regression count
    • iteration count to green
    • time-to-green
    • cost/latency per successful task
  • Maintain two evaluation tracks:
    • Static benchmark track (e.g., SWE-bench-style reproducible suites).
    • Fresh/live track (inspired by SWE-bench-Live, 2025) to reduce contamination and overfitting to stale tasks.
  • Prefer contamination-aware evaluation design and periodically refresh holdout tasks.
  • When relevant, validate on repository-level, multi-file tasks, not only function-level tasks (aligned with recent SWE-agent benchmarks and SWE-Universe scaling results, 2026).

Recommended references for this protocol


If any assumption here conflicts with reality (package API differences, env var names, etc.), update this file and align the codebase accordingly.