Skip to content

netsky-lab/aiml-continuation-lab

Repository files navigation

AiML Continuation Lab

AiML Continuation Lab is currently a static React prototype for an AI/ML research-agent IDE. The working app lives in prototype/AiML Prototype.html and renders a multi-panel workspace for experiment runs, traces, policy overlays, graph views, and chat.

Current State

  • No package manager or build manifest is present at the repo root.
  • The prototype is browser-run from a single HTML entrypoint.
  • React 18.3.1, ReactDOM 18.3.1, and Babel Standalone 7.29.0 are loaded from CDNs in the HTML shell.
  • Application state is hydrated from static browser globals defined in prototype/data.js, prototype/backend_slice_loader.js, and prototype/first_trace_payload.js; first-run LLM, retrieval, tabular, and vision workspaces resolve generated backend slices before fixture fallbacks.
  • Repository verification is currently a deterministic Node smoke baseline captured in .supervisor/project.json and executed without npm scripts or a build toolchain.
  • The current machine verify baseline includes nineteen deterministic Node smokes: prototype/first_trace_render_smoke.js, prototype/live_resume_smoke.js, prototype/codex_resume_bridge_smoke.js, prototype/codex_resume_stream_smoke.js, prototype/codex_resume_persistence_smoke.js, prototype/queued_followup_execution_smoke.js, resume history, queued-followup UI, local bridge preflight UI/data smokes, runtime contract and frontend slice checks, orchestration and retriever smokes, graph and memory schema checks, structured corpus validation, the Codex runtime adapter smoke, and knowledge/codex_first_loop_smoke.js.
  • Product direction is Codex-first: the agent loop is intended to run through Codex CLI or SDK integration first, with Claude compatibility designed in from the start and implemented later.
  • The orchestration control plane is documented in ARCHITECTURE.md.
  • Multi-turn continuation now includes one deterministic executable Codex-first follow-up path on top of the canonical checkpoint/resume contract, while staying runtime-agnostic at the orchestration boundary. The browser prototype hydrates the canonical resumed-turn follow-up slice and can apply local bridge resume and queued follow-up transport results through the same frontend slice/current-run selection path. Bridge-produced resume outputs are also persisted as repo-owned runtime state under knowledge/generated/runtime_state/codex_resume/.
  • When served through prototype/codex_resume_bridge.js, the multi-turn run panel shows a compact local-runtime preflight surface before live resume and queued follow-up controls. It reports the status probe result, bridge id, scoped resume/stream/queued-follow-up/history capabilities, supported queued action kinds, and runtime-state history counts without executing actions.
  • Multi-turn run panels also include a read-only checkpoint lineage branch inspector that groups checkpoint creation, resume, branch, and queued-follow-up runtime-history refs by checkpoint/action provenance without adding branch mutation controls.
  • Multi-turn turn/run/checkpoint legality is now centralized in knowledge/lib/multi_turn_state_machine.js, and adapter/loop/frontend continuation projections consume that deterministic state machine instead of ad hoc transition logic.
  • When multiple backend slices contribute competing current runs for one domain, prototype/data.js applies an explicit deterministic policy: continuation-bearing resumed slices outrank queued-follow-up slices, which outrank other runtime-backed backend currents, which outrank legacy current tags and final first-run fallback. Tabular and vision now use backend-backed first-run current slices while the older tabular static rows remain additive history.

Project Layout

How To Run

Open prototype/AiML Prototype.html in a browser. Because dependencies are loaded from public CDNs, internet access is required unless those assets are vendored later. Backend slices now resolve generated artifacts under knowledge/generated/ before falling back to knowledge/runtime_examples/, so prefer serving the repo from the root with a simple static server such as python3 -m http.server instead of relying on file:// mode.

To run the prototype with the local Codex resume bridge, use:

node prototype/codex_resume_bridge.js

Then open http://127.0.0.1:4177/prototype/AiML%20Prototype.html. This bridge is a local development/runtime boundary for checkpoint-backed resume, not a standalone inference API.

Each successful POST /aiml/codex/resume writes durable repo artifacts under knowledge/generated/runtime_state/codex_resume/. The manifest links the refreshed checkpoint, applied frontend slice, runtime transcript, queued follow-up action state, lineage refs, and the canonical generated slice path loaded by prototype/backend_slice_loader.js after a browser reload or bridge restart. POST /aiml/codex/queued-followup can then select a persisted pending experiment.queue_run action by checkpointId and actionId, execute it through the same Codex-first path, and refresh the persisted artifacts.

Verification

Run the verified repository checks:

node prototype/first_trace_render_smoke.js
node prototype/live_resume_smoke.js
node prototype/codex_resume_bridge_smoke.js
node prototype/codex_resume_stream_smoke.js
node prototype/codex_resume_persistence_smoke.js
node prototype/queued_followup_execution_smoke.js
node prototype/codex_resume_history_smoke.js
node prototype/queued_followup_ui_smoke.js
node prototype/codex_resume_preflight_smoke.js
node knowledge/runtime_contract_smoke.js
node knowledge/frontend_slice_smoke.js
node knowledge/orchestration_core_smoke.js
node knowledge/retriever_query_contract_smoke.js
node knowledge/filesystem_corpus_retriever_smoke.js
node knowledge/graph_access_smoke.js
node knowledge/experiment_memory_smoke.js
node knowledge/structured_corpus_smoke.js
node knowledge/codex_runtime_adapter_smoke.js
node knowledge/codex_first_loop_smoke.js

The same 20-command machine baseline is registered in .supervisor/project.json as one chained sequential testCommand. It intentionally excludes lint, typecheck, build, coverage, and browser automation because those toolchain stages do not exist in the current repository.

Several prototype and Codex loop smokes intentionally write canonical files under knowledge/generated/ so generated-first hydration can be verified. Run those smokes sequentially; if two shared-writer smokes overlap, the smoke lock fails fast with a generated-artifact lock message instead of allowing ambiguous partial JSON reads. Smokes that do not need canonical loader paths should pass an isolated generatedDir through the Codex loop.

To run the optional live Codex CLI integration check on top of the deterministic loop coverage, use:

AIML_ENABLE_LIVE_CODEX_SMOKE=1 node knowledge/codex_first_loop_smoke.js

The loop smoke evaluates the higher-level codex_first_loop path locally, including canonical envelope normalization, command auditing with fail-closed mismatch behavior, provenance propagation, and append-only memory writes across repeated runs.

Multi-Turn Status

  • The canonical multi-turn continuation contract is defined in knowledge/schemas/runtime_checkpoint.schema.json and the paired fixtures under knowledge/runtime_examples/.
  • Repo-owned checkpoints are the source of truth for resume semantics. Runtime-managed handles such as OpenAI previous_response_id chains or Claude session resume remain adapter-local implementation details.
  • The canonical resumed turn uses turn.resume, re-checks corpus and policy state, appends memory, and can queue bounded follow-up actions named by pending_follow_up_action_ids.
  • Frontend continuation state is additive through optional runtime_session, turn_timeline, and continuation objects so existing single-turn slice consumers stay valid.
  • prototype/backend_slice_loader.js and prototype/data.js now hydrate first-run slices plus canonical resumed LLM, retrieval, and additive tabular follow-up slices through the existing backend fixture path. The tabular continuation slice is generated/fallback inspectable but is not a bridge/UI execution target yet.
  • The frontend-visible multi-turn flow remains generated-slice/fallback friendly, and the resumed LLM card can now call window.AIML_CODEX_RESUME_TRANSPORT.resumeCheckpointFollowup(request) or window.AIML_CODEX_RESUME_TRANSPORT.executeQueuedFollowup(request) when the local bridge is available. Runtime session handles stay adapter-local in the request/response boundary; repo checkpoint ids and continuation records remain canonical in the applied frontend slice.
  • Local bridge streaming is intentionally narrow: POST /aiml/codex/resume/stream accepts the same scoped codex.resume_checkpoint_followup JSON body as POST /aiml/codex/resume and returns newline-delimited JSON. Ordered runtime_event records are emitted first and projected into turn_timeline.messages while the resume is running; the final frontend_slice record carries the same canonical response shape used by the non-streaming endpoint, and prototype/data.js still applies that final slice through applyBackendSlice(...).
  • Bridge resume results are durable across browser reloads and bridge restarts because the bridge persists repo-owned runtime state manifests plus checkpoint, frontend slice, runtime transcript, queued action, and lineage artifacts while leaving generated-first slice hydration on the existing loader paths.

Gaps

  • No package.json, lockfile, or formal dev environment.
  • No automated lint, typecheck, build, coverage, or browser test pipeline.
  • No documented product requirements beyond the prototype assets and research notes.
  • The knowledge corpus now includes normalized seed slices for llm_finetuning, retrieval_reranking, tabular_classification, and vision_object_detection, but ingestion is still intentionally curated and incomplete.
  • The local Codex resume bridge is intentionally narrow and deterministic; broader streamed runtime updates and broader queued action-family execution are still future work.

Current Corpus Focus

The first usable corpus slice targeted llm_finetuning, aligned with the current prototype's strongest example flow:

  • knowledge/playbooks/llm_finetuning_first_run_review.yaml
  • knowledge/metric_rules/llm_finetuning.yaml
  • knowledge/stage_rules/first_run_review.yaml
  • knowledge/intervention_library/llm_overfitting_after_step_inflection.yaml
  • knowledge/policy/llm_baseline_guardrails.yaml
  • knowledge/trajectories/human_curated/llm_first_run_overfit_trace.json

The structured corpus now also covers high-signal first-run review scenarios for retrieval, tabular classification, and object detection. The newest vision_object_detection records add:

  • low small-object AP/recall despite acceptable aggregate bbox mAP
  • validation mAP regression after aggressive augmentation while train loss improves
  • provenance-backed playbook, metric, intervention, policy, trajectory, run, trace, and lineage records for both scenarios

Runtime Direction

The intended execution model is:

  • primary runtime: Codex via CLI or SDK
  • secondary runtime: Claude via CLI or SDK later
  • not a standalone inference API as the primary product shape

This means the core system should be designed around agent orchestration, tool execution, corpus traversal, experiment memory, and run-control workflows. Any API surface should be treated as optional glue, not the center of the architecture.

The current source of truth for that control plane is ARCHITECTURE.md, which defines the runtime-agnostic loop, adapter boundaries, and checkpoint model.

Releases

No releases published

Packages

 
 
 

Contributors