GitHub - Mike-E-Log/learn-ai-eval: The Eval Codex — Claude-tutored AI-eval learning engine. Build eval expertise via guided practice.

learn-ai-eval · The Eval Codex

A Claude-tutored learning engine — 13 AI-eval fundamentals as a gamified skill tree,
tutored through a spaced-repetition loop grounded in canonical sources.

What it is. A personal learning engine for getting conversant in the AI-evaluation canon and staying current. It turns the 13 core eval fundamentals into a gamified skill tree, tutors each one with Claude through a spaced-repetition loop grounded in canonical sources, and keeps a full audit trail of what was learned.

What building it exercised (the part a hiring manager cares about):

LLM integration — the Anthropic API as a stateful, multi-step tutor with a structured pedagogy, not a chatbot wrapper.
Secret hygiene — the API key is read from an env var by a localhost-only proxy and injected server-side; it never enters the browser or the repo.
Learning science as product mechanics — retrieval practice, spaced review (expanding intervals), interleaving, and predict-then-test calibration are wired into the loop, not just mentioned.
Auditability — append-only event log + readable per-session transcripts.
Eval domain — the curriculum maps the canon: LLM-as-judge & its biases, eval datasets & contamination, metrics, agentic/trajectory eval, benchmarks & Goodhart, statistics for eval, RAG eval, red-teaming.

How it works

Skill tree — 13 nodes (the eval canon). Each: status todo → gap → shaky → solid, a canonical source anchor, the tutor loop.
Tutor loop (v2) — predict your score → teach with two examples → you write notes in your own words → cold quiz → graded against your notes and the canonical anchor → mock-interview transfer question.
Spaced review — solid nodes resurface on expanding intervals (1/3/7/16/35 days); a "due" queue stops a sprint from evaporating before it's needed.
Audit trail — every session, message, status change, and review is logged and exportable.

Architecture

index.html — single-file app, vanilla JS, no build step. Gamified UI, light/dark.
proxy.js — zero-dependency Node relay. Reads the key from ANTHROPIC_LEARN_AI_EVAL_TUTOR, binds 127.0.0.1 only, injects the key server-side, and is not a generic relay (whitelists fields, only calls Anthropic).
The key never lives in the repo or the browser. Full design + security notes in DESIGN.md.

Run it

Set ANTHROPIC_LEARN_AI_EVAL_TUTOR to a (spend-capped) Anthropic API key.
node proxy.js
Open http://127.0.0.1:8791 — pick a model in ⚙, start a node.

See RUN.md for the file:// fallback and details.

Status

Frozen as-built baseline — see DESIGN.md for problem, premises, success criteria, the learning-science basis, and the security posture. Now public as a portfolio piece in the AI-evaluation engineering portfolio (below).

AI-evaluation engineering portfolio — five repos, one discipline:

ai-eval-toolkit — judge-vs-human calibration (Cohen's κ / Kendall-τ vs Landis–Koch bands)
agentic-eval-harness — eval-gated Claude Code phase boundaries with cross-vendor scorecards
ai-eval-atlas — practitioner + technique map, source-linked
ai-engineer-best-practices — handbook + score MCP tool (3-vendor judge ensemble)
learn-ai-eval (you are here) — Claude-tutored learning engine for the eval canon

Profile: github.com/Mike-E-Log · website: mikeilog.com

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
docs		docs
glowup		glowup
.gitignore		.gitignore
DESIGN.md		DESIGN.md
LICENSE		LICENSE
README.md		README.md
RUN.md		RUN.md
index.html		index.html
proxy.js		proxy.js
social-preview.png		social-preview.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

learn-ai-eval · The Eval Codex

How it works

Architecture

Run it

Status

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

learn-ai-eval · The Eval Codex

How it works

Architecture

Run it

Status

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages