ExamCraft MVP: bank ingestion → AI exam generation → chat revision#1
Merged
Conversation
Backend (uv-managed FastAPI on Python 3.10+) exposes /api/health and is wired for upcoming auth/banks/samples/generations/chat routers. Pydantic settings load from repo-root .env and create the data/uploads/pages/jobs directories on first boot. Frontend (Next.js 15.5.15 + Tailwind + Inter/Fraunces) renders an editorial home page that fetches the backend health JSON server-side, plus a placeholder /login route so typedRoutes builds clean. Makefile brings up both servers in parallel; README documents the brew install poppler libreoffice prerequisite and the .env workflow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Backend gets a real schema (User, Bank) on aiosqlite + WAL, with a
lazily-initialized async engine so tests can pin EXAMCRAFT_DATA_DIR to
tmp_path. Auth is HMAC-signed cookies via itsdangerous — no DB sessions —
issued on /api/auth/login, cleared on /logout, validated on /me. Bank
routes are scoped to the calling user; tests cover the auth roundtrip,
CRUD lifecycle, and cross-user isolation.
Frontend route layout splits into a public /login and an auth-gated (app)
group. The (app) layout calls getMe() server-side and redirect("/login")s
when no session exists; SSR forwards the examcraft_session cookie to the
backend via cookies(). Dashboard lists banks with status pills; an inline
CreateBankCard handles the new-bank form. Bank detail is a shell with
M3/M4 placeholders.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Backend grows three services and one router. docrender shells out to LibreOffice (serialized via asyncio.Lock + per-invocation -env:UserInstallation profile dir) for docx/doc/odt/rtf, then to pdf2image/poppler for the per-page PNGs at 200dpi. llm wraps litellm.acompletion for chat / chat_json / vision_json, all hitting the gateway in $OPENAI_BASE_URL with model gpt-5.4. ingestion stitches them together: per-page vision concurrency=5, then a bank-level aggregation that produces a style + topic profile JSON. JobRegistry holds onto the asyncio tasks so they survive across the request response, and an on-startup sweep flips any extracting/analyzing/running rows to error so restart-mid-job is recoverable. New tables: SampleExam, SampleExamPage, both cascading from Bank. Frontend bank-detail page gets three live sections: drop-zone uploader, sample list with 2s polling while any item is in flight, and an analysis panel that renders style profile, knowledge-point bars, problem-type bars, and the raw JSON behind a disclosure. Server actions + cookie forwarding follow the M2 pattern; new client helpers cover upload/delete/refresh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
M4 — generation pipeline
- image_gen calls gpt-image-2 via httpx with retry-jitter (4 attempts),
bounded asyncio.Semaphore(3), decoder that handles all three observed
response shapes (b64_json, data:image/png;base64,…, http url), and an
ImageGenError on persistent failure.
- generation.run_generation orchestrates the full pipeline: bank profile
→ exam spec via gpt-5.4 → per-page descriptive English prompts via
gpt-5.4 → image fan-out with an on_page callback that updates the DB
and emits SSE events as each page lands. Spec JSON is the source of
truth, the PNG is a stylized companion.
- New tables GenerationJob + GeneratedPage. Startup sweep flips any
queued/running jobs to failed so a restart is recoverable.
- New per-job EventBus (app/sse.py) with bounded replay so the watch
page survives refresh.
- Frontend: /generations/[id] watch page subscribes to /events via
EventSource, drives a progress bar, page gallery that fills live, an
activity log, and a structured spec viewer with toggleable answers +
raw JSON.
M5 — chat revision
- ChatMessage table, /api/generations/{id}/chat endpoints,
revision.apply_revision worker. Worker has the LLM rewrite the spec
given chat history, re-runs the layout planner, diffs prompts, and
re-renders only the pages whose prompt changed.
- Frontend ReviseChat panel embedded in the watch page; SSE streams the
assistant reply and the re-rendered pages back into place.
M6 — polish
- examcraft-seed CLI: auto-creates a user + bank and queues ingestion
of every supported file in a source directory (default
~/Desktop/personal/试卷/), with a --no-aggregate flag for stepping
manually through the pipeline.
- JobRegistry.in_flight() so the seeder waits without touching
internals; ruff sweep across new modules.
Also fixes a real footgun: .gitignore's "lib/" was matching web/lib/ at
any depth, silently dropping the entire frontend API layer from M1+M2
commits. Pinned to /lib/ and committed the missing files. Without this
the repo did not actually build from a fresh clone.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
image_gen.generate_many now returns Path | Exception per index instead of raising on the first failure, and run_generation hooks both on_page and on_page_error so the DB records per-page status. The job ends in done state with current_step explaining how many failed; the gallery shows a "failed — chat to retry this page" placeholder for those pages, and a new SSE event page_error keeps the watch UI in sync. Caught from the first end-to-end run: 6/7 pages rendered fine, page 2 got persistent provider 400s through all 4 retries, and the whole job status flipped to failed. With this change the user gets a usable result plus a clear retry path through chat revision. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two fixes after the first end-to-end chat-revision run took ~15min for a single problem swap: - The diff was string-comparing prompt text. The layout planner produces slightly different prose every call even when the underlying problem assignment is identical, so essentially every page looked "changed" and got re-rendered. Now we diff the *set of problem_ids per page*, reading the previous layout from prompts.json and writing the new one back so subsequent revisions can diff against it. A revision that swaps one problem usually only re-renders that one page. - Re-renders ran serially in a for-loop instead of going through the shared image_gen semaphore. Now uses asyncio.gather, with the same bounded concurrency=3 and partial-failure handling as the initial generation. on_page_error mirrors the generate path's behavior so the watch UI shows failed pages with a "chat to retry" placeholder. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The chat endpoint flips the job to status='running' so the watch UI shows progress, but apply_revision was only emitting SSE events without mirroring the terminal state back to the DB. After every page rendered the job sat in 'running' forever until a manual restart. Now apply_revision calls _set_status on done / no-op done / failure, and also clears progress_pct back to 1.0 with a sensible current_step. Reusing _set_status from generation.py to keep the path identical. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the browseruse-bench claude.yml — triggers on @claude mentions in comments / reviews / issues, and on every PR open or push. Uses the same self-hosted runner pattern because the LiteLLM gateway is on the internal network and isn't reachable from GitHub-hosted runners. Repo secrets ANTHROPIC_API_KEY and ANTHROPIC_BASE_URL are set; pass them through to anthropics/claude-code-action@v1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This reverts a82ebad. No self-hosted runner available for this repo, and the LiteLLM gateway is internal-only so a GitHub-hosted runner can't reach it. Repo secrets ANTHROPIC_API_KEY and ANTHROPIC_BASE_URL have been removed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
waple0820
added a commit
that referenced
this pull request
May 3, 2026
#2 — math typesetting The LLM emits LaTeX delimiters (\(x^2\), \[…\], $…$$, $$…$$) verbatim in problem text and choices, so the page was showing raw "\(-3\)" instead of "−3". Adds a tiny <MathText> component that tokenises the four common delimiter pairs and runs each math chunk through KaTeX renderToString; non-math text is preserved verbatim with whitespace. Wired through ExamView for problem.content / choices / answer. KaTeX CSS imported once in globals.css. #3a — print cleanup The (app) layout's sticky header (brand / locale / sign-out) and the "← back to bank" link on the generation page weren't marked data-no-print, so they were leaking into the printed sheet. Added the attribute to both. Combined with the existing data-no-print on the header progress block, action toggles, FigureSlot loading/error states, problem-tag row, and chat panel, "打印" now produces just the exam article: title + meta + sections + problem text + figures + (optional) answers. #1 — English in profile (delivered via re-aggregation, not code) The legacy bank profile carried English snake_case keys for problem_type_distribution and an English string for style_profile.tone because it was aggregated before the prompt-language fix. Triggered a re-aggregation; the new profile has Chinese keys (选择题, 填空题, etc.) and Chinese values for tone / header_template / layout_pattern / typography. The frontend dictionary still carries fallback labels for any English-keyed banks aggregated before the prompt update. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
First end-to-end build of ExamCraft from an empty repo.
Plan file:
~/.claude/plans/buzzing-sparking-raven.md. Memory:~/.claude/projects/-Users-avatar-Desktop-projects-ExamCraft/memory/.Validated end-to-end
~/Desktop/personal/试卷/→ 8 pages extracted → vision LLM correctly identified 标题 + 知识点 → bank profile captures Hubei middle-school style.backend/data/jobs/{id}/page_*.pngafter running).Commits (7)
lib/gitignore footgun that was droppingweb/lib/*from M1+M2)Known caveats (worth a glance during review)
/api/system/checkendpoint to verify.Test plan
make setupthenmake dev— backend on :8000, web on :3000http://localhost:3000/login~/Desktop/personal/试卷/, watch the status badge progress throughextracting → analyzing → readyfailedon next bootcd backend && uv run pytest -q→ 7/7 greenShortcut for review
🤖 Generated with Claude Code