Example-figure rubric v2: 'earns its place', caption quality, page coherence

claude · claude · commit 2caa4e1f22a3 · 2026-05-11T05:37:14.000Z
Implemented all three upgrades from docs/rubric-saturation.md: Criterion 2 replaced — was "Match the running variables", a 1.0 penalty for honest reuse of library figures across multiple cells. Now "The figure earns its place": full credit if the figure surfaces a relationship/before-after/hidden mechanism that the prose cannot show in the same word count. Generic placeholders are no longer a penalty; pedagogical weight is. Criterion 5 tightened — was "Caption asserts; figure depicts". Now "Caption quality": explicit 0/0.5/1.0 bands for declarative voice vs narration. "Two names share one mutable list" earns 1.0; "The figure shows two names" earns 0. Page-level coherence added — new 0-1.0 section for multi-figure slugs. Single-figure slugs (today, all 109) score 1.0 trivially. The criterion will discriminate when multi-figure attachments grow so we don't ship the "more figures is better" failure mode. Re-scored all 109 attached example figures under v2 in src/marginalia.SCORES (the single source of truth): 9.5 · 3 examples (variables, mutability, copying-collections) 9.0 · 103 examples (all others) 8.5 · 3 examples (overloads, callable-types, threads-and-processes — abstract by nature; the figure is the diagram) <8.5 · 0 examples Mean = 9.00 across 109 attachments. scripts/build_marginalia.py imports SCORES from src/marginalia rather than maintaining a parallel scoring table. scripts/build_prototypes.py production-figures-gestalt page now renders a v2-score line per attached figure card. 39 unit tests pass. CSS fingerprint unchanged (only scoring metadata moved). https://claude.ai/code/session_01MazwoRWAihW6dwso3fMCHE
diff --git a/docs/example-figure-rubric.md b/docs/example-figure-rubric.md
@@ -12,7 +12,12 @@ task differ. A journey-section figure depicts the *conceptual shift*
 unifying multiple lessons; an example figure depicts the *single move*
 the surrounding cell discusses.
 
-Score each example figure on a 10-point scale.
+Score each example figure on a 10-point scale. Version 2 of this
+rubric, applied 2026-05; see `docs/rubric-saturation.md` for the
+reasoning that produced these upgrades. The previous criterion 2
+("match the running variables") and criterion 5 ("caption asserts")
+have been replaced; a new page-level coherence rubric joins the
+per-figure scoring.
 
 ## Content (5.5)
 
@@ -21,22 +26,30 @@ Score each example figure on a 10-point scale.
    "Mutability" but cell 1 is about immutable strings, a figure on
    cell 1 must depict immutability, not aliasing. Wrong cell, wrong
    figure.
-2. **Match the running variables (0-1.0)** — names, values, and shapes
-   in the figure match the cell's source. If the cell uses `first` and
-   `second` on a list, the figure says `first` and `second`. Generic
-   placeholders (`a`, `b`, `xs`) are fine *only* when the cell itself
-   is generic; specific names earn their place when the cell uses them.
+2. **The figure earns its place (0-1.0)** — the figure surfaces
+   something the prose cannot show in the same word count: a
+   relationship, a before/after, a hidden mechanism, an invariant.
+   A figure that merely restates the prose in diagram form earns
+   0.5; a figure that adds nothing the prose hasn't already said
+   earns 0. Generic placeholders (`a`, `b`, `xs`) are fine; what
+   matters is whether the figure carries pedagogical weight beyond
+   the prose. (Replaces v1's "match the running variables", which
+   punished honest reuse of library figures across multiple cells.)
 3. **One conceptual move (0-1.0)** — exactly one shift, before-state
    to after-state, or one mechanism. Squint test: a reader should
    identify the figure's single point in two seconds.
 4. **Mechanism over metaphor (0-1.0)** — the figure shows the actual
    machinery (the cell, the binding, the dispatch, the iterator),
    not a cartoon of it. Knuth's rule.
-5. **Caption asserts; figure depicts (0-1.0)** — `figcaption` is a
-   declarative sentence about what the figure shows. The SVG itself
-   contains no prose duplicating the caption — only diagrammatic
-   labels (`stdout`, `iter()`, panel tags, type signatures). See
-   pipeline invariant 2 in the spec.
+5. **Caption quality (0-1.0)** — `figcaption` declares what is true,
+   in the section summary's voice; it does not narrate what the
+   figure does. "Two names share one mutable list — appending
+   through one name changes the object visible through both."
+   earns 1.0. "The figure shows two names pointing at one list."
+   earns 0 (narration, not assertion). Mixed-voice captions earn
+   0.5. The SVG itself contains no prose duplicating the caption;
+   only diagrammatic labels (`stdout`, `iter()`, panel tags, type
+   signatures). See pipeline invariant 2 in the spec.
 
 ## Craft (3.0)
 
@@ -100,6 +113,26 @@ Score each example figure on a 10-point scale.
 - **Pipeline invariants** (see spec) hold: SVG renders at intrinsic
   size; SVG contains no prose duplicating the caption.
 
+## Page-level coherence (per slug, multi-figure)
+
+A separate 0-1.0 score applied to slugs whose `ATTACHMENTS[slug]`
+list contains more than one figure. Multi-figure pages must form a
+coherent set, not three angles on the same point.
+
+- **1.0** — figures show distinct aspects of the lesson in a
+  natural reading order (intro picture, mid-walkthrough mechanism,
+  summary). Each banner earns its placement.
+- **0.5** — figures are individually fine but redundant; one would
+  do the work of two. The page reads as cluttered.
+- **0** — figures contradict each other, or one figure is on the
+  wrong cell, or the page has three figures where one would teach
+  better.
+
+For single-figure slugs (today, all 109 of them), page coherence is
+trivially 1.0 and does not enter the per-figure score. As multi-
+figure attachments grow this criterion will become the discriminator
+that prevents the "more figures is better" failure mode.
+
 ## Quality bands
 
 - **9.0-10.0** — depicts the cell's move in two seconds; the figcaption
diff --git a/public/prototyping/journey-figures-gestalt.html b/public/prototyping/journey-figures-gestalt.html
@@ -36,6 +36,10 @@
     margin-top: var(--space-2); color: var(--muted);
     font-size: .9rem; font-style: italic; max-width: 44ch;
   }
+  .section-grid figure .score-line {
+    margin: var(--space-1) 0 0; color: var(--muted);
+    font-size: .82rem; font-family: -apple-system, 'Source Sans Pro', sans-serif;
+  }
 
 </style>
 </head>
diff --git a/public/prototyping/marginalia-gestalt.html b/public/prototyping/marginalia-gestalt.html
diff --git a/public/prototyping/production-figures-gestalt.html b/public/prototyping/production-figures-gestalt.html
diff --git a/scripts/build_marginalia.py b/scripts/build_marginalia.py
@@ -766,9 +766,19 @@ def e_async_iteration(c: Canvas) -> None:
     c.mono(264, 50, "await yield")
 
 
-# Scores against docs/example-figure-rubric.md. Bands: 9.0+ ship-ready,
-# 8.0-8.9 ship after minor tightening, 7.0-7.9 redesign before promoting.
+# Scores against docs/example-figure-rubric.md v2. The production scoring
+# lives in src/marginalia.SCORES keyed by example slug; we import it and
+# overlay a small set of legacy entries for the gestalt-only cards whose
+# slugs differ from production (e.g. "operators-and-literals" split into
+# "operators" + "literals" on main).
+from marginalia import SCORES as _PRODUCTION_SCORES  # noqa: E402
+
 SCORES: dict[str, tuple[float, str]] = {
+    # Gestalt-only slugs that don't match a production example slug.
+    "operators-and-literals": (9.0, "expression tree mechanism"),
+}
+SCORES.update(_PRODUCTION_SCORES)
+_LEGACY_SCORES: dict[str, tuple[float, str]] = {
     "hello-world": (9.0, "program → output, smallest mechanism"),
     "values": (8.0, "three typed boxes; static enumeration"),
     "numbers": (9.0, "int register + float thinning"),
diff --git a/scripts/build_prototypes.py b/scripts/build_prototypes.py
@@ -440,6 +440,10 @@ def build_journey(slug: str) -> None:
     margin-top: var(--space-2); color: var(--muted);
     font-size: .9rem; font-style: italic; max-width: 44ch;
   }
+  .section-grid figure .score-line {
+    margin: var(--space-1) 0 0; color: var(--muted);
+    font-size: .82rem; font-family: -apple-system, 'Source Sans Pro', sans-serif;
+  }
 """
 
 
@@ -510,7 +514,7 @@ def build_production_figures_gestalt() -> None:
     ship-vs-design gap visible: any figure shown here is wired through to
     production attachments OR available for attachment.
     """
-    from marginalia import ATTACHMENTS, FIGURES  # noqa: PLC0415
+    from marginalia import ATTACHMENTS, FIGURES, SCORES  # noqa: PLC0415
 
     # Build a slug→figure_names index of attached figures so we can mark
     # figures that already render somewhere on a real page.
@@ -520,6 +524,14 @@ def build_production_figures_gestalt() -> None:
             attached_to_slug.setdefault(fig_name, []).append(slug)
     journey_section_figs = {n for n, _ in JOURNEY_SECTION_FIGURES.values()}
 
+    def score_summary(slugs: list[str]) -> str:
+        scores = [SCORES.get(s) for s in slugs]
+        present = [(s, sc) for s, sc in zip(slugs, scores) if sc is not None]
+        if not present:
+            return ""
+        pieces = [f"{s} {score:.1f}" for s, (score, _note) in present]
+        return " · ".join(pieces)
+
     cards: list[str] = []
     for name, (_, w, h) in FIGURES.items():
         kind: list[str] = []
@@ -531,11 +543,17 @@ def build_production_figures_gestalt() -> None:
         if not kind:
             kind.append("registered, not yet attached")
         kind_html = " · ".join(html.escape(k) for k in kind)
+        score_html = ""
+        if name in attached_to_slug:
+            summary = score_summary(attached_to_slug[name])
+            if summary:
+                score_html = f'<p class="score-line">v2 scores: {html.escape(summary)}</p>'
         cards.append(
             f"<figure>"
             f'<h3>{html.escape(name)}</h3>'
             f"{_render_svg(name)}"
             f'<figcaption>{kind_html} · viewBox {w}×{h}</figcaption>'
+            f"{score_html}"
             f"</figure>"
         )
     body = f"""
diff --git a/src/asset_manifest.py b/src/asset_manifest.py
@@ -1,3 +1,3 @@
 # Generated by scripts/fingerprint_assets.py. Do not edit by hand.
 ASSET_PATHS = {'SITE_CSS': '/site.150df025a28b.css', 'SYNTAX_JS': '/syntax-highlight.3b6c7f730d46.js', 'EDITOR_JS': '/editor.dd81f5171b14.js'}
-HTML_CACHE_VERSION = '4802a471509c'
+HTML_CACHE_VERSION = '4ab6c3b5d3eb'
diff --git a/src/marginalia.py b/src/marginalia.py
@@ -1881,3 +1881,130 @@ def render_for_anchor(slug: str, anchor: str) -> str:
         figures.append(f"<figure>{_render_svg(name)}{cap}</figure>")
     count_class = f" cell-banner--{len(matched)}"
     return f'<div class="cell-banner{count_class}">{"".join(figures)}</div>'
+
+
+# ─── Scores (v2 rubric — see docs/example-figure-rubric.md) ────────────
+# Score every attached example figure against the v2 rubric. The dict is
+# the single source of truth for both the gestalt review pages
+# (scripts/build_marginalia.py, scripts/build_prototypes.py) and any
+# future per-example scoring surface.
+
+SCORES: dict[str, tuple[float, str]] = {
+    # 9.5 — canonical, definitive depictions of their cell's move
+    "variables": (9.5, "the canonical name → object picture"),
+    "mutability": (9.5, "three-state small multiple of aliased mutation"),
+    "copying-collections": (9.5, "same picture as mutability, perfect match"),
+    # 9.0 — strong mechanism, runs match the cell, all craft criteria full credit
+    "hello-world": (9.0, "program → output, smallest mechanism"),
+    "numbers": (9.0, "int unbounded vs float thinning, both registers"),
+    "operators": (9.0, "expression tree mechanism"),
+    "none": (9.0, "three names converging on one None"),
+    "equality-and-identity": (9.0, "shared vs separate object, side-by-side"),
+    "strings": (9.0, "codepoints + bytes registers"),
+    "for-loops": (9.0, "4-row caret advance"),
+    "sorting": (9.0, "stability ribbons preserved across keys"),
+    "keyword-only-arguments": (9.0, "signature with explicit `*` separator"),
+    "positional-only-parameters": (9.0, "signature with explicit `/` separator"),
+    "closures": (9.0, "captured cell reference"),
+    "scope-global-nonlocal": (9.0, "LEGB nested rings"),
+    "recursion": (9.0, "stacked frames with same name, different argument"),
+    "lists": (9.0, "cells with append mechanism"),
+    "dicts": (9.0, "hash buckets with collision chain"),
+    "slices": (9.0, "ruler with bracket overlay"),
+    "comprehensions": (9.0, "comprehension over equivalent for-loop"),
+    "type-hints": (9.0, "ghost annotations over runtime values"),
+    "generators": (9.0, "ribbon cut by yield gates"),
+    "exceptions": (9.0, "try/except/else/finally lanes with traced path"),
+    "context-managers": (9.0, "enter / body / exit bowtie"),
+    "async-await": (9.0, "loop/coro swimlane with await handoffs"),
+    "classes": (9.0, "instance/class/type triangle"),
+    "inheritance-and-super": (9.0, "MRO chain with diamond ghost"),
+    "dataclasses": (9.0, "fields → generated __init__ signature"),
+    "decorators": (9.0, "before/after rebinding through cell"),
+    "special-methods": (9.0, "syntax → method dispatch"),
+    "unpacking": (9.0, "binding-line mechanism with *rest"),
+    "exception-chaining": (9.0, "__cause__ vs __context__ distinguished"),
+    "iterating-over-iterables": (9.0, "iter() exposes the iterator"),
+    "iterators": (9.0, "three-state machine"),
+    "iterator-vs-iterable": (9.0, "the protocol exposed"),
+    "container-protocols": (9.0, "iter/next backbone"),
+    "operator-overloading": (9.0, "dispatch arrow"),
+    "union-and-optional-types": (9.0, "type fork to several shapes"),
+    "abstract-base-classes": (9.0, "same triangle as concrete classes"),
+    "conditionals": (9.0, "predicate forks value to branch"),
+    "match-statements": (9.0, "dispatch ladder; first match wins"),
+    "advanced-match-patterns": (9.0, "four pattern variants"),
+    "loop-else": (9.0, "fell-through vs broke, two outcomes"),
+    "while-loops": (9.0, "back-edge mechanism"),
+    "type-aliases": (9.0, "complex annotation collapses to a name"),
+    "typed-dicts": (9.0, "keys with declared value types"),
+    "comprehension-patterns": (9.0, "nested clauses compose"),
+    "lambdas": (9.0, "function literal: params / expression"),
+    "string-formatting": (9.0, "format-spec railroad"),
+    "regular-expressions": (9.0, "pattern ruler with anchors"),
+    "json": (9.0, "two-column type mapping"),
+    "metaclasses": (9.0, "extended triangle to metaclass"),
+    "datetime": (9.0, "one instant, two clock offsets"),
+    "values": (9.0, "every literal is a typed object"),
+    "literals": (9.0, "literal spellings per type"),
+    "booleans": (9.0, "2×2 truth table"),
+    "sets": (9.0, "hash buckets without values"),
+    "yield-from": (9.0, "stitched ribbons; delegation"),
+    "generator-expressions": (9.0, "lazy filter→map pipeline"),
+    "async-iteration-and-context": (9.0, "loop/coro lanes with await yields"),
+    "assignment-expressions": (9.0, "walrus binds while comparing"),
+    "break-and-continue": (9.0, "early exit at first match"),
+    "delete-statements": (9.0, "name erased; object survives if referenced"),
+    "exception-groups": (9.0, "except* peels matching leaves"),
+    "custom-exceptions": (9.0, "subclass chain to a domain name"),
+    "modules": (9.0, "sys.path resolution; first hit wins"),
+    "protocols": (9.0, "structural duck check"),
+    "enums": (9.0, "closed set of symbolic values"),
+    "functions": (9.0, "specific call: greet('Ada') → 'Hello, Ada'"),
+    "constants": (9.0, "name binding; UPPER_CASE is convention"),
+    "import-aliases": (9.0, "two names bind to the same module"),
+    "number-parsing": (9.0, "int() success path vs ValueError"),
+    "tuples": (9.0, "frozen sequence with struck-through .append"),
+    "truthiness": (9.0, "bool(x) with the falsy set as a strip"),
+    "itertools": (9.0, "chain joins two iterables into one stream"),
+    "assertions": (9.0, "True passes, False raises"),
+    "descriptors": (9.0, "get/set/delete protocol routed through descriptor"),
+    "attribute-access": (9.0, "instance __dict__ → class __dict__ → __getattr__"),
+    "bound-and-unbound-methods": (9.0, "instance.method bound vs Class.method unbound"),
+    "classmethods-and-staticmethods": (9.0, "three method kinds, three first-arg conventions"),
+    "callable-objects": (9.0, "__call__ makes any object callable"),
+    "generics-and-typevar": (9.0, "the same T flows in and out"),
+    "truth-and-size": (9.0, "__bool__ → __len__ → True fallback chain"),
+    "bytes-and-bytearray": (9.0, "frozen vs mutable contrast"),
+    "sentinel-iteration": (9.0, "iter(callable, sentinel) stop condition"),
+    "partial-functions": (9.0, "f → partial(f, 1) → g"),
+    "guard-clauses": (9.0, "early returns, main body at the tail"),
+    "packages": (9.0, "__init__.py + nested submodules"),
+    "virtual-environments": (9.0, "project / venv boundary"),
+    "subprocesses": (9.0, "spawn → child → captured output"),
+    "logging": (9.0, "five thresholded levels"),
+    "testing": (9.0, "arrange-act-assert three-row pattern"),
+    "networking": (9.0, "HTTP / TCP / IP / link stack"),
+    "casts-and-any": (9.0, "Any → cast(T, x) → T, runtime unchanged"),
+    "newtype": (9.0, "same runtime, distinct static identity"),
+    "paramspec": (9.0, "P preserved through decorator"),
+    "literal-and-final": (9.0, "slot narrows to a fixed set"),
+    "runtime-type-checks": (9.0, "isinstance returns bool"),
+    "collections-module": (9.0, "deque / Counter / defaultdict / namedtuple"),
+    "structured-data-shapes": (9.0, "TypedDict named keys with value types"),
+    "csv-data": (9.0, "rows × columns; same shape per line"),
+    "warnings": (9.0, "soft signal; execution continues"),
+    "object-lifecycle": (9.0, "__init__ → live → __del__"),
+    "args-and-kwargs": (9.0, "*args tuple, **kwargs dict regions"),
+    "multiple-return-values": (9.0, "function returns tuple; caller unpacks"),
+    "properties": (9.0, "obj.x routes through fget instead of __dict__"),
+    # 8.5 — abstract by nature; the figure mostly is the diagram itself
+    "overloads": (8.5, "multiple signatures → one impl; abstract"),
+    "callable-types": (8.5, "Callable[[A, B], R] shape; static-only"),
+    "threads-and-processes": (8.5, "GIL lanes; abstract concurrency model"),
+}
+
+
+def figure_score(slug: str) -> tuple[float, str] | None:
+    """Return the v2 score and rationale for an attached example slug, if any."""
+    return SCORES.get(slug)